WO2012061760A2 - Smartphone-based methods and systems - Google Patents
Smartphone-based methods and systems Download PDFInfo
- Publication number
- WO2012061760A2 WO2012061760A2 PCT/US2011/059412 US2011059412W WO2012061760A2 WO 2012061760 A2 WO2012061760 A2 WO 2012061760A2 US 2011059412 W US2011059412 W US 2011059412W WO 2012061760 A2 WO2012061760 A2 WO 2012061760A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- content
- user
- audio
- information
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 341
- 238000005516 engineering process Methods 0.000 claims abstract description 106
- 230000009471 action Effects 0.000 claims abstract description 58
- 238000009877 rendering Methods 0.000 claims abstract description 28
- 230000004438 eyesight Effects 0.000 claims abstract description 7
- 230000006872 improvement Effects 0.000 claims abstract description 6
- 230000015654 memory Effects 0.000 claims description 161
- 238000012545 processing Methods 0.000 claims description 106
- 230000008569 process Effects 0.000 claims description 54
- 230000033001 locomotion Effects 0.000 claims description 49
- 230000000007 visual effect Effects 0.000 claims description 47
- 239000003795 chemical substances by application Substances 0.000 claims description 41
- 230000004044 response Effects 0.000 claims description 35
- 230000000694 effects Effects 0.000 claims description 30
- 230000006870 function Effects 0.000 claims description 30
- 238000003860 storage Methods 0.000 claims description 30
- 238000004422 calculation algorithm Methods 0.000 claims description 29
- 239000000872 buffer Substances 0.000 claims description 20
- 238000004458 analytical method Methods 0.000 claims description 17
- 230000003993 interaction Effects 0.000 claims description 11
- 238000013519 translation Methods 0.000 claims description 10
- 230000001419 dependent effect Effects 0.000 claims description 8
- 230000037452 priming Effects 0.000 claims description 8
- 238000012360 testing method Methods 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 7
- 230000002035 prolonged effect Effects 0.000 claims description 7
- 239000000758 substrate Substances 0.000 claims description 7
- 230000008520 organization Effects 0.000 claims description 4
- 238000007639 printing Methods 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 claims description 3
- 230000006855 networking Effects 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 230000000670 limiting effect Effects 0.000 claims description 2
- 238000004513 sizing Methods 0.000 claims description 2
- 230000001131 transforming effect Effects 0.000 claims 2
- 230000001133 acceleration Effects 0.000 claims 1
- 238000009434 installation Methods 0.000 claims 1
- 238000003909 pattern recognition Methods 0.000 claims 1
- 238000012384 transportation and delivery Methods 0.000 abstract description 8
- 230000000875 corresponding effect Effects 0.000 description 87
- 239000000306 component Substances 0.000 description 42
- 239000003550 marker Substances 0.000 description 21
- 239000002131 composite material Substances 0.000 description 20
- 238000004891 communication Methods 0.000 description 16
- 238000001514 detection method Methods 0.000 description 13
- 230000008901 benefit Effects 0.000 description 12
- 230000008859 change Effects 0.000 description 12
- 239000013598 vector Substances 0.000 description 12
- 230000003190 augmentative effect Effects 0.000 description 11
- 238000005286 illumination Methods 0.000 description 11
- 230000003287 optical effect Effects 0.000 description 10
- 238000013459 approach Methods 0.000 description 9
- 230000006399 behavior Effects 0.000 description 9
- 235000013339 cereals Nutrition 0.000 description 8
- 230000001815 facial effect Effects 0.000 description 7
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 6
- 238000012935 Averaging Methods 0.000 description 6
- 239000003086 colorant Substances 0.000 description 6
- 238000012937 correction Methods 0.000 description 6
- 238000000605 extraction Methods 0.000 description 6
- 230000014509 gene expression Effects 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 6
- 239000000203 mixture Substances 0.000 description 6
- 230000002093 peripheral effect Effects 0.000 description 6
- APTZNLHMIGJTEW-UHFFFAOYSA-N pyraflufen-ethyl Chemical compound C1=C(Cl)C(OCC(=O)OCC)=CC(C=2C(=C(OC(F)F)N(C)N=2)Cl)=C1F APTZNLHMIGJTEW-UHFFFAOYSA-N 0.000 description 6
- 238000005070 sampling Methods 0.000 description 6
- 239000008186 active pharmaceutical agent Substances 0.000 description 5
- 210000004247 hand Anatomy 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 238000012546 transfer Methods 0.000 description 5
- 230000009466 transformation Effects 0.000 description 5
- 241000406668 Loxodonta cyclotis Species 0.000 description 4
- VAYOSLLFUXYJDT-RDTXWAMCSA-N Lysergic acid diethylamide Chemical compound C1=CC(C=2[C@H](N(C)C[C@@H](C=2)C(=O)N(CC)CC)C2)=C3C2=CNC3=C1 VAYOSLLFUXYJDT-RDTXWAMCSA-N 0.000 description 4
- 230000000295 complement effect Effects 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 210000003811 finger Anatomy 0.000 description 4
- 239000000446 fuel Substances 0.000 description 4
- 230000001976 improved effect Effects 0.000 description 4
- 230000010354 integration Effects 0.000 description 4
- 229910052754 neon Inorganic materials 0.000 description 4
- GKAOGPIIYCISHV-UHFFFAOYSA-N neon atom Chemical compound [Ne] GKAOGPIIYCISHV-UHFFFAOYSA-N 0.000 description 4
- 238000004806 packaging method and process Methods 0.000 description 4
- 230000001681 protective effect Effects 0.000 description 4
- 238000013442 quality metrics Methods 0.000 description 4
- 238000012552 review Methods 0.000 description 4
- 238000010079 rubber tapping Methods 0.000 description 4
- 241000785681 Sander vitreus Species 0.000 description 3
- 210000003484 anatomy Anatomy 0.000 description 3
- 230000003416 augmentation Effects 0.000 description 3
- 230000001276 controlling effect Effects 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 210000000887 face Anatomy 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 3
- 239000011800 void material Substances 0.000 description 3
- 241000282412 Homo Species 0.000 description 2
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 239000011013 aquamarine Substances 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013479 data entry Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000007667 floating Methods 0.000 description 2
- 235000003642 hunger Nutrition 0.000 description 2
- 230000000977 initiatory effect Effects 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000001404 mediated effect Effects 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000003012 network analysis Methods 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 230000002085 persistent effect Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000010076 replication Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 238000012358 sourcing Methods 0.000 description 2
- 230000008685 targeting Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 210000003813 thumb Anatomy 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 241000093804 Berzelia galpinii Species 0.000 description 1
- 241001609030 Brosme brosme Species 0.000 description 1
- 235000008534 Capsicum annuum var annuum Nutrition 0.000 description 1
- 235000002568 Capsicum frutescens Nutrition 0.000 description 1
- 208000032544 Cicatrix Diseases 0.000 description 1
- 241001658031 Eris Species 0.000 description 1
- 108010068977 Golgi membrane glycoproteins Proteins 0.000 description 1
- 241000424103 Parapercis colias Species 0.000 description 1
- 241000283080 Proboscidea <mammal> Species 0.000 description 1
- 241000423790 Pseudophycis bachus Species 0.000 description 1
- 235000009137 Quercus alba Nutrition 0.000 description 1
- 241001531312 Quercus pubescens Species 0.000 description 1
- 241001247145 Sebastes goodei Species 0.000 description 1
- 241000270295 Serpentes Species 0.000 description 1
- 241000414697 Tegra Species 0.000 description 1
- 206010000210 abortion Diseases 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 239000003610 charcoal Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000035622 drinking Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000000193 eyeblink Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 239000010438 granite Substances 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000003306 harvesting Methods 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 238000003702 image correction Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000003116 impacting effect Effects 0.000 description 1
- 239000000976 ink Substances 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000000059 patterning Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000011045 prefiltration Methods 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 230000012846 protein folding Effects 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
- 230000002829 reductive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 239000010979 ruby Substances 0.000 description 1
- 229910001750 ruby Inorganic materials 0.000 description 1
- 231100000241 scar Toxicity 0.000 description 1
- 230000037387 scars Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 235000010956 sodium stearoyl-2-lactylate Nutrition 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 235000014347 soups Nutrition 0.000 description 1
- 230000002269 spontaneous effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000008093 supporting effect Effects 0.000 description 1
- 230000002459 sustained effect Effects 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
- 230000037303 wrinkles Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
- G06V10/225—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on a marking or identifier characterising the area
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
- G06F8/61—Installation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/30—Payment architectures, schemes or protocols characterised by the use of specific devices or networks
- G06Q20/32—Payment architectures, schemes or protocols characterised by the use of specific devices or networks using wireless devices
- G06Q20/322—Aspects of commerce using mobile devices [M-devices]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q20/00—Payment architectures, schemes or protocols
- G06Q20/30—Payment architectures, schemes or protocols characterised by the use of specific devices or networks
- G06Q20/32—Payment architectures, schemes or protocols characterised by the use of specific devices or networks using wireless devices
- G06Q20/327—Short range or proximity payments by means of M-devices
- G06Q20/3276—Short range or proximity payments by means of M-devices using a pictured code, e.g. barcode or QR-code, being read by the M-device
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/0021—Image watermarking
- G06T1/005—Robust watermarking, e.g. average attack or collusion attack resistant
- G06T1/0064—Geometric transfor invariant watermarking, e.g. affine transform invariant
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/285—Memory allocation or algorithm optimisation to reduce hardware requirements
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/81—Detection of presence or absence of voice signals for discriminating voice from music
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72403—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N1/00—Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
- H04N1/32—Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
- H04N1/32101—Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
- H04N1/32144—Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title embedded in the image data, i.e. enclosed or integrated in the image, e.g. watermark, super-imposed logo or stamp
- H04N1/32149—Methods relating to embedding, encoding, decoding, detection or retrieval operations
- H04N1/32154—Transform domain methods
- H04N1/32187—Transform domain methods with selective or adaptive application of the additional information, e.g. in selected frequency coefficients
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N1/00—Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
- H04N1/32—Circuits or arrangements for control or supervision between transmitter and receiver or between image input and image output device, e.g. between a still-image camera and its memory or between a still-image camera and a printer device
- H04N1/32101—Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title
- H04N1/32144—Display, printing, storage or transmission of additional information, e.g. ID code, date and time or title embedded in the image data, i.e. enclosed or integrated in the image, e.g. watermark, super-imposed logo or stamp
- H04N1/32149—Methods relating to embedding, encoding, decoding, detection or retrieval operations
- H04N1/32309—Methods relating to embedding, encoding, decoding, detection or retrieval operations in colour image data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2463/00—Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
- H04L2463/101—Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00 applying security measures for digital rights management
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2250/00—Details of telephonic subscriber devices
- H04M2250/52—Details of telephonic subscriber devices including functional features of a camera
Definitions
- the present technology generally primarily concerns sensor-equipped consumer electronic devices, such as smartphones and tablet computers.
- Many apps concern media content. Some are designed to provide on-demand playback of audio or video content, e.g., television shows. Others serve to complement media content, such as by enabling access to extra content (behind-the-scenes clips, cast biographies and interviews, contests, games, recipes, how-to videos), by allowing social network-based features (communicating with other fans, including by Twitter, Facebook and Foursquare, blogs), etc.
- a media-related app may operate in synchrony with the audio or video content, e.g., presenting content and links at time- or event-appropriate points during the content.
- a microphone-equipped user device samples ambient content, and produces content-identifying data from the captured audio. This content-identifying data is then used to look-up an app recommended by the proprietor of the content, which app is then installed and launched - with little or no action required by the user.
- each user device becomes app-adapted to the content preferences of the user - thereby becoming optimized to the user' s particular interests in the content world.
- this aspect of the present technology is akin to the recommendation features of
- TiVo but for apps.
- the user's content consumption habits (and optionally those of the user's social network friends) lead the device to recommend apps that serve the user's interests.
- the "channel" was king, and content played a supporting role (i.e., drawing consumers to the channel, and to its advertising). From the consumer's standpoint, however, these roles should be reversed: content should be primary. Embodiments of the present technology are based on this premise. The user chooses the content, and the delivery mechanism then follows, as a consequence.
- Fig. 1 is a block diagram of a system that can be used in certain embodiments of the present technology.
- Fig. 2 is a representation of a data structure that can be used with the embodiment of Fig. 1.
- Figs. 3-7 detail features of illustrative gaze-tracking embodiments, e.g., for text entry.
- FIGs. 8 and 9 detail features of an illustrative user interface.
- Fig. 10 shows a block diagram of a system incorporating principles of the present technology.
- Fig. 11 shows marker signals in a spatial-frequency domain.
- Fig. 12 shows a mixed-domain view of a printed object that includes the marker signals of Fig. 11 , according to one aspect of the present technology.
- Fig. 13 shows a corner marker that can be used to indicate hidden data.
- Fig. 14 shows an alternative to the marker signals of Fig. 11.
- Fig. 15 shows a graph representation of data output from a smartphone camera.
- Fig. 16 shows a middleware architecture for object recognition.
- Fig. 17 is similar to Fig. 16, but is particular to the Digimarc Discover implementation.
- Fig. 18 is a bar chart showing impact of reading image watermarks on system tasks.
- Fig. 19 further details performance of a watermark recognition agent running on an Apple iPhone 4 device.
- Fig. 20 shows locations of salient points in first and second image frames.
- Fig. 21 shows histograms associated with geometric alignment of two frames of salient points.
- Fig. 22 shows an image memory in a smartphone, including three color bit plane of 8-bit depth each.
- Fig. 23 shows a similar smartphone memory, but now utilized to store RDF triples.
- Fig. 24 shows some of the hundreds or thousands of RDF triples that may be stored in the memory of Fig. 23.
- Fig. 25 shows the memory of Fig. 23, now populated with illustrative RDF information detailing certain relationships among people.
- Fig. 26 shows some of the templates that may be applied to the Predicate plane of the Fig. 25 memory, to perform semantic reasoning on the depicted RDF triples.
- Fig. 27 names the nine RDF triples within a 3x3 pixel block of memory.
- Fig. 28 shows a store of memory in a smartphone.
- Figs. 29A and 29B depict elements of a graphical user interface that uses data from the Fig. 28 memory.
- Fig. 30 shows use of a memory storing triples, and associated tables, to generate data used in generate a search query report to a user.
- Fig. 31 shows another store of memory in a smartphone, depicting four of more planes of integer (e.g., 8-bit) storage.
- Fig. 32 shows a smartphone displaying an image captured from a catalog page, with a distinctive graphical effect that signals presence of a steganographic digital watermark.
- Figs. 33 and 34 show how a smartphone can spawn tags, presented along an edge of the display, associated with different items in the display.
- Fig. 35 shows information retrieved from a database relating to a watermark-identified catalog page (i.e., object handles for an object shape).
- Fig. 36 shows how detection of different watermarks in different regions of imagery can be signaled to a user.
- Fig. 38 shows an LED-based communication system, incorporating both high bandwidth and low bandwidth channels.
- an illustrative system 12 includes a device 14 having a processor 16, a memory 18, one or more input peripherals 20, and one or more output peripherals 22.
- System 12 may also include a network connection 24, and one or more remote computers 26.
- An illustrative device 14 is a smartphone or a tablet computer, although any other consumer electronic device can be used.
- the processor can comprise a microprocessor such as an Atom or A4 device.
- the processor's operation is controlled, in part, by information stored in the memory, such as operating system software, application software (e.g., "apps"), data, etc.
- the memory may comprise flash memory, a hard drive, etc.
- the input peripherals 20 may include a camera and/or a microphone.
- the peripherals (or device 14 itself) may also comprise an interface system by which analog signals sampled by the
- the camera/microphone are converted into digital data suitable for processing by the system.
- Other input peripherals can include a touch screen, keyboard, etc.
- the output peripherals 22 can include a display screen, speaker, etc.
- the network connection 24 can be wired (e.g., Ethernet, etc.), wireless (WiFi, 4G, Bluetooth, etc.), or both.
- device 14 receives a set of digital content data, such as through a microphone 20 and interface, through the network connection 24, or otherwise.
- the content data may be of any type; audio is exemplary.
- the system 12 processes the digital content data to generate corresponding identification data. This may be done, e.g., by applying a digital watermark decoding process, or a fingerprinting algorithm - desirably to data representing the sonic or visual information itself, rather than to so-called "out-of-band" data (e.g., file names, header data, etc.).
- the resulting identification data serves to distinguish the received content data from other data of the same type (e.g., other audio or other video).
- the system determines corresponding software that should be invoked.
- One way to do this is by indexing a table, database, or other data structure with the identification data, to thereby obtain information identifying the appropriate software.
- An illustrative table is shown conceptually in Fig. 2.
- the data structure may return identification of a single software program. In that case, this software is launched - if available. (Availability does not require that the software be resident on the device. Cloud-based apps may be available.) If not available, the software may be downloaded (e.g., from an online repository, such as the iTunes store), installed, and launched. (Or, the device can subscribe to a software-as-service cloud version of the app.) Involvement of the user in such action(s) can depend on the particular implementation: sometimes the user is asked for permission; in other implementations such actions proceed without disturbing the user.
- the data structure may identify several different software programs.
- the different programs may be specific to different platforms, in which case, device 12 may simply pick the program corresponding to that platform (e.g., Android G2, iPhone 4, etc.).
- the data structure may identify several alternative programs that can be used on a given platform.
- the device may check to determine which - if any - is already installed and available. If such a program is found, it can be launched. If two such programs are found, the device may choose between them using an algorithm (e.g., most-recently-used; smallest memory footprint; etc.), or the device may prompt the user for a selection. If none of the alternative programs is available to the device, the device can select and download one - again using an algorithm, or based on input from the user. Once downloaded and installed, the application is launched.
- an algorithm e.g., most-recently-used; smallest memory footprint; etc.
- the data structure may identify different programs that serve different functions - all related to the content.
- One for example, may be an app for discovery of song lyrics.
- Another may be an app relating to musician biography.
- Another may be an app for purchase of the content.
- each different class of software may include several alternatives.
- the device may already have an installed application that is technically suited to work with the received content (e.g., to render an MPEG4 or an MP3 file). For certain types of operations, there may be dozens or more such programs that are technically suitable. However, the content may indicate that only a subset of this universe of possible software programs should be used.
- Software in the device 14 may strictly enforce the content-identified software selection.
- the system may treat such software identification as a preference that the user can override.
- the user may be offered an incentive to use the content-identified software.
- the user may be assessed a fee, or other impediment, in order to use software other than that indicated by the content.
- the system may decline to render certain content on a device (e.g., because of lack of suitable app or hardware capability), but may invite the user to transfer the content to another user device that has the needed capability, and may implement such transfer.
- a device e.g., because of lack of suitable app or hardware capability
- the software may invite the user to instead transfer the imagery to a large format HD display at the user' s home for viewing.
- the system may render it in a limited fashion. For example, a video might be rendered as a series of still key frames (e.g., from scene transitions). Again, the system can transfer the content where it can be more properly enjoyed, or - if hardware considerations permit (e.g., screen display resolution is adequate) - needed software can be downloaded and used.
- a video might be rendered as a series of still key frames (e.g., from scene transitions).
- the system can transfer the content where it can be more properly enjoyed, or - if hardware considerations permit (e.g., screen display resolution is adequate) - needed software can be downloaded and used.
- the indication of software may be based on one or more contextual factors - in addition to the content identification data. (Only two context factors are shown; more or less can of course be used.)
- Context any information that can be used to characterize the situation of an entity (a person, place or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves.”
- Context information can be of many sorts, including computing context (network connectivity, memory availability, processor type, CPU contention, etc.), user context (user profile, location, actions, preferences, nearby friends, social network(s) and situation, etc.), physical context (e.g., lighting, noise level, traffic, etc.), temporal context (time of day, day, month, season, etc.), history of the above, etc.
- computing context network connectivity, memory availability, processor type, CPU contention, etc.
- user context user profile, location, actions, preferences, nearby friends, social network(s) and situation, etc.
- physical context e.g., lighting, noise level, traffic, etc.
- temporal context time of day, day, month, season, etc.
- rows 32 and 34 correspond to the same content (i.e., same content ID), but they indicate different software should be used - depending on whether the user's context is indoors or outdoors.
- the software is indicated by a 5 symbol hex identifier; the content is identified by 6 hex symbols. Identifier
- Row 36 shows a software selection that includes two items of software - both of which are invoked. (One includes a further descriptor - an identifier of a YouTube video that is to be loaded by software "FF245.”) This software is indicated for a user in a daytime context, and for a user in the 20-25 age demographic.
- Row 38 shows user location (zip code) and gender as contextual data.
- the software for this content/context is specified in the alternative (i.e., four identifiers "OR”d together, as contrasted with the "AND” of row 36).
- Rows 40 and 42 show that the same content ID can correspond to different codecs - depending on the device processor (Atom or A4).
- default codecs come bundled with certain media rendering software (e.g., Windows Media Player). If the defaults are unable to handle certain content, the rendering software typically downloads a further codec - again with no input from the parties most concerned.)
- media rendering software e.g., Windows Media Player
- the software indicated in table 30 by the content can be a stand-alone app, or a software component - such as a codec, driver, etc.
- the software can render the content, or it can be a content companion - providing other information or functionality related to the content.
- the "software" can comprise a URL, or other data/parameter that is provided to another software program or online service (e.g., a YouTube video identifier).
- all such software identified in the table is chosen by the proprietor (e.g., artist, creator or copyright-holder) of the content with which it is associated.
- the proprietor e.g., artist, creator or copyright-holder
- the proprietor' s control in such matters should be given more deference than, say, that of a content distributor - such as AOL or iTunes.
- the proprietor's choice seems to merit more weight than that of the company providing word processing and spreadsheet software for the device.
- proprietor's selection of software will be based on aesthetics and technical merit. Sometimes, however, commercial considerations come into play. (As artist Robert Genn noted, " 'Starving artist' is acceptable at age 20, suspect at age 40, and problematical at age 60.") Thus, for example, if a user' s device detects ambient audio by the group The Decemberists, artist- specified data in the data structure 30 may indicate that the device should load the Amazon app for purchase of the detected music (or load the corresponding Amazon web page), to induce sales. If the same device detects ambient audio by the Red Hot chili Peppers, that group may have specified that the device should load the band' s own web page (or another app), for the same purpose. The proprietor can thus specify the fulfillment service for content objected-oriented commerce.
- the starving artist problem may best be redressed by an auction arrangement. That is, the device 14 (or remote computer system 26) may announce to an online service (akin to Google AdWords) that the iPod of a user - for which certain demographic profile/context information may be available - has detected the soundtrack of the movie Avatar. A mini auction can then ensue - for the privilege of presenting a buying opportunity to the user. The winner (e.g., EBay) then pays the winning bid amount into an account, from which it is shared with the auction service, the artist, etc. The user's device responds by launching an EBay app through which the user can buy a copy of the movie, its soundtrack, or related merchandise. Pushing such content detection events, and associated context information, to cloud-based services can enable a richly competitive marketplace of responses.
- Universal Music Group may digitally watermark all its songs with an identifier that causes the FFmpeg MP3 player to be identified as the preferred rendering software.
- Dedicated fans of UMG artists soon install the recommended software - leading to deployment of such software on large numbers of consumer devices.
- the widespread use of the FFmpeg MP3 software can be one of the factors they weigh in making a choice.
- the software indicated in table 30 may be changed over time, such as through the course of a song' s release cycle.
- the table-specified software may include an app intended to introduce the new band to the public (or a YouTube clip can be indicated for this purpose). After the music has become popular and the band has become better known, a different software selection may be indicated.
- OS Operating system
- I/O useful services
- operating system software is provided to perform one or more services specific to content processing or identification.
- an OS application programming interface takes content data as input (or a pointer to a location where the content data is stored), and returns fingerprint data corresponding thereto.
- Another OS service (either provided using the same API, or another) takes the same input, and returns watermark information decoded from the content data.
- An input parameter to the API can specify which of plural fingerprint or watermark processes is to be applied.
- the service may apply several different watermark and/or fingerprint extraction processes to the input data, and return resultant information to the calling program. In the case of watermark extraction, the resultant information can be checked for apparent validity by reference to error correction data or the like.)
- the same API can further process the extracted fingerprint/watermark data to obtain XML-based content metadata that is associated with the content (e.g., text giving the title of the work, the name of the artist, the copyright holder, etc.). To do this it may consult a remote metadata registry, such as maintained by Gracenote.
- XML-based content metadata that is associated with the content (e.g., text giving the title of the work, the name of the artist, the copyright holder, etc.). To do this it may consult a remote metadata registry, such as maintained by Gracenote.
- Such a content-processing API can establish a message queue (e.g., a "listening/hearing queue) to which results of the fingerprint/watermark extraction process (either literally, or the corresponding metadata) are published.
- a message queue e.g., a "listening/hearing queue”
- One or more application programs can monitor (hook) the queue - listening for certain identifiers.
- One app may be alert to music by the Beatles. Another may listen for Disney movie soundtracks.
- the monitoring app - or another - can launch into activity - logging the event, acting to complement the media content, offering a buying opportunity, etc.
- Such functionality can be implemented apart from the operating system.
- One approach is with a publish/subscribe model, by which some apps publish capabilities (e.g., listening for a particular type of audio), and other subscribe to such functions.
- Some apps publish capabilities (e.g., listening for a particular type of audio), and other subscribe to such functions.
- loosely-coupled applications can cooperate to enable a similar ecosystem.
- One application of the present technology is to monitor media to which a user is exposed - as a background process. That is, unlike song identification services such as Shazam, the user need not take any action to initiate a discovery operation to learn the identity of a particular song. (Of course, the user - at some point - must turn on the device, and authorize this background functionality.) Instead, the device listens for a prolonged period - much longer than the 10-15 seconds of Shazam-like services, during the course of the user's day. As content is encountered, it is processed and recognized. The recognition information is logged in the device, and is used to prime certain software to reflect exposure to such content - available the next time the user' s attention turns to the device.
- the device may process ambient audio for fifteen minutes, for an hour, or for a day.
- the user may present a listing of content to which the user has been exposed.
- the user may be invited to touch listings for content of interest, to engage in a discovery operation.
- Software associated with this content then launches.
- the device can prime software applications with information that is based, at least in part, on the content identification data.
- This priming may cause, e.g., the YouTube app to show a thumbnail corresponding to a music video for a song heard by the user - readying it for selection.
- a 90 second sample audio clip may be downloaded to the iPod music player app - available in a "Recent Encounters" folder.
- An email from the band might be added to the user's email InBox, and a trivia game app may load a series of questions relating to the band.
- Such data is resident locally (i.e., the user needn't direct its retrieval, e.g., from a web site), and the information is prominent to the user when the corresponding app is next used - thereby customizing these apps per the user's content experiences.
- Social media applications can serve as platforms through which such information is presented, and shared.
- a Facebook app for example, an avatar may give a greeting, "I noticed that you experienced the following things today" and then list content to which the user was exposed, e.g., "Billy Liar” by the Decemberists, “Boys Better” by the Dandy Warhols, and the new LeBron James commercial for Nike.
- the app may remind the user of the context in which each was encountered, e.g., while walking through downtown Portland on November 4, 2010 (as determined, e.g., by GPS and accelerometer sensors in the device).
- the Facebook app can invite the user to share any of this content with friends. It may further query whether the user would like discographies for any of the bands, or whether it would like full digital copies of the content, is interested in complementary content associated with any, or would like associated app(s) launched, etc.
- the app may similarly report on media encounters, and associated activities, of the user's friends (with suitable permissions).
- Such embodiments assure continuity between artistic intention and delivery; they optimize the experience that the art is intended to create. No longer must the artistic experience be mediated by a delivery platform over which the artist has no control - a platform that may seek attention for itself, potentially distracting from the art in which the user is interested.
- This technology also fosters competition in the app marketplace - giving artists a more prominent voice as to which apps best express their creations. Desirably, a Darwinian effect may emerge, by which app popularity becomes less an expression of branding and marketing budgets, and more a reflection of popularity of the content thereby delivered.
- DVR software such as from Tivo
- Tivo A familiar example is DVR software, such as from Tivo, that presents a subset of the unabridged electronic program guide, based on apparent user interests.
- the Tivo software notices which television programs have been viewed by the user, invites user feedback in the form of "thumbs-up” or “thumbs- down” rankings, and then suggests future programs of potential interest based on such past behavior and ranking.
- Google's "Priority Inbox" for its Gmail service. Incoming email is analyzed, and ranked in accordance with its potential importance to the user. In making such judgment, Google considers what email the user has previously read, to which email the user has previously responded, and the senders/keywords associated with such mails. Incoming email that scores highly in such assessment is presented at the top of the mail list.
- the company My6sense.com offers a similar service for triaging RSS and Twitter feeds. Again, the software monitors the user's historical interaction with data feeds, and elevates in priority the incoming items that appear most relevant to the user. (In its processing of Twitter feeds, My6sense considers the links the user has clicked on, the tweets the user has marked as favorites, the tweets that the user has retweeted, and the authors/keywords that characterize such tweets.)
- Such principles can be extended to encompass object interactions. For example, if a person visiting a Nordstrom department store uses her smartphone to capture imagery of a pair of Jimmy Choo motorcycle boots, this may be inferred to indicate some interest in fashion, or in motorcycling, or in footwear, or in boots, or in Jimmy Choo merchandise, etc. If the person later uses her smartphone to image River Road motorcycle saddle bags, this suggests the person's interest may more accurately be characterized as including motorcycling. As each new image object is discerned, more information about the person's interests is gleaned. Some early conclusions may be reinforced (e.g., motorcycling), other hypotheses may be discounted. In addition to recognizing objects in imagery, the analysis (which can include human review by crowd-sourcing) can also discern activities. Location can also be noted (either inferred from the imagery, or indicated by GPS data or the like).
- image analysis applied to a frame of imagery may determine that it includes a person riding a motorcycle, with a tent and a forested setting as a background. Or in a temporal series of images, one image may be found to include a person riding a motorcycle, another image taken a few minutes later may be found to include a person in the same garb as the motorcycle rider of the previous frame - now depicted next to a tent in a forested setting, and another image taken a few minutes later may be found to depict a motorcycle being ridden with a forested background. GPS data may locate all of the images in Yellowstone National Park.
- Such historical information - accumulated over time - can reveal recurrent themes and patterns that indicate subjects, activities, people, and places that are of interest to the user. Each such conclusion can be given a confidence metric, based on the system' s confidence that the attribute accurately characterizes a user interest. (In the examples just given, "motorcycling" would score higher than "Jimmy Choo merchandise.") Such data can then be used in filtering or highlighting the above-noted feeds of data (and others) with which the user's devices are presented.
- a user may elect to establish a Twitter account that is essentially owned by the user's object- derived profile.
- This Twitter account follows tweets relating to objects the user has recently sensed. If the user has imaged a Canon SLR camera, this interest can be reflected in the profile-associated Twitter account, which can follow tweets relating to such subject. This account can then re -tweet such posts into a feed that the user can follow, or check periodically, from the user's own Twitter account.
- Such object-derived profile information can be used for more than influencing the selection of content delivered to the user via smartphone, television, PC and other content-delivery devices. It can also influence the composition of such content.
- objects with which the user interacts can be included in media mashups for the user' s consumption.
- a central character in a virtual reality gaming world frequented by the user may wear Jimmy Choo motorcycle boots. Treasure captured from an opponent may include a Canon SLR camera. Every time the user interacts with an object, this interaction can be published via Twitter, Facebook, etc. (subject to user permission and sharing parameters).
- These communications can also be thought of as "check-ins" in the Foursquare sense, but in this case it is for an object or media type (music, TV, etc.) rather than for a location.
- Social network analysis views relationships using network theory, in which the network comprises nodes and ties (sometimes called edges, links, or connections).
- Nodes are the individual actors within the networks, and ties are the relationships between the actors.
- the resulting graph-based structures can be complex; there can be many kinds of ties between the nodes.
- the relationships ties can include "likes,” "owns,” etc.
- a particular 3D graph may place people objects in one plane, and physical objects in a parallel plane. Links between the two planes associate people with objects that they own or like. (The default relationship may be “like.” "Owns” may be inferred from context, or deduced from other data. E.g., a Camaro automobile photographed by a user, and geolocated at the user's home residence, may indicate an "owns” relationship. Similarly, a look-up of a Camaro license plate in a public database, which indicates the car is registered to the user, also suggests an "owns" relationship.)
- Such a graph will also typically include links between people objects (as is conventional in social network graphs), and may also include links between physical objects. (One such link is the relationship of physical proximity. Two cars parked next to each other in a parking lot may be linked by such a relationship.)
- the number of links to a physical object in such a network is an indication of the object's relative importance in that network. Degrees of association between two different physical objects can be indicated by the length of the network path(s) linking them - with a shorter path indicating a closer degree of association.
- Some objects may be of transitory interest to users, while others may be of long-term interest. If a user images a particular type of object only once, it likely belongs to the former class. If the user captures images of such object type repeatedly over time, it more likely belongs to the latter class.
- Decisions based on the profile data can take into account the aging of object-indicated interests, so that an object encountered once a year ago is not given the same weight as an object encountered more recently. For example, the system may follow Canon SLR-based tweets only for a day, week or month, and then be followed no longer, unless other objects imaged by the user evidence a continuing interest in Canon equipment or SLR cameras. Each object-interest can be assigned a numeric profile score that is increased, or maintained, by repeated encounters with objects of that type, but which otherwise diminishes over time. This score is then used to weight that object-related interest in treatment of content.
- the front-facing camera on a smartphone can be used to speed text entry, in a gaze-tracking mode.
- a basic geometrical reference frame can be first established by having the user look
- the user can indicate an initial letter by gazing at it on a displayed keyboard 102.
- a displayed keyboard 102 can signify selection of the gazed-at letter by a signal such as a gesture, e.g., an eye blink, or a tap on the smartphone body (or on a desk on which it is lying).
- a gesture e.g., an eye blink, or a tap on the smartphone body (or on a desk on which it is lying).
- text is selected, it is added to a message area 103
- data entry may be speeded (and gaze tracking may be made more accurate) by presenting likely next-letters in an enlarged letter-menu portion 104 of the screen.
- a menu of likely next-letters is presented. An example is shown in Fig. 4, in which the menu takes the form of a hexagonal array of tiles (although other arrangements can of course be used).
- the user To enter the next letter “1", the user simply looks at the "al" display tile 112, and signifies acceptance by a tap or other gesture, as above.
- the system then updates the screen as shown in Fig. 5.
- the message has been extended by a letter ("Now is the time for al_"), and the menu 104 has been updated to show the most common letter pairs beginning with the letter "1”.
- the device solicits a next letter input.
- To enter another "1” the user gazes at the "11" tile 114, and gestures.
- an alternative embodiment presents five letter pairs and a space, as shown in Fig. 6.
- a keyboard is always displayed on the screen, so the user can select letters from it without the intermediate step of selecting the keyboard tile 110 of Fig. 4.
- keyboard display 102 instead of the usual keyboard display 102, a variant keyboard display 102a - shown in Fig. 7 - can be used.
- This layout reflects the fact that five characters are not needed on the displayed keyboard, since the five most-likely letters are already presented in the hexagonal menu.
- the five keys are not wholly omitted, but rather are given extra-small keys.
- the 21 remaining letters are given extra-large keys.
- Such arrangement speeds user letter selection from the keyboard, and makes gaze tracking of the remaining keys more accurate.
- a further variant, in which the five letter keys are omitted entirely from the keyboard can also be used.
- the variant keyboard layout 102a of Fig. 7 omits the usual space bar. Since there is an enlarged menu tile 116 for the space symbol, no space bar in the keyboard 102a is required. In the illustrated arrangement, this area has been replaced with common punctuation symbols.
- a numeric pad can be summoned to the screen by selection of a numeric pad icon - like keyboard tile 110 in Fig. 4. Or a numeric keyboard can be displayed on the screen throughout the message composition operation (like keyboard 102 in Fig. 6).
- One or more of the hexagonal tiles can present a guess of the complete word the user is entering - again based on analysis of a text corpus.
- the corpus used to determine the most common letter pairs, and full word guesses can be user- customized, e.g., a historical archive of all text and/or email messages authored by the user, or sent from the user's device.
- the indicated display features can naturally be augmented by other graphical indicia and controls associated with the smartphone functionality being used (e.g., a text-messaging application).
- the user may select from symbols and words presented apart from the smartphone display - such as on a printed page.
- a large-scale complete keyboard and a complete numeric pad can be presented on such a page, and used independently, or in conjunction with a displayed letter menu, like menu 104.
- the smartphone camera can be used to perform the gaze-tracking, and geometrical calibration can be performed by having the user gaze at reference points.
- a smartphone can watch a user's eye, and interpret the movements, it can similarly watch a user's hand gestures, and interpret them as well.
- the result is a sign language interpreter.
- Sign languages (American and British sign languages being the most dominant) comprise a variety of elements - all of which can be captured by a camera, and identified by suitable image analysis software.
- a sign typically includes handform and orientation aspects, and may also be characterized by a location (or place of articulation), and movement.
- Manual alphabets (iingerspelling) gestures are similar, and are employed mostly for proper names and other specialized vocabulary.
- An exemplary sign language analysis module segments the smartphone-captured imagery into regions of interest, by identifying contiguous sets of pixels having chrominances within a gamut associated with most skin tones.
- the thus-segmented imagery is then applied to a classification engine that seeks to match the hand configuration(s) with a best match within a database library of reference handforms.
- sequences of image frames are processed to discern motion vectors indicating the movement different points within the handforms, and changes to the orientations over time. These discerned movements are likewise applied to a database of reference movements and changes to identify a best match.
- textual meanings associated with the discerned signs are retrieved from the database records and can be output - as words, phonemes or letters - to an output device, such as the smartphone display screen.
- the best-match data from the database is not output in raw form.
- the database identifies for each sign a set of candidate matches - each with a confidence metric.
- the system software then consider what combination of words, phonemes or letters is most likely in that sequential context - giving weight to the different confidences of the possible matches, and referring to a reference database detailing word spellings (e.g., a dictionary), and identifying frequently signed word-pairs and - triples. (The artisan will recognize that similar techniques are used in speech recognition systems - to reduce the likelihood of outputting nonsense phrases.)
- the recognition software can also benefit by training. If the user notes an incorrect interpretation has been given by the system to a sign, the user can make a sign indicating that a previous sign will be repeated for re-interpretation. The user then repeats the sign. The system then offers an alternative interpretation - avoiding the previous interpretation (which the system infers was incorrect). The process may be repeated until the system responds with the correct interpretation (which may be acknowledged with a user sign, such as a thumbs-up gesture). The system can then add to its database of reference signs the just-expressed signs - in association with the correct meaning.
- the system interprets a sign, and the user does not challenge the interpretation, then data about the captured sign imagery can be added to the reference database - in association with that interpretation.
- the system learns to recognize the various presentations of certain signs.
- the same technique allows the system to be trained, over time, to recognize user-specific vernacular and other idiosyncrasies.
- standard sign language can be augmented to give the image analysis software some calibration or reference information that will aid understanding.
- a user may begin with gesture such as extending fingers and thumbs from an outwardly-facing palm (the typical sign for the number '5') and then returning the fingers to a fist.
- This allows the smartphone to identify the user's fleshtone chrominance, and determine the scale of the user's hand and fingers.
- the same gesture, or another can be used to separate concepts - like a period at the end of a sentence. (Such punctuation is commonly expressed in American signal language by a pause. An overt hand gesture, rather than the absence of a gesture, is a more reliable parsing element for machine vision-based sign language interpretation.)
- the interpreted sign language can be output as text on the smartphone display.
- the text can be simply stored (e.g., in a ASCII or Word document), or it can be output through a text-to-speech converter, to yield audible speech.
- the text may be input to a translation routine or service (e.g., Google translate) to convert it to another language - in which it may be stored, displayed or spoken.
- a translation routine or service e.g., Google translate
- the smartphone may employ its proximity sensor to detect the approach of a user' s body part (e.g., hands), and then capture frames of camera imagery and check them for a skin-tone chrominance and long edges (or other attributes that are characteristic of hands and/or fingers). If such analysis concludes that the user has moved hands towards the phone, the phone may activate its sign language translator.
- Apple's FaceTime communications software can be adapted to activate the sign language translator when the user positions hands to be imaged by a phone's camera. Thereafter, text counterparts to the user's hand gestures can be communicated to the other party(ies) to which the phone is linked, such as by text display, text-to-speech conversion, etc.
- a smartphone is equipped to rapidly capture identification from plural objects, and to make same available for later review.
- Fig. 8 shows an example.
- the application includes a large view window that is updated with streaming video from the camera (i.e., the usual viewfinder mode). As the user pans the camera, the system analyzes the imagery to discern any identifiable objects. In Fig. 8, there are several objects bearing barcodes within the camera's field of view.
- the processor analyzes the image frame starting at the center - looking for identifiable features.
- an identifiable feature e.g., the barcode 118
- it overlays bracketing 120 around the feature, or highlights the feature, to indicate to the user what part of the displayed imagery has caught its attention.
- a "whoosh” sound is then emitted from the device speaker, and an animated indicia moves from the bracketed part of the screen to a History 122 button at the bottom.
- the animation can be a square graphic that collapses to a point down at the History button.
- a red-circled counter 124 that is displayed next to the History button indicates the number of items thus-detected and placed in the device History (7, in this case).
- barcode 118 After thus-processing barcode 118, the system continues its analysis of the field of view for other recognizable features. Working out from the center it next recognizes barcode 126, and a similar sequence of operations follows.
- the counter 124 is incremented to "8.” It next notes barcode 128 - even though it is partially outside the camera's field of view. (Redundant encoding of certain barcodes enables such decoding.)
- the time elapsed for recognizing and capturing data from the three barcodes into the device history, with the associated user feedback (sound and animation effects) is less than 3 seconds (with 1 or 2 seconds being typical).
- the History button 122 By tapping the History button 122, a scrollable display of previously-captured features is presented, as shown in Fig. 9.
- each entry includes a graphical indicia indicating the type of feature that was recognized, together with information discerned from the feature, and the time the feature was detected. (The time may be stated in absolute fashion, or relative to the present time; the latter is shown in Fig. 9.)
- the features detected by the system needn't be found in the camera data.
- Fig. 9 shows that the phone also sensed data from a near field chip (e.g., an RFID chip) - indicated by the "NFC" indicia.
- a near field chip e.g., an RFID chip
- the user can recall this History list, and tap indicia of interest.
- the phone then responds by launching a response corresponding to that feature (or by presenting a menu of several available features, from which the user can select).
- a button control 130 on the application UI toggles such functionality on and off. In the indicated state, detection of multiple features is enabled. If the user taps this control, its indicia switches to "Multiple is Off.”
- the phone detects a feature in this mode, the system adds it to the History (as before), and immediately launches a corresponding response. For example, it may invoke web browser functionality and load a web page corresponding to the detected feature.
- Another aspect of the present technology involves smartphone -based state machines, which vary their operation in response to sensor input.
- Application 12/797,503 details how a blackboard data structure is used for passing data between system components. The following discussion provides further information about an illustrative embodiment.
- a physical sensor monitors a sensor, and feeds data from it to the blackboard.
- the camera and microphone of a smartphone are particular types of physical sensors, and may be generically termed "media sensors.”
- Some sensors may output several types of data. For example, an image sensor may output a frame of pixel data, and also an AGC (automatic gain control) signal.
- AGC automatic gain control
- a logical sensor obtains data - typically from the blackboard - and uses it to calculate further data. This further data is also commonly stored back in the blackboard. (The recognition agents discussed in application 12/793,503 are examples of logical sensors. Another is an inference engine.) In some cases the same physical data may pass through multiple stages of logical sensor refinement during processing.
- Modules which produce or consume media content may require some special functionality, e.g., to allow format negotiation with other modules. This can include querying a recognition agent for its requested format (e.g., audio or video, together with associated parameters), and then obtaining the corresponding sensor data from the blackboard.
- a recognition agent for its requested format (e.g., audio or video, together with associated parameters)
- the first two lines simply indicate that a frame of video data, and associated AGC data (which may be, e.g., an average luminance value across the frame), are written to the blackboard from the camera.
- AGC data which may be, e.g., an average luminance value across the frame.
- the third line shows that associated handset movement data - as sensed by the smartphone accelerometer system - is also written the blackboard.
- the table indicates that the Data_Frame data that was previously stored to the blackboard is applied to an image classifier (a variety of logical sensor), resulting in classification data that is stored in the blackboard.
- classification data can be of various sorts. One type of classification data is color saturation. If a frame has very low color saturation, this indicates it is not a color scene, but is more likely printed text on a white background, or a barcode. The illustrative data flow will not activate the watermark detector if the Data_Classification data indicates the scene is likely printed text or barcode - although in other implementations, watermarks may be read from black and white, or greyscale, imagery. Another classifier distinguishes spoken speech from music, e.g., so that a song recognition process does not run when spoken audio is input.)
- the fifth line indicates that the just-derived classification data, together with the AGC and accelerometer data, are recalled from the blackboard and applied to a watermark inference module (another logical sensor) to yield a frame quality metric, which is written back to the blackboard.
- the watermark inference module uses the input data to estimate the likelihood that the frame is of a quality from which a watermark - if present - can be decoded. For example, if the AGC signal indicates that the frame is very dark or very light, then it is improbable that a watermark is recoverable. Ditto if the accelerometer data indicates that the smartphone is being accelerated when the frame of imagery was captured. (The accelerometer data is typically compensated for gravity.) Likewise if the classifier indicates it is a low-saturation set of data.
- the sixth line shows that the just-determined frame quality metric is provided - together with a frame of captured imagery - to a watermark reader (recognition agent). If the frame quality exceeds a threshold, the watermark reader will attempt to decode a watermark from the imagery. The result of such attempt is stored in the ReadResult data (e.g., "1" indicates a watermark was successfully decoded; a "0" indicates that no watermark was found), and the decoded watermark payload - if any - is stored as the WM_ID.
- ReadResult data e.g., "1" indicates a watermark was successfully decoded; a "0" indicates that no watermark was found
- this metric can be used as a priority value that dynamically controls operation of the watermark decoder - based on system context. If the system is busy with other operations, or if other context - such as battery charge - makes the decoding operation costly, then a frame with a low quality metric will not be watermark-processed, so as not to divert system resources from higher-priority processes.
- the installed modules are enumerated in a configuration file. These modules are available for instantiation and use at runtime.
- the configuration file also details one or more scenarios (e.g., Readlmage Watermark - as detailed above, and FingerprintAudio) - each of which specifies a collection of modules that should be used for that scenario.
- the application initializes middleware that specifies a particular scenario(s) to invoke.
- the middleware configuration, and scenarios are typically loaded from an XML configuration file.
- the illustrative system is coded in C/C++ (e.g., using Visual Studio 2010), and follows the architecture shown in Fig. 10.
- the middleware - comprising the blackboard, together with an event controller, and a middleware state machine - is implemented in a dynamic link library (DLL).
- DLL dynamic link library
- the Fig. 10 system employs standardized interfaces through which different system components communicate.
- communication between system applications (above) and the middleware is effected through APIs, which define conventions/protocols for initiating and servicing function calls.
- the sensor modules which are typically implemented as DLLs that are dynamically loaded at runtime by the middleware
- SPI service provider interface
- the illustrated blackboard can store data in a variety of manners, e.g., key-value pairs, XML, ontologies, etc.
- the exemplary blackboard stores data as key-value pairs, and these are accessed using push and pull APIs. Concurrency control is handled by pessimistic locking - preventing processes from accessing data while it is in use by another process.
- the blackboard data types include data blobs in addition to discrete data elements (e.g., integers and strings).
- each data entry has several items of associated data
- the values that are stored in the blackboard are of the following representative data types:
- Video statistics (frame rate, AGC, focal distance, etc.)
- Video classification result Data is written to and read from the blackboard through functions that are supported by both the
- API and SPI Such functions are familiar to artisans and include (with the parentheticals denoting values passed as part of the API/SPI call): BB CreateEntry (name, source, type, size), returns Handle
- the modules publish status information to the blackboard using a common set of named entries.
- Each name is created by using the pattern PREFIX + "_" + MODULE NAME.
- the prefixes include:
- API and SPI functions also include Initialize, Uninitialize, LoadScenario, Start, Stop and Pause, through which the relevant DLLs are initialized (or unitialized), and different scenarios are configured and started/stopped/paused.
- the event controller module of the Fig. 10 middleware deals with the various priorities, processing complexity and processing frequency of different SPIs.
- an image watermark decoder RA is processor intensive, but it operates on discrete frames, so frames can be ignored if other SPIs need time to run. (E.g., in performing the WatermarklmageRead scenario on a stream of images, i.e., a video stream, various frames can be dropped - thereby scaling execution to the available resources, and preventing the system from becoming bogged down).
- an audio watermark decoder RA can be less processor intensive, but it needs to process audio data in an uninterrupted stream. That is, when a stream of audio data is available, the audio watermark RA should take precedence over other SPIs.
- the event controller may periodically interrupt audio processing to allow image processing if a high quality frame of imagery is available.
- each module includes a data structure detailing information about the module's priority needs and limitations, execution frequency, etc.
- a sample of the data in such a structure follows: Name
- PS PS, MS, LS, RA
- Modules and applications can further issue a blackboard trigger function, with a corresponding trigger value (or trigger value range), which causes the middleware (e.g., the blackboard or the event controller) to issue such module/application a notification/message when certain data in the blackboard meets the trigger value criterion.
- a blackboard trigger function with a corresponding trigger value (or trigger value range), which causes the middleware (e.g., the blackboard or the event controller) to issue such module/application a notification/message when certain data in the blackboard meets the trigger value criterion.
- a trigger is if the Readlmage Watermark operation returns a ReadResult value of "1," signifying a successful watermark read.
- Another trigger is if a music recognition module identifies the theme music to the television show Grey's Anatomy. By such function, the module/application can remain dormant until alerted of the presence of certain data on the blackboard.
- each of the sensors can publish data and status information to the blackboard, and this information can be retrieved and used by other modules and by different applications which, in turn, publish their respective results to the blackboard.
- raw data from a single physical sensor can be successively processed and augmented with other information, and reasoned-with, to perform highly complex operations.
- ReadlmageWatermark is a simple example of the multi-phase processing that such system enables.
- Fig. 16 is another architectural view of middleware for media and physical object recognition.
- This flexible architecture can also be used to deliver more contextual information to the application.
- This design includes a blackboard, a sensor bank, a recognition Agent (RA) bank, an inference engine, and an event controller.
- RA recognition Agent
- the blackboard is central to the architecture. It is a shared repository through which system components communicate. No direct communication is typically allowed among any other system components.
- An exemplary blackboard is virtually structured into separate sections, each dedicated to a given type of data. For example, there is one section for audio data and another for imagery.
- the sensor bank provides inputs to the blackboard and may include a camera, microphone, accelerometer, gyroscope, ambient light sensor, GPS, etc.
- the RA bank may include image and audio watermark readers, a fingerprint reader, a barcode reader, etc.
- Each sensor and RA contains a private knowledge base for estimating the cost of achieving its assigned task and the quality of its results. This supports extensibility, enabling sensors or RAs to be added or removed with no impact on other components of the system
- the event controller coordinates the overall operation of the system.
- the inference engine monitors the content of the blackboard and infers contextual data to optimize the use of the resources in the sensor and RA banks.
- the inference engine writes inferred data to the blackboard.
- a minimum number of sensors is typically active to provide input to the blackboard. Upon input from any component, the blackboard may signal the change to all components. Each sensor and RA then assesses whether it can help resolve the identity of a detected object. A sensor can help resolve the identity by providing relevant and more accurate data. A sensor or RA uses its knowledge base to estimate the cost and quality of its solution and writes that data to the blackboard. The event controller activates the sensor(s) or RA(s) estimated to produce optimal results most economically. Sensors and the inference engine continue to update the blackboard to ensure that the most suitable module(s) is always engaged. The system continues this process until the object's identity is resolved. In this scenario, the event controller optimizes the use of power and resources in the system. For example, if the lighting level is poor or the device is in vigorous motion, the camera is not used and neither the image watermark reader nor the barcode reader is used for identification.
- Fig. 17 shows the middleware architecture of the Digimarc Discover application. It is implemented in C/C++ and assembly language and optimized for the iPhone and Android platforms.
- the Digimarc Discover application integrates RAs for digital watermarks, barcodes, and audio fingerprints.
- the primary difference between the architecture depicted in Fig. 17 and that shown in Fig. 16 is the absence of a formal inference engine, and a more limited role for the blackboard. Although some mechanisms are implemented to decide whether to process a media sample, a full inference engine is not required. Also, the blackboard is used essentially as a means for moving media data (audio and video), along with some sensor data (accelerometer, gyroscope, etc.). The blackboard keeps all captured data synchronized and queued for consumption by the RAs regardless of their sampling rates. When an RA is ready for processing, it requests a media sample from a corresponding data queue in the blackboard. The blackboard then provides the RA with the media data and associated sensor data.
- an RA When an RA begins processing the media sample, it may use any of the sensor data attached to the sample. The RA first decides whether to process the sample or not. It undertakes its identification task only if it is relatively confident of reaching a correct identification. Otherwise, the RA aborts the operation and waits for the next media sample. An RA may use a logical sensor to tune its identification parameters. If successful, the RA returns its result to the application through the middleware. To provide an attractive user experience, the RAs should quickly process large amounts of data for the best chance at a positive object/media identification. Because identification requires such a volume of data, appropriate integration with the operating system is desirable. This integration is typically tuned based on the particular audio and video capturing process used. Without such integration, the blackboard may not have enough data for the RAs to perform detection, and the chances of obtaining a correct identification are reduced. Using multiple RAs at once can exacerbate the problem.
- the iPhone version of the Digimarc Discover application relies on Apple's Grand Central Dispatch threading facility, which permits large-scale multi-threading of the application with low thread latency. Audio and video streams are recorded in separate threads, and each RA runs in its own thread. Overhead did not observably increase with the number of threads. The benefits even on the iPhone's single -core processor far outweigh possible drawbacks. RA processing is generally driven by the audio and video capture threads, but this can vary depending on the type of RAs in use.
- Video is delivered to an application as a sequence of frames, each one a complete image.
- audio is delivered as blocks of data in a raw byte stream. While a video frame can stand alone, a single audio block is often useless for audio identification.
- Most audio identification technologies need at least several seconds (equal to many blocks) of audio data for an identification.
- RAs watermark detector and barcode reader
- Excluding one RA from examining an image can significantly improve performance.
- the Digimarc Discover application uses a classifier to decide whether the barcode reader should process an image. Since barcodes are nearly always printed in black and white, the classifier inspects the saturation levels of images and excludes those with significant amounts of color.
- a fingerprint- based Gracenote music recognition RA is controlled by reference to a speech classifier, which avoids calling the Gracenote RA when microphone audio is classified as speech.
- the Image WatermarkRead scenario employs data from the smartphone accelerometer - preventing an attempted watermark read if the image would likely include excessive motion blur.
- other smartphone logical sensors including other position/motion sensors, as well as sensors of focal distance, automatic white balance, automatic gain control, and ISO, can be used to identify low quality frames, so that smartphone resources are not needlessly consumed processing poor quality input.
- implementations employ a formal representation of context, and an artificial intelligence -based inference engine.
- sensors and RAs may be conceived as knowledge sources.
- Any logical sensor may be regarded, in a sense, as an inferencing module.
- a light sensor can detect low light, and infer the smartphone' s context is in the dark. Such inference can be used to issue a signal that turns on a torch to increase the illumination.
- a more sophisticated arrangement employs several modules in the smartphone. For example, the light sensor may detect low light, and the microphone may detect a rustling noise.
- the system may infer the smartphone is in a user's pocket, in which case it may be pointless to turn on the camera's torch. Still more complex arrangements can employ one or more system modules, together with user history data and/or external resources.
- the system may determine, by an audio fingerprinting module, an external database, and user history, that the smartphone user has watched 15 minutes of Season 4, Episode 6, of the television show Grey's Anatomy, and that Sara Ramirez - the actress who plays surgeon Callie Torres - is one of the user's favorite actresses, causing the smartphone to present a link to Ramirez's Wikipedia entry high in a list of menu options displayed to the user during this part of the episode.
- the efficacy of recognition is highly affected by the speed of the RAs and the middleware. A variety of improvements can be taken in this regard.
- Each RA can be modeled with a Receiver Operating Curve at a given aggregate platform utilization profile. Ideally a mobile device profile for each instance of an RA is employed to inform system design.
- the Digimarc image watermark RA is optimized for the iPhone platform by implementing the FFT, log-polar, and non-linear filtering stages in assembly language to take advantage of the NEON registers in the iPhone's A4 processor.
- the image watermark RA processes 128xl28-pixel blocks with a depth of 8 bits, and it can run as a single -block or 4-block detector.
- the execution time of the NEON implementation decreased by 20% for both marked and unmarked frames, as shows as the table below. Increased rejection speed for unmarked frames yields higher throughput, which in turn increases the rate of recognition attempts by the RA.
- the NEON implementation generates particular benefits when the registers' SIMD capability is used to process 4 image blocks concurrently.
- This 4-block approach enables use of a variety of pre -filters to increase the operational envelope of the RA (discussed below) and thus improve the user experience.
- Mobile platforms sometimes offer multiple APIs for a given system service.
- the choice of API can impact resource utilization and the resulting task mix on the platform. In some instances, the choice may impact the sensor's throughput.
- Fig. 18 shows the task mix before and after using the Preview API to retrieve image frames (builds 1.0.8 and 1.11 , respectively).
- Using the Preview API dramatically reduces the time spent rendering the frame and frees up time for image watermark recognition (shown as "Decoding WM").
- Using the Preview API also allowed other system and application threads more use of the processor. While affording the OS more time to service other threads is certainly a benefit to the system as a whole, the increase in throughput from 11 to 14 frames per second is of more direct value to the user. The throughput increase also increases the rate of attempts by the RA to recognize an object.
- recognition rates can be captured as a function of first-order environmental factors, for a specific user scenario.
- throughput can be measured to normalize the recognition rates as a function of time.
- the following table shows the results of using optimized RAs with both the iOS 4.0 Preview and UIGetScreenlmage APIs to retrieve frames from the iPhone Video Queue.
- the primary environmental factors are distance to the print, lighting, and pose.
- a robotic cell was built that repeatedly measures their impacts on recognition rates.
- Fig. 19 displays the results for the two versions, showing which frames resulted in a successfully decoded watermark (or payload) and which did not.
- the improved sampling algorithm materially increased the range of distances over which the watermark could be detected and the payload recovered.
- the detailed Digimarc Discover platform is designed, based on real world usage scenarios, to provide content identification and reduce the associated complexity of building mobile discovery applications.
- the platform is architected to be extensible and allow the addition or removal of any type of recognition agent without impacting the system. Efficient utilization of operating system resources, and optimization of recognition agents, allows consumer-pleasing system performance.
- this middleware includes a formal inference engine that adapts use of sensors and recognition agents based on user and device context, while others use more informal types of inferencing.
- Linked Data as an Atomic Construct of Mobile Discovery
- Smartphone sensors may be regarded as producing data about context.
- such information is represented in a semantically expressive manner, such as by a collection of data triples in the Resource Description
- RDF knowledge representation language
- RDFS knowledge representation language
- Semantic triples commonly express relationships involving two data elements, or express an attribute concerning a single data element.
- parameters can still be assigned values, but the triples are semantically related to other information - imbuing them with meaning.
- a variety of ontological models (RDFS, OWL, etc.) can be used to formally describe the semantics of these triples and there their relationship to each other.
- the LAT (latitude) parameter may still be assigned a floating point datum, but by reference to other triples, the computer can understand that this LAT datum refers to a position on the Earth.
- Such understanding allows powerful inferencing. (For example, a dataset that places an object at latitude 45 at one instant of time, and at latitude 15 two seconds later, can be understood to be suspect.) Semantic web technologies enable smartphones to reason based on contextual information and other data presented in such form.
- Sensed context triples can be stored in graph form, where a sensor makes a collection of assertions about itself and its output data.
- One such graph (a tree) is shown in Fig. 15.
- ImageSensor3DF12_ ImageTime 2011040118060103_Pixel(0,0); HasGreenValue; 32 ⁇
- a tree structure can include a unique name-space identifier in a root or other fundamental node (as in Fig. 15), and other nodes can then be inferentially so-labeled. Trees have a long history as a data organizing construct, and a rich collection of tree-related techniques (e.g., sorting, pruning, memory optimization, etc.) can be applied.
- tree-related techniques e.g., sorting, pruning, memory optimization, etc.
- a blackboard data structure can serve as a database for RDF triples.
- every pixel location in a captured image is expressed as one or more triples.
- sensor systems are configured to output their data as streams of triples.
- Predicates of triples may, themselves, be subjects of other triples.
- “HasRedValue” is a predicate in the example above, but may also be a subject in a triple like ⁇ HasRedValue;
- Recognition agents can use such data triples to trigger, or suspend, their operation. For example, if incoming pixel triples are dark, then no optical recognition agents (e.g., barcode reader, watermark decoder, OCR engine) should be run. Expressing data in fine- grained fashion (e.g., down to the level of triples asserting particular pixel values) allows similarly finegrained control of recognition agents.
- optical recognition agents e.g., barcode reader, watermark decoder, OCR engine
- a triple may provide a pointer to memory that contains a collection of pixels, such as a centerl6xl6 pixel block within an image frame. Still higher, a triple may provide a pointer to a memory location that stores a frame of imagery. Assertions about the imagery at this memory location can be made through a series of triples, e.g., detailing its size (e.g., 640 x 480 pixels), its color representation (e.g., YUV), its time of capture, the sensor with which it was captured, the geolocation at which it was captured, etc.
- its size e.g., 640 x 480 pixels
- its color representation e.g., YUV
- a system processor can then take action on the imagery using the stored assertions. For example, it can respond to a query from a user - or from another system process - to locate a frame of imagery captured from a particular location, or captured at a particular time. The system can then perform an operation (e.g., object recognition) on the thus-identified frame of imagery. In another example, the system can discern how to compute the average luminance of a frame using, in part, knowledge of its form of color representation from stored RDF data. (In a YUV image, Y denotes luminance, so averaging Y across all pixels of a frame yields average luminance. In an RGB image, in contrast, luminance at each pixel is a weighted sum of R, G and B values; these weighted sums can then be averaged across the frame to obtain average luminance.)
- Software e.g., the ICP state machine in application 12/797,503, with the middleware arrangements detailed above
- the ICP state machine in application 12/797,503, with the middleware arrangements detailed above
- context may indicate that both a barcode reading agent and a watermark decoding agent should be active.
- the barcode reader may prefer luminance data, but less preferably could use RGB data and derive luminance therefrom.
- the watermark decoder may require full color imagery, but is indifferent whether it is provided in RGB, YUV, or some other format.
- the system software can weigh the different needs and preferences of the different recognition agents, and configure the sensor system accordingly.
- middleware serves as a negotiating proxy between different agents, e.g., soliciting preference-scored lists of possible data types, scoring different combinations, and making a selection based on the resultant different scores.
- the software would direct the sensor system to output YUV data, since such data is directly suitable for the watermark decoder, and because the Y channel (luminance) data can be directly used by the barcode reader.
- smartphones may be regarded as having logical sensors.
- Logical sensors may both consume context data, and produce context data, and typically comprise software processes - either on the smartphone, or in the cloud. Examples run a wide gamut, from code that performs early-stage recognition (e.g., here's a blob of pixels that appear to be related; here's a circular shape), to full-on inference driven sensors that report the current activity of the user (e.g., Tony is walking, etc.).
- Such context data again can be stored as a simple graph, where the logical sensor makes one or more assertions about the subject (e.g., subject Smartphone_Owner_Tony;
- SPARQL can be used to access triples in the database, enabling detailed queries to be maintained.
- Logical sensors can naturally use - as inputs - data other than smartphone sensor data and its derivatives. Sensors in the environment, for example, can be sources of input. User calendar data or email data may also be used. (A sound sensed - or an objected viewed - at a time that the user is scheduled to be in a meeting may be indicated as having occurred in the presence of the other meeting attendee(s).) Information obtained from social media networks (e.g., via a Facebook or Linkedln web API) can similarly be provided as input to a logical sensor, and be reflected in an RDF output triple.
- the recognition agents detailed in application 12/797,503 can embody state-machines and associated algorithms to recognize specific content/object types, in support of particular applications.
- the applications represent goal-driven usage models. They interface with the detailed intuitive computing platform to perform specific tasks, by leveraging one or more recognition agents. E.g., decode a watermark, recognize a song; read a barcode.
- the intuitive computing platform detailed in application 12/797,503 uses sensors to generate context that can inform software agents - both local and in the cloud - about how to better complete their tasks.
- Jena an open source Java framework for semantic web applications (originally developed by Hewlett-Packard) that provides an RDF API, reading/writing RDF /XML, N3, and N-triples, an OWL API, and a SPARQL query engine.
- One adaptation of Jena for mobile handsets is ⁇ -Jena, from the Polytechnic of Milan. (Alternative implementations can use Androjena or Mobile RDF.)
- the intuitive computing platform detailed in application 12/797,503 manages traffic from applications to the recognition agents, and arbitrates resource contention of both logical and physical sensors.
- the blackboard data structure can be used to enable such inter-process communication, and maintain information about system status (e.g., battery state).
- An example of inter-process communication via the blackboard is a watermark decoder that senses inadequate luminance in captured imagery, and wants the smartphone torch to be turned-on. It may post a triple to the blackboard (instead of making an OS system call) requesting such action.
- One such triple may be:
- a torch control process may monitor the blackboard for such triples, and turn the torch on when same occur. Or, if battery power is low, such a process may wait until two or more recognition agents are waiting for the torch to be illuminated (or until other indicia of urgency is found), and only then turn it on.
- the watermark decoder may detect that the torch has been turned on by a SPARQL query that searches the blackboard for a triple indicating that the torch is powered. This query returns a response when the torch is illuminated, un-blocking the watermark decoding agent, and allowing it to run to completion.
- GeoNames are among the many sources for such data.
- Phone sensor data GPS can be applied to the GeoNames or DBpedia services, to obtain corresponding textual geo-labels.
- the context data needn't derive from the user's own smartphone.
- Low-level sensor information collected/donated by others using their mobile devices, e.g., in the same locale and time period can be used as well (subject to appropriate privacy safeguards).
- data from nearby stationary sensors such as road cameras maintained by government entities, etc.
- the same locale is, itself context/application dependent, and may comprise, e.g., within a threshold distance - such as 100m, lkm or 10km; within the same geographic entity - such as town or city; etc.
- time -proximity can be threshold-bounded, such as data collected within the past 10 seconds, 10 minutes, hour, etc.).
- Such information can be directly integrated into the local blackboard so that device agents can operate on the information.
- data can include audio, samples of wireless signals available in the area to help identify location, etc., etc.
- Web 2.0 notions of data and resources are used with tangible objects and/or related keyvector data, and associated information.
- Linked data refers to arrangements promoted by Sir Tim Berners Lee for exposing, sharing and connecting data via de-referenceable URIs on the web. (See, e.g., T.B. Lee, Linked Data,
- URIs are used to identify tangible objects and associated data objects.
- HTTP URIs are used so that these objects can be referred to and looked up ("de-refeerenced") by people and user agents.
- useful information e.g., structured metadata
- This useful information desirably includes links to other, related URIs - to improve discovery of other related information and tangible objects.
- RDF Resource Description Framework
- RDF Resource Description Framework
- the subject of the triple is a URI identifying the described resource.
- the predicate indicates what kind of relation exists between the subject and object.
- the predicate is typically a URI as well - drawn from a standardized vocabulary relating to a particular domain.
- the object can be a literal value (e.g., a name or adjective), or it can be the URI of another resource that is somehow related to the subject.
- Web Ontology language is one, and uses a semantic model that provides compatibility with the RDF schema.
- SPARQL is a query language for use with RDF expressions - allowing a query to consist of triple patterns, together with conjunctions, disjunctions, and optional patterns.
- items of data captured and produced by mobile devices are each assigned a unique and persistent identifier.
- These data include elemental key vectors, segmented shapes, recognized objects, information obtained about these items, etc.
- Each of these data is enrolled in a cloud-based registry system, which also supports related routing functions. (The data objects, themselves, may also be pushed to the cloud for long term storage.)
- Related assertions concerning the data are provided to the registry from the mobile device.
- each data object known to the local device is instantiated via data in the cloud.
- a user may sweep a camera, capturing imagery. All objects (and related data) gathered, processed and/or identified through such action are assigned identifiers, and persist in the cloud. A day or a year later, another user can make assertions against such objects (e.g., that a tree is a white oak, etc.). Even a quick camera glance at a particular place, at a particular time, is memorialized indefinitely in the cloud.
- Such content in this elemental cloud-based form, can be an organizing construct for collaboration.
- Naming of the data can be assigned by the cloud-based system. (The cloud based system can report the assigned names back to the originating mobile device.) Information identifying the data as known to the mobile device (e.g., clump ID, or UID, noted above) can be provided to the cloud-based registry, and can be memorialized in the cloud as another assertion about the data.
- a partial view of data maintained by a cloud-based registry can include:
- ImageData#94D6BDFA623 was_Provided_From _Device iPhone 3Gs DD69886
- ImageData#94D6BDFA623 was_Captured_at_Time November 30, 2009, 8:32:16 pm
- ImageData#94D6BDFA623 Was_Captured_at_Place 45.51N 122.67W
- ImageData#94D6BDFA623 was_Produced_by_Algorithm Canny
- the mobile device provides data allowing the cloud-based registry to instantiate plural software objects (e.g., RDF triples) for each item of data the mobile device processes, and/or for each physical object or feature found in its camera' s field of view.
- plural software objects e.g., RDF triples
- I am Canny data I am based on imagery captured at a certain place and time
- these attributes can be linked with data posted by other devices - allowing for the acquisition and discovery of new information not discernible by a user' s device from available image data and context alone.
- John's phone may recognize a shape as a building, but not be able to discern its street address, or learn its tenants. Jane, however, may work in the building. Due to her particular context and history, information that her phone earlier provided to the registry in connection with building-related image data may be richer in information about the building, including information about its address and some tenants. By similarities in geolocation information and shape information, the building about which Jane's phone provided information can be identified as likely the same building about which John' s phone provided information.
- Locations (e.g., determined by place, and optionally also by time) that have a rich set of assertions associated with them provide for new discovery experiences.
- a mobile device can provide a simple assertion, such as GPS location and current time, as an entry point from which to start a search or discovery experience within the linked data, or other data repository.
- access or navigation of assertions in the cloud can be influenced by sensors on the mobile device.
- John may be permitted to link to Jane's assertions regarding the building only if he is within a specific proximity of the building as determined by GPS or other sensors (e.g., 10m, 30m, 100m, 300m, etc.). This may be further limited to the case where John either needs to be stationary, or traveling at a walking pace as determined by GPS, accelerometers/gyroscopes or other sensors (e.g., less than 100 feet, or 300 feet, per minute).
- Such restrictions based on data from sensors in the mobile device can reduce unwanted or less relevant assertions (e.g., spam, such as advertising), and provide some security against remote or drive-by (or fly-by) mining of data. (Various arrangements can be employed to combat spoofing of GPS or other sensor data.)
- assertions stored in the cloud may be accessed (or new assertions about subjects may be made) only when the two involved parties share some trait, such as proximity in geolocation, time, social network linkage, etc. (The latter can be demonstrated by reference to a social network data store, such as Facebook or Linkedln, showing that John is socially linked to Jane, e.g., as friends.)
- a social network data store such as Facebook or Linkedln, showing that John is socially linked to Jane, e.g., as friends.
- Such use of geolocation and time parallels social conventions i.e. when large groups of people gather, spontaneous interaction that occurs can be rewarding as there is a high likelihood that the members of the group have a common interest, trait, etc.
- Ability to access, and post, assertions, and the enablement of new discovery experiences based on the presence of others follows this model.
- Location is a frequent clue that sets of image data are related. Others can be used as well.
- elephants e.g., in a preserve
- facial features including scars, wrinkles and tusks
- the researcher' s smart phone may submit facial feature vectors for an elephant to a university database, which exists to associate facial vectors with an elephant's name.
- a greater wealth of information may be revealed, e.g., dates and locations of prior sightings, the names of other researchers who have viewed the elephant, etc. Again, once correspondence between data sets is discerned, this fact can be memorialized by the addition of further assertions to the registry.
- AKT Advanced Knowledge Technologies
- Thing the class “Thing”
- “Tangible -Thing” includes everything from software to sub-atomic particles, both real and imaginary (e.g., Mickey Mouse's car).
- "Tangible- Thing” has subclasses including "Location,” “Geographical-Region,” “Person,” “Transportation-Device,” and “Information-Bearing-Object.” This vocabulary can be extended to provide identification for objects expected to be encountered in connection with the present technology.) Mixed-Domain Displays
- a smartphone presents a display that includes both natural imagery captured by the camera, as well as transform-domain information (e.g., in the spatial-frequency, or Fourier, domain) based on camera-captured imagery.
- transform-domain information e.g., in the spatial-frequency, or Fourier, domain
- Embodiments of the present technology reveal this transform domain-based information to the viewer.
- Fig. 11 shows an exemplary spatial-frequency domain view of a reference signal 210 that is added to printed host imagery, with the real components represented by the horizontal axis, and the imaginary components represented by the vertical axis (the so-called "M, V" plane).
- the illustrated reference signal comprises pentagonal constellations 212 of spatial domain impulses at frequencies (i.e., distances from the origin) that are too high for humans to perceive, but that are detectable in data produced by the image sensor in a smartphone camera.
- the corresponding spatial-frequency domain view of the host imagery is not shown, but would typically comprise signal scattered throughout the u,v plane, but mostly concentrated along the horizontal and vertical axes.
- the markers 215 are centered on a circle 215.
- the limit of human vision is shown by a smaller circle 217.
- Features composed of spatial-frequency components outside of circle 217 e.g., markers 212 are too high in frequency to be discernible to human viewers.
- the markers 212 were lower in spatial-frequency, they would correspond to a pixel pattern that is akin to a fine herringbone weave. At higher frequencies, however, the eye can't distinguish a weave pattern. Rather, the weave dissolves into apparent flatness.
- pentagonal marker constellations 212 While four pentagonal marker constellations 212 are shown, of course a lesser or greater number can also be used. Similarly, the markers needn't be pentagonal in form.
- a smartphone camera When a smartphone camera detects reference pattern 210, it can thereby discern the relative distance between the camera and the printed object, and any rotation and tilt of the camera relative to the object. For example, if the camera is moved closer to the object, the enlarged image components are sensed as having lower component spatial frequencies. Thus, the pentagonal markers move closer to the origin. If the camera is rotated (relative to the orientation at which the reference signal was originally encoded in the host imagery), the pentagonal markers appear similarly rotated. If the camera is tilted - so that part of the printed imagery is closer to the sensor than other parts of the printed imagery - the pattern of pentagons is skewed. (No longer do their centers 214 fall on a circle 215 centered about the u,v origin; instead, they fall on an ellipse.)
- Fig. 12 shows an exemplary smartphone display 220.
- the smartphone is imaging part of a cereal box - the artwork 222 of which occupies most of the screen.
- Superimposed on the screen is a half-plane depiction of the detected reference signal, including the top two pentagonal reference markers.
- the illustrated display also includes two fixed target regions 224 - outlined in circular dashed lines.
- the transform domain overlay is presented at a visibility (strength) that varies with strength of the detected reference signal. If no reference signal is detected (e.g., by a detection metric output by a pattern detector), then no overlay is presented. With stronger signals, the overlaid marker signals are presented with greater contrast - compared to the background image 222. In some embodiments, the markers are presented with coloration that varies in chrominance or luminosity, depending on strength of the detected reference signal.
- the spatial-frequency representation of the captured imagery is thresholded, so that any spatial-frequency component below a threshold value is not displayed. This prevents the display from being degraded by a Fourier domain representation of the captured cereal box artwork 222. Instead, the only overlaid signal corresponds to the marker signals.
- the spatial-frequency data may be high-pass spectrally-filtered, so only image components that are above a threshold spatial frequency (e.g., the spatial frequency indicated by circle 217 in Fig. 11) are shown.
- a threshold spatial frequency e.g., the spatial frequency indicated by circle 217 in Fig. 11
- the circular target regions 224 are not essential. Other visual guides can be presented, or they can be omitted entirely. In the latter case, the user may be instructed to position the phone so that the markers 224 are even (i.e., horizontally-across). If the transformed data is spectrally-filtered (as described in the preceding paragraph), then the user may be instructed to position the phone towards- or away- from the subject until the markers just appear.
- the five points of the markers 212 look a bit like little pixie figures - a head, two hands and two feet, especially when rendered in color. The user can thus be instructed to "look for the pixie people." Their appearance can be made particularly noticeable by giving the five component elements of each marker different colors, and change the colors over time - yielding an engaging, shimmering effect.
- the spatial-frequency information is shown in a rectangular box 226.
- this box also serves to define a rectangular sub-region of pixels within the artwork 222, on which the transform domain analysis is performed. That is, instead of converting the entire frame of imagery into the Fourier domain, only those pixels within the box 226 are so-converted. This reduces the burden on the phone processor.
- the box 226 may be regarded as the fovea region - the sub-region of pixels on which the processor focuses its attention as it helps the user optimally position the phone.
- the luminance of pixels in region 226 can be slightly increased or decreased - to further highlight the region to the user.
- Digital watermarks are normally imperceptible. This is desirable because they can be encoded into fine artwork and other graphics without introducing any visible change.
- this advantage has an associated disadvantage: potential users of the encoded data are uncertain whether any watermarked data is present.
- this disadvantage has sometimes been redressed by use of a small visual logo, printed at a corner of the encoded visual artwork, to indicate that the artwork is watermark-encoded.
- the presence of digitally watermarked information is visually cued by making the visual watermark pattern subtly visible.
- the spatial-frequency elements comprising a watermark are low enough in frequency, they produce a pattern akin to a weave (e.g., a herringbone weave, in the case of regular pentagonal markers). In some applications, such a woven background pattern is not
- Smartphone -based systems can be used to capture imagery of such distinctive patterns, decode the watermarked information, and take corresponding action(s).
- such a watermark includes a first set of spatial frequency components that are within the range of human vision (i.e., inside circle 217 of Fig. 11), and a second set of spatial frequency components that are beyond the range of human vision (i.e., outside circle 217 of Fig. 11).
- the former can include components that are pseudo-randomly distributed in the u,v plane to define a corresponding pattern in the pixel domain that is akin to the surface appearance of handmade paper - which commonly includes a random pattern based on the distribution of pulp fibers in such paper.
- This first set of spatial frequency components can be used repeatedly across all types of documents - producing a characteristic pattern that users can eventually come to recognize as clueing the presence of encoded information.
- This consistent pattern can be used by the smartphone watermark detector (1) to quickly identify the presence of a watermark, and optionally (2) to determine translation, scale and/or rotation of the captured imagery - relative to its originally encoded state.
- the second set of spatial frequency components in this particular embodiment, conveys some or all of the watermark payload information. This information varies from document to document.
- the clueing pattern may even take the form of a distinctive script or typeface - used to indicate the presence of hidden information.
- a font may include serifs that include a distinctive extension feature - such as a curl or twist or knot on the right side.
- printing that includes encoded watermark data may include a distinctive border.
- One is a framing rectangle defined by three fine lines.
- Another is a set of two- or four- similar corner markers - such as the one shown in Fig. 13.
- such a border- or corner-marking is not present in the original physical medium, but is rendered as an on-screen graphic overlay that is triggered by smartphone detection of a signal (e.g., the Fig. 11 or 14 signal) encoded in the medium.
- a signal e.g., the Fig. 11 or 14 signal
- the lines of such overlaid marking are rendered in a somewhat blurred fashion if the smartphone is at a sub-optimal viewing pose, and are increasingly rendered in-focus as the user moves the smartphone to a more optimum viewing pose.
- the phone is positioned optimally (e.g., with plan view of the watermarked subject, at a distance of six inches), then the lines are presented in crisp, sharp form.
- software in the phone translates information about the optimality of the viewing pose into a visual paradigm that is somewhat familiar to certain users - the dependence of focus on distance.)
- the visual, flag, and link layers there may be three conceptual "layers" through which information is presented to a user. These may be termed the visual, flag, and link layers.
- the visual layer is a human-perceptible clue that there is digital watermark information present. As just-noted, these can take different forms. One is a logo, typeface, border, or other printed indicia that indicates the presence of encoded information. Another is a visible artifact (e.g., weave-like patterning) that is introduced in printed content as part of the watermarking process.
- a visible artifact e.g., weave-like patterning
- the flag layer is an indicia (typically transitory) that is presented to the user as a consequence of some initial digital image processing.
- One example is the "pixie people” referenced above.
- Another is the “proto-baubles” discussed in application 12/797,503. Others are discussed in application 12/774,512.
- the flag layer serves as a first glimmer of electronic recognition that there is, in fact, a watermark present. (The flag layer may optionally serve as an aid to guide the user in positioning the smartphone camera for an optimized watermark read.)
- the link layer comprises the information presented to the user after the watermark is decoded. This commonly involves indexing a resolver database with a decoded watermark payload (e.g., a large number) to learn what behavior is associated with that watermark, and then initiating that behavior.
- a decoded watermark payload e.g., a large number
- devices that receive watermark- encoded media signals can act to decode the watermark data, and relay it onward by another data channel.
- the audio and/or video of the programming may be encoded with digital watermark information, e.g., that identifies the program.
- digital watermark information e.g., that identifies the program.
- a consumer may be using a smartphone or tablet computer while watching the video programming on the television, and it may be advantageous for the
- a first device e.g., a television or set-top box
- decodes watermark data from a content stream It then relays this data - by a different channel - to a second device (e.g., a smartphone).
- a second device e.g., a smartphone
- a decoder in a television receives programming and decodes, from the audio component, an audio watermark. It then re-transmits the decoded watermark data to nearby smartphones via Bluetooth wireless technology. These smartphones thus receive the watermark data (using their built-in Bluetooth receivers) free of ambient room noise interference.
- NFC radio protocol Another wireless data channel by which decoded watermark information can be relayed is the NFC radio protocol (which presently operates at 13.56 MHz).
- NFC systems typically include a receiver (e.g., a smartphone) that acts to power a nearby passive NFC chip/emitter by magnetic coupling, and then receive a resulting weak RF response emitted by the chip
- the same smartphone NFC circuitry can receive signals that are transmitted by a powered 13 MHz transmitter - with which a television, set- top box, or other device may be equipped.
- the lowest standard NFC data rate, 106 kbits/second is more than adequate for watermark-relating service (and is sufficiently broadband to allow highly redundant error-correction coding of the relayed data - if desired).
- Still another data channel for relaying decoded watermark data between devices is WiFi, e.g., according to the 802.11b, 802. l lg, or 802.11 ⁇ standards.
- IR communications such as the sort by which televisions and remote controls commonly communicate.
- the television or set-top box, etc.
- the television is typically the emitter of the IR radiation, rather than the receiver.
- IR communications systems commonly use a wavelength of 940 nm.
- the data is communicated by modulating a carrier signal, e.g., 36 KHz, in the case of the popular RC-5 protocol.
- a carrier signal e.g., 36 KHz
- each button on a remote control corresponds to a 14-bit code transmission, with which the carrier signal is modulated when the button is pressed.
- Watermark data can be conveyed in similar fashion, e.g., by using groups of 14-bit codes (thereby allowing existing decoding hardware to be adapted for such use).
- the television (or set-top box) advertises - to other devices - the availability of decoded watermark data using the Bonjour service.
- Bonjour is an implementation of Zeroconf - a service discovery protocol. Bonjour locates devices on a local network, and identifies services that each offers, using multicast Domain Name System service records. This software is built into the Apple MAC OS X operating system, and is also included in the Apple "Remote" application for the iPhone, where it is used to establish connections to iTunes libraries via WiFi. Bonjour is also used by TiVo to locate digital video recorders and shared media libraries. Using Bonjour, the first device advises other devices on the network of the availability of the watermark data, and provides parameters allowing the other devices to obtain such data.
- a first device e.g., a television or set-top box
- the first device may send the fingerprint data to a database system.
- the database system tries to find a close match among stored reference data, to thereby access metadata associated with the fingerprint-identified content.
- This metadata can then be sent back to the originating first device.
- This first device relays this metadata on to the second device via the data channel.
- a smartphone is used in connection with a personal shopping service.
- a service -oriented retail establishment such as the Apple stores found in certain shopping districts.
- a consumer browsing in such a store may use a smartphone to express curiosity about a product (e.g., a MacBook Pro computer). This may involve capturing an image of the MacBook Pro, or otherwise sensing identification information (e.g., from an RFID or NFC chip on the device, or from a barcode or watermark on associated signage).
- the smartphone sends a signal to a service indicating the consumer's interest.
- the phone may wirelessly (e.g., by WiFi or Bluetooth) send the image, or the sensed identification information, to a back office store computer that is running shopper service application software. With the transmitted product information, the phone also sends to the back office computer an identifier of the consumer.
- This consumer identifier can be a name, telephone number, Apple customer number (e.g., iTunes login identifier), or Facebook (or other social network) login identifier, etc.
- the shopper service application software retrieves profile information, if any, associated with that shopper.
- This profile information can include the person's history with Apple - including purchasing history, a list of registered Apple software, and information about other shopper-Apple encounters.
- the shopper service application software enters the consumer in a queue for personal service. If there are several customers ahead in the queue, the software predicts the wait time the shopper will likely experience before service, and sends this information to the consumer (e.g., by a text message to the user's phone).
- the store may provide the customer (e.g., the customer's smartphone or other computer device) with engaging content to help pass the time. For example, the store may grant the shopper unlimited listening/viewing rights to songs, video and other media available from the iTunes media store. Free downloads of a limited number of content items may be granted. Such privileges may continue while the shopper remains in or near the store.
- the customer e.g., the customer's smartphone or other computer device
- the store may grant the shopper unlimited listening/viewing rights to songs, video and other media available from the iTunes media store. Free downloads of a limited number of content items may be granted. Such privileges may continue while the shopper remains in or near the store.
- the software sends the shopper an alert, including the assistant's name, and a picture of the assistant.
- an alert including the assistant's name, and a picture of the assistant.
- a distilled version of the shopper's profile information - giving highlights in abbreviated textual form - was provided to the shopping assistant (e.g., to the assistant's smartphone), to give background information that may help the assistant provide better service.
- the assistant then approaches the customer, and greets him or her by name - ready to answer any questions about the MacBook Pro.
- the queue for personal service may not be strictly first-come, first-served. Instead, shoppers with a history of Apple purchases may be given priority - and bumped ahead of others in the queue, in accordance with the value of their past Apple purchases.
- the shopper service software applies some safeguards to assure that new customers are not always bumped down in priority each time an existing Apple customer enters the store.
- the queue may be managed so that a limited number of priority customers (e.g., two) is granted placement in the queue ahead of a new customer. After two priority customers are bumped ahead of the new customer, the next priority customer is inserted in the queue after the new customer (but ahead of other new customers who have not yet been twice-bumped).
- Queue management can depend on factors in addition to (or other than) past transaction history with Apple. Mining of public and commercial databases allows compilation of useful demographic profile information about most shoppers. If the shopper service computer determines that a customer who just entered the store appears to be the DMV registrant of a late-model Lexus automobile, that customer may be given a priority position in the queue ahead of an earlier customer who, DMV records indicate, drives an old Yugo. (Or, the store may adopt the opposite policy.)
- the foregoing functionality may be implemented via an application program downloaded to the customer's smartphone, or as a web service to which the customer is directed. Or, much of the functionality may be implemented by text (picture) messaging arrangements - with the store optionally providing links that invoke other standard smartphone software (e.g., a web browser or iTunes software).
- a smartphone is used to quickly identify accessories that are useful with certain electronic devices.
- An illustrative scenario is a shopper who enters an electronics retailer, such as Fry's, looking for a protective case for her HTC Thunderbolt smartphone.
- the store has a wall of smartphone cases.
- the shopper would scrutinize each different package - looking for an indication of the smartphone(s) for which that case is suited. This may require removing many of the cases from the wall and turning the packages over - reading fine print. Frustration quickly ensues.
- the retailer makes available a software tool, which may be downloaded to the user's smartphone (or other device). Or the tool may be offered as a web service.
- the user is invited to indicate what they are looking for, such as by a dropdown menu that may include Accessories (cases, chargers, etc.).
- a further dialog inquires about the product for which accessories are sought.
- the user enters (or selects from a dropdown menu) "HTC Thunderbolt.” (The artisan will recognize that this information may be gleaned in many other ways - the particular implementation of this data collection phase can be adapted to the particular store context.)
- the store software Once the store software has collected data identifying the customer's mission, as identifying accessories for a HTC Thunderbolt phone, it then searches a database to identify all products in its inventory that are compatible with such device. This may be done by text-searching datasheets for store products, to identify those that have related keywords. Or, the vendors of accessories may make such compatibility information available to the store in a standardized form - such as by a listing of UPC codes, or other such identifiers for each product with which an accessory is compatible.
- the store downloads a list of identifiers of compatible products to the shopper's device.
- the software advises the shopper to physically scan the display of protective smartphone cases (which is found mid-way down aisle 8B, if the shopper is not already there), and informs the shopper that the phone will display a green light (or output another confirmatory signal) for those accessories compatible with the HTC Thunderbolt.
- the scanning mechanism can be of various sorts - again depending on the context.
- the product packages may each be equipped with an RFID or NFC chip, which serves to electronically identify the product to a smartphone when the phone is brought into close proximity. (NFC readers will soon be standard features of most smartphones.) Or, image recognition techniques can be used. (Although numerous, there is a limited number of protective cases on the wall, each with different packaging.
- the store computer can download visual fingerprint data, such as SIFT or SURF data, or other characteristic information by which the smartphone can visually identify a particular package from this limited universe, by analysis of streaming camera data.)
- the smartphone applies imagery captured by its camera to a watermark detector, which extracts plural-bit data encoded into the artwork of the product packaging. Or barcode reading can be used.
- the phone harvests identifiers from nearby products
- the previously-downloaded list of identifiers for compatible devices is checked for matches. If the identifier of a scanned product is found among the downloaded list of compatible products, a suitable indication is output to the user.
- the smartphone acts in a manner akin to a Geiger counter.
- the customer moves the phone along the displayed protective cases, it issues a signal to draw the customer's attention to particular items of interest (i.e., those cases adapted to protect the HTC Thunderbolt phone).
- the user can then focus her inquiry on other considerations (e.g., price and aesthetics), rather than puzzling over the basic question of which cases are suitable candidates for purchase.
- the store needn't download a list of compatible identifiers to the smartphone.
- the smartphone can send sensed identifiers to the store computer, which can then match such identifiers against a list of compatible products.
- a list of compatible products needn't be generated in advance.
- the store computer can receive scanned identifiers from the customer's smartphone and then determine, on-the-fly, if the scanned product is compatible (e.g., by then-recalling and checking data associated with that product for an indication that the HTC Thunderbolt phone is one of the products with which it is compatible).
- the detection of product identifiers from sensed packaging needn't be performed by the phone.
- camera imagery may be streamed from the phone to the store computer, where it can be processed (e.g., by pattern-, watermark- or barcode-recognition techniques) to obtain an associated identifier.
- shelf tags or other markings can also serve as the basis for product identification.
- a shelf tag may bear the store's proprietary SKU number.
- the reference data by which compatibility is indicated e.g., a product's datasheet
- the system may need to look-up the UPC code from the sensed SKU number in determining compatibility.
- Computational photography refers to image processing techniques that algorithmically alter captured image data to yield images of enhanced form.
- One example is image deblurring.
- Image blur is a particular problem with smartphone cameras, due to the necessarily small size of the camera aperture, which limits the amount of light delivered to the sensor, thus requiring
- Lengthened exposure times require the user to hold the camera steady for longer periods - increasing the risk of motion blur.
- the light weight of such phones also increases the risk of motion blur - they lack the inertial stability that heavier cameras, such as SLRs, offer.
- Blur can be introduced by phenomena other than motion.
- lens optics typically focus on subjects within a particular focal plane and depth of field. Objects that are outside the focused field are blurred (so-called "defocus blur”).
- Blur functions can be characterized mathematically and, once characterized, can be counteracted by application of an inverse function.
- blur functions cannot usually be measured directly; rather, they typically must be estimated and iteratively refined.
- Recovering the blur function from a blurred image is an uncertain endeavor, since the blurred image alone typically provides only a partial constraint.
- To help disambiguate between alternate original images, and better estimate the associated blur function generally a blur "kernel"
- known reference information is introduced into scenes that may be imaged by cameras (e.g., smartphones), to provide image priors that allow image enhancement.
- the prior information can be used in the spatial-frequency domain (where it appears as pentagonal constellations of impulse functions), or in the pixel domain (where it appears as a characteristic weave pattern - too high in frequency to be discerned by human viewers but detectable from camera-captured imagery).
- marker signals may be tailored in frequency to optimize their utility with respect to blur compensation. They may also be tailored in form. For example, instead of markers composed of five impulse functions - as in Fig. 11 , a blur-redressing marker signal may comprise a lesser number of elements, such as one or two. Similarly, instead of impulse function components, such markers may be comprised of elongated segments, arranged horizontally, vertically, and/or at intermediate angles - to help improve robustness in the presence of motion blur. An example is the pattern 302 shown in Fig. 14.
- a watermark signal can include various sets of signal elements.
- One set can comprise a set of registration signals. These are encoded relatively strongly, and enable the translation, scale and rotation of the watermarked imagery to be determined. Once these parameters are known, a thus-informed watermark detector can then recover a second set of elements, which are more numerous (and are typically more weakly encoded), that convey most (or all) of the watermark payload data.
- the marker signals of Figs. 11 and 14 can be used in a manner like the registration signals of patent 6,590,996, to determine affine parameters about the captured imagery. And they also can serve the dual purpose of providing image priors, for blur correction.
- blind deconvolution is applied to a blurred image, using the subliminal markers provided by patterns 210/302 as image priors. Iterative correction is applied to the image to reduce the blur effect - seeking to restore the image to a sharper form. (Assessing the intensity of the blur-corrected Fourier domain marker signals is one metric that can be used.)
- a watermark reading operation is then performed on the blur-compensated imagery - allowing recovery of the plural-bit payload information.
- a virtuous cycle results - the marker signals are useful in deblurring the image, and the resulting deblurred image yields better decoded-watermark results.
- the watermark payload can include various bits that convey statistics about the original imagery.
- image statistics have been used in the prior art as image priors to aid in removing blur.
- a problem with the prior art is obtaining reliable image statistics - when only a blurred image is available.
- a digital watermark can provide a channel by which such information can be reliably conveyed, from the image to the deblurring system.
- the marker signals 210/302 can themselves convey information.
- the phases of the component marker elements can be selectively inverted to convey a limited number of bits.
- One image statistic that can be conveyed in this manner is average luminance of the original artwork. This statistic offers a constraint that is useful in assessing the accuracy of different iterated blur solutions.
- the cereal box artwork depicted in Fig. 12 may comprise an array of 6 x 4 watermark tiles, allowing statistics for 24 different spatial regions to be conveyed.
- a watermark signal may be added that escapes attention because of its chrominance.
- the human eye for example, is relatively insensitive to yellow.
- known marker patterns may be inserted at lower frequencies, if printed in yellow.
- other inks that are generally outside the realm of human perception, but detectable by image sensors, can also be used.
- online photo repositories such as Flickr and Facebook may routinely check uploaded imagery for watermarks. Whenever watermarks are found, the service can employ such signals in computational photography methods to enhance the imagery.
- subliminal marker signals can aid a camera' s auto-focus system in determining where focus should be established.
- a smartphone camera captures a sequence of image frames (e.g., in a streaming capture-, or video- mode). During each frame, motion of the phone is sensed - such as by the phone's 3D gyroscope and/or accelerometer. Selected ones of the stream of image frames (i.e., selected based on low phone motion) are then aligned and combined, and output as an enhanced image.
- Such an enhanced image can be applied, e.g., to a digital watermark detector.
- the image enhancement allows the detector to output the decoded information more quickly (since it needn't work as long in recovering marginal signals), and allows for more robust watermark recovery (e.g., decoding despite poor illumination, image corruption, and other challenges).
- a motion threshold can be set (e.g., in gyroscope-sensed degrees of rotation per second of time), and frames having motion below that threshold can be combined. (Or, in another view, frames having motion above that threshold are disregarded.)
- the number of frames to be combined can be set in advance (e.g., use the first six frames that meet the threshold criterion), or the technique can utilize all frames in the sequence that pass such test.
- Another option is to set a threshold in terms of target frame count (e.g., ten), and then select - from the captured sequence of frames - the target number of frames that have the lowest values of motion data (of whatever value).
- the combination of frames can be by simple averaging. Or, weighted averaging can be used.
- the weight assigned to each frame can depend on the associated motion data. Desirably, the weighting is more particularly based on relationships between the frames' respective motion data, so that the "stiller" a frame, the more it contributes to the average.
- k A [Motion(Frame M iN)/ otion(FrameA)] where k A is the weighting factor for Frame " ⁇ ;” Motion(Frame A ) is the motion, in degrees per second, of frame "A”; Motion(Frame M iN) is the minimum motion among all of the frames in the selected set, and X is an exponential ratio-ing factor.
- the preferred embodiments of these just-noted applications discern the pose of the smartphone relative to the page by reference to registration signal components of a watermark signal encoded in the page.
- the payload of this watermark is used to access a database containing auxiliary information related to the page.
- This auxiliary information is then overlaid on top of the imagery captured from the page, at a position on the screen that is dependent on the discerned pose.
- Earlier-cited application 13/011,618 teaches a somewhat different arrangement, in which the user taps on a portion of an imaged page presented on the smartphone screen.
- a watermark payload decoded from the captured imagery is sent to a database, which returns page layout information corresponding to the page being viewed.
- the page layout data was earlier exported from publishing software used when composing the page, and stored in the database.
- the phone determines the coordinates on the physical page indicated by the user's tap (e.g., 4 inches down, and 6 inches to the right, of the upper left corner of the printed page).
- auxiliary information relating to that particular portion of the page is identified, and presented on the smartphone screen.
- location of the smartphone relative to the page is not determined by reference to registration components of the watermark signal. Instead, the decoded watermark payload is sent to a remote server (database), which returns information about the page. Unlike application 13/011,618, however, the returned information is not page layout data exported from the publishing software. Instead, the database returns earlier-stored reference data about salient points (features) that are present on the page.
- the salient points may be identified simply in terms of their coordinates on the original page, e.g., by inches down and across from a top corner of the page. Additionally or alternatively, other information - typically feature vectors - can be provided. Instead of identifying individual, unrelated points, the information returned from the database may characterize a constellation of salient points.
- the smartphone can use this knowledge about reference salient points on the page being viewed in various ways. For example, it can identify which particular part of the page is being imaged, by matching salient points identified by the database with salient points found within the phone's field of view.
- the auxiliary data presented to the user can also be a function of the salient points.
- the smartphone can transmit to a remote server a list of the identified salient points that are matched within the phone's field of view. Since this subset serves to precisely localize the region of the page being viewed, auxiliary information corresponding particularly to that region (e.g., corresponding to a particular article of interest to the user) can be returned to the phone.
- auxiliary information corresponding particularly to that region (e.g., corresponding to a particular article of interest to the user) can be returned to the phone.
- a larger set of auxiliary data e.g., corresponding to the entirety of the page, or to all pages in the newspaper, can be returned from the database in response to the watermark payload.
- the smartphone can then select from among this larger set of data, and present only a subset that corresponds to the particular page excerpt being imaged (as determined by salient points). As the user moves the phone to image different parts of the object, different subsets can quickly be
- any overlaid information can be geometrically registered with the underlying imagery, e.g., with a rotation, scale, translation, and/or affine- or perspective-warp that matches the smartphone' s view of the page.
- the database also returns scale and rotation data, related to salient point information provided to the smartphone.
- the database may return a numeric value useful to indicate which direction is towards the top of the imaged object (i.e., vertical). This value can express, e.g., the angle between vertical, and a line between the first- and last-listed salient points.
- the database may return a numeric value indicating the distance - in inches - between the first- and last-listed salient points, in the scale with which the object (e.g., newspaper) was originally printed. (These simple illustrations are exemplary only, but serve to illustrate the concepts.)
- the salient points returned from the database can also serve as guides in sizing and positioning graphical indicia - such as boxes, borders, menus, etc.
- the smartphone may be instructed to render a bounding box on the phone display - sized just large enough to encompass salient points numbered 5, 32, 44 and 65, with edges parallel to the display edges.
- the salient points can similarly serve as in-object guideposts by reference to which other information can be sized, or presented.
- reference salient point information is in determining intrinsic parameters of the camera's lens system, such as focal length. Typically, such specs are available from the manufacturer, or are available in metadata output by the camera (e.g., in EXIF data). However, if unknown, lens parameters can be determined empirically from analysis of images containing known salient points, as is familiar to artisans in the field of photogrammetry. (Others may consult reference works, such as the book by Hartley, “Multiple View Geometry in Computer Vision,” Cambridge University Press, 2004, and the thesis by Pollefeys, “Self -Calibration and Metric 3D Reconstruction from Uncalibrated Image Sequences," Democratic University of Leuven, 1999, in implementing such methods.)
- the registration components of the watermark signal are not be used; only the payload of the watermark is employed.
- other data-conveying mechanisms may alternatively be used, such as barcodes, OCR, Near Field Communication chips (RFIDs), etc.
- a smartphone equipped with a NFC reader, senses a plural-symbol identifier from the NFC chip of the poster - which serves to identify the poster.
- This poster-identifying information is transmitted by the phone to a database, which returns salient points associated with the poster. The user can then interact with the poster in a position-dependent manner.
- a user can image different areas of the poster with the smartphone camera.
- the phone identifies salient points in the captured imagery, and matches them with salient points returned from the database in response to submission of the NFC poster-identifying data.
- the smartphone discerns what excerpt of the poster is being imaged (and, if desired, the phone's pose relative to the poster).
- Auxiliary information particularly corresponding to such excerpt is then presented to the user (as a geometrically-registered screen overlay, if desired).
- a user can be presented one response if viewing a first part of the poster, and a different response if viewing a second part of the poster.
- such salient point methods can serve as highly accurate location determination methods - much finer in resolution than, e.g., GPS.
- a venue that includes a poster.
- the position of a fixed point on the poster e.g., its center
- the position of the reference point is determined in advance, and such information is stored in a database record identified by the payload of an NFC chip included in the poster (or is encoded as part of the chip's data payload).
- the position of the reference point may be expressed in various forms, such as latitude/longitude/elevation (geolocation data), or simply by its location relative to salient points of the poster (e.g., at the center of the poster, or the upper left corner).
- the location data can also include pose information, e.g., the compass direction the poster is facing, and its horizontal and vertical tilt, if any (in degrees).
- pose information e.g., the compass direction the poster is facing, and its horizontal and vertical tilt, if any (in degrees).
- a user sensing this NFC chip obtains the location coordinates of the poster, as well as salient point information relating to the poster artwork, from the database.
- the smartphone analyzes imagery captured from the phone's current viewpoint, and discerns the phone's pose relative to the poster (e.g., three inches to right of center, four inches down, and 24 inches from the poster, viewing upward at an inclination of ten degrees, rightward at an angle of 20 degrees, with the phone inclined four degrees clockwise to the poster).
- this salient point-determined pose information in conjunction with the known position of the poster, the phone's absolute 6D pose is determined.
- Salient points - sometimes known as interest points, or local features - are familiar from content- based image retrieval (CBIR) and other image -based technologies. Generally speaking, such points are locations in an image where there is a significant local variation with respect to one or more chosen image features - making such locations distinctive and susceptible to detection. Such features can be based on simple parameters such as luminance, color, texture, etc., or on more complex metrics (e.g., difference of Gaussians). Each salient point can be represented by data indicating its location within the image, the orientation of the point, and/or a feature vector representing information associated with that location.
- CBIR content- based image retrieval
- Such features can be based on simple parameters such as luminance, color, texture, etc., or on more complex metrics (e.g., difference of Gaussians).
- Each salient point can be represented by data indicating its location within the image, the orientation of the point, and/or a feature vector representing information associated with that location.
- Salient points may correspond to individual pixels (or sub-pixel locations within an image), but salient point detectors typically focus on 2D structures, such as corners, or consider gradients within square areas of pixels. Salient points are one particular type of local image descriptors.
- salient points used by the SIFT or SURF algorithms can be used. That is, in response to receipt of a watermark, NFC, or other object identifier from a smartphone, a remote server/database can return a set of SIFT or SURF data corresponding to that object.
- SIFT Scale-Invariant Feature Transform
- SURF Scale-Invariant Feature Transform
- reference salient point data for the object is determined (typically by a proprietor or publisher of the object from analysis of a file from which the object is printed), and this data is stored in a database in association with an identifier for that object (e.g., an NFC identifier, or watermark or barcode payload, etc.).
- an identifier for that object e.g., an NFC identifier, or watermark or barcode payload, etc.
- the salient point data may not be determined and stored in advance.
- a user may capture imagery from a poster, decode a watermark payload, and capture salient point information.
- the smartphone may find that there is no salient point reference information previously stored for that object.
- the smartphone may then be requested by the database to provide the information discerned by the phone, to which the smartphone can respond by transferring its salient point information to the database for storage.
- the smartphone may additionally send information relating to the phone-object pose.
- the watermark detector in the phone may provide affine transform parameters characterizing the scale, rotation and translation of its object viewpoint - as determined by reference to the registration signal components included in the watermark signal.
- an image processing algorithm executed by the phone processor may discern at least some aspect(s) of pose information by reference to apparent distortion of a known item depicted within the field of view (e.g., edges of a square 2D barcode).
- the phone may send the database the captured image data, and such pose estimation methods can be performed by a processor associated with the database - rather than at the phone.
- pose data can be determined otherwise (e.g., by acoustic echo techniques,
- accelerometer/gyroscope/magnetometer sensor data radio-based location, etc.
- a processor associated with the database can process the phone-submitted salient point information, normalize it to reduce or remove pose-related distortions, and store same as reference data for later use. (Or such normalization may be performed by the smartphone, before providing the salient point information to the database for storage.) This normalized salient point information can then serve as reference information when a second smartphone thereafter queries the database to obtain reference salient point information for that object.
- data about edges of the object - sensed from the phone-captured imagery can be stored in the database.
- information is geometrically related to the salient point information, so that the salient points can serve to indicate, e.g., distances from different edges of the object.
- a copy of the page imagery itself can be returned - with or without associated salient point data.
- Watermark detection commonly proceeds by first estimating translation, rotation and scale of the watermarked object by reference to registration signal components of the watermark (e.g., a known constellation of impulses in the spatial frequency domain). The captured imagery is next processed to remove these estimated affine distortions. Finally, a watermark decoding algorithm is applied to the processed imagery.
- registration signal components of the watermark e.g., a known constellation of impulses in the spatial frequency domain.
- the pose of the imaged object relative to the camera is estimated through use of reference salient points - as discussed above.
- corrective adjustments e.g., affine counter-distortions
- the watermark decoding algorithm is then applied to the corrected imagery.
- a very small set of salient points can suffice for such purpose (e.g., three points).
- Graphical indicia which are commonly found in printed materials (e.g., a recycling symbol, or company logos, or even square barcodes) are well suited for such purpose.
- the rectangular outline of a typical magazine page, of typical dimensions, can also suffice.
- Watermark signals are typically small in amplitude, and can be degraded by image noise - such as arises from low-light exposures. Other image operations similarly suffer from image noise (e.g., fingerprint-based image recognition). Image noise can be decreased by lengthening the exposure interval, but so-doing increases the risk of motion blur.
- multiple image frames of a scene are captured, such as by a smartphone in a video capture mode.
- Each frame independently, may have a poor signal-to-noise ratio.
- This signal-to-noise ratio is improved by geometrically aligning multiple frames by reference to their common salient points, and then averaging the aligned frames.
- the composite frame thus-obtained is lower in noise than the component frames, yet this advantage is achieved without the risk of motion blur.
- Such a composite frame can then be submitted to a watermark detector for watermark decoding, or used otherwise.
- Such method works by identifying the salient points in each of the frames (e.g., using the SURF technique). Corresponding points are then matched between frames. The movement of the points between frames is used to quantify the transform by which one frame has changed to yield the next. These respective transforms are then reversed to align each of the frames to a common reference (which may be, e.g., the middle frame in a sequence of five frames). The aligned frames are then averaged.
- the video capture mode permits certain assumptions that facilitate rapid execution of the method. For example, the frame -to-frame translational movement of salient points is small, so in searching a subject frame to identify a salient point from a prior frame, the entire subject frame needn't be searched. Instead, the search can be limited to a small bounded neighborhood (e.g., 32 x 32 pixels) centered on the position of the point in the prior frame.
- a small bounded neighborhood e.g., 32 x 32 pixels
- the feature vectors for the points can omit the customary orientation information.
- the scale factor of the imagery captured in the sequence of frames is likely to be relatively uniform - again constraining the search space that must be considered in finding matching points.
- a particular matching algorithm starts with salient points conventionally identified in first and second frames.
- An exemplary frame may have 20-400 salient points.
- a Euclidean distance is computed between its feature vector, and the feature vector of each salient point in the second frame.
- a point in the second frame with the closest Euclidean distance is identified as a candidate match.
- a point in the second frame may be identified as a candidate match to two or more points in the first frame. Such candidate matches are discarded. Also discarded are candidate matches where the computed Euclidean distance exceeds a threshold. (An absolute value threshold may be used, or the algorithm may discard the candidate matches based on the largest ten percent of distance values.) A set of candidate matches remains.
- Fig. 20 shows the location of the remaining salient points, in both the first and second frames. As can be seen, points near the center of the frame closely coincide. Further away, there is some shifting - some due to slightly different scale between the two image frames (e.g., the user moved the camera closer to the subject), and some due to translation (e.g., the user jittered the camera a bit).
- the transformation between the first and second frames is characterized by a scale factor, and by a translation (in X- and Y-).
- Scale is estimated first. This is done by scaling the second frame of remaining salient points by various amounts, and then examining a histogram of distances between the scaled point locations, and their nearest counterparts in the first frame.
- Fig. 21 shows the results for scale factors of 1.01 , 1.03, 1.05, and 1.07. As can be seen, a scale of 1.05 yields the best peak.
- the second frame of remaining salient points is then scaled in accordance with the determined scale value (1.05). Distances (in X- and Y-) between the scaled point locations, and their nearest counterparts in the first frame, are then computed, and the median values of X- and Y- offset are then computed. This completes the first approximation of the transformation characterizing the alignment of the second image relative to the first.
- This approximation can be further refined, if desired.
- One suitable technique is by discarding those candidate point-pairs that don't yet align within a threshold distance after applying the determined scale and X-, Y- offsets.
- An affine transform, based on the determined scale and offsets, is then perturbed in an iterative fashion, to identify a transformation that yields the best least-squares fit between the still- retained candidate points.
- 500 frames of a digitally watermarked photograph were captured in low light using a smartphone' s video capture mode. Individually, 25% of the 500 frames could be processed to read an encoded digital watermark.
- a successful watermark read was achieved with one or more frames in each sequence of five frames 49% of the time. If successive groups of five frames were averaged without any alignment, the results dropped to 18%. If, however, each sequence of five frames was aligned and averaged as described above (using the third frame as a reference, against which the others were matched), a successful watermark read was achieved 61 % of the time.
- the described procedure enhanced the success of watermark reading operations, such processing of multiple image frames - based on salient point alignment and averaging - can similarly yield low-noise, sharp, images for other purposes (including consumer enjoyment).
- Another method that can be used with the foregoing arrangement, or independently, is to note the smartphone's motion sensor data corresponding to the instant that each frame of a video sequence was captured. If the sensor data (e.g., from a 3D accelerometer or gyroscope) indicates movement above a threshold value, then the corresponding frame of imagery can be discarded, and not used in an averaging operation.
- the threshold can be adaptive, e.g., by discarding two frames out of each sequence of ten having the highest motion values.
- Audio watermarks are increasingly being used to provide network services in association with audio and audio-video content.
- One example is the Grey's Anatomy Sync application offered by ABC Television, and available for download from the Apple App Store. This iPad app allows viewers of the Grey' s Anatomy program to interact with other fans (e.g., by chat functionality), and obtain episode- related content (e.g., actor biographies, quizzes, etc.) in real-time, while watching the program.
- episode- related content e.g., actor biographies, quizzes, etc.
- audio watermarks in music content can allow listeners to interact with other fans, and obtain related information.
- Audio watermark information is typically woven into the content itself - a very low level, noiselike signal that is inseparable from the audio data. Removal of the watermark typically is very difficult, or impossible.
- audio watermark information is conveyed in a separate audio channel, so that such information can be rendered - or not, depending on the desires of the user, or on other circumstances.
- Dolby TrueHD which can convey 24 bit audio in each of 8 discrete audio channels.
- An exemplary implementation is a home audio system, using an audiophile's 5.1 or 7.1 surround sound system, with associated watermark data conveyed on an additional channel.
- the user can instruct whether the watermark channel should be rendered, or not. If rendering is selected, the receiver mixes the watermark data into one or more of the speaker channels (e.g., the front left and right speakers).
- the amplitude of the watermark is usually not changed in the mixing, but some implementations may additionally give the user some ability to vary the amplitude of the mixed watermark.
- Another implementation looks forward to the day that audio is delivered to consumers in the native multi-track form in which it was recorded, allowing users to create their own mixes. (E.g., a consumer who is fond of saxophone may accentuate the saxophone track in a 16-track recording of a band, and may attenuate a drum track, etc.) Again, in such implementation the user is given the opportunity of including the watermark signal in the final audio mix, or leaving it out - depending on whether or not the user plans to utilize network services or other features enabled by the watermark.
- a single watermark track is provided.
- multiple tracks can be used.
- One such embodiment has a basic watermark track, and plural further tracks.
- Each of the further tracks is a data channel, which specifies an amplitude component of the watermark that should be associated with a corresponding audio (instrument) track.
- the amplitude data channels are scaled in accordance with the user-set amplitude of the corresponding audio (instrument) track, and the scaled amplitude data from all such channels are then summed to yield a net scale factor for the watermark signal.
- the watermark signal is then dynamically adjusted in amplitude in accordance with this scale factor (e.g., by multiplying), so that the watermark amplitude optimally corresponds to the amplitudes of the various audio tracks that comprise the aggregate audio.
- the tracks of audio can be stored on a computer readable medium from which the consumer electronic device reads them and processes them, as above. Or the tracks may be streamed to the device, such as by a cable or online delivery service, and buffered briefly in a memory before being read and processed.
- a smartphone may capture an image frame that depicts several different objects in a shared context.
- An example is a department store advertisement that features a variety of products within a single photographic image.
- Another is a page of classified advertising.
- Such documents including plural different objects may be referred to as "composite subjects.”
- each object that forms part of the composite subject may be associated with a different electronic response (e.g., a corresponding online web page, or other triggered action).
- the foregoing can be achieved by determining an identifier associated with the composite subject, transmitting it to a data store, and receiving in reply an authored page that includes data from which a rendering of some or all of the original composite subject can be produced.
- This received page can define different clickable (tappable) regions.
- the user taps on a particular object of interest shown on the rendered page, and the smartphone responds by instituting a response associated with that region, using techniques known in the art (such as via familiar HTML hypertext markup that renders an image as a hyperlink).
- the image presented on the smartphone screen may not be imagery captured by the smartphone camera. Instead, it typically is a page (file) delivered to the smartphone from a remote store. However, it shows a version of the same composite subject with which the user is interacting (albeit usually in a "plan" view - free from any perspective distortion in the smartphone- captured imagery).
- the file delivered to the smartphone may present the composite subject at a native
- the originally-captured imagery may resolve a depicted subject at 50 pixels per inch, whereas the delivered file may provide a resolution of 72 pixels per inch.
- a feature on the printed page that might span 10 pixels in the originally-captured imagery may span 14 pixels in the delivered file.
- the user can employ known touchscreen gestures, including pinching, swiping, etc., to change the display magnification, and traverse the page to bring a desired excerpt into view.
- each object depicted in the composite subject is encoded with its own machine readable code (e.g., a digital watermark, barcode, etc.). If any of these is decoded, and its payload is sent to the remote server, the system responds with the same composite page data in return (i.e., multiple input payloads all resolve to the same output page).
- the entire composite subject may be encoded with a single identifier (e.g., a digital watermark that spans the full composite subject, or a single barcode on a printed page that depicts several objects). Again, the system can respond to transmission of such a single identifier by returning page data for rendering on the smartphone display.
- individual objects depicted in the composite subject, or other excerpts of the composite subject may be recognized by image fingerprint techniques, such as SURF. Again, such identification can map to an identifier for that subject, which can be associated with a corresponding electronic page for rendering.
- the smartphone may discern an identifier from the composite subject without use of the smartphone camera, e.g., by detecting an identifier from an NFC or RFID chip conveyed by the composite subject, using a corresponding detector.
- the electronic page presented on the smartphone for user interaction may visually correspond, to different degrees, with the physical page that launched the experience.
- the electronic page may be indistinguishable from the physical page (except, e.g., it may be presented from a different viewpoint, such as from a plan - rather than an oblique - perspective).
- the electronic page may be visually similar but not identical. For example, it may be of lower resolution, or it may present the page with a smaller color palette, or with other stylized graphical effect, etc.
- the system replicates - on a smartphone screen - a version of a composite subject being viewed by the user, but with clickable/tappable regions that link to corresponding resources, or that trigger corresponding behaviors.
- the single frame - although it may depict multiple objects - maps to a single electronic page in response.
- the user can then unambiguously indicate which object is of interest by a tap (or by alternative user interface selection).
- a related problem can arise in certain implementations of streaming mode detectors (detailed above).
- the smartphone camera may capture images of many other objects that form part of the composite subject (about which the user may have no interest), yet the smartphone may decode machine -readable identifiers from each.
- the smartphone may disable operation of certain modules (e.g., watermark and barcode decoders, NFC readers, etc.) when the phone's motion sensors (e.g., accelerometers, gyroscopes and/or magnetometers) indicate more than a threshold degree of motion. For example, if the phone senses movement exceeding two, four or six inches per second, it may suppress operation of such modules. (Some motion occurs just due to natural hand jitter.) The phone may resume module operation when the motion drops below the threshold value, or below a different threshold (e.g., one inch per second). By such arrangement, decoding of unintended identifiers is suppressed. Print-to-Web Payoffs, e.g., for Newspapers
- print media - such as newspapers and magazines - can be digitally watermarked to embed hidden payload data.
- payload data is sensed by a suitable watermark detection program on a smartphone (e.g., the Digimarc Discover app), it causes the smartphone to present an associated "payoff," such as to display associated online content.
- a newspaper digitally watermarks a large number of its daily photographs, or a large number of its daily articles, it can become a logistical challenge for the publisher to specify an appropriate payoff for each photograph/article.
- some publishers arrange for all of their watermarked images simply to link back to the home page of the publication' s online presence (e.g., the www ⁇ dot>nytimes ⁇ dot>com web page).
- the publisher may specify that the payoff for a print article is simply the online version of the same article.
- a newspaper article (or image) is associated with a more valuable payoff, with little or no effort.
- an operator before the newspaper is delivered to subscribers (but after a watermark ID has been assigned to an article), an operator types a few (e.g., 2-5) keywords that are associated with the article (e.g., Obama, Puerto Rico; or Stanley Cup, Bruins). These keywords are stored in a database at a central computer system, in association with the watermark payload ID with which the article is digitally watermarked.
- the payoff is a Google (or other provider, such as Bing) search based on the keywords entered for that article.
- the Google keyword search may be used as a default, or a backstop, payoff, in case the publisher does not specify any other payoff.
- the keywords are stored in association with the article, and a Google search based on the stored keywords is initially specified as the payoff for the article. Thereafter, however, the newspaper publisher, or the writer of the article, may change the stored data to specify a different payoff. (Sometimes an author-specified online payoff is submitted to the publisher with the article text, in which case this author-specified payoff can be used as the online payoff from the beginning.)
- the keywords are not entered manually, but rather are extracted from the text of the article, e.g., as by tag cloud techniques.
- tag cloud ranks nouns in an article by frequency - possibly discarding "noise words.”
- Co-occurrence methods can be used to identify phrases of two words or more. The most frequently-occurring terms are stored as article keywords. Such techniques are familiar to artisans.
- the system can predict likely keywords based on the article author. A popular sportswriter for the Oregonian commonly writes about the Trailblazers basketball team. Paul Krugman of The New York Times commonly writes about the economy. Google searches based on such keywords can be a suitable default payoff even in the absence of any particular information about their articles' contents.
- Still another method extracts semantic information from imagery, such as by pattern matching or facial recognition. For example, known methods can be used to identify depictions of famous people, and familiar landmarks, in newspaper photographs. Names discerned through use of such techniques can be stored as keywords for such photographs.
- imagery Discerning the contents of imagery is aided if the automated system has some knowledge of location information relating to the image.
- the Oregonian newspaper frequently publishes imagery including faces of local and state officials. Their faces may be difficult to match with reference facial data drawn from faces around the world.
- knowing that such imagery is being published in the Oregonian gives the recognition system a further clue that can be used to identify depicted people/landmarks, i.e., check first for matches with facial data associated with Oregon.
- automatically-generated keywords may be reviewed by an operator, who can supervise such output and revise same, if the automatically-generated keywords seem inappropriate or inadequate.
- a database at the central computer system associates watermark payload IDs with details of associated payoffs.
- This database may be maintained by the newspaper publisher, or by another party.
- the database record also specifies an HTML 5 template.
- the database is interrogated by the smartphone app, which provides a decoded watermark ID, the database pulls the HTML 5 template, inserts the associated keywords, and returns it to the smartphone. The smartphone app then renders the screen display in accordance with the returned HTML template.
- the database may query Google with the keywords, and return to the smartphone a completed form of the HTML page, which already has the Google search results included.
- the Google search that is presented by the smartphone app may be domain-limited, such as to the New York Times web site, and to non-competing domains (e.g., Wikipedia, US government web sites, etc.)
- a Google search with the keywords "Obama” and "Puerto Rico” yields a list of results headed by news reports of his visit (published by The New York Times, The National, Al Jazeera, etc.). Lower in the search results, however, is a YouTube link showing Obama dancing at a rally.
- the HTML 5 code can observe the traffic to and/or from the app, and may indicate to the database which link(s) the user pursues. Based on the number of user who click on the dancing link, the system may revise the payoff so that this result appears higher in the list of search results.
- the database may similarly learn of viewer interest in links relating to Obama drinking "cerveza” in Puerto Rico, and eating "platanos.”
- the results may then exclude the formerly top-ranked news accounts.
- the database to run plural Google queries - one with the original keywords;" one with those keywords and “dancing;” one with those keywords and “cerveza;” and one with those keywords and “platanos.”
- the remote system can then combine the results - based on indicated user popularity of the different subjects, and return these modified results to smartphones that thereafter link from the article. These later users may then see search results headed by the YouTube video.
- the order in which the links are presented on the smartphone app can be tailored to correspond to their apparent popularity among the newspaper's readers.
- objects may be recognized in imagery (e.g., by watermarking or fingerprinting), and the smartphone may present tags (aka icons or baubles) in association with such displayed objects.
- tags aka icons or baubles
- the association may be "sticky.” That is, if the field of view displayed on the smartphone screen is changed, then the displayed tags move with the apparent motion of the objects with which they are respectively associated.
- the moving tags can prove problematic if the user wishes to tap one, e.g., to trigger an associated action. This typically requires holding the camera with at least one hand, while simultaneously tapping a potentially moving target on the screen.
- a tag associated with a recognized object in displayed imagery is presented in a fixed position on the screen, together with a visual indication linking the tag with the object to which it corresponds.
- the smartphone may be programmed to present a distinctive graphical effect on the area of the captured imagery where a watermark seems to be found. This effect may comprise, e.g., shimmering, chrominance or luminance oscillation, overlaid graphical features, etc.
- Fig. 32 shows one such arrangement. (The overlaid stars vary in brightness or position with a frequency of several Hz - indicating to the user that there is more here than meets the eye. In Fig. 32, the watermark signal was detected across the displayed imagery.)
- the user may take an action instructing the smartphone to complete a watermark reading operation on the captured imagery.
- Such action may be a touch screen gesture, a shake of the phone, a touch of a physical button, a spoken command, etc.
- the phone may operate in a mode in which it automatically undertakes watermark reading whenever a possible watermark signal is detected.
- the smartphone completes a watermark reading operation. It then transmits the decoded watermark payload to a remote database station/web service.
- the remote database station uses the received watermark payload to access a database record containing associated information, which it then transmits back to the smartphone.
- this information returned to the smartphone causes the phone to present a display like that shown in Fig. 33. That is, the smartphone spawns three tags at the bottom edge of the screen - where they can be conveniently tapped with the user's thumb.
- One tag corresponds to the blouse worn by the woman depicted in the captured imagery; a second corresponds to the woman's shorts; and the third corresponds to the woman's handbag.
- Fig. 33 there are two types of visual indications that conceptually link each tag with a corresponding object.
- One is a tether line, extending from the tag to the object.
- Another is an object- customized tag.
- the tag desirably remains fixed at the bottom edge of the screen, together with the bottom end of the associated tether line.
- the top end of the tether line moves to track the object, and to maintain a persistent visual association between the object and the tag.
- Figs. 33 and 34 The user has moved the smartphone between these two Figures, yielding a different view of the catalog page.
- the woman wearing the blouse and shorts has moved rightward in the displayed field of view, and the handbag has moved out of sight.
- the blouse and shorts tags remain stationary at the bottom of the screen.
- the tops of the tether lines move to track the moving blouse and shorts objects. (The tag for the handbag, which has moved out of sight of the camera, disappears in Fig. 34.)
- the depicted tether line arrangement thus employs a dual form of iconography.
- the second visual indication linking each tag to a respective object is the distinctive graphical tag artwork indicating the nature of the object to which it corresponds. It will be recognized that in the depicted arrangement, the tether lines are not needed, because the distinctive tags, alone, symbolize the different object in the image (blouse, shorts and handbag) - providing the requisite visual indication of association. But in other embodiments, the tags can be generic and identical to each other, in which case the tether lines provide a suitable visual association.
- the watermark payload (or fingerprint data) sent to the web service enables access to a database record containing information about the particular catalog, and page, being viewed.
- the information returned from the database may include reference image data characterizing particular features in the image. This information may comprise one or more thumbnail images or maps - defining the different object shapes (blouse, shorts, handbag, etc.).
- this information may comprise image fingerprint data, such as data identifying features by which the depicted object(s) may be recognized and tracked. Additionally, or alternatively, this information may comprise data defining object "handles" - locations in or at the edges of the object shapes where the upper ends of the tether lines can terminate. Fig. 35 shows one such shape (for the shorts) that defines three handle locations (indicated by "X"s).
- the information returned from the database is typically authored by a publisher of the catalog.
- the publisher specifies that the Fig. 32 image includes three objects that should be provided with user-selectable interactivity, via linked tags.
- Information about the three objects is stored in the database and provided to the phone (e.g., shape data, like Fig. 35, or fingerprint data - such as SURF), allowing these objects to be pattern-matched and tracked as the view moves, and the attachment handle points are identified.
- the publisher-specified data further defines the particular icon shapes that are to be presented at the bottom of the screen in association with the three different objects.
- the software may analyze the bottom edge of the imagery to identify where to best place the tags. This decision can be based on evaluation of different candidate locations, such as by identifying edges within that region of the imagery. Desirably, the tags should not be placed over strong edges, as this placement may obscure perceptually relevant features of the captured imagery. Better to place the tags over relatively "quiet,” or uniform, parts of the image - devoid of strong edges and other perceptually salient features, where the obstruction will likely matter less. Once a tag is initially placed, however, it is desirably left in that location - even if the underlying captured imagery shifts - so as to ease user interaction with such tag.
- the tops of the tether lines desirably follow the moving features - stretching and repositioning as needed, akin to rubber bands.
- moving of tags may be required, such as when additional objects come into view - necessitating the presentation of additional tags.
- the software seeks to maintain the original tags in as close to their original positions as possible, while still accommodating new tags. This may involve making the original tags smaller in size.
- Tether line(s), if used, are routed using an algorithm that identifies a simple curved path from the tag to the nearest handle on the corresponding object. Different paths, to different object handles, can be evaluated, and a selection of a route can be based on certain criteria (e.g., minimizing crossings of strong edges; crossing strong edges at near a 90 degree angle - if unavoidable; identifying a route that yields a curve having a visually pleasing range - such as a curve angle of close to 25 degrees; identifying a route that approaches a handle on the edge of the object from outside the object, etc.)
- the color of a tether line may be adapted based on the captured imagery over which it is overlaid, so as to provide suitable contrast.
- the tether line route and color may be re -evaluated, and such line may terminate at a different handle on a given object in some circumstances. (This is the case, e.g., with the tether line connecting the shorts to the corresponding icon.
- a handle on the right side of the woman's shorts is employed; in Fig. 34, a handle on the left side is used.
- Fig. 36 when the smartphone detects one watermark encoded in the region of imagery encompassing the woman's clothing, and another encoded in the region of imagery encompassing her handbag.
- the smartphone modifies the region of imagery depicting the woman's shorts and blouse to present one particular graphical effect (shown as an overlaid star pattern), and it modifies the region of imagery depicting the woman's handbag to present a different graphical effect (shown as overlaid circles).
- a software API may be activated by such detection, and output the pixel coordinates of the apparent center of the watermarked region (perhaps with other information, such as radius, or vertical and horizontal extent, or vector area description).
- the API, or other software may fetch a software script that defines what graphical effect should be presented for this particular watermark (e.g., for this payload, or for a watermark found in this area).
- the script can provide effects such as a magnifying glass, bubbles, wobbles, fire animation, etc., etc. - localized to the region where the API reports the watermark appears to be located.
- the smartphone begins to display on the screen a tether line starting in, or at the edge of, the watermarked region, and animates the line to snake towards the edge of the screen.
- the smartphone has sent the decoded watermark to the remote database, and received in response information allowing it to finalize an appropriate display response - such as presenting a handbag icon where the animated line ends. (It may also snap the upper end of the line to a handle point defined by the received information.)
- the phone transmits the imagery (or fingerprint data based thereon) to a remote system/web service.
- the remote system computes a fingerprint (if not already provided by the phone), and seeks to identify a matching reference fingerprint in a database. If a match is found, associated information in the database serves to identify the object/scene depicted by the imagery (e.g., a particular catalog and page). Once such identification has been performed, the behavior detailed above can proceed.
- the smartphone interprets as a command to freeze the currently-displayed image view, and/or maintain the presently-displayed tags on the screen.
- Such functionality allows the user to point the camera at a catalog page to obtain the corresponding tags, and thereafter reposition the phone to a more convenient position for interacting with the image/tags.
- imagery obtained otherwise such as received from a storage device, or from across a network.
- object identification is performed in the detailed arrangement by watermarking (or image fingerprinting), other embodiments can be based on other forms of identification, such as barcodes, glyphs, RFID/NFC chips, etc.
- Smartphones are increasingly being equipped with graphics processing units (GPUs) to speed the rendering of complex screen displays, e.g., for gaming, video, and other image -intensive applications.
- GPUs graphics processing units
- GPU chips are processor chips characterized by multiple processing cores, and an instruction set that is commonly optimized for graphics.
- each core is dedicated to a small neighborhood of pixel values within an image, e.g., to perform processing that applies a visual effect, such as shading, fog, affine transformation, etc.
- GPUs are usually also optimized to accelerate exchange of image data between such processing cores and associated memory, such as RGB frame buffers.
- RGB frame buffers Image data processed by GPUs is commonly expressed in three component planes, such as Red/Green/Blue, or YUV.
- frame buffers also differ from program storage memory in that frame buffers are configured to enable rapid swapping of buffered data to the device screen for display (e.g., by appropriate interface hardware).
- the pixel size of a frame buffer usually has a 1 : 1 correspondence with the pixel size of the display screen (e.g., a smartphone with a 960 x 640 pixel display screen would commonly have one or more 960 x 640 frame buffers).
- GPUs had their genesis in speeding graphics processing, they also have been applied to other uses.
- general purpose GPUs GPUs
- GPUs GPUs
- mainframe supercomputers rely on GPUs for massive parallelism.
- GPU vendors such as NVIDIA, are providing software tools that allow specified functions from a normal "C" language computer program to be run on their GPU-equipped video cards.
- certain embodiments repurpose smartphone hardware provided for graphics purposes (e.g., GPUs and RGB frame buffers) for use instead with RDF triples, such as for searching and semantic reasoning.
- graphics purposes e.g., GPUs and RGB frame buffers
- Fig. 22 is a conceptual view of a memory used to store a frame of image data, which may have dimensions of 128 x 128 pixels.
- Each pixel has three component color values: one red, one green, one blue.
- An illustrative pixel is shown with RGB color values ⁇ 43,35,216 ⁇ . These data correspond to a pixel having a color akin to royal blue.
- This conceptual arrangement maps well to the storage of RDF triples. Instead of storing the three components of pixel representations, this memory serves to store the three components of RDF triples - commonly called the Subject, the Predicate, and the Object.
- the data stored in image memory locations typically comprise 8-bit values (i.e., one each for red, green, blue). Each value can be an integer in the range of 0-255. When repurposed for RDF use, the RDF components are similarly expressed as integer codes in the range of 0-255.
- An auxiliary data structure such as a table, can map different 8-bit RDF codes to associated strings, integers, or real number values.
- a first table may map people's names to integer codes, e.g. :
- Such a table may be dedicated to a single component of the RDF triples (e.g., the Subject data), or it can serve two or more.
- the data may be all of the same type (e.g., people's names), or data of different types may be included. Not every 8-bit code need be mapped to a corresponding datum.
- a further table is used to associate 8-bit codes with different Predicates involving people, e.g.: 0
- BOB HasChild TED can thus be expressed as the triple of 8-bit codes ⁇ 6,3,4 ⁇ . It will be recognized that the meanings of the first and third codes (6 and 4) are indicated by Table I, while the meaning of the second code (3) is indicated by Table II.
- Fig. 23 shows the same memory arrangement as Fig. 22, but now repurposed for RDF use.
- the 8-bit integer codes ⁇ 6,3,4 ⁇ are stored in corresponding memory locations in the three planes - which now represent Subjects, Predicates and Objects.
- FIG. 24 shows a few of potentially thousands of such triples that may be stored in the memory.
- the royal blue pixel is stored in the memory at a location that corresponds to its position of desired presentation in a rendered image.
- frame buffers in smartphones typically have a one-to-one mapping with pixel elements in the display.
- the position at which the royal blue data ⁇ 43,35,216 ⁇ is stored in memory affects where, in the picture that will be rendered from the memory, that blue pixel appears.
- RDF triple data in the Fig. 23 memory is processed to apply a semantic reasoning rule.
- the reasoning infers additional relationship information between people.
- FIG. 25 shows a small excerpt of a smartphone memory, populated with a few RDF triples. (Both the 8 -bit codes, and the corresponding text, are depicted for explanatory
- the RDF data asserts that Alice has a sister Helen. Moreover, the data asserts that Bob has two sisters (Mary and Sue), a child (Ted), and two brothers (Chuck and John).
- the GPU is programmed to apply rule-based reasoning to discern a new type of relationship between individuals - that of being an uncle.
- a new type of relationship between individuals - that of being an uncle may be: if a person has both a child and a brother, then the brother is the child's uncle.
- the rule may be expressed as:
- the memory is conceptually divided into 3x3 blocks (302, 304, etc.) - each devoted to a different RDF Subject. This is shown by the dark lines in Fig. 25. Up to nine different triple assertions about each RDF Subject can be stored in such a 3x3 block.
- the CPU's operation table is loaded with instructions to execute the above procedure.
- the foregoing procedure can sometimes be shortened by imposing a further spatial constraint on the storage of triples in the memory. Namely, in addition to grouping triples with the same Subject together in a common block, the triples are also ordered within the block based on their Predicate codes. Such sorting often allows the nine predicates to be checked for a particular code without an exhaustive search.
- Fig. 25 the Predicates are listed in descending order, starting with the upper left cell of each block.
- the check can stop when a code less than "5" is encountered.
- the fifth Predicate checked in block 304 of Fig. 25 i.e., the center cell in the 3x3 block
- Figs. 26 and 27 show a subset of the templates involved. In these templates, a blank box indicates "don't care.” (Fig. 27 simply gives letter names to each of the triples in the block, to ease reference when discussing the templates of Fig. 26.)
- the GPU core checks the 3x3 Predicate plane of a block in the triple memory (e.g., 304) against each of the templates, to identify matching code patterns. For each match, a new "HasUncle" assertion is generated.
- sorting the triples in the Fig. 25 can aid the first-defined procedure, it can similarly aid the template-based procedure.
- the number of required templates can be halved by sorting the triples by Predicate.
- Some implementations of the present technology make use of number theory, to help or speed reasoning.
- a number theory procedure may be applied first - as a check to determine whether there is any "HasUncle” assertion to be discerned from input data in a block. Only if this preliminary check is affirmative is the template matching procedure, or another such procedure, invoked.
- the block does not have both a 3 and a 5. It therefore cannot generate any "HasUncle” assertions, and the template-matching procedure (or other such procedure) can be skipped as moot.
- the same multiplication product can also be used to screen for presence of one or more
- Prime codes There are 54 different primes among the integers 0-255. If these prime codes are assigned to Predicates that may be ANDed together by semantic reasoning rules (with such assignment perhaps skipping other values, as in Table II), then the presence (or co-presence) of any group of them within a block of Predicate data can be determined by checking whether the product of all nine Predicate codes is evenly divisible by the product of the group of primes. (E.g., to check for the co-occurrence of 2, 3 and 11, check for divisibility by 66.)
- the GPU may not have one core for each 3x3 block triple data.
- the memory may have 1000 3x3 blocks of triple data, while the GPU may have only 200 cores.
- the GPU may apply the prime-screening procedure to the first 200 blocks, and to copy blocks found to have "HasUncle" relations to a frame buffer.
- the process repeats for the second, third, fourth, and fifth 200 blocks, with copies of blocks determined to have "HasUncle” relations being added to the frame buffer.
- the earlier-detailed pattern matching (or another procedure) is run on the blocks in the frame buffer (all of which are known to have latent "HasUncle” relations), to generate the new assertions.
- Product-of -primes is one type of number theory that can be applied. There are many others. Another class involves additive number theory. Consider the following table of predicate codes, for a simple example: 0
- This table is sparse; most 8-bit codes are reserved from assignment in order to yield desired number theory results when Predicates are combined. In fact, the only codes in this table are 1, 10 and 100.
- This assignment of Predicate values enables another check of whether one or more "HasUncle” relationships may be reasoned from a given block of triples.
- the nine Predicate codes in a block are summed. (Again, a "0" is used for any empty cells.)
- This particular sparse assignment of integers is designed so that, if there is at least one "HasChild” Predicate, and at least one "HasBrother" Predicate, each of the last two decimal digits of the sum will be non-zero.
- a GPU core performs this check and, if it is met, the GPU can then further process the block to extract the new "HasUncle” assertion(s), such as with one of the above -described procedures.
- a variant of this additive procedure can also check for one or more "HasAunt” relationship. In this check, a value of 100 is first subtracted from the sum. The core then checks that (1) the result is positive; and (2) the last digit of the result is non-zero. If these conditions are met, then one or more "HasAunt" relationships can be asserted from the data.)
- the reasoning rules may involve more than two Predicates.
- the number of triples in each block may be different than nine. Indeed, uniform block organization of memory is not required; some implementations may have blocks of varying sizes, or dispense with block organization altogether.
- the GPU cores may access overlapping areas of memory (e.g., overlapping blocks).
- Each plane of the triple memory may have a bit-depth other than 8 (e.g., 16).
- the data may be grouped by identity of Predicate or Object, rather than identity of Subject.
- the selection of particular 8-bit codes to assign to different Predicates (or Subjects or Objects) will often depend on the particular context.
- the triple store contains information about vehicles for sale.
- This information may have been automatically downloaded to a smartphone's memory in response to a user capturing an image of a vehicle section of a classified advertising publication, using a smartphone camera (see, e.g., application 13/079,327).
- the 8-bit Subject codes may correspond to text strings identifying different vehicles, e.g.:
- This Table IV may be regarded as the main Subject table. Associated with each of these Subjects will typically be multiple Predicates and Objects.
- the 8- bit Predicate codes, and their associated meanings, may be:
- Table V may be regarded as the main Predicate Table. (The Predicates in this table are chosen for purposes of illustration. Many implementations will employ standardized vocabularies, such as those of established OWL ontologies.)
- the 8-bit Subject codes, and their associated meanings may be:
- Table VI may be regarded as the main Object table.
- triples in the smartphone memory can express assertions using 8-bit codes selected from these three vocabulary tables, e.g.
- entries in the main Object table may be used only with one of the entries in the main Predicate table.
- Object code 62 e.g., 6-7 tons
- HasGross Vehicle Weight predicate Other entries in the Object table may be used with several entries in the Predicate table. For example, Object codes 2-5 might be used both with the HasPassengerCapacity predicate, and with the HasDoors predicate. (Similarly, Object codes 17-19 might be used both with the HasExteriorColor predicate, and with the HasUpholsteryColor predicate.)
- the number of possible Object values exceeds the 256 that can be accommodated in the 8- bit memory plane.
- each of 250 vehicles may have both a different price and different telephone number associated with it, i.e., 500 different values.
- an Object code of "0" can be specified in such triples.
- This different structure may be identified by the Predicate name (or its number equivalent).
- the triple ⁇ 3,4,0 ⁇ concerns the price of the Winnebago.
- the Object code "0" indicates that the price is not indicated by a value indexed by an 8-bit code in the main Object table (i.e., Table VI above). Instead, the "0" directs the smartphone to consult auxiliary memory table #4 (referring to Predicate value 4).
- Auxiliary memory table #4 may have the prices for all the vehicles, associated with their corresponding Subject codes (given in parentheses for ease of understanding), e.g.:
- auxiliary tables may be sorted by the associated Object values (here, price), rather than the Subject codes - to facilitate searching.
- the smartphone GPU can near-instantly filter the data stored in the main Subject-Predicate- Object memory to identify vehicles with certain sought-for parameters (i.e., those expressed in the main Object table). For example, if the user is interested in (1) trucks, (2) that can seat 4-6 passengers, these parameters can be entered using a conventional smartphone graphical user interface (GUI), and the results can be quickly determined.
- GUI smartphone graphical user interface
- One illustrative GUI presents drop-down menus, or scrollable selection wheels, that are populated with literals drawn from the Predicate and Object main tables.
- An auxiliary GUI table may be used to facilitate the display of information, e.g., to provide plain English counterparts to the Predicates, and to indicate the particular codes by which searches can be keyed.
- Figs. 28, 29A and 29B show an example.
- One or more tables 400, or other data structure(s) stores information used in generating GUI menus.
- a sequence of GUI menus, 402, 404, etc., is presented on the smartphone screen, and enables the user to enter desired search parameters.
- the illustrated GUI 402 has a first scrollable window portion 420 in which different menu legends from column 410 of table 410 are selectably displayed. As depicted, the user has scrolled to the "What are you looking for?" option.
- a second scrollable window 422 is populated with second level menu choices that correspond to the selection shown in window portion 420, as determined by reference to table 400. For example, since the user has scrolled window portion 420 to "What are you looking for?" the smartphone responds by presenting choices such as "Car,” “Truck” "Motorcycle,” and “Other” in the second window portion 422. These particular text strings are drawn from column 412 of table 400, where they correspond to the "What are you looking for?" top level menu. As depicted, the user has scrolled the window 422 to indicate "Truck.”
- the GUI 402 further includes a button 424 that the user can tap to enter more search parameters.
- the user can tap a "Get Results" button 426 that presents results of a search based on the user-entered parameter(s).
- the GUI stores the values just-entered by the user (i.e., "What are you looking for?" and "Truck"), or 8-bit code values associated with such values, and then allows the user to interact with window 420, and then window 422, again. This time the user selects "What passenger capacity?" from window 420.
- the smartphone knows to populate the second window
- a flag (not shown) in table 400 can signal to the software that it should render a second window 422a, in which the user can specify an upper range limit, when the "What passenger capacity?" menu option is selected. (The original window 422 then serves as a lower range limit.)
- Fig. 29B the user has scrolled window 422 to "4,” and window 422a to "6.” The user is thus interested in trucks that can seat between 4 and 6 passengers.
- the user can then request search results by tapping the "Get Results” button 426.
- the search of the triple store can commence. (Alternatively, it may have commenced earlier, i.e., when the user completed entry of a first search parameter ("Truck") by tapping button 424. That is, the search can be conducted in a series of successive screening operations, so that when the user taps the "Get Results" button, only the final parameter needs to be searched within a previously-determined set of interim search results.)
- a first search parameter "Truck”
- Table 400 indicates how the smartphone processor should search the stored data to identify vehicles meeting the user's search criteria.
- row 432 of table 400 indicates that this corresponds to a Predicate code of 12 (HasVehicleType), and an Object code of 115 (Truck).
- the GPU searches the memory for triples that meet these criteria.
- thresholding an operation at which GPU cores excel. That is, the memory can be filtered to identify triples having Predicates greater than 11 and less than 13.
- the interim results from this initial operation - which comprise all triples with the HasVehicleType Predicate - may be copied to a new frame buffer. (Or triples not meeting this threshold text can be set to ⁇ 0,0,0 ⁇ - "black" in image processing terms.)
- triples may be identified by this step for further processing - typically one triple for each of the vehicles, e.g., ⁇ 1 , 12,112 ⁇ - the Hyundai CRX; ⁇ 2, 12,115 ⁇ - the Ford Ranger; ⁇ 3, 12, 116 ⁇ - the Winnebago; (the Toyota Tacoma); ⁇ 4, 12,17 ⁇ - the Toyota Sienna; etc.
- a second search is then conducted across these interim results (e.g., in the frame buffer) - this time to identify triples having Object code 115 (i.e., for "Truck” objects). Triples that don't have an Object code of 115 can be deleted (or set to "black”).
- the smartphone From the results of this second phase of search, the smartphone knows the Subject code for the vehicle matching the user's query: 5. (There is one match in this example, but in other instances, there may be several matches.) The smartphone next prepares search result information for presentation to the user. This result-reporting phase of operation is illustrated by reference to Fig. 30.
- the smartphone Knowing which Subject code(s) corresponds to the vehicle meeting the user's queries, the smartphone now identifies all triples in the memory having a Subject code of 5. Multiple triples are found - a few of which are shown in Fig. 30 (e.g., ⁇ 5,1,17 ⁇ , ⁇ 5,2,7 ⁇ , ⁇ 5,3,58 ⁇ , etc.).
- the main Subject, Predicate and Object tables (tables IV, V and VI, above) are consulted for the strings or other values associated with the respective Subject, Predicate and Object Codes.
- the first triple, ⁇ 5,1,17 ⁇ indicates "2007 Toyota Tacoma HasExteriorColor White.”
- the second triple, ⁇ 5,2,7 ⁇ indicates "2007 Toyota Tacoma HasPassengerCapacity 6.”
- the smartphone software fills a template form, which may label different data with plain English titles (e.g., "Color" instead of
- “HasExteriorColor” presents a listing (e.g., including all available parameters from the Predicate table V, above) to the user on the smartphone screen.
- the phone may use different templates based on certain parameters, e.g., the template used for a Truck may be different than that used for a Car.
- the templates may be obtained, as needed, from cloud storage, or they may be resident in smartphone memory.
- auxiliary tables corresponding to the Predicates (e.g., Auxiliary table #4 provides HasPrice values).
- the software populates the form with information indicating that the price of the Toyota Tacoma is $24,500, and the seller's phone number is 503-555-1234.
- Some parameters may not be specified in the data downloaded with the triples into the smartphone, but may instead be pulled from remote triple stores, e.g., in the cloud (or from Google -like text searches). For example EPA mileage is a government statistic that is readily available on-line, and can be obtained to augment the other vehicle information.
- An exemplary screen presenting results of such a user query may include one or more photographs (e.g., obtained from a URL indicated by the HasLinkForMorelnfo Predicate), together with text composed using the referenced template form.
- Such text may read, e.g.:
- a single vehicle may be detailed per screen display, with additional vehicles brought into view by a swiping motion across the touch screen display. More details about presenting such information is found, e.g., in application 13/079, 327.
- triple stores utilizing more than three 8-bit data planes can be used.
- Some images are stored in 4-plane representations, e.g., Cyan/Magenta/Yellow/Black, or RGB A - where the A stands for alpha, or transparency.
- RGB A stands for alpha, or transparency.
- While generally regarded as an enhanced form of "triple,” such set of data may also be called a “quadruple,” or more generically an "N-tuple” (where N 4).
- a fourth 8-bit data plane enables various features.
- Fig. 31 shows a portion of an illustrative memory that includes the three 8-bit planes discussed earlier, together with a fourth 8-bit plane dedicated to storage of codes for price. This memory is virtually organized into 4x4 blocks - each dedicated to a different Subject. The depicted excerpt details codes associated with Subject 5 (the Toyota Tacoma truck).
- the triple with Predicate code 4 (i.e., HasPrice) has 255 for its Object code.
- a code of 255 instructs the software to refer to a further 8-bit plane for an associated code (the particular plane being indicated by the Predicate code).
- the associated code is 218.
- a table can associate different 8-bit codes with different values. It is advantageous, in some implementations, to assign the price codes in a sorted order, e.g., with smaller codes corresponding to smaller prices.
- a sample table may be: 0
- An advantage of this arrangement is that it facilitates searching, since the techniques detailed above - exploiting the GPU's speed at processing 8-bit integers from image storage - can be utilized.
- the user interface of Fig 29B can inquire "What price?" A pair of windows 422, 422a then presents controls thru which the user can scroll among actual prices of vehicles detailed in the memory - setting a lowest price and a highest price. Thresholding, or other such GPU operation, is then applied to the corresponding codes in the Price memory plane to quickly identify Subjects meeting the specified price criteria.
- FIG. 31 shows another such code plane - dedicated to Engine size (which corresponds to Predicate 6). Again storage of corresponding codes in this 8-bit plane allows search queries to be executed rapidly.
- GPU shaders typically are sync'd with the display screens they drive. Even the modest GPU in the iPhone 4 phone refreshes its 640 x 960 pixel display screen at about 25 frames per second.
- most - or even all - of the Predicates may have their own plane of 8-bit memory for storage of codes, like those depicted for Price and Engine Size in Fig. 31.
- It may be recognized searching is facilitated by assigning Object codes to express a semantic ordering. This is clear from the foregoing example concerning passenger capacity, where the different numeric values are ordered in ascending fashion, with corresponding ascendency of the associated 8-bit codes. This enables range-based searching by specifying upper and lower codes, and performing a thresholding operation. A similar ordering can be effected with parameters that are not purely numeric. For example, colors may be ordered in a semantic manner, e.g., based on corresponding wavelength maxima, and/or intensity or luminance.
- all of the blues may have similar codes, and all the reds may have similar codes, with the blue and red codes being spaced apart from each other in a 0-255 code space.
- Range -based color searching can then readily be performed. (E.g., a user may select "Navy Blue” in window 422 of Fig. 29B, and select "Aquamarine" in window 422a, and vehicles having any color code between the color codes of these two range limits are identified.)
- the smartphone can invert these triples, and present the resulting Subjects (e.g., BLUE, RED, etc.) in a GUI, such as in windows 422 and 422a of Fig. 29B.
- the two values selected in windows 422 and 422a do not define a range of parameters, but rather define two different values that are ORed together in the search, so that triples meeting either value are selected.
- the phone may query the user to learn some facts, such as the number of miles the user drives per year. (This information may be available elsewhere, such as in a user profile stored on a networked computer, or in a database in the user's present car.) If the user responds to this query by indicating that 50,000 miles is a typical annual mileage, the phone may employ semantic reasoning to discern that per-mile vehicle operating costs are likely of importance to the user. With this inferred information, the phone may decide to render results of user-directed vehicle searches by presenting vehicles having the highest fuel economy first among the search results (absent other instruction from the user).
- the phone can reason using other data. For example, semantic reasoning can be used to conclude that an engine with a 1300 cc engine likely has better fuel economy than an engine with a 4.7 liter engine. Similarly, such reasoning, or a networked knowledge base, may indicate that diesel engines tend to have better fuel economy than gas engines. Again, such knowledge can inform presentation of the search results - simply based on the fact that the user drives 50,000 miles per year.
- Augmented reality techniques are known for recognizing image features, and overlaying information such as labels.
- the superimposed overlay may be geometrically registered with the image feature(s), so that as the image features move within the field of view, the overlay moves with a corresponding motion.
- Another aspect of the present technology builds on such techniques by providing augmentation in the form of a background, rather than an overlay.
- An exemplary image is the Beatle's Abbey Road record album, depicting the four Beatles walking across a crosswalk on Abbey Road.
- the four Beatles may be excerpted from the image. Two images can thereby be formed - a first image with just the four Beatles (surrounded by a void, or a uniform color - such as white), and a second image with just the background (which may have a void (or white) where the Beatles were, or not).
- the first image may be printed on a substrate, and a smartphone is used to capture imagery of the substrate, e.g., in a video capture mode.
- Software in the smartphone determines the pose of the camera relative to the first image. With this information, the software geometrically warps the second
- the phone then composites the two images - the phone-captured imagery of the four Beatles, and the background - warped to provide the original backdrop of the image.
- the two images complement each other to present a unified image that appears like the original album cover, as if viewed from the phone's pose relative to the substrate.
- background images may be used, instead of the original.
- the background image may depict Broadway in Times Square, New York.
- the excerpted Beatles - imaged from the printed substrate (or from an electronic display screen) may be superimposed on the new background image - which again is warped and scaled so that it appears with the same pose as the camera relative to the substrate.
- the augmentation is more akin to an underlay rather than the traditional augmented reality overlay.
- Geometric warping and registration of the background image to match the substrate-camera pose can be done in various ways, such as using digital watermarks, salient image points, etc. If the first image has a QR code or other barcode, such feature can itself be used to discern pose information. Such techniques are further detailed elsewhere in this disclosure.
- the second (background) image can be modified based on changes in pose - to give a 3D effect. For example, additional background scenery may move into the frame if the user pans the camera. If the user tips the camera to point more downwardly, more of the street imagery can come into view (and some sky imagery recedes out of the top of the frame). As the camera pose changes, certain features of the second image become occluded - or become revealed - by changed perspective of nearer features depicted in the second image. Some such embodiments employ a 3D model to generate the background image - computing appropriate 2D views based on the phone' s viewpoint.
- contours of such subject can be determined by reference to a database.
- the camera can then stitch the second image around the first image - occluding portions of the first image that are outside the database-defined contours of the main subject of the image (e.g., the four Beatles).
- the phone software provides a response that is appropriate to the location of the tap. If the user taps on John Lennon, content related to John Lennon is presented. Such taps invoke this behavior regardless whether the tapped part of the display depicts imagery actually captured by the phone camera, or whether it depicts other imagery laid-in by the phone as an augmentation.
- the phone software outputs X- and Y- locations of the user's tap, which are then mapped to a particularly location in the displayed imagery. Content corresponding to such location in the presented display of imagery can then be determined by known ways, such as by indexing a database with the tap coordinates, by decoding a watermark at that region, etc.) Linking Displays to Mobile Devices
- a watermark is embedded in an image/content/advertisement/video/user interface (e.g., a web page) that is to be presented on a display device, such as an LCD monitor.
- the embedding can be performed by the display device, by an associated computer, or by a remote source of the imagery.
- the watermark is readable with a detector present in a smartphone or other mobile device.
- the payload from the watermark logically links, through a table or other data structure, to a source of information that corresponds to the presented display.
- the information may be the URL address for the page.
- Advantages over other techniques include real estate savings (for an image displayed on screen, the watermark does not take up any additional space), embedding costs (cheaper than printed barcodes), all-digital workflow, covert feature (where required), communication channel between displayed content and mobile device. Applications are many - a few examples are detailed below.
- mapping One application concerns mapping. Suppose a user is looking for directions on a desktop/laptop by using a mapping tool such as MapQuest, Yahoo Maps or Google Maps. After the desired
- map/directions are presented on the screen display (which is watermarked), the user points a mobile phone at the map/directions displayed on the screen.
- the phone On reading the encoded watermark, the phone obtains a URL for the displayed directions, and loads the same page (using either a WiFi internet connection or through a communication link such as GPRS). At that point the user is ready to go with the map/directions directly on the mobile phone.
- the smartphone can directly link the map/directions with the GPS functionality, without having to manually enter all the location/address information.
- the payload information decoded by the phone from the watermarked desktop screen display can be transferred to the GPS device using a wireless (e.g. Bluetooth) connection.
- a wireless e.g. Bluetooth
- Another application concerns facilitating E-commerce.
- a person is looking at an ad for a shoe on their desktop/laptop, and this ad is watermarked. Pointing the mobile phone at the ad could directly take the person to a "checkout" page displayed on the mobile phone.
- Another application concerns syncing imagery.
- a user may like a particular image shown on a desktop screen, and want it on their smartphone. This can be accomplished by simply capturing an image of the screen display, using the phone. The phone decodes the watermark, and uses the payload thereby extracted to obtain a copy of the image from its original location.
- calendar syncing can be accomplished by capturing an image from a calendar program (e.g., Microsoft Outlook) on a desktop display.
- the phone decodes the watermark payload, and by reference to this information, obtains data to sync a local calendar with the displayed Outlook calendar.
- Another application is a visual bookmark.
- a user is viewing a web page on a desktop/laptop, and wants to bookmark that page for further browsing on the mobile phone (say on the commute home). If the web page has a watermark, the user can just point the phone at the page, and the bookmark for the page (or its corresponding mobile version) would automatically appear on the mobile phone.
- Yet another application concerns active links.
- the watermark can facilitate an "active links.” That is, just pointing the mobile device at the web page (or other relevant display) and reading the watermark automatically triggers an action - either on the mobile device, or on the computer connected to the display (through a wireless link).
- the foregoing concepts can be extended to video, to enable reading of dynamically changing watermarks by pointing the mobile device to a video streaming on a display screen.
- the assignee's published patent application 20100205628 notes the desirability of being able to transfer a game, or entertainment content, from one computer system (e.g., a desktop computer) to another computer system (e.g., a smartphone), without losing the user's place in the game/content flow. By such arrangements, a user can seamlessly continue an activity despite switching devices.
- one computer system e.g., a desktop computer
- another computer system e.g., a smartphone
- the display data presented on a computer's screen is routinely digitally watermarked with an app-state-variant payload. That is, a display driver or other module in the computer regularly steganographically encodes the displayed data with a multi-bit identifier. This identifier is changed occasionally (e.g., every frame, or every 1-10 seconds, or at irregular intervals - such as when a threshold amount of change has taken place in the program or computer state). Each time the identifier is changed, the computer writes data that enables the "full state" of the computer system, or of a program being displayed on the screen, to be recovered.
- the data store in which this information is written can include several entries - one providing a base data state, and others providing successive updates (akin to how a video frame is sometimes encoded simply with data detailing its difference from a prior frame).
- a database (which can be as simple as a look-up table) identifies which part of the stored data is needed to recreate the device state corresponding to each watermark payload.
- Fig. 37 further details such an arrangement.
- a first computer e.g., a desktop computer
- This state data is desirably adequate to recreate the computer's state (or that of the program being displayed) on a different device.
- a new watermark ID is assigned (223, 224, etc.), and the screen display is thereafter encoded with this identifier.
- a corresponding update is made to a watermark look-up table.
- the stored state data is commonly of variable length (indicated by the lengths of the rectangles in Fig. 37). Occasionally a large block of data will be written ("Base Data” in Fig. 37). Subsequent blocks of stored data can simply be differential updates to the base data. After a further interval, another new, large, base data block may be written (e.g., "Base Data 38").
- the stored state data in this example, is written to a linear memory, with consecutive addresses (corresponding to the horizontal axis in Fig. 37).
- the "Base Data 37" is stored beginning at memory address 1004, and continues up through 1012.
- a first update, "37A” is stored beginning at address 1013, and continues up through 1016.
- a second update, "37B” is stored beginning at address 1017, and continues up through 1023. This continues with a third update "37C.”
- Base Data 38 a new block of base data
- a consumer uses a smartphone to take a picture of a display screen in which watermark 223 is encoded
- the consumer' s smartphone decodes the watermark, and inputs it to a data structure (local or remote) that provides address information where corresponding state data is stored.
- the data structure returns the memory range 1004-1012. (Fig. 38 shows such a watermark lookup table.)
- the smartphone retrieves this range of data and, using it, recreates the executing program as it existed when watermark 223 was first encoded, albeit on a different device.
- the Fig. 38 table returns a memory range that encompasses the corresponding base data ("Base Data 37") and also extends to include the differential update "37A.” Thus, it returns the memory address range 1004-1016. Again, the consumer's smartphone can use this information to recreate - on the smartphone - the execution state that existed on the desktop computer when watermark 224 was first encoded.
- Fig. 37 graphically shows the different memory ranges associated - in the Fig. 38 watermark look-up table - with each of the different watermark payloads.
- LED office lighting is being used as an optical carrier for data signals - akin to an optical DSL network - communicating with optical modems attached to desktop computers.
- the Greenchip line of lighting by NXP Semiconductor includes LED lights (sometimes termed “SSLs” - solid state lights) with integrated IP6 connectivity. That is, every light has its own internet address.
- “JenNet”-IP network software provides wireless connectivity for the LED devices.
- JenNet is a 6L0WPAN mesh-under tree network employing IEEE 802.15.4-based networking.
- an LED' s operating parameters can be changed based on IP6 data transmitted across the wiring network.
- LED lighting can communicate with smartphones, and other camera-equipped devices.
- the luminance or chrominance of the illumination is varied, at a human-imperceptible degree, to convey additional data. These subtle variations are reflected in imagery captured by the smartphone camera.
- a watermark decoding process executed by the smartphone processor then extracts the encoded information from the camera data.
- smartphone cameras offer capture image frames at less than 100 frames per second (more typically, 10 - 30 fps). But while small, this data rate nonetheless can convey useful information. If the illumination is modulated in two or more of the different color channels sensed by common smartphones - red, green, and blue - somewhat higher data rates can be achieved.
- luminance may be desirable to maintain constant luminance - despite color modulation. This can be accomplished by modulating two of the color channels to convey data, and modulating the third channel as needed to compensate for the luminance change due to the other colors, yielding a constant net luminance. Due to the eye' s different sensitivity to different wavelengths of light, luminance is most dependent on the amount of green, and is least dependent on the amount of blue
- An illustrative embodiment may vary red and blue to convey data, and vary green for luminance-compensation.
- the watermark data payload can be represented using an error-correcting code, such as BCH ("trellis") or convolutional coding, to provide robustness against data errors.
- BCH error-correcting code
- the resulting time -varying luminance or chrominance change can be applied to existing LED control signals (whether the LED is modulated with high speed data or not) to effect broadcast to proximate camera sensors.
- cameras decode plural bits of digital watermark data from a single frame of imagery, e.g., detecting slight differences in luminance or chrominance between different spatial parts (e.g., pixels) of the imagery.
- the present application decodes plural bits of digital watermark data from a sequence of frames, detecting slight differences in luminance or chrominance over time. Within a single frame, all parts of the captured imagery may be similarly influenced by the LED lighting signal.
- the watermark can be decoded from signals output from a single pixel, or from plural pixels, or from all of the pixels.
- the decoding application can sum or average the luminance and/or chrominance across all of the camera pixels, and analyze this aggregate signal for variations caused by the watermark encoding.
- LED bulb fixtures in the shoe section may be encoded with one particular identifier; those in the menswear section may be encoded with a different identifier.
- the phone can determine its location within a store.
- Another application is an LED lighting fixture equipped with a microphone or camera to sense data from the ambient media environment, and extract information based on the sensed environment.
- This may comprise detecting an audio watermark, or generating audio fingerprint data and recognizing a song based thereon, or recognizing a person's face from captured imagery, etc.) Data related to this extracted information is then encoded in light emitted from the lighting fixture.
- LED automobile headlights can be modulated to convey - to oncoming vehicles - parameters of the automobile's operation, such as its speed and compass bearing.
- Sensuch signal is preferably decoded using a light sensor system built into the car, rather than a user's smartphone.
- Such sensor system can be configured to capture data at a higher bandwidth than is possible with smartphone cameras.
- the headlamps may be encoded at a data rate of 1000 or 10,000 bits/second, and the sensor system can be configured to decode such rates.
- outdoor illumination at a business or residence address e.g., a front porch light
- a street number e.g., a street number
- the user can take action on it.
- the phone can give the user directions to another location within the store, or can present coupons for merchandise nearby.
- the phone can signal a warning if the oncoming vehicle is traveling at a rate more than ten percent above the speed limit (or the user's own speed).
- the street number can be presented on the user's phone, or the name of a business/resident at that location can be looked-up from public databases.
- light sources other than general purpose LED lighting can be controlled in the manner just described.
- television and laptop lighting can be modulated in this fashion.
- chrominance modulation may be unsuitable for color-critical television scenes (e.g., depicting skintone)
- other information displays are forgiving of chrominance variations (e.g., desktop color, web page backgrounds, etc.).
- Fig. 38 shows an illustrative embodiment employing luminance LED modulation.
- a light fixture-mounted device 320 includes one or more LEDs 322, a DC power supply 324, and a modulation arrangement 326.
- the depicted arrangement also includes a JenNet IP6 remote control system (including a logic block 327 and an associated modulator 329), although this is not essential.
- the power supply 324 is conventional, and converts fixture AC power (e.g., 120 volts) into a DC voltage suitable for the LED(s) 322. (Although not particularly shown, the same power supply can provide needed voltage(s) to the modulation arrangement 326 and the JenNet system 327.)
- the modulation arrangement 326 includes a data receiver 332 that receives an input data signal 333, e.g., conveyed to the device by a radio or audio signal, and sensed by an antenna 328 or a microphone 330.
- the data receiver provides appropriate decoding (e.g., a watermark extraction process, in the case of an audio signal) to provide binary output data.
- This data is input to a convolutional encoder 334, which provides an output signal to a modulator 336, which varies the DC signal applied to the LED(s) 322 accordingly.
- a modulator 336 which varies the DC signal applied to the LED(s) 322 accordingly.
- the system 320 typically employs red/green/blue/white LED sources, which are driven with tri-stimulus pulse width modulation (PWM) control signals at a frequency of lKHz - 30KHz.
- PWM pulse width modulation
- the durations of the driving pulses are lengthened and/or shortened to effect steganographic encoding of the data signal 333.
- the changes to pulse lengths are less than 25%, and may be less than 10%, 5% or 2%. Larger changes are acceptable if both positive and negative changes are made, e.g., corresponding to "1" and "0" outputs from the convolutional encoder, since their time average is typically zero.
- the particular modulation percentage depends on the application being served, and can be determined by simple experimentation. (E.g., for a given convolutional encoder, increase the percentage change to the PWM driving signals until unwanted visual effects just begin to appear under the most demanding illumination conditions - such as nighttime, and then reduce the percentage change until these effects are imperceptible.
- the data receiver 332 of Fig. 32 is replaced with a GPS receiver, or other location-sensing module.
- GPS Globalstar Satellite System
- the light source emits illumination encoded with geolocation data.
- the system 320 does not employ a data receiver 332, but instead is hard- coded with a fixed plural-bit data payload (which may be set, e.g., by a ROM or a dip-switch arrangement).
- a payload can serve as a unique identifier for the system.
- a receiving smartphone senses illumination from such system, and decodes the plural-bit identifier, this phone can transmit the identifier to a remote database (e.g., over the internet), which returns associated information (e.g., a house number, a store department name, etc.) for the phone's use.
- the data to be steganographically conveyed (i.e., at a bit rate sensible by a smartphone) is conveyed over the power-lines.
- PLC power line communication
- PDSL PDSL
- BPL BPL
- the technology employed by the Greenchip line of devices can be used.
- a smartphone can serve as a receiver of LED-based optical communication signals, it can similarly serve as a transmitter of such signals.
- Most smartphones include an LED “torch” to illuminate camera-captured scenes.
- Such an LED can be modulated, using the arrangements detailed above, to convey data optically from the phone.
- Bluetooth and other short range communications technologies such LED communication affords some measure of privacy, since a clear line of site is typically required.
- a receiving system responds to the LED signals with responsive data.
- This data response can include information indicating the strength of the received optical signal (e.g., a number corresponding to a signal-to-noise metric).
- the originating phone can then reduce its LED driving power so as to provide an adequate, but not excessive, received signal strength at the second device. In addition to saving power, such reduction of LED driving current in this fashion further reduces the capability of unintended optical receivers to eavesdrop.
- This responsive data sent back to the originating smartphone can be conveyed by wireless, optically, or otherwise.
- a method that includes receiving, in a device, data representing content (e.g., audio content).
- An algorithm is applied to the received data to derive identification data therefrom.
- a particular software program is selected for use with the content. This selecting is performed by a processor in the device that is configured to perform such act.
- the selected software program is then launched.
- selection of the particular software program used with the data is based on identification data derived from the data itself.
- a further method again involves receiving, in a device, data representing content (e.g., audio content), and applying an algorithm to the received data to derive identification data therefrom.
- a favored software program for use with the content is selected. This selecting is performed by a processor in the device that is configured to perform such act.
- the favored software program is then indicated to a user.
- the favored software program used with the data is selected based on identification data derived from the data itself.
- Yet another method involves receiving, in a device, data representing content (e.g., audio content).
- content e.g., audio content
- This content is of a type that can be rendered by first software in the device.
- the device indicates that second software should be used instead.
- a creator of the content designated the second software as preferred, for that creator' s content.
- Still another method includes, with a microphone of a device, receiving audio from an ambient environment.
- An algorithm is applied to audio data representing the received audio to derive
- identification data therefrom.
- a corresponding fulfillment service is selected to be used for commerce involving the audio.
- first received audio leads to selection of a first fulfillment service to be used for commerce involving the first audio
- second, different, received audio leads to selection of a second, different fulfillment service to be used for commerce involving the second audio.
- a further method involves identifying content in a user' s ambient environment (e.g., audio), the content having been earlier sampled by a user device.
- Information about the identified content is then published, optionally with user context information.
- one or more third party bids is received.
- a determination is made as to a winning bid, and a third party associated therewith.
- Software identified with the determined third party is then enabled to present a buying opportunity to the user on the user device.
- Yet another method includes receiving a set of digital data, corresponding to received content of a given type (e.g., audio, imagery or video). Identification data corresponding to the received content is generated, by which the received content can be distinguished from other content of the given type. By reference to the identification data, N application program(s) associated with the received content and not associated with all other content of the given type are determined, where N is one or more. Action is then taken with one or more of the determined application program(s).
- a given type e.g., audio, imagery or video
- OS operating system
- the OS software serves to configure a processor to provide OS services to requesting application programs through defined interfaces.
- These OS services include deriving content identification data from signal data representing visual or auditory information, and publishing same for use by one or more application programs that subscribe to same through a defined interface.
- the apparatus includes a processing device and a memory.
- the memory contains operating system (OS) instructions that configure the processing device to provide OS services to requesting application programs through defined interfaces.
- the apparatus further includes a sensing system for sensing ambient visual and/or sonic stimulus from an environment of the device and for producing content data corresponding thereto.
- One of the OS services derives content identification data from the produced content data, and publishes same to requesting application programs that subscribe through a defined interface.
- Another aspect of the technology is an apparatus that includes a processing device and a memory.
- the memory contains operating system (OS) instructions that configure the processing device to provide OS services to requesting application programs through defined interfaces.
- OS operating system
- One of these defined interfaces provides a content identification message queue that publishes content identification data corresponding to visual or sonic stimulus processed by the device, to which queue one or more application programs can subscribe.
- Another aspect of the technology is a method that includes sensing sonic or visual stimuli from a user's ambient environment for a prolonged period (e.g., more than 15 minutes, an hour, or 6 hours), thereby noting content to which the user has been exposed during the period.
- a configured hardware processor is used to generate content identification data corresponding to several different items of content to which the user has been exposed during this period. Then, in the course of a subsequent user interaction with a user interface, the method identifies to the user, by title, plural different items of content noted from the user's environment, and enables the user to select among plural different actions with respect to at least one of the identified items of content.
- Another method involves sensing sonic or visual stimuli from a user's ambient environment, thereby noting content to which the user has been exposed.
- a configured hardware processor is used to generate content identification data corresponding to at least one such item of content.
- Two or more different software applications are then primed with information based, at least in part, on such generated content identification data.
- a further method involves receiving content data representing audio, image and/or video content using a consumer electronic device, where the content data is received in a file format of a type that the electronic device is capable of rendering to produce a content-related experience, by reason of a first software rendering program earlier installed or otherwise made available for use with the device.
- a proprietor of the content data for producing a content-related user experience with the received content data is determined.
- the method determines that none of these determined software rendering programs is presently installed or available for use with the device, and consequently refuses to render the received content data on the device, despite the capable first software rendering program.
- the consumer electronic device respects the prescription of the proprietor regarding the software rendering program(s) by which a content-related user experience is to be delivered to the user.
- Yet another method concerns rendering art using a user's device, where the art has been produced by a proprietor thereof.
- the art is represented in a data format that can be rendered by several different software programs.
- the method further includes consulting proprietor-specified metadata, using a programmed processor, to determine a subset of these programs that may be utilized to render the art data, and rendering the art data with one program of this subset.
- proprietor-specified metadata using a programmed processor, to determine a subset of these programs that may be utilized to render the art data, and rendering the art data with one program of this subset.
- Another aspect of the technology is an apparatus including a processing device, a computer readable storage medium that stores several software programs for configuring the processing device, and a sensing system for generating data representing ambient visual or sonic content in an environment of the apparatus. At least one of the software programs is stored in the computer readable storage medium as a result of the apparatus sensing certain ambient content from the apparatus environment.
- a portable device e.g., a smartphone
- a processor e.g., a processor
- a memory containing operating system software and plural software programs.
- a first of these programs serves to analyze data representing sonic or visual content to produce corresponding identification data, and publishes results of such analysis to an output.
- a second of these programs serves to monitor the output of the first program for certain content to which the second program is attuned, and take action when the certain content is noted.
- Still another method involves capturing imagery of a subject from a viewing pose, and presenting the captured imagery on a screen of a portable device. Optimality of the viewing pose is assessed by reference to the captured imagery. The method then overlays, on the presentation of captured imagery, an indicia that indicates a result of the assessment. This overlaid indicia changes with changes in the assessed optimality.
- a further method involves capturing plural frames of imagery using a portable device, the portable device having an image sensor and a second sensor. A subset of at least two of the captured frames are selected, based on data from the second sensor. These selected frames are then combined. The combination can make use of salient points, and the combined output can be provided to a digital watermark decoder in the portable device.
- a portable device e.g., a smartphone
- a processor e.g., a central processing unit
- a memory e.g., a central processing unit
- plural sensors including at least a camera and a microphone.
- the device components cooperate to define plural recognition agents, such as an audio fingerprint recognition agent, a barcode recognition agent and an image watermark recognition agent.
- the agents are configured to read data from and write data to a blackboard data structure of the memory.
- This blackboard data structure is also configured to receive data from the sensors and to compile queues of stored image data and audio data that the agents process.
- Yet another method involves determining network relationships between a person and plural objects and, from these network relationships, discerning an object's relative importance to the person. An action is then taken based on the discerned relative importance.
- Still another method involves, with a portable device, capturing imagery of a person's hand gestures.
- the imagery is processed to discern sign language from the gestures, and an action is taken based thereon.
- a further method includes defining an information-carrying pattern having first and second spatial frequency portions, and printing this pattern on a substrate.
- the first portion defines a visible background weave pattern that cues human viewers to the presence of encoded information
- the second portion includes spatial frequencies that are beyond the range of human vision, and conveys at least certain of the encoded information.
- Yet another method involves, in a first device, receiving audio or image content data having payload data encoded by digital watermarking. This payload data is decoded, and then at least part is transmitted to a second device by a technique other than digital watermarking.
- Still a further method involves, in a home entertainment system, receiving audio or image content, and determining media fingerprint information therefrom. Media fingerprint information is then wirelessly transmitted to a portable device (e.g., smartphone) of a user to whom the content is being rendered by the home entertainment system.
- a portable device e.g., smartphone
- Another method involves, in a retail store, receiving a signal indicating a product interest by a shopper in the store, and then taking an action based on the received signal.
- This action includes at least one of: entering the shopper in a queue for service, informing the shopper of a wait time for service, providing the shopper information about a retail store staffer who will provide service, and providing the shopper an enticement for waiting for service. (Such enticement may be entertainment content for which a charge would otherwise be assessed.)
- a further method involves, by reference to data sensed from a first product displayed in a retail store by a shopper's smartphone, determining whether a second product is compatible with the first product, and then taking an action based on such determination.
- Still another method involves capturing imagery from a digitally watermarked object, where the digital watermark includes known reference features. Then, these known reference features are used in estimating a blur kernel by which the captured imagery may be processed to reduce image blur. Yet another method involves, by reference to information sensed by a user's portable device (e.g., smartphone), determining an interest of the user, and then taking an action based on the determined interest.
- a user's portable device e.g., smartphone
- Still another method involves displaying an arrangement of text characters on a screen of a portable device from which a user can select; sensing a user's gaze towards a part of the screen to make a text selection; and then changing the arrangement of characters on the screen from which the user can next select.
- a further method involves capturing a video sequence of image data using a smartphone camera; identifying a first machine readable symbology in the captured data; storing information related to the first identified symbology; identifying a second machine readable symbology in the captured data; and storing information related to the second identified symbology.
- the device operates in a streaming mode to collect information related to machine readable symbologies found in the captured video sequence.
- Another aspect of the technology is a system including a processor, a memory, one or more sensor sub-systems, and a touchscreen.
- the system is characterized in that the device is configured to store information from a sensor sub-system in the memory as RDF triples, and the processor is configured to take action based on the stored RDF triples.
- Still another aspect of the technology is a method that includes receiving sensor data, representing the sensed information for storage as RDF triples, and then processing the stored RDF triples.
- Another aspect of the technology is a device including a camera, a processor, and a memory.
- the memory stores a set of imagery captured by the camera, and also stores plural RDF triples that make assertions about the set of imagery.
- the device processor is configured to perform an operation on the stored set of imagery, by reference to one or more of the stored RDF triples.
- Another method enables a first computer process to request a service from one or more other computer processes, and includes including posting a request to a data structure, where the request is expressed as an RDF triple.
- Yet another method involves receiving audio information at a consumer electronic device, where the audio information was provided by a content distributor, and includes at least first and second audio tracks.
- a control signal is received that sets a ratio between the first and second tracks.
- the audio is then rendered to a consumer in accordance with this ratio.
- the second audio track includes a noise-like digital watermark signal that conveys auxiliary information.
- Another aspect of the technology is a computer readable medium that contains content data which, if rendered by a first device, controls a function of a second device.
- This content data includes at least first and second tracks of audio information.
- the first track includes audio for consumer enjoyment
- the second track includes a noise-like digital watermark signal that conveys control data for the second device.
- an apparatus that includes a memory that stores content information, including data corresponding to at least first and second tracks of audio.
- An audio portion is provided for rendering the content information to an output device (e.g., a speaker).
- This audio portion includes a control operable by a user to set a level at which the second track should be rendered relative to the first track.
- the stored data that corresponds to the second track of audio represents a noise-like digital watermark signal for controlling a function of a second apparatus.
- Another aspect of the technology is a method that includes receiving a first set of data corresponding to imagery captured by a camera-equipped portable device, where the imagery depicts a subject.
- Steganographically-encoded digital watermark payload data is decoded from the first set of data.
- a second set of data is obtained. This second set of data includes salient point data corresponding to the subject.
- Another aspect of the technology is a computer readable medium containing software instructions that, if executed by a computing device, cause the computing device to perform operations including: receiving a first set of data corresponding to imagery depicting a subject; decoding steganographically- encoded digital watermark payload data from the first set of data; and by reference to the payload data, obtaining a second set of data.
- the second set of data includes salient point data corresponding to the subject.
- a further aspect of the technology is a method that includes, in a portable device, receiving identification data sensed from a passive chip device associated with an object. By reference to this identification data, a second set of data is obtained. This second set of data includes salient point data corresponding to the object.
- a portable device that includes a memory and a processing portion, disposed in a pocket-sized housing.
- the memory includes a first portion that stores code data representing plural code-based semantic triples, where each of the triples includes first, second and third codes, one of the codes being a subject code, one being a predicate code, and one being an object code.
- the memory further includes a second portion that stores literal data associating different text literals with different of the codes.
- the processing portion further includes a GPU that processes code data in the first portion of the memory, to perform operations on information represented by the text literals in the second portion of the memory.
- a portable device that includes a memory and a processing portion, disposed in a pocket-sized housing.
- the memory includes a frame buffer portion that stores code data representing plural code -based semantic triples, where each triple includes a subject code, a predicate code, and an object code.
- the memory further includes a second portion that stores literal data associating different text literals with different of the codes.
- the processing portion includes a GPU that processes code data in the first portion of the memory, to perform semantic reasoning involving concepts expressed using the text literals in the second portion of the memory.
- a further aspect of the technology is a portable device that includes a memory and a processing portion, disposed in a pocket-sized housing.
- the processing portion is configured by instructions stored in the memory, and includes a GPU.
- the memory stores code data representing plural code-based semantic triples, each triple including a subject code, a predicate code, and an object code.
- the instructions configure the processing portion to apply a series of templates to the stored code data, to perform operations based on the triples.
- Another aspect of the technology is a method that includes receiving a set of data corresponding to imagery captured by a camera-equipped portable device, where the imagery depicts a subject.
- Identification data is determined based on the received set of data.
- a second set of data is obtained, where this second set of data includes plural N-tuples, each including subject, predicate and object elements. These N-tuples are processed, and an action is taken based on such processing.
- a further aspect of the technology is a portable device that includes a touch-screen, a memory and a processing portion, disposed in a pocket-sized housing.
- the memory contains data representing plural code-based semantic triples, where each triple includes a subject code, a predicate code, and an object code.
- the memory further includes a portion containing text literals that indicate respective textual meanings for different of the codes.
- the memory also contains software instructions that configure the processing portion to present, on the touch-screen, a user interface that is populated with certain of the text literals, through which a user can define a query of the code -based semantic triples.
- a portable device that includes a touch-screen, a memory and a processing portion, disposed in a pocket-sized housing.
- the memory is logically organized as plural planes of storage, containing plural data N-tuples, where elements of the N-tuples correspond to text literals.
- the processing portion includes a GPU configured to operate on the data N-tuples as part of responding to a user query entered via a touchscreen user interface.
- Another method involves receiving data corresponding to a user query.
- a first plane -thresholding test is applied to values stored in one plane of an N-tuple data store, where the data store is logically organized to include N planes of data, storing plural data N-tuples, where N is at least 3.
- This plane- thresholding test identifies a first subset of the stored data N-tuples that meet a first aspect of the user query.
- Still another method involves receiving a first set of data corresponding to imagery captured by a camera-equipped portable device, where the imagery depicts a subject at a first resolution. Identification data is determined based on the first set of data.
- a second set of data is obtained, where the second set of data corresponds to imagery that also depicts the subject.
- the second set of data depicts the subject from a different perspective than the first set of data, and may depict the subject at a second resolution higher than the first resolution.
- Yet another method involves receiving NFC identification data sensed from an object and, by reference to the identification data, obtaining a set of image data that depicts the object.
- a further method concerns transferring application state between first and second devices, each device including a processor and a display screen.
- the method includes, using a camera in a the first device, sensing imagery presented on a display screen of the second device. Digital watermark data is decoded from the sensed imagery. Based on the decoded digital watermark data, stored data is identified that memorializes an application state of the second processor. This stored data can then be used to initialize the second processor to the application state.
- Still another method involves a device having a processor and a display screen.
- the method includes occasionally storing application state data, and digitally watermarking imagery to encode an identifier therein, where the identifier facilitates access to the stored application state data. This digitally watermarked imagery may then be presented on the display screen.
- Yet another method involves, using a camera in a portable device, sensing light (e.g., ambient light). An identifier is decoded from the sensed light, and information is then obtained based on the decoded identifier.
- sensing light e.g., ambient light
- a further method involves generating a signal that encodes the direction and/or velocity of a moving vehicle, and controlling one or more exterior lights on the vehicle (e.g., headlamps) in accordance with such signal.
- Another method involves sensing data from an environment illuminated by a lighting fixture, and processing the sensed data to determine information corresponding thereto. Light emitted from the lighting fixture is then encoded in accordance with the determined information.
- Still another method involves, with a microphone of a device, receiving audio from a user's ambient environment over a prolonged interval (e.g., at least 15 minutes, an hour, or 6 hours).
- An algorithm is then applied to data representing the received audio to derive identification data therefrom.
- the derived identification data By reference to the derived identification data, at least first and second different items of audio content to which the user was exposed during the interval are determined.
- software is primed for later use by the user.
- This priming includes receiving first and second items of auxiliary content at the device from one or more remote repositories.
- the first item of auxiliary content is associated with the first determined item of audio content
- the second item of auxiliary content is associated with the second determined item of audio content.
- the user can then be presented a listing that identifies the determined first and second items of audio content.
- an action can be taken in accordance with the primed software, using a corresponding item of auxiliary content.
- Yet another method involves, with a microphone of a portable device, receiving audio from a user's ambient environment over a prolonged interval. The user's location during this interval is sensed.
- An algorithm is applied to data representing the received audio to derive identification data therefrom.
- At least first and second different items of audio content to which the user was exposed during the interval are determined.
- the determined first and second different items of audio content are identified to the user, and a user signal entered through the user interface is received selecting the first item.
- an identification of the first item of audio content is posted to an online social network, together with data indicating the user's sensed location.
- Another aspect of the technology is a system that includes a processor configured by instructions in a memory. These instructions include operating system software.
- the operating system software provides an audio identification service that receives input audio data, or a pointer to stored audio data, and returns metadata corresponding thereto to an audio identification message queue of the operating system software.
- a further method involves frames of imagery depicting a first subject, captured using a camera portion of a mobile phone device, and displayed on a screen of the device.
- the method includes producing an identifier related to the first subject, by analyzing captured image data.
- a first tag is displayed on the screen with a first frame of imagery. This first tag is located at a first position on the screen, and is user-selectable to initiate an action associated with the first subject.
- a visual indicia is presented on the screen that associates the first subject as depicted in the first frame, with the displayed first tag.
- the method further includes displaying this first tag on the screen with a second frame of imagery that depicts the first subject. The first subject is depicted at different locations within the first and second frames of imagery, yet the first tag is displayed at the same first position on the screen with both the first and second frames of imagery.
- Another method involves identifying physical objects depicted in imagery captured by a camera portion of a user's mobile phone, to thereby discern physical object preference information relating to the user.
- Physical object preference profile data based thereon is then stored in a memory.
- Electronic media may thereafter be processed, based at least in part on the stored physical object preference profile data.
- Yet another method involves, with a camera portion of a portable device, capturing a sequence of imagery.
- One excerpt of the sequence is analyzed to discern calibration information, where the calibration information includes fleshtone data characterizing color of a person's hand depicted in this excerpt.
- Other excerpts of the sequence are analyzed to identify American Sign Language handforms depicted in the imagery, by matching captured imagery with stored reference handform data. Words, phonemes and/or letters may then be output to an output device, based on said analysis.
- Particularly contemplated smartphones include the Apple iPhone 4, and smartphones following Google's Android specification (e.g., the Verizon Droid Eris phone, manufactured by HTC Corp., and the Motorola Droid 2 phone).
- Google's Android specification e.g., the Verizon Droid Eris phone, manufactured by HTC Corp., and the Motorola Droid 2 phone.
- the term “smartphone” (or “cell phone”) should be construed to encompass all such devices, even those that are not strictly-speaking cellular, nor telephones.
- each includes one or more processors, one or more memories (e.g. RAM), storage (e.g., a disk or flash memory), a user interface (which may include, e.g., a keypad, a TFT LCD or OLED display screen, touch or other gesture sensors, a camera or other optical sensor, a compass sensor, a 3D magnetometer, a 3-axis accelerometer, a 3-axis gyroscope, one or more microphones, etc., together with software instructions for providing a graphical user interface), interconnections between these elements (e.g., buses), and an interface for communicating with other devices (which may be wireless, such as GSM, CDMA, W-CDMA, CDMA2000, TDMA, EV-DO, HSDPA, WiFi, WiMax, or Bluetooth, and/or wired, such as through an Ethernet local area network, a T-l internet connection, etc).
- memories e.g. RAM
- storage e.g., a disk or flash memory
- the processes and system components detailed in this specification may be implemented as instructions for computing devices, including general purpose processor instructions for a variety of programmable processors, including microprocessors (e.g., the Atom and A4), graphics processing units (CPUs, such as the nVidia Tegra APX 2600), and digital signal processors (e.g., the Texas Instruments TMS320 series devices), etc. These instructions may be implemented as software, firmware, etc. These instructions can also be implemented in various forms of processor circuitry, including programmable logic devices, field programmable gate arrays (e.g., the Xilinx Virtex series devices), field programmable object arrays, and application specific circuits - including digital, analog and mixed analog/digital circuitry.
- microprocessors e.g., the Atom and A4
- CPUs such as the nVidia Tegra APX 2600
- digital signal processors e.g., the Texas Instruments TMS320 series devices
- These instructions may be implemented as software, firmware, etc.
- Execution of the instructions can be distributed among processors and/or made parallel across processors within a device or across a network of devices. Processing of content signal data may also be distributed among different processor and memory devices. "Cloud” computing resources can be used as well. References to "processors,” “modules” or “components” should be understood to refer to functionality, rather than requiring a particular form of implementation.
- the service by which content owners ascribe certain attributes and experiences to content typically uses software on the user device - either in the OS or as application software. Alternatively, this service can be implemented - in part - using remote resources.
- Software and hardware configuration data/instructions are commonly stored as instructions in one or more data structures conveyed by tangible media, such as magnetic or optical discs, memory cards, ROM, etc., which may be accessed across a network.
- Some embodiments may be implemented as embedded systems - a special purpose computer system in which the operating system software and the application software is indistinguishable to the user (e.g., as is commonly the case in basic cell phones).
- the functionality detailed in this specification can be implemented in operating system software, application software and/or as embedded system software.
- data structures used by the present technology may be distributed. For example, different record labels may maintain their own data structures for music in their respective catalogs.
- a system may need to navigate a series of intermediate data structures (often hierarchical) to locate the one with needed information. (One suitable arrangement is detailed in Digimarc's patent 6,947,571.) Commonly accessed information may be cached at servers in the network - much like DNS data - to speed access.
- CPUs Although reference was made to CPUs, this term is meant to include any device that includes plural hardware cores operable simultaneously.
- Intel for example, uses the term "Many Integrated Core," or Intel MIC, to indicate such class of device.
- Most contemporary CPUs have instruction sets that are optimized for graphics processing.
- the Apple iPhone 4 device uses a PowerVR SGX 535 GPU (included in a system-on-a-chip configuration, with other devices).
- Most such devices include, in their instruction sets, instructions that are tailored to work with 3- or 4- plane imagery, to accelerate the building of images in a frame buffer intended for output to a display. For example, many instructions take data triples as input, and provide data triples as output. Other instructions facilitate operating on spatial neighborhoods of data values in stored image frames.
- NFC chips While reference has been made to NFC chips, it will be recognized that this encompasses all manner of chips - including those known by other names (e.g., RFID chips) that issue a signal conveying a plural-bit identifier when interrogated by an NFC reader.
- RFID chips e.g., those known by other names (e.g., RFID chips) that issue a signal conveying a plural-bit identifier when interrogated by an NFC reader.
- Publish/subscribe functionality can be implemented not just in a device, but across a network.
- An ad hoc network may be formed among users in a common location, such as in a theatre.
- Content recognition information generated by one user's smartphone may be published to the ad hoc network, and others in the network can subscribe and take action based thereon.
- Apple's Bonjour software can be used in an exemplary implementation of such arrangement.
- Bonjour is Apple's implementation of Zeroconf - a service discovery protocol.
- Bonjour locates devices on a local network, and identifies services that each offers, using multicast Domain Name System service records. (This software is built into the Apple Mac OS X operating system, and is also included in the Apple "Remote" application for the iPhone - where it is used to establish connections to iTunes libraries via WiFi.)
- Bonjour services are implemented at the application level largely using standard TCP/IP calls, rather than in the operating system. Apple has made the source code of the Bonjour multicast DNS responder - the core component of service discovery - available as a Darwin open source project.
- the project provides source code to build the responder daemon for a wide range of platforms, including Mac OS X, Linux, *BSD, Solaris, and Windows.
- Apple provides a user-installable set of services called Bonjour for Windows, as well as Java libraries. Bonjour can also be used in other embodiments of the present technology, involving communications between devices and systems.
- Examples include Universal Plug and Play (UPnP) and its successor Devices Profile for Web Services (DPWS). These are other protocols implementing zero configuration networking services, through which devices can connect, identify themselves, advertise available capabilities to other devices, share content, etc. Other implementations may use object request brokers, such as CORBA (aka IBM WebSphere).)
- CORBA IBM WebSphere
- Examples of audio fingerprinting are detailed in patent publications 20070250716, 20070174059 and 20080300011 (Digimarc), 20080276265, 20070274537 and 20050232411 (Nielsen), 20070124756 (Google), 7,516,074 (Auditude), and 6,990,453 and 7,359,889 (both Shazam).
- Examples of image/video fingerprinting are detailed in patent publications 7,020,304 (Digimarc), 7,486,827 (Seiko-Epson), 20070253594 (Vobile), 20080317278 (Thomson), and 20020044659 (NEC).
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Business, Economics & Management (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Engineering & Computer Science (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- General Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Computer Networks & Wireless Communication (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Information Transfer Between Computers (AREA)
- User Interface Of Digital Computer (AREA)
- Telephone Function (AREA)
- Stored Programmes (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
Claims
Priority Applications (26)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011800641753A CN103329147A (en) | 2010-11-04 | 2011-11-04 | Smartphone-based methods and systems |
JP2013537885A JP6054870B2 (en) | 2010-11-04 | 2011-11-04 | Smartphone-based method and system |
CA2815944A CA2815944C (en) | 2010-11-04 | 2011-11-04 | Smartphone-based methods and systems |
KR1020137014291A KR102010221B1 (en) | 2010-11-04 | 2011-11-04 | Smartphone-based methods and systems |
EP11838908.9A EP2635997A4 (en) | 2010-11-04 | 2011-11-04 | Smartphone-based methods and systems |
US13/299,140 US8819172B2 (en) | 2010-11-04 | 2011-11-17 | Smartphone-based methods and systems |
US13/552,310 US9105083B2 (en) | 2010-11-04 | 2012-07-18 | Changing the arrangement of text characters for selection using gaze on portable devices |
US13/552,319 US20120282911A1 (en) | 2010-11-04 | 2012-07-18 | Smartphone-Based Methods and Systems |
US13/552,302 US8620772B2 (en) | 2010-11-04 | 2012-07-18 | Method and portable device for locating products of interest using imaging technology and product packaging |
US13/552,233 US20120284593A1 (en) | 2010-11-04 | 2012-07-18 | Smartphone-Based Methods and Systems |
US13/552,337 US9202254B2 (en) | 2010-11-04 | 2012-07-18 | Variable processing of both image and audio data based on processor utilization |
US13/552,265 US9424618B2 (en) | 2010-11-04 | 2012-07-18 | Smartphone-based methods and systems |
US13/552,279 US20120284122A1 (en) | 2010-11-04 | 2012-07-18 | Smartphone-Based Methods and Systems |
US13/552,251 US20130169838A1 (en) | 2010-11-04 | 2012-07-18 | Smartphone-Based Methods and Systems |
US14/157,108 US9330427B2 (en) | 2010-11-04 | 2014-01-16 | Smartphone-based methods and systems |
US14/328,558 US20140324596A1 (en) | 2010-11-04 | 2014-07-10 | Smartphone-based methods and systems |
US14/341,441 US9292895B2 (en) | 2009-10-28 | 2014-07-25 | Device, system and method for recognizing inputted data using memory having blackboard data structure |
US14/460,719 US9484046B2 (en) | 2010-11-04 | 2014-08-15 | Smartphone-based methods and systems |
US14/947,008 US9595258B2 (en) | 2011-04-04 | 2015-11-20 | Context-based smartphone sensor logic |
US15/139,671 US9830950B2 (en) | 2010-11-04 | 2016-04-27 | Smartphone-based methods and systems |
US15/338,045 US10971171B2 (en) | 2010-11-04 | 2016-10-28 | Smartphone-based methods and systems |
US15/711,357 US10199042B2 (en) | 2011-04-04 | 2017-09-21 | Context-based smartphone sensor logic |
US15/823,168 US10181339B2 (en) | 2010-11-04 | 2017-11-27 | Smartphone-based methods and systems |
US16/247,305 US10658007B2 (en) | 2010-11-04 | 2019-01-14 | Smartphone-based methods and systems |
US16/262,634 US10510349B2 (en) | 2011-04-04 | 2019-01-30 | Context-based smartphone sensor logic |
US16/709,463 US10930289B2 (en) | 2011-04-04 | 2019-12-10 | Context-based smartphone sensor logic |
Applications Claiming Priority (22)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US41021710P | 2010-11-04 | 2010-11-04 | |
US61/410,217 | 2010-11-04 | ||
US201161449529P | 2011-03-04 | 2011-03-04 | |
US61/449,529 | 2011-03-04 | ||
US201161467862P | 2011-03-25 | 2011-03-25 | |
US61/467,862 | 2011-03-25 | ||
US201161471651P | 2011-04-04 | 2011-04-04 | |
US61/471,651 | 2011-04-04 | ||
US201161479323P | 2011-04-26 | 2011-04-26 | |
US61/479,323 | 2011-04-26 | ||
US201161483555P | 2011-05-06 | 2011-05-06 | |
US61/483,555 | 2011-05-06 | ||
US201161485888P | 2011-05-13 | 2011-05-13 | |
US61/485,888 | 2011-05-13 | ||
US201161501602P | 2011-06-27 | 2011-06-27 | |
US61/501,602 | 2011-06-27 | ||
US13/174,258 US8831279B2 (en) | 2011-03-04 | 2011-06-30 | Smartphone-based methods and systems |
US13/174,258 | 2011-06-30 | ||
US13/207,841 | 2011-08-11 | ||
US13/207,841 US9218530B2 (en) | 2010-11-04 | 2011-08-11 | Smartphone-based methods and systems |
US13/278,949 US9183580B2 (en) | 2010-11-04 | 2011-10-21 | Methods and systems for resource management on portable devices |
US13/278,949 | 2011-10-21 |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/174,258 Continuation-In-Part US8831279B2 (en) | 2009-10-28 | 2011-06-30 | Smartphone-based methods and systems |
US13/278,949 Continuation-In-Part US9183580B2 (en) | 2009-10-28 | 2011-10-21 | Methods and systems for resource management on portable devices |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/299,140 Continuation-In-Part US8819172B2 (en) | 2009-10-28 | 2011-11-17 | Smartphone-based methods and systems |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2012061760A2 true WO2012061760A2 (en) | 2012-05-10 |
WO2012061760A3 WO2012061760A3 (en) | 2012-06-28 |
Family
ID=46025145
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2011/059412 WO2012061760A2 (en) | 2009-10-28 | 2011-11-04 | Smartphone-based methods and systems |
Country Status (7)
Country | Link |
---|---|
US (1) | US9183580B2 (en) |
EP (1) | EP2635997A4 (en) |
JP (2) | JP6054870B2 (en) |
KR (1) | KR102010221B1 (en) |
CN (2) | CN103329147A (en) |
CA (1) | CA2815944C (en) |
WO (1) | WO2012061760A2 (en) |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013179086A1 (en) * | 2012-05-29 | 2013-12-05 | Nokia Corporation | Supporting the provision of services |
US8819172B2 (en) | 2010-11-04 | 2014-08-26 | Digimarc Corporation | Smartphone-based methods and systems |
JP2014157595A (en) * | 2013-02-15 | 2014-08-28 | Fuji Xerox Co Ltd | Method and program for identifying medium or function, article including marker, and marker arrangement method |
WO2015039139A1 (en) * | 2013-09-16 | 2015-03-19 | The Electric Fan Company | Distributed, Unfolding, Embedded Transaction and Inventory Apparatuses, Methods and Systems |
US9099080B2 (en) | 2013-02-06 | 2015-08-04 | Muzak Llc | System for targeting location-based communications |
US9218530B2 (en) | 2010-11-04 | 2015-12-22 | Digimarc Corporation | Smartphone-based methods and systems |
US9367886B2 (en) | 2010-11-04 | 2016-06-14 | Digimarc Corporation | Smartphone arrangements responsive to musical artists and other content proprietors |
JP2016541066A (en) * | 2013-12-03 | 2016-12-28 | デ ソウザ、アントニオ フェレイラ | Electronic inquiry system and verification of authenticity, validity and restrictions of national driver's license (CNH), vehicle registration certificate (CRV) and vehicle registration license certificate (CRLV) using data reading technology through approximation |
US20170076290A1 (en) * | 2014-05-16 | 2017-03-16 | Sten Corfitsen | System and method for performing payments from a vehicle |
US10642449B2 (en) | 2016-06-12 | 2020-05-05 | Apple Inc. | Identifying applications on which content is available |
WO2020242840A1 (en) * | 2019-05-24 | 2020-12-03 | Universal City Studios Llc | Systems and methods for providing in-application messaging |
US10930289B2 (en) | 2011-04-04 | 2021-02-23 | Digimarc Corporation | Context-based smartphone sensor logic |
CN112511848A (en) * | 2020-11-09 | 2021-03-16 | 网宿科技股份有限公司 | Live broadcast method, server and computer readable storage medium |
US11049094B2 (en) | 2014-02-11 | 2021-06-29 | Digimarc Corporation | Methods and arrangements for device to device communication |
US11057682B2 (en) | 2019-03-24 | 2021-07-06 | Apple Inc. | User interfaces including selectable representations of content items |
US11070889B2 (en) | 2012-12-10 | 2021-07-20 | Apple Inc. | Channel bar user interface |
CN113379794A (en) * | 2021-05-19 | 2021-09-10 | 重庆邮电大学 | Single-target tracking system and method based on attention-key point prediction model |
US11194546B2 (en) | 2012-12-31 | 2021-12-07 | Apple Inc. | Multi-user TV user interface |
US11245967B2 (en) | 2012-12-13 | 2022-02-08 | Apple Inc. | TV side bar user interface |
US11290762B2 (en) | 2012-11-27 | 2022-03-29 | Apple Inc. | Agnostic media delivery system |
US11297392B2 (en) | 2012-12-18 | 2022-04-05 | Apple Inc. | Devices and method for providing remote control hints on a display |
US11461397B2 (en) | 2014-06-24 | 2022-10-04 | Apple Inc. | Column interface for navigating in a user interface |
US11467726B2 (en) | 2019-03-24 | 2022-10-11 | Apple Inc. | User interfaces for viewing and accessing content on an electronic device |
US11520858B2 (en) | 2016-06-12 | 2022-12-06 | Apple Inc. | Device-level authorization for viewing content |
US11609678B2 (en) | 2016-10-26 | 2023-03-21 | Apple Inc. | User interfaces for browsing content from multiple content applications on an electronic device |
US11683565B2 (en) | 2019-03-24 | 2023-06-20 | Apple Inc. | User interfaces for interacting with channels that provide content that plays in a media browsing application |
US11720229B2 (en) | 2020-12-07 | 2023-08-08 | Apple Inc. | User interfaces for browsing and presenting content |
US11797606B2 (en) | 2019-05-31 | 2023-10-24 | Apple Inc. | User interfaces for a podcast browsing and playback application |
US11843838B2 (en) | 2020-03-24 | 2023-12-12 | Apple Inc. | User interfaces for accessing episodes of a content series |
US11863837B2 (en) | 2019-05-31 | 2024-01-02 | Apple Inc. | Notification of augmented reality content on an electronic device |
US11899895B2 (en) | 2020-06-21 | 2024-02-13 | Apple Inc. | User interfaces for setting up an electronic device |
US11934640B2 (en) | 2021-01-29 | 2024-03-19 | Apple Inc. | User interfaces for record labels |
US11962836B2 (en) | 2019-03-24 | 2024-04-16 | Apple Inc. | User interfaces for a media browsing application |
US12105942B2 (en) | 2014-06-24 | 2024-10-01 | Apple Inc. | Input device and user interface interactions |
US12149779B2 (en) | 2022-02-18 | 2024-11-19 | Apple Inc. | Advertisement user interface |
Families Citing this family (123)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120322428A1 (en) | 2004-09-30 | 2012-12-20 | Motedata Inc. | Network of tags |
US8515052B2 (en) | 2007-12-17 | 2013-08-20 | Wai Wu | Parallel signal processing system and method |
US8903847B2 (en) | 2010-03-05 | 2014-12-02 | International Business Machines Corporation | Digital media voice tags in social networks |
US20120224711A1 (en) * | 2011-03-04 | 2012-09-06 | Qualcomm Incorporated | Method and apparatus for grouping client devices based on context similarity |
US8688090B2 (en) | 2011-03-21 | 2014-04-01 | International Business Machines Corporation | Data session preferences |
US20120246238A1 (en) | 2011-03-21 | 2012-09-27 | International Business Machines Corporation | Asynchronous messaging tags |
US20120244842A1 (en) | 2011-03-21 | 2012-09-27 | International Business Machines Corporation | Data Session Synchronization With Phone Numbers |
SG185147A1 (en) * | 2011-04-08 | 2012-11-29 | Creative Tech Ltd | A method, system and electronic device for at least one of efficient graphic processing and salient based learning |
US9117074B2 (en) | 2011-05-18 | 2015-08-25 | Microsoft Technology Licensing, Llc | Detecting a compromised online user account |
US9087324B2 (en) * | 2011-07-12 | 2015-07-21 | Microsoft Technology Licensing, Llc | Message categorization |
US8885882B1 (en) | 2011-07-14 | 2014-11-11 | The Research Foundation For The State University Of New York | Real time eye tracking for human computer interaction |
US9065826B2 (en) | 2011-08-08 | 2015-06-23 | Microsoft Technology Licensing, Llc | Identifying application reputation based on resource accesses |
US10474858B2 (en) | 2011-08-30 | 2019-11-12 | Digimarc Corporation | Methods of identifying barcoded items by evaluating multiple identification hypotheses, based on data from sensors including inventory sensors and ceiling-mounted cameras |
US9367770B2 (en) | 2011-08-30 | 2016-06-14 | Digimarc Corporation | Methods and arrangements for identifying objects |
US9235799B2 (en) | 2011-11-26 | 2016-01-12 | Microsoft Technology Licensing, Llc | Discriminative pretraining of deep neural networks |
GB2497951A (en) * | 2011-12-22 | 2013-07-03 | Nokia Corp | Method and System For Managing Images And Geographic Location Data |
US20130254265A1 (en) * | 2012-03-20 | 2013-09-26 | Alexandra Chemla | System and mechanisms for transferring user selected content to a recipient |
US9510141B2 (en) | 2012-06-04 | 2016-11-29 | Apple Inc. | App recommendation using crowd-sourced localized app usage data |
WO2013184383A2 (en) * | 2012-06-04 | 2013-12-12 | Apple Inc. | App recommendation using crowd-sourced localized app usage data |
WO2013188807A2 (en) | 2012-06-14 | 2013-12-19 | Digimarc Corporation | Methods and systems for signal processing |
CA2878466C (en) | 2012-07-17 | 2019-04-16 | Myron Frederick Zahnow | System, apparatus and method for activity guidance and monitoring |
US20140025537A1 (en) * | 2012-07-23 | 2014-01-23 | Cellco Partnership D/B/A Verizon Wireless | Verifying accessory compatibility with a mobile device |
US9305559B2 (en) | 2012-10-15 | 2016-04-05 | Digimarc Corporation | Audio watermark encoding with reversing polarity and pairwise embedding |
US9401153B2 (en) | 2012-10-15 | 2016-07-26 | Digimarc Corporation | Multi-mode audio recognition and auxiliary data encoding and decoding |
US9224184B2 (en) | 2012-10-21 | 2015-12-29 | Digimarc Corporation | Methods and arrangements for identifying objects |
US8874924B2 (en) * | 2012-11-07 | 2014-10-28 | The Nielsen Company (Us), Llc | Methods and apparatus to identify media |
US9477925B2 (en) | 2012-11-20 | 2016-10-25 | Microsoft Technology Licensing, Llc | Deep neural networks training for speech and pattern recognition |
US20140196117A1 (en) * | 2013-01-07 | 2014-07-10 | Curtis John Schwebke | Recovery or upgrade of a cloud client device |
US9146990B2 (en) * | 2013-01-07 | 2015-09-29 | Gracenote, Inc. | Search and identification of video content |
US8990638B1 (en) * | 2013-03-15 | 2015-03-24 | Digimarc Corporation | Self-stabilizing network nodes in mobile discovery system |
US20140337905A1 (en) * | 2013-05-09 | 2014-11-13 | Telefonaktiebolaget L M Ericsson (Publ) | System and method for delivering extended media content |
US9251549B2 (en) * | 2013-07-23 | 2016-02-02 | Verance Corporation | Watermark extractor enhancements based on payload ranking |
KR102165818B1 (en) | 2013-09-10 | 2020-10-14 | 삼성전자주식회사 | Method, apparatus and recovering medium for controlling user interface using a input image |
US10277330B2 (en) * | 2013-09-19 | 2019-04-30 | Radius Universal Llc | Fiber optic communications and power network |
US9600474B2 (en) | 2013-11-08 | 2017-03-21 | Google Inc. | User interface for realtime language translation |
JP6222830B2 (en) * | 2013-12-27 | 2017-11-01 | マクセルホールディングス株式会社 | Image projection device |
JP2015127897A (en) * | 2013-12-27 | 2015-07-09 | ソニー株式会社 | Display control device, display control system, display control method, and program |
US9351060B2 (en) | 2014-02-14 | 2016-05-24 | Sonic Blocks, Inc. | Modular quick-connect A/V system and methods thereof |
US9613638B2 (en) * | 2014-02-28 | 2017-04-04 | Educational Testing Service | Computer-implemented systems and methods for determining an intelligibility score for speech |
WO2015139026A2 (en) | 2014-03-14 | 2015-09-17 | Go Tenna Inc. | System and method for digital communication between computing devices |
US10108748B2 (en) | 2014-05-30 | 2018-10-23 | Apple Inc. | Most relevant application recommendation based on crowd-sourced application usage data |
US9913100B2 (en) | 2014-05-30 | 2018-03-06 | Apple Inc. | Techniques for generating maps of venues including buildings and floors |
CN104023208B (en) | 2014-06-05 | 2019-07-05 | 北京小鱼在家科技有限公司 | A kind of device and method of automatic monitoring and autonomic response |
US9402161B2 (en) | 2014-07-23 | 2016-07-26 | Apple Inc. | Providing personalized content based on historical interaction with a mobile device |
US10200439B2 (en) * | 2014-07-29 | 2019-02-05 | Sap Se | In-memory cloud triple store |
US10045427B2 (en) * | 2014-09-29 | 2018-08-07 | Philips Lighting Holding B.V. | System and method of autonomous restore point creation and restoration for luminaire controllers |
CN105787485B (en) * | 2014-12-25 | 2019-11-26 | 联想(北京)有限公司 | The device and method for identifying clicking operation |
US11363460B1 (en) * | 2015-03-03 | 2022-06-14 | Amazon Technologies, Inc. | Device-based identification for automated user detection |
CN106162928A (en) * | 2015-04-03 | 2016-11-23 | 松翰科技股份有限公司 | Information transmission system and method |
US9529500B1 (en) | 2015-06-05 | 2016-12-27 | Apple Inc. | Application recommendation based on detected triggering events |
US9652196B2 (en) * | 2015-06-29 | 2017-05-16 | Microsoft Technology Licensing, Llc | Smart audio routing management |
KR102429427B1 (en) * | 2015-07-20 | 2022-08-04 | 삼성전자주식회사 | Image capturing apparatus and method for the same |
US20170034551A1 (en) * | 2015-07-29 | 2017-02-02 | GM Global Technology Operations LLC | Dynamic screen replication and real-time display rendering based on media-application characteristics |
CN105187295B (en) * | 2015-08-06 | 2019-05-17 | 广州华多网络科技有限公司 | A kind of method and client, server and system for realizing that bubble is shown in client |
US10331944B2 (en) * | 2015-09-26 | 2019-06-25 | Intel Corporation | Technologies for dynamic performance of image analysis |
JP6696149B2 (en) * | 2015-10-29 | 2020-05-20 | 富士通株式会社 | Image generation method, image generation program, information processing device, and display control method |
US10740409B2 (en) | 2016-05-20 | 2020-08-11 | Magnet Forensics Inc. | Systems and methods for graphical exploration of forensic data |
EP3458970A4 (en) | 2016-05-20 | 2019-12-04 | Roman Czeslaw Kordasiewicz | Systems and methods for graphical exploration of forensic data |
US20170351651A1 (en) * | 2016-06-01 | 2017-12-07 | Intel Corporation | Smart bookmark device and bookmark synchronization system |
US10595169B2 (en) | 2016-06-12 | 2020-03-17 | Apple Inc. | Message extension app store |
KR102493607B1 (en) * | 2016-06-15 | 2023-02-01 | 삼성전자주식회사 | Electronic device for supporting the fingerprint verification and operating method thereof |
CN106874817A (en) * | 2016-07-27 | 2017-06-20 | 阿里巴巴集团控股有限公司 | Two-dimensional code identification method, equipment and mobile terminal |
CN106921728A (en) | 2016-08-31 | 2017-07-04 | 阿里巴巴集团控股有限公司 | A kind of method for positioning user, information-pushing method and relevant device |
US20180089281A1 (en) * | 2016-09-29 | 2018-03-29 | Convida Wireless, Llc | Semantic query over distributed semantic descriptors |
WO2018142228A2 (en) | 2017-01-19 | 2018-08-09 | Mindmaze Holding Sa | Systems, methods, apparatuses and devices for detecting facial expression and for tracking movement and location including for at least one of a virtual and augmented reality system |
US10515474B2 (en) | 2017-01-19 | 2019-12-24 | Mindmaze Holding Sa | System, method and apparatus for detecting facial expression in a virtual reality system |
US10943100B2 (en) | 2017-01-19 | 2021-03-09 | Mindmaze Holding Sa | Systems, methods, devices and apparatuses for detecting facial expression |
EP3568804A2 (en) | 2017-02-07 | 2019-11-20 | Mindmaze Holding S.A. | Systems, methods and apparatuses for stereo vision and tracking |
CN109255564B (en) * | 2017-07-13 | 2022-09-06 | 菜鸟智能物流控股有限公司 | Pick-up point address recommendation method and device |
CN108021954B (en) | 2017-11-01 | 2020-06-05 | 阿里巴巴集团控股有限公司 | Method and device for starting business process |
CN107958667A (en) * | 2017-11-20 | 2018-04-24 | 北京云知声信息技术有限公司 | The mobile terminal protective case and method for controlling mobile terminal of application can quickly be started |
CN108196952B (en) * | 2017-12-05 | 2020-06-05 | 阿里巴巴集团控股有限公司 | Resource allocation method, device and equipment |
US11328533B1 (en) | 2018-01-09 | 2022-05-10 | Mindmaze Holding Sa | System, method and apparatus for detecting facial expression for motion capture |
JP6855401B2 (en) * | 2018-02-08 | 2021-04-07 | ヤフー株式会社 | Generation device, generation method, and generation program |
US10944669B1 (en) | 2018-02-09 | 2021-03-09 | GoTenna, Inc. | System and method for efficient network-wide broadcast in a multi-hop wireless network using packet echos |
CN108764392B (en) | 2018-04-25 | 2020-07-17 | 阿里巴巴集团控股有限公司 | Service processing method, device and equipment |
EP4130941A1 (en) * | 2018-05-04 | 2023-02-08 | Google LLC | Hot-word free adaptation of automated assistant function(s) |
US10134022B1 (en) * | 2018-06-07 | 2018-11-20 | Capital One Services, Llc | Transaction terminals for automated billing |
CN110209923B (en) * | 2018-06-12 | 2023-07-25 | 中国人民大学 | Topic influence user pushing method and device |
US11145313B2 (en) * | 2018-07-06 | 2021-10-12 | Michael Bond | System and method for assisting communication through predictive speech |
WO2020023909A1 (en) | 2018-07-27 | 2020-01-30 | GoTenna, Inc. | Vine™: zero-control routing using data packet inspection for wireless mesh networks |
EP3621039A1 (en) * | 2018-09-06 | 2020-03-11 | Tata Consultancy Services Limited | Real time overlay placement in videos for augmented reality applications |
CN109451139B (en) * | 2018-09-13 | 2020-11-20 | 腾讯科技(深圳)有限公司 | Message transmission method, terminal, device, electronic equipment and readable medium |
CN109615423B (en) | 2018-11-29 | 2020-06-16 | 阿里巴巴集团控股有限公司 | Service processing method and device |
KR102160189B1 (en) * | 2018-11-30 | 2020-09-25 | 인천대학교 산학협력단 | Electronic device that provides a user interface for supporting the coloring of objects within an animation and operating method thereof |
US10805690B2 (en) | 2018-12-04 | 2020-10-13 | The Nielsen Company (Us), Llc | Methods and apparatus to identify media presentations by analyzing network traffic |
US11126861B1 (en) | 2018-12-14 | 2021-09-21 | Digimarc Corporation | Ambient inventorying arrangements |
CN109448836A (en) * | 2018-12-27 | 2019-03-08 | 重庆科技学院 | A method of for detecting the robot and self-service examination of medical treatment & health |
US11340758B1 (en) * | 2018-12-27 | 2022-05-24 | Meta Platforms, Inc. | Systems and methods for distributing content |
EP3935882A4 (en) | 2019-03-08 | 2022-11-16 | Gotenna Inc. | Method for utilization-based traffic throttling in a wireless mesh network |
CN109822595A (en) * | 2019-03-25 | 2019-05-31 | 杭州纳茵特科技有限公司 | A kind of educational robot |
US11786694B2 (en) | 2019-05-24 | 2023-10-17 | NeuroLight, Inc. | Device, method, and app for facilitating sleep |
BR112021024551A2 (en) | 2019-06-28 | 2022-01-18 | Dolby Laboratories Licensing Corp | Video content type metadata for high dynamic range |
US20210034907A1 (en) * | 2019-07-29 | 2021-02-04 | Walmart Apollo, Llc | System and method for textual analysis of images |
RU2724445C1 (en) * | 2019-08-23 | 2020-06-23 | Самсунг Электроникс Ко., Лтд. | Eye position monitoring device and method |
US11698676B2 (en) | 2019-08-23 | 2023-07-11 | Samsung Electronics Co., Ltd. | Method and electronic device for eye-tracking |
KR102252084B1 (en) * | 2019-09-19 | 2021-05-17 | 주식회사 스켈터랩스 | Method, device and computer readable storage medium for user modeling based on blackboard |
CN110991450A (en) * | 2019-12-03 | 2020-04-10 | 厦门亿合恒拓信息科技有限公司 | Method for collecting power grid equipment information and computer readable storage medium |
CN111177587B (en) * | 2019-12-12 | 2023-05-23 | 广州地理研究所 | Shopping street recommendation method and device |
CN111081270B (en) * | 2019-12-19 | 2021-06-01 | 大连即时智能科技有限公司 | Real-time audio-driven virtual character mouth shape synchronous control method |
CN111127094B (en) * | 2019-12-19 | 2023-08-25 | 秒针信息技术有限公司 | Account matching method and device, electronic equipment and storage medium |
EP3839880A1 (en) | 2019-12-20 | 2021-06-23 | Koninklijke Philips N.V. | A system for performing ambient light image correction |
US10839181B1 (en) | 2020-01-07 | 2020-11-17 | Zebra Technologies Corporation | Method to synchronize a barcode decode with a video camera to improve accuracy of retail POS loss prevention |
CN113296767B (en) * | 2020-04-07 | 2024-07-02 | 阿里巴巴集团控股有限公司 | UI component generation method and device and user interface processing method and device |
US11501470B2 (en) | 2020-05-27 | 2022-11-15 | Microsoft Technology Licensing, Llc | Geometric encoding of data |
CN111667571B (en) * | 2020-06-08 | 2021-09-17 | 南华大学 | Nuclear facility source item three-dimensional distribution rapid reconstruction method, device, equipment and medium |
US11321797B2 (en) * | 2020-08-25 | 2022-05-03 | Kyndryl, Inc. | Wearable watermarks |
CN111930846B (en) * | 2020-09-15 | 2021-02-23 | 支付宝(杭州)信息技术有限公司 | Data processing method, device and equipment |
CN112183401A (en) * | 2020-09-30 | 2021-01-05 | 敦泰电子(深圳)有限公司 | Image acquisition method, chip and image acquisition device |
CN114553618A (en) * | 2020-11-27 | 2022-05-27 | 赛万特科技有限责任公司 | Method and device for controlling equipment, intelligent household equipment, system and storage medium |
EP4039620A1 (en) * | 2021-02-03 | 2022-08-10 | ATS Automation Tooling Systems Inc. | System and method for rotary drive curved track in a conveyor system |
US20220383025A1 (en) * | 2021-05-26 | 2022-12-01 | International Business Machines Corporation | Augmented reality translation of sign language classifier constructions |
CN113360820B (en) * | 2021-05-29 | 2024-03-08 | 北京网聘信息技术有限公司 | Page display method, system, equipment and storage medium |
CN115600162A (en) * | 2021-07-09 | 2023-01-13 | 华为云计算技术有限公司(Cn) | Method, device and related equipment for adding watermark in data |
CN113657116B (en) * | 2021-08-05 | 2023-08-08 | 天津大学 | Social media popularity prediction method and device based on visual semantic relationship |
CN113836397B (en) * | 2021-09-02 | 2024-07-12 | 桂林电子科技大学 | Recommendation method for personalized feature modeling of shopping basket |
CN113946701B (en) * | 2021-09-14 | 2024-03-19 | 广州市城市规划设计有限公司 | Dynamic updating method and device for urban and rural planning data based on image processing |
US11941860B2 (en) * | 2021-09-24 | 2024-03-26 | Zebra Tehcnologies Corporation | Computational load mitigation for image-based item recognition |
US11977858B2 (en) | 2022-02-07 | 2024-05-07 | T-Mobile Usa, Inc. | Centralized intake and capacity assessment platform for project processes, such as with product development in telecommunications |
US20240078684A1 (en) * | 2022-09-01 | 2024-03-07 | Qualcomm Incorporated | Global motion modeling for automotive image data |
CN117771664B (en) * | 2024-01-03 | 2024-06-07 | 广州创一网络传媒有限公司 | Interactive game projection method of self-adaptive projection surface |
CN118113805B (en) * | 2024-04-29 | 2024-06-25 | 山东省国土测绘院 | Geographic information survey calibration method and system based on deep learning |
CN118229242B (en) * | 2024-05-24 | 2024-08-27 | 中国电子科技集团公司第十五研究所 | Office system management method and device suitable for big data |
Family Cites Families (51)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6311214B1 (en) | 1995-07-27 | 2001-10-30 | Digimarc Corporation | Linking of computers based on optical sensing of digital data |
JP4242529B2 (en) * | 1999-10-27 | 2009-03-25 | オリンパス株式会社 | Related information presentation device and related information presentation method |
KR100330134B1 (en) * | 1999-12-29 | 2002-03-27 | 류명선 | System and method for alarming speed limit and guiding traffic information |
US6728312B1 (en) * | 2000-04-13 | 2004-04-27 | Forgent Networks, Inc. | Adaptive video decoding and rendering with respect to processor congestion |
AU2002213069A1 (en) * | 2000-10-10 | 2002-04-22 | Discrete Wireless, Inc. | System and methods for conserving wireless resources |
US6748360B2 (en) * | 2000-11-03 | 2004-06-08 | International Business Machines Corporation | System for selling a product utilizing audio content identification |
US6722569B2 (en) | 2001-07-13 | 2004-04-20 | Welch Allyn Data Collection, Inc. | Optical reader having a color imager |
EP2112804A3 (en) * | 2001-08-15 | 2009-12-02 | Precache Inc. | Packet routing via payload inspection and subscription processing in a publish-subscribe network |
CN1589433A (en) | 2001-11-19 | 2005-03-02 | 皇家飞利浦电子股份有限公司 | Method and system for allocating a budget surplus to a task |
AU2002357029A1 (en) * | 2001-11-30 | 2003-06-17 | A New Voice, Inc. | Method and system for contextual prioritization of unified messages |
WO2005010817A1 (en) * | 2003-07-24 | 2005-02-03 | Olympus Corporation | Image processing device |
US8615487B2 (en) * | 2004-01-23 | 2013-12-24 | Garrison Gomez | System and method to store and retrieve identifier associated information content |
US20060107219A1 (en) | 2004-05-26 | 2006-05-18 | Motorola, Inc. | Method to enhance user interface and target applications based on context awareness |
EP1774686A4 (en) | 2004-08-06 | 2012-08-08 | Digimarc Corp | Fast signal detection and distributed computing in portable computing devices |
JP2006091980A (en) | 2004-09-21 | 2006-04-06 | Seiko Epson Corp | Image processor, image processing method and image processing program |
US20100036717A1 (en) | 2004-12-29 | 2010-02-11 | Bernard Trest | Dynamic Information System |
EP1851646A2 (en) * | 2005-01-06 | 2007-11-07 | Tervela Inc. | Intelligent messaging application programming interface |
US7873974B2 (en) * | 2005-09-19 | 2011-01-18 | Sony Corporation | Identification of television programming using a portable wireless device |
US7319908B2 (en) | 2005-10-28 | 2008-01-15 | Microsoft Corporation | Multi-modal device power/mode management |
US8849821B2 (en) * | 2005-11-04 | 2014-09-30 | Nokia Corporation | Scalable visual search system simplifying access to network and device functionality |
WO2007130688A2 (en) * | 2006-05-10 | 2007-11-15 | Evolution Robotics, Inc. | Mobile computing device with imaging capability |
US7787697B2 (en) | 2006-06-09 | 2010-08-31 | Sony Ericsson Mobile Communications Ab | Identification of an object in media and of related media objects |
US7680959B2 (en) * | 2006-07-11 | 2010-03-16 | Napo Enterprises, Llc | P2P network for providing real time media recommendations |
US8565815B2 (en) * | 2006-11-16 | 2013-10-22 | Digimarc Corporation | Methods and systems responsive to features sensed from imagery or other data |
US7991157B2 (en) * | 2006-11-16 | 2011-08-02 | Digimarc Corporation | Methods and systems responsive to features sensed from imagery or other data |
CN101246486B (en) * | 2007-02-13 | 2012-02-01 | 国际商业机器公司 | Method and apparatus for improved process of expressions |
US8788529B2 (en) * | 2007-02-26 | 2014-07-22 | Microsoft Corp. | Information sharing between images |
US20080243806A1 (en) * | 2007-03-26 | 2008-10-02 | Roger Dalal | Accessing information on portable cellular electronic devices |
US7912444B2 (en) * | 2007-04-23 | 2011-03-22 | Sony Ericsson Mobile Communications Ab | Media portion selection system and method |
JP5380789B2 (en) * | 2007-06-06 | 2014-01-08 | ソニー株式会社 | Information processing apparatus, information processing method, and computer program |
US8279946B2 (en) * | 2007-11-23 | 2012-10-02 | Research In Motion Limited | System and method for providing a variable frame rate and adaptive frame skipping on a mobile device |
US9224150B2 (en) * | 2007-12-18 | 2015-12-29 | Napo Enterprises, Llc | Identifying highly valued recommendations of users in a media recommendation network |
KR20090091549A (en) * | 2008-02-25 | 2009-08-28 | 주식회사 케이티 | Apparatus and method for producing a photo included the location information of a subject |
US8417259B2 (en) * | 2008-03-31 | 2013-04-09 | At&T Mobility Ii Llc | Localized detection of mobile devices |
US8036417B2 (en) | 2008-06-11 | 2011-10-11 | Eastman Kodak Company | Finding orientation and date of hardcopy medium |
US20090315886A1 (en) * | 2008-06-19 | 2009-12-24 | Honeywell International Inc. | Method to prevent resource exhaustion while performing video rendering |
CA2734613C (en) | 2008-08-19 | 2020-06-09 | Digimarc Corporation | Methods and systems for content processing |
US8805110B2 (en) | 2008-08-19 | 2014-08-12 | Digimarc Corporation | Methods and systems for content processing |
US8520979B2 (en) | 2008-08-19 | 2013-08-27 | Digimarc Corporation | Methods and systems for content processing |
EP2324475A1 (en) * | 2008-08-26 | 2011-05-25 | Dolby Laboratories Licensing Corporation | Robust media fingerprints |
US9788043B2 (en) * | 2008-11-07 | 2017-10-10 | Digimarc Corporation | Content interaction methods and systems employing portable devices |
US20100135417A1 (en) * | 2008-12-02 | 2010-06-03 | Asaf Hargil | Processing of video data in resource contrained devices |
JP2010134649A (en) * | 2008-12-03 | 2010-06-17 | Canon Inc | Information processing apparatus, its processing method, and program |
JP5315111B2 (en) * | 2009-03-31 | 2013-10-16 | 株式会社エヌ・ティ・ティ・ドコモ | Terminal device, information presentation system, and terminal screen display method |
CN101533506B (en) * | 2009-04-24 | 2012-01-04 | 西安电子科技大学 | Robust image double-watermarking method |
US8886206B2 (en) | 2009-05-01 | 2014-11-11 | Digimarc Corporation | Methods and systems for content processing |
US9197736B2 (en) | 2009-12-31 | 2015-11-24 | Digimarc Corporation | Intuitive computing methods and systems |
US8121618B2 (en) | 2009-10-28 | 2012-02-21 | Digimarc Corporation | Intuitive computing methods and systems |
US8694533B2 (en) * | 2010-05-19 | 2014-04-08 | Google Inc. | Presenting mobile content based on programming context |
US8781152B2 (en) * | 2010-08-05 | 2014-07-15 | Brian Momeyer | Identifying visual media content captured by camera-enabled mobile device |
US8359020B2 (en) | 2010-08-06 | 2013-01-22 | Google Inc. | Automatically monitoring for voice input based on context |
-
2011
- 2011-10-21 US US13/278,949 patent/US9183580B2/en not_active Expired - Fee Related
- 2011-11-04 JP JP2013537885A patent/JP6054870B2/en active Active
- 2011-11-04 CN CN2011800641753A patent/CN103329147A/en active Pending
- 2011-11-04 WO PCT/US2011/059412 patent/WO2012061760A2/en active Application Filing
- 2011-11-04 CA CA2815944A patent/CA2815944C/en active Active
- 2011-11-04 CN CN201710150469.7A patent/CN107103316B/en active Active
- 2011-11-04 KR KR1020137014291A patent/KR102010221B1/en active IP Right Grant
- 2011-11-04 EP EP11838908.9A patent/EP2635997A4/en not_active Withdrawn
-
2016
- 2016-12-01 JP JP2016234336A patent/JP6572468B2/en active Active
Non-Patent Citations (1)
Title |
---|
See references of EP2635997A4 * |
Cited By (49)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8819172B2 (en) | 2010-11-04 | 2014-08-26 | Digimarc Corporation | Smartphone-based methods and systems |
US9218530B2 (en) | 2010-11-04 | 2015-12-22 | Digimarc Corporation | Smartphone-based methods and systems |
US9367886B2 (en) | 2010-11-04 | 2016-06-14 | Digimarc Corporation | Smartphone arrangements responsive to musical artists and other content proprietors |
US10930289B2 (en) | 2011-04-04 | 2021-02-23 | Digimarc Corporation | Context-based smartphone sensor logic |
WO2013179086A1 (en) * | 2012-05-29 | 2013-12-05 | Nokia Corporation | Supporting the provision of services |
US10223107B2 (en) | 2012-05-29 | 2019-03-05 | Nokia Technologies Oy | Supporting the provision of services |
US11290762B2 (en) | 2012-11-27 | 2022-03-29 | Apple Inc. | Agnostic media delivery system |
US11070889B2 (en) | 2012-12-10 | 2021-07-20 | Apple Inc. | Channel bar user interface |
US11317161B2 (en) | 2012-12-13 | 2022-04-26 | Apple Inc. | TV side bar user interface |
US11245967B2 (en) | 2012-12-13 | 2022-02-08 | Apple Inc. | TV side bar user interface |
US11297392B2 (en) | 2012-12-18 | 2022-04-05 | Apple Inc. | Devices and method for providing remote control hints on a display |
US11194546B2 (en) | 2012-12-31 | 2021-12-07 | Apple Inc. | Multi-user TV user interface |
US11822858B2 (en) | 2012-12-31 | 2023-11-21 | Apple Inc. | Multi-user TV user interface |
US9858596B2 (en) | 2013-02-06 | 2018-01-02 | Muzak Llc | System for targeting location-based communications |
US9424594B2 (en) | 2013-02-06 | 2016-08-23 | Muzak Llc | System for targeting location-based communications |
US9099080B2 (en) | 2013-02-06 | 2015-08-04 | Muzak Llc | System for targeting location-based communications |
US9317872B2 (en) | 2013-02-06 | 2016-04-19 | Muzak Llc | Encoding and decoding an audio watermark using key sequences comprising of more than two frequency components |
JP2014157595A (en) * | 2013-02-15 | 2014-08-28 | Fuji Xerox Co Ltd | Method and program for identifying medium or function, article including marker, and marker arrangement method |
WO2015039139A1 (en) * | 2013-09-16 | 2015-03-19 | The Electric Fan Company | Distributed, Unfolding, Embedded Transaction and Inventory Apparatuses, Methods and Systems |
US10595100B2 (en) | 2013-09-16 | 2020-03-17 | The Electric Fan Company | Distributed, unfolding, embedded transaction and inventory apparatuses, methods and systems |
EP3047445A4 (en) * | 2013-09-16 | 2017-01-25 | The Electric Fan Company | Distributed, Unfolding, Embedded Transaction and Inventory Apparatuses, Methods and Systems |
JP2016541066A (en) * | 2013-12-03 | 2016-12-28 | デ ソウザ、アントニオ フェレイラ | Electronic inquiry system and verification of authenticity, validity and restrictions of national driver's license (CNH), vehicle registration certificate (CRV) and vehicle registration license certificate (CRLV) using data reading technology through approximation |
US11049094B2 (en) | 2014-02-11 | 2021-06-29 | Digimarc Corporation | Methods and arrangements for device to device communication |
US20170076290A1 (en) * | 2014-05-16 | 2017-03-16 | Sten Corfitsen | System and method for performing payments from a vehicle |
US12086186B2 (en) | 2014-06-24 | 2024-09-10 | Apple Inc. | Interactive interface for navigating in a user interface associated with a series of content |
US12105942B2 (en) | 2014-06-24 | 2024-10-01 | Apple Inc. | Input device and user interface interactions |
US11461397B2 (en) | 2014-06-24 | 2022-10-04 | Apple Inc. | Column interface for navigating in a user interface |
US11543938B2 (en) | 2016-06-12 | 2023-01-03 | Apple Inc. | Identifying applications on which content is available |
US10642449B2 (en) | 2016-06-12 | 2020-05-05 | Apple Inc. | Identifying applications on which content is available |
US11520858B2 (en) | 2016-06-12 | 2022-12-06 | Apple Inc. | Device-level authorization for viewing content |
US11609678B2 (en) | 2016-10-26 | 2023-03-21 | Apple Inc. | User interfaces for browsing content from multiple content applications on an electronic device |
US11966560B2 (en) | 2016-10-26 | 2024-04-23 | Apple Inc. | User interfaces for browsing content from multiple content applications on an electronic device |
US11467726B2 (en) | 2019-03-24 | 2022-10-11 | Apple Inc. | User interfaces for viewing and accessing content on an electronic device |
US11962836B2 (en) | 2019-03-24 | 2024-04-16 | Apple Inc. | User interfaces for a media browsing application |
US11683565B2 (en) | 2019-03-24 | 2023-06-20 | Apple Inc. | User interfaces for interacting with channels that provide content that plays in a media browsing application |
US11750888B2 (en) | 2019-03-24 | 2023-09-05 | Apple Inc. | User interfaces including selectable representations of content items |
US11445263B2 (en) | 2019-03-24 | 2022-09-13 | Apple Inc. | User interfaces including selectable representations of content items |
US12008232B2 (en) | 2019-03-24 | 2024-06-11 | Apple Inc. | User interfaces for viewing and accessing content on an electronic device |
US11057682B2 (en) | 2019-03-24 | 2021-07-06 | Apple Inc. | User interfaces including selectable representations of content items |
WO2020242840A1 (en) * | 2019-05-24 | 2020-12-03 | Universal City Studios Llc | Systems and methods for providing in-application messaging |
US11797606B2 (en) | 2019-05-31 | 2023-10-24 | Apple Inc. | User interfaces for a podcast browsing and playback application |
US11863837B2 (en) | 2019-05-31 | 2024-01-02 | Apple Inc. | Notification of augmented reality content on an electronic device |
US11843838B2 (en) | 2020-03-24 | 2023-12-12 | Apple Inc. | User interfaces for accessing episodes of a content series |
US11899895B2 (en) | 2020-06-21 | 2024-02-13 | Apple Inc. | User interfaces for setting up an electronic device |
CN112511848A (en) * | 2020-11-09 | 2021-03-16 | 网宿科技股份有限公司 | Live broadcast method, server and computer readable storage medium |
US11720229B2 (en) | 2020-12-07 | 2023-08-08 | Apple Inc. | User interfaces for browsing and presenting content |
US11934640B2 (en) | 2021-01-29 | 2024-03-19 | Apple Inc. | User interfaces for record labels |
CN113379794A (en) * | 2021-05-19 | 2021-09-10 | 重庆邮电大学 | Single-target tracking system and method based on attention-key point prediction model |
US12149779B2 (en) | 2022-02-18 | 2024-11-19 | Apple Inc. | Advertisement user interface |
Also Published As
Publication number | Publication date |
---|---|
CN107103316A (en) | 2017-08-29 |
KR20130118897A (en) | 2013-10-30 |
WO2012061760A3 (en) | 2012-06-28 |
EP2635997A2 (en) | 2013-09-11 |
JP2014505896A (en) | 2014-03-06 |
CA2815944A1 (en) | 2012-05-10 |
JP6572468B2 (en) | 2019-09-11 |
US9183580B2 (en) | 2015-11-10 |
KR102010221B1 (en) | 2019-08-13 |
US20120134548A1 (en) | 2012-05-31 |
CN107103316B (en) | 2020-11-03 |
CA2815944C (en) | 2019-09-17 |
JP6054870B2 (en) | 2016-12-27 |
JP2017108401A (en) | 2017-06-15 |
CN103329147A (en) | 2013-09-25 |
EP2635997A4 (en) | 2015-01-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10658007B2 (en) | Smartphone-based methods and systems | |
US10521873B2 (en) | Salient point-based arrangements | |
CA2815944C (en) | Smartphone-based methods and systems | |
US9240021B2 (en) | Smartphone-based methods and systems | |
US9225822B2 (en) | Channelized audio watermarks | |
US9367886B2 (en) | Smartphone arrangements responsive to musical artists and other content proprietors | |
US9218530B2 (en) | Smartphone-based methods and systems | |
US20120154633A1 (en) | Linked Data Methods and Systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11838908 Country of ref document: EP Kind code of ref document: A2 |
|
ENP | Entry into the national phase |
Ref document number: 2815944 Country of ref document: CA |
|
REEP | Request for entry into the european phase |
Ref document number: 2011838908 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011838908 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2013537885 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 20137014291 Country of ref document: KR Kind code of ref document: A |