KR20110043775A

KR20110043775A - Methods and systems for content processing

Info

Publication number: KR20110043775A
Application number: KR1020117006167A
Authority: KR
Inventors: 제프리 비. 로즈; 토니 에프. 로드리게즈; 존 디. 로드; 브라이언 티. 매킨토시; 니콜 로즈; 윌리엄 와이 콘웰
Original assignee: 디지맥 코포레이션
Priority date: 2008-08-19
Filing date: 2009-08-19
Publication date: 2011-04-27
Also published as: KR20160136467A; KR101763132B1; CA2734613C; CN102216941B; KR101680044B1; EP2313847A1; EP2313847A4; WO2010022185A1; CA2734613A1; CN102216941A

Abstract

모바일 폰들 및 다른 휴대용 디바이스들은 다양한 기술들이 장착되고, 그에 의해 기존의 기능이 개선될 수 있고, 새로운 기능이 제공될 수 있다. 일부 양태들은 비주얼 검색 능력과, 상이한 이미지 입력들에 응답하는 적합한 동작들을 결정하는 것에 관련된다. 다른 양태들은 이미지 데이터의 처리에 관련된다. 또 다른 양태들은 메타데이터 생성, 처리 및 제공에 관련된다. 또 다른 양태들은 이용자 인터페이스 개선들에 관련된다. 다른 양태들은 이미징 아키텍처들에 관련되며, 이 중 모바일 폰의 이미지 센서는 이미지를 캡처하여 나중에 처리하기 위해 패킷화된 명령어들/데이터에 대해 연속적으로 동작하는 스테이지들의 체인 중 하나이다. 또 다른 양태들은 모바일 디바이스와 원격 리소스들("클라우드") 사이의 처리 작업들의 분배에 관련된다. 요소적 이미지 처리(예를 들면, 간단한 필터링 및 에지 검출)가 모바일 폰 상에서 실행될 수 있지만, 다른 동작들은 원격 서비스 제공자들에 참조될 수 있다. 원격 서비스 제공자들은 역경매들과 같은 기술들을 이용하여 선택될 수 있고, 이를 통해 이들은 작업들을 처리하기 위해 경합한다. 다수의 다른 특징들 및 어레인지먼트들이 또한 상술된다. Mobile phones and other portable devices may be equipped with a variety of technologies, whereby existing functionality may be improved and new functionality may be provided. Some aspects relate to visual search capability and determining appropriate actions in response to different image inputs. Other aspects relate to the processing of image data. Still other aspects relate to metadata generation, processing and provision. Still other aspects relate to user interface improvements. Other aspects relate to imaging architectures, wherein the image sensor of the mobile phone is one of a chain of stages that operate continuously on packetized instructions / data for capturing an image and later processing. Still other aspects relate to the distribution of processing tasks between the mobile device and remote resources (“cloud”). Although elementary image processing (eg, simple filtering and edge detection) may be performed on the mobile phone, other operations may be referenced to remote service providers. Remote service providers can be selected using techniques such as reverse auctions, through which they contend for processing tasks. Many other features and arrangements are also described above.

Description

METHODS AND SYSTEMS FOR CONTENT PROCESSING

도입Introduction

본 명세서에 상술된 기술의 특정 양태들이 도 0에 도입된다. 이용자의 모바일 폰이 이미지를 캡처하고(이용자 명령어에 응답하거나 자동으로), 장면 내의 오브젝트들이 인식된다. 각각의 오브젝트와 연관된 정보가 식별되고, 이미지 상에 그래픽으로 오버레이되는 장면-등록된 대화형 비주얼 "보블(bauble)"을 통해 이용자에게 이용 가능하게 된다. 보블은 스스로 정보를 제공할 수 있거나, 이용자가 관련 정보의 더 긴 리스팅을 획득허가나 관련 기능/애플리케이션을 론칭하기 위해 표시된 위치에 탭핑할 수 있는 단순한 표시자(indicia)일 수 있다. Certain aspects of the techniques described herein above are introduced in FIG. The user's mobile phone captures the image (in response to user commands or automatically) and the objects in the scene are recognized. Information associated with each object is identified and made available to the user via a scene-registered interactive visual "bauble" that is graphically overlaid on the image. The bauble may provide the information on its own, or it may be a simple indicia in which the user may tap on a marked location to obtain a longer listing of relevant information or to launch a related function / application.

예시된 장면에서, 카메라는 배경의 얼굴을 "Bob"으로 인식하였고, 따라서 이미지에 주석을 달았다. 고질라 영화를 홍보하는 광고판이 인식되었고, "상영 시간들"이라고 말하는 보블이 디스플레이 상에 블리팅되었다(blitted) - 이용자에게 정보 스크리닝을 위해 탭핑하도록 촉구한다. In the illustrated scene, the camera recognized the face of the background as "Bob" and thus annotated the image. A billboard promoting a Godzilla movie was recognized, and a bauble saying "run times" was blitted on the display-prompting the user to tap for information screening.

폰은 장면으로부터 이용자의 자동차를 인식하였고, 또한 화상에서 다른 차량을 - 제조 및 연식에 의해- 식별하였다. 양쪽 모두 오버레이된 텍스트에 의해 표시된다. 레스토랑이 또한 식별되었고, 리뷰들의 콜렉션으로부터 초기 리뷰("Jane의 리뷰: 매우 좋음!")가 보인다. 탭핑은 더 많은 리뷰들을 불러온다. The phone recognized the user's car from the scene and also identified another vehicle-by manufacture and age-in the image. Both are represented by overlaid text. The restaurant was also identified, and an initial review ("Jane's review: Very good!") Is seen from the collection of reviews. Tapping brings up more reviews.

일 특정 어레인지먼트(arrangement)에서, 이 시나리오는 로컬 디바이스 오브젝트 인식 코어 서비스들에 의해 지원되는 클라우드측 서비스(cloud-side service)로서 구현된다. 이용자들은 고정된 오브젝트 및 모바일 오브젝트 양쪽 모두에 대한 주석들을 남겨둘 수 있다. 탭핑된 보블들은 다른 애플리케이션들을 트리거링할 수 있다. 소셜 네트워크들은 반대 관계들의 트랙을 유지할 수 있다 - 가상 "오브젝트들의 웹(web of objects)"을 형성함.In one particular arrangement, this scenario is implemented as a cloud-side service supported by local device object aware core services. Users can leave comments for both fixed and mobile objects. Tapped baubles can trigger other applications. Social networks can keep track of opposite relationships-forming a virtual "web of objects".

초기의 롤-아웃에서, 인식 가능한 오브젝트들의 등급이 제한되지만 유용할 것이다. 오브젝트 식별 이벤트들은 주로 보블들에 대한 소셜-웹 접속들 및 공용 도메인 정보를 페치하여 연관시킬 것이다. 바코드들, 디지털 워터마크들, 얼굴 인식, OCR 등을 활용하는 애플리케이션들은 그 기술의 초기 전개를 지원하도록 도울 수 있다. In the initial roll-out, the class of recognizable objects is limited but useful. Object identification events will primarily fetch and associate social-web connections and public domain information for baubles. Applications that utilize bar codes, digital watermarks, facial recognition, OCR, and the like can help support the early deployment of the technology.

나중에, 그 어레인지먼트는 경매 시장으로 발달할 것으로 예상되며, 지불 기업들은 높게 타겟된 인구 통계적 이용자 스크린들 상에 그들 자신의 보블들(또는 연관된 정보)을 배치하기 원한다. 입력 비주얼 자극들(일부 경우들에서 GPS/자기계 데이터에 의해 촉진)과 함께 이용자 프로파일들은 클라우드에서 구글식 믹스-마스터(Google-esque mix-master)에 공급되어, 보블들을 요청하는 이용자들에 모바일 디바이스-스크린 부동산의 구매자들을 매칭한다.Later, the arrangement is expected to develop into an auction market, where payment companies want to place their own baubles (or associated information) on highly targeted demographic user screens. User profiles along with input visual stimuli (promoted by GPS / magnet data in some cases) are fed to a Google-esque mix-master in the cloud, allowing mobile users to request baubles. Match buyers of device-screen real estate.

결국, 이러한 기능은 "I'll try to get a Bauble on that" 또는 "See what happens if you Viewgle that scene"에서와 같이 평범한 어휘에 들어갈 만큼 편재해 있을 수 있다. After all, this feature can be ubiquitous enough to fit into a common vocabulary, such as "I'll try to get a Bauble on that" or "See what happens if you Viewgle that scene."

배경background

Digimarc의 특허 제6,947,571호는 셀 폰 카메라가 콘텐트(예를 들면, 이미지 데이터)를 캡처하고, 이미지에 관련된 식별자를 도출하기 위해 이를 처리하는 시스템을 도시한다. 이러한 도출된 식별자는 대응하는 데이터 또는 동작들을 나타내는 데이터 구조(예를 들면, 데이터베이스)에 제공된다. 그 후에, 셀 폰은 응답 정보를 디스플레이하거나 응답 동작을 취한다. 이러한 동작들의 시퀀스는 "비주얼 검색(visual search)"이라고 칭해지기도 한다. Digimarc's patent 6,947,571 illustrates a system in which a cell phone camera captures content (eg, image data) and processes it to derive an identifier associated with the image. This derived identifier is provided to a data structure (eg, a database) that represents the corresponding data or operations. The cell phone then displays the response information or takes a response action. This sequence of operations may be referred to as a "visual search".

관련 기술들은 특허 공보들, 20080300011 (Digimarc), 7,283,983 및 WO07/130688 (Evolution Robotics), 20070175998 및 20020102966 (DSPV), 20060012677, 20060240862 및 20050185060 (Google), 20060056707 및 20050227674 (Nokia), 20060026140 (ExBiblio), 6,491,217, 20020152388, 20020178410 및 20050144455 (Philips), 20020072982 및 20040199387 (Shazam), 20030083098 (Canon), 20010055391 (Qualcomm), 20010001854 (AirClic), 7,251,475 (Sony), 7,174,293 (Iceberg), 7,065,559 (Organnon Wireless), 7,016,532 (Evryx Technologies), 6,993,573 및 6,199,048 (Neomedia), 6,941,275 (Tune Hunter), 6,788,293 (Silverbrook Research), 6,766,363 및 6,675,165 (BarPoint), 6,389,055 (Alcatel-Lucent), 6,121,530 (Sonoda), 및 6,002,946 (Reber/Motorola)에 도시되어 있다.Related technologies are described in patent publications, 20080300011 (Digimarc), 7,283,983 and WO07 / 130688 (Evolution Robotics), 20070175998 and 20020102966 (DSPV), 20060012677, 20060240862 and 20050185060 (Google), 20060056707 and 20050227674 (Nokia), 20060026140 (ExBiblio), 6,491,217, 20020152388, 20020178410 and 20050144455 (Philips), 20020072982 and 20040199387 (Shazam), 20030083098 (Canon), 20010055391 (Qualcomm), 20010001854 (AirClic), 7,251,475 (Sony), 7,174,293 (Iceberg), Organan, 06065, (Evryx Technologies), 6,993,573 and 6,199,048 (Neomedia), 6,941,275 (Tune Hunter), 6,788,293 (Silverbrook Research), 6,766,363 and 6,675,165 (BarPoint), 6,389,055 (Alcatel-Lucent), 6,121,530 and 6,121,946 (Sberoda) Is shown.

본 발명의 목적은 콘텐트 처리를 위한 방법들 및 시스템들을 제공하는 것이다.It is an object of the present invention to provide methods and systems for content processing.

현재 상술된 기술의 양태들은 이러한 기술들에 대한 개선들에 관련된다 - 직관적인 계산의 목적을 지향: 보고/보거나 듣고, 그 감지된 콘텍스트에서 이용자의 바람을 추론할 수 있다.Aspects of the presently described technique relate to improvements to these techniques-aimed at the purpose of intuitive computation: to see / view or hear and infer the user's desires in the sensed context.

도 0은 본 명세서에 상술된 기술의 특정 양태들을 통합하는 예시적인 실시예를 도시한 도면.
도 1은 본 기술의 양태들을 통합하는 실시예의 상부도.
도 2는 이용자가 카메라-장착된 셀 폰을 실행하도록 요청할 수 있는 애플리케이션의 일부를 도시한 도면.
도 3은 본 기술의 양태들을 통합하는 실시예에서 상업용 엔티티들의 일부를 식별하는 도면.
도 4, 도 4a 및 도 4b는 픽셀 데이터 및 파생물들이 어떻게 상이한 작업들에 적용되고 패킷 형태로 패키징되는지를 개념적으로 도시한 도면들.
도 5는 상이한 작업들이 어떻게 특정 이미지 처리 동작들을 공용으로 가질 수 있는지를 도시한 도면.
도 6은 공용 이미지 처리 동작들이 어떻게 식별될 수 있고, 이들 동작들을 실행하도록 셀 폰 처리 하드웨어를 구성하기 위해 이용될 수 있는지를 도시한 도면.
도 7은 셀 폰이 어떻게 특정 픽셀-관련된 데이터를 로컬 처리를 위한 내부 버스를 통해 송신할 수 있고, 다른 픽셀-관련된 데이터를 클라우드에서 처리하기 위한 통신 채널을 통해 송신할 수 있는지를 도시한 도면.
도 8은 도 7의 클라우드 처리가 어떻게 이용자가 원하는 작업에 훨씬 더 많은 "지능(intelligence)"이 적용되도록 허용하는지를 도시한 도면.
도 9는 키벡터 데이터가 어떻게 상이한 외부 서비스 제공자들에게 배포되는지, 누가 보상을 위한 교환에서 서비스들을 실행하는지, 어떤 것이 이용자를 위해 강화된 방식으로 처리되는지를 상세히 도시한 도면.
도 10은 셀 폰-기반 처리가 어떻게 템플릿 매칭과 같은 단순한 오브젝트 식별 작업들에 적합한지 반면, 클라우드-기반 처리가 데이터 연관과 같은 복잡한 작업들에 적합한지를 주지하는 본 기술의 양태들을 통합하는 실시예를 도시한 도면.
도 10a는 가능한 센서에 가깝게 비주얼 키벡터 처리를 실행하고 가능한 통신 스택이 낮게 클라우드에 대한 트래픽을 관리함으로써, 이용자 경험이 최적화되는 것을 주지하는 본 기술의 양태들을 통합하는 실시예를 도시한 도면.
도 11은 외부 처리에 관련된 작업들이, 셀 폰에 대한 특정 작업들을 일상적으로 실행하는 제 1 그룹의 서비스 제공자들에게 라우팅될 수 있거나, 셀 폰으로부터 처리 작업들을 위해 동적으로 기초하여 경합하는 제 2 그룹의 서비스 제공자들에게 라우팅될 수 있는 것을 도시한 도면.
도 12는 예를 들면 비드 필터 및 브로드캐스트 에이전트 소프트웨어 모듈이 어떻게 역경매 처리를 조사할 수 있는지를 보여주는 도 11의 개념들에 대한 확장도.
도 13은 본 기술의 양태들을 통합하는 처리 어레인지먼트의 상부 블록도.
도 14는 본 기술의 양태들을 통합하는 다른 처리 어레인지먼트의 상부 블록도.
도 15는 셀 폰 카메라에 의해 캡처될 수 있는 이미지 형태들의 예시 범위를 도시한 도면.
도 16은 본 기술의 양태들을 통합하는 특정 하드웨어 구현을 도시한 도면.
도 17은 예시적인 실시예에 이용된 패킷의 양태들을 도시한 도면.
도 18은 SIFT 기술의 구현을 예시하는 블록도.
도 19는 예를 들면, 패킷 헤더 데이터가 어떻게 메모리의 이용을 통해 처리 동안 변경될 수 있는지를 도시한 블록도.
도 19a는 로봇형 플레이어 프로젝트로부터 종래 기술의 아키텍처를 도시한 도면.
도 19b는 다양한 팩터들이 어떻게 상이한 동작들이 처리될 수 있는 방법에 영향을 미칠 수 있는지를 도시한 도면.
도 20은 셀 폰 카메라 및 셀 폰 프로젝터가 렌즈를 공유하는 어레인지먼트를 도시한 도면.
도 20a는 본 기술의 실시예들에서 이용될 수 있는 참조 플랫폼 아키텍처를 도시한 도면.
도 21은 셀 폰 카메라에 의해 캡처된 데스크탑 전화의 이미지를 도시한 도면.
도 22는 도 21의 이미지로부터 식별된 특징들을 참조하여, 공용 이미지들의 저장소에서 발견된 유사한 이미지들의 콜렉션을 도시한 도면.
도 23 내지 도 28a 및 도 30 내지 도 34는 본 기술의 양태들을 통합하는 방법들을 상술한 흐름도들.
도 29는 셀 폰 이용자에 의해 캡처된 에펠 탑의 예술 샷.
도 35는 셀 폰 이용자에 의해 캡처된 다른 이미지.
도 36은 본 기술의 양태들에 따른 방법들을 이용하여 발견된 전화기의 밑면의 이미지.
도 37은 셀 폰의 하나의 스타일의 물리적 이용자 인터페이스의 부분을 도시한 도면.
도 37a 및 도 37b는 상이한 링킹 토폴로지들을 도시한 도면들.
도 38은 애팔래치아 트레일(Appalachian Trail)의 트레일 마커를 묘사한 셀 폰 이용자에 의해 캡처된 이미지.
도 39 내지 도 43은 본 기술의 양태들을 통합하는 상세한 방법들을 도시한 도면들.
도 44는 셀 폰의 하나의 스타일의 이용자 인터페이스를 도시한 도면.
도 45a 및 도 45b는 공통성의 상이한 차원들이 어떻게 셀 폰의 이용자 인터페이스 제어의 이용을 통해 익스플로어될 수 있는지를 도시한 도면들.
도 46a 및 도 46b는 Prometheus 및 Paul Manship과 같은 키워드들이 셀 폰 이미지로부터 자동으로 결정됨으로써, 본 기술의 양태들을 통합하는 특정 방법의 상세도들.
도 47은 본 기술의 양태들에 따른 처리 이미지에서 참고될 수 있는 상이한 데이터 소스들의 일부를 도시한 도면.
도 48a, 도 48b 및 도 49는 본 기술의 양태들에 따른 상이한 처리 방법들을 도시한 도면들.
도 50은 본 기술의 양태들에 따라, 이미지 데이터에 대해 실행될 수 있는 상이한 처리의 일부를 식별한 도면.
도 51은 본 기술의 특정 양태들에 따라 활용될 수 있는 예시적인 트리 구조를 도시한 도면.
도 52는 예를 들면 피어-투-피어 네트워크에서 서로 협력할 수 있는 착용형 컴퓨터들(예를 들면, 셀 폰들)의 네트워크를 도시한 도면.
도 53 내지 도 55는 부호들의 용어사전(glossary)이 어떻게 셀 폰에 의해 식별될 수 있고 상이한 동작들을 트리거링하기 위해 이용될 수 있는지의 상세도들.
도 56은 종래 기술의 디지털 카메라 기술의 양태들을 도시한 도면.
도 57은 본 기술의 양태들을 통합하는 실시예의 상세도.
도 58은 셀 폰이 어떻게 장면 및 디스플레이 어파인 파라미터들에 이용될 수 있는지를 도시한 도면.
도 59는 본 기술의 특정 상태 머신 양태들을 도시한 도면.
도 60은 "정지(still)" 이미지인 경우에도 어떻게 시간 또는 움직임 양태들을 포함할 수 있는지를 도시한 도면.
도 61은 본 기술의 양태들을 통합하는 구현에 관련될 수 있는 일부 메타데이터를 도시한 도면.
도 62는 셀 폰 카메라 이용자에 의해 캡처될 수 있는 이미지를 도시한 도면.
도 63 내지 도 66은 도 62의 이미지가 어떻게 의미 메타데이터(semantic metadata)를 전달하기 위해 처리될 수 있는지의 상세도들.
도 67은 셀 폰 카메라 이용자에 의해 캡처될 수 있는 다른 이미지를 도시한 도면.
도 68 및 도 69는 도 67의 이미지가 어떻게 의미 메타데이터를 전달하도록 처리될 수 있는지의 상세도들.
도 70은 셀 폰 카메라 이용자에 의해 캡처될 수 있는 이미지를 도시한 도면.
도 71은 도 70의 이미지가 어떻게 의미 메타데이터를 전달하도록 처리될 수 있는지의 상세도들.
도 72는 인간 시각 시스템(human visual system)의 양태들을 도시한 차트.
도 73은 이미지의 상이한 저, 중간 및 고주파수 성분들을 도시한 도면.
도 74는 신문지를 도시한 도면.
도 75는 레이아웃 소프트웨어에 의해 설정된 대로 도 74 페이지의 레이아웃을 도시한 도면.
도 76은 인쇄된 텍스트로부터 캡처된 이미지와의 이용자 상호작용이 어떻게 향상될 수 있는지의 상세도.
도 77은 메타데이터의 의미 전달이 어떻게 JPEG2000 등과 유사한 점진형 양태(progressive aspect)를 가질 수 있는지를 도시한 도면.
도 78은 종래 기술의 서모스탯의 블록도.
도 79는 도 78의 서모스탯의 외부도.
도 80은 본 기술의 특정 양태들("ThingPipe")을 활용하는 서모스탯의 블록도.
도 81은 본 기술의 특정 양태들을 활용하는 셀 폰의 블록도.
도 82는 도 80의 서모스탯의 특정 동작들이 설명되는 블록도.
도 83은 서모스탯 온도를 증가시키거나 감소시키기 위해 이용자가 터치할 수 있는 특정 터치-스크린 타겟들이 오버레이된 서모스탯으로부터 캡처된 이미지를 묘사하는 셀 폰 디스플레이를 도시한 도면.
도 84는 도 83과 유사하지만, 터치 스크린을 구비하지 않고 폰 상에서 이용하기 위한 그래픽 이용자 인터페이스를 도시한 도면.
도 85는 본 기술의 양태들을 활용하는 알람 클럭의 블록도.
도 86은 기술의 일 양태에 따라, 셀 폰 상에 제공될 수 있는 알람 클럭 이용자 인터페이스의 스크린을 도시한 도면.
도 87은 셀 폰의 이용을 통해 제어될 수 있는 근처 디바이스들을 상술하는 이용자 인터페이스의 스크린을 도시한 도면.0 illustrates an example embodiment incorporating certain aspects of the techniques described herein above.
1 is a top view of an embodiment incorporating aspects of the present technology.
FIG. 2 illustrates a portion of an application that a user may request to run a camera-mounted cell phone.
3 is an illustration of some of the commercial entities in an embodiment incorporating aspects of the present technology.
4, 4A and 4B conceptually illustrate how pixel data and derivatives are applied to different tasks and packaged in packet form.
5 illustrates how different tasks may have certain image processing operations in common.
6 illustrates how common image processing operations can be identified and used to configure cell phone processing hardware to perform these operations.
FIG. 7 illustrates how a cell phone can transmit certain pixel-related data over an internal bus for local processing and other pixel-related data over a communication channel for processing in the cloud.
FIG. 8 illustrates how the cloud processing of FIG. 7 allows even more " intelligence " to be applied to the task desired by the user.
9 illustrates in detail how keyvector data is distributed to different external service providers, who executes services in exchange for compensation, and which is handled in an enhanced manner for the user.
10 illustrates an embodiment incorporating aspects of the present technology that note how cell phone-based processing is suitable for simple object identification tasks such as template matching, while cloud-based processing is suitable for complex tasks such as data association. Figure.
10A illustrates an embodiment incorporating aspects of the present technology that note that the user experience is optimized by performing visual keyvector processing as close as possible to the sensor and managing the traffic to the cloud as low as possible the communication stack.
11 shows a second group in which tasks related to external processing can be routed to a first group of service providers who routinely execute specific tasks for a cell phone, or compete dynamically based on processing tasks from the cell phone. Illustrating what may be routed to service providers in the network.
12 is an enlarged view of the concepts of FIG. 11 illustrating, for example, how the bead filter and broadcast agent software module may investigate reverse auction processing.
13 is a top block diagram of a processing arrangement incorporating aspects of the present technology.
14 is a top block diagram of another processing arrangement incorporating aspects of the present technology.
FIG. 15 illustrates an example range of image types that may be captured by a cell phone camera.
16 illustrates a particular hardware implementation incorporating aspects of the present technology.
17 illustrates aspects of a packet used in an exemplary embodiment.
18 is a block diagram illustrating an implementation of a SIFT technique.
19 is a block diagram illustrating, for example, how packet header data can be changed during processing through the use of memory.
19A illustrates a prior art architecture from a robotic player project.
FIG. 19B illustrates how various factors can affect how different operations can be handled. FIG.
20 illustrates an arrangement in which a cell phone camera and a cell phone projector share a lens.
20A illustrates a reference platform architecture that may be used in embodiments of the present technology.
21 shows an image of a desktop phone captured by a cell phone camera.
FIG. 22 illustrates a collection of similar images found in a repository of public images, with reference to features identified from the image of FIG. 21.
23-28A and 30-34 are flow charts describing methods of incorporating aspects of the present technology.
29 is an art shot of the Eiffel Tower captured by a cell phone user.
35 is another image captured by the cell phone user.
36 is an image of the underside of a telephone found using methods in accordance with aspects of the present technology.
37 illustrates a portion of one style of physical user interface of a cell phone.
37A and 37B show different linking topologies.
FIG. 38 is an image captured by a cell phone user depicting trail markers of the Appalachian Trail. FIG.
39-43 illustrate detailed methods of incorporating aspects of the present technology.
44 illustrates one style of user interface of a cell phone.
45A and 45B illustrate how different dimensions of commonality can be explored through the use of user interface control of a cell phone.
46A and 46B are detailed views of a particular method of incorporating aspects of the present technology, as keywords such as Prometheus and Paul Manship are automatically determined from the cell phone image.
47 illustrates some of the different data sources that may be referenced in a processed image in accordance with aspects of the present technology.
48A, 48B, and 49 illustrate different processing methods in accordance with aspects of the present technology.
50 identifies some of the different processes that can be performed on image data, in accordance with aspects of the present technology.
FIG. 51 illustrates an example tree structure that may be utilized in accordance with certain aspects of the present technology. FIG.
FIG. 52 illustrates a network of wearable computers (eg, cell phones) that may cooperate with one another, for example in a peer-to-peer network.
53-55 are detailed views of how a glossary of symbols can be identified by a cell phone and used to trigger different actions.
56 illustrates aspects of a digital camera technique of the prior art.
57 is a detailed view of an embodiment incorporating aspects of the present technology.
58 illustrates how a cell phone can be used for scene and display affine parameters.
59 illustrates certain state machine aspects of the present technology.
FIG. 60 shows how time or motion aspects may be included even in the case of “still” images. FIG.
FIG. 61 illustrates some metadata that may be relevant to an implementation that incorporates aspects of the present technology.
FIG. 62 illustrates an image that may be captured by a cell phone camera user. FIG.
63-66 are detailed views of how the image of FIG. 62 can be processed to convey semantic metadata.
67 illustrates another image that may be captured by a cell phone camera user.
68 and 69 are detailed views of how the image of FIG. 67 can be processed to convey semantic metadata.
70 illustrates an image that may be captured by a cell phone camera user.
FIG. 71 illustrates details of how the image of FIG. 70 may be processed to convey semantic metadata. FIG.
FIG. 72 is a chart illustrating aspects of the human visual system. FIG.
73 illustrates different low, medium and high frequency components of the image.
74 shows a newspaper.
FIG. 75 shows a layout of the FIG. 74 page as set by the layout software. FIG.
FIG. 76 is a detail view of how user interaction with an image captured from printed text can be enhanced. FIG.
FIG. 77 illustrates how semantic delivery of metadata may have a progressive aspect similar to JPEG2000 and the like.
78 is a block diagram of a thermostat of the prior art.
FIG. 79 is an external view of the thermostat of FIG. 78;
80 is a block diagram of a thermostat that utilizes certain aspects of the present technology (“ThingPipe”).
81 is a block diagram of a cell phone utilizing certain aspects of the present technology.
FIG. 82 is a block diagram in which specific operations of the thermostat of FIG. 80 are described.
FIG. 83 illustrates a cell phone display depicting an image captured from a thermostat overlaid with specific touch-screen targets that a user may touch to increase or decrease the thermostat temperature.
FIG. 84 is similar to FIG. 83 but illustrates a graphical user interface for use on a phone without a touch screen;
85 is a block diagram of an alarm clock utilizing aspects of the present technology.
86 illustrates a screen of an alarm clock user interface that may be provided on a cell phone, in accordance with an aspect of the technology.
FIG. 87 illustrates a screen of a user interface detailing nearby devices that can be controlled through the use of a cell phone.

본 명세서는 다양한 상이한 목적들을 충족시키기 위해 연장된 시간 기간에 걸쳐 어셈블링된 기술들의 다양성을 상술한다. 아직까지 이들은 다양한 방식으로 함께 관련되고, 따라서 이러한 단일 문서에 일괄적으로 제공된다. This disclosure details the variety of techniques assembled over an extended time period to meet a variety of different purposes. Yet they are related together in various ways and are therefore provided in batches in this single document.

이러한 변형되고 상호 관련된 요지는 직접 제공에 그 자체로 적합하지 않다. 따라서, 이러한 설명부분이 때때로 분류된 토픽들 및 기술들 중에서 비선형적인 방식으로 진행하므로, 판독자의 이해를 바란다. This modified and interrelated subject matter is not in itself suitable for direct provision. As such, the description sometimes proceeds in a non-linear fashion among classified topics and techniques, and therefore should be understood by the reader.

이 명세서의 각각의 부분은 다른 부분들에서 상술된 기술적 특징들을 바람직하게 통합하는 기술을 상술한다. 따라서, 이 개시내용이 논리적으로 시작되어야 하는 "개시부(beginning)"를 식별하는 것은 어렵다. 즉, 우리는 단지 열심히 착수한다.Each part of this specification details a technique that preferably incorporates the above described technical features in other parts. Thus, it is difficult to identify "beginning" where this disclosure should logically begin. That is, we just go on hard.

분산형 네트워크 서비스들을 이용한 모바일 Mobile with Distributed Network Services 디바이스device 오브젝트Object 인식 및 상호작용 Awareness and interaction

모바일 디바이스 카메라(예를 들면, 셀 폰에서)로부터 스트리밍하는 고품질 이미지 데이터에 포함되는 헤아릴 수 없는 정보량과 이 데이터가 무엇이든 종료하도록 처리하기 위한 그 모바일 디바이스의 능력 사이에는 현재 거대한 단절이 존재한다. 비주얼 데이터의 "오프 디바이스(Off device)" 처리는 특히, 다수의 비주얼 처리 작업들이 바람직할 수 있을 때, 이러한 데이터의 파이어 호스를 처리하도록 도울 수 있다. 이러한 이슈들은 "실시간 오브젝트 인식 및 상호작용"이 고려되면 훨씬 더 중요해지며, 그 경우 모바일 디바이스의 이용자는 그 이용자가 장면 또는 오브젝트에 카메라를 들이대므로, 모바일 디바이스 스크린 상에 가상으로 순시 결과들 및 증대된 현실 그래픽 피드백을 예상한다. There is currently a huge disconnect between the innumerable amount of information contained in high quality image data streaming from a mobile device camera (eg, at a cell phone) and the mobile device's ability to process whatever data ends up. “Off device” processing of visual data may help to handle the fire hose of such data, especially when multiple visual processing tasks may be desirable. These issues become even more important when "real-time object recognition and interaction" is considered, in which case the user of the mobile device virtually instantaneous results and augmentation on the mobile device screen as the user touches the camera on the scene or object. Expected realistic graphic feedback.

본 기술의 일 양태에 따라, 픽셀 프로세싱 엔진들의 분산형 네트워크는 이러한 모바일 디바이스 이용자들을 서빙하고, 일반적으로 1초보다 훨씬 더 작게 피드백하여 가장 질적인 "인간 실시간 상호작용성" 요건들을 충족한다. 구현은 바람직하게, 모바일 디바이스에 이용 가능한 기본 통신 채널과 이미지 센서의 출력 픽셀들 사이의 약간 밀접한 관계를 포함하여, 모바일 디바이스에 대한 특정한 기본 특징들을 제공한다. 로컬 디바이스 상의 픽셀 데이터의 "콘텐트 필터링 및 분류"의 기본 특정 레벨들에 뒤이은 이용자의 의도들 및 예약들에 의해 특정된 바와 같은 픽셀 데이터에 라우팅 명령어들은 모바일 디바이스와 하나 이상의 "클라우드 기반" 픽셀 처리 서비스들 사이의 대화형 세션을 유발한다. 키워드 "세션"은 또한, 모바일 디바이스로 다시 송신되는 고속 응답들을 나타내며, "실시간" 또는 "대화형"과 같이 마케팅된 일부 서비스들에 대해, 세션은 본질적으로 일반적으로 패킷-기반의 듀플렉스를 표현하고, 여러 아웃고잉 "픽셀 패킷들" 및 여러 인커밍 응답 패킷들(처리된 데이터와 함께 업데이트된 픽셀 데이터들일 수 있음)은 매초마다 발생할 수 있다. According to one aspect of the present technology, a distributed network of pixel processing engines serves these mobile device users and generally feeds back much smaller than one second to meet the most qualitative "human real-time interactivity" requirements. The implementation preferably provides certain basic features for the mobile device, including a slightly close relationship between the underlying communication channel available to the mobile device and the output pixels of the image sensor. Routing instructions to pixel data as specified by the user's intentions and reservations following the basic specific levels of "content filtering and classification" of pixel data on the local device may be processed by the mobile device and one or more "cloud-based" pixel processing. Induce an interactive session between services. The keyword "session" also refers to fast responses sent back to the mobile device, and for some services marketed, such as "real time" or "interactive," the session essentially represents a packet-based duplex and Several outgoing "pixel packets" and several incoming response packets (which may be updated pixel data along with processed data) may occur every second.

비즈니스 팩터들 및 양호한 오랜 경합은 분산형 네트워크의 심장에 있다. 이용자들은 그들이 선택한 임의의 외부 서비스들에 가입할 수 있거나 이들에 탭핑할 수 있다. 로컬 디바이스 자체 및/또는 그 디바이스에 대한 캐리어 서비스 제공자는 이용자가 선택하는 대로 구성되어, 필터링된 및 적합한 픽셀 데이터를 특정된 오브젝트 상호작용 서비스들에 라우팅할 수 있다. 이러한 서비스들에 대한 요금부과 메커니즘들은 기존의 셀 및/또는 모바일 디바이스 요금부과 네트워크들로 직접 플러깅할 수 있으며, 이용자들에게는 요금부과되고 및 서비스 제공자들에게는 지불된다.Business factors and good old contention are at the heart of distributed networks. Users can subscribe to or tap on any external services they choose. The local device itself and / or the carrier service provider for that device may be configured as the user chooses to route the filtered and suitable pixel data to specified object interaction services. The charging mechanisms for these services can plug directly into existing cell and / or mobile device charging networks, being billed to users and paid to service providers.

그러나, 잠시 백업하자. 모바일 디바이스들에 대한 카메라 시스템들의 추가는 애플리케이션들의 급증을 유발하였다. 원시 애플리케이션은 서민들 사이에서, 그들 환경의 신속한 비주얼 양태들을 간단히 스냅핑하고 친구들 및 가족과 그러한 화상들을 틀림없이 공유했을 것이다. However, let's back up for a while. The addition of camera systems to mobile devices has led to a surge of applications. The native application would simply snap quick visual aspects of their environment among common people and certainly share such pictures with friends and family.

그 시작 지점으로부터의 애플리케이션들의 패닝 아웃은 거의 틀림없이 모바일 카메라들에 고유한 코어 플러밍 특징들(core plumbing features)의 세트에 달려 있다. 간단히( 및 당연히 속속히 규명되지 않고), 이러한 특징들은 다음을 포함한다: a) 고품질의 픽셀 캡처 및 저 레벨 처리; b) 후속 이용자 피드백으로 디바이스 상 픽셀 처리를 위한 더욱 양호한 로컬 디바이스 CPU 및 GPU 리소스들; c) "클라우드"로 구조화된 접속성; 및 중요하게 d) 특정 트래픽 모니터링 및 요금부과 인프라스트럭처. 도 1은 시각적 지능형 네트워크라고 불릴 수 있는 것의 이들 플러밍 특징들의 일부에 대한 하나의 그래픽 조망도이다. (마이크로폰과 같은 셀 폰, A/D 컨버터, 변조 및 복조 시스템들, IF 스테이지들, 셀룰러 송수신기 등의 통상적인 세부사항들이 도면의 명확성을 위해 도시되지 않음.)The panning out of applications from that starting point almost certainly depends on the set of core plumbing features inherent in mobile cameras. Briefly (and of course not quickly identified), these features include: a) high quality pixel capture and low level processing; b) better local device CPU and GPU resources for pixel processing on the device with subsequent user feedback; c) "cloud" structured connectivity; And importantly d) specific traffic monitoring and charging infrastructure. 1 is one graphical view of some of these plumbing features of what may be called visual intelligent networks. (Typical details such as cell phones such as microphones, A / D converters, modulation and demodulation systems, IF stages, cellular transceivers, etc. are not shown for clarity of the drawings.)

모바일 디바이스들 상에서 더욱 양호한 CPU들 및 GPU들 및 더 많은 메모리를 얻는 것이 좋다. 그러나, 비용, 중량 및 전력 고려사항들은 가능한 "지능"을 대폭 올려서 "클라우드"가 행하도록 조력할 가능성이 있다. It is good to get better CPUs and GPUs and more memory on mobile devices. However, cost, weight and power considerations have the potential to help the "cloud" do as much as possible "intelligence".

관련하여, 특정 포맷팅, 요소적 그래픽 처리 및 다른 기계적 동작들을 포함하는 모든 클라우드 처리들을 서빙하는 비주얼 데이터에 대한 "디바이스측" 동작들의 공통 요소 세트가 되어야 할 가능성이 있다. 유사하게, 클라우드와의 결과로서 생긴 후방 및 전방 통신 트래픽(통상적으로 패킷화된)에 대한 표준화된 기본 헤더 및 어드레싱 방식이어야 할 가능성이 있다. In this regard, there is a potential to be a common element set of "device-side" operations for visual data serving all cloud processes, including specific formatting, elemental graphics processing, and other mechanical operations. Similarly, there is a possibility that there should be a standardized default header and addressing scheme for back and forward communication traffic (usually packetized) resulting from the cloud.

이러한 개념화는 인간 시각 시스템과 유사하다. 눈은 색도계들과 같은 베이스라인 동작들을 실행하고, 뇌의 시신경을 따라 송신하기 위한 필요 정보를 최적화한다. 뇌는 실제 인식 작업을 한다. 그리고, 역시 역으로 피드백한다 - 뇌는 - 눈이 향하는 곳, 책의 행들을 스캐닝, 홍채(밝기)를 제어 등 - 근육 움직임을 제어하는 정보를 송신한다.This conceptualization is similar to the human visual system. The eye performs baseline operations such as colorimeters and optimizes the necessary information for transmission along the optic nerve of the brain. The brain does the real work of perception. It also feeds back the feedback-the brain sends information that controls muscle movement-where the eye is headed, scanning rows of books, controlling iris (brightness), and so on.

도 2는 모바일 디바이스들에 대한 비주얼 처리 애플리케이션들의 철저히 규명된 것이 아닌 예시적인 리스트를 도시한다. 다시, 인간 시각 시스템 및 인간 뇌가 동작하는 방법의 기초들과 이 리스트 사이의 유사성들을 찾는 것은 어렵지 않다. 인간 시각 시스템이 임의의 주어진 오브젝트 인식 작업에 관련되는 것이 얼마나 "최적화되는지"를 다루는 것은 잘 연구된 대학 영역이며, 눈-망막-시신경-피질 시스템은 인식 수요들의 광대한 어레이를 서빙하는 것이 얼마나 효율적인지가 놀라울 정도로 예쁜 짜깁기(pretty darn)인 것이 일반적인 합의이다. 이 기술의 양태는 유사하게 효율적이고 광범위하게 가능한 요소들이 어떻게 모바일 디바이스들, 모바일 디바이스 접속부들 및 네트워크 서비스들로 만들어질 수 있는지에 관련되며, 이들 모두는 도 2에 도시된 애플리케이션들 및 기술 댄스가 계속됨에 따라 보여줄 수 있는 이들 새로운 애플리케이션들을 서빙하기 위한 것이다. FIG. 2 shows an illustrative list that is not exhaustive of visual processing applications for mobile devices. Again, it is not difficult to find similarities between this list and the basics of the human visual system and how the human brain operates. It is a well-researched university area to deal with how "optimized" the human visual system is involved in any given object recognition task, and how efficient it is for an eye-retinal-optic-cortical system to serve a vast array of recognition needs. It is a common consensus that the paper is surprisingly pretty darn. Aspects of this technology relate to how similarly efficient and widely available elements can be made of mobile devices, mobile device connections, and network services, all of which include the applications and technology dance shown in FIG. It is intended to serve these new applications that can be shown as it continues.

아마도, 인간 유추(human analogy)와 모바일 디바이스 네트워크들 사이의 주요 차이는 사업이 그에 따른 이익을 내는 방법을 알고 있는 한 구매자들이 점점 더 양호한 것들을 구매하는 "시장"의 기본 개념에 확실히 초점을 맞추어야 한다는 점이다. 도 2에 리스팅된 애플리케이션들을 서빙하기 위한 임의의 기술은 수천이 아니면 수백의 비즈니스 엔티티들이 특정 상업용 제공들의 중요한 세부사항들을 개발할 것이라고 가정해야 하며, 한 방식 또는 다른 방식의 예상은 이들 제공들로부터 이익을 얻는다. 그렇다, 몇몇 비히머스들(behemoths)은 전체 모바일 산업에서 현금 유통들의 주요 라인들을 지배할 것이지만, 틈새 플레이어들은 틈새 애플리케이션들 및 서비스들을 계속 계발하고 있을 것이라는 동일한 확실성들이 있다. 따라서, 이러한 개시내용은 비주얼 처리 서비스들에 대한 시장이 어떻게 개발될 수 있는지와, 그에 의해 스펙트럼에 걸친 비즈니스 관심들이 얻을 것을 가지는 것을 기술한다. 도 3은 이 출원 시기에서 동작하는 글로벌 비즈니스 에코시스템에 응용 가능한 비즈니스 관심들의 일부의 대략적 카테고리화를 시도한다. Perhaps, the main difference between human analogy and mobile device networks should be a clear focus on the basic concept of a "market" where buyers buy better things as long as the business knows how to make a profit. Is the point. Any technique for serving the applications listed in FIG. 2 should assume that not thousands or hundreds of business entities will develop important details of specific commercial offerings, and one way or another may be expected to benefit from these offers. Get Yes, some behemoths will dominate the main lines of cash flows across the entire mobile industry, but there are the same certainties that niche players will continue to develop niche applications and services. Thus, this disclosure describes how a market for visual processing services can be developed and thereby has what business interests across the spectrum will gain. 3 attempts to roughly categorize some of the business interests applicable to a global business ecosystem operating at this filing time.

도 4는 현재 고려중인 기술 양태의 도입의 추론을 도시한다. 여기에서, 우리는 그 낮은 비트의 다수의 대기 소비자들로, 전자 이미지 센서의 일부 형태에 영향을 미친 어떤 일군의 광자들로부터 도출된 정보의 매우 추상적인 비트를 발견하였다. 도 4a는 그 후에 비주얼 정보의 단일 비트들이 공간 및 시간 그룹들 양쪽 모두에서 그들 역할의 밖에서는 그다지 가치가 없다는 직관적으로 잘 알려진 개념을 신속히 도입한다. 이러한 핵심 개념은 MPEG7 및 H.264와 같은 현대 비디오 압축 표준들에서 잘 활용된다. 4 shows an inference of the introduction of the technical aspect currently under consideration. Here, we found a very abstract bit of information derived from some group of photons that affected some form of electronic image sensor, with many of its low bit atmospheric consumers. 4A then quickly introduces an intuitively well-known concept that single bits of visual information are of little value outside their role in both spatial and temporal groups. This core concept is well utilized in modern video compression standards such as MPEG7 and H.264.

비트들의 "비주얼" 캐릭터는 특정 처리에 의해 비주얼 도메인으로부터 매우 멀리 제거될 수 있다(예를 들면, 아이겐페이스 데이터를 나타내는 벡터 스트링들을 고려하자). 따라서, 우리는 때때로, 미가공(raw) 센서/자극 데이터(예를 들면, 픽셀 데이터) 및/또는 처리된 정보 및 연관된 파생물들을 일괄적으로 나타내기 위해 용어 "키벡터 데이터"(또는 "키벡터 스트링들")를 이용한다. 키벡터는 이러한 정보가 전달되는 컨테이너의 형태(예를 들면, 패킷과 같은 데이터 구조)를 취할 수 있다. 태그 또는 다른 데이터는 정보의 타입(예를 들면, JPEG 이미지 데이터, 또는 아이겐페이스 데이터)을 식별하기 위해 포함될 수 있거나, 데이터 타입은 데이터로부터 또는 콘텍스트로부터 명확할 수 있다. 하나 이상의 명령어들 또는 동작들은 - 키벡터에 명확히 상술되거나 내포된 - 키벡터 데이터와 연관될 수 있다. 특정 타입들의 키벡터 데이터에 대해, 디폴트 방식으로 동작이 내포될 수 있다(예를 들면, JPEG 데이터에 대해, "이미지가 저장"될 수 있고; 아이겐페이스 데이터에 대해서는 "이 아이겐페이스 템플릿이 매칭"될 수 있다). 또는 내포된 동작이 콘텍스트에 의존할 수 있다. The "visual" character of the bits can be removed very far from the visual domain by a particular process (eg, consider vector strings representing eigenface data). Thus, we sometimes refer to the term "keyvector data" (or "keyvector string" to collectively represent raw sensor / stimulation data (eg pixel data) and / or processed information and associated derivatives. ”). The keyvector may take the form of a container (such as a packet data structure) in which this information is conveyed. The tag or other data may be included to identify the type of information (eg, JPEG image data, or eigenface data), or the data type may be apparent from the data or from the context. One or more instructions or actions may be associated with keyvector data-which is explicitly specified or implied in the keyvector. For certain types of keyvector data, the operation may be implied in a default manner (eg, for JPEG data, "image may be stored"; for eigenface data, "this eigenface template matches"). Can be). Or the nested action may depend on the context.

도 4a 및 도 4b는 또한 이 개시내용에 중심 플레이어를 도입한다: 키벡터 데이터가 삽입된 몸체로 패키징되고 어드레스-라벨이 붙여진 픽셀 패킷. 키벡터 데이터는 단일 패치 또는 패치들의 콜렉션, 또는 패치들/콜렉션들의 시계열일 수 있다. 픽셀 패킷은 킬로바이트보다 적을 수 있거나, 그 크기는 훨씬 더 클 수 있다. 그 것은 더 큰 이미지로부터 발췌된 픽셀들의 분리된 패치에 관한 정보를 전달할 수 있거나, 노틀담 성당의 대규모 포토싱스(Photosynth)를 전달할 수 있다. 4A and 4B also introduce a central player to this disclosure: an address-labeled pixel packet packaged into a body with keyvector data embedded therein. The keyvector data can be a single patch or a collection of patches, or a time series of patches / collections. Pixel packets can be less than kilobytes, or the size can be much larger. It can convey information about a separate patch of pixels extracted from a larger image, or it can convey the large-scale Photosynth of Notre Dame Cathedral.

(현재 표현되는 바와 같이, 픽셀 패킷은 애플리케이션층 구조이다. 그러나, 실제로 네트워크를 난폭하게 다루면, 더 작은 부분들로 깨어질 수 있다 - 네트워크의 송신층 제약들이 요구될 수 있으므로).(As currently represented, the pixel packet is an application layer structure. However, if you really deal with the network wildly, it can be broken into smaller parts-because the transmission layer constraints of the network may be required).

도 5는 여전히 추상적인 레벨이지만 구체성을 지시하는 세구도이다. 도 2에 도시된 바와 같은 이용자-규정된 애플리케이션들의 리스트는 각기 모든 애플리케이션을 달성할 수 있는 픽셀 처리 방법들 및 방식들의 최첨단 기술의 목록표에 맵핑할 것이다. 이들 픽셀 처리 방법들은 흔한 및 그다지 흔하지 않은 구성요소 부작업들로 나누어질 수 있다. 오브젝트 인식 텍스트북들은 광범위한 방식들 및 용어들로 채워지며, 이는 일견에, 도 2에 도시된 애플리케이션에 관련된 "고유한 요건들"의 당황스러운 어레이인 것으로 나타날 수 있는 것으로 순서의 장면을 유발한다. (그 외에도, OpenCV 및 CMVision - 후술됨 - 와 같은 다중 컴퓨터 비전 및 이미지 처리 라이브러리들은 기능 동작들을 식별하고 렌더링하는 것으로 생성되었으며, 이것은 오브젝트 인식 패러다임들 내의 "원자" 기능들로 고려될 수 있다.) 그러나, 도 5는 비주얼 처리 애플리케이션들 사이에 공유된 공용 단계들 및 처리들의 세트가 실제로 존재하는 것을 도시하려고 한다. 상이하게 형성된 파이 슬라이스들은 특정 픽셀 동작들이 특정 등급일 수 있고 저 레벨의 변수들 또는 최적화들에서 차이들을 가질 수 있는 것을 도시하려고 한다. 전체 파이의 크기(대수 장면에서의 생각, 예를 들면, 다른 파이 크기의 두 배인 파이는 10배 더 많은 플롭들을 나타낼 수 있음) 및 슬라이스의 크기비는 공통성의 정도들을 표현한다. 5 is a segue diagram that still shows an abstract level but indicates specificity. The list of user-defined applications as shown in FIG. 2 will map to a state-of-the-art list of pixel processing methods and methods that can achieve each and every application. These pixel processing methods can be broken down into common and less common component subtasks. Object-aware textbooks are populated with a wide variety of ways and terms, which at first glance cause scenes of the order that can appear to be an embarrassing array of "unique requirements" related to the application shown in FIG. (In addition, multiple computer vision and image processing libraries, such as OpenCV and CMVision-described below, have been created by identifying and rendering functional operations, which can be considered "atomic" functions within object recognition paradigms.) However, FIG. 5 attempts to show that there is actually a set of shared steps and processes shared between visual processing applications. Differently formed pie slices are intended to illustrate that certain pixel operations may be of a particular class and may have differences in low level variables or optimizations. The size of the whole pie (think in algebraic scene, for example a pie that is twice the size of another pie can represent 10 times more flops) and the size ratio of the slices represent degrees of commonality.

도 6은 처리의 단순성을 희생하고 구체적으로 주요 단계를 취한다. 여기서, 우리는 "고유 호출 비주얼 처리 서비스들"이라고 상부에 라벨이 붙여진 것을 알 수 있으며, 이것은 주어진 모바일 디바이스가 자각될 수 있거나 실행되도록 철저히 인에이블될 수 있는 도 2로부터의 애플리케이션들의 모든 가능한 리스트를 표현한다. 이 개념은 이들 모든 애플리케이션들이 모든 시간을 활성화되게 해야 하는 것은 아니고, 따라서 서비스들의 일부 서브-세트는 임의의 주어진 순간에 실제로 "턴 온"된다는 것이다. 1회 구성 활동으로서 턴 온 애플리케이션들은 "공용 처리들 분류기"라고 라벨이 붙여진 공용 구성요소 작업들을 식별하기 위해 협상한다 - 먼저, 이들 요소적 이미지 처리 루틴들(예를 들면, FFT, 필터링, 에지 검출, 리샘플링, 컬러 히스토그래밍, 로그-극성 변환 등)의 라이브러리로부터 선택된 디바이스상 처리에 이용 가능한 픽셀 처리 루틴들의 전체 공용 리스트를 생성한다. 대응하는 흐름 게이트 구성/소프트웨어 프로그래밍 정보의 생성이 뒤따르며, 필드 프로그래밍 가능한 게이트 어레이 셋-업에 적합하게 정렬된 장소들로 라이브러리 요소들을 사실상 로딩하거나, 그렇지 않으면 필요한 구성요소 작업들을 실행하도록 적합한 처리기를 구성한다. 6 sacrifices the simplicity of processing and specifically takes major steps. Here, we can see that it is labeled "Unique Calling Visual Processing Services" at the top, which lists all the possible lists of applications from FIG. 2 that a given mobile device can be aware of or thoroughly enabled to run. Express. The concept is not that all these applications need to be active all the time, so some sub-set of services are actually "turned on" at any given moment. As a one-time configuration activity, turn-on applications negotiate to identify common component tasks labeled "public processes classifier"-first, these elementary image processing routines (eg, FFT, filtering, edge detection). Generate a common list of pixel processing routines available for processing on selected devices from a library of resampling, color histograms, log-polarity conversions, and the like. Generation of the corresponding flow gate configuration / software programming information is followed by the actual loading of library elements into places aligned with the field programmable gate array set-up, or by a suitable processor to perform the necessary component tasks. Configure.

도 6은 또한 이미지 센서에 뒤이은 범용 픽셀 세그먼터의 도면들을 포함한다. 이 픽셀 세그먼터는 센서로부터의 이미지의 대량 스트림을 관리 가능한 공간 및/또는 시간 블로브들(blobs)로 나누어진다(예를 들면 MPEG 매크로블록들, 웨이블릿 변환 블록들, 64 x 64 픽셀 블록들 등과 유사함). 픽셀들의 급류가 부술 수 있는 덩어리들로 나누어진 후에, 이들은 새롭게 프로그래밍된 게이트 어레이(또는 다른 하드웨어)로 공급되며, 이것은 선택된 애플리케이션들과 연관된 요소적 이미지 처리 작업들을 실행한다. (이러한 어레인지먼트들은 "픽셀 패킷들"을 활용하는 예시적인 시스템에서 하기에 더욱 후술된다.) 다양한 출력 결과들이 추가적인 처리를 위해 다른 리소스들(내부 및/또는 외부)에 요소적으로 처리된 데이터(예를 들면, 키벡터 데이터)를 나타내는 라우팅 엔진에 송신된다. 이러한 추가적인 처리는 통상적으로 이미 실행된 것보다 더 복잡하다. 예들은 연관들을 만드는 단계, 추론들을 도출하는 단계, 패턴 및 템플릿 매칭 단계 등을 포함한다. 이러한 추가적인 처리는 고도의 특수 용도일 수 있다. 6 also includes drawings of a general purpose pixel segmenter following the image sensor. This pixel segmenter is divided into manageable spatial and / or temporal blobs of a large stream of images from the sensor (eg MPEG macroblocks, wavelet transform blocks, 64 x 64 pixel blocks, etc.). Similar). After the rapids of pixels are broken into breakable chunks, they are fed into a newly programmed gate array (or other hardware), which performs the elementary image processing tasks associated with the selected applications. (These arrangements are further described below in an example system that utilizes "pixel packets.") Various output results are elementally processed (eg, internally and / or externally) into other resources for further processing (eg, For example, key vector data). This additional processing is typically more complex than what has already been done. Examples include making associations, deriving inferences, pattern and template matching, and the like. This additional treatment may be a high degree of special use.

(스테이트 파크의 보물 사냥에 참여할 대중을 초대하는 펩시로부터 광고 게임을 고려한다. 인터넷-배포된 단서들에 기초하여, 사람들은 상금 $500을 벌기 위해 숨겨진 소다의 종이 상자를 찾으려고 한다. 참여자들은 Pepsi-dot-com 웹 사이트(또는 애플 앱스토어)로부터 특정 애플리케이션을 다운로드해야 하며, 이것은 단서들(트위터에 공개될 수도 있음)을 배포하는 역할을 한다. 다운로드된 애플리케이션은 또한 상금 검증 구성요소를 가지며, 이것은 숨겨진 종이 상자가 고유하게 마크된 특수 패턴을 식별하기 위해 이용자의 셀 폰들에 의해 캡처된 이미지 데이터를 처리한다. SIFT 오브젝트 인식이 이용되며(후술됨), 특수 패키지에 대한 SIFT 특징 디스크립터들은 다운로드된 애플리케이션으로 전달된다. 이미지 매칭이 발견되면, 셀 폰은 즉시 동일한 것을 펩시에 무선으로 보고한다. 우승자는 자신의 셀 폰이 특수-마크된 종이 상자의 검출을 처음 보고한 이용자이다. 도 6의 어레인지먼트에서, SIFT 패턴 매칭 동작의 구성요소 작업들의 일부는 구성된 하드웨어에서 요소적 이미지 처리에 의해 실행된다; 나머지는 내부적 또는 외부적으로 더욱 특수화된 처리를 나타낸다.)(Consider an advertising game from Pepsi, which invites the public to participate in Treasure Hunt at State Park. Based on the Internet-distributed clues, people try to find a hidden box of soda to earn a $ 500 prize. Participants Pepsi- You need to download a specific application from the dot-com website (or Apple App Store), which distributes the clues (which may be published on Twitter) .The downloaded application also has a prize verification component, which A hidden paper box processes the image data captured by the user's cell phones to identify the uniquely marked special pattern SIFT object recognition is used (described below), and the SIFT feature descriptors for the special package are downloaded to the downloaded application. If an image match is found, the cell phone immediately Report wirelessly to the city The winner is the user whose cell phone first reported the detection of a specially-marked paper box In the arrangement of Figure 6, some of the component tasks of the SIFT pattern matching operation are elements in the configured hardware. Executed by an image processing; the rest represent more specialized processing internally or externally.)

도 7은 일반 분배된 픽셀 서비스들 네트워크 도에 대한 화상의 상부도이며, 로컬 디바이스 픽셀 서비스들 및 "클라우드 기반" 픽셀 서비스들이 어떻게 동작하는지에 대해 일종의 대칭성을 가진다. 도 7에서 라우터는 임의의 주어진 패키징된 픽셀 패킷이 어떻게 로컬인지 원격인지든 간에 적합한 픽셀 처리 위치에 송신되는지 주의한다(충전 패턴의 스타일은 상이한 구성요소 처리 기능들을 표시한다; 가능한 비주얼 처리 서비스들에 의해 요구된 소수의 처리 기능들만 기술된다). 클라우드-기반 픽셀 서비스들에 선적된 데이터의 일부는 먼저 로컬 디바이스 픽셀 서비스들에 의해 처리될 수 있다. 원형들은 라우팅 기능이 클라우드 - 노드들에 구성요소들을 가질 수 있음을 나타내며, 이들은 활성 서비스 제공자들에 작업들을 분배하고, 디바이스에 다시 송신하기 위한 결과들을 수집하도록 서빙한다. 일부 구현들에서, 이들 기능들은 예를 들면, 가장 빠른 동작을 보장하도록 무선 서비스 타워들의 모듈들에 의해 무선 네트워크의 에지에서 실행될 수 있다. 활성 외부 서비스 제공자들 및 활성 로컬 처리 스테이지들로부터 수집된 결과들은 픽셀 서비스 관리자 소프트웨어에 피드백되며, 그 후에는 디바이스 이용자 인터페이스와 상호작용한다. FIG. 7 is a top view of the picture for a general distributed pixel services network diagram, with some symmetry on how local device pixel services and “cloud based” pixel services operate. In Fig. 7 the router notes that any given packaged pixel packet is sent to the appropriate pixel processing location, whether local or remote (the style of the charging pattern indicates different component processing functions; to possible visual processing services). Only a few processing functions are required). Some of the data shipped in cloud-based pixel services may first be processed by local device pixel services. The prototypes indicate that the routing function can have components in the cloud-nodes, which serve to distribute the tasks to active service providers and collect the results for sending back to the device. In some implementations, these functions can be executed at the edge of the wireless network, for example, by modules of wireless service towers to ensure the fastest operation. Results collected from active external service providers and active local processing stages are fed back to the pixel service manager software, which then interacts with the device user interface.

도 8은 도 7의 하단 우측의 확대도이고, 도로시의 신들이 적색으로 변하는 순간과, 클라우드에 의해 제공되는 분배된 픽셀 서비스들이 - 로컬 디바이스와 반대로 - 모두 훌륭할 것이지만 가장 평범한 오브젝트 인식 작업들인 이유를 표현한다. FIG. 8 is an enlarged view of the bottom right of FIG. 7, the moment Dorothy's gods turn red and why the distributed pixel services provided by the cloud-both as opposed to the local device-will be fine but are the most common object recognition tasks Express

풍부한 형태의 오브젝트 인식은 엄격한 템플릿 매칭 규칙들보다는 비주얼 연관에 기초한다. 우리 모두가 기본 글자 "A"가 결코 변하지 않는 어떤 이전-역사적 형태(pre-historic form)를, 이런 말이 허용된다면 범용 템플릿 이미지를, 항상 엄격히 따르게 되는 것임을 배웠다면, 규정된 형태 "A"가 카메라에 나타날 때면 기본 A를 확실히 판독하려고 이를 얻기 위하여, 매우 분명하고 국부적으로 규범적인 방법들이 모바일 이미지 디바이스에 적당할 수 있다. 2D 및 3D 바코드도 많은 경우들에서, 오브젝트 인식에 대한 템플릿형 방식을 따르며, 이러한 오브젝트들을 관련시키는 포함된 애플리케이션들에 대해, 로컬 프로세싱 서비스들이 대량으로 일자리를 얻을 수 있다. 그러나, 바코드 예의 경우에도, 명백한 비주얼 코딩 타겟들의 성장 및 진화의 유연성이 명백한 기호 분야에서 어떤 전진이 있을 때마다 무수한 디바이스들에게 "코드 업그레이드들"을 강요하지 않는 아키텍처를 원한다. Rich forms of object recognition are based on visual association rather than strict template matching rules. If we all learned that the pre-historic form, where the basic letter "A" never changes, would always follow the universal template image, if this is allowed, then the prescribed form "A" would be the camera. In order to get this to be sure to read the base A when appearing, very obvious and locally normative methods may be suitable for a mobile image device. 2D and 3D barcodes also follow a templated approach to object recognition in many cases, and for the included applications that relate these objects, local processing services can get jobs in large quantities. However, even in the case of barcode examples, there is a desire for an architecture that does not force "code upgrades" to countless devices whenever there is any advance in the field of symbolic, where the flexibility of growth and evolution of apparent visual coding targets is apparent.

스펙트럼의 다른 끝에서, 예를 들면, 세상 주위의 중간에서 나비의 날개들의 펄럭임으로 유발된 의심스러운 태풍을 예측하는 작업을 - 애플리케이션이 필요로 하는 경우 - 수퍼컴퓨터들의 네트워크에 문의하는 임의의 복잡한 작업들이 생각될 수 있다. 오즈 배콘들(Oz beckons).At the other end of the spectrum, for example, any complex task of predicting a suspicious typhoon caused by the flapping of the wings of a butterfly in the middle around the world-if the application requires it-to consult the network of supercomputers. Can be thought of. Oz beckons.

도 8은 로컬 디바이스에 반대로, 클라우드에서 픽셀 처리의 이러한 기본적인 추가 차원성을 도시하려고 한다. 이것은 말할 것도 없이(또는 화상 없이) 가상적으로 진행하지만, 도 8은 또한, 도로시가 캔자스로 다시 돌아가고 그에 대해 행복해하는 도 9의 세구도이다. 8 attempts to illustrate this basic additional dimensionality of pixel processing in the cloud, as opposed to the local device. This goes without saying (or without images) virtually, but FIG. 8 is also the granular diagram of FIG. 9 where Dorothy goes back to Kansas and is happy about it.

도 9는 모바일 디바이스들 상의 카메라들을 이용하여, 매달 요금을 지불하는 동안 내내, 그들 비주얼 질의들로부터 매우 중요한 결과들을 얻는 현금, 현금 유통 및 행복한 인간들에 관한 모든 것이다. 이것은 지니가 병으로부터 나오는 것이 구글 "AdWords" 경매에서 판명된다. 즉각적인 비주얼 환경의 모바일 이용자로부터 한 순간 비주얼 스캔들의 장면들 뒤에는 그들이 "진실로" 찾고 있는 매우 훌륭한 상품을 위해, 그들이 아는지 모르는지 간에, 수백 및 수천의 마이크로-판단들, 픽셀 라우팅들, 결과 비교들, 및 모바일 디바이스 이용자에 대한 마이크로-경매 채널이 있다. 이러한 최종 지점은 임의 종류의 검색이 어떤 레벨에서 고유하게 제한이 없고 마법적이고, 제 1 장소에서의 검색의 즐기는 부분은 놀랍도록 새로운 연관들이 결과들의 부분이라는 점에서 고의적으로 뻔뻔스럽다. 검색 이용자는 그 후에 그들이 진실로 찾는 것을 안다. 캐리어-기반 금융 추적 서버로서 도 9에 나타난 시스템은 이제, 매달 청구서를 청구하고 적당한 엔티티들에 이익금들을 송신하기 위하여, 서비스들의 이용들을 모니터링하는 동안 내내, 이용자에게 다시 송신될 적절한 결과들을 용이하게 하는데 우리의 네트워킹된 픽셀 서비스들 모듈 및 그 역할의 추가를 알 수 있다. 9 is all about cash, cash flow, and happy humans using cameras on mobile devices to get very important results from their visual inquiries throughout their monthly payments. This proves that Genie comes out of the bottle at the Google "AdWords" auction. Hundreds and thousands of micro-judgments, pixel routings, result comparisons, and whether they know, for a very good product they're looking for "truthfully" behind scenes of an instant visual scandal from a mobile user in an immediate visual environment, and There is a micro-auction channel for mobile device users. This final point is deliberately blatant in that any kind of search is inherently unlimited and magical at some level, and the enjoyable part of the search in the first place is surprisingly new associations are part of the results. Search users then find out what they are really looking for. The system shown in FIG. 9 as a carrier-based financial tracking server now facilitates proper results to be sent back to the user throughout the monitoring of the use of services, in order to bill monthly and send benefits to the appropriate entities. We can see the addition of our Networked Pixel Services module and its role.

(다른 곳에 더욱 상술된 바와 같이, 자금 유통은 원격 서비스 제공자들에게 배타적으로 될 수 없다. 예를 들면, 특정 동작들을 유도하거나 보상하기 위하여, 이용자들 또는 다른 제3자들과 같이, 다른 자금 흐름들이 증가한다.)(As further described elsewhere, the flow of funds may not be exclusive to remote service providers. For example, in order to induce or compensate for certain operations, different flows of money, such as users or other third parties, may be used. Increases.)

도 10은 템플릿 매칭과 비슷한 작업들이 어떻게 셀 폰 상에서 스스로 실행될 수 있는 반면 더욱 복잡한 작업들(데이터 연관과 비슷한) 작업들이 처리를 위해 클라우드에 바람직하게 참조되는 것을 도시한 처리의 기능 분할에 초점을 맞춘다.FIG. 10 focuses on the functional division of processing, showing how tasks similar to template matching can be executed on the cell phone themselves, while more complex tasks (similar to data associations) are preferably referenced in the cloud for processing. .

상술된 것의 요소들은 도 10a에 추출되어, 기술의 양태들을 (일반적으로) 소프트웨어 구성요소들의 물리적인 일로서 구현한 것을 도시한다. 도면에서 2개의 타원형들은 모바일 디바이스와 일반 클라우드 또는 서비스 제공자들 사이의 "인간 실시간(human real-time)" 비주얼 인식 세션의 셋업, 데이터 연관들 및 비주얼 질의 결과들을 관련시키는 대칭 쌍의 소프트웨어 구성요소들을 강조한다. 왼쪽의 타원은 "키벡터들", 더욱 명확히 "비주얼 키벡터들"을 나타낸다. 주지된 바와 같이, 이러한 용어는 로그-극성 변환된 얼굴 특징 벡터들 및 그들 사이나 그들을 넘어선 모든 것을 통한 모든 방식의 간단한 JPEG 압축된 블록들로부터의 모든 것을 포함할 수 있다. 키벡터의 핵심은 어떤 주어진 비주얼 인식 작업의 본질적인 미가공 정보가 최적으로 전-처리되고 패키징되었다(가능하다면 압축되었다)는 점이다. 왼쪽 타원형은 이들 패킷들을 어셈블링하고, 통상적으로, 라우팅될 어떤 어드레싱 정보를 삽입한다.(최종 어드레싱은 패킷이 원격 서비스 제공자들에 궁극적으로 라우팅될 수 있으므로, 가능하지 않을 수 있다 - 이에 대한 세부사항들은 아직 알려지지 않을 수 있다.) 바람직하게, 이 처리는 이미지 센서와 동일한 기판상에 집적된 회로를 처리함으로써와 같이 가능한 미가공 센서 데이터에 가깝게 실행되며, 이것은 패킷 형태로 다른 단으로부터 제공되거나 메모리에 저장된 소프트웨어 명령어들에 응답한다. The elements of the above have been extracted in FIG. 10A to illustrate the implementation of aspects of the technology as (generally) the physical work of software components. The two ovals in the figure represent symmetric pairs of software components that correlate the setup, data associations and visual query results of a "human real-time" visual recognition session between the mobile device and the general cloud or service providers. Emphasize. The ellipse on the left represents "keyvectors", more specifically "visual keyvectors". As noted, this term may include everything from simple JPEG compressed blocks in any manner through log-polarized transformed facial feature vectors and everything in between or beyond them. The key to the keyvector is that the raw information inherent in any given visual recognition task is optimally pre-processed and packaged (and possibly compressed). The left oval assembles these packets and typically inserts some addressing information to be routed (final addressing may not be possible, as the packet may ultimately be routed to remote service providers-details about this). Are not yet known.) Preferably, this process is performed as close to the raw sensor data as possible, such as by processing circuitry integrated on the same substrate as the image sensor, which is provided from another end in packet form or stored in memory. Respond to software instructions.

오른쪽 타원형은 키벡터 데이터의 원격 처리, 예를 들면, 적합한 서비스들을 구성하려는 것, 트래픽 흐름을 향하게 하는 것 등을 관리한다. 바람직하게, 이러한 소프트웨어 처리는 가능한 통신 스택 상에서 낮게, 일반적으로 "클라우드측" 디바이스, 액세스 포인트, 셀 타워 등에서 구현된다. (실시간 비주얼 키벡터 패킷들은 통신 채널을 통해 스트리밍될 때, 그들이 식별되고 라우팅되는 통신 스택이 낮을수록, 주어진 비주얼 인식 작업이 될 "인간 실시간" 보고 느끼기가 더 부드럽다.) 이 어레인지먼트를 지원하기 위해 필요한 남아있는 하이 레벨 처리는 콘텍스트를 위해 도 10a에 포함되고, 일반적으로 기본 모바일 및 원격 하드웨어 능력들을 통해 실행될 수 있다.The right ellipse manages remote processing of keyvector data, for example trying to configure appropriate services, directing traffic flow, and so on. Preferably, such software processing is implemented as low as possible on the communication stack, generally in "cloud side" devices, access points, cell towers, and the like. (When real-time visual keyvector packets are streamed through a communication channel, the lower the communication stack they are identified and routed to, the smoother it will be to see and feel the “human real-time” that will be a given visual recognition task.) Necessary to support this arrangement The remaining high level processing is included in FIG. 10A for the context and can generally be executed via basic mobile and remote hardware capabilities.

도 11 및 도 12는 어떤 클라우드-기반 픽셀 처리 서비스들이 의사-정적(pseudo-static) 방식으로 미리 확립될 수 있는 반면, 다른 제공자들은 역경매 참여를 통해 이용자의 키벡터 데이터를 처리하는 특권을 위해 주기적으로 경합할 수 있는 개념을 도시한다. 많은 구현들에서, 이들 후자의 제공자들은 패킷이 처리를 위해 이용 가능할 때마다 경합한다. 11 and 12 show that some cloud-based pixel processing services may be pre-established in a pseudo-static manner, while other providers periodically for the privilege of processing the user's keyvector data through anti-auction participation. The concept of contention is shown. In many implementations, these latter providers contend whenever a packet is available for processing.

제조 및 모델을 학습하기를 원하는 친숙하지 않은 자동차의 셀 폰 화상을 스냅핑하는 이용자를 고려한다. 다양한 서비스 제공자들은 이 비즈니스를 위해 경합할 수 있다. 창업 벤더는 그 브랜드를 만들고 콜렉터 데이터를 수집하기 위하여 무료로 인식을 실행하도록 제공할 수 있다. 이 서비스에 제시된 이미지는 자동차의 제조 및 모델을 간단히 나타내는 정보를 리턴한다. 소비자 보고들은 제조 및 모델 데이터를 제공할 뿐 아니라, 자동차에 대한 기술적 명세들도 제공하는 대안적인 서비스를 제공할 수 있다. 그러나, 그들은 서비스에 대해 2센트를 청구할 수 있다(또는 대역폭에 기초할 수 있다, 예를 들면 메가픽셀마다 1센트). Edmunds 또는 JD Powers은 소비자 보고들과 같은 데이터를 제공하지만 제공 데이터의 특권에 대해 이용자가 지불하는 또 다른 서비스를 제공할 수 있다. 교환에 있어서, 벤더는 그 파트너들 중 하나가 이용자 광고 상품들 또는 서비스들에 대한 텍스트 메시지를 송신하게 하는 권리가 주어진다. 지급은 이용자의 매달 셀 폰 음성/데이터 서비스 요금 청구에 대해 신용의 형태를 취할 수 있다. Consider a user snapping a cell phone image of an unfamiliar car that wants to learn the make and model. Various service providers can compete for this business. The start-up vendor may offer to run the recognition at no cost to create the brand and collect collector data. The images presented in this service simply return information indicating the make and model of the car. Consumer reports can provide manufacturing and model data, as well as alternative services that provide technical specifications for the car. However, they may charge 2 cents for the service (or may be based on bandwidth, eg 1 cent per megapixel). Edmunds or JD Powers provide the same data as consumer reports but can provide another service that the user pays for the privilege of the data provided. In exchange, the vendor is given the right to have one of its partners send a text message for user advertising products or services. The payment may take the form of credit for the user's monthly cell phone voice / data service billing.

이용자에 의해 지정된 기준, 저장된 선호들, 콘텍스트, 및 다른 규칙들/발견적 교수법들을 이용하여, 질의 라우터 및 응답 관리기(셀 폰에, 클라우드에, 분산된 등)는 처리를 요구하는 데이터 패킷이 안정한 정적 대기들의 서비스 제공자들 중 하나에 의해 다루어져야 하는지의 여부 또는 경매에 기초하여 제공자들에게 제공되어야 하는지의 - 그 경우 경매의 결과를 조정함 - 여부를 결정한다. Using criteria specified by the user, stored preferences, context, and other rules / discovery teaching methods, query routers and response managers (in cell phones, in the cloud, distributed, etc.) ensure that data packets requiring processing are stable. Determine whether it should be handled by one of the service providers of static waits or whether it should be provided to providers based on the auction, in which case the outcome of the auction is adjusted.

정적 대기 서비스는 폰이 초기에 프로그래밍될 때 식별될 수 있고, 폰이 재프로그래밍될 때에만 재구성될 수 있다. (예를 들면, Verizon은 폰들에 대한 모든 FFT 동작들이 이 목적을 위해 제공하는 서버에 라우팅되는 것을 명시할 수 있다.) 또는 이용자는 특정 작업들에 대해 양호한 제공자들을 구성 메뉴를 통해 주기적으로 식별할 수 있거나, 또는 특정 작업들이 경매를 위해 참조되어야 하는 것을 명시할 수 있다. 일부 애플리케이션들은 정적 서비스 제공자들이 인기 있는 곳에 나타날 수 있다; 작업이 너무 평범해질 수 있거나, 한 제공자의 서비스들이 너무 비할 바 없을 수 있어서, 서비스들의 제공을 위한 경합이 정당한 이유가 없다. The static standby service can be identified when the phone is initially programmed and can only be reconfigured when the phone is reprogrammed. (For example, Verizon can specify that all FFT operations for phones are routed to the server providing for this purpose.) Alternatively, the user may periodically identify good providers through the configuration menu for specific tasks. Or may specify that certain tasks should be referenced for auction. Some applications may appear where static service providers are popular; The work may be too common, or the services of one provider may be too unparalleled, so there is no good reason for contention for the provision of services.

경매에 참조된 서비스들의 경우에, 일부 이용자들은 모든 다른 고려사항들 이상으로 가격을 높일 수 있다. 다른 이용자들은 국내 데이터 처리를 강요할 수 있다. 다른 이용자들은 "녹색", "윤리" 또는 다른 통합 실천의 표준들을 충족시키도록 서비스 제공자들이 노력하기를 원할 수 있다. 다른 이용자들은 더욱 풍부한 데이터 출력을 선호할 수 있다. 상이한 기준의 가중치들이 판단을 하는데 있어서 질의 라우터 및 응답 관리기에 의해 적용될 수 있다. In the case of services referenced in the auction, some users may raise the price above all other considerations. Other users can force domestic data processing. Other users may want service providers to strive to meet "green", "ethical" or other standards of integration practice. Other users may prefer richer data output. Weights of different criteria may be applied by the query router and response manager in making the decision.

일부 환경들에서, 질의 라우터 및 응답 관리기에 대한 하나의 입력이 이용자의 위치에 있을 수 있어서, 이용자가 오리건의 집에 있을 때, 이용자가 멕시코에 휴가중일 때와는 상이한 서비스 제공자가 선택될 수 있다. 다른 경우들에서, 요구된 턴어라운드 시간이 지정되며, 이것은 일부 벤더들을 부적격자로 판정할 수 있으며, 다른 벤더들을 더욱 경합시킬 수 있다. 일부 예들에서, 질의 라우터 및 응답 관리기는 예를 들면 이전 경매에서 선택된 서비스 제공자를 식별하는 저장된 결과들이 여전히 이용 가능하고 "신선도" 임계값을 넘지 않는 경우에는 전혀 판단할 필요가 없다. In some circumstances, one input to the query router and response manager may be at the user's location such that when the user is at home in Oregon, a different service provider may be selected than when the user is on vacation in Mexico. . In other cases, the required turnaround time is specified, which may determine some vendors as ineligible and further contend with other vendors. In some examples, the query router and response manager need not determine at all if the stored results identifying, for example, the service provider selected at the previous auction are still available and do not exceed the "freshness" threshold.

벤더들에 의해 제공된 가격 책정은 처리 부하, 대역폭, 일시, 및 다른 고려사항들과 함께 변할 수 있다. 일부 실시예들에서, 제공자들은 경합자들에 의해 제시된 호가들이 알려질 수 있고(데이터 무결성을 보장하는 알려진 신뢰하는 장치들을 이용하여), 그들 호가들이 더욱 끌리게 하기 위한 기회가 주어질 수 있다. 이러한 비딩 전쟁(bidding war)은 제공된 요구액들을 변경하려는 비더들이 없을 때까지 계속될 수 있다. 질의 라우터 및 응답 관리기(또는 일부 구현에서, 이용자)는 그 후에 선택한다. Pricing provided by vendors may vary with processing load, bandwidth, date and time, and other considerations. In some embodiments, the providers may be informed of the quotes presented by the contenders (using known trusted devices that ensure data integrity) and may be given an opportunity to make them more attractive. This bidding war may continue until there are no bidders to change the requirements offered. The query router and response manager (or in some implementations, the user) then select.

설명 편의성 및 시각적 명료성을 위해, 도 12는 "비드 필터 및 브로드캐스트 에이전트"라고 라벨이 붙여진 소프트웨어 모듈을 도시한다. 대부분의 구현들에서, 이것은 질의 라우터 및 응답 관리기 모듈의 부분을 형성한다. 비드 필터 모듈은 일부 벤더들 - 다수의 가능한 벤더들로부터 - 이 처리 작업에 대해 비딩할 기회가 제공되어야 하는지를 판단한다. (이용자의 선호 데이터 또는 이력적 경험은 특정 서비스 제공자들이 부적격인 것을 나타낼 수 있다.) 브로드캐스트 에이전트 모듈은 그 후에, 처리를 위한 이용자 작업을 그들에게 통보하기 위해 선택된 비더들과 통신하고, 그들이 비드를 하는데 필요한 정보를 제공한다. Description For convenience and visual clarity, FIG. 12 shows a software module labeled “Bead Filter and Broadcast Agent”. In most implementations, this forms part of the query router and response manager module. The bead filter module determines if some vendors-from a number of possible vendors-should be given the opportunity to bid for this processing task. (User's preference data or historical experience may indicate that certain service providers are ineligible.) The broadcast agent module then communicates with selected bidders to notify them of user actions for processing, and they Provide the information needed to

바람직하게, 비드 필터 및 브로드캐스트 에이전트는 처리를 위해 이용 가능한 데이터의 적어도 일부의 그들 작업을 미리 행한다. 즉, 그 이용자가 곧 요청할 가능성이 있을 수 있는 동작에 대한 예측이 이루어지는 즉시, 이들 모듈들은 요구되는 것이 예상된 서비스를 실행하도록 제공자를 식별하기 위한 작업을 시작한다. 수 백 밀리초가 지난 후, 이용자 키벡터 데이터는 실제로 처리를 위해 이용 가능할 수 있다(예측이 정확한 것으로 판명되는 경우). Preferably, the bead filter and the broadcast agent do some of their work in advance of at least some of the data available for processing. That is, as soon as a prediction is made about an action that the user may be likely to request soon, these modules begin working to identify the provider to run the service expected to be required. After a few hundred milliseconds, the user keyvector data may actually be available for processing (if the prediction turns out to be correct).

때때로, 구글의 제공 AdWords 시스템들과 같이, 서비스 제공자들은 각각의 이용자 트랜잭션에 참고되지 않는다. 대신, 각각은 비딩 파라미터들을 제공하며, 이것은 트랜잭션이 고려될 때마다 저장되고 참고되어, 어떤 서비스 제공자가 우승하는지를 결정한다. 이들 저장된 파라미터들은 가끔씩 업데이트될 수 있다. 일부 구현들에서, 서비스 제공자는 이용 가능할 때마다 비드 필터 및 브로드캐스트 에이전트에 업데이트된 파라미터들을 넣는다. (비드 필터 및 브로드캐스트 에이전트는 영역 코드(503)에서의 모든 Verizon 가입자들, 또는 커뮤니티에서의 ISP에 대한 모든 가입자들, 또는 도메인 well-dot-com의 모든 이용자들 등과 같이, 많은 인구 통계적 이용자들을 서빙할 수 있거나; 또는 각각의 셀 폰 타워마다 하나씩과 같이 더 많은 국부화된 에이전트들이 활용될 수 있다.) Sometimes, like Google's provided AdWords systems, service providers are not referenced in each user transaction. Instead, each provides bidding parameters, which are stored and consulted whenever a transaction is considered, to determine which service provider wins. These stored parameters can be updated from time to time. In some implementations, the service provider puts updated parameters in the bead filter and broadcast agent whenever it is available. (The bead filter and broadcast agent may serve many demographic users, such as all Verizon subscribers in area code 503, all subscribers to ISPs in the community, or all users of the domain well-dot-com, etc.). May serve; or more localized agents may be utilized, such as one for each cell phone tower.)

트래픽의 휴식이 있는 경우, 서비스 제공자는 다음 순간 동안 그 서비스들을 할인할 수 있다. 서비스 제공자는 따라서, 유닉스 시대에 1244754176 국제 표준 시간까지 2센트에 대해 최대 10메가바이트의 이미지 파일에 대한 고유벡터 추출을 실행할 것이고, 그 시간 후에 가격은 3센트로 돌아갈 것이라고 진술하는 메시지를 송신(또는 우송)할 수 있다. 비드 필터 및 브로드캐스트 에이전트는 따라서 저장된 비딩 파라미터들을 가진 테이블을 업데이트한다. If there is a break in traffic, the service provider may discount the services for the next moment. The service provider will therefore send (or mail) a message stating that in the Unix era, it will perform eigenvector extraction for image files up to 10 megabytes for two cents by 1244754176 international standard time, after which time the price will return to three cents. )can do. The bid filter and broadcast agent thus update the table with the stored bidding parameters.

(판독자는 웹 검색 결과 페이지 상에서 광고주에 의한 광고를 배치하기 위해 구글에 의해 이용된 역경매 어레인지먼트들이 친숙하다고 가정한다. Levy의 2009년 5월 22일 "Secret of Googlenomics: Data-Fueled Recipe Brews Profitability," Wired Magazine에 예시적인 기술이 제공된다. (The reader assumes that the reverse auction arrangements used by Google to place ads by advertisers on web search results pages are familiar. May 22, 2009, "Secret of Googlenomics: Data-Fueled Recipe Brews Profitability," Example technology is provided in Wired Magazine.

다른 구현들에서, 브로드캐스트 에이전트는 비더들을 폴링한다 - 관련 파라미터들을 통신하고, 트랜잭션이 처리를 위해 제공될 때마다 비드 응답들을 요청한다.In other implementations, the broadcast agent polls the bidders-communicating the relevant parameters and requesting bid responses whenever a transaction is provided for processing.

우세한 비더가 결정되면, 데이터는 처리를 위해 이용 가능하고, 브로드캐스트 에이전트는 우승한 비더에 키벡터 데이터(및 특정 작업에 적합할 수 있으므로 다른 파라미터들)를 송신한다. 그 후에, 비더는 요청된 동작을 실행하고, 처리된 데이터를 질의 라우터 및 응답 관리기에 리턴한다. 이 모듈은 처리된 데이터를 로깅하고, 임의의 필요한 회계(예를 들면, 적당한 수수료로 서비스 제공자를 신용함)에 참여한다. 응답 데이터는 그 후에 이용자 디바이스에 다시 송신된다. Once the prevailing bidder is determined, the data is available for processing, and the broadcast agent sends keyvector data (and other parameters as it may be suitable for the particular task) to the winning bidder. The bider then executes the requested operation and returns the processed data to the query router and response manager. This module logs the processed data and participates in any necessary accounting (eg, trusting the service provider for a reasonable fee). The response data is then sent back to the user device.

변형 어레인지먼트에서, 하나 이상의 경합하는 서비스 제공자들은 실제로, 요청된 처리들의 일부 또는 전부를 실행하지만, 부분적인 결과들만을 제공함으로써 이용자(또는 질의 라우터 및 응답 관리기)를 "조른다". 이용 가능한 것의 맛보기로, 이용자(또는 질의 라우터 및 응답 관리기)는 달리 나타낸 관련 기준/발견적 교수법과는 상이한 선택을 하도록 유도될 수 있다. In a variant arrangement, one or more competing service providers actually perform some or all of the requested processes, but “join” the user (or query router and response manager) by providing only partial results. With a sneak peek of what is available, the user (or query router and response manager) may be induced to make different choices than the relevant criteria / discovery teaching methods otherwise indicated.

외부 서비스 제공자들에 송신된 함수 호들은 당연히, 소비자가 찾는 궁극적 결과(예를 들면, 자동차를 식별하거나 불어를 영어로 메뉴 리스팅을 변환)를 제공하지 않아도 된다. 그것들은 FFT를 계산하거나, SIFT 절차 또는 로그-극성 변환을 실행하거나, 히스토그램 또는 고유벡터들을 계산하거나, 에지들을 식별하는 등과 같은 구성요소 동작들일 수 있다. Naturally, function calls sent to external service providers do not have to provide the ultimate result the consumer seeks (eg, identify a car or translate a menu listing into French in English). They may be component operations, such as calculating an FFT, performing a SIFT procedure or log-polar transformation, calculating histograms or eigenvectors, identifying edges, and the like.

조만간, 전문가 처리기들의 풍부한 에코시스템이 - 셀 폰들 및 다른 얇은 클라이언트 디바이스로부터 무수한 처리 요청들을 서빙하는 - 나타날 것이라고 예상된다.Sooner or later, it is expected that a rich ecosystem of expert processors will emerge-serving countless processing requests from cell phones and other thin client devices.

화폐 흐름에 대한 중요성Importance of money flow

이용자 정보(예를 들면, 시청률)에 대한 교환에 있어서, 또는 조사 완료, 특정 장소 방문, 상점의 위치추적들 등과 같이 이용자에 의해 취해진 동작에 대한 교환에 있어서 서비스 제공자들 자신에 의해 소비된 장려금 지급(subsidization)을 관련시키는 원격 서비스들의 부가적인 비즈니스 모델들이 가능할 수 있다. Incentive payments spent by service providers themselves in exchange for user information (e.g., viewership) or in exchange for actions taken by the user, such as completing a survey, visiting a particular place, tracking the location of a store, etc. Additional business models of remote services may be possible that relate to subsidization.

서비스들은 마찬가지로, 소비자들이 상점에 앉아있는 동안 원격 서비스들의 무료/할인된 이용의 형태로 소비자들에게 차별화된 서비스를 제공함으로써 값을 도출하는 커피숍과 같이, 제3자에 의해 장려금이 지급될 수 있다. Services may likewise be awarded incentives by third parties, such as coffee shops that derive value by providing differentiated services to consumers in the form of free / discounted use of remote services while they are sitting in the store. have.

일 어레인지먼트에서, 원격 처리 신용들의 통화가 이용자들과 원격 서비스 제공자들 사이에서 생성되고 교환되는 경제가 가능하다. 이것은 이용자에게 전적으로 투명하고, 예를 들면, 이용자의 셀 폰 또는 데이터 서비스 제공자와의 서비스 플랜의 일부로서 관리될 수 있다. 또는 본 기술의 특정 실시예들의 매우 명시적인 양태로서 노출될 수 있다. 서비스 제공자들 및 다른 제공자들은 동작들을 취하고 특정 제공자들과의 충정을 만들기 위해 빈번한-이용자 프로그램의 일부인 이용자들에게 신용들을 수여할 수 있다. In one arrangement, an economy is possible in which calls of remote processing credits are created and exchanged between users and remote service providers. It is entirely transparent to the user and can be managed, for example, as part of the service plan with the user's cell phone or data service provider. Or as a very explicit aspect of certain embodiments of the present technology. Service providers and other providers may grant credits to users who are part of a frequent-user program to take actions and make loyalty with particular providers.

다른 화폐들에 대해, 이용자들은 명시적으로 기증, 저장, 교환, 또는 일반적으로 필요시 신용들을 물물교환하기 위해 선택될 수 있다. For other currencies, users may be selected to explicitly donate, store, exchange, or generally barter credits as needed.

더욱 상세히 이들 중점들을 고려하면, 서비스는 시청률 패널에 참여하는 이용자에게 지불할 수 있다. 예를 들면, Nielsen 회사는 소비자들에 의해 제시된 오디오 또는 비디오 샘플들로부터의 텔레비전 프로그래밍의 식별과 같이, 대중에게 서비스들을 제공할 수 있다. 이들 서비스들은 Nielsen과 미디어 소비 데이터의 일부를 공유하는 것에 동의한 소비자들에게 무료로 제공되고(도시의 시청률 패널을 위해 익명의 멤버의 역할을 함으로써와 같이), 다른 소비자들에는 수수료에 기초하여 제공될 수 있다. Nielsen은 예를 들면, 매달 참여한 소비자들에게 100 단위의 신용을 - 소액 결제들 또는 다른 값 - 제공할 수 있거나, 이용자가 Nielsen에 정보를 제시할 때마다 신용을 제공할 수 있다.Taking these emphasis into greater detail, the service may pay a user to participate in an audience rating panel. For example, the Nielsen company can provide services to the public, such as identification of television programming from audio or video samples presented by consumers. These services are provided free of charge to consumers who agree to share some of Nielsen's media consumption data (such as acting as anonymous members for the city's viewership panel) and to other consumers on a fee-based basis. Can be. Nielsen may, for example, provide 100 units of credit-micropayments or other values-to consumers who participate each month, or provide credit each time a user presents information to Nielsen.

다른 예에서, 소비자는 회사로부터 광고들 또는 광고 임프레션들을 수용하기 위해 보상받을 수 있다. 소비자가 덴버에 있는 펩시 센터로 가면, 소비자는 각각의 소비자가 마주치는 펩시-브랜드 경험에 대한 보상을 받을 수 있다. 소액 결제의 액수는 소비자가 무대(venue)에서 상이한 펩시-브랜드 오브젝트들(오디오 및 이미지를 포함하는)과 상호작용한 시간량에 비례할 수 있다. In another example, the consumer may be rewarded to accept advertisements or advertisement impressions from the company. When consumers go to the Pepsi Center in Denver, they can be rewarded for the Pepsi-branded experience each consumer encounters. The amount of the micropayment may be proportional to the amount of time the consumer has interacted with different Pepsi-brand objects (including audio and images) on the stage.

뿐만 아니라, 대형 브랜드 소유주들은 개별적으로 신용들을 제공할 수 있다. 신용들은 친구들 및 소셜/비즈니스 지식들에 라우팅될 수 있다. 예시하기 위해, 페이스북의 이용자는 그의 페이스북 페이지로부터 - 다른 사람들이 방문하거나 즐기도록 마음을 끄는 - 신용(상품들/서비스들에 대해 상환 가능하거나, 현금으로 교환 가능한)을 공유할 수 있다. 일부 경우들에 있어서, 신용은 이용자의 비즈니스 카드로부터 또는 다른 론치 페이지로부터 페이지에 링크하는 것과 같이 특정한 방식으로 페이스북 페이지를 네비게이팅하는 사람만 이용 가능하게 할 수 있다. In addition, large brand owners can provide credits individually. Credits can be routed to friends and social / business knowledges. To illustrate, a Facebook user may share credit (which is redeemable or redeemable for goods / services) from his Facebook page-which attracts others to visit or enjoy. In some cases, credit may be available only to a person who navigates a Facebook page in a particular way, such as linking to the page from the user's business card or from another launch page.

다른 예로서, 이익을 얻었거나, 지불했거나, 달리, 특정 서비스들 - iTunes로부터의 노래들의 다운로드, 또는 음악 인식 서비스들, 또는 특정 신발과 어울리는 옷들의 식별(그에 대한 이미지가 제시됨) 등과 같이 - 에 적용될 수 있는 신용을 수신한 페이스북 이용자를 고려하자. 이들 서비스들은 특정 페이스북 페이지와 연관될 수 있어서, 친구들은 그 페이지로부터 서비스들을 - 특히, 호스트의 신용을 소비하는 것(다시, 그 호스팅 이용자에 의한 적절한 허가 또는 초대를 받아서) - 호출할 수 있다. 마찬가지로, 친구들은 이용자의 페이스북 페이지와 연관된 애플리케이션을 통해 액세스 가능한 얼굴 인식 서비스에 이미지들을 제시할 수 있다. 이러한 방식으로 제시된 이미지들은 호스트의 친구들의 얼굴들에 대해 분석되고, 식별 정보는 예를 들면, 오리지널 페이스북 페이지 상에 제공된 이용자 인터페이스를 통해 제시자에게 리턴된다. 다시, 호스트는 각각의 이러한 동작에 대해 평가될 수 있지만, 승인된 친구들만 그러한 서비스 자체를 스스로 무료로 이용하도록 허용할 수 있다. As another example, benefited, paid, or otherwise, on certain services-such as the download of songs from iTunes, or music recognition services, or the identification of clothes that match a particular shoe (images thereof are shown). Consider a Facebook user who has received credit that may apply. These services can be associated with a particular Facebook page so that friends can call the services from that page-in particular, consuming the host's credit (again with the proper permission or invitation by the hosting user). . Similarly, friends can present images to a facial recognition service accessible through an application associated with the user's Facebook page. Images presented in this way are analyzed for the faces of friends of the host, and the identification information is returned to the presenter, for example, via a user interface provided on the original Facebook page. Again, the host can be evaluated for each such operation, but only authorized friends can allow such service itself to be used free of charge.

신용들 및 지불들은 또한 자선단체들에 라우팅될 수 있다. 방글라데시의 빈곤에 관한 특별히 신랄한 영화 후에 극장을 나온 관람자는 연관된 영화 포스터의 이미지를 캡처할 수 있으며, 이것은 방글라데시의 빈곤한 자를 돕는 자선단체에 기부를 위한 포털의 역할을 한다. 영화 포스터를 인식할 때, 셀 폰은 그래픽/터치 이용자 인터페이스를 제공할 수 있으며, 이를 통해 이용자가 베풀 수 있는 기부 금액을 지정하기 위해 다이얼을 돌리고, 트랜잭션의 끝맺음에서는 이용자와 연관된 금융 계좌에서 자선단체와 연관된 금융 계좌로 이체된다.Credits and payments can also be routed to charities. After a particularly poignant movie about poverty in Bangladesh, spectators leaving the theater can capture images of associated movie posters, which serve as a portal for donations to charities that help the poor in Bangladesh. When recognizing a movie poster, the cell phone can provide a graphical / touch user interface that allows the user to dial to specify the amount of contributions the user can make, and at the end of the transaction, the charity in the financial account associated with the user. Is transferred to the financial account associated with it.

특정 하드웨어 Specific hardware 어레인지먼트에On the arrangement 관한 추가 More about

상기 및 인용된 특허 문헌들에 주지된 바와 같이, 모바일 서비스에 의한 일반적인 오브젝트 인식이 필요하다. 특수화된 오브젝트 인식에 대한 일부 방식들이 나왔고, 이들은 특정 데이터 처리 방식들에 대한 증대를 제공했다. 그러나, 특수화된 오브젝트 인식일 넘어서 일반적인 오브젝트 인식쪽으로 진행하는 아키텍처가 제안되지 않았다.As noted in the above and cited patent documents, there is a need for general object recognition by mobile services. Some approaches to specialized object recognition have emerged, and they have provided an augmentation for certain data processing approaches. However, no architecture has been proposed that goes beyond generalized object recognition toward general object recognition.

시각적으로, 일반적인 오브젝트 인식 어레인지먼트는 양호한 미가공 비주얼 데이터에 - 바람직하게는 디바이스 급변들(quirks), 장면 급변들, 이용자 급변들 등이 없는 - 대한 액세스를 요구한다. 오브젝트 식별을 둘러싸고 만들어진 시스템들의 개발자들이 가장 성공할 것이며, 현재 직면할 수 밖에 없는 무수한 기존의 장애물들, 리소스 싱크들 및 제 3 자 의존성들이 아니라, 가까운 미래에 오브젝트 식별 작업에 대해 집중함으로써 그들 이용자들을 서빙할 것이다.Visually, a general object recognition arrangement requires access to good raw visual data, preferably without device quirks, scene changes, user changes, and the like. Developers of systems built around object identification will be the most successful, serving their users by focusing on object identification tasks in the near future, not on the myriad existing obstacles, resource sinks, and third party dependencies that they currently have to face. something to do.

주지된 바와 같이, 사실상 모든 오브젝트 식별 기술들은 "클라우드"에 대한 파이프를 이용할 수 있다 - 또는 의존할 수도 있다. As noted, virtually all object identification techniques may use a pipe for "cloud"-or may depend.

"클라우드"는 셀 폰 외부의 모든 것을 포함할 수 있다. 예는 가까운 셀 폰, 또는 분산형 네트워크 상의 복수의 폰들이다. 이러한 다른 폰 디바이스들 상의 미이용 처리 전력은 필요할 때마다 호출을 위한 이용(또는 무료로)을 이용 가능하게 될 수 있다. 본 명세서에 상술된 구현의 셀 폰들은 이러한 다른 셀 폰으로부터 처리 전력을 모을 수 있다. The "cloud" can include everything outside the cell phone. An example is a nearby cell phone, or a plurality of phones on a distributed network. Unused processing power on these other phone devices may be made available for use (or free of charge) whenever needed. Cell phones of the implementations detailed herein can collect processing power from these other cell phones.

이러한 클라우드는 ad hoc, 예를 들면 이용자 폰의 블루투스 범위 내의 다른 셀 폰들이 될 수 있다. 이러한 다른 폰들을 이용자는 도달할 수 없지만 로컬 클라우드를 블루투스에 의해 도달될 수 있는 또 다른 폰들로 또한 연장시킴으로써 ad hoc 네트워크가 연장될 수 있다. This cloud may be an ad hoc, for example other cell phones within the Bluetooth range of the user phone. These other phones can be extended by ad hoc network by also extending the local cloud to other phones that are not reachable by the user but can be reached by Bluetooth.

"클라우드"는 또한 셋-톱 박스들과 같은 다른 계산 플랫폼들, 자동차들의 처리기들, 서모스탯들, HVAC 시스템들, 무선 라우터들, 로컬 셀 폰 타워들 및 다른 무선 네트워크 에지들(소프트웨어-규정된 무선 기기를 위한 처리 하드웨어를 포함) 등을 포함할 수 있다. 이러한 처리기들은 - 구글, 아마존 등에 의해 제공되므로 - 더 많은 종래의 클라우드 계산 리소스들과 함께 이용될 수 있다. The "cloud" also includes other computing platforms, such as set-top boxes, processors in cars, thermostats, HVAC systems, wireless routers, local cell phone towers and other wireless network edges (software-defined). And processing hardware for the wireless device). Such processors can be used with more conventional cloud computing resources-as provided by Google, Amazon, and the like.

(프라이버시에 관한 특정 이용자들의 관심사들에 비추어, 폰은 바람직하게, 폰이 처리를 위한 클라우드 리소스들에 데이터를 조회할 수 있는지의 여부를 나타내는 이용자-구성 가능한 옵션을 가진다. 일 어레인지먼트에서, 이 옵션은 "아니오"의 디폴트 값을 가져서, 기능을 제한하고 배터리 수명을 감소시킬 뿐만 아니라 프라이버시 관심들을 제한한다. 다른 어레인지먼트에서, 이 옵션은 "예"의 디폴트 값을 가진다.)(In view of the particular users' concerns about privacy, the phone preferably has a user-configurable option that indicates whether the phone can query the cloud resources for processing. In one arrangement, this option Has a default value of "no", which not only limits functionality and reduces battery life, but also limits privacy concerns. In other arrangements, this option has a default value of "yes".)

바람직하게, 이미지-응답 기술들은 단기간 "결과 또는 응답"을 생성해야 하며, 이것은 일반적으로, 이용자와의 어떤 레벨의 상호작용을 필요로 한다 - 실제로 상호작용 애플리케이션들을 위해 1초의 프렉션들에서, 또는 근일의 "나는 기다리는 인내심이 있다" 애플리케이션들을 위해 수초 또는 1분의 프렉션들에서 측정된다. Preferably, image-response techniques should produce a “result or response” for a short period of time, which generally requires some level of interaction with the user—in one second fractions for actually interacting applications, or Measured in fractions of a second or a minute for recent "I have patience waiting" applications.

해당 오브젝트들에 대해, 이들은 (1) 일반적인 수동(기본 검색들에 대한 단서들), (2) 지리적 수동(당신이 있는 곳을 당신이 적어도 알고 있고, 지리적- 특정 리소스들에 연결할 수 있음), (3) "식별된/열거된 오브젝트들" 및 그들 연관된 사이트들과 같이 "클라우드 지원된(cloud supported)" 수동, 및 (4) 활성/제어 가능한 la ThingPipe(WiFi- 장착된 서모스탯들 및 파킹 미터들과 같이 하기에 상술된 기술을 참조)를 포함한 다양한 카테고리들로 나누어질 수 있다.For those objects, they are: (1) general manual (clues to basic searches), (2) geographic manual (you at least know where you are and can connect to geo-specific resources), (3) "cloud supported" manuals, such as "identified / enumerated objects" and their associated sites, and (4) active / controllable la ThingPipe (WiFi-equipped thermostats and parking) Meters can be divided into various categories, including the techniques detailed above.

오브젝트 인식 플랫폼은 클래식 "로컬 디바이스 및 로컬 리소스들 전용" 소프트웨어 지성으로 생각되어서는 안되지만, 그럴 것 같다. 그러나, 그것은 로컬 디바이스 최적화 문제로서 생각될 수 있다. 즉, 로컬 디바이스 상의 소프트웨어 및 그 처리 하드웨어는 오프-디바이스 소프트웨어 및 하드웨어와의 상호작용의 고찰에서 설계되어야 한다. 디바이스 오프에 비해, 제어 기능, 픽셀 고속 처리 기능 양쪽 모두의 밸런스 및 상호작용, 및 디바이스 상에 제공된 애플리케이션 소프트웨어/GUI에 동일하다. (많은 구현들에서, 오브젝트 식별/인식에 유용한 특정 데이터베이스들은 디바이스로부터 원격에 존재할 것이다.)The object recognition platform should not be thought of as a classic "local device and local resources only" software intelligence, but it is likely. However, it can be thought of as a local device optimization problem. In other words, the software on the local device and its processing hardware should be designed in consideration of the interaction with off-device software and hardware. Compared to device off, the balance and interaction of both control functions, pixel high speed processing functions, and application software / GUI provided on the device are the same. (In many implementations, certain databases useful for object identification / recognition will exist remotely from the device.)

특별히 바람직한 어레인지먼트에서, 이러한 처리 플랫폼은 센서 근처 - 최상으로 동일한 칩 상-의 이미지 처리를 활용하며, 적어도 일부 처리 작업들은 전용의 특수 용도 하드웨어에 의해 실행되는 것이 바람직하다. In a particularly preferred arrangement, this processing platform utilizes image processing near the sensor-on the same chip as the best, and at least some processing tasks are preferably carried out by dedicated special purpose hardware.

도 13을 고려하면, 이미지 센서(12)가 2개의 처리 경로들을 공급하는 셀 폰(10)의 아키텍처를 도시한다. 하나(13)는 인간 시각 시스템을 위해 적응되고 JPEG 압축과 같은 처리를 포함한다. 다른 하나(14)는 오브젝트 인식을 위해 적응된다. 논의한 바와 같이, 이러한 처리의 일부는 모바일 디바이스에 의해 실행될 수 있고, 다른 처리는 클라우드(16)에 참조될 수 있다. Considering FIG. 13, the architecture of the cell phone 10 in which the image sensor 12 supplies two processing paths is shown. One 13 is adapted for the human visual system and includes processing such as JPEG compression. The other 14 is adapted for object recognition. As discussed, some of these processes may be executed by the mobile device and other processes may be referenced to the cloud 16.

도 14는 오브젝트 인식 처리 경로의 애플리케이션-중심도를 취한다. 일부 애플리케이션들은 셀 폰 상에 전적으로 존재한다. 다른 애플리케이션들은 셀 폰의 외부에 전적으로 존재한다 - 예를 들면, 자극과 같은 키벡터 데이터를 간단히 취함. 일부 처리가 셀 폰에서 행해지고, 다른 처리가 외부적으로 행해지는 경우와 같이 하이브리드들이 더욱 흔하며, 처리를 조정하는 애플리케이션 소프트웨어는 셀 폰에 존재한다. 14 takes an application-centricity of the object recognition processing path. Some applications are entirely on the cell phone. Other applications are entirely external to the cell phone-simply taking keyvector data such as stimuli, for example. Hybrids are more common, such as when some processing is done at the cell phone, other processing is done externally, and application software that coordinates the processing is present at the cell phone.

다른 논의를 예시하기 위하여, 도 15는 이용자의 셀 폰에 의해 캡처될 수 있는 상이한 형태의 이미지들(41-46)의 일부의 범위(40)를 도시한다. 각각의 이미지에 적용될 수 있는 일부 처리에 관한 몇 개의 간단한(완전하지 않은) 코멘트들이 다음의 단락들에 제공된다. To illustrate another discussion, FIG. 15 shows a range 40 of a portion of different types of images 41-46 that can be captured by a user's cell phone. Some simple (incomplete) comments on some processing that can be applied to each image are provided in the following paragraphs.

이미지(41)는 서모스탯을 묘사한다. 스테가노그래픽 디지털 워터마크(47)가 서모스탯의 케이스 상에 텍스처되거나 인쇄된다. (워터마크는 도 15에 가시적으로 도시되지만 통상적으로 뷰어에게는 인식 불가능하다.) 워터마크는 셀 폰을 위해 의도된 정보를 전달하여, 이용자가 서모스탯과 상호작용할 수 있는 그래픽 이용자 인터페이스를 제공하도록 허용한다. 바 코드 또는 다른 데이터 캐리어가 대안적으로 이용될 수 있다. 이러한 기술은 하기에 더욱 상술된다. Image 41 depicts a thermostat. A steganographic digital watermark 47 is textured or printed on the case of the thermostat. (The watermark is shown visually in FIG. 15 but is typically not recognizable to the viewer.) The watermark conveys the information intended for the cell phone, allowing the user to provide a graphical user interface that can interact with the thermostat. do. Bar codes or other data carriers may alternatively be used. This technique is further detailed below.

이미지(42)는 바코드(48)를 포함하는 항목을 묘사한다. 이 바코드는 통일 제품 코드(UPC: Universal Product Code) 데이터를 전달한다. 다른 바코드들은 다른 정보를 전달할 수 있다. 바코드 페이로드는 주로 이용자 셀 폰(워터마크(47)에 대조적으로)에 의해 판독되도록 의도되는 것이 아니지만, 그럼에도 이용자에 대한 적당한 응답을 결정하는데 도움을 주기 위하여 셀 폰에 의해 이용될 수 있다. Image 42 depicts an item that includes a barcode 48. This barcode carries Universal Product Code (UPC) data. Different barcodes can carry different information. Barcode payloads are not primarily intended to be read by the user cell phone (as opposed to watermark 47) but may nevertheless be used by the cell phone to help determine the appropriate response to the user.

이미지(43)는 임의의 고속 기계 판독가능한 정보(바코드 또는 워터마크와 같이)를 참조하기 않고 식별될 수 있는 제품을 도시한다. 명백한 배경으로부터 명백한 이미지 대상을 구별하기 위하여, 세그먼테이션 알고리즘이 에지-검출된 이미지 데이터에 적용될 수 있다. 이미지 대상은 그 형상, 컬러 및 텍스처를 통해 식별될 수 있다. 이미지 핑거프린팅은 유사한 라벨들을 갖는 참조 이미지들을 식별하는데 이용될 수 있고, 이들 다른 이미지들과 연관된 메타데이터가 채집될 수 있다. SIFT 기술(하기에 논의됨)은 이러한 패턴-기반 인식 작업들을 위해 활용될 수 있다. 낮은 텍스처 영역들의 거울형 반사들은 이미지 대상이 유리로 이루어진 것을 나타내려고 할 수 있다. 광학 캐릭터 인식은 다른 정보(가시적 텍스트를 판독)에 대해 적용될 수 있다. 모든 이들 단서들은 묘사된 항목을 식별하기 위해 활용될 수 있고, 이용자에 대한 적당한 응답을 결정하도록 돕는다. Image 43 shows a product that can be identified without reference to any high speed machine readable information (such as a barcode or watermark). In order to distinguish the apparent image object from the apparent background, a segmentation algorithm can be applied to the edge-detected image data. Image objects can be identified through their shape, color and texture. Image fingerprinting can be used to identify reference images with similar labels, and metadata associated with these other images can be collected. SIFT techniques (discussed below) may be utilized for such pattern-based recognition tasks. Mirrored reflections of low texture areas may attempt to indicate that the image object is made of glass. Optical character recognition can be applied for other information (reading visible text). All these clues can be used to identify the depicted item and help determine the appropriate response to the user.

부가적으로(또는 대안적으로), 구글 유사 이미지들 및 마이크로소프트 라이브 검색과 같은 유사한-이미지 검색 시스템이 유사한 이미지들을 발견하는데 활용될 수 있고, 그 후에 그들 메타데이터가 채집될 수 있다. (이 기록에서와 같이, 이들 서비스들은 유사한 웹 화상들을 찾기 위해 이용자 화상의 업로드를 직접적으로 지원하지 않는다. 그러나, 이용자는 이미지를 플리커(플리커의 셀폰 업로드 기능을 이용하여)에 우송할 수 있고, 이것은 곧 구글 및 마이크로소프트에 의해 발견되고 처리될 것이다.)Additionally (or alternatively), a similar-image search system such as Google Similar Images and Microsoft Live Search can be utilized to find similar images, after which their metadata can be collected. (As in this record, these services do not directly support uploading user images to find similar web images. However, the user can mail images to Flickr (using Flickr's cell phone upload feature), This will soon be discovered and managed by Google and Microsoft.)

이미지(44)는 친구들의 스냅샷이다. 얼굴 검출 및 인식이 활용될 수 있다(즉, 이미지에 얼굴들이 있는 것을 나타내기 위해, 특정 얼굴들을 식별하고, 따라서 예를 들면 애플의 아이포토 서비스, 구글의 피카사 서비스, 페이스북 등에 의해 유지된 이용자-연관 데이터를 참조하여 메타데이터로 이미지에 주석달기 위해). 일부 얼굴 인식 애플리케이션들은 비인간 얼굴들, 예를 들면 아바타들을 포함한 고양이들, 개들 애니메이팅된 캐릭터들 등을 위해 트레이닝될 수 있다. 셀 폰으로부터 지리적 위치 및 데이터/시간 정보는 또한 유용한 정보를 제공할 수 있다. Image 44 is a snapshot of friends. Face detection and recognition can be utilized (ie, to identify certain faces to indicate that there are faces in the image, and thus maintain users maintained by, for example, Apple's iPhoto service, Google's Picasa service, Facebook, etc.). To annotate the image as metadata with reference to the associated data). Some face recognition applications may be trained for non-human faces, such as cats including avatars, dogs animated characters, and the like. Geographic location and data / time information from the cell phone may also provide useful information.

선글라스를 착용한 사람들은 일부 얼굴 인식 알고리즘들에 도전적이다. 이들 개인들의 식별은 그 아이덴티티들이 더욱 쉽게 결정될 수 있는(예를 들면 종래의 얼굴 인식에 의해) 사람들과의 연관에 의해 도움을 받을 수 있다. 즉, 하나 이상의 후자의 개인들을 포함하는 아이포토/피카사/페이스북/등에서 다른 그룹의 화상들을 식별함으로써, 그러한 사진들에 묘사된 다른 개인들은 대상 이미지에 또한 존재할 수 있다. 이들 후보 사람들은 제한되지 않은 아이포토/피카사/페이스북/등에 의해 정상적으로 제공되는 것보다 훨씬 더 작은 가능성들을 형성한다. 대상 이미지에서 선글라스 착용 얼굴들로부터 인식할 수 있는 얼굴 벡터들은 그 후에 최상의 매칭을 결정하기 위해, 이 더 작은 가능성들에 대해 비교될 수 있다. 얼굴을 인식하는 일반적인 경우에, 90의 점수가 매칭(100의 임의의 최고 매칭 점수로부터)이라고 간주되는데 요구된다면, 이러한 그룹-제한된 세트의 이미지들을 검색하는데 있어서, 70 또는 80의 점수면 충분할 수 있다. (이미지(44)에서와 같이, 두 사람들이 선글라스 없이 묘사되는 경우, 하나 이상의 다른 개인들과 함께 사진 속의 이들 개인들 양쪽 모두의 출현은 예를 들면 매칭 알고리즘에서 가중 팩터를 증가시킴으로써 구현된 이러한 분석에 대한 관련성을 증가시킬 수 있다.)People wearing sunglasses are challenging some face recognition algorithms. Identification of these individuals can be aided by association with people whose identities can be more easily determined (eg by conventional facial recognition). That is, by identifying other groups of images in iPhoto / Picasa / Facebook / etc., Including one or more of the latter individuals, other individuals depicted in those photos may also be present in the subject image. These candidates form much smaller possibilities than those normally offered by unrestricted iPhoto / Picasa / Facebook / etc. Face vectors recognizable from sunglasses wearing faces in the subject image can then be compared against these smaller possibilities to determine the best match. In the general case of face recognition, a score of 70 or 80 may be sufficient in retrieving this group-limited set of images if a score of 90 is required to be considered a match (from any best matching score of 100). . (When two people are depicted without sunglasses, as in image 44, the appearance of both of these individuals in the picture with one or more other individuals is such an analysis implemented by, for example, increasing the weighting factor in the matching algorithm. May increase the relevance of

이미지(45)는 NY, 록펠러 센터에서 프로메테우스의 조각상의 일부를 도시한다. 그 식별은 이 명세의 다른 곳에 상술된 개시내용들을 따를 수 있다. Image 45 shows a portion of the statue of Prometheus at Rockefeller Center, NY. The identification may follow the disclosures described elsewhere in this specification.

이미지(46)는 콜로라도의 마룬 벨즈 마운틴 영역을 묘사하는 풍경이다. 이 이미지 대상은 지오네임즈(GeoNames) 또는 야후!의 지오플래넷(GeoPlanet)과 같은 지리적 정보 서비스들과 함께 셀 폰으로부터의 지리적 위치 데이터를 참조하여 인식될 수 있다. Image 46 depicts a landscape depicting the Maroon Bells Mountain region of Colorado. This image object may be recognized with reference to geographic location data from the cell phone along with geographic information services such as GeoNames or Yahoo! 'S GeoPlanet.

(도 15의 이미지들(41-46) 중 하나의 처리와 함께 상기 주지된 기술들은 이미지들의 다른 것들에도 마찬가지로 적용될 수 있음을 유념한다. 더욱이, 일부 관점들에서, 묘사된 이미지들은 대상의 식별 및 응답의 공식화의 용이성에 따라 순서화되지만, 다른 관점들에서는 그렇지 않음을 알아야 한다. 예를 들면, 풍경 이미지(46)가 먼 오른 쪽에 묘사되지만, 그 지리적 위치 데이터는 메타데이터 "마룬 벨즈"와 강력히 상관된다. 따라서, 이 특정 화상은 많은 다른 이미지들에 의해 제공되는 것보다 훨씬 용이한 경우를 제공한다.)(Note that the well-known techniques, together with the processing of one of the images 41-46 of Figure 15, can also be applied to the others of the images as well. Moreover, in some aspects, the depicted images are characterized by the identification and It should be noted that although ordered according to the ease of formulating the response, it is not from other perspectives, for example, while landscape image 46 is depicted to the far right, but its geographical position data is strongly correlated with the metadata "Maroon Bells". Thus, this particular picture provides a case that is much easier than that provided by many other images.)

일 실시예에서, 이미지의 이러한 처리는 자동으로 - 고속 이용자 명령어를 매시간 이용하지 않고 - 발생된다. 전력 제약 및 네트워크 접속에 대한 주체, 정보는 그러한 처리로부터 연속으로 수집될 수 있고, 후속-캡처된 이미지들을 처리하는데 이용될 수 있다. 예를 들면, 사진(44)을 포함하는 시퀀스의 초기 이미지는 선글라스를 착용하지 않고 묘사된 그룹의 멤버들을 보여주고 있다 - 나중에 선글라스를 착용한 사람들의 식별을 간단하게 한다.In one embodiment, this processing of the image occurs automatically-without using fast user instructions every hour. Subjects, information about power constraints and network connections can be collected continuously from such processing and used to process subsequent-captured images. For example, the initial image of the sequence including photo 44 shows the members of the depicted group without wearing sunglasses-simplifying the identification of those who later wore sunglasses.

도 16 등16 and so on , 구현, Implementation

도 16은 초기에 논의된 특징들 중 특정한 것을 통합하는 특정 구현의 핵심에 속한다. (다른 논의된 특징들은 제공된 개시내용에 기초하여, 이 아키텍처 내에서 기술자에 의해 구현될 수 있다.) 이 데이터 구동된 어레인지먼트(30)에서, 셀 폰 카메라(32)의 동작은 셋업 모듈(34)에 의해 송신된 패킷 데이터에 따라 동적으로 제어되고, 그 후에 제어 처리기 모듈(36)에 의해 제어된다. (제어 처리기 모듈(36)은 셀 폰의 1차 처리기 또는 보조 처리기일 수 있거나, 이 기능이 분산될 수 있다.) 패킷 데이터는 처리 스테이지들(38)의 확실한 체인에 의해 실행될 동작들을 명시한다. 16 belongs to the heart of a particular implementation incorporating certain of the features discussed earlier. (Other discussed features can be implemented by a technician within this architecture, based on the disclosure provided.) In this data driven arrangement 30, the operation of the cell phone camera 32 is controlled by the setup module 34. It is dynamically controlled in accordance with the packet data transmitted by the control processor, and then controlled by the control processor module 36. (Control processor module 36 may be the primary processor or secondary processor of the cell phone, or this function may be distributed.) Packet data specifies the operations to be performed by a certain chain of processing stages 38.

일 특정 구현에서, 셋업 모듈(34)은 노출을 모을 때 카메라(32)에 의해 활용되는 파라미터들을 - 한 프레임씩 기초하여 - 묘사한다. 셋업 모듈(34)은 또한, 카메라가 출력하는 데이터의 타입을 명시한다. 이들 명령어 파라미터들은 그 프레임에 대응하는 데이터 패킷(57)의 헤더 부분(56)의 제 1 필드(55)에서 전달된다(도 17).In one particular implementation, the setup module 34 depicts-on a frame-by-frame basis-the parameters utilized by the camera 32 when collecting the exposure. The setup module 34 also specifies the type of data the camera outputs. These command parameters are passed in the first field 55 of the header portion 56 of the data packet 57 corresponding to that frame (FIG. 17).

예를 들면, 프레임마다, 셋업 모듈(34)은 제 1 필드(55)가 예를 들면, 노출의 길이, 개구 크기, 렌즈 초점, 필드 깊이 등에 관해 카메라에 명령어하는 패킷(57)을 발행할 수 있다. 모듈(34)은 센서가 해상도를 감소시키기 위해(예를 들면 1280 x 960이 가능한 센서로부터 640 x 480 데이터의 프레임을 생성함) 센서 전하들을 합산하고, 적색-필터링된 센서 셀들로부터만 데이터를 출력하고, 센서의 중간에 걸친 셀들의 수평 라인으로부터만 데이터를 출력하고, 픽셀 데이터의 중심으로부터 셀들의 128 x 128 패치로부터만 데이터를 출력하는 등을 명시하기 위해 필드(55)를 더 작성할 수 있다. 카메라 명령어 필드(55)는 카메라가 - 예를 들면, 주변 조명(나중에 상술되는 바와 같이)과의 바람직한 동기화를 허용하도록 - 데이터를 캡처하는 정확한 시간을 더 명시할 수 있다. For example, per frame, the setup module 34 may issue a packet 57 where the first field 55 instructs the camera about, for example, the length of exposure, aperture size, lens focus, field depth, and the like. have. The module 34 sums the sensor charges so that the sensor reduces the resolution (e.g., generates a frame of 640 x 480 data from a 1280 x 960 capable sensor) and outputs data only from the red-filtered sensor cells. Field 55 may be further specified to specify output only from the horizontal line of cells across the middle of the sensor, output only data from the 128 × 128 patch of cells from the center of the pixel data, and the like. The camera command field 55 may further specify the exact time at which the camera captures data—eg, to allow for desirable synchronization with, for example, ambient lighting (as described later below).

셋업 모듈(34)에 의해 발행된 각각의 패킷(56)은 제 1 헤더 필드(55)에 상이한 카메라 파라미터들을 포함할 수 있다. 따라서, 제 1 패킷은 카메라(32)로 하여금 1 밀리초의 노출 시간을 가지고 전체 프레임 이미지를 캡처하게 할 수 있다. 다음 패킷은 카메라로 하여금 10 밀리초의 노출 시간을 가지고 전체 프레임 이미지를 캡처하게 할 수 있고, 세 번째는 100밀리초의 노출 시간을 지정할 수 있다. (이러한 프레임들은 높은 다이내믹 레인지 이미지를 산출하기 위해 나중에 조합하여 처리될 수 있다.) 제 4 패킷은 이미지 센서로부터 데이터를 다운-샘플링하도록 카메라에 명령어하고, 그레이스케일 휘도값들의 4 x 3 어레이를 출력하도록 상이하게 컬러-필터링된 센서 셀들로부터 신호들을 조합할 수 있다. 제 5 패킷은 프레임의 중심에서 픽셀들의 8 x 8 패치로부터만 데이터를 출력하도록 카메라에 명령할 수 있다. 제 6 패킷은 센서의 최상부, 최하부, 중간 및 중-상부 및 중-하부 로우들로부터 이미지 데이터의 5개의 라인들만 출력하도록 카메라에 명령할 수 있다. 제 7 패킷은 청색-필터링된 센서 셀들로부터만 데이터를 출력하도록 카메라에 명령할 수 있다. 제 8 패킷은 임의의 자동-초점 명령어들을 무시하지만, 대신 무한 초점에서 전체 프레임을 캡처하도록 카메라에 명령할 수 있다. 기타 등등.Each packet 56 issued by the setup module 34 may include different camera parameters in the first header field 55. Thus, the first packet may cause the camera 32 to capture the full frame image with an exposure time of 1 millisecond. The next packet can cause the camera to capture a full frame image with an exposure time of 10 milliseconds, and the third can specify an exposure time of 100 milliseconds. (These frames can later be combined and processed to produce a high dynamic range image.) The fourth packet instructs the camera to down-sample the data from the image sensor and outputs a 4 x 3 array of grayscale luminance values. The signals can be combined from differently color-filtered sensor cells to make them. The fifth packet may instruct the camera to output data only from an 8 × 8 patch of pixels at the center of the frame. The sixth packet may instruct the camera to output only five lines of image data from the top, bottom, middle and mid-top and mid-bottom rows of the sensor. The seventh packet may instruct the camera to output data only from the blue-filtered sensor cells. The eighth packet ignores any auto-focus instructions, but can instruct the camera to capture the entire frame at infinite focus instead. Etc.

각각의 이러한 패킷(57)은 카메라와 연관된 카메라 제어기 모듈에 버스 또는 다른 데이터 채널(60)을 통해 셋업 모듈(34)로부터 제공된다. (포토센서 셀들의 어레이, 연관된 아날로그-디지털 변환기 및 제어 회로 등을 포함하는 디지털 카메라의 세부사항들이 기술자들에게 잘 알려져 있고 장황하게 논의되지 않는다.) 카메라(32)는 패킷의 헤더 필드(55)의 명령어들에 따라 디지털 이미지 데이터를 캡처하고, 결과로서 생긴 이미지 데이터를 패킷의 몸체(59)로 스터핑한다. 이것은 또한, 패킷 헤더로부터 카메라 명령어들(55)을 삭제한다(또는 후속 처리 스테이지들에 의해 무시되도록 허용하는 방식으로 헤더 필드(55)를 마크한다). Each such packet 57 is provided from setup module 34 via a bus or other data channel 60 to a camera controller module associated with the camera. (The details of a digital camera, including an array of photosensor cells, associated analog-to-digital converter and control circuitry, etc., are well known to technicians and are not discussed in detail.) The camera 32 is a header field 55 of a packet. Capture digital image data and stuff the resulting image data into the body 59 of the packet. This also deletes camera instructions 55 from the packet header (or marks the header field 55 in a manner that allows it to be ignored by subsequent processing stages).

패킷(57)이 셋업 모듈(34)에 의해 작성되었으면, 그것은 또한 일련의 다른 헤더 필드들을 포함하였으며, 각각은 대응하는 연속적인 포스트-센서스테이지(38)이 어떻게 캡처된 데이터를 처리하는지를 명시한다. 도 16에 도시된 바와 같이, 여러 개의 이러한 포스트-센서 처리 스테이지들(38)이 존재한다. If the packet 57 was written by the setup module 34, it also included a series of other header fields, each specifying how the corresponding consecutive post-sensor stage 38 processes the captured data. As shown in FIG. 16, there are several such post-sensor processing stages 38.

카메라(32)는 버스 또는 다른 데이터 채널(61) 상으로 카메라(픽셀 패킷)에 의해 생성된 이미지-스터핑된 패킷을 출력하며, 이것은 제 1 처리 스테이지(38)에 전달된다. The camera 32 outputs an image-stuffed packet generated by the camera (pixel packet) on the bus or other data channel 61, which is passed to the first processing stage 38.

스테이지(38)는 패킷의 헤더를 조사한다. 카메라가 카메라 명령어들을 전달한 명령어 필드(55)를 삭제하므로(또는 그것이 무시되도록 마크되므로), 스테이지(38)의 제어부에 의해 마주친 제 1 헤더 필드는 필드(58a)이다. 이 필드는 패킷의 몸체에 있는 데이터에 스테이지(38)에 의해 적용될 동작의 파라미터들을 상술한다. Stage 38 examines the header of the packet. Since the camera deletes (or is marked to be ignored) the command field 55 that passed camera instructions, the first header field encountered by the control of the stage 38 is field 58a. This field details the parameters of the operation to be applied by the stage 38 to the data in the body of the packet.

예를 들면, 필드(58a)는 패킷의 이미지 데이터에 스테이지(38)에 의해 적용될 에지 검출 알고리즘의 파라미터들(또는 단순히 그러한 알고리즘이 적용되어야하는 것)을 명시한다. 그것은 또한, 스테이지(38)이 패킷의 몸체에서 오리지널 이미지 데이터를 결과로서 생긴 에지-검출된 세트의 데이터로 대체하는 것임을 명시할 수 있다. (첨부보다는 데이터의 대체가 패킷 헤더에서 단일 비트 플래그의 값에 의해 표시될 수 있다.) 스테이지(38)는 요청된 동작을 실행한다(이것은 특정 구현들에서 프로그래밍 가능한 하드웨어를 구성하는 것을 관련시킬 수 있다). 제 1 스테이지(38)는 그 후에, 패킷 헤더(56)로부터 명령어들(58a)을 삭제하고(또는 무시되도록 이들을 마크하고), 다음 처리 스테이지에 의한 동작을 위해 처리된 픽셀 패킷을 출력한다. For example, field 58a specifies the parameters of the edge detection algorithm to be applied by stage 38 (or simply what such algorithm should be applied) to the image data of the packet. It may also specify that stage 38 is to replace the original image data in the body of the packet with the resulting edge-detected set of data. (Replacement of data rather than attachment may be indicated by the value of a single bit flag in the packet header.) Stage 38 performs the requested operation (this may involve configuring programmable hardware in specific implementations). have). The first stage 38 then deletes (or marks them to be ignored) from the packet header 56 and outputs the processed pixel packet for operation by the next processing stage.

다음 처리 스테이지(여기에 나중에 논의되는 스테이지들(38a 및 38b)을 포함함)의 제어부는 패킷의 헤더를 조사한다. 필드(58a)가 삭제되었으므로(또는 무시하도록 마크되었으므로), 마주친 제 1 필드는 필드(58b)이다. 이 특정 패킷에서, 필드(58b)는 패킷의 몸체에 데이터에 관한 임의의 처리를 실행하지 않지만, 대신 패킷 헤더로부터 필드(58b)를 단순히 삭제하고 이 패킷을 다음 스테이지에 넘기도록 제 2 스테이지에 명령할 수 있다. The control of the next processing stage (including stages 38a and 38b discussed later here) examines the header of the packet. Since field 58a has been deleted (or marked to be ignored), the first field encountered is field 58b. In this particular packet, field 58b does not perform any processing on the data in the body of the packet, but instead instructs the second stage to simply delete field 58b from the packet header and pass the packet to the next stage. can do.

패킷 헤더의 다음 필드는 16 x 16 블록들에 기초하여, 패킷 몸체에서 발견된 이미지 데이터에 대해 2D FFT 동작들을 실행하도록 제 3 스테이지(38c)에 명령할 수 있다. 그것은 또한, 명시된 데이터(예를 들면, 텍스처 분류와 같이 그 어드레스에서 컴퓨터에 의해 수신된 FFT 데이터에 대해 실행될 작업을 상술함)에 의해 실행된 어드레스 216.239.32.10으로의 인터넷 송신을 위해, 무선 인터페이스에 처리된 FFT 데이터를 핸드-오프하도록 스테이지에 지시할 수 있다. 그것은 또한, 이용에 관한 명령어들에 대응함으로써 다시 실행된 어드레스 12.232.235.27로의 송신을 위해 동일하거나 상이한 무선 인터페이스에 캡처된 이미지의 중심에 대응하는 FFT 데이터의 단일 16 x 16 블록을 핸드 오프하도록 스테이지에 지시할 수 있다(예를 들면, 저장된 FFT 데이터의 아카이브에서 매칭을 검색하고, 매칭이 발견되는 경우에 정보를 리턴한다; 또한 연관된 식별자를 가진 아카이브에 이 16 x 16 블록을 저장한다). 최종적으로, 셋업 모듈(34)에 의해 작성된 헤더는 무선 인터페이스에 디스패칭된 FFT 데이터의 단일 16 x 16 블록과 패킷의 몸체를 대체하도록 단(38c)에 명령할 수 있다. 이전과 같이, 스테이지는 또한, 응답한 명령어들을 삭제(또는 마크)하도록 패킷 헤더를 편집하여, 다음 처리 스테이지를 위한 헤더 명령어 필드가 먼저 마주치게 된다. The next field of the packet header may instruct the third stage 38c to perform 2D FFT operations on image data found in the packet body based on 16 × 16 blocks. It is also used for the Internet interface to the air interface to address 216.239.32.10 executed by specified data (e.g., detailing the operations to be performed on the FFT data received by the computer at that address, such as texture classification). The stage may be instructed to hand off the processed FFT data. It is also in the stage to hand off a single 16 x 16 block of FFT data corresponding to the center of the image captured on the same or different air interface for transmission to address 12.232.235.27 again executed by corresponding instructions on usage. (Eg, search for a match in the archive of stored FFT data and return information if a match is found; also store this 16 x 16 block in an archive with an associated identifier). Finally, the header created by setup module 34 may instruct stage 38c to replace the body of the packet with a single 16 × 16 block of FFT data dispatched to the air interface. As before, the stage also edits the packet header to delete (or mark) the instructions that have been answered, so that the header instruction field for the next processing stage is encountered first.

다른 어레인지먼트들에서, 원격 컴퓨터들의 어드레스들은 하드-코딩되지 않는다. 예를 들면, 패킷은 데이터베이스 기록 또는 메모리 위치(폰에서 또는 클라우드에서)에 대한 포인터를 포함할 수 있으며, 이것은 목적지 어드레스를 포함한다. 또는 단(38c)은 질의 라우터 및 응답 관리기(예를 들면 도 7)에 처리된 픽셀 패킷을 핸드오프하도록 지시될 수 있다. 이러한 모듈은 다음에 어떤 타입의 처리가 요구되는지를 결정하기 위해 픽셀 패킷을 조사하고, 이를 적당한 제공자에 라우팅한다(리소스들이 허용하는 경우 셀 폰에 있거나, 클라우드에 있을 수 있다 - 안정한 정적 제공자들 사이에, 또는 경매를 통해 식별된 제공자에). 제공자는 요청된 출력 데이터(예를 들면, 텍스처 분류 정보, 및 아카이브에서의 임의의 매칭 FFT에 관한 정보)를 리턴하고, 픽셀 패킷 헤더에서의 다음 항목의 명령어마다 계속 처리한다.In other arrangements, the addresses of the remote computers are not hard-coded. For example, the packet can include a pointer to a database record or memory location (either in the phone or in the cloud), which includes the destination address. Alternatively, stage 38c may be directed to the query router and response manager (eg, FIG. 7) to hand off the processed pixel packet. This module then looks at the pixel packet to determine what type of processing is required and routes it to the appropriate provider (either in the cell phone or in the cloud if the resources allow it-between stable static providers). To, or through auctioned providers). The provider returns the requested output data (e.g., texture classification information, and information about any matching FFT in the archive) and continues processing for each instruction of the next item in the pixel packet header.

데이터 흐름은 특정 동작이 요구될 수 있는 만큼의 함수들을 통해 계속된다.The data flow continues through as many functions as specific operations may be required.

예시된 특정 어레인지먼트에서, 각각의 처리 스테이지(38)는 패킷 헤더로부터 동작한 명령어들을 제거해낸다. 이 명령어들은 처리 스테이지들의 시퀀스에서 헤더에서 지정되고, 이러한 제거는 각각의 단이 지시를 위해 헤더에 남아있는 제 1 명령어들을 조사하도록 허용한다. 다른 어레인지먼트도 당연히 대안적으로 활용될 수 있다. (예를 들면, 모듈은 처리 결과들에 기초하여 - 정면, 후미, 또는 시퀀스의 다른 곳에서 - 헤더에 새로운 정보를 삽입할 수 있다. 이러한 보정된 헤더는 그 후에 패킷 흐름 및 그에 따른 처리를 제어한다.)In the particular arrangement illustrated, each processing stage 38 removes the operating instructions from the packet header. These instructions are specified in the header in the sequence of processing stages, and this removal allows each stage to examine the first instructions remaining in the header for indication. Other arrangements can of course also be used alternatively. For example, the module may insert new information into the header based on the processing results—front, tail, or elsewhere in the sequence. This calibrated header then controls the packet flow and hence processing. do.)

다음 스테이지를 위한 데이터를 출력하는 것 외에도, 각각의 스테이지(38)는 제어 처리기 모듈(36)에 다시 데이터를 제공하는 출력(31)을 더 구비할 수 있다. 예를 들면, 로컬 스테이지들(38) 중 하나에 의해 착수된 처리는 특정 타입의 처리(예를 들면, 오브젝트 식별)를 위해 캡처된 데이터의 업커밍 프레임의 적합성을 최적화하도록 조정되어야 한다. 이러한 초점/노출 정보는 카메라를 위한 예측 셋업 데이터로서 이용될 수 있고 다음 번에 동일하거나 유사한 형태의 프레임이 캡처된다. 제어 처리기 모듈(36)은 이전 프레임들 또는 그들 프레임들의 서브-세트로부터의 초점 정보의 필터링되거나 시계열 예측 시퀀스를 이용하여 프레임 요청을 셋업할 수 있다. In addition to outputting data for the next stage, each stage 38 may further include an output 31 that provides data back to the control processor module 36. For example, the processing undertaken by one of the local stages 38 should be adjusted to optimize the suitability of the upcoming frame of captured data for a particular type of processing (eg, object identification). This focus / exposure information can be used as predictive setup data for the camera and next time a frame of the same or similar form is captured. Control processor module 36 may set up a frame request using a filtered or time series prediction sequence of focus information from previous frames or a subset of those frames.

에러 및 상태 보고 기능들이 또한 출력들(31)을 이용하여 달성될 수 있다. 각각의 스테이지들은 또한, 다른 처리들 또는 모듈들에 데이터를 제공하기 위해 - 셀 폰 내에서 국부적으로 또는 원격으로("클라우드에서") - 하나 이상의 다른 출력들(33)을 가질 수 있다. 데이터(패킷 형태로, 또는 다른 포맷으로)는 패킷(57) 또는 다른 곳의 명령어들에 따라 그러한 출력들에 향해질 수 있다. Error and status reporting functions can also be achieved using the outputs 31. Each stage may also have one or more other outputs 33-locally in the cell phone or remotely (“in the cloud”) to provide data to other processes or modules. The data (in packet form, or in other formats) may be directed to such outputs according to instructions in packet 57 or elsewhere.

예를 들면, 처리 모듈(38)은 실행한 어떤 처리 결과에 기초하여 데이터 흐름 선택을 할 수 있다. 예를 들면, 에지 검출단이 뚜렷한 대조 이미지를 식별한다면, 아웃고잉 패킷은 FFT 처리를 위해 외부 서비스 제공자에 라우팅될 수 있다. 그 제공자는 결과로서 생긴 FFT 데이터를 다른 스테이지들에 리턴할 수 있다. 그러나, 이미지가 불량한 에지들을 가진다면(초점을 벗어나는 것과 같이), 시스템은 데이터에 대해 실행될 FFT 및 다음의 처리를 원하지 않을 수 있다. 따라서, 처리 스테이지들은 처리의 파라미터들(식별된 이미지 특징들과 같이)에 의존하여 데이터 흐름에서 브랜치들을 유발할 수 있다. For example, the processing module 38 may make a data flow selection based on some processing result that has been performed. For example, if the edge detection stage identifies a distinct contrast image, the outgoing packet can be routed to an external service provider for FFT processing. The provider can return the resulting FFT data to other stages. However, if the image has bad edges (such as out of focus), the system may not want the FFT to be performed on the data and subsequent processing. Thus, processing stages may cause branches in the data flow depending on the parameters of the processing (such as identified image features).

이러한 종래의 브랜칭을 명시하는 명령어들은 패킷(57)의 헤더에 포함될 수 있거나, 그들이 제공될 수 있다. 도 19는 일 어레인지먼트를 도시한다. 명령어들(58d)은 원래 패킷(57)에서 조건을 명시하고, 메모리(79)에서의 위치를 명시하고, 그 대체로부터 후속 명령어들(58e' - 58g')이 판독될 수 있고, 조건이 충족되는 경우 패킷 헤더에 대체될 수 있다. 조건이 충족되지 않으면, 패킷에 이미 있는 헤더 명령어들에 따른 실행이 진행된다. Instructions specifying these conventional branches may be included in the header of the packet 57 or they may be provided. 19 shows one arrangement. Instructions 58d specify a condition in the original packet 57, specify a location in memory 79, and subsequent instructions 58e'-58g 'can be read from the replacement and the condition is met. Can be replaced in the packet header. If the condition is not met, execution proceeds according to the header instructions already in the packet.

다른 어레인지먼트들에서, 다른 변형들이 활용될 수 있다. 예를 들면, 모든 가능한 조건적 명령어들이 패킷에 제공될 수 있다. 다른 어레인지먼트에서, 패킷 아키텍처가 여전히 이용될 수 있지만, 하나 이상의 헤더 필드들은 명시적 명령어들을 포함하지 않는다. 오히려, 이들은 단순히, 예를 들면 대응하는 처리 스테이지(38)에 의해 대응하는 명령어들(또는 데이터)이 검색되는 메모리 위치를 가리킨다. In other arrangements, other variations may be utilized. For example, all possible conditional instructions can be provided in a packet. In other arrangements, the packet architecture may still be used, but one or more header fields do not contain explicit instructions. Rather, they simply point to a memory location where, for example, the corresponding instructions (or data) are retrieved by the corresponding processing stage 38.

메모리(79)(클라우드 구성요소를 포함할 수 있음)는 또한, 조건적 브랜칭이 활용되지 않는 경우에는 처리 흐름의 적응을 용이하게 할 수 있다. 예를 들면, 처리 스테이지는 나중 스테이지에 의해 적용될 필터 또는 다른 알고리즘의 파라미터들을 결정하는 출력 데이터를 산출할 수 있다(예를 들면, 콘볼루션 커넬, 시간 지연, 픽셀 마스크 등). 이러한 파라미터들은 메모리의 전 처리 스테이지에 의해 식별(예를 들면, 결정된/계산된, 및 저장된)될 수 있고, 나중 스테이지에 의한 이용을 위해 리콜될 수 있다. 도 19에서, 예를 들면, 처리 스테이지(38)는 메모리(79)에 저장되는 파라미터들을 생성한다. 후속 처리 스테이지(38c)는 나중에 이들 파라미터들을 검색하고, 그 할당된 동작의 실행시 이들을 이용한다. (메모리의 정보는 이들이 발생한 또는 이들이 도착된 모듈/제공자를 식별하도록 라벨이 붙여질 수 있다<알고 있는 경우>, 또는 다른 어드레싱 어레인지먼트들이 이용될 수 있다.) 따라서, 처리 흐름은 제어 처리기 모듈(36)이 패킷(57)을 작성하기 위해 원래 셋업 모듈(34)에 향했을 때 알지 못하는 환경들 및 파라미터들에 적응될 수 있다. Memory 79 (which may include cloud components) may also facilitate adaptation of the processing flow when conditional branching is not utilized. For example, the processing stage may produce output data that determines parameters of a filter or other algorithm to be applied by later stages (eg, convolution kernel, time delay, pixel mask, etc.). These parameters may be identified (eg, determined / calculated, and stored) by the preprocessing stage of the memory and may be recalled for use by later stages. In FIG. 19, for example, processing stage 38 generates parameters that are stored in memory 79. Subsequent processing stage 38c later retrieves these parameters and uses them in the execution of its assigned operation. (Information in memory may be labeled to identify the module / provider in which they originated or in which they arrived <if known>, or other addressing arrangements may be used.) Thus, the processing flow may be controlled processor module 36. ) May be adapted to unknown environments and parameters when originally directed to the setup module 34 to compose the packet 57.

일 특정 실시예에서, 처리 스테이지들(38)의 각각은 특정 작업에 전용인 하드웨어 회로를 포함한다. 제 1 스테이지(38)는 전용된 에지-검출 처리기일 수 있다. 제 3 스테이지(38c)는 전용된 FFT 처리기일 수 있다. 다른 스테이지들은 다른 처리들에 전용될 수 있다. 이들은 DCT, 웨이블릿, Haar, 허프(Hough) 및 푸리에-멜린 변환 처리기들, 상이한 종류의 필터들(예를 들면, 위너, 저역, 대역통과, 고역), 및 얼굴 인식, 광학 캐릭터 인식, 아이겐벨류들의 계산, 형상의 추출, 컬러 및 텍스처 특징 데이터, 바코드 디코딩, 워터마크 디코딩, 오브젝트 세그먼테이션, 패턴 인식, 나이 및 성별 검출, 정서 분류, 방향 결정, 압축, 압축 해제, 로그-극성 맵핑, 컨볼루션, 보간, 데시메이션/다운-샘플링/안티-에일리어싱; 상관, 제곱근 및 제곱 연산들 실행, 행렬 곱셈, 원근 변환, 버터플라이 동작들(더 작은 DFT들의 결과들을 더 큰 DFT로 조합하거나, 더 큰 DCT가 서브 변환들로 분해하는 것) 등과 같은 동자들의 전부 또는 일부를 실행하기 위한 스테이지들을 포함할 수 있다. In one particular embodiment, each of the processing stages 38 includes hardware circuitry dedicated to a particular task. The first stage 38 can be a dedicated edge-detection processor. The third stage 38c may be a dedicated FFT processor. Different stages can be dedicated to other processes. These include DCT, Wavelet, Haar, Hough and Fourier-Melin transformation processors, different kinds of filters (e.g. Wiener, Low Pass, Bandpass, High Pass), and Face Recognition, Optical Character Recognition, Eigen Values Calculations, feature extraction, color and texture feature data, barcode decoding, watermark decoding, object segmentation, pattern recognition, age and gender detection, emotional classification, direction determination, compression, decompression, log-polar mapping, convolution, interpolation Decimation / down-sampling / anti-aliasing; All of the contributors such as correlation, square root and square operations, matrix multiplication, perspective transformation, butterfly operations (combining the results of smaller DFTs into a larger DFT, or a larger DCT decomposing into sub-transforms) Or stages for executing some.

이들 하드웨어 처리기들은 전용되는 대신 필드-구성 가능할 수 있다. 따라서, 도 16의 처리 블록들의 각각은 환경이 정당한 한 동적으로 재구성 가능할 수 있다. 한 순간에서, 블록은 FFT 처리 모듈로서 구성될 수 있다. 다음 순간, 그것은 필터단 등으로서 구성될 수 있다. 한 순간, 하드웨어 처리 체인은 바코드 판독기로서 구성될 수 있다; 다음에는 얼굴 인식 시스템 등으로서 구성될 수 있다. These hardware processors may be field-configurable instead of dedicated. Thus, each of the processing blocks of FIG. 16 may be dynamically reconfigurable as long as the environment is justified. At one moment, the block may be configured as an FFT processing module. At the next moment, it may be configured as a filter stage or the like. At one point, the hardware processing chain can be configured as a barcode reader; Next, it may be configured as a face recognition system or the like.

이러한 하드웨어 재구성 정보는 클라우드로부터 또는 애플 앱스토어와 같은 서비스들로부터 다운로드될 수 있다. 그리고, 정보는 한번 다운로드된 폰 상에 정적으로 상주할 필요가 없다 - 그것은 필요할 때마다 클라우드/앱스토어로부터 호출될 수 있다. This hardware reconfiguration information can be downloaded from the cloud or from services such as the Apple App Store. And, the information does not need to reside statically on the phone once downloaded-it can be called from the cloud / app store whenever needed.

광대역 가용성 및 속도가 증가한다고 가정하면, 하드웨어 재구성 데이터는 셀 폰이 턴온되거나 초기화될 때마다, 또는 특정 기능이 초기화될 때마다 셀 폰에 다운로드될 수 있다. 다수의 상이한 버전들의 애플리케이션이 - 최후에 다운로드한 상이한 이용자들이 업데이트할 때, 회사들이 필드에서 이종 버전들의 제품들을 지원하는데 직면하는 문제들에 의존하여 - 주어진 시간에 시장에 배치되는 것이 딜레마에 빠진다. 디바이스 또는 애플리케이션이 초기화될 때마다, 최후 버전의 모든 또는 선택된 기능들이 폰에 다운로드된다. 그리고, 이것은 전체 시스템 기능뿐 아니라, 하드웨어 구동기들, 하드웨어 레이어들을 위한 소프트웨어 등과 같은 구성요소들을 위해 동작한다. 각각의 초기화에서, 하드웨어는 최후 버전의 응용 가능한 명령어들로 새롭게 구성된다. (초기화 동안 이용되는 코드에 대해, 다음 초기화에서 이용하기 위해 다운로드될 수 있다.) 일부 갱신된 코드가 다운로드될 수 있고, 특정 애플리케이션들이 이를 필요로 할 때만 - 특수화된 기능들을 위해 도 6의 하드웨어를 구성하는 것에 대해서와 같이- 동적으로 로드될 수 있다. 명령어들은 또한 특정 플랫폼들에 적응될 수 있으며, 예를 들면, 아이폰 디바이스는 안드로이드 디바이스와는 상이한 가속도계들을 활용할 수 있고, 애플리케이션 명령어들이 따라서 변할 수 있다. Assuming broadband availability and speed increase, hardware reconfiguration data may be downloaded to the cell phone whenever the cell phone is turned on or initialized, or whenever a particular function is initialized. It is a dilemma to be placed on the market at a given time when a large number of different versions of an application are updated by different users who have downloaded last, depending on the problems companies are facing supporting heterogeneous versions of the product in the field. Each time the device or application is initialized, all or selected features of the last version are downloaded to the phone. And this works not only for the overall system functionality, but also for components such as hardware drivers, software for hardware layers, and the like. At each initialization, the hardware is newly constructed with the latest version of the applicable instructions. (For the code used during initialization, it can be downloaded for use in the next initialization.) Only when some updated code can be downloaded and certain applications need it-the hardware of FIG. 6 for specialized functions. As for constructing-it can be loaded dynamically. The instructions can also be adapted to specific platforms, for example, an iPhone device can utilize different accelerometers than an Android device, and application instructions can change accordingly.

일부 실시예에서, 각각의 용도 처리기들이 고정된 순서로 연결될 수 있다. 에지 검출 처리기가 첫 번째, FFT 처리기가 세 번째 등이 될 수 있다. In some embodiments, the respective use processors may be connected in a fixed order. The edge detection processor may be first, the FFT processor may be third, and so on.

대안적으로, 처리 모듈들은 임의의 스테이지가 임의의 스테이지로부터 데이터를 수신하고 임의의 스테이지에 데이터를 출력하도록 허용하는 하나 이상의 버스들(및/또는 크로스바 어레인지먼트 또는 다른 상호작용 아키텍처)에 의해 상호접속될 수 있다. 다른 상호접속 방법은 칩 상의 네트워크이다(효과적으로, 패킷-기반 LAN; 적응성에서 크로스바와 유사하지만, 네트워크 프로토콜들에 의해 프로그래밍 가능함). 이러한 어레인지먼트들은 또한 하나 이상의 스테이지들이 데이터를 - 다른 처리를 실행하기 위해 입력으로서 출력을 취하여- 반복적으로 처리하도록 지원할 수 있다.Alternatively, processing modules may be interconnected by one or more buses (and / or crossbar arrangement or other interaction architecture) that allows any stage to receive data from any stage and output data to any stage. Can be. Another interconnect method is a network on a chip (effectively a packet-based LAN; similar to a crossbar in adaptability, but programmable by network protocols). Such arrangements may also support one or more stages to iteratively process the data-taking output as input to perform other processing.

하나의 반복 처리 어레인지먼트는 도 16에서 스테이지들(38a/38b)에 의해 보여진다. 스테이지(38a)로부터의 출력은 스테이지(38b)에 대한 입력으로서 취해질 수 있다. 스테이지(38b)는 데이터에 대한 처리를 행하는 것이 아니라, 스테이지(38a)의 입력으로 다시 이를 적용하도록 명령어될 수 있다. 이것은 원하는 대로 여러 번 루핑될 수 있다. 스테이지(38a)에 의한 반복 처리가 완료되면, 그 출력은 체인에서 다음 스테이지(38c)에 넘겨질 수 있다. One iterative processing arrangement is shown by stages 38a / 38b in FIG. 16. The output from the stage 38a can be taken as an input to the stage 38b. The stage 38b may be commanded to apply it back to the input of the stage 38a rather than to process the data. This can be looped as many times as desired. Once the iteration process by stage 38a is complete, its output can be passed to the next stage 38c in the chain.

단순히 통과 스테이지의 역할을 하는 것 외에도, 스테이지(38b)는 스테이지(38a)에 의해 처리된 데이터에 대한 그 자신의 타입의 처리를 실행할 수 있다. 그 출력은 스테이지(38a)의 입력에 적용될 수 있다. 스테이지(38a)는 스테이지(38b)에 의해 생성된 데이터에 대해 그 처리를 다시 적용하거나, 그것을 통과하도록 명령어될 수 있다. 스테이지(38a/38b) 처리의 임의의 일련의 조합이 따라서 달성될 수 있다. In addition to merely acting as a pass-through stage, stage 38b may execute its own type of processing on the data processed by stage 38a. The output can be applied to the input of the stage 38a. Stage 38a may be instructed to reapply or pass through the processing on the data generated by stage 38b. Any series of combinations of stages 38a / 38b processing may thus be achieved.

상술된 것에서 스테이지들(38a 및 38b)의 역할들이 또한 반대로 될 수 있다. The roles of the stages 38a and 38b in the above can also be reversed.

이러한 방식으로, 스테이지들(38a 및 38b)은 (1) 데이터에 한번 이상의 스테이지(38a) 처리를 적용하고; (2) 데이터에 한번 이상의 스테이지(38b) 처리를 적용하고; (3) 데이터에 스테이지들(38a 및 38b) 처리들의 조합 및 순차를 적용하고; (4) 처리하지 않고 다음 스테이지에 입력 데이터를 단순히 통과시키도록 동작될 수 있다.In this way, the stages 38a and 38b (1) apply one or more stage 38a processing to the data; (2) apply one or more stages 38b processing to the data; (3) apply a combination and sequence of stages 38a and 38b processes to the data; (4) can be operated to simply pass the input data to the next stage without processing.

카메라 스테이지는 반복 처리 루프로 통합될 수 있다. 예를 들면, 초점-고정을 얻기 위해, 패킷은 카메라로부터 초점을 평가하는 처리 모듈에 넘겨질 수 있다. (예들은 - 고주파수 이미지 성분들을 찾는 - 강한 에지들을 찾는 등의 FFT 스테이지를 포함할 수 있다. 샘플 에지 검출 알고리즘들은 Canny, Sobel, 및 차동을 포함할 수 있다. 에지 검출은 또는 오브젝트 추적에 유용하다.) 이러한 처리 모듈로부터의 출력은 카메라의 제어기 모듈로 루프백되고 초점 신호가 변할 수 있다. 카메라는 변경된 초점 신호로 후속 프레임을 캡처하고, 결과로서 생긴 이미지는 초점을 평가하는 처리 모듈에 다시 제공된다. 이 루프는 처리 모듈이 달성되는 임계 범위 내에서 보고될 때까지 계속된다. (패킷 헤더 또는 메모리의 파라미터는 예를 들면, 지정된 요건을 충족시키는 초점이 10회 반복들 내에서 충족되지 않는 경우에 반복이 종료하고 에러 신호를 출력해야 하는 것을 명시하는 것과 같이, 출력 반복 한도를 명시할 수 있다.)The camera stage can be integrated into an iterative processing loop. For example, to obtain focus-locking, a packet can be passed from a camera to a processing module that evaluates focus. Examples may include an FFT stage—finding high frequency image components—finding strong edges, etc. Sample edge detection algorithms may include Canny, Sobel, and differential. Edge detection is useful for or tracking an object. The output from this processing module can be looped back to the controller module of the camera and the focus signal can change. The camera captures subsequent frames with the altered focus signal, and the resulting image is provided back to the processing module to evaluate the focus. This loop continues until the processing module is reported within the threshold range achieved. (Parameters in the packet header or memory may specify an output iteration limit, for example specifying that the iteration should end and output an error signal if the focus that meets the specified requirement is not met within 10 iterations. Can be specified.)

이 논의가 일련의 데이터 처리에 초점을 맞추었지만, 이미지 또는 다른 데이터가 2개 이상의 병렬 경로들로 처리될 수 있다. 예를 들면, 단(38d)의 출력은 2개의 후속 스테이지들에 적용될 수 있고, 이들 각각은 처리에서 포크의 각각의 브랜치를 시작한다. 이들 2개의 체인들은 이후 독립적으로 처리되거나, 이러한 처리로부터 결과로서 나온 데이터가 후속 스테이지에서 조합 - 또는 함께 이용- 될 수 있다. (이들 처리 체인들의 각각은 분기될 수 있다.)Although this discussion focuses on a series of data processing, images or other data can be processed in two or more parallel paths. For example, the output of stage 38d can be applied to two subsequent stages, each of which starts each branch of the fork in processing. These two chains can then be processed independently, or the data resulting from this processing can be combined-or used together-at a later stage. (Each of these processing chains may be branched.)

주지된 바와 같이, 포크는 체인에서 훨씬 초기에 흔히 나타날 것이다. 즉, 대부분의 구현들에서, 병렬 처리 체인은 인간 소비에 대한 이미지를 - 기계에 반대로 - 생성하기 위해 활용될 것이다. 따라서, 병렬 처리는 도 13의 정합점(17)에 의해 도시된 바와 같이, 카메라 센서(12)를 즉시 따르도록 분기할 수 있다. 인간 시각 시스템(13)에 대한 처리는 잡음 감소, 백색 밸런스 및 압축과 같은 동작들을 포함한다. 대조적으로, 오브젝트 식별(14)에 대한 처리는 이 명세에 상술된 동작들을 포함할 수 있다. As is well known, forks will appear much earlier in the chain. That is, in most implementations, the parallel processing chain will be utilized to generate an image of human consumption-as opposed to a machine. Thus, parallel processing can branch to immediately follow the camera sensor 12, as shown by the mating point 17 of FIG. Processing for the human visual system 13 includes operations such as noise reduction, white balance and compression. In contrast, the process for object identification 14 may include the operations detailed in this specification.

아키텍처가 분기되거나 다른 병렬 처리들을 관련시킬 때, 상이한 모듈들은 상이한 시간들에서 그들 처리를 종료할 수 있다. 이들은 처리들이 종료할 때 - 파이프라인 또는 다른 상호접속 네트워크가 허용될 때 비동시적으로 - 데이터를 출력할 수 있다. 파이프라인/네트워크가 자유로울 때, 다음 모듈은 완성된 결과들을 전달할 수 있다. 흐름 제어는 하나의 경로 또는 데이터에 더 높은 우선순위를 제공하는 것과 같이 어떤 중재를 관련시킬 수 있다. 패킷들은 중재가 필요한 경우에 순위를 결정하는 우선순위 데이터를 전달할 수 있다. 예를 들면, 많은 이미지 처리 동작들/모듈들은 FFT 모듈에 의해 생성된 바와 같이, 푸리에 도메인 데이터를 이용한다. FFT 모듈로부터의 출력은 따라서 데이터 트래픽을 중재시 높은 우선순위 및 다른 것들보다 높은 순위가 제공될 수 있어서, 다른 모듈들에 의해 필요할 수 있는 푸리에 데이터는 최소 지연으로 이용 가능하게 될 수 있다. When the architecture branches or involves other parallel processes, different modules may end their processing at different times. They may output data when processes terminate-asynchronously when a pipeline or other interconnect network is allowed. When the pipeline / network is free, the next module can deliver the finished results. Flow control can involve some arbitration, such as giving higher priority to one path or data. Packets may carry priority data that determines the ranking when arbitration is needed. For example, many image processing operations / modules use Fourier domain data, as generated by the FFT module. The output from the FFT module may thus be provided with a higher priority and higher priority than others in mediating data traffic so that Fourier data that may be needed by other modules may be made available with minimal delay.

다른 구현들에서, 처리 스테이지들의 일부 또는 전부는 전용 용도 처리기들이 아니라, 소프트웨어에 의해 프로그래밍된 범용 마이크로프로세서이다. 또 다른 구현들에서, 처리기들은 하드웨어-재구성 가능하다. 예를 들면, 일부 또는 전부는 Xilinx Virtex 시리즈 디바이스들과 같은 필드 프로그래밍 가능한 게이트 어레이들일 수 있다. 대안적으로, 이들은 텍사스 인스트루먼트 TMS320 시리즈 디바이스들과 같은 디지털 신호 처리 코어들일 수 있다. In other implementations, some or all of the processing stages are not general purpose processors, but general purpose microprocessors programmed by software. In yet other implementations, the processors are hardware-reconfigurable. For example, some or all may be field programmable gate arrays, such as Xilinx Virtex series devices. Alternatively, they may be digital signal processing cores, such as Texas Instruments TMS320 series devices.

다른 구현들은 PC302 및 PC312 멀티코어 DSP들과 같은 PicoChip 디바이스들을 포함할 수 있다. 이들 프로그래밍 모델은 각 코어가 독립적으로 코딩되고(예를 들면 C에서), 그 후에 내부 상호접속 메시를 통해 다른 것들과 통신하도록 허용된다. 연관된 도구들은 특별히 셀룰러 기기에서 그러한 처리기들의 이용을 제공한다. Other implementations may include PicoChip devices such as PC302 and PC312 multicore DSPs. These programming models allow each core to be coded independently (eg in C) and then communicate with others via an internal interconnect mesh. Associated tools particularly provide for the use of such processors in cellular devices.

또 다른 구현들은 ASIC 상의 구성 가능한 로직을 활용할 수 있다. 예를 들면, 처리기는 구성 로직 - 전용 로직과 혼합된-의 영역을 포함할 수 있다. 이것은 전용 파이프라인 또는 버스 인터페이스 회로와 파이프라인으로의 구성 가능한 로직을 허용한다. Other implementations can utilize configurable logic on the ASIC. For example, the processor may include an area of configuration logic-mixed with dedicated logic. This allows for configurable logic into pipelines and pipelines with dedicated pipeline or bus interface circuits.

구현은 또한, 소형 CPU 및 RAM을 가지고 펌웨어를 위한 프로그래밍 가능한 코드 공간 및 처리를 위한 워크스페이스 - 본질적으로 전용 코어- 를 가진 하나 이상의 모듈들을 포함할 수 있다. 이러한 모듈은 상당히 광범위한 계산들 - 그때 하드웨어를 이용하고 있는 처리에 필요한 대로 구성 가능한 - 을 실행할 수 있다. The implementation may also include one or more modules with a small CPU and RAM, programmable code space for firmware and a workspace for processing, essentially a dedicated core. Such a module can perform quite a wide range of calculations, which are then configurable as needed for processing using hardware.

모든 이러한 디바이스들은 임의의 스테이지가 임의의 스테이지로부터 데이터를 수신하고 그에 데이터를 출력하도록 다시 허용하는 버스, 크로스바 또는 다른 상호접속 아키텍처에 배치될 수 있다. (이러한 방식으로 구현된 FFT 또는 다른 변환 처리기는 16 x 16, 64 x 64, 4096 x 4096, 1 x 64, 32 x 128 등의 블록들을 처리하도록 동적으로 재구성될 수 있다.)All such devices may be placed in a bus, crossbar or other interconnect architecture that allows any stage to receive data from any stage and again output data to it. (FFTs or other transform processors implemented in this manner can be dynamically reconfigured to handle blocks such as 16 x 16, 64 x 64, 4096 x 4096, 1 x 64, 32 x 128, etc.)

특정 구현들에서, 어떤 처리 모듈들은 복제된다 - 병렬 하드웨어 상의 병렬 실행을 허용한다. 예를 들면, 여러 FFT들은 동시에 처리될 수 있다. In certain implementations, some processing modules are replicated-allowing parallel execution on parallel hardware. For example, several FFTs can be processed simultaneously.

변형 어레인지먼트에서, 패킷은 하나 이상의 처리 모듈들의 하드웨어를 재구성하도록 서빙하는 명령어들을 전달한다. 패킷이 모듈에 들어가면, 헤더는 이미지-관련 데이터가 처리를 위해 수용되기 전에 모듈이 하드웨어를 재구성하도록 한다. 따라서, 아키텍처는 패킷들(이미지 관련 데이터를 전달할 수도 있고 그렇지 않을 수도 있음)에 의해 작동 중에 구성된다. 패킷들은 CPU 코어를 가진 모듈로 또는 애플리케이션-이나 클라우드-기반 층으로 로드될 펌웨어를 유사하게 전달할 수 있다; 마찬가지로 소프트웨어 명령어들을 이용한다. In a variant arrangement, the packet carries instructions that serve to reconfigure the hardware of one or more processing modules. When a packet enters the module, the header causes the module to reconfigure the hardware before the image-related data is accepted for processing. Thus, the architecture is constructed in operation by packets (which may or may not carry image related data). Packets can similarly deliver firmware to be loaded into a module with a CPU core or into an application- or cloud-based layer; Similarly, use software instructions.

모듈 구성 명령어들은 무선 또는 다른 외부 네트워크를 통해 수신될 수 있다; 그것은 로컬 시스템 상에 항상 상주할 필요는 없다. 이용자가 로컬 명령어들이 이용 가능하지 않은 동작을 요청하는 경우, 시스템은 원격 소스로부터 재구성 데이터를 요청할 수 있다. Module configuration instructions may be received wirelessly or via another external network; It does not always need to reside on the local system. If the user requests an operation for which no local instructions are available, the system can request reconfiguration data from a remote source.

구성 데이터/명령어들 자체를 전달하는 대신에, 패킷은 단순히 인덱스 번호, 포인터, 또는 다른 어드레스 정보를 전달할 수 있다. 이러한 정보는 필요한 데이터/명령어들이 검색될 수 있는 대응하는 메모리 저장을 액세스하기 위해 처리 모듈에 의해 이용될 수 있다. 캐시의 경우, 로컬 메모리 저장이 필요한 데이터/명령어들을 포함하는 것으로 발견되지 않는 경우, 이들은 다른 소스(예를 들면, 외부 네트워크에 액세스)로부터 요청될 수 있다. Instead of conveying the configuration data / instructions themselves, the packet may simply carry an index number, a pointer, or other address information. This information can be used by the processing module to access the corresponding memory store where the required data / instructions can be retrieved. In the case of a cache, if local memory storage is not found to contain the data / instructions needed, they may be requested from another source (eg, access to an external network).

이러한 어레인지먼트들은 - 데이터가 도달할 때 모듈을 재구성하는 - 하드웨어층으로 동적 라우팅 능력을 떨어뜨린다. Such arrangements degrade the dynamic routing capability to the hardware layer-reconfiguring the module as data arrives.

병렬성은 그래픽 처리 유닛들(GPU들)에서 광범위하게 활용된다. 많은 컴퓨터 시스템들은 그래픽들 렌더링과 같은 동작들을 처리하기 위해 보조 처리기들로서 GPU들을 활용한다. 셀 폰들은 폰들이 게임 플랫폼들로서 서빙하도록 허용하기 위한 GPU 칩들을 점차적으로 포함한다; 이들은 본 기술의 특정 구현들에서 이점을 취하기 위해 활용될 수 있다. (예의 방식으로 제한하지 않고, GPU는 바이리니어(bilinear) 및 바이큐빅(bicubic) 보간, 투사 변환들, 필터링 등을 실행하기 위해 이용될 수 있다. Parallelism is widely used in graphics processing units (GPUs). Many computer systems utilize GPUs as coprocessors to process operations such as graphics rendering. Cell phones gradually include GPU chips to allow phones to serve as game platforms; These may be utilized to take advantage of certain implementations of the present technology. (Not limiting in the manner of example, a GPU may be used to perform bilinear and bicubic interpolation, projection transforms, filtering, and the like.

본 기술의 다른 양태에 따라, GPU는 렌즈 수차들 및 다른 광학 왜곡을 보정하기 위해 이용된다. According to another aspect of the present technology, a GPU is used to correct lens aberrations and other optical distortions.

셀 폰은 카메라들은 흔히, 배럴 왜곡, 파라미터의 초점 변형들 등과 같은 광학 비선형성을 디스플레이한다. 이것은 특히 캡처된 이미지로부터 디지털 워터마크 정보를 디코딩할 때 문제가 있다. GPU를 이용하여, 이미지는 텍스처 맵으로 다루어질 수 있고 정정 표면에 적용될 수 있다. Cell phones often display optical nonlinearities such as barrel distortions, parameter focus variations, and the like. This is especially problematic when decoding digital watermark information from the captured image. Using the GPU, the image can be treated as a texture map and applied to the correction surface.

통상적으로, 텍스처 맵핑은 예를 들면 감옥의 표면 상에 벽들 또는 돌벽의 화상을 넣기 위해 이용된다. 텍스처 메모리 데이터가 참조되고, 그것이 그려질 때 평면 또는 다각형 상으로 맵핑된다. 본 콘텍스트에서, 그것은 표면에 적용되는 이미지이다. 이미지가 변환을 정정하여 임의로 그려지도록 표면이 형성된다. Typically, texture mapping is used to put images of walls or stone walls on the surface of a prison, for example. Texture memory data is referenced and mapped onto a plane or polygon when it is drawn. In this context, it is an image applied to a surface. The surface is formed so that the image is drawn arbitrarily by correcting the transformation.

디지털로 워터마킹된 이미지의 스테가노그래픽 교정 신호들은 이미지가 변환되는 왜곡을 식별하기 위해 이용된다. (예를 들면, Digimarc의 특허 6,590,996 참조.) 워터마킹된 이미지의 각각의 패치는 번역 및 스케일과 같은 어파인 변환 파라미터들에 의해 특징지워질 수 있다. 캡처된 프레임의 각각의 위치에 대한 에러 기능이 그에 의해 도출될 수 있다. 이러한 에러 정보로부터, 대응하는 표면이 고안될 수 있다 - 왜곡된 이미지가 GPU에 의해 프로젝팅될 때, 표면은 이미지가 반대-왜곡된 오리지널 형태로 나타나도록 한다. Steganographic correction signals of the digitally watermarked image are used to identify the distortion to which the image is transformed. (See, eg, Digimarc's patent 6,590,996.) Each patch of a watermarked image can be characterized by affine transformation parameters such as translation and scale. The error function for each position of the captured frame can thereby be derived. From this error information, the corresponding surface can be devised-when the distorted image is projected by the GPU, the surface causes the image to appear in the anti-distorted original form.

렌즈는 참조 워터마크 이미지를 가진 이러한 방식으로 특징지워질 수 있다. 연관된 정정 표면이 고안되었으면, 그 광학 시스템을 통해 캡처된 다른 이미지에 재이용될 수 있다(연관된 왜곡이 고정되기 때문). 다른 이미지는 렌즈 왜곡을 정정하기 위해 GPU에 의해 이 정정 표면 상으로 프로젝팅될 수 있다. (상이한 초점 깊이들 및 개구들은 렌즈를 통한 광 경로가 상이할 수 있으므로, 상이한 정정 기능들의 특징을 요구할 수 있다.)The lens can be characterized in this way with a reference watermark image. Once the associated correction surface has been devised, it can be reused in other images captured through that optical system (since the associated distortion is fixed). Another image can be projected onto this correction surface by the GPU to correct lens distortion. (Different focal depths and apertures may require different correction functions, as the light path through the lens may be different.)

새로운 이미지가 캡처될 때, 키스톤/트래페조이달(keystone/trapezoidal) 원근 효과를 제거하기 위해 초기에 직선화(rectilinearize)될 수 있다. 일단 직선화되면(예를 들면, 카메라 렌즈들에 대해 리스케어링(re-squared)되면), 직선화된 이미지를 정정 표면 상으로 GPU를 이용하여 맵핑함으로써, 로컬 왜곡들이 정정될 수 있다. When a new image is captured, it can be initially rectified to remove the keystone / trapezoidal perspective effect. Once straightened (eg, re-squared with respect to camera lenses), local distortions can be corrected by mapping the straightened image with the GPU onto a correction surface.

따라서, 정정 모델은 본질적으로 다각형 표면에 있으며, 여기서 틸트들 및 고도들은 초점 불규칙성들에 대응한다. 이미지의 각각의 영역은 이미지의 그 조각의 정정을 허용하는 로컬 변환 메트릭스를 가진다. Thus, the correction model is essentially on a polygonal surface, where the tilts and elevations correspond to focal irregularities. Each area of the image has a local transform matrix that allows correction of that piece of the image.

동일한 어레인지먼트는 이미지 프로젝션 시스템에서 렌즈의 왜곡을 정정하기 위해 마찬가지로 이용될 수 있다. 프로젝션 전에, 이미지는 렌즈 왜곡에 반대로 동작하기 위해 합성된 정정 표면 상으로 - 텍스처와 같이 - 맵핑된다. 이렇게 처리된 이미지가 렌즈를 통해 프로젝팅될 때, 렌즈 왜곡은 이전에 적용된 정정 표면 왜곡에 반대로 동작하여, 정정된 이미지가 시스템으로부터 프로젝팅되게 한다. The same arrangement can likewise be used to correct distortion of the lens in the image projection system. Before projection, the image is mapped onto the synthesized correction surface-like a texture-to counteract lens distortion. When the processed image is projected through the lens, the lens distortion acts in opposition to the previously applied correction surface distortion, causing the corrected image to be projected from the system.

노출들을 모으는데 카메라(32)에 의해 활용될 수 있는 파라미터들 중 하나로서 필드의 깊이를 참조하였다. 렌즈가 단 하나의 거리에서만 정확하게 초점이 맞춰질 수 있지만, 선명성의 감소는 초점이 맞추어진 거리의 양쪽 상으로 점진적이다. (필드의 깊이는 광학들의 점 스프레드 기능(point spread function)에 의존한다 - 렌즈 초점 길이 및 개구를 포함한다.) 캡처된 픽셀들이 의도된 동작에 유용한 정보를 산출하는 한, 이들은 완벽한 초점에 있지 않아도 된다. Reference was made to the depth of field as one of the parameters that could be utilized by camera 32 to aggregate the exposures. Although the lens can be accurately focused at only one distance, the decrease in sharpness is progressive over both sides of the focused distance. (The depth of field depends on the point spread function of the optics-including lens focal length and aperture.) As long as the captured pixels yield useful information for the intended operation, they do not have to be in perfect focus. do.

때때로 초점 알고리즘들은 초점을 추적하지만 달성하는데 실패한다 - 사이클들 및 배터리 수명을 낭비한다. 일부 예들에서는 일련의 상이한 초점 설정들에서 프레임들을 단순히 가로채는(grab) 것이 더 양호하다. 초점 깊이들 또는 필드 깊이들의 검색 트리가 이용될 수 있다. 이것은 이미지가 잠재적으로 관심있는 다수의 대상들 -각각은 상이한 평면에 있음 -을 포함하는 경우가 특히 유용하다. 시스템은 6인치의 초점이 맞춰진 프레임과 24인치에 초점이 맞춰진 다른 프레임을 캡처할 수 있다. 상이한 프레임들은 시야 내에서 관심 있는 2개의 오브젝트들이 존재하는 - 하나는 하나의 프레임에서 더 양호하게 캡처되고, 다른 하나는 다른 프레임에서 더 양호하게 캡처된 - 것을 나타낼 수 있다. 또는 24인치-초점이 맞추진 프레임이 유용한 데이터를 가지지 않은 것으로 발견되지만, 6인치-초점이 맞춰진 프레임은 2개 이상의 대상 이미지 평면들이 존재하는 것을 알 수 있을 만큼 충분히 식별적인 주파수 콘텐트를 포함할 수 있다. 주파수 콘텐트에 기초하여, 다른 초점 설정들을 가진 하나 이상의 프레임들이 그 후에 캡처될 수 있다. 또는 24인치-초점이 맞춰진 프레임의 영역이 푸리에 속성들 중 하나의 설정을 가질 수 있고, 6인치-초점이 맞춰진 프레임의 동일 영역은 푸리에 속성들 중 상이한 설정을 가질 수 있고, 2개의 프레임들 사이의 차이로부터, 다음 시험적 초점 설정들이 식별될 수 있고(예를 들면 10인치에서), 그 초점 설정에서의 다른 프레임이 캡처될 수 있다. 피드백이 적용된다 - 완전한 초점 고정을 획득할 필요가 있는 것이 아니라, 부가의 유용한 세부사항들을 나타낼 수 있는 다른 캡처들에 관해 판단하기 위한 검색 기준에 따른다. 검색은 모든 대상들에 관한 만족스러운 정보가 패더링(fathering)될 때까지, 식별된 대상들의 수 및 연관된 푸리에 등에 의존하여 정보를 분기하고 브랜칭할 수 있다. Sometimes focus algorithms track focus but fail to achieve-wasting cycles and battery life. In some examples it is better to simply grab frames in a series of different focus settings. A search tree of focal depths or field depths can be used. This is particularly useful when the image contains a number of objects of potential interest, each in a different plane. The system can capture a frame that is 6 inches in focus and another frame that is focused in 24 inches. Different frames may indicate that there are two objects of interest within the field of view, one better captured in one frame and the other better captured in the other frame. Or a 24-inch-focused frame is found to have no useful data, but a 6-inch-focused frame may contain enough identifiable frequency content to know that there are two or more target image planes. have. Based on the frequency content, one or more frames with different focus settings can then be captured. Or an area of a 24-inch-focused frame may have one of the Fourier properties, and an identical area of a 6-inch-focused frame may have a different setting of Fourier properties, between two frames From the difference of, the following experimental focus settings can be identified (eg at 10 inches), and another frame at that focus setting can be captured. Feedback is applied-it is not necessary to obtain a complete focus lock, but rather a search criterion to determine about other captures that may represent additional useful details. The search may branch and branch the information depending on the number of objects identified and the associated Fourier, etc., until satisfactory information about all objects is fathered.

관련 방식은 카메라 렌즈 시스템이 의도된 초점 설정으로 조정을 취할 때 복수의 프레임들을 캡처하고 버퍼링하는 것이다. 의도된 초점에서 최종적으로 캡처된 프레임의 분석은 중간 초점 프레임들이 예를 들면 초기에 출현되지 않거나 중요하지 않은 대상들에 관한 유용한 정보를 나타내는 것을 제안할 수 있다. 초기에 캡처되고 버퍼링된 하나 이상의 프레임들은 그 후에 그 중요성이 초기에 인식되지 않은 정보를 제공하기 위해 리콜되어 처리될 수 있다. A related approach is to capture and buffer a plurality of frames when the camera lens system makes adjustments to the intended focus setting. Analysis of the frame finally captured at the intended focus may suggest that the intermediate focus frames represent useful information about objects that are not initially present or that are not important, for example. One or more frames initially captured and buffered may then be recalled and processed to provide information whose importance was not initially recognized.

카메라 제어는 또한, 공간 좌표 정보에 응답될 수 있다. 지리적 위치 데이터 및 방향(예를 들면, 자기계)을 이용함으로써, 카메라는 의도된 타겟을 캡처하는지를 확인할 수 있다. 카메라 셋-업 모듈은 특정 노출 파라미터뿐만 아니라 특정 대상들 또는 위치들의 이미지들을 요청할 수 있다. 카메라가 특정 대상(미리 이용자 지정되거나 컴퓨터 처리에 의해 식별되었을 수 있음)을 캡처하기 위한 정확한 위치에 있을 때, 이미지 데이터의 하나 이상의 프레임들이 자동으로 캡처될 수 있다. (일부 어레인지먼트들에서, 카메라의 방향은 스텝퍼 모터들 또는 다른 전기기계적 어레인지먼트에 의해 제어될 수 있어서, 카메라는 원하는 대상을 캡처하기 위해, 특정 방향으로부터 이미지 데이터를 캡처하기 위해 방위 및 고도를 자동으로 설정할 수 있다. 렌즈 방향의 전자적 또는 유동적 조정이 또한 활용될 수 있다.)Camera control may also be responsive to spatial coordinate information. By using geographic location data and direction (eg, a magnetic field), the camera can verify that it captures the intended target. The camera set-up module may request images of specific objects or locations as well as specific exposure parameters. When the camera is in the correct position to capture a particular object (which may have been previously user specified or identified by computer processing), one or more frames of image data may be automatically captured. (In some arrangements, the orientation of the camera can be controlled by stepper motors or other electromechanical arrangements, such that the camera automatically sets the azimuth and altitude to capture image data from a particular orientation to capture the desired object. Electronic or fluid adjustment of the lens direction may also be utilized.)

주지된 바와 같이, 카메라 셋업 모듈은 프레임들의 시퀀스를 캡처하도록 카메라에 명령할 수 있다. 높은 다이내믹 레인지 이미지의 합성과 같은 이점들 외에도, 이러한 프레임들은 수퍼-해상도 이미지들을 획득하기 위해 정렬되고 조합될 수 있다. (본 기술분야에 알려진 바와 같이, 수퍼-해상도는 다른 종류의 방법들에 의해 달성될 수 있다. 예를 들면, 이미지들의 주파수 콘텐트가 분석될 수 있고, 선형 변환에 의해 서로 관련될 수 있고, 정확한 정렬로 어파인-변환될 수 있고, 그 후에 오버레이 및 조합될 수 있다. 다른 애플리케이션들 외에도, 이것은 이미지로부터 디지털 워터마크 데이터를 디코딩하는데 이용될 수 있다. 대상이 일반적으로 만족스러운 이미지 해상도를 획득하기에는 카메라로부터 너무 멀리 있는 경우, 성공적인 워터마크 디코딩에 필요한 더 높은 해상도를 획득하기 위해 이러한 수퍼-해상도 기술들에 의해 이것은 2배가 될 수 있다.)As noted, the camera setup module can instruct the camera to capture a sequence of frames. In addition to advantages such as compositing a high dynamic range image, these frames can be aligned and combined to obtain super-resolution images. (As known in the art, super-resolution can be achieved by other kinds of methods. For example, the frequency content of the images can be analyzed, correlated with each other by linear transformation, and accurate Can be affine-converted into an alignment, and then overlayed and combined .. In addition to other applications, this can be used to decode digital watermark data from an image. If you are too far from the camera, this can be doubled by these super-resolution techniques to obtain the higher resolution needed for successful watermark decoding.)

예시적 실시예에서, 각각의 처리 스테이지는 수신되었을 때 패킷에 포함된 입력 데이터에 대한 처리 결과들을 대용하였다. 다른 어레인지먼트들에서, 처리된 데이터는 원래 존재하는 데이터를 유지하고 있을 때 패킷 몸체에 추가될 수 있다. 이러한 경우, 패킷은 처리 동안 성장한다 - 더 많은 정보가 추가될 수 있기 때문이다. 이것은 일부 콘텍스트들에서 불리할 수 있지만, 또한 이점들을 제공할 수도 있다. 예를 들면, 그것은 2개의 패킷들 또는 2개의 스레드들로 처리 체인을 분기할 필요를 방지할 수 있다. 때때로, 오리지널 및 처리된 데이터 양쪽 모두가 후속 스테이지에서 유용할 수 있다. 예를 들면, FFT 스테이지는 오리지널 픽셀 도메인 이미지를 포함하는 픽셀에 주파수 도메인 정보를 추가할 수 있다. 이들 양쪽 모두는 후속 스테이지, 예를 들면 수퍼-해상도 처리를 위한 서브-픽셀 정렬을 실행하는데 이용될 수 있다. 마찬가지로, 초점 메트릭이 이미지로부터 추출될 수 있고, 후속 스테이지에 의해 - 이미지 데이터에 따라 - 이용될 수 있다. In an exemplary embodiment, each processing stage substituted the processing results for input data contained in the packet when received. In other arrangements, the processed data may be added to the packet body while maintaining the data that originally exists. In this case, the packet grows during processing-because more information can be added. This may be disadvantageous in some contexts, but may also provide advantages. For example, it can avoid the need to branch the processing chain to two packets or two threads. Occasionally, both original and processed data may be useful at subsequent stages. For example, the FFT stage may add frequency domain information to the pixel containing the original pixel domain image. Both of these can be used to perform sub-pixel alignment for subsequent stages, for example super-resolution processing. Similarly, focus metrics can be extracted from the image and used by subsequent stages-depending on the image data.

상술된 어레인지먼트들은 프레임마다 기초하여 상이한 타입들의 이미지 데이터를 생성하도록 카메라를 제어하고, 각각의 그러한 프레임을 상이하게 처리하도록 시스템의 후속 스테이지들을 제어하기 위해 이용될 수 있음을 알 것이다. 따라서, 시스템은 녹색 워터마크 검출을 최적화하도록 선택된 조건들 하에 제 1 프레임을 캡처하고, 바코드 판독을 최적화하도록 선택된 조건들 하에 제 2 프레임을 캡처하고, 얼굴 인식을 최적화하도록 선택된 조건들 하에 제 3 프레임을 캡처하는 등을 할 수 있다. 후속 스테이지들은 찾은 데이터를 최상으로 추출하기 위하여, 이들 프레임들의 각각을 상이하게 처리하도록 지시될 수 있다. 모든 프레임들은 조명 변동들을 감지하기 위해 처리될 수 있다. 모든 다른 프레임은 예를 들면, 이미지 프레임 내의 9개의 상이한 위치들에서 16 x 16 픽셀 FFT들을 계산함으로써 초점을 평가하도록 처리될 수 있다. (또는 모든 프레임들이 초점에 대해 평가되도록 허용하는 포크가 존재할 수 있고, 초점 브랜치가 필요없을 때 디스에이블되거나 다른 용도로 서빙하도록 재구성될 수 있다.) 등.It will be appreciated that the arrangements described above can be used to control the camera to generate different types of image data based on a frame-by-frame basis, and to control subsequent stages of the system to process each such frame differently. Thus, the system captures the first frame under conditions selected to optimize green watermark detection, the second frame under conditions selected to optimize barcode reading, and the third frame under conditions selected to optimize face recognition. To capture it, and so on. Subsequent stages may be instructed to process each of these frames differently in order to best extract the found data. All frames can be processed to detect lighting variations. Every other frame can be processed to evaluate the focus, for example, by calculating 16 x 16 pixel FFTs at nine different locations within the image frame. (Or there may be a fork that allows all frames to be evaluated for focus, and may be disabled or reconfigured to serve for other purposes when the focus branch is not needed.)

일부 구현들에서, 프레임 캡처는 워터마크 패이로드 데이터의 성공적인 자체 디코딩에 관계없이, 디지털 워터마크 신호에 존재하는 스테가노그래픽 교정 신호들을 캡처하도록 튜닝될 수 있다. 예를 들면, 캡처된 이미지 데이터는 낮은 해상도일 수 있다 - 교정 신호를 식별하기에 충분하지만, 패이로드를 식별하기에는 불충분하다. 또는 카메라는 예를 들면, 이미지 하이라이트들이 퇴색되도록 과하게 노출하거나, 이미지의 다른 부분들이 구별 불가능하도록 노출 불충분하게, 인간의 인식에 관계없이 이미지를 노출할 수 있다. 이러한 노출은 워터마크 방향 신호를 캡처하는데 충분할 수 있다.(피드백은 당연히, 하나 이상의 후속 이미지 프레임들을 캡처하기 위해 활용될 수 있다 - 이전 이미지 프레임의 하나 이상의 결점들을 경감시킨다.)In some implementations, frame capture can be tuned to capture steganographic correction signals present in the digital watermark signal, regardless of successful self decoding of the watermark payload data. For example, the captured image data may be of low resolution-sufficient to identify the calibration signal, but insufficient to identify the payload. Or the camera may expose the image irrespective of human perception, for example, overexposing the image highlights to fade, or underexposing so that other parts of the image are indistinguishable. This exposure may be sufficient to capture the watermark direction signal. (Feedback can of course be utilized to capture one or more subsequent image frames-alleviating one or more defects of the previous image frame.)

일부 디지털 워터마크들은 이미지 휘도의 변조로서 컬러들에 걸치기보다는 특정 컬러 채널들(예를 들면 청색)에 임베딩된다(예를 들면, Reed에 대한 공동 소유된 특허 출원 12/337,029 참조). 이러한 워터마크를 포함하는 프레임을 캡처할 때, 노출은 이미지의 다른 컬러들의 노출에 관계없이, 청색 채널에서 최대 다이내믹 레인지를 생성하도록 선택될 수 있다(예를 들면, 8-비트 센서에서 0 - 255). 하나의 프레임은 청색과 같은 하나의 컬러의 다이내믹 레인지를 최대화하도록 캡처될 수 있고, 나중 프레임은 황색(즉, 적색-녹색축을 따라)과 같은 다른 컬러 채널의 다이내믹 레인지를 최대화하도록 캡처될 수 있다. 이들 프레임들은 그 후에 정렬되고, 청색-황색 차가 결정된다. 프레임들은 조명, 대상 등에 의존하여 완전히 상이한 노출 시간들을 가질 수 있다. Some digital watermarks are embedded in specific color channels (eg blue) rather than across colors as a modulation of image brightness (see, eg, co-owned patent application 12 / 337,029 to Reed). When capturing a frame containing such a watermark, the exposure can be selected to produce the maximum dynamic range in the blue channel, regardless of the exposure of other colors in the image (eg, 0-255 in an 8-bit sensor). ). One frame may be captured to maximize the dynamic range of one color, such as blue, and later frames may be captured to maximize the dynamic range of another color channel, such as yellow (ie, along the red-green axis). These frames are then aligned and a blue-yellow difference is determined. The frames can have completely different exposure times depending on the illumination, object, and the like.

바람직하게, 시스템은 이용자가 사진을 "스냅핑"하려고 하지 않을 때에도 이미지를 캡처하고 처리하는 동작 모드를 가질 수 있다. 이용자가 셔터 버튼을 누른다면, 달리-스케줄링되지 않은 이미지 캡처/처리 동작들이 중단될 수 있고, 소비자 사진찍기 모드가 우위를 취할 수 있다. 이러한 모드에서, 이미지의 인간 시각 시스템 양태들을 향상시키도록 설계된 캡처 파라미터들 및 처리들이 대신 활용될 수 있다. Preferably, the system may have an operating mode that captures and processes the image even when the user does not intend to "snap" the photo. If the user presses the shutter button, otherwise-scheduled image capture / processing operations may be interrupted and the consumer photography mode may take the lead. In this mode, capture parameters and processes designed to enhance human visual system aspects of the image may be utilized instead.

(도 16에 도시된 특정 실시예는 임의의 이미지 데이터가 수집되기 전에 패킷들을 생성하는 것을 알 것이다. 대조적으로, 도 10a 및 연관된 논의에서는 카메라 전에 존재하는 패킷들을 나타내지 않는다. 양쪽 어레인지먼트는 양쪽 실시예에서 이용될 수 있다. 즉, 도 10a에서, 패킷들은 카메라에 의해 이미지 데이터의 캡처에 앞서 확립될 수 있고, 그 경우, 비주얼 키벡터 처리 및 패키징 모듈은 픽셀 데이터를 - 또는 더욱 통상적으로, 픽셀 데이터의 서브-세트들 또는 수퍼-세트들 - 이전에 형성된 패킷들에 삽입하도록 서빙한다. 유사하게, 도 16에서, 패킷들은 카메라가 이미지 데이터를 캡처한 후까지 생성될 필요가 없다.)(It will be appreciated that the particular embodiment shown in Figure 16 generates packets before any image data is collected. In contrast, Figure 10A and the associated discussion do not represent packets that exist before the camera. Both arrangements are both embodiments. That is, in FIG. 10A, packets may be established prior to the capture of image data by the camera, in which case the visual keyvector processing and packaging module uses pixel data-or, more typically, pixel data. Sub-sets or super-sets of-serve to insert into previously formed packets Similarly, in Figure 16, packets need not be generated until after the camera has captured image data.)

초기에 주지된 바와 같이, 하나 이상의 처리 스테이지들은 셀 폰으로부터 원격에 있다. 하나 이상의 픽셀 패킷들은 처리를 위해 클라우드에(또는 클라우드를 통해) 라우팅될 수 있다. 결과들이 셀 폰에 리턴될 수 있거나, 다른 클라우드 처리 단(또는 양쪽 모두)에 송신될 수 있다. 일단 셀 폰에 다시 오면, 하나 이상의 다른 국부적인 동작들이 실행될 수 있다. 데이터는 그 후에, 클라우드 밖으로 다시 송신될 수 있다. 처리는 따라서 셀 폰과 클라우드 사이에서 교호할 수 있다. 최종적으로, 결과 데이터는 일반적으로 셀 폰에서 다시 이용자에 제공될 수 있다. As noted earlier, one or more processing stages are remote from the cell phone. One or more pixel packets may be routed to (or through) the cloud for processing. The results may be returned to the cell phone or may be sent to other cloud processing stages (or both). Once back at the cell phone, one or more other local operations may be performed. The data can then be sent back out of the cloud. Processing can thus alternate between cell phones and the cloud. Finally, the resulting data can generally be provided back to the user at the cell phone.

출원인은 상이한 벤더들이 특수화된 처리 작업들에 대한 경합하는 클라우드 서비스들을 제공할 것을 예상하였다. 예를 들면, 애플, 구글 및 페이스북은 클라우드-기반 얼굴 인식 서비스들을 각각 제공할 수 있다. 이용자 디바이스는 처리를 위해 처리된 데이터 패킷을 송신한다. 패킷의 헤더는 이용자, 요청된 서비스 및 - 선택적으로 - 소액 결제 명령어들을 나타낼 수 있다. (다시, 헤더는 원하는 트랜잭션이 클라우드 데이터베이스에 룩-업되거나, 동작 또는 일부 트랜잭션을 위한 처리들의 시퀀스 - 구매, 페이스북 상에 우송, 얼굴- 또는 오브젝트-인식 동작 등 - 를 구성하도록 서빙하는 인덱스 또는 다른 식별자를 전달할 수 있다. 일단 이러한 인덱싱된 트랜잭션 어레인지먼트가 초기에 구성되면, 그것은 원하는 동작을 나타내는 식별자와 이미지-관련 데이터를 포함하는 클라우드에 패킷을 송신함으로써 간단히 쉽게 호출될 수 있다.) Applicants expected different vendors to provide competing cloud services for specialized processing tasks. For example, Apple, Google, and Facebook may each offer cloud-based facial recognition services. The user device transmits the processed data packet for processing. The header of the packet may indicate the user, the requested service and-optionally-micro payment instructions. (Again, the header may be an index that serves to make the desired transaction look up in the cloud database, or to construct a sequence of actions for an action or some transaction such as purchase, mail on Facebook, face- or object-aware action, or Other identifiers may be passed in. Once this indexed transaction arrangement is initially configured, it can be easily invoked simply by sending a packet to the cloud containing the identifier and image-related data representing the desired behavior.)

애플 서비스에서, 예를 들면, 서버는 인커밍 패킷을 조사하고, 이용자의 아이포토 계정을 룩-업하고, 그 계정으로부터 이용자의 친구들에 대한 얼굴 인식 데이터를 액세스하고, 패킷과 함께 전달된 이미지 데이터로부터 얼굴 인식 특징들을 계산하고, 최상의 매칭을 결정하고, 결과 정보(예를 들면, 묘사된 개인의 이름)를 다시 오리지널 디바이스에 리턴할 수 있다. In the Apple service, for example, the server examines the incoming packet, looks up the user's iPhoto account, accesses facial recognition data for the user's friends from that account, and the image data delivered with the packet. Facial recognition features can be calculated from, determining the best match, and returning the resulting information (eg, the name of the person depicted) back to the original device.

구글 서비스에 대한 IP 어드레스에서, 서버는 유사한 동작을 착수할 수 있지만, 이용자의 피카사 계정을 참조한다. 페이스북에 대해서도 동일하다. At the IP address for the Google service, the server can undertake a similar operation, but references the user's Picasa account. The same is true for Facebook.

수십 또는 수백의 알려진 친구들에 대한 얼굴들 중에서 한 얼굴을 식별하는 것은 낯선 사람들의 얼굴을들 식별하는 것보다 더 쉽다. 다른 벤더들은 후자의 종류의 서비스들을 제공할 수 있다. 예를 들면, L-1 아이덴티티 솔류선즈, 인크는 운전자의 면허들과 같은 정부-발행된 증명서들로부터 이미지들의 데이터베이스를 유지한다. 적절한 허가들을 이용하여, 이러한 데이터베이스들로부터 뽑아낸 얼굴 인식 서비스를 제공할 수 있다. Identifying one face among the faces for dozens or hundreds of known friends is easier than identifying the faces of strangers. Other vendors can provide the latter kind of services. For example, L-1 Identity Solks, Inc. maintains a database of images from government-issued certificates, such as driver's licenses. Appropriate permissions can be used to provide facial recognition services extracted from these databases.

다른 처리 동작들은 유사하게 원격으로 동작될 수 있다. 하나는 바코드 처리기이며, 이것은 모바일 폰으로부터 송신된 처리된 이미지 데이터를 취하고, 존재하는 바코드의 타입에 특정한 디코딩 알고리즘을 적용한다. 서비스는 하나, 몇 개, 또는 수십 개의 상이한 타입의 바코드들을 지원할 수 있다. 디코딩된 데이터는 폰에 리턴될 수 있거나, 서비스 제공자는 제품 정보, 명령어들, 구매 옵션들 등과 같은 디코딩된 데이터에 의해 인덱싱된 다른 데이터에 액세스하고, 이러한 다른 데이터를 폰에 리턴할 수 있다. (또는 양쪽 모두 제공될 수 있다.)Other processing operations can similarly be operated remotely. One is a barcode processor, which takes the processed image data sent from the mobile phone and applies a decoding algorithm specific to the type of barcode present. The service can support one, several, or dozens of different types of barcodes. The decoded data may be returned to the phone, or the service provider may access other data indexed by the decoded data, such as product information, instructions, purchase options, and the like, and return this other data to the phone. (Or both may be provided.)

다른 서비스는 디지털 워터마크 판독이다. 다른 서비스는 광학 캐릭터 인식(OCR)이다. OCR 서비스 제공자는 트랜잭션 서비스들, 예를 들면, ASCII 심볼들로 처리된 이미지 데이터를 변환한 다음 ASCII 단어들을 변역 엔진에 제공하여 이들이 상이한 언어로 렌더링되게 하는 서비스를 더 제공할 수 있다. 다른 서비스들은 도 2에서 샘플링된다. (실용성은 무수한 다른 서비스들 및 또한 제공될 수 있는 구성요소 동작들의 일람표를 제공한다.)Another service is digital watermark reading. Another service is optical character recognition (OCR). The OCR service provider may further provide transaction services, for example a service that transforms the processed image data into ASCII symbols and then provides ASCII words to the translation engine so that they are rendered in different languages. Other services are sampled in FIG. (Practicality provides a list of countless other services and also component operations that can be provided.)

원격 서비스 제공자로부터의 출력은 흔히 셀 폰에 리턴된다. 많은 경우들에서, 원격 서비스 제공자는 처리된 이미지 데이터를 리턴할 것이다. 일부 경우들에서, ASCII 또는 다른 그러한 데이터를 리턴할 수 있다. 그러나, 때때로, 원격 서비스 제공자는 오디오(예를 들면 MP3) 및/또는 비디오(예를 들면, MPEG4 및 어도비 프레시)를 포함한 다른 형태들의 출력을 생성할 수 있다. Output from the remote service provider is often returned to the cell phone. In many cases, the remote service provider will return processed image data. In some cases, it may return ASCII or other such data. However, from time to time, the remote service provider may generate other forms of output, including audio (eg MP3) and / or video (eg MPEG4 and Adobe Fresh).

원격 제공자로부터 셀 폰으로 리턴된 비디오는 셀 폰 디스플레이 상에 제공될 수 있다. 일부 구현들에서, 이러한 비디오는 이용자 인터페이스 스크린을 제공하여, 이용자에게 정보 또는 동작을 선택하거나 명령어를 발행하기 위해 디스플레이된 제공 내에서 터치하거나 제스처를 취하도록 요청(invite)한다. 셀 폰의 소프트웨어는 이러한 이용자 입력을 수신할 수 있고 응답 동작들을 착수하거나, 응답 정보를 제공할 수 있다. Video returned from the remote provider to the cell phone can be provided on the cell phone display. In some implementations, such video provides a user interface screen, inviting the user to touch or make a gesture within the displayed offer to select information or action or issue a command. The software of the cell phone may receive such user input and undertake response actions or provide response information.

또 다른 어레인지먼트들에서, 원격 서비스 제공자로부터 셀 폰으로 다시 제공된 데이터는 JavaScript 또는 다른 그러한 명령어들을 포함할 수 있다. 셀 폰에 의해 실행될 때, JavaScript는 원격 제공자에게 조회되는 처리된 데이터와 연관된 응답을 제공한다. In still other arrangements, the data provided back to the cell phone from the remote service provider may include JavaScript or other such instructions. When executed by the cell phone, JavaScript provides a response associated with the processed data to be queried to the remote provider.

원격 처리 서비스들은 다양한 상이한 금융 모델들 하에서 제공될 수 있다. 애플 아이폰 서비스 계획은 부가의 비용 없이 다양한 원격 서비스들, 예를 들면, 아이포토-기반 얼굴 인식과 일괄 제공될 수 있다. 다른 서비스들은 이용마다, 매달 구독, 또는 다른 이용 계획들에 대해 요금 청구될 수 있다. Remote processing services may be provided under a variety of different financial models. The Apple iPhone service plan can be bundled with various remote services, such as iPhoto-based facial recognition, at no additional cost. Other services may be billed for each use, monthly subscription, or other usage plans.

일부 서비스들은 매우 고급화되고 시장화됨을 의심하지 않을 것이다. 다른 것들은 품질이 경합될 수 있다; 다른 것은 가격이 경합될 수 있다.You will not doubt that some services are very advanced and marketed. Others can compete for quality; Others can be priced.

주지된 바와 같이, 저장된 데이터는 상이한 서비스들에 대한 양호한 제공자들을 나타낼 수 있다. 이들은 명시적으로 식별될 수 있거나(예를 들면, Fraunhofer Institute 서비스에 모든 FFT 동작들을 송신함), 또는 이들은 다른 속성들에 의해 명시될 수 있다. 예를 들면, 셀 폰 이용자는 모든 원격 서비스 요청들이, 제공자들(예를 들면 소비자 조합에 의해)의 주기적으로 업데이트된 조사에서 가장 신속하게 랭크된 제공자들에 라우팅되도록 지정될 수 있다. 셀 폰은 이 정보에 대한 공개된 결과들을 주기적으로 확인하거나, 서비스가 요청될 때 동적으로 확인될 수 있다. 다른 이용자는 서비스 요청들이 가장 높은 소비자 만족 점수들을 가진 - 다시, 온라인 레이팅 리소스를 참조하여 - 서비스 제공자들에게 라우팅되어야 함을 명시할 수 있다. 또 다른 이용자는 가장 높은 소비자 만족 점수들을 가진 제공자들에게 라우팅되어야 함을 명시할 수 있다 - 서비스가 무료로 제공되는 경우에도; 그 밖에는 최저 비용의 제공자에게 라우팅된다. 이들 어레인지먼트들과 다른 것들의 조합들도 당연히 가능하다. 이용자는 특정 경우에, 특정 서비스 제공자를 명시할 수 있다 - 저장된 프로파일 데이터에 의해 이루어지는 임의의 선택을 트럼핑한다. As noted, the stored data may represent good providers for different services. These may be explicitly identified (eg, send all FFT operations to the Fraunhofer Institute service), or they may be specified by other attributes. For example, a cell phone user may be designated so that all remote service requests are routed to the fastest ranked providers in the provider's (eg, by consumer combination) periodically updated survey. The cell phone may periodically check the published results for this information, or may dynamically check when a service is requested. The other user may specify that service requests should be routed to service providers with the highest consumer satisfaction scores-again, with reference to the online rating resource. Another user may specify that routing should be made to providers with the highest consumer satisfaction scores-even if the service is provided free of charge; Otherwise it is routed to the lowest cost provider. Combinations of these arrangements and others are of course also possible. The user may, in certain cases, specify a particular service provider-trump any selection made by the stored profile data.

또 다른 어레인지먼트들에서, 서비스에 대한 이용자의 요청은 외부로 우송될 수 있고, 여러 서비스 제공자들은 요청된 동작을 실행하는데 관심을 표현할 수 있다. 또는 요청은 제안들 위해 여러 특정 서비스 제공자들에 송신될 수 있다(예를 들면, 아마존, 구글 및 마이크로소프트에). 상이한 제공자의 응답들(가격책정, 다른 조건들 등)이 이용자에게 제공될 수 있고, 이용자는 그 중에서 선택하거나, 선택이 자동으로 이루어질 수 있다 - 이전에 저장된 규칙에 기초하여. 일부 경우들에서, 하나 이상의 경합하는 서비스 제공자들은 이용자 데이터가 제공될 수 있고, 이를 이용하여 그들이 실행을 시작하거나, 서비스 제공자 선택이 최종적으로 이루어지기 전에 대상 동작을 전적으로 실행한다 - 이러한 제공자들은 그들의 응답 시간들을 빠르게 하고 부가의 실제 데이터를 마주치는 기회를 주어진다. (또한, 예를 들면 도 7 내지 도 12와 함께 경매-기반 서비스들을 포함한 원격 서비스 제공자들의 초기 논의를 참조한다.)In still other arrangements, the user's request for a service may be mailed outwards and various service providers may express interest in performing the requested operation. Or the request can be sent to various specific service providers for proposals (eg, to Amazon, Google and Microsoft). Responses from different providers (price pricing, other conditions, etc.) can be provided to the user, and the user can select from them, or the selection can be made automatically-based on previously stored rules. In some cases, one or more competing service providers may be provided with user data and use it to execute the target operation entirely before they begin execution or the service provider selection is finally made—these providers respond to their response. You are given the opportunity to speed up time and encounter additional real data. (See also an initial discussion of remote service providers including auction-based services, for example in conjunction with FIGS. 7-12.)

다른 곳에 나타낸 바와 같이, 특정 외부 서비스는 공용 허브(모듈)를 통과할 수 있으며, 공용 허브는 그 요청들을 적절할 서비스 제공자들에 배포할 책임이 있다. 대등하게, 특정 외부 서비스 요청들로부터의 결과들도 유사하게 공용 허브를 통해 라우팅될 수 있다. 예를 들면, 상이한 디지털 워터마크들로부터의 상이한 서비스 제공자들에 의해 디코딩된 패이로드들(또는 상이한 바코드들로부터 디코딩된 패이로드들 또는 상이한 콘텐트 오브젝트들로부터 계산된 핑거프린트들)은 공용 허브에 참조될 수 있고, 이것은 통계들을 컴파일할 수 있고 정보를 집선할 수 있다(넬슨의 모니터링 서비스들과 유사하게 - 상이한 데이터와 소비자의 만남들을 조사). 코딩된 워터마크 데이터(바코드 데이터, 핑거프린터 데이터) 외에도, 허브에는 또한(또는 대안적으로) 각각의 디코딩/계산 동작과 연관된 품질 또는 신뢰도 메트릭이 제공될 수 있다. 이것은 고려가 필요한 패키징 문제들, 인쇄 문제들, 미디어 오류 문제들 등을 나타내도록 도울 수 있다.As indicated elsewhere, certain external services may pass through a public hub (module), which is responsible for distributing those requests to appropriate service providers. Equivalently, results from certain external service requests can similarly be routed through a public hub. For example, payloads decoded by different service providers from different digital watermarks (or payloads decoded from different barcodes or fingerprints calculated from different content objects) are referred to the public hub. It can compile statistics and aggregate information (similar to Nelson's monitoring services-investigating different data and consumer encounters). In addition to coded watermark data (barcode data, fingerprint data), the hub may also be provided with (or alternatively) a quality or reliability metric associated with each decoding / calculation operation. This can help to indicate packaging problems, printing problems, media error problems, etc. that need to be considered.

파이프 관리기Pipe manager

도 16 구현에서, 클라우드에 대한 및 클라우드로부터의 통신들은 파이프 관리기(51)에 의해 용이해진다. 이 모듈(도 7의 질의 라우터 및 응답 괄니기의 셀 폰측 부분으로 실현될 수 있음)은 데이터 파이프(52)를 통해 통신하는 것에 관련된 다양한 기능들을 실행한다. (파이프(52)는 다양한 통신 채널들을 포함할 수 없는 데이터 구조를 알 것이다.)In the FIG. 16 implementation, communications to and from the cloud are facilitated by the pipe manager 51. This module (which can be realized as the cell phone side portion of the query router and answer handler in FIG. 7) performs various functions related to communicating over the data pipe 52. (Pipe 52 will know a data structure that cannot contain various communication channels.)

파이프 관리기(51)에 의한 하나의 기능은 필요한 통신 리소스들을 협정하는 것이다. 셀 폰은 다양한 통신 네트워크들 및 광고 데이터 캐리어들 예를 들면, 셀룰러 데이터, WiFi, 블루투스 등 - 활용될 수 있는 것들 중 일부 또는 전부 -를 활용할 수 있다. 각각은 그 자신의 프로토콜 스택을 가질 수 있다. 하나의 관점에서, 파이프 관리기(51)는 이들 데이터 채널들에 대한 각각의 인터페이스들과 상호작용한다 - 상이한 데이터 패이로드들에 대한 대역폭의 가용성을 결정한다.One function by the pipe manager 51 is to negotiate the necessary communication resources. The cell phone may utilize various communication networks and advertising data carriers such as cellular data, WiFi, Bluetooth, etc.—some or all of those that may be utilized. Each can have its own protocol stack. In one aspect, the pipe manager 51 interacts with respective interfaces for these data channels-determining the availability of bandwidth for different data payloads.

예를 들면, 파이프 관리기는 약 450밀리초에서 시작하는 송신을 위한 패이로드 준비가 존재하는 것을 셀룰러 데이터 캐리어 로컬 인터페이스 및 네트워크에 경고한다. 그것은 패이로드(예를 들면, 2개의 메가비트들)의 크기, 그 캐릭터(예를 들면, 블록 데이터), 및 서비스의 요구 품질(예를 들면, 데이터 처리율 레이트)을 더 명시할 수 있다. 그것은 또한 송신에 대한 우선순위 레벨을 명시할 수 있어서, 인터페이스 및 네트워크는 충돌의 경우에 낮은 우선순위의 데이터 교환들보다 앞서 이러한 송신을 서비스할 수 있다. For example, the pipe manager alerts the cellular data carrier local interface and network that there is payload ready for transmission starting at about 450 milliseconds. It may further specify the size of the payload (eg two megabits), its character (eg block data), and the required quality of service (eg data throughput rate). It can also specify a priority level for the transmission, so that the interface and network can service this transmission in advance of low priority data exchanges in the event of a collision.

파이프 관리기는 제어 처리기 모듈(36)에 의해 제공된 정보로 인해 패이로드의 예상된 크기를 알고 있다. (예시된 실시예에서, 제어 처리기 모듈은 패이로드를 산출하는 특정 처리를 명시하고, 그래서 결과로서 생긴 데이터의 크기를 추정할 수 있다). 제어 처리기 모듈은 또한, 예를 들면, 고정된 블록으로서 또는 간헐적으로 버스트들에서 이용 가능할 것인지의 여부, 송신을 위해 제공될 레이트 등의 데이터의 캐릭터를 예측할 수 있다. 제어 처리기 모듈(36)은 또한, 데이터가 송신을 위해 준비하는 시간을 예측할 수 있다. 우선순위 정보 역시, 제어 처리기 모듈에 의해 알려진다. 일부 예들에서, 제어 처리기 모듈은 우선순위 레벨을 자동으로 설정한다. 다른 예들에서, 우선순위 레벨은 사람에 의해 지정되거나, 서비스되는 특정 애플리케이션에 의해 지정된다. The pipe manager knows the expected size of the payload due to the information provided by the control processor module 36. (In the illustrated embodiment, the control processor module specifies the specific process for calculating the payload, so that it can estimate the size of the resulting data). The control processor module may also predict the character of the data, such as whether it will be available in bursts as a fixed block or intermittently, for example, the rate to be provided for transmission. The control processor module 36 may also predict the time for which data is ready for transmission. Priority information is also known by the control processor module. In some examples, the control processor module automatically sets the priority level. In other examples, the priority level is specified by a person or by the particular application being serviced.

예를 들면, 이용자는 셀 폰의 그래픽 이용자 인터페이스를 통해 명백히 시그널링할 수 있거나, 특정 애플리케이션은 이미지-기반 동작이 즉시 처리되는 것을 정규적으로 요구할 수 있다. 이것은 예를 들면, 이용자로부터의 다른 동작이 이미지 처리의 결과들에 기초하여 예상되는 경우일 수 있다. 다른 경우들에서, 이용자는 명백히 시그널링할 수 있거나, 특정 애플리케이션이 일반적으로, 이미지-기반 동작이 편리할 때(예를 들면, 필요한 리소스들이 낮은 또는 빈 활용도를 가질 수 있을 때) 실행될 수 있는 것을 허용할 수 있다. 이것은 예를 들면, 이용자가 페이스북과 같은 소셜 네트워킹 사이트에 스냅샷을 우송하고, 얼굴 인식 처리를 통해 묘사된 개인들의 이름들로 주석달린 이미지를 좋아하는 경우일 수 있다. 예를 들면, 1분, 10분, 1시간 하루 등 내의 처리인 중간 우선순위화(이용자에 의해 또는 애플리케이션에 의해 표현됨)가 또한 활용될 수 있다. For example, a user may explicitly signal via the cell phone's graphical user interface, or certain applications may regularly require that image-based operations be processed immediately. This may be the case, for example, when another action from the user is expected based on the results of the image processing. In other cases, the user may explicitly signal, or allow a particular application to be executed when the image-based operation is generally convenient (eg, when required resources may have low or empty utilization). can do. This may be the case, for example, when a user likes to post a snapshot to a social networking site such as Facebook, and an image annotated with the names of individuals depicted through facial recognition processing. For example, intermediate prioritization (expressed by the user or by the application), which is a process within one minute, ten minutes, one hour a day, or the like, may also be utilized.

예시된 어레인지먼트에서, 제어 처리기 모듈(36)은 예상된 데이터 크기, 캐릭터, 타이밍 및 우선순위를 파이프 관리기에 통보하여, 파이프 관리기는 원하는 서비스에 대한 협정시 이들을 이용할 수 있다. (다른 실시예들에서, 다소의 정보가 제공될 수 있다.)In the illustrated arrangement, the control processor module 36 notifies the pipe manager of the expected data size, character, timing and priority, so that the pipe manager can use them in negotiations for the desired service. (In other embodiments, some information may be provided.)

캐리어 및 인터페이스가 파이프 관리기의 요청을 충족할 수 있다면, 다른 데이터 교환이 데이터 송신을 준비하고, 원격 시스템이 예상된 동작을 준비하는 것을 계속할 수 있다. 예를 들면, 파이프 관리기는 특정 데이터 패이로드를 수신하고 이용자를 식별하는 클라우드에서의 특정 컴퓨터와 안전한 소켓 접속을 확립할 수 있다. 클라우드 컴퓨터가 얼굴 인식 동작을 실행하는 것이면, 애플/구글/페이스북으로부터 지정된 이용자의 친구들에 대해 얼굴 인식 특징들 및 연관된 이름들을 검색함으로써 동작을 준비할 수 있다. If the carrier and the interface can meet the pipe manager's request, another data exchange can prepare for data transmission and the remote system can continue to prepare for the expected operation. For example, a pipe manager can establish a secure socket connection with a particular computer in the cloud that receives a particular data payload and identifies a user. If the cloud computer is executing a facial recognition operation, it may prepare for the operation by retrieving facial recognition features and associated names for the designated user's friends from Apple / Google / Facebook.

따라서, 채널이 외부 통신을 준비하는 것 외에도, 파이프 관리기는 예상된 서비스 요청을 준비시키기 위해 원격 컴퓨터의 사전-워밍을 가능하게 한다. (서비스는 요청할 수 있고 따르지 않을 수 있다.) 일부 예들에서, 이용자는 셔터 버튼을 조작할 수 있고, 셀 폰은 어떤 동작이 뒤따를지를 알지 못한다. 이용자 요청이 얼굴 인식 동작인가? 바코드 디코딩 동작인가? 플리커 또는 페이스북으로의 이미지 우송인가? 일부 경우들에서, 파이프 관리기 - 또는 제어 처리기 모듈 - 는 여러 처리들을 사전-워밍할 수 있다. 또는 그것은 과거 경험에 기초하여, 어떤 동작이 착수될 것인지를 예측할 수 있고, 적절한 리소스를 워밍한다. (예를 들면, 이용자 실행된 얼굴 인식 동작들이 최종 3개의 셔터 동작들을 따르는 경우, 이용자가 얼굴 인식을 다시 요청할 양호한 기회가 있다.) 셀 폰은 임의의 것이 선택되기 전에, 실제로 다양한 가능한 기능들에 대해 구성요소 동작들을 실행하기 시작할 수 있다 - 특히 그 결과들이 여러 기능들에 유용할 수 있는 동작들. Thus, in addition to preparing the channel for external communication, the pipe manager enables pre-warming of the remote computer to prepare for the expected service request. (Service may be requested and may not be followed.) In some examples, the user can operate the shutter button, and the cell phone does not know what action to follow. Is the user request a facial recognition gesture? Is it a barcode decoding operation? Is the image mailed to Flickr or Facebook? In some cases, the pipe manager-or control processor module-can pre-warm several processes. Or it may predict, based on past experience, which action will be undertaken and warm the appropriate resource. (For example, if the user executed facial recognition operations follow the last three shutter operations, there is a good chance that the user will request facial recognition again.) The cell phone is actually loaded with various possible functions before anything is selected. May start executing component actions on-in particular, the results of which may be useful for various functions.

사전-워밍은 또한, 셀 폰 내에 리소스들을 포함할 수 있다: 처리기들을 구성, 캐시들 로딩, 등.Pre-warming may also include resources within the cell phone: configuring processors, loading caches, and the like.

상황은 원하는 리소스들이 예상된 트래픽을 다룰 준비가 되어 있다는 고찰들을 재고하였다. 다른 상황에서, 파이프 관리기는 캐리어가 이용 불가능하다고 보고할 수 있다(예를 들면, 악화된 무선 서비스 상태의 이용자로 인해). 이 정보는 제어 처리기 모듈(36)에 보고되고, 이미지 처리의 스케줄, 버퍼 결과들을 변경할 수 있거나, 다른 대응 동작을 취할 수 있다. The situation reconsidered considerations that desired resources were ready to handle the expected traffic. In other situations, the pipe manager may report that the carrier is unavailable (eg due to a worsened wireless service state user). This information is reported to the control processor module 36 and can change the schedule of image processing, buffer results, or take other corresponding actions.

다른, 충돌하는 데이터 송신들이 진행중인 경우, 캐리어 또는 인터페이스는 요청된 송신이 예를 들면, 요청된 시간에 또는 요청된 서비스 품질로 수용될 수 없다는 것을 파이프 관리기에 응답할 수 있다. 이러한 경우, 파이프 관리기는 제어 처리기 모듈(36)에 이를 보고할 수 있다. 제어 처리기 모듈은 2메가비트 데이트 서비스 요건을 유발하고 나중을 위해 리스케줄링된 처리를 중단할 수 있다. 대안적으로, 제어 처리기 모듈은 2메가 비트 패이로드가 원래 스케줄링된 대로 생성될 수 있다는 것을 결정할 수 있고, 캐리어들 및 인터페이스들이 그렇게 할 수 있을 때 결과들이 송신을 위해 국부적으로 버퍼링될 수 있다. 또는 다른 동작이 취해질 수 있다.If other, conflicting data transmissions are in progress, the carrier or interface may respond to the pipe manager that the requested transmission cannot be accommodated, for example, at the requested time or with the requested quality of service. In such a case, the pipe manager may report this to the control processor module 36. The control processor module may trigger a 2-megabit data service requirement and stop rescheduled processing for later. Alternatively, the control processor module can determine that a 2 mega bit payload can be generated as originally scheduled and the results can be locally buffered for transmission when the carriers and interfaces can do so. Or other actions may be taken.

참여자들이 저녁 전에 단체 사진을 위해 모이는 비즈니스 집회를 고려한다. 이용자는 사진에서 모든 얼굴들이 즉시 인식되기를 원할 수 있어서, 그들은 동료의 이름을 연상하지 않는 곤란을 회피하기 위해 신속히 조사할 수 있다. 이용자가 셀 폰의 이용자 셔터 버튼을 동작시키지 전에도, 제어 처리기 모듈은 시스템으로 하여금 이미지 데이터의 프레임들을 처리하게 하여, 시야(예를 들면, 예상된 위치들에서 2개의 외관상의 눈들을 가진 타원 형상들)에 출현된 얼굴들을 식별한다. 이들은 셀 폰의 뷰파인더(스크린) 디스플레이 상의 사각형들로 하이라이팅될 수 있다. Consider a business meeting where participants gather for group photos before dinner. The user may want all faces in the picture to be recognized immediately, so that they can quickly search to avoid the difficulty of not recalling a colleague's name. Even before the user operates the user shutter button of the cell phone, the control processor module causes the system to process the frames of image data, thereby ellipsoidal shapes with two apparent eyes in the field of view (eg, expected locations). Identify faces that appear in These can be highlighted with rectangles on the cell phone's viewfinder (screen) display.

현재 카메라들이 렌즈/노출 프로파일들(예를 들면, 클로즈-업, 야간, 해변, 풍경, 설원 장면들 등)에 기초하여 사진찍기 모드들을 가질 수 있지만, 이미징 디바이스들은 상이한 이미지-처리 모드들을 부가적으로(또는 대안적으로) 가질 수 있다. 한 모드는 사진에 묘사된 사람의 이름들을 획득하기 위해(예를 들면 얼굴 인식을 통해) 이용자에게 선택될 수 있다. 다른 모드는 이미지 프레임에서 발견된 텍스트의 광학 캐릭터 인식을 실행하도록 선택될 수 있다. 다른 것은 묘사된 항목을 구매하는 것에 관련된 동작들을 트리거링할 수 있다. 묘사된 항목을 판매하는 것과 동일하다. 묘사된 오브젝트, 장면 또는 사람(예를 들면, 위키피디아, 소셜 네트워크, 제조업자의 웹 사이트로부터) 등에 관한 정보를 획득하는 것과 동일하다. 항목을 가진 ThinkPipe 세션 또는 관련 시스템 등을 확립하는 것과 동일하다. While current cameras may have taking modes based on lens / exposure profiles (eg, close-up, night, beach, landscape, snowy scenes, etc.), imaging devices may add different image-processing modes. Or (or alternatively). One mode may be selected for the user to obtain names of the person depicted in the picture (eg via face recognition). Another mode may be selected to effect optical character recognition of the text found in the image frame. The other can trigger actions related to purchasing the depicted item. Same as selling the depicted item. Same as obtaining information about the depicted object, scene or person (eg from Wikipedia, social networks, the manufacturer's website), and the like. It is the same as establishing a ThinkPipe session with an entry or related system.

이들 모드들은 셔터 제어를 미리 동작시키거나 나중에 이용자에 의해 선택될 수 있다. 다른 어레인지먼트들에서, 복수의 셔터 제어들(물리적 또는 GUI)이 이용자에게 제공된다 - 상이한 이용 가능한 동작들을 각각 호출한다. (또 다른 실시예들에서, 이용자에게 명시적으로 나타내게 하기보다는 어떤 동작(들)이 요구될 가능성이 있는지를 추론한다.)These modes can be pre-activated or later selected by the user. In other arrangements, a plurality of shutter controls (physical or GUI) are provided to the user-each calling a different available operation. (In still other embodiments, it is inferred which action (s) are likely to be required rather than explicitly indicated to the user.)

비즈니스 집회의 이용자가 12명의 개인들을 묘사하는 단체 사진을 찍고, 이름들을 즉각적으로 기초하여 요청한다면, 파이프라인 관리기(51)는 요청된 서비스가 제공될 수 없음을 제어 처리기 모듈(또는 애플리케이션 소프트웨어)에 다시 보고할 수 있다. 병목 또는 다른 제약들로 인해, 관리기(51)는 "즉시" 기반을 구성하도록 고려되는 서비스 품질 파라미터들 내에서 묘사된 얼굴들 중 3명의 식별만이 수용될 수 있음을 보고할 수 있다. 다른 3명의 얼굴들은 2초 내에 인식될 수 있고, 얼굴들의 전체 세트의 인식은 5초 후에 예상될 수 있다. (이것은 본질적으로, 캐리어이기보다는 원격 서비스 제공자에 의한 제약으로 인한 것일 수 있다. If the user of the business meeting takes a group picture depicting 12 individuals and makes an immediate request based on names, the pipeline manager 51 informs the control processor module (or application software) that the requested service cannot be provided. You can report it again. Due to a bottleneck or other constraints, the manager 51 may report that only the identification of three of the faces depicted within the quality of service parameters considered to constitute the "on the fly" basis can be accepted. The other three faces can be recognized within 2 seconds and the recognition of the entire set of faces can be expected after 5 seconds. (This may be due to constraints by the remote service provider rather than the carrier in nature.

제어 처리기 모듈(36)(또는 애플리케이션 소프트웨어)은 알고리즘에 따라, 또는 로컬 또는 원격 데이터 구조에 저장된 규칙 세트를 참조하여 이러한 보고에 응답할 수 있다. 알고리즘 또는 규칙 세트는 얼굴 인식 동작들에 대해, 지연된 서비스가 조항들이 이용 가능할지라도 수용되어야 하고, 이용자에게 전체 결과들이 이용 가능하기 전에 약 N초의 지연들이 존재할 것임을 경고(디바이스 GUI를 통해)받아야 한다고 결론지을 수 있다. 선택적으로, 예상된 지연의 보고된 원인이 이용자에게 노출될 수 있다. 다른 서비스 예외들이 상이하게 다루어질 수 있다 - 일부 경우들에서 동작은 중단되거나 리스케줄링되거나 또는 덜 바람직한 제공자에 라우팅되고, 및/또는 이용자에게 경고되지 않는다. Control processor module 36 (or application software) may respond to this report in accordance with an algorithm or with reference to a set of rules stored in a local or remote data structure. The algorithm or rule set concludes that for face recognition operations, the delayed service must be accepted even if the clauses are available and the user is warned (via the device GUI) that there will be about N seconds of delay before the full results are available. Can be built. Optionally, the reported cause of the expected delay may be exposed to the user. Other service exceptions may be handled differently-in some cases the operation may be suspended, rescheduled or routed to a less desirable provider and / or not alerted to the user.

네트워크에 대한 로컬 디바이스 인터페이스의 능력 및 네트워크/캐리어의 능력을 고려하는 것 외에도, 예측 데이터 트래픽을 다루기 위하여(지정된 파라미터들 내에서), 파이프라인 관리기는 또한, 클라우드의 외부에 질의 리소스들이 존재한다 - 서비스들이 요청되는 것이면 무엇이든 실행할 수 있음을 보장하기 위하여(지정된 파라미터들 내에서). 이들 클라우드 리소스들은 예를 들면, 데이터 네트워크들 및 원격 컴퓨터를 포함할 수 있다. 임의의 것이 부정적으로 응답하거나, 서비스 레벨 조건을 가지고 응답한 경우, 이것은 역시 제어 처리기 모듈(36)에 다시 보고될 수 있어서, 적절한 동작이 취해질 수 있다. In addition to taking into account the capabilities of the local device interface to the network and the capabilities of the network / carrier, to handle predictive data traffic (within specified parameters), the pipeline manager also has query resources outside of the cloud − To ensure that services can run anything as requested (within specified parameters). These cloud resources can include, for example, data networks and remote computers. If anything responds negatively or responds with a service level condition, it can also be reported back to the control processor module 36 so that appropriate action can be taken.

예상된 데이터 흐름을 서비스하는데 어려움이 있을 가능성을 나타내는 파이프 관리기(51)로부터의 임의의 통신에 응답하여, 제어 처리(36)는 필요에 따라 파이프 관리기 및/또는 다른 모듈들에 대응하는 명령어들을 발행할 수 있다. In response to any communication from pipe manager 51 indicating the possibility of difficulty in servicing the expected data flow, control process 36 issues instructions corresponding to pipe manager and / or other modules as needed. can do.

필요한 서비스들을 위해 미리 협정하고, 적절한 데이터 접속들을 셋업하는 방금 상술한 작업들 외에도, 파이프 관리기는 또한 흐름 제어 관리기로서 동작할 수 있다 - 셀 폰에서 상이한 모듈들로부터 데이터의 이전을 조정하고, 제어 처리기 모듈(36)로 다시 에러들을 보고한다. In addition to the tasks just described above that pre-negotiate for the necessary services and set up appropriate data connections, the pipe manager can also act as a flow control manager-coordinating the transfer of data from different modules in the cell phone, and controlling the processor. Errors are reported back to module 36.

상술된 논의가 외부로 나가는 데이터 트래픽에 초점을 맞추었지만, 셀 폰에 다시 내부로 들어오는 유사한 흐름이 존재한다. 파이프 관리기(및 제어 처리기 모듈)는 이 트래픽을 마찬가지로 관리하도록 도울 수 있다 - 외부로 나가는 트래픽과 관련하여 논의된 것과 상보적인 서비스들을 제공함.Although the above discussion has focused on outgoing data traffic, there is a similar flow coming back inside the cell phone. Pipe managers (and control processor modules) can help manage this traffic as well-providing services complementary to those discussed with respect to outgoing traffic.

일부 실시예들에서, 클라우드의 외부에 - 상술된 기능의 실행시 셀 폰의 파이프 관리기(51)와 협력하는 - 파이프 관리기 대응 모듈(53)이 존재할 수 있다.In some embodiments, there may be a pipe manager correspondence module 53 outside of the cloud-cooperating with the pipe manager 51 of the cell phone in the execution of the functions described above.

제어 처리기 및 파이프 Control handler and pipe 관리기의Manager 소프트웨어 software 실시예Example

자율형 로봇 공학들의 분야의 연구는 본 명세서에 기술된 시나리오들과 일부 유사한 과제들을 공유하며, 특히, 국부적으로 취해진 동작을 유발하는 로컬 및 원격 처리들에 데이터를 통신하도록 센서들의 시스템을 가능하게 하는 것을 공유한다. 로봇 공학들의 경우, 로봇을 불편한 방식에서 이동시키는 것을 관련시킨다; 본 경우에서 마주치는 이미지, 사운드 등에 기초하여 원하는 경험을 제공하는 것에 가장 일반적으로 초점을 맞춘다. Research in the field of autonomous robotics shares some similar challenges with the scenarios described herein, and in particular enables a system of sensors to communicate data to local and remote processes that cause locally taken actions. Share that In the case of robotics, it involves moving the robot in an inconvenient way; The most common focus in this case is on providing the desired experience based on the image, sound, etc. encountered.

장애물 회피와 같은 단순한 동작들을 실행하는 것과 반대로, 본 기술의 양태들은 더 고 레벨들의 의미론 및 따라서 센서 입력에 기초하여 더 풍부한 경험들을 제공하기 원한다. 카메라가 포스트를 가리키게 하는 이용자는 벽까지의 거리를 알기를 원하지 않는다; 이용자는 포스터가 영화, 상영 장소, 리뷰들, 그 친구들이 생각하는 것들 등에 관련된다면, 포스터 상의 콘텐트에 관해 알기를 원하는 쪽으로 훨씬 더 많이 기울어진다. In contrast to performing simple operations such as obstacle avoidance, aspects of the present technology desire to provide richer experiences based on higher levels of semantics and thus sensor input. The user who makes the camera point to the post does not want to know the distance to the wall; The user is much more inclined towards wanting to know about the content on the poster, if the poster is related to the movie, the show, reviews, what their friends think, and so on.

이러한 차이들에도 불구하고, 로봇 툴키트들로부터의 아키텍처 방식들이 본 콘텍스트에서 이용되도록 적응될 수 있다. 하나의 이러한 로봇 툴키트는 플레이어 프로젝트와 같은 것이다 - sourceforge-dot-net로부터 개방 소스로서 이용 가능한 소스 자유로운 소프트웨어 도구들 및 센서 애플리케이션들의 세트.Despite these differences, architectural approaches from robotic toolkits can be adapted for use in the present context. One such robotic toolkit is something like a player project-a set of source free software tools and sensor applications available as open source from sourceforge-dot-net.

플레이어 프로젝트 아키텍처의 예시는 도 19a에 도시된다. 모바일 로봇(통상적으로 비교적 낮은 성능의 처리기를 가짐)은 무선 프로토콜을 이용하여 고정된 서버(상대적으로 더 높은 성능의 처리기)와 통신한다. 다양한 센서 주변기기들이 각각의 구동기 및 API를 통해 모바일 로봇(클라이언트) 처리기에 결합된다. 마찬가지로, 다른 API를 통해 소프트웨어 라이브러리로부터 서버 처리기에 의해 서비스들이 호출될 수 있다. (CMU CMVision 라이브러리가 도 19a에 도시된다.)An example of the player project architecture is shown in FIG. 19A. Mobile robots (usually having relatively low performance processors) communicate with fixed servers (relatively higher performance processors) using wireless protocols. Various sensor peripherals are coupled to the mobile robot (client) processor via respective drivers and APIs. Similarly, services can be invoked by server handlers from software libraries via other APIs. (CMU CMVision library is shown in FIG. 19A.)

(서비스 라이브러리들 및 센서들에 대해 로봇 기기들을 인터페이싱하기 위한 기본 도구들 외에도, 플레이어 프로젝트는 2D 환경에서 움직이는 모바일 로봇들의 개체수를 다양한 센서들 및 처리로 - 비주얼 볼브(blob) 검출을 포함함 - 시뮬레이팅하는 "스테이지(Stage)" 소프트웨어를 포함한다. "Gazebo"는 스테이지 모델을 3D로 확장한다.)In addition to the basic tools for interfacing robotic devices to service libraries and sensors, the player project simulates the population of mobile robots moving in a 2D environment with various sensors and processing-including visual blow detection. Includes "rating" software rating "Gazebo" extends the stage model to 3D.)

이러한 시스템 아키텍처에 의해, 새로운 센서들이 - 로봇 API와 인터페이싱하는 구동기 소프트웨어의 제공에 의해 - 신속히 활용될 수 있다. 유사하게, 새로운 센서들은 서버 API를 통해 쉽게 플러그인될 수 있다. 2개의 플레이어 프로젝트 API들은 구동기들 및 서비스들이 서버 또는 로봇의 특정 구성과 스스로 관련될 필요가 없도록(반대로도 가능), 표준화된 추상화들을 제공한다. With this system architecture, new sensors can be utilized quickly-by providing driver software to interface with the robotic API. Similarly, new sensors can be easily plugged in via the server API. Two player project APIs provide standardized abstractions so that drivers and services do not need to be associated with a particular configuration of a server or robot (or vice versa).

(하기에 논의되는 도 20a는 또한 국부적으로 이용 가능한 동작들, 외부적으로 이용 가능한 동작들 및 센서들 사이의 추상화 계층을 제공한다.)(FIG. 20A, discussed below, also provides an abstraction layer between locally available operations, externally available operations, and sensors.)

본 기술의 특정 실시예들은 기술자들에게 친숙한 패킷 네트워크 및 처리간 & 처리내 통신 구조들에 의해 연결된(예를 들면, 명명된 파이프들, 소켓들 등) 플레이어 프로젝트의 것과 유사한 로컬 처리 & 원격 처리 패러다임을 이용하여 구현될 수 있다. 상기 통신 미뉴셔(minutiae)는 상이한 처리들이 통신할 수 있는 프로토콜이다; 이것은 메시지 통과 패러다임 및 메시지 큐의 형태, 또는 키벡터들의 충돌들이 그 사실 후에 처리되는(재송신, 사실상 적시인 경우 중단 등) 더 많은 네트워크 중심 방식을 취할 수 있다. Certain embodiments of the present technology are a local processing & remote processing paradigm similar to that of a player project (eg named pipes, sockets, etc.) connected by packet networks and inter-process & intra-process communication structures familiar to the technicians. It can be implemented using. The communication minutiae is a protocol by which different processes can communicate; This may take the form of a message passing paradigm and message queue, or a more network-centric way in which collisions of keyvectors are handled after that fact (retransmission, interruption in fact timely, etc.).

이러한 실시예들에서, 모바일 디바이스(예를 들면 마이크로폰, 카메라) 상의 센서들로부터의 데이터는 연관된 명령어들과 함께 키벡터 형태로 패키징될 수 있다. 데이터와 연관된 명령어(들)는(은) 표현되지 않을 수 있다; 이들은 암시적일 수 있거나(바이엘 변환(Bayer conversion)과 같이) 콘텍스트 또는 이용자 요구들에 기초하여 세션 특정될 수 있다(사진 찍기 모드에서, 얼굴 인식이 생각될 수 있다.) In such embodiments, data from sensors on a mobile device (eg microphone, camera) may be packaged in keyvector form with associated instructions. Instruction (s) associated with the data may not be represented; These may be implicit (such as Bayer conversion) or session specific based on context or user needs (in the photographing mode, face recognition may be thought of).

특정 어레인지먼트에서, 각각의 센서로부터의 키벡터들은 센서의 하드웨어 특정 실시예들을 발췌하여 선택된 프로토콜에 충실한 완전히 형성된 키벡터를 제공하는 디바이스 구동기 소프트웨어 처리들에 의해 생성 및 패키징된다. In a particular arrangement, keyvectors from each sensor are generated and packaged by device driver software processes that extract hardware specific embodiments of the sensor and provide a fully formed keyvector that is faithful to the selected protocol.

디바이스 구동기 소프트웨어는 그 후에, 그 센서에 고유한 출력 큐 상에, 또는 모든 센서들에 의해 공유된 공용 메시지 큐에 형성된 키벡터를 배치할 수 있다. 방식에 상관없이, 로컬 처리들은 키벡터들을 소비하고, 큐 상에 다시 결과로서 생긴 키벡터들을 배치하기 전에 필요한 동작들을 실행할 수 있다. 원격 서비스들에 의해 처리되는 이들 키벡터들은 그 후에, 패킷들에 배치되고, 키벡터들을 분배하는 - 라우터와 유사한 - 원격 서비스에 또는 부가의 처리를 위한 원격 처리들에 직접 송신된다. 시스템의 임의의 센서들 및 처리들을 초기화 또는 셋업하기 위한 명령어들은 제어 프로토콜로부터 유사한 방식으로 분배될 수 있음(예를 들면, 도 16의 박스(36))을 판독자에게 명백하다.The device driver software can then place a keyvector formed on an output queue unique to that sensor, or in a shared message queue shared by all sensors. Regardless of the manner, local processes can consume keyvectors and perform the necessary operations before placing the resulting keyvectors back on the queue. These keyvectors processed by the remote services are then placed in packets and sent directly to a remote service-similar to a router-that distributes the keyvectors or to remote processes for further processing. It is apparent to the reader that the instructions for initializing or setting up any sensors and processes in the system can be distributed in a similar manner from the control protocol (eg, box 36 of FIG. 16).

브랜치Branch 예측; 상업적 prediction; Commercial 인센티브들Incentives

브랜치 예측의 기술은 점차적으로 복잡한 처리기 하드웨어의 필요들을 충족시키기 위해 발생하였다; 이것은 조건적 브랜치들이 해결되도록 대기하지 않고, 데이터 및 명령어들을 페치하기 위해(그리고, 일부 경우들에서, 명령어들을 실행시키기 위해) 긴 파이프라인들을 가진 처리기를 허용한다. The technique of branch prediction has gradually emerged to meet the needs of complex processor hardware; This allows a handler with long pipelines to fetch data and instructions (and, in some cases, to execute instructions) without waiting for conditional branches to be resolved.

유사한 과학이 본 콘텍스트에서 적용될 수 있다 - 인간 이용자가 취하는 동작이 무엇인지를 예측함. 예를 들면, 상기 논의된 바와 같이, 방금 상술된 시스템은 특정 데이터 또는 처리 동작들이 다가올 것을 예상하여 특정 처리기들 또는 통신 채널들을 "사전-워밍"할 수 있다. Similar science can be applied in this context-predicting what actions a human user takes. For example, as discussed above, the system just described may "pre-warm" certain processors or communication channels in anticipation of certain data or processing operations coming.

이용자가 그의 지갑으로부터 아이폰을 꺼내어(센서가 증가된 광에 노출됨) 눈의 레벨까지 들어올릴 때(가속도계에 의해 감지됨), 이용자가 하려는 것은? 예측을 하기 위해 과거 거동(past behavior)에 대한 참조가 이루어진다. 특히, 관련성은 이용자가 이용된 마지막 시간에 폰 카메라로 무엇을 행했는지; 이용자가 어제와 동일한 시간(그리고 몇 주 전 그 시간에)에 폰 카메라로 무엇을 행했는지; 이용자가 동일한 위치 주위에서 마지막으로 무엇을 행했는지 등을 포함할 수 있다. 대응하는 동작들은 예측에서 취해질 수 있다. What does the user want to do when the user removes the iPhone from his wallet (the sensor is exposed to increased light) and raises it to eye level (detected by the accelerometer)? References are made to past behavior to make predictions. In particular, the relevance was what the user did with the phone camera at the last time they were used; What the user did with the phone camera at the same time as yesterday (and a few weeks ago at that time); And what the user last did around the same location. Corresponding actions may be taken in prediction.

이용자 경도/위도가 비디오 대여점 내의 위치에 대응한다면, 그것은 돕는다. DVD 박스로부터 아트워크 상의 이미지 인식을 실행하는 것이 예상될 수 있다. 가능한 인식, 아마도 SIFT 또는 다른 특징 인식을 빠르게 하기 위해, 참고 데이터가 후보 DVD들에 대해 다운로드되어야 하고 셀 폰 캐시에 저장되어야 한다. 최근 개봉작들이 양호한 예상들이다(연령 제한 없는(G 등급) 영화들 또는 폭력성이 높은 영화들을 제외 - 저장된 프로파일 데이터는 이용자가 이들 영화들을 본 이력을 가지고 있지 않음을 나타낸다). 그래서, 시청자가 과거에 본 영화들도 마찬가지다(이력적 대여 기록들에 표시된 바와 같이 - 또한 폰들에서도 이용 가능하다).If the user longitude / latitude corresponds to a location in the video rental store, it helps. It can be expected to perform image recognition on the artwork from the DVD box. In order to speed up the possible recognition, perhaps SIFT or other feature recognition, reference data must be downloaded for the candidate DVDs and stored in the cell phone cache. Recent releases are good predictions (except age-limited (G-grade) movies or highly violent movies-stored profile data indicates that the user has no history of viewing these movies). Thus, the same is true of movies that viewers have seen in the past (as indicated in historical rental records-also available on phones).

이용자의 위치가 시내 거리에 대응하고, 자기계 및 다른 위치 데이터는 그녀가 수평으로부터 위로 기울어진 북쪽을 보고 있음을 나타낸다면, 무엇에 관심이 있을 가능성이 있는가? 이미지 데이터가 없는 경우에도, 구글 Streetview와 같은 온라인 리소스들에 대한 신속한 참고는 그녀가 5번가 애비뉴를 따라 비즈니스 사이너지(signage)를 보고 있을 것을 제안할 수 있다. 아마도, 이러한 지리학에 대한 특징 인식 참고 데이터는 획득될 이미지 데이터에 대한 신속한 매칭을 위해 캐시에 다운로드되어야 한다. If the user's location corresponds to a city street, and the magnetic field and other location data indicate that she is looking north, tilted up from the horizontal, what might you be interested in? Even without image data, a quick reference to online resources such as Google Streetview can suggest that she is looking at business signage along Fifth Avenue. Perhaps feature recognition reference data for this geography should be downloaded to the cache for quick matching of the image data to be obtained.

실행의 빠르게 하기 위해, 캐시는 합리적인 방식으로 로드되어야 한다 - 그래서 가장 가능성 있는 오브젝트가 먼저 고려된다. 위치가 5번가 애비뉴를 나타내는 메타데이터를 포함하는 구글 Streetview는 스타벅스, 노드스트롬 상점 및 타이 레스토랑에 대한 부호들을 가진다. 이용자에 대한 저장된 프로파일 데이터는 매일 스타벅스에 방문하는 것이 드러난다(그녀는 브랜드의 고객우대 카드를 가지고 있음); 그녀는 단골 의복의 고객이다(노드스트롬의 신용 카드보다는 Macy의 것을 이용함); 그녀는 결코 타이 레스토랑에서는 식사하지 않는다. 아마도, 캐시는 스타벅스 부호에 뒤이어 노드스트롬에 뒤이어 타이 레스토랑을 가장 신속히 식별하도록 로드되어야 한다. To speed up execution, the cache must be loaded in a reasonable way-so the most likely object is considered first. Google Streetview, whose location contains metadata indicating Fifth Avenue avenue, has signs for Starbucks, Nordstrom stores, and Thai restaurants. Stored profile data for users reveals daily visits to Starbucks (she has a brand loyalty card); She is a customer of regular clothing (using Macy's rather than Nordstrom's credit cards); She never eats in Thai restaurants. Perhaps the cache should be loaded to identify the Thai restaurant most quickly following the Starbucks sign, followed by Nordstrom.

뷰파인더 상의 제공을 위해 캡처된 낮은 해상도 이미지는 있음직한 얼굴들을 하이라이팅하는 카메라의 특징(예를 들면, 노출 최적화를 위해)을 트리거링하는데 실패한다. 그것이 도와준다. 얼굴 인식과 연관된 복잡한 처리를 사전-워밍해야할 필요가 없다. Low resolution images captured for presentation on the viewfinder fail to trigger the camera's feature (eg, for exposure optimization) to highlight likely faces. It helps. There is no need to pre-warm the complex processing associated with facial recognition.

그녀는 가상 셔터 버튼을 터치하여, 높은 해상도 이미지의 프레임을 캡처하고, 이미지 분석이 진행하게 된다 - 시야에 있는 것을 인식하려고 시도하여, 카메라 애플리케이션은 캡처된 프레임에서 오브젝트들에 관련된 그래픽 링크들을 오버레이할 수 있다. (또는 이것은 이용자 동작없이 일어날 수 있다 - 카메라는 활발하게 보고 있을 수 있다.)She touches the virtual shutter button to capture a frame of high resolution image, and image analysis proceeds-in an attempt to recognize what is in view, the camera application can overlay graphical links related to the objects in the captured frame. Can be. (Or this may happen without user action-the camera may be watching actively.)

일 특정 어레인지먼트에서, 비주얼 "보블들(baubles)"(도 0)이 캡처된 이미지에 오버레이된다. 보블들 중 임의에 대한 탭핑이 링크들 중 랭크된 리스트와 같이 정보의 스크린을 풀업한다. 집선된 이용자 데이터에 기초하여 순서대로 검색 결과들을 랭크하는 구글 웹 검색과 달리, 카메라 애플리케이션은 이용자의 프로파일에 주문 제작된 랭크를 시도한다. 스타벅스 부호 또는 로고가 프레임에서 발견된다면, 스타벅스 링크는 이 이용자에 대한 상부에 위치된다. In one particular arrangement, visual "baubles" (Figure 0) are overlaid on the captured image. Tapping on any of the baubles pulls up a screen of information, such as a ranked list of links. Unlike Google Web Search, which ranks search results in order based on aggregated user data, the camera application attempts a customized rank in the user's profile. If a Starbucks code or logo is found in the frame, the Starbucks link is placed on top of this user.

스타벅스, 노드스트롬 및 타이 레스토랑에 대한 부호들이 모두 발견되면, 링크들은 일반적으로 그 순서로 제공된다(프로파일 데이터로부터 추론된 이용자의 선호들마다). 그러나, 셀 폰 애플리케이션은 자본주의적 기호를 가질 수 있고, 환경들이 정당한 근거가 있다면, 한 위치 또는 둘(아마도 최상 위치가 아니더라도)에 의해 링크를 촉진하기를 원할 수 있다. 본 경우에서, 셀 폰은 링크들의 각각과 연관된 어드레스들에 있는 웹 서버들에 IP 패킷들을 일상적으로 송신하여, 이들에게 아이폰 이용자가 특정 경도/위도로부터 법인 사이너지를 인식하였음을 경고한다. (프라이버시 고려사항들 및 이용자 허가들이 허용되면, 다른 이용자 데이터가 또한 제공될 수 있다.) 타이 레스토랑 서버는 즉시 응답한다 - 다음 2 단골들에게 25% 낮춘 임의의 한 항목을 제공함(레스토랑의 세일 지점 시스템은 4개의 테이블만 나타내고 계류중인 주문이 없다; 요리가 없음). 레스토랑 서버는 폰이 검색 결과들의 제공에서 이용자에게 디스카운트 제공을 제공한다면 3센트를 제공하거나, 또한 랭크된 리스트에서 제 2 장소로의 링크를 촉진한다면 5센트를 제공하거나, 또는 그렇게 하고 결과 리스트에 제공된 디스카운트 제공만 있다면 10센트를 제공한다. (스타벅스들은 또한 인센티브뿐만 아니라 마음을 끌도록 응답한다). 셀 폰은 신속히 레스토랑의 제공을 수용하고, 지불이 신속히 이루어진다 - 이용자에게(예를 들면 매달 폰 청구서를 부담함) 또는 폰 캐리어(예를 들면 AT&T)에 더 가능성이 있음. 스타벅스, 타이 레스토랑 및 노드스트롬에 대한 링크들이 그 순서대로 제공되며, 레스토랑의 링크들은 다음 2 단골들에 대한 디스카운트를 표기한다. If signs for Starbucks, Nordstrom and Thai restaurants are all found, the links are generally provided in that order (per user preferences inferred from profile data). However, cell phone applications may have capitalist preferences and may wish to promote links by one or two (possibly not the best) locations if the circumstances are justified. In this case, the cell phone routinely sends IP packets to web servers at the addresses associated with each of the links, warning them that the iPhone user has recognized corporate synergy from a particular longitude / latitude. (If privacy considerations and user permissions are allowed, other user data may also be provided.) The Thai restaurant server responds immediately-providing the next two regulars any one item 25% lower (the restaurant's sale point). The system only displays four tables and no pending orders; no cooking). The restaurant server may provide 3 cents if the phone provides a discount offer to the user in the presentation of the search results, or 5 cents if it facilitates a link from the ranked list to the second place, or so provided in the results list. 10 cents if you have a discount. (Starbucks also responds to incentives as well as to attract). Cell phones quickly accept the provision of restaurants and payments are made quickly-more likely to the user (eg pay monthly phone bills) or to a phone carrier (eg AT & T). Links to Starbucks, Thai Restaurant, and Nordstrom are provided in that order, and links to the restaurant indicate discounts for the next two regulars.

구글의 AdWord 기술은 이미 주지되어 있다. 그것은 구글 웹 검색의 결과들에 가까운 광고주 링크들로서 어떤 광고들이 제공될지를, 경매 결정된 지불을 포함하는 팩터들에 기초하여 결정한다. 구글은 서비스 AdSense라고 칭해진 사이트들의 특정 콘텐트들에 기초하여, 제 3 자 웹 사이트들 및 블로그들에 대한 광고들을 제공하기 위해 이러한 기술을 적응시켰다. Google's AdWord technology is well known. It determines which advertisements are to be provided as advertiser links close to the results of a Google web search, based on factors including the auction determined payment. Google has adapted this technology to provide advertisements for third party web sites and blogs based on the specific content of sites called Service AdSense.

본 기술의 다른 양태에 따라, AdWord/AdSense 기술은 셀 폰들 상으로 비주얼 이미지 검색으로 확대된다. According to another aspect of the present technology, the AdWord / AdSense technology extends to visual image search onto cell phones.

워렌 버핏 전기 Snowball의 화상을 스냅핑한 작은 서점들에 위치된 이용자를 고려한다. 책은 리스트의 상부에 링크된 대응하는 아마존을 제공하기보다는 신속히 인식되고(정규 구글 검색으로 발생할 수도 있음), 셀 폰은 이용자가 독립된 서점에 위치되어 있음을 인식한다. 콘텍스트 기반 규칙은 먼저 비상업적 링크를 제공하도록 결과로서 묘사한다. 이러한 타입으로 최상부에 랭크된 것은 월 스트리트 저널 도서 리뷰이며, 제공된 링크들의 리스트의 최상부로 진행한다. 그러나, 예의는 여기까지다. 셀 폰은 도서 타이틀 또는 ISBN(또는 이미지 자체)을 구글 AdSense 또는 AdWords에 넘겨주며, 이것은 광고주 링크들이 그 오브젝트와 연관되되도록 식별한다. (구글은 임의의 제공된 이미지에 대한 그 자신의 이미지 분석을 독립적으로 실행할 수 있다. 일부 경우들에서, 이러한 셀 폰-제시된 이미지를 지불할 수 있다 - 구글은 다른 종류의 리소스들로부터 데이터를 활용하기 위한 기교를 가지고 있으므로.) 구글, 반즈 앤드 노블마다 최상부 광고주 위치를 가지며, alldiscountbooks-dot-net이 뒤따른다. 셀 폰 애플리케이션은 그들 출처를 나타내기 위하여(예를 들면, 디스플레이의 상이한 부분에 또는 상이한 컬러로 제공됨) 이들 광고주 링크들을 그래픽으로 명확한 방식으로 제공할 수 있거나, 이들을 비상업적 검색 결과들과 함께 교대로, 예를 들면 위치들 2 및 4에 삽입할 수 있다. 구글에 의해 수집된 AdSense 수익은 다시 이용자와 또는 이용자의 캐리어와 공유될 수 있다. Consider a user located in small bookstores that snapped images of Warren Buffett Electric Snowball . The book is quickly recognized (which may also result from a regular Google search) rather than providing a corresponding Amazon linked at the top of the list, and the cell phone recognizes that the user is located in an independent bookstore. Context-based rules are first described as a result to provide a non-commercial link. Top ranked in this type is the Wall Street Journal book review and proceeds to the top of the list of links provided. However, courtesy is to this. The cell phone passes the book title or ISBN (or the image itself) to Google AdSense or AdWords, which identifies advertiser links to be associated with that object. (Google can independently run its own image analysis on any given image. In some cases, this cell phone-presented image can be paid-Google can utilize data from other kinds of resources. Google, Barnes and Noble have top advertiser positions, followed by alldiscountbooks-dot-net. The cell phone application may present these advertiser links in a graphically clear manner to indicate their origin (eg, in different parts of the display or in different colors), or alternately with them, along with the non-commercial search results, For example, it can be inserted in positions 2 and 4. AdSense revenue collected by Google may again be shared with the user or with the carrier of the user.

일부 실시예들에서, 셀 폰(또는 구글)은 다시, 링크들이 제시될 회사들의 서버들을 핑잉한다 - 그들의 물리적인 세계적 기반의 온라인 가시도(visibility)를 추적하는데 도움을 준다. 핑들은 이용자의 위치 및 핑을 촉진하는 오브젝트의 식별을 포함한다. alldiscountbooks-dot-net이 핑을 수신하면, 재고를 검사할 수 있고, Snowball의 상당한 과잉 재고를 가지고 있음을 알 수 있다. 초기에 제공된 예에서와 같이, 어떤 추가적인 판촉(예를 들면, 제공된 링크에 "우리는 732개의 제본들을 가지고 있습니다-저렴!"을 포함)을 위해 추가의 지불을 제공할 수 있다.In some embodiments, the cell phone (or Google) again pings the servers of the companies for which the links are to be presented-helping to track their physical world-based online visibility. Pings include the location of the user and the identification of the object that facilitates the ping. When alldiscountbooks-dot-net receives a ping, it can check the inventory and see that it has a significant excess of Snowball inventory. As in the example provided earlier, additional payments can be provided for any additional promotion (eg, including "We have 732 bindings-cheap!" On the provided link).

더욱 현저한 검색 리스팅(예를 들면, 리스트에서 더 높게, 또는 부가의 정보로 확대)을 위해 인센티브를 제공하는 것 외에도, 회사는 또한 고객에게 정보를 서빙하기 위해 부가의 대역폭을 제공할 수 있다. 예를 들면, 이용자는 전자 게시판으로부터 비디오 이미지를 캡처하고, 친구들에게 사본을 보여주기 위해 다운로드하고 싶어할 수 있다. 이용자의 셀 폰은 이용자 제공된 콘텐트(예를 들면, 인코딩된 워터마크를 참조하여)에서 그 콘텐트를 인기있는 클립으로서 식별하고, 여러 사이트들 - 유튜브에 뒤이어 마이스페이스가 가장 인기 있음 - 로부터 이용 가능한 클립을 찾는다. 마이스페이스에 링크하도록 이용자를 유도하기 위하여, 마이스페이스는 이용자의 베이스라인 무선 서비스를 초당 3메가비트에서 초당 10메가비트로 업그레이드하도록 제공할 수 있어서, 비디오는 1/3 시간에 다운로드될 것이다. 이러한 업그레이드된 서비스는 단지 비디오 다운로드를 위한 것일 뿐이거나, 더 길 수도 있다. 이용자의 셀 폰의 스크린 상에 제공된 링크는 더 고속인 서비스의 가용성을 하이라이팅하기 위해 보정될 수 있다. (다시, 마이스페이스는 연관된 지불을 할 수 있다.)In addition to providing incentives for more prominent search listings (e.g., higher in the list, or expanding to additional information), the company may also provide additional bandwidth to serve information to customers. For example, a user may want to capture a video image from an electronic bulletin board and download it to show a copy to friends. The user's cell phone identifies the content as a popular clip in the user-provided content (e.g., with reference to the encoded watermark), and the clip available from several sites-MySpace is the most popular following YouTube. Find it. In order to encourage users to link to MySpace, MySpace can offer to upgrade their baseline wireless service from 3 megabits per second to 10 megabits per second, so that the video will be downloaded in 1/3 hour. This upgraded service is just for video download or may be longer. The link provided on the screen of the user's cell phone can be calibrated to highlight the availability of faster services. (Again, MySpace can make associated payments.)

때때로, 네트워크 병목현상을 완화하기 위해 무선 링크의 셀 폰 단부 상에 대역폭 조임판(throttle)을 개방하도록 요구한다. 또는 대역폭 서비스 변경이 셀폰에 의해 요청되거나 허가되어야 한다. 이러한 경우, 마이스페이스는 더 높은 대역폭 서비스를 위한 필요한 단계들을 취하기 위해 셀 폰 애플리케이션에 알릴 수 있고, 마이스페이스는 추가의 연관된 비용들을 이용자에게(또는 이용자의 계정의 이익을 위해 캐리어에) 환불할 것이다. Sometimes, it is required to open a bandwidth throttle on the cell phone end of the radio link to mitigate network bottlenecks. Or a bandwidth service change must be requested or granted by the cell phone. In this case, MySpace can inform the cell phone application to take the necessary steps for higher bandwidth service, and MySpace will refund the additional associated costs to the user (or to the carrier for the benefit of the user's account). .

일부 어레인지먼트들에서, 서비스의 품질(예를 들면, 대역폭)이 파이프 관리기(51)에 의해 관리된다. 마이스페이스로부터의 명령어들은 파이프 관리기가 증대된 서비스 품질을 요청하고 이용자가 마이스페이스를 선택하기 전에도 예상된 높은 대역폭 세션을 셋업하는 것을 시작하도록 요청할 수 있다. In some arrangements, the quality of service (eg, bandwidth) is managed by the pipe manager 51. Instructions from myspace may request the pipe manager to request increased quality of service and begin setting up the expected high bandwidth session even before the user selects myspace.

일부 시나리오들에서, 벤더들은 그 콘텐트를 위한 선택적인 대역폭을 협정할 수 있다. 마이스페이스는 예를 들면, AT&T 폰 가입자들에게 전달된 모든 마이스페이스 콘텐트가 초당 10메가비트들로 전달되는 것을 AT&T와 거래할 수 있다 - 대부분의 가입자들은 보통 초당 3메가비트들의 서비스만을 수신한다. 더 높은 품질 서비스는 제공된 링크에서 이용자에게 하이라이팅될 수 있다. In some scenarios, vendors can negotiate an optional bandwidth for that content. MySpace can, for example, transact with AT & T that all myspace content delivered to AT & T phone subscribers is delivered at 10 megabits per second-most subscribers usually only receive 3 megabits of service per second. Higher quality service may be highlighted to the user at the provided link.

상술된 것으로부터, 특정 실시예들에서, 비주얼 자극에 응답하여 모바일 폰에 의해 제공된 정보는 아마도 이용자의 인구 통계적 프로파일에 기초하여, (1) 이용자의 선호들, 및 (2) 제 3 자 경합 양쪽 모두의 기능임을 인식할 것이다. 인구통계적으로 동일하지만, 상이한 미각을 가진 이용자들은 레스토랑들이 밀집된 거리를 보고 있을 때 상이한 보블들 또는 연관된 정보로 제공될 가능성이 있다. 동일한 미각과 다른 선호 정보를 가진 -그러나 인구통계적 팩터(예를 들면, 나이, 성별)가 상이한 - 이용자들도 마찬가지로, 벤더들 등이 상이한 안구들에 대해 상이하게 지불되기를 바라기 때문에, 인구 통계적 상이한 보블들/정보로 제공될 수 있다.From the foregoing, in certain embodiments, the information provided by the mobile phone in response to the visual stimulus may be based on the user's demographic profile, both (1) the user's preferences, and (2) the third party contention. It will be recognized by everyone. Demographically identical but users with different tastes are likely to be served with different baubles or associated information when restaurants are looking at dense streets. Users with the same taste and different preference information—but with different demographic factors (eg, age, gender) —likewise have different demographic baubles because they want vendors, etc., to pay differently for different eyes. / Information may be provided.

이용자 거동의 Of user behavior 모델링modelling

특정 물리적 환경, 특정 장소 및 시간, 및 예상된 이용자의 거동 프로파일의 지식의 도움으로, 물리적 세상과 인간 컴퓨터 상호작용의 시뮬레이션 모델들은 로봇 공학 및 시청률 조사와 같이 분산되어 필드들로부터 도구들 및 기술들에 기초할 수 있다. 이 예는 특정 시간에서 박물관에서 예상된 모바일 디바이스들의 수와; 이러한 디바이스들이 이용되고 있을 가능성이 있는 특정 센서들; 어떤 자극이 이들 센서들에 의해 캡처될 것으로 예상되는지(예를 들면, 그들이 카메라를 가리키고 있는 장소, 마이크로폰이 무엇을 듣고 있는지 등)일 수 있다. 부가의 정보가 또한 이용자들 사이의 사회적 관계들에 관한 가정들을 포함할 수 있다: 그들은 공통된 관심사들을 공유할 가능성이 있는가? 그들은 콘텐트를 공유하거나, 경험들을 공유하거나, 또는 wiki-맵들과 같은 위치-기반 경험들을 생성하고 싶어할 가능성이 있는 공통의 사회적 서클들 내에 존재하는가(2009년 MobileHCI, Barricelli에 의한 "Map-Based Wikis as Contextual and Cultural Mediators," 참조)?With the help of knowledge of specific physical environments, specific places and times, and expected user behavior profiles, simulation models of the physical world and human computer interactions are distributed, such as robotics and viewership surveys, tools and techniques from the fields. Can be based on. This example includes the number of mobile devices expected in the museum at a particular time; Certain sensors that such devices are likely to be used; What stimuli are expected to be captured by these sensors (eg, where they are pointing at the camera, what the microphone is listening to, etc.). Additional information may also include assumptions about social relationships between users: are they likely to share common interests? Do they exist in common social circles that may want to share content, share experiences, or create location-based experiences such as wiki-maps (Mobile-HCI, 2009, “Map-Based Wikis” as Contextual and Cultural Mediators, "

그 외에도, 모델링은 선천적인 인간 거동(예를 들면, 사람들은 게임의 휴식 시간 동안보다 초과 시간 동안 점수판으로부터 이미지를 캡처할 가능성이 더 많음)에 기초하는 더욱 진화된 예측 모델들에 대해 과거의 이벤트들(예를 들면, 얼마나 많은 사람들이 농구 게임 동안 포틀랜드 트래일블레이저스의 점수판으로부터 이미지를 캡처하기 위해 셀 폰 카메라들을 이용했는지 등)에서의 관찰들로부터 도출된 일반화된 발견적 교수법에 기초할 수 있다. In addition, modeling has historically been used for more advanced predictive models based on inherent human behavior (eg, people are more likely to capture images from the scoreboard during overtime than during game breaks). Based on generalized heuristic teaching derived from observations at events (eg, how many people used cell phone cameras to capture images from the scoreboard of the Portland Trail Blazers during the basketball game, etc.) Can be.

이러한 모델들은 경험을 준비 및 측정하는데 관련된 비즈니스 엔티티들에 부가하여, 이용자들에 대한 경험의 많은 양태들을 통보할 수 있다. Such models may inform many aspects of the experience for the users, in addition to the business entities involved in preparing and measuring the experience.

이들 후자 엔티티들은 이벤트 제품에 관련된 통상적인 가치 사슬 참여자들과, 상호작용을 측정하고 이를 화폐로 정하는 것에 관련된 어레인지먼트들로 구성된다. 생성측 및 연관된 권리들 상의 이벤트 계획자들, 프로듀서들, 기술자들은 로열티측 상과 공동체이다(ASCAP, Directors Guild of America 등). 측정 조망에서, 가입 결정된 이용자들 및 디바이스들로부터의 두 샘플링-기반 기술들 및 인구조사-구동된 기술들(census-driven techniques)이 활용될 수 있다. 더욱 정적인 환경들에 대한 메트릭들은 특정 센서 자극에 대해 클릭률(CTR: Click Through Rates)의 더욱 진화된 모델들에 대한, 디지털 서비스 제공자 네트워크 상에서 생성된 디지털 트래픽에 의해 생성된 단위 수익(RPU: Revenue Per Unit)으로 구성된다(얼마나 많은 대역폭이 소비되고 있는가). These latter entities consist of typical value chain participants involved in the event product, and arrangements related to measuring and monetizing interactions. Event planners, producers, and technicians on the generating side and associated rights are loyalty side awards and communities (ASCAP, Directors Guild of America, etc.). In the measurement view, two sampling-based techniques and census-driven techniques from users and devices determined to subscribe may be utilized. Metrics for more static environments are based on unit revenue (RPU) generated by digital traffic generated on a digital service provider network, for more evolved models of Click Through Rates (CTR) for specific sensor stimuli. Per Unit) (how much bandwidth is being consumed).

예를 들면, 루브르의 모나리자 그림은 박물관의 다른 그림들보다 훨씬 더 높은 CTR을 가질 가능성이 있어서, 콘텐트 준비, 예를 들면, 이용자가 박물관에 접근하거나 들어갈 때 모바일 디바이스 상으로 스스로 미리 로드되지 않는 경우, 모나리자에 관련된 콘텐트는 가능한 클라우드의 에지에 가깝게 캐싱되어 있어야 하는 콘텐트 준비에 대한 우선순위와 같은 사항들을 통보한다. (당연히, 동일한 중요도는 경험 및 환경을 화폐로 정하는데 있어서 CTR이 하는 역할이다.) For example, the Mona Lisa painting of the Louvre is likely to have a much higher CTR than other paintings in the museum, so that content preparation, for example, does not preload itself on the mobile device when the user accesses or enters the museum. For example, content related to Mona Lisa informs such things as priority for content preparation that should be cached as close to the edge of the cloud as possible. (Of course, the same importance is the role of the CTR in monetizing experiences and the environment.)

로댕 작품들의 콜렉션을 가진 정원을 가진 조각 박물관에 들어가는 학교 단체를 고려하자. 박물관은 정원을 서빙하는 서버들 또는 인프라스트럭처(예를 들면 라우터 캐시들) 상의 로뎅 및 그의 작품들에 관련된 콘텐트를 제공할 수 있다. 더욱이, 방문자들이 미리 확립된 사회 단체를 포함하기 때문에, 박물관은 어떤 사회적 접속성을 예상할 수 있다. 그래서 박물관은 달리 이용되지 않을 수 있는 능력들의 공유(예를 들면 ad hoc 네트워킹)를 가능하게 할 수 있다. 한 학생이 박물관의 온라인 콘텐트에 특별한 로댕 조각에 관해 더 많이 학습하기 위해 질의한다면, 시스템은 단체의 나머지들과 이 정보를 공유하기 위해 학생을 즉시 초대하는 것으로 상세한 정보의 전달을 달성할 수 있다. 박물관 서버는 그러한 정보가 공유될 수 있는 학생의 특정 "친구들"을 제안할 수 있다 - 이러한 정보가 페이스북 또는 다른 소셜 네트워킹 데이터 소스로부터 공개적으로 액세스 가능한 경우. 친구들의 이름들 외에도, 이러한 소셜 네트워킹 데이터 소스는 또한 디바이스 식별자들, IP 어드레스들, 프로파일 정보 등을 학생들의 친구들에게 제공할 수 있다 - 단체의 나머지들에게 교육 자료의 보급을 돕도록 레버리징될 수 있다. 이것이 그들 단체의 다른 학생에게 관심이 있었으므로, 이들 다른 학생들은 이러한 관련 특정 정보를 찾을 수 있다 - 원래 학생의 이름이 식별되지 않는 경우에도. 원래 학생이 전달된 정보와 함께 식별된다면, 이것은 그룹의 나머지들에게 정보의 관심을 높일 수 있다. Consider a school group entering a sculpture museum with a garden with a collection of Rodin works. The museum can provide content related to Rodin and his works on servers or infrastructure (eg router caches) serving the garden. Moreover, because visitors include pre-established social groups, the museum can anticipate some social connectivity. Thus museums can enable sharing of capabilities (eg ad hoc networking) that may not otherwise be used. If a student queries the museum's online content to learn more about a particular Rodin sculpture, the system can achieve the delivery of detailed information by immediately inviting the student to share this information with the rest of the organization. The museum server may suggest a particular "friend" of the student with whom such information may be shared-if such information is publicly accessible from Facebook or other social networking data sources. In addition to friends 'names, this social networking data source can also provide device identifiers, IP addresses, profile information, etc. to students' friends-leveraged to help disseminate educational materials to the rest of the organization. have. Since this was of interest to other students in their group, these other students could find this relevant specific information-even if the original student's name was not identified. If the original student is identified along with the information conveyed, this can raise the attention of the information to the rest of the group.

(사회적으로 링크된 단체의 검출은 박물관의 네트워크 트래픽의 리뷰로부터 추론될 수 있다. 예를 들면, 디바이스가 다른 디바이스에 데이터의 패킷들을 송신하고, 박물관의 네트워크가 통신의 두 단부들 - 디스패치 및 전달 - 을 처리하는 경우, 박물관에 2개의 디바이스들 사이의 연관이 존재한다. 디바이스들이 네트워크 이용의 이력적 패턴들을 가지는 디바이스들이 아니면, 예를 들면 고용인들, 시스템은 박물관에 대한 2명의 방문자들이 사회적으로 연결되어 있다고 결론지을 수 있다. 이러한 통신들의 웹이 검출된다면 - 여러 개의 친숙하지 않은 디바이스들을 관련시키면, 사회 단체의 방문자들이 식별될 수 있다. 단체의 규모는 이러한 네트워크 트래픽에서 상이한 참여자들의 수에 의해 측량될 수 있다. 단체에 관한 인구 통계적 정보는 데이터가 교환되는 외부 어드레스들로부터 추론될 수 있다; 중학생들은 마이스페이스 트래픽의 높은 빈도를 가질 수 있다; 대학생들은 대학 도메인에서 외부 어드레스들을 가지고 통신할 수 있다; 고령 시민들은 상이한 트래픽 프로파일을 논증할 수 있다. 모든 이러한 정보는 방문자들에 제공된 정보 및 서비스들을 자동으로 적응시키는데 - 박물관의 관리에 유용한 정보를 제공할 뿐 아니라 백화점 마케팅하는데 - 이용될 수 있다.) (Detection of socially linked entities can be inferred from a review of the museum's network traffic. For example, a device sends packets of data to another device, and the museum's network sends two ends of the communication-dispatch and delivery. In the case of processing-there is an association between the two devices in the museum, if the devices are not devices with historical patterns of network usage, e.g. employees, the system has two visitors to the museum socially. If a web of such communications is detected-associating multiple unfamiliar devices, visitors of social organizations can be identified, the size of the organization being determined by the number of different participants in this network traffic. Can be surveyed. Demographic information about an organization Can be inferred from external addresses being exchanged; middle school students can have a high frequency of myspace traffic; college students can communicate with external addresses in the college domain; older citizens can argue different traffic profiles. All this information can be used to automatically adapt the information and services provided to visitors-not only to provide useful information for the management of the museum, but also to market department stores.)

다른 상황들을 고려하자. 하나는 헤드라인 선수를 특징짓는(예를 들면, 브루스 스프링스틴, 또는 프린스) U.S. 풋볼 수퍼볼의 휴식 시간이다. 쇼는 수백의 팬들이 화상들 또는 오디오-비디오의 이벤트를 캡처하게 할 수 있다. 예측 가능한 대중 거동을 가진 다른 콘텍스트는 NBA 챔피언십 농구 게임의 종료이다. 팬들은 최종 부저의 흥분을 기념하기를 원할 수 있다: 점수판, 스트리머들, 및 천장에서 떨어지는 색종이 등. 이러한 경우들에서, 콘텐트 또는 경험의 준비 또는 최적화, 전달을 위해 취해질 수 있는 동작들이 취해져야 한다. 예들은 연관된 콘텐트에 대한 권리 허가; 가상 세계들 및 다른 합성된 콘텐트를 렌더링, 루틴 시간-비영향적 네트워크 트래픽 다운 조임, 사람이 아마존으로부터 기념 도서들/음악을 구매할 때 호출될 수 있는 광고 리소스들을 큐잉(페이지들을 캐싱, 이용자들을 금융 사이트들에 인증), 포스트-게임 인터뷰들에 대한 링크들 전파(일부 사전-작성/편집된 및 진행할 준비됨), 스타 플레이어들의 트위터 피드들을 캐싱, Jumbotron 디스플레이 상에서 시청하는 홈타운 군중들을 보여주는 도심으로부터의 비디오 버퍼링 - 부저에서 즐거움을 분출 등; 경험, 또는 가능한 곳에서 미리 예습/캐싱되어야 하는 후속 동작들에 관련된 모든 것을 포함한다. Consider other situations. One is the U.S. U.S. It is a break of football superball. The show can have hundreds of fans capture the events of pictures or audio-video. Another context with predictable mass behavior is the end of the NBA Championship basketball game. Fans may want to celebrate the excitement of the final buzzer: scoreboards, streamers, and colored paper falling from the ceiling. In such cases, actions should be taken that may be taken to prepare or optimize, or deliver, the content or experience. Examples include granting rights to associated content; Render virtual worlds and other synthesized content, downtime routinely non-influenced network traffic, queuing advertising resources that can be called when a person purchases commemorative books / music from Amazon (caching pages, letting users access financial sites) To other), propagating links to post-game interviews (some pre-written / edited and ready to go), caching star players' Twitter feeds, video from the city center showing hometown crowds watching on Jumbotron displays Buffering-such as spewing pleasure from the buzzer; It includes everything related to experience or subsequent actions that need to be pre-cached / cached where possible.

이용자 동작 및 관심을 유발할 가능성이 가장 큰 센서들에 대한 자극(오디오, 비주얼, 촉각, 냄새 등)은 그러한 동작을 유발할 가능성이 적은 자극보다는 광고 관점으로부터 훨씬 더 가치있다(구글의 Adwords 광고-서빙 시스템이 기초한 경제적 원리들과 유사함). 이러한 팩터들 및 메트릭들은 본 기술분야에 기술자에 의해 잘 이해되는 경매 모델들을 통해 모델들을 광고하는 것에 직접 영향을 미친다. Stimulus (audio, visual, tactile, smell, etc.) for sensors that are most likely to induce user behavior and attention is far more valuable from an advertising perspective than stimuli that are less likely to cause such behavior (Google's Adwords ad-serving system Similar to these underlying economic principles). These factors and metrics directly affect advertising models through auction models that are well understood by those skilled in the art.

다수의 전달 메커니즘들은 제 3 자에 의한 광고 전달을 위해 존재하여, VAST와 같은 알려진 프로토콜들을 레버리징한다. VAST (Digital Video Ad Serving Template)는 연관된 XML 스키마뿐만 아니라, 광고 서버들과 스크립터블 비디오 렌더링 시스템들 사이의 기준 통신 프로토콜들을 확립한 Interactive Advertising Bureau에 의해 발행된 표준이다. 예를 들면, VAST는 일반적으로 웹 페이지 코드 - 트래픽을 추적하고 쿠키들을 관리하는데에도 또한 도움을 주는 코드 - 에 포함된 자바스크립트의 비트에 기초하여, 독립적인 웹 사이트들에 비디오 광고들의 서비스를 표준화하도록 돕는다(구식 배너 광고들을 대체함). VAST는 또한, 웹 사이트에 의해 전달된 다른 비디오 콘텐트의 프리-롤 및 포스트-롤 뷰잉에 판촉 메시지들을 삽입할 수 있다. 웹 사이트 소유주는 광고들을 판매 또는 실행하는 것에 스스로 관련하지 않지만, 웹 사이트 소유주는 매달 말에 시청률/임프레션들에 기초하여 지불을 수령한다. 유사한 방식으로, 실제 세계에서 이용자에게 제공되고 모바일 기술에 의해 감지된 물리적 자극은 관련자에게 지불들을 위한 기초가 될 수 있다. A number of delivery mechanisms exist for the delivery of advertisements by third parties, leveraging known protocols such as VAST. Digital Video Ad Serving Template (VAST) is a standard issued by the Interactive Advertising Bureau that establishes not only associated XML schemas, but also reference communication protocols between ad servers and scriptable video rendering systems. For example, VAST generally standardizes the service of video ads on independent web sites based on bits of JavaScript contained in web page code-code that also helps in tracking traffic and managing cookies. To help replace the old banner ads. VAST may also insert promotional messages in pre-roll and post-roll viewing of other video content delivered by the web site. The website owner does not relate to selling or running ads, but the website owner receives payment based on viewership / impressions at the end of each month. In a similar manner, the physical stimulus provided to the user in the real world and sensed by the mobile technology may be the basis for payments to interested parties.

자극이 이용자들 및 그들 모바일 디바이스들에 제공되는 동적 환경들이 제어되어(정적 포스터들과 대조적인 비디오 디스플레이들과 같이), CTR과 같은 메트릭들의 측정 및 활용을 위한 새로운 기회들을 제공한다. Dynamic environments in which stimuli are provided to users and their mobile devices are controlled (such as video displays in contrast to static posters), providing new opportunities for the measurement and utilization of metrics such as CTR.

배경 음악, 디지털 디스플레이들 상의 콘텐트, 조명 등은 CTR을 최대화하고 트래픽을 형성하기 위해 수정될 수 있다. 예를 들면, 특정 사이너지에 대한 조명이 증가될 수 있거나, 타겟된 개인이 통과할 때와 플래시될 수 있다. 유사하게, 일본발 비행기가 공항에 착륙할 때, 디지털 사이너지, 음악 등은 CTR을 최대화하기 위해, 명백하게 (예상된 청중의 관심들에 대한 광고의 변경) 또는 비밀스럽게(일본어 웹사이트에 이용자를 취하기 위해 링크된 경험을 변경) 모두 수정될 수 있다.Background music, content on digital displays, lighting, and the like can be modified to maximize CTR and build traffic. For example, illumination for a particular synergy may be increased or may flash as the targeted individual passes through. Similarly, when an airplane from Japan lands at the airport, digital signage, music, etc. may be used to explicitly or (securely change the advertisements to the expected audience's interest) or secretly (in Japanese websites) to maximize the CTR. Can be modified).

메커니즘들은 잘못되거나 승인되지 않은 센서 자극과 대항하기 위해 마찬가지로 도입될 수 있다. 비즈니스 파크 부지의 한정된 공간들 내에서, 재산 소유자의 의도들 또는 정책들에 충실하지 않는 자극들(포스터, 음악, 디지털 사이너지 등) - 또는 도메인에 책임이 있는 엔티티 - 는 관리되어야 할 수 있다. 이것은 지리적 특정된 간단한 블로킹 메커니즘들(DVD 상의 영역 코딩과 다르지 않음)의 이용을 통해 달성될 수 있어서, 클라우드에서의 특정 장소에 키벡터를 라우팅하기 위해 특정 GPS 좌표 내의 모든 시도들은 도메인 소유주에 의해 관리되는 게이트웨이 또는 라우팅 서비스에 의해 중재되어야 하는 것을 나타낸다. Mechanisms can likewise be introduced to combat false or unauthorized sensor stimuli. Within the confined spaces of the business park site, stimuli (posters, music, digital signage, etc.) that are not faithful to the property owner's intentions or policies-or entities responsible for the domain-may need to be managed. This can be achieved through the use of simple geo-specific simple blocking mechanisms (not different from region coding on DVD), so that all attempts within a particular GPS coordinate to route a keyvector to a specific place in the cloud is managed by the domain owner. Indicates that it should be arbitrated by a gateway or routing service.

다른 옵션들은 결과로서 생긴 경험을 필터링하는 것을 포함한다. 나이가 적당한가? Denver Nuggets 게임 동안 펩시 센터 내부의 이용자에 전달되는 코카 콜라 광고와 같은 선재하는 광고하는 또는 브랜딩하는 어레인지먼트들에 반대로 실행되는가?Other options include filtering the resulting experience. Are you old? Does Denver Nuggets run against pre-existing advertising or branding arrangements, such as Coca-Cola advertisements delivered to users inside the Pepsi Center during the game?

이것은 충돌하는 미디어 콘텐트(www.movielabs-dot-com/CRR 비교)에 관련된 무비랩스 콘텐트 인식 규칙들, 디바이스에 캐리어들에 의해 제공된 모체 제어들, 또는 DMCA 오토메틱 테이크 다운 노티스들에 충실함으로써와 같이, 콘텐트 규칙들의 이용을 통해 마찬가지로 디바이스 상에서 달성될 수 있다. This is done by sticking to MovieLabs content recognition rules related to conflicting media content (compare www.movielabs-dot-com / CRR), parental controls provided by carriers to the device, or DMCA automatic take down knots, It can likewise be achieved on the device through the use of content rules.

다양한 권리들 관리 패러다임들 하에서, 라이센스들은 콘텐트들이 어떻게 소비, 공유, 수정 등이 되는지를 결정하는 키 역할을 한다. 자극이 제공되는 위치 및/또는 이용자(및 이용자의 모바일 디바이스)에 제공된 자극들로부터 의미를 추출하는 결과는 제 3 자에 의해 원하는 콘텐트 또는 경험들(게임들 등)에 대한 라이센스의 발행이 될 수 있다. 예시하기 위해, 무대의 록 콘서트에 있는 이용자를 고려하자. 이용자에게는 iTunes 상의 공연 예술가(및/또는 다른 사람들)에 의한 모든 음악 트랙들을 미리 이용 및 청취하기 위한 임시 라이센스가 수여될 수 있다. 그러나, 헤드라인이 콘서트 동안에만, 헤드라인 동작이 공연을 시작할 때까지 문들이 개방될 때로부터만, 또는 이용자가 무대에 있는 동안에만, 그러한 라이센스가 유지될 수 있다. 이후, 이러한 라이센스는 종료한다. Under various rights management paradigms, licenses serve as the key to determining how content is consumed, shared, modified, etc. The result of extracting meaning from the stimulus provided to the location and / or the user (and the user's mobile device) provided the stimulus may be the issuance of a license for the desired content or experiences (games, etc.) by a third party. have. To illustrate, consider a user at a rock concert on stage. A user may be granted a temporary license to pre-use and listen to all music tracks by a performing artist (and / or others) on iTunes. However, such a license can be maintained only during the concert, only from when the doors are opened until the headline operation starts performing, or while the user is on stage. The license then terminates.

유사하게, 국제 비행으로부터 내린 승객들은 세관을 통과한 그들이 도착 후 90분 동안 공항에 있는 동안, 그들 모바일 디바이스들에 대한 번역 서비스들 또는 내비게이션 서비스들(예를 들면, 카메라-캡처된 장면들 상에서 수하물 찾는 곳, 화장실 등에 대한 방향들을 오버레이하는 증대된 실제 시스템)에 대한 위치-기반 또는 시간-제한된 라이센스들을 수여될 수 있다. Similarly, passengers dropping from international flights are translating services or navigation services (e.g., camera-captured scenes on their mobile devices) for their mobile devices while at the airport 90 minutes after their arrival through customs. Location-based or time-limited licenses for augmented real-world systems that overlay directions for finding, restrooms, and the like.

이러한 어레인지먼트들은 경험들에 대한 메타포들 및 필터링 메커니즘들의 역할을 할 수 있다. 센서 자극에 의해 경험들의 공유가 트리거링되는 일 실시예는 브로드캐스트 소셜 네트워크들(예를 들면, 트위터) 및 신디케이션 프로토콜들(예를 들면, RSS 웹 피드들/채널들)을 통한다. 다른 이용자들, 엔티티들 또는 디바이스들은 측정(시청률 등) 또는 활동들(예를 들면, 사람의 데일리 저널)의 로깅시, 후속 통신(소셜, 정보 검색 등)에 대한 기초로서 그러한 브로드캐스트들/피드들에 가입할 수 있다. 그러한 네트워크들/피드들과 연관된 트래픽은 또한 특정 위치에서 디바이스들에 의해 측정될 수 있다 - 어떤 특정한 시점에 누가 통신하고 있었는지를 알기 위해 이용자들이 시간을 트레버싱하도록 허용한다. 이것은 부가의 정보를 검색하는 것 및 채집하는 것을 가능하게 하며, 예로서는 나의 친구가 지난 주말에 여기에 있었나? 나의 또래 그룹 중 누군가가 여기에 있었나? 어떤 콘텐트가 소비되었나?이다. 이러한 트래픽은 또한, 어떤 이용자들이 경험들을 공유하는지를 실시간으로 모니터링하는 것을 가능하게 한다. 콘서트 동안 공연자의 노래 선택에 관한 "트위트들" 모니터링은 공연자가 남은 콘서트 동안 플레이될 노래들을 변경하게 한다. 브랜드 관리에 대해서도 동일하게 적용된다. 예를 들면, 이용자들이 차량 전시 동안 차량에 관한 의견들을 공유한다면, 트래픽 상의 생생한 키워드 필터링은 브랜드 소유주가 최대 효과를 위해 특정 제품들을 재배치하도록 허용할 수 있다(예를 들면, 코르벳함의 새 모델은 스피닝 플랫폼 상에서 더 많은 시간을 보내야 한다, 등).Such arrangements can serve as metaphors and filtering mechanisms for experiences. One embodiment in which sharing of experiences is triggered by sensor stimulation is via broadcast social networks (eg, Twitter) and syndication protocols (eg, RSS web feeds / channels). Other users, entities, or devices may use such broadcasts / feeds as the basis for subsequent communications (social, information retrieval, etc.) upon logging of measurements (viewing rates, etc.) or activities (eg, a daily journal of a person). You can join them. Traffic associated with such networks / feeds can also be measured by devices at a particular location-allowing users to traverse time to know who was communicating at any particular point in time. This makes it possible to retrieve and gather additional information, for example, was my friend here last weekend? Was anyone in my peer group here? What content was consumed? This traffic also makes it possible to monitor in real time which users share experiences. "Tweets" monitoring of the performer's song selection during the concert allows the performer to change the songs to be played for the remainder of the concert. The same applies to brand management. For example, if users share opinions about a vehicle during vehicle display, live keyword filtering on traffic may allow brand owners to relocate specific products for maximum effectiveness (e.g., a new model of Corvette spinning Spend more time on the platform, etc.)

최적화에 대한 추가Addition to optimization

이용자의 동작 또는 의도를 예측하는 것은 최적화의 한 형태이다. 다른 형태는 성능을 개선시키도록 처리를 구성하는 것을 관련시킨다. Predicting the user's behavior or intention is a form of optimization. Another form involves configuring the process to improve performance.

하나의 특정 어레인지먼트를 예시하기 위해, 다시, 도 6의 공용 서비스들 분류기를 고려하자. 어떤 키벡터 동작들이 로컬로 또는 원격으로 실행되어야 하는지, 또는 어떤 종류의 하이브리드? 어떤 순서로 키벡터 동작들이 실행되어야 하는가? 등. 예상된 동작들 및 그들 스케줄링의 혼합은 이용되는 처리 아키텍처에 대한 적합한 방식, 환경들 및 콘텍스트에서 구성되어야 한다.To illustrate one particular arrangement, again consider the common services classifier of FIG. 6. What keyvector operations should be run locally or remotely, or what kind of hybrid? In what order should keyvector operations be executed? Etc. The mix of expected operations and their scheduling should be configured in a suitable manner, environments and context for the processing architecture used.

처리의 한 단계는 어떤 동작들이 발생되어야 하는지를 결정하는 것이다. 이 결정은 이용자로부터의 명시적 요청들, 이용의 이력적 패턴들, 콘텍스트 및 상태 등에 기초할 수 있다. One step of the process is to determine what actions should be taken. This determination may be based on explicit requests from the user, historical patterns of use, context and status, and the like.

많은 동작들이 고 레벨의 기능들이며, 이는 다수의 구성요소들의 - 특정 순서로 실행된 - 동작들을 관련시킨다. 예를 들면, 광학 캐릭터 인식은 에지 검출에 이어, 관심 영역 세그먼테이션에 이어, 템플릿 패턴 매칭을 요구할 수 있다. 얼굴 인식은 피부톤 검출, 허프(Hough) 변환들(타원형 영역들을 식별하기 위해), 특징 위치들의 식별(동공들, 입꼬리, 코), 아이겐페이스 계산, 및 템플릿 매칭을 관련시킬 수 있다. Many operations are high level functions, which involve the operations of multiple components-executed in a particular order. For example, optical character recognition may require template pattern matching following edge detection followed by region of interest segmentation. Facial recognition may involve skin tone detection, Hough transforms (to identify elliptical regions), identification of feature locations (pupils, mouth, nose), eigenface calculations, and template matching.

시스템은 실행되어야 할 수 있는 구성요소 동작들, 및 그들 각각의 결과들이 요구되는 순서를 식별할 수 있다. 규칙들 및 발견적 교수법은 이들 동작들이 로컬로 또는 원격으로 실행되어야 하는지를 결정하도록 돕기 위해 적용될 수 있다. The system can identify component operations that may need to be executed, and the order in which their respective results are required. Rules and heuristics can be applied to help determine whether these actions should be executed locally or remotely.

예를 들면, 일 극단에서, 규칙들은 컬러 히스토그램들 및 임계치화와 같은 간단한 동작들이 일반적으로 로컬로 실행되어야 하는 것을 명시할 수 있다. 다른 극단에서, 복잡한 동작들은 일반적으로 외부 제공자들에 디폴트될 수 있다. For example, at one extreme, rules may specify that simple actions, such as color histograms and thresholding, should generally be executed locally. At the other extreme, complex operations can generally be defaulted to external providers.

스케줄링은 어떤 동작들이 다른 동작들에 대한 전제조건들인지에 기초하여 결정될 수 있다. 이것은 또한, 동작이 로컬로 또는 원격으로 실행되어야 하는지에 영향을 미칠 수 있다(로컬 실행은 더 신속한 결과들을 제공할 수 있다 - 후속 동작들이 덜 지연되어 시작되게 허용함). 규칙들은 그 출력(들)이 최대 수의 후속 동작들에 의해 이용되는 동작을 식별하도록 추구하고, 이 동작을 먼저 실행할 수 있다(그 각각의 전례(들) 허용). 연속적인 소수의 다른 동작들에 대한 전제조건들인 동작들은 나중에 연속적으로 실행된다. 동작들 및 그 시퀀스는 트리 구조로서 그려질 수 있다 - 가장 전역적으로 중요한 것이 먼저 실행되고 다른 동작들에 대해 더 낮은 관련성의 동작들이 나중에 실행된다.Scheduling may be determined based on which operations are prerequisites for other operations. This may also affect whether the operation should be executed locally or remotely (local execution may provide faster results-allowing subsequent operations to start with less delay). The rules seek to identify the action whose output (s) are to be used by the maximum number of subsequent actions, and may execute this action first (allowing its respective precedent (s)). Actions that are prerequisites for a small number of successive operations are subsequently executed sequentially. The operations and their sequence can be drawn as a tree structure-the most globally important ones are executed first and the lower relevance operations to other operations are later executed.

그러나, 이러한 결정들은 다른 팩터들에 의해 조절될 수(또는 좌우될 수) 있다. 하나는 전력이다. 셀 폰 배터리가 낮거나, 동작이 낮은 용량의 배터리 상에서 중요한 드레인을 관련시킬 것이라면, 이것은 동작이 원격으로 실행되게 하기 위하여 밸런스를 팁핑(tip)할 수 있다. However, these decisions may be adjusted (or dependent) by other factors. One is power. If the cell phone battery is low, or the operation will involve significant drain on a low capacity battery, this may tip the balance to allow the operation to be performed remotely.

다른 팩터는 응답 시간이다. 일부 예들에서, 셀 폰의 제한된 처리 능력은 로컬 처리가 원격 처리보다 느린 것을 의미할 수 있다(예를 들면, 더욱 강력한, 병렬의 아키텍처가 동작을 실행할 수 있게 할 수 있음). 다른 예들에서, 원격 서버와의 통신을 확립 및 세션을 확립하는 지연들은 동작의 로컬 실행을 더 신속하게 할 수 있다. 이용자 요구 및 다른 동작(들)의 요구들에 의존하여, 결과들이 리턴되는 속도는 중요할 수 있거나 그렇지 않을 수 있다. Another factor is response time. In some examples, the limited processing power of the cell phone may mean that local processing is slower than remote processing (eg, a more powerful, parallel architecture may enable operation). In other examples, delays in establishing communication with a remote server and establishing a session can make local execution of the operation faster. Depending on the user request and the needs of other operation (s), the speed at which results are returned may or may not be important.

또 다른 팩터는 이용자 선호들이다. 다른 곳에 주지된 바와 같이, 이용자는 어디에서 및 언제 동작들이 실행되는지에 영향을 미치는 파라미터들을 설정할 수 있다. 예를 들면, 이용자는 동작이 국내 서비스 제공자에 의해 원격 처리에 참조될 수 있지만, 아무것도 이용 가능하지 않은 경우, 동작은 로컬로 실행되어야 하는 것을 명시할 수 있다. Another factor is user preferences. As noted elsewhere, the user can set parameters that affect where and when operations are performed. For example, a user may specify that an operation may be referenced for remote processing by a domestic service provider, but if nothing is available, the operation should be executed locally.

라우팅 제약들은 다른 팩터이다. 때때로, 셀 폰은 WiFi 또는 다른 서비스 영역(예를 들면 콘서트 무대에서)에 있을 것이며, 거기서 로컬 네트워크 제공자는 그 네트워크를 통해 액세스될 수 있는 원격 서비스 요청들에 대한 제한들 또는 조건들을 둔다. 사진촬영이 금지된 콘서트에서, 예를 들면, 로컬 네트워크는 콘서트의 지속기간 동안 외부 이미지 처리 서비스 제공자들에 대한 액세스를 차단하도록 구성될 수 있다. 이 경우, 외부 실행을 위해 일반적으로 라우팅된 서비스들은 로컬로 실행되어야 한다. Routing constraints are another factor. Occasionally, a cell phone will be in WiFi or other service area (eg at a concert stage), where the local network provider places restrictions or conditions on remote service requests that can be accessed over that network. In concerts where photography is prohibited, for example, the local network may be configured to block access to external image processing service providers for the duration of the concert. In this case, services that are normally routed for external execution should run locally.

또 다른 팩터는 셀 폰이 장착된 특정 하드웨어이다. 전용된 FFT 처리가 폰에서 이용 가능하다면, 집중적인 FFT 동작들의 실행은 국부적으로 이용한다. 단지 미약한 범용 CPU가 이용 가능한 경우, 집중적인 FFT 동작은 외부 실행을 위해 외부에 참조되는 것이 가장 가능성 있다. Another factor is the specific hardware on which the cell phone is mounted. If dedicated FFT processing is available at the phone, execution of intensive FFT operations is used locally. If only a weak general purpose CPU is available, it is most likely that intensive FFT operations are externally referenced for external execution.

관련된 팩터는 현재 하드웨어 활용이다. 셀 폰이 특정 작업을 위해 잘 구성된 하드웨어가 장착되는 경우에도, 시스템이 이러한 종류의 다음 작업을 완료를 위해 외부 소스에 참조할 수 있는 것은 너무 바빠서 백로깅될 수 있다. A related factor is current hardware utilization. Even if the cell phone is equipped with well-configured hardware for a particular task, it may be too busy to be backlogged by the system to refer to an external source to complete this kind of next task.

다른 팩터는 로컬 처리 체인의 길이 및 스톨(stall)의 위험일 수 있다. 파이프라인 처리 아키텍처들은 동작을 완료하기 위해 요구된 데이터를 대기할 때 간격들 동안 스톨링될 수 있다. 이러한 스톨은 모든 다른 후속 동작들이 유사하게 지연되게 할 수 있다. 가능한 스톨의 위험이 평가되고(예를 들면, 이력적 패턴들, 또는 동작의 완료가 적절한 때에 가용성이 보장되지 않는 - 다른 외부 처리로부터의 결과와 같이- 다른 데이터를 요구하는 지식에 의해), 위험이 충분히 크다면, 동작은 로컬 처리 체인을 스톨링하는 것을 회피하기 위하여 외부 처리에 참조될 수 있다. Another factor may be the risk of stall and length of the local processing chain. Pipeline processing architectures can be stalled for intervals when waiting for data required to complete an operation. This stall can cause all other subsequent operations to be similarly delayed. The risk of possible stall is assessed (eg, by historical patterns, or by knowledge that requires availability of other data—such as results from other external processing—where availability is not guaranteed when the completion of the operation is appropriate). If this is large enough, the operation can be referenced to external processing to avoid stalling the local processing chain.

또 다른 팩터는 접속 상태이다. 신뢰 가능한 고속 네트워크 접속이 확립되었나? 또는 패킷들이 중단되었거나, 네트워크 속도가 느린가(또는 완전히 이용 불가능한가)? Another factor is the connection state. Is a reliable high-speed network connection established? Or are the packets dropped, or is the network slow (or completely unavailable)?

상이한 종류들의 지리적 고려사항들이 또한 팩터들이 될 수 있다. 하나는 서비스 제공자에 근접한 네트워크이다. 다른 하나는 셀 폰이 네트워크에 대한 제한된 액세스를 가지는지(홈 영역에서와 같이), 또는 이용당 지불 어레인지먼트(pay-per-use arrangement)인지(다른 나라에 로밍할 때와 같이)이다. Different kinds of geographic considerations can also be factors. One is a network close to the service provider. The other is whether the cell phone has limited access to the network (as in the home area) or a pay-per-use arrangement (as when roaming to another country).

원격 서비스 제공자(들)에 관한 정보가 또한 팩터가 될 수 있다. 서비스 제공자가 즉각적인 턴어라운드를 제공하는가 또는 요청된 동작들이 서비스를 대기하는 다른 이용자들보다 늦게 긴 큐로 배치되는가? 제공자가 적업을 처리할 준비가 되면, 어떤 속도의 실행이 예상되는가? 이용자에 대한 중요도의 다른 속성들(예를 들면, 서비스 제공자가 환경적 책임의 "녹색" 표준들을 충족하는지의 여부)과 함께, 비용들이 또한 키펙터들이 될 수 있다. 특정 콘텍스트들에서 적당할 수 있을 때 매우 많은 다른 팩터들이 또한 고려될 수 있다. 이러한 데이터에 대한 소스들은 예시적 블록도들에 도시된 다양한 요소들뿐만 아니라 외부 리소스들을 포함할 수 있다. Information about the remote service provider (s) can also be a factor. Does the service provider provide immediate turnaround or are the requested actions placed in a long queue later than other users waiting for service? When the provider is ready to handle a job, what speed is expected to run? Along with other attributes of importance to the user (eg, whether the service provider meets the "green" standards of environmental responsibility), the costs can also be key factors. Very many other factors may also be considered when suitable in certain contexts. Sources for such data may include external resources as well as the various elements shown in example block diagrams.

상술된 것의 개념도가 도 19b에 제공된다. A conceptual diagram of the above is provided in FIG. 19B.

다양한 팩터들에 기초하여, 동작들이 로컬로 또는 원격으로 실행되어야 하는지에 대한 결정이 이루어진다. (동일한 팩터들이 동작들이 실행되어야 하는 순서를 결정하기 위해 평가될 수 있다.)Based on various factors, a determination is made whether operations should be executed locally or remotely. (The same factors can be evaluated to determine the order in which actions should be executed.)

일부 실시예들에서, 상이한 팩터들이 점수들에 의해 양자화될 수 있으며, 이것은 동작이 어떻게 처리되어야 하는지를 나타내는 전체 점수를 산출하기 위해 다항식 방식으로 조합될 수 있다. 이러한 전체 점수는 원격 또는 외부 처리를 위해 동작의 관련 적합성을 나타내는 메트릭의 역할을 한다. (유사한 점수화 방식이 상이한 서비스 제공자들 중에서 선택하도록 활용될 수 있다.)In some embodiments, different factors may be quantized by the scores, which may be combined in a polynomial manner to yield an overall score indicating how the operation should be processed. This overall score serves as a metric indicating the relevant suitability of the action for remote or external processing. (Similar scoring methods can be utilized to select among different service providers.)

환경들을 변경하는 것에 의존하여, 주어진 동작은 한 순간에 로컬로 실행될 수 있고 나중 순간에 원격으로 실행될 수 있다(또는 그 반대로도 가능하다). 또는 동일한 동작은 키벡터 데이터의 두 세트들에 대해 동시에 - 하나는 로컬로 하나는 원격으로 - 실행될 수 있다. Depending on changing environments, a given action can be executed locally at one moment and remotely at a later moment (or vice versa). Or the same operation can be performed simultaneously on two sets of keyvector data-one locally and one remotely.

동작이 로컬로 또는 원격으로 실행되어야 하는지를 결정하는 콘텍스트에서 기술되었지만, 동일한 팩터들이 마찬가지로 다른 것들에 영향을 미칠 수 있다. 예를 들면, 이들은 어떤 정보가 키벡터들에 의해 전달되는지를 결정할 때 또한 이용될 수 있다. Although described in the context of determining whether an operation should be executed locally or remotely, the same factors can likewise affect others. For example, they can also be used when determining what information is conveyed by keyvectors.

셀 폰이 캡처된 이미지에 대해 OCR을 실행하는 환경을 고려하자. 한 세트의 팩터들을 이용하여, 캡처된 이미지로부터의 처리되지 않은 픽셀 데이터는 이 결정을 하기 위해 원격 서비스 제공자에 송신될 수 있다. 상이한 세트의 팩터들 하에서, 셀 폰은 에지 검출과 같은 초기 처리를 실행할 수 있고, 그 후에 키벡터 형태로 에지-검출된 데이터를 패키징할 수 있고, OCR 동작을 완료하기 위해 외부 제공자에 라우팅할 수 있다. 또 다른 세트의 팩터들 하에서, 셀 폰은 최종(템플릿 매칭)까지 모든 구성요소의 OCR 동작들을 실행하고, 이 최종 동작에 대해서만 데이터를 송신한다. (또 다른 하나의 세트의 팩터들 하에서, OCR 동작은 셀 폰에 의해 완전히 완료될 수 있거나, 상이한 구성요소들의 동작이 셀 폰 및 원격 서비스 제공자(들)에 의해 교대로 실행될 수 있다, 등.)Consider an environment where cell phones run OCR on captured images. Using a set of factors, raw pixel data from the captured image can be sent to the remote service provider to make this determination. Under different sets of factors, the cell phone can perform initial processing such as edge detection, then package edge-detected data in keyvector form and route it to an external provider to complete the OCR operation. have. Under another set of factors, the cell phone executes OCR operations of all components until the final (template matching) and transmits data only for this final operation. (Under another set of factors, the OCR operation may be completed completely by the cell phone, or the operation of the different components may be alternately executed by the cell phone and the remote service provider (s), etc.)

하나의 가능한 팩터로서 라우팅 제약들에 대한 참조가 이루어졌다. 이것은 더욱 일반적인 팩터의 특정 예이다 - 외부 비즈니스 규칙들. 덴버의 펩시 센터의 이벤트에 참여한 이용자의 초기 예를 고려하자. 펩시 센터는 그 자신의 WiFi 또는 다른 네트워크를 통해 무선 통신 서비스들을 고객들에게 제공할 수 있다. 자연스럽게, 펩시 센터는 코카 콜라와 같은 경합자들의 이익에 이용될 그 네트워크 리소스들에는 달갑지 않다. 따라서, 호스트 네트워크는 그의 고개들에 의해 활용될 수 있는 클라우드 서비스들에 영향을 미칠 수 있다(예를 들면, 어떤 것을 액세스 불가능하게 함으로써, 또는 특정 타입들 또는 특정 목적지들을 가진 데이터 트래픽에 낮은 우선순위를 제공함으로써). 도메인 소유주는 어떤 동작들을 모바일 디바이스가 실행할 수 있는지에 대한 제어를 행사할 수 있다. 이러한 제어는 로컬/원격 결정뿐만 아니라, 키벡터 패킷들에 전달되는 데이터의 타입에 영향을 미칠 수 있다. Reference is made to routing constraints as one possible factor. This is a specific example of a more general factor-external business rules. Consider an early example of a user who attended an event at the Pepsi Center in Denver. The Pepsi Center can provide wireless communication services to customers via its own WiFi or other network. Naturally, the Pepsi Center is not happy with the network resources that will be used in the interests of competitors such as Coca-Cola. Thus, the host network may affect cloud services that may be utilized by its heads (eg, by making something inaccessible or low priority to data traffic with certain types or specific destinations). By providing it). The domain owner can exercise control over what actions the mobile device can execute. This control may affect the type of data carried in keyvector packets, as well as local / remote decisions.

다른 예는 체육관이며, 여기서 예를 들면, 플리커 및 피카사와 같은 사진 공유 사이트들뿐만 아니라, 이미지에 대한 원격 서비스 제공자들에 대한 액세스를 방해함으로써, 셀 폰 카메라들의 이용을 중단하기를 원할 수 있다. 또 다른 예는 프라이버시 이유들로 학생들 및 관계자의 얼굴 인식을 중단하기를 원할 수 있는 학교이다. 이러한 경우, 얼굴 인식 서비스 제공자들에 대한 액세스는 차단될 수 있거나, 적당하게 한 경우씩에 기초하여 허가될 수 있다. 무대들에서는 개인들이 셀 폰 카메라를 이용하는 - 또는 특정 목적들을 위해 이들을 이용하는- 것을 중단시키는 것이 어렵다는 것을 알 수 있지만, 그들은 그러한 이용을 방해하기 위한 다양한 동작들을 취할 수 있다(예를 들면, 그러한 이용을 촉진하거나 용이하게 하는 서비스들을 부정함으로써).Another example is a gym where you may want to stop using cell phone cameras, for example, by disrupting access to remote image providers as well as photo sharing sites such as Flickr and Picasa. Another example is a school that may want to discontinue facial recognition of students and stakeholders for privacy reasons. In such a case, access to facial recognition service providers may be blocked, or may be granted on a case-by-case basis. While on stages it can be seen that it is difficult for individuals to stop using a cell phone camera-or to use them for specific purposes-they can take various actions to prevent such use (eg, using such a use). By denying services that facilitate or facilitate).

다음의 아웃라인들은 어떤 동작들이 어디에서 어떤 시퀀스로 실행되어야 하는지를 결정하는데 관련될 수 있는 다른 팩터들을 식별한다:The following outlines identify other factors that may be involved in determining which operations should be executed where and in what sequence:

1. 다수의 팩터들에 기초하여 키벡터 처리 유닛들의 최적화를 스케줄링:1. Scheduling optimization of keyvector processing units based on a number of factors:

o 동작 혼합(Operation mix), 동작들은 유사한 원자 명령어들로 구성된다(MicroOps, Pentium Ⅱ 등)Operation mix, operations consist of similar atomic instructions (MicroOps, Pentium II, etc.)

o 스톨 상태들, 동작들은 다음의 이유로 스톨들을 생성할 것이다: Stall states, actions will create stalls for the following reasons:

ㆍ 외부 키벡터 처리를 위해 대기Wait for external keyvector processing

ㆍ 불량한 접속성Poor connectivity

ㆍ 이용자 입력User input

ㆍ 이용자 초점의 변경Change of user focus

o 다음에 기초한 동작의 비용:o Cost of action based on:

ㆍ 공개된 비용Published costs

ㆍ 경매 상태에 기초한 예상 비용Estimated cost based on auction status

ㆍ 배터리 상태 및 전력 모드Battery status and power mode

ㆍ 동작의 전력 프로파일(비싼가?)Power profile of operation (expensive?)

ㆍ 전력 소비의 과거 이력Past history of power consumption

ㆍ 기회 비용, 디바이스의 현재 상태를 제공, 예를 들면, 어떤 다른 처리들이 음성 호출, GPS 내비게이션 등과 같이 우선순위 를 취해야 하는지 Provide opportunity cost, current status of the device, for example, what other processes should be prioritized, such as voice calls, GPS navigation, etc.

ㆍ 이용자 선호들, 즉 나는 "녹색" 제공자 또는 개방 소스 제공 자를 원한다User preferences, i.e. I want a "green" provider or an open source provider

ㆍ 법적 불확실성들(예를 들면, 특정 제공자들은 예를 들면 주 장된 특허 방법의 이용으로 인해, 특허 위반 책임들의 더 큰 위 험에 있을 수 있음)Legal uncertainties (e.g., certain providers may be at greater risk of patent infringement responsibilities, eg due to the use of claimed patent methods)

o 도메인 소유주 영향:o Domain owner impact:

ㆍ 학교들에서 얼굴 인식하지 않는 것과 같은 특정 물리적 무대 들의 프라이버시 관련들The privacy implications of certain physical stages, such as no facial recognition in schools

ㆍ 특정 자극에 대한 특정 동작들을 금지하는 규칙들에 기초하 여 미리결정된 콘텐트Predetermined content based on rules that prohibit specific actions for a particular stimulus

ㆍ 다른 가수들을 이용한 것을 하이라이팅하는 브로드캐 스트 노래들에 대한 보이스프린트 매칭(대상 기록시의 실 제 보컬들이 다른 가수들에 의해 실행되었음을 관리인들 이 알았을 때 Milli-Vanilli의 그래미상은 무효로 된다)Voiceprint matching of broadcast songs highlighting the use of other singers (milli-vanilli's grammys are invalid when managers know that the actual vocals at the time of recording were performed by other singers)

o 모든 상기한 영향 스케줄링 및 원하는 결과로의 최적의 경로에 기초 하여 키벡터들의 실행 순서를 벗어나서 실행하는 능력 o the ability to run out of order of keyvectors based on all the above mentioned impact scheduling and the optimal path to the desired result

ㆍ 후속 키벡터 동작들에 대한 필요성의 예측을 어렵게 하는 긴 체인의 동작들에서의 불확실성(처리기들 & 브랜치 예측에서 깊 은 파이프라인과 유사) - 키벡터들 상의 약한 메트릭들로 인해 어려움들이 있을 수 있다Uncertainty in long chain operations that makes it difficult to predict the need for subsequent keyvector operations (similar to the deep pipeline in processors & branch predictions)-there may be difficulties due to weak metrics on keyvectors have

ㆍ 과거 거동Past behavior

ㆍ 위치(GPS는 디바이스가 신속하게 움직이고 있음을 나 타낸다) & GPS 움직임들의 패턴Position (GPS indicates that the device is moving quickly) & pattern of GPS movements

ㆍ 공항 터미널을 통해 걷고 있는 이용자가 각각의 게이트에서 제공되고 있는 CNN에 반복적으로 노출 되는 것과 같이, 자극에 대한 노출의 패턴이 있다There are patterns of exposure to stimuli, such as a user walking through an airport terminal repeatedly exposed to the CNN provided at each gate.

ㆍ 포켓에 디바이스가 있음을 나타내는 부근 센서들 등Proximity sensors indicating the presence of a device in the pocket, etc.

ㆍ 최근 이용된(LRU: Least Recently Used)와 같은 다른 방식들이 원하는 효과(노래의 인식 등)에 결과로서 나타 나거나 기여된 원하는 키벡터 동작이 얼마나 빈번하지 않 은지를 추적하기 위해 이용될 수 있다.Other approaches, such as Least Recently Used (LRU), can be used to track how often the desired keyvector behavior that results in or contributes to the desired effect (song recognition, etc.) is not frequent.

다른 관련 파이프라인화되거나 다른 시간-소비된 동작들, 특정 실시예들은 클럭 사이클들의 임계 수보다 많을 수 있는 것에 대한 처리 리소스를 연계하기 전에 어떤 적합성 테스트를 착수할 수 있다. 간단한 적합성 테스트는 분석으로부터 신속하게 실격될 수 있는 데이터와 대조하여, 이미지 데이터를 의도된 목적에 잠재적으로 유용한 것을 확실하게 하는 것이다. 예를 들면, 모두 흑색(예를 들면, 이용자의 포켓에서 캡처된 프레임)인지의 여부이다. 알맞은 초점이 또한, 확장된 동작에 수용되기 전에 신속히 확인될 수 있다. Other related pipelined or other time-consuming operations, certain embodiments may undertake some conformance testing before associating processing resources for what may be more than a threshold number of clock cycles. A simple conformance test is to ensure that image data is potentially useful for its intended purpose, in contrast to data that can be quickly disqualified from the analysis. For example, it is all black (e.g., frames captured in the user's pocket). Proper focus can also be quickly identified before being accommodated in extended operation.

(상기 논의된 이 기술의 특정 양태들은 뒤늦게 생각해보면 가시적인 전례들을 가지는 것을 알 것이다. 예를 들면, 파이프라인화된 처리기들을 위한 명령어 최적화에 상당한 작업이 투입된다. 또한, 일부 디바이스들은 예를 들면, 배터리 수명을 연장하기 위해 특정 애플 노트북들의 전력-부족 GPU의 이용자 선택 가능한 비활성화와 같이, 전력 설정들의 이용자 구성을 허용했다.)(It will be appreciated that certain aspects of this technique discussed above have visible precedents when considered late. For example, considerable work is devoted to instruction optimization for pipelined processors. Allowing user configuration of power settings, such as user-selectable deactivation of power-deficient GPUs on certain Apple notebooks to extend battery life.)

적절한 명령어 혼합의 상기 논의된 결정(예를 들면, 도 6의 공용 서비스 분류기에 의해)은 파이프라인화된 아키텍처들에서 발생한 특정 문제들을 특별히 고려하였다. 하나 이상의 GPU들이 이용 가능한 실시예들에 상이한 원리들이 적용될 수 있다. 이들 디바이스들은 통상적으로 병렬 실행을 위해 적응되는 수백 또는 수천의 스칼라 처리기들을 구비하여, 실행의 비용들(시간, 스톨 위험 등)이 작다. 브랜치 예측은 예측을 하지 않고 다루어질 수 있다; 대신, GPU가 브랜치의 모든 잠재적 결과들에 대해 병렬로 처리하고, 시스템은 출력이 실제 브랜치 조건에 대응하는 것은 무엇이든 그것이 알려지면 이용한다. The above discussed decision of proper instruction mixing (eg, by the common service classifier of FIG. 6) specifically took into account certain problems arising in pipelined architectures. Different principles may apply to embodiments in which one or more GPUs are available. These devices typically have hundreds or thousands of scalar processors that are adapted for parallel execution, so the costs of execution (time, stall risk, etc.) are small. Branch prediction can be handled without making predictions; Instead, the GPU processes in parallel all of the potential results of the branch, and the system uses whatever output is known that corresponds to the actual branch condition.

예시하기 위해, 얼굴 인식을 고려하자. GPU-장착된 셀 폰은 명령어들을 호출할 수 있다 - 카메라가 이용자 포토-슛 모드에서 활성활 될 때 - GPU에서 스칼라 처리기들의 20 클러스터들을 구성함. (이러한 클러스터는 때때로 "스트림 처리기"라고 칭해진다.) 특히, 각각의 클러스터는 캡처된 이미지 프레임으로부터 작은 타일에 대해 허프 변환을 실행하도록 구성된다 - 후보 얼굴들일 수 있는 하나 이상의 타원 형상들을 찾음. 따라서, GPU는 20개의 동시적인 허프 변환들에 의해 병렬로 전체 프레임을 처리한다. (많은 스트림 처리기들은 아마도 발견되지 않지만, 처리 속도는 악화되지 않는다.) To illustrate, consider face recognition. The GPU-mounted cell phone can invoke instructions-when the camera is activated in user photo-shoot mode-making up 20 clusters of scalar processors on the GPU. (These clusters are sometimes referred to as "stream processors.") In particular, each cluster is configured to perform a Hough transform on a small tile from the captured image frame-finding one or more ellipse shapes that can be candidate faces. Thus, the GPU processes the entire frame in parallel by twenty simultaneous Hough transforms. (Many stream processors are probably not found, but the processing speed doesn't get worse.)

이들 GPU 허프 변환 동작들이 완료되면, GPU는 더 적은 수의 스트림 처리기들로 재구성될 수 있다 - 눈의 동공들의 위치들, 코 위치, 및 입 양단 거리를 결정하기 위해 각각의 후보 타원 형상을 분석하는데 전념한다. 유용한 후보 얼굴 정보를 산출한 임의의 타원에 대해, 연관된 파라미터들은 키벡터 형태로 패키징되고, 클라우드 서비스에 송신되어, 예를 들면 이용자의 페이스북 친구들의 알려진 템플릿들에 대한 분석된 얼굴 파라미터들의 키벡터들을 확인한다. (또는 이러한 확인은 GPU 또는 셀 폰에서 다른 처리기에 의해 실행될 수 있다.)Once these GPU Hough transform operations are complete, the GPU can be reconstructed with fewer stream processors-analyzing each candidate ellipse shape to determine the positions, nose positions, and mouth-to-mouth distances of the pupils of the eye. Dedicated For any ellipse that yielded useful candidate face information, the associated parameters are packaged in keyvector form and sent to the cloud service, for example, the keyvectors of the analyzed face parameters for known templates of the user's Facebook friends. Check them. (Or such verification may be performed by another processor on the GPU or cell phone.)

(이러한 얼굴 인식 - 본 명세서에서 상술된 다른 것들과 같이 - 은 오리지널 캡처된 이미지에서의 예를 들면, 수백만의 픽셀들(바이트들)로부터 데이터의 볼륨을 수십, 수백 또는 수천의 바이트들을 포함할 수 있는 키벡터로 추출하는 것이 관심있는 주지 사항이다. 더 조밀한 정보 콘텐트를 가진 이러한 더 작은 정보 부분은 처리를 위해 더욱 신속히 라우팅된다 - 때때로 외부적으로. 추출된 키벡터 정보의 통신은 대응하는 대역폭 능력을 가진 - 비용 적합성 및 구현 실용성을 유지하는 - 채널을 통해 발생한다.)Such face recognition (such as others described herein above) may comprise tens, hundreds, or thousands of bytes of volume in an original captured image, for example, from millions of pixels (bytes). It is of interest to extract to the keyvectors that are present in. These smaller pieces of information with more dense information content are routed more quickly for processing-sometimes externally. Ability to occur through the channel-to maintain cost-effectiveness and practicality of implementation.)

스칼라 처리기 상에서 구현될 수 있는 것과 같이 그러한 동작에 대한 얼굴 검출의 방금 기술된 GPU 구현에 대조한다. 전체 이미지 프레임에 걸쳐 허프-변환-기반 타원 검출을 실행하는 것은 처리 시간의 관점에서 억제된다 - 더 많은 수고가 무가치하고, 처리기에 할당된 다른 작업들을 지연시킨다. 대신에, 이러한 구현은 통상적으로 픽셀들이 카메라로부터 나올 때 처리기가 픽셀들을 조사하게 한다 - 예상된 "피부톤" 범위 내의 컬러를 가진 것들을 찾는다. 피부톤 픽셀들의 영역이 식별되는 경우에만, 이미지 데이터의 그 발췌에 대해 허프 변환이 시도된다. 유사한 방식으로, 검출된 타원들로부터 얼굴 파라미터들을 추출하는 시도는 일련의 힘든 방식으로 행해진다 - 흔히 유용하지 않은 결과를 생성한다.Contrast with the just described GPU implementation of face detection for such an operation as can be implemented on a scalar processor. Performing huff-transform-based elliptic detection over the entire image frame is suppressed in terms of processing time-more effort is worthless, and delays other tasks assigned to the processor. Instead, this implementation typically causes the processor to examine the pixels as they come out of the camera-looking for those with colors within the expected "skin tone" range. Only when a region of skin tone pixels is identified, a Hough transform is attempted for that excerpt of the image data. In a similar manner, attempts to extract facial parameters from detected ellipses are done in a series of difficult ways-often producing undesirable results.

주변 광Ambient light

많은 인공 광 소스들은 일관된 조명을 제공하지 않는다. 대부분은 강도(휘도) 및/또는 컬러에 일시적 변형을 보인다. 이들 변형들은 일반적으로 AC 전력 주파수(50/60 또는 100/120 Hz)를 따르지만 때때로는 그렇지 않다. 예를 들면, 형광 튜브들은 -40KHz 레이트로 변하는 적외선 조명을 발광할 수 있다. 방출된 스펙트럼들은 특정 조명 기술에 의존한다. 가정용 및 산업용 조명을 위한 유기 LED들은 때때로 백색을 만들기 위해 개별 컬러 혼합들(예를 들면, 청색 및 호박색)을 이용할 수 있다. 다른 것은 더 많은 종래의 적색/녹색/청색 클러스터들 또는 인광 물질들을 가진 청색/UV LED들을 활용할 수 있다. Many artificial light sources do not provide consistent illumination. Most show temporary variations in intensity (luminance) and / or color. These variants generally follow the AC power frequency (50/60 or 100/120 Hz) but sometimes not. For example, fluorescent tubes can emit infrared light that changes at a -40 KHz rate. The emitted spectra depend on the specific illumination technique. Organic LEDs for home and industrial lighting can sometimes use separate color blends (eg, blue and amber) to make white. Others may utilize more conventional red / green / blue clusters or blue / UV LEDs with phosphors.

일 특정 구현에서, 처리 스테이지(38)는 예를 들면, 패킷들의 몸체들에서 이미지 데이터의 평균 강도, 적색, 녹색, 또는 다른 천연색을 모니터링한다. 이러한 강도 데이터는 그 단의 출력(33)에 적용될 수 있다. 이미지 데이터를 이용하여, 각각의 패킷은 이미지 데이터가 캡처된 특정 시간(절대값 또는 로컬 클럭에 기초하여)을 나타내는 타임스탬프를 전달할 수 있다. 이 시간 데이터는 역시 출력(33)에 제공될 수 있다. In one particular implementation, processing stage 38 monitors the average intensity, red, green, or other natural color of the image data, for example, in the bodies of the packets. This intensity data can be applied to the output 33 of that stage. Using the image data, each packet can carry a timestamp indicating the specific time (based on an absolute value or a local clock) at which the image data was captured. This time data can also be provided to the output 33.

그러한 출력(33)에 결합된 동기화 처리기(35)는 그 주기성을 식별하기 위하여, 타임스탬프 데이터의 함수로서 프레임-대-프레임 강도(또는 컬러)의 변동을 조사할 수 있다. 더욱이, 이러한 모듈은 강도(또는 컬러)가 최대, 최소, 또는 다른 특정 상태를 가지는 다음 시간 순간을 예측할 수 있다. 위상-고정 루프는 조명의 양태의 주기성을 반영하기 위해 동기되는 오실레이터를 제어할 수 있다. 더욱 통상적으로, 디지털 필터는 타이머들에 대해 설정 또는 비교하기 위해 이용되는 - 선택적으로 소프트웨어 인터럽트들로 - 시간 간격을 계산한다. 디지털 위상-고정 루프 또는 지연-고정 루프가 또한 이용될 수 있다. (이러한 형태의 위상 고정을 위해 칼만 필터가 일반적으로 이용된다.) Synchronization processor 35 coupled to such output 33 may examine the variation of frame-to-frame intensity (or color) as a function of timestamp data to identify its periodicity. Moreover, such a module can predict the next time instant at which the intensity (or color) has a maximum, minimum, or other specific state. The phase-locked loop can control the oscillator to be synchronized to reflect the periodicity of aspects of the illumination. More typically, the digital filter calculates a time interval-optionally with software interrupts-used to set or compare against timers. Digital phase-locked loops or delay-locked loops may also be used. (Kalman filters are commonly used for this type of phase lock.)

제어 처리기 모듈(36)은 조명 조건이 원하는 상태를 가지는 것으로 예상될 때를 결정하기 위해 동기화 모듈(35)에 폴링할 수 있다. 이러한 정보를 이용하여, 제어 처리기 모듈(36)은 특수 용도를 위해 선호하는 조명 조건들 하에서 데이터의 프레임을 캡처하도록 셋업 모듈(34)에 지시할 수 있다. 예를 들면, 카메라가 녹색 채널에서 인코딩된 디지털 워터마크를 가진다고 짐작되는 오브젝트를 이미징하고 있는 경우, 처리기(36)는 녹색 조명이 최대인 것으로 예상되는 순간에 이미지의 프레임을 캡처하도록 카메라(32)에 지시하고, 그러한 워터마크의 검출을 위해 그 프레임을 처리하도록 처리 스테이지들(38)에 지시할 수 있다. The control processor module 36 may poll the synchronization module 35 to determine when the lighting condition is expected to have the desired state. Using this information, control processor module 36 may instruct setup module 34 to capture a frame of data under preferred lighting conditions for a particular use. For example, if the camera is imaging an object that is assumed to have a digital watermark encoded in the green channel, processor 36 may capture camera 32 to capture a frame of the image at the moment when the green light is expected to be at its maximum. And processing stages 38 to process the frame for detection of such a watermark.

카메라 폰에는 일반적으로 대상에 대해 백색 광 조명의 플래시를 생성하기 위해 직렬도 동작되는 복수의 LED 광 소스들이 내장될 수 있다. 그러나, 개별적으로 또는 상이한 조합들로 동작되어, 이들은 대상에 광의 상이한 컬러들을 캐스팅할 수 있다. 폰 처리기는 백색 아닌 조명으로 프레임들을 캡처하기 위해, 구성요소 LED 소스들을 개별적으로 제어할 수 있다. 녹색-채널 워터마크를 디코딩하기 위해 판독되는 이미지를 캡처한다면, 프레임이 캡처될 때 녹색 조명만 적용될 수 있다. 또는 카메라가 복수의 연속적인 프레임들을 캡처할 수 있다 - 상이한 LED들이 대상을 조명한다. 하나의 프레임은 적색 단독 조명의 대응하는 기간으로 1/25O번째 초에서 캡처될 수 있다; 후속 프레임은 녹색 전용 조명의 대응하는 기간을 가지고 1/100번째 초에서 캡처될 수 있다. 이들 프레임들은 개별적으로 분석될 수 있거나, 예를 들면 집단으로 분석하기 위해 조합될 수 있다. 또는 단일 이미지 프레임이 1/100번째 초의 간격에 걸쳐 캡처될 수 있으며, 적색 LED가 그 전체 간격에서 활성화되고, 적색 LED가 그 1/100번째 초 간격 중에 1/250번째 초 동안 활성화된다. 순시적 주변 조명이 감지될 수 있고(또는 상기와 같이 예측될 수 있고), 구성요소 LED 컬러 광 소스들은 각각의 방식으로 동작될 수 있다(예를 들면, 청색 LED로부터 청색 조명을 추가함으로써 텅스텐 조명의 주황색에 반대로 동작하기 위해).The camera phone can typically be equipped with a plurality of LED light sources that are also operated in series to produce a flash of white light illumination for the subject. However, operated individually or in different combinations, they can cast different colors of light to the object. The phone processor may individually control component LED sources to capture frames with non-white illumination. If you capture an image that is read to decode the green-channel watermark, only green illumination can be applied when the frame is captured. Or the camera can capture a plurality of consecutive frames-different LEDs illuminate the subject. One frame can be captured at 1 / 250th second with the corresponding period of red single illumination; Subsequent frames can be captured at the 1 / 100th second with the corresponding duration of the green only illumination. These frames can be analyzed individually or combined, for example to analyze in groups. Or a single image frame can be captured over an interval of 1 / 100th second, a red LED is activated at its entire interval, and a red LED is activated for 1 / 250th of its 1 / 100th interval. Instantaneous ambient light can be detected (or predicted as above), and component LED color light sources can be operated in each manner (eg, by adding blue light from a blue LED to tungsten light). To work against the orange).

다른 주지사항들; 프로젝터들Other notices; Projectors

패킷-기반, 데이터 구동된 아키텍처가 도 16에 도시되었지만, 다양한 다른 구현들이 당연히 가능하다. 이러한 대안적인 아키텍처들은 주어진 세부사항들에 기초하여 당업자에게 수월하다. Although a packet-based, data driven architecture is shown in FIG. 16, various other implementations are naturally possible. These alternative architectures are easy for those skilled in the art based on the details given.

당업자는 어레인지먼트들 및 상술된 세부사항들이 임의적임을 알 것이다. 어레인지먼트들 및 세부사항들의 실제 선택들은 서빙되는 특정 애플리케이션들에 의존할 것이고, 주지된 것과 상이할 가능성이 가장 크다. (사소한 예이지만 인용하기 위하여, FFT들은 16 x 16 블록들에 대해 실행되는 것이 아니라, 64 x 64, 256 x 256, 전체 이미지 등에 대해 행해질 수 있다.) Those skilled in the art will appreciate that the arrangements and details described above are arbitrary. The actual choices of arrangements and details will depend on the particular applications being served and most likely will be different from the well known. (This is a trivial example, but for the sake of quoting, the FFTs may not be performed on 16 x 16 blocks, but on 64 x 64, 256 x 256, full image, etc.)

유사하게, 패킷의 몸체가 데이터의 전체 프레임을 전달할 수 있거나 발췌들만(예를 들면, 128 x 128 블록)을 전달할 수 있음을 알 것이다. 따라서, 단일 캡처된 프레임으로부터의 이미지 데이터는 일련의 여러 패킷들을 스패닝한다. 공용 프레임 내의 상이한 발췌들은 이들이 전달되는 패킷에 의존하여 상이하게 처리될 수 있다. Similarly, it will be appreciated that the body of a packet may carry an entire frame of data or only excerpts (eg, 128 x 128 blocks). Thus, image data from a single captured frame spans a series of multiple packets. Different excerpts in the common frame may be processed differently depending on the packet in which they are delivered.

더욱이, 처리 스테이지(38)는 하나의 패킷을 다수의 패킷들로 - 이미지 데이터를 16개의 타일링된 더 작은 서브-이미지들로 분리함으로써와 같이 - 나누도록 명령어될 수 있다. 따라서, 더 많은 패킷들은 시작시에 생성된 것보다 시스템의 끝에서 제공될 수 있다. Moreover, processing stage 38 may be instructed to divide one packet into multiple packets, such as by splitting the image data into 16 tiled smaller sub-images. Thus, more packets may be provided at the end of the system than generated at the beginning.

동일한 방식으로, 단일 패킷은 일련의 상이한 이미지들(예를 들면, 상이한 초점, 개구 또는 셔터 설정들을 가지고 순차적으로 취해진 이미지들: 특정 예는 필드 브래킷의 깊이 - 오버랩핑, 어뷰팅 또는 해체 - 또는 초점 브래킷으로 취해진 5개의 이미지들로부터의 초점 영역들의 세트이다)로부터 데이터의 콜렉션을 포함할 수 있다. 이러한 세트의 데이터는 그 후에 나중 스테이지들에 의해 처리될 수 있다 - 세트로서 또는 처리를 통해, 처리는 지정된 기준(예를 들면, 초점 선명도 메트릭)을 충족시키는 패킷 페이로드의 하나 이상의 발췌들을 선택한다. In the same way, a single packet can be a series of different images (e.g., images taken sequentially with different focus, aperture or shutter settings: a specific example is the depth of field bracket-overlapping, arranging or tearing down-or focusing). A set of focus areas from five images taken with a bracket). This set of data can then be processed by later stages-as a set or through the processing, the processing selects one or more excerpts of the packet payload that meet the specified criteria (eg, focus sharpness metric). .

상술된 특정 예에서, 각각의 처리 스테이지(38)는 일반적으로 패킷의 몸체의 원래 수신된 데이터를 처리한 결과로 대체한다. 다른 어레인지먼트들에서, 이것은 그 경우일 필요가 없다. 예를 들면, 스테이지는 묘사된 처리 체인 외부의 모듈에, 예를 들면 출력(33)에 그 처리 결과를 출력할 수 있다. (또는 주지된 바와 같이, 스테이지는 원래 수신된 데이터를 - 출력 패킷의 몸체에 - 유지하고, 이를 다른 데이터로 - 그 처리 결과(들)와 같이 - 증대시킬 수 있다.) In the specific example described above, each processing stage 38 generally substitutes the result of processing the originally received data of the body of the packet. In other arrangements, this need not be the case. For example, the stage may output the processing result to a module outside the depicted processing chain, for example to output 33. (Or, as noted, the stage may retain the originally received data-in the body of the output packet and augment it with other data-as its processing result (s).)

DCT 주파수 스펙트럼들 또는 에지 검출된 데이터를 참조함으로써 초점을 결정하기 위해 참조가 이루어졌다. 많은 소비자 카메라들은 초점 확인의 더 간단한 형태를 실행한다 - 단순히, 인접한 픽셀들의 쌍들 사이의 강도 차(콘트라스트)를 결정함으로써. 이러한 차는 정확한 초점으로 피크된다. 이러한 어레인지먼트는 상술된 어레인지먼트들에서 자연스럽게 이용될 수 있다. (다시, 센서 칩에 대한 이러한 처리를 실행하는 것으로부터 이점들이 누적될 수 있다.)Reference was made to determine focus by referring to DCT frequency spectra or edge detected data. Many consumer cameras perform a simpler form of focus confirmation-simply by determining the intensity difference (contrast) between pairs of adjacent pixels. This difference is peaked at the correct focus. Such an arrangement can naturally be used in the arrangements described above. (Again, benefits can accumulate from performing this process on the sensor chip.)

각각의 스테이지는 통상적으로 인접하는 단과 핸드세이킹 교환(handshaking exchange)을 행한다 - 각각의 시간 데이터는 인접하는 단으로 넘겨지거나 그로부터 수신된다. 이러한 핸드세이킹은 디지털 시스템 설계와 친숙한 당업자에게는 일상적이고, 따라서, 여기에서 장황하게 논의하지 않는다. Each stage typically performs a handshaking exchange with an adjacent stage-each time data is passed to or received from the adjacent stage. Such handshaking is routine to those skilled in the art familiar with digital system design and, therefore, is not discussed in detail here.

상술된 어레인지먼트들은 단일 이미지 센서를 고찰하였다. 그러나, 다른 실시예들에서, 다수의 이미지 센서들이 이용될 수 있다. 종래의 스테레오스코픽 처리를 가능하게 하는 것 외에도, 2개 이상의 이미지 센서들이 가능하거나 많은 다른 동작들을 향상시킨다. The above-described arrangements considered a single image sensor. However, in other embodiments, multiple image sensors may be used. In addition to enabling conventional stereoscopic processing, two or more image sensors enable or enhance many other operations.

다수의 카메라들로부터 이점이 있는 하나의 기능은 오브젝트들을 식별하는 것이다. 간단한 예를 인용하기 위하여, 단일 카메라는 얼굴의 화상으로부터 인간의 얼굴을 식별할 수 없다(예를 들면, 잡지에서, 광고판에서, 또는 전자 디스플레이 스크린 상에서 발견될 수 있으므로). 공간-이격된 센서들을 이용하여, 콘트라스트에서, 3D 양태의 화상이 쉽게 구별될 수 있어서 화상이 사람을 식별되게 한다. (구현에 의존하여, 실제로 구별되는 사람의 3D 양태일 수 있다.)One function that benefits from multiple cameras is identifying objects. To quote a simple example, a single camera cannot identify a human face from an image of a face (eg, as it can be found in a magazine, on a billboard, or on an electronic display screen). Using space-spaced sensors, in contrast, the 3D aspect of the image can be easily distinguished, allowing the image to be identified by a person. (Depending on the implementation, it may be a 3D aspect of a person that is actually distinct.)

다수의 카메라들로부터 이점이 있는 다른 기능은 지리적 위치의 개량(refinement)이다. 2개의 이미지들 사이의 차이들로부터, 처리기는 그 위치가 정확하게 알려질 수 있는 랜드마크들로부터 디바이스의 거리를 결정할 수 있다. 이것은 디바이스에 이용 가능한 다른 지리적 위치 데이터의 개량을 허용한다(예를 들면, WiFi 노드 식별, GPS 등).Another feature that benefits from multiple cameras is the refinement of the geographic location. From the differences between the two images, the processor can determine the distance of the device from landmarks whose location can be known accurately. This allows for improvement of other geographic location data available to the device (eg WiFi node identification, GPS, etc.).

셀 폰이 하나, 두개(또는 그 이상)의 센서들을 가질 수 있으므로, 그러한 디바이스도 또한 하나, 두개(또는 그 이상)의 프로젝터들을 가질 수 있다. 개별 프로젝터들이 CKing(중국 비전에 의해 배포된 N70 모델) 및 삼성(MPB200)에 의해 셀 폰들에 배치되고 있다. LG 및 다른 것들은 프로토타입들을 도시하였다. (이들 프로젝터들은 LED 또는 레이저 조명과 함께 텍사스 인스트루먼츠 전자적으로 조정 가능한 디지털 마이크로-미러 어레이들의 이용하는 것으로 이해된다.) 마이크로비전은 PicoP 디스플레이 엔진을 제공하며, 이것은 마이크로-전기-기계적 스캐닝 미러(레이저 소스들 및 광학 조합기와 함께)를 이용하여 프로젝터 능력을 산출하기 위해 다양한 디바이스들에 통합될 수 있다. 다른 적절한 프로젝션 기술들은 디스플레이테크의 강유전성 LCOS 시스템들 및 실리콘(LCOS) 상의 3M 액정을 포함한다. Since a cell phone may have one or two (or more) sensors, such a device may also have one, two (or more) projectors. Individual projectors are being deployed in cell phones by CKing (N70 model distributed by China Vision) and Samsung (MPB200). LG and others have shown prototypes. (These projectors are understood to use Texas Instruments electronically adjustable digital micro-mirror arrays with LED or laser lighting.) Microvision provides a PicoP display engine, which is a micro-electro-mechanical scanning mirror (laser sources And optical combiner) to integrate the various devices to calculate projector capabilities. Other suitable projection techniques include DisplayTech's ferroelectric LCOS systems and 3M liquid crystals on silicon (LCOS).

2개의 프로젝터들 또는 2개의 카메라들의 이용은 프로젝션 또는 뷰잉의 차동들을 제공하고, 대상에 관한 추가 정보를 제공한다. 스테레오 특징들 외에도, 또한 국부적 이미지 정정을 가능하게 한다. 예를 들면, 디지털 워터마킹된 오브젝트를 이미징하는 2개의 카메라들을 고려하자. 오브젝트의 하나의 카메라 뷰는 오브젝트의 표면으로부터 식별될 수 있는 변환의 한 측정을 제공한다(예를 들면, 인코딩된 교정 신호들에 의해). 이 정보는 다른 카메라에 의해 오브젝트의 뷰를 정정하기 위해 이용될 수 있다. 그리고 반대로도 가능하다. 2개의 카메라들은 반복될 수 있어서, 오브젝트 표면의 포괄적인 특징을 산출한다. (하나의 카메라는 표면의 더욱 양호한 예시 영역을 볼 수 있고, 다른 카메라가 볼 수 없는 다른 에지들을 볼 수 있다. 따라서, 하나의 뷰는 다른 뷰가 나타내지 않는 정보를 나타낼 수 있다.) The use of two projectors or two cameras provides differentials of projection or viewing and provides additional information about the subject. In addition to the stereo features, it also enables local image correction. For example, consider two cameras that image a digital watermarked object. One camera view of an object provides a measure of the transformation that can be identified from the surface of the object (eg, by encoded calibration signals). This information can be used by other cameras to correct the view of the object. And vice versa. The two cameras can be repeated, yielding a comprehensive feature of the object surface. (One camera can see a better example area of the surface and see other edges that other cameras can't see. Thus, one view can represent information that no other view represents.)

참조 패턴(예를 들며, 그리드)이 표면 상에 프로젝팅되면, 표면의 형상이 패턴의 왜곡들에 의해 드러난다. 도 16은 프로젝터를 포함하도록 확장될 수 있으며, 프로젝터는 카메라 시스템에 의해 캡처하기 위해 오브젝트상으로 패턴을 프로젝팅한다. (프로젝터의 동작은 예를 들면 제어 처리기 모듈(36)에 의해 카메라의 동작과 동기될 수 있다 - 중요한 배터리 드레인을 부과하고 있으므로, 프로젝트가 필요할 때에만 활성화된다.) 모듈들(38)(로컬 또는 원격)에 의한 결과 이미지의 처리는 프로젝트의 표면 토폴로지에 관한 정보를 제공한다. 이 3D 토폴로지 정보는 오브젝트를 식별하는데 단서로서 이용될 수 있다. When a reference pattern (eg, a grid) is projected onto the surface, the shape of the surface is revealed by the distortions of the pattern. 16 may be extended to include a projector, which projects a pattern onto an object for capture by a camera system. (The operation of the projector can be synchronized with the operation of the camera, for example by the control processor module 36-since it imposes an important battery drain, it is only activated when the project is needed.) Modules 38 (local or The processing of the resulting image by remote provides information about the surface topology of the project. This 3D topology information can be used as a clue to identify the object.

오브젝트의 3D 구성에 관한 정보를 제공하는 것 외에도, 형상 정보는 표면이 임의의 다른 구성, 예를 들면 평면에 가상으로 재맵핑되도록 허용한다. 이러한 재맵핑은 일종의 정규화 동작의 역할을 한다. In addition to providing information about the 3D configuration of the object, the shape information allows the surface to be virtually remapped to any other configuration, for example a plane. This remapping serves as a kind of normalization behavior.

일 특정 어레인지먼트에서, 시스템(30)은 참조 패턴을 카메라의 시야로 프로젝팅하도록 프로젝터를 동작시킨다. 패턴이 프로젝팅되면, 카메라는 이미지 데이터의 프레임을 캡처한다. 결과 이미지가 참조 패턴을 검출하도록 처리되고, 그로부터 이미지 프로젝트의 3D 형상을 특징짓는다. 후속 처리가 3D 형상 데이터에 기초하여 그 후에 뒤따른다. In one particular arrangement, the system 30 operates the projector to project the reference pattern into the field of view of the camera. Once the pattern is projected, the camera captures a frame of image data. The resulting image is processed to detect the reference pattern, from which the 3D shape of the image project is characterized. Subsequent processing follows thereafter based on the 3D shape data.

(이러한 어레인지먼트들과 관련하여, 판독자는 관련된 원리들을 활용하는 구글 북-스캐닝 특허 7,508,978을 참조한다. 그 특허는 관련 개시내용들 중에서 특별히 유용한 참조 패턴을 상술한다.)(With respect to these arrangements, the reader refers to Google Book-Scanning Patent 7,508,978, which utilizes related principles. The patent details reference patterns which are particularly useful among the relevant disclosures.)

프로젝터가 조준된 레이저 조명(PicoP 디스플레이 엔진과 같이)을 이용하는 경우, 패턴은 패턴이 프로젝팅되는 오브젝트에 대한 거리에 상관없이 초점이 맞춰질 것이다. 이것은 셀 폰 카메라의 초점이 임의의 대상으로 조정하는데 도움으로 이용될 수 있다. 프로젝팅된 패턴이 카메라에 의해 미리 알려져 있으므로, 캡처된 이미지 데이터는 패턴의 검출을 - 정정에 의해서와 같이 - 최적화하도록 처리될 수 있다. (또는 패턴은 검출을 용이하게 하도록 선택될 수 있다 - 적절하게 초점이 맞추어질 때 이미지 주파수 도메인에서의 단일 주파수에서 강력하게 나타나는 체커보드와 같이.) 일단 카메라가 알려진 조준된 패턴의 최적의 초점으로 조정되면, 프로젝팅된 패턴은 불연속이 될 수 있고, 카메라는 그 후에 패턴이 프로젝팅되었던 피사체의 적절히 초점이 맞추어진 이미지를 캡처할 수 있다. If the projector uses aimed laser light (such as the PicoP display engine), the pattern will be focused regardless of the distance to the object on which the pattern is projected. This can be used to help the cell phone camera's focus adjust to any object. Since the projected pattern is known in advance by the camera, the captured image data can be processed to optimize the detection of the pattern-as by correction. (Or patterns can be selected to facilitate detection-such as a checkerboard that appears strongly at a single frequency in the image frequency domain when properly focused.) Once the camera is at the optimal focus of a known aimed pattern Once adjusted, the projected pattern can be discontinuous and the camera can then capture a properly focused image of the subject to which the pattern was projected.

동시 검출이 또한 활용될 수 있다. 패턴은 하나의 프레임의 캡처 동안 프로젝팅될 수 있고, 그 후에 다음 캡처를 위해 오프될 수 있다. 그 후에 2개의 프레임들은 제거될 수 있다. 2개의 프레임들에서의 공용 이미지는 일반적으로 삭제된다 - 훨씬 더 높은 신호대 잡음비의 프로젝팅된 패턴을 남겨둔다.Simultaneous detection can also be utilized. The pattern can be projected during the capture of one frame and then off for the next capture. The two frames can then be removed. The common image in the two frames is generally erased-leaving the projected pattern of a much higher signal-to-noise ratio.

프로젝팅된 패턴은 카메라의 시야에서 여러 대상들에 대한 정확한 초점을 결정하기 위해 이용될 수 있다. 아이는 그랜드 캐논 앞에서 포즈를 취할 수 있다. 레이저-프로젝팅된 패턴은 카메라가 제 2 프레임 제 1 프레임에서 아이에게 초점을 맞추고 제 2 프레임에서 배경에 초점을 맞추도록 허용한다. 그 후에 이들 프레임들은 합성될 수 있다 - 각각으로부터 적절하게 초점이 맞추어진 부분을 취한다. The projected pattern can be used to determine the correct focus for various objects in the field of view of the camera. The child can pose in front of the Grand Cannon. The laser-projected pattern allows the camera to focus the eye in the first frame of the second frame and the background in the second frame. These frames can then be synthesized-taking the appropriately focused portion from each.

렌즈 어레인지먼트가 셀 폰의 프로젝터 시스템에서 이용되는 경우, 셀 폰의 카메라 시스템이 또한 이용될 수 있다. 미러는 렌즈에 카메라 또는 프로젝터를 조정하기 위해 제어 가능하게 이동될 수 있다. 또는 빔스플리터 어레인지먼트(80)가 이용된다(도 20). 여기서 셀 폰(81)의 몸체는 광을 빔-스플리터(84)에 제공하는 렌즈(82)를 통합한다. 조명의 일부는 카메라 센서(12)에 라우팅된다. 광 경로의 다른 부분은 마이크로-미러 프로젝터 시스템(86)으로 나아간다. If a lens arrangement is used in the projector system of the cell phone, the camera system of the cell phone may also be used. The mirror may be controllably moved to adjust the camera or projector to the lens. Or beamsplitter arrangement 80 is used (FIG. 20). The body of the cell phone 81 here incorporates a lens 82 that provides light to the beam-splitter 84. Some of the illumination is routed to the camera sensor 12. The other part of the light path goes to the micro-mirror projector system 86.

셀 폰 프로젝터들에 이용된 렌즈들은 셀 폰 카메라들에 이용된 것들보다 통상적으로 개구가 더 커서, 카메라는 그러한 공유된 렌즈의 이용에 의해 상당한 성능 이점들(예를 들면, 더 짧은 노출들을 가능하게 함)을 얻을 수 있다. 또는 상호간에, 빔 스플리터(84)는 - 두 광 경로들을 동일하게 선호하지 않고 - 비대칭적일 수 있다. 예를 들면, 빔-스플리터는 센서 경로(83)에 대한 입사광의 더 작은 프렉션(예를 들면, 2%, 8%, 또는 25%)을 외부적으로 결합하는 부분적으로 은인 소자일 수 있다. 따라서, 빔-스플리터는 프로젝션을 위해, 마이크로-미러 프로젝터로부터 조명의 더 큰 프렉션(예를 들면, 98%, 92%, 또는 75%)을 외부적으로 결합하도록 서빙할 수 있다. 이러한 어레인지먼트에 의해, 카메라 센서(12)는 통상적인 - 셀 폰 카메라에 대한 - 강도(더 큰 개구 렌즈들에도 불구하고)의 광을 수신하지만, 프로젝터로부터 출력된 광은 렌즈 공유 어레인지먼트에 의해 단지 약간만 흐려진다. Lenses used in cell phone projectors are typically larger in aperture than those used in cell phone cameras, so that the camera allows significant performance advantages (eg, shorter exposures) by the use of such a shared lens. Can be obtained). Or mutually, the beam splitter 84 may be asymmetrical-without equally favoring the two light paths. For example, the beam-splitter may be a partially silver device that externally couples a smaller fraction of incident light (eg, 2%, 8%, or 25%) to the sensor path 83. Thus, the beam-splitter may serve to externally combine a larger fraction of the illumination (eg, 98%, 92%, or 75%) from the micro-mirror projector for projection. By this arrangement, the camera sensor 12 receives light of the intensity (in spite of the larger aperture lenses), which is conventional-for a cell phone camera, but the light output from the projector is only slightly by the lens sharing arrangement. Cloudy

다른 어레인지먼트에서, 카메라 헤드는 셀 폰 몸체로부터 분리된다 - 또는 탈착 가능하다. 셀 폰 몸체는 이용자의 포켓 또는 지갑에 휴대되지만, 카메라 헤드는 이용자의 포켓 너머의 밖을 보도록 적응된다(예를 들면, 펜과 유사한 팩터의 형태로, 포켓 클립을 가지고, 펜 배럴에 배터리를 가지고). 2개는 불루투스 또는 다른 무선 어레인지먼트에 의해 통신하며, 카메라 헤드로부터 송신된 이미지 데이터 및 폰 몸체로부터 송신된 명령어들을 캡처한다. 이러한 구성은 카메라가 이용자 앞의 장면을 일정하게 조사하도록 허용한다 - 셀 폰이 이용자의 포켓/지갑으로부터 제거될 필요없이.In other arrangements, the camera head is detached from the cell phone body-or detachable. The cell phone body is carried in the user's pocket or purse, but the camera head is adapted to look beyond the user's pocket (e.g., in the form of a pen-like factor, with pocket clips, with batteries in the pen barrel) ). The two communicate by Bluetooth or other wireless arrangement and capture image data sent from the camera head and instructions sent from the phone body. This configuration allows the camera to constantly look at the scene in front of the user-without the cell phone having to be removed from the user's pocket / wallet.

관련 어레인지먼트에서, 카메라에 대한 스트로브 광이 셀 폰 몸체로부터 분리된다 - 또는 탈착 가능하다. 광(LED들을 통합할 수 있음)은 이미지 대상 근처에 배치될 수 있어서, 원하는 가도 및 거리로부터 조명을 제공할 수 있다. 스트로브는 셀 폰 카메라 시스템에 의해 발행된 무선 명령어에 의해 시동될 수 있다. In a related arrangement, the strobe light for the camera is separated from the cell phone body-or detachable. Light (which may incorporate LEDs) may be placed near the image object, providing illumination from the desired street and distance. The strobe can be started by radio commands issued by the cell phone camera system.

(광학 시스템 설계의 당업자는 특별히 주지된 어레인지먼트들에 대한 다수의 대안들을 알 것이다.)(A person skilled in the art of optical system design will know a number of alternatives to specially known arrangements.)

2개의 카메라들을 가지는 것으로부터 생기는 이점들 중 일부는 2개의 프로젝터들(단일 카메라를 가진)을 가짐으로써 실현될 수 있다는 점이다. 예를 들면, 2개의 프로젝터들은 교대하는 또는 달리 구별 가능한 패턴들(예를 들면, 동시적이지만 상이한 컬러, 패턴, 극성 등)을 카메라의 시야로 프로젝팅할 수 있다. 2개의 패턴들 - 상이한 지점들로부터 프로젝팅된 - 이 어떻게 오브젝트 상에 제공되고 카메라에 의해 보일 때 상이한지 주지함으로써, 스테레오스코픽 정보가 다시 식별될 수 있다. Some of the advantages resulting from having two cameras can be realized by having two projectors (with a single camera). For example, two projectors may project alternating or otherwise distinguishable patterns (eg, simultaneous but different colors, patterns, polarities, etc.) into the field of view of the camera. By noting how the two patterns-projected from different points-are provided on the object and seen by the camera, the stereoscopic information can be identified again.

많은 이용 모델들은 새로운 공유 모델들을 포함한 프로젝터의 이용을 통해 가능해진다(2009년 Mobile Interaction with the Real World에서 Greaves에 의한 "View & Share: Exploring Co-Present Viewing and Sharing of Pictures using Personal Projection" 참조). 이러한 모델들은 기계 판독가능한 트리거들을 숨기기 위해, 공용으로 이해된 심볼("개방" 부호)을 통해 명백하게, 공유 세션을 개시하기 위한 트리거로서 프로젝터에 의해 자체 생성된 이미지를 활용한다. 공유는 또한, 피어 투 피어 애플리케이션들 또는 서버 호스트된 애플리케이션을 활용하는 ad hoc 네트워크들을 통해 발생할 수 있다. Many usage models are made possible through the use of projectors, including new shared models (see "View & Share: Exploring Co-Present Viewing and Sharing of Pictures using Personal Projection" by Greaves in Mobile Interaction with the Real World in 2009). These models utilize self-generated images by the projector as triggers to initiate a sharing session, apparently via a commonly understood symbol (“open” sign), to hide machine readable triggers. Sharing can also occur over ad hoc networks utilizing peer to peer applications or server hosted applications.

모바일 디바이스들로부터의 다른 출력이 유사하게 공유될 수 있다. 키벡터들을 고려하자. 하나의 이용자의 폰은 허프 변환 및 다른 아이겐페이스 추출 기술들로 이미지를 처리할 수 있고, 그 후에 이용자의 소셜 사이클에서 다른 사람들과 에이겐페이스 데이터의 결과로서 생긴 키벡터를 공유한다(그들에 동일한 것을 넣거나 그들이 그것을 풀링하도록 허용함으로써). 하나 이상의 이들 소셜-가입된 디바이스들은 그 후에, 오리지널 이용자에 의해 캡처된 이미지에서 이전에 인식되지 않은 얼굴의 식별을 산출하는 얼굴 템플릿 매칭을 실행할 수 있다. 이러한 어레인지먼트는 개인의 경험을 취하고, 이를 공용 경험으로 만든다. 더욱이, 경험은 대다수의 다른 사람들과 공유된 키벡터 데이터 - 본질적으로 경계들 없이 - 로 바이러스성 경험이 될 수 있다.Other outputs from mobile devices can be similarly shared. Consider keyvectors. One user's phone can process the image with Hough transform and other eigenface extraction techniques, and then share the keyvectors resulting from the eigenface data with others in the user's social cycle (the same By putting something or allowing them to pool it). One or more of these social-subscribed devices may then perform face template matching that yields an identification of a previously unrecognized face in the image captured by the original user. This arrangement takes the personal experience and makes it a shared experience. Moreover, the experience can be a viral experience with keyvector data shared with the majority of others-essentially without boundaries.

선택된 다른 Other selected 어레인지먼트들Arrangements

초기에 상술된 어레인지먼트들 외에도, 본 기술의 특정 구현들과 이용하기에 적합한 다른 하드웨어 어레인지먼트가 Mali-400 ARM 그래픽스 멀티프로세서 아키텍처를 이용하며, 이것은 이 문서에서 참조된 상이한 타입의 이미지 처리 작업들에 전념될 수 있는 복수의 프래그먼트 처리기들을 포함한다. In addition to the above-described arrangements earlier, other hardware arrangements suitable for use with certain implementations of the present technology utilize the Mali-400 ARM graphics multiprocessor architecture, which is dedicated to the different types of image processing tasks referenced in this document. A plurality of fragment processors may be included.

표준 그룹 Khronos는 OpenGL ES2.0을 발행하였으며, 이것은 다수의 CPU들 및 다수의 GPU들(셀 폰들이 점차적으로 이주하고 있는 방향)을 포함하는 시스템들에 대한 수백 개의 표준화된 그래픽스 함수 호들을 규정한다. OpenGL ES2.0는 상이한 처리 유닛들에 상이한 동작들을 라우팅하려고 한다 - 그러한 세부사항들은 애플리케이션 소프트웨어에 투명하다. 따라서, 이것은 GPU/CPU 하드웨어의 모든 방식과 이용 가능한 일치하는 소프트웨어 API를 제공한다. Standard group Khronos has published OpenGL ES2.0, which defines hundreds of standardized graphics function calls for systems that include multiple CPUs and multiple GPUs (the direction in which cell phones are migrating). . OpenGL ES2.0 seeks to route different operations to different processing units-such details are transparent to the application software. Thus, it provides a software API that is consistent with all manner of GPU / CPU hardware available.

본 기술의 다른 양태에 따라, OpenGL ES2 표준은 상이한 CPU/GPU 하드웨어에 걸칠 뿐 아니라, 상이한 클라우드 처리 하드웨어에 걸친 표준화된 그래픽스 처리 라이브러리를 제공하도록 확장된다 - 다시, 그러한 세부사항들은 호출 애플리케이션에 투명하다. According to another aspect of the present technology, the OpenGL ES2 standard extends not only to different CPU / GPU hardware but also to provide a standardized graphics processing library across different cloud processing hardware-again, such details are transparent to the calling application. .

점차적으로, 자바 서비스 요청들(JSR들)은 특정 자바-구현된 작업들을 표준화하도록 규정되었다. JSR들은 점차적으로, OpenGL ES2.0 등급 하드웨어의 최상부상의 효율적인 구현들을 위해 설계된다. Increasingly, Java service requests (JSRs) have been defined to standardize certain Java-implemented tasks. JSRs are increasingly designed for efficient implementations on top of OpenGL ES2.0 grade hardware.

본 기술의 또 다른 양태에 따라, 이 명세서에 주지된 이미지 처리 동작들의 일부 또는 전부(얼굴 인식, SIFT 처리, 워터마크 검출, 히스토그램 처리 등)는 JSR들로서 구현될 수 있다 - 다른 종류의 하드웨어 플랫폼들에 걸쳐 적합한 표준화된 구현들을 제공한다.According to another aspect of the present technology, some or all of the image processing operations (face recognition, SIFT processing, watermark detection, histogram processing, etc.) well known in this specification may be implemented as JSRs-other kinds of hardware platforms Provides standardized implementations that are suitable for

클라우드-기반 JSR들을 지원하는 것 외에도, 확장된 표준 명세는 또한 초기에 상술된 질의 라우터 및 응답 관리기 기능을 지원할 수 있다 - 양쪽 모두 정적 및 경매-기반 서비스 제공자들을 포함한다. In addition to supporting cloud-based JSRs, the extended standard specification can also support the query router and response manager functionality described earlier-both including static and auction-based service providers.

OpenGL은 OpenCV과 유사하다 - 개방 소스 라이센스 하에 이용 가능한 컴퓨터 비전 라이브러리는 다양한 함수들을 호출하기 위한 코더들을 허용한다 - 동일한 것을 실행하기 위해 활용되는 특정 하드웨어에 상관없이. (O'Reilly 도서, Learning OpenCV, 광범위한 언어의 문서들.) 대응물인 NokiaCV은 심비안 오퍼레이팅 시스템을 위해 표준화된 유사한 기능을 제공한다(예를 들면 노키아 셀 폰들).OpenGL is similar to OpenCV-a computer vision library available under an open source license allows coders to call various functions-regardless of the specific hardware utilized to execute the same. (O'Reilly Books, Learning OpenCV , Documents in a wide range of languages.) The counterpart NokiaCV provides similar functionality standardized for the Symbian operating system (eg Nokia cell phones).

OpenCV는 얼굴 인식, 제스처 인식, 움직임 추적/이해, 세그먼테이션 등과 같은 하이 레벨 작업들뿐만 아니라, 더 많은 원자적, 요소적 비전/이미지 처리 동작들의 대규모의 모음을 포함하여 광범위한 동작들에 대한 지원을 제공한다.OpenCV provides support for a wide range of operations, including high-level tasks such as face recognition, gesture recognition, motion tracking / understanding, segmentation, etc., as well as a large collection of more atomic and elementary vision / image processing operations. do.

CMVision은 본 기술의 특정 실시예들에서 활용될 수 있는 다른 패키지의 - 이 패키지는 카네기 멜로 대학의 연구자들에 의해 컴파일되었다 - 컴퓨터 비전 도구들이다. CMVision is a computer vision tool of another package that can be utilized in certain embodiments of the present technology, which was compiled by researchers at Carnegie Mello University.

또 다른 하드웨어 아키텍처는 필드 프로그래밍 가능한 오브젝트 어레이(FPOA) 어레인지먼트를 이용하며, 여기서 수백 개의 다른 종류의 16-비트 "오브젝트들"이 그리드 노드 방식으로 배열되며, 각각은 매우 높은 대역폭 채널들을 통해 이웃하는 디바이스들과 데이터를 교환할 수 있다. (초기에 참조된 PicoChip 디바이스들은 이 등급이다.) 각각의 기능은 FPGA들과 같이 프로그래밍 가능하다. 다시, 이미지 처리 작업의 차이는 FPOA 오브젝트들의 차이에 의해 실행될 수 있다. 이들 작업들은 필요시(예를 들면, 오브젝트가 하나의 상태에서 SIFT 처리를 실행할 수 있고; 다른 상태에서 FFT 처리를 실행할 수 있고; 또 다른 상태에서 로그-극성 처리를 실행할 수 있다; 등) 작동 중에 재규정될 수 있다. Another hardware architecture uses a field programmable object array (FPOA) arrangement, where hundreds of different types of 16-bit "objects" are arranged in a grid node fashion, each of which is a neighboring device over very high bandwidth channels. You can exchange data with them. (PicoChip devices referenced earlier are of this class.) Each function is programmable like FPGAs. Again, the difference in image processing operations can be implemented by the difference in FPOA objects. These tasks can be executed as needed (eg, an object can execute SIFT processing in one state; FFT processing in another state; log-polarity processing in another state; etc.) during operation. Can be redefined.

(로직 디바이스들의 많은 그리드 어레인지먼트들이 "가장 가까운 이웃" 상호접속에 기초하지만, "부분적 크로스바(partial crossbar)" 상호접속의 이용에 의해 부가의 유연성이 달성될 수 있다. 예를 들면, 특허 5,448,496(Quickturn Design Systems)을 참조한다.)(Although many grid arrangements of logic devices are based on a "closest neighbor" interconnect, additional flexibility can be achieved by using a "partial crossbar" interconnect. For example, Patent 5,448,496 (Quickturn). Design Systems).

또한, 하드웨어의 영역에서, 본 기술의 특정 실시예들은 "확장된 필드 깊이(extended depth of field)" 이미징 시스템들을 활용한다(예를 들면, 특허들 7,218,448, 7,031,054 및 5,748,371 참조). 이러한 어레인지먼트들은 오브젝트와 이미징 시스템 사이의 거리에 둔감하도록 시스템의 광 전달 함수를 수정하는 이미징 경로에 마스크를 포함할 수 있다. 이미지 품질은 필드 깊이에 걸쳐 균일하게 불량하다. 이미지의 디지털 포스트 프로세싱은 마스크 수정들을 보상하여, 이미지 품질을 복구하지만 증가된 필드 깊이를 유지한다. 이러한 기술을 이용하여, 셀 폰 카메라는 - 일반적으로 요구될 때 - 더 이상 노출을 필요로 하지 않고 모든 대상물들을 더 가까이 및 더 멀리 초점을 맞추어(즉, 더 높은 주파수 세부사항) 이미지를 캡처한다. (더 긴 노출들은 핸드-지터 및 오브젝트들 움직임과 같은 문제점들을 악화시킨다.) 본 명세서에 상술된 어레인지먼트들에서, 더 짧은 노출들은 광학/기계적 집속 요소들에 의해 생성된 일시적 지연을 참아내거나, 이미지의 어떤 요소들이 초점이 맞춰져야 하는지에 관한 이용자로부터의 입력을 요구하지 않고, 더 높은 품질 이미지가 이미지 처리 기능들에 제공되도록 허용한다. 이것은 이용자가 초점 또는 필드 깊이 설정들에 관한 염려 없이 이미징 디바이스를 간단히 원하는 타겟에 향하게 할 수 있을 때 훨씬 더 직관적 경험을 제공한다. 유사하게, 이미지 처리 기능들은 모두 동일한 초점에 있다고 예상되므로, 캡처된 이미지/프레임에 포함된 모든 픽셀들을 레버리징할 수 있다. 그 외에도, 프레임 내의 깊이에 관련된 픽셀들의 그룹들 또는 식별된 오브젝트들에 관한 새로운 메타데이터는 간단히 "깊이 맵" 정보를 생성하여, 깊이 정보의 송신에 관한 최근에 생겨난 표준들을 이용하여 3D 비디오 캡처 및 비디오 스트림들의 저장을 위해 단을 설정할 수 있다. Also in the area of hardware, certain embodiments of the present technology utilize "extended depth of field" imaging systems (see, for example, patents 7,218,448, 7,031,054 and 5,748,371). Such arrangements may include a mask in the imaging path that modifies the light transfer function of the system to be insensitive to the distance between the object and the imaging system. Image quality is poorly uniform over the depth of field. Digital post processing of the image compensates for mask modifications, recovering image quality but maintaining increased field depth. Using this technique, the cell phone camera captures the image by focusing all objects closer and farther (ie, higher frequency detail)-no longer needing exposure-as generally required. (Longer exposures exacerbate problems such as hand-jitter and object movements.) In the arrangements described herein, shorter exposures can withstand the temporary delay created by optical / mechanical focusing elements, or images Allowing a higher quality image to be provided to the image processing functions without requiring input from the user as to which elements of the element should be focused. This provides a much more intuitive experience when the user can simply point the imaging device to the desired target without worrying about focus or field depth settings. Similarly, the image processing functions are all expected to be in the same focus, thus leveraging all the pixels contained in the captured image / frame. In addition, new metadata about identified objects or groups of pixels related to depth within a frame simply generates “depth map” information to capture and display 3D video using emerging standards for the transmission of depth information. A stage can be set up for the storage of video streams.

일부 구현들에서, 셀 폰은 주어진 동작을 로컬로 실행하기 위한 능력을 가질 수 있지만, 클라우드 리소스에 의해 대신 실행되게 하려고 결정할 수 있다. 로컬로 또는 원격으로 처리할지의 결정은 대역폭 비용들, 외부 서비스 제공자 비용들, 셀 폰 배터리에 대한 전력 비용들, 처리를 지연시킴에 의한 소비자 (불)만족의 무형의 비용들 등을 포함한 "비용들"에 기초할 수 있다. 예를 들면, 이용자가 낮은 배터리 전력으로 실행중이고, 셀 타워로부터 먼 위치에 있다면(그래서 셀폰은 송신시 최대 출력으로 RF 증폭기를 실행함), 원격 처리를 위한 대형 데이터 블록을 송신하는 것은 배터리의 나머지 수명의 상당한 부분을 소비할 수 있다. 이러한 경우, 폰은 데이터를 로컬 처리하기로 결정할 수 있거나, 폰이 셀 사이트에 가깝거나 배터리가 재충전되었을 때 원격 처리를 위해 이를 송신하려고 결정할 수 있다. 저장된 규칙들의 세트는 상이한 방식들에 대한(예를 들면 로컬로 처리, 원격으로 처리, 처리를 미룸) 순수한 "비용 함수(cost function)"를 확립하기 위해 관련 변수들에 적용될 수 있고, 이들 규칙들은 이들 변수들의 상태들에 의존하여 상이한 결과들을 나타낼 수 있다.In some implementations, the cell phone can have the ability to perform a given action locally, but can decide to have it executed by the cloud resource instead. The decision of whether to handle locally or remotely includes "costs" including bandwidth costs, external service provider costs, power costs for cell phone batteries, intangible costs of consumer (unsatisfied) by delaying processing, and the like. On the " For example, if the user is running on low battery power and is far from the cell tower (so the cell phone runs the RF amplifier at full output at the time of transmission), sending a large block of data for remote processing is the rest of the battery. It can consume a significant part of its life. In this case, the phone may decide to process the data locally, or may decide to send it for remote processing when the phone is close to the cell site or the battery is recharged. The set of stored rules can be applied to relevant variables to establish a pure "cost function" for different ways (eg locally processed, remotely processed, postponed), and these rules Depending on the states of these variables may represent different results.

매력적인 "클라우드" 리소스는 무선 네트워크들의 에지들에서 발견된 처리 능력이다. 예를 들면, 셀룰러 네트워크들은 믹서들, 필터들, 복조기들 등과 같이 무선 회로들을 아날로그 송신 및 수신함으로써 통상적으로 실행되는 동작들의 일부 또는 전부를 - 디지털로 - 실행하기 위해 처리기들을 활용하여, 큰 부분의 소프트웨어-규정된 무선들인 타워 스테이션들을 포함한다. 더 작은 셀 스테이션들, 소위 "펨토셀들"은 통상적으로, 그러한 처리들을 위해 강력한 신호 처리 하드웨어를 가진다. 초기에 주지된 PicoChip 처리기들 및 다른 필드 프로그래밍 가능한 오브젝트 어레이들은 이러한 애플리케이션들에 광범위하게 배치된다. An attractive "cloud" resource is the processing power found at the edges of wireless networks. For example, cellular networks utilize processors to perform some or all of the operations typically performed by digitally transmitting and receiving wireless circuits, such as mixers, filters, demodulators, etc., digitally, to a large portion of the network. Tower stations, which are software-defined radios. Smaller cell stations, so-called "femtocells", typically have powerful signal processing hardware for such processes. Initially known PicoChip processors and other field programmable object arrays are widely deployed in these applications.

무선 신호 처리 및 이미지 신호 처리는 예를 들면, 샘플링된 데이터를 주파수 도메인으로 변환하기 위한 FFT 처리를 활용하고, 다양한 필터링 동작들을 적용하는 등 많은 공통성들을 가진다. 처리기들을 포함한 셀 스테이션 기기는 피크 소비자 요구들을 충족하도록 설계된다. 이것은 상당한 처리 능력이 종종 미이용된 상태로 남아 있음을 의미한다. Wireless signal processing and image signal processing have many commonalities, for example, utilizing FFT processing to convert sampled data into the frequency domain, applying various filtering operations, and the like. Cell station equipment including processors are designed to meet peak consumer needs. This means that significant processing power often remains unused.

본 기술의 다른 양태들에 따라, 셀룰러 타워 스테이션의 이러한 예비 무선 신호 처리 능력(및 무선 네트워크들의 다른 에지들)은 소비자 무선 디바이스들에 대한 이미지(및/또는 오디오 또는 다른) 신호 처리와 함께 용도변경된다. FFT 동작이 동일하므로 - 샘플링된 무선 신호들 또는 이미지 픽셀을 처리할지의 여부 - 용도변경은 종종 수월하다; 흔히 하드웨어 처리 코어들에 대한 구성 데이터는 필요하다면 그다지 변경될 필요가 없다. 그리고 3G/4G 네트워크들이 너무 빠르므로, 처리 작업은 소비자 디바이스로부터 셀 스테이션 처리기로 신속하게 급파될 수 있고, 유사한 속도로 결과들이 리턴된다. 셀 스테이션 처리기들의 그러한 용도변경이 공급하는 속도 및 계산 능력 외에도, 다른 이점들은 소비자 디바이스들의 전력 소비를 감소시키는 것이다. According to other aspects of the present technology, this preliminary wireless signal processing capability (and other edges of wireless networks) of a cellular tower station is repurposed along with image (and / or audio or other) signal processing for consumer wireless devices. do. Since the FFT operation is the same-whether to process sampled radio signals or image pixels-repurpose is often easy; Often configuration data for hardware processing cores need not be changed as much as needed. And because 3G / 4G networks are so fast, processing can be quickly dispatched from the consumer device to the cell station processor, and results are returned at similar rates. In addition to the speed and computational power that such repurpose of cell station processors provides, other advantages are to reduce the power consumption of consumer devices.

처리를 위한 이미지 데이터를 송신하기 전에, 셀 폰은 의도된 이미지 처리 동작을 착수할 만큼 충분한 미이용 용량을 가지는 것을 확인하기 위하여 통신하고 있는 셀 타워 스테이션에 신속히 문의할 수 있다. 이러한 질의는 도 10의 패키저/라우터; 도 10a의 로컬/원격 라우터, 도 7의 질의 라우터 및 응답 관리기; 도 16의 파이프 관리기(51) 등에 의해 송신될 수 있다. Before sending image data for processing, the cell phone can quickly inquire with the cell tower station it is communicating with to confirm that it has sufficient unused capacity to undertake the intended image processing operation. Such a query may include the packager / router of FIG. 10; The local / remote router of FIG. 10A, the query router and response manager of FIG. 16 may be transmitted by the pipe manager 51 of FIG.

다가올 처리 요청들 및/또는 대역폭 요건들을 셀 타워/기지국에 경고하는 것은 그러한 필요들을 충족하는 예상에서 셀 사이트가 그 처리 및 대역폭 리소스들을 더 양호하게 할당하도록 허용한다. Alerting the cell tower / base station of upcoming processing requests and / or bandwidth requirements allows the cell site to better allocate its processing and bandwidth resources in anticipation of meeting those needs.

셀 사이트들은 그들의 처리 또는 대역폭 용량을 소진하는 서비스 동작들을 착수하는 병목현상들이 될 위험이 있다. 이것이 발생할 때, 이들은 하나 이상의 이용자들에 제공된 처리/대역폭을 예기치 않게 다시 조임으로써 품질이 떨어질 수밖에 없어서, 다른 것이 서빙될 수 있다. 이러한 갑작스러운 서비스 변경은 채널이 원래 확립된 파라미터들을 변경하는 것이(예를 들면, 비디오가 전달될 수 있는 비트 레이트) 채널을 이용하는 데이터 서비스들로 하여금 그들 각각의 파라미터들을 재구성하게 하기 때문에 바람직하지 않다(예를 들면, 낮은 품질 비디오 피드를 제공하도록 ESPN을 요구함). 이러한 세부사항들을 재협정하여, 채널 및 서비스들이 원래 셋업되었으면, 늘 사소한 결함들, 예를 들면 비디오 전달 스터터링(stuttering), 폰 호들에서 중단된 실러블들 등을 유발한다. Cell sites are at risk of becoming bottlenecks in undertaking service operations that consume their processing or bandwidth capacity. When this occurs, they can only be degraded by unexpectedly retightening the processing / bandwidth provided to one or more users, so that others can be served. This abrupt service change is undesirable because the channel changes the originally established parameters (eg, the bit rate at which video can be delivered) causing the data services using the channel to reconfigure their respective parameters. (Eg, require ESPN to provide a low quality video feed). By renegotiating these details, if channels and services were originally set up, they always introduce minor defects, such as video delivery stuttering, interrupted troubles in phone calls, and the like.

이들 예기치 않은 대역폭 감속들 및 결과 서비스 손상들에 대한 필요성을 회피하기 위하여, 셀 사이트들은 가능한 피크 요구들을 위한 용량을 보존하기 위하여 보수적인 전략 - 대역폭/처리 리소스들 인색하게 할당 - 을 채택하려는 경향이 있다. 그러나, 이러한 방식은 정상적으로 제공되는 서비스 품질을 악화시킨다 - 예기치 않은 예상에서 통상적인 서비스를 희생한다. To avoid these unexpected bandwidth slowdowns and the need for consequent service impairments, cell sites tend to adopt a conservative strategy-stray allocation of bandwidth / processing resources-to conserve capacity for possible peak demands. have. However, this approach deteriorates the quality of service normally provided-at the expense of ordinary service at unexpected expectations.

본 기술의 이러한 양태에 따라, 셀 폰은 셀 타워 스테이션에, 예상한 대역폭 또는 처리 필요성들이 다가올 것이라는 경고들을 송신한다. 실제로, 셀 폰은 약간의 미래의 서비스 용량을 보존하도록 요청한다. 타워 스테이션은 또한 고정된 용량을 가진다. 그러나, 특정 이용자가 필요로 하는 것, 예를 들면 200밀리초에서 시작하는 3초 동안 8Mbit/s의 대역폭을 아는 경우에는 다른 이용자들을 서빙할 때 셀 사이트가 이러한 예상된 요구를 고려하도록 허용한다.According to this aspect of the present technology, the cell phone sends alerts to the cell tower station that anticipated bandwidth or processing needs are approaching. In fact, the cell phone asks to preserve some future service capacity. The tower station also has a fixed capacity. However, if one knows what a user needs, for example a bandwidth of 8 Mbit / s for 3 seconds starting at 200 milliseconds, he or she will allow the cell site to consider these expected needs when serving other users.

10 Mbit/s의 채널을 새로운 비디오 서비스 이용자에 정상적으로 할당하는 15 Mbit/s의 초과한(할당된) 채널 용량을 갖는 셀 사이트를 고려하자. 셀 카메라 이용자가 200 밀리초에서 시작하는 8 Mbit/s 채널에 대한 예약을 요청했음을 사이트가 알고 있고, 한편으로 새로운 비디오 서비스 이용자가 서비스를 요청한다면, 사이트는 새로운 비디오 서비스 이용자에게 일반적인 10 Mbit/s보다는 7 Mbit/s의 채널을 할당할 수 있다. 새로운 비디오 서비스 이용자의 채널을 초기에 더 느린 비트 레이트로 설정함으로써, 온고잉 채널 세션 동안 대역폭을 커팅백하는 것과 연관된 서비스 손상들이 회피된다. 셀 사이트의 용량은 동일하지만, 이제는 기존의 채널들, 중간-송신의 대역폭을 감소시키기 위한 필요를 감소시키는 방식으로 할당된다. Consider a cell site with an excess (allocated) channel capacity of 15 Mbit / s that normally allocates 10 Mbit / s channels to new video service users. If the site knows that a cell camera user has requested a reservation for an 8 Mbit / s channel starting at 200 milliseconds, and on the other hand, a new video service user requests a service, the site will receive a 10 Mbit / s typical for a new video service user. Rather, a 7 Mbit / s channel can be allocated. By initially setting the channel of the new video service user to a slower bit rate, service impairments associated with cutting back bandwidth during an ongoing channel session are avoided. The capacity of the cell site is the same, but is now allocated in a way that reduces the need to reduce the bandwidth of existing channels, mid-transmission.

다른 상황들에서, 셀 사이트는 현재 용량을 초과했음을 결정할 수 있지만, 1/2초에 더욱 무겁게 부담지워질 것으로 예상한다. 이 경우, 하나 이상의 비디오 가입자들, 예를 들면, 버퍼 메모리의 전달할 준비가 된 비디오 데이터의 여러 패킷들을 수집한 사람들에게 처리율을 높이기 위해 현재의 초과 용량을 이용할 수 있다. 이들 비디오 패킷들은 비디오 채널이 1/2 초 느려질 것을 예상하여 이제 확장된 채널을 통해 송신될 수 있다. 다시, 이것은 셀 사이트가 미래의 대역폭 요구들에 관한 유용한 정보를 가지기 때문에 실용적이다. In other situations, the cell site may determine that it has exceeded its current capacity, but expects to be more heavily burdened in 1/2 second. In this case, the current excess capacity can be used to increase throughput to one or more video subscribers, for example, those who have collected several packets of video data ready for delivery of the buffer memory. These video packets can now be transmitted over the extended channel in anticipation of slowing the video channel by 1/2 second. Again, this is practical because the cell site has useful information about future bandwidth requirements.

셀 폰으로부터 송신된 서비스 예약 메시지는 또한 우선순위 표시자를 포함할 수 있다. 이 표시자는 충돌하는 서비스 요구들 사이의 중재가 요구되는 경우에, 진술된 관점들에 대한 요청을 충족시키는 관련 중요도를 결정하기 위해 셀 사이트에 의해 이용될 수 있다. The service reservation message sent from the cell phone may also include a priority indicator. This indicator can be used by the cell site to determine the relative importance of satisfying the request for stated perspectives when mediation between conflicting service requests is required.

이러한 셀 폰들로부터의 예상하는 서비스 요청들은 또한, 셀 사이트가 정상적으로 할당되는 것보다 더 높은 품질 일관된 서비스를 제공하도록 허용할 수 있다. Expected service requests from these cell phones may also allow the cell site to provide a higher quality consistent service than normally assigned.

셀 사이트들은 이용 패턴들의 통계적 모델들을 활용하고 따라서 대역폭을 할당하는 것이 이해된다. 할당들은 예를 들면 시간의 99.99% 발생하는 시나리오들을 포함하여 이용 시나리오들의 현실적으로 최악의 경우를 예상하여 통상적으로 보수적으로 설정된다. (일부 이론적으로 가능한 시나리오들은 그다지 대역폭 할당들에서 무시될 수 있을 것 같지 않다. 그러나, 그러한 일어날 것 같지 않은 시나리오들이 발생하는 희귀한 경우들에서 - 수천의 가입자들이 오바마 취임식 동안 워싱턴 DC로부터 셀 폰 화상 메시지들을 송신할 때와 같이, 일부 가입자들은 간단히 서비스를 수신하지 않을 수 있다.)It is understood that cell sites utilize statistical models of usage patterns and thus allocate bandwidth. Allocations are typically set conservatively in anticipation of a realistic worst case of usage scenarios, including, for example, scenarios that occur 99.99% of the time. (Some theoretically possible scenarios are unlikely to be negligible in bandwidth allocations. However, in rare cases where such unlikely scenarios occur—thousands of subscribers from cell phone images from Washington, DC, during the Obama Inauguration. As with sending messages, some subscribers may simply not receive a service.)

사이트 대역폭 할당들이 기초하는 통계적 모델들은 가입자들을 - 부분적으로 - 예기치 않은 행위자들로서 다루는 것으로 이해된다. 특정 가입자가 다가올 수초에 서비스를 요청하는지의 여부( 및 어떤 특정 서비스가 요청되는지)는 랜덤한 양태를 가진다. Statistical models based on site bandwidth allocations are understood to treat subscribers-in part-as unexpected actors. Whether a particular subscriber requests service in the coming seconds (and what specific service is requested) has a random aspect.

통계적 모델에서 랜덤성이 클수록, 극한이 되려는 경향이 크다. 예약들 또는 미래의 요구들의 예측들이 예를 들면 가입자들의 15%로 일상적으로 제시된다면, 이들 가입자들의 거동은 더 이상 랜덤하지 않다. 최악의 경우 셀 사이트 상의 피크 대역폭 요구가 랜덤하게 동작하는 가입자의 100%가 아니라 단지 85%만 관련시킨다. 실제 예약 정보는 다른 것의 15%에 대해 활용될 수 있다. 따라서 피크 대역폭 이용의 가상적인 극한들은 적당하다. The greater the randomness in the statistical model, the greater the tendency to be extreme. If predictions of reservations or future requests are routinely presented, for example with 15% of the subscribers, the behavior of these subscribers is no longer random. In the worst case, the peak bandwidth demand on the cell site involves only 85%, not 100% of randomly operating subscribers. Actual booking information may be utilized for 15% of the others. Thus, virtual limits of peak bandwidth utilization are adequate.

더 낮은 피크 이용 시나리오들을 이용하여, 현재 대역폭의 더욱 일반적인 할당들은 모든 가입자들에게 허가될 수 있다. 즉, 이용자의 일부가 미래의 용량을 보존하는 사이트에 경고들을 송신한다면, 사이트는 곧 다가올 수 있는 실제 피크 요구가 여전히 미이용 용량 상태의 사이트인 것을 예측할 수 있다. 이러한 경우, 카메라 셀 폰 이용자에게 12 Mbit/s 채널을 - 예약 요구시 진술된 8 Mbit/s 채널 대신에 - 허가할 수 있고, 및/또는 보통 10 Mbit/s 채널 대신 15 Mbit/s 채널을 비디오 이용자에게 허가할 수 있다. 따라서, 이러한 이용 예측은 대역폭이 더 소수의 예기치 않은 행위자들을 위해 유지될 필요를 보존하기 때문에, 사이트가 보통 그 경우보다는 더 높은 품질 서비스들을 허가하도록 허용할 수 있다. Using lower peak usage scenarios, more general allocations of current bandwidth can be granted to all subscribers. In other words, if some of the users send alerts to a site that conserves future capacity, the site can predict that the actual peak demand that is coming soon is still a site that is in unused capacity. In such a case, the camera cell phone user may be allowed to grant 12 Mbit / s channels-in place of the 8 Mbit / s channels stated in the reservation request, and / or usually 15 Mbit / s channels instead of the 10 Mbit / s channels. You can authorize the user. Thus, this usage prediction may allow the site to grant higher quality services than usual, since the bandwidth preserves the need to be maintained for fewer unexpected actors.

예상하는 서비스 요청들은 또한, 셀 폰(또는 셀 사이트)으로부터 요청된 서비스들에 관련될 것으로 예상되는 다른 클라우드 처리들로 통신될 수 있어서, 이들이 그들 리소스들을 예상대로 유사하게 할당하는 것을 허용한다. 이러한 예상하는 서비스 요청들은 또한 클라우드 처리를 사전-워밍 연관된 처리로 변경하도록 서빙할 수 있다. 부가의 정보는 암호화 키들, 이미지 디멘젼들(예를 들면, 16 x 16 타일들에서 처리될 1024 x 768 이미지에 대한 FFT 처리기들로서 역할하도록 FPOA를 구성하고, 32개의 스펙트럼 주파수 대역들에 대한 계수들을 출력하기 위해) 등과 같이, 이 용도를 위해 셀 폰으로부터 또는 그밖의 어디서나 제공될 수 있다. Expected service requests can also be communicated to other cloud processes that are expected to be related to the requested services from the cell phone (or cell site), allowing them to similarly allocate their resources as expected. These anticipated service requests may also serve to change cloud processing to pre-warming associated processing. The additional information configures the FPOA to serve as encryption keys, image dimensions (e.g., FFT processors for 1024 x 768 images to be processed in 16 x 16 tiles, and output coefficients for 32 spectral frequency bands. May be provided from the cell phone or elsewhere for this purpose.

이제, 클라우드 리소스는 예상된 동작의 실행시 폰으로부터 요청될 수 있음을 예상하는 임의의 정보 또는 셀 폰이 실행하도록 요청할 수 있는 동작을 셀 폰에 경고할 수 있어서, 셀 폰이 그 자신의 다가올 동작들을 유사하게 예상하고 따라서 준비할 수 있다. 예를 들면, 클라우드 처리는 특정 조건들 하에서, 원래 제공된 데이터가 의도된 용도를 위해 충분하지 않은지를 평가하는 것처럼(예를 들면, 입력 데이터는 충분한 초점 해상도 없는 또는 충분한 콘트라스트 없는 또는 추가적인 필터링이 필요한 이미지일 수 있다), 입력 데이터의 다른 세트를 요청할 수 있다. 미리, 클라우드 처리가 그러한 추가적인 데이터를 요청할 수 있음을 알면, 셀 폰이 그 자신의 동작에서 이 가능성을 고려하도록 허용할 수 있으며, 예를 들면, 달리 그 경우일 수 있지 않는 한, 특정 필터 방식으로 구성된 처리 모듈들을 유지하는 것, 대체 이미지를 가능한 캡처하기 위해 센서 시간의 간격을 보존하는 것 등이다.The cloud resource can now warn the cell phone of any information that it expects to be requested from the phone upon execution of the expected action or an action that the cell phone can request to execute so that the cell phone has its own upcoming action. You can expect them and prepare accordingly. For example, cloud processing may evaluate, under certain conditions, whether the originally provided data is not sufficient for its intended use (e.g., the input data may not have sufficient focus resolution or sufficient contrast or an image requiring additional filtering). May request another set of input data. In advance, knowing that cloud processing may request such additional data, it may allow a cell phone to take this possibility into account in its own operation, for example, in a particular filter manner, unless otherwise. Maintaining the configured processing modules, preserving intervals of sensor time to capture alternate images as possible.

예상하는 서비스 요청들(또는 조건적 서비스 요청들의 가능성)은 일반적으로 수십 또는 수백 밀리초들에서 - 때때로 수 초에서 - 개시할 수 있는 이벤트들에 관련된다. 동작이 미래의 수십초 또는 수백초에서 시작하는 상황들은 희박할 것이다. 그러나, 미리 경고하는 기간이 짧을 수 있지만, 상당한 이점들이 도출될 수 있다: 다음 초의 랜덤성이 감소된다면 - 각각의 초, 시스템 랜덤성은 상당히 감소될 수 있다. 더욱이, 요청들이 관련되는 이벤트들은 더 긴 지속구간에 스스로 있을 수 있다 - 10초 이상을 취할 수 있는 큰 이미지 파일의 송신과 같이.Expected service requests (or likelihood of conditional service requests) generally relate to events that can initiate in tens or hundreds of milliseconds, sometimes in seconds. Situations where the action starts in tens or hundreds of seconds in the future will be sparse. However, although the warning period in advance can be short, significant advantages can be derived: if the randomness of the next second is reduced-each second, the system randomness can be significantly reduced. Moreover, the events to which requests are related can be themselves in a longer duration-such as sending a large image file that can take more than 10 seconds.

미리 셋업하는 것(사전-워밍하는)에 관련하여, 바람직하게, 완료할 임계 시간 간격보다 많이 취할 수 있는 임의의 동작은(예를 들면, 수백 밀리초, 밀리초, 10 마미크로초 등 - 구현에 의존함) 가능하다면 예상대로 준비되어야 한다. (일부 예들에서, 당연히, 예상하는 서비스는 결코 요청되지 않으며, 그 경우 그러한 준비는 무가치할 수 있다.)Regarding pre-setup (pre-warming), preferably, any action that can take more than a threshold time interval to complete (eg, hundreds of milliseconds, milliseconds, 10 microseconds, etc.)-Implementation If possible, be prepared as expected. (In some instances, of course, the expected service is never requested, in which case such preparation may be worthless.)

다른 하드웨어 어레인지먼트에서, 셀 폰 처리기는 열 이미지 잡음(존슨 잡음)이 잠재적 문제인 환경들에서, 이미지 센서에 결합된 펠티에(Peltier) 디바이스 또는 다른 열전 냉각기를 선택적으로 활성화시킬 수 있다. 예를 들면, 셀 폰이 낮은 광 조건을 검출한다면, 시도하도록 센서 상의 냉각기를 활성화시킬 수 있고 이미지 신호대 잡음비를 향상시킬 수 있다. 또는 이미지 처리 스테이지들은 열 잡음과 연관된 아티팩트들에 대해 캡처된 이미지를 조사할 수 있고, 이러한 아티팩트들이 임계값을 초과한다면, 냉각 디바이스가 활성화될 수 있다. (하나의 방법은 16 x 16 픽셀 영역과 같이, 신속한 연속의 2배로 화상의 패치를 캡처한다. 랜덤한 팩터들의 부재로, 2개의 패치들은 동일해야 한다 - 바람직하게 상관된다. 1.0으로부터의 상관의 변동은 잡음의 측정이다 - 아마도 열 잡음.) 냉각 디바이스가 활성화된 후의 짧은 시간 간격 - 냉각기/센서에 대한 열 응답 시간에 의존한 간격 - 에서 대체 이미지가 캡처될 수 있다. 마찬가지로, 셀 폰 비디오가 캡처되면, 센서에 대한 회로에 의해 증가된 스위칭 활동이 온도를 증가시키고 따라서 그 열 잡음을 증가시키기 때문에, 냉각기가 활성화될 수 있다. (냉각기를 활성화할지의 여부는 또한 애플리케이션 종속적일 수 있으며, 예를 들면, 냉각기는 워터마크 데이터가 판독되는 이미지를 캡처할 때 활성화될 수 있지만, 바코드 데이터가 판독될 수 있는 이미지를 캡처할 때 활성화되지 않는다.) In other hardware arrangements, the cell phone processor can selectively activate a Peltier device or other thermoelectric cooler coupled to the image sensor in environments where thermal image noise (Johnson noise) is a potential problem. For example, if the cell phone detects low light conditions, it can activate the cooler on the sensor to try and improve the image signal-to-noise ratio. Or image processing stages may examine the captured image for artifacts associated with thermal noise, and if such artifacts exceed a threshold, the cooling device may be activated. (One method captures a patch of an image at twice the rapid succession, such as a 16 x 16 pixel area. In the absence of random factors, the two patches should be identical-preferably correlated. Of correlation from 1.0 Variation is a measure of noise-perhaps thermal noise.) Alternate images can be captured at short time intervals after the cooling device is activated-intervals dependent on the thermal response time to the cooler / sensor. Similarly, once cell phone video is captured, the cooler can be activated because the increased switching activity by the circuitry for the sensor increases the temperature and thus its thermal noise. (Whether to activate the cooler may also be application dependent; for example, the cooler may be activated when capturing an image from which watermark data is read, but active when capturing an image from which barcode data may be read. Not.)

주지된 바와 같이, 도 16 어레인지먼트의 패킷들은 다양한 명령어들 및 데이터 - 둘 다에서 헤더 및 패킷 몸체 -를 전달할 수 있다. 다른 어레인지먼트에서, 패킷은 데이터베이스의 레코드 또는 클라우드 오브젝트에 대한 포인터를 부가적으로 또는 대안적으로 포함할 수 있다. 클라우드 오브젝트/데이터베이스 레코드는 오브젝트 인식에 유용한 오브젝트 속성들과 같은 정보를 포함할 수 있다(예를 들면, 특정 오브젝트에 대한 핑거프린터 또는 워터마크 속성들).As noted, the packets of the FIG. 16 arrangement may carry various instructions and data—both header and packet body—in both. In other arrangements, the packet may additionally or alternatively include a pointer to a record or cloud object in the database. The cloud object / database record may include information such as object attributes useful for object recognition (eg, fingerprint or watermark attributes for a particular object).

시스템이 워터마크를 판독하였으면, 패킷은 워터마크 패이로드를 포함할 수 있고, 헤더(또는 몸체)는 그 패이로드가 관련 정보와 연관될 수 있는 하나 이상의 데이터베이스 참조들을 포함할 수 있다. 비즈니스 카드로부터 판독된 워터마크 패이로드는 하나의 데이터베이스에서 룩업될 수 있다; 사진으로부터 디코딩된 워터마크는 다른 데이터베이스에서 룩업될 수 있다. 시스템은 다수의 상이한 워터마크 디코딩 알고리즘들을 단일 이미지에 적용할 수 있다(예를 들면, MediaSec, Digimarc ImageBridge, Civolution 등). 어떤 애플리케이션에 특정 디코딩 동작이 실행되었는지에 의존하여, 결과 워터마크 패이로드는 대응하는 목적지 데이터베이스에 송신될 수 있다. (상이한 바코드들, 핑거프린트 알고리즘, 아이겐페이스 기술들 등과 마찬가지이다.) 목적지 데이터베이스 어드레스는 애플리케이션 또는 구성 데이터베이스에 포함될 수 있다. (일반적으로, 어드레싱은 최종 데이터베이스의 어드레스를 포함하는 중간 데이터 저장으로 간접적으로 실행될 수 있어서, 각각의 셀 폰 애플리케이션을 변경하지 않고 데이터베이스의 재배치를 허용한다.) Once the system has read the watermark, the packet may include a watermark payload and the header (or body) may include one or more database references whose payload may be associated with related information. The watermark payload read from the business card can be looked up in one database; The watermark decoded from the picture can be looked up in another database. The system can apply multiple different watermark decoding algorithms to a single image (eg, MediaSec, Digimarc ImageBridge, Civolution, etc.). Depending on which application has a specific decoding operation performed, the resulting watermark payload can be sent to the corresponding destination database. (The same is true for different barcodes, fingerprint algorithms, eigenface techniques, etc.) The destination database address may be included in an application or configuration database. (In general, addressing can be done indirectly with intermediate data storage including the address of the final database, allowing relocation of the database without changing each cell phone application.)

시스템은 주파수 도메인 정보를 획득하기 위해 캡처된 이미지 데이터에 대해 FFT를 실행한 다음, 그 정보를 병렬로 동작하는 여러 워터마크 디코더들 - 각각은 상이한 디코딩 알고리즘을 적용함- 에 공급한다. 애플리케이션들 중 하나가 유효한 워터마크 데이터를 추출할 때(예를 들면, 패이로드로부터 계산된 ECC 정보에 의해 표시됨), 데이터는 워터마크의 포맷/기술에 대응하는 데이터베이스에 송신된다. 복수의 이러한 데이터베이스 포인터들은 패킷에 포함될 수 있고, 조건적으로 이용될 수 있다 - 워터마크 디코딩 동작(또는 바코드 판독 동작, 또는 핑거프린트 계산 등)이 유용한 데이터를 산출하는 것에 의존하여.The system performs an FFT on the captured image data to obtain frequency domain information and then supplies that information to several watermark decoders operating in parallel, each applying a different decoding algorithm. When one of the applications extracts valid watermark data (e.g., indicated by ECC information calculated from the payload), the data is sent to a database corresponding to the format / description of the watermark. A plurality of such database pointers may be included in the packet and may be used conditionally-depending on the watermark decoding operation (or barcode reading operation, or fingerprint calculation, etc.) yielding useful data.

유사하게, 시스템은 이용자의 식별자(그러나, 이용자의 애플 아이포토, 또는 피카사, 또는 페이스북 이용자 이름을 포함하지 않음)를 포함하는 패킷에서, 중간 클라우드 서비스에 대한 얼굴 이미지를 송신할 수 있다. 중간 클라우드 서비스는 제공된 이용자 식별자를 취하여, 이들 다른 서비스들 상의 이용자 이름들이 획득될 수 있는 데이터베이스 레코드에 액세스하기 위해 이를 이용할 수 있다. 중간 클라우드 서비스는 그 후에, 얼굴 이미지 데이터를 애플의 서버에 - 이용자의 아이포토 이용자 이름으로 - ; 피카사의 서비스에 이용자의 구글 이용자 이름으로; 및 페이스북의 서버에 이용자의 페이스북 이용자 이름을 라우팅할 수 있다. 이들 각각의 서비스들은 그 후에 이미지에 대한 얼굴 인식을 실행하고, 이용자의 아이포토/피카사/페이스북 계정들로부터 식별된 사람들의 이름들을 리턴할 수 있다(이용자에게 직접 또는 중간 서비스를 통해). - 대다수의 이용자들을 서빙할 수 있는 - 중간 클라우드 서비스는 각각의 셀 폰이 업데이트된 방식으로 이러한 데이터를 유지하도록 시도하게 하기보다는 관련 서버들(및 이용자가 집에서 떨어져 있는 경우에, 대안적인 근처 서버들)에 대한 현재의 어드레스들의 통보를 유지할 수 있다.Similarly, the system may send a face image for the intermediate cloud service in a packet that includes the user's identifier (but not including the user's Apple iPhoto, or Picasa, or Facebook username). The intermediate cloud service can take the provided user identifier and use it to access a database record from which user names on these other services can be obtained. The intermediary cloud service then sends the face image data to Apple's server-with the user's iPhoto username; Picasa's service to your Google username; And route the Facebook username of the user to a Facebook server. Each of these services may then perform face recognition on the image and return the names of people identified from the user's iPhoto / Picasa / Facebook accounts (either directly or through an intermediate service to the user). Intermediate cloud services, which can serve the majority of users, do not cause each cell phone to attempt to maintain this data in an updated manner, rather than related servers (and alternative nearby servers if the user is away from home). Notification of the current addresses).

얼굴 인식 애플리케이션들은 사람들을 식별할 뿐만 아니라 이미지에 묘사된 개인들 사이의 관계들을 식별하기 위해 이용될 수 있다. 예를 들면, 아이포토/피카사/페이스북에 의해 유지된 데이터는 얼굴 인식 특징들 및 연관된 이름들 뿐만 아니라, 이름의 얼굴들과 계정 소유주 사이의 관계들을 나타내는 용어들(예를 들면, 아버지, 남자친구, 형제자매, 애완동물, 룸메이트 등)을 포함할 수 있다. 따라서, 예를 들면 "데이비드 스미스"의 모든 사진들에 대한 이용자의 이미지 콜렉션을 간단히 검색하는 대신에, 이용자의 콜렉션은 또한 "형제자매"를 묘사하는 모든 사진들에 대해 검색될 수도 있다. Facial recognition applications can be used to identify people as well as to identify relationships between individuals depicted in an image. For example, data maintained by iPhoto / Picasa / Facebook may refer to terms representing facial recognition features and associated names, as well as relationships between names' faces and account holders (eg, father, male Friends, siblings, pets, roommates, etc.). Thus, for example, instead of simply searching the user's image collection for all the photos of "David Smith", the user's collection may also be searched for all the pictures depicting "Brother and Sister".

사진들이 리뷰되는 애플리케이션 소프트웨어는 상이하게 인식되는 얼굴들 주위에 상이하게 컬러된 프레임들을 제공한다 - 연관된 관계 데이터에 따라(예를 들면, 형제자매들에 대해 청색, 남자친구들에 대해 적색 등). The application software in which the pictures are reviewed provides differently colored frames around differently recognized faces-depending on the associated relationship data (eg blue for siblings, red for boyfriends, etc.).

일부 어레인지먼트들에서, 이용자의 시스템은 이용자의 네트워크 "친구들"에 의해 유지되는 계정들에 저장된 이러한 정보에 액세스할 수 있다. 피카사에서 이용자의 계정과 연관된 얼굴 인식 데이터에 의해 인식될 수 없는 얼굴은 이용자의 친구 "데이비드 스미스"의 계정과 연관된 피카사 얼굴 인식 데이터를 참고함으로써 인식될 수 있다. 데이비드 스미스" 계정에 의해 표시된 관계 데이터는 이용자의 사진들을 제공 및 구성하기 위해 유사하게 이용될 수 있다. 초기에 인식되지 않은 얼굴은 따라서 그 사람이 데이비드 스미스의 룸메이트인 것을 표시하는 표시자로 라벨이 붙여질 수 있다. 이것은 관계 정보(예를 들면, 데이비드 스미스의 계정에 표시된 바와 같이 "룸메이트"를 이용자의 계정에서 "데이비드 스미스의 룸메이트"로 맵핑하는 것)를 본질적으로 재맵핑한다. In some arrangements, the user's system can access this information stored in accounts maintained by the user's network "friends." Faces that cannot be recognized by the facial recognition data associated with the user's account in Picasa may be recognized by referring to the Picasa facial recognition data associated with the account of the user's friend "David Smith". The relationship data indicated by the David Smith "account can be similarly used to provide and organize the user's photos. An initially unrecognized face is therefore labeled with an indicator indicating that the person is David Smith's roommate. This essentially remaps the relationship information (eg, mapping "roommate" from user's account to "David Smith's roommate" as indicated in David Smith's account).

상술된 실시예들은 일반적으로 단일 네트워크의 콘텍스트에서 기술되었다. 그러나, 복수의 네트워크들은 일반적으로 이용자의 폰에 이용 가능할 수 있다(예를 들면, WiFi, 블루투스, 가능한 상이한 셀룰러 네트워크들 등). 이용자는 이들 대아들 사이에서 선택할 수 있거나, 시스템은 자동으로 그렇게 하도록 저장된 규칙들을 적용할 수 있다. 일부 예들에서, 서비스 요청은 여러 네트워크들에 걸쳐 병렬로 발행(또는 결과들이 리턴)될 수 있다.The above-described embodiments are generally described in the context of a single network. However, a plurality of networks may generally be available to the user's phone (eg WiFi, Bluetooth, possible different cellular networks, etc.). The user can choose between these children or the system can automatically apply stored rules to do so. In some examples, a service request may be issued (or results returned) in parallel across several networks.

참조 플랫폼 아키텍처Reference platform architecture

셀 폰들의 하드웨어는 원래 특수 용도들을 위해 도입되었다. 예를 들면, 마이크로폰은 셀룰러 네트워크를 통한 음성 송신을 위해서만 이용되었다; A/D 컨버터를 공급하여 폰의 무선 송수신기에서 변조기를 공급하였다. 카메라는 스냅샷들을 캡처하기 위해서만 이용되었다. 등.The hardware of cell phones was originally introduced for special purposes. For example, microphones were only used for voice transmissions over cellular networks; A / D converters were supplied to provide modulators in the phone's wireless transceiver. The camera was used only to capture snapshots. Etc.

부가의 애플리케이션은 이러한 하드웨어를 활용하는 것에서 비롯되었으며, 각각의 애플리케이션은 그 자신의 방법으로 하드웨어에 이야기하도록 개발되어야 했다. 다른 종류의 소프트웨어 스택들이 발생되었다 - 특정 애플리케이션에 특수화된 각각은 하드웨어의 특정 부분과 상호작용할 수 있었다. 이것은 애플리케이션 개발에 대한 구현을 취한다.Additional applications resulted from utilizing this hardware, and each application had to be developed to talk to the hardware in its own way. Different kinds of software stacks have arisen-each specialized for a particular application could interact with a particular piece of hardware. This takes the implementation to application development.

이 문제는 클라우드 서비스들 및/또는 특수화된 처리기들이 믹스에 추가될 때 악화된다. This problem is exacerbated when cloud services and / or specialized processors are added to the mix.

이러한 어려움들을 완화시키기 위하여, 본 기술의 일부 실시예들은 이와 함께 및 이를 통해 하드웨어 및 소프트웨어가 상호작용할 수 있는 표준 인터페이스를 제공하는 중간 소프트웨어층을 활용할 수 있다. 이러한 어레인지먼트는 중간 소프트웨어층이 "참조 플랫폼"으로 라벨이 붙여진 도 20a에 도시된다. To alleviate these difficulties, some embodiments of the present technology may utilize an intermediate software layer that provides a standard interface with and through which hardware and software can interact. This arrangement is shown in FIG. 20A where the intermediate software layer is labeled “Reference Platform”.

이러한 도면에서, 하드웨어 요소들은 최하부 상의 처리 하드웨어 및 왼쪽 상의 주변기기들을 포함하는 점선 박스들에 도시된다. 박스 "IC HW"는 "직관적인 계산 하드웨어(intuitive computing hardware)"이고, 도 16의 모듈들(38), 도 6의 구성 가능한 하드웨어 등과 같이 이미지 관련 데이터의 상이한 처리를 지원하는 초기에 논의된 하드웨어를 포함한다. DSP는 범용 디지털 신호 처리기이며, 이것은 특수화된 동작들을 실행하도록 구성될 수 있다; CPU는 폰의 주 처리기이다; GPU는 그래픽스 처리 유닛이다. OpenCL 및 OpenGL은 그래픽스 처리 서비스들(CPU 및/또는 GPU 상에서 실행된)이 호출될 수 있는 API들이다. In this figure, the hardware elements are shown in dotted boxes containing processing hardware on the bottom and peripherals on the left. The box "IC HW" is "intuitive computing hardware", and the hardware discussed earlier that supports different processing of image related data, such as modules 38 of FIG. 16, configurable hardware of FIG. 6, and the like. It includes. The DSP is a general purpose digital signal processor, which can be configured to perform specialized operations; CPU is the phone's main processor; GPU is a graphics processing unit. OpenCL and OpenGL are APIs in which graphics processing services (running on the CPU and / or GPU) can be called.

상이한 특수화된 기술들은 하나 이상의 디지털 워터 마크 디코더들(및/또는 인코더들), 바코드 판독 소프트웨어, 광학 캐릭터 인식 소프트웨어 등과 같이 중도에 있다. 클라우드 서비스들은 오른쪽 상에 도시되고, 애플리케이션들은 최상부에 도시된다. Different specialized techniques are intermediate, such as one or more digital watermark decoders (and / or encoders), barcode reading software, optical character recognition software, and the like. Cloud services are shown on the right and applications are shown on top.

참조 플랫폼은 상이한 애플리케이션들이 하드웨어와 상호작용하고, 정보를 교환하고, 서비스들을 요청하는 표준 인터페이스를 확립한다(예를 들면, API 호들에 의해). 유사하게, 플랫폼은 상이한 기술들이 액세스될 수 있고, 이들이 시스템 구성요소들의 다른 것들에 데이터를 송신 및 수신할 수 있는 표준 인터페이스를 확립한다. 클라우드 서비스들과 마찬가지로, 참조 플랫폼은 또한 서비스 제공자를 식별하는 세부사항들을 처리할 수 있다 - 역경매, 발견적 교수법 등에 의해. 서비스가 셀 폰의 기술로부터 및 원격 서비스 제공자로부터 둘 다 이용 가능한 경우들에, 참조 플랫폼은 또한, 상이한 옵션들의 비용들 및 이점들을 가중하는 것과, 어느 것이 특정 서비스 요청을 다루어야 하는지를 결정하는 것을 실행할 수 있다.The reference platform establishes a standard interface (eg, by API calls) where different applications interact with hardware, exchange information, and request services. Similarly, the platform establishes a standard interface through which different technologies can be accessed and in which they can send and receive data to other of the system components. Like the cloud services, the reference platform can also process the details that identify the service provider-by reverse auctions, heuristics and the like. In cases where a service is available both from the technology of the cell phone and from a remote service provider, the reference platform may also execute weighting the costs and benefits of different options and determining which should handle a particular service request. have.

이러한 어레인지먼트에 의해, 상이한 시스템 구성요소들은 시스템의 다른 부분들의 세부사항들과 스스로 관련될 필요가 없다. 애플리케이션은 시스템이 셀 폰 앞의 오브젝트로부터 텍스트를 판독하도록 요청할 수 있다. 이미지 센서의 특정 제어 파라미터들 또는 OCR 엔진의 이미지 포맷 요건들과 스스로 연관될 필요가 없다. 애플리케이션은 셀 폰 앞의 사람의 감정의 판독을 요청할 수 있다. 대응하는 호는 폰의 기술이 이러한 기능을 지원하는 것은 무엇이든 통과되고, 결과들은 표준화된 형태로 리턴된다. 개선된 기술이 이용 가능하게 될 때, 그것은 폰에 추가될 수 있고, 참조 플랫폼을 통해 시스템은 그 향상된 능력들의 이점들을 취한다. 따라서, 센서들의 성장하는/변하는 콜렉션들 및 서비스 제공자들의 성장하는/발전하는 세트들은 이러한 적응 가능한 아키텍처의 이용을 통해 입력 자극(비주얼뿐만 아니라 오디오, 예를 들면 음성 인식)으로부터 의미를 도출하는 작업들에 설정될 수 있다. By this arrangement, different system components do not have to relate to the details of other parts of the system themselves. The application can request that the system read the text from the object in front of the cell phone. There is no need to correlate with the specific control parameters of the image sensor or the image format requirements of the OCR engine. The application may request the reading of a person's emotions in front of the cell phone. The corresponding call passes whatever the phone's technology supports this function and the results are returned in a standardized form. When the improved technology becomes available, it can be added to the phone, and through the reference platform the system takes advantage of its improved capabilities. Thus, growing / changing collections of sensors and growing / developing sets of service providers are tasked to derive meaning from input stimuli (visual as well as audio, eg speech recognition) through the use of this adaptive architecture. Can be set.

아라산 칩 시스템즈, 인크는 특정 기술들의 셀 폰들로의 통합을 가능하게 하기 위해 적층된 커넬-레벨 스택을 모바일 산업 처리기 인터페이스 유니프로 소프트웨어 스택에 제공한다. 그 어레인지먼트는 상술된 기능을 제공하기 위해 확장될 수 있다. (아라산 프로토콜은 송신층 문제들에 대해 주로 초점이 맞춰지지만, 하드웨어 구동기들 아래의 층들을 마찬가지로 관련시킨다. 모바일 산업 처리기 인터페이스 얼라이언스는 대형 산업 단체(진보된 셀 폰 기술들에 동작함)이다.Arasan Chip Systems, Inc. provides a stacked kernel-level stack to the mobile industrial processor interface Unipro software stack to enable integration of specific technologies into cell phones. The arrangement can be extended to provide the functionality described above. (The Arasan protocol focuses primarily on transport layer problems, but likewise involves layers under hardware drivers. The mobile industry processor interface alliance is a large industry organization (works on advanced cell phone technologies).

예를 들면 메타데이터에 대한 기존 이미지 For example, an existing image for metadata 콜렉션들의Of collections 레버리징Leveraging

공용으로 이용 가능한 이미지 및 다른 콘텐트의 콜렉션들이 더욱 일반적으로 행해지고 있다. 플리커, 유튜브, 포토버켓(마이스페이스), 피카사, Zooomr, 페이스북, 웹샷들 및 구글 이미지들이 바로 그 몇몇이다. 흔히, 이들 리소스들은 메타데이터의 소스들의 역할을 할 수 있다 - 그와 같이 명백히 추론됨 또는 파일 이름들, 기술들 등과 같이 파일로부터 추론됨. 때때로 지리적-위치 데이터가 또한 이용 가능하다.Collections of publicly available images and other content are more generally done. Flickr, YouTube, Photobucket (MySpace), Picasa, Zooomr, Facebook, web shots and Google images are just a few of them. Often, these resources can serve as sources of metadata-such obviously inferred or deduced from a file such as file names, descriptions, etc. Sometimes geographic-location data is also available.

본 기술의 일 양태에 따른 예시적 실시예들은 다음과 같이 동작한다. 오브젝트 또는 장면의 셀 폰 화상 캡처들 - 도 21에 도시된 바와 같이, 데스크 전화. (이미지는 다른 이용자로부터 송신되거나 원격 컴퓨터로부터 다운로드된 바와 같이 다른 방식으로도 또한 얻어질 수 있다.) Exemplary embodiments according to one aspect of the present technology operate as follows. Cell Phone Picture Captures of an Object or Scene—A Desk Phone, as shown in FIG. 21. (Images can also be obtained in other ways as well, such as sent from another user or downloaded from a remote computer.)

예비적인 동작으로서, 알려진 이미지 처리 동작들이 예를 들면, 캡처된 이미지에 대해, 컬러 또는 콘트라스트를 정정하기 위해, 직-표준화(ortho-normalization)를 실행하기 위해, 등과 같이 적용될 수 있다. 알려진 이미지 오브젝트 세그먼테이션 또는 분류 기술들이 또한, 이미지의 명백한 대상 영역을 식별하고 이를 다른 처리를 위해 분리하기 위해 이용될 수 있다. As a preliminary operation, known image processing operations can be applied, for example, to the captured image, to correct color or contrast, to perform ortho-normalization, and so on. Known image object segmentation or classification techniques may also be used to identify the apparent target area of the image and separate it for other processing.

그 후에, 이미지 데이트는 패턴 매칭 및 인식에 유용한 특성화된 특징들을 결정하도록 처리된다. 컬러, 형상 및 텍스처 메트릭들이 이러한 용도로 흔히 이용된다. 이미지들은 또한, 레이아웃 및 고유벡터들(후자는 얼굴 인식에 특히 인기가 있음)에 기초하여 그룹화될 수 있다. 이 명세서의 다른 부분에 주지된 바와 같이, 많은 다른 기술들이 이용될 수 있음은 당연하다. Image data is then processed to determine the characterized features useful for pattern matching and recognition. Color, shape and texture metrics are commonly used for this purpose. Images can also be grouped based on layout and eigenvectors (the latter being particularly popular for face recognition). As is well known elsewhere in this specification, it is obvious that many other techniques may be used.

(얼굴들, 이미지, 비디오, 오디오 및 다른 패턴들에서, 벡터 특징화/분류들 및 다른 이미지/비디오/오디오 메트릭들의 이용들은 잘 알려져 있고, 본 기술의 특정 실시예들과 함께 이용하기에 적합하다. 예를 들면, 특허 공개들 20060020630 및 20040243567 (Digimarc), 20070239756 및 20020037083 (Microsoft), 20070237364 (Fuji Photo Film), 7,359,889 및 6,990,453 (Shazam), 20050180635 (Corel), 6,430,306, 6,681,032 및 20030059124 (L-1 Corp.), 7,194,752 및 7,174,293 (Iceberg), 7,130,466 (Cobion), 6,553,136 (Hewlett-Packard), 및 6,430,307 (Matsushita), 및 이 개시내용의 끝부분에 인용된 학술 참조문헌들을 참조한다. 오디오 및 비디오와 같은 엔터테인먼트 콘텐트의 인식과 함께 이용될 때, 이러한 특징들은 때때로 콘텐트 "핑거프린터들" 또는 "해시들"라고 한다.)(In faces, images, video, audio and other patterns, the use of vector characterizations / classifications and other image / video / audio metrics are well known and suitable for use with certain embodiments of the present technology. See, for example, patent publications 20060020630 and 20040243567 (Digimarc), 20070239756 and 20020037083 (Microsoft), 20070237364 (Fuji Photo Film), 7,359,889 and 6,990,453 (Shazam), 20050180635 (Corel), 6,430,306, 6,681,032 and 20030059124 (L-1) Corp.), 7,194,752 and 7,174,293 (Iceberg), 7,130,466 (Cobion), 6,553,136 (Hewlett-Packard), and 6,430,307 (Matsushita), and the academic references cited at the end of this disclosure. When used with the recognition of the same entertainment content, these features are sometimes referred to as content "fingerprinters" or "hashes."

이미지에 대한 특징 메트릭들이 결정된 후, 유사한 메트릭들을 가진 이미지들에 대한 하나 이상의 공용으로 액세스 가능한 이미지 저장소들을 통해 검색이 행해질 수 있고, 그에 의해, 유사한 이미지들을 명백하게 식별할 수 있다. (이미지 수집 처리의 일부로서, 플리커 및 다른 이러한 저장소들은 이들이 이용자들에 의해 업로딩되고 공용 검색을 위해 인덱스가 동일한 것을 수집할 때, 고유벡터들, 컬러 히스토그램들, 키포인트 디스크립터들, FFT들, 또는 이미지들에 대한 다른 분류 데이터를 계산할 수 있다.) 검색은 도 22에 도시된 플리커에서 발견된 명백하게 유사한 전화 이미지들의 콜렉션을 유발할 수 있다. After the feature metrics for the image have been determined, a search can be made through one or more publicly accessible image stores for images with similar metrics, thereby making it possible to clearly identify similar images. (As part of image collection processing, Flickr and other such repositories are eigenvectors, color histograms, keypoint descriptors, FFTs, or images when they are uploaded by users and collect the same index for public search. Other classification data can be calculated.) The search can result in a collection of apparently similar telephone images found in the flicker shown in FIG.

메타데이터는 그 후에 이들 이미지들의 각각에 대해 플리커로부터 획득되고, 기술 용어들이 분석되어, 발생 빈도들에 의해 랭킹된다. 묘사된 이미지들의 세트에서, 예를 들면, 그러한 동작 및 그들 발생 빈도로부터 획득된 디스크립터들은 다음과 같을 수 있다:Metadata is then obtained from Flickr for each of these images, and technical terms are analyzed and ranked by frequency of occurrence. In the set of depicted images, for example, descriptors obtained from such an action and their frequency of occurrence may be as follows:

시스코 (18) Cisco (18)

폰 (10) Phone (10)

전화 (7) Phone (7)

VOIP (7) VOIP (7)

IP (5) IP (5)

7941 (3) 7941 (3)

폰들 (3) Pawns (3)

전화 (3) Phone (3)

7960 (2) 7960 (2)

7920 (1) 7920 (1)

7950 (1) 7950 (1)

가장 잘 산 물건(Best Buy) (1) Best Buy (1)

데스크 (1) Desk (1)

이더넷 (1) Ethernet (1)

IP-폰 (1) IP-Phone (1)

오피스 (1) Office (1)

값비싼(Pricey) (1) Pricey (1)

스프린트 (1) Sprint (1)

원격통신들 (1) Telecommunications (1)

유니넷(Uninett) (1) Uninet (1)

작업 (1) Action (1)

이러한 추론된 메타데이터의 집선된 세트로부터, 가장 높은 카운트 값들을 가진 용어들(예를 들면, 가장 빈빈히 발생하는 용어들)은 이용자의 도 21 이미지를 가장 정확하게 특징짓는 용어들이라고 가정할 수 있다. From this condensed set of inferred metadata, one can assume that terms with the highest count values (eg, the least frequently occurring terms) are the terms that most accurately characterize the user's FIG. 21 image. .

추론된 메타데이터는 알려진 이미지 인식/등급 기술들에 의해 원한다면 증대되거나 향상될 수 있다. 이러한 기술은 이미지에 묘사된 오브젝트들의 자동 인식을 제공하도록 추구한다. 예를 들면, 터치톤 키패트 레이아웃과 코일 코드를 인식함으로써, 이러한 분류기는 용어들 전화 및 팩시밀리 기계의 용어들을 이용하여 도 21 이미지에 라벨을 붙일 수 있다. Inferred metadata can be augmented or enhanced if desired by known image recognition / rating techniques. This technique seeks to provide automatic recognition of the objects depicted in the image. For example, by recognizing the touchtone keypad layout and coil code, this classifier can label the FIG. 21 image using the terms telephone and facsimile machine terms.

추론된 메타데이터에 아직 존재하지 않으면, 이미지 분류기에 의해 리턴된 용어들은 리스트에 추가될 수 있거나 카운트 값이 주어질 수 있다. (임의의 수, 예를 들면 2가 이용될 수 있거나, 구별된 식별에서 분류기의 보고된 신뢰에 의존하는 값이 활용될 수 있다.)If not yet present in the inferred metadata, the terms returned by the image classifier may be added to the list or given a count value. (A random number, for example 2, may be used, or a value that depends on the reported confidence of the classifier in the distinguished identification may be utilized.)

분류기가 이미 존재하는 하나 이상의 용어들을 유발한다면, 리스트의 용어(들)의 위치가 상승될 수 있다. 용어의 위치를 상승시키는 한가지 방법은 백분율(예를 들면 30%)에 의해 카운트 값을 증가시키는 것이다. 다른 방법은 이미지 분류기에 의해 구별되지 않는 다음-상위 용어보다 1 더 크게 카운트 값을 증가시키는 것이다. (분류기가 용어 "전화"를 리턴하였지만 용어 "시스코"를 리턴하지 않았기 때문에, 이 후자 방식은 용어 전화를 "19"의 카운트 값 - 시스코보다 1 위에 - 으로 랭킹할 수 있었다.) 추론된 메타데이터를 이미지 분류기로부터 유발된 것으로 증대/향상을 위한 다양한 다른 기술들은 구현하기가 용이하다. If the classifier results in one or more terms already present, the position of the term (s) in the list may be elevated. One way to raise the position of the term is to increase the count value by a percentage (eg 30%). Another method is to increase the count value by one greater than the next-parent term not distinguished by the image classifier. (Since the classifier returned the term "telephone" but did not return the term "cisco", this latter approach could rank the term telephone with a count value of "19"-1 above Cisco.) Inferred metadata Is derived from the image classifier and various other techniques for augmentation / enhancement are easy to implement.

상술한 것으로부터 유발된 메타데이터의 개정된 리스트는 다음과 같을 수 있다:The revised list of metadata resulting from the above may be as follows:

전화 (19) Phone (19)

시스코 (18) Cisco (18)

폰 (10) Phone (10)

VOIP (7) VOIP (7)

IP (5) IP (5)

7941 (3) 7941 (3)

폰들 (3) Pawns (3)

기술 (3) Technology (3)

7960 (2) 7960 (2)

팩시밀리 머신 (2)Fax machines (2)

7920 (1) 7920 (1)

7950 (1) 7950 (1)

가장 잘 산 물건 (1) Best Buy (1)

데스크 (1) Desk (1)

이더넷 (1) Ethernet (1)

IP-폰 (1) IP-Phone (1)

오피스 (1) Office (1)

값비싼 (1) Expensive (1)

스프린트 (1) Sprint (1)

원격통신들 (1) Telecommunications (1)

유니넷 (1) Uninet (1)

작업 (1) Action (1)

추론된 메타데이터의 리스트는 가장 높은 명확한 신뢰도, 예를 들면, 카운트 값들을 갖는 용어들로 제한될 수 있다. 예를 들면, 최상위 N개의 용어들, 또는 랭킹 리스트의 최상위 M번째 백분위수의 용어들을 포함하는 리스트의 서브세트가 이용될 수 있다. 이 서브세트는 추론된 메타데이터로서 그 이미지에 대한 메타데이터 저장소에서 도 21 이미지와 함께 연관될 수 있다. The list of inferred metadata may be limited to terms with the highest clear confidence, eg, count values. For example, a subset of the list may be used that includes the top N terms, or the terms of the top Mth percentile of the ranking list. This subset can be associated with the FIG. 21 image in the metadata repository for that image as inferred metadata.

본 예에서, N = 4이면, 용어들, 전화, 시스코, 폰 및 VOIP가 도 21 이미지와 연관된다. In this example, if N = 4, the terms, phone, Cisco, phone and VOIP are associated with the FIG. 21 image.

일단, 메타데이터의 리스트가 도 21 이미지에 대해 어셈블링되면(상술된 절차들 또는 다른 것에 의해), 다양한 동작들이 착수될 수 있다. Once the list of metadata is assembled for the FIG. 21 image (by the procedures or otherwise described above), various actions can be undertaken.

한가지 옵션은 캡처된 콘텐트 또는 캡처된 콘텐트로부터 도출된 데이터와 함께(예를 들면, 도 21 이미지, 고유벡터들, 컬러 히스토그램, 키포인트 디스크립터들, FFT들, 이미지로부터 디코딩된 기계 판독가능한 데이터 등과 같은 이미지 특징) 메타데이터를, 제시된 데이터에 대해 동작하고 이용자에게 응답들 제공하는 서비스 제공자에 제시한다. Shazam, Snapnow (지금은 LinkMe Mobile), ClusterMedia Labs, Snaptell (지금은 아마존의 A9 검색 서비스의 일부), Mobot, Mobile Acuity, Nokia Point & Find, Kooaba,

TinEye, iVisit's SeeScan, Evolution Robotics' ViPR, IQ Engine's oMoby, 및 Digimarc Mobile이 몇몇의 여러 상업적으로 이용 가능한 서비스들이며, 이들은 미디어 콘텐트를 캡처하고 대응 응답을 제공한다; 다른 것들은 초기 인용된 특허 공개들에 상술되어 있다. 메타데이터와 콘텐트 데이터를 수반함으로써, 서비스 제공자는 이용자의 제시에 어떻게 응답되어야 하는지에 관한 더욱 충분한 판단을 할 수 있다. One option is to capture the captured content or images derived from the captured content (e.g., images such as Figure 21 images, eigenvectors, color histograms, keypoint descriptors, FFTs, machine readable data decoded from images, etc.). Feature) present metadata to a service provider that operates on the presented data and provides responses to the user. Shazam, Snapnow (now LinkMe Mobile), ClusterMedia Labs, Snaptell (now part of Amazon's A9 search service), Mobot, Mobile Acuity, Nokia Point & Find, Kooaba,

TinEye, iVisit's SeeScan, Evolution Robotics' ViPR, IQ Engine's oMoby, and Digimarc Mobile are some of several commercially available services that capture media content and provide corresponding responses; Others are detailed in the earlier cited patent publications. By involving metadata and content data, the service provider can make more informed decisions about how to respond to the user's presentation.

서비스 제공자 - 또는 이용자의 디바이스 - 는 이용자에 의해 원하는 적절한 것이 더욱 양호하게 구별/추론/직관하도록 도울 수 있는 더욱 풍부한 세트의 보조 정보를 획득하기 위해, 하나 이상의 다른 서비스들, 예를 들면 구글과 같은 웹 검색 엔진에 메타데이터 디스크립터들을 제시할 수 있다. 또는 구글(또는 다른 그러한 데이터베이스 리소스)로부터 획득된 정보는 서비스 제공자에 의해 이용자에게 전달되는 응답을 증대/개량하기 위해 이용될 수 있다. (일부 경우들에서, - 구글로부터 수신된 보조 정보에 의해 가능하게 수반된 - 메타데이터는 이미지 데이터를 필요로 하지 않고서도, 서비스 제공자가 이용자에게 적절한 응답을 만들어 주도록 허용할 수 있다.)The service provider-or the user's device-can obtain one or more other services, such as Google, for example, to obtain a richer set of assistance information that can help better distinguish / infer / intuitive what is desired by the user. You can present metadata descriptors to a web search engine. Or information obtained from Google (or other such database resource) may be used to augment / improve the response delivered to the user by the service provider. (In some cases, metadata-possibly accompanied by auxiliary information received from Google-can allow the service provider to make an appropriate response to the user without requiring image data.)

일부 경우들에서, 플리커로부터 획득된 하나 이상의 이미지들은 이용자의 이미지에 대해 대체된다. 이것은 예를 들면, 플리커 이미지가 더 높은 품질인 것을 나타내는 경우(선명도, 조명 히스토그램 또는 다른 측정들을 이용하여), 그리고 이미지 메트릭들이 상당히 유사한 경우에 행해질 수 있다. (유사성은 이용중인 메트릭들에 적절한 거리 측정에 의해 판단될 수 있다. 일 실시예는 거리 측정이 임계값보다 낮은지의 여부를 확인한다. 여러 교호하는 이미지들이 이 스크린을 통과하면, 가장 가까운 이미지가 이용된다.) 또는 다른 환경들에서 대체가 이용될 수 있다. 대체된 이미지는 그 후에 본 명세서에 상술된 어레인지먼트들에서 캡처된 이미지 대신(또는 부가하여) 이용될 수 있다.In some cases, one or more images obtained from Flickr are replaced for the user's image. This can be done, for example, if the flicker image is of higher quality (using sharpness, illumination histogram or other measurements), and if the image metrics are quite similar. (Similarity can be determined by distance measurement appropriate to the metrics being used. One embodiment checks whether the distance measurement is lower than a threshold. When several alternating images pass through this screen, the closest image is obtained. Or alternatives may be used in other circumstances. The replaced image may then be used instead of (or in addition to) the image captured in the arrangements detailed herein.

그러한 일 어레인지먼트에서, 대체 이미지 데이터가 서비스 제공자에 제시된다. 또한, 여러 대체 이미지들에 대한 데이터가 제시된다. 또한, 오리지널 이미지 데이터 - 하나 이상의 대안적인 세트들의 이미지 데이터와 함께 - 가 제시된다. 후자의 두 경우들에서, 서비스 제공자는 에러의 기회를 감소시키도록 돕기 위해 리던던시를 이용할 수 있다 - 적절한 응답이 이용자에게 제공된다고 가정한다. (또는 서비스 제공자는 개별적으로 각각 제시된 세트의 이미지 데이터를 다룰 수 있고, 복수의 응답들을 이용자에게 제공할 수 있다. 그 후에, 셀 폰 상의 클라이언트 소프트웨어는 상이한 응답들을 평가하고, 이들 사이를 피킹하고(예를 들면, 보우팅 어레인지먼트에 의해), 또는 이용자 응답들을 조합할 수 있다.) In one such arrangement, the replacement image data is presented to the service provider. In addition, data for several alternative images are presented. In addition, original image data, along with one or more alternative sets of image data, is presented. In the latter two cases, the service provider may use redundancy to help reduce the chance of error-assuming that an appropriate response is provided to the user. (Or the service provider can handle each presented set of image data individually and provide a plurality of responses to the user. The client software on the cell phone then evaluates the different responses and picks between them ( For example, by bowing arrangement), or by combining user responses.)

대체의 대신에, 하나 이사의 관련된 공용 이미지(들)는 이용자의 셀 폰 이미지와 합성되거나 병합될 수 있다. 그 후에 결과로서 생긴 하이브리드 이미지는 이 개시내용에 상술된 상이한 콘텍스트들에서 이용될 수 있다. Instead of substitution, one director's associated public image (s) may be composited or merged with the user's cell phone image. The resulting hybrid image can then be used in the different contexts detailed in this disclosure.

또 다른 옵션은 이용자의 이미지의 향상을 통보하기 위해 플리커로부터 수집된 명백하게 유사한 이미지들을 이용하는 것이다. 예들은 컬러 정정/매칭, 콘트라스트 정정, 섬광 감소, 전경/배경 오브젝트 제거 등을 포함한다. 이러한 어레인지먼트에 의해, 예를 들면, 이러한 시스템은 도 21 이미지가 마스킹되거나 무시되어야 하는 전화 상의 전경 구성요소들(명확하게 포스트-잇 표기들)을 가지는 것을 구별할 수 있다. 이용자의 이미지 데이터는 따라서 향상될 수 있고, 향상된 이미지 데이터가 그 후에 이용된다.Another option is to use clearly similar images collected from Flickr to notify the user of an improvement in the image. Examples include color correction / matching, contrast correction, glare reduction, foreground / background object removal, and the like. By such an arrangement, for example, such a system can distinguish that the FIG. 21 image has foreground components (clearly post-it notations) on the phone that should be masked or ignored. The image data of the user can thus be improved, and the enhanced image data is then used.

관련하여, 이용자의 이미지는 예를 들면, 단편적 투시로부터 대상을 묘사하는 것, 또는 불량한 조명 등으로 어떤 장애를 겪어야 할 수 있다. 이 장애는 이용자의 이미지가 서비스 제공자에 의해 인식되지 않게 할 수 있다(즉, 이용자에 의해 제시된 이미지 데이터는 검색되는 데이터베이스에서 임의의 이미지 데이터를 매칭할 것 같지 않다). 이러한 실패에 응답하거나, 미리 대비적으로, 플리커로부터 식별된 유사한 이미지들로부터의 데이터는 서비스 제공자에게 대안들로 제시될 수 있다 - 그들이 더욱 양호하게 작업하는 것을 희망한다. In this regard, the image of the user may have to suffer from certain obstacles, for example, depicting the object from fractional perspective, poor lighting, or the like. This disorder may cause the user's image not to be recognized by the service provider (ie, the image data presented by the user is unlikely to match any image data in the database being retrieved). In response to this failure, or in advance, data from similar images identified from Flickr can be presented to the service provider as alternatives-hope they work better.

다른 방식 - 많은 다른 가능성들을 열어 둠 - 은 유사한 이미지 메트릭들을 가진 하나 이상의 이미지들을 플리커에서 검색하고 본 명세서에 기술된 메타데이터를 수집하는 것이다(예를 들면, 전화, 시스코, 폰, VOIP). 플리커는 그 후에, 메타데이터에 기초하여 2번 검색된다. 유사한 메타데이터를 가진 복수의 이미지들이 그에 의해 식별될 수 있다. 그 후에, 이들 다른 이미지들에 대한 데이터(다양한 상이한 원근법, 상이한 조명 등의 이미지들을 포함)는 서비스 제공자에게 제시될 수 있다 - 이들이 이용자의 셀 폰 이미지와 상이하게 "보일(look)"수 있음에도 불구하고. Another way-leaving many other possibilities open-is to retrieve one or more images with similar image metrics from Flickr and collect the metadata described herein (eg, phone, Cisco, phone, VOIP). Flicker is then, on the basis of meta data is searched twice. Multiple images with similar metadata can thereby be identified. Thereafter, data for these other images (including images of various different perspectives, different lighting, etc.) may be presented to the service provider-although they may be "looked" differently from the user's cell phone image. and.

메타데이터-기반 검색들을 할 때, 메타데이터의 아이덴티티는 필요하지 않을 수 있다. 예를 들면, 금방 참조된 플리커의 제 2 검색에서, 4개 용어들의 메타데이터가 이용자의 이미지와 연관될 수 있다: 전화, 시스코, 폰 및 VOIP. 매칭은 이들 용어들의 서브세트(예를 들면 3)가 발견되는 예로 간주될 수 있다. When doing metadata-based searches, the identity of the metadata may not be needed. For example, in a second search of Flickr just referenced, metadata of four terms may be associated with the user's image: phone, Cisco, phone and VOIP. Matching can be considered an example where a subset of these terms (eg 3) is found.

다른 방식은 공유된 메타데이터 용어들의 랭킹들에 기초하여 매칭들을 랭킹하는 것이다. 따라서, 전화 및 시스코로 태깅된 이미지는 폰 및 VOIP로 태깅된 이미지보다 양호한 매칭으로 랭킹된다. "매칭"을 랭크하는 하나의 적응적 방식은 이용자의 이미지에 대한 메타데이터 디스크립터들에 대한 카운트들을 합산하고(예를 들면, 19 + 18 + 10 + 7 = 54), 그 후에 플리커 이미지에서 공유된 용어들에 대한 카운트 값들을 기록하는 것이다(예를 들면, 35, 플리커 이미지가 시스코, 폰 및 VOIP로 태깅되는 경우에). 그 후에 비율이 계산되고(35/54) 임계값(예를 들면 60%)에 비교될 수 있다. 이 경우, "매칭"이 발견된다. 다양한 다른 적응적 매칭 기술들이 기술자에 의해 강구될 수 있다. Another way is to rank the matches based on the rankings of shared metadata terms. Thus, images tagged with phone and Cisco are ranked with better matching than images tagged with phone and VOIP. One adaptive way of ranking "matching" sums the counts for metadata descriptors for the user's image (eg, 19 + 18 + 10 + 7 = 54), and then shared in the flicker image It is to record the count values for the terms (eg 35, when the flicker image is tagged with Cisco, Phone and VOIP). The ratio can then be calculated (35/54) and compared to a threshold (eg 60%). In this case, "matching" is found. Various other adaptive matching techniques can be devised by the technician.

상기 예들은 이미지 메트릭들의 유사성, 및 선택적으로 텍스트의(의미) 메타데이터의 유사성에 기초하여 이미지들을 플리커에서 검색했다. 지리적 위치 데이터(예를 들면, GPS 태그들)이 또한 메타데이터 발판(toe-hold)을 얻기 위해 이용될 수 있다. The above examples retrieved images from Flickr based on the similarity of the image metrics, and optionally the similarity of the (meaning) metadata of the text. Geographic location data (eg, GPS tags) may also be used to obtain metadata toe-hold.

이용자가 금속 세공 또는 다른 드물게 유리한 지점의 가운데로부터 에펠 타워의 예술적 추상적인 샷(예를 들면 도 29)을 캡처하는 경우에, 이미지 메트릭들로부터 - 에펠 타워인 것으로 인식할 수 없다. 그러나, 이미지와 함께 캡처된 GPS 정보는 이미지 대상의 위치를 식별한다. 공용 데이터베이스들(플리커를 포함)은 GPS 디스크립터들에 기초하여 텍스트의 메타데이터를 검색하기 위해 활용될 수 있다. 사진에 대한 GPS 디스크립터들을 입력하여, 텍스트의 디스크립터들 파리 및 에펠을 생성한다. If the user captures an artistic abstract shot of the Eiffel Tower (eg FIG. 29) from the center of a metalwork or other rarely advantageous point, it cannot be recognized from the image metrics-the Eiffel Tower. However, the GPS information captured with the image identifies the location of the image object. Public databases (including Flickr) can be utilized to retrieve metadata of text based on GPS descriptors. By inputting GPS descriptors for the picture, the descriptors of text Paris and Eiffel are generated.

구글 이미지들 또는 다른 데이터베이스에는 다른, 더욱 가능성 있는 에펠 타워의 통상적인 이미지들을 검색하기 위해 용어 에펠 및 파리로 질의될 수 있다. 하나 이상의 이들 이미지들은 처리를 구동하기 위해 서비스 제공자에게 제시될 수 있다. (대안적으로, 이용자 이미지로부터의 GPS 정보는 동일한 위치로부터 이미지들을 플리커에서 검색한다; 서비스 제공자에게 제시될 수 있는 에펠 타워의 이미지를 생성한다.) Google images or other databases may be queried with the terms Eiffel and Paris to search for other, more likely conventional images of the Eiffel Tower. One or more of these images may be presented to a service provider to drive the processing. (Alternatively, GPS information from a user image retrieves images from Flickr from the same location; creates an image of the Eiffel Tower that can be presented to the service provider.)

GPS가 카메라-메타데이터-전개에서 얻어지지만, 플리커 및 다른 공용 데이터베이스들에서 현재의 대부분의 이미지는 지리적 위치 정보를 손실하고 있다. 그러나, GPS 정보는 가시적 특징들을 공유하거나(고유벡터들, 컬러 히스토그램들, 키포인트 디스크립터들, FFT들, 또는 다른 분류 기술들과 같은 이미지 메트릭들에 의해), 또는 메타데이터 매칭을 가지는 이미지의 콜렉션을 통해 자동으로 전파될 수 있다. Although GPS is obtained from camera-metadata-deployment, most of the current images in Flickr and other public databases are missing geographic location information. However, GPS information can be a collection of images that share visible features (by image metrics such as eigenvectors, color histograms, keypoint descriptors, FFTs, or other classification techniques), or have metadata matching. Can be propagated automatically.

예시하기 위해, 이용자가 도시 분수대의 셀 폰 화상을 찍고, 이미지가 GPS 정보로 태깅되는 경우, 특징-인식에 기초하여 그 분수대의 플리커/구글 이미지들을 매칭하는 것을 식별하는 처리가 제시될 수 있다. 이들 이미지들의 각각에 대해, 처리는 이용자의 이미지로부터 GPS 정보를 추가할 수 있다. To illustrate, if a user takes a cell phone image of an urban fountain and the image is tagged with GPS information, a process may be presented that identifies matching the flicker / Google images of that fountain based on feature-recognition. For each of these images, the process may add GPS information from the user's image.

제 2 레벨의 검색이 또한 활용될 수 있다. 출현의 유사성에 기초하여 제 1 검색으로부터 식별된 분수대 이미지들이 세트로부터, 메타데이터가 상기와 같이 획득되어 랭킹될 수 있다. 플리커는 그 후에, 특정 임계값 내에서 매칭하는 메타데이터를 갖는 이미지들을 2회 검색될 수 있다(예를 들면, 상기에 검토된 바와 같이). 이들 이미지들에 대해, 역시 이용자의 이미지로부터 GPS 정보가 추가될 수 있다. A second level of search may also be utilized. From the set of fountain images identified from the first search based on the similarity of appearance, metadata may be obtained and ranked as above. Flickr may then retrieve the images twice with matching metadata within a certain threshold (eg, as discussed above). For these images, GPS information can also be added from the user's image.

대안적으로, 또는 그에 부가하여, 분수대의 이용자의 이미지와 유사한 플리커/구글에서의 제 1 세트의 이미지들이 식별될 수 있다 - 패턴 매칭뿐만 아니라 GPS에 매칭에 의해(또는 양쪽 모두). 메타데이터는 이들 GPS-매칭된 이미지들로부터 획득되어 랭킹될 수 있다. 플리커는 유사한 메타데이터를 가진 제 2 세트의 이미지들이 2번 검색될 수 있다. 이 제 2 세트의 이미지들에 대해, 이용자의 이미지로부터 GPS 정보가 추가될 수 있다. Alternatively, or in addition, a first set of images in flicker / Google similar to images of the user of the fountain can be identified-by pattern matching as well as by GPS (or both). Metadata can be obtained and ranked from these GPS-matched images. Flickr can be retrieved twice for a second set of images with similar metadata. For this second set of images, GPS information can be added from the user's image.

지리적 위치 이미지에 대한 다른 방식은 유사한 이미지 특성들(예를 들면, 요점(gist), 고유벡터들, 컬러 히스토그램들, 키포인트 디스크립터들 FFT들 등)을 갖는 이미지를 플리커에서 검색하고, 오리지널 이미지의 가능한 위치를 추론하기 위해 식별된 이미지들에 지리적 위치 데이터를 평가한다. 예를 들면, 2008년 Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition의 Hays 등에 의한, IM2GPS: Estimating geographic information from a single image를 참조한다. Hays 논문에 상술된 기술들은 본 기술의 특정 실시예들과 함께 이용하기에 적합하다(추론적 기술의 불확실성을 양자화하기 위한 확률 함수들의 이용을 포함함).Another way for a geographic location image is to search the flicker for an image with similar image characteristics (eg gist, eigenvectors, color histograms, keypoint descriptors FFTs, etc.) Evaluate geographic location data on the identified images to infer location. For example, 2008 Proc. of the IEEE Conf. See IM2GPS: Estimating geographic information from a single image by Hays et al. on Computer Vision and Pattern Recognition. The techniques detailed in the Hays article are suitable for use with certain embodiments of the present technology (including the use of probability functions to quantify the uncertainty of inferential techniques).

지리적 위치 데이터가 카메라에 의해 캡처될 때, 매우 신뢰 가능하다. 또한, 이미지의 소유자에 의해 저작된 메타데이터(위치 등)가 일반적으로 신뢰 가능하다. 그러나, 메타데이터 디스크립터들(지리적 위치 또는 의미)이 추론 또는 추정될 때 또는 이미지가 낯선 사람에 의해 저작될 때, 불확실성 및 다른 문제들이 발생한다. When geographic location data is captured by the camera, it is very reliable. In addition, metadata (such as location) authored by the owner of the image is generally reliable. However, uncertainty and other problems arise when metadata descriptors (geographic location or meaning) are inferred or estimated or when an image is authored by a stranger.

바람직하게, 이러한 본질적인 불확실성은 나중의 이용자들(인간 또는 기계)이 이러한 불확실성을 고려할 수 있는 어떤 방식으로 기억되어야 한다.Preferably, this inherent uncertainty should be remembered in some way so that later users (human or machine) can take this uncertainty into account.

한가지 방식은 디바이스-저작된 또는 생성자-저작된 메타데이터로부터 불확실한 메타데이터를 분리하는 것이다. 예를 들면, 상이한 데이터 구조들이 이용될 수 있다. 또는 그러한 정보의 등급들을 구별하기 위하여 상이한 태그들이 이용될 수 있다. 또는 각각의 메타데이터 디스크립터가 저자, 생성일, 데이터의 소스를 나타내는 그 자신의 서브-메타데이터를 가질 수 있다. 서브-메타데이터의 저자 또는 소스 필드는 디스크립터가 추론되고, 추정되고, 연역되는 등을 나타낸 데이터 스트링을 가질 수 있거나, 그러한 정보는 분리된 서브-메타데이터 태그일 수 있다. One way is to separate the uncertain metadata from device-authored or producer-written metadata. For example, different data structures can be used. Or different tags may be used to distinguish between classes of such information. Or each metadata descriptor may have its own sub-metadata indicating the author, date of creation, and source of data. The author or source field of the sub-metadata may have a data string indicating that the descriptor is inferred, estimated, deduced, or the like, or such information may be a separate sub-metadata tag.

각각의 불확실한 디스크립터는 신뢰 메트릭 또는 랭크가 제공될 수 있다. 이러한 데이터는 명확하게 또는 추론적으로 대중에 의해 결정될 수 있다. 예로는 이용자가 플리커에서 화상을 볼 때, 그녀는 옐로우스톤에서 있는 것이라고 믿고, 95% 신뢰 태그(기여된 위치 메타데이터에 관한 확실성의 그녀의 추정)와 함께 "옐로우스톤" 위치 태그를 추가한다고 가정하는 경우이다. 그녀는 대응하는 50% 신뢰 태그와 함께 "몬테나"를 나타내는 교호하는 위치 메타태그를 추가할 수 있다. (신뢰 태그들은 100%의 합일 필요가 없다. 단 하나의 태그가 기여될 수 있다 - 100%보다 적은 신뢰를 가지고. 또는 여러 태그들이 기여될 수 있다 - 옐로우스톤 및 몬테나의 경우에서와 같이, 오버랩핑될 가능성이 있다.) Each uncertain descriptor may be provided with a confidence metric or rank. Such data can be clearly or speculatively determined by the public. An example assumes that when a user views an image on Flickr, she believes that she is in Yellowstone and adds a "yellowstone" location tag with a 95% trust tag (her estimate of certainty about the contributed location metadata). This is the case. She can add an alternate location meta tag that represents "Montana" with a corresponding 50% trust tag. (Trust tags do not need to be a sum of 100%. Only one tag can be contributed-with less than 100% trust or multiple tags can be contributed-as in the case of Yellowstone and Montena. It may be wrapped.)

여러 이용자들이 이미지에 대한 동일한 타입의 메타데이터에 기여하는 경우(예를 들면, 위치 메타데이터), 조합된 기여들이 평가되어 집단 데이터를 생성할 수 있다. 이러한 정보는 예를 들면, 메타데이터에 기여한 6명의 이용자들 중 5명이 평균 93% 신뢰를 가지고 이미지에 옐로우스톤으로 태깅했고; 메타데이터에 기여한 6명의 이용자들 중 1명이 평균 50% 신뢰를 가지고 이미지에 몬테나로 태깅했고, 6명의 이용자들 중 2명이 평균 15% 신뢰를 가지고 이미지에 글래이셔 국립 공원으로 태깅했음 등을 나타낼 수 있다. If multiple users contribute to the same type of metadata for an image (eg, location metadata), the combined contributions can be evaluated to generate collective data. This information, for example, 5 of 6 users who contributed to the metadata tagged yellowstone in the image with an average of 93% confidence; 1 of 6 users who contributed to the metadata tagged Montana to the image with an average of 50% confidence, 2 of 6 users tagged to Glacier National Park with an average of 15% confidence, and so on. have.

기여자들에 의해 이루어진 명확한 추정들이 이용 가능하지 않을 때, 또는 일상적으로, 메타데이터 신뢰도의 추론적 결정이 실행될 수 있다. 이 예는 도 21 사진 경우이며, 여기서 메타데이터 발생 카운트들은 메타데이터의 각각의 항목의 관련 메리트를 판단하기 위해 이용된다(예를 들면, 전화 = 19 또는 7, 이용된 수학에 의존하여). 유사한 방법들이 여러 메타데이터 기여자들이 주어진 이미지에 대한 디스크립터들을 제공할 때 신뢰도를 랭크하기 위해 이용될 수 있다. When explicit estimates made by contributors are not available, or routinely, inferential determination of metadata reliability may be performed. This example is the case of the FIG. 21 photograph, where the metadata occurrence counts are used to determine the relevant merit of each item of metadata (eg, phone = 19 or 7, depending on the mathematics used). Similar methods can be used to rank the reliability when several metadata contributors provide descriptors for a given image.

온라인 작업자들에 이미지-식별 작업들을 분배하여 결과들을 수집하기 위한, 크라우드-소싱 기술들(crowd-sourcing techniques)이 알려져 있다. 그러나, 식별에 관한 간단한 단기간 합의를 추구하기 위해 종래 기술의 어레인지먼트들이 이해된다. 더 양호한 것은 이미지 콘텐트들(그리고, 선택적으로, 시간에 걸친 변동 및 의존된 소스들에 관한 정보)에 관해 수집된 다른 종류의 의견을 양자화하고, 이미지, 그 값, 그 관련도, 그 이용 등에 관한 더욱 미묘한 차이의 결정들을 자동화된 시스템들이 할 수 있게 하기 위해 더 풍부한 데이터를 이용하는 것으로 보인다. Crowd-sourcing techniques are known for distributing image-identifying tasks to online workers to collect the results. However, prior art arrangements are understood in order to seek a simple short term agreement on identification. Even better is to quantize different kinds of opinions collected about image contents (and optionally, information about variations over time and dependent sources), and about images, their values, their relevance, their use, etc. It seems to use richer data to enable automated systems to make more subtle differences.

예시하기 위하여, 알려진 크라우드-소싱 이미지 식별 기술들은 도 35 이미지를 식별자들 "축구 공" 및 "개 "로 식별할 수 있다. 이들은 하나 또는 여러 뷰어들로부터 합의된 용어들이다. 그러나, 예를 들면, 합산기, 래브라도, 풋볼, 혀, 점심, 저녁, 아침, 김의털(fescue) 등과 같이 대안적인 디스크립터들의 긴 테일에 관한 정보는 무시될 수 있다. 또한, 메타데이터 식별자들, 그들 평가들의 환경들의 역할을 하는 사람들(또는 처리들)에 인구 통계 및 관한 다른 정보가 무시될 수 있다. 더 풍부한 세트의 메타데이터는 이러한 다른 정보를 상술하는 서브-메타데이터의 세트를 각각의 디스크립터와 연관시킬 수 있다. To illustrate, known crowd-sourcing image identification techniques may identify the FIG. 35 image with identifiers “football ball” and “dog”. These are terms agreed from one or several viewers. However, information regarding the long tail of alternative descriptors such as summer, labrador, football, tongue, lunch, dinner, morning, fescue, etc. may be ignored. In addition, demographics and other information about metadata identifiers, people (or processes) serving as environments of their assessments may be ignored. A richer set of metadata can associate each descriptor with a set of sub-metadata that details this other information.

서브-메타데이터는 예를 들면, 태그 "풋볼"이 2008년 6월 18일 브라질에서 21세 남성에 의해 기여되었다고 나타낼 수 있다. 또한, 태그들 "점심", "저녁" 및 "아침"이, 예를 들면 대상들에 대한 조명의 각도에 기초하여, 2008년 7월 2일에 이들 판스테이지들이 이루어진 텍사스의 대학에서 자동화된 이미지 분류기에 의해 기여되었음을 나타낼 수 있다. 이들 3개의 디스크립터들은 또한, 분류기에 의해 할당된 가능성들, 예를 들면, 오후에 대해 50%, 저녁에 대해 30%, 및 아침에 대해 20%(이들 백분율들의 각각은 서브-메타태그로서 저장될 수 있음) 연관되어 있을 수 있다. 분류기에 의해 기여된 하나 이상의 메타데이터 용어들은 할당된 용어들을 이해하는데 도움을 주는 온-라인 용어사전을 가리키는 다른 서브-태그를 가질 수 있다. 예를 들면, 이러한 서브-태그는 용어 "오후"를, 그 용어가 정오에서 7pm까지를 의미하는 것을 나타내는 규정 또는 동의어를 연관시키는 컴퓨터 리소스의 URL을 제공할 수 있다. 용어사전은 또한, "오후"에 의해 의미된 평균 시간이 3:30pm이고, 중간 시간이 4:15pm이고, 용어가 정오에서 7pm까지의 시간 간격에 미치는 의미의 가우시안 함수를 가지는 것을 나타내는 확률 밀도 함수를 나타낼 수 있다. The sub-metadata may indicate, for example, that the tag “Football” was contributed by a 21 year old male in Brazil on June 18, 2008. In addition, the tags “lunch”, “evening” and “morning” are automated images at the University of Texas, where these panstages were made on July 2, 2008, for example based on the angle of illumination for the objects. It may indicate that it has been contributed by the classifier. These three descriptors may also be stored in the possibilities assigned by the classifier, for example 50% for afternoon, 30% for evening, and 20% for morning (each of these percentages being stored as sub-metatags). May be associated). One or more metadata terms contributed by the classifier may have another sub-tag that points to an on-line glossary that helps to understand the assigned terms. For example, such a sub-tag can provide the URL of a computer resource that associates the term "afternoon" with a regulation or synonym indicating that the term means from noon to 7pm. The glossary also has a probability density function indicating that the mean time implied by "afternoon" is 3:30 pm, the median time is 4:15 pm, and the term has a Gaussian function of meaning over time intervals from noon to 7pm. Can be represented.

메타데이터 기여자들의 전문지식은 또한 서브-메타데이터에 반영될 수 있다. 용어 "김의털(fescue)"은 오레곤에서 45세 식물 씨앗 농부에 의해 기여되었음을 나타내는 서브-메타데이터를 가질 수 있다. 자동화된 시스템은 이 메타데이터 용어가 관련 지식 도메인에서 진귀한 전문지식을 가진 사람에 의해 기여되었음을 결론지을 수 있고, 따라서, 디스크립터를 매우 신뢰할 수 있는 것으로(높게 관련되지 않을 수 있더라도) 다룰 수 있다. 이 신뢰도는 메타데이터 콜렉션에 추가될 수 있어서, 메타데이터의 다른 리뷰어들은 자동화된 시스템의 평가로부터 유리할 수 있다. The expertise of metadata contributors can also be reflected in the sub-metadata. The term “fescue” may have sub-metadata indicating that it was contributed by a 45 year old plant seed farmer in Oregon. An automated system can conclude that this metadata term has been contributed by someone with rare expertise in the relevant knowledge domain, and thus can treat the descriptor as highly reliable (although not highly related). This confidence can be added to the metadata collection, so that other reviewers of the metadata can benefit from the evaluation of the automated system.

기여자의 전문지식의 평가는 또한, 기여자에 의해 자체 만들어질 수도 있다. 또는 달리, 기여자의 메타데이터 기여들의 수집된 제 3 자 평가들을 이용하여 평판 랭킹들에 의해 만들어질 수 있다. (이러한 평판 랭킹들은 예를 들면, 이베이 상의 판매자들 및 아마존의 서적 리뷰어들의 공용 평가들로부터 알려져 있다.) 평가들은 필드-특정적일 수 있고, 그래서 사람은 식물 타입들에 관한 지식이 많지만, 개의 품종들에 관해서는 지식이 많지 않은 것으로 판단될 수 있다(또는 자체-판단될 수 있다). 다시, 모든 이러한 정보는 서브-메타태그들(정보가 서브-메타태그에 관한 것일 때, 서브-서브-메타태그들을 포함하여)에 기억되는 것이 바람직하다. An assessment of the contributor's expertise may also be made by the contributor itself. Alternatively, it may be made by reputation rankings using collected third party assessments of the contributor's metadata contributions. (These reputation rankings are known, for example, from public ratings of sellers on eBay and Amazon's book reviewers.) The ratings can be field-specific, so that a person has a lot of knowledge about plant types, With regard to these, it may be judged that there is not much knowledge (or self-judgment). Again, all such information is preferably stored in sub-metatags (including sub-sub-metatags when the information relates to the sub-metatags).

기여자 전문지식 등의 이용을 포함하는 크라우드-소싱에 관한 더 많은 정보는 Digimarc의 공개된 특허 출원 20070162761에서 발견된다. More information about crowdsourcing, including the use of contributor expertise and the like, is found in Digimarc's published patent application 20070162761.

지리적 위치 디스크립터들(숫자, 예를 들면 경도/위도와 관련될 수 있거나 또는 텍스트일 수 있음)의 경우로 돌아가서, 이미지는 - 시간에 걸쳐 - 기여된 지리적 디스크립터들의 긴 카달로그를 축적할 수 있다. 자동화된 시스템(예를 들면, 플리커에서 서버)은 기여된 지리적 태그 정보를 주기적으로 리뷰할 수 있고, 공용 이용을 용이하게 하기 위해 추출할 수 있다. 수와 관련된 정보에 대해, 처리는 유사한 좌표들의 클러스터들을 식별하기 위해 알려진 클러스터링 알고리즘들을 적용할 수 있고, 각각의 클러스터에 대한 평균 위치를 생성하기 위해 이를 평균낼 수 있다. 예를 들면, 온천 사진은 옐로우스톤에서의 위도/경도 좌표들로 어떤 사람에 의해, 그리고, 뉴질랜드의 헬스 게이트 파크의 위도/경도 좌표들로 다른 사람들에 의해 태깅될 수 있다. 따라서, 이들 좌표들은 별도로 평균될 수 있는 개별적인 2 개의 클러스터들을 형성한다. 70%의 기여자들이 옐로우스톤에서의 좌표들에 배치된다면, 추출된(평균된) 값은 70%의 신뢰가 주어질 수 있다. 외부 데이터가 유지될 수 있지만, 낮은 확률은 그 외부 상태와 상응한다고 가정한다. 소유자에 의한 데이터의 이러한 추출은 공용에 의해 판독가능하지만 기록 가능하지는 않은 메타데이터 필드들에 저장될 수 있다. Returning to the case of geographic location descriptors (which may be associated with numbers, for example longitude / latitude or text), the image may accumulate a long catalog of contributing geographic descriptors-over time. An automated system (eg, a server at Flickr) can periodically review the contributed geographic tag information and extract it to facilitate public use. For information related to the number, the process may apply known clustering algorithms to identify clusters of similar coordinates, and average them to generate an average position for each cluster. For example, a hot spring photo can be tagged by someone with latitude / longitude coordinates in Yellowstone, and by others by latitude / longitude coordinates at Healthgate Park in New Zealand. Thus, these coordinates form two separate clusters that can be averaged separately. If 70% of the contributors are placed in the coordinates in Yellowstone, the extracted (averaged) value can be given 70% confidence. External data may be maintained, but a low probability is assumed to correspond to its external state. This extraction of data by the owner may be stored in metadata fields that are publicly readable but not writable.

동일한 또는 다른 방식은 추가된 텍스트의 메타데이터와 함께 이용될 수 있다 - 예를 들면, 관련 신뢰의 감각을 제공하기 위해, 발생 빈도가 축적되어 이에 기초하여 랭킹될 수 있다. The same or another way can be used with metadata of the added text-for example, to provide a sense of related trust, the frequency of occurrence can be accumulated and ranked based on this.

이 명세서에 상술된 기술은 워터마킹, 바-코딩, 핑거프린팅, OCR-디코딩, 및 이미지로부터 정보를 획득하기 위한 다른 방식들을 관련시키는 콘텍스트들에서 다수의 애플리케이션들을 발견한다. 다시, 도 21의 데스크 폰의 셀 폰 사진을 고려하자. 플리커는 대상-유사한 이미지들의 콜렉션을 획득하기 위해 이미지 메트릭들에 기초하여 검색될 수 있다(예를 들면, 상술된 바와 같이). 데이터 추출 처리(예를 들면, 워터마크 디코딩, 핑거프린트 계산, 바코드- 또는 OCR-판독)는 결과로서 생긴 이미지들의 일부 또는 전부에 적용될 수 있고, 그에 의해 수집된 정보는 도 21 이미지에 대한 메타데이터에 적용되고 및/또는 이미지 데이터가 서비스 제공자에 제시될 수 있다(도 21 이미지에 대해 및/또는 관련 이미지들에 대해).The technique described herein in detail finds a number of applications in contexts involving watermarking, bar-coding, fingerprinting, OCR-decoding, and other ways to obtain information from an image. Again, consider the cell phone picture of the desk phone of FIG. Flicker may be retrieved based on image metrics to obtain a collection of object-like images (eg, as described above). The data extraction process (e.g., watermark decoding, fingerprint calculation, barcode- or OCR-reading) can be applied to some or all of the resulting images, and the information gathered by the metadata for the FIG. 21 image. And / or image data may be presented to the service provider (relative to FIG. 21 image and / or relative images).

제 1 검색에 발견된 이미지들의 콜렉션으로부터, 텍스트 또는 GPS 메타데이터가 채집될 수 있고, 제 2 검색이 유사하게-태킹된 이미지들에 행해질 수 있다. 텍스트 태그들 시스코 및 VOIP로부터, 예를 들면, 플리커의 검색은 도 36에 도시된 바와 같이 - OCR-판독가능한 데이터 - 이용자의 폰의 밑면의 사진을 발견할 수 있다. 다시, 추출된 정보는 도 21 이미지에 대한 메타데이터에 추가될 수 있고 및/또는 이용자에게 제공할 수 있는 응답을 향상시키기 위해 서비스 제공자에 제시될 수 있다. From the collection of images found in the first search, text or GPS metadata can be collected and a second search can be done on similarly-tagged images. From the text tags Cisco and VOIP, for example, a search of Flickr can find a picture of the underside of the user's phone-as shown in FIG. 36-OCR-readable data. Again, the extracted information can be added to the metadata for the FIG. 21 image and / or presented to the service provider to enhance the response that can be provided to the user.

방금 도시된 바와 같이, 셀 폰 이용자는 - 관련 이미지들의 대형 콜렉션에 대한 포털로서 하나의 이미지를 이용함으로써 - 오브젝트들 아래 및 코너 주위를 보는 능력이 제공될 수 있다.As just shown, the cell phone user may be provided with the ability to look under and around corners of objects-by using one image as a portal to a large collection of related images.

이용자 인터페이스User interface

도 44 및 도 45a를 참조하면, 셀 폰들 및 관련 휴대 가능한 디바이스들(110)은 통상적으로 디스플레이(111) 및 키패드(112)를 포함한다. 수와 관련된(또는 알파벳과 관련된) 키패드 외에도, 다기능 제어기(114)가 흔히 있을 수 있다. 하나의 인기있는 제어기는 중앙 버튼(118), 및 4개의 주변 버튼들(116a, 116b, 116c 및 116d)을 가진다(또한, 도 37에 도시됨). 44 and 45A, cell phones and associated portable devices 110 typically include a display 111 and a keypad 112. In addition to keypads associated with numbers (or alphabets), multifunction controller 114 may be common. One popular controller has a center button 118 and four peripheral buttons 116a, 116b, 116c and 116d (also shown in FIG. 37).

예시적인 이용 모델은 다음과 같다. 시스템은 셀 폰 디스플레이 상에서 이용자에게 관련 이미지들의 콜렉션을 디스플레이함으로써, 이미지(128)(선택적으로 캡처되거나 무선으로 수신됨)에 응답한다. 예를 들면, 이용자는 이미지를 캡처하여 원격 서비스에 이를 제시한다. 서비스는 제시된 이미지에 대한 이미지 메트릭들을 결정하고(가능하다면, 상술된 바와 같이 전-처리 후에), 시각적으로 유사한 이미지들을 검색한다(예를 들면 플리커). 이들 이미지들은 셀 폰에 송신되고(예를 들면, 서비스에 의해 또는 플리커로부터 직접), 디스플레이를 위해 버퍼링된다. 서비스는 예를 들면, 디스플레이 상에 제공된 명령어들에 의해 이용자가 패턴-유사한 이미지들의 시퀀스를 뷰잉하기 위해 4-방식 제어기(또는 눌러서 유지) 상의 오른쪽-화살표 버튼(116b)을 반복적으로 누르도록 촉구할 수 있다(도 45a, 130). 버튼이 눌러질 때마다, 버퍼링된 명백하게-유사한 이미지들 중 다른 하나가 디스플레이된다.An exemplary usage model is as follows. The system responds to the image 128 (optionally captured or wirelessly received) by displaying a collection of relevant images to the user on the cell phone display. For example, a user captures an image and presents it to a remote service. The service determines image metrics for the presented image (if possible, after pre-processing as described above), and retrieves visually similar images (eg flicker). These images are sent to the cell phone (eg, by service or directly from Flickr) and buffered for display. The service may prompt the user to repeatedly press the right-arrow button 116b on the 4-way controller (or press and hold) to view a sequence of pattern-like images, for example by instructions provided on the display. May be (FIGs. 45A, 130). Each time the button is pressed, the other of the buffered apparently-like images is displayed.

초기에 기술된 것들과 같은 기술들에 의해, 또는 달리, 원격 서비스는 또한, 제시된 이미지에 지리적 위치가 유사한 이미지들을 검색할 수 있다. 이들은 또한, 셀 폰에 송신되어 버퍼링될 수 있다. 명령어들은 이들 GPS-유사한 이미지들을 리뷰하기 위해 제어기의 왼쪽-화살표 버튼(116d)을 누를 수 있다(도 45a, 132). By techniques such as those described earlier, or otherwise, the remote service can also retrieve images whose geographical location is similar to the presented image. They can also be sent to the cell phone and buffered. The instructions may press the left-arrow button 116d of the controller to review these GPS-like images (FIGS. 45A, 132).

유사하게, 서비스는 제시된 이미지와 메타데이터가 유사한 이미지들을 검색할 수 있다(예를 들면, 다른 이미지들로부터 추론된 텍스트의 메타데이터에 기초하여, 패턴 매칭 또는 GPS 매칭에 의해 식별됨). 다시, 이들 이미지들은 폰에 송신되어 즉각적인 디스플레이를 위해 버퍼링될 수 있다. 명령어들은 이들 메타데이터-유사한 이미지들을 뷰잉하기 위해 제어기의 상향 화살표 버튼(116a)을 누르는 것을 권고할 수 있다(도 45a, 134).Similarly, the service can retrieve images that are similar in metadata to the presented image (eg, identified by pattern matching or GPS matching, based on metadata of text inferred from other images). Again, these images can be sent to the phone and buffered for immediate display. Instructions may recommend pressing the up arrow button 116a of the controller to view these metadata-like images (FIGS. 45A, 134).

따라서 오른쪽, 왼쪽, 상향 버튼들을 누름으로써, 이용자는 출현, 위치 또는 메타데이터 디스크립터들에서 캡처된 이미지와 유사한 이미지들을 뷰잉할 수 있다. Thus, by pressing the right, left, and up buttons, the user can view images similar to the image captured in the appearance, location, or metadata descriptors.

이러한 리뷰가 특별히 관심있는 화상을 나타낼 때마다, 이용자는 다운 버튼(116c)을 누를 수 있다. 이 동작은 서비스 제공자에 현재-리뷰된 화상을 식별하며, 그 후에, 기본 이미지로서 현재 뷰잉된 화상으로 처리를 반복할 수 있다. 처리는 그 후에, 기본으로서 이용자-선택된 이미지로 반복되고, 버튼 누름들은 기본 이미지와 출현(16b), 위치(16d) 또는 메타데이터(16a)가 유사한 이미지들의 리뷰를 가능하게 한다. Each time such a review shows an image of particular interest, the user can press down button 116c. This operation identifies the picture that is currently-reviewed to the service provider, and can then repeat the process with the picture that is currently viewed as the base image. The process is then repeated with a user-selected image as a base, and button presses enable review of images that are similar to the base image and appearance 16b, location 16d or metadata 16a.

이 처리는 무기한 계속될 수 있다. 어떤 지점에서, 이용자가 4-방식 제어기의 중앙 버튼(118)을 누를 수 있다. 이 동작은 다른 동작을 위해 서비스 제공자에 디스플레이된 이미지를 제시한다(예를 들면, 초기 인용 문헌들에 개시된 바와 같이, 예를 들면 대응 응답을 트리거링한다). 이 동작은 모든 대안적인 이미지를 제공한 것과 상이한 서비스 제공자를 관련시킬 수 있거나, 이들은 같은 것이 될 수 있다. (후자의 경우, 최종적으로-선택된 이미지는 서비스 제공자가 셀 폰에 의해 버퍼링된 모든 이미지들을 알고 있고 어떤 이미지가 현재 디스플레이되고 있는지 추적할 수 있기 때문에, 서비스 제공자에 송신될 필요가 없다.)This process can continue indefinitely. At some point, the user can press the center button 118 of the four way controller. This action presents an image displayed at the service provider for another action (e.g., triggering a corresponding response, for example, as disclosed in earlier cited documents). This operation may involve different service providers from providing all alternative images, or they may be the same. (In the latter case, the finally-selected image does not need to be sent to the service provider because the service provider knows all the images buffered by the cell phone and can track which image is currently being displayed.)

방금 상술된 정보 브라우징의 차원들은(유사한-출현 이미지들; 유사한-위치 이미지들; 유사한-위치 이미지들; 유사한-메타데이터 이미지들) 다른 실시예들에서 상이할 수 있다. 예를 들면, 입력(또는 위도/경도)으로서 집의 이미지를 취하고, 다음의 이미지들의 시퀀스들을 리턴하는 실시예를 고려하자: (a) 입력-이미징된 집에 가장 가까운 위치의 판매용 집; (b) 입력-이미징된 집에 가장 근접한 가격의 판매용 집; 및 (c) 입력-이미징된 집에 가장 근접한 특징들(예를 들면, 침실들/욕실들)의 판매용 집들. (디스플레이된 집의 용지가 예를 들면, 우편번호, 수도권 영역, 학구 또는 다른 수식 어구에 의해 제한될 수 있다.) The dimensions of the information browsing just described above (similar-appearance images; similar-position images; similar-position images; similar-metadata images) may be different in other embodiments. For example, consider an embodiment that takes an image of a house as input (or latitude / longitude) and returns the following sequences of images: (a) a house for sale at a location closest to the input-imaged house; (b) the home for sale at the price closest to the input-imaged home; And (c) houses for sale of features closest to the input-imaged home (eg, bedrooms / bathrooms). (The displayed home's paper may be restricted, for example, by zip code, metropolitan area, school district, or other modifiers.)

이 이용자 인터페이스 기술의 다른 예는 Xbox 360 게임 콘솔들을 리스팅하는 경매들에 대한 이베이로부터의 검색 결과들의 제공이다. 1 차원은 가격이 될 수 있다(예를 들면, 버튼(116b)을 누르면 최저-가격의 경매들에서 시작하는 Xbox 360 경매들을 보여주는 스크린들의 시퀀스를 생성한다); 다른 것은 이용자에 대한 판매자의 지리적 근접이 될 수 있다(버튼(116d)을 누름으로써 가장 가까운 데서부터 가장 먼 데까지 도시됨); 다른 것은 경매의 종료까지의 시간이 될 수 있다(버튼(116a)을 누름으로써 가장 짧은 시간부터 가장 긴 시간까지 제공됨). 중간 버튼(118)을 누르면, 디스플레이되고 있는 경매의 전체 웹 페이지를 로드할 수 있다. Another example of this user interface technology is the provision of search results from eBay for auctions listing Xbox 360 game consoles. One dimension can be a price (eg, pressing button 116b creates a sequence of screens showing Xbox 360 auctions starting at the lowest-priced auctions); Others may be the geographic proximity of the seller to the user (shown from the closest to the farthest by pressing button 116d); The other can be the time until the end of the auction (provided from the shortest time to the longest time by pressing button 116a). Pressing the middle button 118 can load the entire web page of the auction being displayed.

관련 예는 차량을 식별하고, 이베이 및 크라이그스리스트(Craigslist)에서 유사한 차량들을 검색하고, 결과들을 스크린 상에 제공함으로써(이미지 특징들 및 연관된 데이터베이스(들)를 이용함) 차량의 이용자-캡처된 이미지에 응답하는 시스템이다. 버튼(116b)을 누르면, 전국적으로, 입력 이미지에 대한 유사성(먼저, 동일한 모델 연식/동일한 컬러, 그 후에 가장 가까운 모델 연식/컬러들)에 기초하여, 판매용으로 제공된 차량들에 관한 정보(예를 들면, 이미지, 판매자 위치 및 가격을 포함함)의 스크린들을 제공한다. 버튼(116d)을 누르면, 그러한 스크린들의 시퀀스를 생성하지만, 이용자의 상태에 제한되지 않는다(또는 수도권 지역 또는 50마일 반경의 이용자의 위치 등). 버튼(116a)을 누르면, 다시 지리적으로 제한된 그러한 스크린들의 시퀀스를 생성하지만, 이것은 오름차순 가격의 순서(가장 가까운 모델 연식/컬러보다는)로 시간 제공된다. 다시, 중간 버튼을 누르면, 최종-디스플레이된 차량의 전체 웹 페이지(이베이 또는 크라이그슬리스트를 로드한다. A related example is a user-captured image of a vehicle by identifying the vehicle, searching for similar vehicles on eBay and Craigslist, and providing the results on the screen (using image features and associated database (s)). Is a system that responds to. When button 116b is pressed, information about vehicles provided for sale (e.g., based on similarity (first model year / same color, then closest model year / colors) to the input image, nationwide, For example, images, seller location and price). Pressing button 116d creates such a sequence of screens, but is not limited to the user's condition (or metropolitan area or the user's location in a 50 mile radius, etc.). Pressing button 116a again creates a sequence of such geographically limited screens, but this is provided in time in ascending order of prices (rather than the closest model year / color). Again, pressing the middle button loads the entire web page (eBay or Craigslist) of the last-displayed vehicle.

다른 실시예는 이름을 리콜하도록 사람들 돕는 애플리케이션이다. 이용자는 파티에서 친숙한 사람을 보지만, 그의 이름을 기억할 수 없다. 몰래, 이용자는 그 사람의 화상을 스냅핑하고, 이미지가 원격 서비스 제공자에 송신된다. 서비스 제공자는 얼굴 인식 파라미터들을 추출하고, 소셜 네트워크 사이트들 또는 이들 사이트들 상의 이미지들에 대한 얼굴 인식 파라미터들을 포함하는 분리된 데이터베이스에서 유사한-출현 얼굴들을 검색한다(예를 들면, 페이스북, 마이스페이스, 링크드-인).(서비스는 사이트들에 이용자의 서명된 크리덴셜들(user's sign-on credentials)을 제공할 수 있어서, 정보의 검색을 허용하고, 그렇지 않으면 공용으로 액세스 가능하지 않다.) 검색을 통해 찾은 유사한 출현 사람들에 관한 이름 및 다른 정보는 이용자의 셀 폰에 리턴된다 - 이용자의 기억을 환기시키도록 돕기 위해. Another embodiment is an application that helps people to recall names. The user sees a familiar person at the party, but cannot remember his name. Secretly, the user snaps the person's picture and the image is sent to the remote service provider. The service provider extracts face recognition parameters and retrieves similar-appearing faces in a separate database that includes face recognition parameters for social network sites or images on those sites (eg, Facebook, MySpace). (Linked-in). (The service may provide users with user's sign-on credentials, allowing for retrieval of information, otherwise not publicly accessible.) Names and other information about similar appearing people found through are returned to the user's cell phone-to help remind the user's memory.

다양한 UI 절차들이 고찰된다. 데이터가 원격 서비스로부터 리턴될 때, 이용자는 지리학에 상관없이 가장 유사한 순서로 매칭들을 스크롤하기 위해 버튼(116b)을 누를 수 있다. 연관된 이름 및 다른 프로파일 정보와 매칭된 개인들의 섬네일들이 디스플레이될 수 있거나, 사람의 전체 스크린 이미지들만이 - 이름이 오버랩되어 - 제공될 수 있다. 친숙한 사람이 인식되면, 이용자는 그 사람에 대해 전체 페이스북/마이스페이스/링크드-인 페이지를 로드하기 위해 버튼(118)을 누를 수 있다. 대안적으로, 이름들과 함께 이미지들을 제공하는 대신, 이름들의 텍스트의 리스트만이 예를 들면 단일 스크린 상에 모두 제공될 수 있다 - 얼굴-매칭의 유사성의 순서로; SMS 텍스트 메시징은 이러한 최종 어레인지먼트에 만족할 수 있다. Various UI procedures are considered. When data is returned from the remote service, the user can press button 116b to scroll the matches in the most similar order regardless of geography. Thumbnails of individuals that match the associated name and other profile information may be displayed, or only full screen images of the person may be provided-with overlapping names. Once a familiar person is recognized, the user can press button 118 to load the entire Facebook / MySpace / Linked-In page for that person. Alternatively, instead of providing images with names, only a list of texts of names may be provided, for example, all on a single screen-in order of similarity of face-matching; SMS text messaging may be satisfied with this final arrangement.

버튼(116d)을 누르면, 이용자의 현재 위치 또는 이용자의 기준 위치(예를 들면, 집)의 특정 지리적 근접(예를 들면, 동일한 수도권 영역, 동일한 주, 동일한 캠퍼스 등) 내에서와 같이 그들 거주지를 리스팅하는 사람의 가장 가까운-유사성의 순서로 매칭들을 스크롤할 수 있다. 버튼(116a)을 누르면, 유사한 디스플레이가 생성될 수 있지만, 소셜 네트워크 내의 이용자의 "친구들"인 사람들(또는 친구들의 친구들인 사람들, 또는 이용자의 다른 지정된 정도의 분리 내에 있는 사람들)에 제한된다. When button 116d is pressed, their residence, such as within a user's current location or within a particular geographic proximity of the user's reference location (eg, home) (eg, the same metropolitan area, same state, same campus, etc.) You can scroll the matches in order of the nearest-similarity of the listinger. When the button 116a is pressed, a similar display can be generated, but limited to those who are "friends" of the user in the social network (or those who are friends of friends, or those who are within another specified degree of separation of the user).

관련 어레인지먼트는 공무원이 사람의 이미지를 캡처하고, 정부 운전 면허증 기록들 및/또는 다른 소스들 얼굴 초상화/고유값 정보를 포함하는 데이터베이스에 이를 제시하는 법률 집행 도구이다. 버튼(116b)을 누르면, 스크린으로 하여금, 가장 가까운 얼굴 매칭들을 갖는 전국적으로 사람들에 관한 이미지들의 시퀀스/전기적 서류 일체들을 디스플레이하게 한다. 버튼(116d)을 누르면, 스크린으로 하여금, 유사한 시퀀스를 디스플레이하게 하지만, 공무원의 주 내의 사람들에 제한된다. 버튼(116a)은 그러한 시퀀스를 생성하지만, 공무원이 일하고 있는 수도권 영역 내의 사람들에 제한된다. Related arrangements are law enforcement tools that allow officials to capture an image of a person and present it in a database that includes government driver license records and / or other source face portrait / unique information. Pressing button 116b causes the screen to display a sequence / electrical documents of images about people nationwide with the closest face matches. Pressing button 116d causes the screen to display a similar sequence, but limited to those within the state of the civil service. Button 116a creates such a sequence, but is limited to those in the metropolitan area in which the civil servant is working.

정보 브라우징(예를 들면, 유사한-출현 이미지들/유사하게-위치된 이미지들/유사한 메타데이터-태깅된 이미지들에 대한 버튼들(116b, 116d, 116a))의 3 차원들 대신, 다소의 차원들이 활용될 수 있다. 도 45b는 2 차원들에서 브라우징 스크린들을 보여준다. (오른쪽 버튼을 누르면, 정보 스크린들의 제 1 시퀀스(140)를 생성한다; 왼쪽 버튼을 누르면, 정보 스크린들의 상이한 시퀀스(142)를 생성한다.)Instead of three dimensions of information browsing (e.g., buttons 116b, 116d, 116a for similar-appearing images / similarly-located images / similar metadata-tagged images) Can be utilized. 45B shows browsing screens in two dimensions. (Pressing the right button produces a first sequence of information screens 140; pressing the left button produces a different sequence of information screens 142.)

2개 이상의 개별 버튼들 대신에, 단일 UI 제어가 활용되어, 정보의 이용 가능한 차원들에서 네비게이팅할 수 있다. 조이스틱이 하나의 이러한 디바이스이다. 다른 것은 롤러 휠(또는 스크롤 휠)이다. 도 44의 휴대용 디바이스(110)는 그 측면상에, 롤업 또는 롤다운할 수 있는 롤러 휠(124)을 구비한다. 이것은 또한 선택하기 위해 안으로 눌려질 수 있다(예를 들면, 초기에 논의된 제어기의 버튼들(116c 또는 118)과 유사함). 유사한 제어들이 많은 마우스들 상에서 이용 가능하다. Instead of two or more individual buttons, a single UI control can be utilized to navigate in the available dimensions of the information. Joysticks are one such device. The other is a roller wheel (or scroll wheel). The portable device 110 of FIG. 44 has on its side a roller wheel 124 that can roll up or roll down. This can also be pressed in to select (for example, similar to the buttons 116c or 118 of the controller discussed earlier). Similar controls are available on many mice.

대부분의 이용자 인터페이스들에서, 대향하는 버튼들(예를 들면, 왼쪽 버튼(116b) 및 오른쪽 버튼(116d))은 동일한 차원의 정보를 네비게이팅할 수 있다 - 단지 반대 방향들로(예를 들면, 순방향/역방향). 상기 논의된 특정 인터페이스에서, 이것은 그 경우가 아님을 인식할 것이다(다른 구현들에서는 그럴 수도 있겠지만). 오른쪽 버튼(116b)을 누른 다음 왼쪽 버튼(116d)을 누르면, 시스템이 오리지널 상태로 리턴하지 않는다. 대신에, 오른쪽 버튼을 누르면, 예를 들면, 제 1 유사하게-나타나는 이미지를 제공하고, 왼쪽 버튼을 누르면, 예를 들면, 제 1 유사하게-위치된 이미지를 제공한다. In most user interfaces, opposing buttons (eg, left button 116b and right button 116d) can navigate through the same dimension of information-only in opposite directions (eg, Forward / reverse). In the particular interface discussed above, it will be appreciated that this is not the case (though in other implementations it may be). If the right button 116b is pressed and then the left button 116d is pressed, the system does not return to the original state. Instead, pressing the right button provides, for example, a first similarly- appearing image, and pressing the left button provides, for example, a first similarly- located image.

때때로, 스크린들의 동일한 시퀀스를 통하지만, 방금-리뷰된 순서의 역방향으로 네비게이팅하는 것이 바람직하다. 다양한 인터페이스 제어들이 이를 행하기 위해 활용될 수 있다. Sometimes it is desirable to navigate through the same sequence of screens, but in the reverse order of just-reviewed. Various interface controls can be utilized to do this.

하나는 "역방향" 버튼이다. 도 44의 디바이스(110)는 아직 논의되지 않은 다양한 버튼들을 포함한다(예를 들면, 제어기(114)의 주변 주위에 버튼들(120a - 120f). 이들 중 어느 하나는 - 눌려진다면 - 스크롤 순서의 역방향으로 서빙할 수 있다. 예를 들면 버튼(120a)을 누름으로써, 근처 버튼(116b)과 연관된 스크롤링(제공) 방향은 역방향일 수 있다. 그래서, 버튼(116b)이 보통, 증가하는 비용의 순서로 항목들을 제공한다면, 버튼(120a)의 활성화는 버튼(116b)의 기능이 예를 들면, 감소하는 비용의 순서로 항목을 제공하도록 스위칭하게 할 수 있다. 버튼(116b)의 이용으로부터 유발되는 스크린들을 리뷰시, 이용자가 "오버슈팅하고" 방향을 반대로 하기를 원한다면, 그녀는 스크린들을 버튼(120a)을 누른 다음 버튼(116b)을 다시 누를 수 있다. 초기에 제공된 스크린(들)은 그 후에 역방향 순서로 나타난다 - 현재 스크린에서 시작한다. One is the "Reverse" button. The device 110 of Figure 44 includes various buttons that have not yet been discussed (eg, buttons 120a-120f around the periphery of the controller 114. Either-if pressed-in scrolling order). Serving in the reverse direction, for example, by pressing button 120a, the scrolling (providing) direction associated with nearby button 116b can be reversed, so that button 116b is usually in an order of increasing cost. Activating the button 120a may cause the function of the button 116b to switch to, for example, providing the items in order of decreasing cost. Upon reviewing, if the user wishes to "overshoot" and reverse the direction, she can press the screens 120a and then the button 116b again. In order - the current begins on the screen.

또는 이러한 버튼(예를 들면, 120a 또는 120f)의 동작은 반대 버튼(116d)이 역방향 순서로, 버튼(116b)의 활성화에 의해 제공된 스크린들을 통해 반대로 스크롤되게 할 수 있다. Alternatively, the operation of such a button (eg, 120a or 120f) may cause the opposite button 116d to scroll backward through the screens provided by the activation of the button 116b in reverse order.

텍스트의 또는 심볼의 촉구는 모든 이들 실시예들에서 디스플레이 스크린 상에서 오버레이될 수 있다 - 브라우징되고 있는 정보의 차원 및 방향(예를 들면, 비용에 의해 브라우징: 증가)을 이용자에게 통보한다. The prompting of a text or symbol can be overlaid on the display screen in all these embodiments-notifying the user of the dimension and direction (eg, browsing by cost: increase) of the information being browsed.

또 다른 어레인지먼트들에서, 단일 버튼은 다수의 기능들을 실행할 수 있다. 예를 들면, 버튼(116b)을 누르면, 시스템이 예를 들면, 이용자의 위치에 가장 가까운 판매하기 위한 집들의 화상들을 보여주는 스크린들의 시퀀스를 제공하기 시작하게 할 수 있다 - 800밀리초마다 제공함(이용자에 의해 입력된 선호 데이터에 의해 간격 설정). 버튼(116b)을 2번 누르면, 시스템이 - 판매하기 위한 집의 정적 스크린을 디스플레이하는 - 시퀀스를 중단하게 할 수 있다. 버튼(116b)을 3번 누르면, 시스템이 정적 스크린에서 시작하여 초기에 제공된 스크린들을 통해 역방향으로 진행하는 역방향 순서로 스크린을 제공하게 할 수 있다. 버튼들(116a, 116b 등)의 반복된 동작은 마찬가지로 동작할 수 있다(그러나, 예를 들면, 가장 가까운 가격의 집들 및 가장 가까운 특징들의 집들과 같은 상이한 정보 시퀀스들을 제어한다). In still other arrangements, a single button can execute multiple functions. For example, pressing button 116b may cause the system to begin providing a sequence of screens showing, for example, images of houses for sale closest to the user's location-providing every 800 milliseconds (user The interval set by the preference data entered by Pressing button 116b twice can cause the system to abort the sequence-displaying the static screen of the house for sale. Pressing button 116b three times may cause the system to present the screens in reverse order starting from the static screen and proceeding backward through the initially provided screens. Repeated operation of the buttons 116a, 116b, etc. may likewise operate (but control different information sequences, such as, for example, the houses of the closest price and the houses of the closest features).

제공된 정보가 기본 이미지(예를 들면, 이용자에 의해 스냅핑된 화상)에 적용되는 처리로부터 회전하는 어레인지먼트들에서, 이 기본 이미지는 디스플레이를 통해 제공될 수 있다 - 예를 들면, 디스플레이 코너에서의 섬네일과 같이. 또는 디바이스 상의 버튼(예를 들면, 126a 또는 120b)은 디스플레이에 다시 기본 이미지를 즉각적으로 호출하도록 동작될 수 있다. In arrangements in which the information provided is rotated from a process in which the information provided is applied to a base image (e.g., a picture snapped by the user), this base image can be provided via the display-for example, a thumbnail at the display corner. As shown. Or a button on the device (eg, 126a or 120b) may be operable to immediately call the base image back to the display.

애플 및 마이크로소프트로부터 이용 가능한 제품들에서와 같이, (예를 들면, 애플의 특허 공개들 20060026535, 20060026536, 20060250377, 20080211766, 20080158169, 20080158172, 20080204426, 20080174570에 및 마이크로소프트의 특허 공개들 20060033701, 20070236470 및 20080001924에 상술됨), 터치 인터페이스들이 인기를 모으고 있다. 이러한 기술들은 방금-리뷰된 이용자 인터페이스 개념들 - 더 큰 정도들의 유연성 및 제어를 허용함 - 을 향상시키고 확장하기 위해 이용될 수 있다. 상기 주지된 각각의 버튼 누름은 터치 스크린 시스템의 어휘에서 상응하는 제스처를 가질 수 있다. As in products available from Apple and Microsoft (e.g., in Apple's patent publications 20060026535, 20060026536, 20060250377, 20080211766, 20080158169, 20080158172, 20080204426, 20080174570 and Microsoft's patent publications 20060033701, 20070236470 and Touch interfaces are gaining popularity. These techniques can be used to enhance and extend just-reviewed user interface concepts—allowing greater degrees of flexibility and control. Each button press noted above may have a corresponding gesture in the vocabulary of the touch screen system.

예를 들면, 상이한 터치-스크린 제스처들은 방금 리뷰된 상이한 타입들의 이미지 피드들의 디스플레이를 호출할 수 있다. 예를 들면, 오른쪽으로의 브러싱 제스처는 유사한 비주얼 콘텐트를 갖는 이미지의 이미지 프레임들(130)의 우향-스크롤 시리즈들을 제공할 수 있다(초기 스크롤 속도는 이용자 제스처의 속도에 의존하고, 스크롤 속도는 시간에 걸쳐 감속됨 - 또는 감속되지 않음). 왼쪽으로의 브러싱 제스처는 유사한 GPS 정보를 갖는 이미지(132)의 유사한 좌향-스크롤 디스플레이를 제공할 수 있다. 상향 브러싱 제스처는 메타데이터가 유사한 이미지(134)의 상향-스크롤 디스플레이로 이미지를 제공할 수 있다. 임의의 지점에서, 이용자는 디스플레이된 이미지 중 하나를 기본 이미지로 만들기 위해 이를 탭핑할 수 있고, 처리는 반복한다.For example, different touch-screen gestures may invoke the display of different types of image feeds just reviewed. For example, the brushing gesture to the right may provide right-scroll series of image frames 130 of the image with similar visual content (initial scrolling speed depends on the speed of the user gesture and scrolling speed is time). Slowed over-or not slowed down). The brushing gesture to the left may provide a similar left-scroll display of the image 132 with similar GPS information. The upward brushing gesture may present the image to an up-scroll display of the image 134 with similar metadata. At any point, the user can tap on one of the displayed images to make it the base image, and the process repeats.

다른 제스처들은 또 다른 동작들을 호출할 수 있다. 하나의 이러한 동작은 선택된 이미지와 연관된 GPS 위치에 대응하는 오버헤드 이미지를 디스플레이하는 것이다. 이미지는 다른 제스처들과 함께 줌 인/아웃될 수 있다. 이용자는 사진 이미지, 맵 데이터, 하루 중 상이한 시간들 또는 상이한 날짜들/계절들, 및/또는 다양한 오버레이들(토포그래픽, 관심있는 장소들 및, 구글 어스로부터 알려진 바와 같은 다른 데이터)로부터의 데이터 등을 디스플레이하기 위해 선택될 수 있다. 아이콘들 또는 다른 그래픽들이 특정 이미지의 콘텐트들에 의존하여 디스플레이 상에 제공될 수 있다. 하나의 이러한 어레인지먼트는 Digimarc의 공개된 출원 20080300011에 상술된다. Other gestures can invoke further actions. One such operation is to display an overhead image corresponding to the GPS location associated with the selected image. The image can be zoomed in / out along with other gestures. The user may have photographic images, map data, different times of day or different dates / seasons, and / or data from various overlays (topographical, places of interest, and other data as known from Google Earth), etc. Can be selected to display. Icons or other graphics may be provided on the display depending on the contents of the particular image. One such arrangement is detailed in Digimarc's published application 20080300011.

"커브사이드(Curbside)" 또는"스트리트-레벨(street-level)" 이미지가 - 오버헤드 이미지보다는 - 또한 디스플레이된다. "Curbside" or "street-level" images are also displayed-rather than overhead images.

본 기술의 특정 실시예들이 공유된 일반 구조를 포함하는 것을 인식할 것이다. 초기 세트의 데이터(예를 들면, 이미지 또는 디스크립터들이나 지리적 코드 정보와 같은 메타데이터, 또는 고유값들과 같은 이미지 메트릭들)가 제공된다. 이로부터, 제 2 세트의 데이터(예를 들면, 이미지, 또는 이미지 메트릭들 또는 메타데이터)가 획득된다. 제 2 세트의 데이터로부터, 제 3 세트의 데이터가 컴파일된다(예를 들면, 유사한 이미지 메트릭들 또는 유사한 메타데이터를 가진 이미지들 또는 이미지 메트릭들 또는 메타데이터). 제 3 세트로부터의 데이터로부터의 항목들은 처리의 결과로서 이용될 수 있거나, 예를 들면, 제 4 데이터를 결정하는데 제 3 세트의 데이터를 이용함으로써(예를 들면, 기술 메타데이터의 세트는 제 3 세트의 이미지들로부터 컴파일될 수 있다) 처리가 계속될 수 있다. 이것은 예를 들면, 제 4 데이터 세트로부터 제 5 세트의 데이터를 결정하는 것을 계속할 수 있다(예를 들면, 제 4 데이터 세트로부터 메타데이터 용어들을 가지는 이미지들의 콜렉션을 식별함). 제 6 세트의 데이터는 5 세트의 데이터로부터 획득될 수 있다 등(예를 들면, 5 세트의 데이터에서 어떤 이미지들이 태깅되는지로 GPS의 클러스터들을 식별함). It will be appreciated that certain embodiments of the present technology include a shared general structure. An initial set of data (e.g., metadata such as image or descriptors or geographic code information, or image metrics such as unique values) is provided. From this, a second set of data (eg, image, or image metrics or metadata) is obtained. From the second set of data, the third set of data is compiled (eg, images or image metrics or metadata with similar image metrics or similar metadata). Items from data from the third set may be used as a result of the processing, or for example, by using the third set of data to determine the fourth data (eg, the set of descriptive metadata is third May be compiled from the images of the set) processing may continue. This may, for example, continue to determine a fifth set of data from the fourth data set (eg, identify a collection of images with metadata terms from the fourth data set). The sixth set of data may be obtained from five sets of data and the like (eg, identifying clusters of GPS by which images are tagged in the five sets of data).

데이터의 세트들은 이미지들일 수 있거나, 다른 차입들의 데이터들일 수 있다(예를 들면, 이미지 메트릭들, 텍스트의 메타데이터, 지리적 위치 데이터, 디코딩된 OCR-, 바코드- , 워터마크-데이터 등).The sets of data may be images or may be data of other borrows (eg, image metrics, metadata of text, geographic location data, decoded OCR-, barcode-, watermark-data, etc.).

임의의 데이터가 시드로서 서빙될 수 있다. 처리는 이미지 데이터와 함께 시작할 수 있거나, 이미지 메트릭들, 텍스트의 메타데이터(의미 메타데이터와 유사), 지리적 위치 정보(예를 들면, GPS 좌표들), 디코딩된 OCR/바코드/워터마크 데이터 등과 같은 다른 정보와 함께 시작할 수 있다. 제 1 타입의 정보(이미지 메트릭들, 의미 메타데이터, GPS 정보, 디코딩된 정보)로부터, 제 1 세트의 정보-유사 이미지들이 획득될 수 있다. 그 제 1 세트로부터, 제 2의 상이한 타입의 정보(이미지 메트릭들/의미 메타데이터/GPS/디코딩된 정보 등)가 모일 수 있다. 그 제 2 타입의 정보로부터, 제 2 세트의 정보-유사 이미지들이 획득될 수 있다. 그 제 2 세트로부터, 제 3의 상이한 타입의 정보(이미지 메트릭들/의미 메타데이터/GPS/디코딩된 정보 등)가 모일 수 있다. 그 제 3 타입의 정보로부터, 제 3 세트의 정보-유사 이미지들이 획득될 수 있다. 등. Any data can be served as a seed. Processing may begin with image data, or may include image metrics, metadata of text (similar to meaningful metadata), geographic location information (eg, GPS coordinates), decoded OCR / barcode / watermark data, and the like. You can start with other information. From a first type of information (image metrics, semantic metadata, GPS information, decoded information), a first set of information-like images can be obtained. From that first set, second different types of information (image metrics / meaning metadata / GPS / decoded information, etc.) may be gathered. From that second type of information, a second set of information-like images can be obtained. From that second set, third different types of information (image metrics / meaning metadata / GPS / decoded information, etc.) may be gathered. From that third type of information, a third set of information-like images can be obtained. Etc.

따라서, 예시된 실시예들이 일반적으로 이미지와 함께 시작한 다음, 이미지 메트릭들을 참조하여 처리되지만, 동작들의 완전히 상이한 조합들도 또한 가능하다. 시드는 제품 바코드로부터의 패이로드일 수 있다. 이것은 동일한 바코드를 묘사하는 이미지들의 제 1 콜렉션을 생성할 수 있다. 이것은 공용 메타데이터의 세트를 유발할 수 있다. 이것은 그 메타데이터에 기초하여 이미지들의 제 2 콜렉션을 유발할 수 있다. 이미지 메트릭들은 이 제 2 콜렉션으로부터 계산되고, 가장 우세한 메트릭들이 이미지들의 제 3 콜렉션을 검색 및 식별하기 위해 이용될 수 있다. 이렇게 식별된 이미지들은 상기 주지된 어레인지먼트들을 이용하여 이용자에게 제공될 수 있다. Thus, although the illustrated embodiments generally start with an image and then are processed with reference to image metrics, completely different combinations of operations are also possible. The seed may be a payload from the product barcode. This may create a first collection of images depicting the same barcode. This can lead to a set of public metadata. This may result in a second collection of images based on the metadata. Image metrics are calculated from this second collection, and the most prevalent metrics can be used to retrieve and identify the third collection of images. The images thus identified may be provided to the user using the above known arrangements.

본 기술의 특정 실시예들은 반복적이고, 순환적인 처리를 이용하는 것으로 간주될 수 있으며, 그에 의해 한 세트의 이미지들(많은 초기 경우들에서 단일 이미지)에 관한 정보는 제 3 세트의 이미지들을 식별하기 위해 이용될 수 있는 제 2 세트의 이미지들을 식별하기 위해 이용된다. 각 세트의 이미지들이 다음에 관련되는 기능은 특정 등급의 이미지 정보, 예를 들면, 이미지 메트릭들, 의미 메타데이터, GPS, 디코딩된 정보 등에 관련된다. Certain embodiments of the present technology may be considered to use iterative, circular processing, whereby information about a set of images (a single image in many initial cases) may be used to identify a third set of images. It is used to identify a second set of images that can be used. The functionality with which each set of images is then associated with a particular class of image information, such as image metrics, semantic metadata, GPS, decoded information, and the like.

다른 콘텍스트들에서, 한 세트의 이미지들과 다음 세트의 이미지들 사이의 관계는 한 등급의 정보뿐만 아니라 2개 이상의 등급의 정보의 기능이다. 예를 들면, 시드 이용자 이미지는 이미지 메트릭들 및 GPS 데이터 양쪽 모두에 대해 조사될 수 있다. 이들 두 등급들의 정보로부터, 이미지들의 콜렉션이 결정될 수 있다 - 어떤 양태의 비주얼 출현 및 위치 양쪽 모두가 유사한 이미지들. 관계들의 다른 한 쌍들, 세 쌍들 등이 자연스럽게 활용될 수 있다 - 연속하는 세트들의 이미지들 중 어느 것의 결정시.In other contexts, the relationship between one set of images and the next set of images is a function of not only one class of information, but also two or more classes of information. For example, the seed user image may be examined for both image metrics and GPS data. From these two classes of information, a collection of images can be determined-images that are similar in both visual appearance and position of some aspect. Other pairs of relationships, three pairs, etc. may naturally be utilized-in the determination of any of the successive sets of images.

다른 논의Other discussion

본 기술의 일부 실시예들은 소비자 셀 폰 화상을 분석하고, 화상의 대상에 관한 정보를 발견적 교수적으로 결정한다. 예를 들면, 사람, 장소 또는 물건인가? 이러한 고 레벨의 결정으로부터, 시스템은 어떤 타입의 응답이 소비자에 의해 추구될 수 있는지 - 동작을 더욱 직관적으로 만듦 - 를 더욱 양호하게 공식화할 수 있다. Some embodiments of the present technology analyze consumer cell phone images and heuristically determine information about the subject of the pictures. For example, is it a person, a place or a thing? From this high level of determination, the system can better formulate what type of response can be pursued by the consumer—making the operation more intuitive.

예를 들면, 사진의 대상이 사람이면, 소비자는 페이스북 "친구"로서 묘사된 사람을 추가하는데 관심이 있을 수 있다. 또는 그 사람에게 텍스트 메시지를 송신할 수 있다. 또는 웹 사이트에 사진의 주석달린 버전을 공개할 수 있다. 또는 간단히 그 사람이 누구인지 학습할 수 있다. For example, if the subject of the picture is a person, the consumer may be interested in adding a person depicted as a Facebook "friend." Or send a text message to the person. Alternatively, you can publish an annotated version of the photo on your website. Or simply learn who the person is.

대상이 장소이면(예를 들면, 타임 스퀘어), 소비자는 로컬 지리학, 맵들 및 근처의 인기있는 것에 관심이 있을 수 있다. If the object is a place (eg, Times Square), the consumer may be interested in local geography, maps, and popular things nearby.

대상이 물건이면(예를 들면, 자유의 종 또는 맥주병), 소비자는 오브젝트에 관한 정보(예를 들면, 그 역사, 이를 이용하는 다른 것들)에 또는 오브젝트를 사거나 파는 것 등에 관심이 있을 수 있다.If the object is an object (e.g., Liberty Bell or a beer bottle), the consumer may be interested in information about the object (e.g., its history, other things using it), or buying or selling the object.

이미지 타입에 기초하여, 예시적인 시스템/서비스는 소비자가 셀 폰 이미지에 가장 적절하게 응답하는 것을 찾을 것을 예상하는 하나 이상의 동작들을 식별할 수 있다. 이들 중 하나 또는 전부가 착수되어, 리뷰를 위해 소비자의 셀 폰 상에 캐싱될 수 있다. 예를 들면, 셀 폰의 측면 상에 섬휠을 스크롤하면, 상이한 스크린들의 연속들을 제공할 수 있다 - 각각은 상이한 정보로 이미지 대상에 응답한다. (또는 스크린은 몇몇의 가능한 동작들 중 어느 것이 바람직한 것에 관해 소비자에게 질의하는 것이 제공될 수 있다.)Based on the image type, the example system / service may identify one or more operations that the consumer expects to find the most appropriate response to the cell phone image. One or all of these may be undertaken and cached on the consumer's cell phone for review. For example, scrolling the thumbwheel on the side of the cell phone can provide a series of different screens-each responding to an image object with different information. (Or the screen may be provided to query the consumer about which of the several possible actions is desired.)

이용시, 시스템은 이용 가능한 동자들 중 어느 것이 소비자에 의해 선택되는지를 모니터링할 수 있다. 소바자의 이용 이력은 소비자의 관심들 및 바람들의 베이스 모델을 개량하기 위해 활용될 수 있어서, 미래의 응답은 이용자에게 더욱 양호화게 맞춤식될 수 있다. In use, the system can monitor which of the available partners is selected by the consumer. The use history of soba can be utilized to refine the base model of consumer interests and wishes, so that future responses can be customized to better suit the user.

이들 개념들은 예에 의해 더욱 명확해질 것이다(예를 들면, 도 46 및 도 47에서 묘사된 양태들).These concepts will be clearer by way of example (eg, the aspects depicted in FIGS. 46 and 47).

샘플 이미지들의 세트의 처리Processing a Set of Sample Images

여행자가 셀 폰 또는 다른 모바일 디바이스를 이용하여 뉴욕의 록펠러 센터에 있는 프로메테우스 조각상의 사진을 스냅핑하는 것을 가정한다. 처음에, 그것은 단지 한 다발의 픽셀들이다. 무엇을 할 것인가?Suppose a traveler snaps a picture of a Prometheus statue at Rockefeller Center in New York using a cell phone or other mobile device. At first, it is only one bunch of pixels. What will you do?

이미지가 위치 정보(예를 들면, XMP- 또는 EXIF- 메타데이터의 위도/경도)로 지리적 코딩된다고 가정한다. Assume that an image is geocoded with location information (e.g., latitude / longitude of XMP- or EXIF- metadata).

지리적 코드 데이터로부터, 제 1 세트의 이미지들에 대한 플리커의 검색이 착수될 수 있다 - 동일한(또는 근처의) 위치로부터 취해짐. 아마도, 이 제 1 세트에서 5 또는 500개의 이미지들이 있을 것이다. From the geographic code data, a search of Flickr for the first set of images can be undertaken-taken from the same (or nearby) location. Perhaps there will be 5 or 500 images in this first set.

이 세트의 이미지들로부터의 메타데이터가 수집된다. 메타데이터는 다양한 타입들이 있을 수 있다. 하나는 이미지에 주어진 타이틀로부터의 단어들/구문들이다. 다른 하나는 이미지에 할당된 메타태그들의 정보이다 - 일반적으로, 사진가에 의해 (예를 들면, 사진 대상 및 특정 속성들/키워드들을 명명), 그러나 부가적으로, 캡처 디바이스(예를 들면, 카메라 모델, 사진의 날짜/시간, 위치 등을 식별)에 의해. 다른 것은 사진가에 의해 저작된 사진의 서술적 기술에서의 단어들/구문들이다. Metadata from this set of images is collected. The metadata may be of various types. One is words / phrases from the title given to the image. The other is the information of the meta tags assigned to the image-generally by the photographer (eg naming the photographic subject and certain attributes / keywords), but additionally, the capture device (eg the camera model). By identifying the date / time, location, etc. of the photo. Others are words / phrases in the narrative description of photography authored by the photographer.

일부 메타데이터 용어들은 상이한 이미지들에 걸쳐 반복될 수 있다. 2개 이상의 이미지들에 공용인 디스크립터들이 식별(클러스터링)될 수 있고, 가장 인기있는 단어들이 랭킹될 수 있다. (이러한 리스팅은 도 46a의 "A"에 도시된다. 여기서, 그리고, 다른 메타데이터 리스팅에서, 설명의 편의를 위해 단지 부분적인 결과들이 주어진다.)Some metadata terms may be repeated over different images. Descriptors common to two or more images can be identified (clustered) and the most popular words can be ranked. (This listing is shown at "A" in Figure 46A. Here, and in other metadata listings, only partial results are given for ease of explanation.)

메타데이터로부터, 그리고 다른 분석으로부터, 제 1 세트에서 어떤 이미지들이 사람 중심이고, 어떤 이미지들이 장소-중심이고, 어떤 이미지들이 물건-중심일 가능성이 있는지를 결정하는 것이 가능할 수 있다.From metadata, and from other analysis, it may be possible to determine which images in the first set are person-centric, which images are place-centric, and which images are object-centric.

50개의 이미지 세트에서 태깅될 수 있는 메타데이터를 고려하자: 용어들 중 일부는 장소에 관련된다. 일부는 이미지들에 묘사된 사람들에 관련된다. 일부는 물건들에 관련된다.Consider metadata that can be tagged in 50 image sets: some of the terms are related to places. Some are related to the people depicted in the images. Some are related to things.

장소-중심 처리Place-based processing

장소에 관련된 용어들은 다양한 기술들을 이용하여 식별될 수 있다. 하나는 지리적 정보를 가진 데이터베이스를 이용하여 주어진 지리적 위치 근처의 위치 디스크립터들을 룩업하는 것이다. 야후의 지오플래넷 서비스는 예를 들면, 록펠러 센터의 위도/경도로 질의할 때, "록펠러 센터", "10024" (우편 코드), "미드타운 맨해튼", "뉴욕", "맨해튼" "뉴욕" 및 "미국"과 같은 디스크립터들의 계층을 리턴한다. Terms relating to a place may be identified using various techniques. One is to look up location descriptors near a given geographic location using a database with geographic information. Yahoo's GeoPlanet service, for example, queries the latitude / longitude of Rockefeller Center, "Rockefeller Center", "10024" (zip code), "Midtown Manhattan", "New York", "Manhattan" "New York" Returns a hierarchy of descriptors such as "and" United States ".

동일한 서비스가 예를 들면, "10017", "10020", "10036", "극장 지구(Theater District)", "카네기홀", "그랜드 센트럴 스테이션", "미국 민속 예술 박물관" 등과 같이 요청에 관한 인접하는/형제자매간 이웃들/특징들의 이름들을 리턴할 수 있다. The same services are available on request, for example, "10017", "10020", "10036", "Theater District", "Carnegie Hall", "Grand Central Station", "American Folk Art Museum", etc. It may return the names of neighbors / features between neighboring / siblings.

위도/경도 좌표들 또는 다른 위치 정보의 세트가 주어지면, 근처의 스트리트 이름들은 다양한 맵핑 프로그램들로부터 획득될 수 있다. Given a set of latitude / longitude coordinates or other location information, nearby street names can be obtained from various mapping programs.

근처의 장소-디스크립터들의 용어사전은 이러한 방식으로 컴파일될 수 있다. 플리커 이미지들의 세트로부터 획득된 메타데이터는 그 후에, 장소에 관련된 용어들(예를 들면, 용어사전에서 용어들과 매칭하는)을 식별하기 위해 용어사전을 참조하여 분석될 수 있다. The glossary of nearby place-descriptors can be compiled this way. The metadata obtained from the set of flicker images can then be analyzed with reference to the glossary to identify terms related to the place (eg, matching the terms in the glossary).

그 후에, 고려사항은 플리커로부터 수집된 이미지들의 참조 세트에 이들 장소-관련된 메타데이터의 이용으로 바뀐다. Subsequently, considerations turn to the use of these place-related metadata in a reference set of images collected from Flickr.

일부 이미지들은 장소-관련된 메타데이터가 아닌 데이터를 가질 수 있다. 이들 이미지들은 장소-중심이기보다는 사람-중심 또는 물건-중심일 가능성이 있다. Some images may have data that is not place-related metadata. These images are more likely to be person-centric or object-centric than place-centric.

다른 이미지들은 배타적으로 장소-관련되는 메타데이터를 가질 수 있다. 이들 이미지들은 사람-중심 또는 물건-중심이기보다는 장소-중심일 가능성이 있다. Other images may have exclusively place-related metadata. These images are more likely place-centric than person-centric or object-centric.

둘 사이에는 두 장소-관련된 메타데이터 및 다른 메타데이터를 갖는 이미지들이 존재한다. 다양한 규칙들이 고안되고, 이미지의 상대적 관련도를 장소에 할당하도록 활용될 수 있다. Between the two are images with two place-related metadata and other metadata. Various rules can be devised and utilized to assign the relative relevance of an image to a place.

하나의 규칙은 이미지와 연관된 메타데이터 디스크립터들의 수를 보고, 장소-관련된 용어들의 용어사전에서 발견되는 프렉션을 결정한다. 이것이 하나의 메트릭이다.One rule looks at the number of metadata descriptors associated with an image and determines the fractions found in the glossary of place-related terms. This is one metric.

다른 것은 메타데이터의 어디에서 장소-관련된 디스크립터들이 나타나는지를 본다. 이들이 이미지 타이틀에 나타나면, 이들은 사진에 관한 긴 설명적 기술의 끝에 나타나는 경우보다 더욱 관련된 가능성이 많다. 장소-관련 메타데이터의 장소는 다른 메트릭이다. The other sees where the place-related descriptors appear in the metadata. If they appear in the image titles, they are more relevant than they would appear at the end of a long descriptive description of the photograph. The place of place-related metadata is another metric.

장소-관련된 디스크립터의 특별성에 대한 고려가 또한 주어질 수 있다. 디스크립터 "뉴욕" 또는 "USA"는 "록펠러 센터" 또는 "그랜드 센트럴 스테이션"과 같이 더 많은 특정 디스크립터보다 이미지가 장소-중심인 것을 덜 나타낼 수 있다. 이것은 제 3 메트릭을 생성할 수 있다. Consideration may also be given to the specificity of the location-related descriptor. The descriptor "New York" or "USA" may indicate less that the image is place-centric than more specific descriptors, such as "Rockefeller Center" or "Grand Central Station". This may generate a third metric.

관련된 제 4 메트릭은 용어의 발생(또는 발생할 것 같은 않음)의 빈도를 고려한다 - 수집된 메타데이터 내에 또는 그 데이터의 수퍼세트 내에서. "록펠러 센터"보다는 이 관점에서부터 "RCA 빌딩"이 더 관련되며, 이것이 훨씬 덜 빈번하기 때문이다. The relevant fourth metric takes into account the frequency of occurrences (or unlikely to occur) of terms-within collected metadata or within a superset of that data. From this point of view the "RCA Building" is more relevant than the "Rockefeller Center" because it is much less frequent.

이들 및 다른 메트릭들은 설정에서의 각각의 이미지에 그 잠재적 장소-중심성을 나타내는 장소 점수가 할당하도록 조합될 수 있다. These and other metrics can be combined such that each image in the setting is assigned a place score that indicates its potential place-centricity.

조합은 4개의 팩터들의 직합이 될 수 있으며, 각각은 0 내지 100의 범위에 있다. 그러나,더욱 특히, 일부 메트릭들은 더욱 무겁게 가중될 것이다. 메트릭들 Ml, M2, M2 및 M4를 이용하는 다음의 수학식은 점수 S를 산출하도록 활용될 수 있고, 팩터들 A, B, C, D 및 지수들 W, X, Y 및 Z는 실험적으로, 베이스 기술들에 의해 결정된다:The combination can be a series of four factors, each in the range of 0 to 100. However, more particularly, some metrics will be weighted more heavily. The following equation using the metrics Ml, M2, M2 and M4 can be utilized to yield a score S, the factors A, B, C, D and the exponents W, X, Y and Z are experimentally based on the base technique. Determined by:

사람-중심 처리People-centric processing

플리커로부터 획득된 세트의 각각의 이미지의 사람-중심성을 추정하도록 활용될 수 있다. It can be utilized to estimate the person-centricity of each image of the set obtained from Flickr.

방금 주어진 예에서와 같이, 관련 용어들의 용어사전이 컴파일될 수 있다 - 이번 용어들은 사람과 연관된다. 장소 이름 용어사전과 대조적으로, 사람 이름 용어사전은 - 특정 현장과 관련되기보다는 - 전역적일 수 있다. (그러나, 상이한 용어사전들은 상이한 국가들에 적합할 수 있다.) As in the example just given, a glossary of related terms can be compiled-these terms are associated with a person. In contrast to place name glossaries, person name glossaries can be global—rather than associated with a particular site. (However, different glossaries may be appropriate for different countries.)

이러한 용어사전은 전화 디렉토리들, 가장 인기있는 이름들의 리스트들, 및 이름들이 나타나는 다른 참조 작업들을 포함한 다양한 소스들로부터 컴파일될 수 있다. 리스트가 다음과 같이 시작할 수 있다; "Aaron, Abigail, Adam, Addison, Adrian, Aidan, Aiden, Alex, Alexa, Alexander, Alexandra, Alexis, Allison, Alyssa, Amelia, Andrea, Andrew, Angel, Angelina, Anna, Anthony, Antonio, Ariana, Arianna, Ashley, Aubrey, Audrey, Austin, Autumn, Ava, Avery..." This glossary can be compiled from various sources, including phone directories, lists of the most popular names, and other reference tasks in which names appear. The list can begin with: "Aaron, Abigail, Adam, Addison, Adrian, Aidan, Aiden, Alex, Alexa, Alexander, Alexandra, Alexis, Allison, Alyssa, Amelia, Andrea, Andrew, Angel, Angelina, Anna, Anthony, Antonio, Ariana, Arianna, Ashley , Aubrey, Audrey, Austin, Autumn, Ava, Avery ... "

첫 번째 이름들이 단독으로 고려될 수 있거나, 또는 마지막 이름들이 역시 고려될 수 있다. (어떤 이름들은 장소 이름이거나 사람 이름일 수 있다. 인접한 첫 번째/마지막 이름들 및/또는 인접한 장소 이름들을 검색하는 것은 모호한 경우들을 구별하도록 도울 수 있다. 예를 들면, 엘리자베스 스미스는 사람이다; 엘리자베스 NJ는 장소이다.)The first names may be considered alone, or the last names may also be considered. (Some names may be place names or person names. Searching for adjacent first / last names and / or adjacent place names can help distinguish ambiguous cases. For example, Elizabeth Smith is a person; Elizabeth NJ is a place.)

개인적 대명사 등이 또한, 이러한 용어사전에 포함될 수 있다(예를 들면, 그, 그녀, 그에게, 그녀에게, 그를, 그녀를, 그의, 우리의, 그녀의, 나, 나를, 나 자신, 우리, 그들, 그들에게, 나의 것, 그들의). 사람 및 개인적 관계들을 식별하는 명사들이 또한 포함될 수 있다(예를 들면, 삼촌, 언니, 딸, 할아버지, 사장님, 학생, 종업원, 웨딩 등). Personal pronouns may also be included in this glossary (eg, he, she, him, to her, him, her, his, our, her, me, me, myself, us, Them, to them, mine, their). Nouns that identify people and personal relationships may also be included (eg, uncle, older sister, daughter, grandfather, boss, student, employee, wedding, etc.).

사람과 일반적으로 연관되는 속성들 및 오브젝트들의 이름들일 수 있으므로(예를 들면, 티셔츠, 백팩, 선글라스, 태닝된, 등), 일반적으로 사람에게 적용되는 형용사들 및 부사들이 또한 사람-용어 용어사전에 포함될 수 있다(예를 들면 행복한, 지루한, 금발인, 등). 사람과 연관된 동사들이 또한 활용될 수 있다(예를 들면, 서핑, 드링킹). As they may be names of attributes and objects that are commonly associated with a person (e.g., t-shirts, backpacks, sunglasses, tanned, etc.), adjectives and adverbs that generally apply to a person may also be used in a person-term glossary. May be included (eg happy, boring, blond, etc.). Verbs associated with humans may also be utilized (eg surfing, drinking).

이 최종 그룹에서, 어떤 다른 것들에서와 같이, 물건-중심 이미지들에 또한 적용될 수 있는 어떤 용어들이 있다(사람-중심이기보다). 용어 "선글라스"는 선글라스를 단독으로 묘사하는 이미지에 대한 메타데이터에서 나타날 수 있다; "행복한"은 개를 묘사하는 이미지에 대한 메타데이터에서 나타날 수 있다. 사람-용어가 또한 장소-용어(예를 들면, 지루한, 오레곤)일 수 있는 일부 경우들이 또한 존재한다. 더욱 정교한 실시예들에서, 용어사전 용어들은 각각의 신뢰 메트릭들과 연관될 수 있으며, 이에 의해, 이러한 용어들에 기초한 임의의 결과들이 디스카운트될 수 있거나, 상이한 정도들의 불확실성을 가지기 위해 확인응답될 수 있다.)In this final group, as in some others, there are certain terms that can also be applied to object-centric images (rather than person-centric). The term “sunglasses” may appear in metadata for an image that depicts sunglasses alone; "Happy" can appear in the metadata for an image depicting a dog. There are also some cases where a person-term may also be a place-term (eg, boring, Oregon). In more sophisticated embodiments, glossary terms may be associated with respective confidence metrics, whereby any results based on these terms may be discounted or acknowledged to have different degrees of uncertainty. have.)

전과 같이, 이미지가 임의의 사람-관련된 메타데이터와 연관되지 않는다면, 이미지는 사람-중심일 가능성이 없는 것으로 판단될 수 있다. 역으로, 모든 메타데이터가 사람-관련이라면, 이미지는 사람-중심일 가능성이 있다. As before, if the image is not associated with any person-related metadata, it may be determined that the image is unlikely to be person-centric. Conversely, if all metadata is person-related, then the image is likely person-centric.

다른 경우들에 대해, 상기에 리뷰된 것들과 같은 메트릭들은 예를 들면, 이미지와 연관된 사람-관련된 메타데이터의 수, 장소, 특정성 및/또는 빈도/불가능성에 기초하여, 각각의 이미지의 상대적 사람-중심성을 나타내는 점수를 산출하도록 평가되고 조합될 수 있다.For other cases, metrics such as those reviewed above may be based on, for example, the relative person of each image, based on the number, place, specificity and / or frequency / inability of the person-related metadata associated with the image. -Can be evaluated and combined to yield a score indicating centrality.

메타데이터의 분석이 이미지가 사람-중심인지의 여부에 관한 유용한 정보를 제공하지만, 다른 기술들이 또한 활용될 수 있다 - 대안적으로, 또는 메타데이터 분석과 함께. While the analysis of the metadata provides useful information as to whether the image is person-centric, other techniques may also be utilized-alternatively, or in conjunction with metadata analysis.

한 가지 기술은 피부톤 컬러들의 연속하는 영역들을 찾아 이미지를 분석하는 것이다. 이러한 특징들은 사람-중심 이미지들의 많은 특징들을 특성화하지만, 장소들 및 물건들의 이미지에서 덜 빈번히 발견된다. One technique is to find successive areas of skin tone colors and analyze the image. These features characterize many of the people-centric images, but are less frequently found in the images of places and objects.

관련된 기술은 얼굴 인식이다. 이 과학은 값싼 보고 찍기만 하면 되는 디지털 카메라들조차도, 이미지 프레임 내의 얼굴들을 신속하고 신뢰 가능하게 식별할 수 있는 지점(예를 들면, 이러한 대상들에 기초하여 이미지에 초점을 맞추거나 노출하기 위해)까지 진보하였다.A related technique is face recognition. This science is the point where even digital cameras that only need to look and shoot cheaply can quickly and reliably identify faces in an image frame (for example, to focus or expose an image based on these objects). Progressed to

(얼굴 검색 기술은 예를 들면, 특허들 5,781,650 (Central Florida 대학), 6,633,655 (Sharp), 6,597,801 (Hewlett-Packard) 및 6,430,306 (L-1 Corp.), 및 2002년 1월 IEEE Transactions on Pattern Analysis and Machine Intelligence의 제1호, 제24권, 1, 34-58쪽에서 Yang 등에 의한, Detecting Faces in Images: A Survey, 및 2003년, ACM Computing Surveys, 399-458쪽에서 Zhao 등에 의한 Face Recognition: A Literature Survey에 상술된다.) (Face retrieval techniques are described, for example, in patents 5,781,650 (Central Florida University), 6,633,655 (Sharp), 6,597,801 (Hewlett-Packard) and 6,430,306 (L-1 Corp.), and in January 2002 IEEE Transactions on Pattern Analysis and Detecting Faces in Images: A Survey, by Yang et al. In Machine Intelligence, Vol. 1, No. 24, pp. 1, 34-58, and Face Recognition: A Literature Survey, by Zhao et al., 2003, ACM Computing Surveys, pp. 399-458. It is detailed in.)

얼굴 인식 알고리즘들은 명백한 얼굴들을 가진 것들을 식별하고, 얼굴들에 대응하는 이미지들의 위치들을 식별하기 위해, 플리커로부터 획득된 참조 이미지들의 세트에 적용될 수 있다.Face recognition algorithms may be applied to a set of reference images obtained from flicker to identify those with obvious faces and to identify locations of images corresponding to the faces.

당연히, 많은 포토들은 이미지 프레임 내에서 우연히 묘사된 얼굴들을 가진다. 얼굴을 가진 모든 이미지들은 사람-중심으로서 식별될 수 있지만, 대부분의 실시예들은 더욱 개량된 평가를 제공하기 위한 다른 처리를 활용한다. Naturally, many photos have faces depicted by chance within an image frame. All images with faces can be identified as person-centric, but most embodiments utilize other processing to provide a further refined assessment.

다른 처리의 한 형태는 식별된 얼굴(들)에 의해 점유된 이미지 프레임의 백분율 영역을 결정하는 것이다. 백분율이 높을수록, 이미지가 사람-중심일 가능성이 높다. 이것은 이미지의 사람-중심 점수를 결정하는데 이용될 수 있는 것과는 다른 메트릭이다. Another form of processing is to determine the percentage area of the image frame occupied by the identified face (s). The higher the percentage, the more likely the image is person-centric. This is a different metric than can be used to determine the person-centric score of the image.

다른 처리의 다른 형태는 (1) 이미지의 하나 이상의 얼굴들의 존재를, (2) 이미지와 연관된 메타데이터의 사람-디스크립터들과 함께 찾는 것이다. 이러한 경우에, 얼굴 인식 데이터는 메타데이터 또는 다른 분석에 기초하여 이미지의 사람-중심 점수를 증가시키기 위해 "플러스" 팩터로서 이용될 수 있다. ("플러스"는 다양한 형태들을 취할 수 있다. 예를 들면, 점수(0 - 100 스케일에서)는 10 씩 증가될 수 있거나, 10%씩 증가될 수 있다. 또는 100까지의 나머지 거리의 절반씩 증가된다, 등.) Another form of other processing is to (1) find the presence of one or more faces in an image, along with (2) the person-descriptors of metadata associated with the image. In such cases, facial recognition data may be used as a "plus" factor to increase the person-centric score of the image based on metadata or other analysis. ("Plus") can take various forms. For example, the score (on a scale of 0-100) can be increased by 10, or by 10%, or by half of the remaining distance to 100. , Etc.)

따라서, 예를 들면, "엘리자베스" 메타데이터로 태깅된 사진은 얼굴이 발견되지 않는 경우보다 얼굴 인식 알고리즘이 이미지 내에서 얼굴을 발견하는 경우가 더 사람-중심 사진일 가능성이 많다. Thus, for example, a picture tagged with "Elizabeth" metadata is more likely a person-centric picture when the face recognition algorithm finds a face within the image than when no face is found.

(반대로, 이미지의 임의의 얼굴의 부재는 이미지 대상이 상이한 타입, 예를 들면 장소 또는 물건인 신뢰를 증가시키기 위해 "플러스" 팩터로서 이용될 수 있다. 따라서, 메타데이터로서 엘리자베스로 태깅되지만 임의의 얼굴이 결여된 이미지는 이미지가 장소 명명된 엘리자베스 또는 물건 명명된 엘리자베스 - 애완동물과 같이 - 에 관련될 가능성을 증가시킨다.)(On the contrary, the absence of any face in an image can be used as a "plus" factor to increase the confidence that the image object is of a different type, for example a place or an object. Images without faces increase the likelihood that the image is related to a place-named Elizabeth or an object-named Elizabeth-like a pet.)

얼굴 인식 알고리즘이 여성으로서 얼굴을 인식하고 메타데이터가 여성 이름을 포함하는 경우에, 또한 결정시 더욱 신뢰된다고 가정될 수 있다. 당연히, 이러한 어레인지먼트는 용어사전 - 또는 다른 데이터 구조 - 이 적어도 어떤 이름들과 성별들을 연관시키는 데이터를 가지는 것을 요구한다. If the face recognition algorithm recognizes a face as a female and the metadata includes a female name, it can also be assumed that it is more trusted in the decision. Naturally, this arrangement requires that the glossary-or other data structure-have data that associates at least certain names and genders.

(더욱 정교한 어레인지먼트들이 구현될 수 있다. 예를 들면, 묘사된 사람(들)의 나이는 자동화된 기술들을 이용하여 추정될 수 있다(예를 들면, Central Florida 대학의 특허 5,781,650에 상술된 바와 같이). 이미지 메타데이터에서 발견된 이름들은 또한 이러한 이름의 사람(들)의 나이를 추정하기 위해 처리될 수 있다. 이것은 나이의 함수로서 이름의 통계적 분포에 관한 공용 도메인 정보를 이용하여 행해질 수 있다(예를 들면, 공개된 사회 보장 관리 데이터로부터, 및 생일 기록들로부터 가장 인기 있는 이름들을 상술하는 웹 사이트들로부터). 따라서, 이름들 Mildred 및 Gertrude는 80살에서 피크인 나이 분포와 연관될 수 있는 반면, Madison 및 Alexis는 8살이 피크인 나이 분포와 연관될 수 있다. 메타데이터 이름과 추정된 사람 나이 사이의 통계적으로 가능한 상응을 찾으면, 이미지에 대한 사람-중심 점수를 더욱 증가시킬 수 있다. 통계적으로 불가능한 상응은 사람-중심 점수를 감소시키기 위해 이용될 수 있다. (소비자의 이미지에서 대상의 나이에 관한 추정된 정보는 또한, 대상의 성별에 관한 정보일 수 있으므로, 직관된 응답(들)로 재단하기 위해 이용될 수 있다.)(More sophisticated arrangements can be implemented. For example, the age of the depicted person (s) can be estimated using automated techniques (eg, as detailed in Patent 5,781,650 of Central Florida University). Names found in image metadata can also be processed to estimate the age of the person (s) of this name, which can be done using public domain information about the statistical distribution of names as a function of age (eg For example, from published social security management data, and from Web sites detailing the most popular names from birthday records.) Thus, the names Mildred and Gertrude may be associated with an age distribution that peaks at 80 years old. , Madison, and Alexis can be associated with an age distribution that peaks at 8. The statistically possible correspondence between the metadata name and the estimated human age If found, the person-centric score for the image can be further increased, a statistically impossible correspondence can be used to reduce the person-centric score (the estimated information about the subject's age in the consumer's image is also It may be information about the subject's gender and may be used to tailor the intuitive response (s).)

이미지에서 얼굴의 검출이 메타데이터에 기초하여 점수가 "플러스" 팩터로서 이용될 수 있으므로, 사람-중심 메타데이터의 존재는 얼굴 인식 데이터에 기초하여 사람-중심 점수를 증가시키기 위해 "플러스" 팩터로서 이용될 수 있다. Since the detection of a face in an image can be used as a "plus" factor based on metadata, the presence of the person-centered metadata as a "plus" factor to increase the person-centric score based on face recognition data. Can be used.

당연히, 이미지에서 얼굴이 발견되지 않으면, 이 정보는 이미지에 대한 사람-중심 점수를 감시시키기 위해 이용될 수 있다(아마도, 영으로 다운).Naturally, if no face is found in the image, this information can be used to monitor the person-centric score for the image (perhaps down to zero).

물건-중심 처리Stuff-centric processing

물건-중심 이미지는 본 예에서 플리커로부터 획득된 이미지들의 세트에서 발견될 수 있는 이미지의 제 3 타입이다. 이미지에 대한 물건-중심 점수가 결정될 수 있는 다양한 기술들이 존재한다. The object-centric image is a third type of image that can be found in the set of images obtained from flicker in this example. There are various techniques by which the object-centric score for an image can be determined.

하나의 기술은 상술된 것들과 같은 원리들을 이용하여 메타데이터 분석에 의존한다. 명사의 용어사전이 컴파일될 수 있고 - 대량 플리커 메타데이터로부터 또는 일부 다른 코퍼스(예를 들면, WordNet)로부터 -, 발생의 빈도에 의해 랭킹될 수 있다. 장소들 및 사람들과 연관된 명사들은 용어사전으로부터 제거될 수 있다. 용어사전은 각각에 대한 점수를 산출하기 위해 이미지의 메타데이터의 분석들을 행하기 위해 상기 식별된 방식들에서 이용될 수 있다. One technique relies on metadata analysis using the same principles as those described above. The glossary of nouns can be compiled and ranked by the frequency of occurrences-from bulk flicker metadata or from some other corpus (eg, WordNet). Nouns associated with places and people may be removed from the glossary. The glossary can be used in the above identified ways to perform analyzes of the metadata of the image to yield a score for each.

다른 방식은 알려진 물건-관련된 이미지들의 라이브러리에 대해 각각 매칭하는 물건- 중심 이미지들을 식별하기 위해 패턴 매칭을 이용한다. Another approach uses pattern matching to identify object-centric images that each match against a library of known object-related images.

또 다른 방식은 사람-중심 및 장소-중심에 대한 초기 결정된 점수들에 기초한다. 물건-중심 점수는 다른 2개의 점수들에 역 관계로 할당될 수 있다(즉, 이미지 점수들이 사람-중심인 것에 대해 낮고, 물건-중심에 대해 낮은 경우, 물건-중심에 대해 높은 점수가 할당될 수 있다).Another approach is based on initially determined scores for person-centered and place-centric. The object-centric score may be assigned inversely to the other two scores (ie, if the image scores are low for person-centric and low for object-centric, a high score for object-centric may be assigned). Can be).

이러한 기술들은 조합될 수 있거나, 개별적으로 이용될 수 있다. 임의의 이벤트에서, 점수는 각각의 이미지에 대해 생성된다 - 이미지가 물건-중심일 가능성이 높거나 낮은지를 나타내려는 경향이 있음.These techniques may be combined or used separately. In any event, a score is generated for each image-tending to indicate whether the image is more or less likely to be object-centric.

이미지들의 샘플 세트의 다른 처리Other processing of the sample set of images

상술된 기술들에 의해 생성된 데이터는 이미지가 (1) 사람-중심, (2) 장소-중심, 또는 (3) 물건-중심인 대충의 신뢰/확률/가능성을 나타내는 세트 내의 각각의 이미지에 대해 3개의 점수들을 생성할 수 있다. 이들 점수들은 100%로 추가될 필요는 없다(그럴 수 있을지라도). 때때로, 이미지는 2개 이상의 카테고리들에서 점수가 높을 수 있다. 이러한 경우에서, 이미지는 다수의 관련도를 가는 것으로, 예를 들면, 양쪽 모두 사람 및 물건을 묘사하는 것으로 간주될 수 있다.The data generated by the techniques described above is for each image in the set representing roughly trust / probability / likelihood that the image is (1) person-centric, (2) place-centric, or (3) object-centric. Three scores can be generated. These scores need not be added to 100% (although it may be). Occasionally, an image may have a high score in two or more categories. In such cases, the image may be considered to be of multiple relevance, for example, both depicting people and objects.

플리커로부터 다운로드된 이미지들의 세트는 다음에, 주로 사람-중심, 장소-중심, 또는 물건-중심으로 식별되는지에 의존하여, 예를 들면 A, B 및 C의 그룹들로 분리될 수 있다. 그러나, 일부 이미지들이 확률들을 분리할 수 있기 때문에(예를 들면, 이미지가 장소-중심인 어떤 표시자를 가질 수 있고, 일부 표시자는 사람-중심일 수 있음), 예를 들면 이미지를 높은 점수로 전체적으로 식별하는 것은 유용한 정보를 무시한다. 이미지들의 세트에 대해 - 3개의 카테고리들의 각각의 이미지의 각각의 점수를 고려하여 - 가중된 점수를 계산하는 것이 양호하다.The set of images downloaded from Flickr may then be separated into groups of A, B and C, for example, depending on whether they are identified primarily as person-centric, place-centric, or object-centric. However, since some images may separate probabilities (eg, an image may have some place-centric indicator, some indicators may be person-centric), for example, an image as a whole with a high score. Identifying ignores useful information. It is preferable to calculate the weighted score for the set of images-taking into account the respective score of each image of the three categories.

플리커로부터의 이미지들의 샘플 - 모두 록펠러 센터 근처에서 취해짐 - 은 60%가 장소-중심이고, 25%가 사람-중심이고, 15%가 물건-중심인 것을 제안할 수 있다. A sample of images from Flickr-all taken near the Rockefeller Center-can suggest that 60% are place-centric, 25% are person-centric, and 15% are object-centric.

이 정보는 여행자의 셀 폰 이미지에 - 이미지 자체의 콘텐트들에 관련되지 않고서도 - 유용한 인사이트를 제공한다(그 지리적 코딩을 제외하고). 즉, 이미지가 장소-중심이고, 사람-중심인 가능성이 적고, 또한 물건 중심인 확률이 적은 것이 양호한 기회들이다. (이러한 순서는 처리의 후속 스테이지계들의 순서를 결정하기 위해 이용될 수 있다 - 시스템이 적절할 가능성이 가장 큰 응답들을 더욱 신속하게 제공하도록 허용한다.)This information provides useful insight (except for its geographic coding) to the traveler's cell phone image-without regard to the contents of the image itself. That is, it is good opportunities that the image is less likely to be place-centric, person-centric, and less likely to be object oriented. (This order can be used to determine the order of subsequent stage systems of processing-allowing the system to more quickly provide the responses most likely to be appropriate.)

셀 폰 사진의 이러한 타입-평가가 이미지에 응답하여 여행자에게 제공된 자동화된 동작을 결정하도록 돕기 위하여 - 단독으로 - 이용될 수 있다. 그러나 다른 처리가 이미지의 콘텐트들을 더욱 양호하게 평가할 수 있고, 그에 의해 더욱 특별하게 재단된 동작이 직관적이게 허용한다.This type-evaluation of the cell phone photo can be used-alone-to help determine the automated action provided to the traveler in response to the image. However, other processing may better evaluate the contents of the image, thereby allowing a more specifically tailored action to be intuitive.

유사성 평가들 및 메타데이터 가중Similarity Assessments and Metadata Weighting

플리커로부터 수집된 공동 위치된 이미지들의 세트 내에서, 장소-중심인 이미지들은 사람-중심이거나 물건-중심인 이미지들과는 상이한 출현을 가지려고 할 것이지만, 장소-중심 그룹 내에 어떤 유사성을 가지려고 할 것이다. 장소-중심 이미지들은 직선들에 의해 특징지워질 수 있다(예를 들면, 아키텍처 에지들). 또는 반복적인 패턴들(윈도우들)에 의해. 또는 이미지(하늘)의 상부 근처의 유사한 컬러 및 균일한 텍스처의 큰 영역들에 의해.Within a set of co-located images collected from Flickr, place-centric images will try to have a different appearance than person-centric or object-centric images, but will have some similarity within the place-centric group. Place-centric images can be characterized by straight lines (eg, architectural edges). Or by repetitive patterns (windows). Or by large areas of similar color and uniform texture near the top of the image (sky).

사람-중심인 이미지들이 또한, 다른 2개의 등급들의 이미지와는 상이한 출현들을 가지려고 할 것이지만, 사람-중심 등급 내에서 공통 속성들을 가질 것이다. 예를 들면, 사람-중심 이미지들은 - 2개의 눈들과 코, 피부 톤들의 영역들 등을 가진 타원형 형상에 의해 특징지워진 - 일반적으로 얼굴들을 가질 것이다. Person-centric images will also try to have different appearances than images of the other two grades, but will have common attributes within the person-centric grade. For example, human-centered images will generally have faces-characterized by an elliptical shape with two eyes and a nose, areas of skin tones, and the like.

물건-중심 이미지들이 가장 다른 종류일 가능성이 있지만, 임의의 주어진 지리학으로부터의 이미지들은 속성들 또는 특징들을 단일화하려는 경향이 있을 수 있다. 말 트랙에 지리적 코딩된 사진들은 어떤 빈도를 가지고 말들을 묘사할 것이다; 필라델피아에서의 인디펜던스 내셔널 히스토리컬 파크로부터의 지리적 코딩된 사진들은 정기적으로 자유의 종을 묘사하려고 할 것이다. While object-centric images are likely to be the most different kind, images from any given geography may tend to unify properties or features. Geographically coded pictures on a horse track will describe the horses at any frequency; Geographically coded photographs from Independence National Historical Park in Philadelphia will regularly attempt to describe the species of freedom.

셀 폰 이미지가 플리커 이미지들의 세트에서 장소-중심 또는 사람-중심, 또는 물건-중심 이미지들과 더 유사한지의 여부를 결정함으로써, 셀 폰 이미지의 대상에서 더 많은 신뢰가 달성될 수 있다(그리고, 더욱 정확한 응답이 직관될 수 있어서 소비자에게 제공될 수 있다). By determining whether the cell phone image is more similar to place-centric or person-centric, or object-centric images in the set of flicker images, more trust can be achieved in the object of the cell phone image (and, moreover, The correct response can be intuitive and provided to the consumer).

고정된 세트의 이미지 평가 기준은 이들 카테고리들에서 이미지들을 구별하기 위해 적용될 수 있다. 그러나, 상술된 실시예는 이러한 기준을 적응적으로 결정한다. 특히, 이 실시예는 이미지들의 세트를 조사하고, 어떤 이미지 특징들/특성들/메트릭들이 가장 신뢰할 수 있게 (1) 동일-카테고리화된 이미지와 함께 그룹화되는지(유사성); 및 (2) 상이하게 카테고리화된 이미지들을 서로로 구별되는지(차이)를 결정한다. 이미지들의 세트 내의 유사성/차이 거동에 대해 측정 및 확인될 수 있는 속성들 중에는 우세한 컬러; 컬러 다이버시티; 컬러 히스토그램; 우세한 텍스처; 텍스처 다이버시티; 텍스처 히스토그램; 에지성(edginess); 웨이블릿-도메인 변환 계수 히스토그램들, 및 우세한 웨이블릿 계수들; 주파수 도메인 송신 계수 히스토그램들 및 우세한 주파수 계수들(상이한 컬러 채널들에서 계산될 수 있음); 고유값들; 키포인트 디스크립터들; 기하학 등급 확률들; 대칭; 얼굴로서 식별된 이미지 영역의 백분율; 이미지 자동상관; 이미지의 저차원의 "요지들" 등이다.(그러한 메트릭들의 조합들은 개별적인 특성들보다 더욱 신뢰할 수 있다.)A fixed set of image evaluation criteria can be applied to distinguish images in these categories. However, the embodiment described above adaptively determines this criterion. In particular, this embodiment examines the set of images and which image features / characteristics / metrics are most reliably grouped together (1) with the same-categorized image (similarity); And (2) whether different categorized images are distinguished from each other (difference). Among the attributes that can be measured and identified for similarity / difference behavior in a set of images, prevailing color; Color diversity; Color histogram; Predominant textures; Texture diversity; Texture histogram; Edginess; Wavelet-domain transform coefficient histograms, and dominant wavelet coefficients; Frequency domain transmission coefficient histograms and dominant frequency coefficients (which can be calculated in different color channels); Eigenvalues; Keypoint descriptors; Geometry class probabilities; Symmetry; The percentage of the image area identified as a face; Image autocorrelation; Low-level "points" of the image, etc. (Combinations of such metrics are more reliable than individual characteristics.)

어떤 메트릭들이 이들 목적들에 가장 현저한지를 결정하는 한 가지 방법은 참조 이미지들에 대한 다양한 상이한 이미지 메트릭들을 계산하는 것이다. 특정 메트릭에 대한 이미지들의 카테고리 내의 결과들이 클러스터링되는 경우(예를 들면, 장소-중심 이미지들에 대해, 컬러 히스토그램이 특정 출력 값들 근처에서 클러스터링되는 경우), 및 다른 카테고리들에서의 이미지들이 그 클러스터링된 결과 근처의 출력 값들을 거의 가지지 않는 경우, 메트릭은 이미지 평가 기준으로서 이용하기에 아주 적합한 것으로 나타난다. (클러스터링은 일반적으로 k-평균 알고리즘의 구현을 이용하여 실행된다.) One way to determine which metrics are most prominent for these purposes is to calculate various different image metrics for the reference images. If results in a category of images for a particular metric are clustered (eg, for place-centric images, a color histogram is clustered near specific output values), and images in other categories are clustered. If there are few output values near the result, the metric appears to be well suited to use as an image evaluation criterion. (Clustering is generally performed using an implementation of the k-means algorithm.)

록펠러 센터로부터의 이미지들의 세트에서, 시스템은 >40의 에지성 점수가 장소-중심만큼 높게 점수가 매겨진 이미지들과 신뢰성 있게 연관되는지; >15%의 얼굴 영역 점수가 사람-중심만큼 높게 점수가 매겨진 이미지들과 신뢰성 있게 연관되는지; 및 낮은 이미지 주파수들에서 피크하는 황색에 대한 주파수 콘텐트와 함께 - 금색 톤들의 로컬 피크를 갖는 컬러 히스토그램이 물건-중심만큼 높게 점수가 매겨진 이미지들과 다소 연관되는지를 결정할 수 있다. In the set of images from the Rockefeller Center, the system checks whether an edgeness score of> 40 is reliably associated with the images scored as high as place-centric; Facial area score of> 15% is reliably associated with images scored as high as person-centered; And a color histogram with a local peak of gold tones, with frequency content for yellow that peaks at low image frequencies, may be somewhat associated with images scored as object-centric as high.

상이한 카테고리들의 이미지들을 그룹화/구별하는데 가장 유용한 것으로 발견된 분석 기술들은 그 후에 이용자의 셀 폰 이미지에 적용될 수 있다. 그 후에 결과들은 근접도 - 거리 측정 견지에서(예를 들면, 다차원 공간) - 에 대해 분석될 수 있으며, 특성적인 특징들은 상이한 카테고리들의 이미지들과 연관된다. (이것은 셀 폰 이미지가 이 특정 실시예에서 처리된 첫 번째이다.)The analysis techniques found to be most useful for grouping / dividing images of different categories can then be applied to the user's cell phone image. The results can then be analyzed for proximity-in terms of distance measurement (eg multidimensional space), with characteristic features associated with different categories of images. (This is the first time a cell phone image has been processed in this particular embodiment.)

이러한 기술들을 이용하여, 셀 폰 이미지는 물건-중심에 대해 60, 장소-중심에 대해 15, 및 사람-중심에 대해 0으로 점수를 매길 수 있다(0 - 100의 스케일에 대해). 이것은 셀 폰 이미지를 분류하기 위해 이용될 수 있는 제 2의 더욱 양호한 세트의 점수들이다(제 1은 플리커에서 발견된 공동 위치된 사진들의 통계적 분포이다). Using these techniques, the cell phone image can be scored as 60 for object-centric, 15 for place-centric, and 0 for person-centric (for a scale of 0-100). This is a second, better set of scores that can be used to classify cell phone images (first is the statistical distribution of co-located photos found in Flickr).

이용자의 셀 폰 이미지의 유사성은 다음에 참조 세트에서 개별 이미지들과 비교될 수 있다. 초기에 식별된 유사성 메트릭들이 이용될 수 있거나, 상이한 측정들이 적용될 수 있다. 이 작업에 쏟은 시간 또는 처리는 방금 결정된 점수들에 기초하여 3개의 상이한 이미지 카테고리들에 걸쳐 배분될 수 있다. 예를 들면, 처리는 100% 사람-중심으로서 분류된 참조 이미지들과 유사성을 판단할 시간을 소비하지 않을 수 있지만, 대신에 물건- 또는 장소-중심으로 분류된 참조 이미지들과의 유사성을 판단하는데 집중할 수 있다(후자보다 전자에 더 많은 수고로움이 적용된다 - 예를 들면 4배만큼 수고로움). 유사성 점수는 참조 세트에서의 이미지들의 대부분에 대해 생성된다(100% 사람-중심으로 평가되는 것들을 포함).The similarity of the user's cell phone image can then be compared to the individual images in the reference set. Initially identified similarity metrics may be used or different measurements may be applied. The time or processing devoted to this task can be distributed across three different image categories based on the just determined scores. For example, the process may not spend time determining similarity with reference images classified as 100% person-centric, but instead determines similarity with reference images classified as object- or place-centric. Can concentrate (more effort is applied to the former than the latter-for example 4 times as hard). Similarity scores are generated for most of the images in the reference set (including those evaluated at 100% person-centered).

그 후에, 고려사항을 메타데이터로 되돌린다. 참조 이미지들로부터의 메타데이터는 다시 어셈블링된다 - 이 시간은 셀 폰 이미지에 대한 각각의 이미지의 각각의 유사성에 따라 가중된다. (가중은 선형이거나 지수적일 수 있다.) 유사한 이미지들로부터의 메타메이터가 유사하지 않은 이미지들로부터의 메타데이터보다 더 많이 가중되므로, 결과로서 생긴 세트의 메타데이터는 셀 폰 이미지에 더 많이 대응할 가능성이 있도록 재단된다. After that, the consideration is returned to the metadata. Metadata from the reference images is assembled again-this time is weighted according to each similarity of each image to the cell phone image. (Weights can be linear or exponential.) Since metadata from similar images is weighted more than metadata from dissimilar images, the resulting set of metadata is more likely to correspond to cell phone images. This is so tailored.

결과로서 생긴 세트로부터, 최상부 N(예를 들면 3)개의 메타데이터 디스크립터들이 이용될 수 있다. 또는 총계M%의 메타데이터 세트를 포함하는 - 가중에 기초하여 - 디스크립터들이 이용될 수 있다. From the resulting set, the top N (e.g. 3) metadata descriptors can be used. Or descriptors based on weighting, including a metadata set of total M%, may be used.

주어진 예에서, 이렇게 식별된 메타데이터는 "록펠러 센터", "프로메테우스" 및 "스케이팅 링크"를 포함할 수 있으며, 각각은 19, 12 및 5의 점수들을 가진다(도 46b의 "B" 참조). In the given example, the metadata so identified can include "Rockefeller Center", "Prometheus" and "Skating Link", each with scores of 19, 12 and 5 (see "B" in FIG. 46B). .

이 가중된 세트의 메타데이터를 이용하여, 시스템은 소비자에게 어떤 응답들이 가장 적합한지를 결정하기 시작할 수 있다. 그러나, 예시적 실시예에서, 시스템은 셀 폰 이미지의 평가를 더욱 개량함으로써 계속된다. (시스템은 또한, 다른 처리를 착수하는 동안 적합한 응답들을 결정하기 시작할 수 있다.)Using this weighted set of metadata, the system can begin to determine which responses are best for the consumer. However, in the exemplary embodiment, the system continues by further improving the evaluation of the cell phone image. (The system may also begin to determine appropriate responses while embarking on another process.)

제 2 세트의 참조 이미지들 처리Processing a second set of reference images

이 지점에서, 시스템은 셀 폰 이미지에 관해 더욱 양호하게 통보받는다. 위치뿐만 아니라; 가능한 타입(물건-중심) 및 가장 가능성 있는 관련 메타데이터도 알게 된다. 이 메타데이터는 플리커로부터 제 2 세트의 참조 이미지들을 획득하는데 이용될 수 있다. At this point, the system is better informed about the cell phone image. As well as location; You will also know the possible types (stuff-centric) and the most likely related metadata. This metadata can be used to obtain a second set of reference images from Flickr.

예시적 실시예에서, 플리커는 식별된 메타데이터를 갖는 이미지들에 대해 질의된다. 질의는 셀 폰의 지리적 위치에 지리적으로 제한될 수 있거나, 또는 더 넓은(또는 제한되지 않은) 지리학이 검색될 수 있다. (또는 질의는 2번 실행될 수 있어서, 이미지들의 절반이 셀 폰 이미지와 공동으로 위치되고, 나머지들은 원격에 있다 등.) In an exemplary embodiment, Flickr is queried for images with identified metadata. The query may be geographically limited to the geographic location of the cell phone, or wider (or unrestricted) geography may be retrieved. (Or the query can be executed twice, so that half of the images are co-located with the cell phone image, the rest are remote, etc.)

검색은 모든 식별된 메타데이터와 태깅과는 이미지들을 먼저 찾을 수 있다. 이 경우, 60개의 이미지들이 발견된다. 더 많은 이미지들을 원한다면, 플리커는 상이한 쌍들로 또는 개별적으로 메타데이터 용어들이 검색될 수 있다. (이들 후자의 경우들에서, 선택된 이미지들의 분포는 결과들의 메타데이터 상응이 상이한 메타데이터 용어들의 각각의 점수, 예를 들면 19/12/5에 대응하도록 선택될 수 있다.) The search can find all identified metadata and tagging images first. In this case 60 images are found. If more images are desired, Flickr can be searched for metadata terms in different pairs or individually. (In these latter cases, the distribution of the selected images may be selected such that the metadata correspondence of the results corresponds to each score of different metadata terms, eg 19/12/5.)

이 제 2 세트의 이미지들로부터의 메타데이터는 수집, 클러스터링될 수 있고, 랭킹될 수 있다(도 46b에서 "C"). (불필요한 단어들("및, 의, 또는" 등)이 제거될 수 있다. 사진의 타입 또는 카메라의 전용 설명 단어들이 또한 무시될 수 있다(예를 들면, "니콘", "D80", "HDR", "흑백" 등). 달 이름들도 또한 제거될 수 있다.) Metadata from this second set of images may be collected, clustered, and ranked (“C” in FIG. 46B). (Unnecessary words ("and, of, or", etc.) can be eliminated.) Descriptive words of the type of camera or dedicated words of the camera can also be ignored (e.g. "Nikon", "D80", "HDR"). "," Black and white ", etc.) Month names can also be removed.)

초기에 실행된 분석 - 제 1 세트의 이미지들에서의 각각의 이미지가 사람-중심, 장소-중심 또는 물건-중심으로 분류되는 것에 의해 - 은 제 2 세트의 이미지들에서의 이미지들에 대해 반복될 수 있다. 이 제 2 이미지 세트의 등급들 내 및 사이의 유사성/차이를 결정하기 위한 적합한 이미지 메트릭들이 식별될 수 있다(또는 초기의 측정들이 활용될 수 있다). 이들 측정들은 그 후에, 이전과 같이, 사람-중심, 장소-중심 또는 물건-중심인 것으로서 이용자의 셀 폰 이미지에 대한 개량된 점수들을 생성하기 위해 적용된다. 제 2 세트의 이미지들을 참조하여, 셀 폰 이미지는 물건-중심에 대해 65, 장소-중심에 대해 12, 및 사람-중심에 대해 0으로 점수를 매길 수 있다. (이들 점수들은 원한다면, 예를 들면, 평균함으로써, 초기에 결정된 점수들과 조합될 수 있다.)The analysis performed initially-by each image in the first set of images being classified as person-centric, place-centric or object-centric-can be repeated for the images in the second set of images. Can be. Suitable image metrics may be identified (or initial measurements may be utilized) to determine similarity / difference within and between the grades of this second image set. These measurements are then applied to generate improved scores for the user's cell phone image as being person-centric, place-centric or object-centric as before. Referring to the second set of images, the cell phone image can be scored as 65 for object-centric, 12 for place-centric, and 0 for person-centric. (These scores can be combined with initially determined scores, if desired, for example, by averaging.)

이전과 같이, 이용자의 셀 폰 이미지와 제 2 세트의 각각의 이미지 사이의 유사성이 결정될 수 있다. 각각의 이미지로부터의 메타데이터는 그 후에, 대응하는 유사성 측정에 따라 가중될 수 있다. 결과들은 그 후에, 이미지 유사성에 따라 가중된 메타데이터의 세트를 산출하기 위해 조합될 수 있다.As before, the similarity between the user's cell phone image and each image of the second set may be determined. The metadata from each image can then be weighted according to the corresponding similarity measure. The results can then be combined to yield a set of weighted metadata according to image similarity.

메타데이터의 일부 - 흔히 어떤 높게 랭킹된 용어들을 포함함 - 는 소비자에게 제공하기 위한 이미지-적합한 응답들을 결정하는데 있어서 비교적 낮은 값일 것이다. "뉴욕", "맨해튼"이 몇몇 예들이다. 일반적으로, 비교적 진귀한 메타데이터 디스크립터들이 더욱 유용하다.Part of the metadata, often including some highly ranked terms, will be a relatively low value in determining image-suitable responses to provide to the consumer. "New York" and "Manhattan" are some examples. In general, relatively rare metadata descriptors are more useful.

"진귀성(unusualness)"의 측정은 플리커 이미지 태그들(전역적으로, 또는 지리적 위치된 영역 내에서), 또는 각각의 이미지들이 제시된 사진가들에 의한 이미지 태그들, 또는 백과사전 또는 웹사이트의 구글의 인덱스 내의 단어들 등과 같이, 관련 코퍼스 내의 상이한 메타데이터 용어들의 빈도를 결정함으로써 계산될 수 있다. 가중된 메타데이터 리스트 내의 용어들은 그들 진귀성에 따라 더 가중될 수 있다(즉, 제 2 가중). The measurement of "unusualness" can be either flicker image tags (globally or within a geographically located area), or image tags by photographers where each image is presented, or the encyclopedia or Google of a website. It can be calculated by determining the frequency of different metadata terms in the relevant corpus, such as words in the index of. Terms in the weighted metadata list may be further weighted according to their novelty (ie, second weighting).

이러한 연속하는 처리 결과는 도 46b의 "D"에 도시된 메타데이터의 리스트를 생성할 수 있다(각각은 각각의 점수를 가지고 도시됨). 이러한 정보(선택적으로, 사람/장소/물건 결정을 나타내는 태그와 함께)는 소비자에 대한 응답들이 셀 폰 포토와 잘 상관되게 허용한다.This successive processing result can produce a list of metadata shown at " D " in FIG. 46B (each with each score). This information (optionally with a tag indicating a person / place / stuff decision) allows the responses to the consumer to correlate well with the cell phone photo.

이용자의 셀 폰 포토에 대한 추론된 메타데이터의 이러한 세트가 다른 공용 리소스들(예를 들면, 이름들 장소들의 리스팅)과 함께 플리커와 같은 공용 소스들로부터 획득된 다른 이미지들의 자동화된 처리에 의해 전적으로 컴파일되었음을 알 것이다. 추론된 메타데이터는 이용자의 이미지와 자연스럽게 연관될 수 있다. 그러나, 본 애플리케이션에 더욱 중요하게, 이용자의 이미지의 제시에 응답하기 위한 최상의 방법을 결정하는데 서비스 제공자에게 도움을 줄 수 있다.This set of inferred metadata for the user's cell phone photo is entirely by automated processing of other images obtained from public sources such as Flickr along with other public resources (e.g., listing of names and places). You will see that it has been compiled. Inferred metadata can naturally be associated with the user's image. However, more importantly for the present application, it can assist the service provider in determining the best way to respond to the presentation of the user's image.

소비자에 대한 적합한 응답들의 결정Determination of appropriate responses to the consumer

도 50을 참조하여, 방금 기술된 시스템은 이용자로부터 이미지 데이터를 수신하는 "이미지 주서(image juicer)"의 하나의 특정한 애플리케이션으로서 보일 수 있고, 이미지와 연관될 수 있는 정보를 수집, 계산 및/또는 추론하도록 상이한 형태들의 처리를 적용한다. Referring to FIG. 50, the system just described may appear as one particular application of an "image juicer" that receives image data from a user, and collects, calculates, and / or information that may be associated with an image. Apply different forms of processing to infer.

정보가 식별될 때, 그것은 라우터에 의해 상이한 서비스 제공자들에게 송신될 수 있다. 이들 제공자들은 상이한 타입들의 정보(예를 들면, 의미 디스크립터들, 이미지 텍스처 데이터, 키포인트 디스크립터들, 고유값들, 컬러 히스토그램들 등)를 처리하거나, 또는 상이한 등급들의 이미지들(예를 들면 친구들의 포토, 소다 캔의 포토 등)을 처리하도록 구성될 수 있다. 이들 서비스 제공자들로부터의 출력들은 제공을 위해 또는 나중 참조를 위해 하나 이상의 디바이스들(예를 들면, 이용자의 셀 폰)에 송신된다. 본 논의는 이제 어떤 응답들이 주어진 세트의 입력 정보에 대해 적합할 수 있는지를 이들 서비스 제공자들이 어떻게 결정하는지를 고려하자. When the information is identified, it can be sent by routers to different service providers. These providers process different types of information (eg semantic descriptors, image texture data, keypoint descriptors, eigenvalues, color histograms, etc.), or images of different grades (eg friends' photos). , Photos of soda cans, etc.). Outputs from these service providers are sent to one or more devices (eg, user's cell phone) for provision or for later reference. This discussion now considers how these service providers determine which responses may be appropriate for a given set of input information.

하나의 방식은 이미지 대상들 및 대응하는 응답들의 분류학을 확립하는 것이다. 트리 구조가 이용될 수 있으며, 이미지는 먼저 몇몇의 하이 레벨 그룹들 중 하나로 분류되고(예를 들면, 사람/장소/물건), 그 후에 각 그룹은 다른 서브그룹들로 나누어진다. 이용시, 이미지는 이용 가능한 정보의 제한들이 다른 진행이 이루어지지 않도록 할 때까지 트리의 상이한 브랜치들을 통해 평가된다. 트리의 단말 리프 또는 노드와 연관된 동작들이 그 후에 취해진다. One way is to establish a taxonomy of image objects and corresponding responses. A tree structure can be used, and the image is first classified into one of several high level groups (eg, person / place / stuff), after which each group is divided into different subgroups. In use, the image is evaluated through different branches of the tree until the limitations of the information available do not allow for further progress. The actions associated with the terminal leaf or node of the tree are then taken.

간단한 트리 구조의 일부가 도 51에 도시된다. (각 노드는 3개의 브랜치들을 만들지만, 이것은 단지 예시하기 위한 것이다; 다소의 브랜치들이 마찬가지로 이용될 수 있다.) Part of the simple tree structure is shown in FIG. 51. (Each node makes three branches, but this is for illustration only; some branches can be used as well.)

이미지의 대상이 음식 항목인 것으로 추론되면(예를 들면, 이미지가 음식-관련 메타데이터와 연관되면), 3개의 상이한 스크린들의 정보가 이용자의 폰에 캐싱될 수 있다. 하나는 온라인 벤더에서 묘사된 항목의 온라인 구매들을 시작한다.(벤더의 선택, 및 대금지급/선적 세부사항들은 이용자 프로파일 데이터로부터 획득될 수 있다.) 제 2 스크린은 제품에 관한 영양 정보를 보여준다. 제 3 스크린은 - 묘사된 제품을 판매하는 상점들을 식별하는 - 근거리의 맵을 제공한다. 이용자는 폰의 측면상에 롤러 휠(124)을 이용하여 이들 응답들 사이에서 스위칭한다(도 44).If the subject of the image is inferred to be a food item (eg, if the image is associated with food-related metadata), the information of three different screens may be cached in the user's phone. One initiates online purchases of the item depicted at the online vendor (vendor selection, and payment / shipment details can be obtained from the user profile data.) The second screen shows nutritional information about the product. The third screen provides a near map-identifying the stores that sell the depicted product. The user uses the roller wheel 124 on the side of the phone to switch between these responses (FIG. 44).

대상이 가족 멤버나 친구의 사진인 것으로 추론되면, 이용자에게 제공된 하나의 스크린은 이용자의 페이스북 페이지에 사진의 사본을 포스팅하는 옵션을 제공하고, 사람(들)의 가능한 이름(들)이 주석이 달린다. (포토에 묘사된 사람들의 이름들을 결정하는 것은 피카사에서 이용자의 계정에 포토를 제시함으로써 행해질 수 있다. 피카사는 제시된 이용자 이미지들에 대해 얼굴 인식 동작들을 실행하고, 이용자에 의해 제공된 개별 이름들과 얼굴 고유벡터들을 상관시키고, 그에 의해 이용자의 이전 이미지들에 묘사된 친구들 및 다른 사람들에 대한 얼굴 인식 정보의 이용자-특정 데이터베이스를 컴파일한다. 피카사의 얼굴 인식은 구글에의 특허 6,356,659에 상술된 기술에 기초하여 이해된다. 애플이 아이포토 소프트웨어 및 페이스북의 포토 파인더 소프트웨어는 유사한 얼굴 인식 기능을 포함한다.) 다른 스크린은 개인에 대한 텍스트 메시지를 시작하여, 어드레스 정보가 피카사-결정된 아이덴티티에 의해 인덱싱된 이용자의 어드레스 북으로부터 획득된다. 이용자는 연관된 스크린들 사이를 스위칭함으로써 제공된 옵션들 중 어느 하나 또는 전부를 속행할 수 있다. If the subject is inferred to be a picture of a family member or friend, one screen provided to the user provides the option to post a copy of the picture on the user's Facebook page, and the possible name (s) of the person (s) are annotated. Run (Determining the names of the people depicted in the photos can be done by presenting the photos to the user's account in Picasa. Picasa performs facial recognition operations on the presented user images, and the individual names and faces provided by the user. Correlate the eigenvectors and thereby compile a user-specific database of face recognition information for friends and others depicted in the user's previous images .. Picasa's face recognition is based on the technique detailed in patent 6,356,659 to Google. Apple's iPhoto software and Facebook's photo finder software include similar facial recognition functions.) Another screen launches a text message to an individual, with the user whose address information indexed by a Picasa-determined identity. Is obtained from an address book. The user can continue any or all of the options provided by switching between associated screens.

대상이 낯선 사람인 것으로 나타나면(예를 들면, 피카사에 의해 인식되지 않음), 시스템은 공개적으로 이용 가능한 얼굴 인식 정보를 이용하여 사람의 시도한 인식이 초기에 착수될 것이다. (이러한 정보는 알려진 사람의 포토들로부터 추출될 수 있다. 비디오서프(VideoSurf)가 행위자들 및 다른 사람들에 대한 얼굴 인식 특징들의 데이터베이스를 가진 하나의 벤더가다. L-1 Corp.는 얼굴 인식 목적들을 위해 - 적합한 보호장치들을 이용하여 - 활용될 수 있는 운전자의 면허 포토들 및 연관된 데이터의 데이터베이스를 유지한다.) 이용자에게 제공된 스크린(들)은 매칭된 사람들의 참조 포토들을 보여줄 뿐만 아니라("매칭" 점수와 함께), 웹 및 다른 데이터베이스로부터 컴파일된 연관된 정보의 관계서류들을 보여줄 수 있다. 다른 스크린은 마이스페이스, 또는 인식된 사람이 존재하는 것으로 발견되는 다른 소셜 네트워킹 사이트 상의 인식된 사람에게의 "친구" 초대를 송신하는 옵션을 이용자에게 제공한다. 또 다른 스크린은 이용자와 인식된 사람 사이의 분리의 정도를 상술한다. (예를 들면, 나의 동생 데이비드는 학급친구 스티브를 가지며, 그는 묘사된 사람의 아들이다.) 이러한 관계들은 소셜 네트워킹 사이트들 상에서 공개된 연관 정보로부터 결정될 수 있다. If the subject appears to be a stranger (eg not recognized by Picasa), the system will initially initiate the person's attempted recognition using publicly available facial recognition information. (This information can be extracted from photos of known people. VideoSurf is a vendor with a database of facial recognition features for actors and others. L-1 Corp. To maintain a database of driver's license photos and associated data that can be utilized-using suitable safeguards. The screen (s) provided to the user not only show reference photos of matched persons ("matching"). Along with the scores), can show relevant documents of associated information compiled from the Web and other databases. Another screen provides the user with the option to send a "friend" invitation to the recognized person on MySpace or another social networking site where the recognized person is found to be present. Another screen details the degree of separation between the user and the recognized person. (For example, my brother David has classmate Steve, and he is the son of the person depicted.) These relationships can be determined from publicly relevant information published on social networking sites.

당연히, 이미지 대상들의 상이한 서브-그룹들에 대해 고찰된 각각의 옵션들은 대부분의 이용자 요구들을 충족시킬 수 있지만, 일부 이용자들은 다른 것들을 원할 것이다. 따라서, 각각의 이미지에 대한 적어도 하나의 대안적인 응답은 제한 없을 수 있다 - 예를 들면 이용자가 상이한 정보를 네비게이팅하도록 허용하거나, 원하는 응답을 명시하도록 허용하여 - 이미지/메타데이터 처리된 정보가 이용 가능한 것이면 무엇이든 이용할 수 있다.Naturally, each option considered for different sub-groups of image objects may meet most user needs, but some users will want others. Thus, at least one alternative response to each image may be unlimited—for example allowing a user to navigate different information or specifying a desired response—image / metadata processed information is used. You can use anything you can.

하나의 이러한 제한 없는 방식은 범용 검색 엔진에 상기 주지된 2배 가중된 메타데이터(예를 들면, 도 46b에서 "D")를 제시하는 것이다. 구글은 본질적으로, 현재 구글 검색들이 모든 검색 용어들이 결과들에서 발견되는 것을 요구하기 때문에, 이 기능에 대해 최상일 필요는 없다. 퍼지 검색을 하고, 모든 것이 발견되어야 할 필요 없이 상이하게 - 가중된 키워드들 - 응답하는 검색 엔진이 더 양호하다. 결과들은 키워드들이 발견되는 장소, 어디에서 그것들이 발견되는지 등에 의존하여 상이하게 보이는 관련성을 나타낼 수 있다. ("프로메테우스"를 포함하지만 "RCA 빌딩"이 결여된 결과는 후자를 포함하지만 전자가 결여된 결과보다 더욱 관련되게 랭크된다.) One such non-limiting way is to present the double-weighted metadata noted above (eg, “D” in FIG. 46B) to the universal search engine. Google does not necessarily have to be the best for this feature, as Google search now requires that all search terms be found in the results. Better is a search engine that does fuzzy search and responds differently-weighted keywords-without having to find everything. The results can indicate a seemingly different relationship depending on where the keywords are found, where they are found, and so on. (Results that include "Prometheus" but lack "RCA Building" rank more relevant than results that include the latter but lack the former.)

이러한 검색으로부터의 결과들은 다른 개념들에 의해 클러스터링될 수 있다. 예를 들면, 일부 결과들은 주제 "예술 데코"를 공유하기 때문에 클러스터링될 수 있다. 다른 것들은 RCA와 GE의 협력 역사를 다루기 때문에 클러스터링될 수 있다. 다른 것들은 건축가 레이몬드 후드의 작품들을 관련시키기 때문에 클러스터링될 수 있다. 다른 것들은 20세기 미국 조각품 또는 폴 맨십에 관련되므로 클러스터링될 수 있다. 개별적인 클러스터들을 생성하기 위해 발견된 다른 개념들은 존 록펠러, 미쯔비시 그룹, 콜롬비아 대학, 라디오 시티 음악홀, 레인보우 룸 레스토랑 등을 포함할 수 있다. The results from this search can be clustered by other concepts. For example, some results may be clustered because they share the subject "art deco". Others can be clustered because they address the history of RCA and GE collaboration. Others can be clustered because they relate the works of architect Raymond Hood. Others may be clustered as they relate to 20th century American sculptures or Paul Manships. Other concepts discovered to create individual clusters may include John Rockefeller, Mitsubishi Group, Columbia University, Radio City Music Hall, Rainbow Room Restaurant, and the like.

이들 클러스터들로부터의 정보는 예를 들면, 스크린 상에서 규정된 정보/동작들이 제공된 후에, 연속적인 UI 스크린들 상에 이용자에게 제공될 수 있다. 이들 스크린들의 순서는 키워드-결정된 관련성 또는 정보 클러스터들의 크기들에 의해 결정될 수 있다. Information from these clusters may be provided to the user on successive UI screens, for example, after prescribed information / operations are provided on the screen. The order of these screens may be determined by keyword-determined relevance or sizes of information clusters.

또 다른 응답은 구글 검색 스크린을 이용자에게 제공하는 것이다 - 검색 용어들로서 2배 가중된 메타데이터로 미리 장소를 차지한다. 이용자는 그 후에, 이용자에 의해 요구된 동작 또는 정보를 유도하는 웹 검색을 신속히 실행시키기 위해, 그 자신의 관심사에 관련없는 용어들을 삭제하고, 다른 용어들을 추가할 수 있다.Another response is to provide the user with a Google search screen-pre-placed with twice the weighted metadata as search terms. The user can then delete terms that are irrelevant to his or her interests and add other terms to quickly execute a web search leading to the action or information required by the user.

일부 실시예들에서, 시스템 응답은 이용자가 소셜 네트워크에서 "친구" 관계를 갖는 사람들 또는 신뢰의 어떤 다른 표시자에 의존할 수 있다. 예를 들면, 이용자 테드에 관해 거의 알려지지 않았지만, 테드의 친구 앨리스에 관해 이용 가능한 풍부한 세트의 정보가 존재한다면, 그 풍부한 세트의 정보는 주어진 콘텐트 자극과 함께 테드에 응답하는 방법을 결정하는데 활용될 수 있다. In some embodiments, the system response may depend on people with whom the user has a "friend" relationship in the social network or some other indicator of trust. For example, if little is known about User Ted, but there is a rich set of information available about Ted's friend Alice, that rich set of information can be used to determine how to respond to Ted with a given content stimulus. have.

유사하게, 이용자 테드가 이용자 앨리스의 친구이고, 밥이 앨리스의 친구이면, 밥에 관련된 정보는 테드에 적합한 응답을 결정하는데 이용될 수 있다. Similarly, if User Ted is a user Alice's friend and Bob is Alice's friend, information related to Bob can be used to determine a proper response to Ted.

암시적 신뢰를 위한 다른 기초가 존재한다고 가정하면, 동일한 원리들이 테드와 앨리스가 낯선 사람들인 경우에도 활용될 수 있다. 기본 프로파일 유사성이 하나의 가능한 기초일 때, 진귀한 속성(또는 더 양호한, 여러개의)을 공유하는 것이 더욱 양호하다. 따라서, 예를 들면, 테드와 앨리스 양쪽 모두는 대통령에 대한 데니스 쿠치니치의 열렬한 지지자들이고 생강 피클의 애호가들인 특성들을 공유하면, 그에 관련된 정보는 다른 것에 제공될 적당한 응답을 결정하는데 이용될 수 있다. Assuming that there is another basis for implicit trust, the same principles can be used when Ted and Alice are strangers. When basic profile similarity is one possible basis, it is better to share rare attributes (or better, several). Thus, for example, if Ted and Alice both share Dennis Kucinich's ardent supporters of the president and are enthusiasts of ginger pickles, information about them can be used to determine the appropriate response to be provided to others.

방금 기술된 어레인지먼트들은 강력한 새로운 기능을 제공한다. 그러나, 이용자가 바랄 가능성이 있는 응답들의 "직관(intuiting)"은 시스템 설계자들에 크게 의존한다. 이들은 마주칠 수 있고, 이용자의 가능한 바람을 최상으로 만족시킬 것이라 믿는 응답들(또는 응답들의 선택들)을 지시할 수 있는 상이한 타입들의 이미지들을 고려한다. The arrangements just described offer powerful new features. However, the "intuiting" of responses that the user may wish to rely heavily on system designers. They take into account different types of images that can be encountered and can indicate responses (or choices of responses) that we believe will best meet the user's possible desires.

이러한 점에서, 상술된 어레인지먼트들은 사람들이 검색할 수 있는 정보의 인간 생성된 분류학들 및 상이한 검색 결과들을 만족시킬 수 있는 웹 리소스들을 수동으로 찾는 야후! 팀들과 같은 웹의 초기 인덱스들과 유사하다. In this regard, the above-described arrangements allow Yahoo! to manually search for human-generated taxonomics of information that people can search for and web resources that can satisfy different search results. Similar to the initial indexes of the web like teams.

결국, 웹은 조직에서 그러한 수동 수고들을 압도했다. 구글의 설립자들은 웹에 관한 정보의 탭핑되지 않은 풍요함이 페이지들 사이의 링크들과 이들 링크들을 네비게이팅할 때의 이용자들의 동작들을 조사하여 획득될 수 있음을 인식한 사람들 가운데 있었다. 따라서, 시스템의 이해는 외부의 조망보다는 시스템 내의 데이터로부터 나왔다. In the end, the web overwhelmed such manual efforts in the organization. Google's founders were among those who recognized that the untapped richness of information about the web could be obtained by examining the links between pages and the actions of users when navigating these links. Thus, the understanding of the system came from the data within the system rather than from the outside.

동일한 방식으로, 이미지 분류들/응답들의 수동으로 만들어진 트리들은 훗날 이미지-응답 기술들의 개발에서 초기 단계인 것으로 보일 가능성이 있다. 결국, 이러한 방식들은 시스템 자체 및 그 이용으로부터 도출된 기계 이해에 의존하는 어레인지먼트들에 의해 가려질 것이다. In the same way, manually created trees of image classifications / responses are likely to appear to be an early stage in the development of image-response techniques later. In the end, these approaches will be obscured by arrangements that depend on machine understanding derived from the system itself and its use.

하나의 이러한 기술은 단순히, 어떤 응답 스크린(들)이 특정 콘텍스트들에서 이용자들에 의해 선택되는지를 조사한다. 이러한 이용 패턴들이 명백하므로, 가장 인기 있는 응답들은 이용자에게 제공된 스크린들의 시퀀스에서 더 초기로 이동될 수 있다. One such technique simply examines which response screen (s) are selected by the users in specific contexts. Since these usage patterns are obvious, the most popular responses can be moved earlier in the sequence of screens presented to the user.

마찬가지로, 패턴들이 제한 없는 검색 질의 옵션의 이용에서 명백하게 되는 경우, 이러한 동작은 표준 응답이 될 수 있고, 제공 큐에서 더 높게 이동될 수 있다. Likewise, if the patterns become evident in the use of the unlimited search query option, this behavior may be a standard response and may be moved higher in the provision queue.

이용 패턴들은 다양한 차원들의 콘텍스트에서 재단될 수 있다. 40세 내지 60세의 뉴욕에 있는 남성들은 20세기 조각가에 의한 조각상의 스냅샷의 캡처 후에, 13세 내지 16세의 베이징의 여성과는 상이한 응답들의 관심을 입증할 수 있다. 크리스마스 전의 몇 주간에 음식 처리기의 포토를 스냅핑하는 대부분의 사람들은 제품의 최저가 온라인 벤더를 찾는데 관심이 있을 수 있다; 크리스마스 전의 주에 동일한 오브젝트의 포토를 스냅핑하는 대부분의 사람들은 이베이 또는 크라이그슬리스트 상의 판매용 리스팅에 관심이 있을 수 있다 등. 바람직하게, 이용 패턴들은 이용자 거동의 대부분을 예측하기 위해, 가능한 많은 인구통계학 및 다른 디스크립터들로 추적된다. Usage patterns can be tailored in the context of various dimensions. Men in New York, ages 40 to 60, can demonstrate interest in responses different from women in Beijing, ages 13-16, after capturing snapshots of statues by 20th-century sculptors. Most people who snap photos of food handlers in the weeks before Christmas may be interested in finding the cheapest online vendor of the product; Most people who snap photos of the same object the week before Christmas may be interested in listings for sale on eBay or Craigslist. Preferably, usage patterns are tracked with as many demographics and other descriptors as possible to predict the majority of user behavior.

현재 이용 가능한 명백하게 및 추론적으로 링크된 데이터 소스들의 풍부한 소스들로부터 얻어진 더욱 정교한 기술들이 또한 적용될 수 있다. 이들은 웹 및 개인용 프로파일 정보뿐만 아니라, 예를 들면, 셀 폰 대금 청구서들, 신용 카드 내역서들, 아마존 및 이베이로부터의 쇼핑 데이터, 구글 검색 이력, 브라우징 이력, 캐싱된 웹 페이지들, 쿠키들, 이메일 아카이브들, 구글 음성으로부터의 폰 메시지 아카이브들, Expedia 및 Orbitz 상의 여행 예약들, iTunes 상의 음악 콜렉션들, 케이블 텔레비전 가입들, Netflix 영화 선택들, GPS 추적 정보, 소셜 네트워크 데이터 및 활동들, 플리커 및 피카사와 같은 포토 사이트들 및 유튜브와 같은 비디오 사이트들 상의 활동들 및 포스팅들, 이들 레코드들이 기록된 일시들 등(우리의 "디지털 라이프 로그")과 같이 우리가 트레이스들을 남겨두고 우리가 터치한 모든 방식의 다른 디지털 데이터를 포함한다. 더욱이 이러한 정보는 이용자를 위할 뿐만 아니라 이용자의 친구들/가족을 위해, 이용자와의 인구 통계적 유사성을 갖는 다른 사람들을 위해, 그리고, 궁극적으로 그 밖의 모든 사람들을 위해서도 잠재적으로 이용 가능하다(적당한 익명 및/또는 프라이버시 보호장치들을 가지고).More sophisticated techniques derived from abundant sources of currently and explicitly available speculatively linked data sources can also be applied. These include, for example, cell phone bills, credit card statements, shopping data from Amazon and eBay, Google search history, browsing history, cached web pages, cookies, email archives, as well as web and personal profile information. , Phone message archives from Google Voice, travel reservations on Expedia and Orbitz, music collections on iTunes, cable television subscriptions, Netflix movie selections, GPS tracking information, social network data and activities, Flickr and Picasawa Activities and postings on photo sites such as YouTube and YouTube, date and time these records were recorded (our "digital life log"), all the way we touched and left traces Include other digital data. Moreover, this information is potentially available not only for the user but also for the user's friends / family, for others with demographic similarities with the user, and ultimately for everyone else (appropriate anonymous and / Or with privacy protectors).

이들 데이터 소스들 사이의 상관성들의 네트워크는 구글에 의해 분석된 웹 링크들의 네트워크보다 작지만, 아마도 링크들의 다이버시티 및 타입들이 더 풍부하다. 이로부터, 특정 이용자가 특정 스냅핑된 이미지로 무엇을 처리하기를 원할 것 같은지를 알리는데 도움을 줄 수 있는 추론들 및 통찰력들의 풍부함을 캐낼 수 있다. The network of correlations between these data sources is smaller than the network of web links analyzed by Google, but perhaps richer in diversity and types of links. From this, it is possible to extract a wealth of inferences and insights that can help inform a particular user what he or she wants to process with a particular snapped image.

인공 지능 기술들은 데이터-채집 작업에 적용될 수 있다. 하나의 등급의 이러한 기술들은 자연 언어 처리(NLP: natural language processing)이며, 그 과학은 최근 획기적으로 발전하였다.Artificial intelligence techniques can be applied to data-gathering tasks. One class of these technologies is natural language processing (NLP), and the science has evolved significantly.

일례는 Cognition Technologies, Inc.에 의해 컴파일된 의미 맵(Semantic Map)이며, 이들 의미를 구별하기 위하여 콘텍스트에서 단어들을 분석하기 위해 이용될 수 있는 데이터베이스이다. 이 기능은 예를 들면, 이미지 메타데이터의 분석에서 동음 이의어 모호성을 해결하기 위해 이용될 수 있다(예를 들면, "bow"는 배의 일부, 또는 리본 장식, 또는 실행자의 감사 인사, 또는 화살의 보완물을 나타내는가? "카니발 크루즈 여행(Carnival cruise)", "새틴(satin)", "카네기 홀(Carnegie Hall)" 또는 "사냥(hunting)"과 같은 용어들의 근접은 있음직한 대답을 제공할 수 있다). 특허 5,794,050 (FRCD Corp.)은 기본 기술들을 상술한다.One example is a semantic map compiled by Cognition Technologies, Inc., which is a database that can be used to analyze words in a context to distinguish these meanings. This function can be used, for example, to resolve homonym ambiguities in the analysis of image metadata (e.g., "bow" is part of a ship, or a ribbon ornament, or a thank you from the practitioner, or of an arrow). The closeness of terms like "Carnival cruise", "satin", "Carnegie Hall", or "hunting" can provide a probable answer. have). Patent 5,794,050 to FRCD Corp. details basic techniques.

NLP 기술들을 통해 획득된 의미의 이해는 다른 관련 디스크립터들을 가진 - 본 명세서에 상술된 실시예들에서 부가의 메타데이터로서 이용될 수 있음 - 이미지 메타데이터를 증대시키기 위해 이용될 수 있다. 예를 들면, 디스크립터 "하비스쿠스 수술들(hibiscus stamens)" 로 태깅된 클로즈-업 이미지는 - NLP 기술들을 통해 - 용어 "꽃"으로 더 태깅될 수 있다. (이 기록으로, 플리커는 "하비스쿠스" 및 "수술"로 태깅되지만 "꽃"을 생략한 460개의 이미지들을 가진다.) An understanding of the meaning obtained through NLP techniques can be used to augment image metadata with other related descriptors-which can be used as additional metadata in the embodiments detailed herein. For example, a close-up image tagged with the descriptor "hibiscus stamens" can be further tagged with the term "flower"-through NLP techniques. (With this record, Flickr has 460 images tagged as "Habiscus" and "Surgery" but omitting "Flower.")

특허 7,383,169 (Microsoft)는 세상에 대한 이러한 "상식" 정보의 가공할 소스들의 역할을 하는 어휘 지식 기반들을 컴파일하기 위해 사전들 및 언어의 다른 큰 작업들이 NLP 기술들에 의해 어떻게 처리될 수 있는지를 상술한다. 이 상식 지식은 본 명세서에 상술된 메타데이터 처리에 적용될 수 있다. (위키피디아는 이러한 지식 기반에 대한 기초의 역할을 할 수 있는 다른 기준 소스이다. 우리의 디지털 라이프 로그는 또 다른 것이다 - 개별적으로서 우리에게 고유한 통찰력들을 생성하는 것이다.) Patent 7,383,169 (Microsoft) details how dictionaries and other large tasks of the language can be handled by NLP techniques to compile lexical knowledge bases that serve as fictitious sources of this "common sense" information about the world. . This common sense knowledge can be applied to the metadata processing detailed herein. (Wikipedia is another reference source that can serve as the basis for this knowledge base. Our digital lifelog is another-generating insights that are unique to us individually.)

우리의 디지털 라이프 로그에 적용될 때, NLP 기술들은 우리의 이력적 관심들 및 동작들에 관한 미묘한 차이의 이해들 - 우리의 현재 관심들 및 다가올 동작들을 모델링(예측)하기 위해 이용될 수 있는 정보 - 에 도달할 수 있다. 이러한 이해는 어떤 정보가 제공되어야 하는지, 또는 어떤 동작이 착수되어야 하는지를, 특정 이미지를 캡처하는 특정 이용자에(또는 다른 자극에) 응답하여, 동적으로 결정하기 위해 이용될 수 있다. 그 후에 실제로 직관적 계산에 도달할 것이다.When applied to our digital lifelog, NLP techniques understand subtle differences about our historical interests and actions-information that can be used to model (predict) our current interests and upcoming actions. Can be reached. This understanding can be used to dynamically determine what information should be provided or what action should be undertaken, in response to a particular user (or other stimulus) capturing a particular image. After that you will actually come to an intuitive calculation.

다른 의견들Other opinions

상술된 이미지/메타데이터 처리가 많은 단어들을 취하여 기술하였지만, 실행하기에 많은 시간이 걸릴 필요가 없다. 실제로, 많은 참조 데이터의 처리, 용어사전들의 편찬, 등은 임의의 입력 이미지가 시스템에 제공되기 전에 오프-라인으로 행해질 수 있다. 플리커, 야후! 또는 다른 서비스 제공자들은 주기적으로 컴파일하고, 다양한 현장들에 대한 데이터의 참조 세트들을 사전-처리하여 이미지 질의에 응답할 필요가 있을 때 신속히 이용 가능하다. Although the image / metadata processing described above has taken and described many words, it does not need to take much time to execute. Indeed, processing of a large number of reference data, compilation of glossaries, etc. can be done off-line before any input image is provided to the system. Flickr, Yahoo! Or other service providers are readily available when they need to compile periodically and pre-process reference sets of data for various sites to answer image queries.

일부 실시예들에서, 다른 처리 활동들은 상술된 것들과 병렬로 시작될 것이다. 예를 들면, 제 1 세트의 참조 이미지들의 초기 처리가 스냅핑된 이미지가 장소-중심인 것을 제안하는 경우, 시스템은 이용자 이미지의 처리가 종료되기 전에 다른 리소스들로부터 있음직한-유용한 정보를 요청할 수 있다. 예시하기 위하여, 시스템은 위성 뷰, 스트리트 뷰, 대규모 수송 맵 등과 함께 주위 영역의 스트리트 맵을 즉시 요청할 수 있다. 마찬가지로, 주위의 레스토랑들에 관한 정보의 페이지가 주위의 영화들 및 쇼 - 시간들을 상술하는 다른 페이지와, 지역 날씨 예보의 또 다른 페이지와 함께, 컴파일될 수 있다. 이들은 이용자의 폰에 모두 송신될 수 있고, 나중의 디스플레이를 위해 캐싱될 수 있다(예를 들면, 폰의 측면 상에 섬휠을 스크롤함으로써). In some embodiments, other processing activities will begin in parallel with those described above. For example, if initial processing of the first set of reference images suggests that the snapped image is place-centric, the system may request likely-useful information from other resources before processing of the user image is terminated. have. To illustrate, the system may immediately request a street map of the surrounding area along with satellite view, street view, large scale transport map, and the like. Similarly, a page of information about surrounding restaurants can be compiled, along with another page detailing surrounding movies and show-times, and another page of local weather forecasts. They can all be sent to the user's phone and cached for later display (eg by scrolling the thumbwheel on the side of the phone).

이들 동작들은 마찬가지로, 임의의 이미지 처리가 발생하기 전에 착수될 수 있다 - 간단히 셀 폰 이미지를 수반하는 지리적 코드 데이터에 기초하여. These operations can likewise be undertaken before any image processing takes place-simply based on geographic code data accompanying the cell phone image.

셀 폰 이미지를 수반하는 지리적 코딩 데이터가 특별히 기술된 어레인지먼트에서 이용되었지만, 이것은 필수적인 것이 아니다. 다른 실시예들이 이미지 유사성과 같은 다른 기준에 기초하여 참조 이미지들의 세트들을 선택할 수 있다. (이것은 상술되고 또한 후술되는 바와 같이 다양한 메트릭들에 의해 결정될 수 있다. 입력 이미지가 포함된 이미지들의 여러 등급들 중 하나를 결정하기 위해 알려진 이미지 분류 기술들이 또한 이용될 수 있어서, 유사하게 분류된 이미지들이 검색될 수 있을 것이다.) 다른 기준은 입력 이미지가 업로드되는 IP 어드레스이다. 동일한 - 또는 지리적으로 근접한 - IP 어드레스들로부터 업로드된 다른 이미지들은 참조 세트들을 형성하기 위해 샘플링될 수 있다. Although geographic coding data involving cell phone images has been used in specially described arrangements, this is not essential. Other embodiments may select sets of reference images based on other criteria such as image similarity. (This may be determined by various metrics as described above and below. Known image classification techniques may also be used to determine one of several grades of images that include the input image, such that similarly classified images Another criterion is the IP address from which the input image is uploaded. Other images uploaded from the same-or geographically close-IP addresses can be sampled to form reference sets.

입력 이미지에 대한 지리적 코드 데이터의 부재시에도 불구하고, 이미지의 참조 세트들은 위치에 기초하여 컴파일될 수 있다. 입력 이미지에 대한 위치 정보는 다양한 간접 기술들로부터 추론될 수 있다. 셀 폰 이미지가 중계되는 무선 서비스 제공자는 여행자의 송신이 수신된 특정 셀 타워를 식별할 수 있다. (송신이 WiFi와 같은 다른 무선 링크를 통해 발생되었다면, 그 위치도 또한 알 수 있다.) 여행자는 맨해튼 호텔에서 한 시간 일찍 그의 신용 카드를 이용했을 수 있어서, 시스템(적합한 프라이버시 보호장치들을 가진)이 맨해튼 근처의 어딘가에서 사진이 취해졌음을 추론하도록 허용한다. 때때로, 이미지에 묘사된 특징들은 상징적이어서, 플리커에서 유사한 이미지들에 대한 신속한 검색은 이용자를 찾을 수 있다(예를 들면, 에펠 타워에 있는 것으로서, 또는 자유의 여신상에서). Despite the absence of geographic code data for the input image, the reference sets of the image can be compiled based on the location. Location information for the input image can be inferred from various indirect techniques. The wireless service provider to which the cell phone image is relayed may identify the particular cell tower from which the traveler's transmission was received. (If the transmission was through another wireless link, such as WiFi, the location is also known.) The traveler may have used his credit card an hour earlier at the Manhattan hotel, so that the system (with appropriate privacy guards) Allow to infer that the picture was taken somewhere near Manhattan. Sometimes, the features depicted in an image are symbolic, so a quick search for similar images in Flickr can find the user (eg, in the Eiffel Tower, or in the Statue of Liberty).

지리적 정보의 하나의 소스로서 지오플래넷이 인용되었다. 그러나, 다수의 다른 지리적 정보 데이터베이스들이 대안적으로 이용될 수 있다. GeoNames-dot-org가 하나이다. ("-dot-" 전환 및 일반적인 http 프리엠블의 생략은 특허청에 의한 재생이 이 텍스트의 라이브 하이퍼링크로 표시되는 것을 방지하기 위해 이용되는 것임을 알 것이다.) 주어진 위도/경도에 대한 장소 이름들을 (이웃, 도시, 주, 국가의 레벨들로) 제공하고, 지리적 분할들을 위해 부모, 어린이 및 형제자매 정보를 제공하는 것 외에도, 지오네임즈의 무료 데이터(웹 서비스로서 이용 가능)는 또한, 가장 가까운 교차로를 찾는 것, 가장 가까운 우체국을 찾는 것, 표층 고도를 찾는 것 등과 같은 기능들을 제공한다. 또 다른 옵션은 구글의 지오서치 API이며, 이것은 구글 어스 및 구글 맵스로부터의 데이터와 상호작용 및 그의 검색을 허용한다. GeoPlanet is cited as one source of geographic information. However, many other geographic information databases may alternatively be used. GeoNames-dot-org is one. (Note that the "-dot-" switch and the omission of the usual http preamble are used to prevent playback by the Office of patents from being displayed as live hyperlinks to this text.) Place names for a given latitude / longitude ( In addition to providing neighboring, city, state, and country levels, and providing parent, child, and sibling information for geographic divisions, Geonames' free data (available as a web service) is also the nearest intersection. It provides features such as finding, finding the nearest post office, finding surface elevation, and more. Another option is Google's GeoSearch API, which allows interacting with and searching for data from Google Earth and Google Maps.

공중 이미지(aerial imagery)의 아카이브들이 지수적으로 성장하고 있음을 알 것이다. 이러한 이미지의 부분은 직선 조망이지만, 이미지의 오프-축은 점차적으로 사선이 된다. 위치의 2개 이상의 상이한 사선 뷰들로부터, 3D 모델이 생성될 수 있다. 이러한 이미지의 해상도가 증가하므로, 상당히 풍부한 세트들의 데이터가 - 어떤 위치들에 대해 - 그라운드 레벨로부터 취해진 것과 같은 장면의 뷰가 합성될 수 있도록 이용 가능하다. 이러한 뷰들은 스트리트 레벨 포토들과 매칭될 수 있고 하나로부터의 메타데이터는 다른 것에 대한 메타데이터를 증대시킬 수 있다. It will be appreciated that archives of aerial imagery are growing exponentially. Part of this image is a straight line view, but the off-axis of the image gradually becomes oblique. From two or more different oblique views of the position, a 3D model can be generated. As the resolution of this image increases, a fairly rich set of data-for some locations-is available so that a view of the scene, such as taken from the ground level, can be synthesized. Such views can be matched with street level photos and metadata from one can augment the metadata for the other.

도 47에 도시된 바와 같이, 상기에 특별히 기술된 실시예는 플리커, 사람 이름들의 데이터베이스, 단어 빈도 데이터베이스 등을 포함하여 다양한 리소스들을 이용하였다. 이러한 어레인지먼트들에서 활용될 수 있는 많은 상이한 정보 소스들 중 몇몇이 존재한다. 다른 소셜 네트워킹 사이트들, 쇼핑 사이트들(예를 들면, 아마존, 이베이), 날씨 및 교통 사이트들, 온라인 유의어 사전들, 최근 방문된 웹 페이지들의 캐시들, 브라우징 이력, 쿠키 콜렉션, 구글, 다른 디지털 저장소들(본 명세서에 상술된 바와 같이), 등이 모두, 의도된 작업들에 적용될 수 있는 부가적인 정보의 풍부함을 제공할 수 있다. 이 데이터의 일부는 이용자의 관심들, 습관들 및 선호들에 관한 정보를 드러낸다 - 스냅핑된 화상의 콘텐트들을 더욱 양호하게 추론하고, 직관된 응답(들)을 더욱 양호하게 재단할 수 있는 데이터. As shown in FIG. 47, the embodiment described above utilized a variety of resources, including Flickr, a database of person names, a word frequency database, and the like. There are some of many different information sources that can be utilized in such arrangements. Other social networking sites, shopping sites (eg, Amazon, eBay), weather and traffic sites, online thesaurus, cache of recently visited web pages, browsing history, cookie collection, Google, other digital repositories (As detailed herein), and the like can all provide a wealth of additional information that can be applied to the intended tasks. Some of this data reveals information about the user's interests, habits and preferences-data that can better infer the contents of the snapped picture and better tailor the intuitive response (s).

마찬가지로, 도 47이 상이한 항목들을 상호 접속하는 몇몇 라인들을 도시하고 있지만, 이들은 단지 예시적일 뿐이다. 상이한 상호접속들이 자연스럽게 활용될 수 있다. Likewise, although FIG. 47 shows several lines interconnecting different items, these are merely exemplary. Different interconnections may naturally be utilized.

이 명세서에 상술된 어레인지먼트들은 활용될 수 있는 무수한 것 중 특정한 몇몇이다. 대부분의 실시예들은 상술된 것들과는 상이할 것이다. 일부 동작들은 생략될 것이고, 일부는 상이한 순서들로 실행될 것이고, 일부는 직렬보다는 병렬로 실행될 것이고(그 반대로도 가능), 일부 부가의 동작들이 포함될 수 있다 등.The arrangements detailed in this specification are some of the myriad of which can be utilized. Most embodiments will be different from those described above. Some operations will be omitted, some will execute in different orders, some will execute in parallel rather than in series (or vice versa), some additional operations may be included, and the like.

하나의 부가적인 동작은 예를 들면 제 1 세트의 플리커 이미지들의 처리 후에 이용자-관련된 입력을 수신함으로써 방금 상술된 처리를 개량하는 것이다. 예를 들면, 시스템은 "록펠러 센터", "프로메테우스" 및 "스케이팅 링크"를 이용자-스냅핑된 이미지에 대한 관련 메타데이터로서 식별하였다. 시스템은 이용자에게, 이들 용어들 중 어느 것이 그/그녀의 특정 관심에 가장 관련된(또는 적어도 관련된) 것인지에 관해 질의할 수 있다. 다른 처리(예를 들면, 다른 검색 등)가 따라서 초점이 맞추어질 수 있다. One additional operation is to improve the processing just described, for example by receiving user-related input after processing the first set of flicker images. For example, the system has identified "Rockefeller Center", "Prometheus" and "Skating Link" as relevant metadata for user-snapped images. The system can query the user as to which of these terms is most relevant (or at least related) to his / her particular interest. Other processing (eg, other searches, etc.) may thus be focused.

터치 스크린 상에 제공된 이미지 내에서, 이용자는 이미지 프레임 내의 특정 관련성의 오브젝트를 나타내기 위해 영역을 터치할 수 있다. 그 후에, 이미지 분석 및 후속 동작들은 식별된 오브젝트에 초점이 맞추어질 수 있다. Within the image provided on the touch screen, the user can touch the area to represent an object of a particular relevance in the image frame. Thereafter, image analysis and subsequent operations can be focused on the identified object.

데이터베이스 검색들의 일부는 반복/회전적일 수 있다. 예를 들면, 하나의 데이터베이스 검색으로부터의 결과들은 오리지널 검색 입력들과 조합될 수 있고, 다른 처리에 대한 입력들로서 이용될 수 있다. Some of the database searches may be recursive / rotary. For example, the results from one database search can be combined with the original search inputs and used as inputs to another process.

대부분의 상술된 처리가 경계가 모호함을 알 것이다. 대부분 데이터는 절대적인 의미를 가지는 것이 아니라, 다른 메트릭들과 상이한 범위에 단지 관련되는 메트릭들의 관점에 있을 수 있다. 많은 이러한 상이한 확률적 팩터들이 평가된 다음 조합될 수 있다 - 통계적 스튜. 기술자들은 주어진 상황에 적합한 특정 구현이 주로 임의적일 수 있음을 알 것이다. 그러나, 경험들 및 베이스 기술들을 통해, 상이한 팩터들을 가중 및 이용하는 더 많은 추론된 방식들이 식별될 수 있고 결국 이용될 수 있다. It will be appreciated that most of the above described processes are blurred in boundaries. Most data does not have absolute meaning, but may be in terms of metrics that are only related to a different range than other metrics. Many such different stochastic factors can be evaluated and then combined-statistical stew. Those skilled in the art will appreciate that the particular implementation suitable for a given situation may be predominantly arbitrary. However, through experiences and base techniques, more inferred ways of weighting and using different factors can be identified and eventually used.

플리커 아카이브가 충분히 크다면, 상술된 어레인지먼트에서의 제 1 세트의 이미지들은 대상 이미지와 유사할 가능성이 더욱 많도록 선택적으로 선택될 수 있다. 예를 들면, 플리커에는 그날의 거의 동일한 시간에 취해진 이미지들이 검색될 수 있다. 조명 상태들은 예를 들면, 밤 장면을 낮 장면에 매칭시키는 것을 회피하도록 대략 유사할 것이고, 그림자/음영 상태들도 유사할 것이다. 마찬가지로, 플리커에는 동일한 계절/달에서 취해진 이미지들이 검색될 수 있다. 따라서, 록펠러 센터에서의 아이스 스케이팅 링크와 겨울 풍경 상의 눈의 계절적 소실과 같은 문제들이 완화될 수 있다. 유사하게, 카메라/폰에 자기계, 내부 센서 또는 결정될 베어링(및/또는 방위각/고도)을 허용하는 다른 기술이 갖추어지면, 플리커에서는 이러한 유사성의 정도를 가진 샷들이 역시 검색될 수 있다. If the flicker archive is large enough, the first set of images in the arrangement described above may be selectively selected to be more likely to be similar to the target image. For example, Flickr can retrieve images taken at about the same time of day. The lighting states will be approximately similar, for example, to avoid matching the night scene to the day scene, and the shadow / shading states will be similar. Similarly, images taken in the same season / month can be retrieved from Flickr. Thus, problems such as the ice skating rink at the Rockefeller Center and the seasonal loss of snow on the winter landscape can be mitigated. Similarly, if the camera / phone is equipped with a magnetic field, an internal sensor, or other technology that allows the bearing to be determined (and / or azimuth / altitude), shots with this degree of similarity in flicker can also be retrieved.

더욱이, 플리커로부터 수집된 참조 이미지들의 세트들은 많은 상이한 소스들(사진사들)로부터의 이미지를 포함하는 것이 바람직하다 - 그래서 그들은 동일한 메타데이터 디스크립터들을 이용하려는 경향이 없다. Moreover, the sets of reference images collected from Flickr preferably include images from many different sources (photographers)-so they do not tend to use the same metadata descriptors.

플리커로부터 수집된 이미지들은 적합한 메타데이터를 위해 스크리닝될 수 있다. 예를 들면, 메타데이터를 가지지 않은 이미지들(아마도 임의의 이미지 수를 제외함)은 참조 세트(들)로부터 제거될 수 있다. 마찬가지로, 2(또는 20)보다 적은 수의 메타데이터 용어들을 가지거나 또는 설명적 기술을 가지지 않은 이미지들은 무시될 수 있다. Images collected from Flickr can be screened for appropriate metadata. For example, images without metadata (possibly excluding any number of images) may be removed from the reference set (s). Similarly, images with fewer than two (or 20) metadata terms or no descriptive description can be ignored.

플리커는 종종 이 명세서에서 언급되지만, 다른 콘텐트의 콜렉션들이 당연히 이용될 수 있다. 플리커에서의 이미지들은 일반적으로, 각각의 이미지에 대한 지정된 라이센스 권리들을 가진다. 이들은 "예정된 모든 권리들"뿐만 아니라, 다양한 크리에이티브 커먼스 라이센스들(Creative Commons licenses)을 포함하며, 이를 통해 대중은 상이한 용어들에 대한 이미지를 이용할 수 있다. 본 명세서에 상술된 시스템들은 지정된 라이센스 기준을 충족하는 이미지를 플리커를 통해 검색하는 것을 제한할 수 있다(예를 들면, "예정된 모든 권리들"로 마킹된 이미지들을 무시).Flickr is often mentioned in this specification, but collections of other content may naturally be used. Images in Flickr generally have designated license rights for each image. These include various Creative Commons licenses, as well as "all rights envisioned," allowing the public to use images for different terms. The systems described herein above may restrict retrieval through Flickr for images that meet specified license criteria (eg, ignore images marked as "all rights envisioned").

다른 이미지 콜렉션들은 어떤 관점들에서 양호하다. 예를 들면, images. google-dot-com에서의 데이터베이스는 플리커보다 메타-관련성에 기초한 랭킹 이미지들에서 더욱 양호하게 보인다. Other image collections are good in some respects. For example, images. The database at google-dot-com looks better on ranking images based on meta-relevance than Flickr.

플리커 및 구글은 공개적으로 액세스 가능한 이미지 아카이브들을 유지한다. 많은 다른 아미지 아카이브들은 비밀이다. 본 기술의 실시예들은 양쪽 모두와의 애플리케이션을 찾을 수 있다 - 두 공개적 및 소유자 이미지 콜렉션들이 이용되는 어떤 하이브리드 콘텍스트들을 포함하여(예를 들면, 플리커는 이용자 이미지에 기초하여 이미지를 찾기 위해 이용되고, 플리커 이미지는 매칭을 찾고 이용자에 대한 대응하는 응답을 결정하기 위해 비밀 데이터베이스에 제시된다.) Flickr and Google maintain publicly accessible image archives. Many other image archives are secret. Embodiments of the present technology may find an application with both-including some hybrid contexts in which both public and owner image collections are used (e.g., Flickr is used to find an image based on a user image, The flicker image is presented in a secret database to find a match and determine the corresponding response to the user.)

유사하게, 데이터를 제공하기 위해(예를 들면, 이미지들 및 메타데이터) 플리커와 같은 서비스들에 참조되지만, 다른 소스들도 당연히 이용될 수 있다. Similarly, although referenced to services such as Flickr to provide data (eg, images and metadata), other sources may of course be used.

하나의 대안적인 리소스는 ad hoc 피어-투-피어(P2P) 네트워크이다. 하나의 이러한 P2P 어레인지먼트에서, 선택적으로 중앙 인덱스가 존재할 수 있고, 이를 이용하여 피어들은 원하는 콘텐트를 검색할 때 통신할 수 있고, 이들이 공유하기 위해 이용 가능한 콘텐트를 상술한다. 인덱스는 이미지들 자체가 저장되는 노드들에 대한 포인터들과 함께, 이미지들에 대한 메타데이터 및 메트릭들을 포함할 수 있다. One alternative resource is an ad hoc peer-to-peer (P2P) network. In one such P2P arrangement, there may optionally be a central index, which allows peers to communicate when searching for the desired content, detailing the content they are available for sharing. The index may include metadata and metrics for the images, along with pointers to the nodes where the images themselves are stored.

피어들은 카메라들, PAD들 및 다른 휴대용 디바이스들을 포함할 수 있고, 이로부터 이미지 정보는 캡처된 후에 거의 즉시 이용 가능할 수 있다.Peers may include cameras, PADs, and other portable devices from which image information may be available almost immediately after being captured.

본 명세서에 상술된 방법들이 과정에서, 이미지들 사이에 특정 관계들이 발견된다(예를 들면, 유사한 지리적 위치; 유사한 이미지 메트릭들; 유사한 메타데이터 등). 이들 데이터가 일반적으로 상호적이어서, 시스템이 - 이미지 A의 처리 동안 - 그 컬러 히스토그램이 이미지 B의 것과 유사하다는 것을 발견하면, 이 정보는 나중 이용을 위해 저장될 수 있다. 나중 처리가 이미지 B를 관련시키면, 초기-저장된 정보가 참고되어, 이미지 A가 - 이미지 B를 분석하지 않고 - 유사한 히스토그램을 가지는 것을 발견할 수 있다. 이러한 관계들은 이미지들 사이의 가상 링크들과 유사하다. In the course of the methods described herein, certain relationships are found between images (eg, similar geographic location; similar image metrics; similar metadata, etc.). If these data are generally interactive, and the system finds that during the processing of Image A its color histogram is similar to that of Image B, this information can be stored for later use. If the later processing involves image B, then the initially-stored information may be consulted to find that image A has a similar histogram-without analyzing image B. These relationships are similar to virtual links between images.

이러한 관계 정보가 시간에 걸쳐 그 유틸리티를 유지하기 위해서는 이미지들이 지속적인 방식으로 식별되는 것이 바람직하다. 관계가 이미지 A가 이용자의 PDA 상에 있고 이미지 B가 어딘가의 데스크탑 상에 있으면, 이미지 A가 이용자의 마이스페이스 계정에 송신된 후에도 이미지 A를 식별하고, 이미지 B가 클라우드 네트워크의 익명의 컴퓨터에 보존된 후에 이미지 B를 추적하기 위한 수단이 제공되어야 한다. In order for this relationship information to maintain its utility over time, it is desirable that the images be identified in a continuous manner. If the relationship is Image A on the user's PDA and Image B is on some desktop, Image A identifies Image A even after it is sent to the user's MySpace account, and Image B is preserved on an anonymous computer in the cloud network. Means for tracking image B should be provided.

이미지들은 이 목적을 위해 디지털 오브젝트 식별자들(DOI)이 할당될 수 있다. 국제 DOI 재단은 CNRI 핸들 시스템을 구현하여, 그러한 리소스들이 웹사이트 doi-dot-org를 통해 현재 위치에 결정될 수 있다. 다른 대안은 이미지들이 Digimarc For Images 서비스에 의해 추적된 식별자들을 가지고 할당 및 디지털로 워터마킹되는 것이다. Images may be assigned digital object identifiers (DOI) for this purpose. The DOI Foundation has implemented the CNRI Handle System so that such resources can be determined in their current location through the website doi-dot-org. Another alternative is that images are assigned and digitally watermarked with identifiers tracked by the Digimarc For Images service.

여러 상이한 저장소들에서 이미지 또는 다른 정보가 검색된다면, 특정 데이터베이스에 대한 질의가 이용되도록 적응되는 것이 종종 바람직하다. 예를 들면, 상이한 얼굴 인식 데이터베이스들이 상이한 얼굴 인식 파라미터들을 이용할 수 있다. 다수의 데이터페이스들에 걸쳐 검색하기 위해, Digimarc의 공개된 특허 출원들 20040243567 및 20060020630에 상술된 바와 같은 기술들이 활용되어 각각의 데이터베이스가 적합하게 재단된 질의로 조사되는 것을 보장할 수 있다. If an image or other information is retrieved from several different repositories, it is often desirable to be adapted to use a query for a particular database. For example, different face recognition databases may use different face recognition parameters. To search across multiple databases, techniques such as those described in Digimarc's published patent applications 20040243567 and 20060020630 may be utilized to ensure that each database is examined with a properly tailored query.

이미지들에 대한 빈번한 참조가 이루어지지만, 많은 경우들에서, 그 이미지 정보 자체 대신에 다른 정보가 이용될 수 있다. 상이한 애플리케이션들에서, 이미지 식별자들, 고유벡터들의 특징화, 컬러 히스토그램들, 키포인트 디스크립터들, FFT들, 연관된 메타데이터, 디코딩된 바코드 또는 워터마크 데이터 등은 본질적으로 이미지 대신 이용될 수 있다(예를 들면, 데이터 프록시와 같이).Although frequent references to images are made, in many cases other information may be used instead of the image information itself. In different applications, image identifiers, characterization of eigenvectors, color histograms, keypoint descriptors, FFTs, associated metadata, decoded barcode or watermark data, etc. can be used in essence instead of an image (e.g., For example, as a data proxy).

초기 예가 경도/위도 데이터에 의한 지리적 코딩을 이야기하였지만, 다른 어레인지먼트들에서, 셀 폰/카메라는 야후의 지오플래넷 ID - 지구상 장소 ID(WOEID) - 과 같이 하나 이상의 다른 참조 시스템들에서 위치 데이터를 제공할 수 있다. Although the initial example talked about geographic coding by longitude / latitude data, in other arrangements, the cell phone / camera used the location data in one or more other reference systems, such as Yahoo's GeoPlanet ID (WOEID). Can provide.

위치 메타데이터는 유사하게-위치된 이미지에 부가하여 다른 리소스들을 식별하기 위해 이용될 수 있다. 웹 페이지들은 예를 들면, 지리적 연관들을 가질 수 있다(예를 들면, 블로그가 저작가의 이웃에 관련될 수 있고; 레스토랑의 웹 페이지가 특정 물리적 어드레스와 연관된다). 웹 서비스 GeoURL-dot-org는 특정 지리학들과 연관된 웹 사이트들을 식별하기 위하여 이용될 수 있는 URL에 대한 위치(location-to-URL) 역방향 디렉토리이다. Location metadata may be used to identify other resources in addition to the similarly-located image. Web pages may have, for example, geographic associations (eg, a blog may be associated with the author's neighborhood; a restaurant's web page is associated with a particular physical address). The web service GeoURL-dot-org is a location-to-URL reverse directory that can be used to identify web sites associated with particular geography.

GeoURL은 그들 자신의 ICMB 메타태그 뿐만 아니라 지리적 태그들을 포함한 다양한 위치 태그들을 지원한다. 지리적 태깅을 지원하는 다른 시스템들은 일반적으로 XMP- 및 EXIF-카메라 메타정보에서 이용되는 RDF, Geo 마이크로포맷, 및 GPSLongitude/GPSLatitude 태그들을 포함한다. 플리커는 다음과 같은 Geobloggers에 의해 확립된 신택스를 이용한다, 예를 들면:GeoURL supports a variety of location tags, including their own ICMB meta tags as well as geo tags. Other systems that support geo tagging include RDF, Geo microformats, and GPSLongitude / GPSLatitude tags, which are commonly used in XMP- and EXIF-camera meta information. Flickr uses the syntax established by Geobloggers, for example:

geotagged geotagged

geo:lat = 57.64911 geo: lat = 57.64911

geo:lon = 10.40744 geo: lon = 10.40744

메타데이터 처리시, 상기 참조된 바와 같이, 분석하기 전에 데이터를 클린-업하는 것이 때때로 도움이 된다. 메타데이터는 또한, 우세한 언어에 대해 조사될 수 있고, 영어가 아닌 경우(또는 다른 특정 언어의 구현), 메타데이터 및 연관된 이미지가 고려사항으로부터 제거될 수 있다. In metadata processing, as referenced above, it is sometimes helpful to clean up data prior to analysis. Metadata may also be searched for the prevailing language, and if it is not English (or an implementation of another particular language), metadata and associated images may be removed from consideration.

초기에 상술된 실시예가 대응적으로 상이한 동작이 취해지도록, 이미지 대상이 사람/장소/물건 중 하나인 것으로 식별되는 것이 추구되었지만, 다른 등급들 내의 이미지의 분석/식별이 자연스럽게 활용될 수 있다. 무수한 다른 등급/타입 그룹들의 몇몇 예는 동물/야채/미네랄; 골프/테니스/풋볼/야구; 남성/여성; 검출된 결혼 반지/검출되지 않은 결혼 반지; 도시/시골; 비/맑음; 낮/밤; 어린이/성인; 여름/가을/겨울/봄; 차량/트럭; 소비자 제품/비소비자 제품; 캔/박스/가방; 자연적/인공적; 모든 연령대에 적합/13세 이하 어린이들에 대한 부모 조언/17세 이하 어린이들에 대한 부모 조언/성인 전용; 등을 포함한다. Initially the above-described embodiments were sought to identify an image object as being one of people / places / objects so that correspondingly different actions could be taken, but analysis / identification of images in other classes may naturally be utilized. Some examples of countless other grade / type groups include animal / vegetable / mineral; Golf / tennis / football / baseball; Male / female; Detected wedding rings / undetected wedding rings; Urban / countryside; Rain / sunny; Day / night; Children / adults; Summer / autumn / winter / spring; Vehicle / truck; Consumer products / non-consumer products; Cans / boxes / bags; Natural / artificial; Suitable for all ages / parental advice for children under 13 / parental advice for children under 17 / adult only; And the like.

때때로, 상이한 분석 엔진들이 이용자의 이미지 데이터에 적용될 수 있다. 이들 엔진들은 순차적이거나 병렬로 동작할 수 있다. 예를 들면, 도 48a는 - 이미지가 사람-중심인 것으로 식별되는 경우 - 다음에 2개의 다른 엔진들에 참조되는 어레인지먼트를 도시한다. 하나는 사람을 가족, 친구 또는 낯선 사람으로 식별한다. 다른 것은 사람을 어린이 또는 성인으로 식별한다. 후자의 두 엔진들은 첫 번째가 그 작업을 완료한 후에 병렬로 작업한다. Sometimes different analysis engines may be applied to the user's image data. These engines can operate sequentially or in parallel. For example, FIG. 48A shows an arrangement which is then referenced to two different engines-when the image is identified as person-centric. One identifies a person as a family, friend or stranger. The other identifies the person as a child or adult. The latter two engines work in parallel after the first has completed its work.

때때로, 엔진들은 이들이 적용 가능한 임의의 확실성 없이 활용될 수 있다. 예를 들면, 도 48b는 가족/친구/낯선 사람 및 어린이/성인 분석들을 실행하는 엔진들을 도시한다 - 동시에 사람/장소/물건이 분석을 착수한다. 후자의 엔진이 장소 또는 물건일 가능성이 있다고 결정하면, 첫 번째 2개의 엔진들의 결과들은 이용되지 않을 가능성이 있다. Sometimes engines may be utilized without any certainty with which they are applicable. For example, FIG. 48B shows engines for performing family / friend / familiar and child / adult analyzes-at the same time person / place / object undertakes analysis. If it is determined that the latter engine may be a place or an object, the results of the first two engines are likely not to be used.

(특수화된 온라인 서비스들은 특정 타입들의 이미지 구별/식별을 위해 이용될 수 있다. 예를 들면, 하나의 웹 사이트는 항공기 인식 서비스를 제공할 수 있다: 항공기의 이미지가 사이트에 업로딩되면, 비행기의 식별이 제조사 및 모델에 의해 리턴된다. (이러한 기술은 예를 들면, JCIS-2008 Proceedings에서 Sun에 의한 The Features Vector Research on Target Recognition of Airplane; 및 2003년 Optical Engineering 제1호 제42권에서 Tien에 의한 Using Invariants to Recognize Airplanes in Inverse Synthetic Aperture Radar Images의 개시내용들을 따를 수 있다.) 본 명세서에 상술된 어레인지먼트들은 항공기인 것이 나타나는 이미지를 이러한 사이트에 참조할 수 있고, 리턴된 식별 정보를 이용할 수 있다. 또는 모든 입력 이미지는 이러한 사이트에 참조될 수 있다; 리턴된 결과들의 대부분은 모호하거나 이용되지 않을 것이다.) (Specialized online services can be used for distinguishing / identifying certain types of images. For example, one website can provide aircraft recognition services: if an image of the aircraft is uploaded to the site, the identification of the plane. (This technique is described, for example, by Sun in JCIS-2008 Proceedings, The Features Vector Research on Target Recognition of Airplane; and 2003 by Tien, Vol. Using Invariants to Recognize Airplanes in Inverse Synthetic Aperture Radar Images.) The arrangements described herein can refer to these sites an image that appears to be an aircraft and can use the returned identification information. Or all input images can be referenced to these sites; most of the returned results are ambiguous or available Will not be.)

도 49는 상이한 분석 엔진들이 상이한 응답 엔진들에 대한 그들의 출력들을 제공할 수 있는 것을 도시한다. 흔히 상이한 분석 엔진들 및 응답 엔진들은 상이한 서비스 제공자들에 의해 동작될 수 있다. 이들 응답 엔진들로부터의 출력들은 그 후에 소비자에게 제공을 위해 통합되거나/조정될 수 있다. (이 통합은 이용자의 셀 폰에 의해 실행될 수 있다 - 상이한 데이터 소스들로부터의 입력들을 어셈블링하거나; 또는 그러한 작업은 다른 곳에서 처리기에 의해 실행될 수 있다.)49 shows that different analysis engines can provide their outputs for different response engines. Often different analysis engines and response engines can be operated by different service providers. The outputs from these response engines can then be integrated / adjusted for presentation to the consumer. (This integration can be performed by the user's cell phone-assembling inputs from different data sources; or such work can be performed by a processor elsewhere.)

본 명세서에 상술된 기술의 일례는 예비 부품을 필요로 하는 드릴의 셀 폰 이미지를 취하는 주택 건설업자이다. 이미지는 분석되고, 드릴은 시스템에 의해 블랙 앤 데커 DR250B로서 식별되고, 이용자에게는 다양한 정보/동작 옵션들이 제공된다. 이들은 유사한 출현을 가진 드릴들의 포토들을 리뷰하고, 유사한 디스크립터들/특징들을 가진 드릴들의 포토들을 리뷰하고, 드릴에 대한 이용자의 매뉴얼을 리뷰하고, 드릴에 대한 부품들 리스트를 보고, 아마존으로부터 새로운 또는 이베이로부터 이용된 드릴을 사고, 이베이 상에서 건설업자의 드릴을 리스팅하고, 드릴에 대한 부품들을 사는 등을 포함한다. 건설업자는 "부품을 사는" 옵션을 선택하고 필요한 부품을 주문하도록 처리한다(도 41).One example of the technology detailed herein is a home builder that takes a cell phone image of a drill in need of spare parts. The image is analyzed, the drill is identified by the system as the Black and Decker DR250B, and the user is provided with various information / action options. They review photos of drills with similar appearances, reviews photos of drills with similar descriptors / features, review the user's manual for the drill, view the parts list for the drill, find new or ebay from Amazon Buying a drill used from, listing a builder's drill on eBay, buying parts for the drill, and so on. The builder selects the “buy parts” option and proceeds to order the required parts (FIG. 41).

다른 예는 집을 쇼핑하는 사람이다. 그녀는 집의 사진을 스냅핑한다. 시스템은 MLS 정보의 비밀 데이터베이스와 구글과 같은 공개 데이터베이스 양쪽 모두에 이미지를 참조한다. 시스템은 판매용으로 제공된 가장 가까운 집들의 사진들을 리뷰하고; 화상속의 집과 값이 가장 가깝고 동일한 우편 번호 내에 있는 판매용 리스트된 집들의 포토들을 리뷰하고; 화상 속의 집과 특징들이 가장 유사하고 동일한 우편 번호 내에 있는 판매용 리스트된 집들의 포토들을 리뷰하고; 이웃 및 학교 정보 등을 포함하여, 다양한 옵션들로 응답한다(도 43). Another example is people who shop at home. She snaps a picture of the house. The system references images in both a secret database of MLS information and a public database such as Google. The system reviews the photos of the closest houses provided for sale; Reviewing the photos of the listed homes for sale that are closest in value to the house in the image and within the same postal code; Reviewing the photos of the listed homes for sale whose home and features are most similar and within the same postal code; Respond with various options, including neighborhood and school information, etc. (FIG. 43).

다른 예에서, 제 1 이용자는 콘서트에서 폴 사이먼의 이미지를 스냅핑한다. 시스템은 자동으로 - 상술된 절차들에 의해 추론된 메타데이터와 함께 - 이용자의 플리커 계정에 이미지를 포스팅한다. (예술가의 이름은 이용자의 지리적 위치에 대한 구글의 검색에서 찾을 수 있다; 예를 들면, 티켓마스터 웹 페이지는 폴 사이먼이 그 밤 그 무대에서 공연하고 있음을 나타낸다.) 제 1 이용자의 화상은 잠시 후에, 상이한 유리한 위치로부터 동일한 이벤트의 제 2 콘서트-고어의 포토를 처리하는 시스템에 의해 마주치게 된다. 제 2 이용자는 제 2 포토에 대한 시스템의 응답들 중 하나로서 제 1 이용자의 포토를 보게 된다. 시스템은 또한 제 1 이용자에게, 그가 특정 버튼을 2번 누른다면, 동일한 이벤트의 다른 화상 - 상이한 시점으로부터 - 이 그의 셀 폰 상에 리뷰하기 위해 이용 가능하다는 것을 경고할 수 있다. In another example, the first user snaps an image of Paul Simon at a concert. The system automatically posts the image to the user's Flickr account-along with the metadata inferred by the procedures described above. (The artist's name can be found in Google's search for the user's geographic location; for example, the ticketmaster web page indicates that Paul Simon is performing on stage that night.) Later, they are encountered by a system that processes photos of a second concert-gore of the same event from different advantageous locations. The second user sees the photo of the first user as one of the responses of the system to the second photo. The system may also warn the first user that if he presses a particular button twice, another picture of the same event-from a different point in time-is available for review on his cell phone.

많은 이러한 어레인지먼트들에서, "콘텐트는 네트워크임"을 인식할 것이다. 각각의 포토 또는 포토에 묘사된 각각의 대상과 연관되는 것(또는 디지털 콘텐트의 임의의 다른 항목 또는 거기에 표현된 정보)은 동작들 및 다른 콘텐트에 대한 명시적인 - 또는 명확한 - 링크의 역할을 하는 데이터 및 속성들의 세트이다. 이용자는 한 노드에서 다음 노드로 네비게이팅할 수 있다 - 네트워크 상의 노드들 사이에서 네비게이팅한다.In many such arrangements, one will recognize that "content is a network". Associated with each photo or each object depicted in the photo (or any other item of digital content or information represented therein) acts as an explicit-or explicit-link to actions and other content. Set of data and attributes. The user can navigate from one node to the next-navigating between nodes on the network.

텔레비전 쇼들은 시청자들의 수에 의해 레이팅되고, 학교 신문들은 나중의 인용구들의 수에 의해 판단된다. 더 고 레벨로 표현하면, 물리적 - 또는 가상 - 콘텐트에 대한 이러한 "시청률"은 이를 다른 물리적 - 또는 가상 - 콘텐트와 연관시키는 링크들의 개체 조사임을 알 것이다.Television shows are rated by the number of viewers, and school newspapers are judged by the number of later quotes. Expressed at a higher level, it will be appreciated that this "viewing rate" for physical- or virtual-content is an entity survey of links that associate it with other physical- or virtual-content.

구글이 디지털 콘텐트 사이의 링크들의 분석 및 개발에 제한되지만, 본 명세서에 상술된 기술은 물리적 콘텐트 사이(그리고 물리적 및 전자적 콘텐트 사이)의 링크들의 분석 및 개발도 마찬가지로 허용한다.While Google is limited to the analysis and development of links between digital content, the techniques described herein also allow for the analysis and development of links between physical content (and between physical and electronic content) as well.

알려진 셀 폰 카메라들 및 다른 이미징 디바이스들은 통상적으로 단일 "셔터" 버튼을 가진다. 그러나, 디바이스에는 상이한 엑추에이터 버튼들이 구비될 수 있다 - 각각은 캡처된 이미지 정보로 상이한 동작을 호출한다. 이러한 어레인지먼트에 의해, 이용자는 - 착수시에 - 의도된 동작의 타입(예를 들면, 피카사 또는 비디오서프(VideoSurf) 정보마다의 이미지에서 얼굴들을 식별하고, 나의 페이스북 페이지에 포스팅하거나; 또는 묘사된 사람을 시도 및 식별하고, 그 사람의 마이스페이스 계정에 "요청된 친구"를 송신함)을 나타낼 수 있다. Known cell phone cameras and other imaging devices typically have a single "shutter" button. However, the device may be equipped with different actuator buttons-each invoking a different operation with the captured image information. By such an arrangement, the user-upon launch-identifies faces in the image per type of intended behavior (e.g., Picasa or VideoSurf information, posts to my Facebook page; or depicted) Try and identify a person, and send a "requested friend" to that person's MySpace account.

다수의 엑추에이터 버튼들보다는 단독 엑추에이터 버튼의 기능이 디바이스 상의 다른 UI 제어들에 따라 제어될 수 있다. 예를 들면, 기능 선택 버튼의 반복된 누름은 상이하게 의도된 동작들이 UI의 스크린 상에 디스플레이되게 할 수 있다(친숙한 소비자 카메라들이 클로즈업, 해변, 야간, 초상화 등과 같이 상이한 포토 모드들을 가지는 것처럼). 이용자가 그 후에 셔터 버튼을 누르면, 선택된 동작이 호출된다. The function of a single actuator button, rather than multiple actuator buttons, may be controlled in accordance with other UI controls on the device. For example, repeated pressing of the function selection button can cause differently intended actions to be displayed on the screen of the UI (as familiar consumer cameras have different photo modes such as close-up, beach, night, portrait, etc.). If the user then presses the shutter button, the selected operation is invoked.

하나의 공동 응답(확인할 필요가 없을 수 있음)은 플리커 또는 소셜 네트워크 사이트(들) 상에 이미지를 포스팅하는 것이다. 본 명세서에 상술된 처리들에 의해 추론된 메타데이터는 이미지(아마도 그 신뢰에 관해 적격인)와 함께 저장될 수 있다. One common response (which may not need to be verified) is posting an image on Flickr or social network site (s). Metadata inferred by the processes detailed herein may be stored with an image (possibly eligible for its trust).

과거, 마우스의 "클릭"은 이용자-원하는 동작을 트리거링하도록 서빙되었다. 그 동작은 이용자의 명확한 의도를 나타낸 가상 풍경(예를 들면, 데스크탑 스크린) 상의 X-Y- 위치 좌표를 식별하였다. 나아가서, 이 역할은 점차적으로 - 이용자의 의도가 추론되는 실제 풍경을 캡처하는 - 셔터의 "스냅"에 의해 서빙될 것이다. In the past, a "click" of a mouse has been served to trigger a user-desired action. The operation identified X-Y-position coordinates on a virtual landscape (eg, a desktop screen) that showed the user's clear intent. Further, this role will be served gradually by the "snap" of the shutter-capturing the actual landscape from which the user's intention is deduced.

비즈니스 역할들은 주어진 상황에 적합한 응답을 지시할 수 있다. 이들 역할들 및 응답들은 구글 등과 같은 웹 인덱서들에 의해 수집된 데이터를 참조하여, 지적 라우팅을 이용하여 결정될 수 있다. Business roles can dictate the responses appropriate to a given situation. These roles and responses can be determined using intelligent routing, with reference to data collected by web indexers such as Google.

크라우드소싱(Crowdsourcing)은 실시간 구현들에 일반적으로 적당하지 않다. 그러나, 시스템을 방해하고 대응하는 동작을 생성하지(또는 이용자가 아무것도 선택하지 않는 동작들을 생성하지) 못하는 입력들은 크라우드소스 분석을 위해 오프라인에 참조될 수 있다 - 그 결과 다음 번에 그것이 제공되고 더욱 양호하게 처리될 수 있다. Crowdsourcing is generally not suitable for real-time implementations. However, inputs that interfere with the system and produce no corresponding action (or produce actions that the user selects nothing) can be referenced offline for crowdsource analysis-as a result it is provided next time and better. Can be processed.

이미지-기반 내비게이션 시스템들은 웹 페이지-기반 내비게이션 시스템으로부터 친숙한 상이한 토폴로지를 제공한다. 도 57a는 인터넷 상의 웹 페이지들이 포인트-대-포인트 방식으로 관련되는 것을 도시한다. 예를 들면, 웹 페이지 1은 웹 페이지들 2 및 3에 링크될 수 있다. 웹 페이지 3은 페이지 2에 링크될 수 있다. 웹 페이지 2는 페이지 4에 링크될 수 있다. 등. 도 57b는 이미지-기반 내비게이션과 연관된 대조적인 네트워크를 도시한다. 개별 이미지들은 중앙 노드(예를 들면, 라우터)에 링크되고, 이것은 그 후에 이미지 정보에 따라 다른 노드들(예를 들면, 응답 엔진들)에 링크된다. Image-based navigation systems provide a different topology that is familiar from web page-based navigation systems. 57A shows that web pages on the Internet are related in a point-to-point manner. For example, web page 1 can be linked to web pages 2 and 3. Web page 3 can be linked to page 2. Web page 2 may be linked to page 4. Etc. 57B shows a contrasting network associated with image-based navigation. The individual images are linked to a central node (eg router), which is then linked to other nodes (eg response engines) in accordance with the image information.

여기서, "라우터"는 - 인터넷 트래픽 라우터들과 친숙한 경우에서와 같이 - 패킷과 함께 전달된 어드레스 정보에 의해 결정된 목적지에 입력 패킷을 단순히 라우팅하지 않는다. 오히려, 라우터는 이미지 정보를 취하고 그것으로 무엇을 할지를, 예를 들면, 어떤 응답 시스템이 이미지 정보를 추론해야 하는지에 대해 결정한다. Here, the "router" does not simply route the input packet to the destination determined by the address information carried with the packet-as in the case of familiarity with Internet traffic routers. Rather, the router takes the image information and decides what to do with it, for example, which response system should infer the image information.

라우터들은 네트워크 상의 독립형 노드들일 수 있거나, 그들은 다른 디바이스들과 통합될 수 있다. (또는 그 기능은 그러한 위치들 사이에 분배될 수 있다.) 착용 가능한 컴퓨터는 라우터 부분(예를 들면, 소프트웨어 명령어들의 세트)을 가질 수 있다 - 이것은 컴퓨터로부터 이미지 정보를 취하고, 이것이 어떻게 처리되어야 하는지를 결정한다. (예를 들면, 이미지 정보가 비즈니스 카드의 이미지인 것으로 인식되면, 그것은 OCR 이름, 폰 번호 및 다른 데이터일 수 있고, 이를 접촉 데이터베이스에 입력한다.) 상이한 타입들의 입력 이미지 정보에 대한 특정 응답은 예를 들면 컴퓨터의 오퍼레이팅 시스템에 의해 유지되는 종류의 또는 다른 레지스트리 데이터베이스에 의해 결정될 수 있다. Routers can be standalone nodes on the network, or they can be integrated with other devices. (Or the functionality may be distributed between such locations.) A wearable computer may have a router portion (eg a set of software instructions)-it takes image information from the computer and how it should be processed. Decide (For example, if the image information is recognized to be an image of a business card, it may be an OCR name, phone number and other data, which is entered into the contact database.) The specific response to different types of input image information is yes. For example, it may be determined by a kind or other registry database maintained by the computer's operating system.

마찬가지로, 응답 엔진들이 네트워크 상의 독립형 노드들일 수 있지만, 이들은 또한 다른 디바이스들과 통합될 수 있다(또는 그 기능들이 분배될 수 있다.) 착용 가능한 컴퓨터는 라우터 부분에 의해 제공된 정보에 대해 동작을 취하는 하나 또는 여러 개의 상이한 응답을 가질 수 있다. Similarly, the response engines may be standalone nodes on the network, but they may also be integrated with other devices (or their functions may be distributed). A wearable computer is one that takes action on information provided by the router portion. Or may have several different responses.

도 52는 여러 컴퓨터들(A-E)을 활용하는 어레인지먼트를 도시하며, 이중 일부는 착용 가능한 컴퓨터(예를 들면, 셀 폰들)일 수 있다. 컴퓨터들은 처리기, 메모리, 저장장치, 입력/출력 등의 일반적인 구성요소를 포함한다. 저장장치 또는 메모리는 이미지들, 오디오 및 비디오와 같은 콘텐트를 포함할 수 있다. 컴퓨터들은 또한 하나 이상의 라우터들 및/또는 응답 엔진들을 포함할 수 있다. 독립형 라우터들 및 응답 엔진들이 또한 네트워크에 결합될 수 있다.52 illustrates an arrangement utilizing several computers A-E, some of which may be wearable computers (eg, cell phones). Computers include common components such as processors, memory, storage, input / output, and the like. Storage or memory may include content such as images, audio and video. Computers may also include one or more routers and / or response engines. Standalone routers and response engines may also be coupled to the network.

컴퓨터들은 네트워킹되어 링크(150)에 의해 개략적으로 도시된다. 이 접속은 인터넷 및/또는 무선 링크들(WiFi, WiMax, 블루투스 등), 피어-투-피어(P2P) 클라이언트를 포함하는 적어도 어떤 특정한 컴퓨터들에서의 소프트웨어로서, 적어도 일부의 컴퓨터의 리소스들이 네트워크 상의 다른 컴퓨터에 이용 가능하게 하고, 그 컴퓨터가 다른 컴퓨터들의 어떤 특정한 리소스들을 상호 활용할 수 있게 하는 상기 소프트웨어를 포함하여, 임의의 알려진 네트워킹 어레인지먼트에 의해 알려질 수 있다. The computers are networked and schematically illustrated by the link 150. This connection is software on at least some specific computers, including the Internet and / or wireless links (WiFi, WiMax, Bluetooth, etc.), a peer-to-peer (P2P) client, wherein at least some of the resources of the computer It may be known by any known networking arrangement, including the software that makes it available to other computers and allows the computer to interoperate certain specific resources of other computers.

P2P 클라이언트를 통해, 컴퓨터 A는 컴퓨터 B로부터 이미지, 비디오 및 오디오 콘텐트를 획득할 수 있다. 컴퓨터 B 상의 공유 파라미터들은 어떤 콘텐트가 공유되고 누구와 공유되는지를 결정하도록 설정될 수 있다. 컴퓨터 B 상에서 데이트는 예를 들면, 일부 콘텐트가 비밀로 유지되고; 일부 콘텐트가 알려진 자들(예를 들면, 소셜 네트워크 "친구들"의 단)과 공유될 수 있고; 나머지 콘텐트는 자유롭게 공유될 수 있는 것을 명시할 수 있다. (지리적 위치 정보와 같은 다른 정보가 또한 - 이러한 파라미터들을 조건으로 - 공유될 수 있다.)Through the P2P client, computer A can obtain image, video and audio content from computer B. The sharing parameters on computer B may be set to determine what content is shared and with whom. Dating on computer B, for example, keeps some content confidential; Some content may be shared with known ones (eg, groups of social network “friends”); The remaining content may specify that it can be freely shared. (Other information, such as geographical location information, may also be shared-subject to these parameters.)

파티에 기초하여 공유 파라미터들을 설정하는 것 외에도, 공유 파라미터는 또한 콘텐트 연령에 기초하여 공유하는 것을 명시할 수 있다. 예를 들면, 1년보다 오래된 콘텐트/정보는 자유롭게 공유될 수 있고, 1달보다 오래된 콘텐트는 친구들의 단과 공유될 수 있다(또는 다른 규칙-기반 제한들에 따라). 다른 어레인지먼트들에서, 더 신선한 콘텐트는 가장 자유롭게 공유되는 타입일 수 있다. 예를 들면, 과거 시간, 날 또는 주 내에 캡처되거나 저장된 콘텐트는 자유롭게 공유될 수 있고, 과거 달 또는 해 내로부터의 콘텐트는 친구들과 공유될 수 있다. In addition to setting sharing parameters based on the party, the sharing parameters may also specify sharing based on content age. For example, content / information older than one year may be freely shared, and content older than one month may be shared with a group of friends (or according to other rule-based restrictions). In other arrangements, fresher content may be of the type that is most freely shared. For example, content captured or stored within a past time, day or week may be freely shared, and content from past months or years may be shared with friends.

제외 리스트는 상술된 규칙들과는 상이하게 다루어지는(예를 들면 절대 공유되지 않거나 항상 공유되는) 콘텐트 - 또는 하나 이상의 등급들의 콘텐트 - 를 식별할 수 있다. The exclusion list may identify content that is treated differently from the rules described above (eg, never shared or always shared) —or one or more ratings of content.

콘텐트를 공유하는 것 외에도, 컴퓨터들은 또한 네트워크에 걸쳐 그들 각각의 라우터 및 응답 엔진 리소스들을 공유할 수 있다. 따라서, 예를 들면, 컴퓨터 A가 특정 타입의 이미지 정보에 적합한 응답 엔진을 가지지 않는다면, 응답 엔진에 의해 처리하기 위해 컴퓨터 B에 정보를 넘겨줄 수 있다. In addition to sharing content, computers can also share their respective router and response engine resources across the network. Thus, for example, if computer A does not have a response engine suitable for a particular type of image information, it can pass the information to computer B for processing by the response engine.

이러한 배포된 아키텍처는 감소된 비용 및 증가된 신뢰도의 관점에서 다수의 이점들을 가지는 것을 알 것이다. 또한, "피어" 그룹들은 예를 들면, 특정 공간적 환경(예를 들면, 특정 WiFi 시스템에 의해 서빙된 영역) 내에서 그들 자신들을 발견하는 컴퓨터들과 같이, 지리적으로 규정될 수 있다. 따라서, 피어는 근처의 컴퓨터들로부터의 콘텐트 및 서비스들에 대한 ad hoc 가입들을 동적으로 확립할 수 있다. 컴퓨터가 그 환경을 벗어나면, 세션은 종료한다. It will be appreciated that this distributed architecture has a number of advantages in terms of reduced cost and increased reliability. Also, "peer" groups can be geographically defined, such as computers that find themselves within a particular spatial environment (eg, an area served by a particular WiFi system). Thus, a peer can dynamically establish ad hoc subscriptions for content and services from nearby computers. If the computer leaves the environment, the session ends.

일부 연구원들은 모든 우리의 경험들이 디지털 형태로 캡처되는 날이라고 예측한다. 실제로, 마이크로소프트에서의 고든 벨은 그의 기술들 Cyber All, SenseCam 및 MyLifeBits을 통해 그의 최근 존재의 디지털 아카이브를 컴파일하였다. 벨의 아카이브에는 모든 전화 호들, 일상의 비디오, 시청한 모든 TV 및 비디오의 캡처들, 방문한 모든 웹 페이지들의 아카이브, 방문한 모든 장소들의 맵 데이터, 그의 수면 무호흡 동안의 수면다원검사도들 등등의 기록들이 있다. (다른 정보를 위해, 예를 들면, 2007년 3월 Scientific American에서 Bell에 의한 A Digital Life; Gemmell, MyLifeBits: A Personal Database for Everything, Microsoft Research Technical Report MSR-TR-2006-23; Gemmell, Passive Capture and Ensuing Issues for a Personal Lifetime Store, Proceedings of The First ACM Workshop on Continuous Archival and Retrieval of Personal Experiences (CARPE '04), pp. 48-55; 2007년 5월 27일 The New Yorker에서, Wilkinson에 의한 Remember This를 참조한다. 또한, 고든 벨의 마이크로소프트 리서치 웹 페이지 및 CARPE (Capture, Archival & Retrieval of Personal Experiences)에 대한 ACM 특정 관심 그룹 웹 페이지에 인용된 다른 참고문헌들을 참조한다.)Some researchers predict that all our experiences will be captured in digital form. In fact, Gordon Bell at Microsoft compiled his digital presence's digital archive through his technologies Cyber All, SenseCam and MyLifeBits. Bell's archive includes records of all phone calls, daily videos, captures of all TV and video watched, archives of all web pages visited, map data of all places visited, polysomnograms during his sleep apnea, and more. have. (For other information, see, for example, A Digital Life by Bell at Scientific American in March 2007; Gemmell, MyLifeBits: A Personal Database for Everything, Microsoft Research Technical Report MSR-TR-2006-23; Gemmell, Passive Capture. and Ensuing Issues for a Personal Lifetime Store, Proceedings of The First ACM Workshop on Continuous Archival and Retrieval of Personal Experiences (CARPE '04), pp. 48-55; May 27, 2007 at The New Yorker, Remember by Wilkinson See also Gordon Bell's Microsoft Research web page and other references cited on the ACM Specific Interest Group web page on CARPE (Capture, Archival & Retrieval of Personal Experiences).

본 기술의 양태들을 통합하는 특정 실시예들은 이러한 경험적 디지털 콘텐트와 함께 이용하기에 매우 적합하다 - 시스템(즉, 시스템은 이용자의 현재 경험에 응답함)에 대한 입력으로서, 또는 메타데이터, 습관들 및 다른 속성들이 채굴될 수 있는 리소스로서(초기에 상술된 실시예들에서 플리커 아카이브의 역할에서 서비스를 포함).Certain embodiments incorporating aspects of the present technology are well suited for use with such empirical digital content—as input to a system (ie, the system is responsive to a user's current experience), or as metadata, habits and As a resource from which other attributes can be mined (including services in the role of the flicker archive in the embodiments described earlier).

입력으로서 개인적 경험을 활용하는 실시예들에서, 처음에 시스템이 트리거링하게 하고, - 제한적으로 자유롭게 실행하기보다(처리, 메모리 및 대역폭 문제들의 관점으로부터 현재 금지됨) - 이용자가 원할 때에만 응답하는 것이 바람직하다. In embodiments that utilize a personal experience as input, it is desirable to initially trigger the system and to respond only when the user wants-rather than running freely limitedly (currently forbidden in terms of processing, memory and bandwidth issues). Do.

이용자의 바람은 이용자에 의한 의도된 동작에 의해 표현될 수 있으며, 예를 들면, 버튼을 누르거나 머리 또는 손으로의 제스처를 한다. 시스템은 현재 경험적 환경으로부터 데이트를 취하고 후보 응답들을 제공한다. The user's desires can be expressed by the intended action by the user, for example, pressing a button or making a gesture with the head or hand. The system takes a date from the current empirical environment and provides candidate responses.

아마도 더 많은 관심거리는 생물학적 센서들을 통해 이용자의 관심을 결정하는 시스템들이다. 뇌파전위기록술(Electroencephalography)은 예를 들면, 시스템의 응답을 트리거링하는 신호를 생성하기 위해 이용될 수 있다(또는 예를 들면, 현재 환경에서 상이한 자극에 응답하여 여러 상이한 응답들 중 하나를 트리거한다). 피부 도전성, 동공 팽창 및 다른 자율적인 생리적 응답들은 또한 선택적으로 또는 전기적으로 감지될 수 있고 트리거링 신호를 시스템에 제공할 수 있다. Perhaps more interesting are systems that determine the user's interest through biological sensors. Electroencephalography can be used, for example, to generate a signal that triggers a system's response (or, for example, triggers one of several different responses in response to a different stimulus in the current environment). ). Skin conductivity, pupil dilation and other autonomic physiological responses can also be selectively or electrically sensed and provide a triggering signal to the system.

경험적-비디오 센서에 의해 캡처된 시야의 어떤 오브젝트가 이용자에게 관심이 있는지를 식별하기 위하여 눈 추적 기술이 활용될 수 있다. Tony가 바에 앉아 있고, 그의 눈이 근처 여성 앞의 진귀한 맥주 병에 닿아 있다면, 시스템은 그의 초점 관심을 식별할 수 있고, 그 병에 대응하는 픽셀들에 그 자신의 처리 수고들의 초점을 맞춘다. 2개의 신속한 눈-깜박임들과 같은 Tony로부터의 신호를 이용하여, 시스템은 그 맥주병에 기초하여 후보 응답들을 제공하기 위한 수고를 착수할 수 있다 - Tony 자신의 개인용 프로파일 데이터뿐만 아니라 그 환경으로부터 수집된 다른 데이터(일시, 날짜, 주변 오디오 등)에 의해 아마도 또한 통지받는다. (응시 인식 및 관련 기술이 예를 들면 애플의 특허 공개 20080211766에 개시되어 있다.) Eye tracking techniques can be utilized to identify which objects in the field of view captured by the heuristic-video sensor are of interest to the user. If Tony is sitting at the bar and his eyes touch a rare beer bottle in front of a nearby woman, the system can identify his focus interest and focus his own processing efforts on the pixels corresponding to the bottle. Using a signal from Tony, such as two rapid eye-flashes, the system can undertake the effort to provide candidate responses based on the beer bottle-collected from the environment as well as Tony's own personal profile data. You may also be notified by other data (date, date, ambient audio, etc.). (Speech recognition and related technologies are disclosed in, for example, Apple Patent Publication 20080211766.)

시스템은 예를 들면, 이미지(및/또는 OCR)로부터의 패턴 매칭에 의해 맥주를 Doppelbock으로서 신속히 식별할 수 있다. 그 식별자를 이용하여, 맥주를 표시하는 다른 리소스들은 파올라의 프란 스트리트의 몽크스에 의해 양조되는 Bavaria로부터 발생된다는 것을 발견한다. 그것의 9% 알콜 함유도 또한 특징이다. The system can quickly identify the beer as Doppelbock by, for example, pattern matching from an image (and / or OCR). Using that identifier, it finds that other resources representing beer originate from Bavaria, which is brewed by Monks of Fran Street in Paola. Its 9% alcohol content is also characteristic.

친구들이 Tony에게 이용 가능하게 만든 개인용 경험적 아카이브들을 확인함으로써, 시스템은 그의 친구 Geoff는 Doppelbock을 좋아하고 가장 최근에 더블린의 버프에서 한 병의 술을 마신 것을 학습한다. Tony의 병과의 비스듬하게 마주치는 것은 그 자신의 경험적 아카이브에 로깅되고 있으며, Geoff는 나중에 동일한 것을 마주칠 수 있다. 비스듬하게 마주치는 사실은 프라하에서 Geoff에 실시간 관련될 수 있어서, 그의 친구들의 활동들에 관한 온고잉 데이터 공급을 하게 하는데 도움을 준다. By checking the personal empirical archives that friends made available to Tony, the system learns that his friend Geoff likes Doppelbock and most recently drank a bottle of alcohol at Dublin's buffs. Oblique encounters with Tony's illness are logged in his own empirical archive, and Geoff can later encounter the same. The oblique encounter can be related to Geoff in Prague in real time, helping to provide on-going data on the activities of his friends.

바는 또한 경험적 데이터 서버를 공급할 수 있고, Tony는 그것에 무선으로 허가된 액세스를 한다. 서버는 바에서 캡처되고 고객들에 의해 기여된 디지털 데이터의 아카이브를 유지한다. 서버는 또한 관련된 메타데이터 & 정보에 프라이밍될 수 있고, 관리는 어떤 브랜드가 수주일 내에 나오게 될지, 또는 어떤 날에 특별한지, 파울 스트리트의 몽크스의 양조 방법들에 관한 위키피디아 페이지와 같이, 그의 고객들에게 관심 있는 것으로 간주할 수 있다. (이용자 선호마다, 일부 이용자들은 그들이 바를 떠날 때 그들 데이터가 클리어되는 것을 요구하고; 다른 이용자들은 데이터가 유지되도록 허용한다.) Tony의 시스템은 정보의 어떤 홀수 비트들이 발견될 수 있는지를 알기 위하여 로컬 환경의 경험적 데이터 서버를 일상적으로 확인할 수 있다. 이번에는 바의 의자 3의 여성 Doppelbock를 가진 여성 - 이 그녀의 친구들 중 Tom<인코딩된 최종 이름>을 가지는 것을 보여준다. Tony의 시스템은 Geoff의 친구들의 사이클(Geoff는 그의 친구들을 이용 가능하게 할 수 있음)이 동일한 Tom을 포함하는 것을 인식한다. The bar can also supply an empirical data server, and Tony gives it wirelessly authorized access. The server maintains an archive of digital data captured in the bar and contributed by customers. The server can also be primed with relevant metadata & information, and management can determine which brand will be released in a few weeks, or on what day, his clients, such as the Wikipedia page on Monks' brewing methods in Paul Street. May be considered of interest to (Per user preference, some users require their data to be cleared when they leave the bar; others allow the data to be retained.) Tony's system uses local to know what odd bits of information can be found. You can routinely check your environment's empirical data server. This time, the woman with the female Doppelbock on bar stool 3 shows that Tom of her friends has an encoded final name. Tony's system recognizes that Geoff's friends' cycles (Geoff can make his friends available) contain the same Tom.

그의 두 번 깜박임 후의 수초 후에 Tony의 셀 폰은 그의 벨트 상에서 진동한다. 이를 뒤로 젖혀 열고 측면 상의 롤 휠을 돌리면, Tony는 시스템이 모은 정보를 제공하는 일련의 스크린들을 리뷰한다 - Tony에게 가장 유용한 것으로 보이는 정보가 먼저 보인다. A few seconds after his two blinks, Tony's cell phone vibrates on his belt. Flip it back and rotate the roll wheel on the side, and Tony reviews a series of screens that provide information gathered by the system-the information that seems most useful to Tony first.

이러한 Tony-Geoff-Tom 접속(일반적으로 6도의 분리보다 더 가깝게)에 관한 지식을 갖추고, 그녀의 Doppelbock 맥주에 관한 사소한 것들로 프라이밍되어, Tony는 그의 잔을 들고 바를 걸어다닌다. (이용자 인터페이스들 및 시각화 기술들을 포함한 이러한 어레인지먼트들에서 활용될 수 있는 부가의 세부사항들은 2009년 MobileHCI에서 Dunekacke에 의한 "Localized Communication with Mobile Devices"에서 찾을 수 있다.) With knowledge of this Tony-Geoff-Tom connection (typically closer than six degrees of separation) and primed with the trivial things about her Doppelbock beer, Tony walks around the bar with his glass. (Additional details that can be utilized in these arrangements, including user interfaces and visualization techniques, can be found in Dunekacke's "Localized Communication with Mobile Devices" at 2009 MobileHCI.)

BitTorrent와 같은 P2P 네트워크들이 오디오, 이미지 및 비디오 콘텐트를 공유하는 것을 허용하지만, 도 52에서 도시된 것과 동일한 어레인지먼트들은 네트워크들이 경험적 콘텐트의 맥락에서 더 풍부한 세트를 공유하도록 허용한다. P2P 네트워크들의 기본 개념은 콘텐트의 긴 테일을 채집하는 기술들에도 불구하고, 대다수의 이용자들은 유사한 콘텐트(오늘밤의 NBA 게임의 점수, 로스트의 현재 에피소드 등)에 관심이 있다는 것과, 충분한 대역폭 및 프로토콜들이 주어지면, 개별 스트림들을 송신하는 것이 아니라 네트워크 상에 당신의 어떤 "이웃들"을 가지는지에 기초하여 콘텐트를 함께 결합함으로써, 이용자들에 유사한 콘텐트를 전달하기 위한 가장 효율적인 메커니즘이 되는 것이다. 이러한 동일한 메커니즘은 바에서 Dopplebock를 마시는 것 또는 바에 있는 동안 폰 상으로 오늘밤의 NBA 게임의 하이라이트를 시청하는 것과 같은 경험을 향상시키는 것에 관련된 메타데이터를 제공하기 위해 이용될 수 있다. 상술된 ad-hoc 네트워크에 이용된 프로토콜은 실제 P2P 양식에서 또는 피어 등록 서비스를 제공하는 경험 서버들(초기 P2P 네트워크들과 유사함)로 P2P 프로토콜들을 레버리징할 수 있으며, ad-hoc 네트워크에서의 모든 디바이스들은 무료이든 유료이든 또는 동일 종류의 정보의 물물교환이든 등 어떤 경험들(메타데이터, 콘텐트, 소셜 접속들 등)을 그들이 이용할 수 있는지를 광고한다. 애플의 Bonjour 소프트웨어는 이러한 종류의 애플리케이션에 매우 적합하다. While P2P networks such as BitTorrent allow sharing audio, image and video content, the same arrangements as shown in FIG. 52 allow networks to share a richer set in the context of empirical content. The basic concept of P2P networks is that despite the techniques of gathering long tails of content, the majority of users are interested in similar content (score of tonight's NBA game, current episode of lost, etc.), sufficient bandwidth and protocols. Is given, it is the most efficient mechanism for delivering similar content to users by combining content together based on what "neighbors" you have on the network, rather than transmitting individual streams. This same mechanism can be used to provide metadata related to enhancing the experience, such as drinking Dopplebock in a bar or watching the highlights of tonight's NBA game on the phone while in the bar. The protocol used in the ad-hoc network described above may leverage the P2P protocols in actual P2P form or to experience servers that provide peer registration services (similar to the initial P2P networks), All devices advertise what experiences they have available (metadata, content, social connections, etc.), whether free or paid, or bartering the same kind of information. Apple's Bonjour software is well suited for this kind of application.

이러한 기본구조 내에서, Tony의 셀 폰은 피어 네트워크에 질문을 포스팅함으로써 Dopplebock에 관한 정보를 간단히 검색할 수 있고, 소스를 알지 못하더라도 바 내의 다양한 디바이스들 또는 경험 서버로부터 풍부한 정보를 수신할 수 있다. 유사하게, 경험 서버는 또한, 데이터-레코더로서 작동할 수 있어서, ad-hoc 네트워크 내에 이들의 경험들을 기록하고, 시간 및 장소에서 경험에 대한 지속성을 제공한다. Geoff는 미래의 어떤 지점에 동일한 바를 방문할 수 있고, 그의 친구 Tony와 어떤 스레드들의 통신 또는 접속들을 2주 전에 만들었는지를 알거나, 또는 다음 번에 그가 바에 있는 미래의 시간을 검색하기 위해 Tony에 대한 표기를 남겨둘 가능성이 있다. Within this infrastructure, Tony's cell phones can simply retrieve information about the Dopplebock by posting questions to the peer network and receive a wealth of information from various devices or experience servers in the bar without knowing the source. . Similarly, the experience server can also act as a data-recorder, recording their experiences in an ad-hoc network and providing persistence for the experience at time and place. Geoff can visit the same bar at some point in the future, find out which threads made communications or connections with his friend Tony two weeks ago, or the next time he visits Tony to search for future times in the bar. There is a possibility to leave the notation for.

네트워크 상의 트래픽에 의해 표현된 소셜 스레드들을 채집하기 위한 능력은 또한, 바의 소유주들이 상호작용 또는 도입들을 조정함으로써 고객들의 경험들을 증대시킬 수 있게 한다. 이것은 테마 기반 게임들에 사람들을 참여하게 허용함으로써 게임의 형태로 또는 공유된 관심들, 단일들 등을 가진 사람들을 포함할 수 있으며, 고객들은 비밀을 풀거나(보드 게임 단서와 유사) 바에서 누군가의 실제 아이덴티티를 발견하는 단서들을 함께 결합한다. 최종적으로, 시청률에 관련된 인구통계적 정보는 이들이 어떤 맥주들이 다음에 비축되고 어디에서 광고할 것인지 등을 고려할 때 소유주들에게 가치있는 재료일 것이다.The ability to collect social threads represented by traffic on the network also allows bar owners to enhance the customer's experiences by coordinating interactions or introductions. This can include people in the form of a game or with shared interests, singles, etc. by allowing people to participate in theme-based games, where customers can unlock secrets (similar to board game clues) or have someone at the bar. Combine the clues together to discover the true identity of the. Finally, demographic information related to audience ratings would be a valuable ingredient for owners when they consider which beers will be stocked next and where they will advertise.

또 다른 논의Another discussion

애플 아이폰과 같은 어떤 휴대용 디바이스들은 미리 규정된 기능들에 액세스하기 위한 단일-버튼을 제공한다. 이들 가운데서 가장 좋아하는 주식들의 조각들을 리뷰하고, 날씨 예보를 리뷰하고, 이용자의 위치의 일반 맵을 리뷰한다. 부가의 기능들이 이용 가능하지만, 이용자는 예를 들면, 가장 좋아하는 웹 사이트 등에 도달하기 위해 일련의 부가의 조작들을 착수해야 한다. Some portable devices, such as the Apple iPhone, provide a single button for accessing predefined functions. Review pieces of your favorite stocks among these, review weather forecasts, and review a general map of your location. Although additional functions are available, the user must undertake a series of additional operations, for example, to reach his favorite web site and the like.

본 기술의 특정 양태의 실시예는 이들의 다른 조작들이 특이한 이미지를 캡처함으로써 용이해지도록 허용한다. 이용자의 손의 이미지를 캡처하여 - 구유의 갓난 아기의 실시간 비디오를 전달하는 - 베이비캠 백홈에 이용자를 링크할 수 있다. 손목 시계의 이미지를 캡처하여 이용자의 운전 집에 대한 라우트의 어떤 부분을 따른 트래픽 상태들 등을 보여주는 맵을 로딩할 수 있다. 이러한 기능은 도 53 내지 도 55에 도시된다. Embodiments of certain aspects of the present technology allow their other manipulations to be facilitated by capturing unusual images. Capturing an image of the user's hand-linking the user to the BabyCam back home-delivering a real-time video of the baby's manger. An image of a wrist watch can be captured to load a map showing traffic conditions along some portion of the route to the user's driving home, and the like. This function is shown in FIGS. 53-55.

휴대용 디바이스에 대한 이용자 인터페이스는 이용자가 상이한 비주얼 부호들을 가진 상이한 기능들을 연관시키도록 허용하는 셋-업/트레이닝 단계를 포함한다. 이용자는 화상을 캡처하고, 묘사된 오브젝트와 연관되는 URL 및 동작 이름을 입력하도록 촉구한다. (URL은 하나의 타입의 응답이다; 다른 것들이 -JAVA 애플리케이션을 론칭하는 것 등과 같이 - 또한 이용될 수 있다.) The user interface for the portable device includes a set-up / training step that allows the user to associate different functions with different visual signs. The user is prompted to capture an image and enter a URL and action name associated with the depicted object. (URL is a type of response; others can also be used-such as launching a JAVA application.)

그 후에 시스템은 유사한 이미지들이 인식될 수 있는 특징 벡터들의 세트를 도출함으로써(예를 들면, 패턴/템플릿 매칭을 통해) 스냅핑된 이미지를 특징짓는다. 특징 백터들은 기능 이름 및 연관된 URL과 연관하여 데이터 구조(도 55)에 저장된다. The system then features the snapped image by deriving a set of feature vectors from which similar images can be recognized (eg, via pattern / template matching). Feature vectors are stored in the data structure (Figure 55) in association with the function name and the associated URL.

이러한 초기 트레이닝 단계에서, 이용자는 동일한 비주얼 부호의 여러 이미지들을 캡처할 수 있다 - 아마도 상이한 거리들 및 조망들로부터 및 상이한 조명 및 백그라운드들을 가지고. 특징 추출 알고리즘은 모든 트레이닝 이미지들의 공유된 유사성들을 캡처하는 특징 세트를 추출하기 위해 콜렉션을 처리한다. In this initial training phase, the user can capture several images of the same visual sign-perhaps from different distances and views and with different lighting and backgrounds. The feature extraction algorithm processes the collection to extract a feature set that captures shared similarities of all training images.

이미지 특징들의 추출 및 데이터 구조의 저장은 휴대용 디바이스에서 또는 원격 디바이스에서(또는 분산된 방식으로) 실행될 수 있다. Extraction of image features and storage of the data structure may be performed at the portable device or at the remote device (or in a distributed manner).

나중 동작에서, 디바이스는 저장된 비주얼 부호들 중 하나와의 대응에 대하여 디바이스에 의해 캡처된 각각의 이미지를 확인한다. 어떤 것이 인식되면, 대응하는 동작이 착수될 수 있다. 아니면, 디바이스는 새로운 이미지를 캡처할 때 이용자에게 이용 가능한 다른 기능들로 응답한다.In later operation, the device verifies each image captured by the device for correspondence with one of the stored visual signs. If something is recognized, the corresponding action can be undertaken. Otherwise, the device responds with other functions available to the user when capturing a new image.

다른 실시예에서, 휴대용 디바이스에는 2개 이상의 셔터 버튼들이 장착된다. 하나의 버튼의 조작은 이미지를 캡처하고 - 캡처된 이미지와 저장된 비주얼 부호 사이의 가장 근접한 매칭에 기초하여- 동작을 실행한다. 다른 버튼의 조작은 이러한 동작을 착수하지 않고 이미지를 캡처한다. In another embodiment, the portable device is equipped with two or more shutter buttons. Manipulation of one button captures the image and executes the operation-based on the closest match between the captured image and the stored visual code. Operation of other buttons captures the image without undertaking this action.

디바이스 UI는 도 54에 도시된 바와 같이, 이용자에게 비주얼 용어 사전의 부호들을 제공하는 제어를 포함할 수 있다. 활성화되면, 상이한 비주얼 부호들의 섬네일들은 초기에 저장된 기능들의 이름들과 연관하여 디바이스 디스플레이 상에 제공된다 - 이용자에게 부호들의 규정된 어휘를 상기시킨다. The device UI may include control to provide the user with signs of the visual term dictionary, as shown in FIG. 54. When activated, thumbnails of different visual signs are provided on the device display in association with the names of the initially stored functions-reminding the user of the defined vocabulary of the signs.

이러한 부호들의 용어사전을 론칭하는 제어는 - 스스로 - 이미지가 될 수 있다. 이 기능에 적절한 하나의 이미지는 일반적으로 특징없는 프레임이다. 모든 어두운 프레임은 렌즈를 커버하여 셔터를 동작시킴으로써 달성될 수 있다. 모든 밝은 프레임은 렌즈가 광 소스에 향하게 하여 셔터를 동작시킴으로써 달성될 수 있다. 다른 실질적으로 특징없는 프레임(중간 밀도의)은 피부 또는 벽 또는 하늘의 패치를 이미징함으로써 달성될 수 있다. (실질적으로 특징이 없기 위하여, 프레임은 다른 저장된 비주얼 부호들 중 하나의 매칭보다 더욱 가깝게 특징이 없어야 한다. 다른 실시예들에서, "특징없음(featureless)"은 이미지가 임계값보다 낮은 텍스처 메트릭을 가지는 경우라고 결론지어질 수 있다.)The control of launching a glossary of these symbols can-by itself-be an image. One image that is appropriate for this feature is typically a featureless frame. All dark frames can be achieved by covering the lens and operating the shutter. All bright frames can be achieved by operating the shutter with the lens facing the light source. Another substantially featureless frame (of medium density) can be achieved by imaging a patch of skin or wall or sky. (In order to be substantially non-featured, the frame should be characterized less closely than the matching of one of the other stored visual codes. In other embodiments, a "featureless" means that the image has a texture metric lower than the threshold. It can be concluded that with eggplant.)

(모든 밝은 프레임을 캡처함으로써 동작을 트리거링하는 개념은 임의의 디바이스 기능으로 확장될 수 있다. 일부 실시예들에서, 반복된 모든 밝은 노출들은 기능을 대안적으로 토글링 온 및 오프한다. 모든 어두운 및 중간 밀도의 프레임들도 마찬가지이다. 임계값은 명령어으로 해석되기 위하여 그러한 프레임이 어떻게 "밝거나" "어두워야" 하는지를 확립하기 위하여 - UI 제어를 가진 이용자에 의해, 또는 제조업자에 의해 - 설정될 수 있다. 예를 들면, 백만 픽셀 센서로부터의 8-비트(0-255) 픽셀 값들이 합산될 수 있다. 합이 900,000보다 적다면, 프레임은 모두 어두운 것으로 간주될 수 있다. 2억 54백만보다 크다면, 프레임은 모두 밝은 것으로 간주될 수 있다. 등.)(The concept of triggering an operation by capturing all bright frames can be extended to any device function. In some embodiments, all repeated bright exposures alternately toggle on and off the function. All dark and The same is true for frames of medium density, the threshold being set by the user with UI control, or by the manufacturer, to establish how such a frame should be "bright" or "dark" in order to be interpreted as a command. For example, 8-bit (0-255) pixel values from a million pixel sensor may be summed up, if the sum is less than 900,000, the frames may all be considered dark. If large, the frames can all be considered bright.

다른 특징없는 프레임들 중 하나는 다른 특수한 응답을 트리거할 수 있다. 그것은 휴대용 디바이스가 용어사전에서 모든 저장된 기능들/URL들(또는 예를 들면, 어떤 특정한 5개 또는 10개)을 론칭하게 할 수 있다. 디바이스는 정보의 결과 프레임들을 캐시하고, 도 44의 버튼(116b) 또는 스크롤 휠(124)과 같이 이용자가 폰 제어들 중 하나를 동작시키거나, 터치 스크린 상에서 어떤 특정한 제스처를 만들 때 이들을 연속으로 제공할 수 있다. (이 기능은 다른 제어들에 의해 마찬가지로 호출될 수 있다.)One of the other featureless frames may trigger another special response. It may allow the portable device to launch all stored functions / URLs (or some particular five or ten, for example) in the glossary. The device caches the resulting frames of information and provides them continuously when the user operates one of the phone controls or makes some particular gesture on the touch screen, such as the button 116b or scroll wheel 124 of FIG. can do. (This function can likewise be called by other controls.)

제 3의 특징없는 프레임들(즉, 어두운, 백색, 또는 중간-밀도)은 디바이스의 위치를 맵 서버에 송신할 수 있고, 맵 서버는 그 후에 이용자의 위치의 다수의 맵 뷰들을 다시 송신할 수 있다. 이들 뷰들은 주변의 스트리트-레벨 이미지와 함께, 상이한 줌 레벨들에서의 공중 뷰들 및 스트리트 맵 뷰들을 포함할 수 있다. 이들 프레임들의 각각은 디바이스에서 캐싱될 수 있고, 스크롤 휠을 돌리거나 다른 UI 제어에 의해 신속하게 리뷰될 수 있다. Third featureless frames (ie, dark, white, or medium-density) may send the device's location to the map server, which may then send back multiple map views of the user's location. have. These views may include aerial views and street map views at different zoom levels, along with the surrounding street-level image. Each of these frames can be cached at the device and quickly reviewed by rotating the scroll wheel or other UI control.

이용자 인터페이스는 비주얼 부호들을 삭제하고 각각에 할당된 이름/기능을 편집하기 위한 제어들을 포함하는 것이 바람직하다. URL들은 키패드 상에서 타이핑하거나, 달리 원하는 목적지에 네비게이팅한 후에 그 목적지를 특정 이미지에 대응하는 응답으로서 저장함으로써 규정될 수 있다. The user interface preferably includes controls for deleting visual signs and editing the name / function assigned to each. URLs can be defined by typing on the keypad or otherwise navigating to a desired destination and then storing the destination as a response corresponding to a particular image.

패턴 인식 엔진의 트레이닝은 이용을 통해 계속할 수 있고, 상이한 비주얼 부호들의 연속하는 이미지들은 각각 그 비주얼 부호가 규정되는 템플릿 모델을 개량하도록 서빙한다.Training of the pattern recognition engine can continue through use, and successive images of different visual codes each serve to refine the template model in which the visual code is defined.

일반적으로 이용자에게 이용 가능한 리소스들을 이용하여, 다양한 상이한 비주얼 부호들이 규정될 수 있는 것을 인식할 것이다. 손은 상이한 위치들에 배열된 손가락을 이용하여(주먹, 5개의 손가락들을 통해 하나로, 엄지-집게 손가락 OK 부호, 손바닥 펴기, 엄지 세우기, 미국식 부호 언어 부호들 등) 상이한 부호들을 규정할 수 있다. 의복 및 그 구성요소들(예를 들면, 신발들, 버튼들)이 또한 장식품들로 이용될 수 있을 때 이용될 수 있다. 흔한 주변기기들로부터(예를 들면 전화)의 특징들이 또한 이용될 수 있다. It will be appreciated that a variety of different visual codes can be defined, generally using the resources available to the user. The hand may define different codes using a finger arranged in different positions (fist, one through five fingers, thumb-forefinger OK sign, open hand, thumb up, US sign language codes, etc.). Clothing and its components (eg shoes, buttons) can also be used when they can be used as ornaments. Features of common peripherals (eg telephone) may also be used.

특정한 좋아하는 동작들을 론칭하는 것 외에도, 이러한 기술들은 다른 동작들에서 이용자 인터페이스 기술로서 이용될 수 있다. 예를 들면, 소프트웨어 프로그램 또는 웹 서비스가 이용자에 대한 옵션들의 리스트를 제공할 수 있다. 예를 들면, 선택 #3을 입력하기 위해 키보드를 조작하기보다는 이용자는 3개의 손가락들의 이미지를 캡처할 수 있다 - 선택을 시각적으로 기호화한다. 소프트웨어는 3개의 손가락 심볼을 디지트 3을 의미하는 것으로 인식하고 그 값을 처리에 입력한다. In addition to launching certain favorite operations, these techniques can be used as user interface technology in other operations. For example, a software program or web service can provide a list of options for a user. For example, rather than manipulating the keyboard to enter selection # 3, the user can capture an image of three fingers-visually sign the selection. The software recognizes the three finger symbols as meaning digit 3 and inputs the values to the process.

원한다면, 비주얼 부호들은 예를 들면, 소셜-네트워킹 웹 사이트 또는 뱅크에 액세스하기 위해, 인증 절차들의 부분을 형성할 수 있다. 예를 들면, 사이트에 이름상 부호 또는 패스워드를 입력한 후에, 이용자는 저장된 이미지를 볼 수 있고(사이트가 인증된 것을 확인하기 위해), 그 후에, 특정 비주얼 타입의 이미지를 제시하도록 촉구될 수 있다(이용자에 의해 초기에 규정되었지만, 이제는 사이트에 의해 명시적으로 촉구되지 않음). 웹 사이트는 이용자가 웹 사이트에 액세스하도록 허용하기 전에, 예상된 응답과의 대응에 대해 방금 캡처된 이미지로부터 추출된 특징들을 확인한다. If desired, the visual codes may form part of the authentication procedures, for example to access a social-networking website or bank. For example, after entering a name code or password on a site, a user can view the stored image (to confirm that the site is authenticated) and then be prompted to present a particular visual type of image. (As initially defined by the user, but now not explicitly prompted by the site). The website checks the features extracted from the just captured image for correspondence with the expected response before allowing the user to access the website.

다른 실시예들은 특정 기간 내(예를 들면 10초)에 스냅샷들의 시퀀스에 응답할 수 있다 - 이미지의 문법. "손목시계", "4개의 손가락들", "3개의 손가락들"의 이미지 시퀀스는 휴대용 디바이스 상의 알람 클럭 기능이 7시에 울리도록 설정할 수 있다. Other embodiments may respond to a sequence of snapshots within a certain time period (eg 10 seconds) —grammar of the image. An image sequence of "watch", "four fingers", "three fingers" can set the alarm clock function on the portable device to ring at seven.

또 다른 실시예들에서, 비주얼 부호들은 - 휴대용 디바이스에 의해 프레임들의 시퀀스(예를 들면 비디오)로서 캡처된 - 움직임을 포함하는 제스처들일 수 있다. In still other embodiments, the visual signs may be gestures that include motion-captured as a sequence (eg, video) of frames by the portable device.

콘텍스트 데이터(예를 들면, 이용자의 지리적 위치, 일시, 달 등을 나타냄)는 응답을 재단하기 위해 또한 이용될 수 있다. 예를 들면, 이용자가 직장에 있을 때, 특정 비주얼 부호에 대한 응답은 이용자의 집으로부터 보안 카메라로부터 이미지를 페치하는 것일 수 있다. 집에서, 동일한 부호에 대한 응답은 직장에서의 보안 카메라로부터의 이미지를 페치하는 것일 수 있다.Context data (e.g., indicating the user's geographic location, date, month, etc.) can also be used to tailor the response. For example, when the user is at work, the response to the particular visual code may be to fetch an image from the security camera from the user's home. At home, the answer to the same sign may be to fetch an image from a security camera at work.

이 실시예에서, 다른 것들에서와 같이, 응답은 비주얼일 필요가 없다. 오디오 또는 다른 출력(예를 들면, 촉각, 후각 등)이 당연히 활용될 수 있다. In this embodiment, as in others, the response need not be visual. Audio or other outputs (eg, tactile, olfactory, etc.) may of course be utilized.

방금 기술된 기술은 이용자가 비주얼 부호들의 용어사전 및 대응하는 맞춤식 응답들을 규정하도록 허용한다. 의도된 응답은 쉽게 이용 가능한 대상을 이미징함으로써 신속히 호출될 수 있다. 캡처된 이미지는 이것이 반드시 비교적 작은 범위의 대안들 사이에서 분류될 필요가 있고 그로부터 구별되어야 하므로, 낮은 품질(예를 들면, 초과 노출된, 흐릿한)일 수 있다.The technique just described allows the user to define a glossary of visual signs and corresponding custom responses. The intended response can be called quickly by imaging the readily available object. The captured image may be of low quality (eg, overexposed, blurry) as it must be classified between and must be distinguished from a relatively small range of alternatives.

비주얼 지능 사전-처리Visual Intelligence Pre-Processing

본 기술의 다른 양태는 카메라 센서에 의해 캡처된 이미지 정보에 대해 하나 이상의 비주얼 지능 사전-처리 동작들을 실행하는 것이다. 이들 동작들은 이용자 요청 없이 그리고 카메라가 습관적으로 실행하는 다른 이미지 처리 동작들 전에 실행될 수 있다. Another aspect of the present technology is to perform one or more visual intelligence pre-processing operations on image information captured by a camera sensor. These operations can be executed without user request and before other image processing operations that the camera habitually performs.

도 56은 셀 폰 카메라와 같은 예시적인 카메라에서 실행된 어떤 특정한 처리를 도시하는 개략도이다. 조명은 포토다이오드들의 어레이를 포함하는 이미지 센서 상에 충돌한다. (CCD 또는 CMOS 센서 기술들이 일반적으로 이용된다.) 결과로서 생긴 아날로그 전기 신호들이 증폭되어, D/A 변환기들에 의해 디지털 형태로 변환된다. 이들 D/A 변환기들의 출력들은 대부분의 미가공 또는 "자연스러운" 형태로 이미지 데이터를 제공한다. 56 is a schematic diagram illustrating certain specific processes performed in an exemplary camera such as a cell phone camera. The illumination impinges on an image sensor that includes an array of photodiodes. (CCD or CMOS sensor technologies are commonly used.) The resulting analog electrical signals are amplified and converted into digital form by D / A converters. The outputs of these D / A converters provide image data in most raw or "natural" form.

상술된 동작들은 일반적으로 공용 기판, 즉 "온칩(on-chip)" 상에 형성된 회로에 의해 실행된다. 다른 처리들이 이미지 데이터에 액세스하기 전에, 하나 이상의 다른 처리들이 일반적으로 실행된다.The above-described operations are generally performed by circuitry formed on a common substrate, i.e., "on-chip." One or more other processes are generally executed before other processes access the image data.

하나의 이러한 다른 동작은 바이엘 보간(모자이크-해제)이다. 센서 어레이의 포토다이오드들은 통상적으로 단일 컬러의 광만을 각각 캡처한다: 컬러 필터 어레이로 인해 적색, 녹색 또는 청색(R/G/B). 이 어레이는 필터 요소들의 타일링된 2 x 2 패턴으로 구성된다: 하나의 적색, 대각선으로 반대되는 하나의 청색, 및 다른 2개의 녹색. 바이엘 보간은 효과적으로, 청색 필터가 존재하는 적색 신호를 제공함으로써, 센서의 결과로서 생긴 R/G/B 모자이크 패턴의 "블랭크들을 충전한다". One such other operation is Bayer interpolation (mosaic-release). Photodiodes of the sensor array typically only capture light of a single color, respectively: red, green or blue (R / G / B) due to the color filter array. This array consists of a tiled 2 × 2 pattern of filter elements: one red, one blue diagonally opposite, and two other greens. Bayer interpolation effectively "fills" the blanks of the resulting R / G / B mosaic pattern by providing a red signal with a blue filter present.

다른 공용 동작은 백색 밸런스 정정이다. 이 처리는 특정 컬러들(특히 중간 컬러들)을 정확하게 렌더링하기 위하여 구성요소 R/G/B 컬러들의 명암들을 조정한다.Another common operation is white balance correction. This process adjusts the contrast of the component R / G / B colors to render certain colors (especially intermediate colors) correctly.

실행될 수 있는 다른 동작들은 감마 정정 및 에지 보강을 포함한다. Other operations that can be performed include gamma correction and edge enhancement.

최종적으로, 처리된 이미지 데이터는 통상적으로 저장 요건들을 감소시키도록 압축된다. JPEG 압축이 가장 일반적으로 이용된다. Finally, the processed image data is typically compressed to reduce storage requirements. JPEG compression is most commonly used.

처리된, 압축된 이미지 데이터가 그 후에 버퍼 메모리에 저장된다. 이 지점에서만 이미지 정보가 셀 폰의 서비스들 및 다른 처리들에 일반적으로 이용 가능하다(예를 들면, 시스템 API를 호출함으로써). The processed, compressed image data is then stored in the buffer memory. Only at this point image information is generally available to the cell phone's services and other processes (eg, by calling a system API).

일반적으로 이 처리된 이미지 데이터로 호출되는 하나의 이러한 처리는 카메라의 스크린 상에서 이용자에게 이미지를 제공하는 것이다. 이용자는 그 후에, 이미지를 평가할 수 있고, 예를 들면, (1) 카메라의 메모리 카드에 이를 저장할지의 여부, (2) 화상 메시지에서 이를 송신할지의 여부, (3) 이를 삭제할지의 여부 등을 결정한다. One such process, commonly called with this processed image data, is to present the image to the user on the screen of the camera. The user can then evaluate the image, for example: (1) whether to store it in the camera's memory card, (2) whether to send it in a video message, (3) whether to delete it, etc. Determine.

이용자가 카메라에 지시할 때까지(예를 들면, 버튼-기반 이용자 인터페이스 또는 그래픽 제어를 통해), 이미지가 버퍼 메모리에 유지된다. 다른 명령어들 없이, 처리된 이미지 데이터의 이용은 단지 셀 폰의 스크린 상에 이를 디스플레이하는 것이다. The image remains in the buffer memory until the user instructs the camera (eg, via a button-based user interface or graphical control). Without other instructions, the use of the processed image data is only to display it on the screen of the cell phone.

도 57은 기술의 현재 논의된 양태의 예시적인 실시예를 도시한다. 아날로그 신호들을 자연스러운 디지털 형태로 변환한 후에, 하나 이상의 다른 처리들이 실행된다. 57 illustrates an example embodiment of a currently discussed aspect of the technology. After converting the analog signals into natural digital form, one or more other processes are performed.

하나의 이러한 처리는 자연스러운 이미지 데이터 상에 푸리에 변환(예를 들면, FFT)를 실행하는 것이다. 이것은 이미지의 공간-도메인 표현을 주파수-도메인 표현으로 변환한다. One such process is to perform a Fourier transform (e.g., FFT) on natural image data. This converts the spatial-domain representation of the image into a frequency-domain representation.

자연스러운 이미지 데이터의 푸리에-도메인 표현은 다양한 방식으로 유용할 수 있다. 하나는 있음직한 바코드 데이터에 대한 이미지를 스크리닝하는 것이다. Fourier-domain representations of natural image data may be useful in a variety of ways. One is to screen images for likely barcode data.

하나의 친숙한 2D 바코드가 밝은- 및 어두운- 사각형들의 체커판-형 어레이이다. 구성요소의 크기가 사각형들이고, 따라서, 그들의 반복 이격은 대응하는 주파수에서 이미지의 푸리에-도메인 표현에서 한쌍의 두드러진 피크들을 제공한다. (피크들은 패턴이 수직 및 수평 방향들 양쪽 모두의 동일한 주파수에서 재발생되는 경우에, UV 평면에서 90도 위상-이격될 수 있다.) 이들 피크들은 주위의 이미지 주파수들에서 다른 이미지 성분들 위로 상당히 확장한다 - 피크들은 종종 주위의 이미지 주파수들의 것의 2배 - 내지 5배 - 또는 10배(또는 그 이상)의 크기를 가진다. 푸리에 변환이 이미지로부터 타일링된 패치들(예를 들면, 16 x 16 픽셀들, 또는 128 x 128 픽셀들 등의 패치들) 상에서 행해진다면, 이미지 프레임의 바코드 부분 내에 전적으로 존재하는 어떤 특정한 패치들이 이 특징적인 주파수를 제외하고 본질적으로 신호 에너지를 가지지 않는 것을 알 수 있다. One familiar 2D barcode is a checkerboard-like array of bright- and dark-squares. The size of the components are squares, so their repetitive spacing provides a pair of prominent peaks in the Fourier-domain representation of the image at the corresponding frequency. (Peaks may be phase-spaced 90 degrees in the UV plane if the pattern is regenerated at the same frequency in both the vertical and horizontal directions.) These peaks extend significantly above the other image components at the surrounding image frequencies. Peaks often have a size two to five times or ten times (or more) of the surrounding image frequencies. If the Fourier transform is performed on patches tiled from an image (e.g., patches of 16 x 16 pixels, or 128 x 128 pixels, etc.), then certain particular patches that are entirely present in the barcode portion of the image frame are characterized by this feature. It can be seen that there is essentially no signal energy except for the normal frequency.

도 57에 도시된 바와 같이, 푸리에 변환 정보는 바코드의 이미지와 연관된 텔타일 부호들에 대해 분석될 수 있다. 템플릿-형 방식이 이용될 수 있다. 템플릿은 푸리에 변환 정보가 테스트되는 파라미터들의 세트를 포함할 수 있다 - 데이터가 바코드-형 패턴과 연관된 표시자를 가지는지를 알기 위하여.As shown in FIG. 57, Fourier transform information may be analyzed for teltile codes associated with the image of the barcode. Template-type approaches may be used. The template may include a set of parameters for which the Fourier transform information is tested-to see if the data has an indicator associated with a barcode-like pattern.

푸리에 데이터가 2D 바코드를 묘사하는 이미지와 일치한다면, 대응하는 정보는 다른 처리를 위해 라우팅될 수 있다(예를 들면, 셀 폰에서 바코드-응답 서비스로 송신될 수 있다). 이 정보는 자연스러운 이미지 데이터 및/또는 이미지 데이터로부터 도출된 푸리에 변환 정보를 포함할 수 있다. If the Fourier data matches the image depicting the 2D barcode, the corresponding information can be routed for other processing (eg, sent from the cell phone to the barcode-response service). This information may include natural image data and / or Fourier transform information derived from the image data.

전자의 경우에, 전체 이미지 데이터가 송신될 필요가 없다. 일부 실시예들에서, 이미지 데이터의 다운 샘플링된 버전, 예를 들면, 수평 및 수직 방향들 양쪽 모두의 제 4 해상도가 송신될 수 있다. 또는 바코드 패턴의 부분을 묘사할 가능성이 가장 높은 이미지 데이터의 패치들이 전송될 수 있다. 또는 반대로, 바코드를 묘사할 가능성이 가장 낮은 이미지 데이터의 패치들은 전송될 수 없다. (이들은 특징적인 주파수에서 피크를 가지지 않거나 주변보다는 그곳에 더 낮은 진폭을 가진 패치들일 수 있다.) In the former case, the entire image data does not need to be sent. In some embodiments, a down sampled version of the image data, eg, a fourth resolution in both horizontal and vertical directions can be transmitted. Or patches of image data most likely to depict portions of the barcode pattern can be sent. Or conversely, patches of image data that are least likely to depict a barcode can not be sent. (These may be patches that have no peak at characteristic frequencies or that have a lower amplitude there than at ambient.)

송신은 이용자에 의해 촉구될 수 있다. 예를 들면, 카메라 UI는 정보가 바코드 처리를 위해 지시되어야 하는지를 이용자에게 물을 수 있다. 다른 어레인지먼트들에서, 송신은 이미지 프레임이 가능한 바코드 데이터를 나타내는 템플릿에 매칭한다는 결정시 즉시 디스패치된다. 이용자 동작은 호출되지 않는다. The transmission can be prompted by the user. For example, the camera UI may ask the user if information should be directed for barcode processing. In other arrangements, the transmission is dispatched immediately upon determining that the image frame matches a template representing possible barcode data. User actions are not called.

푸리에 변환 데이터는 마찬가지로 다른 이미지 대상들의 부호들에 대해 테스트될 수 있다. D 바코드는 예를 들면, 높은 주파수에서 최상위 진폭 성분에 의해 특징지워질 수 있다 - ("피킷들에 걸쳐" 및 낮은 주파수에서의 다른 최상위 진폭 스파이크에 걸쳐 진행하고 - 피킷들을 따라 진행한다. (최상위는 상술된 바와 같이, 주위의 주파수들의 진폭을 다시 2번 이상 평균낸다.) 다른 이미지 콘텐트들은 또한 그들 푸리에 도메인 표현을 참조하여 특징지워질 수 있고, 대응하는 템플릿들이 고안될 수 있다. 푸리에 변환 데이터는 또한, 미디어 콘텐트의 자동화된 인식을 위해 이용된 핑거프린트들을 계산하는데 일반적으로 이용된다. The Fourier transform data can likewise be tested for signs of other image objects. The D barcode can be characterized, for example, by the highest amplitude component at high frequencies-("over the pickets" and over other highest amplitude spikes at low frequencies-along the pickets. As described above, the amplitudes of the surrounding frequencies are averaged again two or more times.) Other image contents can also be characterized with reference to their Fourier domain representation, and corresponding templates can be devised. It is generally used to calculate fingerprints used for automated recognition of media content.

푸리에-멜린(F-M) 변환은 또한 - 상기 주지된 바코드들을 포함하여 - 다양한 이미지 대상들/구성요소들을 특징짓는데 유용하다. F-M 변환은 이미지 대상의 스케일 및 회전(스케일/회전 침해)에 강력해지는 이점을 가진다. 예시적인 실시예에서, 대상의 스케일이 증가한다면(카메라를 더 가깝게 이동시킴으로써와 같이), F-M 변환 패턴이 위로 이동한다; 스케일이 감소된다면, F-M 패턴이 아래로 이동한다. 유사하게, 대상이 시계방향으로 회전된다면, F-M 패턴이 왼쪽으로 이동한다. (이동들의 특정 방향들은 구현에 의존하여 재단될 수 있다.) 이들 속성들은 얼굴 인식, 캐릭터 인식, 오브젝트 인식 등과 같이 어파인-변환될 수 있는 패턴들을 인식하는데 있어서 F-M 데이터를 중요하게 한다. The Fourier-Meline (F-M) transformation is also useful for characterizing various image objects / components-including the barcodes noted above. F-M conversion has the advantage of being robust to scale and rotation (scale / rotation intrusion) of the image object. In an exemplary embodiment, if the scale of the subject increases (such as by moving the camera closer), the F-M conversion pattern moves up; If the scale is reduced, the F-M pattern moves down. Similarly, if the object is rotated clockwise, the F-M pattern moves to the left. (The specific directions of the movements can be tailored depending on the implementation.) These properties make F-M data important in recognizing patterns that can be affine-transformed, such as face recognition, character recognition, object recognition, and the like.

도 57에 도시된 어레인지먼트는 F-M 데이터를 생성하기 위해 푸리에 변환 처리의 출력에 멜린 변환을 적용한다. 그 후에, F-M은 상이한 이미지 대상들과 연관된 속성들에 대해 스크리닝될 수 있다. The arrangement shown in FIG. 57 applies a Melin transform to the output of the Fourier transform process to generate F-M data. Thereafter, the F-M may be screened for attributes associated with different image objects.

예를 들면, 텍스트는 더 큰 배경 필드와 대조적인 전경 컬러에서 스트로크들로 구성된 근사적으로 유사한 크기의 복수의 심볼들에 의해 특징지워진다. 수직 에지들이 우세하려는 경향이 있다(이텔릭체들로 약간 기울어지더라도) - 상당한 에너지가 또한 수평 방향들에서 발견된다. 스트로크들 사이의 공간들은 일반적으로 공평하게 협소한 범위 내에 있다.For example, text is characterized by a plurality of symbols of approximately similar size, consisting of strokes in the foreground color as opposed to a larger background field. Vertical edges tend to predominate (even slightly tilted into italics)-significant energy is also found in the horizontal directions. The spaces between the strokes are generally in a fairly narrow range.

이들 속성들은 F-M 변환 공간에서 특정 경계들 내에 신뢰 가능하게 있는 경향이 있는 특징들로서 자체적으로 명백하다. 다시, F-M 데이터가 캡처된 자연스러운 이미지 데이터에서 텍스트의 있음직한 존재를 나타내기 위해 스크리닝되는 테스트들을 규정할 수 있다. 이미지가 있음직한-텍스트를 포함하기로 결정하는 경우, 이 타입의 에디터를 다루는 서비스에 디스패치될 수 있다(예를 들면, 광학 캐릭터 인식 또는 OCR, 엔진). 다시, 이미지(또는 이미지의 변형들)가 송신될 수 있거나, 변환 데이터가 송신될 수 있거나, 어떤 다른 데이터가 송신될 수 있다. These properties are self evident as features that tend to reliably within certain boundaries in the F-M transform space. Again, F-M data can define tests that are screened to indicate the likely presence of text in the captured natural image data. If the image decides to include a likely-text, it can be dispatched to a service dealing with this type of editor (eg optical character recognition or OCR, engine). Again, an image (or variants of the image) can be sent, transform data can be sent, or some other data can be sent.

F-M에서의 특정 세트의 특징적인 속성들로 텍스트 자체가 명백한 것처럼, 얼굴들도 마찬가지이다. 멜린 변환으로부터 출력된 F-M 데이터는 캡처된 이미지와 함께 얼굴의 있음직한 존재를 결정하기 위해 상이한 템플릿에 대해 테스트될 수 있다. Just as the text itself is apparent with a particular set of characteristic attributes in F-M, so do faces. F-M data output from the Melin transformation can be tested against different templates to determine the likely presence of a face with the captured image.

마찬가지로, F-M 데이터는 이미지 데이터가 워터마크를 전달하는 텔-타일 부호들에 대해 조사될 수 있다. 워터마크 배향 신호는 워터마크가 존재하는 부호의 역할을 할 수 있는 일부 워터마크들에 존재하는 특이한 신호들이다.Similarly, F-M data may be examined for Tel-tile codes in which image data carries a watermark. Watermark orientation signals are unique signals present in some watermarks that may serve as the sign in which the watermark exists.

방금 주어진 예에서, 다른 것들에서와 같이, 템플릿들은 알려진 이미지들로의 테스팅에 의해 컴파일될 수 있다(예를 들면, "트레이닝"). 많은 상이한 텍스트 제공들의 이미지들을 캡처함으로써, 결과로서 생긴 변환 데이터는 샘플 세트에 걸쳐 일치하거나, 또는 (대부분 가능성 있음) 경계된 범위들 내에 있는 속성들에 대해 조사될 수 있다. 이들 속성들은 그 후에 있음직한 텍스트를 포함하는 이미지들이 식별되는 템플릿으로서 이용될 수 있다. (마찬가지로, 얼굴들, 바코드들 및 이미지 대상들의 다른 타입들에 대해.)In the example just given, as in others, templates can be compiled by testing with known images (eg, "training"). By capturing images of many different textual provisions, the resulting transform data can be examined for attributes that match across the sample set, or are (most likely) within bounded ranges. These attributes can then be used as a template in which images containing likely text are identified. (Likewise, for other types of faces, barcodes and image objects.)

도 57은 다양한 상이한 변환들이 이미지 데이터에 적용될 수 있는 것을 도시한다. 이들은 일반적으로 병렬로 실행되는 것으로 도시되지만, 하나 이상이 순차적으로 실행될 수 있다 - 모두 동일한 입력 이미지 데이터에 대해 동작하거나, 하나의 변환이 이전 변환의 출력을 이용하여 실행된다(멜린 변환을 이용한 경우와 같이). 모두 도시되지 않았지만(도시의 명료성을 위해), 다른 변환 처리들의 각각으로부터의 출력들이 특정 이미지 타입의 존재를 제안하는 특징들에 대해 조사될 수 있다. 발견된다면, 관련된 데이터는 그 타입의 이미지 정보에 적합한 서비스에 송신된다. 57 illustrates that various different transformations may be applied to image data. They are generally shown to be executed in parallel, but one or more may be executed sequentially-all of which operate on the same input image data, or one transform is executed using the output of the previous transform (as with the Mellin transform). together). Although not all shown (for the sake of clarity of illustration), outputs from each of the other transform processes may be examined for features suggesting the presence of a particular image type. If found, the relevant data is sent to the appropriate service for that type of image information.

푸리에 변환 및 멜린 변환 처리들 외에도, 아이겐페이스(고유벡터) 계산, 이미지 압축, 크로핑, 어파인 왜곡, 필터링, DCT 변환, 웨이블릿 변환, 가버 변환(Gabor transform) 및 다른 신호 처리 동작들이 적용될 수 있다(모두 변환들로서 간주된다). 다른 것들은 이 명세서의 다른 곳에 주지되었고, 이 명세서에 참고문헌으로 포함되었다. 이들 처리들로부터의 출력들은 그 후에, 이미지가 특정 등급의 정보를 묘사하는 기회가 랜덤한 기회보다 큰 것을 나타내는 특징들에 대해 테스트된다. In addition to Fourier transform and melin transform processes, eigenface calculation, image compression, cropping, fine distortion, filtering, DCT transform, wavelet transform, Gabor transform and other signal processing operations may be applied. (All regarded as transformations). Others are well known elsewhere in this specification and are incorporated herein by reference. The outputs from these processes are then tested for features that indicate that the chance that the image depicts a particular grade of information is greater than the random chance.

일부 처리들로부터의 출력들은 다른 처리들에 입력될 수 있다. 예를 들면, 도 57에서 ETC 라벨이 붙여진 박스들 중 하나로부터의 출력은 푸리에 변환 처리에 대한 입력으로서 제공된다. 이 ETC 박스는 예를 들면, 필터링 동작일 수 있다. 샘플 필터링 동작들은 메디안, 라플라시안, 위너, 소벨, 하이-패스, 로우-패스, 대역통과, 가버, 시그넘 등을 포함할 수 있다. (Digimarc의 특허들 6,442,284, 6,483,927, 6,516,079, 6,614,914, 6,631,198, 6,724,914, 6,988,202, 7,013,021 및 7,076,082는 다양한 그러한 필터들을 보여준다.) Outputs from some processes may be input to other processes. For example, the output from one of the ETC labeled boxes in FIG. 57 is provided as input to the Fourier transform process. This ETC box can be a filtering operation, for example. Sample filtering operations may include median, laplacian, winner, sobel, high-pass, low-pass, bandpass, gabor, signum, and the like. (Digimarc's patents 6,442,284, 6,483,927, 6,516,079, 6,614,914, 6,631,198, 6,724,914, 6,988,202, 7,013,021 and 7,076,082 show various such filters.)

때때로, 단일 서비스는 상이한 데이터 타입들 또는 상이한 스크린들을 통과하는 데이터를 다룰 수 있다. 도 57에서, 예를 들면, 얼굴 인식 서비스는 F-M 변환 데이터 또는 아이겐페이스 데이터를 수신할 수 있다. 또는 그것은 여러 상이한 스크린들(예를 들면, 하나의 스크린을 통과한 그 F-M 변환, 또는 상이한 스크린을 통과한 그 아이겐페이스 표현) 중 하나를 통과한 이미지 정보를 수신할 수 있다. Sometimes a single service may handle data passing through different data types or different screens. In FIG. 57, for example, the face recognition service may receive F-M converted data or eigenface data. Or it may receive image information that has passed through one of several different screens (eg, its F-M transform through one screen, or its eigenface representation through a different screen).

일부 경우들에서, 데이터는 2개 이상의 상이한 서비스들에 송신될 수 있다. In some cases, data may be sent to two or more different services.

필수적이지는 않지만, 도 57에 도시된 처리의 일부 또는 전부는 이미지 센서들과 동일한 기판 상에 집적된 회로에 의해 실행되는 것이 바람직하다. (동작들의 일부는 소프트웨어 명령어들에 응답적으로 - 기판 상의 또는 밖의 - 프로그래밍 가능한 하드웨어에 의해 실행될 수 있다.) Although not essential, it is preferred that some or all of the processing shown in FIG. 57 be performed by circuitry integrated on the same substrate as the image sensors. (Some of the operations may be performed by programmable hardware-on or off board-in response to software instructions.)

상술된 동작들이 디지털 형태로 아날로그 센서 신호들의 변환 직후인 것으로 기술되었지만, 다른 실시예들에서, 그러한 동작들은 다른 처리 동작들 후에 실행될 수 있다(예를 들면, 바이엘 보간, 백색 밸런스 정정, JPEG 압축 등).Although the operations described above have been described immediately after the conversion of analog sensor signals to digital form, in other embodiments, such operations can be performed after other processing operations (eg, Bayer interpolation, white balance correction, JPEG compression, etc.). ).

정보가 송신되는 서비스들의 일부는 셀 폰에서 로컬로 제공될 수 있다. 또는 이들은 원격 디바이스에 의해 제공될 수 있으며, 이를 이용하여 셀 폰은 적어도 부분적으로 무선인 링크를 확립한다. 또는 이러한 처리는 다양한 디바이스들 사이에 분산될 수 있다. Some of the services for which information is transmitted may be provided locally at the cell phone. Or they may be provided by a remote device, which uses the cell phone to establish a link that is at least partially wireless. Or this process can be distributed among the various devices.

(통상적인 CCD 및 CMOS 센서들의 콘텍스트에서 기술되었지만, 이 기술은 센서 타입에 상관없이 적용 가능하다. 따라서, 예를 들면, Foveon 및 전정색 이미지 센서들이 대안적으로 이용될 수 있다. 높은 다이내믹 레인지 센서들일 수도 있고, 코닥의 트루센스 컬러 필터 패턴(적색/녹색/청색 센서 픽셀들의 일반적인 바이엘 어레이에 전정색 센서 픽셀들을 추가한다)을 이용한 센서들일 수 있다. 적외선 출력 데이터를 가진 센서들이 또는 유리하게 이용될 수 있다. 예를 들면, 적외선 이미지 데이터를 출력하는 센서들(가시적인 이미지 데이터 또는 가시적이지 않은 이미지 데이터에 부가하여)은 온도 차이들로 얼굴들 및 다른 이미지 대상들을 식별하기 위해 이용된다 - 프레임 내의 이미지 대상들을 세그먼트하는데 도움이 된다.)(While described in the context of conventional CCD and CMOS sensors, this technique is applicable regardless of sensor type. Thus, for example, Foveon and monochromatic image sensors may alternatively be used. High dynamic range sensor Or sensors using Kodak's TrueSense color filter pattern (adding monochromatic sensor pixels to a typical Bayer array of red / green / blue sensor pixels). For example, sensors that output infrared image data (in addition to visible or invisible image data) are used to identify faces and other image objects with temperature differences—frame It helps to segment the image objects within.)

도 57의 아키텍처를 활용하는 디바이스들은 본질적으로, 2개의 병렬 처리 체인들을 가지는 것을 인식할 것이다. 하나의 처리 체인은 인간의 뷰어들에 의해 이용하기 위해 지각적 형태로 렌더링되기 위한 데이터를 생성한다. 이것은 통상적으로, 모자이크-해제 처리기, 백색 밸런스 모듈, 및 JPEG 이미지 처리기 등 중 적어도 하나를 포함한다. 제 2 처리 체인은 하나 이상의 기계-구현된 알고리즘들에 의해 분석되기 위한 데이터를 생성하고, 예시적인 예에는 푸리에 변환 처리기, 아이겐페이스 처리기 등을 포함한다. Devices that utilize the architecture of FIG. 57 will recognize that, in essence, have two parallel processing chains. One processing chain generates data for rendering in perceptual form for use by human viewers. This typically includes at least one of a mosaic-releasing processor, a white balance module, a JPEG image processor, and the like. The second processing chain generates data for analysis by one or more machine-implemented algorithms, and illustrative examples include Fourier transform processors, eigenface processors, and the like.

이러한 처리 아키텍처들은 초기에 인용된 출원 61/176,739에 더욱 상술된다. Such processing architectures are further detailed in the previously cited application 61 / 176,739.

상술된 것과 같은 어레인지먼트들에 의해, 하나 이상의 적합한 이미지-응답 서비스들은 이용자가 캡처된 이미지로 무엇을 할지를 결정하기도 전에 비주얼 자극에 대한 후보 응답들을 공식화하기 시작할 수 있다.By arrangements as described above, one or more suitable image-response services may begin to formulate candidate responses to the visual stimulus before the user even decides what to do with the captured image.

비주얼 지능 사전-처리에 관한 추가적인 의견들Additional comments on visual intelligence pre-processing

정적 이미지 사전-처리가 도 57(및 도 50)과 함께 논의되었지만, 이러한 처리는 또한 움직임과 같은 시간적 양태들을 포함할 수 있다. Although static image pre-processing has been discussed in conjunction with FIG. 57 (and FIG. 50), such processing may also include temporal aspects such as motion.

움직임은 비디오와 가장 일반적으로 연관되고, 본 명세서에 상술된 기술들은 비디오 콘텐트를 캡처할 때 이용될 수 있다. 그러나, 움직임/시간 암시들은 또한 "정지" 이미지와 함께 제공된다. Movement is most commonly associated with video, and the techniques described herein may be used when capturing video content. However, motion / time implications are also provided with a "stop" image.

예를 들면, 어떤 이미지 센서들은 상부 로우에서 하부 로우로 순차적으로 판독된다. 판독 동작 동안, 이미지 대상은 이미지 프레임 내에서 이동할 수 있다(즉, 카메라 이동 또는 대상 이동으로 인해). 이 효과의 집선된 뷰가 도 60에 도시되며, 센서가 왼쪽으로 움직일 때 캡처된 이미징 "E"를 묘사한다. 글자의 수직 스트로크는 픽셀 데이터가 클로킹-아웃되고 있을 때 센서의 움직임으로 인해, 상부보다 하부에서 이미지 프레임의 왼쪽 에지로부터 더 멀다. For example, some image sensors are read sequentially from the top row to the bottom row. During a read operation, the image object may move within the image frame (ie, due to camera movement or object movement). An aggregated view of this effect is shown in FIG. 60 and depicts the imaging "E" captured when the sensor moves to the left. The vertical stroke of the letter is further from the left edge of the image frame at the bottom than at the top due to the movement of the sensor when the pixel data is clocking out.

이 현상은 또한, 카메라가 단일 "정지" 이미지를 생성하기 위하여 여러 프레임들로부터 데이터를 어셈블링할 때 발생한다. 이용자에게 종종 알려지지 않았지만, 많은 소비자 이미징 디바이스들은 이미지 데이터의 복수의 프레임들을 신속히 캡처하고, 데이터의 상이한 양태들을 함께 (예를 들면, FotoNation, Inc., 현재는 Tessera Technologies, Inc.에 의해 제공되는 소프트웨어를 이용하여) 합성한다. 예를 들면, 디바이스는 3개의 노출들을 취할 수 있다 - 하나는 이미지 프레임에서 묘사된 얼굴들의 출현을 최적화하고, 다른 하나는 배경에 따라 노출되고, 나머지는 전경에 따라 노출된다. 이들은 재미있는 몽타주를 생성하기 위해 함께 혼합된다. (다른 예에서, 카메라는 프레임들의 버스트를 캡처하고, 각각에서, 사람들이 웃고 있는지 눈을 깜박이고 있는지를 결정한다. 그 후에, 최종 이미지를 생성하기 위해 상이한 프레임들로부터 상이한 얼굴들을 선택할 수 있다.)This phenomenon also occurs when the camera assembles data from several frames to produce a single "still" image. Although often unknown to the user, many consumer imaging devices quickly capture multiple frames of image data and combine different aspects of the data together (eg, software provided by FotoNation, Inc., now Tessera Technologies, Inc.). Using). For example, the device can take three exposures-one optimizes the appearance of the faces depicted in the image frame, the other is exposed according to the background, and the other is exposed according to the foreground. These are mixed together to create interesting montages. (In another example, the camera captures a burst of frames, and in each, determines whether people are laughing or blinking. Then, different faces can be selected from different frames to produce the final image. )

따라서, 비디오와 정지 이미지 사이의 구별은 더 이상 단순히 디바이스 양식이 아니라 이용자 양식이 되고 있다. Thus, the distinction between video and still images is no longer simply a device form, but a user form.

움직임 검출은 공간 도메인(예를 들면, 프레임들 사이에서 특징 픽셀들의 움직임을 참조함으로써) 또는 변환 도메인에서 달성될 수 있다. 푸리에 변환 및 DCT 데이터가 예시적이다. 시스템은 이미지 구성요소의 변환 도메인 시그너처를 추출하고, - 그 움직임을 식별하는 - 상이한 프레임들에 걸쳐 그 움직임을 추적할 수 있다. 하나의 예시적인 기술은 예를 들면, - 매우 높은 주파수 에지들 등을 남겨두고 - 가장 낮은 N개의 주파수 계수들을 삭제한다. (가장 높은 M개의 주파수 계수들이 마찬가지로 무시될 수 있다.) 임계 동작은 나머지 계수들의 크기에 대해 실행된다 - 어떤 값 이하의 것들(평균의 30%와 같이)을 영으로 한다. 결과 계수들은 그 이미지 영역에 대한 시그너처의 역할을 한다. (변환은 예를 들면 8 x 8 픽셀들의 타일들에 기초할 수 있다.) 이 시그너처에 대응하는 패턴이 다른(또는 동일한) 이미지 프레임 내의 주위의 위치에서 발견될 때(상관과 같은 알려진 유사성 테스트를 이용하여), 그 이미지 영역의 움직임이 식별될 수 있다.Motion detection can be accomplished in the spatial domain (eg, by referring to the motion of feature pixels between frames) or in the transform domain. Fourier transform and DCT data are exemplary. The system can extract the transform domain signature of the image component and track the motion across different frames-identifying the motion. One exemplary technique deletes the lowest N frequency coefficients, for example-leaving very high frequency edges and the like. (The highest M frequency coefficients can be ignored as well.) Threshold operation is performed on the magnitude of the remaining coefficients-zero below some value (such as 30% of the mean). The resulting coefficients serve as signatures for that image area. (The transformation may be based on tiles of 8 x 8 pixels, for example.) When a pattern corresponding to this signature is found at a position around in another (or the same) image frame (a known similarity test such as correlation) Use), the movement of the image area can be identified.

의미 정보의 이미지 전달Image passing of semantic information

많은 시스템들에서, 스케일링 가능한(예를 들면 분산된) 방식으로 인커밍 콘텐트(예를 들면 이미지 데이터)에 관한 정보를 추출하는 처리 단계들의 세트(상술된 것들과 같이)를 실행하는 것이 바람직하다. 이 추출된 정보(메타데이터)는 그 후에 후속 처리를 용이하게 하도록 패키징하는 것이 바람직하다(이것은 애플리케이션 특정될 수 있거나, 더욱 계산 집중적일 수 있고, 발원 디바이스 내에서 또는 원격 시스템에 의해 실행될 수 있다).In many systems, it is desirable to execute a set of processing steps (such as those described above) that extract information about incoming content (eg, image data) in a scalable (eg, distributed) manner. This extracted information (metadata) is then preferably packaged to facilitate subsequent processing (this may be application specific, more computationally intensive, and may be executed within the originating device or by a remote system). .

대략적인 유추는 구글과의 이용자 상호작용이다. 적나라한 검색 용어들은 결여된 단말기에서 나온 것처럼 구글 메인프레임에 전송되지 않는다. 대신, 이용자의 컴퓨터는 발원 컴퓨터의 인터넷 프로토콜 어드레스(위치를 표시)를 포함하여 HTTP 요청으로서 질의를 포맷하고, 이용자 언어 선호들, 원하는 안전 검색 필터링 등이 식별될 수 있는 쿠키 정보를 이용 가능하게 한다. 관련 정보의 이러한 구조는 구글의 검색 처리에 선행자의 역할을 하여, 구글이 더욱 지능적으로 - 이용자에게 더 빠르고 더 양호한 결과들을 제공 - 검색 처리를 실행하게 허용한다.The rough analogy is user interaction with Google. Naked search terms aren't sent to the Google mainframe as they come from a missing device. Instead, the user's computer formats the query as an HTTP request, including the originating computer's Internet protocol address (indicating the location), and makes available cookie information for which user language preferences, desired safe search filtering, etc. can be identified. . This structure of relevant information acts as a predecessor to Google's search process, allowing Google to execute the search process more intelligently-providing users with faster and better results.

도 61은 예시적인 시스템에서 관련될 수 있는 메타데이터의 일부를 도시한다. 가장 왼쪽 컬럼의 정보 타입들은 이미지 센서로부터 취해진 자연스러운 이미지 데이터 신호들로부터 직접 계산될 수 있다. (주지된 바와 같이, 이들 중 일부 또는 전부는 공용 기판 상에 센서와 함께 집적된 처리 어레인지먼트들을 이용하여 계산될 수 있다.) 부가의 정보는 제 2 컬럼의 정보 타입들에 의해 도시된 바와 같이, 이들 기본 데이터 타입들을 참조하여 도출될 수 있다. 이러한 다른 정보는 셀 폰에서의 처리에 의해 생성될 수 있거나, 외부 서비스가 활용될 수 있다(예를 들면, 도 57에 도시된 OCR 인식 서비스는 셀 폰 내에 있을 수 있거나, 원격 서버 등일 수 있다; 도 50에 도시된 동작들과 유사하다.)61 illustrates a portion of metadata that may be relevant in an example system. The types of information in the leftmost column can be calculated directly from the natural image data signals taken from the image sensor. (As noted, some or all of these may be calculated using processing arrangements integrated with the sensor on a common substrate.) The additional information is shown by the information types of the second column, Can be derived with reference to these basic data types. This other information may be generated by processing at the cell phone, or an external service may be utilized (eg, the OCR recognition service shown in FIG. 57 may be in the cell phone, may be a remote server, etc.); Similar to the operations shown in FIG. 50)

이 정보가 후속 처리를 용이하게 하도록 어떻게 패키징될 수 있는가? 하나의 대안은 공용 이미지 포맷들의 "알파" 채널에서 이것을 전달하는 것이다. How can this information be packaged to facilitate subsequent processing? One alternative is to convey this in the "alpha" channel of common image formats.

대부분의 이미지 포맷들은 바이트-평면들 또는 복수의 채널들에서 전달된 데이터에 의해 이미지를 표현한다. RGB에서, 예를 들면, 하나의 채널은 적색 휘도를 전달하고, 두 번째 채널은 녹색 휘도를 전달하고, 세 번째 채널은 청색 휘도를 전달한다. CMYK(채널들은 각각 청록색, 자홍색, 황색 및 흑색 정보를 전달한다)와 유사하게, YUV에 대해서도 마찬가지이다 - 일반적으로 비디오(루마, 또는 명도, 채널: Y, 및 2개의 컬러 채널들: U 및 V) 및 LAB(또는 2개의 컬러 채널들을 가진 명도)와 함께 이용된다.Most image formats represent an image by data conveyed in byte-planes or in multiple channels. In RGB, for example, one channel carries red luminance, the second channel carries green luminance, and the third channel carries blue luminance. The same is true for YUV, similar to CMYK (channels carry cyan, magenta, yellow and black information, respectively)-generally video (luma, or brightness, channel: Y, and two color channels: U and V ) And LAB (or brightness with two color channels).

이들 이미징 구조들이 일반적으로 부가의 채널을 포함하도록 확장된다: 알파. 알파 채널은 불명료한 정보를 전달하기 위해 제공된다 - 배경 대상들은 이미지를 통해 가시적인 범위를 나타낸다. These imaging structures are generally extended to include additional channels: alpha. An alpha channel is provided to convey ambiguity-background objects represent the visible range through the image.

일반적으로, 이미지 처리 파일 구조들, 소프트웨어 및 시스템들에 의해 지원되지만, 알파 채널은 그다지 이용되지 않는다(가장 두드러지게는 컴퓨터 생성된 이미지 및 방사선을 제외하고). 본 기술의 특정 구현들은 이미지 데이터로부터 도출된 정보를 송신하기 위해 알파 채널을 이용한다. In general, although supported by image processing file structures, software and systems, alpha channels are not used very much (most notably computer generated images and radiation). Certain implementations of the present technology use an alpha channel to transmit information derived from image data.

이미지 포맷들의 상이한 채널들은 일반적으로 동일한 크기 및 비트-깊이를 가질 수 있다. RGB에서, 일반적으로, 적색 채널은 640 x 480 어레이에서 각 픽셀에 대해, 8-비트 데이터를 전달할 수 있다(0-255의 값들이 표현되도록 허용한다). 녹색 및 청색 채널들에 대해서도 마찬가지이다. 이러한 어레인지먼트의 알파 채널은 또한 일반적으로 8 비트들이고, 이미지 크기와 공동 범위이다(예를 들면, 8 비트들 x 640 x 480). 따라서, 모든 픽셀은 적색 값, 녹색 값, 청색 값 및 알파 값을 가진다. (합성 이미지 표현은 일반적으로 RGBA로서 알려져 있다.) Different channels of image formats may generally have the same size and bit-depth. In RGB, in general, the red channel can carry 8-bit data for each pixel in a 640 x 480 array (allowing values of 0-255 to be represented). The same is true for the green and blue channels. The alpha channel of this arrangement is also generally 8 bits, and is co-range with the image size (eg 8 bits x 640 x 480). Thus, every pixel has a red value, green value, blue value, and alpha value. (Composite image representation is commonly known as RGBA.)

알파 채널이 이미지 데이터로부터 도출된 정보를 전달하기 위해 이용될 수 있는 많은 방법들 중 몇몇은 도 62 내지 도 71에 도시되고, 하기에 논의된다. Some of the many ways in which an alpha channel can be used to convey information derived from image data are shown in FIGS. 62-71 and discussed below.

도 62는 이용자가 셀 폰으로 스냅핑할 수 있는 화상을 도시한다. 셀 폰의 처리기(센서 기판상 또는 다른 곳)는 이미지 데이터에 대한 에지 검출 필터(예를 들면, 소벨 필터)를 적용하여 에지 맵을 산출할 수 있다. 이미지의 각각의 픽셀은 에지의 부분인지의 여부가 결정된다. 그래서 이 에지 정보는 알파 채널에서 이용 가능한 8 비트 평면들 중 1 비트 평면에서만 전달될 수 있다. 이러한 알파 채널 패이로드는 도 63에 도시된다. 62 illustrates an image that a user can snap to a cell phone. The processor of the cell phone (on the sensor substrate or elsewhere) may apply an edge detection filter (eg, Sobel filter) on the image data to calculate the edge map. It is determined whether each pixel of the image is part of an edge. So this edge information can only be conveyed in one bit plane of the eight bit planes available in the alpha channel. This alpha channel payload is shown in FIG.

셀 폰 카메라는 또한 이미지 프레임 내에서 얼굴들을 식별하기 위해 알려진 기술들을 적용할 수 있다. 얼굴 영역들에 대응하는 적색, 녹색 및 청색 이미지 데이터는 그레이-스케일 표현을 생성하기 위해 조합될 수 있고, 이 표현은 - 예를 들면, RGB 이미지 데이터에서 식별된 얼굴들과의 정렬된 대응에서 - 알파 채널에 포함될 수 있다. 두 에지 정보 및 그레이스케일 얼굴들을 전달하는 알파 채널은 도 64에 도시된다. (8-비트 그레이스케일은 예시된 실시예에서 얼굴들에 이용되지만, 6- 또는 7-비트들과 같은 더 얕은 비트-깊이가 - 다른 정보에 대한 다른 비트 평면들을 자유롭게 - 다른 어레인지먼트들에서 이용될 수 있다.) The cell phone camera can also apply known techniques to identify faces within an image frame. Red, green and blue image data corresponding to facial regions can be combined to produce a grey-scale representation, for example in an aligned correspondence with faces identified in the RGB image data. May be included in the alpha channel. An alpha channel carrying two edge information and grayscale faces is shown in FIG. 64. (8-bit grayscale is used for faces in the illustrated embodiment, but shallower bit-depths, such as 6- or 7-bits, are available in different arrangements-freeing other bit planes for other information. Can be.)

카메라는 또한 각각의 검출된 얼굴에서 눈들 및 입의 위치들을 찾기 위한 동작들을 실행할 수 있다. 마커들은 알파 채널에서 송신될 수 있다 - 이들 검출된 특징들의 스케일 및 위치들을 표시한다. 간단한 형태의 마커는 "웃는 얼굴" 비트 맵핑된 아이콘이며, 눈들 및 입의 아이콘은 검출된 눈들 및 입의 위치들에 위치된다. 얼굴의 스케일은 아이콘 입의 길이에 의해 또는 타원을 둘러싸는 크기(또는 눈과 마커들 사이의 공간)에 의해 표시될 수 있다. 얼굴의 틸트는 입의 각도(또는 눈들 사이의 라인의 각도 또는 타원을 둘러싸는 틸트)에 의해 표시될 수 있다.The camera may also perform operations to find the positions of the eyes and mouth in each detected face. Markers can be transmitted in the alpha channel-indicating the scale and positions of these detected features. The simple form of the marker is a “smiley face” bit mapped icon, and the icons of eyes and mouth are located at the positions of the detected eyes and mouth. The scale of the face may be indicated by the length of the icon mouth or by the size surrounding the ellipse (or the space between the eye and the markers). The tilt of the face may be indicated by the angle of the mouth (or the angle of the line between the eyes or the tilt surrounding the ellipse).

셀 폰 처리가 이미지에 묘사된 사람들의 성명의 결정을 산출하면, 이것 역시 추가의 이미지 채널에서 표현될 수 있다. 예를 들면, 여성의 묘사된 얼굴의 경계를 그리는 타원 라인은 점선으로 또는 다른 패턴들로 만들어질 수 있다. 눈들은 어둡게 된 원형들 대신에 십자선들 또는 X들 등으로 표현될 수 있다. 묘사된 사람들의 연령들이 또한 근사될 수 있고 유사하게 표시될 수 있다. 처리는 또한, 비주얼 얼굴 단서들에 의해 각각의 사람의 감정 상태를 분류할 수 있고, 놀람/행복/슬픔/화/애매함과 같은 표시가 표현될 수 있다.(예를 들면, 2007년 오스트레일리아 퀸즈랜드 Proceedings of the 2007 Int'l Conf on Computer Engineering and Applications 456-461쪽에서 "A simple approach to facial expression recognition"를 참조한다. 또한, 특허 공개들 20080218472 (Emotiv Systems, Pty), 및 20040207720 (NTT DoCoMo)을 참조한다.)If cell phone processing yields the determination of the names of the people depicted in the image, this too can be represented in an additional image channel. For example, an ellipse line that delineates the depicted face of a woman can be made in dashed lines or in other patterns. The eyes may be represented by crosshairs or Xs or the like instead of darkened circles. The ages of the depicted people may also be approximated and displayed similarly. The treatment may also classify each person's emotional state by visual facial cues, and an indication such as surprise / happy / sadness / angry / ambiguity may be expressed (eg, Proceedings, Queensland, 2007, 2007). See “A simple approach to facial expression recognition” on pages 456-461 of the 2007 Int'l Conf on Computer Engineering and Applications, see also Patent Publications 20080218472 (Emotiv Systems, Pty), and 20040207720 (NTT DoCoMo). do.)

결정이 일부 불확정성을 가질 때(성별, 연령대, 또는 감정을 추측하는 것과 같이), 분석 처리에 의해 출력된 신뢰 메트릭은 또한, 라인의 폭 또는 패턴 요소들의 스케일 또는 선택에 의해서와 같이 아이콘 방식으로 표현될 수 있다. When a decision has some uncertainty (such as guessing gender, age group, or emotion), the confidence metric output by the analysis process is also represented in an iconic manner, such as by the width or selection of line width or pattern elements. Can be.

도 65는 성별 및 신뢰를 포함하는 상이한 정보를 보조 이미지 평면에 표시하기 위해 이용될 수 있는 상이한 패턴 요소들을 도시한다. 65 illustrates different pattern elements that can be used to display different information in the auxiliary image plane, including gender and confidence.

휴대용 디바이스는 또한 이미지 데이터에서 묘사된 영숫자 심볼들 및 스트링들의 광학 캐릭터 인식에서 정점에 이른 동작들을 실행할 수 있다. 예시된 예에서, 디바이스는 화상에서 스트링 "LAS VEGAS"을 인식할 수 있다. 이 결정은 알파 채널에 추가된 PDF417 2D 바코드에 의해 기억될 수 있다. 바코드는 이미지 프레임에서 OCR'd 텍스트의 위치에 또는 다른 곳에 있을 수 있다. The portable device may also perform peaked operations in optical character recognition of alphanumeric symbols and strings depicted in the image data. In the illustrated example, the device can recognize the string "LAS VEGAS" in the picture. This decision can be stored by a PDF417 2D barcode added to the alpha channel. The barcode may be at the location of the OCR'd text in the image frame or elsewhere.

(PDF417은 예시적일 뿐이다. 다른 바코드들 - ID, Aztec, Datamatrix, 고용량 컬러 바코드, Maxicode, QR 코드, Semacode, 및 ShotCode와 같이 - 또는 다른 기계-판독가능한 데이터 심볼들 - OCR 폰트들 및 데이터 무늬들과 같이 - 이 자연스럽게 이용될 수 있다. 무늬들은 임의의 데이터를 전달하고 또한 하프톤 이미지 묘사들을 형성하기 위해 양쪽 모두 이용될 수 있다. 이와 관련하여, Xerox의 특허 6,419,162 및 2001년 IEEE Computer Magazine 제3호 제34권 47-55쪽에서 Hecht에 의한 "Printed Embedded Data Graphical User Interfaces"를 참조한다.) (PDF417 is exemplary only. Other barcodes-such as ID, Aztec, Datamatrix, high capacity color barcodes, Maxicode, QR code, Semacode, and ShotCode-or other machine-readable data symbols-OCR fonts and data patterns Patterns can be used both to convey arbitrary data and also to form halftone image descriptions, in this regard, patents 6,419,162 to Xerox and IEEE Computer Magazine 3rd 2001 See "Printed Embedded Data Graphical User Interfaces" by Hecht, Vol. 34, pp. 47-55.)

도 66은 디바이스에 의해 결정된 정보의 일부의 알파 채널 표현을 도시한다. 모든 이들 정보는 알파 채널의 단일 비트 평면(8 비트 평면들의) 내에서 전달되도록 허용하는 방식으로 구성된다. 처리 동작들(예를 들면, 도 50 및 도 61에서 도시된 분석들)의 다른 것으로부터 유발된 정보는 이 동일한 비트 평면에서 또는 다른 비트 평면들에서 전달될 수 있다. 66 shows an alpha channel representation of a portion of the information determined by the device. All these information are organized in a manner that allows them to be conveyed within a single bit plane (of 8 bit planes) of the alpha channel. Information derived from another of the processing operations (eg, the analyzes shown in FIGS. 50 and 61) can be conveyed in this same bit plane or in other bit planes.

도 62 내지 도 66은 알파 채널 및 이것의 상이한 표현들에서 전달될 수 있는 다양한 정보를 도시하였지만, 더 많은 것들이 도 67 내지 도 69의 예에 도시된다. 이들은 새로운 GMC 트럭 및 소유주의 셀 폰 화상을 관련시킨다. 62-66 illustrate various information that can be conveyed in the alpha channel and its different representations, more are shown in the example of FIGS. 67-69. These involve the new GMC truck and the owner's cell phone burn.

다른 처리들 중에서, 이 예의 셀 폰은 트럭의 모델, 연식 및 컬러를 인식하기 위해, 트럭 그릴 및 소유주의 티셔츠 상의 텍스트를 인식하기 위해, 소유주의 얼굴을 인식하기 위해, 및 풀 및 하늘 영역들을 인식하기 위해 이미지 데이터를 처리하였다.Among other processes, the cell phone of this example can recognize the model, age and color of the truck, recognize the text on the truck grill and the owner's t-shirt, recognize the owner's face, and recognize the full and sky areas. Image data was processed.

하늘은 프레임의 상단부의 위치에 의해, 예상된 놈들(norms)의 임계 거리 내의 컬러 히스토그램에 의해, 특정 주파수 계수들에서의 약한 스펙트럼 구성에 의해(예를 들면, 실질적으로 "평평한(flat)" 영역) 인식되었다. 풀은 텍스처 및 컬러에 의해 인식되었다. (이들 특징들을 인식하는 다른 기술들이 예를 들면, 2000년 5월 Image and Vision Computing 제18권 이슈들 6-7, 515-530쪽에서 Batlle에 의해 "A review on strategies for recognizing natural objects in colour images of outdoor scenes"; 2001년 3월 Pattern Analysis & Applications 제1호 제4권 20-27쪽에서 Hayashi에 의한 "Fast Labelling of Natural Scenes Using Enhanced Knowledge"; 및 2005년 7월 IEEE Int'l Conf. on Multimedia and Expo에서 Boutell에 의한 "Improved semantic region labeling based on scene context" 에 개시되어 있다. 또한, 특허 공개들 20050105776 및 20050105775 (Kodak)을 참조한다.) 트리들은 유사하게 인식될 수 있었다. The sky is a region of the top of the frame, by a color histogram within the expected distance of norms, by a weak spectral configuration at certain frequency coefficients (e.g., substantially "flat" region). A) was recognized. Pools were recognized by textures and colors. Other techniques for recognizing these features are described by Batlle in, for example, May 2000, Image and Vision Computing, Volume 18, pages 6-7, 515-530, in "A review on strategies for recognizing natural objects in color images of outdoor scenes ”;“ Fast Labeling of Natural Scenes Using Enhanced Knowledge ”by Hayashi in Pattern Analysis & Applications Volume 1, Volume 4, pages 20-27, March 2001; and IEEE Int'l Conf. on Multimedia and July 2005 Expo is disclosed by Boutell in “Improved semantic region labeling based on scene context.” See also Patent Publications 20050105776 and 20050105775 (Kodak). The trees could be recognized similarly.

이미지의 인간 얼굴은 소비자의 카메라에서 일반적으로 활용되는 것들과 동일한 어레인지먼트들을 이용하여 검출되었다. 광학 캐릭터 인식은 입력 이미지에 뒤이은 에지 검출 알고리즘의 적용 후에 푸리에 및 멜린 변환들로부터 유발된 데이터 세트에 대해 실행되었다. (텍스트 GMC 및 LSU TIGERS를 찾았지만, 알고리즘은 티셔츠 상의 다른 텍스트 및 타이어들 상의 텍스트를 식별하지 못했다. 부가적인 처리 시간을 이용하여, 이 놓친 텍스트의 일부가 디코딩될 수 있었다.) The human face of the image was detected using the same arrangements as those commonly used in consumer cameras. Optical character recognition was performed on the data set resulting from Fourier and Mellyn transforms after application of the edge detection algorithm following the input image. (The texts GMC and LSU TIGERS were found, but the algorithm did not identify other text on the t-shirt and text on the tires. With additional processing time, some of this missing text could be decoded.)

트럭은 차량으로서 먼저 분류되었고, 그 후에 트럭으로서, 그 후에 최종적으로, 패턴 매칭에 의해, 확장된 운전대를 가진 Dark Crimson Metallic 2007 GMC Sierra Z-71로서 식별되었다. (이러한 상술된 식별은 GM 트럭들 웹 사이트, 플리커 및 헐리우드 움직임 화상들에서 차량들을 식별하는데 충실한 팬 사이트: IMCDB-dot-com와 같은 리소스들로부터 알려진 참조 트럭 이미지들의 이용을 통해 획득되었다.) 제조사 및 모델 인식을 하기 위한 다른 방식은 2009년 Proc. SPIE, 제7251권, 725105에서 Zafar에 의한 "Localized Contourlet Features in Vehicle Make and Model Recognition"에 상술되어 있다.)The truck was first classified as a vehicle, then as a truck and then finally, by pattern matching, as a Dark Crimson Metallic 2007 GMC Sierra Z-71 with an extended steering wheel. (This above-described identification was obtained through the use of known reference truck images from resources such as the fan site: IMCDB-dot-com, faithful to identifying vehicles on the GM trucks website, flicker and Hollywood motion pictures.) And other ways of model recognition in 2009 Proc. This is described in detail in "Localized Contourlet Features in Vehicle Make and Model Recognition" by Zafar in SPIE, vol.

도 68은 도 67 이미지의 알파 채널에 추가할 때 구별된 정보의 예시적인 그래픽, 비토널 표현(bitonal representation)을 도시한다. (도 69는 합성 이미지의 상이한 평면들: 적색, 녹색, 청색 및 알파를 도시한다.)FIG. 68 shows an example graphical, bitonal representation of differentiated information when added to the alpha channel of the FIG. 67 image. (Figure 69 shows the different planes of the composite image: red, green, blue and alpha.)

이미지의 부분은 묘사한 풀이 점들의 균일한 이미지에 의해 표시될 때 검출된다. 하늘을 묘사하는 이미지 영역은 라인들의 그리드로서 표현된다. (트리들이 특별히 식별되었으면, 이들은 동일한 패턴들 중 하나를 이용하여 라벨링될 수 있지만, 상이한 크기/간격/등으로도 가능하다. 또는 완전히 상이한 패턴이 이용될 수 있었다.) Portions of the image are detected when the depicted pool is represented by a uniform image of the points. The image region depicting the sky is represented as a grid of lines. (If the trees were specifically identified, they could be labeled using one of the same patterns, but they could be of different sizes / spacings / etc. Or completely different patterns could be used.)

확장된 운전대를 가진 Dark Crimson Metallic 2007 GMC Sierra Z-71로의 트럭의 식별은 PDF417 2D 바코드에서 인코딩된다 - 트럭의 크기로 스케일링되고 그 형상에 의해 마스킹된다. PDF417가 에러-정정 특징들로 중복적으로 정보를 인코딩하기 때문에, 손상되는 직사각형 바코드의 부분들은 인코딩된 정보가 복구되는 것을 방지하기 않는다. The identification of the truck with the Dark Crimson Metallic 2007 GMC Sierra Z-71 with an extended steering wheel is encoded in the PDF417 2D barcode-scaled to the size of the truck and masked by its shape. Because PDF417 encodes information redundantly with error-correcting features, the portions of the rectangular barcode that are damaged do not prevent the encoded information from being recovered.

얼굴 정보는 제 2 PDF417 바코드에 인코딩된다. 이 제 2 바코드는 트럭 바코드에 대해 90도로 배향되고, 상이하게 스케일링되어, 다운스트림 디코더들에 대한 2개의 개별 심볼들을 구별하는데 도움을 준다. (다른 상이한 배향들이 이용될 수 있었고, 일부 경우들에서는 예를 들면 30도, 45도 등이 바람직하다.)Face information is encoded in a second PDF417 bar code. This second barcode is oriented 90 degrees relative to the truck barcode and scaled differently to help distinguish two separate symbols for downstream decoders. (Other different orientations could be used, in some cases, for example, 30 degrees, 45 degrees, etc. is preferred.)

얼굴 바코드는 타원 형상이고 타원 테두리로 아웃라인될 수 있다(이것이 묘사되지는 않았다). 바코드의 중심은 사람의 눈들의 중간 지점에 배치된다. 바코드의 폭은 눈들 사이의 거리의 2배이다. 타원 바코드의 높이는 입과 눈들을 합류시키는 라인 사이의 거리의 4배이다. Face barcodes are oval shaped and can be outlined with an elliptic rim (this is not depicted). The center of the barcode is placed at the midpoint of the human eyes. The width of the barcode is twice the distance between the eyes. The height of the ellipse barcode is four times the distance between the lines joining the mouth and eyes.

얼굴 바코드의 패이로드는 얼굴로부터 식별된 정보를 전달한다. 실시예들에서, 바코드는 간단히 얼굴의 출현 존재를 나타낸다. 더욱 정교한 실시예들에서, 얼굴 이미지로부터 계산된 고유벡터들이 인코딩될 수 있다. 특정한 얼굴이 인식된다면, 사람을 식별하는 정보가 인코딩될 수 있다. 처리기는 대상의 가능성 있는 성별에 관해 판단하고, 이 정보는 바코드에서 역시 전달될 수 있다. The payload of the face barcode carries the information identified from the face. In embodiments, the barcode simply indicates the presence of a face. In more sophisticated embodiments, the eigenvectors calculated from the face image may be encoded. If a particular face is recognized, information identifying the person can be encoded. The processor determines about the likely gender of the subject and this information can also be conveyed in the barcode.

소비자 카메라들 및 셀 폰들에 의해 캡처된 이미지에 나타나는 사람들은 랜덤하지 않다: 상당한 비율이 재발생하는 대상들이다, 즉 소유주의 어린이들, 배우자, 친구들, 이용자 자신 등. 소유주에 의해 소유되거나 이용되는 디바이스들, 예를 들면 PDA, 셀 폰, 집 컴퓨터, 네트워크 저장장치 등 사이에 분포된 이들 재발생하는 대상들의 다수의 이전 이미지들이 종종 존재한다. 많은 이들 이미지들은 묘사된 사람들의 이름들로 주석이 달린다. 이러한 참조 이미지들로부터, 얼굴 벡터들을 특징짓는 설정들이 계산될 수 있고, 새로운 포토들에서 대상들을 식별하기 위해 이용될 수 있다. (주지된 바와 같이, 구글의 피카사 서비스가 이용자의 포토 콜렉션에서 사람들을 식별하기 위해 이 원리에 대해 동작한다; 페이스북 및 아이포토도 마찬가지이다.) 참조 얼굴 벡터들의 이러한 라이브러리는 도 67의 사진에 묘사된 사람을 시도하고 식별하기 위해 확인될 수 있고, 식별은 바코드에서 표현될 수 있다. (식별은 사람의 이름 및/또는 다른 식별자(들)를(을) 포함할 수 있고, 그에 의해 매칭된 얼굴은 예를 들면, 데이터베이스 또는 접촉 리스트의 인덱스 번호, 전화 번호, 페이스북 이용자 이름 등이 알려진다.)Those who appear in the images captured by consumer cameras and cell phones are not random: significant proportions are reoccurring objects, ie owner children, spouses, friends, users themselves. There are often many previous images of these reoccurring objects distributed between devices owned or used by the owner, such as a PDA, cell phone, home computer, network storage, and the like. Many of these images are annotated with the names of the people depicted. From these reference images, settings characterizing face vectors can be calculated and used to identify objects in new photos. (As noted, Google's Picasa service works on this principle to identify people in a user's photo collection; facebook and iPhoto are the same.) This library of reference face vectors is shown in the photo of FIG. The identification may be verified to attempt and identify the person depicted, and the identification may be represented in a barcode. (The identification may include the person's name and / or other identifier (s), whereby the matched face may be, for example, an index number, phone number, Facebook user name, etc., in a database or contact list. Known.)

도 67 이미지의 영역들로부터 인식된 텍스트는 알파 채널 프레임의 대응하는 영역들에 추가되어, 신뢰 가능하게 디코딩 가능한 OCR 폰트에서 제공된다. (OCR-A가 묘사되지만, 다른 폰트들이 이용될 수 있다.)The text recognized from the regions of FIG. 67 is added to the corresponding regions of the alpha channel frame and provided in a reliably decodable OCR font. (OCR-A is depicted, but other fonts may be used.)

다양한 다른 정보가 도 68 알파 채널에 포함될 수 있다. 예를 들면, 처리기가 텍스트를 의심하는 프레임의 위치들이 존재하지만, OCR하는 것은 영숫자 심볼들을 성공적으로 디코딩하지 않았고(아마도 타이어들 상에, 또는 사람의 셔츠 상의 다른 캐릭터들), 대응하는 비주얼 단서를 추가함으로써 식별될 수 있다(예를 들면, 대각선들의 패턴). 사람의 윤곽(그의 얼굴의 표시보다는)이 또한 처리기에 의해 검출될 수 있고, 대응하는 테두리 또는 충전 패턴에 의해 표시된다. Various other information may be included in the FIG. 68 alpha channel. For example, there are positions in the frame where the processor suspects text, but OCR did not successfully decode alphanumeric symbols (probably on tires or other characters on a person's shirt), and the corresponding visual cue. By adding (eg, a pattern of diagonals). The contour of the person (rather than the representation of his face) can also be detected by the processor and indicated by the corresponding border or filling pattern.

도 62 내지 도 66과 도 67 내지 도 69의 예들이 알파 채널에서 의미 메타데이터를 표현하는 다양한 상이한 방식들을 도시하지만, 더 많은 기술들이 도 70 및 도 71의 예에 도시된다. 여기서, 이용자는 놀고있는 어린이의 스냅샷을 캡처하였다(도 70).Although the examples of FIGS. 62-66 and 67-69 illustrate various different ways of representing semantic metadata in an alpha channel, more techniques are shown in the examples of FIGS. 70 and 71. Here, the user captured a snapshot of the playing child (FIG. 70).

어린이의 얼굴은 카메라로부터 멀리 돌려지고, 불량한 콘트라스트로 캡처된다. 그러나, 이 제한된 정보를 이용해서라도, 처리기는 이용자의 이전 이미지들을 참조하여 있음직한 식별을 만든다: 이용자의 첫 태어난 아이 Matthew Doe(무수한 이용자의 보존된 포토들에서 발견될 것 같음). The child's face is turned away from the camera and captured with poor contrast. However, even with this limited information, the processor makes a likely identification with reference to the user's previous images: the user's first born child Matthew Doe (likely to be found in the preserved photos of countless users).

도 71에 도시된 바와 같이, 이 예의 알파 채널은 이용자의 이미지의 에지-검출된 버전을 전달한다. 어린이의 머리 위의 부과된 것은 어린이의 얼굴 대체된 이미지이다. 이러한 대체 이미지는 그 구성(예를 들면, 2개의 눈들, 코 및 입을 묘사하는) 및 더 양호한 콘트라스트에 대해 선택될 수 있다. As shown in FIG. 71, the alpha channel of this example conveys an edge-detected version of the user's image. What is imposed on the child's head is the child's face replaced image. This alternate image can be selected for its construction (eg, depicting two eyes, nose and mouth) and better contrast.

일부 실시예들에서, 시스템에 알려진 각각의 사람은 상이한 콘텍스트들에서 사람에 대한 비주얼 프록시의 역할을 하는 아이콘 얼굴 이미지를 가진다. 예를 들면, 일부 PDA들은 접촉들의 얼굴 이미지들을 포함하는 접촉 리스트들을 저장한다. 이용자(또는 접촉들)는 쉽게 인식된 - 아이콘으로 - 얼굴 이미지들을 제공한다. 이들 아이콘 얼굴 이미지들은 이미지에 묘사된 사람의 머리를 매칭하도록 스케일링될 수 있고, 대응하는 얼굴 위치에서 알파 채널에 추가될 수 있다. In some embodiments, each person known to the system has an icon face image that acts as a visual proxy for the person in different contexts. For example, some PDAs store contact lists that include face images of the contacts. The user (or contacts) provide facial images that are easily recognized-with an icon. These icon face images can be scaled to match the head of the person depicted in the image and added to the alpha channel at the corresponding face position.

또한, 도 71에 묘사된 알파 채널에는 2D 바코드가 포함되어 있다. 이 바코드는 이미지 데이터의 처리로부터 구별된 정보의 나머지를 전달할 수 있거나 이용 가능하다(예를 들면, 어린이의 이름, 컬러 히스토그램, 노출 메타데이터, 얼마나 많은 얼굴들이 화상에서 검출되었나, 10개의 가장 큰 DCT 또는 다른 변환 계수들 등).In addition, the alpha channel depicted in FIG. 71 includes a 2D barcode. This barcode can convey or be available the rest of the information distinguished from the processing of the image data (e.g. the child's name, color histogram, exposure metadata, how many faces were detected in the image, the ten largest DCTs) Or other transform coefficients, etc.).

2D 바코드를 압축 및 다른 이미지 처리 동작들에 가능한 강력하게 하기 위하여, 그 크기는 고정되는 것이 아니라, 오히려 환경들 - 이미지 특성들과 같이 - 에 기초하여 동적으로 스케일링될 수 있다. 묘사된 실시예에서, 처리기는 균일한 에지니스(edgeness)를 가진 영역들(즉, 임계된 범위 내)을 식별하기 위해 에지 맵을 분석한다. 가장 큰 이러한 영역이 선택된다. 바코드는 그 후에, 이 영역의 중앙 영역을 점유하기 위해 스케일링되고 배치된다. (후속 처리에서, 바코드가 대체된 에지니스는 4개의 바코드 측면들에 인접한 중심 지점들에서 에지니스를 평균냄으로써 크게 복구될 수 있다.)In order to make the 2D bar code as powerful as possible to compression and other image processing operations, the size is not fixed but rather can be dynamically scaled based on circumstances-such as image characteristics. In the depicted embodiment, the processor analyzes the edge map to identify areas with uniform edgeness (ie, within a critical range). The largest such area is selected. The barcode is then scaled and placed to occupy a central area of this area. (In subsequent processing, the edgeness with the barcode replaced can be largely recovered by averaging the edgeness at the center points adjacent to the four barcode sides.)

다른 실시예에서, 영역 크기는 바코드를 배치할 장소를 결정하는데 에지니스로 조절된다: 낮은 에지니스가 양호하다. 이 대안적인 실시예에서, 더 낮은 에지니스의 더 작은 영역은 더 높은 에지니스의 더 큰 영역을 통해 선택될 수 있다. 각각의 후보 영역에 스케일링된 값의 에지니스를 뺀 크기는 어떤 영역이 바코드를 호스팅해야 하는지를 결정하기 위하여 메트릭의 역할을 할 수 있다. 이것은 도 71에서 이용된 어레인지먼트이고, - 더 크지만 더 에지가 있는 오른쪽에 대한 영역보다는 - Matthew의 머리의 왼쪽 영역의 바코드의 배치를 유발한다.In another embodiment, the area size is adjusted with edgeness to determine where to place the barcode: low edgeness is good. In this alternative embodiment, smaller areas of lower edgeness may be selected through larger areas of higher edgeness. The subtracted edgeness of the scaled value in each candidate region can serve as a metric to determine which region should host the barcode. This is the arrangement used in FIG. 71, leading to the placement of the barcode in the left region of Matthew's head-rather than the region for the larger but more edged right.

도 70이 비교적 "에지"있지만(예를 들면 도 62 사진과 대조적으로), 대부분의 에지니스는 무관할 수 있다. 일부 실시예들에서, 에지 데이터는 주요한 에지들(예를 들면, 연속하는 라인 윤곽들에 의해 표시된 에지들)만 보존되도록 필터링된다. 결과로서 생긴 필터링된 에지 맵의 공백 영역들 내에서, 처리기는 부가의 데이터를 전달할 수 있다. 일 어레인지먼트에서, 처리기는 그 사람의 이미지 컬러들이 있는 특정 컬러 히스토그램 빈(bin)을 나타내기 위하여 패턴을 삽입한다. (64개의 상이한 패턴들을 요구하는 64-빈 히스토그램에서, 빈 2는 적색 채널이 0-63의 값을 가지고, 녹색 채널이 0-63의 값을 가지고, 청색 채널이 64-127의 값을 가지는 등의 컬러들을 포함할 수 있다.) 다른 이미지 메트릭들이 유사하게 전달될 수 있다. While Figure 70 is relatively "edge" (as opposed to, for example, the Figure 62 photo), most edgeness may be irrelevant. In some embodiments, the edge data is filtered so that only major edges (eg, edges indicated by continuous line contours) are preserved. Within the blank areas of the resulting filtered edge map, the processor can convey additional data. In one arrangement, the processor inserts a pattern to indicate a particular color histogram bin with the person's image colors. (In a 64-bin histogram that requires 64 different patterns, bin 2 has a red channel value of 0-63, a green channel value of 0-63, a blue channel value of 64-127, and so on. May include colors). Other image metrics may be conveyed similarly.

상이한 데이터를 나타내기 위해 상이한 패턴들을 이용하는 대신에, 필터링된 에지 맵의 빈 영역들은 잡음-형 신호로 필터링될 수 있다 - 디지털 워터마크 데이터로서 히스토그램(또는 다른 정보)을 전달하기 위해 스테가노그래픽으로 인코딩된다. (적합한 워터마킹 기술이 Digimarc의 특허 6,590,996에 상술되어 있다.) Instead of using different patterns to represent different data, blank areas of the filtered edge map can be filtered with a noise-like signal-steganographically to convey histograms (or other information) as digital watermark data. Is encoded. (Suitable watermarking techniques are detailed in Digimarc's patent 6,590,996.)

알파 채널에서 일부 정보가 - 그래픽 형태로 인간에게 시각적으로 제공되는 경우 - 유용한 정보를 전달하는 것을 알 것이다. 도 63으로부터, 사람은 "WELCOME TO Fabulous LAS VEGAS NEVADA"이라고 기재된 부호 앞에서 여성을 포옹한 남성을 식별할 수 있다. 도 64로부터, 사람은 그레이스케일 얼굴들 및 장면의 아웃라인을 볼 수 있다. 도 66으로부터, 사람은 어떤 정보를 전달하는 바코드를 부가적으로 식별할 수 있고, 얼굴의 위치들을 보여주는 2개의 웃는 얼굴 아이콘들을 식별할 수 있다. It will be appreciated that some information in the alpha channel-when presented visually to humans in graphical form-conveys useful information. From FIG. 63, a person can identify a male who embraced a female before a sign that says "WELCOME TO Fabulous LAS VEGAS NEVADA". From FIG. 64, a person can see the grayscale faces and the outline of the scene. From FIG. 66, a person may additionally identify a barcode carrying some information and identify two smiley face icons showing the locations of the face.

마찬가지로, 도 68에서 그래픽 정보의 프레임이 렌더링될 수 있는 뷰어는 사람의 윤곽을 식별할 수 있고, 사람의 티셔츠로부터 LSU TIGERS를 판독할 수 있고, 트럭의 윤곽에 무엇이 나타나는지를 알아낼 수 있다(트럭의 그릴이 있는 GMC 텍스트의 단서에 의해 도움을 받음). Similarly, in FIG. 68, a viewer, in which a frame of graphical information can be rendered, can identify the outline of a person, read LSU TIGERS from a person's t-shirt, and find out what appears in the outline of the truck (of the truck). Assisted by clues in GMC text with draw).

도 71의 알파 채널 데이터의 제공으로부터, 사람은 장난감들을 가지고 놀고 있는 마루바닥 위에 앉아 있는 어린이를 식별할 수 있다. From the provision of the alpha channel data of FIG. 71, a person can identify a child sitting on the floor playing with toys.

도 71의 바코드는 도 66의 바코드와 같이, 정보의 존재를 조사하는 사람에게 눈에 띄게 나타내지만, 그 콘텐트는 나타내지 않는다. The bar code of FIG. 71, like the bar code of FIG. 66, is prominently displayed to the person investigating the existence of the information, but the content thereof is not shown.

알파 채널에서 그래픽 콘텐트의 나머지는 조사시 사람에게 유익하지 않을 수 있다. 예를 들면, 어린이의 이름이 도 71에서 잡음-형 신호에서 디지털 워터마크로서 스테가노그래픽으로 인코딩되는 경우, 잡음이 있을 수 있는 정보의 존재도 사람에 의해 검출되지 않을 수 있다. The rest of the graphic content in the alpha channel may not be beneficial to humans upon investigation. For example, if a child's name is encoded steganographically as a digital watermark in the noise-like signal in FIG. 71, the presence of information that may be noisy may not be detected by the person.

상술된 예들은 알파 채널에 스터퍼링될 수 있는 의미 정보의 다이버시티 및 활용될 수 있는 표현 구조들의 다이버시티의 일부를 상술한다. 당연히, 이것은 바로 작은 샘플링이다; 예술가는 특정 애플리케이션들의 요구들에 대한 이들 개시내용들을 신속히 적응시킬 수 있어서, 많은 다른 상이한 실시예들을 생성한다. 따라서, 예를 들면, 이미지로부터 추출될 수 있는 임의의 정보는 본 명세서에 개시된 것들과 유사한 어레인지먼트들을 이용하여 알파 채널에서 기억될 수 있다. The examples described above detail some of the diversity of semantic information that can be stuffed into an alpha channel and the diversity of representational structures that can be utilized. Of course, this is just a small sampling; An artist can quickly adapt these disclosures to the needs of specific applications, creating many other different embodiments. Thus, for example, any information that can be extracted from an image can be stored in an alpha channel using arrangements similar to those disclosed herein.

이미지 관련된 정보가 상이한 시간들에서 상이한 처리기들에 의해 상이한 위치들에서 알파 채널에 추가될 수 있음을 알 것이다. 예를 들면, 휴대용 디바이스에서의 센서 칩은 특정 분석들을 실행하고, 결과 데이터를 알파 채널에 추가하는 온칩 처리를 할 수 있다. 디바이스는 - 이미지 데이터에 대해 및/또는 초기의 분석들의 결과들에 대해 - 추가적인 처리를 실행하고, 이들 추가적인 결과들의 표현을 알파 채널에 추가하는 다른 처리기를 가질 수 있다. (이들 추가적인 결과들은 원격 소스로부터 무선으로 획득된 데이터에 부분적으로 기초할 수 있다. 예를 들면, 소비자 카메라는 이용자의 PDA에 블루투스에 의해 링크되어, 이용자의 접촉 파일들로부터 얼굴 정보를 획득할 수 있다.) It will be appreciated that image related information may be added to the alpha channel at different locations by different processors at different times. For example, a sensor chip in a portable device may perform on-chip processing to perform specific analyzes and add the resulting data to the alpha channel. The device may have another processor-for the image data and / or for the results of the initial analyzes-that performs additional processing and adds a representation of these additional results to the alpha channel. (These additional results may be based in part on data obtained wirelessly from a remote source. For example, a consumer camera may be linked by Bluetooth to a user's PDA to obtain facial information from the user's contact files. have.)

합성 이미지 파일은 휴대용 디바이스로부터 중간 네트워크 노드(예를 들면, Verizon, AT&T, 또는 T-Mobile와 같은 캐리어에, 또는 다른 서비스 제공자에)에 송신될 수 있고, 이것은 추가적인 처리를 실행하고 그 결과를 알파 채널에 추가한다. (더욱 유능한 처리 하드웨어를 이용하여, 이러한 중간 네트워크 노드는 더욱 복잡하고, 리소스-집중적인 처리 -더욱 정교한 얼굴 인식 및 패턴 매칭과 같이 - 를 실행한다. 더 높은-대역폭 네트워크 액세스를 이용하여, 이러한 노드는 추가적인 데이터, 예를 들면 위키피디아 엔트리들에 대한 링크들 - 또는 위키피디아 콘텐트 자체, 전화 데이터베이스 및 이미지 데이터베이스 룩업들로부터의 정보 등으로 알파 채널을 증대시키기 위해 다양한 원격 리소스들을 활용할 수 있다.) 이렇게 보충된 이미지는 그 후에 이미지 질의 서비스 제공자(예를 들면, SnapNow, MobileAcuity 등)에 송신될 수 있고, 이것은 처리를 계속할 수 있고 및/또는 이렇게 제공된 정보에 기초하여 응답 동작을 명령어한다. The composite image file can be sent from the portable device to an intermediate network node (e.g., to a carrier such as Verizon, AT & T, or T-Mobile, or to another service provider), which executes further processing and outputs the result to alpha. Add to channel (With more capable processing hardware, these intermediate network nodes perform more complex, resource-intensive processing-such as more sophisticated face recognition and pattern matching.) With higher-bandwidth network access, these nodes Can utilize various remote resources to augment the alpha channel with additional data, such as links to Wikipedia entries-or information from Wikipedia content itself, phone database and image database lookups.) The image may then be sent to an image query service provider (eg, SnapNow, MobileAcuity, etc.), which may continue processing and / or instruct a response operation based on the information so provided.

알파 채널은 따라서 모든 진행하는 처리가 이미지를 식별하고 그에 관해 학습한 것의 아이콘 뷰를 전달할 수 있다. 각각의 후속 처리기는 이 정보에 쉽게 액세스할 수 있고 더 많이 기여할 수 있다. 이 모두는 기존 작업흐름 채널들 및 길게 확립된 파일 포맷들의 제약들 내에 있다.The alpha channel can thus convey an icon view of what all ongoing processing has identified the image and learned about it. Each subsequent processor can easily access this information and contribute more. All of this is within the constraints of existing workflow channels and long established file formats.

일부 실시예들에서, 구별된/추론된 데이터의 일부 또는 전부의 출처가 표시된다. 예를 들면, 저장된 데이터는 특정 텍스트를 생성하는 OCR이, 01-50-F3-83-AB-CC의 MAC 어드레스 또는 2008년 8월 28일 8:35 pm에 PDX- LA002290.corp.verizon-dot-com의 네트워크 식별자와 같은 고유한 식별자를 갖는 Verizon 서버에 의해 실행되었음을 나타낼 수 있다. 이러한 정보는 알파 채널에, 헤더 데이터에, 포인터가 제공되는 원격 저장소 등에 저장될 수 있다. In some embodiments, the source of some or all of the distinguished / inferred data is indicated. For example, the stored data can be the OCR generating the specific text, the MAC address of 01-50-F3-83-AB-CC or PDX-LA002290.corp.verizon-dot at 8:35 pm on August 28, 2008. It may indicate that execution was performed by a Verizon server having a unique identifier, such as a network identifier of -com. Such information may be stored in the alpha channel, in the header data, in remote storage or the like provided with a pointer.

상이한 처리기들이 알파 채널의 상이한 비트-평면들에 기여할 수 있다. 캡처 디바이스는 비트 평면 #1에 그 정보를 기록할 수 있다. 중간 노드는 비트 평면 #2에 그 기여들을 저장할 수 있다. 특정 비트 평면들이 공유된 이용을 위해 이용 가능할 수 있다. Different processors may contribute to different bit-planes of the alpha channel. The capture device can write the information to bit plane # 1. The intermediate node may store its contributions in bit plane # 2. Certain bit planes may be available for shared use.

또는 상이한 비트 평면들에는 의미 정보의 상이한 등급들 또는 타입들이 할당될 수 있다. 이미지 내의 얼굴들 또는 사람들에 관련된 정보는 비트 평면 #1에 항상 기록될 수 있다. 장소들에 관련된 정보 비트 평면 #2에 항상 기록될 수 있다. 에지 맵 데이터는 컬러 히스토그램 데이터와 함께(예를 들면, 2D 바코드 형태로 표현됨) 비트 평면 #3에서 항상 발견될 수 있다. 다른 콘텐트 라벨링(예를 들면, 풀, 모래, 하늘)은 OCR'd 텍스트와 함께 비트 평면 #4에서 발견될 수 있다. 웹으로부터 획득된 텍스트의 콘텐트 또는 관련 링크들과 같은 텍스트의 정보는 비트 평면 #5에 발견될 수 있다. (ASCII 심볼들은 비트 패턴들로서 포함될 수 있으며, 예를 들면 각각의 심볼은 평면에 8 비트들을 취한다. 후속 처리에 대한 견고성은 2 이상의 비트들을 ASCII 데이터의 각각의 비트에 대해 이미지 평면에 할당함으로써 향상될 수 있다. 콘볼루션 코딩 및 다른 에러 정정 기술들은 이미지 플랜 정보의 일부 또는 전부에 대해 활용될 수 있다. 역시, 에러 정정 바코드들을 할 수 있다.) Or different grades or types of semantic information may be assigned to different bit planes. Information related to faces or people in the image can always be recorded in bit plane # 1. It can always be written to information bit plane # 2 related to the places. Edge map data can always be found in bit plane # 3 along with color histogram data (eg, represented in the form of a 2D barcode). Other content labeling (eg grass, sand, sky) can be found in bit plane # 4 with OCR'd text. Information of the text, such as content of the text obtained from the web or related links, may be found in bit plane # 5. (ASCII symbols can be included as bit patterns, for example each symbol takes 8 bits in the plane. Robustness for subsequent processing is improved by assigning two or more bits to the image plane for each bit of ASCII data. Convolutional coding and other error correction techniques may be utilized for some or all of the image plan information, again with error correction barcodes.)

알파 채널에서 전달되는 정보에 대한 인덱스가 예를 들면 이미지와 연관된 EXIF 헤더 내에서 컴파일될 수 있어서, 후속 시스템들이 이러한 데이터의 해석 및 처리를 빠르게 하도록 허용한다. 인덱스는 알파 채널에서 전달된 데이터의 타입들 및 선택적으로 다른 정보(예를 들면, 그들 위치들)를 명시하는 XML-형 태그들을 활용한다. The index to the information carried in the alpha channel can be compiled, for example, in the EXIF header associated with the image, allowing subsequent systems to quickly interpret and process this data. The index utilizes XML-type tags that specify the types of data conveyed in the alpha channel and optionally other information (eg, their locations).

위치들은 비트-평면 어레이, 예를 들면, X-, Y- 좌표들에서 최상위 비트(또는 최상좌위 비트)의 위치로서 명시될 수 있다. 또는 직사각 경계 박스는 2개의 코너 지점들(예를 들면, X, Y 좌표들에 의해 지정된)을 참조하여 명시될 수 있다 - 정보가 표현되는 영역을 상술한다. The locations can be specified as the location of the most significant bit (or most significant bit) in the bit-plane array, eg, X-, Y-coordinates. Alternatively, a rectangular bounding box may be specified with reference to two corner points (eg, designated by X, Y coordinates)-detailing the region in which the information is represented.

도 66의 예에서, 인덱스는 와 같이 정보를 전달할 수 있다:In the example of FIG. 66, the index may convey information as:

<MaleFace1> AlphaBitPlane1 (637,938) </MaleFace1> <MaleFace1> AlphaBitPlane1 (637,938) </ MaleFace1>

<FemaleFace1> AlphaBitPlane1 (750,1012) </FemaleFace1> <FemaleFace1> AlphaBitPlane1 (750,1012) </ FemaleFace1>

<OCRTextPDF417> AlphaBitPlane1 (75,450)-(1425,980) </OCRTextPDF417><OCRTextPDF417> AlphaBitPlane1 (75,450)-(1425,980) </ OCRTextPDF417>

<EdgeMap> AlphaBitPlane1 </EdgeMap> <EdgeMap> AlphaBitPlane1 </ EdgeMap>

이 인덱스는 따라서, 상부 픽셀이 위치(637, 938)인 알파 채널의 비트 평면 #1에서 발견된다; 여성 얼굴은 (750, 1012)에 위치된 상부 픽셀에서 유사하게 표현된다; PDF417 바코드로서 인코딩된 OCR'd 텍스트가 코너 지점들 (75,450) 및 (1425,980)을 가진 직사각 영역의 비트 평면 #1에서 발견되고, 그 비트 평면 #1은 또한 이미지의 에지 맵을 포함한다. This index is thus found in bit plane # 1 of the alpha channel where the top pixel is at positions 637 and 938; The female face is similarly represented in the upper pixel located at 750, 1012; OCR'd text encoded as a PDF417 barcode is found in bit plane # 1 of a rectangular region with corner points 75,450 and 1425,980, which bit plane # 1 also includes an edge map of the image.

다소의 정보가 자연스럽게 제공될 수 있다. 더 적은 정보를 가진 상이한 형태의 인덱스가 예를 들면 하기와 같이 명시될 수 있다:Some information may naturally be provided. Different types of indexes with less information can be specified, for example:

<AlphaBitPlane1> Face,Face,PDF417,EdgeMap </AlphaBitPlane1> <AlphaBitPlane1> Face, Face, PDF417, EdgeMap </ AlphaBitPlane1>

이러한 형태의 인덱스는 간단히, 알파 채널의 비트 평면 #1이 2개의 얼굴들, PDF417 바코드 및 에지 맵을 포함하는 것을 나타낼 수 있다. This form of index may simply indicate that bit plane # 1 of the alpha channel includes two faces, a PDF417 barcode and an edge map.

더 많은 정보를 가진 인덱스는 각각의 얼굴에 대한 회전각 및 스케일 팩터, PDF417 바코드의 LAS VEGAS 패이로드, LAS VEGAS 바코드의 각도, 주관적 결정들에 대한 신뢰 팩터들, 인식된 사람들의 이름들, 알파 채널들에서 이용된 각각의 패턴의 의미 중요성을 상술하는 어휘 또는 용어사전(예를 들면, 도 65의 패턴들 및 도 68의 하늘 및 풀에 이용된 그래픽 라벨들), 보조 데이터의 소스들(예를 들면, 도 71에서 이중인화된 어린이의 얼굴, 또는 도 67에서 트럭이 Sierra Z71이라는 결론에 대한 기초로서 서빙되는 원격 참조 이미지 데이터) 등을 포함하는 데이터를 명시할 수 있다. The index with more information is the rotation angle and scale factor for each face, the LAS VEGAS payload of the PDF417 barcode, the angle of the LAS VEGAS barcode, the confidence factors for subjective decisions, the names of recognized people, the alpha channel. Vocabulary or glossary (e.g., patterns in FIG. 65 and graphic labels used in the sky and grass in FIG. 68) detailing the significance significance of each pattern used in the For example, data may be specified, such as the face of a duplicated child in FIG. 71, or remote reference image data served as the basis for the conclusion that the truck is Sierra Z71 in FIG. 67.

알 수 있는 바와 같이, 인덱스는 알파 채널의 비트 평면들에서 또한 전달되는 정보를 전달할 수 있다. 일반적으로, 상이한 형태들의 표현이 알파 채널의 그래픽 표현들 대 인덱스에서 이용된다. 예를 들면, 알파 채널에서, 제 2 얼굴의 여성성은 눈들을 표현하기 위해 '+'들에 의해 표현된다; 인덱스에서, 여성성은 XML 태그 <FemaleFace1>에 의해 표현된다. 정보의 리던던트 표현은 데이터 무결성에 대한 확인의 역할을 한다. As can be seen, the index can carry information that is also conveyed in the bit planes of the alpha channel. In general, different forms of representation are used in the index versus the graphical representations of the alpha channel. For example, in the alpha channel, the femininity of the second face is represented by '+' to express eyes; In the index, femininity is represented by the XML tag <FemaleFace1>. The redundant representation of the information serves as a check for data integrity.

때때로, EXIF 데이터와 같은 헤더 정보는 이미지 데이터로부터 분리될 수 있다(예를 들면, 이미지가 상이한 포맷에 전달될 때). 헤더에 인덱스 정보를 전달하는 대신에, 알파 채널의 비트 평면은 인덱스 정보, 예를 들면 비트 평면 #1을 전달하도록 서빙할 수 있다. 하나의 이러한 어레인지먼트는 2D 바코드로서 인덱스 정보를 인코딩한다. 바코드는 가능한 이미지 저하에 대한 최대의 견고성을 제공하기 위해 프레임을 채우도록 스케일링될 수 있다. Sometimes, header information such as EXIF data may be separated from the image data (eg, when the image is delivered in a different format). Instead of conveying index information in the header, the bit plane of the alpha channel may serve to convey index information, for example bit plane # 1. One such arrangement encodes index information as a 2D barcode. The barcode can be scaled to fill the frame to provide maximum robustness to possible image degradation.

일부 실시예들에서, 인덱스 정보의 일부 또는 전부는 상이한 데이터 저장들에서 복제된다. 예를 들면, EXIF 헤더 형태에서 및 비트 평면 #1에서 바코드로서 양쪽 모두 전달될 수 있다. 데이터의 일부 또는 전부는 또한 구글 또는 "클라우드에서"의 다른 웹 저장에 의해서와 같이, 원격으로 유지될 수 있다. 이미지에 의해 전달된 어드레스 정보는 이 원격 저장에 대한 포인터의 역할을 할 수 있다. 포인터(URL일 수 있지만, 더욱 일반적으로 - 요구될 때 - 추구된 데이터의 현재 어드레스를 리턴하는 데이터베이스로의 UID 또는 인덱스임)는 인덱스 내에 및/또는 알파 채널의 하나 이상의 비트 평면들에 포함될 수 있다. 또는 포인터는 디지털 워터마킹 기술을 이용하여 이미지 데이터의 픽셀들 내에서(합성 이미지 평면들의 일부 또는 전부에서) 스테가노그래픽으로 인코딩될 수 있다. In some embodiments, some or all of the index information is replicated in different data stores. For example, both may be delivered in the form of an EXIF header and as a barcode in bit plane # 1. Some or all of the data may also be maintained remotely, such as by Google or other web storage "in the cloud." The address information carried by the image can serve as a pointer to this remote store. A pointer (which can be a URL, but more generally-when required-is a UID or index into the database that returns the current address of the data sought) can be included in the index and / or in one or more bit planes of the alpha channel. . Alternatively, the pointer may be steganographically encoded in pixels (in some or all of the composite image planes) of the image data using digital watermarking techniques.

또 다른 실시예들에서, 알파 채널에 저장된 것으로 상술된 정보의 일부 또는 정부는 부가적으로 또는 대안적으로 원격 저장될 수 있거나 또는 디지털 워터마크로서 이미지 픽셀들 내에 인코딩될 수 있다. (화상 자체는 알파 채널을 가지거나 가지지 않고, 처리 체인의 임의의 디바이스에 의해, 원격 저장에서 또한 복제될 수 있다.) In still other embodiments, some or part of the information described above as stored in the alpha channel can additionally or alternatively be remotely stored or encoded in image pixels as a digital watermark. (The image itself may or may not be duplicated in remote storage, by any device in the processing chain, with or without an alpha channel.)

일부 이미지 포맷들은 상술된 4개의 평면들보다 많이 포함할 수 있다. 지리적 공간 이미지(geospatial imagery) 및 다른 맵핑 기술들은 일반적으로 반-다스 이상의 정보 평면들로 확장하는 포맷들로 데이터를 표현한다. 예를 들면, 다중스펙트럼 공간-기반 이미지는 (1) 적색, (2) 녹색, (3) 청색, (4) 근적외선, (5) 중간-적외선, (6) 원적외선, 및 (7) 열적외선에 몰두된 개별 이미지 평면들을 가질 수 있다. 상술된 기술들은 이러한 포맷들에서 이용 가능한 하나 이상의 보조 데이터 평면들을 이용하여 도출된/추론된 이미지 정보를 전달할 수 있다. Some image formats may include more than the four planes described above. Geospatial imagery and other mapping techniques typically represent data in formats that extend to more than half a dozen or more information planes. For example, multispectral space-based images may be applied to (1) red, (2) green, (3) blue, (4) near infrared, (5) mid-infrared, (6) far infrared, and (7) thermal infrared. It can have individual image planes engrossed. The techniques described above can convey derived / inferred image information using one or more auxiliary data planes available in these formats.

처리 노드들 사이에서 이미지가 움직임에 따라, 노드들의 일부는 초기의 처리에 의해 삽입된 데이터에 겹쳐쓰기할 수 있다. 필수적이지는 않지만, 겹쳐쓰기 처리기는 겹쳐쓰기된 정보를 원격 저장장치에 복사될 수 있고, 이미지 또는 인덱스 또는 알파 채널에서 이에 대한 링크 또는 다른 참조를 포함한다 - 동일한 후자의 경우에서 필요하다. As the image moves between processing nodes, some of the nodes may overwrite the data inserted by the initial processing. Although not required, the overwrite processor can copy the overwritten information to the remote storage and include a link or other reference to it in the image or index or alpha channel-necessary in the latter case.

알파 채널에서 정보를 표현할 때, 이 채널이 겪을 수 있는 저하들에 대한 고려사항이 주어질 수 있다. JPEG 압축은 예를 들면, 일반적으로, 이미지의 인간의 인식에 의미있게 기여하지 않는 높은 주파수 상세들을 폐기한다. 그러나, 인간 시각 시스템에 기초한 정보의 이러한 폐기는 다른 목적들을 위해 존재하는 정보에 적용될 때 단점들로 작용할 수 있다(알파 채널의 인간의 뷰가 분명히 가능하고, 어떤 경우들에서는 유용할지라도). When presenting information in an alpha channel, consideration may be given to the degradations that this channel may experience. JPEG compression, for example, generally discards high frequency details that do not contribute significantly to human perception of the image. However, this revocation of information based on the human visual system can serve as disadvantages when applied to information that exists for other purposes (although a human view of the alpha channel is clearly possible and useful in some cases).

이러한 저하를 제거하도록 노력하기 위해, 알파 채널의 정보는 시각적으로 무관한 것으로 간주될 가능성이 없는 특징들에 의해 표현될 수 있다. 상이한 타입들의 정보는 상이한 특징들에 의해 표현될 수 있어서, 가장 중요한 것은 엄격한 압축을 통해서도 지속한다. 따라서, 예를 들면, 도 66에서 얼굴들의 존재는 굵은 타원형들로 나타낸다. 눈들의 위치들은 덜 관련될 수 있어서, 더 작은 특징들에 의해 표현된다. 도 65에 도시된 패턴들은 압축 후에 신뢰 가능하게 구별되지 않을 수 있어서, 2차 정보 - 손실이 덜 중요한 곳 - 를 표현하기 위해 예약될 수 있다. JPEG 압축을 이용하여, 최상위 비트-평면이 가장 잘 보존되는 반면, 더 낮은 상위 비트-평면들은 점차적으로 오류가 생긴다. 따라서, 가장 중요한 메타데이터는 알파 채널의 최상위 비트 평면들에서 전달된다 - 생존가능성을 향상시키기 위해. In an effort to eliminate this degradation, the information in the alpha channel may be represented by features that are unlikely to be considered visually irrelevant. Different types of information can be represented by different features, so the most important one persists even through strict compression. Thus, for example, the presence of faces in FIG. 66 is indicated by bold ovals. The positions of the eyes may be less relevant and are represented by smaller features. The patterns shown in FIG. 65 may not be reliably distinguished after compression, so they may be reserved to represent secondary information where loss is less important. Using JPEG compression, the top bit-planes are best preserved, while the lower top bit-planes are progressively error-prone. Thus, the most important metadata is carried in the most significant bit planes of the alpha channel-to improve viability.

도 62 내지 도 71에 의해 도시된 종류의 기술이 메타데이터를 전달하기 위해 공통어가 된다면, 이미지 압축은 그 존재를 고려하기 위해 진화될 것이다. 예를 들면, JPEG 압축은 적색, 녹색 및 청색 이미지 채널들에 적용될 수 있지만, 무손실(또는 저손실) 압축은 알파 채널에 적용될 수 있다. 다양한 비트 평면들의 알파 채널이 상이한 정보를 전달할 수 있기 때문에, 이들은 별도로 - 8-비트 깊이의 바이트들로서 보다는 - 압축될 수 있다. (별도로 압축되면, 손실있는 압축이 더욱 수용될 수 있다.) 각각의 비트-평면이 단지 비토널 정보만을 전달하여, 수정된 허프만, 수정된 READ, 런 랭스 인코딩 및 ITU-T T.6를 포함하는 팩시밀리 기술로부터 알려진 압축 방식들이 이용될 수 있다. 따라서, 하이브리드 압축 기술들은 이러한 파일들에 매우 적합하다. If the kind of technique shown by Figs. 62-71 becomes a common language for carrying metadata, image compression will evolve to take into account its presence. For example, JPEG compression may be applied to the red, green and blue image channels, while lossless (or low loss) compression may be applied to the alpha channel. Because the alpha channel of the various bit planes can carry different information, they can be compressed separately-rather than as bytes of 8-bit depth. (If compressed separately, lossy compression may be more acceptable.) Each bit-plane carries only non-tonal information, including modified Huffman, modified READ, run length encoding, and ITU-T T.6. Compression schemes known from facsimile techniques can be used. Thus, hybrid compression techniques are well suited for these files.

메타데이터의 알파 채널 전달은 JPEG 2000와 같은 압축 어레인지먼트들을 이용하여, 연관된 이미지 특징들에 일반적으로 대응하여 점진적으로 송신 및 디코딩하도록 구성될 수 있다. 즉, 알파 채널이 비주얼 도메인(예를 들면, 아이콘으로)에서 의미 정보를 제공하고 있으므로, 그것은 이미지와 동일한 레이트로 의미 상세의 층들을 압축해제하도록 표현될 수 있다. Alpha channel delivery of metadata may be configured to progressively transmit and decode correspondingly to the associated image features, using compression arrangements such as JPEG 2000. That is, since the alpha channel provides semantic information in the visual domain (eg, with an icon), it can be expressed to decompress layers of semantic detail at the same rate as the image.

JPEG 2000에서, 웨이블릿 변환이 이용되어 이미지를 표현하는 데이터를 생성한다. JPEG 2000은 점진적인 송신 및 디코딩을 생성하는 방식으로 이 변환 데이터를 패키징하고 처리한다. 예를 들면, JPEG 2000 이미지를 렌더링할 때, 이미지의 총체적 상세들이 먼저 나타나고, 연속적으로 더 미세한 상세들이 뒤따른다. 송신에 대해서도 유사하다. In JPEG 2000, wavelet transform is used to generate data representing an image. JPEG 2000 packages and processes this transformed data in a way that produces progressive transmission and decoding. For example, when rendering a JPEG 2000 image, the overall details of the image appear first, followed by more fine details. The same is true for transmission.

도 67의 트럭 및 남성 이미지를 고려하자. 이것의 JPEG 2000 버전이 낮은 주파수의 굵은 선 형태의 트럭을 먼저 표현하는 것을 렌더링한다. 이후, 남성의 형상이 나타난다. 다음에, 트럭 그릴 상의 GMC 글자 및 남성의 티셔츠 상의 로고와 같은 특징들이 구별된다. 최종적으로, 남성의 얼굴 특징들, 풀, 나무들의 상세, 및 다른 높은 주파수 미뉴셔가 이미지의 렌더링을 완료한다. 송신에 대해서도 유사하다. Consider the truck and male image of FIG. 67. The JPEG 2000 version of this renders the first rendering of a low frequency thick line truck. Then, the shape of the male appears. Next, features such as GMC letters on the truck grille and logos on men's t-shirts are distinguished. Finally, male facial features, grass, details of trees, and other high frequency finishers complete the rendering of the image. The same is true for transmission.

이러한 진행은 도 77a의 피라미드에서 도시된다. 처음에 비교적 작은 양의 정보가 총체적 형상의 세부사항들을 제공하여 표현된다. 점진적으로, 이미지가 내부에 채워진다 - 최종적으로 비교적 큰 양의 작은 상세한 데이터로 끝난다. This progress is shown in the pyramid of FIG. 77A. Initially a relatively small amount of information is represented by providing details of the overall shape. Gradually, the image is filled inside-finally ending with a relatively large amount of small detailed data.

알파 채널의 정보는 유사하게 구성될 수 있다(도 77b). 트럭에 관한 정보는 크고, 낮은 주파수(형상-우세함) 심볼로 표현될 수 있다. 남성의 존재 및 위치는 다음-가장-우세한 표현으로 인코딩될 수 있다. 트럭 그릴 상의 GMC 글자 및 남성의 셔츠 상의 글자에 대응하는 정보는 미세한 정도로 상세하게 알파 채널에서 표현될 수 있다. 이미지에서 가장 미세한 정도의 현저한 상세, 예를 들면 남성의 얼굴 미뉴셔가 알파 채널에서 가장 미세한 정도로 상세하게 표현될 수 있다. (주지될 수 있는 바와 같이, 도 68의 예시적인 알파 채널은 이 모델을 그다지 따르지 않는다.)The information of the alpha channel may be similarly configured (FIG. 77B). Information about the truck can be represented by large, low frequency (shape-dominant) symbols. The presence and position of the male may be encoded in the next-most-dominant expression. The information corresponding to the GMC letters on the truck grille and the letters on the men's shirts can be represented in the alpha channel in fine detail. The finest degree of detail in the image, for example the facial facial of a male, can be represented in the finest detail in the alpha channel. (As may be noted, the exemplary alpha channel of FIG. 68 does not follow this model very much.)

알파 채널이 기계-판독가능한 심볼들(예를 들면, 바코드들, 디지털 워터마크들, 글리프들 등)의 형태로 그 정보를 전달한다면, 알파 채널 디코딩의 순서는 결정적으로 제어될 수 있다. 가장 큰 특징들을 가진 특징들이 먼저 디코딩된다; 가장 미세한 특징들을 가진 특징들이 가장 나중에 디코딩된다. 따라서, 알파 채널은 여러 상이한 크기들에서(동일한 비트프레임에서 모두, 예를 들면, 나란히 위치된 또는 비트 프레임들 사이에 분산된) 바코드들을 전달할 수 있다. 또는 알파 채널은 복수의 디지털 워터마크 신호들을 전달할 수 있으며, 예를 들면, 하나는 총체적 해상도에서(예를 들면, 10개의 워터마크 요소들에 대응 또는 인치의 "왁셀들(waxels)", 다른 것들은 연속적으로 더 미세한 해상도들에서(예를 들면, 인치당 50, 100, 150 및 300 왁셀들). 데이터 글리프들도 마찬가지이다: 더 큰 및 더 작은 크기들의 글리프들의 범위가 이용될 수 있고, 이들은 비교적 더 초기에 또는 나중에 디코딩될 것이다. If the alpha channel conveys its information in the form of machine-readable symbols (eg, barcodes, digital watermarks, glyphs, etc.), the order of the alpha channel decoding can be deterministically controlled. Features with the largest features are decoded first; Features with the finest features are decoded last. Thus, the alpha channel can carry barcodes at several different sizes (all in the same bitframe, eg, located side by side or distributed between bit frames). Or the alpha channel may carry a plurality of digital watermark signals, for example one in overall resolution (eg, corresponding to ten watermark elements or "waxels" in inches, others Continuously at finer resolutions (eg 50, 100, 150 and 300 wax cells per inch) The same is true for data glyphs: a range of larger and smaller sizes of glyphs can be used, which are relatively more It will be decoded early or later.

(JPEG2000은 점진적인 거동을 나타내는 가장 흔한 압축 방식들이지만, 다른 것들도 존재한다. 어떤 수고를 하는 JPEG는 유사하게 행동할 수 있다. 본 개념들은 이러한 점진성이 존재할 때마다 적용 가능하다.) (JPEG2000 is the most common compression scheme that exhibits gradual behavior, but there are others. Some laborious JPEGs can behave similarly. These concepts are applicable whenever such graduality exists.)

이러한 어레인지먼트들에 의해, 이미지 특징들이 제공을 위해 디코딩될 때 - 또는 송신될 때(예를 들면, 미디어 전달 스트리밍에 의해), 대응하는 메타데이터가 이용 가능하게 된다. Such arrangements make corresponding metadata available when image features are decoded for provision—or when transmitted (eg, by media delivery streaming).

다양한 분산된 처리 노드들에 의해 알파 채널에 기여된 결과들은 이미지의 각각의 후속 수신에 즉시 이용 가능하다는 것을 알 것이다. 따라서, 처리된 이미지를 수신하는 서비스 제공자는 예를 들면, 도 62가 라스베가스에서 남성 및 여성을 묘사하고; 도 63이 남성 및 그의 GMC 트럭을 도시하고; 도 70이 Matthew Doe라는 이름의 어린이를 도시하는 것을 신속히 이해한다. 에지 맵, 컬러 히스토그램, 및 이들 이미지들과 함께 전달된 다른 정보는 이미지의 처리에서 헤드스타트를 서비스 제공자에게 제공하여, 예를 들면, 이를 증대시키고, 그 콘텐트를 인식하고, 적합한 응답을 개시한다. It will be appreciated that the results contributed to the alpha channel by the various distributed processing nodes are readily available for each subsequent reception of the image. Thus, a service provider receiving the processed image may, for example, depict a male and a female in FIG. 62 depicting Las Vegas; 63 shows a male and his GMC truck; It is quickly understood that Figure 70 shows a child named Matthew Doe. Edge maps, color histograms, and other information conveyed with these images provide a headstart to the service provider in the processing of the image, for example to augment it, recognize its content, and initiate an appropriate response.

수신 노드들은 또한, 이용자에 관련된 저장된 프로파일 정보를 향상시키기 위해 전달된 데이터를 이용할 수 있다. 도 66의 메타데이터를 수신하는 노드는 잠재적으로 관심있는 위치로서 라스베가스를 표기할 수 있다. 도 68의 메타데이터를 수신하는 시스템은 GMC Z71 트럭들이 이용자에 및/또는 그 포토에 묘사된 사람에 관련된다고 추론할 수 있다. 이러한 연관들은 재단된 이용자 경험들을 위한 론치 지점들의 역할을 할 수 있다. Receiving nodes may also use the transferred data to enhance stored profile information related to the user. The node receiving the metadata of FIG. 66 may mark Las Vegas as a location of potential interest. The system for receiving the metadata of FIG. 68 may infer that the GMC Z71 trucks are related to the user and / or the person depicted in the photo. These associations can serve as launch points for tailored user experiences.

메타데이터는 또한, 특정 속성들을 가진 이미지들이 이용자 질의들에 응답하여 신속하게 식별되도록 허용한다. (예를 들면, GMC Sierra Z71 트럭들을 보여주는 사진들을 발견한다.) 바람직하게, 웹-인덱싱 크롤러들은 웹 상에서 발견하는 이미지들의 알파 채널들을 확인할 수 있고, 이미지가 검색자들에게 더욱 쉽게 식별 가능하게 하도록 알파 채널로부터의 정보를 컴파일된 인덱스에 추가할 수 있다. The metadata also allows images with certain attributes to be quickly identified in response to user queries. (For example, find photos showing GMC Sierra Z71 trucks.) Preferably, web-indexing crawlers can see the alpha channels of the images they find on the web and make the images easier to identify to searchers. You can add information from the alpha channel to the compiled index.

주지된 바와 같이, 알파 채널-기반 방식은 이 명세서에 상술된 기술들의 이용을 위해 필수적인 것은 아니다. 다른 대안은 이미지 픽셀들의 좌표들에 의해 인덱싱되는 데이터 구조이다. 데이터 구조는 이미지 파일과 함께 전달될 수 있거나(예를 들면, EXIF 헤더 데이터와 같이), 원격 서버에 저장될 수 있다. As noted, an alpha channel-based approach is not necessary for the use of the techniques detailed herein. Another alternative is a data structure indexed by the coordinates of the image pixels. The data structure can be passed with the image file (such as EXIF header data, for example) or stored on a remote server.

예를 들면, 도 66에서 픽셀(637,938)에 대응하는 데이터 구조의 하나의 엔트리는 픽셀이 남성의 얼굴의 부분을 형성하는 것을 나타낼 수 있다. 이 픽셀에 대한 제 2 엔트리는 이 얼굴에 대한 아이겐페이스 값들이 저장되는 공유된 서브-데이터 구조를 가리킬 수 있다. (공유된 서브-데이터 구조는 또한, 그 얼굴과 연관된 모든 픽셀들을 리스팅할 수 있다.) 픽셀(622,970)에 대응하는 데이터 레코드는 픽셀이 남성의 얼굴의 좌측 눈에 대응하는 것을 나타낼 수 있다. 픽셀(155,780)에 의해 인덱싱된 데이터 레코드는 픽셀은 철자 "L"로서 인식된(OCR에 의해) 텍스트의 부분을 형성하고 또한 컬러 히스토그램 빈(49)에 있는 것을 나타낼 수 있다. 정보의 각각의 데이텀의 출처가 또한 기록될 수 있다. For example, one entry in the data structure corresponding to pixels 637 and 938 in FIG. 66 may indicate that the pixels form part of a male's face. The second entry for this pixel may point to a shared sub-data structure in which the eigenface values for this face are stored. (The shared sub-data structure may also list all pixels associated with that face.) A data record corresponding to pixels 622,970 may indicate that the pixel corresponds to the left eye of the male's face. The data records indexed by pixels 155 and 780 may indicate that the pixels form part of the text that is recognized as spelled "L" (by OCR) and are also in color histogram bin 49. The source of each datum of information can also be recorded.

(X- 및 Y-좌표들에 의해 각각의 픽셀을 식별하는 대신에, 각각의 픽셀은 참조되는 순차적 번호가 할당될 수 있다.)(Instead of identifying each pixel by X- and Y-coordinates, each pixel may be assigned a referenced sequential number.)

상이한 픽셀들의 데이터 레코드들로부터 공용 서브-데이터 구조를 가리키는 여러 포인터들 대신에, 엔트리들은 링크된 리스트를 형성할 수 있고, 여기서 각각의 픽셀은 공동 속성(예를 들면, 동일한 얼굴과 연관된)을 가진 다음 픽셀에 대한 포인터를 포함한다. 픽셀에 대한 레코드는 복수의 상이한 서브-데이터 구조들 또는 복수의 다른 픽셀들에 대한 포인터들을 포함할 수 있다 - 복수의 상이한 이미지 특징들 또는 데이터와 픽셀을 연관시키기 위해.Instead of several pointers pointing to a common sub-data structure from data records of different pixels, the entries may form a linked list, where each pixel has a common attribute (eg, associated with the same face). Contains a pointer to the next pixel. The record for the pixel may include a plurality of different sub-data structures or pointers to a plurality of other pixels—to associate the pixel with a plurality of different image features or data.

데이터 구조가 원격으로 저장된다면, 원격 저장에 대한 포인터는 예를 들면, 이미지 데이터에서 스테가노그래픽으로 인코딩되고 EXIF 데이터로 표현되는 등의 이미지 파일과 함께 포함될 수 있다. 임의의 워터마킹 어레인지먼트가 이용된다면, 워터마크의 기원(Digimarc의 특허 6,307,949 참조)은 픽셀 참조들이 오프셋들로서 명시된 기초로서 이용될 수 있다(예를 들면, 이미지의 좌상단 코너를 이용하는 대신). 이러한 어레인지먼트는 크로핑 또는 회전과 같은 오류들에도 불구하고 픽셀들이 정확하게 식별되도록 허용한다. If the data structure is stored remotely, a pointer to remote storage may be included with the image file, for example, steganographically encoded in the image data and represented as EXIF data. If any watermarking arrangement is used, the origin of the watermark (see Digimarc's patent 6,307, 949) may be used as the basis for which pixel references are specified as offsets (eg, instead of using the upper left corner of the image). This arrangement allows the pixels to be correctly identified despite errors such as cropping or rotation.

알파 채널 데이터와 같이, 원격 저장에 기록된 메타데이터는 검색을 위해 이용 가능한 것이 바람직하다. 이미지와 마주치는 웹 크롤러는 메타데이터의 대응하는 저장소를 식별하고 그 저장소로부터 이미지에 대한 인덱스 용어들에 메타데이터를 추가하기 위해(상이한 위치들에서 발견되더라도), 스테가노그래픽으로 인코딩된 워터마크 또는 EXIF 데이터에서 포인터를 이용할 수 있다. Like alpha channel data, metadata recorded in remote storage is preferably available for retrieval. A web crawler that encounters an image may identify a corresponding repository of metadata and add a metadata to the index terms for the image from that repository (even if found at different locations), such as a steganographically encoded watermark or You can use pointers in EXIF data.

상술된 어레인지먼트들에 의해, 기존의 이미지 표준들, 작업흐름들 및 에코시스템들 - 그 픽셀 이미지 데이터를 지원하도록 원래 설계된 - 이 본 명세서에서 마찬가지로 메타데이터의 지원에서 활용되는 것을 알 것이다. By means of the arrangements described above, it will be appreciated that existing image standards, workflows and ecosystems-originally designed to support the pixel image data-are utilized here in the support of metadata as well.

(당연히, 알파 채널 및 이 섹션에서 상술된 다른 방식들은 본 기술의 다른 양태들에 필수적이지 않다. 예를 들면, 도 50, 도 57 및 도 61에 도시된 처리들과 같은 처리들로부터 도출된 또는 추론된 정보는 예를 들면, WiFi 또는 WiMax를 이용하여 패킷화된 데이터로서 디스패치되거나, 블루투스를 이용하여 디바이스로부터 송신되거나, SMS 단문 텍스트 또는 MMS 멀티미디어 메시지들로서 송신되거나, 낮은 전력의 피어-투-피어 무선 네트워크에서 다른 노드와 공유되거나, 무선 셀룰러 송신 다른 송신 또는 무선 데이터 서비스와 함께 전달되거나 등) 어레인지먼트들에 의해 송신될 수 있다.(Of course, the alpha channel and the other ways described above in this section are not essential to other aspects of the present technology. For example, derived from processes such as those shown in FIGS. 50, 57 and 61 or Inferred information can be dispatched as packetized data using, for example, WiFi or WiMax, transmitted from the device using Bluetooth, transmitted as SMS short text or MMS multimedia messages, or a low power peer-to-peer Shared with other nodes in a wireless network, communicated with a wireless cellular transmission other transmission or wireless data service, or the like).

텍스팅Texting 등 Etc

미국 특허들 5,602,566 (Hitachi), 6,115,028 (Silicon Graphics), 6,201,554 (Ericsson), 6,466,198 (Innoventions), 6,573,883 (Hewlett-Packard), 6,624,824 (Sun) 및 6,956,564 (British Telecom), 및 공개된 PCT 출원 WO9814863 (Philips)는 휴대용 컴퓨터들에는 틸팅이 감지될 수 있고, 상이한 목적들에 이용될 수 있는(예를 들면, 메뉴들을 통한 스크롤링) 디바이스들이 장착될 수 있는 것을 개시한다. U.S. Patents 5,602,566 (Hitachi), 6,115,028 (Silicon Graphics), 6,201,554 (Ericsson), 6,466,198 (Innoventions), 6,573,883 (Hewlett-Packard), 6,624,824 (Sun) and 6,956,564 (British Telecom), and published 63 PCTPhilips WO98P ) Discloses that portable computers can be equipped with devices in which tilting can be sensed and used for different purposes (eg, scrolling through menus).

본 기술의 다른 양태에 따라, 팁/틸트 인터페이스는 PDA, 셀 폰 또는 다른 휴대용 무선 디바이스로부터 간단 메시지 서비스(SMS) 프로토콜에 의해 전송된 텍스트 메시지들을 구성하는 것과 같이, 타이핑 동작과 관련하여 이용된다. According to another aspect of the present technology, a tip / tilt interface is used in connection with a typing operation, such as constructing text messages sent by a simple message service (SMS) protocol from a PDA, cell phone or other portable wireless device.

일 실시예에서, 이용자는 임의의 다양한 알려진 수단을 이용하여 팁/틸트 텍스트 입력 모드를 활성화한다(예를 들면, 버튼을 누름, 제스처를 입력함, 등). 스크롤 가능한 이용자 인터페이스는 일련의 아이콘들을 제공하는 디바이스 스크린 상에 나타난다. 각각의 아이콘은 숫자 "2" 및 글자 "abc"를 묘사하는 버튼과 같은 셀 폰 키의 출현을 가진다. 이용자는 원하는 버튼에 도달하기 위해 일련의 아이콘들을 통해 역방향 또는 순방향으로 스크롤하기 위해 디바이스를 좌우로 틸팅한다. 이용자는 그 후에, 그 아이콘과 연관된 3개의 글자들 사이에서 네비게이팅하기 위해 그들 자신쪽으로 또는 그로부터 멀리 디바이스를 팁핑한다(예를 들면, "a"에 멀어지게 네비게이팅하는 팁핑; "b"에 대응하는 팁핑이 없음; "c"쪽으로 네비게이팅하는 팁핑). 원하는 글자에 네비게이팅한 후에, 이용자는 그 글자를 선택하기 위한 동작을 취한다. 이 동작은 디바이스 상의 버튼을 누를 수 있거나(예를 들면, 이용자의 엄지로), 또는 다른 동작이 선택을 시그널링할 수 있다. 이용자는 그 후에 후속 글자들을 선택하기 위해 기술된 바와 같이 진행한다. 이 어레인지먼트에 의해, 이용자는 작은 버튼들 또는 UI 특징들 상에 큰 손가락들의 제약들 없이 일련의 텍스트를 입력한다. In one embodiment, the user activates the tip / tilt text input mode using any of a variety of known means (eg, pressing a button, entering a gesture, etc.). The scrollable user interface appears on the device screen providing a series of icons. Each icon has the appearance of a cell phone key, such as a button depicting the number "2" and the letter "abc". The user tilts the device left and right to scroll backward or forward through a series of icons to reach the desired button. The user then tips the devices towards or away from themselves to navigate between the three letters associated with the icon (eg, tipping to navigate away from “a”; corresponding to “b”). No tipping; tipping to navigate towards "c"). After navigating to a desired letter, the user takes an action to select that letter. This action may press a button on the device (eg with a user's thumb), or another action may signal a selection. The user then proceeds as described for selecting subsequent letters. By this arrangement, the user enters a series of text without the constraints of large fingers on small buttons or UI features.

많은 변형들이 당연히 가능하다. 디바이스는 폰일 필요가 없다; 손목시계, 키포브(keyfob)일 수 있거나, 다른 작은 형태의 팩터를 가질 수 있다. Many variations are of course possible. The device does not need to be a phone; It can be a watch, keyfob, or other small form factor.

디바이스는 터치-스크린을 가질 수 있다. 원하는 캐릭터로의 네비게이팅 후에, 이용자는 선택을 행하기 위해 터치 스크린을 탭핑할 수 있다. 디바이스를 팁핑/틸팅할 때, 내비게이션에서 이용자의 진행을 나타내기 위해, 대응하는 글자가 확대된 방식으로 스크린 상에 디스플레이될 수 있다(예를 들면, 버튼을 표현하는 아이콘 상에 또는 다른 곳에 오버레이되어). The device may have a touch-screen. After navigating to the desired character, the user can tap the touch screen to make a selection. When tipping / tilting the device, the corresponding letters may be displayed on the screen in an enlarged manner to indicate the user's progress in navigation (eg, overlaid on or over an icon representing the button). ).

가속도계들 또는 다른 물리적 센서들이 특정 실시예들에서 활용되었지만, 다른 것들이 2D 옵션 센서(예를 들면, 카메라)를 이용한다. 이용자는 바닥에, 무릎에, 또는 다른 대상에 센서를 향하게 할 수 있고, 디바이스는 그 후에, 이미지 프레임 내의 특징들의 움직임(상향/하향; 왼쪽 오른쪽)을 감지함으로써 관련 물리적 움직임을 감지한다. 이러한 실시예들에서, 카메라에 의해 캡처된 이미지 프레임은 스크린 상에 제공될 필요가 없다; 심볼 선택 UI이, 단독으로 디스플레이될 수 있다. (또는 UI는 카메라에 의해 캡처된 배경 이미지 상에 오버레이로서 제공될 수 있다.) Accelerometers or other physical sensors have been utilized in certain embodiments, while others use 2D optional sensors (eg, cameras). The user may direct the sensor to the floor, to the knees, or to another object, and the device then detects the relevant physical movement by sensing the movement (up / down; left and right) of the features in the image frame. In such embodiments, the image frame captured by the camera need not be provided on the screen; The symbol selection UI may be displayed alone. (Or the UI may be provided as an overlay on the background image captured by the camera.)

카메라-기반 실시예들에서, 물리적 센서들을 활용하는 실시예들에서와 같이, 다른 차원의 움직임이 또한 감지될 수 있다: 상향/하향. 이것은 부가적인 정도의 제어를 제공할 수 있다(예를 들면, 대문자들로의 시프팅, 또는 캐릭터들에서 숫자들로의 시프팅, 또는 현재 심볼을 선택 등).In camera-based embodiments, as in embodiments utilizing physical sensors, other dimensions of motion may also be sensed: up / down. This may provide an additional degree of control (eg shifting to uppercase letters, shifting from characters to numbers, or selecting the current symbol, etc.).

일부 실시예들에서, 디바이스는 여러 모드들을 가진다: 하나는 텍스트를 입력하기 위해; 다른 하나는 숫자들을 입력하기 위해; 다른 하나는 심볼들을 입력하기 위해; 등. 이용자는 기계적 제어들(예를 들면 버튼들)을 이용하여 또는 이용자 인터페이스의 제어들(예를 들면, 터치들 또는 제스처들 또는 음성 명령어들)을 통하여 이들 모드들 사이를 스위칭할 수 있다. 예를 들면, 스크린의 제 1 영역을 탭핑하는 것은 현재 디스플레이된 심볼을 선택할 수 있고, 스크린의 제 2 영역을 탭핑하는 것은 캐릭터 입력과 숫자 입력 사이에서 모드를 토글링할 수 있다. 또는 이 제 2 영역에서의 하나의 탭은 캐릭터 입력(디폴트)으로 스위칭할 수 있다; 이 영역에서 2개의 탭들은 숫자 입력으로 스위칭할 수 있다; 그리고 이 영역에서 3개의 탭들은 다른 심볼들의 엔트리로 스위칭할 수 있다. In some embodiments, the device has several modes: one for entering text; The other for entering numbers; The other for inputting symbols; Etc. The user can switch between these modes using mechanical controls (eg buttons) or via controls of the user interface (eg touches or gestures or voice commands). For example, tapping the first area of the screen can select the currently displayed symbol and tapping the second area of the screen can toggle the mode between character input and numeric input. Or one tap in this second area can switch to character input (default); Two taps in this area can switch to numeric input; And three taps in this area can switch to entry of other symbols.

개별 심볼들 사이에서 선택하는 대신에, 이러한 인터페이스는 또한, 공용 단어들 또는 구문들(예를 들면, 시그너처 블록들)을 포함할 수 있고, 그것에, 이용자는 팁/틸트 네비게이팅할 수 있고 그 후에 선택할 수 있다. 단어들/구문들의 여러 리스트들이 존재할 수 있다. 예를 들면, 제 1 리스트는 표준화될 수 있고(디바이스 벤더에 의해 사전-프로그래밍됨), 통계적으로 공용 단어들을 포함한다. 제 2 리스트는 특정 이용자(또는 이용자들의 특정한 등급)와 연관되는 단어들 및/또는 구문들을 포함할 수 있다. 이용자는 이러한 리스트에 이들 단어들을 입력할 수 있거나, 디바이스는 동작 동안 리스트를 컴파일할 수 있다 - 어떤 단어들이 이용자에 의해 가장 일반적으로 입력되는지를 결정한다. (제 2 리스트는 제 1 리스트 상에서 발견된 단어들을 배제하거나 하지 않을 수 있다.) 다시, 이용자는 상기에 기재된 바와 같이 이들 리스트들 사이를 스위칭할 수 있다. Instead of selecting between individual symbols, this interface may also include common words or phrases (eg, signature blocks), in which the user can tip / tilt navigate and then You can choose. There may be several lists of words / phrases. For example, the first list can be standardized (pre-programmed by the device vendor) and statistically include common words. The second list may include words and / or phrases associated with a particular user (or a particular class of users). The user can enter these words in this list, or the device can compile the list during operation-to determine which words are most commonly entered by the user. (The second list may or may not exclude words found on the first list.) Again, the user can switch between these lists as described above.

바람직하게, 팁/틸트 인터페이스의 감도는 상이한 이용자 선호들 및 기술들을 수용하기 위해 이용자에 의해 조정 가능하다. Preferably, the sensitivity of the tip / tilt interface is adjustable by the user to accommodate different user preferences and techniques.

상술된 실시예들이 제한된 문법의 틸트들/팁들을 고려하였지만, 더욱 확장된 문법들이 고안될 수 있다. 예를 들면, 왼쪽으로 스크린을 비교적 느리게 틸팅하면 주어진 방향(구현에 의존하여 왼쪽 또는 오른쪽)으로 아이콘들을 스크롤하게 할 수 있고, 그 방향으로의 스크린의 갑작스런 틸팅은 - 텍스트에서 라인(또는 단락) 브레이크를 삽입하는 것과 같이 - 상이한 동작을 실행할 수 있다. 다른 방향으로의 급격한 틸트는 디바이스가 메시지를 송신하게 할 수 있다. Although the embodiments described above have considered tilts / tips of limited grammar, more extended grammars can be devised. For example, tilting the screen relatively slowly to the left can cause the icons to scroll in a given direction (left or right depending on the implementation), and sudden tilting of the screen in that direction can cause breaks in the line (or paragraph) in the text. As you insert-you can perform different actions. A sharp tilt in the other direction can cause the device to send a message.

틸트의 속도 대신에, 틸트의 각도는 상이한 동작들에 대응할 수 있다. 예를 들면, 5도 내지 25도로 디바이스를 틸팅하는 것은 아이콘들을 스크롤하게 할 수 있지만, 30도 이상 디바이스를 틸팅하면 라인 브레이크를 삽입할 수 있거나(왼쪽이면) 메시지가 전달되게 할 수 있다(오른쪽이면).Instead of the speed of tilt, the angle of tilt may correspond to different actions. For example, tilting the device at 5 to 25 degrees can cause icons to scroll, but tilting the device above 30 degrees can insert a line break (if left) or have a message delivered (if right). ).

상이한 팁 제스처들은 상이한 동작들을 마찬가지로 트리거링할 수 있다. Different tip gestures may likewise trigger different actions.

방금 기술된 어레인지먼트들은 많은 상이한 가능성들 중 몇몇만 필수적이다. 이러한 기술을 채택한 기술자들은 특정 애플리케이션들에 적합한 것으로서 이들 개시내용들을 수정 및 적응시키도록 예상된다.The arrangements just described are only a few of the many different possibilities. Those skilled in the art that employ this technology are expected to modify and adapt these disclosures as appropriate for their particular applications.

어파인Affine 캡처 파라미터들 Capture parameters

본 기술의 다른 양태에 따라, 휴대용 디바이스는 디바이스의 위치(또는 대상의 위치)에 관련된 기하학 정보를 캡처한다 - 그리고 제공할 수 있다.According to another aspect of the present technology, a portable device may capture and provide geometric information related to the location (or location of the object) of the device.

Digimarc의 공개된 특허 출원 20080300011은 셀 폰이 특정 이미징 오브젝트들의 상부에 그래픽 특징들을 오버레이하는 것을 포함하여 "보는 것"에 대한 응답이 이루어질 수 있는 다양한 어레인지먼트들을 개시한다. 오버레이는 오브젝트의 지각된 어파인 왜곡에 따라 랩핑될 수 있다.Digimarc's published patent application 20080300011 discloses various arrangements in which a cell phone can respond in response to "seeing", including overlaying graphical features on top of certain imaging objects. The overlay can be wrapped according to the perceived affine distortion of the object.

이미징 오브젝트의 어파인 왜곡이 정확하게 양자화될 수 있는 스테가노그래픽 교정 신호들은 예를 들면, Digimarc의 특허들 6,614,914 및 6,580,809; 및 특허 공개들 20040105569, 20040101157, 및 20060031684에 상술된다. Digimarc의 특허 6,959,098은 왜곡이 어떻게 가시적인 이미지 특징들(예를 들면, 직사각 오브젝트의 에지들)과 함께 이러한 워터마크 교정 신호들에 의해 특징지워질 수 있는 것을 개시한다. 이러한 어파인 왜곡 정보로부터, 셀 폰의 이미저에 관련된 워터마킹된 오브젝트의 6D 위치가 결정될 수 있다. Steganographic correction signals for which the affine distortion of an imaging object can be accurately quantized are described, for example, in Digimarc's patents 6,614,914 and 6,580,809; And patent publications 20040105569, 20040101157, and 20060031684. Digimarc's patent 6,959,098 discloses how distortion can be characterized by such watermark correction signals along with visible image features (eg, edges of a rectangular object). From this affine distortion information, the 6D position of the watermarked object relative to the imager of the cell phone can be determined.

6D 위치가 기술될 수 있는 다양한 방법들이 존재한다. 하나는 3개의 위치 파라미터들: x, y, z, 및 3개의 각도 파라미터들: 팁, 틸트, 회전에 의한 것이다. 다른 하나는 선형 변환을 규정하는 4개의 요소들의 2D 메트릭스와 함께 회전 및 스케일 파라미터들에 의한 것이다(예를 들면, 전단 맵핑(shear mapping), 번역 등). 메트릭스는 선형 변환이 발생된 후에, 임의의 픽셀 x, y의 위치를 결과 위치로 변환한다. (판독자는 메트릭스 매스에 대한 정보 등을 위해, 전단 맵핑에 대한 참조들, 예를 들면 위키피디어를 참조한다.) There are various ways in which 6D location can be described. One is by three position parameters: x, y, z, and three angle parameters: tip, tilt, rotation. The other is by rotation and scale parameters along with the 2D metrics of the four elements defining the linear transformation (eg shear mapping, translation, etc.). The matrix transforms the position of any pixel x, y into the resulting position after the linear transformation has occurred. (The reader refers to references to shear mapping, for example Wikipedia, for information about the matrix mass.)

도 58은 셀 폰이 어파인 파라미터들(예를 들면, 이미지 또는 다른 곳으로부터 도출됨)을 디스플레이할 수 있는 방법을 도시한다. 카메라는 UI 제어(예를 들면, 물리적 버튼을 탭핑, 터치스크린 제스처를 만듦, 등)를 통해 이 모드에 배치될 수 있다.58 illustrates how a cell phone can display affine parameters (eg, derived from an image or elsewhere). The camera may be placed in this mode through UI controls (eg, tapping physical buttons, making touchscreen gestures, etc.).

묘사된 어레인지먼트에서, (분명한) 수평 방향으로부터의 디바이스의 회전은 셀 폰 스크린의 상부에 제공된다. 셀 폰 처리기는 하나 이상의 일반적으로 평행하게 긴 직선 에지 특징들에 대한 이미지 데이터를 분석하고, 평균을 결정하기 위해 이들을 평균내고, 이것이 수평인 것을 가정함으로써 이 결정을 할 수 있다. 카메라가 통상적으로 수평과 정렬되면, 이 평균 라인은 수평일 것이다. 수평으로부터 이 라인의 발산은 카메라의 회전을 나타낸다. 이 정보는 텍스트로 제공될 수 있고(예를 들면, "12도 오른쪽"), 및/또는 수평으로부터의 발산을 보여주는 그래픽 표현이 활용될 수 있다. In the depicted arrangement, the rotation of the device from the (obvious) horizontal direction is provided on top of the cell phone screen. The cell phone processor can make this determination by analyzing image data for one or more generally parallel long edge features, averaging them to determine an average, and assuming it is horizontal. If the camera is typically aligned with the horizontal, this average line will be horizontal. The divergence of this line from the horizontal indicates the camera's rotation. This information may be provided in text (eg, "12 degrees right"), and / or a graphical representation showing divergence from the horizontal may be utilized.

(각도 배향을 감지하기 위한 다른 수단이 활용될 수 있다. 예를 들면, 많은 셀 폰들은 가속도계들 또는 다른 틸트 검출기들을 포함하며, 이들은 셀 폰 처리기가 디바이스의 각도 배향을 구별할 수 있는 데이터를 출력한다. (Other means for sensing angular orientation may be utilized. For example, many cell phones include accelerometers or other tilt detectors, which output data in which the cell phone processor can distinguish the angular orientation of the device. do.

예시된 실시예에서, 카메라는 이 동작 모드에 있을 때, 이미지 프레임들의 시퀀스(예를 들면 비디오)를 캡처한다. 제 2 데이텀은 이미지 캡처가 시작된 이후 이미지 프레임에서의 특징들이 회전된 각도를 나타낸다. 다시, 이 정보는 이미지 데이터의 분석에 의해 수집될 수 있고, 텍스트로 및/또는 그래픽으로 제공될 수 있다. (그래픽은 카메라의 왼쪽 또는 오른쪽으로의 실시간 각도 이동을 보여주는 센터를 통해 선을 가진 원 - 또는 화살표 - 을 포함할 수 있다.) In the illustrated embodiment, the camera captures a sequence of image frames (eg video) when in this mode of operation. The second datum represents the angle at which features in the image frame have been rotated since image capture began. Again, this information can be collected by analysis of the image data and provided in text and / or graphically. (The graphic can include a circle-or an arrow-with a line through the center showing real-time angular movement to the left or right of the camera.)

유사한 방식으로, 디바이스는 이미지 캡처가 시작된 이후 스케일이 변경된 양을 결정하기 위해, 이미지에서의 에지들의 분명한 크기, 오브젝트들 및/또는 다른 특징들의 변경들을 추적할 수 있다. 이것은 카메라가 오브젝트쪽으로 또는 그로부터 멀어지게 이동되었는지와 얼마나 많이 이동되었는지를 나타낸다. 다시, 정보는 텍스트로 및 그래픽으로 제공될 수 있다. 그래픽 표현은 2개의 라인들을 포함할 수 있다: 기준 라인, 및 스케일 변경에 따라 실시간으로 그 길이가 변경하는 제 2 병렬 라인(대상에 더 가까운 카메라의 이동을 위해 기준 라인보다 크게, 그리고 멀어지는 이동에 대해 더 작게). In a similar manner, the device can track changes in the apparent size of the edges, objects and / or other features in the image to determine the amount of scale change since image capture began. This shows how far the camera has been moved towards or away from the object and how much has been moved. Again, the information can be provided in text and graphically. The graphical representation may include two lines: a reference line, and a second parallel line whose length changes in real time as the scale changes (larger than the reference line for movement of the camera closer to the subject, and at a distance away from it). Smaller).

도 58의 예시적인 실시예에 특별히 도시되지 않았지만, 예를 들면, 번역, 상이한 스케일링, 팁 각도(즉, 순방향/역방향) 등의 다른 그러한 기하학 데이터가 또한 도출되거나 제공될 수 있다. Although not specifically shown in the example embodiment of FIG. 58, other such geometric data may also be derived or provided, for example, translation, different scaling, tip angle (ie, forward / reverse), and the like.

상술된 결정들은 카메라 시야가 참조된 특허 문헌들에 상술된 종류의 스테가노그래픽 교정/배향 데이터를 갖는 디지털 워터마크를 포함하는 경우에 간단해질 수 있다. 그러나, 정보는 또한 이미지에서의 다른 특징들로부터 도출될 수 있다. The above-described determinations can be simplified when the camera field of view includes a digital watermark with steganographic correction / orientation data of the kind described above in the referenced patent documents. However, the information can also be derived from other features in the image.

당연히, 또 다른 실시예들에서, 하나 이상의 가속도계들 또는 디바이스에서 다른 위치 감지 어레인지먼트들로부터의 데이터는 - 단독으로 또는 이미지 데이터와 함께 - 제공되는 정보를 생성하기 위해 이용될 수 있다. Of course, in still other embodiments, data from other position sensing arrangements in one or more accelerometers or devices may be used to generate information provided alone or in combination with image data.

이러한 기하학 정보를 디바이스 스크린 상에 제공하는 것 외에도, 이러한 정보는 예를 들면, 이용자에 의해 디바이스로 만들어진 제스처들의 감지시에, 원격 시스템이 맞춤식될 수 있는 콘텍스트의 제공시 등에서 또한 이용될 수 있다.In addition to providing such geometric information on the device screen, this information can also be used, for example, in the detection of gestures made by the user by the user, in the context of providing a context in which the remote system can be customized, and the like.

카메라-기반 환경 및 거동 상태 Camera-Based Environment and Behavior 머신machine

본 기술의 다른 양태에 따라, 셀 폰은 상태 머신으로 기능하며, 예를 들면, 이전에 획득된 이미지-관련 정보에 기초하여 그 기능의 양태들을 변경한다. 이미지-관련 정보는 카메라 이용자의 자연스러운 거동, 카메라가 동작되는 통상적인 환경들, 카메라 자체의 고유한 물리적 특성들, 카메라에 의해 이미징되는 장면들의 구조 및 동적 속성들, 및 많은 다른 그러한 카테고리들의 정보에 초점이 맞추어질 수 있다. 카메라 기능의 결과로서 생긴 변경들은 어떤 이미지-분석 서버에 원격으로 위치되거나 카메라-디바이스 상에 상주하는 이미지 분석 프로그램들을 개선시키는 쪽으로 향해질 수 있다. 이미지 분석은 디지털 워터마크 판독에서부터 오브젝트 및 얼굴 인식까지, 2-D 및 3-D 바코드 판독 및 광학 캐릭터 인식까지, 장면 카테고리화 분석을 통한 모든 방식들로, 아니 그 이상까지의 분석 범위를 커버하여, 매우 광범위하게 해석된다.According to another aspect of the present technology, a cell phone functions as a state machine, for example changing aspects of its functionality based on previously acquired image-related information. The image-related information is related to the natural behavior of the camera user, the typical environments in which the camera operates, the physical properties of the camera itself, the structure and dynamic properties of the scenes imaged by the camera, and many other such categories of information. Can be focused. Changes resulting from camera functionality can be directed towards improving image analysis programs that are remotely located on any image-analysis server or reside on a camera-device. Image analysis covers the full range of analysis, from digital watermark reading to object and face recognition, 2-D and 3-D barcode reading and optical character recognition, in all ways through scene categorization analysis and beyond. It is interpreted very broadly.

몇몇 간단한 예들은 미래의 모바일 디바이스의 중요한 양태가 될 것으로 예상되는 것을 도시할 것이다. Some simple examples will illustrate what is expected to be an important aspect of future mobile devices.

오브젝트 인식의 문제를 고려하자. 대부분의 오브젝트들은 이들이 뷰잉되는 각도에 의존하여 상이한 출현들을 가진다. 머신 버전 오브젝트-인식 알고리즘이 오브젝트가 뷰잉되는 조망에 관한 어떤 정보가 주어지면, 오브젝트가 무엇인지의 더욱 정확한(더욱 신속한) 추측을 할 수 있다.Consider the problem of object recognition. Most objects have different appearances depending on the angle at which they are viewed. Given some information about the view in which an object is viewed, the machine version object-recognition algorithm can make a more accurate (and faster) guess of what the object is.

사람들은 셀 폰 카메라들의 이용을 포함한 습관의 창조물들이다. 이것은 이들이 통상적으로 폰을 쥐고 있는 손 및 사진을 찍는 동안 그것을 어떻게 기울이는지에 확장된다. 이용자가 폰으로 이력을 확립한 후에, 이용 패턴들이 캡처된 이미지들로부터 구별될 수 있다. 예를 들면, 이용자는 대상의 포토들을 바로가 아니라 약간 오른쪽으로 취하려고 할 수 있다. 조망에서 이러한 오른쪽-경사진 경향은 일반적으로 이용자가 오른 손에 카메라를 잡고 있어서, 노출들이 약간 오른쪽 중심에서 취해진다는 사실에 기인할 수 있다. People are creatures of habit, including the use of cell phone cameras. This extends to the hands they usually hold the phone and how to tilt it while taking pictures. After the user establishes a history with the phone, the usage patterns can be distinguished from the captured images. For example, the user may try to take the photos of the subject slightly to the right rather than straight. This right-beveling tendency in the view can generally be due to the fact that the user is holding the camera in his right hand, so that the exposures are taken slightly from the right center.

(오른쪽-경사짐은 예를 들면, 이미지 프레임들 내의 수직 병렬 에지들의 길이들에 의해 다양한 방식들로 감지될 수 있다. 에지들이 이미지의 오른쪽들 상에 더 이상 있지 않으려 한다면, 이것은 오른쪽-경사진 뷰로부터 이미지들이 취해진 것을 나타내려고 한다. 전경 오브젝트들에 걸친 조명의 차이들이 또한 이용될 수 있다 - 대상들의 오른쪽 상의 더 밝은 조명은 오른쪽이 렌즈에 더 가까울 것을 제안한다. 등.) (Right-tilt can be sensed in various ways by, for example, the lengths of the vertical parallel edges in the image frames. If the edges no longer want to be on the right sides of the image, this is right-tilted Try to show that the images were taken from the view, differences in illumination across the foreground objects can also be used-brighter illumination on the right side of the objects suggests that the right side is closer to the lens.

유사하게, 디바이스를 잡고 있는 동안 폰의 셔터 버튼을 수월하게 조작하기 위하여, 이러한 특정 이용자는 카메라의 상부를 이용자쪽으로 5도(즉, 왼쪽으로) 기울이는 포토의 그립(grip)을 습관적으로 채택할 수 있다. 캡처된 이미지 대상들에서 이 결과들은 일반적으로 5도의 분명한 회전으로 비스듬하게 된다. Similarly, in order to easily manipulate the shutter button of the phone while holding the device, this particular user may habitually adopt a grip of the photo which tilts the top of the camera 5 degrees (ie to the left) towards the user. have. In captured image objects these results are generally oblique with a clear rotation of 5 degrees.

이러한 재발생 바이어스들은 그 셀 폰과 그 이용자에 의해 캡처된 이미지들의 콜렉션을 조사함으로써 구별될 수 있다. 일단 식별되면, 이들 특질들을 기억하는 데이터가 메모리에 저장될 수 있고, 디바이스에 의해 실행된 이미지 인식 처리들을 최적화하기 위해 이용될 수 있다. These reoccurring biases can be distinguished by examining the collection of images captured by the cell phone and its user. Once identified, data storing these characteristics can be stored in memory and used to optimize image recognition processes performed by the device.

따라서, 디바이스는 한 시간에 주어진 이미지 프레임으로부터 제 1 출력(예를 들면, 잠정적인 오브젝트 식별)을 생성할 수 있지만, 나중 시간에 동일한 이미지 프레임으로부터 제 2의 상이한 출력(예를 들면, 상이한 오브젝트 식별)을 생성할 수 있다 - 카메라의 개재 이용으로 인해. Thus, the device may generate a first output (e.g., tentative object identification) from a given image frame at one time, but a second different output (e.g., different object identification) from the same image frame at a later time. )-Due to the intervening use of the camera.

이용자의 손의 지터의 특징적인 패턴이 또한 복수의 이미지들의 실험에 의해 추론될 수 있다. 예를 들면, 상이한 노출 기간들의 화상들을 조사함으로써, 이용자가 왼쪽-오른쪽(수평) 방향으로 우세한 4헤르츠의 주파수를 가진 지터를 가진다는 것을 발견할 수 있다. 그 지터 거동에 재단된(및 또한 노출의 길이에 의존하는) 예리한 필터들이 그 후에 결과로서 생긴 이미지를 향상시키기 위해 적용될 수 있다.The characteristic pattern of jitter of the user's hand can also be inferred by experimentation of a plurality of images. For example, by examining images of different exposure periods, one may find that the user has jitter with a frequency of 4 hertz predominant in the left-right (horizontal) direction. Sharp filters tailored to the jitter behavior (and also dependent on the length of exposure) can then be applied to enhance the resulting image.

유사한 방식으로, 이용을 통해, 디바이스는 9:00 - 5:00의 주일 시간들 동안 이용자에 의해 캡처된 이미지들이 일반적으로 형광의 스펙트럼 특징으로 조명되고, 그에, 오히려 급격한 백색-밸런싱 동작이 시도 및 보상을 위해 적용되어야 한다. 이 경향의 사전 지식을 이용하여, 디바이스는 베이스라인 노출 파라미터들과는 상이하게 그 시간들 동안 캡쳐된 포토들을 노출할 수 있다 - 형광 조명을 예상하고, 더 양호한 백색 밸런스가 달성되도록 허용한다.In a similar manner, through the use, the device is generally illuminated with spectral features of fluorescence of images captured by the user during weekly hours of 9:00-5:00, whereby a rather sharp white-balancing operation is attempted and Should be applied for compensation. Using prior knowledge of this trend, the device can expose the photos captured during those times differently from the baseline exposure parameters-expect fluorescent illumination and allow a better white balance to be achieved.

시간에 걸쳐, 디바이스는 이용자의 습관적인 거동 또는 환경적 변수들의 일부 양태를 모델링하는 정보를 도출한다. 그 후에, 디바이스는 일부 양태의 동작을 따라서 적응시킨다. Over time, the device derives information that models some aspect of the user's habitual behavior or environmental variables. Thereafter, the device adapts according to some aspect of the operation.

디바이스는 또한, 그 자신의 특색들 또는 저하들에 적응시킬 수 있다. 이들은 이미지 센서의 포토다이오드들의 비균일성들, 이미지 센서 상의 먼지, 렌즈 상의 흠들 등을 포함한다. The device may also adapt to its own features or degradations. These include nonuniformities of the photodiodes of the image sensor, dust on the image sensor, flaws on the lens, and the like.

다시, 시간에 걸쳐, 디바이스는 재발생 패턴을 검출할 수 있다: (a) 하나의 픽셀은 인접한 픽셀들보다 2% 낮은 평균 출력 신호를 제공하고; (b) 연속하는 그룹의 픽셀들은 나타낸 평균들보다 낮은 약 3개의 디지털 숫자들인 신호들을 출력하는 경향이 있고; (c) 포토센서의 특정 영역은 높은 주파수 상세를 캡처할 것 같지 않다 - 그 영역에서의 이미지는 일관되게 흐릿한 비트이다, 등. 이러한 재발생 현상으로부터, 디바이스는 예를 들면 (a) 이 픽셀을 서빙하는 증폭기에 대한 이득이 낮고; (b) 먼지 또는 다른 이질적인 오브젝트가 이들 픽셀들을 차단하고 있고; (c) 렌즈 흠은 포토센서의 이 영역에 있는 광이 적당하게 초점이 맞추어지는 것을 방지하는 것 등을 추론할 수 있다. 그 후에, 이들 결함들을 완화하기 위해 적절한 보상들이 적용될 수 있다. Again, over time, the device can detect the reoccurrence pattern: (a) one pixel provides an average output signal that is 2% lower than adjacent pixels; (b) consecutive groups of pixels tend to output signals that are about three digital numbers lower than the indicated averages; (c) Certain areas of the photosensor are unlikely to capture high frequency details-images in those areas are consistently blurry bits, etc. From this reoccurrence, the device is for example (a) having a low gain for the amplifier serving this pixel; (b) dust or other foreign object is blocking these pixels; (c) Lens flaws can be inferred from preventing the light in this area of the photosensor from being properly focused. Thereafter, appropriate compensations can be applied to mitigate these defects.

대상물 또는 "이미징되는 장면들"의 공용 양태들은 픽셀 데이터를 최적으로 필터링 및/또는 변환함으로써 나중 단계의 이미지 분석 루틴들을 보조하는 적어도 초기-단의 이미지 처리 단계들 또는 후속 이미지 분석 루틴들을 위한 다른 풍부한 소스의 정보이다. 예를 들면, 주어진 이용자가 단지 3개의 기본 관심들: 디지털 워터마크 판독, 바코드 판독 및 실험실에서 실험적 셋업들의 비주얼 로깅을 위해 이 카메라들을 이용하는 것은 카메라 이용의 몇일 및 몇주에 걸쳐 분명해질 수 있다. 일부 주어진 카메라 이용이 어떤 "최종 결과(end result)" 동작을 유발하였고, 두 워터마크 및 바코드 기본 특성들의 초기 검출들에 집중된 처리 사이클들의 증가가 뒤따르는지를 보여주는 히스토그램이 시간을 통해 전개될 수 있다. 여기서 비트를 더 깊이 드릴링하여, 푸리에-변환된 세트의 이미지 데이터는 신속한 2-D 바코드 검출 기능으로 우선적으로 라우팅될 수 있으며, 그렇지 않으면 우선순위를 벗어날 수 있다. 디지털 워터마킹 판독에 대해서도 마찬가지이며, 여기서 푸리에 변환된 데이터는 특수화된 패턴 인식 루틴으로 선적될 수 있다. 이 상태-머신 변경을 뷰잉하기 위한 부분적인 요약 방식은 카메라 디바이스에 이용 가능한 고정된 양의 CPU 및 이미지-처리 사이클들만 존재하고, 어떤 모드들의 분석들이 그들 사이클들의 어떤 부분들을 얻는지에 대한 선택들이 이루어져야 한다. Common aspects of the object or “imaged scenes” are at least early-stage image processing steps or other rich for subsequent image analysis routines that assist later image analysis routines by optimally filtering and / or transforming pixel data. Source information. For example, a given user using these cameras for only three basic interests: digital watermark reading, barcode reading, and visual logging of experimental setups in the laboratory may become apparent over the days and weeks of camera use. A histogram can be developed over time showing what "end result" operation some given camera use caused, followed by an increase in processing cycles focused on initial detections of both watermark and barcode basic characteristics. . By drilling the bit deeper here, the Fourier-transformed set of image data may be preferentially routed to fast 2-D barcode detection, otherwise it may be out of priority. The same is true for digital watermark reading, where the Fourier transformed data can be shipped to a specialized pattern recognition routine. The partial summary scheme for viewing this state-machine change is that there is only a fixed amount of CPU and image-processing cycles available to the camera device, and choices must be made as to which modes of analysis get what parts of those cycles. do.

이러한 실시예들의 과도하게 단순화된 표현이 도 59에 도시된다. An oversimplified representation of these embodiments is shown in FIG. 59.

방금 논의된 바와 같은 어레인지먼트들에 의해, 이미저 장착된 디바이스의 동작은 연속 동작을 통해 전개된다.With the arrangements just discussed, the operation of the imager mounted device is developed through continuous operation.

초점 문제들, 페이지 레이아웃에 기초한 향상된 프린트-투-웹 Focus issues, improved print-to-web based on page layout 링킹Linking

대부분의 셀 폰들 및 다른 휴대용 PDA형 디바이스들에 장착된 카메라들은 일반적으로 조정 가능한 초점들을 가지지 않는다. 오히려, 광학들은 - 통상적인 초상화 스냅샷 및 풍경 환경들 하의 어울리는 이미지를 획득하기 위한 목적의 - 절충 방식으로 구성된다. 가까운 거리들에서의 이미징은 일반적으로 하위 결과들을 생성한다 - 높은 주파수 상세를 잃음. (이것은 방금 논의된 "필드의 연장된 깊이" 이미지 센서들에 의해 개량되지만, 그러한 디바이스들의 널리 보급된 전개는 아직 발생하지 않았다.)Cameras mounted on most cell phones and other portable PDA-type devices generally do not have adjustable focal points. Rather, the optics are configured in a compromise manner-for the purpose of obtaining a matching image under conventional portrait snapshots and landscape environments. Imaging at close distances generally produces lower results-losing high frequency detail. (This is improved by the "extended depth of field" image sensors just discussed, but widespread deployment of such devices has not yet occurred.)

인간 시각 시스템은 상이한 스펙트럼 주파수들에서 이미지에 대한 상이한 감도를 가진다. 상이한 이미지 주파수들은 상이한 인상들을 전달한다. 저주파수들은 배향 및 일반적인 형상과 같이 이미지에 관한 전역적인 정보를 제공한다. 고주파수들은 미세한 상세들 및 에지들을 제공한다. 도 72에 도시된 바와 같이, 인간 시각 시스템의 감도는 망막 상에서 약 10 cycles/mm의 주파수들에서 피크하고, 측면 상에서 급격히 떨어진다. (지각은 또한 구별되려는 특징들 사이의 콘트라스트에 의존한다 -수직 축.) 평행선의 음영 구역에서 공간 주파수들 및 콘트라스트를 가진 이미지 특징들은 일반적으로 인간들에 의해 지각되지 않는다. 도 73은 개별적으로 묘사된 낮은 및 높은 주파수들(왼쪽 및 오른쪽)을 가진 이미지를 도시한다.Human visual systems have different sensitivity to the image at different spectral frequencies. Different image frequencies convey different impressions. Low frequencies provide global information about the image, such as orientation and general shape. High frequencies provide fine details and edges. As shown in FIG. 72, the sensitivity of the human visual system peaks at frequencies of about 10 cycles / mm on the retina and drops sharply on the side. (Perception also depends on the contrast between the features to be distinguished-the vertical axis.) Image features with spatial frequencies and contrast in the shaded region of the parallel are generally not perceived by humans. 73 shows an image with low and high frequencies (left and right) depicted separately.

신문들과 같은 인쇄 매체들의 디지털 워터마킹은 스테가노그래픽으로 보조 패이로드 데이터를 전달하는 불쾌감을 주지 않는 배경 패턴으로 페이지에 음영을 줌으로써(페인팅 전, 동안 또는 후) 행해질 수 있다. 텍스트의 상이한 컬럼들은 상이한 패이로드 데이터로 인코딩될 수 있으며, 예를 들면, 각각의 뉴스 이야기가 상이한 전자 리소스에 링크하도록 허용한다(예를 들면, Digimarc의 특허들 6,985,600, 6,947,571 및 6,724,912를 참조).Digital watermarking of print media, such as newspapers, can be done by shading (before, during or after painting) the page with a non-nasty background pattern that conveys secondary payload data to steganographic. Different columns of text may be encoded with different payload data, for example, allowing each news story to link to a different electronic resource (see, for example, Digimarc's patents 6,985,600, 6,947,571 and 6,724,912).

본 기술의 다른 양태에 따라, 휴대용 이미징 디바이스들의 근접-초점 결점은 저주파수 디지털 워터마크를 임베딩함으로써(예를 들면, 도 72의 왼쪽 상에 중심을 둔 곡선 위의 스펙트럼 구성으로) 극복될 수 있다. 상이한 컬럼들에서 상이한 워터마크들을 인코딩하는 대신에, 페이지는 페이지에 미치는 단일 워터마크로 마킹된다 - 그 페이지에 대한 식별자를 인코딩한다.According to another aspect of the present technology, the near-focus defects of portable imaging devices can be overcome by embedding a low frequency digital watermark (eg, in a spectral configuration above the curve centered on the left side of FIG. 72). Instead of encoding different watermarks in different columns, the page is marked with a single watermark that applies to the page-encoding the identifier for that page.

이용자가 관심있는 신문 이야기의 화상을 스냅핑할 때(화상은 원하는 이야기/광고로부터 텍스트/그래픽스를 캡처할 수 있거나, 마찬가지로 다른 콘텐트에 미칠 수 있음), 그 페이지의 워터마크가 디코딩된다(디바이스에 의해 로컬로, 상이한 디바이스에 의해 원격으로, 또는 분산된 방식으로).When a user snaps an image of a newspaper story of interest (the image can capture text / graphics from the desired story / advertisement, or can likewise span other content), the watermark of the page is decoded (on the device Locally, remotely by different devices, or in a distributed manner).

디코딩된 워터마크는 그 디바이스에 정보를 리턴하는 데이터 구조를 인덱싱하기 하도록, 그 디스플레이 스크린 상에 제공하도록 서빙한다. 디스플레이는 상이한 컬러들에 도시된 상이한 기사들/광고들로 신문 페이지 레이아웃의 맵을 제공한다. The decoded watermark serves to provide on the display screen to index the data structure that returns information to the device. The display provides a map of the newspaper page layout with different articles / ads shown in different colors.

도 74 및 도 75는 하나의 특정 실시예를 도시한다. 오리지널 페이지는 도 74에 도시된다. 이용자 디바이스 스크린 상에 디스플레이된 레이아웃 맵은 도 75에 도시된다. 74 and 75 illustrate one particular embodiment. The original page is shown in FIG. The layout map displayed on the user device screen is shown in FIG. 75.

이야기들 중 어느 하나에 관한 부가의 정보를 링크하기 위하여, 이용자는 관심있는 이야기에 대응하는 디스플레이 맵의 일부를 간단히 터치한다. (디바이스에 터치 스크린이 장착되지 않으면 도 75의 맵은 상이한 맵 구역들을 식별하는 표시자, 예를 들면 1, 2, 3... 또는 A, B, C...가 제공될 수 있다. 이용자는 그 후에, 관심있는 기사를 식별하기 위하여 디바이스의 숫자 또는 알파벳 이용자 인터페이스(예를 들면, 키패드)를 조작할 수 있다.)To link additional information about any of the stories, the user simply touches a portion of the display map that corresponds to the story of interest. (If the device is not equipped with a touch screen, the map of FIG. 75 may be provided with indicators identifying different map regions, for example 1, 2, 3 ... or A, B, C .... May then manipulate the device's numeric or alphabetical user interface (eg, keypad) to identify the article of interest.)

이용자의 선택은 원격 서버(레이아웃 맵 데이터를 휴대용 디바이스에 서빙하는 동일한 하나 또는 다른 하나일 수 있음)에 송신되고, 그 후에, 이용자의 선택에 응답하여 정보를 식별하기 위해 저장된 데이터를 참고한다. 예를 들면, 이용자가 페이지 맵의 하단 우측의 영역을 터치한다면, 라우터 시스템은 이용자 디바이스 상의 제공을 위해 Buick Lucerne에 관한 더 많은 정보를 가진 페이지를 송신하도록 buick-dot-com의 서버에 명령할 수 있다. 또는 원격 시스템은 그 페이지에 대한 링크를 이용자 디바이스에 송신할 수 있고, 디바이스는 그 후에 페이지를 로딩할 수 있다. 또는 원격 시스템은 예를 들면 관련된 포드캐스트를 청취하고; 동일한 주제에 대한 초기 이야기들을 보고; 리프린트들을 지시하고; 기사를 워드 파일로 다운로드 등을 하기 위해 이용자에게 옵션들이 주어질 수 있는 새로운 기사에 대해, 이용자 디바이스가 옵션들의 메뉴를 제공하도록 할 수 있다. 또는 원격 시스템은 이메일에 의해 메뉴 페이지 또는 웹페이지에 대한 링크를 이용자에게 송신할 수 있어서, 이용자는 나중 시간이 이를 리뷰할 수 있다. (이용자-표현된 선택들에 대한 다양한 이러한 상이한 응답들은 본 명세서에 인용된 기술분야에 알려진 바와 같이 제공될 수 있다.)The user's selection is sent to a remote server (which can be the same one or the other serving layout map data to the portable device), and then consults the stored data to identify the information in response to the user's selection. For example, if the user touches the area on the bottom right of the page map, the router system can instruct the server of buick-dot-com to send a page with more information about Buick Lucerne for presentation on the user device. have. Or the remote system can send a link to the page to the user device, which can then load the page. Or the remote system listens to the associated podcast, for example; See early stories on the same subject; Indicate reprints; For a new article, where the options may be given to the user for downloading the article to a word file, etc., the user device may be provided with a menu of options. Alternatively, the remote system can send a link to the menu page or web page to the user by e-mail so that the user can review it later. (A variety of these different responses to user-presented choices may be provided as known in the art cited herein.)

도 75의 맵 대신에, 시스템은 이용자 디바이스가 신문 페이지 자체의 감소된 스케일 버전을 보여주는 스크린을 디스플레이할 수 있게 한다 - 도 74에 도시된 것과 같다. 다시, 이용자는 연관된 응답을 트리거링하기 위해 관심있는 기사를 간단히 터치할 수 있다.Instead of the map of FIG. 75, the system allows the user device to display a screen showing a reduced scale version of the newspaper page itself—as shown in FIG. 74. Again, the user can simply touch the article of interest to trigger the associated response.

또는 페이지의 그래픽 레이아웃을 제공하는 대신에, 원격 시스템은 그 페이지 상의 모든 콘텐트의 타이틀들(예를 들면, "Banks Owe Billions...", "McCain Pins Hopes...", "Buick Lucerne")을 리턴할 수 있다. 이들 타이틀들은 디바이스 스크린 상에 메뉴 형태로 제공되고, 이용자는 원하는 항목을 터치한다(또는 대응하는 숫자/글자 선택을 입력한다). Or, instead of providing a graphical layout of the page, the remote system may provide titles of all content on the page (eg, "Banks Owe Billions ...", "McCain Pins Hopes ...", "Buick Lucerne"). Can return These titles are provided in the form of a menu on the device screen, and the user touches the desired item (or enters the corresponding number / letter selection).

각각 인쇄된 신문 및 잡지 페이지에 대한 레이아웃 맵은 통상적으로, 예를 들면 Quark, Impress 및 Adobe 등과 같이, 벤더들로부터 자동화된 소프트웨어를 이용하여 그 레이아웃 처리의 일부로서 출판 회사에 의해 발생된다. 따라서, 기존의 소프트웨어는 각각의 인쇄된 페이지 상의 어떤 공간들에 어떤 기사들 및 광고들이 나타나는지를 안다. 이들 동일한 소프트웨어 도구들 또는 다른 것들은 이 레이아웃 맵 정보를 취하고, 각각의 이야기/광고들에 대해 대응하는 링크들 또는 다른 데이터를 연관시키고, 휴대용 디바이스들이 웹-액세스 가능한 서버에 액세스할 수 있는 웹-액세스 가능한 서버에서 결과 데이터 구조를 저장하도록 적응될 수 있다. Layout maps for printed newspaper and magazine pages, respectively, are typically generated by a publishing company as part of its layout process using automated software from vendors, such as, for example, Quark, Impress, and Adobe. Thus, existing software knows which articles and advertisements appear in which spaces on each printed page. These same software tools or others take this layout map information, associate corresponding links or other data for each story / advertisement, and web-access where portable devices can access a web-accessible server. It can be adapted to store the resulting data structure on a possible server.

신문 및 잡지 페이지들의 레이아웃은 워터마크 디코딩에 유용할 수 있는 배향 정보를 제공한다. 컬럼들은 수직이다. 헤드라인들 및 텍스트의 라인들은 수평이다. 매우 낮은 공간 이미지 주파수들에서도, 그러한 형상 배향이 구별될 수 있다. 인쇄된 페이지의 이미지를 캡처하는 이용자는 콘텐트를 "정면으로(squarely)" 캡처할 수 없다. 그러나, 이미지의 이들 강력한 수직 및 수평 구성요소들은 캡처된 이미지 데이터의 알고리즘적 분석에 의해 쉽게 결정되고, 캡처된 이미지의 회전이 구별되도록 허용한다. 이 지식은 워터마크 디코딩 처리를 간단하게 하고 신속하게 한다(많은 워터마크 디코딩 동작들의 제 1 단계가 원래 인코딩된 상태로부터 이미지의 회전을 구별하기 위한 것이기 때문에).The layout of newspaper and magazine pages provides orientation information that may be useful for watermark decoding. The columns are vertical. Headlines and lines of text are horizontal. Even at very low spatial image frequencies, such shape orientation can be distinguished. A user capturing an image of a printed page cannot capture content “squarely”. However, these powerful vertical and horizontal components of the image are easily determined by algorithmic analysis of the captured image data, allowing the rotation of the captured image to be distinguished. This knowledge simplifies and speeds up the watermark decoding process (since the first step of many watermark decoding operations is to distinguish the rotation of the image from the original encoded state).

다른 실시예에서, 원격 서버로부터 이용자 디바이스로의 페이지 맵의 전달은 불필요하다. 다시, 콘텐트의 여러 항목들에 미치는 페이지의 영역은 단일 워터마크 패이로드로 인코딩된다. 다시, 이용자는 관심있는 콘텐트를 포함하는 이미지를 캡처한다. 페이지를 식별하는 워터마크가 디코딩된다. In another embodiment, the transfer of the page map from the remote server to the user device is unnecessary. Again, the area of the page that spans several items of content is encoded into a single watermark payload. Again, the user captures an image containing the content of interest. The watermark identifying the page is decoded.

이 실시예에서, 캡처된 이미지는 디바이스 스크린 상에 디스플레이되고, 이용자는 특정 관심있는 콘텐트 영역을 터치한다. 캡처된 이미지 영역 내의 이용자 선택의 좌표가 기록된다. In this embodiment, the captured image is displayed on the device screen and the user touches a particular area of content of interest. Coordinates of user selections in the captured image area are recorded.

도 76은 예시적이다. 이용자는 워터마킹된 신문 페이지로부터의 발췌로부터 이미지를 캡처한 다음 관심있는 기사(타원으로 나타냄)를 터치하기 위해 애플 아이폰, 티-모바일 안드로이드 폰 등을 이용했다. 이미지 프레임 내의 터치의 위치는 터치 스크린 소프트웨어에, 예를 들면, 픽셀들에서 측정된 상단 좌측 코너로부터의 오프셋으로서 알려져 있다. (디스플레이는 480 x 320 픽셀들의 해상도를 가질 수 있다). 터치는 픽셀 위치(200, 160)에 있을 수 있다.76 is illustrative. The user captured an image from an excerpt from a watermarked newspaper page and then used an Apple iPhone, a T-Mobile Android phone, etc. to touch the article of interest (represented by an oval). The position of the touch in the image frame is known to the touch screen software, for example as an offset from the upper left corner measured in pixels. (The display can have a resolution of 480 x 320 pixels). The touch may be at pixel location 200, 160.

워터마크는 페이지에 미치고 점선 대각선들에 의해 도 76에 도시된다. 워터마크(예를 들면, Digimarc의 특허 6,590,996에 기술된 바와 같이)는 기원을 가지지만, 그 기점은 이용자에 의해 캡처된 이미지 프레임 내에 있지 않다. 그러나, 워터마크로부터, 워터마크 디코더 소프트웨어는 이미지 및 그 회전의 스케일을 알고 있다. 또한 그것은 워터마크의 기원으로부터 캡처된 이미지 프레임의 오프셋을 알고 있다. 이 정보와, 오리지널 워터마크가 인코딩된 스케일에 관한 정보(정보는 워터마크와 함께 전달될 수 있고, 원격 저장소로부터 액세스될 수 있고, 검출기에 하드-코딩될 수 있는 등)에 기초하여, 소프트웨어는 캡처된 이미지 프레임의 상단 좌측 코너가 원래 인쇄된 페이지의 최상단 좌측 코너의 1.6인치 아래 지점 및 2.3 인치 오른쪽에 대응하는지를 결정할 수 있다(워터마크 기원이 페이지의 최상단 좌측 코너에 있다고 가정함). 디코딩된 스케일 정보로부터, 소프트웨어는 캡처된 이미지의 480 픽셀 폭이 원래 인쇄된 페이지 12인치 폭의 영역에 대응한다고 식별할 수 있다. The watermark spans the page and is shown in FIG. 76 by dotted diagonal lines. The watermark (as described in, for example, Digimarc's patent 6,590,996) has an origin, but the origin is not in the image frame captured by the user. However, from the watermark, the watermark decoder software knows the scale of the image and its rotation. It also knows the offset of the image frame captured from the origin of the watermark. Based on this information and information about the scale at which the original watermark was encoded (information can be conveyed with the watermark, accessed from a remote repository, hard-coded to the detector, etc.) It can be determined whether the upper left corner of the captured image frame corresponds to a point 1.6 inches below the top left corner of the originally printed page and 2.3 inches right (assuming the watermark origin is at the top left corner of the page). From the decoded scale information, the software can identify that the 480 pixel width of the captured image corresponds to a 12 inch wide area of the original printed page.

소프트웨어는 최종적으로, 원래 인쇄된 페이지의 상단 좌측 코너로부터 오프셋으로서 이용자의 터치의 위치를 결정한다. 그것은 캡처된 이미지의 코너가 인쇄된 페이지의 상단 좌측 코너로부터 오프셋되었고(1.6", 2.3"), 터치가 (6.6", 6.3")의 원래 인쇄된 페이지 내의 최종 위치에 대해, 오른쪽으로 5" 더 멀어지고(200 픽셀들 x 12"/480 픽셀들) 아래로 4" 더 내려간(160 픽셀들 * 12"/480 픽셀들)임을 알 수 있다. The software finally determines the location of the user's touch as an offset from the top left corner of the originally printed page. It has a corner of the captured image offset from the top left corner of the printed page (1.6 ", 2.3") and the touch is 5 "more to the right, for the final position in the original printed page of (6.6", 6.3 "). It can be seen that it is farther away (200 pixels x 12 "/ 480 pixels) and 4" further down (160 pixels * 12 "/ 480 pixels).

그 후에 디바이스는 원격 서버에 워터마크의 패이로드(페이지를 식별함)와 함께 이들 좌표들을 전송한다. 서버는 식별된 페이지의 레이아웃 맵을 룩업하고(페이지 레이아웃 소프트웨어에 의해 저장된 적절한 데이터베이스로부터), 좌표들을 참조하여, 이용자의 터치가 기사들/광고들 중 어느 곳에 있는지를 결정한다. 그 후에, 원격 시스템은 상기 주지된 바와 같이, 표시된 기사에 관련된 대응 정보를 이용자 디바이스에 리턴한다. The device then sends these coordinates with the payload of the watermark (identifying the page) to the remote server. The server looks up the layout map of the identified page (from the appropriate database stored by the page layout software) and consults the coordinates to determine where the user's touch is in articles / advertisements. The remote system then returns the corresponding information related to the displayed article to the user device, as noted above.

초점으로 돌아가서, PDA 카메라의 근접-초점 핸디캡은 실제로 워터마크들을 디코딩하는데 있어서 이점으로 바뀔 수 있다. 워터마크 정보는 텍스트의 잉크로 된 영역들로부터 검색되지 않는다. 대부분의 워터마크들이 기초하는 휘도의 미묘한 변조들은 완전한 흑색이 인쇄되는 영역들에서 손실된다. Returning to focus, the near-focus handicap of the PDA camera can actually turn into an advantage in decoding watermarks. The watermark information is not retrieved from the inked areas of the text. Subtle modulations of luminance on which most watermarks are based are lost in areas where full black is printed.

페이지 기판이 워터마크로 색칠되면, 유용한 워터마크 정보가 인쇄되지 않은 페이지의 영역들로부터, 예를 들면, 컬럼들 사이, 라인들 사이, 단락들의 끝 등의 "백색 공간"으로부터 복구된다. 잉크로 된 캐릭터들은 가장 잘 무시되는 "잡음"이다. PDA 카메라들의 초점 결점들에 의해 도입된 페이지의 인쇄된 부분들의 흐릿함은 - 다량의 잉크로 된 영역들을 식별하는 - 마스크를 규정하기 위해 이용될 수 있다. 이들 부분들은 워터마크 데이터를 디코딩할 때 무시될 수 있다. Once the page substrate is painted with a watermark, useful watermark information is recovered from areas of the unprinted page, such as from "white space" such as between columns, between lines, the end of paragraphs, and the like. Inked characters are the "noise" that is best ignored. The blurring of the printed portions of the page introduced by the focal defects of the PDA cameras can be used to define a mask-identifying areas of large amounts of ink. These parts can be ignored when decoding the watermark data.

더욱 특히, 흐릿한 이미지 데이터는 임계화될 수 있다. 임계값보다 어두운 값을 갖는 임의의 이미지 픽셀들이 무시될 수 있다. 다른 방식으로, 임계값보다 밝은 값을 갖는 이미지 픽셀들만이 워터마크 디코더에 입력된다. 잉크로된 캐릭터들에 의해 기여된 "잡음"이 따라서 필터링되어진다.More particularly, blurred image data may be thresholded. Any image pixels with values darker than the threshold may be ignored. Alternatively, only image pixels with values brighter than the threshold are input to the watermark decoder. The "noise" contributed by the characters in the ink is thus filtered out.

분명하게 초점이 맞추어진 텍스트를 캡처하는 이미징 디바이스들에서, 흐릿한 커넬을 가진 텍스트를 처리함으로써 - 그리고 이렇게 인쇄된 텍스트에 의해 우세해지는 것으로 발견된 영역들을 추출해냄으로써 - 유사한 이점들이 생성될 수 있다. In imaging devices capturing clearly focused text, similar advantages can be created by processing text with a blurry kernel-and by extracting the areas found to be dominant by this printed text.

상술된 것들과 같은 어레인지먼트들에 의해, 휴대용 이미징 디바이스들의 결점들이 시정되고, 페이지 레이아웃 데이터에 기초한 향상된 프린트-투-웹 링킹이 가능해진다.With arrangements such as those described above, the shortcomings of portable imaging devices are corrected and improved print-to-web linking based on page layout data is enabled.

이미지 검색, 특징 추출, 패턴 Image search, feature extraction, pattern 매칭matching 등 Etc

특정 상술된 실시예들의 이미지 검색 기능이 모두

Inc.(Toronto, ON)로부터의 Pixsimilar 이미지 검색 소프트웨어 및/또는 비주얼 검색 개발자의 키트(SDK)를 이용하여 구현될 수 있다. 이미지에 대한 설명적 주석들을 자동으로 생성하는 도구는 특허 7,394,947 (Penn State)에 상술된 바와 같이 ALIPR(Automatic Linguistic Indexing of Pictures)이다. All of the image retrieval functions of the specific embodiments described above

Pixsimilar image search software from Inc. (Toronto, ON) and / or Visual Search Developer's Kit (SDK). A tool for automatically generating descriptive annotations for an image is Automatic Linguistic Indexing of Pictures (ALIPR) as detailed in Patent 7,394,947 (Penn State).

상술된 실시예들에서 콘텐트-기반 이미지 검색(CBIR)이 또한 이용될 수 있다. 기술자들에게 친숙한 바와 같이, CBIR은 본질적으로, (1) 이미지의 특성화를 추출하는 것 - 일반적으로 수학적으로 - 및 (2) 이미지들 사이의 유사성을 평가하기 위해 이러한 특성화를 이용하는 것을 관련시킨다. 이들 필드들을 조사하는 2개의 문헌들은 2000년 IEEE Trans. Pattern Anal. Mach. Intell, 제 12호 제22권, 1349-1380쪽에서 Smeulders 등에 의한 "Content-Based Image Retrieval at the End of the Early Years", 및 2008년 4월 ACM Computing Surveys 제2번 제40권에서 Datta 등에 의한 "Image Retrieval: Ideas, Influences and Trends of the New Age"이다.In the embodiments described above content-based image retrieval (CBIR) may also be used. As is familiar to those skilled in the art, CBIR inherently involves (1) extracting the characterization of an image—generally mathematically—and (2) using this characterization to assess similarity between images. Two documents examining these fields are described in IEEE Trans. Pattern Anal. Mach. "Content-Based Image Retrieval at the End of the Early Years" by Smeulders et al. In Intell, Vol. 12, no. 22, pp. 1349-1380, and by Datta et al. In ACM Computing Surveys, Vol. 40, April 2008. Image Retrieval: Ideas, Influences and Trends of the New Age.

대형 이미지 데이터베이스들로부터 동일하게 보이는 이미지를 식별하는 작업은 운전 면허증들의 발행에서 친숙한 동작이다. 즉, 새로운 신청자로부터 캡처된 이미지는 신청자가 운전 면허증을 이미 발행했는지의 여부(다른 이름 하에 가능한)를 확인하기 위해, 모든 이전 운전 면허증 포토들의 데이터베이스에 대해 일반적으로 확인된다. 운전 면허증 분야로부터 알려진 방법들 및 시스템들은 여기에 상술된 어레인지먼트들에서 활용될 수 있다. (예들은 Identix 특허 7,369,685 와, L-1 Corp. 특허 7,283,649 및 7,130,454를 포함한다.) Identifying identical looking images from large image databases is a familiar operation in the issuance of driver's licenses. That is, the image captured from the new applicant is generally verified against a database of all previous driver's license photos to check whether the applicant has already issued a driver's license (possibly under another name). Methods and systems known from the driver's license field may be utilized in the arrangements described herein above. (Examples include Identix patent 7,369,685 and L-1 Corp. patent 7,283,649 and 7,130,454.)

본 명세서의 많은 실시예들에서 CEDD 및 FCTH로서 알려진 이미지 특징 추출 알고리즘들이 유용하다. 전자는 2008년 5월 6th International Conference in advanced research on Computer Vision Systems ICVS 2008에서 Chatzichristofis 등에 의한 "CEDD: Color and Edge Directivity Descriptor - A Compact Descriptor for Image Indexing and Retrieval"에; 후자는 2008년 5월 9th International Workshop on Image Analysis for Multimedia Interactive Services, Proceedings: IEEE Computer Society에서 Chatzichristofis 등에 의한 "FCTH: Fuzzy Color And Texture Histogram - A Low Level Feature for Accurate Image Retrieval"에 상술되어 있다. In many embodiments herein, image feature extraction algorithms known as CEDD and FCTH are useful. The former was published in Chatzichristofis et al. In "CEDD: Color and Edge Directivity Descriptor-A Compact Descriptor for Image Indexing and Retrieval" at the 6th International Conference in Advanced Research on Computer Vision Systems ICVS 2008, May 2008; The latter is detailed in the May 9th International Workshop on Image Analysis for Multimedia Interactive Services, Proceedings: IEEE Computer Society, in "FCTH: Fuzzy Color And Texture Histogram-A Low Level Feature for Accurate Image Retrieval" by Chatzichristofis et al.

이들 기술들을 구현하는 오픈-소스 소프트웨어가 이용 가능하다; 웹 페이지 savvash.blogspot-dot-com/2008/05/cedd-and-fcth-are-now-open-dot-html를 참조한다. 그 기능을 구현하는 DLL들이 다운로드될 수 있다. 입력 이미지 데이터(예를 들면, 파일.jpg)에 대한 클레스들이 다음과 같이 호출될 수 있다:Open-source software that implements these techniques is available; See the web page savvash.blogspot-dot-com / 2008/05 / cedd-and-fcth-are-now-open-dot-html. DLLs that implement that functionality can be downloaded. Classes for input image data (e.g., file.jpg) can be invoked as follows:

double [] CEDDTable = new double[144]; double [] CEDDTable = new double [144];

double [] double []

FCTHTable = new double[144];
FCTHTable = new double [144];

Bitmap ImageData = new Bitmap("c:/file.jpg");
Bitmap ImageData = new Bitmap ("c: /file.jpg");

CEDD CEDD

GetCEDD = new CEDD(); GetCEDD = new CEDD ();

FCTH GetFCTH = new FCTH();
FCTH GetFCTH = new FCTH ();

CEDDTable = GetCEDD.Apply(ImageData); CEDDTable = GetCEDD.Apply (ImageData);

FCTHTable = FCTHTable =

GetFCTH.Apply(ImageData,2); GetFCTH.Apply (ImageData, 2);

방금 인용한 웹 페이지로부터 이용가능한 공동 복합 디스크립터 파일을 이용하여 개선된 결과들을 생성하기 위해 CEDD 및 FCTH가 조합될 수 있다.CEDD and FCTH can be combined to produce improved results using the joint compound descriptor file available from the web page just cited.

Chatzichristofis는 오픈 소스 프로그램 "img(Finder)"를 이용 가능하게 하였다(웹 페이지 savvash.blogspot-dot-com/2008/07/image-retrieval-in-facebook-dot-html 참조) - CEDD 및 FCTH를 이용하여 페이스북 소셜 네트워킹 사이트로부터 이미지들을 검색 및 인덱싱하는 이미지 검색 데스크탑 애플리케이션에 기초한 콘텐트. 이용시, 이용자는 페이스북을 그들 개인용 계정 데이터와 연결하고, 애플리케이션은이용자의 이미지들뿐만 아니라 이용자의 친구들의 이미지 앨범들로부터 정보를 다운로드하여, CEDD 및 FCTH 특징들로 검색을 위한 이들 이미지들을 인덱싱한다. 인덱스는 그 후 샘플 이미지에 의해 질의될 수 있다.Chatzichristofis made the open source program "img (Finder)" available (see savvash.blogspot-dot-com / 2008/07 / image-retrieval-in-facebook-dot-html)-using CEDD and FCTH Content based on an image search desktop application that retrieves and indexes images from a Facebook social networking site. In use, the user connects Facebook with their personal account data, and the application downloads information from the user's friends' image albums as well as the user's images, indexing these images for search with CEDD and FCTH features. . The index can then be queried by the sample image.

Chatzichristofis는 또한, 이용자가 포토를 업로드하고, 서비스가 11개의 상이한 이미지 아카이브들 중 하나에서 유사한 이미지들을 검색하는 - CEDD 및 FCTH를 포함하는 이미지 메트릭들을 이용하여 - 서비스 온라인 검색 서비스 "img(Anaktisi)"를 이용 가능하게 만들었다. orpheus.ee.duth-dot-gr/anaktisi/를 참조한다. (이미지 아카이브들은 플리커를 포함한다). Anaktisi 검색 서비스에 대한 연관된 설명에서, Chatzichristofis는 하기와 같이 설명한다:Chatzichristofis also uses the service online retrieval service "img (Anaktisi)"-with image metrics including CEDD and FCTH-in which the user uploads photos and the service retrieves similar images from one of eleven different image archives. Made available. See orpheus.ee.duth-dot-gr / anaktisi /. (Image archives include flicker). In the related description of the Anaktisi search service, Chatzichristofis is described as follows:

컴퓨터들 및 인터넷의 광범위한 대중화를 통한 디지털 이미지들의 급속한 성장은 긴요한 효율적인 이미지 검색 기술의 개발을 일구었다. The rapid growth of digital images through the widespread popularity of computers and the Internet has led to the development of critical and efficient image retrieval techniques. CBIRCBIR 로서 알려진 Known as 콘텐Content 트-기반 이미지 검색은 이미지 콘텐트를 기술하는 여러 특징들을 추출하여, 이미지들의 비주얼 콘텐트의 소위 특징 공간이라고 불리는 새로운 공간으로 맵핑한다. 주어진 이미지에 대한 특징 공간 값들은 유사한 이미지들을 검색하기 위해 이용될 수 있는 디스크립터에 저장된다. 성공적인 검색 시스템에 대한 키는 가능한 정확하고 고유하게 이미지들을 표현하는 적절한 특징들을 선택하는 것이다. 선택된 특징들은 이미지에 존재하는 오브젝트들을 기술하는데 있어서 특이하고 충분해야 한다. 이러한 목표들을 달성하기 위하여, CBIR 시스템들은 3개의 기본 타입들의 특징들을 이용한다: 컬러 특징들, 텍스처 특징들 및 공간 특징들. 이들 특징들의 타입들 중 하나만을 이용하여 만족스러운 검색 결과들을 달성하는 것은 매우 어렵다. Trait-based image retrieval extracts several features describing image content and maps them to a new space called the feature space of the visual content of the images. Feature space values for a given image are stored in a descriptor that can be used to retrieve similar images. The key to a successful retrieval system is to select the appropriate features that represent the images as accurately and uniquely as possible. The selected features must be unique and sufficient to describe the objects present in the image. To achieve these goals, CBIR systems use three basic types of features: color features, texture features and spatial features. It is very difficult to achieve satisfactory search results using only one of these types of features.

지금까지는 많은 제안된 검색 기술들이 하나보다 많은 특징 타입이 관련되는 방법들을 채택한다. 예를 들면, 컬러, 텍스처 및 형상 특징들이 IBM의 QBIC 및 MIT의 포토북 양쪽 모두에 이용된다. QBIC는 컬러 히스토그램들, 순간-기반 형상 특징 및 텍스트의 디스크립터를 이용한다. 포토북은 출현 특징들, 텍스처 특징들 및 2D 형상 특징들을 이용한다. 다른 CBIR 시스템들은 SIMBA, CIRES, SIMPLIcity, IRMA, FIRE 및 MIRROR를 포함한다. 검색의 누적적 몸체는 이들 특징 타입들에 대한 추출 방법들을 제공한다. To date, many proposed retrieval techniques employ methods in which more than one feature type is involved. For example, color, texture and shape features are used for both IBM's QBIC and MIT's photobook. QBIC uses color histograms, instant-based shape features and descriptors of text. The photobook uses appearance features, texture features and 2D shape features. Other CBIR systems include SIMBA, CIRES, SIMPLIcity, IRMA, FIRE, and MIRROR. The cumulative body of the search provides extraction methods for these feature types.

컬러 및 텍스처와 같은 둘 이상의 특징 타입들을 조합하는 대부분의 검색 시스템들에서, 독립된 벡터들이 이용되어 각각의 종류의 정보를 기술한다. 높은 차원의 벡터를 가지는 이미지들의 디스크립터들의 크기를 증가시킴으로써 매우 양호한 검색 점수들을 달성하는 것이 가능하지만, 이러한 기술은 여러 결점들을 가진다. 디스크립터가 수백 심지어 수천 개의 빈들을 가진다면, 검색 절차가 상당히 지연되기 때문에 실용적으로 이용할 수 없다. 또한, In most search systems that combine two or more feature types, such as color and texture, separate vectors are used to describe each kind of information. While it is possible to achieve very good search scores by increasing the size of descriptors of images with high dimensional vectors, this technique has several drawbacks. If the descriptor has hundreds or even thousands of beans, it is not practically available because the retrieval procedure is considerably delayed. Also, 디스크립터의Of descriptor 크기를 증가시키면, 수백만 개의 이미지들을 포함하는 데이터베이스들에 대한 상당한 패널티를 가질 수 있는 저장 요건들이 증가한다. 많은 제공된 방법들은 디스크립터의 길이를 더 작은 수의 빈들로 제한하여, 가능한 팩터 값들을 십진법의 양자화되지 않은 형태로 남겨둔다. Increasing the size increases the storage requirements, which can have significant penalties for databases containing millions of images. Many provided methods limit the length of the descriptor to a smaller number of bins, leaving the possible factor values in undetermined quantized form.

엠페그(MPEG: Moving Picture Experts Group)는 MPEG-7 표준의 멀티미디어 데이터에 대한 콘텐트-기반 액세스에 대한 표준을 규정한다. 이 표준은 특징의 크기와 검색 결과들의 품질 사이의 균형을 유지하는 이미지 디스크립터들의 세트를 식별한다.Moving Picture Experts Group (MPEG) defines a standard for content-based access to multimedia data of the MPEG-7 standard. This standard identifies a set of image descriptors that maintain a balance between the size of a feature and the quality of the search results.

이러한 웹-사이트에서, 새로운 세트의 특징 디스크립터들은 검색 시스템에 제공된다. 이들 디스크립터들은 크기 및 저장 요건들에 특별히 주의해서 식별력을 절충하지 않고 가능한 작게 유지하여 설계된다. 이들 디스크립터들은 이미지당 23 및 74바이트들 사이의 크기들을 유지하면서 컬러 및 텍스처 정보를 하나의 히스토그램으로 통합한다.In this web-site, a new set of feature descriptors is provided to the search system. These descriptors are designed with special attention to size and storage requirements, keeping them as small as possible without compromising discernment. These descriptors combine color and texture information into one histogram while maintaining sizes between 23 and 74 bytes per image.

콘텐트-기반 이미지 검색 시스템들에서의 높은 검색 점수들은 관련 피드백 메커니즘들을 채택함으로써 달성될 수 있다. 이들 메커니즘들은 검색된 이미지들을 관련되거나 되지 않는 것으로 마킹함으로써 질의 결과들의 품질을 분류하도록 이용자에게 요구한다. 그 후에, 검색 엔진은 더욱 양호하게 이용자의 요구들을 충족시키기 위하여 후속 질의들에서 이 분류된 정보를 이용한다. 관련 피드백 메커니즘들이 정보 관련 분야에 먼저 도입되었지만, 이들은 현재 CBIR 분야에서 상당한 관심을 받고 있음을 유념한다. 참고문헌에 제안된 대다수의 관련 피드백 기술들은 검색 파라미터들의 값들을 수정하는 것에 기초하여, 이들은 이용자가 염두한 개념을 더 잘 표현한다. 검색 파라미터들은 지금까지 검색된 모든 이미지들에 대해 이용자에 의해 할당된 관련 값들의 함수로서 계산된다. 예를 들면, 관련 피드백은 질의 벡터의 수정의 관점들에서 및/또는 적응형 유사성 메트릭들의 관점들에서 빈번하게 공식화된다.High search scores in content-based image retrieval systems can be achieved by employing relevant feedback mechanisms. These mechanisms require the user to classify the quality of query results by marking the retrieved images as being related or not. The search engine then uses this sorted information in subsequent queries to better meet the needs of the user. Note that although relevant feedback mechanisms were first introduced in the information related field, they are now of considerable interest in the CBIR field. Many of the relevant feedback techniques proposed in the references are based on modifying the values of the search parameters, which better express the concept that the user has in mind. The search parameters are calculated as a function of the relevant values assigned by the user for all the images retrieved so far. For example, relevant feedback is frequently formulated in terms of modification of the query vector and / or in terms of adaptive similarity metrics.

또한, 이 웹-사이트에서, 자동 관련 피드백(ARF: Auto Relevance Feedback) 기술은 제안된 디스크립터들에 기초하여 도입된다. 제안된 자동 관련 피드백(ARF) 알고리즘의 목표는 이용자 선호들에 기초하여 초기의 검색 결과들에서 최상으로 재적응된다. 이 절차 동안, 이용자는 그의 초기 검색 예상들에 관련되는 것으로서 하나를 1차 검색된 이미지들에서 선택한다. 이들 선택된 이미지들로부터의 정보는 초기 질의 이미지 디스크립터를 변경하기 위해 이용된다.Also in this web-site, an Auto Relevance Feedback (ARF) technique is introduced based on the proposed descriptors. The goal of the proposed automatic related feedback (ARF) algorithm is best re-adapted in the initial search results based on user preferences. During this procedure, the user selects one from the primary retrieved images as related to his initial search expectations. Information from these selected images is used to change the initial query image descriptor.

다른 오픈 소스 콘텐트 기반 이미지 검색 시스템은 Geneva 대학의 연구원들에 의해 만들어진 GIFT(GNU 이미지 탐색 도구)이다. 도구들 중 하나는 이용자가 이미지들을 포함하는 디렉토리 트리들을 인덱싱하도록 허용한다. GIFT 서버 및 그 클라이언트(SnakeCharmer)는 그 후에 이미지 유사성에 기초하여 인덱싱된 이미지들을 검색하기 위해 이용될 수 있다. 시스템은 웹 페이지 gnu-dot- org/software/gift/gift-dot-html에서 추가적으로 기술된다. 최신 버전의 소프트웨어는 ftp 서버 ftp.gnu-dot-org/gnu/gift에서 찾을 수 있다. Another open source content-based image retrieval system is the GIFT (GNU image search tool) created by researchers at Geneva University. One of the tools allows a user to index directory trees containing images. The GIFT server and its client (SnakeCharmer) can then be used to retrieve indexed images based on image similarity. The system is further described on the web page gnu-dot-org / software / gift / gift-dot-html. The latest version of the software can be found on the ftp server ftp.gnu-dot-org / gnu / gift.

또 다른 오픈 소스 CBIR 시스템은 RWTH Aachen 대학에서 Tom Deselaers 등에 의해 작성되고, 웹 페이지 -i6.informatik.rwth-aachen-dot-de/~deselaers/fire/로부터 다운로드를 위해 이용 가능한 Fire이다. Fire는 예를 들면, 2008년 3월 네델란드 스크링거, Information Retrieval 제2호 제11권 77-107쪽에서 Deselaers에 의한 "Features for Image Retrieval: An Experimental Comparison"에 기술된 기술을 이용한다. Another open source CBIR system is Fire, written by Tom Deselaers et al at RWTH Aachen University and available for download from the web page -i6.informatik.rwth-aachen-dot-de / ~ deselaers / fire /. Fire uses, for example, the technique described in "Features for Image Retrieval: An Experimental Comparison" by Deselaers, March, 2008, in Scringer, Netherlands, Vol. 11, No. 11, pp. 77-107.

본 발명의 실시예들은 이미지 픽셀들의 전체 프레임들보다는 이미지에 묘사된 오브젝트들과 일반적으로 연관된다. 이미지 내의 오브젝트들의 인식(때때로 컴퓨터 비전이라고 칭해짐)은 판독자에게 친숙한 것으로 여겨지는 대형 과학이다. 에지들 및 센트로이드들(centroids)은 이미지들에서 오브젝트들을 인식하는데 도움을 주기 위해 이용될 수 있는 이미지 특징들이다. 다른 하나는 형상 콘텍스트들이다(2000년 IEEE Workshop on Content Based Access of Image and Video Libraries에서 Belongie 등에 의한 Matching with Shape Contexts 비고). 어파인 변환들에 대한 견고성(예를 들면, 스케일 불변성, 회전 불변성)은 특정 오브젝트 인식/패턴 매칭/컴퓨터 비전 기술들의 유리한 특징이다. 허프 변환 및 푸리에 멜린 변환에 기초한 방법들은 회전-불변 속성들을 나타낸다. SIFT(하기의 논의됨)는 이것 및 다른 유리한 속성들을 가진 이미지 인식 기술이다. Embodiments of the present invention generally relate to objects depicted in an image rather than entire frames of image pixels. Recognition (sometimes called computer vision) of objects in an image is a large science that is considered familiar to the reader. Edges and centroids are image features that can be used to help recognize objects in images. The other is shape contexts (see Matching with Shape Contexts by Belongie et al. In the 2000 IEEE Workshop on Content Based Access of Image and Video Libraries). Robustness to affine transformations (eg, scale invariance, rotational invariance) is an advantageous feature of certain object recognition / pattern matching / computer vision techniques. Methods based on Hough transform and Fourier melin transform exhibit rotation-invariant properties. SIFT (discussed below) is an image recognition technique with this and other advantageous properties.

오브젝트 인식/컴퓨터 비전 외에도, 이 명세서에서 고찰된 이미지 처리(메타데이터 연관된 처리에 반대)는 다양한 다른 기술들을 이용할 수 있으며, 이것은 다양한 이름들에 의해 진행할 수 있다. 이미지 분석, 패턴 인식, 특징 추출, 특징 검출, 템플릿 매칭, 얼굴 인식, 고유벡터들 등이 포함된다. (모든 이들 용어들은 일반적으로 이 명세서에서 교환 가능하게 이용된다.) 관심있는 판독자는 관련 정보에 대한 개별설명 및 인용들을 포함하는 방금 나열된 주제들의 각각에 대한 기사를 갖는 위키피디어를 참조한다. In addition to object recognition / computer vision, the image processing (as opposed to metadata associated processing) discussed herein may utilize a variety of other techniques, which may proceed by various names. Image analysis, pattern recognition, feature extraction, feature detection, template matching, face recognition, eigenvectors, and the like. (All these terms are generally used interchangeably herein.) Interested readers refer to Wikipedia with articles on each of the topics just listed, including individual descriptions and citations of related information.

기술된 종류의 이미지 메트릭들은 때때로 메타데이터, 즉 "콘텐트-종속 메타데이터"로서 간주된다. 이것은 "콘텐트-기술 메타데이터"와 대조적이다 - 용어 메타데이터가 이용된다는 관점에서 더욱 친숙하다.Image metrics of the described kind are sometimes considered as metadata, ie "content-dependent metadata". This is in contrast to "content-descriptive metadata"-more familiar in terms of the term metadata being used.

통신 Communication 디바이스들과의With devices 상호작용 Interaction

상술된 대부분의 예들은 통신할 수단을 가지지 않는 이미징 오브젝트들을 관련시킨다. 이 섹션은 통신하기 위해 갖추어지거나 그렇게 갖추어질 수 있는 오브젝트들에 적용된 그러한 기술들을 더욱 특별히 고려한다. 간단한 예들은 WiFi-장착된 서모스탯들 및 파킹 미터들, 이더넷-링킹된 전화들 및 블루투스가 장착된 호텔 침대측 시계들이다. Most of the examples described above involve imaging objects that do not have a means to communicate. This section considers more particularly those techniques applied to objects that may or may not be equipped to communicate. Simple examples are WiFi-equipped thermostats and parking meters, Ethernet-linked phones and hotel bedside watches equipped with Bluetooth.

이용자의 사무실 시내로 운전하는 이용자를 고려하자. 빈 주차 공간을 찾으면, 그녀는 그녀의 셀 폰을 파킹 미터에 향하게 한다. 가상 이용자 인터페이스(UI)는 - 이용자가 미터로부터 2시간을 구매하도록 허용하는 - 셀 폰 스크린 상에 거의 즉시 나타난다. 사무실 건물 내부에서, 여성이 회의실이 싸늘한 것을 발견하고 셀 폰을 서모스탯에 향하게 한다. 잠시 후에, 상이한 가상 이용자 인터페이스가 셀 폰 상에 나타난다 - 그녀에게 서모스탯의 설정들을 변경하도록 허용한다. 파킹 미터가 시간이 다 되어가려고 하기 10분 전에 셀 폰이 울리고, 다시 파킹 미터에 대한 UI를 제공한다. 이용자는 - 그녀의 사무실로부터 - 또 1시간을 구매한다.Consider a user driving into the user's office downtown. When she finds an empty parking space, she points her cell phone to the parking meter. The virtual user interface (UI) appears almost immediately on the cell phone screen-allowing the user to purchase two hours from the meter. Inside the office building, a woman finds the conference room cool and points the cell phone to the thermostat. After a while, a different virtual user interface appears on the cell phone-allowing her to change the settings of the thermostat. The cell phone will ring 10 minutes before the parking meter is about to run out of time and again provide the UI for the parking meter. The user buys another hour-from her office.

상호작용의 보안이 중요한 산업상 이용자들 및 다른 애플리케이션들 또는 익명이 중요한 애플리케이션들에 대해, 다양한 레벨들의 보안 및 액세스 특권들이 이미징되고 있는 오브젝트와 이용자의 모바일 디바이스 사이의 상호작용 세션에 통합될 수 있다. 제 1 레벨은 IP 어드레스와 같은 오브젝트의 표면적 특징에서 접촉 명령어들을 단순히 명백하게 또는 은밀하게 인코딩하는 것을 포함한다; 제 2 레벨은 명백한 심볼을 통해 명시적으로 또는 디지털 워터마킹을 통해 더욱 미묘하게 디바이스에 공개-키 정보를 제공하는 것을 포함한다; 그리고 제 3 레벨은 고유한 패턴들 또는 디지털 워터마킹이 능동적으로 오브젝트의 사진을 찍음으로써만 획득될 수 있다.For industrial users and other applications where security of interaction is important or for applications where anonymity is important, various levels of security and access privileges can be integrated into the interaction session between the object being imaged and the user's mobile device. . The first level involves simply explicitly or covertly encoding contact instructions in the surface feature of the object, such as an IP address; The second level includes providing the public-key information to the device either explicitly through explicit symbols or more subtly via digital watermarking; And the third level can be obtained only by unique patterns or digital watermarking actively taking a picture of the object.

이용자의 셀 폰 상에 제공된 인터페이스는 이용자 선호들에 따라 및/또는 디바이스와의 특정 작업형 상호작용들을 용이하게 하기 위해(예를 들면, 사무실 직원이 온도 설정 제어를 정지할 수 있는 동안 기술자는 서모스탯에 대한 "디버그" 인터페이스를 정지할 수 있음) 맞춤식이 될 수 있다. The interface provided on the user's cell phone may be adapted to the user's preferences and / or to facilitate specific operational interactions with the device (e.g., while the office staff may stop controlling the temperature setting. You can stop the "debug" interface to the stats).

디스플레이들, 버튼들, 다이얼들, 또는 오브젝트 또는 디바이스와의 물리적 상호작용을 위해 의도된 다른 이러한 특징들과 같은 요소들을 통합하는 물리적 오브젝트 또는 디바이스가 존재하는 곳이라면, 이러한 비용들은 불필요할 수 있다. 대신, 그 기능은 오브젝트 또는 디바이스와 능동적으로 및 가상적으로 상호작용하는 모바일 디바이스에 의해 중복될 수 있다. These costs may be unnecessary where there is a physical object or device incorporating elements such as displays, buttons, dials, or other such features intended for physical interaction with the object or device. Instead, the functionality may be overlapped by the mobile device actively and virtually interacting with the object or device.

무선 칩을 디바이스에 통합함으로써, 제조업자는 그 디바이스에 대한 모바일 GUI를 효과적으로 가능하게 한다. By integrating a wireless chip into a device, manufacturers can effectively enable a mobile GUI for that device.

한 양태에 따라, 이러한 기술은 디바이스에 대응하는 식별 정보를 획득하기 위해 모바일 폰을 이용하는 것을 포함한다. 획득된 식별 정보를 참조하여, 상기 디바이스에 대응하는 애플리케이션 소프트웨어가 그 후에 식별되고, 모바일 폰에 다운로드된다. 이 애플리케이션 소프트웨어는 그 후에 디바이스와의 이용자 상호작용을 용이하게 하는데 이용된다. 이러한 어레인지먼트에 의해, 모바일 폰은 - 그 디바이스에 대응하는 정보를 참조하여 식별된 애플리케이션 소프트웨어의 이용을 통해 특정 디바이스를 제어하도록 적응하는 - 다기능 제어기의 역할을 한다. According to one aspect, such techniques include using a mobile phone to obtain identification information corresponding to a device. With reference to the obtained identification information, application software corresponding to the device is then identified and downloaded to the mobile phone. This application software is then used to facilitate user interaction with the device. By this arrangement, the mobile phone acts as a multifunctional controller-adapted to control a particular device through the use of the identified application software with reference to the information corresponding to that device.

다른 양태에 따라, 이러한 기술은 디바이스의 하우징으로부터 정보를 감지하기 위해 모바일 폰을 이용하는 것을 포함한다. 이렇게 감지된 정보의 이용을 통해, 다른 정보가 디바이스에 대응하는 공개 키를 이용하여 암호화된다. According to another aspect, such techniques include using a mobile phone to sense information from a housing of a device. Through the use of this sensed information, other information is encrypted using the public key corresponding to the device.

또 다른 양태에 따라, 이러한 기술은 디바이스로부터 아날로그 정보를 감지하기 위해 모바일 폰을 이용하는 것을 포함한다. 이렇게 감지된 아날로그 정보는 디지털 형태로 변환되고 대응하는 데이터가 셀 폰으로부터 송신된다. 이렇게 송신된 데이터는 이용자가 모바일 폰을 이용하여 디바이스와 상호작용하도록 허용하기 전에, 디바이스에 대한 이용자 근접을 확인하기 위해 이용된다. According to another aspect, this technique includes using a mobile phone to sense analog information from a device. The sensed analog information is converted into digital form and the corresponding data is transmitted from the cell phone. This transmitted data is used to confirm user proximity to the device before allowing the user to interact with the device using the mobile phone.

또 다른 양태에 따라, 이러한 기술은 디바이스의 제어에 관련된 명령어를 수신하기 위해 이용자의 셀 폰 상의 이용자 인터페이스를 이용하는 것을 포함한다. 이 이용자 인터페이스는 디바이스의 셀 폰-캡처된 이미지와 조합하여 셀 폰이 스크린 상에 제공된다. 명령어에 대응하는 정보는 이용자에게 제 1 방식으로 시그널링되는 동안, 그 명령어는 계류중이다; 그리고, 제 2 방식으로 일단 명령어가 성공적으로 실행되었다. According to another aspect, this technique includes using a user interface on a user's cell phone to receive instructions related to control of the device. This user interface is combined with the cell phone-captured image of the device to provide a cell phone on the screen. While the information corresponding to the command is signaled to the user in a first manner, the command is pending; And, in the second manner, the command was successfully executed once.

또 다른 양태에 따라, 본 기술은 이용자가 디바이스에 근접할 때 이용자 셀 폰의 스크린 상에 제공된 이용자 인터페이스를 이용하여, 디바이스와의 트랜잭션을 개시하는 것을 포함한다. 나중에, 셀 폰은 디바이스에 관련되지 않은 용도로 이용된다. 더 나중에, 이용자 인터페이스는 디바이스와의 다른 트랜잭션에서 연계되기 위해 리콜되어 이용된다. According to another aspect, the present technology includes initiating a transaction with the device using a user interface provided on the screen of the user cell phone when the user is in proximity to the device. Later, cell phones are used for non-device related purposes. Later on, the user interface is recalled and used to associate in another transaction with the device.

또 다른 양태에 따라, 이러한 기술은 처리기, 메모리, 센서 및 디스플레이를 포함하는 모바일 폰을 포함한다. 메모리 내의 명령어들은 다음의 동작들을 할 수 있게 처리기를 구성한다: 근접한 제 1 디바이스로부터 정보를 감지하고; 감지된 정보를 참조하여, 제 1 디바이스에 대응하는 제 1 이용자 인터페이스 소프트웨어를 다운로드하고; 다운로드된 제 1 이용자 인터페이스 소프트웨어와 이용자 상호작용에 의해 제 1 디바이스와 상호작용하고; 제 2 디바이스에 대응하는 모바일 폰에 초기에 다운로드된 제 2 이용자 인터페이스 소프트웨어를 메모리로부터 리콜하고; 이용자가 상기 제 2 디바이스에 근접한지의 여부에 상관없이, 리콜된 제 2 이용자 인터페이스 소프트웨어와 이용자 상호작용에 의해 제 2 디바이스와 상호작용하는 동작들. According to another aspect, such technology includes a mobile phone that includes a processor, a memory, a sensor, and a display. The instructions in the memory configure the processor to perform the following operations: sense information from the first device in proximity; Referring to the sensed information, download first user interface software corresponding to the first device; Interact with the first device by user interaction with the downloaded first user interface software; Recall from the memory, second user interface software initially downloaded to the mobile phone corresponding to the second device; Operations interacting with the second device by user interaction with the recalled second user interface software, whether or not the user is in close proximity to the second device.

또 다른 양태에 따라, 이러한 기술은 처리기, 메모리 및 디스플레이를 포함하는 모바일 폰을 포함한다. 메모리 내의 명령어들은 처리기가 모바일 폰을 이용하여 복수의 상이한 외부 디바이스들과의 상호작용을 하기 위해, 이용자가 메모리에 저장된 여러 개의 다른 디바이스-지정 이용자 인터페이스들 사이에서 선택하도록 허용하는 이용자 인터페이스를 제공하도록 처리기를 구성한다. According to another aspect, such technology includes a mobile phone that includes a processor, a memory, and a display. The instructions in the memory allow the processor to provide a user interface that allows the user to select between several different device-specified user interfaces stored in memory for interacting with a plurality of different external devices using the mobile phone. Configure your handler.

이들 어레인지먼트들은 도 78 내지 도 87을 참조하여 더욱 특별히 상술된다. These arrangements are more particularly detailed with reference to FIGS. 78-87.

도 78 및 도 79는 종래 기술의 WiFi-장착된 서모스탯(512)을 도시한다. 여기에는 온도 센서(514), 처리기(516) 및 이용자 인터페이스(518)가 포함된다. 이용자 인터페이스는 다양한 버튼들(518), LCD 디스플레이 스크린(520) 및 하나 이상의 표시기 광들(522)을 포함한다. 메모리(524)는 서모스탯에 대한 프로그래밍 및 데이터를 저장한다. 최종적으로, WiFi 송수신기(526) 및 안테나(528)는 원격 디바이스들과의 통신을 허용한다. (묘사된 서모스탯(512)은 미국의 무선 서모스탯 회사로부터 모델 CT80로서 이용가능하다. WiFi 송수신기는 GainSpan GS1010 SoC(System on Chip) 디바이스를 포함한다.) 78 and 79 show a prior art WiFi-mounted thermostat 512. This includes a temperature sensor 514, a processor 516 and a user interface 518. The user interface includes various buttons 518, an LCD display screen 520, and one or more indicator lights 522. Memory 524 stores programming and data for the thermostat. Finally, WiFi transceiver 526 and antenna 528 allow communication with remote devices. (The depicted thermostat 512 is available as a model CT80 from a US wireless thermostat company. The WiFi transceiver includes a GainSpan GS1010 System on Chip (SoC) device.)

도 80은 유사한 서모스탯(530)을 도시하지만, 본 기술의 특정 양태에 따른 원리들을 구현한다. 서모스탯(512)과 마찬가지로, 서모스탯(530)은 온도 센서(514), 처리기(532)를 포함한다. 메모리(534)는 메모리(524)와 동일한 프로그래밍 및 데이터를 저장할 수 있다. 그러나, 이 메모리(534)는 하기에 기술된 기능을 지원하기 위한 약간 더 많은 소프트웨어를 포함한다. (설명의 편의를 위해, 본 기술의 이러한 양태와 연관된 소프트웨어에 이름이 주어진다: ThingPipe 소프트웨어. 서모스탯 메모리는 따라서 ThingPipe 코드를 가지며, 이것은 상술된 기능을 구현하기 위해 - 셀 폰들과 같은 - 다른 디바이스들 상의 다른 코드와 협력한다. 80 shows a similar thermostat 530, but implements principles in accordance with certain aspects of the present technology. Like the thermostat 512, the thermostat 530 includes a temperature sensor 514 and a processor 532. Memory 534 may store the same programming and data as memory 524. However, this memory 534 contains slightly more software to support the functions described below. (For convenience of description, a name is given to the software associated with this aspect of the present technology: ThingPipe software. The thermostat memory thus has a ThingPipe code, which is used to implement the functions described above-other devices-such as cell phones-. Work with other code on the desktop.

서모스탯(530)은 서모스탯(512)과 동일한 이용자 인터페이스(518)를 포함할 수 있다. 그러나, LCD 디스플레이 및 버튼들과 같은 많은 연관된 부분들을 생략함으로써 상당한 절감들이 달성될 수 있다. 묘사된 서모스탯은 따라서 표시기 광들(522)만을 포함할 수 있고, 이들도 생략될 수 있다. The thermostat 530 may include the same user interface 518 as the thermostat 512. However, significant savings can be achieved by omitting many associated parts, such as LCD displays and buttons. The depicted thermostat may thus only include indicator lights 522, which may also be omitted.

서모스탯(530)은 또한, 그 아이덴티티가 셀 폰에 의해 감지될 수 있는 어레인지먼트를 포함한다. 서모스탯으로부터의 WiFi 방출들이 활용될 수 있다(예를 들면, 디바이스의 MAC 식별자에 의해). 그러나, 셀 폰의 카메라에 의해 감지될 수 있는 표시자와 같은 다른 수단이 바람직하다. The thermostat 530 also includes an arrangement whose identity can be detected by the cell phone. WiFi emissions from the thermostat may be utilized (eg, by the device's MAC identifier). However, other means such as an indicator that can be detected by the camera of the cell phone are desirable.

스테가노그래픽 디지털 워터마크는 셀 폰 카메라에 의해 감지될 수 있는 하나의 그러한 표시자이다. 디지털 워터마크 기술은 6,590,996 및 6,947,571을 포함한 양수인의 특허들에 상술된다. 워터마크 데이터는 서모스탯의 외부 상에, 부착 라벨 상에, 서모스탯 상의 의사 우드-그레인 트림(pseudo wood-grain trim) 상 등에 텍스처 패턴으로 인코딩될 수 있다. (스테가노그래픽 인코딩은 숨겨져 있으므로, 도 80에는 묘사되지 않는다.)Steganographic digital watermarks are one such indicator that can be detected by a cell phone camera. Digital watermark technology is detailed in Assignee's patents, including 6,590,996 and 6,947,571. The watermark data may be encoded in a texture pattern on the exterior of the thermostat, on an attachment label, on a pseudo wood-grain trim on the thermostat, and the like. (The steganographic encoding is hidden and therefore not depicted in FIG. 80.)

다른 적당한 표시자는 도 80에 도시된 바코드(536)와 같이, 1D 또는 2D 바코드 또는 명확한 심볼들이다. 이것은 부착 라벨 등에 의해 적용된 서모스탯 하우징 상에 인쇄될 수 있다. Other suitable indicators are 1D or 2D barcodes or clear symbols, such as barcode 536 shown in FIG. This can be printed on the thermostat housing applied by an attachment label or the like.

RFID 칩 538과 같은 서모스탯을 식별하기 위한 또 다른 수단이 활용될 수 있다. 다른 하나는 블루투스 식별자의 - 블루투스에 의해서와 같이 - 단거리 무선 브로드캐스트 또는 네트워크 서비스 발견 프로토콜(예를 들면, Bonjour)이다. SIFT와 같은 스케일-불변 특징 변환 또는 이미지 핑거프린팅과 같은 수단에 의한 오브젝트 인식이 또한 이용될 수 있다. 다른 식별자들도 이용될 수 있다 - 이용자에 의해 수동으로 입력되거나, 가능한 디바이스들의 디렉토리 구조를 네비게이팅하는 것을 통해 식별된다. 기술자는 많은 다른 대안들을 알 것이다. Another means for identifying a thermostat, such as an RFID chip 538, may be utilized. The other is a short range wireless broadcast or network service discovery protocol (eg Bonjour) of the Bluetooth identifier, such as by Bluetooth. Object recognition by means such as scale-invariant feature transformation such as SIFT or image fingerprinting may also be used. Other identifiers can also be used-manually entered by the user or identified through navigating the directory structure of possible devices. The skilled person will know many other alternatives.

도 81은 애플 아이폰 디바이스와 같은 예시적인 셀 폰(540)을 도시한다. 여기에는 처리기(542), 카메라(544), 마이크로폰, RF 송수신기, 네트워크 적응기, 디스플레이 및 이용자 인터페이스를 포함한 통상적인 요소들이 포함된다. 이용자 인터페이스는 터치-스크린 센서뿐만 아니라 물리적 제어들을 포함한다. (이용자 인터페이스 및 연관된 소프트웨어의 상세들은 애플의 특허 공개 20080174570에서 제공된다.) 폰의 메모리(546)는 일반적인 오퍼레이팅 시스템 및 애플리케이션 소프트웨어를 포함한다. 그 외에도, 이 명세서에 상술된 기능들을 실행하기 위한 ThingPipe 소프트웨어를 포함한다. 81 shows an example cell phone 540, such as an Apple iPhone device. This includes typical elements including processor 542, camera 544, microphone, RF transceiver, network adaptor, display and user interface. The user interface includes physical controls as well as touch-screen sensors. (Details of the user interface and associated software are provided in Apple's patent publication 20080174570.) The phone's memory 546 includes general operating system and application software. In addition, it includes ThingPipe software for performing the functions detailed herein.

예시된 실시예의 동작으로 돌아가서, 이용자는 셀 폰 카메라(544)를 이용하여 디지털-워터마킹된 서모스탯(530)을 묘사하는 이미지를 캡처한다. 셀 폰의 처리기(542)는 캡처된 이미지 데이터를 사전-처리하고(예를 들면, 이미지 데이터를 위너 필터 또는 다른 필터링 및/또는 압축을 적용하여), 처리된 데이터를 원격 서버(552)에 무선으로 송신한다(도 82) - 셀 폰을 식별하는 정보와 함께. (이것은 셀 폰에서 ThingPipe 코드의 기능의 일부일 수 있다.) 무선 통신은 근처의 무선 액세스 지점에 WiFi에 의할 수 있고, 그 후에 서버(552)로의 인터넷에 의할 수 있다. 또는 셀 폰 네트워크가 활용될 수 있다, 등.Returning to the operation of the illustrated embodiment, the user captures an image depicting the digital-watermarked thermostat 530 using cell phone camera 544. The processor 542 of the cell phone pre-processes the captured image data (eg, by applying Wiener filters or other filtering and / or compression), and wirelessly processes the processed data to the remote server 552. (Fig. 82)-with information identifying the cell phone. (This may be part of the function of the ThingPipe code in the cell phone.) The wireless communication may be by WiFi to a nearby wireless access point and then by the Internet to the server 552. Or a cell phone network may be utilized, and the like.

서버(552)는 셀 폰으로부터 수신된 처리된 이미지 데이터에 디코딩 알고리즘을 적용하여, 스테가노그래픽으로 인코딩된 디지털 워터마크 데이터를 추출한다. 이 디코딩된 데이터 - 서모스탯의 식별자를 포함할 수 있음 - 는 셀 폰을 식별하는 정보와 함께, 라우터(554)에 인터넷에 의해 송신된다. The server 552 applies a decoding algorithm to the processed image data received from the cell phone to extract steganographically encoded digital watermark data. This decoded data, which may include an identifier of the thermostat, is sent by the Internet to the router 554, along with information identifying the cell phone.

라우터(554)는 식별자를 수신하고 네임스페이스 데이터베이스(555)에서 이를 룩업한다. 네임스페이스 데이터베이스(555)는 식별자의 최상위 비트들을 조사하고, 그 그룹의 식별자들 책임이 있는 특정 서버를 식별하기 위한 질의를 한다. 이 처리에 의해 식별된 서버(556)는 그 서모스탯에 속한 데이터를 가진다. (이러한 어레인지먼트는 인터넷 라우팅에서 활용되는 도메인 이름 서버들과 유사하다. 특허 6,947,571은 그러한 데이터로 무엇을 할지를 알고 있는 서버를 식별하기 위하여 워터마킹된 데이터가 어떻게 이용될 수 있는지에 관한 부가의 개시내용을 가진다.)Router 554 receives the identifier and looks up it in namespace database 555. The namespace database 555 examines the most significant bits of the identifier and makes a query to identify the particular server responsible for that group's identifiers. The server 556 identified by this process has data belonging to the thermostat. (This arrangement is similar to domain name servers utilized in Internet routing. Patent 6,947,571 describes additional disclosures about how watermarked data can be used to identify servers that know what to do with such data. Have.)

라우터(554)는 정보에 대해 식별된 서버(556)를 폴링한다. 예를 들면, 라우터(554)는 서모스탯에 관련된 현재 데이터(예를 들면, 현재 온도 설정값 및 주변 온도, 서버(556)는 WiFi를 포함하는 링크에 의해 서모스탯으로부터 획득할 수 있음)를 서버(556)로부터 청구항 수 있다. 또한, 서버(556)는 특정 서모스탯을 제어하기 위해 애플 아이폰(540) 상의 디스플레이에 적합한 그래픽 이용자 인터페이스에 관한 정보를 제공하도록 요청된다. 이 정보는 예를 들면, 셀 폰(540) 상에서 실행하고 서모스탯과의 이용을 위해 적합한 GUI를 제공하는 자바스크립트 애플리케이션을 포함할 수 있다. 이 정보는 셀 폰 상으로 - 직접 또는 서버(552)를 통해 - 다시 넘겨질 수 있다. 리턴된 정보는 서버(556)의 IP 어드레스를 포함할 수 있어서, 셀 폰은 그 후에 서버(556)와 직접 데이터를 교환할 수 있다. Router 554 polls the identified server 556 for information. For example, router 554 may store current data related to the thermostat (eg, current temperature set point and ambient temperature, server 556 may be obtained from the thermostat by a link including WiFi). Claim 556. The server 556 is also required to provide information regarding a graphical user interface suitable for display on the Apple iPhone 540 to control a particular thermostat. This information may include, for example, a JavaScript application that runs on cell phone 540 and provides a suitable GUI for use with a thermostat. This information can be passed back onto the cell phone-either directly or through the server 552. The returned information may include the IP address of the server 556 so that the cell phone may then exchange data directly with the server 556.

셀 폰(540)에서 ThingPipe 소프트웨어는 그 스크린 상에 서모스탯(530)에 대한 그래픽 이용자 인터페이스를 제공함으로써 수신된 정보에 응답한다. 이 GUI는 서모스탯에 대한 주변 온도 및 설정값 온도를 포함할 수 있다 - 서버(556)로부터 수신할지 또는 서모스탯으로부터 직접(WiFi에 의해서와 같이). 또한, 제공된 GUI는 이용자가 설정들을 변경하도록 동작할 수 있는 제어들을 포함한다. 설정값 온도를 올리기 위해, 이용자는 이 동작에 대응하는 디스플레이된 제어(예를 들면, "증가 온도" 버튼)를 터치한다. UI 디스플레이에 제공된 설정값 온도는 이용자의 동작에 응답하여 즉시 증가한다 - 아마도, 요청이 계류중인 것을 나타내기 위해 플래싱 또는 다른 특이한 방식으로.At cell phone 540, the ThingPipe software responds to the received information by providing a graphical user interface to the thermostat 530 on its screen. This GUI may include ambient temperature and setpoint temperature for the thermostat-whether to receive from the server 556 or directly from the thermostat (such as by WiFi). The provided GUI also includes controls that the user can operate to change settings. To raise the setpoint temperature, the user touches the displayed control (eg, "increase temperature" button) corresponding to this operation. The setpoint temperature provided in the UI display immediately increases in response to the user's action-perhaps in flashing or other unusual way to indicate that the request is pending.

이용자의 터치는 또한 ThingPipe 소프트웨어가 셀 폰(540)으로부터 대응하는 데이터를 서모스탯에 송신하도록 한다(송신은 도 82에 도시된 다른 디바이스의 일부 또는 전부를 포함할 수 있거나, 서모스탯에 직접 - WiFi에 의해서와 같이 - 진행할 수 있다). 이 데이터의 수신시, 서모스탯은 이용자의 명령어들마다 그 설정 온도를 증가시킨다. 그 후에, 서모스탯으로부터 셀 폰으로 다시 중계되는 확인 메시지를 발행한다. 확인 메시지의 수신시, 증가된 온도 표시기의 플래싱은 중단되고, 설정값 온도는 그 후에 정적인 형태로 디스플레이된다. (다른 어레인지먼트들이 당연히 가능하다. 예를 들면, 확인 메시지는 가시적인 신호로서 이용자에게 렌더링될 수 있다 - 디스플레이 상에 제공된 텍스트 "수용됨", 청취 가능하게 울림 또는 음성으로 "OK"라고 말함과 같이.) The user's touch also causes the ThingPipe software to transmit corresponding data from the cell phone 540 to the thermostat (transmission may include some or all of the other devices shown in FIG. 82, or directly to the thermostat-WiFi As in-can proceed). Upon receipt of this data, the thermostat increases its set temperature per user's instructions. Thereafter, a confirmation message relayed from the thermostat back to the cell phone is issued. Upon receipt of the confirmation message, flashing of the increased temperature indicator is stopped and the setpoint temperature is then displayed in static form. (Other arrangements are of course possible. For example, the confirmation message can be rendered to the user as a visible signal-such as the text "accepted", audibly ringing or "OK" in speech provided on the display. .)

일 특정 실시예에서, 디스플레이된 UI는 셀 폰의 스크린 상에 오버레이로서 제공되고, 이용자에 의해 초기에 캡처된 이미지의 최상단에 서모스탯을 묘사한다. UI의 특징들은 캡처된 이미지에 도시된 임의의 대응하는 물리적 제어들(예를 들면, 버튼들)과 등록된 정렬로 제공된다. 따라서, 서모스탯이 온도 업 및 온도 다운 버튼들(예를 들면, 도 79에서 "+" 및 "-" 버튼들)을 가진다면, 그래픽 오버레이는 적색 점선들을 스크롤하는 것과 같이 특이한 방식으로 디스플레이된 이미지에서 이들을 아웃라인할 수 있다. 이들은 이용자가 설정값 온도를 올리거나 낮추기 위해 터치하는 그래픽 제어들이다.In one particular embodiment, the displayed UI is provided as an overlay on the screen of the cell phone and depicts the thermostat on top of the image initially captured by the user. Features of the UI are provided in registered alignment with any corresponding physical controls (eg, buttons) shown in the captured image. Thus, if the thermostat has temperature up and temperature down buttons (eg, "+" and "-" buttons in FIG. 79), the graphical overlay displays the image displayed in a unique manner such as scrolling red dotted lines. You can outline them in. These are graphic controls that the user touches to raise or lower the setpoint temperature.

이것은 도 83에 개략적으로 도시되었으며, 여기서 이용자는 서모스탯의 일부의 이미지(560)를 캡처했다. 이미지에는 워터마크(562)의 적어도 일부가 포함된다(예시를 위해 가시적이게 도시됨). 서모스탯의 레이아웃에 관해 서버(556)로부터 획득된 데이터 및 워터마크를 참조하여, 셀 폰 처리기는 이미지 최상단의 점선들을 스크롤하여 오버레이한다 - "+" 및 "-" 버튼들을 아웃라인한다. 폰의 터치-스크린 이용자 인터페이스가 이들 아웃라인된 영역들에서 터치할 때, 셀 폰에서 ThingPipe 소프트웨어에 이를 보고한다. 그 후에, 이들 터치들을 서모스탯 온도를 증가시키거나 감소시키는 명령어들로서 해석하고, 이러한 명령어들을 서모스탯에 송신한다(예를 들면, 서버(552 및/또는 556)를 통해). 한편, 이것은 또한 이미지의 최상단에 오버레이된 "SET TEMPERATURE" 그래픽을 증가시키고, 확인 메시지가 서모스탯으로부터 다시 수신될 때까지 이를 플래싱하게 한다. This is shown schematically in FIG. 83, where the user has captured an image 560 of a portion of the thermostat. The image includes at least a portion of the watermark 562 (shown visually for illustration). With reference to the watermark and the data obtained from the server 556 regarding the layout of the thermostat, the cell phone processor scrolls and overlays the dotted lines at the top of the image-outlines the "+" and "-" buttons. When the phone's touch-screen user interface touches in these outlined areas, the cell phone reports it to the ThingPipe software. Thereafter, these touches are interpreted as commands to increase or decrease the thermostat temperature and send these commands to the thermostat (eg, via server 552 and / or 556). On the other hand, this also increases the "SET TEMPERATURE" graphic overlaid on top of the image and causes it to flash until a confirmation message is received back from the thermostat.

이미지 데이터를 캡처한 그래픽 이용자 인터페이스 최상단의 등록된 오버레이는 서모스탯 하우징 상의 인코딩된 워터마크 데이터에 의해 가능해진다. 워터마크에서의 교정 데이터는 정밀하게 결정될 이미지 내의 스케일, 변환 및 서모스탯의 배치의 회전을 허용한다. 워터마크가 다른 디바이스 특징들(예를 들면, 버튼들 및 디스플레이들)과의 알려진 공간적 관계에서 서모스탯 상에 신뢰가능하게 배치되면, 캡처된 이미지 내의 이들 특징들의 위치들은 워터마크를 참조하여 결정될 수 있다. (이러한 기술은 출원인의 공개 특허 출원 20080300011에 더 상술되어 있다.)Registered overlay on top of the graphical user interface that captured the image data is enabled by encoded watermark data on the thermostat housing. The calibration data in the watermark allows rotation of the scale, transformation and placement of the thermostat in the image to be precisely determined. If the watermark is reliably placed on the thermostat in a known spatial relationship with other device features (eg, buttons and displays), the locations of these features in the captured image may be determined with reference to the watermark. have. (These techniques are further detailed in Applicant's published patent application 20080300011.)

셀 폰이 터치-스크린을 가지지 않는다면, UI의 등록된 오버레이가 여전히 이용될 수 있다. 그러나, 이용자가 터치하기 위한 스크린 타겟을 제공하는 대신, 셀 폰 스크린 상에 제공된 아웃라인된 버튼들은 아웃라인된 기능을 활성화하기 위해 이용자가 눌러야 하는 폰의 키패드 상의 대응하는 버튼들을 나타낼 수 있다. 예를 들면, "+" 버튼 주위의 아웃라인된 박스는 숫자 "2"와 함께 주기적으로 주황색 플래싱될 수 있다 - 서모스탯 온도 설정값을 증가시키기 위해 이용자가 셀 폰 키패드 상의 "2" 버튼을 눌러야 하는 것을 나타낸다. (숫자 "2"는 이미지의 "+" 부분의 최상단에서 플래싱된다 - 이용자가 숫자가 플래싱되고 있지 않을 때 컵처된 이미지에서 "+" 마킹을 식별하도록 허용한다.) 유사하게, "-" 버튼 주위의 아웃라인된 박스는 숫자 "8"과 함께 주기적으로 주황색 플래싱될 수 있다 - 서모스탯 온도 설정값을 감소시키기 위해 이용자가 셀 폰 키패드 상의 "8" 버튼을 눌러야 하는 것을 나타낸다. 도 84에서 572,574를 참조한다. If the cell phone does not have a touch-screen, the registered overlay of the UI can still be used. However, instead of providing a screen target for the user to touch, the outlined buttons provided on the cell phone screen may represent corresponding buttons on the keypad of the phone that the user must press to activate the outlined function. For example, the outlined box around the "+" button can be flashed orange with the number "2"-the user must press the "2" button on the cell phone keypad to increase the thermostat temperature setpoint. It shows. (The number "2" is flashed on top of the "+" portion of the image-allows the user to identify the "+" marking on the captured image when the number is not flashing.) Similarly, around the "-" button The outlined box of can be flashed orange periodically with the number "8"-indicating that the user must press the "8" button on the cell phone keypad to decrease the thermostat temperature setpoint. See 572,574 in FIG. 84.

서모스탯의 캡처된 이미지 상으로의 등록된 정렬로의 그래픽 이용자 인터페이스의 오버레이가 워터마크들의 이용을 통해 구현하기가 가장 쉽다고 생각되지만, 다른 어레인지먼트들이 가능하다. 예를 들면, 바코드의 크기 및 스케일과 서모스탯 상의 그 위치가 알려지면, 오버레이를 위한 서모스탯 특징들의 위치들은 기하학적으로 결정될 수 있다. 이미지 핑거프린트-기반 방식과 유사하다(SIFT를 포함). 서모스탯의 정규 출현이 알려지면(예를 들면, 서버(556에 의해), 캡처된 이미지 내의 특징들의 관련 위치들은 이미지 분석에 의해 구별될 수 있다. Although the overlay of the graphical user interface to the registered alignment of the thermostat onto the captured image is thought to be easiest to implement through the use of watermarks, other arrangements are possible. For example, once the size and scale of the barcode and its location on the thermostat are known, the locations of the thermostat features for the overlay can be determined geometrically. Similar to an image fingerprint-based approach (including SIFT). Once the normal appearance of the thermostat is known (eg, by server 556), the relevant locations of the features in the captured image can be distinguished by image analysis.

하나의 특정 어레인지먼트에서, 이용자는 서모스탯을 묘사하는 이미지의 프레임을 캡처하고, 이 프레임은 폰에 의한 정적 디스플레이를 위해 버퍼링된다. 그 후에, 오버레이는 이러한 정적 이미지와의 등록된 정렬로 제공된다. 이용자가 카메라를 이동시키면, 정적 이미지가 지속되고, 오버레이된 UI는 유사하게 정적이다. 다른 어레인지먼트에서, 이용자는 이미지들의 스트림을 캡처하고(예를 들면, 비디오 캡처), 오버레이는 이들이 프레임에서 프레임으로 이동하는 경우에도 이미지의 특징들과 등록된 정렬로 제공된다. 이 경우, 오버레이는 셀 폰 스크린 내의 묘사된 서모스탯의 움직임에 대응하여 스크린에 걸쳐 이동할 수 있다. 이러한 어레인지먼트는 상이한 양태들의 서모스탯을 캡처하기 위해 이용자가 카메라를 이동시키도록 허용할 수 있다 - 아마도, 부가의 특징들/제어들이 나타난다. 또는 어떤 특정한 특징들(및 대응하는 그래픽 오버레이들)이 나타나거나, 셀 폰의 터치스크린 디스플레이 상에 더 큰 스케일로 나타나도록 이용자가 카메라를 주밍하도록 허용한다. 이러한 동적 오버레이 실시예에서, 이용자는 임의의 시간에 캡처된 이미지를 선택적으로 고정시킬 수 있고, 그 후에 오버레이된 이용자 인터페이스 제어와 함께 계속 동작할 수 있다(정적) - 카메라의 시야에 서모스탯을 유지하는 것에 상관없이.In one particular arrangement, the user captures a frame of the image depicting the thermostat, which is buffered for static display by the phone. The overlay is then provided in a registered alignment with this static image. As the user moves the camera, the static image persists, and the overlaid UI is similarly static. In another arrangement, the user captures a stream of images (eg, video capture), and the overlay is provided in registered alignment with the features of the image even when they move from frame to frame. In this case, the overlay may move across the screen in response to movement of the depicted thermostat within the cell phone screen. Such an arrangement may allow the user to move the camera to capture thermostats of different aspects-perhaps additional features / controls appear. Or allow the user to zoom the camera so that certain specific features (and corresponding graphic overlays) appear or appear on a larger scale on the touch screen display of the cell phone. In this dynamic overlay embodiment, the user can optionally freeze the captured image at any time, and then continue to operate with the overlaid user interface control (static) —keep the thermostat in view of the camera. Regardless of what you do.

서모스탯(530)이 가시적 제어들을 가지지 않는 종류의 것이면, 셀 폰 상에 디스플레이된 UI는 임의의 포맷이 될 수 있다. 셀 폰이 터치-스크린을 가지면, 서모스탯 제어들은 디스플레이 상에 제공될 수 있다. 터치-스크린이 존재하지 않는다면, 디스플레이는 단순히 대응하는 메뉴를 제공할 수 있다. 예를 들면, 온도 설정값을 증가시키기 위해 이용자에게 "2"를 누르도록, 온도 설정값을 감소시키기 위해 "8"을 누르도록 명령할 수 있다. If the thermostat 530 is of a kind without visual controls, the UI displayed on the cell phone may be in any format. If the cell phone has a touch-screen, thermostat controls may be provided on the display. If no touch-screen is present, the display may simply provide a corresponding menu. For example, the user may be instructed to press "2" to increase the temperature setpoint and to press "8" to decrease the temperature setpoint.

이용자가 셀 폰을 통해 명령어를 발행한 후에, 명령어는 상술된 바와 같이 서모스탯에 중계되고, 확인 메시지가 다시 리턴되는 것이 바람직하다 - ThingPipe 소프트웨어에 의해 이용자에게 렌더링하기 위해.After the user issues a command through the cell phone, the command is preferably relayed to the thermostat as described above and a confirmation message is returned again-for rendering to the user by ThingPipe software.

디스플레이된 이용자 인터페이스는 폰이 상호작용하고 있는 디바이스(예를 들면, 서모스탯)의 기능이고, 또한 셀 폰 자체의 능력들의 기능(예를 들면, 그것이 터치-스크린을 가지는지의 여부, 스크린의 치수 등)일 수도 있음을 알 것이다. 셀 폰의 ThingPipe 소프트웨어가 이들 상이한 UI들을 생성할 수 있게 하는 명령어들 및 데이터는 서모스탯을 관리하고 서모스탯이 상호작용하는 셀 폰의 메모리(546)에 전달되는 서버(556)에 저장될 수 있다. The displayed user interface is the function of the device (e.g., thermostat) with which the phone is interacting, and also the functions of the capabilities of the cell phone itself (e.g. whether it has a touch-screen, the dimensions of the screen). Etc.). Instructions and data that allow the cell phone's ThingPipe software to create these different UIs can be stored in a server 556 that manages the thermostat and is delivered to the memory 546 of the cell phone with which the thermostat interacts. .

그렇게 제어될 수 있는 디바이스의 다른 예는 WiFi 가능한 파킹 미터이다. 이용자는 셀 폰 카메라로 파킹 미터의 이미지를 캡처한다(예를 들면, 버튼을 누름으로써, 또는 이미지 캡처는 자유롭게 실행할 수 있다 - 매초 또는 여러 번과 같이). 처리들은 상술된 바와 같이 일반적으로 발생한다. ThingPipe 소프트웨어는 이미지 데이터를 처리하고, 라우터(554)는 그 파킹 미터와의 ThingPipe 상호작용들에 책임이 있는 서버(556a)를 식별한다. 서버는 그 미터에 대한 상태 정보와 선택적으로 UI 상호작용들(예를 들면, 남아있는 시간; 최대 허용 가능한 시간)을 리턴한다. 이들 데이터는 셀 폰 UI 상에 디스플레이되며, 예를 들면, 시간을 구매하는 제어들/명령어들과 함께 셀 폰의 캡처된 이미지 상에 오버레이된다.Another example of a device that can be controlled so is a WiFi enabled parking meter. The user captures an image of the parking meter with the cell phone camera (for example, by pressing a button, or the image capture can be freely performed-such as every second or several times). The processes generally occur as described above. ThingPipe software processes the image data, and router 554 identifies the server 556a responsible for ThingPipe interactions with its parking meter. The server returns status information for that meter and optionally UI interactions (eg, remaining time; maximum allowable time). These data are displayed on the cell phone UI and are overlaid on the captured image of the cell phone, for example with controls / commands to buy time.

이용자는 미터에 2시간을 추가하기 위해 셀 폰과 상호작용한다. 대응하는 지불이 예를 들면, 이용자의 신용 카드 계정 - 셀 폰 또는 원격 서버에 암호화된 프로파일 정보로서 저장된 - 으로부터 인출된다. (셀 폰들과의 이용을 위해 적합한 지불 어레이먼트들을 포함한 온라인 지불 시스템들이 잘 알려져 있어서, 여기서는 장황하게 설명하지 않는다.) 셀 폰 상의 이용자 인터페이스는 지불이 만족스럽게 이루어진 것을 확인하고, 미터로부터 구매된 분들의 수를 표시한다. 거리측의 미터에서의 디스플레이도 또한 구매된 시간을 반영할 수 있다. The user interacts with the cell phone to add two hours to the meter. The corresponding payment is withdrawn, for example, from the user's credit card account, which is stored as encrypted profile information on the cell phone or remote server. (Online payment systems, including payment arrays suitable for use with cell phones, are well known and are not described in detail here.) The user interface on the cell phone confirms that the payment has been made satisfactorily, and has been purchased from the meter. Display the number of. The display at the meter on the side may also reflect the time of purchase.

이용자는 미터를 떠나 다른 업무를 실행하고, 다른 용도들을 위해 셀 폰을 이용할 수 있다. 셀 폰은 낮은 전력 모드의 상태가 될 수 있다 - 스크린이 어두워진다. 그러나, 다운로드된 애플리케이션 소프트웨어는 미터 상에 남아 있는 분들의 수를 추적한다. 이것은 또한 데이터를 연관된 서버에서 주기적으로 질의함으로써 행해질 수 있다. 또는 독립적으로 카운트다운된 시간을 추적할 수 있다. 주어진 지점에서, 예를 들면, 10분이 남아 있으면, 셀 폰은 경보를 울린다. The user can leave the meter to perform other tasks and use the cell phone for other purposes. The cell phone can be in a low power mode-the screen dims. However, the downloaded application software keeps track of how many remain on the meter. This can also be done by periodically querying the data at the associated server. Alternatively, the countdown time can be tracked independently. At a given point, for example, if 10 minutes remain, the cell phone sounds an alarm.

셀 폰을 보면, 이용자는 셀 폰이 활성 상태로 리턴되었고, 미터 UI가 스크린에 복구되었음을 알 수 있다. 디스플레이된 UI는 남아있는 시간을 보고하고, 더 많은 시간을 구매할 기회를 이용자에게 제공한다. 이용자는 다른 30분의 시간을 구매한다. 완료된 구매는 셀 폰 디스플레이 상에서 확인된다 - 40분의 시간이 남아있음을 보여준다. 거리측 미터 상의 디스플레이는 유사하게 업데이트될 수 있다. Looking at the cell phone, the user can see that the cell phone has returned to the active state and the Meter UI has been restored to the screen. The displayed UI reports the remaining time and gives the user the opportunity to purchase more time. The user purchases another 30 minutes of time. The completed purchase is confirmed on the cell phone display-showing 40 minutes of time remaining. The display on the distance side meter can be similarly updated.

이용자가 시간을 추가하기 위해 미터에 물리적으로 리턴할 필요가 없음을 유념한다. 셀 폰과 파킹 미터 사이에 가상 링크가 지속되었거나 재확립되었다 - 이용자가 12 블록들을 걸어서 다수층들 위로 엘리베이터를 탈 수 있었을 지라도. 파킹 미터 제어는 셀 폰만큼 가깝다. Note that the user does not need to physically return to the meter to add time. The virtual link between the cell phone and the parking meter was either continued or reestablished-even though the user could walk the 12 blocks and climb the elevator over multiple floors. Parking meter control is as close as the cell phone.

(특별히 상술되지 않았지만, 파킹 미터의 블록도는 온도 센서를 가지지 않은 것을 제외하고 도 80의 서모스탯의 것과 유사하다.) (Unless specifically mentioned, the block diagram of the parking meter is similar to that of the thermostat of FIG. 80 except that it does not have a temperature sensor.)

제 3 예 - 호텔에서의 침대 알람 클럭 - 을 고려하자. 대부분의 여행자들은 이러한 클럭들이 제공하는 다양한 비논리적인 이용자 인터페이스들이 잘못되는 경험을 알고 있다. 그것은 늦다; 여행자는 긴 비행으로부터 혼미하고, 이제는 5:30 a.m.에 알람 클럭을 설정하기 위해 흐릿한 호텔 방에서 블랙 클럭 상의 블랙 버튼들 중 어느 것이 조작되어야 하는지를 알아내는 잡일에 직면한다. 이러한 디바이스들이 이용자의 셀 폰 상에 제공되는 인터페이스에 의해 제어될 수 있는 경우가 더욱 양호하다 - 여행자가 반복된 이용으로부터 알고 있는 표준화된 이용자 인터페이스가 바람직하다. Consider the third example-bed alarm clock in a hotel. Most travelers know the various illogical user interfaces that these clocks provide are wrong. It is late; The traveler is confusing from the long flight and now faces the chore of finding out which of the black buttons on the black clock should be manipulated in the hazy hotel room to set the alarm clock at 5:30 am. It is even better if these devices can be controlled by an interface provided on the user's cell phone-a standardized user interface that the traveler knows from repeated use is preferred.

도 85는 본 기술의 양태들을 활용하는 알람 클럭(580)을 도시한다. 다른 알람 클럭들과 같이, 이것은 디스플레이(582), 물리적 UI(584)(예를 들면, 버튼들), 및 제어 처리기(586)를 포함한다. 그러나, 이 클럭은 또한 블루투스 무선 인터페이스(588), 및 처리기에 의한 실행을 위한 ThingPipe 및 블루투스 소프트웨어가 저장된 메모리(590)를 포함한다. 클럭은 또한 상술된 바와 같이, 디지털 워터마크 또는 바코드와 같은 자체 식별하기 위한 수단을 가진다.85 illustrates an alarm clock 580 utilizing aspects of the present technology. Like other alarm clocks, this includes a display 582, a physical UI 584 (eg, buttons), and a control processor 586. However, this clock also includes a Bluetooth air interface 588 and a memory 590 in which ThingPipe and Bluetooth software are stored for execution by the processor. The clock also has means for self identification, such as a digital watermark or barcode, as described above.

초기의 예들에서와 같이, 이용자는 클럭의 이미지를 캡처한다. 식별자는 셀 폰 처리기에 의해 또는 원격 서버(552b)의 처리기에 의해 이미지로부터 디코딩된다. 식별자로부터, 라우터는 그러한 클럭들에 관해 알 수 있는 다른 서버(556b)를 식별한다. 라우터는 셀 폰의 어드레스와 함께 식별자를 다른 서버에 넘겨준다. 서버는 특정 클럭을 룩업하기 위해 디코딩된 워터마크 식별자를 이용하고, 그 처리기, 디스플레이 및 다른 구성 데이터에 관한 명령어들을 리콜한다. 그것은 또한, 셀 폰(530)의 특정 디스플레이가 표준화된 클럭 인터페이스를 제공할 수 있는 명령어들을 제공하며, 이 표준화된 클럭 인터페이스를 통해 클럭 파라미터들이 설정될 수 있다. 서버는 파일에 이 정보를 패키징하며, 이것은 셀 폰에 다시 송신된다. As in the earlier examples, the user captures an image of the clock. The identifier is decoded from the image by the cell phone processor or by the processor of the remote server 552b. From the identifier, the router identifies another server 556b that can know about such clocks. The router passes the identifier along with the cell phone's address to another server. The server uses the decoded watermark identifier to look up a particular clock and recall instructions about its processor, display and other configuration data. It also provides instructions for a particular display of cell phone 530 to provide a standardized clock interface through which clock parameters can be set. The server packages this information in a file, which is sent back to the cell phone.

셀 폰은 이 정보를 수신하고, 서버(556b)에 의해 상술된 이용자 인터페이스를 스크린 상에 제공한다. 이것은 - 클럭의 모델 또는 제조업자에 상관없이 이 셀 폰이 호텔 알람 클럭과 상호작용하기 위해 이용될 때마다 나타나는 - 친숙한 인터페이스이다. (일부 경우들에서, 폰은 예를 들면 셀 폰에서 UI 캐시로부터 UI를 단순히 리콜할 수 있으며, 그것은 빈번하게 이용되기 때문이다.)The cell phone receives this information and provides the user interface described above by the server 556b on the screen. This is a familiar interface-which appears whenever this cell phone is used to interact with the hotel alarm clock, regardless of the model or manufacturer of the clock. (In some cases, the pawn can simply recall the UI from the UI cache, for example in a cell phone, because it is frequently used.)

UI에는 제어 "LINK TO CLOCK"이 포함된다. 선택시, 셀 폰은 클럭과 블루투스에 의해 통신한다. (서버(556b)로부터 송신된 파라미터들은 세션을 확립하도록 요구될 수 있다.) 일단 블루투스에 의해 링크되면, 클럭 상에 디스플레이된 시간은 셀 폰 UI 상에 옵션들의 메뉴와 함께 제공된다. The UI contains the control "LINK TO CLOCK". When selected, the cell phone communicates by clock and Bluetooth. (Parameters sent from server 556b may be required to establish a session.) Once linked by Bluetooth, the time displayed on the clock is provided with a menu of options on the cell phone UI.

셀 폰 스크린 상에 제공된 옵션들 중 하나는 "SET ALARM"이다. 선택시, UI는 다른 스크린(595)(도 86)으로 이동하여, 폰의 키패드 상의 디지트 키들을 누름으로써 원하는 알람 시간을 입력하도록 이용자에게 촉구한다. (예를 들면, 원하는 디지트들이 나타날 때까지 이들을 회전하게 하기 위해 터치-스크린 인터페이스 상의 디스플레이된 숫자들을 플리킹하는 것 등과 같이 다른 패러다임들이 자연스럽게 이용될 수 있다.) 원하는 시간이 입력되었을 때, 이용자는 시간을 설정하기 위해 셀 폰 키패드 상의 OK 버튼을 누른다. One of the options provided on the cell phone screen is "SET ALARM". Upon selection, the UI moves to another screen 595 (FIG. 86), prompting the user to enter the desired alarm time by pressing the digit keys on the keypad of the phone. (Other paradigms can naturally be used, such as, for example, flicking displayed numbers on a touch-screen interface to rotate them until the desired digits appear.) When the desired time is entered, the user Press the OK button on the cell phone keypad to set the time.

이전과 같이, 입력된 이용자 데이터(예를 들면, 알람 시간)는 디바이스가 확인 신호를 발행할 때까지 - 그 지점에서 디스플레이된 데이터가 플래싱을 멈춘다 - 명령어가 디바이스에 송신될 때(이 경우 블루투스에 의해) 플래싱한다.As before, the input user data (e.g., alarm time) remains until the device issues a confirmation signal-the data displayed at that point stops flashing-when the command is sent to the device (in this case Bluetooth). Flash).

클럭에서, 알람 시간을 5:30 a.m.으로 설정하는 명령어는 블루투스에 의해 수신된다. 알람 클럭 메모리의 ThingPipe 소프트웨어는 블루투스 신호에 의해 어떤 데이터가 전달되는 포맷을 이해하고, 원하는 시간 및 알람을 설정하기 위한 명령어를 분석해낸다. 알람 클럭 처리기는 그 후에 지정된 시간에 울리도록 알람을 설정한다. In the clock, a command to set the alarm time to 5:30 a.m. is received by Bluetooth. The ThingPipe software in the alarm clock memory understands the format in which data is transmitted by the Bluetooth signal, and analyzes the commands to set the desired time and alarm. The alarm clock processor then sets the alarm to sound at the specified time.

이 예에서, 셀 폰 및 클럭은 - 하나 이상의 중간 컴퓨터들을 통하기보다는 - 직접 통신하는 것을 유념한다.(다른 컴퓨터들은 클럭에 대한 프로그래밍 세부항목들을 획득하기 위하여 셀 폰에 의해 참고되었지만, 일단 획득되면, 더 접촉되지 않는다.) Note that in this example, the cell phone and clock communicate directly—rather than through one or more intermediate computers—other computers have been referenced by the cell phone to obtain programming details about the clock, but once obtained, No contact)

이 예 - 서모스탯과 달리 - 에서, 이용자 인터페이스는 이용자에 의해 캡처된 클럭의 이미지와 자체 통합하지 않음(예를 들면, 등록된 정렬로)을 더 유념한다. 이러한 개량은 일관된 이용자 인터페이스 경험을 제공하기 위하여 생략된다 - 프로그래밍되는 특정 클럭에 무관하다. In this example-unlike the thermostat-the user interface is further noted that it does not self-integrate (eg, with a registered alignment) with the image of the clock captured by the user. These improvements are omitted to provide a consistent user interface experience-independent of the particular clock being programmed.

초기의 예에서와 같이, 워터마크는 본 가입자에 의해 특정 디바이스를 식별하는데 양호하다. 그러나, 상기 주지된 식별 기술들을 포함하여 임의의 다른 알려진 식별 기술이 이용될 수 있다. As in the initial example, the watermark is good for identifying a particular device by the present subscriber. However, any other known identification technique can be used, including the above known identification techniques.

상술된 디바이스들의 각각에 선택적 위치 모듈들(596)에 관해 아직까지 설명하지 않았다. 하나의 그러한 모듈은 GPS 수신기이다. 이러한 모듈들에 적합한 다른 최신 기술이 디바이스들(예를 들면, WiFi, 셀룰러 등) 사이에서 일반적으로 교환되는 것인 무선 신호들에 의존한다. 여러 통신 디바이스들이 주어지면, 신호들 자체 - 및 이들을 제어하는 불완전한 디지털 클럭 신호들- 은 매우 정확한 시간 및 위치 양쪽 모두가 추출될 수 있는 참조 시스템을 형성한다. 이러한 기술은 국제 공개 특허 공보 WO08/073347에 상술된다.The optional location modules 596 for each of the devices described above have not been described yet. One such module is a GPS receiver. Another state-of-the-art technology suitable for such modules relies on wireless signals that are generally exchanged between devices (eg WiFi, cellular, etc.). Given several communication devices, the signals themselves-and the incomplete digital clock signals that control them-form a reference system from which both very accurate time and location can be extracted. Such techniques are detailed in International Publication No. WO08 / 073347.

디바이스들의 위치들을 알면, 향상된 기능이 실현되도록 허용한다. 예를 들면, 그것은 디바이스들이 - 식별자(예를 들면 워터마킹된 또는 다른)에 의하기보다는 - 그들 위치(예를 들면, 고유한 위도/경도/고도 좌표들)에 의해 식별되도록 허용한다. 더욱이, 그것은 셀 폰과 다른 ThingPipe 디바이스들 사이의 근접이 결정되도록 허용한다. Knowing the locations of the devices allows the improved functionality to be realized. For example, it allows devices to be identified by their location (eg, unique latitude / longitude / altitude coordinates) —rather than by an identifier (eg watermarked or otherwise). Moreover, it allows the proximity between the cell phone and other ThingPipe devices to be determined.

서모스탯에 접근하는 이용자의 예를 고려하자. 서모스탯의 이미지를 캡처하기보다는 이용자는 폰의 ThingPipe 소프트웨어를 간단히 론칭할 수 있다(또는 배경에서 이미 실행되고 있을 수 있다). 이러한 소프트웨어는 서버(552)에 셀 폰의 현재 위치를 통신하고, 근처의 다른 ThingPipe-가능한 디바이스들의 식별을 요청한다. ("근처(Nearby)"는 당연히 구현에 의존한다. 그것은 예를 들면, 10피트, 10미터들, 50피트, 50미터들 등일 수 있다. 이 파라미터는 셀 폰 이용자에 의해 규정될 수 있거나, 디폴트 값이 활용될 수 있다.) 서버(552)는 다른 ThingPipe-가능한 디바이스들의 현재 위치들을 식별하는 데이터베이스를 확인하고, 근처에 있는 것들을 식별하는 셀 폰에 데이터를 리턴한다. 리스팅(598)(도 87)은 - 이용자로부터의 거리를 포함하여- 셀 폰 스크린 상에 제공된다. (셀 폰의 위치 모듈이 자기계 또는 디바이스가 면하고 있는 방향을 결정하기 위한 다른 수단을 포함한다면, 디스플레이된 리스팅은 또한 거리를 가진 방향 단서들, 예를 들면 "당신의 왼쪽으로 4'"를 포함할 수 있다.)Consider the example of a user accessing a thermostat. Rather than capture an image of the thermostat, the user can simply launch the phone's ThingPipe software (or may already be running in the background). This software communicates the current location of the cell phone to the server 552 and requests identification of other ThingPipe-enabled devices nearby. ("Nearby" is of course implementation dependent. It may be, for example, 10 feet, 10 meters, 50 feet, 50 meters, etc. This parameter may be defined by the cell phone user, or may be the default. The value may be utilized.) The server 552 checks the database identifying the current locations of other ThingPipe-enabled devices and returns the data to the cell phone identifying those nearby. Listing 598 (FIG. 87) is provided on the cell phone screen—including distance from the user. (If the location module of the cell phone includes a magnetic field or other means for determining the direction that the device is facing, the displayed listing may also display direction clues with distance, eg "4 'to your left". May contain)

이용자는 디스플레이된 리스트로부터 THERMOSTAT을 선택한다(예를 들면, 터치스크린이면- 스크린을 터치함으로써, 또는 키패드 상의 연관된 디지트를 입력함으로써). 폰은 그 후에 상술된 바와 같이 이렇게 식별된 디바이스와 ThingPipe 세션을 확립한다. (이 예에서, 서모스탯 이용자 인터페이스는 이미지가 캡처되지 않았으므로, 서모스탯의 이미지의 최상단에 오버레이되지 않는다.)The user selects THERMOSTAT from the displayed list (for example, by touching the screen-by touching the screen, or by entering an associated digit on the keypad). The phone then establishes a ThingPipe session with the device so identified as described above. (In this example, the thermostat user interface is not overlaid on top of the thermostat's image because no image has been captured.)

상술된 3개의 예들에서, 디바이스와 상호작용하기 위해 누구에게 허가받아야 하고 얼마나 오래 동안인지의 질문이 존재한다.In the three examples described above, there is a question of who should be allowed to interact with the device and for how long.

호텔 알람 클럭의 경우에, 허가는 중요하지 않다. 방안의 누구나 - 클럭 식별자를 감지할 수 있는 - 클럭 파라미터들(예를 들면, 현재, 시간, 알람 시간, 디스플레이 밝기, 버즈 또는 무선에 의한 알람 등)을 설정하도록 허가받은 것으로 생각될 수 있다. 그러나, 허가는 이용자가 클럭의 주위 내(예를 들면, 블루투스 범위 내)에 있는 경우에만 지속되어야 한다. 다음날 밤의 손님이 잠자는 동안 이전 손님이 알람을 재프로그래밍해야 할 필요는 없다. In the case of a hotel alarm clock, permission is not important. Anyone in the room can be considered authorized to set clock parameters (eg, current, time, alarm time, display brightness, buzz or wireless alarm, etc.) that can detect a clock identifier. However, the permission should only last if the user is in the vicinity of the clock (eg within Bluetooth range). The previous guest does not need to reprogram the alarm while the next night's guest sleeps.

파킹 미터의 경우에, 허가는 미터에 접근하여 화상을 캡처하는(또는 단거리로부터 그 식별자를 감지하는) 누군가에게 다시 주어져야 한다. In the case of a parking meter, permission must be given back to someone who approaches the meter and captures an image (or detects its identifier from a short range).

주지된 바와 같이, 파킹 미터 경우에, 이용자는 나주에 대응하는 UI를 리콜하고 디바이스와 다른 트랜잭션들에 연계할 수 있다. 이것은 어느 정도는 양호하다. 아마도, 이미지 캡처의 시간으로부터 12시간이 적당한 시간 간격이며, 이 내에서 이용자는 미터와 상호작용할 수 있다. (12시간 중에 이용자가 나중에 시간을 추가하는 경우 - 그 공간에 그 밖의 누군가 파킹되어 있을 때 문제가 없다.) 대안적으로, 이용자의 디바이스와의 상호작용하기 위한 허가는 새로운 이용자가 미터와 세션을 개시할 때 종료될 수 있다(예를 들면, 디바이스의 이미지를 캡처하고 상기에 식별된 종류의 트랜잭션을 개시함으로써).As noted, in the case of the parking meter, the user can recall the UI corresponding to Naju and associate with the device and other transactions. This is somewhat good. Perhaps 12 hours from the time of image capture is a suitable time interval within which the user can interact with the meter. (If the user adds time later in 12 hours-there is no problem when someone else is parked in the space.) Alternatively, the permission to interact with the user's device allows the new user to establish a session with the meter. May be terminated upon initiation (eg, by capturing an image of the device and initiating a transaction of the kind identified above).

이용자의 허가 기간을 설정한 데이터를 저장한 메모리는 미터에 위치될 수 있거나, 또는 다른 어떤 곳, 예를 들면 서버(556a)에 위치될 수 있다. 이용자에 대한 대응하는 ID가 또한 일반적으로 저장될 수 있다. 이것은 이용자의 전화번호, 폰 디바이스에 대한 MAC 식별자, 또는 어떤 다른 일반적으로 고유한 식별자일 수 있다. The memory storing the data setting the user's permission period may be located in the meter or elsewhere, for example in the server 556a. The corresponding ID for the user may also be generally stored. This may be the user's telephone number, the MAC identifier for the phone device, or some other generally unique identifier.

서모스탯의 경우에, 온도를 변경하도록 허가된 사람과 얼마나 오랫동안인지에 관한 더 엄격한 제어들이 있을 수 있다. 아마도, 사무실의 관리인들만이 온도를 설정할 수 있다. 다른 직원은 예를 들면, 현재 주위 온도를 단지 보기 위한 더 낮은 권리들이 제공될 수 있다. 다시, 이러한 데이터를 저장하는 메모리는 서모스탯에, 서버(556)에, 또는 다른 곳에 위치될 수 있다. In the case of a thermostat, there may be tighter controls on who is allowed to change the temperature and how long. Perhaps only the managers of the office can set the temperature. Another employee may be provided with lower rights, for example, just to see the current ambient temperature. Again, the memory storing this data may be located in the thermostat, server 556, or elsewhere.

이들 3개의 예들은 간단하며, 제어되는 디바이스들은 작은 결과이다. 다른 애플리케이션들에서, 더 높은 보안이 자연스럽게 관련된다. 인증의 분야는 잘 발달되어 있고, 기술자는 알려진 기술들, 및 임의의 주어진 애플리케이션의 특정 요구들에 적합한 인증 어레인지먼트를 구현하기 위한 기술들로부터 도출할 수 있다. These three examples are simple and the controlled devices are a small result. In other applications, higher security is naturally involved. The field of authentication is well developed and the technician can derive from known techniques and techniques for implementing an authentication arrangement suitable for the specific needs of any given application.

기술이 널리 보급되면, 이용자는 여러 온-고잉 ThingPipe 세션들 사이에서 스위칭해야 할 수 있다. ThingPipe 애플리케이션은 선택시, 계류중이거나 최근 세션들의 리스트를 호출하는 "최근 UI" 메뉴 옵션을 가질 수 있다. 아무거나 선택하면 대응하는 UI를 리콜하여, 이용자가 특정 디바이스와의 초기의 상호작용을 계속하도록 허용한다. As technology becomes widespread, users may need to switch between several on-going ThingPipe sessions. The ThingPipe application may have a "Recent UI" menu option that, when selected, invokes a list of pending or recent sessions. Selecting anything recalls the corresponding UI, allowing the user to continue initial interaction with a particular device.

물리적 이용자 인터페이스들 - 서모스탯들 등에 대한 것과 같이 - 은 고정된다. 모든 이용자들에게는 동일한 물리적 디스플레이, 노브들(knobs), 다이얼들 등이 제공된다. 모든 상호작용들은 이러한 동일한 물리적 어휘의 제어들을 강제로 적응시켜야 한다. Physical user interfaces-such as for thermostats, etc.-are fixed. All users are provided with the same physical display, knobs, dials and the like. All interactions must forcibly adapt these same physical vocabulary controls.

본 기술의 양태들의 구현들은 더욱 다양할 수 있다. 이용자들은 저장된 프로파일 설정들을 가질 수 있다 - 셀 폰 UI들을 그들 특정 선호들에 - 전역적으로 및/또는 디바이스당 기초하여 - 맞춘다. 예를 들면, 색맹인 이용자가 그렇게 특정될 수 있어서, 항상 그레이 스케일 인터페이스가 - 이용자가 구별하기 어려울 수 있는 컬러들 대신에 - 제공되도록 한다. 원시 시력을 가진 사람은 정보가 가장 큰 가능한 폰트로 디스플레이되는 것이 양호할 수 있다 - 미학에 상관없이. 다른 사람은 합성된 음성에 의한 것과 같이 텍스트가 디스플레이로부터 판독되도록 선택할 수 있다. 하나의 특정 서모스탯 UI는 일반적으로 현재 데이터를 나타내는 텍스트를 제공할 수 있다; 이용자는 UI가 그러한 정보로 클러스터링되지 않는 것이 양호할 수 있고, - 그 UI에 대해 - 데이터 정보가 보여서는 안되는 것으로 지정할 수 있다. Implementations of aspects of the present technology may be more diverse. Users can have stored profile settings-tailor cell phone UIs to their specific preferences-globally and / or on a per device basis. For example, a user who is color blind may be so specified, that a gray scale interface is always provided—instead of colors that may be difficult for the user to distinguish. It may be desirable for a person with primitive vision to have the information displayed in the largest possible font-regardless of aesthetics. Others may choose to have the text read from the display, such as by synthesized speech. One particular thermostat UI may generally provide text representing current data; The user may prefer that the UI not be clustered with such information, and specify that-for that UI-no data information should be shown.

이용자 인터페이스는 또한, 오브젝트와의 특정 작업 지향 상호작용들을 위해 맞춤식이 될 수 있다. 기술자는 연관된 HVAC 시스템을 조정하기 위하여, 서모스탯에 대한 "디버그" 인터페이스를 호출할 수 있다; 사무 직원은 현재 및 설정값 온도들을 간단히 제공하는 더 간단한 UI를 호출할 수 있다. The user interface can also be customized for specific task oriented interactions with the object. The technician can call the "debug" interface to the thermostat to adjust the associated HVAC system; The office worker can invoke a simpler UI that simply provides current and setpoint temperatures.

상이한 이용자들에게 상이한 인터페이스들이 제공될 수 있는 것과 같이, 상이한 레벨들의 보안 및 액세스 특권들이 또한 제공될 수 있다. Different levels of security and access privileges may also be provided, as different interfaces may be provided to different users.

제 1 보안 레벨은 IP 어드레스와 같은 오브젝트의 표면 특징들에서 오브젝트에 대한 접촉 명령어들을 간단히 엔코딩(명확하게 또는 은밀하게)을 포함한다. 세션은 간단히, 디바이스로부터 접촉 정보를 수집하는 셀 폰으로 시작한다. (간접적으로 관련될 수 있다; 디바이스 상의 정보는 디바이스에 대한 접촉 정보를 저장하는 원격 저장소를 참조할 수 있다.)The first level of security simply includes (obviously or covertly) encoding contact instructions for the object in surface features of the object, such as an IP address. The session simply begins with a cell phone collecting contact information from the device. (It may be indirectly related; information on the device may refer to a remote repository that stores contact information for the device.)

제 2 레벨은 공개-키 정보를 포함하며, 이것은 명백한 심볼을 통해 디바이스 상에 명백하게 제공하고, 간접적으로 액세스되거나 달리-전달되는 스테가노그래픽 디지털 워터마킹을 통해 더욱 미묘하게 숨겨질 수 있다. 예를 들면, 디바이스 상의 기계 판독가능한 데이터는 디바이스의 공개-키를 제공할 수 있다 - 이를 이용하여 이용자로부터의 송신들이 암호화되어야 한다. 이용자의 송신들은 또한, 이용자의 공개 키를 전달할 수 있다- 이에 의해 디바이스는 이용자를 식별할 수 있고, 이를 이용하여 셀 폰에 리턴되는 데이터/명령어들이 암호화된다. The second level contains public-key information, which can be more subtly hidden through steganographic digital watermarking, which is explicitly provided on the device via explicit symbols and indirectly accessed or otherwise-delivered. For example, the machine-readable data on the device can provide the device's public-key-using which transmissions from the user must be encrypted. The user's transmissions can also convey the user's public key-whereby the device can identify the user and use it to encrypt the data / commands returned to the cell phone.

이러한 어레인지먼트는 디바이스와의 보안 세션을 허용한다. 몰 내의 서모스탯은 이러한 기술을 이용할 수 있다. 모든 통행자들은 서모스탯의 공개 키를 판독할 수 있다. 그러나, 서모스탯은 특정 이용자에게만 - 그들 각각의 공개 키들에 의해 식별된 - 제어 권리들을 제공할 수 있다.This arrangement allows for a secure session with the device. Thermostats in the mall can use this technique. All passers can read the public key of the thermostat. However, the thermostat can provide control rights only to a particular user-identified by their respective public keys.

제 3 레벨은 이용자가 디바이스의 사진을 능동적으로 찍음으로써만 획득될 수 있는 고유한 패턴들 또는 디지털 워터마킹을 제시하지 않는 한 디바이스의 제어를 방지하는 것을 포함한다. 즉, 그것은 디바이스에 대응하는 식별자를 송신할 만큼 충분히 간단하지 않다. 오히려, 이용자의 디바이스에 대한 물리적 근접을 증명하는 미뉴셔가 또한 캡처되고 송신되어야 한다. 디바이스의 화상을 캡처함으로써만 필요한 데이터를 이용자가 획득할 수 있다; 이미지 픽셀들은 이용자가 근처에서 사진을 찍는 것을 반드시 증명해야 한다. The third level includes preventing control of the device unless the user presents unique patterns or digital watermarking that can only be obtained by actively taking a picture of the device. That is, it is not simple enough to send an identifier corresponding to the device. Rather, a minus that proves the physical proximity of the user's device must also be captured and transmitted. The user can acquire the necessary data only by capturing an image of the device; Image pixels must prove that the user is taking a picture nearby.

스풀링을 회피하기 위하여, 이전에 제시된 모든 패턴들은 - 원격 서버에 또는 디바이스에 - 캐싱될 수 있고, 새로운 데이터가 수신될 때 새로운 데이터에 관해 확인된다. 동일한 패턴이 2번 제시되면, 자격을 잃을 수 있다 - 명백한 재생 공격으로서(즉, 디바이스의 각각의 이미지는 픽셀 레벨에서 일부 변동을 가져야 한다). 일부 어레인지먼트들에서, 디바이스의 출현은 시간에 걸쳐 변하고(예를 들면, 픽셀들의 주기적으로 변하는 패턴을 제공하는 디스플레이에 의해), 제시된 데이터는 바로 앞의 시간 간격(예를 들면 5초 또는 5분) 내에서 디바이스에 대응해야 한다. To avoid spooling, all previously presented patterns can be cached-either at the remote server or at the device-and verified with respect to the new data when new data is received. If the same pattern is presented twice, it may be disqualified-as an apparent playback attack (ie each image of the device must have some variation at pixel level). In some arrangements, the appearance of the device changes over time (e.g., by a display providing a periodically changing pattern of pixels), and the presented data is immediately preceding the time interval (e.g., 5 seconds or 5 minutes). It must correspond to the device within.

관련된 실시예에서, 임의의 아날로그 정보(출현, 사운드 및 온도 등)는 디바이스 또는 그 환경으로부터 감지될 수 있고, 디바이스에 대한 이용자 근접을 확립하기 위해 이용된다. (아날로그 정보의 불완전한 표현은 디지털 형태로 변환될 때, 재생 공격들을 검출하기 위해 다시 이용될 수 있다.) In a related embodiment, any analog information (appearance, sound and temperature, etc.) can be sensed from the device or its environment and used to establish user proximity to the device. (Incomplete representations of analog information can be used again to detect playback attacks when converted to digital form.)

이 어레인지먼트의 하나의 간단한 애플리케이션은 물건 찾기 게임이다(scavenger hunt) - 디바이스의 사진찍기는 디바이스에 이용자의 존재를 제공한다. 더욱 실제적인 애플리케이션은 산업 설정들이며, 물리적으로 존재하지 않는 디바이스들에 원격으로 액세스하는 것을 시도하는 사람들에 관련된다. One simple application of this arrangement is a scavenger hunt-taking a picture of the device provides the user's presence to the device. More practical applications are industrial settings and are related to those who attempt to remotely access devices that are not physically present.

이러한 어레인지먼트들의 대다수의 변동들 및 하이브리드들은 상술된 것으로부터 기술자에게 명백할 것이다.Many variations and hybrids of such arrangements will be apparent to the skilled person from what has been described above.

SIFTSIFT

때때로, SIFT 기술들에 대한 참조가 이루어진다. SIFT는 스케일-불변 특징 변환에 대한 머릿글자이고, David Lowe에 의해 개척되고 다양한 그의 논문들에 기술된 컴퓨터 비전 기술이며, 그의 논문은 International Journal of Computer Vision, 60, 2 (2004년), 91-110쪽의 "Distinctive Image Features from Scale-Invariant Keypoints"; 및 International Conference on Computer Vision, Corfu, Greece (1999년 9월), 1150-1157쪽의 "Object Recognition from Local Scale-Invariant Features" 뿐만 아니라 특허 6,711,293호를 포함한다. Occasionally, references are made to SIFT techniques. SIFT is an acronym for scale-invariant feature conversion and computer vision technology pioneered by David Lowe and described in various his papers. His paper is published in International Journal of Computer Vision, 60, 2 (2004), 91- "Distinctive Image Features from Scale-Invariant Keypoints" on page 110; And "Object Recognition from Local Scale-Invariant Features," International Conference on Computer Vision, Corfu, Greece (September 1999), 1150-1157, as well as patents 6,711,293.

SIFT는 로컬 이미지 특징들의 식별 및 기술 - 및 후속 검출 -에 의해 작업한다. SIFT 특징들은 국부적이며, 특별히 관심있는 지점들에서 오브젝트의 출현에 기초하고, 이미지 스케일, 회전 및 어파인 변환에 불변한다. 이들은 또한 조명의 변경들, 잡음 및 시점의 일부 변경들에 강력하다. 이들 속성들 외에도, 이들은 특이하고, 추출하기가 가 비교적 쉽고, 낮은 미스매칭 가능성으로 정확한 오브젝트 식별을 허용하고, 로컬 특징들의 (큰) 데이터베이스에 대해 매칭하기가 수월하다. SIFT 특징들의 세트에 의한 오브젝트 기술은 또한 부분적인 폐색에 강력하다; 오브젝트로부터 3개 정도의 SIFT 특징들이면 위치 및 자세를 계산하기에 충분하다. SIFT works by identifying and describing local image features—and subsequent detection. SIFT features are local and are based on the appearance of an object at points of particular interest and are invariant to image scale, rotation and affine transformation. They are also robust to changes in lighting, noise and some changes in viewpoint. In addition to these attributes, they are unique, relatively easy to extract, allow for accurate object identification with low mismatchability, and are easy to match against a (large) database of local features. Object description by a set of SIFT features is also robust to partial occlusion; As many as three SIFT features from the object are sufficient to calculate position and pose.

기술은 참조 이미지에서 로컬 이미지 특징들을 식별함으로써 시작한다 - 키포인트들이라고 칭해짐. 이것은 상이한 스케일들(해상도들)에서 가우시안 블러 필터들로 이미지를 감고, 연속적인 가우시안-블러링된 이미지들 사이의 차들을 결정함으로써 행해진다. 키포인트들은 다수의 스케일들에서 발생하는 가우시안들의 최대 또는 최소 차를 갖는 이미지 특징들이다. (가우시안 프레임의 차의 각각의 픽셀은 동일한 스케일에서 8개의 이웃들과 비교되고 이웃하는 스케일들의 각각(예를 들면, 9개의 다른 스케일들)의 대응하는 픽셀들에 비교된다. 픽셀 값이 모든 이들 픽셀들로부터 최대 또는 최소이면, 후보 키포인트로서 선택된다. The technique begins by identifying local image features in the reference image-called keypoints. This is done by winding the image with Gaussian blur filters at different scales (resolutions) and determining the differences between successive Gaussian-blurred images. Keypoints are image features with a maximum or minimum difference of Gaussians occurring at multiple scales. (Each pixel of the difference of the Gaussian frame is compared to eight neighbors at the same scale and to corresponding pixels of each of the neighboring scales (eg, nine different scales). The pixel value is compared to all these pixels. If it is the maximum or minimum from these, it is selected as the candidate keypoint.

(방금 기술된 절차는 이미지의 스케일-국부화된 라플라시안 변환의 공간-스케일 극값을 검출하는 볼브-검출 방법이다. 가우시안 방식의 차는 피라미드 설정에서 표현되는 이러한 라플라시안 연산의 근사치이다.)(The procedure just described is a volve-detection method that detects the space-scale extrema of the scale-localized Laplacian transform of an image. The Gaussian difference is an approximation of this Laplacian operation expressed in the pyramid setup.)

상기 절차는 통상적으로, 예를 들면 낮은 콘트라스트를 갖는 것으로 인해(따라서 잡음에 민감함으로 인해), 또는 에지를 따라 불량하게 결정된 위치들을 갖는 것으로 인해(가우시안들의 차 함수는 에지들을 따라 강한 응답을 가져서, 많은 후보 키포인트들을 생성하지만, 이들 중 대부분은 잡음에는 강력하지 않음) 적합하지 않은 많은 키포인트들을 식별한다. 신뢰할 수 없는 키포인트들은 정확한 위치, 스케일 및 주된 곡률들의 비율을 위해 근처의 데이터에 대한 후보 키포인트들 상에 상세한 맞춤을 실행함으로써 스크리닝된다. 이것은 낮은 콘트라스트를 가지거나 에지를 따라 불량하게 위치된 키포인트들을 거부한다. The procedure is typically for example due to having low contrast (and therefore sensitive to noise) or having poorly determined positions along the edge (the difference function of Gaussians has a strong response along the edges, Many candidate keypoints are generated, but most of them are not robust to noise). Unreliable keypoints are screened by performing a detailed fit on candidate keypoints for nearby data for accurate position, scale, and ratio of major curvatures. This rejects keypoints that have low contrast or are poorly located along the edge.

더욱 특별히, 이 처리는 - 각각의 후보 키포인트에 대해 - 키포인트 위치를 더욱 정확하게 결정하기 위해 근처의 데이터를 보간함으로써 시작한다. 이것은 흔히, 최대/최소 위치의 개량된 추정을 결정하기 위해 기원으로서 키포인트를 이용한 테일러 확장에 의해 행해진다. More particularly, this process begins by interpolating nearby data to determine the keypoint location more accurately-for each candidate keypoint. This is often done by Taylor extension using keypoints as the origin to determine an improved estimate of the maximum / minimum position.

2차 테일러 확장의 값은 낮은 콘트라스트 키포인트들을 식별하기 위해 또한 이용될 수 있다. 콘트라스트가 임계값보다 작다면(예를 들면, 0.03), 키포인트는 폐기된다. The value of the secondary taylor extension can also be used to identify low contrast keypoints. If the contrast is less than the threshold (eg 0.03), the keypoint is discarded.

강한 에지 응답들을 가지지만 불량하게 국부화되는 키포인트들을 제거하기 위해, 코너 검출 절차의 변형이 적용된다. 간단히, 이것은 에지에 걸쳐 주된 곡률을 계산하고, 에지를 따라 주된 곡률을 비교하는 것을 수반한다. 이것은 2차 헤센 메트릭스(Hessian matrix)의 고유값들을 풂으로써 행해진다. In order to remove keypoints having strong edge responses but poorly localized, a modification of the corner detection procedure is applied. In brief, this involves calculating the major curvature across the edges and comparing the major curvatures along the edges. This is done by subtracting the eigenvalues of the second Hessian matrix.

적합하지 않은 키포인트들이 폐기되면, 남아 있는 키포인트들은 로컬 이미지 기울기 함수에 의해 배향에 대해 평가된다. 기울기의 크기 및 방향은 가우시안 블러링된 이미지에서 키포인트 주위의 이웃하는 영역에서 모든 픽셀에 대해 계산된다(그 키포인트의 스케일에서). 36개의 빈들을 가진 배향 히스토그램이 그 후에 컴파일된다 - 각각의 빈은 배향의 정도들을 포함한다. 이웃의 각각의 픽셀은 히스토그램에 기여하고, 기여는 기울기 크기에 의해, 그리고 σ가 키포인트의 스케일의 1.5배인 가우시안에 의해 가중된다. 이 히스토그램의 피크들은 키포인트의 우세한 배향을 규정한다. 이 배향 데이터는 키포인트 디스크립터가 이 배향에 대해 표현될 수 있기 때문에, SIFT가 회전 견고성을 달성하도록 허용한다. If unsuitable keypoints are discarded, the remaining keypoints are evaluated for orientation by the local image slope function. The magnitude and direction of the slope is calculated for all pixels in the neighboring area around the keypoint in the Gaussian blurred image (at the scale of that keypoint). An orientation histogram with 36 bins is then compiled-each bin contains degrees of orientation. Each pixel in the neighborhood contributes to the histogram, and the contribution is weighted by the magnitude of the slope and by a Gaussian where σ is 1.5 times the scale of the keypoint. The peaks of this histogram define the predominant orientation of the keypoint. This orientation data allows SIFT to achieve rotational robustness because the keypoint descriptor can be represented for this orientation.

상술한 것으로부터, 스케일들이 상이한 복수의 키포인트들이 식별된다 - 각각은 대응하는 배향들을 가진다. 이 데이터는 이미지 번역, 스케일 및 회전에 불변한다. 128개의 요소 디스크립터들은 그 후에, 각각의 키포인트에 대해 생성되어, 조명 및 3D 관점에 대한 견고성을 허용한다. From the foregoing, a plurality of keypoints with different scales are identified-each with corresponding orientations. This data is invariant to image translation, scale and rotation. The 128 element descriptors are then generated for each keypoint, allowing robustness to the illumination and 3D perspective.

이 동작은 방금 리뷰된 배향 평가 절차와 유사하다. 키포인트 디스크립터는 (4 x 4) 픽셀 이웃들에 대한 배향 히스토그램들의 세트로서 계산된다. 배향 히스토그램들은 키포인트 배향에 관련되고, 배향 데이터는 키포인트의 스케일에 스케일이 가장 가까운 가우시안 이미지로부터 나온다. 이전과 같이, 각각의 픽셀의 기여는 기울기 크기에 의해, 및 σ가 키포인트의 스케일의 1.5배인 가우시안에 의해 가중된다. 히스토그램들은 8개의 빈들을 각각 포함하고, 각각의 디스크립터는 키포인트 주위의 16개의 히스토그램들의 4 x 4 어레이를 포함한다. 이것은 (4 x 4 x 8 = 128 요소들)을 가진 SIFT 특징 벡터를 유발한다. 이 벡터는 조명의 변화들에 대한 불변성을 향상시키기 위해 정규화된다. This operation is similar to the orientation evaluation procedure just reviewed. The keypoint descriptor is calculated as a set of orientation histograms for (4 x 4) pixel neighbors. Orientation histograms relate to keypoint orientation, and the orientation data comes from a Gaussian image that is closest to the scale of the keypoint. As before, the contribution of each pixel is weighted by the gradient magnitude, and by a Gaussian where sigma is 1.5 times the scale of the keypoint. The histograms each contain eight bins, and each descriptor contains a 4 x 4 array of 16 histograms around the keypoint. This results in a SIFT feature vector with (4 x 4 x 8 = 128 elements). This vector is normalized to improve the invariance of changes in illumination.

상술된 절차는 참조 데이터베이스를 컴파일하기 위해 트레이닝 이미지들에 적용된다. 알려지지 않은 이미지는 그 후에 키포인트 데이터를 생성하기 위해 상기한 바와 같이 처리되고, 데이터베이스 내의 가장 가까운 매칭 이미지는 유클리드 거리-형 측정에 의해 식별된다. ("best-bin-first" 알고리즘은 여러 차수들의 크기 속도 개선을 달성하기 위하여 순수한 유클리드 거리 계산 대신에 통상적으로 이용된다.) 위정들(false positives)을 회피하기 위하여, 최상의 매칭을 위한 거리 점수가 다음 최상의 매칭에 대한 거리 점수에 가까울 때 - 예를 들면 25% - "매칭 없음" 출력이 생성된다. The procedure described above is applied to training images to compile a reference database. The unknown image is then processed as described above to generate keypoint data, and the closest matching image in the database is identified by Euclidean distance-type measurement. (The "best-bin-first" algorithm is commonly used instead of pure Euclidean distance calculation to achieve several orders of magnitude speed improvement.) To avoid false positives, the distance score for best matching is When it is close to the distance score for the next best match-for example 25%-an "no match" output is generated.

성능을 더 개선시키기 위하여, 이미지는 클러스터링에 의해 매칭될 수 있다. 이것은 동일한 참조 이미지에 속하는 특징들을 식별한다 - 클러스터링되지 않은 결과들이 가짜인 것으로 간주되도록 허용한다. 허프 변환이 이용될 수 있다 - 동일한 오브젝트 포즈를 찬성하는 오브젝트들의 클러스터들을 식별한다. To further improve performance, images can be matched by clustering. This identifies the features belonging to the same reference image-allowing the non-clustered results to be considered fake. Hough transform can be used-identify clusters of objects that favor the same object pose.

SIFT 절차를 실행하기 위한 특정 하드웨어 실시예를 상술하기 논문은 2008년 IEEE Trans on Circuits and Systems for Video Tech, 제12호 제18권에서 Bonato 등에 의한 "Parallel Hardware Architecture for Scale and Rotation Invariant Feature Detection"이다. 이러한 어레인지먼트(70)의 블록도는 도 18에 제공된다(Bonato로부터 적응). A paper detailing a specific hardware embodiment for implementing the SIFT procedure is "Parallel Hardware Architecture for Scale and Rotation Invariant Feature Detection" by Bonato et al. In IEEE Trans on Circuits and Systems for Video Tech, Vol. 12, No. 12, 2008. . A block diagram of this arrangement 70 is provided in FIG. 18 (adapted from Bonato).

픽셀 데이터를 생성하는 카메라(32) 외에도, 3개의 하드웨어 모듈들(72-74)이 존재한다. 모듈(72)은 카메라로부터 픽셀들을 입력으로서 수신하고, 2개의 타입들의 동작들을 실행한다: 가우시안 필터 및 가우시안들의 차. 전자는 모듈(73)에 송신된다; 후자는 모듈(74)에 송신된다. 모듈(73)은 픽셀 배향 및 기울기 크기를 계산한다. 모듈(74)은 키포인트들을 검출하고, 특징들을 식별할 때 키포인트들이 신뢰할 수 있음을 보장하기 위해 안정성 확인들을 실행한다. In addition to the camera 32 generating pixel data, there are three hardware modules 72-74. Module 72 receives the pixels as input from the camera and performs two types of operations: a Gaussian filter and a difference of Gaussians. The former is sent to module 73; The latter is sent to module 74. Module 73 calculates pixel orientation and gradient magnitudes. Module 74 detects keypoints and performs stability checks to ensure that the keypoints are reliable when identifying features.

소프트웨어 블록(75)(Altera NIOS Ⅱ 필드 프로그래밍 가능한 게이트 어레이 상에서 실행됨)은 블록(73)에 의해 생성된 픽셀 배향 및 기울기 크기에 기초하여 블록(74)에 의해 검출된 각각의 특징에 대한 디스크립터를 생성한다. Software block 75 (executed on Altera NIOS II field programmable gate arrays) generates descriptors for each feature detected by block 74 based on the pixel orientation and gradient magnitudes generated by block 73. Create

상이한 모듈들을 동시에 실행하는 것 외에도, 각각의 하드웨어 블록 내에서 병렬화가 존재한다. Bonato의 예시적 구현은 초당 30 프레임들을 처리한다. 셀 폰 구현은 초기 생성에서 적어도 10fps와 같이 다소 더 느리게 실행할 수 있다. In addition to running different modules simultaneously, there is parallelism within each hardware block. An example implementation of Bonato handles 30 frames per second. The cell phone implementation may run somewhat slower at least 10 fps in initial generation.

판독자는 다른 상세들을 위해 Bonato 논문을 참조한다. The reader refers to the Bonato paper for other details.

기존 SIFT 기술들에 대한 대안적인 하드웨어 아키텍처는 2004년 10월 Proc. of Int. Astronautical Congress (IAC)에서 Se 등에 의한 "Vision Based Modeling and Localization for Planetary Exploration Rovers"에 상술된다.An alternative hardware architecture for existing SIFT technologies was published in October 2004 in Proc. of Int. It is detailed in "Vision Based Modeling and Localization for Planetary Exploration Rovers" by Se et al. At the Astronautical Congress (IAC).

또 다른 어레인지먼트는 2009년 Bonn, Mobile Interaction with the Real World에서 Henze 등에 의한 "What is That? Object Recognition from Natural Features on a Mobile Phone"에 상술된다. Henze 등은 트리 방식의 이용을 통해 인식될 수 있는 오브젝트들의 이용을 확장하기 위해 Nister 등 및 Schindler 등에 의한 기술들을 이용한다(예를 들면, 2006년 proc. of Computer Vision and Pattern Recognition에서 Nister 등에 의한 "Scalable Recognition with a Vocabulary Tree", 및 2007년 Proc. of Computer Vision and pattern Recognition에서 Schindler 등에 의한 "City-Scale Location Recognition"을 참조한다). Another arrangement is detailed in "What is That? Object Recognition from Natural Features on a Mobile Phone" by Bonze, Mobile Interaction with the Real World, 2009. Henze et al. Use techniques by Nister et al. And Schindler et al. To expand the use of objects that can be recognized through the use of tree methods (eg, “Scalable” by Nister et al. In 2006 proc. Of Computer Vision and Pattern Recognition. Recognition with a Vocabulary Tree ", and" City-Scale Location Recognition "by Schindler et al. In 2007 Proc. Of Computer Vision and pattern Recognition).

상술된 구현들은 셀 폰 플랫폼들 상에서 활용될 수 있거나, 처리는 셀 폰과 하나 이상의 원격 서비스 제공자들 사이에 분산될 수 있다(또는 모든 이미지-처리 실행-폰을 이용하여 구현될 수 있다).Implementations described above may be utilized on cell phone platforms, or processing may be distributed between the cell phone and one or more remote service providers (or may be implemented using any image-processing execution-phone).

공개된 특허 출원 WO07/130688은 SIFT의 셀 폰-기반 구현에 관련되며, 여기서 로컬 디스크립터 특징들은 셀 폰 처리기에 의해 추출되고, 참조 라이브러리에 대한 매칭을 위해 원격 데이터베이스에 송신된다. Published patent application WO07 / 130688 relates to a cell phone-based implementation of SIFT, wherein local descriptor features are extracted by the cell phone processor and sent to a remote database for matching to a reference library.

SIFT가 아마도 강력한 로컬 디스크립터들을 생성하기 위한 가장 잘 알려진 기술이지만, 애플리케이션에 의존하여- 다소 적당할 수 있는 다른 것들이 존재한다. 이들은 GLOH(2005년 IEEE Trans. Pattern Anal. Mach. Intell., 제10호 제27권 1615-1630쪽에서 Mikolajczyk 등에 의한 "Performance Evaluation of Local Descriptors" 비고); 및 SURF(2006년 Eur. Conf. on Computer Vision (1) 404-417쪽에서 Bay 등에 의한 "SURF: Speeded Up Robust Features" 비고); 뿐만 아니라, 2007년 Proc. of the 6th IEEE and ACM Int. Symp. On Mixed and Augmented Reality에서 Chen 등에 의한 "Efficient Extraction of Robust Image Features on Mobile Devices"; 및 2008년 10월 ACM Int. Conf. on Multimedia Information Retrieval에서 Takacs 등에 의한 "Outdoors Augmented Reality on Mobile Phone Using Loxel-Based Visual Feature Organization"을 포함한다. 로컬 디스크립터 특징들의 조사는 2005년 IEEE Trans. On Pattern Analysis and Machine Intelligence에서 Mikolajczyk 등에 의한 "A Performance Evaluation of Local Descriptors"에 제공된다. SIFT is probably the best known technique for generating powerful local descriptors, but there are others that may be more or less appropriate depending on the application. They GLOH (2005 years IEEE Trans Pattern Anal Mach Intell, the "Performance Evaluation of Local Descriptors" due 10 No. 27 1615-1630 Mikolajczyk side note...); And SURF (side 2006 Eur Conf on Computer Vision (1) 404-417 due Bay.. "SURF: Speeded Up Robust Features"note); In addition, 2007 Proc. of the 6th IEEE and ACM Int. Symp. "Efficient Extraction of Robust Image Features on Mobile Devices" by Chen et al in On Mixed and Augmented Reality; And October 2008 ACM Int. Conf. In the on Multimedia Information Retrieval, "Outdoors Augmented Reality on Mobile Phone Using Loxel-Based Visual Feature Organization" by Takacs et al. A survey of local descriptor features was published in 2005 in IEEE Trans. On Pattern Analysis and Machine Intelligence are provided in "A Performance Evaluation of Local Descriptors" by Mikolajczyk et al.

Takacs 논문은 이미지 매칭 속도가 대량의 참조 이미지들(그로부터 매칭들이 도출됨)을 이용자의 현재 위치에 지리적으로 가까운 것들(예를 들면 30미터 이내)로 제한함으로써 크게 증가된다는 것을 개시한다. 출원자들은 우주가 얼굴들, 식료품들, 집들 등과 같은 특수화된 도메인들에 - 이용자 선택 또는 다른 것에 의해 - 제한되는 것이 유리할 수 있다고 생각한다.The Takacs paper discloses that the image matching speed is greatly increased by limiting a large number of reference images (matches derived therefrom) to those geographically close to the user's current location (eg within 30 meters). Applicants believe that it may be advantageous for the universe to be limited to specialized domains such as faces, foodstuffs, houses, etc.-by user selection or otherwise.

오디오 애플리케이션들에 대한 추가Additions to Audio Applications

모바일 디바이스 상의 음성 대화는 자연스럽게 세션의 구조를 규정하여, 오디오 키벡터 처리의 우선순위화에 레버리징될 수 있는 상당량의 메타데이터를 제공한다(식별된 호출자, 지리적 위치 등의 형태의 대부분의 관리 정보). Voice conversations on mobile devices naturally define the structure of the session, providing a significant amount of metadata that can be leveraged for prioritizing audio keyvector processing (most management information in the form of identified callers, geographic locations, etc.). ).

CallerID 정보를 수반하지 않고 호출이 수신된다면, 이것은 음성메일함에 여전히 있거나 이에 대한 키벡터 데이터가 보존된 과거 호들과 음성 패턴 매칭의 처리를 트리거할 수 있다. (구글 보이스는 인식 또는 매칭 용도들을 위해 잠재적으로 유용한 음성 데이터의 장기간 저장소가다.) If a call is received without involving CallerID information, this may trigger the processing of voice pattern matching with past calls that are still in the voicemail box or where keyvector data for them is preserved. (Google Voice is a long term storage of potentially useful voice data for recognition or matching purposes.)

호출의 발생 지리학이 식별될 수 있지만 친숙한 번호가 아니라면(예를 들면, 이용자의 접촉 리스트에도 일반적으로 수신되는 번호도 아님), 발생 지리학을 고려하여- 음성 인식을 위한 기능 블록들이 호출될 수 있다. 예를 들면, 그것이 외국일 때, 그 나라의 언어로의 음성 인식이 개시될 수 있다. 수신기가 호출을 수신하면, 이용자의 모국어로의 동시적 음성-텍스트 변환이 개시되어 변환을 돕기 위해 스크린 상에 디스플레이될 수 있다. 지리학이 국내이면, 지방 사투리/특정 악센트 음성 인식 라이브러리들의 리콜이 남부 특유의 끄는 말투 또는 보스톤의 악센트를 더욱 용이하게 대처하도록 허용할 수 있다. If the origin geography of the call can be identified but is not a familiar number (e.g., it is also a number generally received in the user's contact list), in view of the origin geography-functional blocks for speech recognition can be called. For example, when it is foreign, speech recognition in the language of that country may be initiated. When the receiver receives the call, simultaneous voice-to-text conversion to the user's native language may be initiated and displayed on the screen to assist the conversion. If geography is domestic, recall of local dialect / specific accented speech recognition libraries may allow for easier handling of southern-specific tonal tone or Boston accents.

대화가 개시되었으면, 음성 인식에 기초한 프롬프트들이 셀 폰 스크린(또는 다른 것) 상에 제공될 수 있다. 접속의 원단 상의 화자가 특정 주제에 대해 논의하기 시작하면, 위키피디아와 같은 참조 사이트들에 모국어 질의들을 생성하고, 가용성을 확인하기 위해 로컬 이용자의 달력을 찾고, 쇼핑 리스트들을 복사하기 위해 결과로서 생긴 텍스트를 레버리징할 수 있다. Once the conversation has been initiated, prompts based on speech recognition can be provided on the cell phone screen (or the like). When the speaker on the far end of the connection begins discussing a particular subject, the resulting text is used to generate native language queries to reference sites such as Wikipedia, find a local user's calendar to check availability, and copy shopping lists. Can leverage.

세션 동안 음성의 평가 및 처리를 넘어, 다른 오디오가 마찬가지로 분석될 수 있다. 대화의 원단 상의 이용자가 로컬 처리 및 키벡터 생성을 할 수 없거나 하지 않기로 선택한다면, 이것은 로컬 이용자의 핸드세트 상에서 달성될 수 있어서, 원격 경험들이 로컬로 공유되도록 허용한다. Beyond the evaluation and processing of speech during the session, other audio can be analyzed as well. If the user on the far end of the conversation chooses not to or cannot do local processing and keyvector generation, this can be accomplished on the local user's handset, allowing remote experiences to be shared locally.

모든 상술된 것들이 비디오 호들에 대해서도 마찬가지로 유효하며, 두 오디오 및 비주얼 정보가 분석되어 키벡터들로 처리될 수 있음이 명백하다.It is clear that all of the above are equally valid for video calls, and that both audio and visual information can be analyzed and processed into keyvectors.

개별 처리를 위한 공개 이미지Public image for individual processing

대부분의 상술된 논의는 모바일 폰들과 같은 개인용 디바이스들에 의해 캡처된 이미지를 관련시켰다. 그러나, 논의된 원리들 및 어레인지먼트들은 다른 이미지에도 또한 적용 가능하다. Most of the above discussion involved images captured by personal devices such as mobile phones. However, the principles and arrangements discussed are also applicable to other images.

붐비는 주자창에서 주차된 차량을 찾는 문제를 고려하자. 주차장의 소유주는 매우 유리한 지점에서 주차장의 이미지를 캡처하기 위해 하나 이상의 폴-장착된 카메라들을 세울 수 있다. 이러한 이미지는 이용 가능해질 수 있다 - 일반적으로(예를 들면, 인터넷으로부터 다운로드된 파일 또는 페이지에 의해), 또는 국부적으로(예를 들면, 로컬 무선 네트워크로부터 다운로드된 파일 또는 페이지에 의해). 개인들은 이러한 카메라들로부터 이미지를 획득하여, 개별적인 용도들- 그들 차량이 주차된 곳을 찾는 것과 같이 - 을 위해 이를 분석할 수 있다. 예를 들면, 이용자의 모바일 폰은 하나 이상의 이미지들을 다운로드할 수 있고, 상술된 바와 같은 이미지 처리(기계 비전) 기술들을 적용하여, 사람의 적색 Honda Civic을 인식하고 따라서 주차장에서 이를 찾는다(또는 완벽한 매칭이 발견되지 않건, 여러 매칭들이 발견되면, 여러 후보위치들을 식별한다).Consider the problem of finding parked cars in crowded runner windows. The owner of the parking lot can set up one or more pole-mounted cameras to capture an image of the parking lot at a very advantageous point. Such an image may be made available-generally (eg by a file or page downloaded from the Internet), or locally (eg by a file or page downloaded from a local wireless network). Individuals can acquire images from these cameras and analyze them for individual uses-such as finding where their vehicle is parked. For example, a user's mobile phone can download one or more images and apply image processing (machine vision) techniques as described above to recognize a person's red Honda Civic and thus find it in a parking lot (or a perfect match). If not found, if multiple matches are found, then multiple candidate positions are identified).

변형 어레인지먼트에서, 이용자의 모바일 폰은 단순히, 원하는 차량을 특징짓는 데이터의 템플릿을 웹 서비스(예를 들면, 주차장의 소유주에 의해 조작됨)에 제공한다. 웹 서비스는 그 후에 데이터 템플릿을 매칭하는 후보 차량들을 식별하기 위해 이용 가능한 이미지를 분석한다.In a variant arrangement, the user's mobile phone simply provides a template of data that characterizes the desired vehicle to a web service (eg, operated by the owner of the parking lot). The web service then analyzes the available images to identify candidate vehicles that match the data template.

일부 어레인지먼트에서, 카메라는 조정 가능한 짐블(gimble) 상에 장착될 수 있고 줌 렌즈들이 장착되어, 팬/틸트/줌 제어들을 제공한다. (하나의 이러한 카메라는 Axis 215 PTZ-E이다.) 이들 제어들이 이용자들에게 액세스 가능하게 이루어질 수 있다 - 분석을 위해(나는 Macy'의 것이 주차된 것을 알고 있다) 주차장의 특정 부분으로부터 이미지를 캡처하거나, 다른 이미지의 분석이 실행되었으면 후보 매칭 차량들 사이에서 더욱 양호하게 구별하도록 이용자들이 카메라를 조정하게 허용한다. (남용을 방지하기 위해, 카메라 제어 특권들은 허가된 이용자들에게만 제공될 수 있다. 허가를 확립하기 위한 하나의 방법은 이용자에 의한 주차 전표 - 이용자의 차량이 주차장에 처음 진입했을 때 발행됨 - 의 보관이다. 이 전표는 이용자의 모바일 폰 카메라에 보여져서, 이용자가 특정 기간(예를 들면, 12시간 전) 내에 주차장에 주차했음을 나타내는 인쇄된 정보(예를 들면, 영숫자, 바코드, 워터마크 등)를 판별하기 위해 분석될 수 있다(예를 들면, 주차장과 연관된 서버에 의해).)In some arrangements, the camera may be mounted on an adjustable gimble and zoom lenses may be mounted to provide pan / tilt / zoom controls. (One such camera is the Axis 215 PTZ-E.) These controls can be made accessible to users-capturing images from specific parts of the parking lot for analysis (I know Macy's is parked). Or, if analysis of another image has been performed, allow users to adjust the camera to better distinguish between candidate matching vehicles. (To prevent abuse, camera control privileges can only be provided to authorized users. One way to establish a permit is a parking slip by the user, issued when the user's vehicle first enters the parking lot. This slip is shown on the user's mobile phone camera and printed information (eg alphanumeric, bar code, watermark, etc.) indicating that the user has parked in the parking lot within a certain period of time (eg 12 hours ago). May be analyzed (eg, by a server associated with the parking lot).

주차장 소유주에 의해 제공된 폴-장착된 카메라 대신에(또는 그 외에도), 유사한 "내 차량 찾기" 기능이 크라우드-소싱된 이미지의 이용을 통해 달성될 수 있다. 개별 이용자들 각각이 그들 각각의 차량들을 찾기 위해 그들 모바일 폰들에 의해 캡처된 이미지를 처리하는 "내 차량 찾기" 애플리케이션을 가진다면, 이러한 방식으로 캡처된 이미지는 다른 것들의 이점들을 위해 공유될 수 있다. 따라서, 이용자 "A"는 특정 차량의 검색에서 통로들을 돌아다닐 수 있고, 이용자 "B"는 주차장 다른 곳에서 마찬가지로 돌아다닐 수 있고, 각각으로부터의 이미지 공급들은 공유될 수 있다. 따라서, 이용자 B의 모바일 폰은 이용자 A의 모바일 폰에 의해 캡처된 이미지에서 B의 차량을 찾을 수 있다. Instead of (or in addition to) a pole-mounted camera provided by the parking lot owner, a similar "find my vehicle" function can be achieved through the use of crowd-sourced images. If each of the individual users has a "find my vehicle" application that processes the image captured by their mobile phones to find their respective vehicles, the image captured in this way can be shared for the advantages of others. . Thus, user "A" may roam the aisles in search of a particular vehicle, user "B" may likewise roam elsewhere in the parking lot, and image supplies from each may be shared. Thus, user B's mobile phone can find B's vehicle in the image captured by user A's mobile phone.

이러한 수집된 이미지는 로컬 무선 네트워크를 통해 액세스 가능한 아카이브에 저장될 수 있고, 그로부터 이미지는 설정 기간 후, 예를 들면 2시간 후에 제거된다. 바람직하게, 지리적 위치 데이터는 각각의 이미지와 연관되어, 상이한 이용자에 의해 캡처된 이미지와의 매칭된 차량은 주차장에서 물리적으로 발견될 수 있다. 이러한 이미지 아카이브는 주차장 소유주에 의해(또는 소유주가 계약한 서비스에 의해) 유지될 수 있다. 대안적으로, 이미지들은 다른 곳으로부터 소싱될 수 있다. 예를 들면, 모바일 폰들은 플리커, 피카사 등과 같은 하나 이상의 온라인 아카이브들에의 저장을 위해 캡처된 이미지를 자동으로 - 또는 이용자 명령어에 응답하여 - 포스팅할 수 있다. 이용자의 "내 차량 찾기" 애플리케이션은 지리적으로 근사한(10 또는 100야드 내와 같은 이용자의 특정 거리 또는 기준 위치 내에서 캡처된 -또는 대상물 묘사된 - 이미지와 같이) 및 또한 임시로 근사된(예를 들면, 과거 10 또는 60분 내와 같이 특정한 이전 시간 간격 내에서 캡처된) 이미지에 대해 하나 이상의 그러한 아카이브들에 질의할 수 있다. 이러한 제 3 자 이미지의 분석은 이용자의 차량을 찾도록 서빙할 수 있다. This collected image can be stored in an archive accessible through the local wireless network, from which the image is removed after a set period of time, for example two hours later. Preferably, geographic location data is associated with each image so that a matched vehicle with an image captured by a different user can be physically found in the parking lot. This image archive can be maintained by the parking lot owner (or by the owner's contracted service). Alternatively, images can be sourced from elsewhere. For example, mobile phones can automatically post a captured image-or in response to user commands-for storage in one or more online archives such as Flickr, Picasa, and the like. Your "Find My Car" application is geographically stunning (such as an image captured within a user's specific distance or reference location, such as within 10 or 100 yards-or an object depicted-) and also temporarily approximated (e.g., For example, one or more such archives may be queried for an image captured within a particular previous time interval, such as within the past 10 or 60 minutes. Such analysis of the third party image may serve to find the user's vehicle.

범위들을 약간 확장하여, 이제, 정보가 이제 공개적으로 액세스될 수 있는 세상의 모든 카메라들(급격히 증가한 고속도로 카메라들은 단지 일례이다)은 단순히 늘 변화하는 웹 페이지의 "데이터"로서 인식되는 것을 고려하자. (보안 카메라와 같은 카메라의 비밀 네트워크들로부터의 이미지는 특정 프라이버시 보호장치들 또는 민감한 정보의 다른 정화로 이용 가능하게 될 수 있다.) 옛날에는 "데이터"가 "텍스트"에 의해 지배를 받아서, 키워드들을 유발하고, 검색 엔진들을 유발하고, 구글을 유발했다. 그러나, 고도로 분산된 카메라 네트워크를 이용하여, 대부분의 편재한 데이터 형태는 픽셀들이 된다. 일부 데이터 구조(예를 들면, 표준화되는 포맷들의 키벡터들 - 시간/위치를 포함할 수 있음) 및 본 명세서에 상술된 다른 기술들과 함께, 위치 및 시간 데이터에 부가하여, 새로운 등급의 검색을 위한 단계가 설정되며, 여기서 제공자들은 어떤 형태들의 키벡터 분류들 및 시간/위치로 충분히 공급된 - 양쪽 모두는 새로운 검색 구성요소들의 기본이 됨 - 현재 및 과거 공용 관점들의 큰 콜렉션들을 컴파일 및/또는 분류(또는 인덱싱)한다. 텍스트 검색은 현저히 감소하며, 새로운 질의 패러다임들 - 비주얼 자극 및 키벡터 속성들 - 이 최근에 생겨나고 있다.Expanding the ranges slightly, now consider that all cameras in the world where information is now publicly accessible (the rapidly increasing highway cameras are just one example) are simply perceived as "data" of ever-changing web pages. (Images from cameras' secret networks, such as security cameras, can be made available with certain privacy guards or other cleansing of sensitive information.) In the past, "data" was dominated by "text" Triggered them, triggered search engines, triggered Google. However, using a highly distributed camera network, most ubiquitous data types become pixels. In addition to positional and temporal data, along with some data structures (eg, may include keyvectors in standardized formats—time / position) and other techniques detailed herein, a new class of search Steps are set up, where providers are sufficiently supplied with some form of keyvector classifications and time / location, both of which are the basis of new search components-compiling and / or compiling a large collection of current and past public perspectives. Classify (or index). Text search is significantly reduced, and new query paradigms-visual stimulus and keyvector properties-are emerging.

다른 코멘트들Other comments

예시된 예들을 참조하여 우리의 독창적 작업의 원리들을 기술하고 예시하였지만, 기술이 그렇게 제한되는 것이 아님을 알 것이다. Although the principles of our original work have been described and illustrated with reference to the illustrated examples, it will be appreciated that the technology is not so limited.

예를 들면, 셀 폰들에 대한 참조가 이루어졌지만, 본 기술은 모든 방식의 디바이스들 - 휴대용 및 고정 양쪽 모두- 과의 유틸리티를 발견하는 것을 알 것이다. PDA들, 구성기들, 휴대용 음악 플레이어들, 데스크탑 컴퓨터들, 랩탑 컴퓨터들, 테블릿 컴퓨터들, 노트북들, 초경량휴대용품들(ultraportables), 착용형 컴퓨터들, 서버들 등이 본 명세서에 상술된 원리들을 모두 이용할 수 있다. 특히, 고찰된 셀 폰들은 애플 아이폰, 및 구글의 안드로이드 명세를 따르는 셀 폰들(예를 들면, HTC Corp.에 의해 T-Mobile을 위해 제조된 G1 폰)을 포함한다. 용어 "셀 폰"( 및 "모바일 폰")은 이들이 엄격히 말해서 셀룰러도 아니고 전화기도 아니더라도, 모든 이러한 디바이스들을 포함하도록 해석되어야 한다. For example, although reference has been made to cell phones, it will be appreciated that the present technology finds utility with all manner of devices-both portable and fixed. PDAs, configurators, portable music players, desktop computers, laptop computers, tablet computers, notebooks, ultraportables, wearable computers, servers and the like described above The principles can all be used. In particular, cell phones contemplated include the Apple iPhone, and cell phones that conform to Google's Android specification (eg, G1 phones manufactured for T-Mobile by HTC Corp.). The term "cell phone" (and "mobile phone") should be interpreted to encompass all such devices, even if they are not strictly cellular or telephone.

(터치 인터페이스를 포함한 아이폰의 상세들은 애플 공개된 특허 출원 20080174570에 제공된다.) (The details of the iPhone, including the touch interface, are provided in Apple published patent application 20080174570.)

이 개시내용에 참조된 셀 폰들 및 다른 컴퓨터들의 설계는 기술자에게 친숙하다. 일반적인 관점들에서, 각각은 하나 이상의 처리기들, 하나 이상의 메모리들(예를 들면, RAM), 저장장치(예를 들면, 디스크 또는 플래시 메모리), 이용자 인터페이스(예를 들면, 키패드, TFT LCD 또는 OLED 디스플레이 스크린, 터치 또는 다른 제스처 센서들, 카메라 또는 다른 광 센서, 나침반 센서, 3D 자기계, 3-축 가속도계, 마이크로폰 등과 함께, 그래픽 이용자 인터페이스를 제공하기 위한 소프트웨어 명령어들을 포함할 수 있음), 이들 요소들(예를 들면 버스들)과 다른 디바이스들과 통신하기 위한 인터페이스와의 상호접속들이다(GSM, CDMA, W-CDMA, CDMA2000, TDMA, EV-DO, HSDPA, WiFi, WiMax, 또는 블루투스와 같이 무선일 수 있고, 및/또는 이더넷 근거리 네트워크, T-1 인터넷 접속 등을 통과는 것과 같이 유선일 수 있다).The design of cell phones and other computers referenced in this disclosure is familiar to the skilled person. In general terms, each may include one or more processors, one or more memories (eg, RAM), storage (eg, disk or flash memory), user interface (eg, keypad, TFT LCD or OLED). May include software instructions to provide a graphical user interface with a display screen, touch or other gesture sensors, camera or other optical sensor, compass sensor, 3D magnetometer, 3-axis accelerometer, microphone, etc.), these elements Interfaces (e.g. buses) and interfaces to communicate with other devices (wireless like GSM, CDMA, W-CDMA, CDMA2000, TDMA, EV-DO, HSDPA, WiFi, WiMax, or Bluetooth). And / or wired, such as through an Ethernet local area network, T-1 internet connection, etc.).

본 명세서에 상술된 어레인지먼트들은 또한 개인용 피플 미터들(PPMs: Personal People Meters)과 같은 - 시청자 조사 용도를 위한 주위 매체를 감지하는 페이저-크기의 디바이스들 - 휴대용 모니터링 디바이스들에서 활용될 수 있다(예를 들면, Nielsen 특허 공개 20090070797 및 Arbitron 특허들 6,871,180 및 7,222,071를 참조한다). 이용자 온라인에 제공될 수 있는 상이한 형태들의 콘텐트에 동일한 원리들이 또한 적용될 수 있다. 이와 관련하여, 네트워크-접속된 미디어 모니터링 디바이스를 상술하는 Nielsen 특허 출원 20080320508을 참조한다. The arrangements detailed herein can also be utilized in portable monitoring devices, such as phaser-sized devices that sense ambient media for viewer research purposes, such as Personal People Meters (PPMs) (eg See, eg, Nielsen Patent Publication 20090070797 and Arbitron Patents 6,871,180 and 7,222,071. The same principles can also be applied to different forms of content that can be provided to a user online. In this regard, reference is made to Nielsen patent application 20080320508, which details a network-connected media monitoring device.

이 명세서 초반에 양수인의 이전 특허 출원들에 대한 관련들을 주지하였지만, 반복할 수 있다. 이들 개시내용들은 전체로서 해석되고 협력하여 판독되어야 한다. 출원인들은 각각의 특징들이 다른 것들의 특징들과 조합되도록 의도한다. 따라서, 예를 들면, 출원들 12/271,772 및 12/490,980에 개시된 신호 처리는 본 명세서에 상술된 아키텍처들 및 클라우드 어레인지먼트를 이용하여 구현될 수 있고, 크라우드-소싱된 데이터베이스들은 흐름 이용자 인터페이스들을 커버할 수 있고, '772 및 '980 애플리케이션들에 상술된 다른 특징들은 현재 개시된 기술들의 구현들에 통합될 수 있다. 등. 따라서, 본 출원에 개시된 방법들, 요소들 및 개념들은 이들 관련된 출원들에 상술된 방법들, 요소들 및 개념들과 조합되는 것을 알아야 한다. 일부는 본 명세서에 특별히 상술되었지만, 대다수의 치환들 및 큰 조합들로 인해 대부분은 그렇지 않다. 그러나, 모든 이러한 조합들의 구현은 기술자에게는 제공된 개시내용들로부터 수월하다. At the beginning of this specification, the relevance to the transferee's previous patent applications was noted, but can be repeated. These disclosures should be interpreted as a whole and read in concert. Applicants intend that each feature be combined with the features of others. Thus, for example, the signal processing disclosed in applications 12 / 271,772 and 12 / 490,980 can be implemented using the architectures and cloud arrangements detailed herein, and crowd-sourced databases may cover flow user interfaces. And other features described above in the '772 and' 980 applications can be incorporated into implementations of the presently disclosed techniques. Etc. Accordingly, it should be understood that the methods, elements, and concepts disclosed herein are combined with the methods, elements, and concepts described above in these related applications. Some have been specifically described herein, but most are not because of the large number of substitutions and large combinations. However, implementation of all such combinations is straightforward from the disclosures provided to the skilled person.

본 명세서에 개시된 상이한 실시예들 내의 요소들 및 개시내용들은 또한 교환 및 조합되는 것으로 의미된다. 예를 들면, 도 1 내지 도 12의 콘텍스트에 상술된 개시내용들은 도 14 내지 도 20의 어레인지먼트들에서 이용될 수 있고, 그 반대로도 가능하다. Elements and disclosures in the different embodiments disclosed herein are also meant to be interchanged and combined. For example, the disclosures described above in the context of FIGS. 1-12 may be used in the arrangements of FIGS. 14-20 and vice versa.

본 명세서에 상술된 처리들 및 시스템 구성요소들은 마이크로프로세서들, 그래픽 처리 유닛들(nVidia Tegra APX 2600과 같은 GPU들), 디지털 신호 처리기들(예를 들면, 텍사스 인스트루먼츠 TMS320 시리즈 디바이스들) 등을 포함하는 다양한 프로그래밍 가능한 처리기들에 대한 범용 처리기 명령어들을 포함하는 컴퓨팅 디바이스에 대한 명령어들로서 구현될 수 있다. 이들 명령어들은 소프트웨어, 펌웨어 등으로서 구현될 수 있다. 이들 명령어들은 또한, 프로그래밍 가능한 논리 디바이스들, FPGA들(예를 들면, 주지된 Xilinx Virtex 시리즈 디바이스들), FPOA들(예를 들면, 주지된 PicoChip 디바이스들) 및 특수 용도 회로들 - 디지털, 아날로그 및 혼합된 아날로그/디지털 회로를 포함하여 - 을 포함하는 다양한 형태들의 처리기 회로에 구현될 수 있다. 명령어들의 실행은 처리기들 사이에 분산될 수 있고 및/또는 디바이스 내의 처리기들에 걸쳐 또는 디바이스들의 네트워크에 걸쳐 병렬로 이루어질 수 있다. 콘텐트 신호 데이터의 변환은 또한, 상이한 처리기 및 메모리 디바이스들 사이에 분산될 수 있다. "처리기들" 또는 "모듈들"에 대한 참조들(푸리에 변환 처리기 또는 FFT 모듈 등과 같이)은 특정 형태의 구현을 요구하기보다는 기능을 참조하는 것으로 이해되어야 한다. The processes and system components detailed herein include microprocessors, graphics processing units (GPUs such as the nVidia Tegra APX 2600), digital signal processors (eg, Texas Instruments TMS320 series devices), and the like. Can be implemented as instructions for a computing device that includes general purpose processor instructions for various programmable processors. These instructions may be implemented as software, firmware, or the like. These instructions also include programmable logic devices, FPGAs (eg, well-known Xilinx Virtex series devices), FPOAs (eg, well-known PicoChip devices) and special purpose circuits-digital, analog and It can be implemented in various types of processor circuits, including mixed analog / digital circuits. Execution of the instructions may be distributed among processors and / or in parallel across processors within a device or across a network of devices. Conversion of content signal data may also be distributed among different processors and memory devices. References to "processors" or "modules" (such as Fourier transform processor or FFT module, etc.) should be understood to refer to functionality rather than requiring a particular form of implementation.

FFT들에 대한 참조들은 또한 역 FFT들 및 관련 변환들(예를 들면, DFT, DCT, 그들 각각의 역들 등)을 포함하는 것으로 이해되어야 한다.References to FFTs should also be understood to include inverse FFTs and associated transforms (eg, DFT, DCT, their respective inverses, etc.).

상술된 기능을 구현하기 위한 소프트웨어 명령어들은 본 명세서에 제공된 기술들로부터 기술자들에 의해 쉽게 작성될 수 있으며, 예를 들면, C, C++, 비주얼 베이직, 자바, 파이산, Tcl, 펄, 스킴, 루비 등으로 작성될 수 있다. 본 기술의 특정 구현들에 따른 셀 폰들 및 다른 디바이스들은 상이한 기능들 및 동작들을 실행하기 위한 소프트웨어 모듈들을 포함할 수 있다. 알려진 인공 지능 시스템들 및 기술들은 상기 주지된 추론, 결론들 및 다른 결정들을 하기 위해 활용될 수 있다. Software instructions for implementing the functions described above can be easily written by technicians from the techniques provided herein, for example C, C ++, Visual Basic, Java, Pysan, Tcl, Perl, Scheme, Ruby Or the like. Cell phones and other devices in accordance with certain implementations of the present technology may include software modules for performing different functions and operations. Known artificial intelligence systems and techniques may be utilized to make the inferences, conclusions, and other decisions noted above.

일반적으로, 각각의 디바이스는 하드웨어 리소스들에 대한 인터페이스들 및 범용 기능들을 제공하는 오퍼레이팅 시스템 소프트웨어를 포함하고, 또한 이용자에 의해 원하는 특정 작업들을 실행하기 위해 선택적으로 호출될 수 있는 애플리케이션 소프트웨어를 포함할 수 있다. 알려진 브라우저 소프트웨어, 통신 소프트웨어 및 미디어 처리 소프트웨어는 본 명세서에 상술된 많은 이용들을 위해 적응될 수 있다. 소프트웨어 및 하드웨어 구성 데이터/명령어들은 자기 및 광 디스크들, 메모리 카드들, ROM 등과 같이 네트워크를 통해 액세스될 수 있는 유형 매체들에 의해 전달되는 하나 이상의 데이터 구조들에서의 명령어들로서 일반적으로 저장된다. 일부 실시예들은 임베딩된 시스템 - 오퍼레이팅 시스템 소프트웨어 및 애플리케이션 소프트웨어가 이용자에게 구별 가능하지 않은(예를 들면, 기본 셀 폰들에서 일반적인 경우이므로) 특수 용도 컴퓨터 시스템 - 으로서 구현될 수 있다. 본 명세서에서 상술된 기능은 오프레이팅 시스템 소프트웨어, 애플리케이션 소프트웨어 및/또는 임베딩된 시스템 소프트웨어에서 구현될 수 있다. In general, each device includes operating system software that provides interfaces and general purpose functions for hardware resources, and may also include application software that may be selectively called by a user to perform particular tasks desired. have. Known browser software, communication software, and media processing software can be adapted for many of the uses detailed herein. Software and hardware configuration data / instructions are generally stored as instructions in one or more data structures carried by tangible media such as magnetic and optical disks, memory cards, ROM, and the like, that can be accessed over a network. Some embodiments may be implemented as an embedded system—a special purpose computer system in which operating system software and application software are indistinguishable to the user (eg, as is common in basic cell phones). The functionality detailed herein may be implemented in offrating system software, application software and / or embedded system software.

상이한 기능은 상이한 디바이스들 상에서 구현될 수 있다. 예를 들면, 셀 폰이 원격 서비스 제공자의 서버와 통신하는 시스템에서, 상이한 작업들이 하나의 디바이스 또는 다른 디바이스에 의해 배타적으로 실행될 수 있거나, 실행이 디바이스들 사이에 분산될 수 있다. 이미지로부터의 고유값 데이터의 추출은 그러한 작업의 단지 일례일 뿐이다. 따라서, 특정 디바이스(예를 들면, 셀 폰)에 의해 실행되는 것으로의 동작의 기술은 제한적이 아니라 예시적인 것이고; 다른 디바이스(예를 들면, 원격 서버)에 의하거나 또는 디바이스들 사이에서 공유되는 동작의 실행이 명백하게 고찰되는 것으로서 이해해야 한다. (더욱이, 2개보다 많은 디바이스들이 일반적으로 활용될 수 있다. 예를 들면, 서비스 제공자는 이미지 검색, 오브젝트 세그먼테이션 및/또는 이미지 분류와 같은 일부 작업들을 그러한 작업들에 전용인 서버들에 참조한다.) Different functionality can be implemented on different devices. For example, in a system in which a cell phone communicates with a server of a remote service provider, different tasks may be executed exclusively by one device or another device, or execution may be distributed among the devices. Extraction of eigenvalue data from an image is just one example of such a task. Thus, the description of operation as being performed by a particular device (eg, cell phone) is illustrative rather than limiting; It should be understood that the execution of an operation shared by or between other devices (eg, a remote server) is explicitly contemplated. (Furthermore, more than two devices may generally be utilized. For example, the service provider refers some tasks, such as image retrieval, object segmentation, and / or image classification, to servers dedicated to those tasks. )

(동일한 방식으로, 특정 디바이스 상에 저장되는 데이터의 기술도 또한 예시적이다; 데이터는 임의의 장소에 저장될 수 있다: 로컬 디바이스, 원격 디바이스, 클라우드에, 분산되어, 등.) (In the same way, the description of the data stored on the particular device is also exemplary; the data can be stored anywhere: local device, remote device, in the cloud, distributed, etc.)

동작들은 특별히-식별 가능한 하드웨어에 의해 배타적으로 실행될 필요는 없다. 오히려, 일부 동작들은 다른 서비스들에 참조될 수 있으며(예를 들면, 클라우드 컴퓨팅), 더욱 일반적으로 익명인 시스템들에 의해 실행에 참여한다. 이러한 분산된 시스템들은 큰 크기일 수 있거나(예를 들면, 지구상의 컴퓨팅 리소스를 수반함) 또는 로컬일 수 있다(예를 들면, 휴대용 디바이스가 블루투스 통신을 통해 주위의 디바이스들을 식별하고, 작업에 하나 이상의 주위의 디바이스들을 관련시킬 때 - 로컬 기하학으로부터 기여 데이터와 같이; 이와 관련하여, Beros에 대한 특허 7,254,406 참조.)The operations need not be executed exclusively by specially-identifiable hardware. Rather, some operations may be referenced to other services (eg, cloud computing) and participate in execution by more generally anonymous systems. Such distributed systems can be large in size (eg, involving computing resources on the planet) or local (eg, a portable device identifies the surrounding devices via Bluetooth communication and has one in operation). When relating the above surrounding devices-as contribution data from local geometry; in this regard, see patent 7,254,406 to Beros.)

유사하게, 특정 기능들이 특정 모듈들(예를 들면, 제어 처리기 모듈(36), 파이프 관리기(51), 도 7의 질의 라우터 및 응답 관리기 등)에 의해 실행되는 것으로 상술되었지만, 다른 구현들에서, 그러한 기능들은 다른 모듈들에 의해 또는 애플리케이션 소프트웨어(또는 모두 함께 시행)실행될 수 있다. Similarly, while specific functions have been described above as being executed by specific modules (eg, control processor module 36, pipe manager 51, query router and response manager of FIG. 7, etc.), in other implementations, Such functions may be executed by other modules or by application software (or all together).

판독자는 특정 논의가 대부분의 이미지 처리가 셀 폰 상에서 실행되는 어레인지먼트들을 고찰하는 것을 유념한다. 그러한 어레인지먼트들에서, 외부 리소스들은 이미지 처리 작업들에 대한 것보다 데이터에 대한 리소스들(예를 들면, 구글)로서 더 많이 이용된다. 이러한 어레인지먼트들은 다른 섹션들에서 논의된 원리들을 이용하여 자연스럽게 실시될 수 있으며, 이미지-관련된 데이터의 하드코어 대량 처리의 일부 또는 전부는 외부 처리기들(서비스 제공자들)에 참조된다. The reader is cautioned that certain discussions consider arrangements where most image processing is performed on the cell phone. In such arrangements, external resources are used more as resources for data (eg, Google) than for image processing tasks. Such arrangements can be naturally implemented using the principles discussed in other sections, with some or all of the hardcore mass processing of image-related data being referenced to external processors (service providers).

마찬가지로, 이 개시내용이 예시된 실시예들에서 요소들의 특정 조합들 및 동작들의 특정 순서를 상술하였지만, 다른 고찰된 방법들이 동작들을 재순서화할 수 있고, 다른 고찰된 조합들이 일부 요소들을 생략하고 다른 요소들을 추가할 수 있음을 알 것이다. Likewise, although this disclosure has described the specific order of certain combinations and operations of elements in the illustrated embodiments, other contemplated methods may reorder the operations, and other contemplated combinations may omit some elements and other You will see that you can add elements.

완전한 시스템들로서 개시되었지만, 상술된 어레인지먼트들의 부조합들이 또한 개별적으로 고찰된다. Although disclosed as complete systems, subcombinations of the above-described arrangements are also considered separately.

예시적인 실시예들에서, 인터넷에 대한 참조가 이루어졌다. 다른 실시예들에서, 다른 네트워크들 - 컴퓨터들의 비밀 네트워크들을 포함하여 - 이 또한 대신 활용될 수 있다. In exemplary embodiments, reference was made to the Internet. In other embodiments, other networks-including secret networks of computers-can also be utilized instead.

판독자는 유사하거나 동일한 구성요소들, 처리들 등을 참조할 때 상이한 이름들이 때때로 이용되는 것을 알 것이다. 이것은 부분적으로, 거의 1년의 과정에 걸친 - 시간에 걸쳐 바뀐 용어를 이용하여 - 이 특허 명세서의 개발로 인한 것이다. 따라서, 예를 들면, "비주얼 질의 패킷" 및 "키벡터"는 양쪽 모두 동일한 것을 나타낼 수 있다. 다른 용어들에 대해서도 유사하다. The reader will appreciate that different names are sometimes used when referring to similar or identical components, processes, and the like. This is in part due to the development of this patent specification over a nearly year-long process-using terms that have changed over time. Thus, for example, "visual query packet" and "keyvector" may both indicate the same thing. Similar for other terms.

일부 모드들에서, 본 기술의 양태들을 활용하는 셀 폰들은 관찰 상태 머신들로서 간주될 수 있다. In some modes, cell phones utilizing aspects of the present technology can be considered as observation state machines.

이미지 캡처 및 처리를 실행하는 시스템들의 콘텍스트에서 주로 상술되었지만, 대응하는 어레인지먼트들은 오디오를 캡처하여 처리하거나, 이미지 및 오디오 양쪽 모두를 캡처하여 처리하는 시스템들에 동일하게 적용 가능하다. Although primarily described above in the context of systems that perform image capture and processing, corresponding arrangements are equally applicable to systems that capture and process audio, or capture and process both image and audio.

오디오-기반 시스템에서 일부 처리 모듈들은 자연스럽게 상이할 수 있다. 예를 들면, 오디오 처리는 일반적으로 임계 대역 샘플링에 의존한다(인간 청각 시스템마다). 캡스트럼(cepstrum) 처리(전력 스펙트럼의 DCT)가 또한 빈번히 이용된다. Some processing modules in an audio-based system may naturally be different. For example, audio processing generally relies on threshold band sampling (per human auditory system). Capstrum processing (DCT of power spectrum) is also frequently used.

예시적인 처리 체인은 낮은 및 높은 주파수들을 제거하기 위하여, 예를 들면 대역 300-3000Hz를 남겨두기 위하여, 마이크로폰에 의해 캡처된 오디오를 필터링하는 대역 통과 필터를 포함할 수 있다. 데시메이션 스테이션이 뒤따를 수 있다(예를 들면, 40K 샘플들/초로부터 6K 샘플들/초까지 샘플링 레이트를 감소시킴). FFT가 그 후에 뒤따를 수 있다. 전력 스펙트럼 데이터가 FFT로부터 출력 계수들을 제곱함으로써 계산될 수 있다(이들은 임계 대역 세그먼테이션을 행하기 위해 그룹화될 수 있다). 그 후에, 캡스트럼 데이터를 생성하기 위하여, DCT가 실행될 수 있다. 임의의 이들 스테이지들로부터의 출력들은 음성 인식, 언어 번역, 익명화(상이한 음성으로 동일한 발성들을 리턴) 등과 같은 애플리케이션 처리를 위해 클라우드에 전송될 수 있다. 원격 시스템들은 또한, 예를 들면 다른 시스템들을 제어하고, 다른 처리에 의한 이용을 위해 정보를 공급하는 등을 위하여, 마이크로폰에 의해 캡처되고 이용자에 의해 말해진 명령어들에 응답할 수 있다. An exemplary processing chain may include a band pass filter that filters the audio captured by the microphone to remove low and high frequencies, for example to leave the band 300-3000 Hz. The decimation station may follow (eg, reduce the sampling rate from 40K samples / second to 6K samples / second). The FFT can then follow. Power spectral data can be calculated by squared output coefficients from the FFT (they can be grouped to perform critical band segmentation). Thereafter, the DCT may be executed to generate capstrum data. Outputs from any of these stages can be sent to the cloud for application processing such as speech recognition, language translation, anonymization (returning the same utterances with different voices), and the like. Remote systems may also respond to commands captured by the microphone and spoken by the user, for example, to control other systems, to supply information for use by other processes, and the like.

콘텐트 신호들의 상술된 처리들은 다양한 물리적 형태들의 이들 신호들의 변환을 포함하는 것을 알 것이다. 이미지들 및 비디오(물리적 오브젝트들을 묘사하고 물리적 공간을 통해 이동하는 전자기파들의 형태들)는 카메라들 또는 다른 캡처 기기를 이용하여 물리적 오브젝트들로부터 캡처될 수 있거나, 계산 디바이스들에 의해 생성될 수 있다. 유사하게, 물리적 매체를 통해 이동하는 오디오 압력 파형들은 오디오 변환기(예를 들면, 마이크로폰)를 이용하여 캡처될 수 있거나 전기 신호(디지털 또는 아날로그 형태)로 변환될 수 있다. 이들 신호들이 통상적으로 상술된 구성요소들 및 처리들을 구현하기 위해 전자 및 디지털 형태로 처리되지만, 이들은 또한, 전자, 광, 자기 및 전자기파 형태들을 포함한 다른 물리적인 형태들로 캡처, 처리, 송신 및 저장될 수 있다. 콘텐트 신호들은 신호들 및 관련된 정보의 다양한 데이터 구조 표현들을 처리 및 생성하는 동안 다양한 형태들로 다양한 용도들을 위해 변환된다. 이제, 메모리 내의 데이터 구조 신호들이 검색, 분류, 판독, 기록 및 검색 동안 조작을 위해 변환된다. 신호들이 또한, 디스플레이 또는 오디오 변환기(예를 들면, 스피커들)를 통해 캡처, 송신, 저장 및 출력하기 위해 변환된다. It will be appreciated that the above described processing of content signals involves the conversion of these signals in various physical forms. Images and video (in the forms of electromagnetic waves depicting physical objects and moving through physical space) may be captured from physical objects using cameras or other capture device, or may be generated by computing devices. Similarly, audio pressure waveforms traveling through a physical medium can be captured using an audio transducer (eg, a microphone) or converted into an electrical signal (digital or analog form). While these signals are typically processed in electronic and digital form to implement the components and processes described above, they are also captured, processed, transmitted and stored in other physical forms, including electronic, optical, magnetic and electromagnetic forms. Can be. The content signals are transformed for various uses in various forms while processing and generating various data structure representations of the signals and related information. Now, data structure signals in the memory are converted for manipulation during retrieval, sorting, reading, writing and retrieval. The signals are also converted for capture, transmission, storage and output through a display or audio converter (eg, speakers).

일부 실시예들에서, 디바이스에 저장된 데이터를 참조하여 - 어떠한 외부 리소스도 참조하지 않고 - 캡처된 이미지에 대한 적절한 응답이 결정될 수 있다. (많은 오퍼레이팅 시스템들에 이용되는 레지스트리 데이터베이스는 특정 입력들에 대한 응답-관련된 데이터가 명시될 수 있는 곳이다.) 대안적으로, 정보는 원격 시스템에 송신될 수 있다 - 이것은 응답을 결정하기 위한 것이다. In some embodiments, an appropriate response to the captured image can be determined with reference to data stored on the device-without reference to any external resources. (The registry database used in many operating systems is where response-related data for specific inputs can be specified.) Alternatively, information can be sent to the remote system-this is to determine the response. .

상기에 특별히 식별되지 않은 도면들은 개시된 기술의 상세들 또는 예시적 실시예들의 양태들을 도시한다. The drawings, not particularly identified above, illustrate aspects of the details or example embodiments of the disclosed technology.

디바이스로부터 송신된 정보는 미가공 픽셀들일 수 있거나, 압축된 형태의 이미지일 수 있거나, 이미지에 대한 변환된 사본일 수 있거나, 이미지 데이터로부터 추출된 특징들/메트릭들, 등일 수 있다. 모두 이미지 데이터로서 간주될 수 있다. 수신 시스템은 데이터 타입을 인식할 수 있거나, 수신 시스템에 명백하게 식별될 수 있고(예를 들면, 비트맵, 고유벡터들, 푸리에-멜린 변환 데이터 등), 그 시스템은 처리하는 방법을 결정하는데 입력들 중 하나로서 데이터 타입을 이용할 수 있다. The information transmitted from the device may be raw pixels, may be an image in compressed form, may be a converted copy of the image, features / metrics extracted from the image data, and so forth. All can be regarded as image data. The receiving system can recognize the data type or can be explicitly identified to the receiving system (e.g., bitmap, eigenvectors, Fourier-Meline transform data, etc.) and the system inputs to determine how to process. One of the data types can be used.

송신된 데이터가 전체 이미지 데이터(생 또는 압축된 형태)이면, 처리 시스템에 의해 수신된 패킷들에는 복제들이 본질적으로 존재하지 않을 것이다 - 본질적으로 모든 화상은 다소 상이하다. 그러나, 발신 디바이스가 전체 이미지에 대해 처리를 실행하여 특징들 또는 메트릭들 등을 추출한다면, 수신 시스템은 때때로 초기에 마주친 것과 동일한 패킷을 수신할 수 있다(또는 거의 그렇게). 이러한 경우, 그 "스냅 패킷"(또한 "픽셀 패킷" 또는 "키벡터"라고도 칭해짐)에 대한 응답은 - 새로운 것이 결정되기보다는 - 캐시로부터 리콜될 수 있다. (이용 가능하고 응용 가능한 경우에, 이용자 선호 정보에 따라 응답 정보가 수정될 수 있다.)If the transmitted data is full image data (raw or compressed form), then there will be essentially no duplicates in the packets received by the processing system-essentially every picture is somewhat different. However, if the originating device performs processing on the entire image to extract features or metrics, etc., the receiving system may sometimes receive (or almost do) the same packet as initially encountered. In such a case, the response to that "snap packet" (also referred to as a "pixel packet" or "keyvector") may be recalled from the cache-rather than a new one being determined. (If available and applicable, response information may be modified according to user preference information.)

특정 실시예들에서, 알려진 이용자가 디바이스를 조작하고 있음을 보장하기 위하여, 캡처 디바이스가 셔터 버튼과 통합된 핑거프린트 판독기와 같은 생체 인증의 어떤 형태를 포함하는 것이 바람직할 수 있다. In certain embodiments, to ensure that a known user is operating the device, it may be desirable for the capture device to include some form of biometric authentication, such as a fingerprint reader integrated with a shutter button.

일부 실시예들은 상이한 조망들로부터 대상의 여러 이미지들을 캡처할 수 있다(예를 들면, 비디오 클립). 그 후에, 이미징된 대상물의 3D 모델을 합성하기 위한 알고리즘들이 적용될 수 있다. 이러한 모델로부터, 대상의 새로운 뷰들 - 상술된 처리들(예를 들면, 전경 오브젝트를 폐색하는 것을 회피하는 것)에 대한 자극으로서 더욱 적합할 수 있는 뷰들 - 이 도출될 수 있다. Some embodiments may capture several images of the subject from different views (eg, video clip). Thereafter, algorithms for synthesizing the 3D model of the imaged object can be applied. From this model, new views of the subject-views that may be more suitable as a stimulus to the processes described above (eg, avoiding occlusion of the foreground object) can be derived.

텍스트의 디스크립터들을 이용한 실시예들에서, 때때로 동의어들, 하위어들(더욱 특정한 용어들) 및/또는 상위어들(더욱 일반적인 용어들)로 디스크립터들을 증대시키는 것이 바람직하다. 이들은 프린스턴 대학에 의해 컴파일된 WordNet 데이터베이스를 포함한 다양한 소스들로부터 획득될 수 있다. In embodiments using descriptors of text, it is sometimes desirable to augment descriptors with synonyms, lower words (more specific terms) and / or higher words (more general terms). These can be obtained from a variety of sources, including a WordNet database compiled by Princeton University.

상술된 실시예들의 대부분이 이미지 데이터를 서비스 제공자에 제시하여 대응하는 응답을 트리거링하는 셀 폰의 콘텍스트에 있었지만, 그 기술은 일반적으로 - 이미지 및 다른 콘텐트의 처리가 발생할 때마다 - 더욱 응용 가능하다. While most of the embodiments described above have been in the context of a cell phone that presents image data to a service provider to trigger a corresponding response, the technique is generally more applicable-whenever processing of images and other content occurs.

이러한 개시내용의 초점은 이미지에 맞춰졌다. 그러나, 이 기술들은 오디오 및 비디오에도 유용하다. 상술된 기술은 유튜브와 같은 이용자 생성 콘텐트(UGC: User Generated Content) 사이트들에서 특히 유용하다. 비디오들은 흔히 메타데이터를 거의 가지지 않고 다운로드된다. 이를 식별하기 위해 불확실성의 정도들을 달리하여 다양한 기술들이 적용되고(예를 들면, 워터마크들 판독; 핑거프린터들, 인간 리뷰자들 등 계산), 이 식별 메타데이터는 저장된다. 다른 메타데이터는 비디오를 시청하는 이용자들의 프로파일들에 기초하여 축적된다. 또 다른 메타데이터는 비디오에 관해 포스팅된 나중의 이용자 코멘트틀로부터 채집될 수 있다. (본 기술에 포함될 수 도록 출원자들이 의도한 UGC-관련된 어레인지먼트들은 공개된 특허 출원들 20080208849 및 20080228733 (Digimarc), 20080165960 (TagStory), 20080162228 (Trivid), 20080178302 및 20080059211 (Attributor), 20080109369 (Google), 20080249961 (Nielsen), 및 20080209502 (MovieLabs)에 상술된다.) 본 명세서에 상술된 것과 같은 어레인지먼트들에 의해, 적절한 광고/콘텐트 편성들이 수집될 수 있고, 이용자들의 경험에 대한 다른 향상들이 제공될 수 있다. The focus of this disclosure has been on the image. However, these techniques are also useful for audio and video. The technique described above is particularly useful in User Generated Content (UGC) sites such as YouTube. Videos are often downloaded with little metadata. Various techniques are applied (eg, reading watermarks; calculating fingerprints, human reviewers, etc.) with varying degrees of uncertainty to identify this, and this identification metadata is stored. Other metadata is accumulated based on the profiles of users watching the video. Still other metadata may be collected from later user comment frames posted about the video. (UGC-related arrangements intended by the applicant to be included in the present technology are disclosed in published patent applications 20080208849 and 20080228733 (Digimarc), 20080165960 (TagStory), 20080162228 (Trivid), 20080178302 and 20080059211 (Attributor), 20080109369 (Google), 20080249961 (Nielsen), and 20080209502 (MovieLabs).) With arrangements such as those described herein above, appropriate advertisement / content organizations can be collected and other improvements to the user's experience can be provided. .

유사하게, 기술은 이용자 디바이스에 의해 캡처된 오디오 및 캡처된 음성 인식과 함께 이용될 수 있다. 임의의 캡처된 정보(예를 들면, OCR'd 텍스트, 디코딩된 워터마크 데이터, 인식된 음성)로부터 수집된 정보는 본 명세서에서 상술된 용도들을 위해 메타데이터로서 이용될 수 있다.Similarly, the technology can be used with audio captured by the user device and captured speech recognition. Information collected from any captured information (eg, OCR'd text, decoded watermark data, recognized speech) can be used as metadata for the purposes detailed herein.

이 기술의 멀티-미디어 애플리케이션들이 또한 고찰된다. 예를 들면, 이미지는 플리커에서의 유사한 이미지들의 세트를 식별하기 위해 패턴-매칭되거나 GPS-장착될 수 있다. 메타데이터 디스크립터들은 유사한 이미지들의 세트로부터 수집되거나 오디오 및/또는 비디오를 포함하는 메타데이터를 질의하기 위해 이용될 수 있다. 따라서, 애팔래치아 트레일 상의 트레일 마커의 이미지를 캡처 및 제출한 이용자(도 38)는 이용자의 셀 폰 또는 홈 엔터테인먼트 시스템에 적합한 Aaron Copeland의 "애팔래치아 봄" 오케스트라로부터 오디오 트랙의 다운로드를 트리거할 수 있다. (이용자와 연관될 수 있는 상이한 목적지들에 콘텐트를 송신하는 것에 관해, 예를 들면, 특허 공개 20070195987을 참조한다.)Multi-media applications of this technology are also contemplated. For example, the image may be pattern-matched or GPS-mounted to identify a set of similar images in flicker. Metadata descriptors may be collected from a set of similar images or used to query metadata including audio and / or video. Thus, a user who captures and submits an image of a trail marker on the Appalachian Trail (FIG. 38) may trigger the download of an audio track from Aaron Copeland's “Appalachian Spring” orchestra suitable for his cell phone or home entertainment system. Can be. (See, eg, patent publication 20070195987 regarding transmitting content to different destinations that may be associated with a user.)

GPS 데이터에 대한 반복된 참조가 이루어졌다. 이것은 임의의 위치-관련된 정보에 대해 일손이 부족한 것으로 이해되어야 한다; 그것은 글로벌 위치확인 시스템 위성 배치들로부터 도출될 필요가 없다. 예를 들면, 위치 데이터를 생성하기에 적합한 다른 기술은 디바이스들 사이에서 일반적으로 교환되는 것인 무선 신호들(예를 들면, WiFi, 셀룰러 등)에 의존한다. 여러 통신 디바이스들이 주어지면, 신호들 자체 - 및 이들을 제어하는 불완전한 디지털 클럭 신호들 - 는 양쪽 모두 매우 정확한 시간 및 위치가 추출될 수 있는 참조 시스템을 형성한다. 이러한 기술은 공개 국제 특허 공보 WO08/073347에 상술된다. 기술자는 도착 시간 기술들에 기초한 위치-추정 기술들 및 브로드캐스트 무선 및 텔레비전 타워들(Rosum에 의해 제공된 것처럼) 및 WiFi 노드들(Skyhook 무선에 의해 제공되고 아이폰에서 활용되는 것처럼) 등의 위치들에 기초한 위치-추정 기술들을 포함한 여러 다른 위치-추정 기술들과 친숙할 것이다. Repeated references to GPS data were made. It should be understood that there is a lack of work for any location-related information; It does not need to be derived from global positioning system satellite deployments. For example, another technique suitable for generating location data relies on wireless signals (eg, WiFi, cellular, etc.) that are generally exchanged between devices. Given several communication devices, the signals themselves-and the incomplete digital clock signals that control them-both form a reference system from which very accurate time and location can be extracted. Such techniques are detailed in published international patent publication WO08 / 073347. The technician is responsible for location-estimation techniques based on arrival time techniques and locations such as broadcast wireless and television towers (as provided by Rosum) and WiFi nodes (as provided by Skyhook radio and utilized in an iPhone). You will be familiar with several other location- estimation techniques, including based location- estimation techniques.

지리적 위치 데이터가 일반적으로 위도 및 경도 데이터를 포함하지만, 대안적으로 다소 또는 상이한 데이터를 포함할 수 있다. 예를 들면, 그것은 자기계에 의해 제공된 나침반 방향과 같은 배향 정보, 또는 자이로스코픽 또는 다른 센서들에 의해 제공된 경사 정보를 포함할 수 있다. 그것은 또한, 디지털 고도계 시스템들에 의해 제공되는 것과 같은 고도 정보를 포함할 수 있다. Geographic location data generally includes latitude and longitude data, but may alternatively include somewhat or different data. For example, it may include orientation information, such as compass direction provided by the magnetic field, or tilt information provided by gyroscopic or other sensors. It may also include altitude information as provided by digital altimeter systems.

애플의 Bonjour 소프트웨어에 대한 참조가 이루어졌다. Bonjour는 Zeroconf - 서비스 발견 프로토콜 - 의 애플의 구현이다. Bonjour는 로컬 네트워크 상에서 디바이스들을 찾고, 각각이 멀티캐스트 도메인 네임 시스템 서비스 기록들을 이용하여 제공하는 서비스들을 식별한다. 이 소프트웨어는 애플 MAC OS X 오퍼레이팅 시스템에서 만들어지고, 또한 아이폰에 대한 애플 "원격" 애플리케이션에 포함된다 - WiFi를 통한 iTunes 라이브러리들에 대한 접속들을 확립하기 위해 이용된다. Bonjour 서비스들은 오퍼레이팅 시스템에서보다는 표준 TCP/IP 호들을 이용하여 대량 애플리케이션 레벨로 구현된다. 애플은 Darwin 오픈 소스 프로젝트로서 이용 가능한 - 서비스 발견의 코어 구성요소 - Bonjour 멀티캐스트 DNS 응답기의 소스 코드를 만들었다. 프로젝트는 Mac OS X, 리눅스, *BSD, 솔라리스, 및 윈도우즈를 포함한 광범위한 플랫폼에 대한 응답기 데몬을 만들기 위한 소스 코드를 제공한다. 그 외에도, 애플은 자바 라이브러리들뿐만 아니라 소위 윈도우즈용 Bonjour의 이용자-인스톨 가능한 세트의 서비스들을 제공한다. Bonjour는 디바이스들과 시스템들 사이의 통신들을 관련시켜 본 기술의 다양한 실시예들에서 이용될 수 있다. A reference to Apple's Bonjour software was made. Bonjour is Apple's implementation of Zeroconf, a service discovery protocol. Bonjour looks for devices on the local network and identifies the services that each provides using multicast domain name system service records. This software is built on the Apple MAC OS X operating system and is also included in the Apple "remote" application for the iPhone-used to establish connections to iTunes libraries over WiFi. Bonjour services are implemented at the bulk application level using standard TCP / IP calls rather than in the operating system. Apple created the source code for the Bonjour multicast DNS responder-a core component of service discovery-available as a Darwin open source project. The project provides source code for creating responder daemons for a wide range of platforms including Mac OS X, Linux, * BSD, Solaris, and Windows. In addition, Apple offers not only Java libraries, but also a so-called user-installable set of services for Bonjour for Windows. Bonjour may be used in various embodiments of the present technology by relating communications between devices and systems.

(다른 소프트웨어가 대안적으로 또는 부가적으로 디바이스들 사이에서 데이터를 교환하기 위해 이용될 수 있다. 예들은 유니버셜 플러그 앤 플레이(UPnP: Universal Plug and Play) 및 그 후임 웹 서비스들에 대한 디바이스 프로파일(DPWS: Devices Profile for Web Services)을 포함한다. 이들은 제로 구성 네트워킹 서비스들을 구현하는 다른 프로토콜들이며, 이를 통해 디바이스들은 접속할 수 있고, 자신을 식별할 수 있고, 이용 가능한 능력들을 다른 디바이스들에 광고할 수 있고, 콘텐트를 공유할 수 있다, 등.)(Other software may alternatively or additionally be used to exchange data between devices. Examples include device profiles for Universal Plug and Play (UPnP) and its successor web services. DPWS: Devices Profile for Web Services (DPWS), which are other protocols that implement zero-configuration networking services, through which devices can connect, identify themselves, and advertise available capabilities to other devices. And share content, etc.)

초기에 주지된 바와 같이, 인공 지능 기술들은 본 기술의 실시예들에서 중요한 역할을 할 수 있다. 이 분야의 최근의 참여자는 Wolfram 리서치에 의한 Wolfram Alpha 제품이다. Alpha는 보조 데이터의 지식 기반을 참조함으로써 구성된 입력에 응답하여 응답들 및 시각화들을 계산한다. 메타데이터 분석 또는 의미 검색 엔진들로부터 수집된 정보는 본 명세서에 상술된 바와 같이, 이용자에 다시 응답 정보를 제공하기 위해 Wolfram Alpha 제품에 제공될 수 있다. 일부 실시예들에서, 이용자는 시스템에 의해 수집된 용어들 및 다른 프리미티브들로부터 질의를 구성하고, 시스템에 의해 구성된 상이한 질의들의 메뉴 중에서 선택하는 등에 의해서와 같이, 이러한 정보의 제시에 관련된다. 부가적으로 또는 대안적으로, Alpha 시스템으로부터 응답 정보는 다른 응답 정보를 식별하기 위하여 구글과 같은 다른 시스템들에 대한 입력으로 제공될 수 있다. Wolfram의 특허 공보들 20080066052 및 20080250347은 기술의 양태들을 더 상술한다. As noted earlier, artificial intelligence techniques can play an important role in embodiments of the present technology. A recent participant in this field is the Wolfram Alpha product by Wolfram Research. Alpha calculates responses and visualizations in response to input configured by referencing a knowledge base of auxiliary data. Information collected from metadata analysis or semantic search engines may be provided to the Wolfram Alpha product to provide response information back to the user, as detailed herein. In some embodiments, a user is involved in the presentation of such information, such as by constructing a query from terms and other primitives collected by the system, selecting from a menu of different queries configured by the system, and the like. Additionally or alternatively, response information from the Alpha system may be provided as input to other systems, such as Google, to identify other response information. Wolfram's patent publications 20080066052 and 20080250347 further detail aspects of the technology.

다른 최근 기술 도입은 구글 보이스(초기 벤처의 GrandCentral 제품에 기초하여)이며, 통상적인 전화 시스템들에 대한 다수의 개선들을 제공한다. 이러한 특징들은 본 기술의 특정 양태들과 함께 이용될 수 있다. Another recent technology introduction is Google Voice (based on the initial venture's GrandCentral product), which provides a number of improvements over conventional telephone systems. Such features may be used with certain aspects of the present technology.

예를 들면, 구글 보이스에 의해 제공되는 음성-텍스트 전사 서비스들은 이용자의 셀 폰에서 마이크로폰을 이용하여 스피커의 환경으로부터 주위 오디오를 캡처하고, 대응하는 디지털 데이터(예를 들면, ASCII 정보)를 생성하기 위해 활용될 수 있다. 시스템은 관련된 정보를 획득하기 위해 구글 또는 Wolfram Alpha와 같은 서비스들에 이러한 데이터를 제시할 수 있으며, 시스템은 그 후에 관련 정보를 이용자에게 - 스크린 디스플레이에 의해 또는 음성에 의해 - 다시 제공할 수 있다. 유사하게, 구글 보이스에 의해 가능한 음성 인식은 대화형 이용자 인터페이스를 셀 폰 디바이스들에 제공하기 위해 이용될 수 있고, 그에 의해 본 명세서에 상술된 기술의 특징들이 말해진 단어들에 의해 선택적으로 호출되어 제어될 수 있다. For example, voice-text transcription services provided by Google Voice use microphones in a user's cell phone to capture ambient audio from the speaker's environment and generate corresponding digital data (e.g., ASCII information). Can be utilized for The system can present this data to services such as Google or Wolfram Alpha to obtain relevant information, which can then provide the relevant information back to the user-either by a screen display or by voice. Similarly, speech recognition possible by Google Voice can be used to provide an interactive user interface to cell phone devices, whereby the features of the techniques described above are selectively invoked and controlled by words spoken. Can be.

다른 양태에서, 이용자가 셀 폰 디바이스로 콘텐트(오디오 또는 비주얼)를 캡처하고, 현재 개시된 기술을 활용하는 시스템이 응답을 리턴할 때, 응답 정보는 텍스트에서 음성으로 변환될 수 있고, 구글 보이스에서 이용자의 음성메일 계정에 전달된다. 이용자는 임의의 정보로부터 또는 임의의 컴퓨터로부터 이 데이터 저장소에 액세스할 수 있다. 저장된 음성 메일은 청취 가능한 형태로 리뷰될 수 있거나, 이용자는 예를 들면 셀 폰 또는 컴퓨터 스크린 상에 제공된 텍스트의 사본을 리뷰하는 대신 선택할 수 있다. In another aspect, when the user captures content (audio or visual) with the cell phone device, and the system utilizing the presently disclosed technology returns a response, the response information may be converted from text to voice and the user in Google Voice Will be forwarded to your voicemail account. The user can access this data store from any information or from any computer. The stored voicemail may be reviewed in audible form, or the user may choose, for example, instead of reviewing a copy of the text provided on the cell phone or computer screen.

(구글 보이스 기술의 양태들은 특허 출원 20080259918에 상술된다.)(The aspects of Google Voice technology are detailed in patent application 20080259918.)

1세기 이상의 역사는 이용자들이 지점 A에서 오디오를 수신하고, 지점 B에서 오디오를 전달하는 통신 디바이스들로서 폰들을 생각하는데 익숙하게 했다. 그러나, 본 기술의 양태들은 매우 상이한 효과로 활용될 수 있다. 오디오-인, 오디오-아웃은 구시대의 패러다임이 되고 있다. 본 기술의 특정 양태들에 따라, 폰들은 또한, 지점 A에서 이미지(또는 다른 자극들)를 수신하여, 지점 B에서 텍스트, 음성, 데이터, 이미지, 비디오, 향기 또는 다른 감각적 경험을 전달하게 하는 통신 디바이스들이다.More than a century of history has made users accustomed to thinking of phones as communication devices that receive audio at point A and deliver audio at point B. However, aspects of the present technology can be utilized with very different effects. Audio-in and audio-out are becoming the old paradigm. In accordance with certain aspects of the present technology, phones also receive communication (or other stimuli) at point A to communicate text, voice, data, image, video, fragrance or other sensory experience at point B. Devices.

현재 상술된 기술을 질의 디바이스로서 이용하는 - 단일 폰이 입력 및 출력 양쪽 모두의 역할을 하는 - 대신에, 이용자는 질의에 응답하여 콘텐트가 하나 또는 여러 목적지 시스템들에 전달되게 할 수 있다 - 이것은 발신 폰을 포함할 수도 하지 않을 수도 있다. (수신자(들)는 키패드 입력, 수신자들의 메뉴를 통한 스크롤, 음성 인식 등을 포함하여 알려진 UI 기술들에 의해 선택될 수 있다.)Instead of using the technology described above as a querying device, where a single phone acts as both an input and an output, the user can have the content delivered to one or several destination systems in response to the query-this is the originating phone It may or may not include. (Recipient (s) may be selected by known UI techniques, including keypad input, scrolling through recipients' menus, speech recognition, etc.)

이 이용 모델의 간단한 예시는 장미 화초 식물의 사진을 캡처하기 위해 셀 폰을 이용하는 사람이다. 이용자의 지시에 응답하여, 화상 - 그 특정 종류의 장미의 합성된 향기에 의해 증대되는 - 이용자의 여자친구에게 전달된다. (프로그래밍 가능한 향기들을 전파하기 위한 컴퓨터 디바이스들을 장착하기 위한 어레인지먼트들이 알려져 있으며, 예를 들면, Digiscents에 의한 iSmell 제공 및 특허 문헌들 20080147515, 20080049960, 20060067859, WO00/15268 및 WO00/15269에 상술된 기술들이다.) 한 위치에서 한 이용자에 의해 캡처된 자극은 상이하지만 상이한 장소에서 상이한 이용자에 대한 관련된 경험적 자극의 전달을 유발할 수 있다. A simple example of this usage model is a person using a cell phone to capture a picture of a rose flower plant. In response to the user's instructions, an image, which is augmented by the synthesized fragrance of that particular kind of rose, is delivered to the user's girlfriend. (Arrangements for mounting computer devices for propagating programmable fragrances are known, for example the techniques described in iSmell provision by Digiscents and patent documents 20080147515, 20080049960, 20060067859, WO00 / 15268 and WO00 / 15269. .) The stimuli captured by one user at one location may cause the delivery of related empirical stimuli to different users in different but different locations.

주지된 바와 같이, 비주얼 자극들에 대한 응답은 셀 폰 스크린 상에 제공된 하나 이상의 그래픽 오버레이들(보블들)을 포함할 수 있다 - 셀 폰 카메라로부터 최상단 이미지 데이터. 오버레이는 이미지 데이터에서 특징들로 기하학으로 등록될 수 있고, 이미지에 묘사된 오브젝트의 어파인 왜곡에 대응하여 어파인-왜곡될 수 있다. 그래픽 특징들은 그 영역에서 방출된 스파크들 또는 플래싱/이동 비주얼 효과와 같이 보블에 관심을 끌기 위해 이용될 수 있다. 그러한 기술은 예를 들면, Digimarc의 특허 공보 20080300011에 더 상술되어 있다. As noted, the response to the visual stimuli may include one or more graphical overlays (bobbles) provided on the cell phone screen—top image data from the cell phone camera. The overlay may be registered geometrically as features in the image data and may be affine-distorted in response to the affine distortion of the object depicted in the image. Graphical features can be used to interest the bauble, such as sparks emitted in that area or flashing / moving visual effects. Such techniques are further detailed, for example, in Digimarc's patent publication 20080300011.

이러한 그래픽 오버레이는 메뉴 특징들을 포함할 수 있어서, 이를 이용하여 이용자는 원하는 기능들을 실행하도록 상호작용할 수 있다. 그 외에도 또는 대안적으로, 오버레이는 하나 이상의 그래픽 이용자 제어들을 포함할 수 있다. 예를 들면, 여러 상이한 오브젝트들은 카메라의 시야 내에서 인식될 수 있다. 각각과 연관된 오버레이는 그래픽일 수 있으며, 이것은 정보를 획득하거나, 그 각각의 오브젝트에 관련된 기능을 트리거링하기 위하여 이용자에 의해 터치될 수 있다. 오버레이들은 비주얼 플래그들로서 간주될 수 있다 - 예를 들면, 스크린의 그 위치 상에서의 이용자 탭핑에 의해, 또는 손가락 또는 첨필로 그 영역을 빙글빙글 돌림으로써 등과 같이, 그러한 그래픽 특징들과 이용자의 상호작용을 통해 액세스될 수 있는 정보의 가용성에 관심을 끈다. 이용자가 카메라의 조망을 변경함에 따라, 상이한 보블들이 나타날 수 있다 - 근원적인 현실 시계 이미지에서 상이한 오브젝트들의 이동을 추적하고, 연관된 보조 정보를 익스플로어하도록 이용자에게 촉구한다. 다시, 오버레이들은 연관된 현실 세계 특징들 상의 어파인-정정 프로젝션으로 직각으로 정정되는 것이 바람직하다. (현실 세계에서 이미징된 것으로서 대상들의 포즈 추정 - 오버레이들의 적절한 공간 등록이 결정되는 - 은 로컬로 실행되는 것이 바람직하지만, 애플리케이션에 의존하여 클라우드에 참조될 수 있다.)Such graphical overlay may include menu features, which allows the user to interact to perform the desired functions. In addition or in the alternative, the overlay may include one or more graphical user controls. For example, several different objects can be recognized within the field of view of the camera. The overlay associated with each may be graphical, which may be touched by the user to obtain information or trigger a function associated with that respective object. Overlays can be considered as visual flags-such as by user tapping on that location of the screen, or by rotating the area with a finger or stylus, etc., to interact with such graphical features. It draws attention to the availability of information that can be accessed through. As the user changes the view of the camera, different baubles may appear—tracking the movement of different objects in the underlying reality field of view image and prompting the user to explore the associated assistance information. Again, the overlays are preferably corrected at right angles to the affine-correction projection on the associated real-world features. (The pose estimation of the objects as imaged in the real world, where appropriate spatial registration of the overlays is determined, is preferably executed locally, but can be referenced to the cloud depending on the application.)

오브젝트들이 인식되고, 추적될 수 있고, 상술된 동작들에 의해 피드백 제공될 수 있다. 예를 들면, 로컬 처리기는 오브젝트 분석 및 초기 오브젝트 인식(예를 들면, 목록에 있는 프로토-오브젝트들) 을 실행할 수 있다. 클라우드 처리들은 인식 동작들을 완료할 수 있고, 디스플레이 장면 상에 직각으로 등록되는 적절한 상호작용 포털들을 만들어낸다(등록은 로컬 처리기 또는 클라우드에 의해 실행될 수 있다). Objects can be recognized, tracked, and fed back by the operations described above. For example, the local processor can perform object analysis and initial object recognition (eg, proto-objects in the list). The cloud processes can complete the recognition operations and create the appropriate interactive portals registered at right angles on the display scene (registration can be executed by the local processor or the cloud).

일부 양태들에서, 본 기술이 현실 세계에서 - 셀 폰 상에서 - 그래픽 이용자 인터페이스로서 동작할 수 있음을 알 것이다. In some aspects, it will be appreciated that the present technology may operate in the real world-on a cell phone-as a graphical user interface.

초기 구현들에서, 기술된 종류의 범용 비주얼 질의 시스템들은 비교적 투박하고 보기가 그다지 좋지 않을 것이다. 그러나, 달성 및 분석을 위해 클라우드에 다시 키벡터 데이터의 트리클(trickle)(또는 토런트(torrent))을 공급함으로써(이러한 데이터에 기초한 이용자 동작에 관한 정보와 함께), 이들은 템플릿 및 다른 트레이닝 모델들이 만들어질 수 있는 - 비주얼 자극들이 제공될 때 이러한 시스템들의 후속 생성들이 고도로 직관적이고 응답적으로 될 수 있게 함 - 데이터 토대를 확립할 수 있다. (이러한 트리클은 이용자가 디바이스와 어떻게 작업하는지, 무슨 작업을 하는지, 무엇을 하지 않는지, 이용자가 어떤 자극들에 기초하여 어떤 선택들을 하는지, 자극들이 관련되는지 등에 관한 작은 정보를 때때로 붙잡고 이들을 클라우드에 공급하는 로컬 디바이스 상의 서브루틴에 의해 제공될 수 있다.) In early implementations, general-purpose visual query systems of the described type would be relatively crude and not very good looking. However, by supplying a trickle (or torrent) of keyvector data back to the cloud for achievement and analysis (along with information about user behavior based on such data), they are created by templates and other training models. Data foundations can be established that allow subsequent generations of these systems to become highly intuitive and responsive when visual stimuli are provided. (The trickle sometimes catches small pieces of information about how users work with the device, what they do, what they don't do, what choices the user makes based on what stimuli, what stimuli are involved, and feed them to the cloud. May be provided by a subroutine on the local device.)

제스처 인터페이스 형태의 터치스크린 인터페이스들에 대한 참조가 이루어졌다. 특정 실시예들에서 이용될 수 있는 다른 형태의 제스처 인터페이스는 셀 폰의 움직임을 감지하여 - 캡처된 이미지 내의 특징들의 움직임을 추적함으로써 - 동작한다. 이러한 제스처 인터페이스들에 관한 다른 정보는 Digimarc의 특허 6,947,571에 상술된다. Reference has been made to touchscreen interfaces in the form of gesture interfaces. Another form of gesture interface that can be used in certain embodiments operates by sensing movement of the cell phone-by tracking the movement of features in the captured image. Other information regarding these gesture interfaces is detailed in Digimarc's patent 6,947,571.

워터마크 디코딩은 특정 실시예들에 이용될 수 있다. 인코딩/디코딩 워터마크들에 대한 기술은 예를 들면 Digimarc의 특허들 6,614,914 및 6,122,403에; Nielsen의 특허들 6,968,564 및 7,006,555에; 및 Arbitron의 특허들 5,450,490, 5,764,763, 6,862,355, 및 6,845,360에 상술된다. Watermark decoding may be used in certain embodiments. Techniques for encoding / decoding watermarks are described, for example, in Digimarc's patents 6,614,914 and 6,122,403; In Nielsen's patents 6,968,564 and 7,006,555; And Arbitron's patents 5,450,490, 5,764,763, 6,862,355, and 6,845,360.

Digimarc는 본 요지에 관련된 다양한 다른 특허들 출원들을 가진다. 특허 공보들 20070156726, 20080049971, 및 20070266252와, 2008년 5월 22일 출원된 Sharma 등에 의한 계류 출원 12/125,840을 참조한다. Digimarc has various other patent applications related to this subject matter. See patent publications 20070156726, 20080049971, and 20070266252, and pending application 12 / 125,840 to Sharma et al., Filed May 22, 2008.

구글의 북-스캐닝 특허 7,508,978은 본 콘텍스트에서 유용한 어떤 원리들을 상술한다. 예를 들면, '978 특허는 비평면 표면 상에 참조 패턴을 프로젝팅함으로써, 표면 토폴로지가 식별될 수 있음을 개시한다. 이 표면으로부터 캡처된 이미지는 그 후에, 플랫 페이지로부터 발생되는 것으로 나타나도록, 이를 재정상화하기 위해 처리될 수 있다. 이러한 재정상화는 또한, 본 명세서에 상술된 오브젝트 인식 어레인지먼트들과 함께 이용될 수 있다. 유사하게, 차세대 텔레비전과 상호작용하기 위한 비전들을 상술하는 구글의 특허 출원 20080271080이 또한, 현재 상술된 기술들과 함께 유용한 원리들을 상술한다. Google's book-scanning patent 7,508,978 details certain principles useful in this context. For example, the '978 patent discloses that by projecting a reference pattern on a non-planar surface, the surface topology can be identified. An image captured from this surface can then be processed to renormalize it so that it appears to originate from a flat page. This refinement can also be used with the object recognition arrangements described herein above. Similarly, Google's patent application 20080271080, which details visions for interacting with next generation television, also details useful principles along with the techniques described above.

오디오 핑거프린팅의 예들은 특허 공보들 20070250716, 20070174059 및 20080300011 (Digimarc), 20080276265, 20070274537 및 20050232411 (Nielsen), 20070124756 (Google), 7,516,074 (Auditude), 및 6,990,453 및 7,359,889 (양쪽 모두 Shazam)에 상술된다. 이미지/비디오 핑거프린팅의 예들은 특허 공보들 7,020,304 (Digimarc), 7,486,827 (Seiko-Epson), 20070253594 (Vobile), 20080317278 (Thomson), 및 20020044659 (NEC)에 상술된다. Examples of audio fingerprinting are detailed in patent publications 20070250716, 20070174059 and 20080300011 (Digimarc), 20080276265, 20070274537 and 20050232411 (Nielsen), 20070124756 (Google), 7,516,074 (Auditude), and 6,990,453 and 7,359,889 (both Shazam). Examples of image / video fingerprinting are detailed in patent publications 7,020,304 (Digimarc), 7,486,827 (Seiko-Epson), 20070253594 (Vobile), 20080317278 (Thomson), and 20020044659 (NEC).

상술된 기술의 특정 양태들이 정보를 수집하기 위해 다수의 이미지들을 처리하는 것을 관련시키지만, 대다수의 사람들(및/또는 자동화된 처리들)이 단일 이미지(예를 들면, 크라우드-소싱)를 고려하게 함으로써 관련된 결과들이 획득될 수 있다는 것을 알 것이다. 더 큰 정보 및 유틸리티는 이들 2개의 일반적인 방식들을 조합하여 달성될 수 있다. While certain aspects of the techniques described above involve processing multiple images to collect information, by allowing the majority of people (and / or automated processes) to consider a single image (eg, crowd-sourcing) It will be appreciated that relevant results may be obtained. Greater information and utility can be achieved by combining these two general ways.

도시된 것들은 예시적일 뿐 제한하기 위한 것이 아님을 의미한다. 예를 들면, 단일 데이터베이스가 이용될 수 있을 때 이들은 때때로 다수의 데이터베이스들을 보여준다(그 반대로도 가능하다). 마찬가지로, 묘사된 블록들 사이의 일부 링크들은 명료성을 위해 도시되지 않는다.It is meant that the depictions are exemplary only and not intended to be limiting. For example, when a single database can be used they sometimes show multiple databases (or vice versa). Likewise, some links between depicted blocks are not shown for clarity.

콘텍스트 데이터는 동작을 더 향상시키기 위해 상술된 실시예들의 전반에 이용될 수 있다. 예를 들면, 처리는 발신 디바이스가 셀 폰인지 데스크탑 컴퓨터인지; 주위 온도가 30도인지 80도인지; 이용자의 위치 및 이용자를 특징짓는 다른 정보; 등에 의존할 수 있다. Context data can be used throughout the above-described embodiments to further improve operation. For example, the processing may be whether the originating device is a cell phone or a desktop computer; Whether the ambient temperature is 30 degrees or 80 degrees; The location of the user and other information characterizing the user; And so on.

상술된 실시예들이 흔히, 이용자가 신속하게 스위칭할 수 있는 셀 폰 스크린 상의 일련의 캐싱된 디스플레이들로서 후보 결과들/동작들을 제공하지만, 다른 실시예들에서 이것은 그 경우일 필요가 없다. 결과들의 메뉴를 제공하는 더욱 통상적인 단일-스크린 제공이 이용될 수 있고 - 이용자는 키패드 디지트를 누르거나, 원하는 옵션을 하이라이팅하여 선택할 수 있다. 또는 동일한 이용자 경험이 데이터를 로컬로 캐싱하거나 버퍼링하지 않고 제공될 수 있도록 - 필요할 때 셀 폰에 전달되게 하기보다는 - 대역폭이 충분히 증가할 수 있다. While the embodiments described above often provide candidate results / actions as a series of cached displays on a cell phone screen that the user can switch quickly, this need not be the case in other embodiments. A more conventional single-screen presentation that provides a menu of results can be used-the user can press the keypad digits or highlight the desired option to select. Or the bandwidth can be increased enough so that the same user experience can be provided without caching or buffering the data locally-rather than being delivered to the cell phone when needed.

지리적-기반 데이터베이스 방법들은 예를 들면 Digimarc의 특허 공개 20030110185에 상술된다. 이미지 콜렉션을 통한 네비게이팅 및 검색 실행을 위한 다른 어레인지먼트들은 특허 공보들 20080010276 (Executive Development Corp.) 및 20060195475, 20070110338, 20080027985, 20080028341 (Microsoft's Photosynth work)에서 보여준다. Geographic-based database methods are described, for example, in Digimarc's patent publication 20030110185. Other arrangements for navigating and searching execution through image collection are shown in patent publications 20080010276 (Executive Development Corp.) and 20060195475, 20070110338, 20080027985, 20080028341 (Microsoft's Photosynth work).

본 명세서에 기술된 기술의 무수한 변형들 및 조합들을 명백하게 목록 작성하는 것은 불가능하다. 출원인들은 이 명세서의 개념들이 - 이들 중 둘 및 이들 사이에서, 뿐만 아니라 인용된 종래 기술로부터 알려진 개념들과 함께 - 조합되고, 대체되고, 교환될 수 있음을 인식하고 의도했다. 더욱이, 상술된 기술은 유리하게 실시하기 위하여 다른 기술들 - 현재 및 곧 도래하는 - 과 함께 포함될 수 있음을 알 것이다. It is not possible to explicitly list the myriad variations and combinations of the techniques described herein. Applicants have recognized and intended that the concepts of this specification may be combined, replaced, and exchanged-two of and between them, as well as with concepts known from the cited prior art. Moreover, it will be appreciated that the techniques described above may be included with other techniques—current and coming soon—to advantageously implement.

판독자는 본 명세서에 참조된 문헌들(특허 문헌들을 포함)과 친숙한 것으로 생각된다. 이 명세서를 과도하게 늘이지 않고 포괄적인 개시내용을 제공하기 위하여, 출원인들은 상기 참조된 이들 문헌들을 참조로서 포함한다. (이러한 문헌들은 특정한 개시내용들과 함께 상기에 인용한 경우에도, 온전히 포함된다.) 이들 참조문헌들은 본 명세서에 상술된 어레인지먼트들에 통합될 수 있고, 여기에 본 명세서에 상술된 기술들 및 개시내용들이 통합될 수 있는 기술들 및 개시내용들을 개시한다. The reader is believed to be familiar with the documents (including patent documents) referenced herein. In order to provide a comprehensive disclosure without unduly elongating this specification, applicants include these references referenced above by reference. (These documents are incorporated in their entirety even when cited above with specific disclosures.) These references may be incorporated into the arrangements detailed herein, and the techniques and disclosures set forth herein herein. Disclosed are techniques and disclosures in which the contents may be incorporated.

인식될 수 있는 바와 같이, 본 명세서는 무수한 신규한 어레인지먼트들을 상술했다. (실제 제약들로 인해, 많은 이러한 어레인지먼트들은 이 출원서의 첫 출원시에 아직 주장구되지 않았지만, 출원인들은 우선권을 주장하는 후속 출원들에서 이러한 다른 요지를 주장하려고 한다.) 독창적 어레인지먼트들의 일부의 불완전한 샘플링은 다음의 단락들에서 리뷰된다: As can be appreciated, the present specification has described a myriad of novel arrangements. (Because of actual constraints, many such arrangements have not yet been claimed at the time of the first filing of this application, but applicants try to claim these other points in subsequent applications claiming priority.) Incomplete sampling of some of the original arrangements Is reviewed in the following paragraphs:

일 어레인지먼트에서, 이용자의 모바일 디바이스의 센서에 의해 캡처된 자극들을 처리하고, 여기서, 일부 처리 작업들은 디바이스의 처리 하드웨어 상에서 실행될 수 있고, 다른 처리 작업들은 디바이스로부터 원격인 하나의 처리기 - 또는 복수 - 상에서 실행될 수 있고, 여기서, 적어도: 모바일 디바이스 전력 고려사항들; 필요한 응답 시간; 라우팅 제약들; 모바일 디바이스 내의 하드웨어 리소스들의 상태; 접속 상태; 지리적 고려사항들; 파이프라인 스톨의 위험; 그 준비성, 처리 속도, 비용, 및 모바일 디바이스의 이용자에게 중요하지 않은 속성들을 포함한 원격 처리기에 관한 정보; 및 그 작업의 다른 처리 작업들에 대한 관련을 포함하는 세트로부터 추출된 2개 이상의 상이한 팩터들의 고려사항에 기초하여, 제 1 작업이 디바이스 하드웨어 상에서 또는 원격 처리기 상에서 실행되어야 할지에 관한 판단이 자동화된 방식으로 이루어지고, 일부 환경들에서, 제 1 작업이 디바이스 하드웨어 상에서 실행되고, 다른 환경들에서, 제 1 작업이 원격 처리기 상에서 실행된다. 또한, 이러한 어레인지먼트에서, 결정은 리스팅된 고려사항들의 적어도 일부에 관련된 파라미터들의 조합에 의존하는 점수에 기초한다. In one arrangement, the stimuli captured by the sensor of the user's mobile device are processed, where some processing tasks can be executed on the processing hardware of the device and other processing tasks are on one processor-or multiple-remote from the device. Wherein the at least: mobile device power considerations; Required response time; Routing constraints; State of hardware resources within the mobile device; Connection status; Geographic considerations; Risk of pipeline stalls; Information about the remote processor, including its readiness, processing speed, cost, and attributes not critical to the user of the mobile device; And based on considerations of two or more different factors extracted from the set that includes the association to other processing tasks of the task, the determination as to whether the first task should be executed on the device hardware or on the remote processor is automated. And in some circumstances, the first task is executed on the device hardware, and in other environments, the first task is executed on the remote processor. In addition, in such an arrangement, the determination is based on a score that depends on the combination of parameters related to at least some of the listed considerations.

일 어레인지먼트에서, 이용자의 모바일 디바이스의 센서에 의해 캡처된 자극들을 처리하고, 여기서 일부 처리 작업들은 디바이스의 처리 하드웨어 상에서 실행될 수 있고, 다른 처리 작업들은 디바이스로부터 원격인 처리기 - 또는 복수의 처리기들 - 상에서 실행될 수 있고, 여기서, 적어도: 모바일 디바이스 전력 고려사항들; 필요한 응답 시간; 라우팅 제약들; 모바일 디바이스 내의 하드웨어 리소스들의 상태; 접속 상태; 지리적 고려사항들; 파이프라인 스톨의 위험; 그 준비성, 처리 속도, 비용, 및 모바일 디바이스의 이용자에게 중요하지 않은 속성들을 포함한 원격 처리기에 관한 정보; 및 그 작업의 다른 처리 작업들에 대한 관련을 포함하는 세트로부터 추출된 2개 이상의 상이한 팩터들의 고려사항에 기초하여, 작업들의 세트가 실행되어야 하는 순서가 이루어지고; 일부 환경들에서, 작업들의 세트는 제 1 순서로 실행되고, 다른 환경들에서, 작업들의 세트는 제 2 의 상이한 순서로 실행된다. 또한, 이러한 어레인지먼트에서, 결정은 리스팅된 고려사항들의 적어도 일부에 관련된 파라미터들의 조합에 의존하는 점수에 기초한다.In one arrangement, the stimuli captured by the sensor of the user's mobile device are processed, where some processing tasks can be executed on the processing hardware of the device, and other processing tasks are on a processor-or a plurality of processors-remote from the device. Wherein the at least: mobile device power considerations; Required response time; Routing constraints; State of hardware resources within the mobile device; Connection status; Geographic considerations; Risk of pipeline stalls; Information about the remote processor, including its readiness, processing speed, cost, and attributes not critical to the user of the mobile device; And based on considerations of two or more different factors extracted from the set including the association to other processing tasks of the task, an order in which the set of tasks should be executed; In some circumstances, the set of tasks is executed in a first order, and in other environments, the set of tasks is executed in a second different order. In addition, in such an arrangement, the determination is based on a score that depends on the combination of parameters related to at least some of the listed considerations.

일 어레인지먼트에서, 이용자의 모바일 디바이스의 센서에 의해 캡처된 자극들을 처리하고, 여기서, 일부 처리 작업들은 디바이스의 처리 하드웨어 상에서 실행될 수 있고, 다른 처리 작업들은 디바이스로부터 원격인 하나의 처리기 - 또는 복수 - 상에서 실행될 수 있고, 여기서, 패킷들은 처리 작업들 사이에서 데이터를 전달하기 위해 활용되고, 패킷들의 콘텐트들은 적어도: 모바일 디바이스 전력 고려사항들; 필요한 응답 시간; 라우팅 제약들; 모바일 디바이스 내의 하드웨어 리소스들의 상태; 접속 상태; 지리적 고려사항들; 파이프라인 스톨의 위험; 그 준비성, 처리 속도, 비용, 및 모바일 디바이스의 이용자에게 중요하지 않은 속성들을 포함한 원격 처리기에 관한 정보; 및 그 작업의 다른 처리 작업들에 대한 관련을 포함하는 세트로부터 추출된 2개 이상의 상이한 팩터들의 고려사항에 기초하여, 자동화된 방식으로 결정되고; 일부 환경들에서, 패킷들은 제 1 형태로 데이터를 포함할 수 있고, 다른 환경들에서, 패킷들은 제 2 형태로 데이터를 포함할 수 있다. 또한, 이러한 어레인지먼트에서, 결정은 리스팅된 고려사항들의 적어도 일부에 관련된 파라미터들의 조합에 의존하는 점수에 기초한다. In one arrangement, the stimuli captured by the sensor of the user's mobile device are processed, where some processing tasks can be executed on the processing hardware of the device and other processing tasks are on one processor-or multiple-remote from the device. Wherein the packets are utilized to convey data between processing tasks, the contents of the packets being at least: mobile device power considerations; Required response time; Routing constraints; State of hardware resources within the mobile device; Connection status; Geographic considerations; Risk of pipeline stalls; Information about the remote processor, including its readiness, processing speed, cost, and attributes not critical to the user of the mobile device; And based on consideration of two or more different factors extracted from the set that includes the association to other processing tasks of the task; In some circumstances, packets may contain data in a first form, and in other environments, packets may contain data in a second form. In addition, in such an arrangement, the determination is based on a score that depends on the combination of parameters related to at least some of the listed considerations.

일 어레인지먼트에서, 무대는 네트워크를 통해 이용자들에게 데이터 서비스들을 제공하고, 네트워크는 무대에 있는 동안 이용자들에 의한 전자 이미징의 이용을 억제하도록 구성된다. 또한, 이러한 어레인지먼트에서, 억제는 이용자 디바이스들로부터 네트워크 외부의 특정 데이터 처리 제공자들로 데이터의 송신을 제한함으로써 행해진다. In one arrangement, the stage provides data services to users over the network, and the network is configured to inhibit the use of electronic imaging by the users while on stage. Also in such an arrangement, the suppression is done by limiting the transmission of data from user devices to specific data processing providers outside the network.

일 어레인지먼트에서, 이미지 캡처 능력을 가진 모바일 통신 디바이스는 제 1 동작을 실행하기 위한 파이프라인화된 처리 체인을 포함하고, 제어 시스템은 제 2 동작을 실행함으로써 이미지 데이터를 테스트하는 모드를 가지고, 제 2 동작은 제 1 동작보다 계산적으로 더 간단하고, 제어 시스템은 제 2 동작이 제 1 타입의 출력을 생성하는 경우에만 파이프라인화된 처리 체인에 이미지 데이터를 적용한다. In one arrangement, the mobile communication device with image capture capability includes a pipelined processing chain for executing the first operation, the control system has a mode for testing image data by executing the second operation, and the second The operation is computationally simpler than the first operation, and the control system applies the image data to the pipelined processing chain only if the second operation produces an output of the first type.

일 어레인지먼트에서, 모바일 폰에는 모바일 폰 스크린 상의 디스플레이를 위해, 예를 들면 게임을 위해, 그래픽스의 렌더링을 용이하게 하기 위한 GPU가 장착되고, GPU는 또한, 기계 비전 용도들을 위해 활용된다. 또한, 이러한 어레인지먼트에서, 머신 비전 용도는 얼굴 검출을 포함한다. In one arrangement, the mobile phone is equipped with a GPU for facilitating the rendering of graphics for display on the mobile phone screen, for example for games, and the GPU is also utilized for machine vision applications. Also in this arrangement, machine vision applications include face detection.

일 어레인지먼트에서, 상이한 개인들에 의해 유지되는 복수의 소셜-연계된 모바일 디바이스들은 기계 비전 동작을 실행하는데 협력한다. 또한, 이러한 어레인지먼트에서, 디바이스들 중 제 1 디바이스는 이미지로부터 얼굴 특징들을 추출하기 위한 동작을 실행하고, 디바이스들 중 제 2 디바이스는 제 1 디바이스에 의해 생성된 추출된 얼굴 특징들에 대해 템플릿 매칭을 실행한다.In one arrangement, a plurality of social-associated mobile devices maintained by different individuals cooperate to perform machine vision operations. Also in this arrangement, a first of the devices performs an operation for extracting facial features from the image, and a second of the devices performs template matching on the extracted facial features generated by the first device. Run

일 어레인지먼트에서, 음성 인식 동작은 호출자를 식별하기 위해 인커밍 비디오 또는 폰 호출로부터의 오디오에 대해 실행된다. 또한, 이러한 어레인지먼트에서, 비디오 인식 동작은 인커밍 호출이 CallerID 데이터에 의해 식별되지 않는 경우에만 실행된다. 또한, 이러한 어레인지먼트에서, 음성 인식 동작은 하나 이상의 초기-저장된 음성 메시지들에 대응하는 데이터에 대한 참조를 포함한다. In one arrangement, a speech recognition operation is performed on incoming video or audio from a phone call to identify the caller. Also in this arrangement, the video recognition operation is performed only if the incoming call is not identified by the CallerID data. In this arrangement, the voice recognition operation also includes a reference to data corresponding to one or more initially-stored voice messages.

일 어레인지먼트에서, 인커밍 비디오 또는 폰 호출로부터의 음성이 인식될 수 있고, 호출이 처리될 때 그에 대응하는 텍스트 데이터가 생성된다. 또한, 이러한 어레인지먼트에서, 인커밍 호는 특정 지리학과 연관될 수 있고, 그러한 지리학은 음성을 인식하는데 고려될 수 있다. 또한, 이러한 어레인지먼트에서, 텍스트 데이터는 보조 정보를 데이터 구조에 질의하기 위해 이용된다. In one arrangement, voice from an incoming video or phone call can be recognized and text data corresponding thereto is generated when the call is processed. In addition, in such an arrangement, the incoming call may be associated with a particular geography, which may be considered in recognizing speech. Also in this arrangement, the text data is used to query the data structure for assistance information.

일 어레인지먼트는 오버레이 보블들을 모바일 디바이스 스크린 상으로 이주시켜, 로컬 및 클라우드 처리 양쪽 모두로부터 도출되게 하기 위한 것이다. 또한, 이러한 어레인지먼트에서, 오버레이 보블들은 이용자 선호 정보에 따라 동조된다. One arrangement is to migrate overlay baubles onto the mobile device screen so that they are derived from both local and cloud processing. Also in this arrangement, overlay baubles are tuned according to user preference information.

일 어레인지먼트에서, 비주얼 질의 데이터는 이용자의 모바일 디바이스와 클라우드 리소스들 사이에 분산된 방식으로 처리되어 응답을 생성하고, 관련된 정보는 클라우드에 보존되어, 후속 비주얼 질의 데이터가 더욱 직관적인 응답을 생성할 수 있도록 처리된다. In one arrangement, the visual query data is processed in a distributed manner between the user's mobile device and the cloud resources to generate a response, and the associated information is preserved in the cloud so that subsequent visual query data can generate a more intuitive response. So that it is processed.

일 어레인지먼트에서, 이용자는 (1) 벤더에 의해 데이터 처리 서비스에 대한 요금청구 받을 수 있거나, 대안적으로 (2) 이용자가 그에 관련하여 특정 동작을 취하면 벤더로부터 무료 서비스를 제공 받을 수 있거나 또는 심지어 신용을 수신할 수 있다.In one arrangement, the user may be (1) billed for data processing services by the vendor, or alternatively (2) may be provided free services from the vendor if the user takes certain actions in relation to it, or even You can receive credit.

일 어레인지먼트에서, 이용자는 - 이용자에 의해 전달되는 모바일 디바이스에 의해 감지되는 바와 같이 - 판촉 콘텐트를 제공받는 교환으로 상업적 이익을 수신받는다.In one arrangement, the user receives commercial interest in an exchange for providing promotional content-as sensed by the mobile device delivered by the user.

일 어레인지먼트에서, 제 1 이용자는 제 2 당사자가 제 1 이용자와 제 2 당사자 사이의 소셜 네트워킹 접속에 의해, 제 1 이용자의 신용들을 소비하거나 제 1 이용자에 의해 비용들이 생겨나게 하도록 허용한다. 또한, 이러한 어레인지먼트에서, 소셜 네트워킹 웹 페이지는 제 2 당사자가 이러한 신용들의 소비, 또는 이러한 비용 발생시 상호작용하도록 구성된다. In one arrangement, the first user allows the second party to consume credits of the first user or to incur costs by the first user by a social networking connection between the first user and the second party. Also in this arrangement, the social networking web page is configured for the second party to interact upon the consumption of these credits, or such costs.

자선단체 기금을 위한 일 어레인지먼트에서, 이용자는 자선단체에 대한 이용자 기부를 용이하게 하는 컴퓨터-관련 처리를 트리거링하기 위해, 자선단체 조직과 연관된 물리적 오브젝트와 상호작용한다. In an arrangement for a charity fund, the user interacts with a physical object associated with the charity organization to trigger a computer-related process that facilitates user donation to the charity.

휴대용 디바이스에서, 하나 이상의 물리적 센서들로부터 입력을 수신하고, 하나 이상의 로컬 서비스들에 의한 처리를 활용하고, 또한, 하나 이상의 원격 서비스들에 의한 처리를 활용하며, 디바이스의 소프트웨어는 하나 이상의 추상화 계층들을 포함하며, 이를 통해, 상기 센서들, 로컬 서비스들 및 원격 서비스들이 디바이스 아키텍처에 인터페이싱하여 동작을 용이하게 한다.In a portable device, it receives input from one or more physical sensors, utilizes processing by one or more local services, and also utilizes processing by one or more remote services, and the software of the device provides one or more abstraction layers. And through which the sensors, local services, and remote services interface to a device architecture to facilitate operation.

휴대용 디바이스에서, 하나 이상의 물리적 센서들로부터 입력을 수신하고, 입력을 처리하여 그 결과를 키벡터 형태로 패키징하고, 그 디바이스로부터 키벡터 형태를 송신한다. 또한, 그러한 어레인지먼트에서, 디바이스는 키벡터가 송신된 원격 리소스로부터 다시 키벡터에 대한 다른 처리된 사본을 수신한다. 또한, 이러한 어레인지먼트에서, 키벡터 형태는 콘텍스트에 따라 내포되는 하나 이상의 명령어들에 따라 - 휴대용 디바이스 또는 원격 디바이스 상에서 - 처리된다. In a portable device, it receives an input from one or more physical sensors, processes the input and packages the result into a keyvector form, and transmits the keyvector form from the device. Also, in such an arrangement, the device receives another processed copy of the keyvector from the remote resource to which the keyvector was sent. Also in such an arrangement, the keyvector form is processed-on a portable device or a remote device-according to one or more instructions implied in accordance with the context.

모바일 폰에 의해 감지된 물리적 자극에 응답하기 위한 분산된 처리 아카텍처에서, 아키텍처는 모바일 폰 상의 로컬 처리 및 원격 컴퓨터 상의 원격 처리를 활용하고, 두 처리들은 패킷 네트워크 및 처리간 통신 구조에 의해 링킹되고, 아키텍처는 또한 상이한 처리들이 통신할 수 있는 프로토콜을 포함하고, 이 프로토콜은 메시지 큐 또는 충돌 처리 어레인지먼트와 함께 메시지 통과 패러다임을 포함한다. 또한, 이러한 어레인지먼트에서, 하나 이상의 물리적 센서 구성요소들에 대한 구동기 소프트웨어 패킷 형태로 센서 데이터를 제공하고, 그 센서와 고유하게 연관되거나 복수의 구성요소들과 공용으로 연관되게 패킷을 출력 큐 상에 배치하고; 패킷이 원격으로 처리되는 것이 아닌 한, 로컬 처리들은 그 패킷들에 대해 동작하고, 결과로서 생긴 패킷들을 다시 큐 상에 배치하고, 원격으로 처리되는 패킷인 경우, 그것은 라우터 어레인지먼트에 의해 원격 처리에 향해진다. In a distributed processing architecture for responding to physical stimuli sensed by the mobile phone, the architecture utilizes local processing on the mobile phone and remote processing on a remote computer, the two processes being linked by a packet network and an interprocess communication structure. The architecture also includes a protocol through which different processes can communicate, which protocol includes a message passing paradigm with a message queue or conflict handling arrangement. In this arrangement, the sensor data is also provided in the form of driver software packets for one or more physical sensor components, and the packets are placed on an output queue to be uniquely associated with the sensor or commonly associated with a plurality of components. and; Unless a packet is processed remotely, local processes operate on those packets, place the resulting packets back on the queue, and if the packet is processed remotely, it is directed to remote processing by the router arrangement. Lose.

일 어레인지먼트에서, 특정 물리적 장소와 연관된 네트워크는 네트워크 상의 트래픽을 참조하여, 그 장소에 대한 방문자들의 세트가 소셜 접속을 가지는지의 여부를 자동으로 구별하도록 적응한다. 또한, 그러한 어레인지먼트에서는 또한 그룹의 인구 통계적 특성을 구별하는 것을 포함한다. 또한, 그러한 어레인지먼트에서, 네트워크는 소셜 접속을 하지는 것으로 구별된 방문자들 사이에서 ad hoc 네트워킹을 용이하게 한다. In one arrangement, the network associated with a particular physical place adapts to refer to traffic on the network to automatically distinguish whether or not the set of visitors to that place have a social connection. In addition, such arrangements also include distinguishing the demographic characteristics of the groups. Also, in such an arrangement, the network facilitates ad hoc networking among visitors who are identified as not having a social connection.

일 어레인지먼트에서, 공개 장소에서 컴퓨터 리소스들을 포함하는 네트워크는 상기 장소를 방문하는 이용자들의 거동의 예측 모델에 따라 동적으로 재구성된다. 또한, 그러한 어레인지먼트에서, 네트워크 재구성은 부분적으로 콘텍스트에 기초한다. 또한, 그러한 어레인지먼트에서, 네트워크 재구성은 특정 콘텐트를 캐싱하는 것을 포함한다. 또한, 그러한 어레인지먼트에서, 재구성은 합성된 콘텐트를 렌더링하고, 이를 신속하게 이용 가능하게 하기 위하여 하나 이상의 컴퓨터 리소스들에 저장하는 것을 포함한다. 또한, 이러한 어레인지먼트에서는 이용자들로부터 트래픽의 일시적 증가의 예상시 시간-둔감한 네트워크 트래픽을 다시 조절하는 것을 포함한다. In one arrangement, a network comprising computer resources in a public place is dynamically reconfigured according to a predictive model of the behavior of users visiting the place. Also in such an arrangement, network reconfiguration is based in part on the context. Also, in such an arrangement, network reconstruction includes caching specific content. Also in such an arrangement, the reconstruction includes rendering the synthesized content and storing it in one or more computer resources to make it available quickly. This arrangement also includes re-balancing time-sensitive network traffic in anticipation of a temporary increase in traffic from users.

일 어레인지먼트에서, 광고는 현실 세계 콘텐트와 연관되고, 이에 따른 요금 청구는 - 이용자의 모바일 폰들에서 센서들에 의해 표시된 바와 같이 - 상기 콘텐트에 대한 노출의 조사들에 기초하여 평가된다. 또한, 이러한 어레인지먼트에서, 요금 청구는 자동화된 경매 어레인지먼트의 이용을 통해 설정된다.In one arrangement, the advertisement is associated with real-world content, and billing accordingly is assessed based on surveys of exposure to the content-as indicated by sensors in the user's mobile phones. Also, in such arrangements, billing is established through the use of automated auction arrangements.

공개 장소에서 2개의 대상들을 포함하는 일 어레인지먼트에서, 상기 대상들에 관한 조명은 - 대상들에 대한 사람 근접의 속성에 기초하여 - 상이하게 요금 청구된다. In one arrangement comprising two objects in a public place, the illumination for the objects is charged differently-based on the nature of the person's proximity to the objects.

일 어레인지먼트에서, 콘텐트는 공개 장소에서 사람들에게 제공되고, 제공된 콘텐트와 보조 콘텐트 사이에는 링크가 존재하고, 링크된 보조 콘텐트는 콘텐트가 제공되는 사람의 인구 통계적 속성에 따라 요금 청구된다. In one arrangement, the content is provided to people at a public location, a link exists between the provided content and the supplemental content, and the linked supplemental content is billed according to the demographic attributes of the person from whom the content is provided.

일 어레인지먼트에서, 특정 콘텐트에 대한 임시 전자 라이센스는 공개 장소에 대한 사람의 방문과 관련된 사람에게 제공된다. In one arrangement, a temporary electronic license for specific content is provided to a person associated with a person's visit to the public place.

일 어레인지먼트에서, 모바일 폰은 인간 시각 시스템 처리부와 기계 비전 처리부 양쪽 모두에 접속된 이미지 센서를 포함하고, 이미지 센서는 인간 시각 시스템 처리부를 통하지 않고 기계 비전 처리부에 결합된다. 또한, 이러한 어레인지먼트에서, 인간 시각 시스템 처리부는 백색 밸런스 정정 모듈, 감마 정정 모듈, 에지 향상 모듈 및/또는 JPEG 압축 모듈을 포함한다. 또한, 이러한 어레인지먼트에서, 기계 비전 처리부는 FFT 모듈, 에지 검출 모듈, 패턴 추출 모듈, 푸리에-멜린 처리 모듈, 텍스처 분류기 모듈, 컬러 히스토그램 모듈, 움직임 검출 모듈, 및/또는 특징 인식 모듈을 포함한다. In one arrangement, the mobile phone includes an image sensor connected to both the human vision system processor and the machine vision processor, and the image sensor is coupled to the machine vision processor without passing through the human vision system processor. Also, in this arrangement, the human vision system processing unit includes a white balance correction module, a gamma correction module, an edge enhancement module and / or a JPEG compression module. In this arrangement, the machine vision processing unit also includes an FFT module, an edge detection module, a pattern extraction module, a Fourier-Mellin processing module, a texture classifier module, a color histogram module, a motion detection module, and / or a feature recognition module.

일 어레인지먼트에서, 모바일 폰은 이미지-관련된 데이터를 처리하기 위한 복수의 단 및 이미지 센서를 포함하며, 데이터 구동된 패킷 아키텍처가 활용된다. 또한, 이러한 어레인지먼트에서, 패킷의 헤더 내의 정보는 이미지 데이터를 처음 캡처할 때 이미지 센서에 의해 적용되는 파라미터들을 결정한다. 또한, 이러한 어레인지먼트에서, 패킷의 헤더 내의 정보는 패킷의 몸체에서 전달되는 이미지-관련된 데이터에 대해 복수의 스테이지들에 의해 실행되는 처리를 결정한다.In one arrangement, the mobile phone includes a plurality of stages and image sensors for processing image-related data, and a data driven packet architecture is utilized. Also in this arrangement, the information in the header of the packet determines the parameters applied by the image sensor when the image data is first captured. Also in such an arrangement, the information in the header of the packet determines the processing performed by the plurality of stages for the image-related data carried in the body of the packet.

일 어레인지먼트에서, 모바일 폰은 이미지-관련된 처리를 실행하기 위한 하나 이상의 원격 처리기들과 협력한다. 또한, 이러한 어레인지먼트에서, 모바일 폰은 이미지를 - 관련된 이미지를 패킷들로 - 패키징하며, 이중 적어도 일부는 이미지 데이터의 단일 프레임보다 적게 포함한다. 또한, 이러한 어레인지먼트에서, 모바일 폰은 모바일 폰 내의 처리기에 의한 처리를 위해 특정 이미지-관련된 데이터를 라우팅하고, 원격 처리기에 의한 처리를 위해 특정 이미지-관련된 데이터를 라우팅한다. In one arrangement, the mobile phone cooperates with one or more remote processors to perform image-related processing. Also in such an arrangement, the mobile phone packages the image-the related image into packets-at least some of which contain less than a single frame of image data. Also in this arrangement, the mobile phone routes certain image-related data for processing by the processor in the mobile phone and routes the specific image-related data for processing by the remote processor.

일 어레인지먼트에서, 모바일 폰은 원격 라우팅 시스템과 협력하고, 원격 라우팅 시스템은 상이한 원격 처리기들에 의한 처리를 위해, 그리고, 모바일 폰으로 리턴하기 위해 상기 처리기들로부터 처리된 이미지-관련된 데이터를 수집하기 위해, 모바일 폰으로부터 이미지-관련된 데이터를 배포하도록 서빙한다. 또한, 일 어레인지먼트에서, 모바일 폰은 처리를 위해 모바일 폰의 내부의 하나 이상의 처리기들에 또는 원격 처리기들에 의한 처리를 위해 원격 라우팅 시스템에 이미지-관련된 데이터를 배포하도록 서빙하는 내부 라우팅 시스템을 포함한다. In one arrangement, the mobile phone cooperates with a remote routing system, which collects processed image-related data from the processors for processing by different remote processors and for return to the mobile phone. Serve to distribute image-related data from the mobile phone. In addition, in one arrangement, the mobile phone includes an internal routing system that serves to distribute image-related data to one or more processors inside the mobile phone for processing or to a remote routing system for processing by remote processors. .

일 어레인지먼트에서, 모바일 폰으로부터의 이미지-관련된 데이터는 처리를 위해원격 처리기에 참조되고, 원격 처리기는 복수의 원격 처리기들을 관련시키는 자동화된 평가에 의해 선택된다. 또한, 이러한 어레인지먼트에서, 평가는 역경매를 포함한다. 또한, 이러한 어레인지먼트에서, 선택된 원격 처리기로부터의 출력 데이터는 모바일 폰에 리턴된다. 또한, 이러한 어레인지먼트에서, 이미지-관련된 데이터는 선택된 처리기에 송신되기 전에 모바일 폰에서 처리 모듈에 의해 처리된다. 또한, 이러한 어레인지먼트에서, 모바일 폰으로부터의 다른 이미지-관련된 데이터는 선택된 처리기와는 다른 원격 처리기에 참조된다. In one arrangement, the image-related data from the mobile phone is referenced to the remote processor for processing, and the remote processor is selected by an automated assessment involving multiple remote processors. Also in this arrangement, the evaluation includes reverse auction. Also in this arrangement, output data from the selected remote processor is returned to the mobile phone. Also in this arrangement, the image-related data is processed by the processing module at the mobile phone before being sent to the selected processor. Also in this arrangement, other image-related data from the mobile phone is referenced to a remote processor other than the selected processor.

일 어레인지먼트에서, 이미지 데이터는 복수-평면 데이터 구조 중 적어도 하나의 평면 데이터 구조에 저장되고, 이미지 데이터에 관련된 메타데이터의 그래픽 표현은 다른 평면의 데이터 구조에 저장된다. 또한, 이러한 어레인지먼트에서, 메타데이터는 이미지 데이터로부터 도출된 에지 맵 데이터를 포함한다. 또한, 이러한 어레인지먼트에서, 메타데이터는 이미지 데이터에서 인식된 얼굴들에 관한 정보를 포함한다. In one arrangement, the image data is stored in at least one planar data structure of the multi-plane data structure, and the graphical representation of metadata related to the image data is stored in a data structure of another plane. Also in this arrangement, the metadata includes edge map data derived from the image data. Also in this arrangement, the metadata includes information about the faces recognized in the image data.

일 어레인지먼트에서, 카메라-장착된 모바일 폰은 (1) 수평으로부터의 회전; (2) 초기 시간 이후의 회전; 및 (3) 초기 시간 이후의 스케일 변화 중 적어도 하나를 나타내는 데이터를 디스플레이하며, 디스플레이되는 데이터는 카메라로부터의 정보를 참조하여 결정된다. In one arrangement, the camera-mounted mobile phone comprises (1) rotation from horizontal; (2) rotation after initial time; And (3) data representing at least one of the scale change after the initial time, wherein the displayed data is determined with reference to information from the camera.

일 어레인지먼트에서, 카메라-장착된 모바일 폰은 제 1 및 제 2 병렬 처리부들을 포함하고, 제 1 처리부는 인간 뷰어들에 의한 이용을 위해 지각적 형태로 렌더링될 이미지 데이터를 처리하고, 모자이크-해제 처리기, 백색 밸런스 정정 모듈, 감마 정정 모듈, 에진 향상 모듈, 및 JPEG 압축 모듈 중 적어도 하나를 포함하고, 제 2 처리부는 그로부터 의미 정보를 도출하기 위해 이미지 데이터를 분석한다. In one arrangement, the camera-mounted mobile phone includes first and second parallel processing units, the first processing unit processing image data to be rendered in a perceptual form for use by human viewers, and a mosaic-releasing processor. And at least one of a white balance correction module, a gamma correction module, an edge enhancement module, and a JPEG compression module, wherein the second processor analyzes the image data to derive semantic information therefrom.

일 어레인지먼트에서, 대상에 관련된 2 이상 차원들의 정보는 모바일 폰의 스크린 상에 제공되고, 제 1 이용자 인터페이스 제어의 동작은 제 1 차원에서 대상에 관련된 정보를 제공하는 스크린의 시퀀스를 제공하고, 제 2 이용자 인터페이스 제어의 동작은 제 2 차원에서 대상에 관련된 정보를 제공하는 스크린의 시퀀스를 제공한다. 또한, 이러한 어레인지먼트에서, 대상은 그 대상을 제공하는 스크린이 디스플레이되는 동안 이용자 인터페이스 제어를 조작함으로써 변경될 수 있다. 또한, 이러한 어레인지먼트에서, 대상은 이미지이고, 제 1 차원은 (1) 지리적 위치, (2) 출현, 또는 (3) 콘텐트 기술 메타데이터 중 하나에서 이미지와의 유사성이 있고, 제 2 차원은 상기 (1), (2) 또는 (3) 중 상이한 하나에서 이미지와의 유사성이 있다. In one arrangement, information of two or more dimensions related to the object is provided on the screen of the mobile phone, the operation of the first user interface control providing a sequence of screens providing information related to the object in the first dimension, and the second The operation of the user interface control provides a sequence of screens that provide information related to the object in the second dimension. Also in this arrangement, the object can be changed by manipulating user interface controls while the screen providing the object is displayed. Also, in this arrangement, the object is an image, the first dimension being similar to the image in one of (1) geographic location, (2) appearance, or (3) content description metadata, and the second dimension is the ( There is a similarity with the image in the different one of 1), (2) or (3).

카메라-장착된 휴대용 디바이스 상에서 텍스트 메시지를 구성하는 일 어레인지먼트에서, 디바이스는 디스플레이된 아이콘들의 시퀀스를 통해 스크롤링하기 위해 제 1 방향으로 틸팅되고 - 각각은 알파벳의 복수의 글자들을 표현하고, 그 후에 복수의 글자들 중에서 선택한다. 또한, 이러한 어레인지먼트에서는 복수의 글자들 중에서 선택하기 위해 제 2 방향으로 디바이스를 틸팅하는 것을 포함한다. 또한, 이러한 어레인지먼트에서, 틸팅은 카메라에 의해 캡처된 이미지 데이터를 참조하여 감지된다. 또한, 이러한 어레인지먼트에서, 상이한 캐릭터들의 틸트들은 상이한 의미들에 기인된다. In one arrangement of composing a text message on a camera-mounted portable device, the device is tilted in a first direction to scroll through a sequence of displayed icons-each representing a plurality of letters of the alphabet, and then a plurality of Choose from letters. The arrangement also includes tilting the device in a second direction to select among a plurality of letters. Also in this arrangement, the tilt is sensed with reference to the image data captured by the camera. Also in this arrangement, the tilts of the different characters are due to different meanings.

일 어레인지먼트에서, 카메라-장착된 모바일 폰은 상태 머신으로서 기능하여, 이전에 획득된 이미지-관련된 정보에 기초하여 그 기능의 양태를 변경한다. In one arrangement, the camera-mounted mobile phone functions as a state machine to change aspects of its functionality based on previously acquired image-related information.

일 어레인지먼트에서, 처리기-장착된 디바이스에 대응하는 식별 정보는 대응하는 애플리케이션 소프트웨어를 식별하고 식별하기 위해 이용되며, 그 후에는 모바일 폰 디바이스의 동작을 프로그래밍하기 위해 이용되며, 프로그래밍된 모바일 디바이스는 그 디바이스에 대한 제어기의 역할을 한다. 또한, 이러한 어레인지먼트에서, 디바이스는 서모스탯, 파킹 미터, 알람 클럭, 또는 차량이다. 또한, 이러한 어레인지먼트에서, 모바일 폰 디바이스는 디바이스의 이미지를 캡처하고, 소프트웨어는 디바이스용 이용자 인터페이스가 캡처된 이미지 상의 그래픽 오버레이로서 제공되게 한다(선택적으로, 그래픽 오버레이는 캡처된 이미지에서 디바이스의 위치 또는 포즈에 대응하는 위치 또는 포즈로 제공된다).In one arrangement, the identification information corresponding to the processor-mounted device is used to identify and identify the corresponding application software, which is then used to program the operation of the mobile phone device, the programmed mobile device being the device. It acts as a controller for. Also in such an arrangement, the device is a thermostat, parking meter, alarm clock, or vehicle. Also, in such an arrangement, the mobile phone device captures an image of the device, and the software causes the user interface for the device to be provided as a graphic overlay on the captured image (optionally, the graphic overlay may pose or pose the device in the captured image) Is provided in the corresponding position or pose).

일 어레인지먼트에서, 모바일 폰의 스크린은 개별 디바이스에 대한 하나 이상의 이용자 인터페이스 제어들을 제공하고, 스크린 상의 이용자 인터페이스 제어들은 개별 디바이스의 폰-캡처된 이미지와 조합하여 제공된다. 또한, 이러한 어레인지먼트에서, 이용자 인터페이스 제어는 개별 디바이스의 제어에 관련된 명령어를 발행하기 위해 이용되고, 스크린은 제 1 방식으로 명령어에 대응하는 정보를 시그널링하는 동안 명령어는 계류중이고, 제 2 방식으로 일단 명령어가 성공적으로 실행된다. In one arrangement, the screen of the mobile phone provides one or more user interface controls for the individual device, and the user interface controls on the screen are provided in combination with the phone-captured image of the individual device. Also, in such an arrangement, user interface control is used to issue commands related to the control of an individual device, the screen is pending while the screen signals information corresponding to the command in a first manner, and once in a second manner. Is executed successfully.

일 어레인지먼트에서, 모바일 폰의 스크린 상에 제공된 이용자 인터페이스는 폰이 디바이스와 물리적으로 근접할 때 개별 디바이스와의 트랜잭션을 초기화하기 위해 이용되고, 모바일 폰은 개별 디바이스에 관련되지 않은 용도로 나중에 이용되고, 더 나중에 이용자 인터페이스는 디바이스와의 다른 트랜잭션에서 연계하기 위해 모바일 폰의 스크린에 리콜된다. 또한, 이러한 어레인지먼트에서, 이용자 인터페이스는 모바일 폰이 디바이스로부터 원격에 있을 때 디바이스와의 다른 트랜잭션에 연계하기 위해 리콜된다. 또한, 이러한 어레인지먼트에서, 디바이스는 파킹 미터, 차량 또는 서모스탯을 포함한다. In one arrangement, the user interface provided on the screen of the mobile phone is used to initiate a transaction with an individual device when the phone is in physical proximity to the device, the mobile phone is later used for purposes not related to the individual device, Later the user interface is recalled on the screen of the mobile phone to associate in another transaction with the device. Also in this arrangement, the user interface is recalled to associate with other transactions with the device when the mobile phone is remote from the device. In this arrangement, the device also includes a parking meter, a vehicle or a thermostat.

일 어레인지먼트에서, 모바일 폰은 상이한 디바이스들에 대응하는 여러 이용자 인터페이스들 사이에서의 선택을 허용하는 이용자 인터페이스를 제공하여, 폰은 복수의 개별 디바이스들과 상호작용시 이용될 수 있다.In one arrangement, the mobile phone provides a user interface that allows selection between several user interfaces corresponding to different devices, such that the phone can be used when interacting with a plurality of individual devices.

일 어레인지먼트에서, 네트워크-접속된 디바이스의 하우징으로부터 정보를 감지하기 위해 모바일 폰을 이용하는 것과 그러한 정보의 이용을 통해, 디바이스에 대응하는 키를 이용하여 정보를 암호화하는 것을 포함한다.In one arrangement, using a mobile phone to sense information from a housing of a network-connected device and encrypting the information using a key corresponding to the device, through the use of such information.

일 어레인지먼트에서, 모바일 폰은 무선 디바이스가 장착된 디바이스로부터 정보를 감지하고, 모바일 폰으로부터 관련된 정보를 송신하기 위해 이용되고, 송신된 데이터는 디바이스에 대한 이용자 근접을 확인하도록 서빙한다. 또한, 이러한 어레인지먼트에서, 이러한 근접은 이용자가 모바일 폰을 이용하여 디바이스와 상호작용하도록 허용하기 전에 요구된다. 또한, 이러한 어레인지먼트에서, 감지된 정보는 아날로그 정보이다. In one arrangement, a mobile phone is used to sense information from a device equipped with a wireless device and to transmit relevant information from the mobile phone, and the transmitted data serves to confirm user proximity to the device. Also in this arrangement, such proximity is required before allowing the user to interact with the device using the mobile phone. Also, in this arrangement, the sensed information is analog information.

휴대용 전자 디바이스 및 재구성 가능한 하드웨어를 활용하는 일 어레인지먼트에서, 이용하기 위해 준비되도록 초기화될 때, 하드웨어에 대한 업데이트된 구성 명령어들은 원격 소스로부터 무선으로 다운로드되고, 재구성 가능한 하드웨어를 구성하기 위해 이용된다. In one arrangement utilizing a portable electronic device and reconfigurable hardware, when initialized to be ready for use, updated configuration instructions for the hardware are downloaded wirelessly from a remote source and used to configure reconfigurable hardware.

일 어레인지먼트에서, 무선 시스템 기지국의 하드웨어 처리 구성요소는 기지국과 복수의 연관된 원격 무선 디바이스들 사이에서 교환된 무선 신호들에 관련된 데이터를 처리하기 위해 활용되고, 또한 카메라 디바이스에 의한 처리를 위해 무선 기지국에 오프로딩된 이미지-관련된 데이터를 처리하기 위해 활용된다. 또한, 이러한 어레인지먼트에서, 하드웨어 처리 구성요소는 하나 이상의 필드 프로그래밍 가능한 오브젝트 어레이들을 포함하고, 원격 무선 디바이스들은 모바일 폰들을 포함한다. In one arrangement, the hardware processing component of the wireless system base station is utilized to process data related to wireless signals exchanged between the base station and the plurality of associated remote wireless devices, and is also used by the wireless base station for processing by the camera device. It is utilized to process offloaded image-related data. Also, in such an arrangement, the hardware processing component includes one or more field programmable object arrays and the remote wireless devices include mobile phones.

일 어레인지먼트에서, 광학 왜곡 기능이 특징지워지고, 광학적으로 왜곡된 이미지가 프로젝팅되는 대응하는 가상 정정 표면의 기하학을 규정하기 위해 이용되고, 기하학은 프로젝팅된 이미지의 왜곡을 중화시킨다. 또한, 일 어레인지턴트에서, 이미지는 그 토폴로지가 이미지에 존재하는 왜곡을 중화시키도록 형성되는 가상 표면상으로 프로젝팅된다. 또한, 그러한 어레인지먼트들에서, 왜곡은 렌즈에 의해 도입되는 왜곡을 포함한다. In one arrangement, an optical distortion function is characterized and used to define the geometry of the corresponding virtual correction surface on which the optically distorted image is projected, the geometry neutralizing the distortion of the projected image. Also, in one arrangement, the image is projected onto a virtual surface whose topology is formed to neutralize the distortion present in the image. Also in such arrangements, the distortion includes the distortion introduced by the lens.

일 어레인지먼트에서, 무선 스테이션은 모바일 디바이스로부터 서비스 예약 메시지를 수신하고, 메시지는 모바일 디바이스가 즉시 이용하지 않고 미래의 시간에 이용 가능하게 되도록 요청하는 미래의 서비스의 하나 이상의 파라미터들을 포함하고; 제 2 모바일 디바이스에 제공되는 서비스에 대해 - 제 1 모바일 디바이스로부터 수신된 서비스 예약 메시지에 적어도 부분적으로 기초하여 결정하여, 무선 스테이션의 리소스 할당은 제 1 모바일 디바이스에 제공될 예상 서비스들에 관한 개량 정보로 인해 개선된다. In one arrangement, the wireless station receives a service reservation message from the mobile device, the message including one or more parameters of a future service requesting that the mobile device be available at a future time without immediate use; With respect to the service provided to the second mobile device-determining based at least in part on the service reservation message received from the first mobile device, the resource allocation of the wireless station is improved information regarding expected services to be provided to the first mobile device. Is improved.

일 어레인지먼트에서, 열전 냉각 디바이스가 모바일 폰의 이미지 센서에 결합되고, 캡처된 이미지 데이터에서 잡음을 감소시키기 위해 선택적으로 활성화된다. In one arrangement, a thermoelectric cooling device is coupled to the image sensor of the mobile phone and selectively activated to reduce noise in the captured image data.

일 어레인지먼트에서, 모바일 폰은 제 1 및 제 2 무선으로 링크된 부분들을 포함하고, 제 1 부분은 광 센서 및 렌즈 어셈블리를 포함하고 이용자의 신체에 대해 제 1 위치에 휴대되도록 적응되고, 제 2 부분은 디스플레이 및 이용자 인터페이스를 포함하고, 제 2의 상이한 위치에 휴대되도록 적응된다. 또한, 이러한 어레인지먼트에서, 제 2 부분은 제 1 부분에 탈착 가능하게 수용되도록 적응된다. In one arrangement, the mobile phone includes first and second wirelessly linked portions, the first portion comprising an optical sensor and a lens assembly and adapted to be carried in a first position relative to the user's body, the second portion Includes a display and a user interface and is adapted to be carried in a second, different location. Also in this arrangement, the second part is adapted to be detachably received in the first part.

일 어레인지먼트에서, 모바일 폰은 제 1 및 제 2 무선으로 링크된 부분들을 포함하고, 제 1 무선으로 링크된 부분은 제 2 무선으로 링크된 부분에 조립적으로 탈착 가능하게 결합된 LED 조명을 포함하고, 제 2 무선으로 링크된 부분은 디스플레이, 이용자 인터페이스, 광 센서 및 렌즈를 포함하고, 제 1 무선으로 링크된 부분은 제 2 무선으로 링크된 부분으로부터 탈착될 수 있고, 제 2 무선으로 링크된 부분의 광 센서에 의해 이미징되는 대상을 조명하기 위해 배치된다. In one arrangement, the mobile phone includes first and second wirelessly linked portions and the first wirelessly linked portion includes LED lights that are assembleably detachably coupled to the second wirelessly linked portion. The second wirelessly linked portion includes a display, a user interface, an optical sensor and a lens, wherein the first wirelessly linked portion can be detached from the second wirelessly linked portion, and the second wirelessly linked portion It is arranged to illuminate the object being imaged by the light sensor.

일 어레인지먼트에서, 카메라-장착된 모바일 폰은 복수의 처리 스테이지들의 선택을 통해 이미지 데이터를 처리하고, 하나의 처리 스테이지의 선택은 이전 처리 스테이지로부터 출력된 처리된 이미지 데이터의 속성에 의존한다. In one arrangement, the camera-mounted mobile phone processes the image data through the selection of a plurality of processing stages, and the selection of one processing stage depends on the attributes of the processed image data output from the previous processing stage.

일 어레인지먼트에서, 상이한 이미지 처리 스테이지들 중 조건적 브랜칭은 카메라-장착된 모바일 폰에서 활용된다. 또한, 이러한 어레인지먼트에서, 스테이지들은 패킷 데이터에 응답하고, 조건적 브랜칭 명령어들은 패킷 데이터에서 전달된다.In one arrangement, conditional branching of different image processing stages is utilized in a camera-mounted mobile phone. Also in such an arrangement, the stages respond to packet data and conditional branching instructions are carried in the packet data.

일 어레인지먼트에서, 카메라-장착된 모바일 폰의 GPU는 카메라에 의해 캡처된 이미지 데이터를 처리하기 위해 활용된다. In one arrangement, the GPU of a camera-mounted mobile phone is utilized to process image data captured by the camera.

일 어레인지먼트에서, 카메라-장착된 모바일 디바이스는 조명의 일시적 변동들을 감지하고 그 동작에서 그러한 변동들을 고려한다. 또한, 이러한 어레인지먼트에서, 카메라는 일시적으로 변하는 조명의 매래 상태를 예측하고, 상기 조명이 원하는 상태를 갖는 것으로 예상될 대 이미지 카메라를 캡처한다.In one arrangement, the camera-mounted mobile device detects temporary variations in illumination and takes those variations into operation. In this arrangement, the camera also predicts the future state of the temporarily changing illumination and captures the image camera when the illumination is expected to have the desired state.

일 어레인지먼트에서, 카메라-장착된 모바일 폰에는 2개 이상의 카메라들이 장착된다. In one arrangement, the camera-mounted mobile phone is equipped with two or more cameras.

일 어레인지먼트에서, 모바일 폰에는 2개 이상의 프로젝터들이 장착된다. 또한, 이러한 어레인지먼트에서, 프로젝터들은 표면 상에 패턴을 교대로 프로젝팅하고, 프로젝팅된 패턴들은 모바일 폰의 카메라부에 의해 감지되어 토폴로지 정보를 식별하기 위해 이용된다.In one arrangement, the mobile phone is equipped with two or more projectors. Also in such an arrangement, the projectors alternately project patterns on the surface, and the projected patterns are sensed by the camera portion of the mobile phone and used to identify the topology information.

일 어레인지먼트에서, 카메라-장착된 모바일 폰에는 그 후에 카메라에 의해 캡처되는 표면상에 패턴을 프로젝팅하는 프로젝트가 장착되고, 모바일 폰은 표면의 토폴로지에 관한 정보를 식별할 수 있다. 또한, 이러한 어레인지먼트에서, 오브젝트를 식별하는데 도움을 주기 위하여 이용된다. 또한, 이러한 어레인지먼트에서, 식별된 토폴로지 정보는 카메라에 의해 캡처된 이미지 정보를 정상화하기 위해 이용된다. In one arrangement, the camera-mounted mobile phone is then equipped with a project that projects a pattern onto a surface that is captured by the camera, and the mobile phone can identify information about the topology of the surface. It is also used in this arrangement to help identify the object. Also in this arrangement, the identified topology information is used to normalize the image information captured by the camera.

일 어레인지먼트에서, 모바일 폰의 카메라 및 프로젝터 부분은 적어도 하나의 광학 구성요소를 공유한다. 또한, 이러한 어레인지먼트에서, 카메라 및 프로젝터 부분은 렌즈를 공유한다. In one arrangement, the camera and projector portions of the mobile phone share at least one optical component. Also, in this arrangement, the camera and projector portions share a lens.

일 어레인지먼트에서, 카메라-장착된 모바일 폰은 복수의 처리 모듈들 사이에 이미지-관련된 데이터를 라우팅하기 위해 패킷 아키텍처를 활용한다. 또한, 이러한 어레인지먼트에서, 패킷들은 처리 모듈들이 응답하는 명령어들을 추가적으로 전달한다. 또한, 이러한 어레인지먼트에서, 폰의 이미지 센서는 그에 대한 이미지 캡처 명령어들을 전달하는 패킷에 응답한다. In one arrangement, a camera-mounted mobile phone utilizes a packet architecture to route image-related data between a plurality of processing modules. Also in this arrangement, the packets additionally carry instructions that the processing modules respond to. Also in this arrangement, the phone's image sensor responds to a packet carrying image capture instructions therefor.

일 어레인지먼트에서, 카메라-장착된 모바일 폰의 이미지 캡처 시스템은 그에 제공된 자동화된 명령어들에 따라 상이한 타입들의 제 1 및 제 2 시퀀스 세트들을 출력한다. 또한, 이러한 어레인지먼트에서, 시퀀스 세트들은 크기, 컬러 또는 해상도가 상이하다. In one arrangement, the image capture system of the camera-mounted mobile phone outputs different types of first and second sequence sets in accordance with the automated instructions provided thereto. Also, in such arrangements, the sequence sets differ in size, color or resolution.

일 어레인지먼트에서, 카메라-장착된 모바일 폰은 비주얼 데이터 세트들의 시퀀스를 캡처하고, 세트들 중 하나를 캡처하는데 이용되는 파라미터는 이전-캡처된 데이터 세트의 분석에 의존한다. In one arrangement, the camera-mounted mobile phone captures a sequence of visual data sets, and the parameters used to capture one of the sets depend on the analysis of the previously-captured data set.

일 어레인지먼트에서, 카메라-장착된 모바일 폰은 분석을 위해 복수의 경합하는 클라우드-기반 서비스들 중 하나에 이미지-관련된 데이터를 송신한다. 또한, 이러한 어레인지먼트에서, 분석은 얼굴 인식, 광학 캐릭터 인식 또는 FFT 동작을 포함한다. 또한, 이러한 어레인지먼트에서, 규칙 세트에 기초하여 복수의 경합하는 서비스들로부터 서비스를 선택하는 것을 포함한다. In one arrangement, the camera-mounted mobile phone sends image-related data to one of a plurality of competing cloud-based services for analysis. Also in this arrangement, the analysis includes face recognition, optical character recognition or FFT motion. Also in this arrangement, selecting a service from a plurality of contending services based on a rule set.

일 어레인지먼트에서, 카메라-장착된 모바일 폰은 처리를 위해 클라우드-기반 서비스에 이미지-관련된 데이터를 전송하고, 응답에서 오디오 또는 비디오 데이터, 또는 자바스크립트 명령어들을 수신한다. In one arrangement, the camera-mounted mobile phone sends image-related data to a cloud-based service for processing and receives audio or video data, or JavaScript instructions in response.

일 어레인지먼트에서, 카메라-장착된 모바일 폰은 처리를 위해 클라우드-기반 서비스에 이미지-관련된 데이터를 송신하고, 폰은 이미지-관련된 데이터의 송신을 예상하여 서비스 또는 통신 채널을 사전-워밍한다. 또한, 이러한 어레인지먼트에서, 사전 워밍된 서비스 또는 채널은 환경들에 기초하여 예측에 의해 식별된다.In one arrangement, the camera-mounted mobile phone sends image-related data to a cloud-based service for processing, and the phone pre-warms the service or communication channel in anticipation of the transmission of the image-related data. Also in such an arrangement, the prewarmed service or channel is identified by prediction based on the circumstances.

일 어레인지먼트에서, 카메라-장착된 모바일 폰은 이용자가 선택할 수 있는 복수의 모드들을 가지고, 모드들 중 하나는 얼굴 인식 모드, 광학 캐릭터 인식 모드, 이미징된 항목을 구매하는 것과 연관된 모드, 이미징된 항목을 판매하는 것과 연관된 모드, 또는 이미징 항목, 장면 또는 사람에 관한 정보를 결정하는 모드(예를 들면, 위키피디아, 제조업자의 웹 사이트, 소셜 네트워크 사이트로부터)를 포함한다. 또한, 이러한 어레인지먼트에서, 이용자는 이미지를 캡처하기 전에 모드를 선택한다.In one arrangement, the camera-mounted mobile phone has a plurality of modes that the user can select, one of the modes being a face recognition mode, an optical character recognition mode, a mode associated with purchasing an imaged item, an imaged item A mode associated with selling, or a mode for determining information about an imaging item, scene or person (eg, from Wikipedia, a manufacturer's website, a social network site). Also in this arrangement, the user selects a mode before capturing the image.

일 어레인지먼트에서, 비주얼 부호들의 용어사전을 규정하고, 이것은 모바일 폰에 의해 인식되어 연관된 기능들을 트리거링하도록 서빙한다. In one arrangement, it defines a glossary of visual signs, which is recognized by the mobile phone to serve to trigger associated functions.

일 어레인지먼트에서, 카메라-장착된 모바일 폰은 이름 인식에서의 도움으로서 이용되고, 카메라는 얼굴을 포함하는 이미지를 캡처하고, 이것은 페이스북, 피카사 또는 아이포토와 같은 원격 리소스에 의해 결정된 참조 데이터와 관련하여 얼굴 인식 처리에 의해 처리된다. In one arrangement, a camera-mounted mobile phone is used as a help in name recognition, and the camera captures an image containing a face, which is associated with reference data determined by a remote resource such as Facebook, Picasa or iPhoto. By face recognition processing.

일 어레인지먼트에서, 카메라-장착된 모바일 폰에 의해 캡처된 오브젝트의 이미지는 예비 부품들 또는 수동 명령어, 유사한 출현을 가진 오브젝트들의 이미지들 등과 같이, 그 오브젝트에 관련된 정보에 링크하기 위해 이용된다. In one arrangement, the image of the object captured by the camera-mounted mobile phone is used to link to information related to the object, such as spare parts or manual instructions, images of objects with similar appearance, and the like.

일 어레인지먼트에서, 이미지는 동작들에 대한 암시적 또는 명시적 링크들 및/또는 다른 콘텐트로서 역할을 하는 데이터 또는 속성들의 세트와 연관하여 저장된다. 또한, 이러한 어레인지먼트에서, 이용자는 하나의 이미지에서 다음 이미지로 네비게이팅한다 - 네트워크 상의 노드들 사이에서 네비게이팅과 유사하다. 또한, 이러한 어레인지먼트에서, 이러한 링크들은 부가의 정보를 식별하기 위해 분석된다. In one arrangement, an image is stored in association with a set of data or attributes that serve as implicit or explicit links to operations and / or other content. Also in this arrangement, the user navigates from one image to the next-similar to navigating between nodes on the network. Also in such an arrangement, these links are analyzed to identify additional information.

일 어레인지먼트에서, 이미지는 연관된 의미 정보를 식별하기 위하여 - 데이터 저장소로부터의 정보에 따라 - 처리된다. 또한, 이러한 어레인지먼트에서, 식별된 의미 정보는 또 다른 연관된 의미 정보를 식별하기 위하여 - 데이터 저장소로부터의 정보에 따라 - 처리된다. In one arrangement, the image is processed-in accordance with information from the data store-to identify associated semantic information. Also in this arrangement, the identified semantic information is processed-in accordance with the information from the data store-to identify another associated semantic information.

일 어레인지먼트는 네트워크 클러스터에 복수의 모바일 폰들을 포함한다. 또한, 이러한 어레인지먼트에서, 네트워킹된 클러스터는 피어-투-피어 네트워크를 포함한다. One arrangement includes a plurality of mobile phones in a network cluster. Also in this arrangement, the networked cluster comprises a peer-to-peer network.

일 어레인지먼트에서, 디폴트 규칙은 네트워크에서 콘텐트의 공유를 지배하고, 디폴트 규칙은 제 1 시기의 범위의 콘텐트가 공유되지 않도록 명시한다. 또한, 이러한 어레인지먼트에서, 디폴트 규칙은 제 2 시기의 범위의 콘텐트가 공유될 수 있도록 명시한다. 또한, 이러한 어레인지먼트에서, 디폴트 규칙은 제 2 시기의 범위의 콘텐트가 소셜 링크에 있는 경우에만 공유될 수 있도록 명시한다. In one arrangement, the default rule governs the sharing of content in the network, and the default rule specifies that content of the range of the first time period is not shared. In addition, in this arrangement, the default rule specifies that the content of the range of the second time period can be shared. In addition, in such an arrangement, the default rule specifies that the content of the range of the second time period can be shared only if it is in the social link.

일 어레인지먼트에서, 위치와 연관된 경험적 데이터는 그 위치에서의 이용자들에게 이용 가능하게 된다. 또한, 그러한 어레인지먼트에서, 그 위치에서의 모바일 폰들은 경험적 데이터가 공유되는 ad hoc 네트워크를 형성한다. In one arrangement, the empirical data associated with a location becomes available to users at that location. Also in such an arrangement, the mobile phones at that location form an ad hoc network where empirical data is shared.

일 어레인지먼트에서, 카메라-장착된 모바일 폰의 이미지 센서는 기판 상에 형성되고, 기판 상에는 또한, 자동화된 비주얼 질의(예를 들면, 오브젝트 인식)를 서빙하기 위해 이미지-관련 데이터를 처리하기 위한 용도로 하나 이상의 모듈들이 형성된다. In one arrangement, an image sensor of a camera-mounted mobile phone is formed on a substrate, and also on the substrate for processing image-related data to serve automated visual queries (eg object recognition). One or more modules are formed.

일 어레인지먼트에서, 이미지는 한 당사자에 의해 캡처되고 오브젝트 인식 용도들과 같이(예를 들면, 나의 차량 찾기), 분석들 위해 복수의 이용자들에게 이용 가능하게 된다.In one arrangement, the image is captured by one party and made available to multiple users for analysis, such as object recognition purposes (eg, find my vehicle).

일 어레인지먼트에서, 분산된 카메라 네트워크로부터의 이미지 공급은 공용 검색을 위해 이용 가능하게 된다. In one arrangement, the image feed from the distributed camera network is made available for public retrieval.

또한, 상술된 것에 대응하는 어레인지먼트들은 이미지 센서에 의해 캡처된 비주얼 입력(예를 들면, 얼굴 인식을 위한 비디오 인식 등을 포함)보다는 마이크로폰에 의해 캡처된 오디오에 관련된다.Also, arrangements corresponding to those described above relate to audio captured by a microphone rather than visual input captured by an image sensor (eg, including video recognition for face recognition, etc.).

또한, 상술한 것에 대응하는 방법들, 시스템들 및 부조합들과, 이러한 방법들의 일부 또는 전부를 실행하도록 처리 시스템을 구성하기 위한 명령어들을 갖는 컴퓨터 판독가능한 저장 매체들에 관련된다.It also relates to computer-readable storage media having methods, systems and subcombinations corresponding to those described above, and instructions for configuring a processing system to perform some or all of these methods.

10, 81, 530: 셀 폰 12: 이미지 센서
16: 클라우드 32, 544: 카메라
34: 셋업 모듈 35: 동기화 처리기
36: 제어 처리기 모듈 38: 처리 모듈
51: 파이프라인 관리기 52: 데이터 파이프
72, 73, 74: 하드웨어 모듈
79, 524, 534, 546, 590: 메모리 82: 렌즈
84: 빔 스플리터
86: 마이크로-미러 프로젝터 시스템 110: 휴대용 디바이스
111, 582: 디스플레이 112: 키패드
114: 제어기 124: 롤러 휠
512, 530: 서모스탯 514: 온도 센서
516, 542: 처리기
520: LCD 디스플레이 스크린 526: WiFi 송수신기
528: 안테나 532: 처리기
552b: 원격 서버 554: 라우터
556b: 서버 584: 물리적 UI
586: 제어 처리기10, 81, 530: cell phone 12: image sensor
16: cloud 32, 544: camera
34: Setup Module 35: Synchronization Processor
36: control processor module 38: processing module
51: pipeline manager 52: data pipe
72, 73, 74: hardware module
79, 524, 534, 546, 590: Memory 82: Lens
84: beam splitter
86: micro-mirror projector system 110: portable device
111, 582: Display 112: Keypad
114: controller 124: roller wheel
512, 530: thermostat 514: temperature sensor
516, 542: processor
520: LCD display screen 526: WiFi transceiver
528: antenna 532: processor
552b: remote server 554: router
556b: Server 584: Physical UI
586: control handler

Claims

In a mobile phone comprising a microphone and a wireless transceiver:
And further comprising an image sensor coupled to both the human vision system processor and the machine vision processor, wherein the image sensor is coupled to the machine vision processor without passing through the human vision system processor, and wherein the human vision system processor is a white balance correction module. And a module selected from the group consisting of a gamma correction module, an edge enhancement module and a JPEG compression module.

The method of claim 1,
The machine vision processor includes at least one selected from the group consisting of an FFT module, an edge detection module, a pattern extraction module, a Fourier-Mellin module, a texture classifier module, a color histogram module, a motion detection module, and a feature recognition module. A mobile phone comprising a module of, a microphone and a wireless transceiver.

The method of claim 2,
Wherein the machine vision processing module comprises a microphone and a wireless transceiver utilizing a processing circuit integrated on the image sensor and a common substrate.

In a mobile phone comprising a microphone and a wireless transceiver:
A mobile phone comprising a microphone and a wireless transceiver, further comprising a plurality of stages for processing an image sensor and image-related data, wherein a data driven packet architecture is utilized.

The method of claim 4, wherein
Wherein the information in the header of the packet determines the parameters to be applied by the image sensor when first capturing image data.

The method of claim 4, wherein
Wherein the information in the header of the packet determines a process to be executed by the plurality of stages for image-related data conveyed in the body of the packet.

A system comprising a mobile phone and one or more remote processors for performing image-related processing:
The mobile phone includes a microphone, a wireless transceiver, a memory, and an image sensor operable to capture frames of image information, the mobile phone being operated to package image-related data into packets, wherein at least some of the packets are A system for performing image-related processing, comprising fewer than a single frame of image data.

A system comprising a mobile phone and one or more remote processors for performing image-related processing:
The mobile phone includes a microphone, a wireless transceiver, a memory, and an image sensor operative to capture frames of image information, the mobile phone routing specific image-related data for processing by a processor in the mobile phone, Operative to route other image-related data for processing by a remote processor.

In an arrangement where the mobile phone cooperates with a remote routing system:
The remote routing system serves to distribute image-related data from the mobile phone for processing by different remote processors and to collect processed image-related data for returning from the processors to the mobile phone. Arrangement in which the mobile phone cooperates with the remote routing system.

And a routing system serving the mobile phone to distribute image-related data to one or more processors inside the mobile phone for processing or to a remote routing system for processing by remote processors.

In a method of processing captured image data using an image sensor in a mobile phone:
Executing a first processing operation on the captured image data using the processing module of the mobile phone to generate an internally-processed image data;
Executing an automated assessment to select one of two or more external service providers;
Sending the internally-processed image data to the selected external service provider; And
Returning data from the selected external service provider to the mobile phone.

The method of claim 11,
The automated assessment includes reverse auction. The method of processing image data captured using an image sensor in a mobile phone.

The method of claim 11,
The module of the mobile phone generates a plurality of sets of image-related data,
The method is:
Routing one of the sets of image-related data to the selected external service provider; And
Routing another one of said sets of image-related data to a different external service provider.

The method of claim 11,
Wherein said evaluation comprises an anti-auction, wherein said image data is captured using an image sensor in a mobile phone.

In the image processing method:
Receiving an image, and metadata associated therewith;
Storing image data in at least one plane of the multi-plane data structure;
Generating a graphical representation of the metadata; And
And storing the metadata in another plane of the data structure.

The method of claim 15,
Processing the image to derive edge map data related to the image; And
Storing the edge map data as metadata in the other plane of the data structure.

The method of claim 15,
Processing the image to recognize one or more faces in the image; And
Storing information about the faces as metadata in the other plane of the data structure.

The method of claim 15,
Storing the graphical representation of the image data and the metadata in JPEG2000 form, wherein the graphical representation of the metadata is metadata corresponding to the features when the features of the image are progressively decoded for provision. Is similarly configured to be progressively decoded.

A mobile phone comprising a camera, a memory, a processor, a screen, and software in the memory:
The software provides the processor with display display data including at least one of (1) rotation from horizontal, (2) rotation after an initial time, and (3) scale change after an initial time. Wherein the processor comprises a camera, a memory, a processor, a screen, and software in the memory to determine the data with reference to information from the camera.

A mobile phone system comprising an image data sensor, a memory, a data processing system, and a screen:
The data processing system includes two parallel processing units,
The two parallel processors are:
A first processing unit for processing image data to be rendered in a recognition form for use by human viewers, the first processing module comprising a mosaic-releasing processor, a white balance correction module, a gamma correction module, an edge enhancement module, and a JPEG The first processor including at least one of a compression module; And
A mobile phone system comprising an image data sensor, a memory, a data processing system, and a screen, the second processing portion for analyzing the image data to derive semantic information from the image data.

The method of claim 20,
The second processing module includes at least one of an edge detection module, a Fourier transform processor, a discrete cosine transform processor, and an eigenface processor, wherein the mobile phone comprises an image data sensor, a memory, a data processing system, and a screen. system.

In a method for presenting information on the screen of a mobile phone:
Defining two or more dimensions of information related to the subject;
Displaying in a first dimension a sequence of screens providing information about the object when a first user interface control is operated; And
Displaying in a second dimension a sequence of screens providing information about the object when a second user interface control is operated.

The method of claim 22,
While the screen is displayed, in response to the operation of the third user interface control, changing the object to the object represented on one of the displayed screens, providing information on the screen of the mobile phone. Way.

The method of claim 22,
The object is an image;
The first dimension is similar to the image in one of (1) geolocation, (2) appearance, or (3) content description metadata;
Said second dimension being similar to said image in one of said (1), (2) or (3).

A method of composing a text message on a portable device equipped with a camera, comprising:
Tilting the device in a first direction to scroll through a sequence of displayed icons, wherein at least some of the icons each represent the device representing a plurality of letters of the alphabet until an intended icon is reached; Tilting;
Tilting the device in a second different direction to scroll through a sequence of displayed characters including the plurality of letters associated with the intended icon until an intended letter is reached; And
Selecting the intended letter to add to a text message.

Obtaining, using a mobile phone, identification information corresponding to a device, the device comprising an electronic processor;
Identifying application software corresponding to the device with reference to the identification information;
Programming the mobile phone with the identified application software; And
Interacting with the device through the use of the application software,
And the mobile phone acts as a multifunction controller configured to control a particular device through the use of application software identified with reference to information corresponding to that device.

The method of claim 26,
The device is a thermostat, parking meter, alarm clock, or vehicle.

The method of claim 26,
The acquiring identification information includes capturing image data from the device and processing it to obtain an identifier corresponding to the device.

The method of claim 26,
Capturing image data from the device, and providing a user interface for the device as a graphical overlay on the captured image data.

The method of claim 29,
The graphical overlay is provided in a registered alignment with the features of the captured image data.

Issuing a command regarding control of the device via a user interface on a user mobile phone, the user interface being provided on the screen of the phone in combination with a phone-captured image of the device. Issue step;
Signaling to the user information corresponding to the command in a first manner while the command is pending; And
And in a second different manner, once the command has been executed successfully, signaling information corresponding to the command to the user.

When the user is in physical proximity to the device, initiating a transaction with the device using the user interface provided on the screen of the mobile phone;
Later using the mobile phone for purposes not related to the device; And
Later, recalling the user interface to associate with another transaction with the device.

33. The method of claim 32,
And the device comprises a parking meter.

33. The method of claim 32,
And the device comprises a thermostat.

A mobile phone comprising a processor, a wireless interface, a memory, a sensor, and a display:
The instructions in the memory configure the processor to provide a user interface, wherein the user interface uses the mobile phone to interact with a plurality of different external devices such that the user has several different device-specific users stored in memory. A mobile phone comprising a processor, a wireless interface, a memory, a sensor, and a display allowing to select among the interfaces.

Using a mobile phone, sensing information from a housing of a device including a wireless interface;
Encrypting the information using the sensed information using a public key corresponding to the device; And
Transmitting the encrypted information.

Using the mobile phone, detecting analog information from a device equipped with an air interface;
Converting the sensed analog information into a digital form;
Transmitting data corresponding to the digital form from the mobile phone; And
Using the transmitted data to confirm user proximity to the device before allowing a user to interact with the device using the mobile phone.

In an arrangement in which a camera-mounted mobile phone processes image data through selection of a plurality of processing stages:
An arrangement in which a camera-mounted mobile phone processes image data through selection of a plurality of processing stages, wherein the selection of one processing stage processes image data that depends on the attributes of the processed image data output from different processing stages.

A method of using a portable electronic device comprising a wireless interface and reconfigurable hardware:
In response to a user action, initializing the device to prepare for use, wherein the initializing step includes downloading, via the wireless interface, updated hardware configuration instructions for the hardware from a remote source. An initialization step; And
Configuring hardware of the device according to the downloaded hardware configuration instructions,
The method dynamically reconfigures the hardware whenever the device is ready for use, and wherein the reconfiguration ensures that the device is not initialized with replaced configuration instructions for programmable hardware. A method of using a portable electronic device comprising.

Utilizing a hardware processing component of the wireless system base station to process data related to wireless signals exchanged between the wireless system base station and a plurality of associated remote wireless devices; And
And using the hardware processor component of the wireless system base station to process image-related data offloaded to the wireless base station for processing by a consumer camera device.

The method of claim 40,
Wherein the hardware processing component comprises a field programmable object array and the remote wireless devices comprise mobile phones.

Characterizing an optical distortion function associated with the lens;
Defining a geometry of a correction surface corresponding to the optical distortion function; And
After receiving the image through the lens or before projecting the image through the lens, texture-mapping the image onto the correction surface using a GPU;
The tilts and / or elevations at the correction surface serve to compensate for the optical distortion function associated with the lenses.

In the method of correcting lens distortion in an image:
Projecting the image onto a virtual surface whose topology is formed to remove the lens distortion.

Receiving, at a wireless station, a service reservation message from a first mobile device, the service reservation message including one or more parameters of a future service requesting that the first mobile device be available at a future time, not immediately. Receiving the service reservation message; And
Based on the service reservation message received from the first mobile device, determining about a service provided to a second mobile device,
The allocation of resources of the wireless station is improved due to the improved information regarding the expected services to be provided to the first mobile device.

45. The method of claim 44,
Wherein the one or more parameters include a parameter indicating a future time when the future service is requested to be made available.

45. The method of claim 44,
The one or more parameters include a parameter indicative of an expected duration of the requested future service.

A mobile phone comprising an image sensor, a processor, and a memory coupled to a thermoelectric cooling device:
The instructions in the memory include an image sensor, processor, and memory coupled to a thermoelectric cooling device that configure the processor to selectively activate the thermoelectric cooling device based on noise or environments detected in captured image data. Mobilephone.

In a mobile phone device:
A first portion comprising an optical sensor and a lens assembly, the first portion configured to be carried by the user at a first position with respect to the user's body; And
A second portion comprising a display, a user interface, and a wireless transceiver, the second portion configured to be carried by the user at a second different location relative to the user's body; And
And a wireless communication system linking the first and second portions.

49. The method of claim 48 wherein
And the second portion is configured to removably receive the first portion.

In a mobile phone device:
A first portion comprising a lighting assembly;
A second portion comprising a display, a user interface, a wireless transceiver, an optical sensor and a lens; And
A wireless communication system linking the first and second portions;
And the first portion is detached from the second portion and can be positioned to illuminate an object to be imaged by the optical sensor of the second portion.

Receiving an image at a camera-mounted mobile phone device;
Recognizing a plurality of objects depicted in the image;
On the screen of the mobile phone device, graphical features are overlaid with a registered alignment with the recognized objects to provide the image; And
Detecting a user selection of one of the graphical features and thereby triggering an action related to a recognized object associated therewith.

The method of claim 51 wherein
The operation includes providing a menu of links related to the recognized object.

The method of claim 51 wherein
The operation includes launching an application.