KR100985694B1

KR100985694B1 - Selective sound source listening in conjunction with computer interactive processing

Info

Publication number: KR100985694B1
Application number: KR1020077028369A
Authority: KR
Inventors: 리차드 엘. 마크스; 시아동 마오
Original assignee: 소니 컴퓨터 엔터테인먼트 인코포레이티드
Priority date: 2005-05-05
Filing date: 2006-04-28
Publication date: 2010-10-05
Also published as: CN101132839A; EP1877149A1; TWI308080B; JP2008539874A; KR20080009153A; CN101132839B; JP5339900B2; TW200708328A; WO2006121681A1

Abstract

컴퓨터 프로그램과 상호 작용하는 동안, 이미지 및 사운드 캡쳐를 위한 방법 및 장치가 제공된다. 상기 장치는 하나 이상의 이미지 프레임을 캡쳐하도록 구성된 이미지 캡쳐 유닛을 포함한다. 사운드 캡쳐 유닛 역시 제공된다. 상기 사운드 캡쳐 유닛은 하나 이상의 사운드를 식별하도록 구성된다. 상기 사운드 캡쳐 유닛은 사운드를 처리하여 포커스 구역 외부의 사운드를 실질적으로 배제시켜 상기 포커스 구역을 결정하도록 분석될 수 있는 데이터를 발생한다. 이러한 방법으로 상기 포커스 구역을 위해 캡쳐되고 처리된 사운드는 컴퓨터 프로그램과의 상호 작용에 사용된다. While interacting with a computer program, methods and apparatus are provided for capturing images and sounds. The apparatus includes an image capture unit configured to capture one or more image frames. Sound capture units are also provided. The sound capture unit is configured to identify one or more sounds. The sound capture unit processes the sound to generate data that can be analyzed to determine the focus zone by substantially excluding sound outside the focus zone. In this way, the sound captured and processed for the focus zone is used for interaction with the computer program.

선택적 음원 청취, 이미지 사운드 캡쳐 장치, 음원, 사운드 Selective sound source, image sound capture device, sound source, sound

Description

{SELECTIVE SOUND SOURCE LISTENING IN CONJUNCTION WITH COMPUTER INTERACTIVE PROCESSING}

본 발명은 컴퓨터 상호 작용 프로세싱과 관련된 선택적 음원 청취에 관한 것이다. The present invention relates to selective sound source listening associated with computer interaction processing.

비디오 게임 산업은 수년에 걸쳐 많은 변화를 보여 왔다. 연산 능력이 확장됨에 따라, 비디오 게임 개발자들은 연산 능력에 있어 이러한 이점의 증가를 받아들이는 게임 소프트웨어를 만들어 왔다. 이러한 목적을 위하여 비디오 게임 개발자들은 매우 현실적인 게임 경험을 창출하기 위해 정교한 연산 및 수학적 계산이 적용된 게임을 코딩하여 왔다. The video game industry has seen many changes over the years. As computing power has expanded, video game developers have written game software that accepts this increase in computing power. For this purpose, video game developers have coded games with sophisticated computations and mathematical calculations to create highly realistic gaming experiences.

게임 플랫폼의 예로서, 소니 플레이스테이션(SONY Playstation) 또는 소니 플레이스테이션 2(SONY Playstation 2, PS2)가 있을 수 있고, 이 각각은 게임 콘솔의 형식으로 판매된다. 잘 알려진 바와 같이, 상기 게임 콘솔은 모니터(보통 텔레비젼)에 연결되도록 디자인되었고, 핸드헬드 컨트롤러를 통하여 사용자 대화가 가능하다. 상기 게임 콘솔은 CPU, 강한 그래픽 동작을 처리하기 위한 그래픽 합성기, 기하학적 변환을 수행하기 위한 벡터 유닛 및 다른 글루(glue) 하드웨어, 펌웨어, 소프트 웨어를 포함한 특수화된 프로세싱 하드웨어로 디자인된다. 상기 게임 콘솔은 게임 콘솔을 통한 로컬 플레이를 위해 게임 컴팩트 디스크를 수신하기 위한 광학 디스크 트레이를 더 포함하여 디자인된다. 사용자는 인터넷을 통하여 다른 사용자와 함께 또는 대항하여 대화식의 플레이를 하는 온라인 게임 역시 가능하다. Examples of gaming platforms may be Sony Playstation or Sony Playstation 2 (PS2), each of which is sold in the form of a game console. As is well known, the game console is designed to be connected to a monitor (usually a television) and allows user conversation via a handheld controller. The game console is designed with a specialized processing hardware including a CPU, a graphics synthesizer to handle strong graphics operations, a vector unit to perform geometric transformations and other glue hardware, firmware and software. The game console is designed to further include an optical disc tray for receiving a game compact disc for local play through the game console. Online games are also available for users to play interactively with or against other users over the Internet.

게임 복잡도가 플레이어들의 흥미를 이끌어냄에 따라, 게임 및 하드웨어 제조자들은 부가적인 상호 작용성이 가능하도록 혁신을 계속하고 있다. 그러나 실질상 사용자들이 게임과 상호 작용하는 방법은 수년에 걸쳐 극적으로 변화하지 않았다. As game complexity draws players' interest, game and hardware manufacturers continue to innovate to enable additional interactivity. In practice, however, the way users interact with games has not changed dramatically over the years.

상술한 관점에 있어서, 사용자가 게임 플레이와의 상호 작용성을 더 개선 시키는 방법 및 시스템에 대한 필요가 있다.In view of the above, there is a need for a method and system that further improves user interaction with gameplay.

널리 말하면, 본 발명은 컴퓨터 프로그램과의 상호 작용성을 촉진하는 장치 및 방법을 제공함으로써 이러한 필요성을 충족시킨다. 한 실시예에 있어서, 상기 컴퓨터 프로그램은 게임 프로그램이지만, 이에 대한 제한은 없고, 상기 장치 및 방법은 트리거 제어, 입력으로서 사운드 입력을 취하거나 통신을 가능하게 할 수 있는 어떠한 컴퓨터 환경에 있어서 응용성을 찾을 수 있다. 보다 구체적으로 말하면, 만약 사운드가 제어 또는 입력을 트리거 하는데 사용된다면, 본 발명의 실시예는 특정 음원의 입력이 필터링되는 것을 가능하게 할 것이고, 상기 필터링된 입력은 제외하거나 또는 관심대상 밖의 음원으로부터 초점이 맞춰지지 않도록 구성된다. 선택된 음원에 의존하는 비디오 게임 환경에서, 상기 비디오 게임은 관심대상 밖에 있는 다른 사운드의 왜곡 또는 잡음없이, 관심대상인 음원을 처리한 후에 특정 응답에 응답한다. 보통, 게임 플레이 환경은 음악, 다른 사람 및 물체의 움직임과 같은 많은 배경 잡음에 노출될 것이다. 만약, 관심대상 밖의 사운드가 실질적으로 필터링된다면, 상기 컴퓨터 프로그램은 관심대상의 사운드에 더 잘 응답할 수 있다. 상기 응답은 명령어, 움직임의 개시, 선택, 게임 상태(status) 또는 형세(state)에서의 변화, 특징의 열림(unlocking of features) 등과 같은 어떠한 형태가 될 수 있다. Broadly speaking, the present invention meets this need by providing an apparatus and method for facilitating interaction with a computer program. In one embodiment, the computer program is a game program, but there is no limitation thereto, and the apparatus and method may be applicable in any computer environment capable of triggering control, taking sound input as input or enabling communication. You can find it. More specifically, if sound is used to trigger a control or input, embodiments of the present invention will allow the input of a particular sound source to be filtered, excluding the filtered input or focusing from a sound source outside of interest. It is configured not to fit. In a video game environment that depends on the selected sound source, the video game responds to a particular response after processing the sound source of interest, without distortion or noise of other sounds outside of the interest. Usually, the game play environment will be exposed to many background noises such as music, movement of others and objects. If the sound outside of the interest is substantially filtered, the computer program may better respond to the sound of interest. The response may be in any form such as command, initiation of movement, selection, change in game status or state, unlocking of features, and the like.

한 실시예에 있어서, 컴퓨터 프로그램과 상호 작용하는 동안 이미지 및 사운드를 캡쳐하는 장치가 제공된다. 상기 장치는 하나 이상의 이미지 프레임을 캡쳐하도록 구성되는 이미지 캡쳐 유닛을 포함한다. 사운드 캡쳐 유닛 또한 제공된다. 상기 사운드 캡쳐 유닛은 하나 이상의 사운드를 식별하도록 구성된다. 상기 사운드 캡쳐 유닛은 사운드를 처리하여 포커스 구역 외부의 사운드를 실질적으로 배제시켜 상기 포커스 구역을 결정하도록 분석될 수 있는 데이터를 발생한다. 이러한 방법으로, 상기 포커스 구역을 위해 캡쳐되고 처리된 사운드는 컴퓨터 프로그램과의 상호 작용에 사용된다. In one embodiment, an apparatus is provided for capturing images and sounds while interacting with a computer program. The apparatus includes an image capture unit configured to capture one or more image frames. Sound capture units are also provided. The sound capture unit is configured to identify one or more sounds. The sound capture unit processes the sound to generate data that can be analyzed to determine the focus zone by substantially excluding sound outside the focus zone. In this way, the sound captured and processed for the focus zone is used for interaction with the computer program.

또 다른 실시예에 있어서, 컴퓨터 프로그램과 상호 작용하는 동안 선택적 사운드 청취를 위한 방법이 개시된다. 상기 방법은 두 개 이상의 음원 캡쳐 마이크로폰에, 하나 이상의 음원으로부터 입력을 수신하는 단계를 포함한다. 그리고 상기 방법은 상기 음원 각각으로부터 지연 경로(delay path)를 결정하는 단계 및 상기 하나 이상의 음원 각각의 수신된 입력 각각을 위한 방향을 식별하는 단계를 포함한다. 그리고 상기 방법은 포커스 구역의 식별된 방향에 있지 않은 사운드를 필터링하는 단계를 포함한다. 상기 포커스 구역은 컴퓨터 프로그램의과 상호 작용성을 위한 음원을 제공하도록 구성된다.In yet another embodiment, a method for selective sound listening while interacting with a computer program is disclosed. The method includes receiving input from one or more sound sources to two or more sound source capture microphones. And the method includes determining a delay path from each of the sound sources and identifying a direction for each received input of each of the one or more sound sources. And the method includes filtering sound that is not in the identified direction of the focus zone. The focus zone is configured to provide a sound source for interaction with the computer program.

또 다른 실시예에 있어서, 게임 시스템이 제공된다. 상기 게임 시스템은 상호 작용 컴퓨터 프로그램의 실행을 가능하게 하는 컴퓨팅 시스템과 인터페이스 하도록 구성되는 이미지-사운드 캡쳐 장치를 제공한다. 상기 이미지 캡쳐 장치는 포커스 구역으로부터 비디오를 캡쳐하기 위해 배치되는 것이 가능한 비디오 캡쳐 하드웨어를 포함한다. 하나 이상의 음원으로부터 사운드를 캡쳐하기 위해 마이크로폰의 배열이 제공된다. 각각의 음원은 식별되고 상기 이미지-음원 캡쳐 장치에 관한 방향으로 결합된다. 상기 비디오 캡쳐 하드웨어와 결합한 상기 포커스 구역은 상기 포커스 구역의 근접한 방향에서 상기 음원 중 하나를 식별하는데 사용되도록 구성된다. In yet another embodiment, a game system is provided. The game system provides an image-sound capture device configured to interface with a computing system that enables the execution of an interactive computer program. The image capture device includes video capture hardware that can be arranged to capture video from the focus area. An array of microphones is provided to capture sound from one or more sound sources. Each sound source is identified and combined in a direction relative to the image-source capture device. The focus zone in combination with the video capture hardware is configured to be used to identify one of the sound sources in the proximal direction of the focus zone.

일반적으로, 상기 상호 작용 사운드 식별 및 추적은 어떠한 연산 장치의 어떠한 컴퓨터 프로그램과의 인터페이싱에 적용 가능하다. 만약, 상기 음원이 식별된다면, 상기 음원의 내용은 트리거, 운전, 지시 또는 컴퓨터 프로그램에 의해 만들어진 특징 또는 객체를 제어하도록 더 처리될 수 있다.In general, the interactive sound identification and tracking is applicable to interfacing with any computer program on any computing device. If the sound source is identified, the content of the sound source may be further processed to control a feature or object created by a trigger, driving, instruction or computer program.

본 발명의 다른 측면 또는 이점은 본 발명의 원리에 대한 예를 설명한 발명의 상세한 설명을 첨부된 도면과 함께 참고하면 명백해 질 것이다. Other aspects or advantages of the present invention will become apparent from the following detailed description of the invention, taken in conjunction with the accompanying drawings, which illustrate examples of principles of the invention.

본 발명은 첨부된 도면과 함께 발명의 상세한 설명을 참고한다면 그 이상의 이점과 함께 가장 잘 이해될 수 있을 것이다. BRIEF DESCRIPTION OF THE DRAWINGS The present invention may be best understood with more advantages by reference to the detailed description of the invention in conjunction with the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른, 비디오 게임 프로그램이 한 명 이상의 사용자들과의 상호 작용을 위해 실행될 수 있는 게임 환경을 나타낸다. 1 illustrates a game environment in which a video game program may be executed for interaction with one or more users, in accordance with an embodiment of the present invention.

도 2는 본 발명의 일 실시예에 따른, 이미지-사운드 캡쳐 장치의 3차원 다이아그램의 예시를 도시한다. 2 shows an example of a three-dimensional diagram of an image-sound capture device, according to one embodiment of the invention.

도 3A 및 3B는 본 발명의 일 실시예에 따른, 입력을 수신하도록 디자인된 서로 다른 마이크로폰에서 사운드 경로의 처리 및 선택된 음원의 출력을 위한 논리를 도시한다. 3A and 3B illustrate logic for processing sound paths and outputting selected sound sources in different microphones designed to receive input, according to one embodiment of the invention.

도 4는 본 발명의 일 실시예에 따른, 입력 음원의 처리를 위한 이미지-사운드 캡쳐 장치와 인터페이싱 하는 컴퓨팅 시스템을 도시한다. 4 illustrates a computing system for interfacing with an image-sound capture device for processing an input sound source, in accordance with an embodiment of the present invention.

도 5는 본 발명의 일 실시예에 따른, 특정 음원의 방향 식별 정밀도를 증가시키는데 사용되는 다수의 마이크로폰의 예시를 도시한다. 5 illustrates an example of a plurality of microphones used to increase the direction identification precision of a particular sound source, according to one embodiment of the invention.

도 6은 본 발명의 일 실시예에 따른, 서로 다른 평면에서 마이크로폰을 사용한 특정 공간 볼륨에서 사운드가 식별되는 예시를 도시한다. 6 illustrates an example in which sound is identified at a particular spatial volume using a microphone in different planes, according to one embodiment of the invention.

도 7과 도 8은 본 발명의 일 실시예에 따른, 음원 및 초점이 맞춰지지 않은 음원의 배제의 식별에 처리될 수 있는 전형적인 방법 동작을 예시한다. 7 and 8 illustrate exemplary method operations that may be addressed for identification of exclusion of sound sources and unfocused sound sources, according to one embodiment of the invention.

본 발명은 사운드가 컴퓨터 프로그램과 함께 상호 작용 도구로서 사용될 때, 특정 음원의 식별 및 원하지 않는 음원의 필터링을 용이하게 하기 위한 방법 및 장치에 관하여 개시한다. The present invention discloses a method and apparatus for facilitating identification of certain sound sources and filtering of unwanted sound sources when sound is used as an interactive tool with a computer program.

이하의 기술에서, 다수의 특정 묘사들은 본 발명의 완전한 이해를 제공하기 위해 설명된다. 그러나 본 발명의 당업자는 이러한 일부 또는 전부의 특정 묘사들 없이 실시할 수 있을 것이라는 것은 명백하다. 다른 예시에 있어서, 잘 알려진 처리 단계는 본 발명을 불명료하게 하지 않기 위해 상세히 기술되지는 않을 것이다. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without some or all of these specific details. In other instances, well known processing steps will not be described in detail in order not to obscure the present invention.

도 1은 본 발명의 실시예에 따라,비디오 게임 프로그램이 한 명 이상의 사용자들과 함께 상호 작용이 실행될 수 있는 게임 환경 100을 보여준다. 묘사된 바와 같이, 플레이어(player) 102 는 디스플레이 110를 포함하는 모니터 108의 정면에 보인다. 상기 모니터 108은 컴퓨팅 시스템 104과 상호 연결되어 있다. 상기 컴퓨팅 시스템은 표준 컴퓨터 시스템, 게임 콘솔 또는 휴대용 컴퓨터 시스템이 될 수 있다. 특정 브랜드에 한정되지 않는 특정 예시에서, 상기 게임 콘솔은 소니 컴퓨터 엔터테인먼트(주)(Sony computer entertainment, Inc), 마이크로소프트(Microsoft), 또는 다른 제조자에 의해 생산된 것일 수 있다. 1 illustrates a gaming environment 100 in which a video game program may be interacted with one or more users, in accordance with an embodiment of the present invention. As depicted, player 102 is visible in front of monitor 108 including display 110. The monitor 108 is interconnected with the computing system 104. The computing system may be a standard computer system, a game console or a portable computer system. In certain examples, not limited to a particular brand, the game console may be produced by Sony computer entertainment, Inc., Microsoft, or other manufacturer.

컴퓨팅 시스템 104는 이미지-사운드 캡쳐 장치 106과 함께 연결된다. 상기 이미지-사운드 캡쳐 장치 106는, 사운드 캡쳐 유닛 106a 및 이미지 캡쳐 유닛 106b를 포함한다. 상기 플레이어 102는 디스플레이 110에 게임 형상 112과 상호 작용하여 통신할 수 있다. 상기 실행되는 비디오 게임은 입력이 이미지 캡쳐 유닛 106b 및 사운드 캡쳐 유닛 106a을 경유하여 적어도 부분적으로 플레이어 102에 의해 제공되는 것이다. 기술된 바와 같이, 상기 플레이어 102는 디스플레이 110에서 상호 작용 아이콘 114을 선택하기 위해 그의 손을 움직일 수 있다. 플레이어 102의 반투명 이미지는 상기 이미지 캡쳐 유닛 106b에 의해 캡쳐되어, 디스플레이 110에 투사된다. 따라서, 상기 플레이어 102는 아이콘을 선택하거나 상기 게임 형상 112와 인 터페이스 하기 위해 그의 손을 어느 곳으로 움직여야 하는지 알 수 있다. 이러한 움직임 및 상호 작용을 캡쳐하는 기술은 서로 다를 수 있지만, 예시적인 기술은 각각 2003년 2월 21일에 출원된 영국 출원 GB0304024.3(PCT/GB/2004/000693) 및 GB0304022.7(PCT/GB2004/000703)에 기술되어 있고 , 각각은 참고 자료로 활용될 것이다. The computing system 104 is coupled with the image-sound capture device 106. The image-sound capture device 106 includes a sound capture unit 106a and an image capture unit 106b. The player 102 may communicate with the game feature 112 on the display 110. The video game played is one whose input is provided at least in part by the player 102 via the image capture unit 106b and the sound capture unit 106a. As described, the player 102 may move his hand to select the interaction icon 114 on the display 110. The translucent image of the player 102 is captured by the image capture unit 106b and projected onto the display 110. Thus, the player 102 may know where to move his hand to select an icon or interface with the game feature 112. The techniques for capturing such movements and interactions may be different, but exemplary techniques are described in the UK applications GB0304024.3 (PCT / GB / 2004/000693) and GB0304022.7 (PCT /, respectively, filed February 21, 2003). GB2004 / 000703), each of which will be used as a reference.

상기 예시에서, 상기 상호 작용 아이콘 114는 플레이어가 상기 게임 형상 112가 핸들이 있는 물체를 흔들 수 있도록 "스윙"을 선택하도록 하는 아이콘이다. 부가하여, 상기 플레이어 102는 사운드 캡쳐 유닛 106a에 의해 캡쳐되고, 컴퓨팅 시스템 104에 의해 처리되어 상기 실행되는 비디오 게임과의 상호 작용을 제공할 수 있는 음성 명령(voice commands)을 제공할 수 있다. 보여진 바와 같이, 상기 음원 116는 음성 명령 "점프!" 이다. 상기 음원 116a는 사운드 캡쳐 유닛 106a에 의해 캡쳐되고, 컴퓨팅 시스템 104에 의해 처리되어 상기 게임 형상 112가 점프하도록 할 것이다. 음성 인(voice recognition)식은 상기 음성 명령의 식별을 가능하게 하는데 사용될 수 있다. 양자택일로, 상기 플레이어 102는 인터넷 또는 네트워크로 연결되지만, 역시 직접적으로 혹은 부분적으로 게임의 상호 작용과 관련된 원격의 사용자들과 통신할 수 있다. In this example, the interaction icon 114 is an icon that allows the player to select "swing" so that the game feature 112 can shake the object with the handle. In addition, the player 102 may provide voice commands that may be captured by the sound capture unit 106a and processed by the computing system 104 to provide interaction with the executed video game. As shown, the sound source 116 has a voice command "Jump!" to be. The sound source 116a will be captured by the sound capture unit 106a and processed by the computing system 104 to cause the game feature 112 to jump. Voice recognition may be used to enable identification of the voice command. Alternatively, the player 102 may be connected to the Internet or a network, but may also communicate with remote users who are directly or partly involved in game interaction.

본 발명의 일 실시예에 따라, 상기 사운드 캡쳐 유닛 106a는 컴퓨팅 시스템 104이 특정 방향으로부터 오는 사운드를 선택할 수 있도록 하는 적어도 두 개의 마이크로폰을 포함하도록 구성된다. 상기 컴퓨팅 시스템 104이 상기 게임 플레이(또는 포커스)의 중앙에 있지 않은 방향을 필터링할 수 있도록 함으로써, 게임 환경 100에서의 사운드 전환은 특정 명령이 플레이어 102에 의해 제공될 때 상기 게임 실행을 방해하거나 혼돈시키지 않을 것이다. 예를 들어, 상기 게임 플레이어 102는 그의 발을 두드릴 수 있고, 이는 논-랭귀지(non-language) 사운드 117인 탭 잡음(tap noise)을 일으킬 수 있다. 그러한 사운드는 사운드 캡쳐 유닛 106a에 의해 캡쳐될 수 있지만 상기 플레이어의 발 102에서부터 오는 사운드는 상기 비디오 게임의 포커스 구역에 있지 않기 때문에 필터링될 수 있다. According to one embodiment of the invention, the sound capture unit 106a is configured to include at least two microphones that allow the computing system 104 to select sound coming from a particular direction. By allowing the computing system 104 to filter out directions that are not in the center of the game play (or focus), sound transitions in game environment 100 may interfere with or disrupt the game execution when a specific command is provided by player 102. I will not let you. For example, the game player 102 can tap his foot, which can cause tap noise, which is a non-language sound 117. Such sound may be captured by sound capture unit 106a but the sound coming from the player's foot 102 may be filtered because it is not in the focus area of the video game.

이하에서 기술되는 바와 같이, 상기 포커스 구역은 보다 바람직하게 상기 이미티 캡쳐 유닛 106b의 포커스 포인트인 활동 이미지 영역(Active Image Area)에 의해 식별된다. 다른 방법에서, 상기 포커스 구역은 초기화 단계 이후에 사용자에게 제공된 구역의 선택으로부터 수동으로 선택될 수 있다. 도 1의 예시에 계속하여, 게임 관찰자 103은 상호 작용 게임 플레이 동안 컴퓨팅 시스템에 의한 프로세싱을 혼란시키는 음원 116b을 제공할 수 있다. 그러나, 상기 게임 관찰자 103는 이미지 캡쳐 유닛 106b의 활동 이미지 영역에 있지는 않고 따라서, 게임 관찰자 103 방향으로부터 오는 사운드는, 상기 컴퓨팅 시스템 104이 음원 116a처럼 플레이어 102로부터 오는 음원과 음원 116b로부터 오는 명령이 잘못되게 혼란되지 않도록 필터링될 것이다. As described below, the focus zone is more preferably identified by an active image area which is the focus point of the emission capture unit 106b. Alternatively, the focus zone may be manually selected from the selection of zones provided to the user after the initialization step. Continuing with the example of FIG. 1, game observer 103 may provide a sound source 116b that disrupts processing by the computing system during interactive game play. However, the game observer 103 is not in the active image area of the image capturing unit 106b, and therefore, the sound coming from the game observer 103 direction is incorrect for the sound source coming from the player 102 and the command from the sound source 116b such that the computing system 104 is sound source 116a. It will be filtered to avoid confusion.

상기 이미지-사운드 캡쳐 장치 106는 이미지 캡쳐 유닛 106b 및 사운드 캡쳐 유닛 106a를 포함한다. 상기 이미지-사운드 캡쳐 장치 106는 보다 바람직하게는 이미지 프레임을 디지털적으로 캡쳐할 수 있고, 그 다음의 프로세싱을 위해 상기 이미지 프레임을 컴퓨팅 시스템 104에 전송할 수 있다. 상기 이미지 캡쳐 유닛 106b 의 예는 비디오 이미지가 캡쳐되어, 하위 저장 장치 또는 인터넷과 같은 네트워크를 통한 통신을 위한 연산 장치에 디지털적으로 전송될 때 보통 사용되는 웹 카메라이다. 이미지 캡쳐 장치의 다른 타입은 상기 이미지 데이터가 식별 및 필터링이 가능하도록 디지털적으로 처리되는 한, 아날로그 또는 디지털 방식으로 작동할 수 있다. 한 바람직한 실시예에 있어서, 상기 필터링을 가능하게 하는 상기 디지털 프로세싱은 입력 데이터가 수신된 후, 소프트웨어 상에서 행해진다. 상기 사운드 캡쳐 유닛 106a는 한 쌍의 마이크로폰(마이크 1 및 마이크 2)을 포함한다. 상기 마이크로폰은 상기 이미지-사운드 캡쳐 장치 106을 구성하는 하우징에 통합될 수 있는 표준 마이크로폰이다. The image-sound capture device 106 includes an image capture unit 106b and a sound capture unit 106a. The image-sound capture device 106 may more preferably digitally capture an image frame and send the image frame to computing system 104 for subsequent processing. An example of the image capture unit 106b is a web camera that is commonly used when a video image is captured and digitally transmitted to a computing device for communication via a network such as a lower storage device or the Internet. Other types of image capture devices may operate analog or digitally as long as the image data is digitally processed to enable identification and filtering. In one preferred embodiment, the digital processing that enables the filtering is performed on software after input data is received. The sound capture unit 106a includes a pair of microphones (microphone 1 and microphone 2). The microphone is a standard microphone that can be integrated into the housing constituting the image-sound capture device 106.

도 3A는 사운드 A 및 사운드 B로부터 오는 음원 116과 마주칠 때의 사운드 캡쳐 유닛 106a를 묘사한다. 보여진 바와 같이, 사운드 A는 들을 수 있는 사운드를 투사할 것이고, 사운드 경로 201a 및 사운드 경로 201b를 따라 마이크 1 및 마이크 2에 의해 감지될 것이다. 사운드 B는 사운드 경로 202a 및 202b를 통해 마이크 1 및 마이크 2를 향해 투사될 것이다. 묘사된 바와 같이, 사운드 A를 위한 사운드 경로는 다른 길이가 될 것이고, 따라서 사운드 경로 202a 및 202b와 비교할 때 상대적인 지연을 제공한다. 사운드 A 및 사운드 B 각각으로부터 오는 상기 사운드는 도 3B에서 보여진 박스 216 에서 방향 선택이 일어날 수 있도록 표준 삼각 측량 알고리즘을 사용하여 처리될 것이다. 마이크 1 및 마이크 2로부터 오는 상기 사운드는 버퍼 1 및 2(210a, 210b)에 각각 버퍼되고, 지연 라인(212a, 212b)을 통해 통과될 것이다. 한 실시예에 있어서, 상기 버퍼링 및 지연 프로세스는 하드웨어가 상기 동 작이 잘 조종되도록 설계될 수 있음에도 불구하고 소프트웨어에 의해 제어될 것이다. 삼각 측량에 기초하여, 방향 선택 216은 상기 음원 116 중 어느 하나의 식별 및 선택을 트리거 할 것이다. 3A depicts sound capture unit 106a when encountering sound source 116 coming from sound A and sound B. FIG. As shown, sound A will project an audible sound and will be sensed by microphone 1 and microphone 2 along sound path 201a and sound path 201b. Sound B will be projected towards microphone 1 and microphone 2 through sound paths 202a and 202b. As depicted, the sound path for sound A will be of different length, thus providing a relative delay when compared to sound paths 202a and 202b. The sound from each of Sound A and Sound B will be processed using a standard triangulation algorithm so that the direction selection can take place in box 216 shown in FIG. 3B. The sound coming from microphones 1 and 2 is buffered in buffers 1 and 2 210a and 210b, respectively, and will pass through delay lines 212a and 212b. In one embodiment, the buffering and delay process will be controlled by software even though hardware can be designed to handle the operation well. Based on triangulation, the direction selection 216 will trigger the identification and selection of any of the sound sources 116.

마이크 1 및 마이크 2 각각으로부터 오는 상기 사운드는 선택된 음원의 출력으로서 출력되기 전에 박스 214에서 더해질 것이다. 이러한 방법으로 활동 이미지 영역 이외의 방향에서 오는 사운드는, 그러한 음원이 컴퓨터 시스템 104에 의한 프로세싱을 방해하거나 네트워크 또는 인터넷을 통한 상호 작용 비디오 게임을 하는 다른 사용자들과의 통신을 방해하지 않도록 필터링될 수 있다.The sound coming from microphone 1 and microphone 2 respectively will be added in box 214 before being output as the output of the selected sound source. In this way, sound coming from directions other than the active image area can be filtered so that such sound sources do not interfere with processing by computer system 104 or communicate with other users playing interactive video games over a network or the Internet. have.

도 4는 본 발명의 일 실시예에 따른 이미지-사운드 캡쳐 장치 106과 관련하여 사용될 수 있는 컴퓨팅 시스템 250을 도시한다. 상기 컴퓨팅 시스템 250은 프로세서 252 및 메모리 256을 포함한다. 버스 254는 프로세서 및 메모리 256를 이미지-사운드 캡쳐 장치 106과 상호 연결한다. 상기 메모리 256는 상호 작용 프로그램 258의 적어도 일부분을 포함할 것이고, 또한 상기 수신한 음원 데이터를 처리하기 위한 선택적 음원 청취 로직 또는 코드 260을 포함할 것이다. 포커스 구역이 상기 이미지 캡쳐 유닛 106b에 의해 식별되는데 기초하여, 상기 포커스 구역의 외부 음원은 선택적 음원 청취 로직 260이 실행됨에 의해 선택적으로 필터링될 것이다(예를 들어, 프로세서 및 메모리 256에 적어도 일부분으로 저장된 것에 의해). 컴퓨팅 시스템은 단순화한 형태로 보여지지만, 하드웨어가 명령어를 처리하여 들어오는 음원의 프로세싱을 달성하고 선택적 청취를 가능하게 할 수 있는 어떠한 하드웨어 구성이라도 사용될 수 있다는 점이 강조된다. 4 illustrates a computing system 250 that may be used in connection with an image-sound capture device 106 in accordance with an embodiment of the present invention. The computing system 250 includes a processor 252 and a memory 256. The bus 254 interconnects the processor and memory 256 with the image-sound capture device 106. The memory 256 will include at least a portion of the interactive program 258 and will also include optional sound source listening logic or code 260 for processing the received sound source data. Based on the focus zone being identified by the image capture unit 106b, the external sound source of the focus zone will be selectively filtered by executing the optional sound source listening logic 260 (e.g., stored at least in part in the processor and memory 256). By). While the computing system appears in a simplified form, it is emphasized that any hardware configuration can be used that allows the hardware to process instructions to achieve processing of the incoming sound and to allow for selective listening.

컴퓨팅 시스템 250은 버스를 경유하여 디스플레이 110와 역시 상호 연결된다. 이 예시에서, 포커스 구역은 음원 B를 향해 포커스 된 이미지 캡쳐 유닛에 의해 식별된다. 음원 A와 같은 다른 음원으로부터 오는 사운드는 상기 사운드가 사운드 캡쳐 유닛 106a에 의해 캡쳐되고 컴퓨팅 시스템 250에 전송될 때, 본질적으로 선택적 음원 청취 로직 260에 의해 필터링될 것이다. Computing system 250 is also interconnected with display 110 via a bus. In this example, the focus zone is identified by the image capture unit focused towards sound source B. Sound from other sound sources, such as sound source A, will essentially be filtered by the optional sound source listening logic 260 when the sound is captured by sound capture unit 106a and transmitted to computing system 250.

한 특정 실시예에 있어서, 플레이어는 각 사용자의 주요한 가청 경험이 스피커를 경유하는 다른 사용자들과 함께 인터넷 또는 네트워크 비디오 게임 시합에 참여할 수 있다. 상기 스피커는 컴퓨팅 시스템의 부분이 될 수 있거나 혹은 모니터 108의 부분이 될 수 있다. 따라서, 상기 로컬 스피커가 도 4에 도시된 바와 같이 음원 A를 발생하고 있다고 가정한다. 음원 A가 경합하는 사용자에게 로컬 스피커로부터 나오는 사운드를 피드백 받지 않기 위해, 상기 선택적 음원 청취 로직 260은 상기 경합하는 사용자가 그 또는 그녀 자신의 사운드 또는 음성의 피드백을 제공받지 않도록 음원 A의 사운드를 필터링할 것이다. 비디오 게임과 인터페이싱 하는 동안,이 필터링을 제공함으로써 상기 프로세스 동안 부정적인 피드백을 이롭게 피하는 반면, 네트워크를 통한 상호 작용 통신을 하는 것이 가능하다. In one particular embodiment, the player may participate in the Internet or network video game competition with other users whose primary audible experience is through the speaker. The speaker may be part of the computing system or may be part of the monitor 108. Therefore, it is assumed that the local speaker is generating the sound source A as shown in FIG. In order that sound source A does not receive feedback from the local speaker to the contested user, the optional sound source listening logic 260 filters the sound of sound source A such that the contested user is not provided with feedback of his or her own sound or voice. something to do. During interfacing with video games, it is possible to provide this filtering for interactive communication over the network, while advantageously avoiding negative feedback during the process.

도 5는 이미지-음원 캡쳐 장치 106이 적어도 4개의 마이크로폰(마이크 1부터 마이크 4)을 포함하는 예시를 묘사한다. 따라서 사운드 캡쳐 유닛 106a는 삼각 측량이 가능하여 더 나은 입상을 갖고 음원 116(A 및 B)의 위치를 식별하는 것이 가능하다. 즉, 부가적인 마이크로폰을 제공함으로써, 음원의 위치를 더 정확하게 정의하는 것이 가능하고, 따라서 관심대상 밖이거나 또는 게임 플레이나 컴퓨팅 시스 템과의 상호 작용에 방해가 되는 음원을 삭제하고 필터링하는 것이 가능하다. 도 5에 도시된 바에 따르면, 음원 116(B)는 비디오 캡쳐 유닛 106b에 의해 식별된 바와 같이 관심대상의 음원이다. 도 5의 예시에 계속하여, 도 6은 음원 B가 공간적인 볼륨에 식별되는 방법을 도시한다. 5 depicts an example in which the image-source capture device 106 includes at least four microphones (microphones 1 through 4). Thus, the sound capture unit 106a can be triangulated to have a better granularity and to identify the position of the sound sources 116 (A and B). In other words, by providing an additional microphone, it is possible to more accurately define the position of the sound source, and thus it is possible to delete and filter out sound sources that are outside of interest or which interfere with gameplay or interaction with the computing system. . As shown in FIG. 5, sound source 116 (B) is the sound source of interest as identified by video capture unit 106b. Continuing with the example of FIG. 5, FIG. 6 shows how sound source B is identified in spatial volume.

음원 B가 위치하는 상기 공간적 볼륨은 포커스 볼륨 274를 정의할 것이다. 포커스 볼륨을 식별하는 것에 의하여, 특정 볼륨(예를 들어,바로 그 방향에 있지 않은) 내에 있지 않은 잡음을 삭제하거나 필터링하는 것이 가능하다. 포커스 볼륨 274의 선택을 용이하게 하기 위해, 상기 이미지-사운드 캡쳐 장치 106은 바람직하게는 적어도 4개의 마이크로폰을 포함할 것이다. 적어도 하나의 마이크로폰은 세 개의 마이크로폰과 다른 평면에 놓여질 것이다. 상기 마이크로폰 중 하나를 평면 271에 유지하고, 나머지 마이크로폰을 상기 이미지-사운드 캡쳐 장치 106의 평면 270에 유지함으로써, 공간적 볼륨을 정의하는 것이 가능하다. The spatial volume where sound source B is located will define a focus volume 274. By identifying the focus volume, it is possible to eliminate or filter out noise that is not within a particular volume (eg, not directly in that direction). To facilitate the selection of the focus volume 274, the image-sound capture device 106 will preferably include at least four microphones. At least one microphone will be placed in a plane different from the three microphones. By holding one of the microphones in plane 271 and the other microphone in plane 270 of the image-sound capture device 106, it is possible to define a spatial volume.

결과적으로, 근처(276a 및 276b에서 보여진 것과 같이)의 다른 사람들로부터 오는 잡음은 볼륨 포커스 274에서 정의된 공간적 볼륨 내에 놓여있지 않기 때문에 필터링될 것이다. 부가적으로, 스피커 276c와 같이, 상기 공간적 볼륨 외부에서 생성될 수 있는 잡음 역시 공간적 볼륨 외부에서 일어난 것이기 때문에 필터링될 수 있다. As a result, noise from others in the vicinity (as shown at 276a and 276b) will be filtered because it does not lie within the spatial volume defined at volume focus 274. In addition, noise that may be generated outside the spatial volume, such as speaker 276c, may also be filtered because it occurs outside the spatial volume.

도 7은 본 발명의 실시예에 따른 순서도를 도시한다. 상기 방법은, 둘 이상의 사운드 캡쳐 마이크로폰에서 하나 이상의 음원으로부터 입력이 수신되는 302 단계에서 시작한다. 한 예시에서, 상기 둘 이상의 사운드 캡쳐 마이크로폰은 이미지- 사운드 캡쳐 장치 106에 통합된다. 택일적으로, 상기 둘 이상의 사운드 캡쳐 마이크로폰은 상기 이미지 캡쳐 유닛 106b와 인터페이스 하는 제2 모듈/하우징의 일부가 될 수 있다. 택일적으로, 상기 사운드 캡쳐 유닛 106a는 다수의 사운드 캡쳐 마이크로폰을 포함할 수 있고, 사운드 캡쳐 마이크로폰은 컴퓨팅 시스템과 인터페이싱 할 수 있는 사용자로부터 오는 사운드를 캡쳐하기 위해 설계된 특정 위치에 놓일 수 있다. 7 shows a flowchart in accordance with an embodiment of the present invention. The method begins at step 302 where input is received from one or more sound sources at two or more sound capture microphones. In one example, the two or more sound capture microphones are integrated into the image-sound capture device 106. Alternatively, the two or more sound capture microphones may be part of a second module / housing that interfaces with the image capture unit 106b. Alternatively, the sound capture unit 106a may include a number of sound capture microphones, which may be placed in a specific location designed to capture sound from a user capable of interfacing with the computing system.

상기 방법은 각각의 음원을 위한 지연 경로가 결정되는 304 단계로 진행한다. 지연 경로의 예는 도 3A의 201 및 202 사운드 경로에 의해 정의된다. 잘 알려진 바와 같이, 상기 지연 경로는 사운드 파동이 음원으로부터 상기 사운드를 캡쳐하기 위해 놓여진 특정 마이크로폰까지 이동하는데 걸리는 시간을 정의한다. 사운드가 상기 특정 음원 116으로부터 이동하는데 걸리는 지연에 기초하여, 상기 마이크로폰은 표준 삼각 측량 알고리즘을 사용하여 지연 및 사운드가 발산하는 대략의 위치를 결정할 수 있다.The method proceeds to step 304 where a delay path for each sound source is determined. Examples of delay paths are defined by the 201 and 202 sound paths in FIG. 3A. As is well known, the delay path defines the time it takes for a sound wave to travel from a sound source to a particular microphone placed to capture the sound. Based on the delay it takes for the sound to travel from the particular sound source 116, the microphone can use a standard triangulation algorithm to determine the delay and the approximate location at which the sound emanates.

상기 방법은 수신된 하나 이상의 음원 입력의 각각에 대한 방향이 식별되는 306 단계로 진행한다. 즉, 음원 116로부터 발생하는 사운드의 방향은, 상기 사운드 캡쳐 유닛 106a을 포함하고 있는 이미지-사운드 캡쳐 장치의 위치에 상대적으로 식별된다. 상기 식별된 방향에 기초하여, 포커스 구역(또는 볼륨)의 식별된 방향에 있지 않은 음원은 308 단계에서 필터링된다. 포커스 구역 근처에 있는 방향으로부터 발생하지 않은 음원을 필터링함으로써 310단계에서 보여진 바와 같이 컴퓨터 프로그램과의 상호 작용을 위해, 필터링되지 않은 음원을 사용하는 것이 가능하다. The method proceeds to step 306 where a direction for each of the received one or more sound source inputs is identified. That is, the direction of sound originating from the sound source 116 is identified relative to the position of the image-sound capture device that includes the sound capture unit 106a. Based on the identified direction, the sound source not in the identified direction of the focus area (or volume) is filtered in step 308. It is possible to use an unfiltered sound source for interaction with a computer program as shown in step 310 by filtering the sound source that did not originate from a direction near the focus area.

예를 들어, 상기 상호 작용 프로그램은 사용자가 비디오 게임의 인물 또는 상기 비디오 게임의 제1 플레이어와 적수가 될 수 있는 플레이어와 상호 작용 통신을 할 수 있는 비디오 게임이 될 수 있다. 상기 적수의 플레이어는 로컬 또는 원격의 위치에 있을 수 있고, 인터넷과 같은 네트워크를 통하여 제1 사용자와 통신을 할 수 있다. 부가적으로, 상기 비디오 게임은 상기 비디오 게임과 관련된 특정 콘테스트에서 각각 다른 사람의 기술을 상호 작용하여 도전하도록 구성된 그룹 내에 있는 수 많은 사용자들 사이에서 플레이 될 수 있다.For example, the interactive program may be a video game in which the user may interact with a person who may be an enemy of the person of the video game or the first player of the video game. The enemy player may be in a local or remote location and may communicate with the first user via a network such as the Internet. In addition, the video game may be played between a large number of users in a group configured to interactively challenge each other's skills in a particular contest associated with the video game.

도 8은 340 단계에서 수신된 입력을 수행하는 소프트웨어 실행 단계와 분리된, 이미지-사운드 켭쳐 장치에서 320단계가 묘사되는 것을 보여주는 순서도를 도시한다. 따라서, 302 단계에서 둘 이상의 사운드 캡쳐 마이크로폰에서 하나 이상의 음원으로부터 입력이 수신되면, 상기 방법은 소프트웨어 상에서 음원 각각에 대한 지연 경로가 결정되는 304 단계로 진행한다. 상기 언급한 바와 같이, 상기 지연 경로에 기초하여 상기 수신된 입력 각각에 대한 방향이 하나 이상의 음원 각각을 위해 306 단계에서 식별된다. FIG. 8 shows a flow chart depicting step 320 depicted in an image-sound on device, separate from the software execution step of performing the input received at step 340. Thus, if an input from one or more sound sources is received at two or more sound capture microphones in step 302, the method proceeds to step 304 where a delay path for each of the sound sources is determined in software. As mentioned above, a direction for each of the received inputs based on the delay path is identified in step 306 for each of one or more sound sources.

이러한 관점에서, 상기 방법은 비디오 캡쳐의 근처에 있는 상기 식별된 방향이 결정되는 312 단계로 이동한다. 예를 들어 도 1에서 보여진 바와 같이, 비디오 캡쳐는 활동 이미지 영역에서 표적이 될 수 있다. 따라서, 비디오 캡쳐의 근처는 활동 이미지 영역(또는 볼륨) 내에 있을 수 있고, 이 활동 영역 또는 이 활동 영역 근처 내에 있는 음원과 관련된 어떠한 방향이라도 결정될 것이다. 이러한 결정에 기초하여, 상기 방법은 비디오 캡쳐 근처에 있지 않은 방향이 필터링 되는 314 단 계로 진행한다. 따라서, 제1 플레이어의 비디오 게임 플레이를 방해할 수 있는 소동, 잡음 및 다른 외부 입력은 게임 플레이 동안 실행되고 있는 소프트웨어에 의해 수행되는 프로세싱에서 필터링될 것이다.In this regard, the method moves to step 312 where the identified direction in the vicinity of video capture is determined. For example, as shown in FIG. 1, video capture can be targeted in the active image area. Thus, the vicinity of the video capture may be within the active image area (or volume), and any direction relative to the sound source within or near this active area will be determined. Based on this determination, the method proceeds to step 314 where the direction not near the video capture is filtered. Thus, disturbances, noise and other external inputs that may interfere with the first player's video game play will be filtered out of the processing performed by software running during game play.

결론적으로, 제1 사용자는 비디오 게임과 상호작용 할 수 있고, 비디오 게임을 적극적으로 사용하고 있는 비디오 게임의 다른 사용자와 상호작용 할 수 있고 또는 로그인 되거나 또는 관심대상의 동일한 비디오 게임을 위해 거래와 연관될 수 있는 네트워크를 통해 다른 사용자들과 통신할 수 있다. 그러한 비디오 게임 커뮤니케이션, 상호 작용성 및 제어는, 외부 잡음 및/또는 특정 게임 또는 상호 작용 프로그램에 있어서, 상호 작용하여 통신 또는 참여하도록 의도되지 않은 관찰자에 의해 방해 받지 않을 것이다. In conclusion, the first user may interact with the video game, interact with other users of the video game actively using the video game, or may be logged in or associated with a transaction for the same video game of interest. It can communicate with other users via a network that can be. Such video game communication, interactivity and control will not be disturbed by observers who are not intended to interact, communicate or participate in external noise and / or a particular game or interaction program.

상기 묘사된 실시예는 온라인 게임 어플리케이션에도 적용될 수 있다는 사실을 인지해야 한다. 즉, 상기 기술된 실시예는 비디오 신호를 인터넷과 같은 분배된 네트워크를 통하여 복수의 사용자에게 전송하여, 플레이어들이 원격의 시끄러운 위치에서 상호간 통신이 가능하게 하는 서버에서 발생할 수 있다. 상기 기술된 실시예는 하드웨어 또는 소프트웨어 이행을 통해 수행되어야 한다는 점 역시 인지되어야 한다. 즉, 상기 논의된 기능상의 실시예는 잡음 해제 스킴과 연관된 모듈 각각을 위한 기능상의 업무를 수행하도록 구성된 로직을 갖는 마이크로칩을 정의하도록 합성될 수 있다. It should be appreciated that the embodiment depicted above can also be applied to online game applications. That is, the embodiment described above may occur in a server that transmits a video signal to a plurality of users via a distributed network such as the Internet, allowing players to communicate with each other in remote noisy locations. It should also be appreciated that the above described embodiments must be performed through hardware or software implementation. That is, the functional embodiments discussed above can be synthesized to define a microchip having logic configured to perform functional tasks for each module associated with the noise cancellation scheme.

역시, 음원의 선택적 필터링은 전화기와 같은 다른 어플리케이션에도 적용될 수 있다. 전화를 사용하는 환경에서, 제 3자(수신자)와 대화를 갖기를 원하는 최초 의 사람(발신자)이 존재한다. 그러나 통신을 하는 동안, 주변에서 말을 하거나 소음을 만드는 다른 사람들이 존재할 수 있다. 상기 최초의 사람(예를 들어, 수화기의 방향에 의해)을 향하여 설정된 상기 전화기는 최초 사용자의 입으로부터 나오는 사운드를 포커스 구역으로 만들 수 있고, 따라서 오직 최초의 사용자에게 청취를 위한 선택을 가능하도록 한다. 따라서, 상기 선택적 청취는 최초의 사람과 관련 없는 음성 또는 잡음의 실질적인 필터링을 가능하게 하고, 따라서 수신단에서 전화기를 사용하는 최초의 사람으로부터 더 선명한 통신을 수신할 수 있다. Again, selective filtering of the sound source can be applied to other applications such as telephones. In an environment using a telephone, there is the first person (sender) who wants to have a conversation with a third party (recipient). However, while communicating, there may be other people around you who are talking or making noise. The telephone set towards the first person (eg by the direction of the handset) can make the sound coming from the mouth of the first user into the focus area, thus allowing only the first user a choice for listening. . Thus, the selective listening enables substantial filtering of voice or noise unrelated to the original person, and thus can receive clearer communication from the first person using the telephone at the receiving end.

부가적인 기술들은 제어 또는 통신을 위한 입력으로서 사운드를 취하는 것으로 이익이 될 수 있는 다른 전자 장치를 포함할 수 있다. 예를 들어, 사용자는 사용자의 명령을 혼란시키는 다른 승객들로부터의 잡음을 피하면서, 음성 명령으로 자동차의 셋팅을 제어할 수 있다. 다른 어플리케이션은 브라우징 어플리케이션, 문서 처리 또는 통신과 같은 컴퓨터 제어 어플리케이션을 포함할 수 있다. 이러한 필터링을 가능하게 함으로써, 주변 사운드에 의한 방해 없이 음성 또는 사운드 명령을 더 효과적으로 내리는 것이 가능하다. 그러한 것으로 어떠한 전자 장치라도 가능하다. Additional techniques may include other electronic devices that may benefit from taking sound as input for control or communication. For example, the user can control the car's settings with voice commands while avoiding noise from other passengers that may confuse the user's commands. Other applications may include computer controlled applications such as browsing applications, document processing or communications. By enabling this filtering, it is possible to more effectively issue voice or sound commands without disturbing the ambient sound. As such, any electronic device is possible.

게다가, 본 발명의 실시예는 다양한 어플리케이션에 적용될 수 있고, 청구항의 범위는 상기 실시예로부터 이익이 될 수 있는 어떠한 어플리게이션이라도 포함하도록 해석되어야 한다. In addition, embodiments of the present invention may be applied to various applications, and the scope of the claims should be construed to include any application that may benefit from the above embodiments.

예를 들어, 유사한 어플리케이션에 있어서, 사운드 분석을 사용하여 음원을 필터링하는 것이 가능할 수 있다. 만약 사운드 분석이 사용된다면, 겨우 하나의 마 이크로폰만을 사용하는 것이 가능하다. 상기 단일의 마이크로폰에 의해 캡쳐된 상기 사운드는 디지털적으로 분석되어 어떠한 음성 또는 사운드가 관심대상 인지 여부를 결정할 수 있다. 게임과 같은 어떠한 환경에 있어서, 특정 음성을 식별하는 시스템을 교육하기 위해 최초 사용자가 그의 또는 그녀의 목소리를 한때 기록하는 것이 가능하다. 이러한 방법으로, 다른 음성 또는 사운드의 배제는 용이할 것이다. 결과적으로, 필터링이 하나의 사운드 톤 및/또는 주파수에 기초하여 이루어지기 때문에, 방향을 식별하는 것은 필요적이지 않다. For example, in similar applications, it may be possible to filter sound sources using sound analysis. If sound analysis is used, it is possible to use only one microphone. The sound captured by the single microphone can be digitally analyzed to determine which voice or sound is of interest. In some environments, such as games, it is possible for an initial user to record his or her voice once to train a system that identifies a particular voice. In this way, the exclusion of other voices or sounds will be easy. As a result, since the filtering is based on one sound tone and / or frequency, it is not necessary to identify the direction.

방향 및 볼륨이 고려될 때, 사운드 필터링와 관련된 상기 언급된 모든 이점들은 동등하게 적용할 수 있다. When direction and volume are taken into account, all of the above mentioned advantages associated with sound filtering are equally applicable.

상기 실시예와 함께, 상기 발명은 컴퓨터 시스템에 저장된 데이터를 포함한 다양한 컴퓨터-이행 작업을 사용할 수 있다는 것이 이해되어야 한다. 이러한 작업들은 물리적 양의 물리적 조종이 필요한 작업을 포함할 수 있다. 필요적이지 않음에도 불구하고, 보통 이러한 양은 저장되고, 전송되고, 결합하고, 비교되고 다른 방법으로 조종되는 것이 가능한 전기적 혹은 자기적 신호의 형태를 취한다. 게다가, 상기 수행된 조종은 생산, 식별, 결정 또는 비교와 같은 용어에 관련된다. In addition to the above embodiments, it should be understood that the invention may employ various computer-implemented tasks, including data stored on computer systems. These tasks may include tasks that require physical manipulation of physical quantities. Although not required, these quantities usually take the form of electrical or magnetic signals that can be stored, transmitted, combined, compared and manipulated in other ways. In addition, the manipulations performed are related to terms such as production, identification, determination or comparison.

상기 기술된 발명은 핸드-헬드 장치, 마이크로프로세서 시스템, 마이크로프로세서 기반의 또는 프로그램 가능한 가전제품, 미니컴퓨터, 메임프레임 컴퓨터 및 그와 같은 것을 포함한 다른 컴퓨터 시스템 구성으로 실행될 수 있다. 상기 발명은 작업이 통신 네트워크를 통해 링크된 원격 프로세싱 장치에 의해 수행되는 연산 환경을 분배하는데 역시 사용할 수 있다. The invention described above may be practiced with other computer system configurations, including hand-held devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like. The invention may also be used to distribute computing environments where tasks are performed by remote processing devices that are linked through a communications network.

상기 발명은 역시 컴퓨터 판독 가능 매체에 있어서 컴퓨터 판독 가능 코드로서 실시될 수 있다. 상기 컴퓨터 판독 가능 매체는 전자기 파동 반송파를 포함하여 컴퓨터 시스템에 의해 읽혀질 수 있는 데이터를 저장할 수 있는 어떠한 데이터 저장 장치이다. 상기 컴퓨터 판독 가능 매체의 예는 하드드라이브, 나스(Network Attached Storage, NAS), ROM, RAM, CD-ROMs, CD-Rs, CD-RWs, 자기 테이프 및 다른 광학 및 비광학 데이터 저장 장치를 포함한다. 상기 컴퓨터 판독 가능 매체는 분배 방식에 있어서 상기 컴퓨터 판독 가능 코드가 저장되고 실행될 수 있도록, 컴퓨터 시스템과 연결된 네트워크를 통하여 분배될 수 있다. The invention may also be embodied as computer readable code on a computer readable medium. The computer readable medium is any data storage device capable of storing data that can be read by a computer system, including electromagnetic wave carriers. Examples of such computer-readable media include hard drives, NAS (Network Attached Storage), ROMs, RAMs, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes, and other optical and non-optical data storage devices. . The computer readable medium may be distributed over a network coupled with a computer system such that the computer readable code may be stored and executed in a distributed fashion.

상기 발명이 명확한 이해의 목적을 위해 상세하게 기술되었음에도 불구하고, 첨부된 청구항의 범위 내에서 어느 정도의 변경 및 수정이 가능하다는 것은 명백하다. 따라서, 본 실시예는 예시적인 것이지 제한적으로 고려될 것은 아니다. 그리고 본 발명은 상기 기술된 상세 사항에 한정되는 것이 아니고, 상기 첨부된 청구항의 범위 및 등가물 내에서 수정될 수 있다. Although the invention has been described in detail for purposes of clarity of understanding, it is apparent that some modifications and variations are possible within the scope of the appended claims. Accordingly, this embodiment is exemplary and not to be considered as limiting. And the invention is not limited to the details described above, but may be modified within the scope and equivalents of the appended claims.

본 발명은 상기 기술된 상세 사항에 한정되는 것이 아니고, 상기 첨부된 청구항의 범위 및 등가물 내에서 수정될 수 있다. The invention is not limited to the details described above, but may be modified within the scope and equivalents of the appended claims.

Claims

An image capture unit configured to capture one or more image frames; And

A sound capture unit configured to identify one or more sound sources, the sound capture unit processing the sound to generate data that can be analyzed to determine the focus zone by substantially excluding sound outside the focus zone,

Sound captured and processed for the focus zone is used for interaction with a computer program.

The method of claim 1,

The sound capture unit includes an array of microphones, wherein the array of microphones is set to receive sound from one or more sound sources, and a path in which the sound travels from the one or more sound sources to each of the microphones is sound of each of the microphones. A device for capturing images and sounds when interacting with a computer program characterized by defining a path.

The method of claim 2,

The sound paths of different lengths provide a specific delay that enables the calculation of the direction of each of the one or more sound sources with respect to the capture device of the image and sound. .

The method of claim 1,

And a computing system for interfacing with the image and sound capture device,

The computing system includes a processor and

A memory configured to store at least a portion of the computer program and an optional sound source listening code,

The optional sound source listening code is capable of identifying one or more sound sources according to a focus zone.

The method of claim 1,

And said sound capture unit comprises at least four microphones, wherein any one of said four microphones is not coplanar with the other microphones.

The method of claim 5,

Wherein the four microphones are used to define a spatial volume.

The method of claim 6,

And the spatial volume is a focus volume for listening when interacting with the computer program.

The method of claim 7, wherein

The computer program is a game program, characterized in that for capturing an image and sound when interacting with the computer program.

The method of claim 1,

10. The method of claim 9,

Wherein said image capture unit is a camera and said sound capture unit comprises an arrangement of two or more microphones.

Receiving input from one or more sound sources at two or more sound source capture microphones;

Determining a delay path from each of the sound sources;

Identifying a direction for each of the received inputs of each of the one or more sound sources;

Filtering a sound source that is not in the identified direction of the focus zone, wherein the focus zone supplies a sound source for interaction with the computer program.

The method of claim 11,

Wherein said computer program is a game, said game receives interactive input from both image data and sound data, said sound data coming from said sound source in said focus zone. .

The method of claim 11,

Wherein said at least two sound capture microphones comprise at least four microphones, at least one of said four microphones being in a plane different from said other microphones.

The method of claim 13,

Identifying a direction for each of the received inputs of each of the one or more sound sources includes processing a triangulation algorithm,

And said triangulation algorithm determines a direction relative to a position at which said at least two sound source capture microphones receive input from said at least one sound source.

The method of claim 14,

Buffering input received from one or more sound sources associated with the two or more sound source capture microphones; And

Delaying the received and buffered input;

The filtering may further include selecting one of the sound sources.

And the selected sound source output is a summation of sounds from each of the sound source capture microphones.

An image-sound capture device configured to interface with a computing system that enables the execution of an interactive computer game,

The image-sound capture device includes video capture hardware that can be positioned to capture video from a focus area and

An array of microphones for capturing sound from one or more sound sources,

Each sound source is identified and associated with a direction relative to the image-sound capture device, wherein the focus zone associated with the video capture hardware is configured to be used to identify one of the sound sources in the direction in the focus zone. Game system.

The method of claim 16,

And the video capture hardware receives video data to enable shape and interactivity of the computer game.

The method of claim 16,

The sound source in the focus zone enables interaction with the computer game or voice communication with other game users.

The method of claim 18,

A sound source outside of the focus zone is filtered by interaction with the computer game.

A sound capture unit for capturing sound from one or more sound sources;

A processor and memory for processing and receiving the sound,

The processor is configured to execute instructions for identifying one of the sound sources in the focus zone, and the sound from the identified sound source is processed to enable interactive input with the computer program. Sound capture device.

21. The method of claim 20,

And instructions for identifying one of the sound sources use triangulation to identify the direction of each of the sound sources.

21. The method of claim 20,

And a command to identify one of the sound sources uses sound frequencies to identify each of the sound sources.

21. The method of claim 20,

And wherein said mutual input is one of a communication with a program or a communication with a third party.

21. The method of claim 20,

And the input used for the interactive input interfaces with the shape of a computer game.

21. The method of claim 20,

And the interaction input interfaces with an electronic device.

delete