KR101250610B1

KR101250610B1 - User recognition system for controlling audio using stereo camera

Info

Publication number: KR101250610B1
Application number: KR1020110016915A
Authority: KR
Inventors: 조준동; 박득현; 함헌호
Original assignee: 성균관대학교산학협력단
Priority date: 2011-02-25
Filing date: 2011-02-25
Publication date: 2013-04-03
Also published as: KR20120097607A

Abstract

본 발명에 따른 스트레오 카메라를 이용한 사용자 인식 음향 조절 시스템(100)은 게임 콘텐츠 영상이 표시되는 디스플레이부(110), 디스플레이부(110)에서 표시되는 게임 콘텐츠에 따른 음향이 출력되는 하나 이상의 스피커를 포함하는 음향 출력부(120), 스트레오 카메라로 사용자 영상을 획득하는 영상획득부(130), 영상획득부(130)에서 획득한 영상에서 사용자의 얼굴 위치를 검출하는 얼굴 위치 검출부(140), 얼굴 위치 검출부(140)에서 검출된 얼굴 위치에서 사용자의 귀 위치를 예측하는 귀위치 예측부(150) 및 귀위치 예측부에서 예측된 사용자의 귀 위치에 따라 스피커의 위치 또는 스피커의 출력 각도를 조절하는 음향신호 조절부(160)를 포함한다. The user recognition sound control system 100 using the stereo camera according to the present invention includes a display unit 110 for displaying a game content image and one or more speakers for outputting sound according to the game content displayed on the display unit 110. The sound output unit 120, the image acquisition unit 130 for acquiring the user image by the stereo camera, the face position detection unit 140 for detecting the face position of the user from the image acquisition unit 130, the face position Sound for adjusting the position of the speaker or the output angle of the speaker according to the ear position predictor 150 for predicting the user's ear position from the face position detected by the detector 140 and the user's ear position predicted by the ear position predictor It includes a signal controller 160.

Description

USER RECOGNITION SYSTEM FOR CONTROLLING AUDIO USING STEREO CAMERA}

본 발명은 스트레오 카메라를 이용한 사용자 인식 음향 조절 시스템 및 이 시스템에서 음향을 조절하는 방법에 관한 것이다. 특히 본 발명은 스트레오 카메라로 사용자 얼굴의 귀 위치를 인식하여 최적의 음향을 출력하는 음향 조절 시스템 및 이 시스템에서 음향을 조절하는 방법에 관한 것이다. The present invention relates to a user recognition sound control system using a stereo camera and a method for adjusting sound in the system. In particular, the present invention relates to a sound control system that outputs an optimal sound by recognizing the ear position of a user's face with a stereo camera, and a method of adjusting sound in the system.

종래의 홈시어터 시스템 또는 게임 시스템은 스피커의 위치가 고정되어 있기 때문에 사용자(청자)의 위치가 일정한 영역으로 제한되었다. 따라서 사용자가 위치가 변동되는 경우, 장소에 따라 출력되는 음향이 달라지는 문제점이 있었다In the conventional home theater system or game system, the position of the user (listener) is limited to a certain area because the position of the speaker is fixed. Therefore, when the user changes the position, there is a problem that the output sound is different according to the place

특히 디지털 TV와 연결하여 사용되는 게임기 중에 사용자의 동작을 인식하는 게임기는 사용자의 동작을 요하면서도, 최적의 음향효과를 위해서는 한정된 영역에서만 사용자가 동작을 수행해야 하는 문제점이 있었다.In particular, the game machine that recognizes the user's motion among the game consoles used in connection with the digital TV requires the user's motion, but there is a problem that the user must perform the motion only in a limited area for optimal sound effects.

사용자의 동작에 따라 수동으로 스피커의 위치를 변경시켜야만 음원에 최대로 근접한 음향과 현장감을 느낄 수 있는 불편함이 있는 것이다.If you change the position of the speaker manually according to the user's operation, there is an inconvenience that you can feel the sound and the sense of reality as close as possible to the sound source.

본 발명에 따른 스트레오 카메라를 이용한 사용자 인식 음향 조절 시스템 및 이 시스템에서 음향을 조절하는 방법은 다음과 같은 해결과제를 목적으로 한다.A user recognition sound control system using a stereo camera according to the present invention and a method for adjusting sound in the system aims to solve the following problems.

첫째, 게임을 하는 사용자에게 최적의 음향 효과를 출력하고자 한다.First, we want to output the optimal sound effect to the user playing the game.

둘째, 역동적으로 움직이는 사용자의 위치를 검출하여 사용자가 위치에 따라 느낄 수 있는 최적의 음향 효과를 제공하고자 한다.Second, to detect the position of the dynamic moving user to provide the optimum sound effect that the user can feel according to the position.

셋째, 홈시어터 또는 오디오 시스템을 통해 음향을 듣는 사용자가 일상적으로 이동하더라도 일정한 음향 품질을 유지하도록 하고자 한다.Third, even if a user who listens to the sound through a home theater or an audio system routinely moves, the sound quality is to be maintained.

본 발명의 해결과제는 이상에서 언급된 것들에 한정되지 않으며, 언급되지 아니한 다른 해결과제들은 아래의 기재로부터 당업자에게 명확하게 이해되어 질 수 있을 것이다.The solution to the problem of the present invention is not limited to those mentioned above, and other solutions not mentioned can be clearly understood by those skilled in the art from the following description.

본 발명에 따른 스트레오 카메라를 이용한 사용자 인식 음향 조절 시스템은 게임 콘텐츠 영상이 표시되는 디스플레이부, 디스플레이부에서 표시되는 게임 콘텐츠에 따른 음향이 출력되는 하나 이상의 스피커를 포함하는 음향 출력부, 스트레오 카메라로 사용자 영상을 획득하는 영상획득부, 영상획득부에서 획득한 영상에서 사용자의 얼굴 위치를 검출하는 얼굴위치 검출부, 얼굴위치 검출부에서 검출된 얼굴 위치에서 사용자의 귀 위치를 예측하는 귀위치 예측부 및 귀위치 예측부에서 예측된 사용자의 귀 위치에 따라 스피커의 위치 또는 스피커의 출력 각도를 조절하는 음향신호 조절부를 포함한다.A user recognition sound control system using a stereo camera according to the present invention includes a display unit for displaying a game content image, a sound output unit including one or more speakers for outputting sound according to the game content displayed on the display unit, and a stereo camera. An image acquisition unit for acquiring an image, a face position detection unit for detecting a user's face position in an image acquired by the image acquisition unit, an ear position prediction unit for predicting a user's ear position at a face position detected by the face position detection unit, and an ear position And a sound signal adjusting unit for adjusting the position of the speaker or the output angle of the speaker according to the position of the ear predicted by the predictor.

본 발명에 따른 영상 획득부는 2개의 카메라가 촬영한 이미지에서 발생한 편차 오차를 수정하는 편차 오차 수정부, 편차 오차 수정부에서 오차가 수정된 이미지에서 2개의 카메라가 촬영한 이미지 간의 정합점을 찾아 깊이 맵 이미지를 생성하는 깊이 맵 생성부 및 깊이 맵 생성부에 생성된 깊이 맵에서 사용자의 위치를 파악하는 위치 파악부를 포함한다.The image acquisition unit according to the present invention finds a depth of deviation between a deviation error corrector correcting a deviation error occurring in an image captured by two cameras and a match point between images captured by two cameras in an image in which the error is corrected in the deviation error correction part. A depth map generator for generating a map image and a location determiner for identifying a position of a user in a depth map generated in the depth map generator.

본 발명에 따른 영상 획득부는 깊이 맵 생성부에서 생성한 깊이 맵 이미지에서 노이즈를 제거하는 노이즈 제거부를 더 포함한다.The image acquisition unit according to the present invention further includes a noise removal unit for removing noise from the depth map image generated by the depth map generator.

본 발명에 따른 귀위치 예측부는 얼굴위치 검출부에서 검출된 얼굴 위치의 영상에서 얼굴의 중심이 되는 코위치를 파악하여 양쪽 귀의 위치를 예측하는 것을 특징으로 한다.The ear position predicting unit according to the present invention is characterized by estimating the positions of both ears by grasping the nose position which is the center of the face in the image of the face position detected by the face position detecting unit.

본 발명에 따른 사용자 인식 음향 조절 시스템은 스피커에서 출력되는 음향의 종류 및 음향의 세기에 따라 사용자의 양쪽 귀에 도달해야 하는 최적음향정보 데이터를 저장하는 음향정보 데이터베이스부를 더 포함할 수 있다.The user recognition sound control system according to the present invention may further include a sound information database unit for storing the optimum sound information data that should reach both ears of the user according to the type and sound intensity of the sound output from the speaker.

본 발명에 따른 음향신호 조절부는 음향 출력부에서 출력되는 음향의 종류 및 음향의 세기와 매칭되는 음향 정보를 음향 정보 데이터베이스부에서 검색하고, 검색된 음향정보를 기준으로 음향 출력부의 스피커의 위치 또는 출력 각도를 조절하는 것을 특징으로 한다.The sound signal adjusting unit according to the present invention searches for sound information matching the type and sound intensity of the sound output from the sound output unit in the sound information database unit, and the position or output angle of the speaker of the sound output unit based on the found sound information. It characterized in that to adjust.

본 발명에 따른 스트레오 카메라를 이용한 사용자 인식 시스템에서 음향을 조절하는 방법은 게임 콘텐츠에 따라 하나 이상의 스피커를 통해 음향이 출력되는 S1 단계, 스트레오 카메라를 이용하여 사용자 영상이 획득되는 S2 단계, S2 단계에서 획득한 사용자 영상에서 사용자의 얼굴 위치가 검출되는 S3 단계, S3 단계에서 검출된 얼굴 위치에서 사용자의 귀 위치가 예측되는 S4 단계 및 S4 단계에서 예측되는 귀 위치에 따라 스피커의 위치 또는 스피커의 출력 각도가 조절되는 S5 단계를 포함한다.Method for adjusting the sound in the user recognition system using a stereo camera according to the present invention in the step S1, the sound is output through one or more speakers according to the game content, in the step S2, S2 step of obtaining a user image using the stereo camera The position of the speaker or the output angle of the speaker depending on the ear position predicted in steps S3 and S4 where the position of the user's face is detected in the face position detected in the step S3 and the S4 and S4 stages that are detected in the acquired user image S5 step is adjusted.

본 발명에 따른 S2 단계는 2개의 카메라로 촬영된 이미지에서 발생한 편차 오차가 수정되는 S2-1 단계, S2-1 단계에서 편차 오차가 수정된 이미지에서 2개의 카메라가 촬영한 이미지 간 정합점을 찾아 깊이 맵이 생성되는 S2-2 단계 및 S2-2 단계에서 생성된 깊이 맵에서 사용자 위치가 파악되는 S2-3 단계를 포함한다.In the step S2 according to the present invention, the matching point between the images taken by the two cameras is found in the image in which the deviation error is corrected in the step S2-1 and the step S2-1 in which the deviation error occurred in the image photographed by the two cameras is corrected. Steps S2-2 in which the depth map is generated and step S2-3 in which the user position is identified in the depth map generated in step S2-2 are included.

본 발명에 따른 S2 단계는 S2-2 단계 후에 깊이 맵에서 노이즈가 제거되는 S2-4 단계를 더 포함할 수 있다.Step S2 according to the present invention may further include step S2-4 in which noise is removed from the depth map after step S2-2.

본 발명에 따른 S4 단계는 S3 단계에서 검출된 얼굴 위치의 영상에서 얼굴의 중심이 되는 코위치를 기준으로 양쪽 귀의 위치가 예측되는 것을 특징으로 한다.In the step S4 according to the present invention, the position of both ears is predicted based on the nose position, which is the center of the face, in the image of the face position detected in the step S3.

본 발명에 따른 사용자 인식 시스템에서 음향을 조절하는 방법은 스피커에서 출력되는 음향의 종류 및 음향의 세기에 따라 사용자의 양쪽 귀에 도달해야 하는 최적음향정보 데이터가 사전에 음향정보 데이터베이스에 저장되는 S0 단계를 더 포함할 수 있다.The method for adjusting sound in the user recognition system according to the present invention includes the step S0 in which the optimal sound information data that needs to reach both ears of the user is previously stored in the sound information database according to the type of sound and the intensity of the sound output from the speaker. It may further include.

본 발명에 따른 S5 단계는 S1 단계에서 출력되는 음향의 종류 및 음향의 세기와 매칭되는 음향 정보가 음향 정보 데이터베이스부에서 검색되고, 검색된 음향정보를 기준으로 스피커의 위치 또는 스피커의 출력각도가 조절되는 것을 특징으로 한다.In step S5 according to the present invention, the sound information matching the type and sound intensity of the sound output in step S1 is searched in the sound information database unit, and the position of the speaker or the output angle of the speaker is adjusted based on the found sound information. It is characterized by.

본 발명에 따른 스트레오 카메라를 이용한 사용자 인식 음향 조절 시스템 및 이 시스템에서 음향을 조절하는 방법은 다음과 같은 효과를 갖는다.A user recognition sound control system using a stereo camera according to the present invention and a method for adjusting sound in the system has the following effects.

첫째, 홈시어터 또는 오디오 시스템을 통해 음악이나 음향 효과를 전달받는 사용자가 실내에서 움직여도 사용자 최적의 음향이 유지된다.First, even if a user who receives music or sound effects through a home theater or an audio system moves in the room, the user's optimal sound is maintained.

둘째, 홈시어터 또는 TV와 연결된 콘솔게임에서 사용자의 움직임이 인터페이스로 사용되거나 특정 인터페이스 장치를 사용하면서 사용자가 움직이는 경우에도 최적의 음향 효과가 유지된다.Second, the optimal sound effect is maintained even when the user's movement is used as an interface in a home theater or a console game connected to a TV or when the user moves while using a specific interface device.

셋째, 특히 사용자의 얼굴이 회전하여 귀의 위치가 달라지는 경우에도 좌우 귀로 들을 수 있는 최적의 음향효과가 유지된다.Third, even when the user's face is rotated to change the position of the ear, the optimum sound effect that can be heard by the left and right ears is maintained.

본 발명의 효과는 이상에서 언급된 것들에 한정되지 않으며, 언급되지 아니한 다른 효과들은 아래의 기재로부터 당업자에게 명확하게 이해되어 질 수 있을 것이다.The effects of the present invention are not limited to those mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the following description.

도 1은 본 발명에 따른 스트레오 카메라를 이용한 사용자 인식 음향 조절 시스템의 구성을 개략적으로 도시한 블록도이다.
도 2는 본 발명의 음향조절 시스템의 구성의 일 예를 도시한 도면이다.
도 3은 본 발명에 따른 스트레오 카메라를 이용한 사용자 인식 시스템에서 음향을 조절하는 방법의 순서를 도시한 순서도이다.1 is a block diagram schematically showing the configuration of a user recognition sound control system using a stereo camera according to the present invention.
2 is a view showing an example of the configuration of the sound control system of the present invention.
3 is a flowchart illustrating a procedure of a method for adjusting sound in a user recognition system using a stereo camera according to the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It is to be understood, however, that the invention is not to be limited to the specific embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 해당 구성요소들은 상기 용어들에 의해 한정되지는 않으며, 단지 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.The terms first, second, A, B, etc. may be used to describe various components, but the components are not limited by the terms, but merely for distinguishing one component from other components. Only used as For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

본 명세서에서 사용되는 용어에서 단수의 표현은 문맥상 명백하게 다르게 해석되지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함한다" 등의 용어는 설시된 특징, 개수, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 의미하는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 단계 동작 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하지 않는 것으로 이해되어야 한다.As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It is to be understood that the present invention means that there is a part or a combination thereof, and does not exclude the presence or addition possibility of one or more other features or numbers, step operation components, parts or combinations thereof.

이하에서는 도면을 참조하면서 스트레오 카메라를 이용한 사용자 인식 음향 조절 시스템(100) 및 이 시스템에서 음향을 조절하는 방법에 관하여 구체적으로 설명하겠다.Hereinafter, a user recognition sound control system 100 using a stereo camera and a method of adjusting sound in the system will be described in detail with reference to the accompanying drawings.

도면에 대한 상세한 설명을 하기에 앞서, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. 그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다. 따라서 본 명세서를 통해 설명되는 각 구성부들의 존재 여부는 기능적으로 해석되어야 할 것이며, 이러한 이유로 본 발명의 무선 전력 전송 장치 따른 구성부들의 구성은 본 발명의 목적을 달성할 수 있는 한도 내에서 도 1과는 상이해질 수 있음을 명확히 밝혀둔다.Prior to the detailed description of the drawings, it is to be clear that the division of the components in the present specification is only divided by the main function of each component. That is, two or more constituent parts to be described below may be combined into one constituent part, or one constituent part may be divided into two or more functions according to functions that are more subdivided. Each of the components to be described below may additionally perform some or all of the functions of other components in addition to the main functions of the components, and some of the main functions of each of the components are different. Of course, it may be carried out exclusively by. Therefore, the presence or absence of each component described through this specification should be functionally interpreted, and for this reason, the configuration of the components according to the wireless power transmission apparatus of the present invention is limited to the extent that the object of the present invention can be achieved. Clearly, it can be different.

도 1은 본 발명에 따른 스트레오 카메라를 이용한 사용자 인식 음향 조절 시스템(100)의 구성을 개략적으로 도시한 블록도이다.1 is a block diagram schematically illustrating a configuration of a user recognition sound control system 100 using a stereo camera according to the present invention.

본 발명에 따른 스트레오 카메라를 이용한 사용자 인식 음향 조절 시스템(100)은 게임 콘텐츠 영상이 표시되는 디스플레이부(110), 디스플레이부(110)에서 표시되는 게임 콘텐츠에 따른 음향이 출력되는 하나 이상의 스피커를 포함하는 음향 출력부(120), 스트레오 카메라로 사용자 영상을 획득하는 영상획득부(130), 영상획득부(130)에서 획득한 영상에서 사용자의 얼굴 위치를 검출하는 얼굴 위치 검출부(140), 얼굴 위치 검출부(140)에서 검출된 얼굴 위치에서 사용자의 귀 위치를 예측하는 귀위치 예측부(150) 및 귀위치 예측부에서 예측된 사용자의 귀 위치에 따라 스피커의 위치 또는 스피커의 출력 각도를 조절하는 음향신호 조절부(160)를 포함한다. 이 음향 조절 시스템(100)은 게임 콘텐츠 영상이 표시되는 디스플레이부(110)를 갖는다. 디스플레이 장치와 콘손 게임기를 연결하여 사용자가 게임을 즐기는 경우에 대한 음향 조절 시스템(100)이다.The user recognition sound control system 100 using the stereo camera according to the present invention includes a display unit 110 for displaying a game content image and one or more speakers for outputting sound according to the game content displayed on the display unit 110. The sound output unit 120, the image acquisition unit 130 for acquiring the user image by the stereo camera, the face position detection unit 140 for detecting the face position of the user from the image acquisition unit 130, the face position Sound for adjusting the position of the speaker or the output angle of the speaker according to the ear position predictor 150 for predicting the user's ear position from the face position detected by the detector 140 and the user's ear position predicted by the ear position predictor It includes a signal controller 160. The sound control system 100 has a display unit 110 on which game content images are displayed. A sound control system 100 for a case where a user enjoys a game by connecting a display device and a conson game machine.

본 발명의 다른 측면에서 스트레오 카메라를 이용한 사용자 인식 음향 조절 시스템(100)은 음향이 출력되는 하나 이상의 스피커를 포함하는 음향 출력부(120), 스트레오 카메라로 사용자 영상을 획득하는 영상획득부(130), 영상획득부(130)에서 획득한 영상에서 사용자의 얼굴 위치를 검출하는 얼굴 위치 검출부(140), 얼굴 위치 검출부(140)에서 검출된 얼굴 위치에서 사용자의 귀 위치를 예측하는 귀위치 예측부(150) 및 귀위치 예측부에서 예측된 사용자의 귀 위치에 따라 스피커의 위치 또는 스피커의 출력 각도를 조절하는 음향신호 조절부(160)를 포함한다. 홈시어터 또는 오디오 장치로 음향 효과 또는 음악을 듣는 경우에 대한 음향 조절 시스템(100)이다.In another aspect of the present invention, the user recognition sound control system 100 using the stereo camera includes a sound output unit 120 including one or more speakers for outputting sound, and an image acquisition unit 130 for acquiring a user image by the stereo camera. The ear position predictor 140 detects the face position of the user from the image acquired by the image acquisition unit 130, and the ear position predictor predicts the ear position of the user from the face position detected by the face position detector 140 ( 150) and an acoustic signal controller 160 for adjusting the position of the speaker or the output angle of the speaker according to the position of the user's ear predicted by the ear position predictor. A sound control system 100 for listening to sound effects or music in a home theater or audio device.

음향 출력부(120)는 하나 이상의 스피커를 포함하여, 다양한 형태의 음향 시스템을 포함한다. 2채널부터 서라운드 채널까지 하나 이상의 스피커로 구현 가능한 모든 형태를 포함한다.The sound output unit 120 includes various types of sound systems, including one or more speakers. It includes all forms that can be implemented with one or more speakers, from 2 channels to surround channels.

영상 획득부는 2개의 카메라가 촬영한 이미지에서 발생한 편차 오차를 수정하는 편차 오차 수정부(131), 편차 오차 수정부(131)에서 오차가 수정된 이미지에서 2개의 카메라가 촬영한 이미지 간의 정합점을 찾아 깊이 맵 이미지를 생성하는 깊이 맵 생성부(132) 및 깊이 맵 생성부(132)에 생성된 깊이 맵에서 사용자의 위치를 파악하는 위치 파악부(133)를 포함한다.The image acquisition unit detects a matching point between the deviation error correction unit 131 for correcting the deviation error occurring in the images captured by the two cameras, and the image captured by the two cameras in the image in which the error is corrected in the deviation error correction unit 131. Depth map generator 132 for finding and generating a depth map image and a location determiner 133 for identifying the position of the user in the depth map generated in the depth map generator 132.

깊이 맵 생성부(132)(110)에서 생성되는 깊이 맵은 촬영된 객체(대상)에 대한 거리 정보를 갖고 있다. 즉 촬영에 사용된 카메라로부터 어떤 위치에 있는지에 대한 정보를 갖고 있다.The depth map generated by the depth map generators 132 and 110 has distance information on the photographed object (target). That is, it has information about where it is from the camera used for shooting.

영상 획득부는 깊이 맵 생성부(132)에서 생성한 깊이 맵 이미지에서 노이즈를 제거하는 노이즈 제거부(134)를 더 포함하는 것이 바람직하다. 노이즈 제거는 해당분야의 통상의 지식을 가진 자가 알고 있는 다양한 방법이 사용될 수 있다.The image acquirer may further include a noise remover 134 that removes noise from the depth map image generated by the depth map generator 132. Noise reduction may be used in a variety of methods known to those skilled in the art.

귀위치 예측부(150)는 얼굴 위치 검출부(140)에서 검출된 얼굴 위치의 영상에서 얼굴의 중심이 되는 코위치를 파악하여 양쪽 귀의 위치를 예측한다. 물론 깊이 맵 이미지를 통해 귀 위치 자체를 파악할 수도 있을 것이다. The ear position predictor 150 detects the nose position, which is the center of the face, in the image of the face position detected by the face position detector 140 to predict the positions of both ears. Of course, the depth map image can also identify the ear position itself.

다만, 사용자의 역동적인 움직임 속에서 사용자 위치를 실시간으로 검출하면서 스피커의 위치 또는 출력 각도를 조절하기 위해서는 사용자 위치 검출을 조속하게 처리해야 한다.However, in order to adjust the position or output angle of the speaker while detecting the user's position in real time in the dynamic movement of the user, the user's position detection must be processed quickly.

따라서 깊이 맵 이미지에서 얼굴 영역을 검출하고, 검출된 얼굴 영역의 중심부위 좌우 끝에 귀가 위치할 것이라고 예측하고, 음향을 조절할 수도 있다. 또한 얼굴 영역 중 중심이 되는 코의 위치를 파악하면, 코의 위치와 대략 수평선상에 존재하는 귀의 위치를 비교적 빠르면서 정확하게 검출할 수 있다.Accordingly, the face region may be detected in the depth map image, the ear may be located at the left and right ends on the center of the detected face region, and the sound may be adjusted. In addition, if the position of the nose which is the center of the face area is known, the position of the nose and the position of the ear existing on the horizontal line can be detected relatively quickly and accurately.

얼굴 영역에서 귀의 위치 또는 코의 위치에 따른 귀의 상대적인 위치는 사람마다 다르다. 따라서 정확한 음향 출력을 위해서는 사전에 귀의 위치 정보를 저장하고, 이를 귀 위치 예측에 사용하는 것이 바람직하다. 즉 게임을 진행하기 전에 사용자의 얼굴 영역을 스캔하거나 카메라로 촬영하여 얼굴 윤곽선에서의 정확한 귀 위치, 코의 위치 기준으로 한 귀의 상대적 위치 또는 눈의 위치 기준으로 한 귀의 상대적 위치 등의 정보를 사전에 시스템에 저장하는 것이 바람직하다.The relative position of the ear, depending on the position of the ear or nose in the facial area, varies from person to person. Therefore, for accurate sound output, it is desirable to store ear position information in advance and use it for ear position prediction. In other words, the user's face area can be scanned or photographed with a camera before the game is played to determine the exact ear position in the contour of the face, the relative position of the ear relative to the nose position, or the relative position of the ear relative to the eye position. It is desirable to store in the system.

얼굴 윤곽선 검출, 코의 검출 또는 눈의 검출의 해당 분야의 통상의 지식을 가진 자가 사용할 수 있는 다양한 이미지 처리 방법을 이용해 검출이 가능하다.Detection is possible using various image processing methods available to those skilled in the art of facial contour detection, nose detection or eye detection.

사용자 인식 음향 조절 시스템(100)은 스피커에서 출력되는 음향의 종류 및 음향의 세기에 따라 사용자의 양쪽 귀에 도달해야 하는 최적음향정보 데이터를 저장하는 음향정보 데이터베이스부를 더 포함하는 것이 바람직하다.The user recognition sound control system 100 may further include a sound information database unit for storing the optimal sound information data that should reach both ears of the user according to the type and sound intensity of the sound output from the speaker.

예컨대, 오디오로 음악을 듣는다면 음악의 종류 또는 음량에 따라 스피커의 배치 및/또는 스피커의 좌우 각도나 상하 각도가 달라지는 것이 바람직할 수 있다. 디스플레이 장치에 연결된 콘솔게임기로 게임을 진행하는 경우, 게임의 종류 또는 게임의 내용에 따른 음향 효과가 게임에 참여한 사용자의 위치 기준으로 달라지는 것이 바람직하다. 결국 최적의 음향 효과를 나타내기 위한 스피커의 특정 위치, 스피커의 출력 각도, 스피커의 이동 여부(이동하면서 음향 출력) 등이 음향정보 데이터베이스부에 저장되는 것이다.For example, when listening to music through audio, it may be desirable to arrange the speaker and / or change the left and right angles or the vertical angle of the speaker according to the type or volume of the music. When playing a game with a console game machine connected to the display device, it is preferable that the sound effect according to the type of game or the content of the game is changed based on the location of the user who participated in the game. As a result, a specific location of the speaker, an output angle of the speaker, and whether or not the speaker moves (sound output while moving) are stored in the sound information database unit for the optimal sound effect.

스피커의 이동 경우 게임이나 홈시어터로 감상하는 영상 중 특정 대상물이 사용자 기준으로 멀어지거나 가까워지는 경우, 이를 반영하기 위한 것이다. 단지 음향 효과의 출력을 변경시켜도 유사한 효과가 가능하나 보다 역동적인 음향 효과를 위해 스피커 자체를 이동시키는 것이 바람직할 수도 있다.In case of moving the speaker, it is to reflect this when a specific object moves away from or close to the user's standard among images watched by a game or home theater. A similar effect is possible just by changing the output of the sound effect, but it may be desirable to move the speaker itself for a more dynamic sound effect.

음향신호 조절부(160)는 음향 출력부(120)에서 출력되는 음향의 종류 및 음향의 세기와 매칭되는 음향 정보를 음향 정보 데이터베이스부에서 검색하고, 검색된 음향정보를 기준으로 음향 출력부(120)의 스피커의 위치 또는 출력 각도를 조절하게 된다.The sound signal controller 160 retrieves sound information matching the type and sound intensity of the sound output from the sound output unit 120 in the sound information database unit, and outputs the sound output unit 120 based on the found sound information. Adjust the speaker's position or output angle.

나아가 게임이라면 게임 콘텐츠인 메모리 카드 또는 DVD 형태로 저장된 데이터에 스피커 조절과 관련된 음향 정보가 사전에 저장되어 있을 수 있다. 이 경우 사용자의 위치만 파악하면 저장되어 있는 음향 정보에 따라 음향 효과를 출력하면 된다.Furthermore, in the case of a game, sound information related to speaker control may be previously stored in data stored in a memory card or a DVD form that is game content. In this case, the user only needs to know the location of the user and outputs sound effects according to the stored sound information.

음향신호 조절부(160)는 스피커의 위치 이동 또는 스피커의 출력 방향(각도) 변경을 해야하므로, 이동 또는 출력 각도 변경을 위한 기계장치를 포함하고 있어야 한다. 스피커 이동을 위해서는 레일 장치 또는 스피커 자체에 이동 가능한 바퀴 등의 구동장치를 갖는 것이 바람직하다. 스피커 회전을 위해서는 스피커 자체 또는 스피커를 하우징하는 케이스에 회전을 위한 장치를 갖추어야 한다. 또는 물리적으로 스피커를 회전시키는 대신 스피커가 회전한 것과 유사한 효과를 주는 음향 제어 장치를 이용할 수도 있다.Since the sound signal controller 160 needs to change the position of the speaker or change the output direction (angle) of the speaker, the sound signal controller 160 should include a mechanism for moving or changing the output angle. In order to move the speaker, it is desirable to have a driving device such as a wheel or the like movable on the rail device or the speaker itself. To rotate the speaker, the speaker itself or a case housing the speaker must be equipped with a device for rotation. Alternatively, instead of physically rotating the speaker, an acoustic control device having an effect similar to that of the speaker rotated may be used.

도 2는 본 발명의 음향조절 시스템의 구성의 일 예를 도시한 도면이다. 도 2는 디지털 TV(디스플레이부(110))에 연결된 콘솔 게임기로 사용자가 게임을 플레이하고 있는 장면이다. 콘솔 게임기는 디스플레이부(110)를 통해 연결되거나, 또는 직접 복수 개의 스피커를 포함하는 음향 출력부(120)와 연결되어 음향을 출력한다.2 is a view showing an example of the configuration of the sound control system of the present invention. 2 is a scene in which a user plays a game with a console game machine connected to a digital TV (display unit 110). The console game machine is connected through the display 110 or directly connected to the sound output unit 120 including a plurality of speakers to output sound.

도 2에서는 카메라로 모션을 캡쳐하여 게임을 플레이하는 방식을 가정하여 설명하고 있으나, 사용자가 리모컨 같은 인터페이스 장치를 통해 플레이하는 전통적인 게임 방식에도 적용가능하다. 사용자가 게임 중 이동하거나, 사용자의 이동 자체가 인터페이스로 작동하는 경우, 해당 사용자의 움직임을 검출하여 최적의 음향을 출력한다.Although FIG. 2 assumes a method of playing a game by capturing motion with a camera, the present invention is also applicable to a traditional game method in which a user plays through an interface device such as a remote controller. When the user moves during the game or the user's movement itself acts as an interface, the user's movement is detected and the optimum sound is output.

특히 사용자의 얼굴 위치를 인식하고, 귀위치를 예측하여 스피커의 위치 및 스피커의 출력 각도를 조절한다. 도 2의 하단에 위치한 스피커는 이동하는 경우를 설명하였고, 디스플레이부(110) 옆에 위치한 메인 스피커는 회전하는 모습을 도시하였다. 다만 스피커 회전은 좌우뿐만 아니라 상하 등 다양한 각도로도 가능하다.In particular, it recognizes the position of the user's face, predicts the ear position, and adjusts the position of the speaker and the output angle of the speaker. 2 illustrates a case in which the speaker moves, and the main speaker located next to the display 110 rotates. However, the speaker can be rotated at various angles such as up and down as well as up and down.

이하 스트레오 카메라를 이용한 사용자 인식 시스템에서 음향을 조절하는 방법에 대해 설명한다. 전술한 스트레오 카메라를 이용한 사용자 인식 음향 조절 시스템(100)와 중복되는 내용은 간략하게 설명하도록 한다.Hereinafter, a method of adjusting sound in a user recognition system using a stereo camera will be described. The overlapping contents of the user recognition sound control system 100 using the stereo camera will be described briefly.

도 3은 본 발명에 따른 스트레오 카메라를 이용한 사용자 인식 시스템에서 음향을 조절하는 방법의 순서를 도시한 순서도이다.3 is a flowchart illustrating a procedure of a method for adjusting sound in a user recognition system using a stereo camera according to the present invention.

S2 단계는 2개의 카메라로 촬영된 이미지에서 발생한 편차 오차가 수정되는 S2-1 단계, S2-1 단계에서 편차 오차가 수정된 이미지에서 2개의 카메라가 촬영한 이미지 간 정합점을 찾아 깊이 맵이 생성되는 S2-2 단계 및 S2-2 단계에서 생성된 깊이 맵에서 사용자 위치가 파악되는 S2-3 단계를 포함한다.In step S2, the depth map is generated by finding the matching point between the images taken by the two cameras in the step S2-1 in which the deviation error occurred in the images captured by the two cameras is corrected, and in the image in which the deviation error is corrected in the S2-1 step. In step S2-2 and the depth map generated in the step S2-2 comprises a step S2-3 where the user location is identified.

나아가 S2 단계는 S2-2 단계 후에 깊이 맵에서 노이즈가 제거되는 S2-4 단계를 더 포함하는 것이 바람직하다. S2-4 단계는 이미지 처리 프로세스상 S2-2 단계 후에 수행되는 것이 바람직하다.Furthermore, it is preferable that step S2 further includes step S2-4 in which noise is removed from the depth map after step S2-2. Step S2-4 is preferably performed after step S2-2 in the image processing process.

S4 단계는 S3 단계에서 검출된 얼굴 위치의 영상에서 눈의 위치를 기준으로 또는 얼굴의 중심이 되는 코 위치를 기준으로 양쪽 귀의 위치가 예측될 수 있다. 기타 다양한 방법을 통해 귀 위치가 예측될 수도 있을 것이다.In step S4, the positions of both ears may be predicted based on the position of the eye or the nose position which is the center of the face in the image of the face position detected in step S3. Various other methods may predict ear position.

사용자 인식 시스템에서 음향을 조절하는 방법은 스피커에서 출력되는 음향의 종류 및 음향의 세기에 따라 사용자의 양쪽 귀에 도달해야 하는 최적음향정보 데이터가 사전에 음향정보 데이터베이스에 저장되는 S0 단계를 더 포함하는 것이 바람직하다. 또는 전술한 바와 같이 홈시어터로 감상하는 DVD 타이틀이나 콘솔게임기로 구동하는 게임팩(메모리) 또는 DVD 타이틀에 해당 영화나 게임에 최적인 음향정보가 사전에 저장하고, 이를 시스템을 통해 읽어들여 사용할 수 있다.The method of adjusting the sound in the user recognition system may further include a step S0 in which the optimal sound information data that needs to reach both ears of the user is previously stored in the sound information database according to the type of sound output from the speaker and the sound intensity. desirable. As described above, the sound information optimal for the movie or game is stored in advance in a DVD title, a game pack (memory) driven by a home theater, a console game machine, or a DVD title, and can be read and used through the system. have.

음향정보 데이터베이스부가 사용되는 경우, S5 단계는 S1 단계에서 출력되는 음향의 종류 및 음향의 세기와 매칭되는 음향정보가 음향정보 데이터베이스부에서 검색되고, 검색된 음향정보를 기준으로 스피커의 위치 또는 스피커의 출력각도가 조절된다. When the sound information database unit is used, in step S5, sound information matching the type and sound intensity of the sound output in step S1 is searched in the sound information database unit, and the position of the speaker or the output of the speaker based on the found sound information. The angle is adjusted.

본 실시예 및 본 명세서에 첨부된 도면은 본 발명에 포함되는 기술적 사상의 일부를 명확하게 나타내고 있는 것에 불과하며, 본 발명의 명세서 및 도면에 포함된 기술적 사상의 범위 내에서 당업자가 용이하게 유추할 수 있는 변형 예와 구체적인 실시예는 모두 본 발명의 권리범위에 포함되는 것이 자명하다고 할 것이다.The embodiments and drawings attached to this specification are merely to clearly show some of the technical ideas included in the present invention, and those skilled in the art can easily infer within the scope of the technical ideas included in the specification and drawings of the present invention. Modifications that can be made and specific embodiments will be apparent that all fall within the scope of the present invention.

100 : 스트레오 카메라를 이용한 사용자 인식 음향 조절 시스템
110 : 디스플레이부 120 : 음향 출력부
130 : 영상획득부 131 : 편차 오차 수정부
132 : 깊이 맵 생성부 133 : 위치 파악부
134 : 노이즈 제거부 140 : 얼굴 위치 검출부
150 : 귀위치 예측부 160 : 음향신호 조절부100: user recognition sound control system using stereo camera
110: display unit 120: sound output unit
130: image acquisition unit 131: deviation error correction
132: depth map generator 133: position grasping unit
134: noise removing unit 140: face position detection unit
150: ear position predictor 160: sound signal control unit

Claims

A display unit displaying a game content image;
A sound output unit including one or more speakers to output sound according to game contents displayed on the display unit;
Image acquisition unit for obtaining a user image with a stereo camera;
A face position detector for detecting a face position of a user from the image acquired by the image acquisition unit;
An ear position predictor for predicting a user's ear position at the face position detected by the face position detector; And
A sound signal adjusting unit for adjusting the position of the speaker or the output angle of the speaker according to the position of the ear of the user predicted by the ear position prediction unit,
The ear position predictor predicts the positions of both ears based on the position of the eye or the nose position which is the center of the face in the image of the face position detected by the face position detector.
A user recognition sound control system using a stereo camera, characterized in that the ear is located on the center of the detected face area or the left and right ends of the nose position.

The method of claim 1,
The image acquisition unit
A deviation error correction unit for correcting a deviation error occurring in an image captured by two cameras;
A depth map generator for generating a depth map image by finding a matching point between images captured by two cameras in the image in which the error is corrected in the deviation error correction unit; And
And a location determiner configured to determine a location of a user from a depth map generated by the depth map generator.

The method of claim 2,
The image acquisition unit
And a noise removal unit for removing noise from the depth map image generated by the depth map generator.

delete

The method of claim 1,
The user recognition sound control system
And a sound information database unit for storing optimum sound information data which should reach both ears of the user according to the type and sound intensity of the sound output from the speaker.

The method of claim 5,
The sound signal adjusting unit
Search for sound information matching the type and sound intensity of the sound output from the sound output unit in the sound information database unit, and adjust the position or output angle of the speaker of the sound output unit based on the searched sound information. User recognition sound control system using stereo camera.

Step S1 of outputting sound through at least one speaker according to game content;
Step S2 of obtaining a user image using a stereo camera;
Step S3 of detecting a face position of the user from the user image acquired in step S2;
Step S4 of predicting the position of the user's ear at the face position detected in the step S3; And
S5 step of adjusting the position of the speaker or the output angle of the speaker according to the ear position predicted in the step S4,
In the step S4, the position of both ears is predicted based on the position of the eye or the nose position which is the center of the face in the image of the face position detected in the step S3.
A method of controlling sound in a user recognition system using a stereo camera, characterized in that the ear is predicted to be located on the center of the detected face region or at the left and right ends of the nose position.

The method of claim 7, wherein
The step S2
Step S2-1 of correcting the deviation error occurring in the images captured by the two cameras;
Step S2-2 of generating a depth map by finding a matching point between images captured by two cameras in the image in which the deviation error is corrected in step S2-1; And
And a step S2-3 in which the user position is identified from the depth map generated in step S2-2.

9. The method of claim 8,
The step S2
And a step S2-4 of removing noise from the depth map after the step S2-2.

delete

The method of claim 7, wherein
The method for adjusting the sound in the user recognition system
According to the type of sound and the intensity of the sound output from the speaker, the user's recognition system using a stereo camera, characterized in that it further comprises a step S0 is stored in the acoustic information database in advance the optimal sound information data that must reach both ears of the user How to adjust the sound on the

The method of claim 11,
The step S5
The sound information matching the type and sound intensity of the sound output in step S1 is searched in the sound information database unit, and the position of the speaker or the output angle of the speaker is adjusted based on the searched sound information. How to adjust sound in user recognition system using stereo camera.