KR100733964B1

KR100733964B1 - A game Apparatus and method using motion capture and voice recognition

Info

Publication number: KR100733964B1
Application number: KR1020050113655A
Authority: KR
Inventors: 유정재; 박창준; 이인호
Original assignee: 한국전자통신연구원
Priority date: 2005-11-25
Filing date: 2005-11-25
Publication date: 2007-06-29
Also published as: KR20070055210A

Abstract

1. 청구범위에 기재된 발명이 속한 기술분야1. TECHNICAL FIELD OF THE INVENTION

본 발명은, 모션인식과 음성인식을 이용한 게임 장치 및 그 방법에 관한 것임.The present invention relates to a game device and method using motion recognition and voice recognition.

2. 발명이 해결하려고 하는 기술적 과제2. The technical problem to be solved by the invention

본 발명은, 서로 다른 카메라 정보를 가지는 양안식 영상에서 객체를 추출한 후 양안 시차를 이용하여 깊이 정보를 산출하고, 상기 산출한 깊이 정보를 이용하여 3차원 공간에서 상기 객체의 위치를 구하며, 후진동작(Inverse Kinematics) 알고리즘을 통해 전체 모션 데이터를 생성하여 인식한 모션 인식결과와 음성(문장)에서 추출한 특징점 및 상기 모션 인식수단으로부터 전달받은 모션 인식결과를 이용하여 문장의 어구간을 구분한 후 인식한 음성 인식결과를 조합하여 명령을 인식한 후, 캐릭터의 동작 및 그에 상응하는 음향을 제어하기 위한, 모션인식과 음성인식을 이용한 게임 장치 및 그 방법을 제공하는데 그 목적이 있음.According to the present invention, after extracting an object from a binocular image having different camera information, depth information is calculated using binocular parallax, and the position of the object is calculated in a three-dimensional space using the calculated depth information. (Inverse Kinematics) After classifying the phrases of sentences using the motion recognition result generated by recognizing the entire motion data and the feature point extracted from the voice (sentence) and the motion recognition result received from the motion recognition means It is an object of the present invention to provide a game apparatus and method using motion recognition and voice recognition for controlling the movement of a character and the corresponding sound after recognizing a command by combining the voice recognition results.

3. 발명의 해결방법의 요지3. Summary of Solution to Invention

본 발명은, 모션인식과 음성인식을 이용한 게임 장치에 있어서, 사용자별 명령 데이터를 저장하기 위한 저장수단; 서로 다른 카메라 정보를 가지는 양안식 영상에서 객체를 추출한 후 양안 시차를 이용하여 깊이 정보를 산출하고, 상기 산출한 깊이 정보를 이용하여 3차원 공간에서 상기 객체의 위치를 구하며, 후진동작(Inverse Kinematics) 알고리즘을 통해 전체 모션 데이터를 생성하여 모션을 인식하기 위한 모션 인식수단; 적어도 하나의 어구로 이루어진 음성에서 추출한 특징점 및 상기 모션 인식수단으로부터 전달받은 모션 인식결과를 이용하여 음성의 어구간을 구분한 후 어구를 인식하기 위한 고립단어 인식수단; 상기 고립단어 인식수단에서의 인식결과와 상기 모션 인식수단에서의 인식결과를 조합하여 명령을 인식하기 위한 명령 인식수단; 및 상기 명령 인식수단에서의 인식결과(명령 데이터)에 따라 캐릭터의 동작 및 그에 상응하는 음향을 제어하고, 상기 인식결과를 상기 저장수단에 저장하기 위한 중앙처리수단을 포함함.The present invention provides a game device using motion recognition and voice recognition, comprising: storage means for storing user-specific command data; After extracting an object from a binocular image having different camera information, depth information is calculated using binocular disparity, the position of the object is calculated in three-dimensional space using the calculated depth information, and inverse kinematics. Motion recognition means for recognizing motion by generating full motion data through an algorithm; Isolated word recognition means for recognizing a phrase after dividing a phrase of speech using a feature point extracted from a speech composed of at least one phrase and a motion recognition result received from the motion recognition means; Command recognition means for recognizing the command by combining the recognition result in the isolated word recognition means and the recognition result in the motion recognition means; And central processing means for controlling the operation of the character and a corresponding sound according to the recognition result (command data) in the command recognition means, and storing the recognition result in the storage means.

4. 발명의 중요한 용도4. Important uses of the invention

본 발명은 게임 장치 등에 이용됨.The present invention is used in game devices and the like.

모션인식, 음성인식, 조합, 가상공간, 후진동작 알고리즘, 고립단어 인식, 게임 장치 Motion recognition, speech recognition, combination, virtual space, backward motion algorithm, isolated word recognition, game device

Description

Game apparatus and method using motion recognition and voice recognition {A game Apparatus and method using motion capture and voice recognition}

도 1 은 본 발명에 따른 모션인식과 음성인식을 이용한 게임 장치의 일실시예 구성도,1 is a configuration diagram of a game device using motion recognition and voice recognition according to the present invention;

도 2 는 본 발명에 이용되는 카메라 보정 방식 중 보정물체 방식의 일실시예 설명도,2 is a diagram illustrating an embodiment of a correction object method among camera correction methods used in the present invention;

도 3 은 본 발명에 이용되는 카메라 보정 방식 중 자율보정 방식의 일실시예 설명도,3 is a diagram illustrating an embodiment of an autonomous correction method among camera correction methods used in the present invention;

도 4 는 본 발명에 따른 모션인식과 음성인식을 이용한 게임 장치의 일실시예 구현도,Figure 4 is an embodiment implementation of a game device using motion recognition and speech recognition according to the present invention,

도 5 는 본 발명에 따른 모션정보를 이용한 어구간 경계 구분 과정에 대한 일실시예 설명도,5 is a diagram illustrating an embodiment of a boundary classification process between phrases using motion information according to the present invention;

도 6 은 본 발명에 따른 네트워크를 이용한 게임 방법에 대한 일실시예 설명도,6 is a diagram illustrating an embodiment of a game method using a network according to the present invention;

도 7 은 본 발명에 이용되는 후진동작 알고리즘의 일실시예 설명도,7 is a diagram illustrating an embodiment of a backward operation algorithm used in the present invention;

도 8 은 본 발명에 따른 모션인식과 음성인식을 이용한 게임 방법에 대한 일 실시예 흐름도이다.8 is a flowchart illustrating an embodiment of a game method using motion recognition and voice recognition according to the present invention.

* 도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

10 : 저장부 20 : 입력부10: storage unit 20: input unit

30 : 모션 인식부 40 : 음성 인식부30: motion recognition unit 40: speech recognition unit

50 : 명령 인식부 60 : 출력부50: command recognition unit 60: output unit

70 : 인터넷 연결부 80 : 중앙처리부70: internet connection 80: central processing unit

본 발명은 모션인식과 음성인식을 이용한 게임 장치 및 그 방법에 관한 것으로, 더욱 상세하게는 영상을 통한 모션인식과 음성인식의 장단점을 조합하여 여러 가지 동작의 구분과 방향, 강도, 기타 모드 변화 등을 자유롭게 활용하여 기존의 모션인식 또는 음성인식 하나에만 기반하던 게임보다 더욱 역동적이고 흥미로운 게임을 제공하기 위한, 모션인식과 음성인식을 이용한 게임 장치 및 그 방법에 관한 것이다.The present invention relates to a game device and method using motion recognition and voice recognition, and more particularly, to distinguish between the various motions and direction, intensity, other mode changes, etc. by combining the advantages and disadvantages of motion recognition and voice recognition through video The present invention relates to a game apparatus and method using motion recognition and voice recognition to provide a more dynamic and interesting game than a game based only on conventional motion recognition or voice recognition.

인체나 동물의 움직임의 기본원리는 관절의 회전이다. 즉, 관절의 움직임은 대체로 제한되어 있으며 이를 모델 내에 설정한 관절에 정의해 주면 모델의 하위 부분을 움직일 경우 상위의 관절이 이를 반영하여 움직이게 된다. 후진동작 (Inverse Kinematics)은 상위관절의 움직임이 하위관절에 영향을 미치는 것이 아니라, 하위관절의 움직임에 따라 제한된 범위 내에서 상위관절의 움직임을 자동으로 계산하는 알고리즘이다.The basic principle of human or animal movement is rotation of the joints. That is, the movement of the joint is generally limited, and if this is defined in the joint set in the model, when the lower part of the model is moved, the upper joint moves accordingly. Inverse Kinematics is an algorithm that automatically calculates the movement of the upper joint within a limited range according to the movement of the lower joint, rather than the movement of the upper joint.

사람이 팔을 들거나 하는 동작할 때 신경은 보통 손을 어느 위치에 가져다 놓겠다는 의지로 움직이는게 자연스럽지, 어깨뼈를 몇도 돌려서 저 물건을 잡겠다는 의지를 보이는 경우는 거의 없다. 그렇기 때문에 전진동작은 애니메이터의 입장에서 직관적이지 못한 작업방식이다. 이러한 단점을 보완한 방식이 바로 후진동작 알고리즘이다.When a person raises or moves an arm, the nerve usually moves in a willingness to place the hand in a position, and rarely shows a willingness to grab that object by turning the shoulder blade a few degrees. Therefore, moving forward is not an intuitive way of working for an animator. The method that compensates for these shortcomings is the backward operation algorithm.

HMM(Hidden Markov Model)은 음성인식을 위한 기본단위(음소)를 모델링하는 방식이다. 즉, 음성인식 엔진으로 들어오는 음소와 음성인식 엔진 내의 데이터베이스로 갖고 있는 음소를 결합해 단어와 문장을 만드는 방식으로 국내 대부분의 음성인식 엔진업체들이 사용하고 있다.HMM (Hidden Markov Model) is a method of modeling the basic unit (phoneme) for speech recognition. That is, most of the Korean voice recognition engine companies use the phoneme coming into the voice recognition engine and the phoneme that has a database in the voice recognition engine to make words and sentences.

HMM은 관측 불가능한 프로세스를 관측 가능한 다른 프로세스를 통해 추정하는 이중 확률처리 방식으로 현재 음성인식에 많이 사용되고 있다. 따라서, 음성인식에서 HMM방식을 이용한다는 것은 음성인식의 최소단위(음소)를 모델링하여 이를 이용해 음성인식 시스템을 구성하는 것을 말한다. 이에 따라 HMM의 장점은 다른 방법보다 인식률이 높다는 것이다. 그러나, 현재의 HMM 훈련샘플이 충분하지 못할 경우에는 정확한 모델추정이 어려운 점과 음소문맥에 관한 지식이 필요하다는 등의 문제가 있다HMM is widely used in speech recognition as a double probability processing method that estimates unobservable processes through other observable processes. Therefore, using the HMM method in speech recognition refers to constructing a speech recognition system by modeling a minimum unit (phoneme) of speech recognition. Therefore, the advantage of HMM is that recognition rate is higher than other methods. However, if the current HMM training sample is not sufficient, there are problems such as accurate model estimation and knowledge of phoneme context.

한편, 모션인식과 음성인식은 각각의 분야에서 매우 오랫동안 연구되어 왔 다. Meanwhile, motion recognition and voice recognition have been studied for a long time in their respective fields.

먼저, 모션인식은 사용하는 장비에 따라 초음파를 발생하는 센서와 수신기를 이용하는 초음파 방식, 사용자의 관절 움직임을 측정하기 위한 전위차계와 슬라이더를 이용하는 보철방식, 사용자의 각 관절 부위에 자기장 센서를 부착하고 자기장의 변화를 공간적인 변화량으로 계산하여 움직임을 측정하는 자기식 방식, 그리고 CCD(Charge Coupled Device) 카메라로부터 얻은 영상을 사용하는 광학 방식으로 분류된다. First, motion recognition is based on the ultrasonic method using the sensor and the receiver to generate ultrasonic waves according to the equipment used, the prosthetic method using a potentiometer and slider to measure the joint movement of the user, a magnetic field sensor is attached to each joint area of the user It is classified into a magnetic method for measuring movement by calculating the change of the spatial difference and an optical method using an image obtained from a charge coupled device (CCD) camera.

이중 광학 방식은 비교적 정확한 결과를 간단한 장비를 통해서 얻을 수 있다는 이점이 있는데, 지금까지는 주로 몸에 마커를 부착하고 이들 마커의 움직임을 보고 관절의 움직임을 유추하는 방식이 많이 연구되어 왔다. 이러한 방식은 사용할 때마다 마커를 몸에 부착해야 한다는 번거로움이 있어 실질적인 게임 컨텐츠에 활용되기 부적합하다. The dual optical method has the advantage that a relatively accurate result can be obtained through simple equipment. Until now, a lot of researches have been conducted to attach markers to the body, to see the movement of these markers, and to infer joint movement. This method is cumbersome to attach a marker to the body every time it is used, making it unsuitable for practical game content.

따라서, 마커 없이 광학식 방식으로 모션을 인식하기 위한 접근방안으로 옥스포드 대학에서 연구해 온 실루엣 정보를 추출해서 자세를 유추하는 방식이 있는데, 이러한 방식은 각각의 신체 부위가 겹치는 동작에서 취약한 성능을 보이고 있다. Therefore, as an approach for recognizing motion in an optical manner without markers, there is a method of inferring posture by extracting silhouette information that has been studied at Oxford University, and this method has a weak performance in overlapping motions of body parts. .

일반적으로, 카메라가 물체를 바라보는 위치, 각도와 같은 외부변수, 카메라 고유의 휨(skew), 종횡비(aspect ratio), 초점길이(focal length)와 같은 내부 변수를 찾는 과정을 카메라 보정(Calibration)이라 한다. In general, the camera calibration process looks for external variables such as the position of the camera looking at an object, angles such as angles, internal variables such as the camera's inherent skew, aspect ratio, and focal length. This is called.

카메라 보정은 크게 직육면체의 보정물체를 사용하는 방식과 영상으로부터 자동으로 검출하는 일반적인 특징(Natural Feature)의 위치를 추적하여 구하는 자율보정방식으로 구분할 수 있다. Camera correction can be largely classified into a method using a rectangular parallelepiped object and an autonomous correction method obtained by tracking a position of a natural feature automatically detected from an image.

보정물체 사용방식은 도 2 에 도시된 바와 같은 보정물체를 사용하는 방식으로, 현재까지 널리 사용되고 있으며 3차원 직육면체 모양의 보정물체 또는 평면상의 점들의 좌표를 알고 있는 평면 보정 물체를 사용한다. 즉, 보정물체를 촬영하여 이미 알고 있는 그 물체의 기하학적인 관계로부터 카메라 사영행렬을 계산하여 보정한다. The method of using a correcting object is a method of using a correcting object as shown in FIG. 2, which is widely used until now, and uses a correcting object having a three-dimensional cuboid shape or a plane correcting object that knows coordinates of points on a plane. That is, the correction object is photographed to calculate and correct a camera projection matrix from a known geometric relationship of the object.

자율보정방식은 여러 각도의 영상에서 관찰되는 특징점들의 대응관계를 알고 있을 경우, 이들 특징점들의 3D 공간에서의 위치를 업투스케일(up to scale)까지(metric space) 계산할 수 있고, 동시에 카메라의 자동보정(Auto-Calibration)을 수행하여 카메라가 물체를 바라보는 사영행렬 P=KR[I|-C]을 구할 수 있다는 수학적 논리를 따르며, "Richard Hartley"와 "Marc Pollefeys"가 각각 자신들의 방식으로 체계화한 바 있다. The autonomous correction method can calculate the position of these feature points in 3D space in metric space if they know the correspondence of the feature points observed in the images from different angles, and at the same time, automatically correct the camera. (Auto-Calibration) follows the mathematical logic that the camera can see the projection matrix P = KR [I | -C] looking at the object, and "Richard Hartley" and "Marc Pollefeys" are each organized in their own way. I've done it.

이러한 자율보정방식은 광범위한 활용 가능성에도 불구하고 아직까지는 특징점 위치에 대한 노이즈에 매우 취약하며, 현존하는 방법으로는 실시간 구현이 불가능한 상황이다. Despite the wide range of applications, these autonomous correction methods are still very vulnerable to the noise of feature points, and the existing methods cannot realize real time.

도 3 은 자율 보정 방식에서 가장 우수한 성능을 보이는 것으로 알려진 부조(boujou) 3.0에서 추출한 특징점들의 위치와 보정 과정을 거친 뒤에 구한 카메라의 이동경로와 각도를 보여준다. 이와 같은 방식으로 일단 카메라 보정이 이루어지면 미드-포인트 알고리즘(Mid-Point Algorithm)을 비롯한 "Triangulation" 방식으로 해당 특징점들의 3D 공간 좌표를 계산할 수 있다. FIG. 3 shows the movement paths and angles of the cameras obtained after the process of the feature points extracted from the boujou 3.0, which is known to show the best performance in the autonomous correction method, and after the correction process. In this way, once the camera is calibrated, the 3D spatial coordinates of the corresponding feature points can be calculated by a "Triangulation" method including a mid-point algorithm.

다음으로, 음성인식은 최초 사용자의 입력이 들어오면 전처리를 거쳐 특징점을 추출하고, 이들 특징점들을 HMM모델로 보고 기존에 구성된 데이터 베이스와 비교하여 조건부 확률을 정의한 베이시안(Bayesian) 정리에 의해 가능성(Likelihood)을 결정하고 최종 결과를 얻는다. Next, speech recognition extracts feature points through pre-processing when the first user inputs them, sees them as HMM models, and compares them with existing Bayesian theorem which defines conditional probabilities. Likelihood) and get the final result.

이때, HMM 모델링 단위에 따라 단어단위 고립단어 인식기, 음소단위 고립단어인식기, 연속음성인식기 등으로 구분된다. 단어단위 고립단어 인식기는 고립 숫자음 인식 또는 인식 대상 어휘가 고정되어 있는 소규모 인식기에 적합한 구조이다. At this time, it is divided into a word unit isolated word recognizer, a phoneme unit isolated word recognizer, and a continuous speech recognizer according to the HMM modeling unit. The word-based isolated word recognizer is a structure suitable for small-scale recognizers that recognize isolated digits or a target vocabulary.

음소단위 고립단어인식기는 음소를 HMM 단위로 나누기 때문에 임의의 단어를 음소 모델의 결합으로 구성할 수 있으므로, 상황에 따라서 인식 대상 어휘가 바뀌는 가변어휘 인식기의 구현이 가능해진다. 음성 데이터의 양이 적더라도 같은 음소 환경을 갖는 모델들끼리 파라미터를 공유할 수 있으므로 음성 데이터의 양이 적어도 되는 장점이 있다. Since the phoneme-based isolated word recognizer divides the phonemes into HMM units, arbitrary words can be formed by combining phoneme models, so that a variable vocabulary recognizer whose recognition target vocabulary changes according to circumstances can be implemented. Even if the amount of voice data is small, the models having the same phonetic environment can share parameters, so that the amount of voice data is at least advantageous.

연속음성인식의 문제점은 관측열, 음향모델, 발음모델, 언어모델이 주어졌을 때 가장 확률이 높은 단어열을 찾아야 하는데, 단어의 경계 정보가 주어지지 않기 때문에 매 프레임마다 모든 단어가 새로 시작될 수 있어서 탐색 공간이 커진다는 점이다. The problem of continuous speech recognition is to find the most probable word sequence given the observation sequence, acoustic model, pronunciation model, and language model. Since the word boundary information is not given, every word can be started every frame. The search space is larger.

또한, 화자의 음성이 주변 환경에 반사되는 반향현상 때문에 경계점을 찾는 어려움은 더욱 커지며 현재도 많은 연구가 진행중이지만 아직까지는 실용적으로 사 용될 만큼 높은 성능을 보이는 시스템은 고안되지 못한 상태이다. In addition, due to the reflection of the speaker's voice reflected in the surrounding environment, the difficulty of finding the boundary point becomes greater, and much research is still underway, but a system that has high performance enough for practical use has not been devised yet.

즉, 연속음성인식은 사용자가 연속적인 동작을 시도하면서 여러 단어를 발음하거나 문장으로 구성된 주문을 사용하면 단어간의 경계점을 찾기 힘든 문제가 발생하고 음성에서의 ‘반향’현상이 노이즈로 작용하게 되는 문제점이 있다. In other words, continuous speech recognition occurs when a user attempts continuous operation to pronounce several words or uses a spell consisting of sentences, which makes it difficult to find the boundary between words and causes the 'echo' phenomenon in the voice to act as noise. There is this.

이에, 본 발명에서는 모션인식이나 음성인식의 영역에서 비교적 성능이 입증된 기존의 방법들을 조합하여 모션정보와 음성정보 상호간에 불완전한 부분을 보완하고 역동적인 게임을 제공하는 방안을 제안하고자 한다. Accordingly, the present invention proposes a method of complementing the incomplete parts between the motion information and the voice information and providing a dynamic game by combining existing methods that have been relatively proven in the area of motion recognition or voice recognition.

본 발명은 상기 문제점을 해결하기 위하여 제안된 것으로, 서로 다른 카메라 정보를 가지는 양안식 영상에서 객체를 추출한 후 양안 시차를 이용하여 깊이 정보를 산출하고, 상기 산출한 깊이 정보를 이용하여 3차원 공간에서 상기 객체의 위치를 구하며, 후진동작(Inverse Kinematics) 알고리즘을 통해 전체 모션 데이터를 생성하여 인식한 모션 인식결과와 적어도 하나의 어구로 이루어진 음성에서 추출한 특징점 및 상기 모션 인식수단으로부터 전달받은 모션 인식결과를 이용하여 문장의 어구간을 구분한 후 인식한 음성 인식결과를 조합하여 명령을 인식한 후, 캐릭터의 동작 및 그에 상응하는 음향을 제어하기 위한, 모션인식과 음성인식을 이용한 게임 장치 및 그 방법을 제공하는데 그 목적이 있다.The present invention has been proposed to solve the above problems, after extracting an object from a binocular image having different camera information, calculating depth information using binocular parallax, and using the calculated depth information in a three-dimensional space The position of the object is obtained, and the motion recognition result generated by recognizing and generating the entire motion data through an inverse kinematics algorithm, the feature point extracted from the voice composed of at least one phrase, and the motion recognition result received from the motion recognition means. The game apparatus and method using motion recognition and voice recognition for controlling the movement of the character and the corresponding sound after recognizing the command by combining the recognized speech recognition result after dividing the phrase of the sentence by using The purpose is to provide.

본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.Other objects and advantages of the present invention can be understood by the following description, and will be more clearly understood by the embodiments of the present invention. Also, it will be readily appreciated that the objects and advantages of the present invention may be realized by the means and combinations thereof indicated in the claims.

상기 목적을 달성하기 위한 본 발명의 장치는, 모션인식과 음성인식을 이용한 게임 장치에 있어서, 사용자별 명령 데이터를 저장하기 위한 저장수단; 서로 다른 카메라 정보를 가지는 양안식 영상에서 객체를 추출한 후 양안 시차를 이용하여 깊이 정보를 산출하고, 상기 산출한 깊이 정보를 이용하여 3차원 공간에서 상기 객체의 위치를 구하며, 후진동작(Inverse Kinematics) 알고리즘을 통해 전체 모션 데이터를 생성하여 모션을 인식하기 위한 모션 인식수단; 적어도 하나의 어구로 이루어진 음성에서 추출한 특징점 및 상기 모션 인식수단으로부터 전달받은 모션 인식결과를 이용하여 음성의 어구간을 구분한 후 어구를 인식하기 위한 고립단어 인식수단; 상기 고립단어 인식수단에서의 인식결과와 상기 모션 인식수단에서의 인식결과를 조합하여 명령을 인식하기 위한 명령 인식수단; 및 상기 명령 인식수단에서의 인식결과(명령 데이터)에 따라 캐릭터의 동작 및 그에 상응하는 음향을 제어하고, 상기 인식결과를 상기 저장수단에 저장하기 위한 중앙처리수단을 포함하는 것을 특징으로 한다.According to an aspect of the present invention, there is provided a game apparatus using motion recognition and voice recognition, comprising: storage means for storing user-specific command data; After extracting an object from a binocular image having different camera information, depth information is calculated using binocular disparity, the position of the object is calculated in three-dimensional space using the calculated depth information, and inverse kinematics. Motion recognition means for recognizing motion by generating full motion data through an algorithm; Isolated word recognition means for recognizing a phrase after dividing a phrase of speech using a feature point extracted from a speech composed of at least one phrase and a motion recognition result received from the motion recognition means; Command recognition means for recognizing the command by combining the recognition result in the isolated word recognition means and the recognition result in the motion recognition means; And central processing means for controlling the operation of the character and its corresponding sound in accordance with the recognition result (command data) in the command recognition means, and for storing the recognition result in the storage means.

한편, 본 발명의 방법은, 모션인식과 음성인식을 이용한 게임 방법에 있어서, 모션 인식수단이 서로 다른 카메라 정보를 가지는 양안식 영상을 입력받음에 따라 객체를 추출한 후 양안 시차를 이용하여 깊이 정보를 산출하는 단계; 상기 모션 인식수단이 상기 산출한 깊이 정보를 이용하여 3차원 공간에서 상기 객체의 위치를 구한 후 후진동작(Inverse Kinematics) 알고리즘을 통해 전체 모션 데이터를 생성하여 모션을 인식하는 단계; 고립단어 인식수단이 적어도 하나의 어구로 이루어진 음성을 입력받음에 따라 특징점을 추출한 후 상기 특징점과 상기 모션 인식결과를 이용하여 음성의 어구간을 구분한 후 어구를 인식하는 단계; 명령 인식수단이 상기 어구 인식결과와 상기 모션 인식결과를 조합하여 명령을 인식하는 단계; 및 중앙처리수단이 상기 인식결과(명령 데이터)에 따라 캐릭터의 동작 및 그에 상응하는 음향을 제어하는 단계를 포함하는 것을 특징으로 한다.Meanwhile, in the game method using motion recognition and voice recognition, the method of the present invention extracts an object as a motion recognition means receives a binocular image having different camera information, and then uses depth binocular disparity to extract depth information. Calculating; Recognizing the motion by generating the entire motion data through an inverse kinematics algorithm after obtaining the position of the object in a three-dimensional space by using the calculated depth information by the motion recognizing means; Extracting a feature point according to an input of a voice composed of at least one phrase by the isolated word recognizing means, and recognizing a phrase after dividing a phrase of the voice using the feature point and the motion recognition result; Command recognition means for recognizing a command by combining the phrase recognition result and the motion recognition result; And controlling, by the central processing means, the motion of the character and the corresponding sound according to the recognition result (command data).

또한, 본 발명은 영상을 통한 모션인식과 음성인식의 장단점을 조합하여 사용자가 여러 가지 행동을 취하면서 게임 속 내용물 및 접속중인 다수의 사용자들과 역동적인 인터액션을 가질 수 있는 게임을 제작하기 위한 효과적인 모델과 구체적인 구현방안을 제공한다. In addition, the present invention combines the advantages and disadvantages of motion recognition and voice recognition through video, and is effective for producing a game that can have dynamic interaction with the contents of the game and a large number of users connected while taking various actions. Provide a model and specific implementation.

또한, 본 발명은 제한된 개수의 신체부위로부터 전체 모션을 유추하기 때문에 동작의 모호성이 존재하는 후진동작 알고리즘의 가장 큰 단점을 음성 데이터와 조합하여 해결한다. 예를 들어, 사용자가 게임상에서 특정한 동작이나 마법 등을 행하려 할 때 해당 권법, 무술의 명칭이나 주문을 외치도록 하여 정확하게 명령을 인식한다.In addition, the present invention solves the biggest disadvantage of the backward motion algorithm in which the motion ambiguity exists because the whole motion is inferred from a limited number of body parts by combining with the voice data. For example, when a user attempts to perform a specific action or magic in the game, the user may call out the title, the name of the martial art, or the spell, to correctly recognize the command.

또한, 본 발명은 모션인식만으로 해결하기엔 번거롭던 여러 가지 기능들을 간편하게 해결해 준다. 예를 들어, 사용자가 게임 속에서 걷고 있다가 뛰는 상태로 또는 하늘을 나는 상태로 모드를 전환하고자 할 때 기존의 게임처럼 마우스 컨트롤로 버튼을 누르거나 모션으로 디스플레이 화면 안의 버튼을 클릭할 필요 없이 입으로 명령어를 발음하면 바로 전환이 가능하다. 또한, 게임 속에서 특정 아이템을 선 택하거나 무기, 장비를 교환하는 등의 특별한 행동도 음성을 통해 간단히 지시할 수 있다. In addition, the present invention easily solves a number of functions that are cumbersome to solve by motion recognition alone. For example, when a user walks in a game and wants to switch modes to run or fly, there is no need to click a button with the mouse controls or click a button on the display screen in motion like a traditional game. If you pronounce the command, you can switch immediately. In addition, special actions such as selecting specific items in the game or exchanging weapons or equipment can be simply indicated by voice.

또한, 본 발명은 인식한 명령(모션 데이터 및 그에 상응하는 음성 데이터)들을 데이터베이스화함으로써, 사용자가 게임을 오래 할수록 게임 장치로 의사를 전달하기가 수월해지고 이는 게임상에서의 ‘경험치’의 증가로 반영될 수 있다. In addition, the present invention has a database of the recognized commands (motion data and corresponding voice data), the longer the user the game, the easier it is to communicate to the game device, which is reflected in the increase in the 'experience value' in the game Can be.

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명하기로 한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings, whereby those skilled in the art may easily implement the technical idea of the present invention. There will be. In addition, in describing the present invention, when it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명에 따른 모션인식과 음성인식을 이용한 게임 장치의 일실시예 구성도이다.1 is a configuration diagram of a game device using motion recognition and voice recognition according to the present invention.

도 1 에 도시된 바와 같이, 본 발명에 따른 모션인식과 음성인식을 이용한 게임 장치는, 사용자별 명령 데이터(모션 데이터 및 그에 상응하는 음성 데이터)를 저장하기 위한 저장부(10), 서로 다른 카메라 정보를 가지는 양안식 영상 및 음성(문장)을 입력받기 위한 입력부(20), 상기 입력부(20)에서 입력받은 양안식 영상에서 객체를 추출한 후 양안 시차를 이용하여 깊이 정보를 산출하고, 상기 산출한 깊이 정보를 이용하여 3차원 공간에서 상기 객체의 위치를 구하며, 후진동작(Inverse Kinematics) 알고리즘을 통해 전체 모션 데이터를 생성하여 모션을 인식 및 인식결과를 고립단어 인식부(40)로 전달하기 위한 모션 인식부(30), 상기 입력부(20)에서 입력받은 음성(문장)에서 추출한 특징점 및 모션 인식부(30)로부터 전달받은 모션 인식결과를 이용하여 문장의 어구간을 구분한 후 어구를 인식하기 위한 고립단어 인식부(40), 상기 고립단어 인식부(40)에서의 인식결과와 상기 모션 인식부(30)에서의 인식결과를 조합하여 명령을 인식하기 위한 명령 인식부(50), 중앙처리부(80)의 제어에 따라 캐릭터의 동작 및 그에 상응하는 음향을 출력하기 위한 출력부(60), 인터넷에 연결하기 위한 인터넷 연결부(70), 및 상기 명령 인식부(50)에서의 인식결과(명령 데이터)에 따라 캐릭터의 동작 및 그에 상응하는 음향을 제어하고, 상기 인식결과를 저장부(10)에 저장하기 위한 중앙처리부(80)를 포함한다. As shown in FIG. 1, a game device using motion recognition and voice recognition according to the present invention includes a storage unit 10 for storing user-specific command data (motion data and corresponding voice data), and different cameras. After extracting the object from the input unit 20 for receiving the binocular image having the information and the audio (sentence), the binocular image received from the input unit 20, depth information is calculated using binocular disparity, and the calculated A motion for obtaining the position of the object in a three-dimensional space using depth information and generating all motion data through an inverse kinematics algorithm to recognize the motion and deliver the result to the isolated word recognition unit 40. Recognition unit 30, using the feature point extracted from the voice (sentence) received from the input unit 20 and the motion of the sentence using the motion recognition result received from the motion recognition unit 30 After distinguishing between the isolated word recognition unit 40 for recognizing a phrase, the recognition result in the isolated word recognition unit 40 and the recognition result in the motion recognition unit 30 for recognizing a command The command recognition unit 50, the output unit 60 for outputting the motion of the character and the corresponding sound under the control of the central processing unit 80, the Internet connection unit 70 for connecting to the Internet, and the command recognition unit And a central processing unit 80 for controlling the operation of the character and the corresponding sound according to the recognition result (command data) in 50, and for storing the recognition result in the storage unit 10.

여기서, 상기 인터넷 연결부(70)는 본 발명의 부가적인 요소이다.Here, the Internet connection 70 is an additional element of the present invention.

또한, 상기 카메라 정보는 카메라 3차원 위치정보(x, y, z 좌표), 카메라 방향 정보, 초점거리 정보를 포함한다.In addition, the camera information includes camera three-dimensional position information (x, y, z coordinates), camera direction information, focal length information.

한편, 상기 입력부(20)는 서로 다른 카메라 정보를 가지는 양안식 영상을 입력받기 위한 영상센서(201), 및 음성(문장)을 입력받기 위한 음성센서(202)를 포함한다.The input unit 20 includes an image sensor 201 for receiving a binocular image having different camera information, and a voice sensor 202 for receiving a voice (sentence).

한편, 상기 모션 인식부(30)는 연령, 성별 및 체형별로 게임에 활용되는 모션 데이터를 저장하고 있는 모션 데이터베이스(300), 상기 영상센서(201)를 통해 입력받은 양안식 영상에서 객체(일예로, 머리, 손, 발)를 추출하기 위한 객체 추출기(301), 상기 객체 추출기(301)에서 추출한 객체의 양안 시차를 이용하여 깊이 정 보(카메라 렌즈에 수직한 방향으로의 거리 정보)를 산출하고, 상기 산출한 깊이 정보를 이용하여 3차원 공간에서 객체의 위치를 구하기 위한 위치 산출기(302), 후진동작(Inverse Kinematics) 알고리즘을 이용하여 상기 위치 산출기(302)에서 산출한 3차원 공간에서의 객체의 위치를 통해 팔꿈치, 무릎, 골반과 같은 다른 부위의 위치와 상태를 산출하여 전체 모션 데이터를 생성하기 위한 후진동작 처리기(303), 및 상기 후진동작 처리기(303)에서 생성한 모션 데이터를 상기 모션 데이터베이스(300) 상의 해당 모션 데이터와 비교하여 모션을 인식 및 인식결과를 어구간 경계 구분기(402)로 전달하기 위한 모션 인식기(304)를 포함한다.On the other hand, the motion recognition unit 30 is an object (for example, in the binocular image received through the motion sensor 300, the image sensor 201, which stores the motion data used in the game for each age, gender and body type) Depth information (distance information in the direction perpendicular to the camera lens) is calculated by using the object extractor 301 for extracting the head, the hand, the foot, and the binocular parallax of the object extracted by the object extractor 301. In the three-dimensional space calculated by the position calculator 302 for calculating the position of the object in the three-dimensional space using the calculated depth information, the position calculator 302 using an inverse kinematics algorithm The backward motion processor 303 for generating the entire motion data by calculating the position and state of other parts such as the elbow, knee, and pelvis through the position of the object of the object, and generated by the backward motion processor 303. And a motion recognizer 304 for comparing the motion data with the corresponding motion data on the motion database 300 and recognizing the motion and delivering the result to the phrase boundary delimiter 402.

여기서, 상기 후진동작 처리기(303)는 도 7에 도시된 바와 같이, 특정 신체부위의 위치(b)를 입력으로 받아서 전체 모션 데이터(c)를 결과물로 산출한다. Here, as shown in FIG. 7, the backward motion processor 303 receives a position b of a specific body part as an input and calculates the entire motion data c as a result.

또한, 상기 모션 데이터베이스(300)는 사용자로부터 입력받은 모션이 항상 동일하지 않기 때문에 하나의 정확한 모션 데이터에 대해 다수의 유사 모션 데이터를 저장하고 있다.In addition, since the motion input from the user is not always the same, the motion database 300 stores a plurality of similar motion data for one accurate motion data.

한편, 상기 음성 인식부(40)는, 게임에 활용되는 음성 데이터를 저장하고 있는 음성 데이터베이스(400), 상기 음성센서(202)를 통해 입력받은 음성(문장)에서 특징점을 추출하기 위한 특징점 추출기(401), 상기 특징점 추출기(401)에서 추출한 특징점 및 모션 인식기(304)로부터의 모션 인식결과를 이용하여 문장의 어구간을 구분하기 위한 어구간 경계 구분기(402), 및 상기 어구간 경계 구분기(402)에서 구분한 어구를 인식하기 위한 고립단어 인식기(403)를 포함한다.On the other hand, the voice recognition unit 40, the voice database 400 that stores the voice data used in the game, a feature point extractor for extracting feature points from the voice (sentence) received through the voice sensor 202 ( 401, a phrase boundary boundary separator 402 for classifying phrases of sentences using the feature points extracted by the feature point extractor 401 and the motion recognition result from the motion recognizer 304, and the phrase boundary separator An isolated word recognizer 403 for recognizing the phrase classified at 402 is included.

여기서, 상기 음성 데이터베이스(400)는 사용자로부터 입력받은 음성이 항상 동일하지 않기 때문에 하나의 정확한 음성 데이터에 대해 다수의 유사 음성 데이터를 저장하고 있다.In this case, since the voice input from the user is not always the same, the voice database 400 stores a plurality of similar voice data for one accurate voice data.

또한, 상기 어구간 경계 구분기(402)는 도 5에 도시된 바와 같이, 적어도 하나의 어구로 이루어진 음성에서 어구간을 구분한다. 이때, 어구간 구분을 위해 이용하는 모션정보는 음성정보보다 각 동작의 시작과 끝을 명확하게 인지할 수 있도록 한다.In addition, as shown in FIG. 5, the phrase boundary delimiter 402 separates a phrase from a voice composed of at least one phrase. At this time, the motion information used to distinguish between the phrases so that the start and end of each operation can be clearly recognized than the voice information.

또한, 상기 고립단어 인식기(403)는 특징점 추출 단계를 거친 뒤 각각의 단어와 하나의 HMM 모델을 구성한 데이터베이스와의 비교를 통해서 최대 가능성(Maximum Likelihood)을 갖는 결과를 선택한다. In addition, the isolated word recognizer 403 selects a result having a maximum likelihood through a feature point extraction step and comparing each word with a database of one HMM model.

한편, 상기 명령 인식부(50)는, 상기 고립단어 인식기(403)에서의 인식결과와 모션 인식기(304)에서의 인식결과를 조합하여 명령을 인식하기 위한 명령 인식기(501), 및 상기 명령 인식기(501)에서 인식한 명령에 해당하는 음성 및 모션을 저장하기 위한 사용자 데이터베이스(500)를 포함한다.On the other hand, the command recognizer 50, a command recognizer 501 for recognizing a command by combining the recognition result in the isolated word recognizer 403 and the recognition result in the motion recognizer 304, and the command recognizer A user database 500 for storing voice and motion corresponding to the command recognized by 501 is included.

도 4 는 본 발명에 따른 모션인식과 음성인식을 이용한 게임 장치의 일실시예 구현도이다.Figure 4 is an embodiment implementation of a game device using motion recognition and speech recognition according to the present invention.

도 4 에 도시된 바와 같이, 본 발명에 따른 모션인식과 음성인식을 이용한 게임 장치는, 사용자(41), 게임의 출력 영상을 출력하기 위한 디스플레이장치(일예로 TV)(42), 미리 보정된 결과를 사용할 수 있도록 그 위치와 각도가 정밀하게 고정되어 있는 스테레오 카메라 역할을 하는 두 대의 캠 카메라(43), 음성을 입력받기 위한 마이크(44), 입력과 출력, 그리고 게임 작동을 수행하기 위한 게임기 본체(45), 및 인터넷 연결부(46)를 포함한다.As shown in FIG. 4, a game device using motion recognition and voice recognition according to the present invention includes a user 41, a display device (for example, a TV) 42 for outputting an output image of a game, and a preliminary correction. Two cam cameras (43) acting as stereo cameras whose position and angle are precisely fixed so that the results can be used, microphones (44) for voice input, inputs and outputs, and game machines for performing game operations. A main body 45 and an internet connection 46.

여기서, 상기 인터넷 연결부(46)는 네트워크를 통해 다른 유저들과 동일한 전자공간 안에서 게임을 즐길 수 있도록 하며, 게임상에서 사용 가능한 동작과 기술을 온라인 업그레이드를 통해 제공받음으로써, 사용자들이 실증내지 않고 게임에 지속적인 흥미를 갖도록 한다. Here, the Internet connection unit 46 allows the user to enjoy the game in the same electronic space as other users through the network, and by providing the operation and technology available in the game through the online upgrade, users do not demonstrate to the game Keep it interesting.

도 8 은 본 발명에 따른 모션인식과 음성인식을 이용한 게임 방법에 대한 일실시예 흐름도이다.8 is a flowchart illustrating an embodiment of a game method using motion recognition and voice recognition according to the present invention.

먼저, 후진동작(Inverse Kinematics)을 적용하기 위해 최초 게임 시작시 사용자는 손바닥이 전방을 향하도록 카메라 앞에서 십자형 자세로 팔을 들어주어 체형을 인식시킨다.First, in order to apply Inverse Kinematics, at the beginning of the first game, the user raises his arm in a cruciform position in front of the camera so that his palm is facing forward to recognize the figure.

이후, 서로 다른 카메라 정보를 가지는 양안식 영상을 입력받음에 따라 객체를 추출한 후 양안 시차를 이용하여 깊이 정보를 산출하고, 상기 산출한 깊이 정보를 이용하여 3차원 공간에서 상기 객체의 위치를 구하며, 후진동작(Inverse Kinematics) 알고리즘을 통해 전체 모션 데이터를 생성하여 모션을 인식한다(801).Then, after receiving the binocular image having different camera information, the object is extracted, depth information is calculated using binocular parallax, and the position of the object is calculated in the three-dimensional space using the calculated depth information. The entire motion data is generated through an inverse kinematics algorithm to recognize the motion (801).

이후, 음성(문장)을 압력받음에 따라 특징점을 추출한 후 상기 특징점과 상기 모션 인식결과를 이용하여 문장의 어구간을 구분한 후 어구를 인식한다(802).Then, after extracting the feature point according to the pressure (sentence) under the pressure, and using the feature point and the motion recognition result to distinguish the phrase of the sentence and recognizes the phrase (802).

이후, 상기 음성 인식결과와 상기 모션 인식결과를 조합하여 명령을 인식한다(803).Thereafter, the command is recognized by combining the voice recognition result with the motion recognition result (step 803).

이후, 상기 인식결과(명령 데이터)에 따라 캐릭터의 동작 및 그에 상응하는 음향을 제어한다(804).Then, the operation of the character and the corresponding sound is controlled according to the recognition result (command data) (804).

한편, 본 발명에서 카메라 보정 방식은 노이즈에 강인하고 끊김 없는 서비스 를 위하여 ‘보정 물체를 사용하는 방식’을 이용하는 것이 바람직하다.On the other hand, in the present invention, the camera correction method is preferable to use the 'correction method using the compensation object' for noise-resistant and seamless service.

또한, 본 발명에서 짧은 명칭의 공격 수단뿐 아니라 게임상에서 마법과 같은 환타지적인 요소를 활용하기 위한 장문의 주문을 입력받을 경우 모션 인식 데이터를 사용해서 어구간을 구분하여 음성 인식기의 입력으로 넣으므로 단어단위 고립단어 인식기(403)를 이용하는 것이 바람직하다.In addition, in the present invention, when receiving a long spell to utilize a fantasy element such as magic in the game as well as a short-named attack means, the word is classified into a speech recognizer using motion recognition data. It is preferable to use the unit isolated word recognizer 403.

또한, 본 발명은 도 6 에 도시된 바와 같이, 네트워크를 통해 다수의 사용자들이 동일한 가상 공간 안에서 서로의 행동을 게임 속 캐릭터를 통해 실시간으로 느끼면서 같이 게임을 즐기고, 서로 간의 실력을 겨룸으로써 보다 높은 흥미를 유발할 수 있다. In addition, the present invention, as shown in Figure 6, through the network, a plurality of users in the same virtual space to enjoy each other's actions in real time through the characters in the game through the game, and compete with each other higher It can be interesting.

또한, 본 발명은 인터넷 연결을 통해서 주기적으로 동작, 주문 등에 대한 업그레이드를 제공받음으로써, 사용자들은 계속해서 새로운 공격, 방어 수단과 마법 주문들을 활용하고 동일 게임에 대해서 실증내지 않고 오랜 시간 즐길 수 있다.In addition, the present invention is periodically provided with upgrades to operations, spells, and the like through an Internet connection, so that users can continue to utilize new attacks, defenses and magic spells and enjoy a long time without demonstrating the same game.

또한, 본 발명에서 영상정보는 동작을 구분하기 위해서도 사용되지만 동작의 방향성과 민첩성, 강도 등을 결정하기 위해 주로 사용되고, 음성정보는 동작의 정확한 구분과 강도(기합소리), 기타 아이템 선택이나 모드 선택 등 특정한 지시를 내리기 위한 용도로 활용된다. In addition, in the present invention, the image information is also used to distinguish the motion, but mainly used to determine the direction, agility, strength, and the like of the motion, and the voice information is used to accurately classify the motion and intensity (joint sound), other item selection, or mode selection. It is used to give specific instructions.

또한, 본 발명에서 사용자는 디스플레이 장치 앞에서 서서 다양한 행동을 취하고, 디스플레이 화면에는 접속중인 다른 사용자들의 모습과 여러 가지 공격대상과 방어대상이 나타나며, 게임 컨텐츠는 액션과 마법이 등장하는 환타지 게임 등을 포함한다. 따라서, 사용자는 공격, 방어 등의 다양한 동작을 취하면서 특정 동작 명이나 주문을 외치고 기합 소리를 넣는 등 신체와 음성 모두를 활용해서 게임과 활발한 인터액션을 갖는다.In addition, in the present invention, the user stands in front of the display device and takes various actions, and on the display screen, the user's appearance and various attack targets and defense targets appear, and the game content includes a fantasy game in which action and magic appear. do. Therefore, the user has active interaction with the game by utilizing both the body and the voice, such as shouting a specific action name or spell while putting various sounds such as attack and defense, and putting in the sound of a fire.

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 형태로 기록매체(씨디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다. 이러한 과정은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있으므로 더 이상 상세히 설명하지 않기로 한다.As described above, the method of the present invention may be implemented as a program and stored in a recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.) in a computer-readable form. Since this process can be easily implemented by those skilled in the art will not be described in more detail.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다.The present invention described above is capable of various substitutions, modifications, and changes without departing from the technical spirit of the present invention for those skilled in the art to which the present invention pertains. It is not limited by the drawings.

상기와 같은 본 발명은, 서로 다른 카메라 정보를 가지는 양안식 영상에서 객체를 추출한 후 양안 시차를 이용하여 깊이 정보를 산출하고, 상기 산출한 깊이 정보를 이용하여 3차원 공간에서 상기 객체의 위치를 구하며, 후진동작(Inverse Kinematics) 알고리즘을 통해 전체 모션 데이터를 생성하여 인식한 모션 인식결과와 음성(문장)에서 추출한 특징점 및 상기 모션 인식수단으로부터 전달받은 모션 인식결과를 이용하여 문장의 어구간을 구분한 후 인식한 음성 인식결과를 조합하여 명령을 인식한 후, 캐릭터의 동작 및 그에 상응하는 음향을 제어함으로써, 적은 오 류율로 게임을 제공할 수 있는 효과가 있다.As described above, the present invention extracts an object from a binocular image having different camera information, calculates depth information using binocular parallax, and calculates a position of the object in a three-dimensional space using the calculated depth information. , Using the inverse kinematics algorithm to generate the entire motion data and classify the sentence phrase using the motion recognition result and the feature point extracted from the voice (sentence) and the motion recognition result received from the motion recognition means After recognizing the command by combining the recognized voice recognition results, and controlling the motion of the character and the corresponding sound, there is an effect that can provide a game with a small error rate.

또한, 본 발명은 기존의 방식으로는 구현할 수 없었던 역동적이고 실감 넘치는 액션, 환타지 게임 컨텐츠를 모션인식과 음성인식의 장단점을 조합하여 실현 가능하게 하였으며 다수의 사용자들은 온라인상으로 연결되어 서로간의 동작과 행동을 실시간으로 확인하며 실감 있게 게임을 즐길 수 있는 효과가 있다. In addition, the present invention enables to realize dynamic and realistic action and fantasy game contents that cannot be realized by the conventional method by combining the advantages and disadvantages of motion recognition and voice recognition. You can check the action in real time and have the effect of playing the game realistically.

Claims

In the game device using motion recognition and voice recognition,

Storage means for storing user-specific command data;

After extracting an object from a binocular image having different camera information, depth information is calculated using binocular disparity, the position of the object is calculated in three-dimensional space using the calculated depth information, and inverse kinematics. Motion recognition means for recognizing motion by generating full motion data through an algorithm;

Isolated word recognition means for recognizing a phrase after dividing a phrase of speech using a feature point extracted from a speech composed of at least one phrase and a motion recognition result received from the motion recognition means;

Command recognition means for recognizing the command by combining the recognition result in the isolated word recognition means and the recognition result in the motion recognition means; And

Central processing means for controlling the operation of the character and the corresponding sound in accordance with the recognition result (command data) in the command recognition means, and for storing the recognition result in the storage means

Game device using motion recognition and speech recognition comprising a.

The method of claim 1,

Internet connection means for connecting to the internet

Game device using motion recognition and voice recognition further comprising.

The method according to claim 1 or 2,

The motion recognition means,

A motion database that stores motion data used in the game for each age, gender, and body type;

An object extractor for extracting an object from the binocular image;

Calculate depth information (distance information in a direction perpendicular to the camera lens) using binocular parallax of the object extracted by the object extractor, and calculate a position for obtaining the position of the object in a three-dimensional space using the calculated depth information group;

Using the inverse kinematics algorithm to calculate the position and state of other parts such as elbows, knees, pelvis, etc. from the position of the object in the three-dimensional space calculated by the position calculator to generate the entire motion data A reverse motion processor for; And

A motion recognizer for comparing the motion data generated by the backward motion processor with the corresponding motion data on the motion database to recognize the motion and to transmit the result to the isolated word recognition means.

Game device using motion recognition and speech recognition comprising a.

The method of claim 3, wherein

The isolated word recognition means,

A voice database for storing voice data utilized in the game;

A feature point extractor for extracting feature points from the speech composed of the at least one phrase;

A phrase boundary separator for classifying speech phrases of speech using feature points extracted by the feature point extractor and motion recognition results from the motion recognizer; And

Isolated word recognizer for recognizing phrases classified by the boundary delimiter between phrases

Game device using motion recognition and speech recognition comprising a.

In the game method using motion recognition and voice recognition,

Calculating depth information using binocular disparity after extracting an object according to a binocular image having different camera information inputted by the motion recognition means;

Recognizing the motion by generating the entire motion data through an inverse kinematics algorithm after obtaining the position of the object in a three-dimensional space by using the calculated depth information by the motion recognizing means;

Extracting a feature point according to an input of a voice composed of at least one phrase by the isolated word recognizing means, and recognizing a phrase after dividing a phrase of the voice using the feature point and the motion recognition result;

Command recognition means for recognizing a command by combining the phrase recognition result and the motion recognition result; And

Controlling, by the central processing means, the movement of the character and the corresponding sound according to the recognition result (command data)

Game method using a motion recognition and speech recognition comprising a.