KR20230076409A

KR20230076409A - Smart Glass and Voice Recognition System having the same

Info

Publication number: KR20230076409A
Application number: KR1020210163308A
Authority: KR
Inventors: 박진홍
Original assignee: 주식회사 딥파인
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2023-05-31
Also published as: KR102605774B1

Abstract

본 발명은 음성인식 및 정보 가시화를 위한 스마트 글래스 및 이를 포함하는 음성 인식 시스템에 관한 것으로, 본 발명의 일 실시예는 사용자의 신체에 착용되되, 사용자의 음성 및 조작 중 적어도 하나를 포함하는 입력을 수신하는 입력 수신부; 상기 음성을 인식하여 음성 인식 정보를 생성하는 음성 인식부; 상기 음성 인식 정보를 텍스트로 표시하는 표시부; 및 상기 입력 수신부를 통해 수신된 텍스트 선택 명령에 따라 상기 표시부에 표시된 텍스트 중 적어도 하나의 텍스트를 수정 대상 텍스트로 선택하고, 상기 입력 수신부를 통해 수신된 텍스트 수정 명령에 따라 상기 수정 대상 텍스트를 수정하는 제어부를 포함하는, 스마트 글래스를 제공한다.The present invention relates to smart glasses for voice recognition and information visualization and a voice recognition system including the same. An embodiment of the present invention is worn on a user's body and receives an input including at least one of the user's voice and manipulation. an input receiving unit to receive; a voice recognition unit recognizing the voice and generating voice recognition information; a display unit displaying the voice recognition information as text; and selecting at least one of the texts displayed on the display unit as a text to be edited according to a text selection command received through the input receiving unit, and modifying the target text according to a text editing command received through the input receiving unit. Provided are smart glasses including a control unit.

Description

Smart glasses and voice recognition system including the same {Smart Glass and Voice Recognition System having the same}

본 발명은 스마트 글래스 및 이를 포함하는 음성 인식 시스템에 관한 것으로, 더욱 상세하게는 음성인식 및 정보 가시화를 위한 스마트 글래스 및 이를 포함하는 음성 인식 시스템에 관한 것이다.The present invention relates to smart glasses and a voice recognition system including the same, and more particularly, to smart glasses for voice recognition and information visualization and a voice recognition system including the same.

최근 스마트 전자기기에서는 음성 인식 기술이 대중화되고 있으며, 스마트워치나 구글 글래스와 같은 웨어러블 장치에서도 음성 인식을 가장 주요한 문자 입력 수단으로 활용할 것으로 예상된다. 음성 인식 기술은 오래 전부터 개발되어 온 기술임에도 불구하고, 정확한 음성 인식을 통하여 텍스트로 치환되어 입력되는데 있어서 아직도 오류가 다수 발생하고 있다.Recently, voice recognition technology has become popular in smart electronic devices, and it is expected that wearable devices such as smart watches and Google Glass will also use voice recognition as the most important text input method. Although voice recognition technology has been developed for a long time, many errors still occur in inputting text after being substituted into text through accurate voice recognition.

음성 인식으로 입력을 행하면, 텍스트로 치환되어 이것이 사용자에게 보여지는데, 인식된 텍스트가 의도와 다르거나 오류가 있는 경우에 이를 수정해야 하고, 키보드 입력 모드로 전환하여 기존 키보드 입력 수정 방법과 동일하게 백스페이스 키를 통하거나 커서 이동을 통하여 입력된 텍스트의 일부를 수정하거나, 음성 인식 모드상에서 삭제 버튼을 눌러 구술된 단위로 한번에 입력된 텍스트 전체를 삭제하고, 구술을 다시 행하여 재입력하는 방법으로 이전의 오류를 수정한다.When input is performed by voice recognition, it is substituted into text and displayed to the user. If the recognized text is different from the intended or has an error, it must be corrected, and it is switched to the keyboard input mode to back the same way as the existing keyboard input correction method. It is a method of correcting part of the input text through the space key or moving the cursor, or deleting the entire input text at once in dictation units by pressing the delete button in the voice recognition mode, and re-entering the dictation again. Fix the error.

이러한 수정 동작에 있어서, 키보드 입력으로 복귀하는 방법은 작은 크기의 터치 스크린 상에서 정확한 커서 이동 동작을 수행하거나 백스페이스 키를 복수회로 눌러 입력하여 이미 입력된 텍스트를 지우고 다시 입력하는 등 번거로운 동작이 수반된다. 또한, 구술된 단위로 다시 입력하는 것은 잘못 인식된 텍스트뿐만 아니라 인식에 문제가 없는 부분까지 다시 입력해야 하는 측면에서 비효율적이다.In this correction operation, the method of returning to the keyboard input is accompanied by cumbersome operations such as performing an accurate cursor movement operation on a small-sized touch screen or pressing the backspace key multiple times to input to erase already input text and re-enter it. . In addition, re-inputting in dictated units is inefficient in that not only erroneously recognized texts but also non-recognized parts must be re-entered.

한국 등록특허공보 제10-1651909호Korean Registered Patent Publication No. 10-1651909

본 발명이 해결하고자 하는 기술적 과제는 키보드 및 마우스를 대체하여 음성으로 명령을 입력하고, 입력 오류를 수정할 수 있는 스마트 글래스 및 이를 포함하는 음성 인식 시스템을 제공하는 것이다.A technical problem to be solved by the present invention is to provide smart glasses capable of inputting commands by voice and correcting input errors by replacing a keyboard and a mouse, and a voice recognition system including the same.

또한, 본 발명이 해결하고자 하는 기술적 과제는 음성 인식에 대한 정확도를 상당히 향상시켜 우수한 인식률을 확인할 수 있는 스마트 글래스 및 이를 포함하는 음성 인식 시스템을 제공하는 것이다.In addition, a technical problem to be solved by the present invention is to provide a voice recognition system including smart glasses capable of confirming an excellent recognition rate by significantly improving voice recognition accuracy.

또한, 본 발명이 해결하고자 하는 기술적 과제는 영문과 숫자로 혼합된 시리얼 넘버 또는 제품 넘버에서도 매우 뛰어난 인식률을 확인할 수 있는 스마트 글래스 및 이를 포함하는 음성 인식 시스템을 제공하는 것이다.In addition, a technical problem to be solved by the present invention is to provide smart glasses capable of verifying a very high recognition rate even in a serial number or product number mixed with English and numeric characters, and a voice recognition system including the same.

또한, 본 발명이 해결하고자 하는 기술적 과제는 동음이의에 의해 불가피한 오인식에 대하여 사용자는 손쉽게 오류를 수정할 수 있고, 오류 수정 결과를 업데이트하여 음성 인식 성능을 향상시킬 수 있는 스마트 글래스 및 이를 포함하는 음성 인식 시스템을 제공하는 것이다.In addition, a technical problem to be solved by the present invention is smart glasses that can easily correct errors for inevitable misrecognition due to homonyms and improve voice recognition performance by updating error correction results, and voice including the same. It is to provide a recognition system.

본 발명이 이루고자 하는 기술적 과제는 이상에서 언급한 기술적 과제로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problem to be achieved by the present invention is not limited to the above-mentioned technical problem, and other technical problems not mentioned can be clearly understood by those skilled in the art from the description below. There will be.

상기 기술적 과제를 달성하기 위하여, 본 발명의 일 실시예는 사용자의 신체에 착용되되, 사용자의 음성 및 조작 중 적어도 하나를 포함하는 입력을 수신하는 입력 수신부; 상기 음성을 인식하여 음성 인식 정보를 생성하는 음성 인식부; 상기 음성 인식 정보를 텍스트로 표시하는 표시부; 및 상기 입력 수신부를 통해 수신된 텍스트 선택 명령에 따라 상기 표시부에 표시된 텍스트 중 적어도 하나의 텍스트를 수정 대상 텍스트로 선택하고, 상기 입력 수신부를 통해 수신된 텍스트 수정 명령에 따라 상기 수정 대상 텍스트를 수정하는 제어부를 포함하는, 스마트 글래스를 제공한다.In order to achieve the above technical problem, one embodiment of the present invention doedoe worn on the user's body, the input receiving unit for receiving an input including at least one of the user's voice and manipulation; a voice recognition unit recognizing the voice and generating voice recognition information; a display unit displaying the voice recognition information as text; and selecting at least one of the texts displayed on the display unit as a text to be edited according to a text selection command received through the input receiving unit, and modifying the target text according to a text editing command received through the input receiving unit. Provided are smart glasses including a control unit.

본 발명의 실시예에 있어서, 상기 음성 인식부는, 스피치 투 텍스트(Speech To Text) API(Application Programming Interface)를 이용하여 상기 음성을 텍스트로 변환하여 출력할 수 있다.In an embodiment of the present invention, the voice recognition unit may convert the voice into text using a speech to text (Speech To Text) API (Application Programming Interface) and output the converted voice.

본 발명의 실시예에 있어서, 상기 입력 수신부는, 사용자의 음성을 입력으로 수신하는 음성 입력부; 및 사용자의 조작을 감지하여 입력으로 수신하는 조작 감지부를 포함할 수 있다.In an embodiment of the present invention, the input receiving unit may include: a voice input unit receiving a user's voice as an input; and a manipulation detector for detecting and receiving a user's manipulation as an input.

본 발명의 실시예에 있어서, 상기 조작 감지부는, 사용자의 터치를 감지하여 입력으로 수신하는 터치 패드; 및 사용자에 의한 상기 스마트 글래스의 변위를 감지하여 입력을 수신하는 변위 감지부를 포함하되, 상기 변위 감지부는, 사용자의 신체 움직임에 대한 바디 트래킹을 통해 벡터 변위를 감지하여 변위 정보를 생성할 수 있다.In an embodiment of the present invention, the manipulation sensor may include: a touch pad that detects and receives a user's touch as an input; and a displacement detector configured to sense displacement of the smart glasses by a user and receive an input, wherein the displacement detector may generate displacement information by detecting vector displacement through body tracking of a user's body movement.

본 발명의 실시예에 있어서, 상기 제어부는, 오류 수정 유저인터페이스를 통해 수정할 텍스트를 선택하도록 상기 변위 정보에 따라 설정된 단위만큼 커서를 이동시킬 수 있다.In an embodiment of the present invention, the control unit may move a cursor by a set unit according to the displacement information to select text to be corrected through an error correction user interface.

본 발명의 실시예에 있어서, 상기 제어부는, 상기 커서를 상기 수정 대상 텍스트로 이동시킨 경우, 음성 입력, 패드 터치, 모션 감지 중 적어도 하나로 구현되는 텍스트 선택 명령을 수신하면 상기 수정 대상 텍스트를 선택할 수 있다.In an embodiment of the present invention, when the controller moves the cursor to the text to be edited, and receives a text selection command implemented by at least one of voice input, pad touch, and motion detection, the control unit may select the text to be edited. there is.

상기 기술적 과제를 달성하기 위하여, 본 발명의 다른 실시예는 스마트 글래스; 및 상기 스마트 글래스와 통신망으로 연결되고, 상기 스마트 글래스로부터 텍스트를 수신하여 데이터베이스에 조회하여 세부 데이터를 상세하게 확인하고, 상기 세부 데이터를 시각화하는 정보제공서버를 포함하는, 음성 인식 시스템을 제공한다.In order to achieve the above technical problem, another embodiment of the present invention is smart glasses; and an information providing server that is connected to the smart glasses through a communication network, receives text from the smart glasses, checks detailed data by querying a database, and visualizes the detailed data.

본 발명의 일 실시예에 따르면, 키보드 및 마우스를 대체하여 음성으로 명령을 입력하고, 입력 오류를 수정할 수 있다.According to one embodiment of the present invention, it is possible to input a command by voice and correct an input error by replacing a keyboard and a mouse.

또한, 본 발명의 일 실시예에 따르면, 음성 인식에 대한 정확도를 상당히 향상시켜 우수한 인식률을 확인할 수 있다.In addition, according to an embodiment of the present invention, an excellent recognition rate can be confirmed by significantly improving the accuracy of voice recognition.

또한, 본 발명의 일 실시예에 따르면, 영문과 숫자로 혼합된 시리얼 넘버 또는 제품 넘버에서도 매우 뛰어난 인식률을 확인할 수 있다.In addition, according to an embodiment of the present invention, a very excellent recognition rate can be confirmed even in a serial number or product number mixed with English letters and numbers.

또한, 본 발명의 일 실시예에 따르면, 동음이의에 의해 불가피한 오인식에 대하여 사용자는 손쉽게 오류를 수정할 수 있다. 또한, 오류 수정 결과를 업데이트함으로써 차후에 더욱 정확한 음석 인식이 가능하도록 인식 성능을 향상시킬 수 있다.In addition, according to an embodiment of the present invention, a user can easily correct an error for unavoidable misrecognition due to homophony. In addition, by updating the error correction result, recognition performance may be improved so that more accurate speech recognition is possible in the future.

본 발명의 효과는 상기한 효과로 한정되는 것은 아니며, 본 발명의 상세한 설명 또는 특허청구범위에 기재된 발명의 구성으로부터 추론 가능한 모든 효과를 포함하는 것으로 이해되어야 한다.The effects of the present invention are not limited to the above effects, and should be understood to include all effects that can be inferred from the detailed description of the present invention or the configuration of the invention described in the claims.

도 1은 본 발명의 일 실시예에 따른 음성 인식 시스템의 구성을 나타내는 도면이다.
도 2는 도 1의 입력 수신부의 세부구성을 나타내는 도면이다.
도 3은 도 2의 음성 입력부의 세부 구성을 나타내는 도면이다.
도 4는 도 2의 조작 감지부의 세부구성을 나타내는 도면이다.
도 5는 본 발명의 일 실시예에 따른 음성인식을 통한 텍스트 입력 및 수정 방법에 따라 수정음절을 수신하고, 대응하는 텍스트블록을 선택하여 수정하는 일예시를 도시한 것이다.1 is a diagram showing the configuration of a voice recognition system according to an embodiment of the present invention.
FIG. 2 is a diagram showing a detailed configuration of an input receiving unit of FIG. 1 .
FIG. 3 is a diagram showing a detailed configuration of the voice input unit of FIG. 2 .
FIG. 4 is a view showing a detailed configuration of a manipulation detection unit of FIG. 2 .
5 illustrates an example of receiving a corrected syllable and selecting and correcting a corresponding text block according to the text input and correction method through voice recognition according to an embodiment of the present invention.

이하에서는 첨부한 도면을 참조하여 본 발명을 설명하기로 한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 따라서 여기에서 설명하는 실시예로 한정되는 것은 아니다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, the present invention will be described with reference to the accompanying drawings. However, the present invention may be embodied in many different forms and, therefore, is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결(접속, 접촉, 결합)"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 부재를 사이에 두고 "간접적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 구비할 수 있다는 것을 의미한다.Throughout the specification, when a part is said to be "connected (connected, contacted, combined)" with another part, this is not only "directly connected", but also "indirectly connected" with another member in between. "Including cases where In addition, when a part "includes" a certain component, it means that it may further include other components without excluding other components unless otherwise stated.

본 명세서에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms used in this specification are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "include" or "have" are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or more other features It should be understood that the presence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded.

이하 첨부된 도면을 참고하여 본 발명의 실시예를 상세히 설명하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 음성 인식 시스템의 구성을 나타내는 도면이다. 도 2는 도 1의 입력 수신부의 세부구성을 나타내는 도면이다. 도 3은 도 2의 음성 입력부의 세부 구성을 나타내는 도면이다. 도 4는 도 2의 조작 감지부의 세부구성을 나타내는 도면이다.1 is a diagram showing the configuration of a voice recognition system according to an embodiment of the present invention. FIG. 2 is a diagram showing a detailed configuration of an input receiving unit of FIG. 1 . FIG. 3 is a diagram showing a detailed configuration of the voice input unit of FIG. 2 . FIG. 4 is a view showing a detailed configuration of a manipulation detection unit of FIG. 2 .

도 1 내지 도 4를 참조하면, 본 발명의 일 실시예에 따른 음성 인식 시스템은 스마트 글래스(100) 및 정보제공서버(300)를 포함할 수 있다.Referring to FIGS. 1 to 4 , the voice recognition system according to an embodiment of the present invention may include smart glasses 100 and an information providing server 300 .

구체적으로, 상기 스마트 글래스(100)는 사용자의 음성 및 조작 중 적어도 하나를 포함하는 입력을 수신하는 입력 수신부(110), 상기 음성을 인식하여 음성 인식 정보를 생성하는 음성 인식부(120), 상기 음성 인식 정보를 텍스트로 표시하는 표시부(130) 및 상기 입력 수신부를 통해 수신된 텍스트 선택 명령에 따라 상기 표시부에 표시된 텍스트 중 적어도 하나의 텍스트를 수정 대상 텍스트로 선택하고, 상기 입력 수신부를 통해 수신된 텍스트 수정 명령에 따라 상기 수정 대상 텍스트를 수정하는 제어부(140)를 포함할 수 있다.Specifically, the smart glasses 100 include an input receiver 110 that receives an input including at least one of a user's voice and manipulation, a voice recognizer 120 that recognizes the voice and generates voice recognition information, According to the display unit 130 displaying voice recognition information as text and a text selection command received through the input receiver, at least one text displayed on the display unit is selected as a text to be edited, and the text received through the input receiver is selected. A control unit 140 for modifying the target text according to a text modification command may be included.

상기 입력 수신부(110)는 사용자의 음성이 포함된 오디오 신호를 입력받고, 오디오 신호를 처리하여 사용자 음성 신호를 생성하거나, 사용자의 조작을 감지하여 감지 신호를 생성할 수 있다. 이때, 상기 입력 수신부(110)는 상기 스마트 글래스(100)의 본체의 외부에 구비될 수 있다. 상기 스마트 글래스(100)의 본체의 외부에 구비된 경우, 상기 입력 수신부(110)는 무선 인터페이스(예를 들어, Wi-Fi, 블루투스 등)을 통해 상기 사용자 음성 신호를 상기 스마트 글래스(100)의 본체에 전송할 수 있다. 이를 위하여, 상기 입력 수신부(110)는 사용자의 음성을 입력으로 수신하는 음성 입력부(210) 및 사용자의 조작을 감지하여 입력으로 수신하는 조작 감지부(220)를 포함할 수 있다.The input receiving unit 110 may receive an audio signal including a user's voice, process the audio signal to generate a user voice signal, or detect a user's manipulation to generate a detection signal. In this case, the input receiving unit 110 may be provided outside the main body of the smart glasses 100 . When provided outside the main body of the smart glasses 100, the input receiver 110 transmits the user voice signal through a wireless interface (eg, Wi-Fi, Bluetooth, etc.) to the surface of the smart glasses 100. can be transmitted to the body. To this end, the input receiving unit 110 may include a voice input unit 210 that receives a user's voice as an input and a manipulation detector 220 that detects and receives a user's operation as an input.

상기 음성 입력부(210)는 마이크(211), 컨버터(213), 에너지 판단부(215), 노이즈 제거부(217) 및 음성 신호 생성부(219)를 포함할 수 있다.The voice input unit 210 may include a microphone 211, a converter 213, an energy determination unit 215, a noise removal unit 217, and a voice signal generator 219.

여기서, 상기 마이크(211)는 사용자의 음성이 포함된 아날로그 형태의 오디오 신호를 입력받을 수 있다.Here, the microphone 211 may receive an analog audio signal including the user's voice.

또한, 상기 컨버터(213)는 상기 마이크(211)로부터 입력된 다채널 아날로그 신호를 디지털 신호로 변환할 수 있다. Also, the converter 213 may convert a multi-channel analog signal input from the microphone 211 into a digital signal.

또한, 상기 에너지 판단부(215)는 변환된 디지털 신호의 에너지를 계산하여, 디지털 신호의 에너지가 기설정된 값 이상인지 여부를 판단할 수 있다. 만약, 디지털 신호의 에너지가 기설정된 값 이상인 경우, 상기 에너지 판단부(215)는 입력된 디지털 신호를 상기 노이즈 제거부(217)로 전송할 수 있다. 만약, 디지털 신호의 에너지가 기설정된 값 미만인 경우, 상기 에너지 판단부(215)는 입력된 디지털 신호를 외부로 출력하지 않고, 다른 입력을 기다릴 수 있다. 이는 음성 신호가 아닌 소리에 의해 전체 오디오 처리 과정이 활성화되지 않아, 불필요한 전력 소모를 방지하기 위함이다.In addition, the energy determination unit 215 may calculate the energy of the converted digital signal and determine whether the energy of the digital signal is equal to or greater than a predetermined value. If the energy of the digital signal is greater than or equal to a predetermined value, the energy determination unit 215 may transmit the input digital signal to the noise removal unit 217 . If the energy of the digital signal is less than a predetermined value, the energy determination unit 215 may wait for another input without outputting the input digital signal to the outside. This is to prevent unnecessary power consumption because the entire audio processing process is not activated by sound rather than a voice signal.

한편, 상술한 실시예에서는 상기 에너지 판단부(215)를 이용하여 불필요한 전력 소모를 방지하는 것으로 설명하였으나, 이는 일 실시예에 불과할 뿐, 버튼을 이용하여, 불필요한 전력 소모를 방지할 수도 있다. 예를 들어, 버튼을 누른 경우에 입력되는 음성 신호에 대해서는 음성 인식을 수행하여, 불필요한 전력 소모를 방지할 수 있게 된다.Meanwhile, in the above-described embodiment, it has been described that unnecessary power consumption is prevented using the energy determination unit 215, but this is only an example, and unnecessary power consumption can be prevented using a button. For example, voice recognition is performed on a voice signal input when a button is pressed, thereby preventing unnecessary power consumption.

상기 노이즈 제거부(217)는 디지털 신호가 입력된 경우, 노이즈 성분과 사용자 음성 성분이 포함된 디지털 신호 중 노이즈 성분을 제거할 수 있다. 이때, 노이즈 성분은 사용자의 주변 환경에서 흔히 발생할 수 있는 돌발성 잡음으로써, 에어컨 소리, 청소기 소리, 음악 소리 등이 포함될 수 있다. 그리고, 상기 노이즈 제거부(217)는 노이즈 성분이 제거된 디지털 신호를 상기 음성 신호 생성부(219)로 출력할 수 있다.When a digital signal is input, the noise removal unit 217 may remove a noise component from a digital signal including a noise component and a user's voice component. In this case, the noise component is a sudden noise that can often occur in the user's surrounding environment, and may include air conditioner sound, vacuum cleaner sound, music sound, and the like. Also, the noise removal unit 217 may output the digital signal from which noise components have been removed to the audio signal generation unit 219 .

상기 음성 신호 생성부(219)는 로컬라이제이션/스피커 추적(Localization/Speaker Tracking) 모듈을 이용하여 상기 음성 입력부(210)를 기준으로 360˚ 범위 내에 존재하는 사용자의 발화 위치를 추적하여 사용자 음성에 대한 방향 정보를 구할 수 있다. 또한, 상기 음성 신호 생성부(219)는 목표 음원 추출(Target Spoken Sound Extraction) 모듈을 통해 노이즈가 제거된 디지털 신호와 사용자 음성에 대한 방향 정보를 이용하여 상기 음성 입력부를 기준으로 360˚ 범위 내에 존재하는 목표 음원을 추출할 수 있다. 특히, 상기 음성 입력부(210)가 외부에 구비된 경우, 상기 음성 신호 생성부(219)는 사용자의 음성을 상기 스마트 글래스(100)로 전송하기 위한 형태의 상기 사용자 음성 신호로 변환하고, 무선 인터페이스를 이용하여 상기 스마트 글래스(100)의 본체로 상기 사용자 음성 신호를 전송할 수 있다.The voice signal generation unit 219 uses a localization/speaker tracking module to track the user's speech position existing within a 360° range with respect to the voice input unit 210, thereby determining the direction of the user's voice. information can be obtained. In addition, the voice signal generation unit 219 exists within a range of 360 degrees relative to the voice input unit by using a digital signal from which noise has been removed through a target spoken sound extraction module and direction information on the user's voice. A target sound source can be extracted. In particular, when the voice input unit 210 is provided externally, the voice signal generation unit 219 converts the user's voice into the user's voice signal in a form to be transmitted to the smart glasses 100, and transmits the user's voice through the wireless interface. The user voice signal may be transmitted to the main body of the smart glasses 100 by using.

다음, 상기 조작 감지부(220)는 사용자의 터치를 감지하여 입력으로 수신하는 터치 패드(222) 및 사용자에 의한 상기 스마트 글래스(100)의 변위를 감지하여 입력을 수신하는 변위 감지부(224)를 포함할 수 있다.Next, the manipulation detector 220 includes a touch pad 222 that detects a user's touch and receives it as an input, and a displacement detector 224 that detects the user's displacement of the smart glasses 100 and receives an input. can include

상기 터치 패드(222)는 사용자의 터치 유무, 터치 횟수, 터치 지속시간(강도) 중 적어도 하나를 감지하고, 감지 결과를 상기 제어부(140)로 제공할 수 있다.The touch pad 222 may detect at least one of the presence or absence of a user's touch, the number of touches, and the touch duration (strength), and provide a detection result to the control unit 140 .

또한, 상기 변위 감지부(224)는 사용자의 신체 움직임에 대한 바디 트래킹(body tracking)을 통해 벡터 변위를 감지하여 변위 정보를 생성할 수 있다. 이러한 상기 변위 감지부(224)는 가속도, 각속도 및 자기장 중 적어도 하나를 감지하는 관성센서(예를 들어, 가속도센서, 각속도센서, 지자계센서)를 포함할 수 있다.Also, the displacement detection unit 224 may generate displacement information by detecting vector displacement through body tracking of the user's body movement. The displacement sensor 224 may include an inertial sensor (eg, an acceleration sensor, an angular velocity sensor, or a earth magnetic field sensor) that detects at least one of acceleration, angular velocity, and magnetic field.

다음, 상기 음성 인식부(120)는 상기 입력 수신부(110)를 통해 수신된 사용자의 음성을 인식할 수 있다. 여기서, 상기 음성 인식부(120)는 스피치 투 텍스트(Speech To Text: STT) API(Application Programming Interface)를 이용하여 상기 음성을 텍스트로 변환하여 출력할 수 있다. 예를 들면, 상기 음성 인식부(120)는 구글 STT API를 이용하여 사용자가 발화한 음성을 텍스트로 변환할 수 있다.Next, the voice recognition unit 120 may recognize the user's voice received through the input receiving unit 110 . Here, the voice recognition unit 120 may convert the voice into text using a speech to text (STT) application programming interface (API) and output the converted voice. For example, the voice recognition unit 120 may convert the voice spoken by the user into text using the Google STT API.

다음, 상기 표시부(130)는 상기 제어부(140)의 제어에 의해 영상 데이터를 표시할 수 있다. 여기서, 상기 표시부(130)는 상기 음성 인식부(120)에서 텍스트로 변환된 음성을 텍스트로 표시할 수 있다. 또한, 상기 표시부(130)는 오인식된 사용자의 음성을 수정하기 위한 오류 수정 UI(User Interface)를 표시할 수 있다.Next, the display unit 130 may display image data under the control of the control unit 140 . Here, the display unit 130 may display the voice converted into text by the voice recognition unit 120 as text. In addition, the display unit 130 may display an error correction user interface (UI) for correcting the misrecognized user's voice.

다음, 상기 제어부(140)는 사용자의 명령에 의해 상기 스마트 글래스(100)의 전반적인 동작을 제어할 수 있다. 만약, 상기 표시부(130)에 표시된 텍스트가 잘못된 경우, 상기 제어부(140)는 상기 음성 인식부(120)에서 오인식되어 변환된 텍스트를 수정하기 위하여 수정할 텍스트를 선택하고, 선택된 텍스트를 올바른 텍스트로 수정하는 텍스트 오류 수정을 처리할 수 있다. 예를 들면, 상기 제어부(140)는 숫자 '2'와 알파벳 'E', 숫자 '5'와 알파벳 'O', 한글 'ㅔ'와 한글 'ㅐ' 같은 동음이의어에 의해 불가피한 오인식에 대하여 손쉽게 수정할 수 있도록 텍스트 오류 수정을 처리할 수 있다.Next, the controller 140 may control the overall operation of the smart glasses 100 according to a user's command. If the text displayed on the display unit 130 is incorrect, the control unit 140 selects the text to be corrected in order to correct the text misrecognized and converted by the voice recognition unit 120, and corrects the selected text to correct text. It can handle text error correction. For example, the controller 140 can easily correct inevitable misrecognition by homonyms such as number '2' and alphabet 'E', number '5' and alphabet 'O', Korean 'ㅔ' and Korean 'ㅐ'. You can handle text error correction so that

여기서, 상기 표시부(130)에 표시된 텍스트의 오류를 판단하는 주체는 사용자 및 상기 제어부(140) 중 적어도 하나일 수 있다. 이때, 상기 제어부(140)는 음향 모델 및 언어 모델 중 적어도 하나를 포함하는 인공지능(기계 학습을 포함)을 이용하여 텍스트의 오류를 판단할 수 있다. 이때, 상기 음향 모델은 음소가 어떤 식으로 발성되는지 다수의 화자 발성 데이터를 토대로 훈련하여 만들어지는 음성의 통계적 모델로 설정될 수 있다. 또한, 상기 언어 모델은 음성 신호의 문법을 검색할 수 있고, 텍스트 말뭉치 데이터베이스로부터 문법을 추출하여 학습 및 탐색시 임의적인 문장보다는 문법에 맞도록 단어와 단어 사이의 말의 규칙을 정해두는 문법적 모델로 설정될 수 있다.Here, a subject that determines an error in the text displayed on the display unit 130 may be at least one of the user and the control unit 140 . In this case, the controller 140 may determine an error in the text using artificial intelligence (including machine learning) including at least one of a sound model and a language model. In this case, the acoustic model may be set as a statistical model of voice created by training how phonemes are uttered based on a plurality of speaker utterance data. In addition, the language model is a grammatical model that can search the grammar of speech signals, extracts grammar from a text corpus database, and determines rules of speech between words and words to match grammar rather than arbitrary sentences during learning and search. can be set.

상기 텍스트 오류 수정을 수행하기 위하여, 상기 제어부(140)는 상기 오류 수정 UI가 표시되도록 상기 표시부(130)를 제어할 수 있다. 또한, 상기 제어부(140)는 상기 오류 수정 UI를 통해 수정할 텍스트를 선택하도록 상기 변위 정보에 따라 설정된 단위만큼 커서를 이동시킬 수 있다. 또한, 상기 커서를 상기 수정 대상 텍스트로 이동시킨 경우, 상기 제어부(140)는 음성 입력, 패드 터치, 모션 감지 중 적어도 하나로 구현되는 텍스트 선택 명령을 수신하면 상기 수정 대상 텍스트를 선택할 수 있다. 예를 들면, 상기 입력 수신부(110)를 통해 “수정”의 음성 또는 터치 패드의 터치가 입력된 경우, 상기 제어부(140)는 텍스트 선택 명령을 수신한 것으로 판단하여 상기 수정 대상 텍스트를 선택할 수 있다.To perform the text error correction, the controller 140 may control the display unit 130 to display the error correction UI. Also, the controller 140 may move the cursor by a set unit according to the displacement information to select text to be corrected through the error correction UI. Also, when the cursor is moved to the text to be edited, the control unit 140 may select the text to be edited when receiving a text selection command implemented by at least one of voice input, pad touch, and motion detection. For example, when a voice of “edit” or a touch of a touch pad is input through the input receiving unit 110, the control unit 140 determines that a text selection command has been received and selects the text to be corrected. .

특히, 상기 제어부(140)는 상기 커서의 이동시키는 과정에서 기 설정된 동음이의어로 판단된 텍스트에 우선 순위 또는 가중치를 두고 상기 커서를 빠르게 이동시킬 수 있다. 예를 들면, 상기 제어부(140)는 숫자 '2'와 알파벳 'E', 숫자 '5'와 알파벳 'O', 한글 'ㅔ'와 한글 'ㅐ'와 같은 동음이의어에 우선적으로 상기 커서를 이동시킬 수 있다.In particular, in the process of moving the cursor, the control unit 140 may give priority or weight to text determined as a preset homonym and quickly move the cursor. For example, the controller 140 preferentially moves the cursor to homonyms such as number '2' and alphabet 'E', number '5' and alphabet 'O', Korean 'ㅔ' and Korean 'ㅐ'. can make it

여기서, 상기 제어부(140)는 상기 텍스트 선택 명령에 따라 하나 이상의 텍스트를 선택할 수 있다. 만약, 상기 표시부(130)에 표시된 텍스트에서 설정된 단위의 음절 또는 단어로 선택하는 경우, 상기 제어부(140)는 상기 텍스트 선택 명령이 해당 단위의 음절 또는 단어를 선택하도록 설정된 음성 입력, 패드 터치, 모션 감지 중 어느 하나인지 판단하여 상기 수정 대상 텍스트를 선택할 수 있다. 예를 들면, 상기 제어부(140)는 “단어 수정” 음성, 2초 이상의 패드 터치, 엄지 및 검지의 사각(블록) 생성으로 구현되는 상기 텍스트 선택 명령을 수신하면, 텍스트 말뭉치에서 커서가 위치하는 단어를 상기 수정 대상 텍스트를 선택할 수 있다.Here, the controller 140 may select one or more texts according to the text selection command. If a syllable or word of a set unit is selected from the text displayed on the display unit 130, the control unit 140 controls the text selection command to select a syllable or word of the corresponding unit through voice input, pad touch, or motion. It is possible to select one of the detections to select the text to be corrected. For example, when the control unit 140 receives the text selection command implemented by a “word correction” voice, a pad touch of 2 seconds or more, and a square (block) generation of the thumb and index finger, the word on which the cursor is located in the text corpus. to select the text to be corrected.

상기 제어부(140)는 상기 수정 대상 텍스트를 수정할 때 사용자가 직접 음성 입력한 텍스트로 변경하여 상기 표시부(130)로 수정된 텍스트를 표시할 수 있다.When modifying the text to be corrected, the control unit 140 may change the text to a text input directly by a user and display the corrected text on the display unit 130 .

또한, 상기 제어부(140)는 변경 결과에 따라 상기 음성 인식부(120), 상기 음향 모델 및 상기 언어 모델 중 적어도 하나를 업데이트할 수 있다. 즉, 사용자가 발화한 음성이 다음에 다시 입력되는 경우 수정된 텍스트가 최우선적으로 출력될 수 있도록 상기 제어부(140)는 상기 음성 인식부(120), 상기 음향 모델 및 상기 언어 모델 중 적어도 하나를 학습시킬 수 있다. 예를 들면, 상기 제어부(140)는 상기 음성 인식부(120)가 수정된 텍스트에 해당하는 음성에 대해 더 높은 가중치를 두도록 STT API를 업데이트할 수 있다. 또한, 상기 제어부(140)는 수정된 텍스트에 해당하는 단어 또는 문법에 더 높은 가중치를 두도록 상기 음향 모델 또는 상기 언어 모델을 업데이트할 수 있다.In addition, the controller 140 may update at least one of the voice recognition unit 120, the acoustic model, and the language model according to a change result. That is, when the voice spoken by the user is input again next time, the control unit 140 selects at least one of the voice recognition unit 120, the acoustic model, and the language model so that the corrected text can be output first. can be learned For example, the controller 140 may update the STT API so that the voice recognition unit 120 places a higher weight on a voice corresponding to the modified text. Also, the controller 140 may update the acoustic model or the language model so that a higher weight is given to words or grammars corresponding to the modified text.

상기 정보제공서버(300)는 상기 스마트 글래스(100)와 통신망으로 연결되고 상기 스마트 글래스(100)로부터 텍스트를 수신할 수 있다. 또한, 상기 정보제공서버(300)는 수신한 텍스트를 기 저장된 정보(데이터베이스)에 조회하여 세부 데이터를 상세하게 확인하고 상기 세부 데이터를 시각화하여 상기 스마트 글래스(100)에 제공하거나, 자체 표시장치를 이용하여 표시할 수 있다.The information providing server 300 is connected to the smart glasses 100 through a communication network and can receive text from the smart glasses 100 . In addition, the information providing server 300 inquires the received text into pre-stored information (database), confirms detailed data in detail, visualizes the detailed data, and provides it to the smart glasses 100 or displays its own display device. can be displayed using

예를 들면, 상기 정보제공서버(300)는 상기 스마트 글래스(100)로부터 텍스트로 변환된 장치 부품의 시리얼 넘버(S/N) 및 제품 넘버(P/N) 중 적어도 하나를 수신한 후 자체 데이터베이스와 연계하여 내부정보 상세조회 및 데이터 시각화를 수행할 수 있다.For example, the information providing server 300 receives at least one of the serial number (S/N) and product number (P/N) of the device part converted into text from the smart glasses 100, and then has its own database. In conjunction with, internal information detailed inquiry and data visualization can be performed.

도 5는 본 발명의 일 실시예에 따른 음성인식을 통한 텍스트 입력 및 수정 방법에 따라 수정음절을 수신하고, 대응하는 텍스트블록을 선택하여 수정하는 일예시를 도시한 것이다.5 illustrates an example of receiving a corrected syllable and selecting and correcting a corresponding text block according to the text input and correction method through voice recognition according to an embodiment of the present invention.

도 5의 (a)는 스마트 글래스(100)에 텍스트 '게선방안' 이 입력되어 표시된 상태를 도시한 것이다. 도 5의 (a)에 따르면, 사용자의 '개선방안'에 대한 입력정보가 '게선방안'으로 잘못 입력되어 수정이 필요한 상태임을 확인할 수 있다. 도 5의 (b)는 본 발명의 일 실시예에 따른 음성인식을 통해 텍스트를 입력 및 수정하는 방법에 따라 사용자가 '수정모드' 등의 텍스트수정 활성화 명령을 입력하여 수정모드가 활성화되고, 이후 입력한 수정음절에 해당하는 음성정보 '개선'에 대응하는 텍스트 '게선'(310)이 텍스트 블록으로 선택된 상태를 도시한 것이다.FIG. 5(a) shows a state in which the text 'Gaeseonplan' is input and displayed on the smart glasses 100. Referring to FIG. According to (a) of FIG. 5 , it can be confirmed that the input information for the 'improvement plan' by the user is erroneously input as 'measures for improvement' and needs to be corrected. 5(b) shows a method for inputting and correcting text through voice recognition according to an embodiment of the present invention, in which a user inputs a text correction activation command such as 'edit mode', and then the edit mode is activated. It shows a state in which the text 'geseon' 310 corresponding to the voice information 'improvement' corresponding to the input corrected syllable is selected as a text block.

도 5의 (c)는 본 발명의 일 실시예에 따른 음성인식을 통해 텍스트를 입력 및 수정하는 방법에 따라 텍스트블록을 모음분리 교정법을 통해 수정하는 일예시를 도시한 것이다. 도 5의 (c)에 따르면, 텍스트 '게선' (310) 이 텍스트 블록으로 선택되어 있고, 입력확인 메뉴 인터페이스(320)를 통해, 사용자가 '가이선'에 해당하는 음성정보를 입력하고 있음을 확인할 수 있다. 도 5의 (d)는 도 5의 (b) 및 (c)에서 도시한, 수정모드에서의 사용자의 음성정보 입력에 따라 텍스트 '게선' (310) 이 텍스트 '개선'(311)으로 수정된 상태를 도시한 것이다.5(c) shows an example of correcting a text block through a vowel separation correction method according to a method of inputting and correcting text through voice recognition according to an embodiment of the present invention. According to (c) of FIG. 5, the text 'Gae-seon' 310 is selected as a text block, and through the input confirmation menu interface 320, the user is inputting voice information corresponding to 'Guy-seon'. You can check. FIG. 5(d) shows the text 'geseon' 310 modified to the text 'improvement' 311 according to the user's voice information input in the edit mode shown in FIG. 5(b) and (c). state is shown.

구체적으로 설명하면, 사용자는 수정음절로 음성정보 '개선' 또는 '게선'을 입력함으로써, 텍스트 '게선' (310)을 텍스트블록으로 선택하고, 선택된 텍스트블록에 대하여 음성정보 '가이선'을 입력함으로써, 음성인식부가 수신한 '가', '이' 및 '선'에 대한 음성정보 중 모음분리 교정법에 따라 음성정보 '가' 및 두번째 음절 '이'의 자음을 제외한 모음 'ㅣ'에 기초하여 이중모음 입력정보를 생성하여 텍스트 '개'가 생성되고 음성정부 '선' 과 결합하여 텍스트 '개선' (311) 이 입력된다. 이러한 자음을 제외한 모음정보를 생성하는 기준은, '입력된 음성정보의 간격'을 기준으로 하거나 '이중모음입력' 및 '입력종료' 등의 기 설정된 명령어를 통해 이중모음 입력정보를 수신할 수 있다.Specifically, the user selects the text 'Geseon' 310 as a text block by inputting the voice information 'improvement' or 'geiseon' as a modified syllable, and inputs the voice information 'gaiseon' for the selected text block. By doing so, based on the vowel 'ㅣ' excluding the consonants of the voice information 'A' and the second syllable 'I' according to the vowel separation correction method, The text 'dog' is generated by generating the diphthong input information, and the text 'improvement' 311 is input by combining with the phonetic government 'line'. The criterion for generating vowel information excluding these consonants is based on the 'interval of input voice information', or input double vowel information can be received through preset commands such as 'input double vowels' and 'end input'. .

예를 들어 도 5의 (c)에서 도시한 음성정보 '가이선'(320)을 입력하는 경우 '가이' 와 '선'을 발성할 때 '가이'를 발성한 후 일정시간 간격을 두어 음성정보 '가이'를 텍스트 '개' 로 인식하도록 하거나, '이중모음입력' 과 같은 기능활성화 명령어를 사용하여 음성정보 '가이'를 텍스트 '개' 로 인식하도록 할 수 있으며, 본 예시와 같은 '개선'의 경우 텍스트 '선' 은 이중모음 정보를 포함하고 있지 않으므로, 음성정보 '가이선'을 그대로 발성하더라도 '개선'으로 인식하도록 할 수도 있다.For example, in the case of inputting the voice information 'Guy line' 320 shown in (c) of FIG. 'Guy' can be recognized as text 'Dog', or voice information 'Guy' can be recognized as text 'Dog' by using a function activation command such as 'Enter double vowel', and 'improvement' as in this example In the case of , since the text 'seon' does not contain diphthong information, even if the voice information 'gaiseon' is uttered as it is, it may be recognized as 'improvement'.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes, and those skilled in the art can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts should be interpreted as being included in the scope of the present invention.

100: 스마트 글래스
300: 정보제공서버100: smart glasses
300: information provision server

Claims

In the smart glasses worn on the user's body,
an input receiving unit receiving an input including at least one of a user's voice and manipulation;
a voice recognition unit recognizing the voice and generating voice recognition information;
a display unit displaying the voice recognition information as text; and
A control unit that selects at least one of the texts displayed on the display unit as text to be corrected according to a text selection command received through the input receiver, and corrects the target text according to a text correction command received through the input receiver. ;
Including, smart glasses.

According to claim 1,
The voice recognition unit,
Characterized in that, the smart glasses convert the voice into text using a speech to text (API) (application programming interface) and output it.

According to claim 1,
The input receiving unit,
a voice input unit that receives a user's voice as an input; and
a manipulation detection unit that senses and receives a user's manipulation as an input;
Characterized in that it comprises a, smart glasses.

According to claim 3,
The manipulation detection unit,
a touch pad that detects a user's touch and receives it as an input; and
a displacement sensor configured to sense a displacement of the smart glasses by a user and receive an input;
Including,
The displacement sensor,
Smart glasses characterized in that displacement information is generated by detecting vector displacement through body tracking of a user's body movement.

According to claim 1,
The control unit,
Characterized in that, the smart glasses move a cursor by a set unit according to the displacement information to select text to be corrected through an error correction user interface.

According to claim 5,
The control unit,
The smart glasses, characterized in that, when the cursor is moved to the text to be edited, the text to be edited is selected when a text selection command implemented by at least one of voice input, pad touch, and motion detection is received.

The smart glasses according to any one of claims 1 to 6; and
an information providing server connected to the smart glasses through a communication network, receiving text from the smart glasses, checking detailed data by querying a database, and visualizing the detailed data;
Including, voice recognition system.