KR20220019551A

KR20220019551A - Voice service method through artificial intelligence learning

Info

Publication number: KR20220019551A
Application number: KR1020200100109A
Authority: KR
Inventors: 노태경; 임서환
Original assignee: (주)씨아이그룹
Priority date: 2020-08-10
Filing date: 2020-08-10
Publication date: 2022-02-17

Abstract

The present invention relates to a voice service method through AI learning, wherein after analyzing and learning a frequency and number of vibrations which are the main components of voice directly spoken or recorded by a person for a long time through an AI program, performs a service with the corresponding voice. The voice service method through AI learning of the present invention performed on an electronic equipment equipped with an AI algorithm comprises: (a) a step of receiving a voice to be learned; (b) a step of extracting only the voice of a user after analyzing the frequency and the number of vibrations of an input voice by driving the AI algorithm; (c) a step of performing iterative learning through the AI algorithm targeting the input voice; and (d) a step of converting and outputting the voice of the user learned in the step when an arbitrary text is inputted.

Description

Voice service method through AI learning {Voice service method through artificial intelligence learning}

본 발명은 AI 학습을 통한 음성 서비스 방법에 관한 것으로, 특히 사람이 직접 말하거나 녹음하여 들려준 음성의 주요 구성요소인 주파수와 진동수를 AI 프로그램을 통해 장시간 분석하고 학습한 후 해당 음성으로 서비스를 수행할 수 있도록 한 AI 학습을 통한 음성 서비스 방법에 관한 것이다.The present invention relates to a voice service method through AI learning. In particular, the frequency and frequency, which are the main components of a voice spoken or recorded by a person, are analyzed and learned for a long time through an AI program, and then the service is performed with the voice. It is about a voice service method through AI learning that made it possible.

근래에는 인간 수준의 지능을 구현하는 인공지능 시스템이 다양한 분야에서 이용되고 있다. 인공지능 시스템은 기존의 룰(rule) 기반 스마트 시스템과 달리 기계가 스스로 학습하고 판단하며 똑똑해지는 시스템이다. 인공지능 시스템은 사용할수록 인식률이 향상되고 사용자 취향을 보다 정확하게 이해할 수 있게 되어, 기존 룰 기반 스마트 시스템은 점차 딥러닝 기반 인공지능 시스템으로 대체되고 있다.Recently, artificial intelligence systems that implement human-level intelligence are being used in various fields. Unlike the existing rule-based smart system, an artificial intelligence system is a system in which a machine learns, judges, and becomes smarter by itself. The more the AI system is used, the higher the recognition rate and the more accurate understanding of user preferences. Existing rule-based smart systems are gradually being replaced by deep learning-based AI systems.

인공지능 기술은 기계학습, 예를 들어 딥러닝 알고리즘 및 기계학습 알고르짐을 활용한 요소 기술들로 구성된다. 기계학습 알고리즘은 입력 데이터들의 특징을 스스로 분류 및 학습하는 알고리즘 기술이며, 요소 기술은 딥러닝 등의 기계학습 알고리즘을 활용하여 인간 두뇌의 인지 및 판단 등의 기능을 모사하는 기술로서, 언어적 이해, 시각적 이해, 추론/예측, 지식 표현, 동작 제어 등의 기술 분야로 구성된다.Artificial intelligence technology consists of component technologies using machine learning, for example, deep learning algorithms and machine learning algorithms. Machine learning algorithm is an algorithm technology that categorizes and learns the characteristics of input data by itself, and element technology uses machine learning algorithms such as deep learning to simulate functions such as cognition and judgment of the human brain. It consists of technical fields such as visual understanding, reasoning/prediction, knowledge expression, and motion control.

인공지능 기술이 응용되는 다양한 분야는 다음과 같다. 언어적 이해는 인간의 언어/문자를 인식하고 응용/처리하는 기술로서, 자연어 처리, 기계 번역, 대화시스템, 질의응답, 음성 인식/합성 등을 포함한다. 시각적 이해는 사물을 인간의 시각처럼 인식하여 처리하는 기술로서, 객체 인식, 객체 추적, 영상 검색, 사람 인식, 장면 이해, 공간 이해, 영상 개선 등을 포함한다. 추론 예측은 정보를 판단하여 논리적으로 추론하고 예측하는 기술로서, 지식/확률 기반 추론, 최적화 예측, 선호 기반 계획, 추천 등을 포함한다.The various fields where artificial intelligence technology is applied are as follows. Linguistic understanding is a technology for recognizing and applying/processing human language/text, and includes natural language processing, machine translation, dialogue system, question and answer, and speech recognition/synthesis. Visual understanding is a technology for recognizing and processing objects like human vision, and includes object recognition, object tracking, image search, human recognition, scene understanding, spatial understanding, image improvement, and the like. Inference prediction is a technology for logically reasoning and predicting by judging information, and includes knowledge/probability-based reasoning, optimization prediction, preference-based planning, and recommendation.

지식 표현은 인간의 경험 정보를 지식 데이터로 자동화 처리하는 기술로서, 지식 구축(데이터 생성/분류), 지식 관리(데이터 활용) 등을 포함한다. 동작 제어는 차량의 자율 주행, 로봇의 움직임을 제어하는 기술로서, 움직임 제어(항법, 충돌, 주행), 조작 제어(행동 제어) 등을 포함한다.Knowledge expression is a technology that automatically processes human experience information into knowledge data, and includes knowledge construction (data generation/classification), knowledge management (data utilization), and the like. Motion control is a technology for controlling autonomous driving of a vehicle and movement of a robot, and includes motion control (navigation, collision, driving), manipulation control (action control), and the like.

전술한 바와 같이 인공지능 시스템이 다양한 분야에서 활용이 모색되으나 AI 알고리즘에 의한 반복적인 학습을 통해 본인 음성의 주파수 및 진동수를 분석하여 얻어진 결과에 따라 문자로 이루어진 텍스트를 본인의 음성으로 출력하는 등의 서비스 등이 제안되어 있지 않아서 그 활용성에 제한이 있는 문제점이 있었다.As mentioned above, although artificial intelligence systems are being explored for use in various fields, through repeated learning by AI algorithms, the frequency and frequency of one's own voice are analyzed and the text composed of characters is output as one's voice, etc. There was a problem in that there was a limitation in its usability because the service was not proposed.

선행기술1 : 10-2019--0057687호 공개특허공보(발명의 명칭 : 챗봇 변경을 위한 전자 장치 및 이의 제어 방법)Prior Art 1: 10-2019--0057687 Publication of Patent Publication (Title of the Invention: Electronic device for chatbot change and control method thereof)

선행기술2 : 10-2019-0095620호 공개특허공보(발명의 명칭 : 전자 장치 및 그의 제어 방법)Prior Art 2: 10-2019-0095620 Unexamined Patent Publication (Title of the Invention: Electronic device and control method thereof)

선행기술3 : 10-2019-0123362호 공개특허공보(발명의 명칭 : 인공지능을 이용한 음성 대화 분석 방법 및 장치)Prior Art 3: 10-2019-0123362 Publication of Patent Publication (Title of the Invention: Voice Conversation Analysis Method and Apparatus Using Artificial Intelligence)

본 발명은 전술한 문제점을 해결하기 위해 안출된 것으로서, 사람이 직접 말하거나 녹음하여 들려준 음성의 주요 구성 요소인 주파수와 진동수를 AI 프로그램을 통해 장시간 분석하고 학습한 후 해당 음성으로 서비스를 수행할 수 있도록 한 AI 학습을 통한 음성 서비스 방법을 제공함을 목적으로 한다.The present invention has been devised to solve the above problems, and after analyzing and learning the frequency and frequency, which are the main components of a voice spoken or recorded by a person, through an AI program for a long time, it is possible to perform a service with the voice. The purpose of this is to provide a voice service method through AI learning.

전술한 목적을 달성하기 위한 본 발명의 AI 학습을 통한 음성 서비스 방법은 AI 알고리즘이 탑재된 전자 기기에서 수행되되, 학습할 음성을 입력받는 (a) 단계; AI 알고리즘 구동하여 입력 음성의 주파수 및 진동수를 분석한 후에 사용자의 음성만을 추출하는 (b) 단계; 입력 음성을 대상으로 AI 알고리즘을 통한 반복 학습을 수행하는 (c) 단계 및 임의의 텍스트가 입력된 경우에 단계 에서 학습된 사용자 음성으로 변환하여 출력하는 (d) 단계를 포함하여 이루어진다.A voice service method through AI learning of the present invention for achieving the above object is performed in an electronic device equipped with an AI algorithm, comprising: (a) receiving a voice to be learned; (b) extracting only the user's voice after analyzing the frequency and frequency of the input voice by driving the AI algorithm; (c) of performing repeated learning through the AI algorithm on the input voice, and (d) of converting and outputting the user's voice learned in step (d) when an arbitrary text is input.

본 발명의 AI 학습을 통한 음성 서비스 방법에 따르면, AI 알고리즘을 통해 분석 및 학습한 사용자의 음성대로 텍스트를 변환하여 출력할 수가 있다.According to the voice service method through AI learning of the present invention, text can be converted and output according to the user's voice analyzed and learned through the AI algorithm.

도 1은 본 발명의 일 실시예에 따른 AI 학습을 통한 음성 서비스 방법의 설명하기 위한 흐름도.1 is a flowchart illustrating a voice service method through AI learning according to an embodiment of the present invention.

이하에서는 첨부한 도면을 참고하여 본 발명의 AI 학습을 통한 음성 서비스 방법의 바람직한 실시예에 대해 보다 상세하게 설명한다.Hereinafter, a preferred embodiment of the voice service method through AI learning of the present invention will be described in more detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 AI 학습을 통한 음성 서비스 방법의 설명하기 위한 흐름도인바, 예를 들어, 스마트폰, 태블릿 PC, 전자책 리더기, 데스크탑 PC, 랩탑 PC, 넷북 컴퓨터, PDA, PMP(portable multimedia player) 또는 웨어러블 장치 중 적어도 하나에 프로그램의 형태로 탑재될 수 있는데, 이하의 실시예에서는 편의상 이들을 통칭하여 '전자 기기'라 한다1 is a flowchart for explaining a voice service method through AI learning according to an embodiment of the present invention, for example, a smartphone, a tablet PC, an e-book reader, a desktop PC, a laptop PC, a netbook computer, a PDA, It may be loaded in the form of a program on at least one of a portable multimedia player (PMP) or a wearable device.

도 1에 도시한 바와 같이, 전자 기기는 학습할 음성을 입력(단계 S10)받는데, 이때 입력받는 음성은 사용자에 의해 사전에 녹음된 음성 파일에서 실행(출력)되거나 마이크를 통해 실시간으로 입력된 음성 파일, 또는 일상 생활 또는 통화 중에 다른 화자와의 대화에서 수집된 음성을 포함할 수 있다.As shown in FIG. 1 , the electronic device receives a voice to be learned (step S10), wherein the inputted voice is executed (output) from a voice file previously recorded by the user or inputted in real time through a microphone. It may include voices collected from files, or conversations with other speakers in everyday life or during a call.

다음으로 전자 기기는 AI 알고리즘 구동(단계 S20)하여 입력 음성의 주파수 및 진동수를 분석(단계 S30)한 후에 사용자의 음성만을 추출(단계 S40)하는데, 이와 같이 함으로써 학습에 필요한 음성 데이터를 사용자가 별도로 준비해야 하는 번거로움을 감소시킬 수가 있다.Next, the electronic device drives the AI algorithm (step S20) to analyze the frequency and frequency of the input voice (step S30), and then extracts only the user's voice (step S40). It can reduce the hassle of having to prepare.

다음으로, 전자 기기는 AI 알고리즘을 통한 학습을 수행(단계 S50)하는데, 여기에서 학습된 모델은 인공지능 기반으로 학습된 판단 모델로서, 예를 들어 신경망(Neural Network)을 기반으로 하는 모델일 수 있다. 객체 판단 모델은 인간의 뇌 구조를 컴퓨터상에서 모의하도록 설계될 수 있으며 인간의 신경망의 뉴런(neuron)을 모의하는, 가중치를 가지는 복수의 네트워크 노드들을 포함할 수 있다. 복수의 네트워크 노드들은 뉴런이 시냅스(synapse)를 통하여 신호를 주고받는 뉴런의 시냅틱(synaptic) 활동을 모의하도록 각각 연결 관계를 형성할 수 있다. 또한, 객체 판단 모델은, 일 예로, 신경망 모델, 또는 신경망 모델에서 발전한 딥러닝 모델을 포함할 수 있다. 딥러닝 모델에서 복수의 네트워크 노드들은 서로 다른 깊이(또는, 레이어)에 위치하면서 컨볼루션(convolution) 연결 관계에 따라 데이터를 주고받을 수 있다. 객체 판단 모델의 예에는 DNN(Deep Neural Network), RNN(Recurrent Neural Network), BRDNN(Bidirectional Recurrent Deep Neural Network) 등이 있을 수 있으나 이에 한정되지 않는다.Next, the electronic device performs learning through the AI algorithm (step S50), where the learned model is a judgment model learned based on artificial intelligence, for example, it may be a model based on a neural network. there is. The object judgment model may be designed to simulate a human brain structure on a computer and may include a plurality of network nodes having weights that simulate neurons of a human neural network. The plurality of network nodes may each form a connection relationship to simulate a synaptic activity of a neuron through which a neuron sends and receives a signal through a synapse. Also, the object determination model may include, for example, a neural network model or a deep learning model developed from a neural network model. In a deep learning model, a plurality of network nodes may exchange data according to a convolutional connection relationship while being located at different depths (or layers). Examples of the object determination model may include a deep neural network (DNN), a recurrent neural network (RNN), a bidirectional recurrent deep neural network (BRDNN), and the like, but is not limited thereto.

전자 기기는 이와 같이 AI 알고리즘을 통해 입력 음성을 대상으로 학습을 반복하면서, 임의의 텍스트가 입력되었는지를 판단, 예를 들어 외국어 번역문이나 외부의 음성 인식 기기를 제어하기 위한 명령어 문자 등이 입력되었는지를 판단(단계 S60)한다. 단계 S60에서의 판단 결과, 입력되지 않은 경우에는 단계 S10 이하를 반복 수행하는 반면에 입력된 경우에는 입력된 텍스트를 학습된 사용자 음성으로 변환하여 출력(단계 S70)한다.The electronic device repeats learning for the input voice through the AI algorithm as described above, and determines whether any text is input, for example, whether a foreign language translation or a command character for controlling an external voice recognition device is input. A judgment is made (step S60). As a result of the determination in step S60, if the input is not performed, steps S10 and below are repeatedly performed, whereas in the case of input, the input text is converted into the learned user's voice and output (step S70).

이상에서 상세히 설명한 바와 같은 본 발명은 해당 분야에 속하는 기술분야의 당업자들이 본 발명의 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다.The present invention as described in detail above will be understood by those skilled in the art to which it pertains to be embodied in other specific forms without changing the technical spirit or essential features of the present invention.

그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해되어야 하고, 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.Therefore, the embodiments described above are to be understood as illustrative and not restrictive in all respects, and the scope of the present invention is indicated by the following claims rather than the above detailed description, the meaning and scope of the claims, and All changes or modifications derived from the concept of equivalents thereof should be construed as being included in the scope of the present invention.

Claims

It is performed on an electronic device equipped with an AI algorithm,
(a) receiving a voice to be learned;
(b) extracting only the user's voice after analyzing the frequency and frequency of the input voice by driving the AI algorithm;
(c) performing iterative learning through an AI algorithm on the input voice; and
A voice service method through AI learning comprising the step of (d) converting and outputting the user's voice learned in step when an arbitrary text is input.