KR102504445B1

KR102504445B1 - System and method for supporting artificial intelligence speech recognition and conversation services

Info

Publication number: KR102504445B1
Application number: KR1020220086890A
Authority: KR
Inventors: 조한희
Original assignee: (주)인티그리트
Priority date: 2022-07-14
Filing date: 2022-07-14
Publication date: 2023-03-02

Abstract

The present invention relates to a system for supporting artificial intelligence speech recognition and conversation services, comprising: an AI chatbot for extracting first response information corresponding to query content included in query speech of a speaker from a pre-built learning data set and providing the same; and a cloud response server for deriving second response information appropriate for the query content included in the query speech of the speaker transmitted from the AI chatbot, using at least one NLP engine, and providing the derived second response information to the AI chatbot, wherein the AI chatbot compares the first response information and the second response information to select and enunciate response information having delay time and response accuracy meeting preset conditions. Therefore, immediate response without delay in response speed is enabled.

Description

System and method for supporting artificial intelligence speech recognition and conversation services}

본 발명은 인공지능형 음성 인식 및 대화 서비스를 지원하는 시스템 및 방법에 관한 것이다.The present invention relates to a system and method for supporting artificial intelligence voice recognition and conversation services.

첨단 기술이 일상의 일부가 되기 위해서는 직관적이며 자유로운 인터페이스는 필수 항목이 되고 있다. 첨단 지능형 서비스를 목표하지만 고객과 일정 거리를 두고 운영되어야 하는 차세대 서비스 로봇을 위해서는 보다 진보된 사용자 인터페이스가 요구되고 있다.For advanced technology to become a part of daily life, an intuitive and free interface is becoming a must. A more advanced user interface is required for next-generation service robots that aim for advanced intelligent services but must be operated at a certain distance from customers.

그러나, 새롭게 제시되고 있는 제스처와 키네틱(Kinect) 인터페이스, 시선 인식하거나 뇌의 전기적 신호를 인식하는 인터페이스 등의 경우, 로봇에서는 아직 실증 단계를 넘어서지 못하고 있으며, 콜 센터의 관제, 제한된 어플리케이션에 머물러 있는 음성인식 인터페이스 또한 로봇의 활용에는 기술적 제약이 존재하고 있다.However, in the case of newly proposed gestures, kinect interfaces, gaze recognition or interfaces that recognize electrical signals in the brain, robots have not yet passed the demonstration stage, and call center control and voice remain in limited applications. Recognition interface also has technical limitations in the use of robots.

지난 수 개년간 스마트 스피커의 등장으로 음성을 활용한 스마트 디바이스의 확대 가능성을 보여주었으며, 음성 대화서비스를 주도하는 핵심 플랫폼으로 부상했다.The emergence of smart speakers over the past several years has shown the possibility of expanding smart devices using voice, and has emerged as a key platform leading voice conversation services.

그러나 보이스 인터페이스를 제공하는 스마트 디바이스가 가지는 부정확한 인식, 잘못된 해석, 제한된 어휘, 느린 클라우드 서비스 속도라는 부정적인 인식은 새로운 스마트 디바이스의 사용자 인터페이스로서 음성인식을 채택하는 데에 걸림돌로 작용하고 있으며, 이 같은 시행착오를 뛰어 넘어 실내 서비스 로봇을 통한 디지털 비서를 구현하는 기술은 아직 불안정한 실정이다.However, negative perceptions such as inaccurate recognition, misinterpretation, limited vocabulary, and slow cloud service speed of smart devices that provide voice interfaces act as obstacles to adopting voice recognition as a user interface for new smart devices. Beyond trial and error, the technology to implement a digital assistant through an indoor service robot is still unstable.

공개특허공보 제10-2017-0093629호 (발명의 명칭: 음성인식 장치 및 방법, 음성인식시스템)Publication No. 10-2017-0093629 (Title of Invention: Voice Recognition Apparatus and Method, Voice Recognition System)

본 발명이 해결하고자 하는 과제는 종래의 문제점을 해결할 수 있는 인공지능형 음성 인식 및 대화 서비스를 지원하는 시스템 및 방법을 제공하는 데 그 목적이 있다.An object of the present invention is to provide a system and method for supporting artificial intelligence voice recognition and conversation services that can solve the conventional problems.

상기 과제를 해결하기 위한 본 발명의 일 실시예에 따른 인공지능형 음성 인식 및 대화 서비스를 지원하는 시스템은 발화자의 질의음성에 포함된 질의내용과 부합하는 제1 응답정보를 기 구축된 학습 데이터 셋에서 추출하여 제공하는 AI 챗봇; 및 상기 AI 챗봇에서 전달한 발화자의 질의음성에 포함된 질의내용에 적합한 제2 응답정보를 적어도 하나 이상의 NLP 엔진을 이용하여 도출한 후, 도출된 제2 응답정보를 상기 AI 챗봇으로 제공하는 클라우드 응답서버를 포함하고, 상기 AI 챗봇은 제1 응답정보 및 제2 응답정보를 대조하여 기 설정된 조건에 부합하는 지연시간, 응답의 정확성을 갖는 응답정보를 선택하여 발화하도록, 발화자의 음성정보에 포함된 질의내용(질의어)를 감지하면, 상기 질의내용(질의어)에 대한 응답을 요청하는 웨이크업부; 상기 질의내용(질의어)에 대한 응답에 부합하는 응답정보를 학습된 학습 데이터 셋에서 추출하여 제공하는 로컬 응답제공부; 및 상기 로컬 응답제공부 및 상기 클라우드 응답서버 각각에서 도출된 응답정보를 대조하여 기 설정된 조건에 부합하는 지연시간, 응답의 정확성을 갖는 응답정보를 선택하여 발화하는 응답 대기 매니저부로 구성된 프로세서부를 포함하고, 상기 응답 대기 매니저부는 상기 웨이크업부에서 발화된 응답정보에 대한 거절 또는 부정적 언어를 감지하면, 미선택된 응답정보를 발화하고, 상기 웨이크업부에서 기 설정된 시간 내에 거절 또는 부정적 언어를 미 감지하면, 상기 미 선택된 응답정보를 삭제하는 것을 특징으로 한다.
일 실시예에서, 상기 클라우드 응답서버는 상기 AI 챗봇으로부터 발화자의 질의내용(질의어)에 적합한 응답정보를 제공받기 위한 적어도 하나 이상의 NLP 엔진 중 어느 하나를 선별한 후, 선별된 NLP 엔진에서 도출된 응답정보를 AI 챗봇으로 제공하는 것을 특징으로 한다.In order to solve the above problem, a system supporting artificial intelligence voice recognition and conversation service according to an embodiment of the present invention converts first response information matching the query content included in the speaker's query voice from a pre-built learning data set. AI chatbot that extracts and provides; And a cloud response server for deriving second response information suitable for the inquiry contents included in the questioning voice of the speaker transmitted from the AI chatbot using at least one NLP engine, and then providing the derived second response information to the AI chatbot. Including, the AI chatbot compares the first response information and the second response information and selects and utters response information having a delay time and response accuracy that meets a preset condition. A query included in the speaker's voice information a wake-up unit requesting a response to the query content (query word) when detecting content (query word); a local response provider extracting and providing response information corresponding to a response to the query content (query language) from a learned learning data set; And a processor unit composed of a response waiting manager unit that compares response information derived from the local response provider and the cloud response server, selects and ignites response information having a delay time and response accuracy that meets a preset condition, and , When the response standby manager detects rejection or negative language for response information spoken by the wakeup unit, ignites unselected response information, and when the wakeup unit does not detect rejection or negative language within a preset time, the It is characterized in that unselected response information is deleted.
In one embodiment, the cloud response server selects one of at least one NLP engine for receiving response information suitable for the speaker's query content (query language) from the AI chatbot, and then the response derived from the selected NLP engine It is characterized by providing information to an AI chatbot.

삭제delete

일 실시예에서, 상기 클라우드 응답서버는 라운드 로빈 방식(round robin fashion)에 따라 적어도 하나 이상의 NLP 엔진 각각에서 응답정보를 도출시킨 후, 도출된 응답정보 중 정확도가 가장 높은 응답정보를 선택하여 AI 챗봇으로 제공하는 것을 특징으로 한다.In one embodiment, the cloud response server derives response information from each of at least one or more NLP engines according to a round robin fashion, and then selects response information with the highest accuracy among the derived response information to generate an AI chatbot. It is characterized by providing as.

일 실시예에서, 상기 클라우드 응답서버는 자동 음성 인식(ASR, automatic speech recognition) 응답, 자연어 해석(NLU, natural language Understanding) 응답, TTS(text to speech) 응답 중 적어도 하나의 형식의 응답정보를 제공하는 것을 특징으로 한다.In one embodiment, the cloud response server provides response information in the form of at least one of an automatic speech recognition (ASR) response, a natural language understanding (NLU) response, and a text to speech (TTS) response. It is characterized by doing.

일 실시예에서, 상기 클라우드 응답서버는 적어도 하나 이상의 NLP 엔진 각각에서 응답정보를 도출시킨 후, 도출된 응답정보의 정확율을 분석하고, 분석된 정확율이 기 설정된 기준을 충족하지 못할 경우, 해당 NLP 엔진에 대한 자가학습을 수행하도록 처리하는 것을 특징으로 한다.In one embodiment, the cloud response server derives response information from each of at least one NLP engine, analyzes the accuracy rate of the derived response information, and if the analyzed accuracy rate does not meet a predetermined standard, the corresponding NLP engine It is characterized by processing to perform self-learning for.

상기 과제를 해결하기 위한 본 발명의 일 실시예에 따른 인공지능형 음성 인식 및 대화 서비스를 지원하는 방법은 AI 챗봇에서 사용자의 음성정보(발화정보)를 감지하는 단계; 상기 AI 챗봇에서 음성정보(발화정보)를 텍스트로 변환한 후, 텍스트 내의 질의내용을 분석하여 상기 질의내용에 부합하는 응답정보를 학습된 데이터 셋에서 추출하며, 동시에 외부의 클라우드 응답서버로 음성정보(발화정보)의 텍스트 정보를 제공하는 단계; 및 상기 AI 챗봇에서 상기 클라우드 응답서버에서 제공된 응답정보와 AI 챗봇 내에 구축된 학습 데이터 셋에서 추출한 응답정보를 대조하여 기 설정된 조건에 부합하는 지연시간, 응답의 정확성을 갖는 응답정보를 선택 후, 발화하는 단계를 포함하고, 상기 발화하는 단계 이후, 상기 발화된 응답정보에 대한 사용자의 거절 또는 부정적 언어 감지하면, 미선택된 응답정보를 발화하는 단계; 및 기 설정된 시간 내에 사용자의 거절 또는 부정적 언어를 미 감지하면, 상기 미 선택된 응답정보를 삭제하는 단계를 포함하고, 상기 AI 챗봇은 제1 응답정보 및 제2 응답정보를 대조하여 기 설정된 조건에 부합하는 지연시간, 응답의 정확성을 갖는 응답정보를 선택하여 발화하도록, 발화자의 음성정보에 포함된 질의내용(질의어)를 감지하면, 상기 질의내용(질의어)에 대한 응답을 요청하는 웨이크업부; 상기 질의내용(질의어)에 대한 응답에 부합하는 응답정보를 학습된 학습 데이터 셋에서 추출하여 제공하는 로컬 응답제공부; 및 상기 로컬 응답제공부 및 상기 클라우드 응답서버 각각에서 도출된 응답정보를 대조하여 기 설정된 조건에 부합하는 지연시간, 응답의 정확성을 갖는 응답정보를 선택하여 발화하는 응답 대기 매니저부로 구성된 프로세서부를 포함한다.A method for supporting an artificial intelligence voice recognition and conversation service according to an embodiment of the present invention for solving the above problems includes detecting user's voice information (speech information) in an AI chatbot; After converting the voice information (speech information) into text in the AI chatbot, the query content in the text is analyzed to extract response information matching the query content from the learned data set, and at the same time, voice information to an external cloud response server providing text information of (speech information); And the AI chatbot compares the response information provided from the cloud response server with the response information extracted from the learning data set built in the AI chatbot, selects response information having delay time and accuracy of response that meets preset conditions, and then utters and, after the uttering step, uttering unselected response information when a user's rejection or negative language is detected for the uttered response information; and deleting the non-selected response information if the user's rejection or negative language is not detected within a preset time, wherein the AI chatbot compares the first response information and the second response information to meet a preset condition. a wake-up unit requesting a response to the query content (query word) included in the speaker's voice information so as to select and utter response information having a delay time and accuracy of response; a local response provider extracting and providing response information corresponding to a response to the query content (query language) from a learned learning data set; and a processor unit configured of a response waiting manager unit that compares response information derived from the local response providing unit and the cloud response server, selects and ignites response information having delay time and response accuracy that meet preset conditions. .

삭제delete

따라서, 본 발명의 일 실시예에 따른 인공지능형 음성 인식 및 대화 서비스를 지원하는 시스템 및 방법을 이용하면, 응답속도의 지연없이 즉각적인 응답이 가능하다는 이점이 있다.Therefore, using the system and method for supporting artificial intelligence voice recognition and conversation service according to an embodiment of the present invention, there is an advantage that immediate response is possible without delay in response speed.

또한, AI 챗봇에서 지원하는 로컬의 NLP 솔루션과 설정된 클라우드의 NLP 솔루션 각각에서 분석한 응답정보 중 적절한 응답이 존재할 경우, 그 중 하나를 선택하여 즉각적인 음성 데이터로 변환 후 사용자에게 출력 가능하다는 이점이 있다.In addition, if there is an appropriate response among the response information analyzed in each of the local NLP solution supported by the AI chatbot and the set cloud NLP solution, there is an advantage that one of them can be selected and converted into voice data immediately and then output to the user. .

더 나아가, 최소한의 지연시간으로 즉각적이 대화서비스를 가능하게 하는 것뿐만 아니라, 운영되는 시스템이 네트워크 또는 네트워크 블랙 아웃(black out)의 상황에서도 로컬의 NLP 엔진을 사용함으로써 사용자와 AI 챗봇 간의 대화서비스가 가능하다는 이점이 있다.Furthermore, in addition to enabling immediate conversation service with minimal delay, the operating system uses a local NLP engine even in a network or network blackout situation to provide a conversation service between a user and an AI chatbot. has the advantage of being possible.

도 1은 본 발명의 일 실시예에 따른 인공지능형 음성 인식 및 대화 서비스를 지원하는 시스템의 네트워크 구성도이다.
도 2는 도 1에 도시된 챗봇의 세부 구성도이다.
도 3은 도 2에 도시된 프로세서부의 세부 구성도이다.
도 4a는 도 1에 도시된 AI 챗봇의 로컬 대화 세션의 생성, 발화, 종료 과정을 설명한 예시도이다.
도 4b는 도 1에 도시된 클라우드 서버의 동작과정을 설명한 예시도이다.
도 5는 도 1에 도시된 AI 챗봇에서 발화하는 응답정보를 선별하는 과정을 도식화한 예시도이다.
도 6은 본 발명의 일 실시예에 따른 인공지능형 음성 인식 및 대화 서비스를 지원하는 방법을 설명한 흐름도이다.1 is a network configuration diagram of a system supporting artificial intelligence voice recognition and conversation service according to an embodiment of the present invention.
FIG. 2 is a detailed configuration diagram of the chatbot shown in FIG. 1 .
FIG. 3 is a detailed configuration diagram of a processor unit shown in FIG. 2 .
FIG. 4A is an exemplary view illustrating processes of creation, utterance, and termination of a local conversation session of the AI chatbot shown in FIG. 1 .
FIG. 4B is an exemplary view illustrating an operation process of the cloud server shown in FIG. 1 .
5 is an exemplary diagram illustrating a process of selecting response information uttered by the AI chatbot shown in FIG. 1.
6 is a flowchart illustrating a method for supporting an artificial intelligence voice recognition and conversation service according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail so that those skilled in the art can easily practice the present invention with reference to the accompanying drawings. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only the case where it is "directly connected" but also the case where it is "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, this means that it may further include other components, not excluding other components, unless otherwise stated, and one or more other characteristics. However, it should be understood that it does not preclude the possibility of existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

명세서 전체에서 사용되는 정도의 용어 "약", "실질적으로" 등은 언급된 의미에 고유한 제조 및 물질 허용오차가 제시될 때 그 수치에서 또는 그 수치에 근접한 의미로 사용되고, 본 발명의 이해를 돕기 위해 정확하거나 절대적인 수치가 언급된 개시 내용을 비양심적인 침해자가 부당하게 이용하는 것을 방지하기 위해 사용된다. 본 발명의 명세서 전체에서 사용되는 정도의 용어 "~(하는) 단계" 또는 "~의 단계"는 "~ 를 위한 단계"를 의미하지 않는다. As used throughout the specification, the terms "about", "substantially", etc., are used at or approximating that value when manufacturing and material tolerances inherent in the stated meaning are given, and do not convey an understanding of the present invention. Accurate or absolute figures are used to help prevent exploitation by unscrupulous infringers of the disclosed disclosure. The term "step of (doing)" or "step of" as used throughout the specification of the present invention does not mean "step for".

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. 한편, '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, '~부'는 어드레싱 할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체 지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.In this specification, a "unit" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized using two or more hardware, and two or more units may be realized by one hardware. On the other hand, '~ unit' is not limited to software or hardware, and '~ unit' may be configured to be in an addressable storage medium or configured to reproduce one or more processors. Thus, as an example, '~unit' refers to components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, and procedures. , subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables. Functions provided within components and '~units' may be combined into smaller numbers of components and '~units' or further separated into additional components and '~units'. In addition, components and '~units' may be implemented to play one or more CPUs in a device or a secure multimedia card.

본 명세서에 있어서 단말, 장치 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말, 장치 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말, 장치 또는 디바이스에서 수행될 수도 있다. In this specification, some of the operations or functions described as being performed by a terminal, device, or device may be performed instead by a server connected to the terminal, device, or device. Likewise, some of the operations or functions described as being performed by the server may also be performed by a terminal, apparatus, or device connected to the server.

본 명세서에서 있어서, 단말과 매핑(Mapping) 또는 매칭(Matching)으로 기술된 동작이나 기능 중 일부는, 단말의 식별 정보(Identifying Data)인 단말기의 고유번호나 개인의 식별정보를 매핑 또는 매칭한다는 의미로 해석될 수 있다.In this specification, some of the operations or functions described as mapping or matching with the terminal mean mapping or matching the terminal's unique number or personal identification information, which is the terminal's identifying data. can be interpreted as

이하, 첨부된 도면들에 기초하여 본 발명의 일 실시예에 따른 인공지능형 음성 인식 대화 서비스를 지원하는 시스템 및 방법을 보다 상세하게 설명하도록 한다.Hereinafter, a system and method for supporting an artificial intelligence voice recognition conversation service according to an embodiment of the present invention will be described in more detail based on the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 인공지능형 음성 인식 및 대화 서비스를 지원하는 시스템의 네트워크 구성도이고, 도 2는 도 1에 도시된 챗봇의 세부 구성도이고, 도 3은 도 2에 도시된 프로세서부의 세부 구성도이고, 도 4a는 도 1에 도시된 AI 챗봇의 로컬 대화 세션의 생성, 발화, 종료 과정을 설명한 예시도이고, 도 4b는 도 1에 도시된 클라우드 서버의 동작과정을 설명한 예시도이고, 도 5는 도 1에 도시된 AI 챗봇에서 발화하는 응답정보를 선별하는 과정을 도식화한 예시도이다.1 is a network configuration diagram of a system supporting artificial intelligence voice recognition and conversation service according to an embodiment of the present invention, FIG. 2 is a detailed configuration diagram of the chatbot shown in FIG. 1, and FIG. 3 is shown in FIG. Figure 4a is an example diagram explaining the process of creating, uttering, and ending a local conversation session of the AI chatbot shown in FIG. 1, and FIG. 5 is an exemplary diagram illustrating a process of selecting response information uttered by the AI chatbot shown in FIG. 1.

먼저, 도 1에 도시된 바와 같이, 본 발명의 일 실시예에 따른 인공지능형 음성 인식 및 대화 서비스를 지원하는 시스템(100)은 AI 챗봇(200) 및 클라우드 서버(300)로 구성되며, 각 구성은 네트워크로 통신하면, 상기 네트워크는 복수의 단말 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 이러한 네트워크의 일 예에는 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷(WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), 5GPP(5th Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 인터넷(Internet), LAN(Local Area Network), Wireless LAN(Wireless Local Area Network), WAN(Wide Area Network), PAN(Personal Area Network), RF(Radio Frequency), 블루투스(Bluetooth) 네트워크, NFC(Near-Field Communication) 네트워크, 위성 방송 네트워크, 아날로그 방송 네트워크, DMB(Digital Multimedia Broadcasting) 네트워크 등이 포함되나 이에 한정되지는 않는다.First, as shown in FIG. 1, the system 100 supporting artificial intelligence voice recognition and conversation service according to an embodiment of the present invention is composed of an AI chatbot 200 and a cloud server 300, each component When communicating through a network, the network refers to a connection structure capable of exchanging information between nodes such as a plurality of terminals and servers. Examples of such networks include a local area network (LAN) and a wide area network. It includes a wide area network (WAN), a World Wide Web (WWW), a wired and wireless data communication network, a telephone network, and a wired and wireless television communication network. Examples of wireless data communication networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), 5th Generation Partnership Project (5GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), Wi-Fi , Internet (Internet), LAN (Local Area Network), Wireless LAN (Wireless Local Area Network), WAN (Wide Area Network), PAN (Personal Area Network), RF (Radio Frequency), Bluetooth (Bluetooth) network, NFC ( A Near-Field Communication (Near-Field Communication) network, a satellite broadcasting network, an analog broadcasting network, a Digital Multimedia Broadcasting (DMB) network, etc. are included, but not limited thereto.

상기 AI 챗봇(200)은 발화자의 음성정보에 포함된 질의어에 대한 응답에 부합하는 응답정보를 구축된 학습 데이터 셋에서 추출한 제1 응답정보와 외부의 클라우드 서버로부터 상기 질의어에 대한 응답정보를 대조하여 기 설정된 조건에 부합하는 지연시간, 응답의 정확성을 갖는 응답정보를 선택하여 발화하는 구성일 수 있다.The AI chatbot 200 compares the first response information extracted from the learning data set corresponding to the response to the query included in the speaker's voice information and the response information to the query from the external cloud server. It may be configured to select and ignite response information having delay time and accuracy of response that meet a preset condition.

또한, 상기 AI 채봇(200)은 음성정보를 기초로 발화자를 구분 및 분류하고, 각 발화자가 사용하는 질의형태, 질의언어를 학습하고, 학습한 학습 데이터 셋을 DB로 저장하는 구성일 수 있다.In addition, the AI chatbot 200 may be configured to classify and classify talkers based on voice information, learn the query form and query language used by each talker, and store the learned learning data set in a DB.

상기 AI 챗봇(200)은 프로세서부(210), 메모리(220) 및 통신모듈(230)을 포함할 수 있다.The AI chatbot 200 may include a processor unit 210, a memory 220 and a communication module 230.

상기 프로세서부(210)는 하나 이상의 어플리케이션 프로세서(application processor, AP), 하나 이상의 커뮤니케이션 프로세서(communication processor, CP) 또는 적어도 하나 이상의 AI 프로세서(artificial intelligence processor)를 포함할 수 있다. 어플리케이션 프로세서, 커뮤니케이션 프로세서 또는 AI 프로세서는 서로 다른 IC(integrated circuit) 패키지들 내에 각각 포함되거나 하나의 IC 패키지 내에 포함될 수 있다.The processor unit 210 may include one or more application processors (APs), one or more communication processors (CPs), or one or more AI processors (artificial intelligence processors). The application processor, communication processor, or AI processor may be each included in different integrated circuit (IC) packages or included in one IC package.

어플리케이션 프로세서는 운영체제 또는 응용 프로그램을 구동하여 어플리케이션 프로세서에 연결된 다수의 하 드웨어 또는 소프트웨어 구성요소들을 제어하고, 멀티미디어 데이터를 포함한 각종 데이터 처리/연산을 수행할 수 있다. 일 예로, 어플리케이션 프로세서는 SoC(system on chip)로 구현될 수 있다. 프로세서부(210)는 GPU(graphic prcessing unit)를 더 포함할 수 있다.The application processor may control a plurality of hardware or software components connected to the application processor by driving an operating system or an application program, and may process/calculate various data including multimedia data. For example, the application processor may be implemented as a system on chip (SoC). The processor unit 210 may further include a graphic processing unit (GPU).

커뮤니케이션 프로세서는 네트워크로 연결된 클라우드 응답서버(300)와의 통신에서 데이터 링크를 관리하고 통신 프로토콜을 변환하는 기능을 수행할 수 있다. 일 예로, 커뮤니케이션 프로세서는 SoC로 구현될 수 있다. 커뮤니케이션 프로세서는 멀티미디어 제어 기능의 적어도 일부를 수행할 수 있다.The communication processor may perform functions of managing a data link and converting a communication protocol in communication with the cloud response server 300 connected through a network. For example, the communication processor may be implemented as an SoC. The communications processor may perform at least part of the multimedia control function.

또한, 커뮤니케이션 프로세서는 통신 모듈(230)의 데이터 송수신을 제어할 수 있다. 커뮤니케이션 프로세서는 어플리케이션 프로세서의 적어도 일부로 포함되도록 구현될 수도 있다.Also, the communication processor may control data transmission and reception of the communication module 230 . The communication processor may be implemented to be included as at least a part of the application processor.

어플리케이션 프로세서 또는 커뮤니케이션 프로세서는 각각에 연결된 비휘발성 메모리 또는 다른 구성요소 중 적어도 하나로부터 수신한 명령 또는 데이터를 휘발성 메모리에 로드(load)하여 처리할 수 있다. 또한, 어플리케이션 프로세서 또는 커뮤니케이션 프로세서는 다른 구성요소 중 적어도 하나로부터 수신하거나 다른 구성요소 중 적어도 하나에 의해 생성된 데이터를 비휘발성 메모리에 저장할 수 있다.The application processor or communication processor may load a command or data received from at least one of a non-volatile memory or other components connected thereto into the volatile memory and process the load. Also, the application processor or communication processor may store data received from at least one of the other components or generated by at least one of the other components in a non-volatile memory.

보다 구체적으로, 프로세서부(210)는 웨이크업부(211), 로컬 응답 제공부(212), 응답 대기 매니저부(213)을 포함할 수 있다.More specifically, the processor unit 210 may include a wakeup unit 211, a local response providing unit 212, and a response standby manager unit 213.

상기 웨이크업 감지부(211)는 발화자의 음성정보에 포함된 질의어를 감지하면, 상기 질의어에 대한 음성 대화 세션을 생성 및 응답을 요청하는 구성일 수 있다.The wake-up detection unit 211 may be configured to generate a voice conversation session for the query and request a response when it detects a query included in voice information of a talker.

상기 로컬 응답 제공부(212)는 상기 질의어에 대한 응답에 부합하는 응답정보를 인공지능 로봇 내에 구축된 학습 데이터 셋에서 추출하여 제공하는 구성일 수 있다.The local response providing unit 212 may be configured to extract and provide response information corresponding to a response to the query from a learning data set built in an artificial intelligence robot.

여기서, 상기 로컬 응답 제공부(220)는 메모리(230)에 저장된 프로그램을 이용하여 신경망을 학습할 수 있다. 특히, 질의 응답과 관련된 데이터를 인식하기 위한 신경망을 학습할 수 있다. 여기서, 신경망은 인간의 뇌 구조(예를 들어, 인간의 신경망의 뉴런 구조)를 컴퓨터 상에서 모의하도록 설계될 수 있다. 신경망은 입력층(input layer), 출력층(output layer) 및 적어도 하나의 은닉층(hidden layer)를 포함할 수 있다. 각 층은 가중치를 갖는 적어도 하나의 뉴런을 포함하고, 신경망은 뉴런과 뉴런을 연결하는 시냅스(synapse)를 포함할 수 있다. 신경망에서 각 뉴런은 시냅스를 통해 입력되는 입력 신호를 가중치(weight) 및/또는 편향(bias)에 대한 활성함수(activation function)의 함수값으로 출력할 수 있다.Here, the local response provider 220 may learn the neural network using a program stored in the memory 230 . In particular, a neural network for recognizing data related to query responses may be trained. Here, the neural network may be designed to simulate a human brain structure (eg, a neuron structure of a human neural network) on a computer. A neural network may include an input layer, an output layer, and at least one hidden layer. Each layer may include at least one neuron having a weight, and the neural network may include neurons and synapses connecting the neurons. In the neural network, each neuron may output an input signal input through a synapse as a function value of an activation function for weight and/or bias.

복수의 네트워크 모드들은 뉴런이 시냅스를 통해 신호를 주고받는 뉴런의 시냅틱 활동을 모의하도록 각각 연결관계에 따라 데이터를 주고받을 수 있다. 여기서 신경망은 신경망 모델에서 발전한 딥러닝 모델을 포함할 수 있다. 딥러닝 모델에서 복수의 네트워크 노드들은 서로 다른 레이어에 위치하면서 콘볼루션(convolution) 연결 관계에 따라 데이터를 주고받을 수 있다. 신경망 모델의 예는 심층 신경망(deep neural network, DNN), 합성곱 신경망(convolutional neural network, CNN), 순환 신경망(recurrent neural network), 제한 볼츠만 머신(restricted Boltzmann machine), 심층 신뢰 신경망(deep belief network), 심층 Q-네트워크(deep QNetwork)와 같은 다양한 딥러닝 기법들을 포함하며, 비전인식, 음성인식, 자연어처리, 음성/신호처리 등의 분야에서 적용될 수 있다.A plurality of network modes may transmit and receive data according to respective connections so as to simulate synaptic activity of neurons that transmit and receive signals through synapses. Here, the neural network may include a deep learning model developed from a neural network model. In the deep learning model, a plurality of network nodes may exchange data according to a convolutional connection relationship while being located in different layers. Examples of neural network models include deep neural networks (DNNs), convolutional neural networks (CNNs), recurrent neural networks, restricted Boltzmann machines, and deep belief networks. ), deep Q-Network, etc., and can be applied to fields such as vision recognition, voice recognition, natural language processing, and voice/signal processing.

즉, 로컬 응답 제공부(212)는 데이터 분류/인식을 위한 신경망을 학습하는 데이터 학습부를 포함할 수 있다. 데이터 학습부는 데이터 분류/인식을 판단하기 위하여 어떤 학습 데이터를 이용할지, 학습 데이터를 이용하여 데이터를 어떻게 분류하고 인식할지에 관한 기준을 학습할 수 있다. 데이터 학습부는 학습에 이용될 학습 데이터를 획득하고, 획득된 학습데이터를 딥러닝 모델에 적용함으로써, 딥러닝 모델을 학습할 수 있다. 데이터 학습부는 적어도 하나의 하드웨어 칩 형태로 제작되어 탑재될 수 있다. 일 예로, 데이터 학습부는 인공지능을 위한 전용 하드웨어 칩 형태로 제작될 수 있고, 범용 프로세서(CPU) 또는 그래픽 전용 프로세서(GPU)의 일부로 제작되어 탑재될 수 있다. 또한, 데이터 학습부는 소프트웨어 모듈로 구현될 수도 있다. 소프트웨어 모듈(또는 인스트럭션(instruction)을 포함하는 프로그램 모듈)로 구현되는 경우, 소프트웨어 모듈은 컴퓨터로 읽을 수 있는 판독 가능한 비일시적 판독 가능 기록 매체(non-transitory computer readable media)에 저장될 수 있다. 이 경우에 적어도 하나의 소프트웨어 모듈은 OS(operating system)에 제공되거나, 애플리케이션에 의해 제공될 수 있다.That is, the local response providing unit 212 may include a data learning unit that learns a neural network for data classification/recognition. The data learning unit may learn criteria regarding which training data to use to determine data classification/recognition and how to classify and recognize data using the training data. The data learning unit may acquire learning data to be used for learning and learn the deep learning model by applying the obtained learning data to the deep learning model. The data learning unit may be manufactured and mounted in the form of at least one hardware chip. For example, the data learning unit may be manufactured in the form of a dedicated hardware chip for artificial intelligence, and may be manufactured and mounted as a part of a general-purpose processor (CPU) or a graphics-only processor (GPU). Also, the data learning unit may be implemented as a software module. When implemented as a software module (or a program module including instructions), the software module may be stored in a computer-readable, non-transitory computer readable recording medium (non-transitory computer readable media). In this case, at least one software module may be provided to an operating system (OS) or may be provided by an application.

데이터 학습부는 획득된 학습 데이터를 이용하여, 신경망 모델이 소정의 데이터를 어떻게 분류/인식할지에 관한 판단기준을 가지도록 학습할 수 있다. 이때, 데이터 학습부에 의한 학습 방식은 지도 학습(supervised learning), 비지도 학습(unsupervised learning), 강화 학습(reinforcement learning)으로 분류될 수 있다. 여기서, 지도 학습은 학습 데이터에 대한 레이블(label)이 주어진 상태에서 인공 신경망을 학습시키는 방법을 지칭하며, 레이블이란 학습 데이터가 인공 신경망에 입력되는 경우 인공 신경망이 추론해야 하는 정답 (또는 결과 값)을 의미할 수 있다. 비지도 학습은 학습 데이터에 대한 레이블이 주어지지 않는 상태에서 인공 신경망을 학습시키는 방법을 의미할 수 있다. 강화 학습은 특정 환경 안에서 정의된 에이전트(agent)가 각 상태에서 누적 보상을 최대화하는 행동 혹은 행동 순서를 선택하도록 학습시키는 방법을 의미할 수 있다.The data learning unit may learn to have a criterion for determining how to classify/recognize predetermined data by using the acquired training data. At this time, the learning method by the data learning unit may be classified into supervised learning, unsupervised learning, and reinforcement learning. Here, supervised learning refers to a method of learning an artificial neural network given a label for training data, and a label is an answer (or a result value) that the artificial neural network must infer when learning data is input to the artificial neural network. can mean Unsupervised learning may refer to a method of training an artificial neural network in a state in which a label for training data is not given. Reinforcement learning may refer to a method of learning to select an action or an action sequence that maximizes a cumulative reward in each state by an agent defined in a specific environment.

또한, 데이터 학습부는 오류 역전파법(backpropagation) 또는 경사 하강법(gradient decent)을 포함하는 학습 알고리즘을 이용하여 신경망 모델을 학습시킬 수 있다. 신경망 모델이 학습되면 학습된 신경망 모델은 학습 모델이라 호칭할 수 있다. 학습 모델은 메모리(230)에 저장되어 학습 데이터가 아닌 새로운 입력 데이터에 대한 결과를 추론하는 데 사용될 수 있다.In addition, the data learning unit may train the neural network model using a learning algorithm including error backpropagation or gradient descent. When the neural network model is trained, the trained neural network model may be referred to as a learning model. The learning model may be stored in the memory 230 and used to infer results for new input data other than training data.

상기 응답 대기 매니저부(213)는 상기 로컬 응답제공부(212) 및 상기 클라우드 응답서버(300) 각각에서 전송된 응답정보를 대조하여 기 설정된 조건에 부합하는 지연시간, 응답의 정확성을 갖는 응답정보를 선택하여 발화하는 구성일 수 있다.The response waiting manager unit 213 compares the response information transmitted from the local response providing unit 212 and the cloud response server 300, and the response information having delay time and accuracy of response that meet preset conditions. It may be configured to ignite by selecting.

상기 응답 대기 매니저부(213)는 상기 웨이크업부에서 발화된 응답정보에 대한 거절 또는 부정적 언어 감지하면, 미선택된 응답정보를 발화하는 기능을 포함할 수 있다. The response waiting manager unit 213 may include a function of uttering non-selected response information when it detects rejection or negative language for response information uttered by the wake-up unit.

또한, 상기 응답 대기 매니저부(213)는 상기 웨이크업 감지부(211)에서 기 설정된 시간 내에 거절 또는 부정적 언어를 미 감지하면, 상기 미 선택된 응답정보를 삭제하는 기능을 포함할 수 있다.In addition, the response waiting manager unit 213 may include a function of deleting the unselected response information when the wakeup detection unit 211 does not detect rejection or negative language within a preset time.

한편, 본 발명에서 언급하는 AI 챗봇(200)은 지능형 가상비서, 가상 개인비서(Virtual Personal Assistant), 지능형 개인비서(Intelligent Personal Assistant), 대화형 에이전트(conversational agent), 가상 동반자(virtual companion), 가상도우미(virtual assistant) 등과 같은 용어로 이용되더라도 본 발명의 챗봇의 범위에 포함될 수 있다. Meanwhile, the AI chatbot 200 mentioned in the present invention is an intelligent virtual assistant, a virtual personal assistant, an intelligent personal assistant, a conversational agent, a virtual companion, Even if it is used as a term such as virtual assistant, it may be included in the scope of the chatbot of the present invention.

그리고, AI 챗봇(200)은 인공지능 분야의 강화학습 알고리즘을 통한 자연어 처리(NLP)와 자연어 생성(NLG)의 기능을 수행할 수 있다. 챗봇은, 인간이 하는 말과 글을 이해하고 자기가 갖고 있거나 학습한 데이터를 분석해서 인간이 이해할 수 있는 말과 글로 대답할 수 있는데, 이를 위해서는 도형, 문자, 음성 등의 패턴을 인식하여 인간이 쓰는 언어를 처리할 수 있으며, 논리적 추론까지 가능할 수 있다. 또한, 챗봇은, 사용자가 요구하는 정보를 비정형 데이터에서 찾아낼 수도 있고, 현실의 상황을 정보화하고 활용하는 지능화된 서비스를 제공할 수도 있다.In addition, the AI chatbot 200 may perform functions of natural language processing (NLP) and natural language generation (NLG) through a reinforcement learning algorithm in the field of artificial intelligence. A chatbot can understand human speech and text, analyze the data it has or has learned, and respond with words and text that humans can understand. It can process written language and can even make logical inferences. In addition, chatbots can find information requested by users from unstructured data, and can provide intelligent services that inform and utilize real situations.

이를 위해, 본 발명의 일 실시예에 따른 챗봇은, 기계가 도형, 문자, 음성 등을 식별하는 패턴 인식(Pattern Recognition), 인간이 보통 쓰는 언어를 기계가 인식하도록, 정보검색, 질의응답, 시스템 자동번역, 통역의 기능을 포함하는 자연어처리(Natural Language Processing), 정보 데이터의 뜻을 이해하고, 논리적 추론까지 할 수 있는 시멘틱 웹(Semantic Web), 비정형 텍스트 데이터에서 새롭고 유용한 정보를 찾아내는 텍스트 마이닝(Text Mining), 가상공간에서 현실의 상황을 정보화하고, 이를 활용하여 사용자 중심의 지능화된 서비스를 제공하는 상황인식컴퓨팅(Text Aware Computing)의 기능을 수행할 수 있다.To this end, the chatbot according to an embodiment of the present invention is a pattern recognition (Pattern Recognition) in which a machine identifies figures, texts, voices, etc., information search, question and answer, system so that a machine recognizes a language commonly used by humans Natural Language Processing, which includes automatic translation and interpretation functions; Semantic Web, which can understand the meaning of information data and even make logical inferences; and Text Mining, which finds new and useful information from unstructured text data. Text Mining), it can perform the function of context-aware computing that informationizes real situations in virtual space and provides user-centered intelligent services by utilizing them.

이때, 본 발명의 일 실시예는, 챗봇에 캐릭터를 부여, 즉 정체성을 심어주어 다양한 성격을 가지도록 학습시킬 수도 있다. 챗봇에 정체성을 심어주었을 경우 사람으로 인식하는 확률이 높아진다는 것은 다양한 연구들을 통해 밝혀져 있다. 사람이 로봇에게 성격을 부여하는 것은 로봇 사용자들이 로봇의 행동을 더욱 쉽게 이해할 수 있게 하고, 더욱 친근한 상호작용을 가능하게 해주기 때문에 도움이 되고 선호도가 더 높아질 수 있다.At this time, in one embodiment of the present invention, a character may be given to the chatbot, that is, an identity may be implanted so as to have various personalities. It has been revealed through various studies that the probability of recognizing a chatbot as a person increases when an identity is implanted in it. Humans endowing robots with personalities are helpful because robot users can more easily understand their behavior and enable more friendly interactions, which can lead to higher preference.

성격 유형 분석 도구로 다양한 방법들이 존재하지만 본 발명의 일 실시예에서는 관찰이 어려운 개인의 기질과 같은 내면적 요소보다는 쉽게 드러나는 행동을 통해 유형을 구분할 수 있는 DISC모델을 사용할 수 있다. 왜냐하면, DISC모델은 구분이 단순해서 사용자들의 전체적인 선호도를 파악하기 쉬우므로 인공지능에 적합한 성격을 파악하는데 적절하기 때문이다. DISC 분석의 기준이 되는 두 가지 차원은 대상(일/사람)과 속도(느림/빠름)이다. Various methods exist as a personality type analysis tool, but in one embodiment of the present invention, the DISC model can be used to classify types through behaviors that are easily revealed rather than internal factors such as personal temperament that are difficult to observe. This is because the DISC model is suitable for identifying the personality suitable for artificial intelligence because it is easy to grasp the overall preference of users due to its simple classification. The two dimensions that are the criteria for DISC analysis are object (day/person) and speed (slow/fast).

성격유형은 4가지로 주도형(D: Dominance), 사교형(I: Influences), 신중형(C: Conscintiousness), 안정형(S: Steadiness)으로 구분될 수 있다. 예를 들어, 주도형인 D형은 일 중심적 성향으로 빠른 결단과 추진력을 중요시한다. Personality types can be divided into four types: Dominance (D), Influences (I: Influences), Conscintiousness (C), and Steadiness (S). For example, D-types who are driven are work-oriented and value quick decisions and momentum.

사교형인 I형은 상대방에게 감화를 잘하고 적극적이며 재미있는 성격이다. Social type I type is good at influencing others, active and fun personality.

C형은 신중형으로 말수가 적고 분석적이고 정확하다.Type C is a prudent type, quiet, analytical, and precise.

S형은 안정형으로 여유 있으며 합리적인 성격이다. 이러한 성격 유형을 챗봇의 역할에 맞춰 가정해볼 수 있을 것이다. Type S is a stable, relaxed and rational personality. These personality types can be assumed to fit the role of the chatbot.

D형의 챗봇은 빠르고 정확한 일처리가 중요한 테스크 중심 챗봇에 적합할 수 있고 사교적인 I형은 심심할 때 말동무로 적절할 수 있을 것이고 신중한 C형 또는 안정적인 S형은 상담을 위한 챗봇으로 적절할 수 있다고 가정해볼 수 있다. 또한, 상술한 성격으로 정의가 되었다고 할지라도, 사용자의 취향 및 유형에 따른 선호도가 반영되어 변경적용될 수도 있다.Let's assume that D-type chatbots can be suitable for task-oriented chatbots where fast and accurate work processing is important, sociable I-types can be suitable as companions when bored, and cautious C-types or stable S-types can be suitable as counseling chatbots. can In addition, even if the character is defined as described above, the user's taste and preference according to the type may be reflected and applied.

또한, 본 발명의 일 실시예는, 대화형 에이전트를 사람처럼 느끼게 하는 인터랙션 설계인 의인화(anthropomorphism)를 더 포함할 수 있다. 이때, Computers Are Social Actors(CASA) 관련 연구들을 이용하여 긍정적인 사회적 반응을 끌어내 사람과 에이전트 간 상호작용의 질을 증진시킬 수도 있다.In addition, an embodiment of the present invention may further include anthropomorphism, which is an interaction design that makes the interactive agent feel like a human. At this time, the quality of interaction between humans and agents can be improved by eliciting positive social responses using studies related to Computers Are Social Actors (CASA).

상기 메모리(220)는 내장 메모리 또는 외장 메모리를 포함할 수 있다. 내장 메모리는 휘발성 메모리(예를 들면, DRAM(dynamic RAM), SRAM(static RAM), SDRAM(synchronous dynamic RAM) 등) 또는 비휘발성 메모리 비휘발성 메모리(예를 들면, OTPROM(one time programmable ROM), PROM(programmable ROM), EPROM(erasable and programmable ROM), EEPROM(electrically erasable and programmable ROM), mask ROM, flash ROM, NAND flash memory, NOR flash memory 등) 중 적어도 하나를 포함할 수 있다. 일례에 따르면, 내장 메모리는 SSD(solid state drive)의 형태를 취할 수도 있다. 외장 메모리는 플래시 드라이브(flash drive), 예를 들면, CF(compact flash), SD(secure digital), Micro-SD(micro secure digital), Mini- SD(mini secure digital), xD(extreme digital) 또는 메모리 스틱(memory stick) 등을 포함할 수 있다.The memory 220 may include a built-in memory or an external memory. The built-in memory includes volatile memory (eg, dynamic RAM (DRAM), static RAM (SRAM), synchronous dynamic RAM (SDRAM), etc.) or non-volatile memory (eg, one time programmable ROM (OTPROM)), It may include at least one of programmable ROM (PROM), erasable and programmable ROM (EPROM), electrically erasable and programmable ROM (EEPROM), mask ROM, flash ROM, NAND flash memory, NOR flash memory, etc.). According to one example, the embedded memory may take the form of a solid state drive (SSD). The external memory is a flash drive, for example, compact flash (CF), secure digital (SD), micro secure digital (Micro-SD), mini secure digital (Mini-SD), extreme digital (xD), or It may include a memory stick and the like.

다음으로, 통신모듈(230)은 외부의 클라우드 응답서버(300)로 상기 질의어에 대한 응답을 요청하고, 상기 클라우드 응답서버에서 응답한 응답정보를 수신하는 구성일 수 있다.Next, the communication module 230 may be configured to request a response to the query from an external cloud response server 300 and receive response information from the cloud response server.

상기 통신모듈(230)은 무선 통신 모듈 또는 RF 모듈를 포함할 수 있다. 무선 통신 모듈은, 예를 들면, Wi-Fi, BT, GPS 또는 NFC를 포함할 수 있다. 예를 들면, 무선 통신 모듈은 무선 주파수를 이용하여 무선 통신 기능을 제공 할 수 있다. 추가적으로 또는 대체적으로, 무선 통신 모듈은 사용자 디바이스(100)를 네트워크(예: Internet, LAN, WAN, telecommunication network, cellular network, satellite network, POTS 또는 5G network 등)와 연결시키기 위한 네트워크 인터페이스 또는 모뎀 등을 포함할 수 있다. The communication module 230 may include a wireless communication module or an RF module. The wireless communication module may include, for example, Wi-Fi, BT, GPS or NFC. For example, the wireless communication module may provide a wireless communication function using a radio frequency. Additionally or alternatively, the wireless communication module includes a network interface or modem for connecting the user device 100 to a network (eg, Internet, LAN, WAN, telecommunication network, cellular network, satellite network, POTS or 5G network, etc.) can include

RF 모듈은 데이터의 송수신, 예를 들면, RF 신호 또는 호출된 전자 신호의 송수신을 담당할 수 있다. 일 예로, RF 모듈은 트랜시버(transceiver), PAM(power amp module), 주파수 필터(frequency filter) 또는 LNA(low noise amplifier) 등을 포함할 수 있다. 또한, RF 모듈은 무선 통신에서 자유공간상의 전자파를 송수신하기 위한 부품, 예를 들면, 도체 또는 도선 등을 포함할 수 있다.The RF module may be responsible for transmitting and receiving data, for example, transmitting and receiving RF signals or called electronic signals. For example, the RF module may include a transceiver, a power amp module (PAM), a frequency filter, or a low noise amplifier (LNA). In addition, the RF module may include components for transmitting and receiving electromagnetic waves in free space in wireless communication, for example, conductors or wires.

다음으로, 상기 클라우드 응답서버(300)는 AI 챗봇(200)으로부터 발화자의 음성정보에 포함된 질의내용에 적합한 응답정보를 제공하는 적어도 하나 이상의 NLP 엔진 중 어느 하나를 선별한 후, 선별된 NLP 엔진에서 도출된 응답정보를 AI 챗봇(200)으로 제공하는 구성일 수 있다.Next, the cloud response server 300 selects any one of at least one NLP engine that provides response information suitable for the query included in the voice information of the speaker from the AI chatbot 200, and then selects the selected NLP engine It may be configured to provide the response information derived from to the AI chatbot 200.

또한, 상기 클라우드 응답서버(300)는 라운드 로빈 방식(round robin fashion)에 따라 적어도 하나 이상의 NLP 엔진 각각에서 응답정보를 도출시킨 후, 도출된 응답정보 중 정확도가 가장 높은 응답정보를 선택하여 AI 챗봇(200)으로 제공하는 구성일 수 있다.In addition, the cloud response server 300 derives response information from each of at least one NLP engine according to a round robin fashion, and selects response information with the highest accuracy among the derived response information to generate an AI chatbot. It may be a configuration provided as (200).

클라우드 응답서버(300)는 자동 음성 인식(ASR, automatic speech recognition) 응답, 자연어 해석(NLU, natural language Understanding) 응답, TTS(text to speech) 응답 중 적어도 하나의 형식의 응답정보를 제공할 수 있다.The cloud response server 300 may provide response information in the form of at least one of an automatic speech recognition (ASR) response, a natural language understanding (NLU) response, and a text to speech (TTS) response. .

자동 음성 인식 응답은, 질의내용에 대한 응답이 고정된 형태의 정보에 한에서 제공되고, 자연어 해석 응답은, 자연어 해석이 수행된 결과에 관한 정보에 대한 응답이고, TTS 응답은, TTS 기술에 따라 음성 신호로 변환된 정보에 대한 응답일 수 있다.In the automatic voice recognition response, a response to a query is provided only in a fixed form of information, a natural language interpretation response is a response to information about a result of natural language interpretation, and a TTS response is a response based on TTS technology. It may be a response to information converted into a voice signal.

또한, 상기 클라우드 응답서버(300)는 적어도 하나 이상의 NLP 엔진 각각에서 응답정보를 도출시킨 후, 도출된 응답정보의 정확율을 분석하고, 분석된 정확율이 기 설정된 기준을 충족하지 못할 경우, 해당 NLP 모델에 대한 자가학습을 수행하도록 처리하는 기능을 포함할 수 있다.In addition, the cloud response server 300 derives response information from each of the at least one NLP engine, analyzes the accuracy rate of the derived response information, and if the analyzed accuracy rate does not meet a preset standard, the corresponding NLP model It may include a function of processing to perform self-learning for.

도 6은 본 발명의 일 실시예에 따른 인공지능형 음성 인식 및 대화 서비스를 지원하는 방법을 설명한 흐름도이다.6 is a flowchart illustrating a method for supporting an artificial intelligence voice recognition and conversation service according to an embodiment of the present invention.

도 6에 도시된 바와 같이, 본 발명의 일 실시예에 따른 인공지능형 음성 인식 및 대화 서비스를 지원하는 방법(S700)은 AI 챗봇(200)에서 사용자의 음성정보(발화정보)를 감지(S710)하면, 음성정보(발화정보)를 텍스트로 변환한 후, 텍스트 내의 질의내용을 분석하여 상기 질의내용에 부합하는 응답정보를 학습된 데이터 셋에서 추출하며, 동시에 외부의 클라우드 응답서버로(300)로 음성정보(발화정보)의 텍스트 정보를 제공(S720)한다.As shown in FIG. 6, the method for supporting artificial intelligence voice recognition and conversation service according to an embodiment of the present invention (S700) detects user's voice information (speech information) in the AI chatbot 200 (S710) Then, after converting voice information (speech information) into text, analyzing the content of the query in the text, extracting response information matching the content of the query from the learned data set, and at the same time to the external cloud response server (300) Text information of voice information (speech information) is provided (S720).

이후, AI 챗봇(200)은 클라우드 응답서버(300)에서 제공한 음성정보(발화정보)의 질의내용에 부합하는 응답정보를 제공받은 후, AI 챗봇 내에 구축된 학습 데이터 셋에서 추출한 제1 응답정보와 외부의 클라우드 서버의 제2 응답정보를 대조하여 기 설정된 조건에 부합하는 지연시간, 응답의 정확성을 갖는 응답정보를 선택 후, 음성정보로 변환하여 발화(S730)하는 과정을 포함할 수 있다.Thereafter, the AI chatbot 200 receives response information corresponding to the query content of the voice information (speech information) provided by the cloud response server 300, and then first response information extracted from the learning data set built in the AI chatbot and second response information of an external cloud server, select response information having a delay time and response accuracy that meets a preset condition, and then convert the response information into voice information and speak (S730).

또한, 상기 발화하는 단계 이후, 상기 발화된 응답정보에 대한 사용자의 거절 또는 부정적 언어 감지하면, 미선택된 응답정보를 발화하는 과정을 더 포함할 수 있다.In addition, after the uttering step, if a user's rejection or negative language is detected for the uttered response information, a process of uttering unselected response information may be further included.

또한, 상기 발화하는 단계 이후, 기 설정된 시간 내에 사용자의 거절 또는 부정적 언어를 미 감지하면, 상기 미 선택된 응답정보를 삭제하는 과정을 더 포함할 수 있다.The method may further include deleting the non-selected response information if the user's rejection or negative language is not detected within a preset time after the utterance step.

본 발명의 일 실시예에서 사용된 “~부”는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 컨트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.“~ unit” used in one embodiment of the present invention may be implemented as a hardware component, a software component, and/or a combination of hardware components and software components. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It may be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. A processing device may run an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of software. For convenience of understanding, there are cases in which one processing device is used, but those skilled in the art will understand that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it can include. For example, a processing device may include a plurality of processors or a processor and a controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다Software may include a computer program, code, instructions, or a combination of one or more of the foregoing, which configures a processing device to operate as desired or processes independently or collectively. You can command the device. Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or to provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on one or more computer readable media.

본 발명의 실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광 기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 기술한 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program commands recorded on the medium may be specially designed and configured for the embodiment or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler. The described hardware devices may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

전술된 내용은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The foregoing may be modified and modified by those skilled in the art without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are not intended to limit the technical idea of the present invention, but to explain, and the scope of the technical idea of the present invention is not limited by these embodiments. The protection scope of the present invention should be construed according to the claims below, and all technical ideas within the equivalent range should be construed as being included in the scope of the present invention.

100: 시스템
200: AI 챗봇
210: 프로세서부
211: 웨이크업 감지부
212: 로컬 응답 제공부
213: 응답 대기 매니저부
220: 메모리
230: 통신부
300: 클라우드 응답서버100: system
200: AI chatbot
210: processor unit
211: wake-up detection unit
212: local response providing unit
213: response waiting manager unit
220: memory
230: communication department
300: cloud response server

Claims

an AI chatbot extracting and providing first response information corresponding to the inquiry contents included in the speaker's query voice from a pre-built learning data set; and
A cloud response server for deriving second response information suitable for the inquiry content included in the query voice of the speaker delivered by the AI chatbot using at least one NLP engine, and then providing the derived second response information to the AI chatbot include,
The AI chatbot compares the first response information and the second response information to select and utter response information having a delay time and response accuracy that meets a preset condition,
a wake-up unit requesting a response to the query content (query word) when detecting the query content (query word) included in the speaker's voice information;
a local response provider extracting and providing response information corresponding to a response to the query content (query language) from a learned learning data set; and
A processor unit configured of a response waiting manager unit that compares response information derived from the local response providing unit and the cloud response server, selects and ignites response information having a delay time and response accuracy that meets a preset condition,
The response waiting manager unit
When the wake-up unit detects rejection or negative language for response information uttered, unselected response information is ignited,
The system for supporting an artificial intelligence voice recognition conversation service, characterized in that, if the wake-up unit does not detect rejection or negative language within a predetermined time, the unselected response information is deleted.

delete

According to claim 1,
The cloud response server
After selecting any one of at least one NLP engine for receiving response information suitable for the speaker's query content (query language) from the AI chatbot, response information derived from the selected NLP engine is provided to the AI chatbot. A system that supports artificial intelligence voice recognition conversation service.

According to claim 1,
The cloud response server
After deriving response information from each of at least one NLP engine according to a round robin fashion, selecting the response information with the highest accuracy among the derived response information and providing it to an AI chatbot Artificial intelligent voice, characterized in that A system that supports cognitive conversation services.

According to claim 1,
The cloud response server
An artificial intelligence voice recognition conversation characterized by providing at least one type of response information among an automatic speech recognition (ASR) response, a natural language understanding (NLU) response, and a text to speech (TTS) response. systems that support the service.

According to claim 7,
The cloud response server
After deriving response information from each of at least one NLP engine, analyzing the accuracy rate of the derived response information, and processing to perform self-learning for the corresponding NLP engine when the analyzed accuracy rate does not meet a predetermined standard. A system that supports an artificial intelligence voice recognition conversation service characterized by

Detecting the user's voice information (speech information) in the AI chatbot;
After converting the voice information (speech information) into text in the AI chatbot, the query content in the text is analyzed to extract response information matching the query content from the learned data set, and at the same time, voice information to an external cloud response server providing text information of (speech information); and
In the AI chatbot, the response information provided from the cloud response server and the response information extracted from the learning data set built in the AI chatbot are compared, and response information having delay time and response accuracy that meets preset conditions is selected, and then uttered contains steps,
After the ignition step,
uttering non-selected response information when a user's rejection or negative language is detected for the uttered response information; and
Deleting the unselected response information if the user's rejection or negative language is not detected within a preset time,
The AI chatbot compares the first response information and the second response information to select and utter response information having a delay time and response accuracy that meets a preset condition,
a wake-up unit requesting a response to the query content (query word) when detecting the query content (query word) included in the speaker's voice information;
a local response provider extracting and providing response information corresponding to a response to the query content (query language) from a learned learning data set; and
An artificial processor including a processor unit including a response standby manager unit that compares response information derived from the local response providing unit and the cloud response server, selects and ignites response information having delay time and response accuracy that meet preset conditions. How to support Intelligent Voice Recognition Conversation Services.

delete