KR20220051690A

KR20220051690A - Terminal apparatus, method using the same and interfacing apparatus

Info

Publication number: KR20220051690A
Application number: KR1020200135407A
Authority: KR
Inventors: 장병순; 김상진; 김태성; 안동훈; 장정호
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2020-10-19
Filing date: 2020-10-19
Publication date: 2022-04-26

Abstract

According to one embodiment of the present invention, a terminal apparatus comprises: a communication unit connected to a data network; a user interface unit; and a processor, when text converted from a voice of a partner terminal device and additional information including emotions of the partner are received in the communication unit through the data network after a request signal for a text-type calling service is transmitted through the data network, controlling the received text and additional information to be displayed on user interface unit.

Description

TERMINAL APPARATUS, METHOD USING THE SAME AND INTERFACING APPARATUS

본 발명은 단말 장치, 이에 의해 수행되는 방법 및 인터페이싱 장치에 관한 것이다.The present invention relates to a terminal device, a method performed thereby, and an interfacing device.

스마트폰 또는 스마트 패드와 같은 스마트 기기는 음성/영상 통화 기능, 그리고 데이터 통신 기능을 제공한다. 사용자는 스마트 기기가 제공하는 데이터 통신 기능을 통해서, 상대방과 메시지를 이용한 채팅을 할 수 있으며, 또한 실시간으로 지도, 음악, 뉴스 또는 사진과 같은 컨텐츠를 공유할 수도 있다.A smart device such as a smartphone or a smart pad provides a voice/video call function and a data communication function. A user may chat with a counterpart using a message through a data communication function provided by a smart device, and may also share content such as maps, music, news, or photos in real time.

한편 최근에는 인공지능 서비스를 제공하는 서버가 등장하고 있다. 이러한 인공지능 서비스 제공 서버는 학습 기능을 갖추고 있기에, 사용될수록 수준 높은 서비스를 제공할 수 있다. Meanwhile, servers that provide artificial intelligence services have recently appeared. Since this artificial intelligence service providing server has a learning function, it can provide a higher level of service the more it is used.

스마트 기기는 인공지능 서비스 제공 서버에 데이터 통신 기능을 이용하여 접속하여서 다양한 서비스를 제공받을 수 있다. 예컨대, 사용자는 스마트 기기를 통해 인공지능 서비스 제공 서버에게 날씨나 개인 스케쥴에 대한 간단한 질문 뿐 아니라 보다 복잡하고 고도한 정보에 대한 질의까지도 할 수 있으며, 이에 대한 응답을 제공받을 수 있다.The smart device can receive various services by accessing the artificial intelligence service providing server using a data communication function. For example, the user can ask not only simple questions about weather or personal schedule but also more complex and sophisticated information to the AI service providing server through a smart device, and receive a response.

한국특허공개공보, 제 2011-0041322호 (2011.04.21. 공개)Korean Patent Laid-Open Publication No. 2011-0041322 (published on April 21, 2011)

사용자가 스마트 기기를 통해 상대방과 음성 또는 영상 통화를 시도하거나 진행함에 있어서, 상황에 따라 통화가 곤란한 상황이 발생될 수 있다. 예컨대 영화가 상영 중인 극장 내에서 또는 어린 아기가 자고 있는 방에서, 사용자는 상대방과 음성 또는 영상 통화를 하기가 곤란할 수 있다. When a user attempts or proceeds with an audio or video call with a counterpart through a smart device, a situation in which it is difficult to make a call may occur depending on circumstances. For example, in a theater where a movie is playing or in a room where a baby is sleeping, it may be difficult for a user to make an audio or video call with the other party.

이 때 사용자와 상대방과 통화를 종료하고 메시지를 통한 채팅을 수행할 수 있다. 그런데 통화를 하기 곤란한 대상은 사용자이지 상대방은 아니다. 상대방으로서는 메시지를 통한 채팅보다 통화가 편함에도 불구하고, 사용자가 겪는 곤란함으로 인해 메시지를 통해 채팅을 해야 하는 상황일 수도 있다.At this time, the user and the other party can end the call and chat through a message. However, it is the user who is difficult to make a call with, not the other party. Although a call is more convenient than chatting through a message as a counterpart, there may be a situation in which chatting through a message is required due to difficulties experienced by the user.

이에, 본 발명의 해결하고자 하는 과제는, 사용자가 스마트 기기를 통해 상대방과 통화를 시도하거나 통화를 진행함에 있어서, 상황에 따라 통화가 곤란한 상황이 발생되었을 때, 사용자는 통화를 종료하지 않고도 메시지를 통한 채팅으로 상대방과 대화를 수행할 수 있을 뿐 아니라, 상대방 역시 통화를 종료하지 않음은 물론 메시지를 통한 채팅이 아닌 통화를 그대로 수행할 수 있도록 하는 기술을 제공하는 것이다.Accordingly, the problem to be solved by the present invention is that when a user tries to make a call with the other party through a smart device or when a call is difficult depending on the situation, the user can send a message without terminating the call. It is to provide a technology that allows not only to perform a conversation with the other party through chatting, but also to allow the other party not to end the call as well as to perform a call rather than chatting through a message.

또한, 이러한 기술이 제공되도록 하는 데에 있어서, 스마트 기기만에 구현되어 있는 시스템이 아닌, 공중망에 구현되어 있는 시스템을 통해 이러한 기술이 제공되도록 하는 것이 본 발명의 해결하고자 하는 과제에 포함될 수 있다.In addition, in providing such technology, it may be included in the problem to be solved by the present invention to provide such technology through a system implemented in a public network, not a system implemented only in smart devices.

다만, 본 발명의 해결하고자 하는 과제는 이상에서 언급한 것으로 제한되지 않으며, 언급되지 않은 또 다른 해결하고자 하는 과제는 아래의 기재로부터 본 발명이 속하는 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다However, the problems to be solved of the present invention are not limited to those mentioned above, and other problems to be solved that are not mentioned can be clearly understood by those of ordinary skill in the art to which the present invention belongs from the following description. will be

일 예에 따른 단말 장치는 데이터망과 연결되는 통신부와, 사용자 인터페이스부와, 텍스트 타입의 통화 서비스에 대한 요청 신호가 상기 데이터망을 통해 송신된 뒤, 상대방 측 단말 장치로부터의 음성을 변환한 텍스트 및 상기 상대방의 감정을 포함하는 부가 정보가 상기 데이터망을 통해 상기 통신부에서 수신되면, 상기 수신된 텍스트 및 부가 정보가 상기 사용자 인터페이스부에서 표시되도록 제어하는 프로세서를 포함한다.A terminal device according to an embodiment includes a communication unit connected to a data network, a user interface unit, and a text converted from voice from the other terminal device after a request signal for a text-type call service is transmitted through the data network and a processor for controlling the received text and additional information to be displayed on the user interface when the additional information including the emotion of the counterpart is received by the communication unit through the data network.

일 예에 따른 방법은 단말 장치가 수행하며, 텍스트 타입의 통화 서비스에 대한 요청 신호를 데이터망을 통해 송신하는 단계와, 상대방 측 단말 장치로부터의 음성을 변환한 텍스트 및 상기 상대방의 감정을 포함하는 부가 정보가 상기 데이터망을 통해 수신되는 단계와, 상기 수신된 텍스트 및 부가 정보를 표시하는 단계를 포함하여 수행된다.The method according to an example includes the steps of: transmitting a request signal for a text-type call service through a data network, performed by a terminal device, and including text converted from voice from the other-side terminal device and emotions of the other party The additional information is received through the data network, and the received text and additional information are displayed.

일 예에 따른 인터페이싱 방법은 인공지능 서비스의 인터페이싱 장치가 수행하며, 통화망과 연결된 상대방 측 단말 장치로부터 음성이 수신되면, 상기 음성을 인공지능 서비스 제공 서버에게 전달하는 단계와, 상기 음성을 변환한 텍스트 및 상기 음성에 기반하여 도출된 상기 상대방 측 단말 장치의 사용자 감정을 나타내는 부가 정보가 상기 인공지능 서비스 제공 서버로부터 수신되면, 상기 상대방 측 단말 장치에게 상기 텍스트 및 상기 부가 정보를 전달하는 단계를 포함하여 수행된다.The interfacing method according to an example is performed by an interfacing device of an artificial intelligence service, and when a voice is received from a terminal device on the other side connected to a call network, transmitting the voice to an artificial intelligence service providing server; and transmitting the text and the additional information to the other-side terminal device when additional information representing a user emotion of the other-side terminal device derived based on text and the voice is received from the artificial intelligence service providing server is performed by

일 실시예에 따르면 단말 장치와 단말 장치가 서로 간에 음성 또는 영상 통화를 시도하거나 진행하는 중에, 어느 하나의 단말 장치는 음성 또는 영상 통화를 유지하지만 다른 하나의 단말 장치는 메시지를 통한 채팅 방식으로 단말 장치와 소통을 할 수 있다,According to an embodiment, while the terminal device and the terminal device are attempting or conducting an audio or video call with each other, one terminal device maintains an audio or video call, but the other terminal device uses a chatting method through a message. can communicate with the device,

도 1은 일 실시예에 따른 인공지능 인터페이싱 장치가 적용된 이동통신망을 개념적으로 도시한 도면이다.
도 2는 도 1에 도시된 인공지능 서비스 제공 서버의 구성을 도시한 도면이다.
도 3은 TTS 모듈에서 변환된 결과에 대한 예시를 도시하고 있다.
도 4는 STT 모듈에서 변환된 결과에 대한 예시를 도시하고 있다.
도 5은 도 1에 도시된 호처리망의 구성을 도시한 도면이다.
도 6은 일 실시예에 따른 단말 장치의 구성을 도시한 도면이다.
도 7은 일 실시예에 따라 단말 장치에 의해 수행되는 방법의 순서를 도시한 도면이다.
도 8은 도 1에 도시된 인공지능 인터페이싱 장치의 구성을 도시한 도면이다.
도 9는 일 실시예에 따른 인공지능 인터페이싱 방법의 흐름을 도시한 도면이다.
도 10과 11 각각은 일 실시예에서 단말 장치의 화면에 대한 예시이다.
도 12는 텍스트 타입의 통화 서비스에서 텍스트 입력이 지연될 때 이를 보완하기 위해 채용되는 알고리즘을 설명하기 위한 도면이다.
도 13은 일 실시예에 따른 인공지능 인터페이싱 방법의 흐름을 도시한 도면이다.1 is a diagram conceptually illustrating a mobile communication network to which an artificial intelligence interfacing device according to an embodiment is applied.
FIG. 2 is a diagram illustrating the configuration of the artificial intelligence service providing server shown in FIG. 1 .
3 shows an example of a result converted by the TTS module.
4 shows an example of the result converted in the STT module.
FIG. 5 is a diagram illustrating the configuration of the call processing network shown in FIG. 1 .
6 is a diagram illustrating a configuration of a terminal device according to an embodiment.
7 is a diagram illustrating a sequence of a method performed by a terminal device according to an embodiment.
FIG. 8 is a diagram illustrating the configuration of the artificial intelligence interfacing device shown in FIG. 1 .
9 is a diagram illustrating a flow of an artificial intelligence interfacing method according to an embodiment.
10 and 11 are each an example of a screen of a terminal device according to an embodiment.
12 is a diagram for explaining an algorithm employed to compensate for delay in text input in a text-type call service.
13 is a diagram illustrating a flow of an artificial intelligence interfacing method according to an embodiment.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only these embodiments allow the disclosure of the present invention to be complete, and common knowledge in the art to which the present invention pertains It is provided to fully inform those who have the scope of the invention, and the present invention is only defined by the scope of the claims.

본 발명의 실시예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명의 실시예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In describing the embodiments of the present invention, if it is determined that a detailed description of a well-known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, the terms to be described later are terms defined in consideration of functions in an embodiment of the present invention, which may vary according to intentions or customs of users and operators. Therefore, the definition should be made based on the content throughout this specification.

도 1은 일 실시예에 따른 인공지능 인터페이싱 장치(100)가 적용된 이동통신망(10)을 개념적으로 도시한 도면이다. 다만, 도 1은 예시적인 것에 불과하므로, 인공지능 인터페이싱 장치(100)(이하, 인터페이싱 장치라고 지칭)가 도 1에 도시된 이동통신망(10)에만 적용되는 것으로 한정 해석되지는 않는다.1 is a diagram conceptually illustrating a mobile communication network 10 to which an artificial intelligence interfacing apparatus 100 according to an embodiment is applied. However, since FIG. 1 is merely exemplary, the artificial intelligence interfacing device 100 (hereinafter referred to as an interfacing device) is not construed as being limited to being applied only to the mobile communication network 10 shown in FIG. 1 .

도 1을 참조하면, 이동통신망(10)은 인터페이싱 장치(100), 인공지능 서비스 제공 서버(200), 통화망(410) 및 데이터망(420)을 포함할 수 있다. 제1 단말 장치(300/1)와 제2 단말 장치(300/2)는 각각 이러한 이동통신망(10)(의 통화망(410)이나 데이터망(420))에 접속하는 단말을 예시적으로 도시한 것이며, 각 단말 장치(300/1,300/2)는 이들 이동통신망(10)에 접속하여서 음성/영상 통화나, 또는 인공지능 서비스와 같은 데이터 서비스를 제공받을 수 있다.Referring to FIG. 1 , the mobile communication network 10 may include an interfacing device 100 , an artificial intelligence service providing server 200 , a communication network 410 , and a data network 420 . The first terminal device 300/1 and the second terminal device 300/2 each exemplarily show a terminal accessing the mobile communication network 10 (the call network 410 or the data network 420 of the mobile communication network 10). In this case, each terminal device 300/1,300/2 may be provided with a data service such as a voice/video call or an artificial intelligence service by accessing the mobile communication network 10.

먼저, 인공지능 서비스 제공 서버(200)는 인공지능 서비스를 제공하는 서버를 지칭한다. 인공지능 서비스 제공 서버(200)는 단말 장치(300/1,300/2) 이외에도 다양한 단말이나 서버와 연결될 수 있으며, 이들 각각에게 다양한 인공지능 서비스를 제공할 수 있다.First, the artificial intelligence service providing server 200 refers to a server providing an artificial intelligence service. The artificial intelligence service providing server 200 may be connected to various terminals or servers in addition to the terminal devices 300/1 and 300/2, and may provide various artificial intelligence services to each of them.

이러한 인공지능 서비스 제공 서버(200)의 구성은 도 2에 도시되어 있다. 도 2를 참조하면, 인공지능 서비스 제공 서버(200)는 인공지능 처리를 수행하는 인공지능 처리부(intelligence workflow, IWF)(210), 자연어를 처리하는 자연어 처리부(natural language understand)(220), 음성 합성을 수행하는 음성 합성부(text to speech, TTS) 모듈(230) 및 텍스트 합성을 수행하는 텍스트 합성부(speech to text, STT)(240)를 포함할 수 있다. 아울러, 이러한 인공지능 서비스 제공 서버(200)는 복수 개의 서버로 이루어진 서버군으로서 형성될 수 있다.The configuration of the artificial intelligence service providing server 200 is shown in FIG. 2 . 2, the artificial intelligence service providing server 200 is an artificial intelligence processing unit (intelligence workflow, IWF) 210 for performing artificial intelligence processing, a natural language processing unit for processing natural language (natural language understand) 220, voice It may include a text to speech (TTS) module 230 for performing synthesis and a speech to text (STT) 240 for performing text synthesis. In addition, the artificial intelligence service providing server 200 may be formed as a server group consisting of a plurality of servers.

이 중, 인공지능 처리부(210)는 인간의 사고 처리 방식을 모방해서 인공지능 서비스를 제공하도록 마련된 구성이다. 예컨대 인공지능 처리부(210)는 이미지 인식, 이미지 캡션, 언어 인식 또는 대화 등와 같은 기능을 수행할 수 있으며, 이를 위해 머신러닝이나 딥러닝와 같은 방식에 의해 사전에 학습된 것일 수 있다. Among them, the artificial intelligence processing unit 210 is configured to provide artificial intelligence services by imitation of a human thinking processing method. For example, the artificial intelligence processing unit 210 may perform functions such as image recognition, image caption, language recognition or conversation, and for this purpose, it may be learned in advance by a method such as machine learning or deep learning.

다음으로, 자연어 처리부(220)는, 질문을 입력받으면 이를 인지하고 분석한 뒤, 그에 대한 추천 응답을 제공하는 알고리즘을 채용할 수 있다. 여기서 추천 응답이란, 예컨대 과거에 동일 사용자가 동일 또는 유사한 질문에 대답한 사례의 빈도에 기반한 것이거나 또는 타 사용자라고 하더라도 동일 또는 유사한 질문에 대답한 다양한 사례를 기반으로 추천된 응답일 수 있다. Next, the natural language processing unit 220 may employ an algorithm that recognizes and analyzes a question when it receives an input and provides a recommended response thereto. Here, the recommended response may be, for example, a response recommended based on the frequency of cases in which the same user answered the same or similar question in the past, or based on various cases in which other users answered the same or similar question.

다음으로, TTS 모듈(230)은 사용자가 입력한 텍스트를 음성으로 합성 내지 변환해서 제공하는 모듈이다. 이 때, 텍스트를 음성으로 변환하는 알고리즘으로는 공지된 다양한 것들이 채용될 수 있다.Next, the TTS module 230 is a module that synthesizes or converts the text input by the user into voice and provides the same. In this case, various well-known algorithms for converting text into speech may be employed.

한편, 이러한 TTS 모듈(230)은 음성을 합성할 때, 사용자에 대해 획득된 감정이나 기분 등을 반영할 수 있다. 이하에서 이러한 감정이나 기분 등을 나타내는 정보는 '부가 정보'라고 지칭하기로 하자. 예컨대 동일한 '안녕하세요'라는 음성이라도 기쁘거나 반가운 감정인 경우는 소리의 크기나 톤을 상대적으로 높인다던가, 슬프거나 우울한 경우는 소리의 크기나 톤을 상대적으로 낮춘다던가 등의 방법으로, TTS 모듈(230)은 음성을 합성할 수 있다. 이러한 예시는 도 3에 도시되어 있다. 도 3을 참조하면, TTS 모듈(230)에 텍스트와 이모티콘이 입력되면, 이모티콘에 대응되는 기분이나 감정이 음성에 반영되어서 TTS 모듈(230)로부터 출력될 수 있다.Meanwhile, the TTS module 230 may reflect an emotion or mood acquired for the user when synthesizing the voice. Hereinafter, information representing such emotions or moods will be referred to as 'additional information'. For example, even if the same 'hello' voice is a happy or welcome emotion, the volume or tone of the sound is relatively increased, or if it is sad or depressed, the volume or tone of the sound is relatively lowered. can synthesize speech. An example of this is shown in FIG. 3 . Referring to FIG. 3 , when text and emoticons are input to the TTS module 230 , a mood or emotion corresponding to the emoticons may be reflected in the voice and output from the TTS module 230 .

이를 위해, TTS 모듈(230)은 사용자의 감정이나 기분 등을 파악할 수 있어야 하는데, 이는 이하에서의 실시예 중 어느 하나를 채용함으로써 구현 가능하다.To this end, the TTS module 230 must be able to grasp the user's emotion or mood, which can be implemented by employing any one of the following embodiments.

첫째, TTS 모듈(230)은, 사용자가 텍스트와 함께 입력한 이모티콘 등을 인지 내지 해석하는 모듈을 추가로 구비할 수 있다. 예컨대 TTS 모듈(230)에는 각각의 이모티콘마다 감정이 매핑되어 있는 테이블이 구비되어 있을 수 있다. 텍스트와 함께 웃는 이모티콘, 우는 이모티콘, 화내는 이모티콘 또는 당황한 이모티콘 등이 수신되면, TTS 모듈(230)은 이러한 이모티콘에 대응되는 감정을 전술한 테이블로부터 획득할 수 있다.First, the TTS module 230 may additionally include a module for recognizing or interpreting an emoticon input by a user along with text. For example, the TTS module 230 may include a table in which emotions are mapped for each emoticon. When a smiley emoticon, a crying emoticon, an angry emoticon, or a bewildered emoticon is received along with the text, the TTS module 230 may obtain an emotion corresponding to the emoticon from the above-described table.

둘째, TTS 모듈(230)은 사람의 감정이나 기분 (또는 뉘앙스) 등을 나타내는 별도의 정보를, 단말 장치로부터 획득할 수도 있다. 이를 위해, 예컨대, 단말 장치(300/2)에는 사용자 자신의 감정이나 기분 등을 입력하는 별도의 버튼 등이 마련되어 있을 수 있다. 만약 사용자가 소정의 텍스트를 입력하면서 자신의 감정이나 기분 등을 나타내는 버튼을 누르면, 눌러진 버튼에 해당되는 그 사용자의 감정이나 기분 등에 대한 정보가 TTS 모듈(230)에게 전달될 수 있다.Second, the TTS module 230 may obtain separate information indicating a person's emotion or mood (or nuance) from the terminal device. To this end, for example, the terminal device 300/2 may be provided with a separate button for inputting the user's own emotion or mood. If the user presses a button indicating his or her emotion or mood while inputting a predetermined text, information on the user's emotion or mood corresponding to the pressed button may be transmitted to the TTS module 230 .

셋째, TTS 모듈(230)은 텍스트에 나타난 맥락이나 분위기를 인지하는 별도의 모듈을 포함할 수도 있다. 이를 위해 TTS 모듈(230)에는 자연어를 이해하고 인지하는 알고리즘이 채용되어 있을 수 있다.Third, the TTS module 230 may include a separate module for recognizing the context or atmosphere shown in the text. To this end, an algorithm for understanding and recognizing natural language may be employed in the TTS module 230 .

한편, 이러한 TTS 모듈(230)은 채팅의 당사자로부터 텍스트가 입력되기 시작했는데 소정 시간 동안 텍스트의 입력이 종료, 즉 완료가 되지 않을 경우, 이러한 텍스트의 입력 완료가 지연되고 있음을 나타내는 소리를 음성 사이에 또는 음성 중간에 삽입할 수 있다. 이러한 소리의 종류에는 예컨대 '음...', '그러니까....', '잠깐만....'과 같이, 대화에서 일반적으로 지연을 나타내는 단어가 포함될 수 있다.On the other hand, if the TTS module 230 starts inputting text from the chatting party, but the text input is not completed for a predetermined time, it emits a sound indicating that the text input completion is delayed between voices. It can be inserted into or in the middle of a voice. These types of sounds may include words that typically indicate a delay in a conversation, such as 'umm...', 'so...', 'wait a minute...'.

다음으로, STT 모듈(240)은 사용자가 발한 음성을 텍스트로 합성 내지 변환해서 제공하는 모듈이다. 이 때, 음성을 텍스트로 변환하는 알고리즘으로는 공지된 다양한 것들이 채용될 수 있다.Next, the STT module 240 is a module that synthesizes or converts the user's voice into text and provides it. In this case, various well-known algorithms for converting speech into text may be employed.

한편, 일 실시예에 따른 STT 모듈(240)은 사용자가 입력한 음성으로부터 사용자의 감정이나 기분 등을 판단하도록 학습된 모듈을 포함할 수 있다. 예컨대 STT 모듈(240)은 사용자의 음성의 주파수나 높낮이, 단어와 단어 사이의 간격 또는 음성으로부터 파악된 문맥(context) 등에 기초해서, 음성을 발하고 있는 사용자의 상태를 기쁨, 슬픔, 우울함, 화남, 즐거움, 당황함 또는 흥분함 등으로 분류할 수 있다. On the other hand, the STT module 240 according to an embodiment may include a module learned to determine the user's emotion or mood from the voice input by the user. For example, the STT module 240 determines the state of the user who is emitting a voice, such as joy, sadness, depression, or anger, based on the frequency or pitch of the user's voice, the spacing between words, or the context recognized from the voice. , pleasure, embarrassment, or excitement.

아울러, 이러한 STT 모듈(240)은 텍스트를 합성할 때, 사용자에 대해 판단된 전술한 감정이나 기분 등을 반영할 수 있다. 예컨대 합성된 '안녕하세요'라는 텍스트에 대해, 사용자가 기쁘거나 반가운 감정인 경우에 그에 대응되는 이모티콘 등을 텍스트의 앞이나 중간 또는 끝에 부가할 수 있고, 또한 슬프거나 우울한 감정이 경우에 그에 대응되는 이모티콘 등을 텍스트에 동일/유사한 방식으로 부가할 수도 있다. 이러한 예시는 도 4에 도시되어 있다. 도 4를 참조하면 STT 모듈(240)에 음성이 입력되면, 음성으로부터 파악된 기분이나 감정이 이모티콘의 형태로 변환되어서 텍스트와 함께 STT 모듈(240)로부터 출력될 수 있다.In addition, when synthesizing text, the STT module 240 may reflect the above-described emotion or mood determined for the user. For example, with respect to the synthesized 'hello' text, if the user is happy or happy, a corresponding emoticon may be added to the front, middle, or end of the text. may be added to the text in the same/similar manner. An example of this is shown in FIG. 4 . Referring to FIG. 4 , when a voice is input to the STT module 240 , a mood or emotion recognized from the voice may be converted into an emoticon and output from the STT module 240 together with the text.

다시 도 1을 참조하면, 통화망(410)과 데이터망(420)은 단말 장치(300/1,300/2)들이 접속하는 통신망이다. 도 5는 통화망(410)과 데이터망(420)에 대한 구성을 개념적으로 도시한 도면이다. Referring back to FIG. 1 , the communication network 410 and the data network 420 are communication networks through which the terminal devices 300/1 and 300/2 connect. FIG. 5 is a diagram conceptually illustrating the configuration of the communication network 410 and the data network 420 .

도 5를 참조하면, 통화망(410)은 지능망이라고 지칭될 수 있으며, 예컨대 IMS(IP multimedia subsystem)일 수 있다. Referring to FIG. 5 , the communication network 410 may be referred to as an intelligent network, for example, an IP multimedia subsystem (IMS).

이러한 통화망(410)은 교환기망(Call Session Control Function,CSCF)(411), 응용 서비스 노드(telephony application server, TAS)(412) 또는 미디어 리소스 서버(media resource function, MRF)(413)를 포함하며, 그 외에 HLR(Home Location Register)(414), MGCF(Media Gateway Control Function)(415), MGW(Media Gateway)(416), SCC AS(Service Centralization and Continuity Application Server)(417) 등을 포함할 수 있다. 아울러, 이하에서 설명할 각 구성들이 수행하는 기능은 예시적인 것에 불과하다. 따라서, 각 구성들은 이하에서 기술되는 기능 이외의 다른 기능들을 추가적으로 수행할 수 있다.The call network 410 includes a call session control function (CSCF) 411 , an application service node (telephony application server, TAS) 412 or a media resource function (MRF) 413 . In addition, HLR (Home Location Register) (414), MGCF (Media Gateway Control Function) (415), MGW (Media Gateway) (416), SCC AS (Service Centralization and Continuity Application Server) (417), etc. can do. In addition, the functions performed by each configuration to be described below are merely exemplary. Accordingly, each of the components may additionally perform functions other than the functions described below.

이 중 교환기망(411)은 각 단말 장치(300/1,300/2)의 위치, 즉 해당 단말 장치가 어떠한 기지국에 연결되어 있는지에 대한 정보를 획득한다. Among them, the switching network 411 obtains information on the location of each terminal device 300/1 and 300/2, that is, to which base station the corresponding terminal device is connected.

응용 서비스 노드(412)는 전화와 관련된 기본 기능 및 전화와 관련된 부가 서비스(call hold, swap, forward) 등을 처리한다. 예컨대 응용 서비스 노드(412)는 단말 장치(300/2)로부터 수신된 '텍스트 타입의 통화 서비스에 대한 요청'을 처리할 수 있다. 텍스트 타입의 통화 서비스란, 단말 장치(300/1)와 단말 장치(300/2)가 서로 간에 음성 또는 영상 통화를 시도하거나 진행하는 중에, 어느 하나의 단말 장치(300/1)는 음성 또는 영상 통화를 유지하지만 다른 하나의 단말 장치(300/2)는 메시지를 통한 채팅 방식으로 단말 장치(300/1)와 소통을 하는 서비스를 지칭한다. 이에 대해서는 뒤에 보다 자세하게 살펴보기로 한다.The application service node 412 processes basic functions related to a phone call and additional services (call hold, swap, forward) related to a phone call. For example, the application service node 412 may process a 'text-type call service request' received from the terminal device 300/2. The text-type call service means that while the terminal device 300/1 and the terminal device 300/2 are attempting or conducting an audio or video call with each other, any one of the terminal devices 300/1 is While maintaining a call, the other terminal device 300/2 refers to a service that communicates with the terminal device 300/1 in a chatting method through a message. We will look at this in more detail later.

또한, 응용 서비스 노드(412)는 단말 장치 상호 간에 통화가 연결되거나 끊긴 경우, 이를 이하에서 설명할 인터페이싱 장치(100)에게 통보할 수 있다.Also, when a call is connected or disconnected between the terminal devices, the application service node 412 may notify the interfacing device 100 to be described below.

미디어 리소스 서버(413)는 코덱 변환을 수행한다. 코덱 변환을 통해, 서로 상이한 사양의 단말 장치 간의 패킷 교환이 가능하다. 이를 위해, 미디어 리소스 서버(413)는 코덱 변환 모듈을 포함할 수 있다.The media resource server 413 performs codec conversion. Through codec conversion, packet exchange between terminal devices of different specifications is possible. To this end, the media resource server 413 may include a codec conversion module.

미디어 리소스 서버(413)는 미디어 데이터를 전달(forking)한다. 예컨대 미디어 리소스 서버(413)는 제1 단말 장치(300/1)와 제2 단말 장치(300/2) 간에 음성/영상이 전달되도록 할 수 있다. 또한 미디어 리소스 서버(413)는 각 단말 장치(300/1,300/2)로부터 전달받은 미디어 데이터를 인터페이싱 장치(100)에게 전달할 수 있으며, 또한 반대 방향으로 전달할 수도 있다. 이를 위해, 미디어 리소스 서버(413)는 각 단말 장치(300/1,300/2) 또는 인터페이싱 장치(100)와의 통신을 위한 통신 모듈을 포함할 수 있다.The media resource server 413 forking media data. For example, the media resource server 413 may transmit audio/video between the first terminal device 300/1 and the second terminal device 300/2. Also, the media resource server 413 may transmit the media data received from each terminal device 300/1 and 300/2 to the interfacing device 100, and may also transmit it in the opposite direction. To this end, the media resource server 413 may include a communication module for communication with each of the terminal devices 300/1 and 300/2 or the interfacing device 100 .

미디어 리소스 서버(413)는 미디어 데이터를 믹싱(mixing)(또는 먹싱(muxing))한다. 미디어 데이터란 각 단말 장치(300/1,3002)에게 제공되는 음성/영상 통화, 데이터 패킷 또는 DTMF(dual tone multiple frequency) 신호 등을 포함할 수 있으며 다만 이에 한정되는 것은 아니다. The media resource server 413 mixes (mixing) (or muxing) media data. The media data may include, but is not limited to, voice/video calls, data packets, or dual tone multiple frequency (DTMF) signals provided to each terminal device 300/1,3002.

믹싱에 있어서, 미디어 리소스 서버(413)는 다양한 객체로부터 전달받은 음원들을 서로 믹싱할 수 있다. 예컨대, 미디어 리소스 서버(413)는 각 단말 장치(300/1,300/2)에게 전달될 음성/영상에 인공지능 서비스 제공 서버(200)로부터 전달받은 음성 신호를 믹싱할 수 있으며, 또한 각 단말 장치(300/1,300/2)에게 전달될 음성/영상에 기 정의된 음원이나 영상(이하에서는 대기 음원 또는 대기 영상이라고 지칭)을 믹싱할 수 있다. 아울러, 미디어 리소스 서버(413)는 믹싱되는 음원들의 크기를 조절할 수 있는데, 이러한 조절은 응용 서비스 노드(412)로부터 전달받은 명령에 의해 수행 가능하다. 이를 위해, 미디어 리소스 서버(413)는 믹싱 모듈을 포함할 수 있다.In mixing, the media resource server 413 may mix sound sources received from various objects with each other. For example, the media resource server 413 may mix the audio signal received from the artificial intelligence service providing server 200 with the audio/video to be delivered to each terminal device 300/1,300/2, and also each terminal device ( 300/1,300/2), a predefined sound source or image (hereinafter referred to as a standby sound source or a standby image) may be mixed with the audio/video to be delivered. In addition, the media resource server 413 may adjust the size of the sound sources to be mixed, and such adjustment may be performed by a command received from the application service node 412 . To this end, the media resource server 413 may include a mixing module.

한편, 도시된 HLR(Home Location Register)(414), MGCF(Media Gateway Control Function)(415), MGW(Media Gateway)(416), SCC AS(Service Centralization and Continuity Application Server)(417)의 경우 이미 공지된 구성과 동일하므로 이에 대한 설명은 생략하기로 한다.On the other hand, in the case of the shown Home Location Register (HLR) 414, MGCF (Media Gateway Control Function) 415, MGW (Media Gateway) 416, SCC AS (Service Centralization and Continuity Application Server) (417) already Since it is the same as the known configuration, a description thereof will be omitted.

다음으로, 데이터망(420)은 레거시 호처리망이라고도 지칭되며, 예컨대 WCDMA와 같은 이동통신망을 의미할 수 있다. 이러한 데이터망(420)은 각 단말 장치(300/1,300/2)에게 음성/영상 서비스를 제공할 수 있다. 또한 이러한 데이터망(420)은 각 단말 장치(300/1,300/2)에게 앱 기반의 소정의 서비스를 제공할 수도 있다. Next, the data network 420 is also referred to as a legacy call processing network, and may mean, for example, a mobile communication network such as WCDMA. The data network 420 may provide an audio/video service to each terminal device 300/1 and 300/2. Also, the data network 420 may provide an app-based predetermined service to each terminal device 300/1 and 300/2.

데이터망(420)은 MSC(mobile switching center)(421) 또는 홈 위치 등록기(home location register, HLR)(422)를 포함하고, 그 외에 GGSN(Gateway General packet radio service Support Node)(423), NodeB(424), RNC(Radio Network Controller)(425), SGSN(Serving General packet radio service Support Node)(426), CGS(Cellular Gateway Switch)(427), MME(mobility management entity)(431), PGW(packet data network gateway)(432)를 포함하며, 이외에도 eNodeB(433), SGW(Serving Gateway)(434), PCRF(Policy & Charging Rule Function)(435), HSS(Home Subscriber Server)(436) 등을 포함하거나 이들과 연결될 수 있다. 이러한 데이터망(420)은 공지된 망과 동일한 구성을 가질 수 있는 바, 이러한 데이터망(420)에 대한 자세한 설명은 생략하기로 한다.The data network 420 includes a mobile switching center (MSC) 421 or a home location register (HLR) 422, in addition to a Gateway General packet radio service Support Node (GGSN) 423, NodeB (424), RNC (Radio Network Controller) (425), SGSN (Serving General packet radio service Support Node) (426), CGS (Cellular Gateway Switch) (427), MME (mobility management entity) (431), PGW ( packet data network gateway) 432, and in addition to eNodeB 433, SGW (Serving Gateway) 434, PCRF (Policy & Charging Rule Function) 435, HSS (Home Subscriber Server) (436), etc. may include or be linked to. Since this data network 420 may have the same configuration as a known network, a detailed description of the data network 420 will be omitted.

다시 도 1을 참조하면, 각 단말 장치(300/1,300/2)는 통화망(410)이나 데이터망(420)을 구성하는 복수 개의 셀 중의 어느 하나(또는 둘 이상)의 셀에 위치하여서 음성/영상 통화 또는 인공지능 서비스와 같은 데이터 서비스를 제공받을 수 있다. 이러한 단말 장치(300/1,300/2)는 스마트폰이나 스마트 패드 또는 태블릿 패드와 같이 다양한 형태로 구현 가능하다. 도 1에 도시된 단말 장치(300/1,300/2) 중 어느 하나의 단말 장치가 발신 단말이면 다른 하나의 단말 장치는 수신 단말일 수 있다.Referring back to FIG. 1 , each terminal device 300/1,300/2 is located in any one (or two or more) cells of a plurality of cells constituting the communication network 410 or the data network 420, so that the voice / Data services such as video calls or artificial intelligence services may be provided. These terminal devices 300/1 and 300/2 can be implemented in various forms, such as a smart phone, a smart pad, or a tablet pad. If any one of the terminal devices 300/1 and 300/2 shown in FIG. 1 is a calling terminal, the other terminal device may be a receiving terminal.

이러한 단말 장치(300/1,300/2)에 대해서는 이하의 도 6에서 보다 자세하게 살펴보기로 한다.The terminal devices 300/1 and 300/2 will be described in more detail with reference to FIG. 6 below.

도 6은 일 실시예에 따른 단말 장치(300/2)의 구성에 대해 개념적으로 도시한 도면이다. 다만, 도 6에 도시된 도면은 예시적인 것에 불과한 바, 단말 장치(300/2)의 구성이 도 6에 도시된 것으로 한정 해석되지는 않는다. 아울러, 도 6에 도시된 도면 및 이에 대한 설명은 단말 장치(300/1)에 대해서도 동일하게 적용 가능하다.6 is a diagram conceptually illustrating a configuration of a terminal device 300/2 according to an embodiment. However, the diagram shown in FIG. 6 is merely exemplary, and the configuration of the terminal device 300/2 is not limited to that shown in FIG. 6 . In addition, the drawing shown in FIG. 6 and the description thereof are equally applicable to the terminal device 300/1.

도 6을 참조하면, 단말 장치(300/2)는 통신부(310), 사용자 인터페이스부(320), 스피커부(330), 메모리(340) 및 프로세서(350)를 포함하며, 또한 도시되지 않은 구성을 더 포함할 수 있다.Referring to FIG. 6 , the terminal device 300/2 includes a communication unit 310 , a user interface unit 320 , a speaker unit 330 , a memory 340 , and a processor 350 , and a configuration not shown. may further include.

통신부(310)는 3G나 4G, 5G, Wi-Fi, NFC 등의 무선 통신 모듈을 포함할 수 있다. 이러한 통신부(310)는 도 1에 도시된 통화망(410) 또는 데이터망(420)과 연결된다.The communication unit 310 may include a wireless communication module such as 3G, 4G, 5G, Wi-Fi, or NFC. The communication unit 310 is connected to the communication network 410 or the data network 420 illustrated in FIG. 1 .

사용자 인터페이스부(320)는 입력부와 출력부를 포함한다. The user interface unit 320 includes an input unit and an output unit.

입력부는 사용자로부터의 터치 등을 입력받는 구성이며, 터치 인터페이스 등을 포함할 수 있다. 이러한 입력부를 통해 사용자는 자신이 원하는 텍스트를 입력하거나 자신이 원하는 어플리케이션 등을 터치함으로써 실행시킬 수 있으며, 특정 어플리케이션에서 특정 기능을 활성화 내지 비활성화 시킬 수 있다.The input unit is configured to receive a touch input from a user, and may include a touch interface and the like. Through such an input unit, a user can execute a desired text or touch a desired application, and can activate or deactivate a specific function in a specific application.

출력부는 사용자에게 다양한 정보를 제공해주는 구성이며, LED나 LCD과 같은 다양한 종류의 구성을 이용하여 구현 가능하다. 사용자는 이러한 출력부를 통해서, 현재 단말 장치(300/2)에서 어떤 어플리케이션이 실행되고 있는지, 해당 어플리케이션과 관련하여 어떠한 메시지가 표출되고 있는지 등을 확인 가능하다.The output unit is a configuration that provides various information to the user, and can be implemented using various types of configurations such as LED or LCD. The user can check which application is currently being executed in the terminal device 300/2, which message is being displayed in relation to the corresponding application, and the like through this output unit.

이 때 전술한 입력부와 출력부는 하나의 터치 인터페이스 모듈의 형태로 구현 가능하다.In this case, the above-described input unit and output unit may be implemented in the form of one touch interface module.

스피커부(330)는 음성을 출력하는 구성이다. 이러한 스피커부(330)는 다양한 형태의 스피커 모듈을 통해 구현 가능하다. 사용자는 이러한 스피커부(330)를 통해, 상대방의 음성을 듣거나 또는 어플리케이션에서 출력하는 다양한 음향을 들을 수 있다.The speaker unit 330 is configured to output audio. The speaker unit 330 can be implemented through various types of speaker modules. The user can listen to the voice of the other party or hear various sounds output from the application through the speaker unit 330 .

메모리(340)는 다양한 종류의 정보를 저장하는 구성이다. 또한, 메모리(340)에는 다양한 종류의 어플리케이션이 설치될 수 있다.The memory 340 is a configuration for storing various types of information. In addition, various types of applications may be installed in the memory 340 .

프로세서(350)는 이하에서 설명될 다양한 기능을 실행되도록 하는 구성으로서, 예컨대 마이크로프로세서에 의해 구현 가능하다.The processor 350 is a configuration that executes various functions to be described below, and may be implemented by, for example, a microprocessor.

프로세서(350)에 대해 보다 구체적으로 살펴보면, 프로세서(350)는 텍스트 타입의 통화 서비스에 대한 전반적인 제어를 수행한다. 예컨대 프로세서(350)는 텍스트 타입의 통화 서비스에 대한 요청 신호가 데이터망(420)을 통해 송신되도록 제어한다. 이러한 제어는, 단말 장치(300/1)와 단말 장치(300/2)가 음성호를 수행하거나 음성호를 시도하는 중, 단말 장치(300/2)의 사용자 인터페이스부(320)를 통해 텍스트 타입의 통화 서비스에 대한 요청이 입력되면 수행 가능하다.Looking at the processor 350 in more detail, the processor 350 performs overall control of a text-type call service. For example, the processor 350 controls a request signal for a text type call service to be transmitted through the data network 420 . This control is performed in text type through the user interface unit 320 of the terminal device 300/2 while the terminal device 300/1 and the terminal device 300/2 are performing a voice call or attempting a voice call. It can be performed when a request for call service is input.

한편, 전술한 요청 신호는 인공지능 서비스의 인터페이싱 장치(100)를 통해서 인공지능 서비스 제공 서버(200)에게 전달된다. 이후부터는 단말 장치(300/1)로부터의 음성이나 그에 대한 부가 정보, 즉 단말 장치(300/1)의 사용자의 기분이나 감정을 나타내는 부가 정보가, 단말 장치(300/1)의 사용자 인터페이스부(320)에서는 그에 대응되는 텍스트나 이모티콘 또는 특수문자 등의 형태로 표시될 수 있다.Meanwhile, the aforementioned request signal is transmitted to the artificial intelligence service providing server 200 through the artificial intelligence service interfacing device 100 . From now on, the voice from the terminal device 300/1 or additional information therefor, that is, additional information representing the mood or emotion of the user of the terminal device 300/1, is transmitted to the user interface unit ( 320) may be displayed in the form of text, emoticons, or special characters corresponding thereto.

또한, 프로세서(350)는 전술한 텍스트 타입의 통화 서비스가 진행 중에도, 단말 장치(300/1)의 사용자에 대한 음성이 스피커부((330)에서 출력되도록 제어할 수 있다.Also, the processor 350 may control the speaker unit 330 to output a voice for the user of the terminal device 300/1 even while the text-type call service is in progress.

한편, 텍스트 타입의 통화 서비스가 제공되는 경우, 단말 장치(300/2)의 사용자는 사용자 인터페이스부(320)를 통해 텍스트 또는 자신의 기분이나 감정 등을 입력할 수 있다. 그러면 이러한 텍스트, 그리고 기분이나 감정 등을 포함하는 부가 정보가 단말 장치(300/1)에게 음성 형태로 전환되어서 전달되어서 출력된다. 이 때 프로세서(350)는 이러한 음성이, 단말 장치(300/1)에서 출력 중인지 또는 출력 완료되었는지 여부를 추정할 수 있거나, 또는 단말 장치(300/1)로부터 출력 완료되었다는 신호를 수신받음으로써 인지할 수 있다. 그러면 프로세서(350)는 이렇게 추정된 것 또는 수신받은 정보를 기초로, 단말 장치(300/1)에서 음성이 출력 중이거나 출력 완료되었다는 정보가 사용자 인터페이스부(320)에서 표시되도록 제어할 수 있다.Meanwhile, when a text-type call service is provided, the user of the terminal device 300/2 may input text or his or her mood or emotion through the user interface unit 320 . Then, the text and additional information including mood or emotion are converted into voice form and transmitted to the terminal device 300/1, and then output. At this time, the processor 350 may estimate whether the voice is being output from the terminal device 300/1 or whether the output is completed, or recognize it by receiving a signal indicating that the output is completed from the terminal device 300/1. can do. Then, the processor 350 may control the user interface unit 320 to display information indicating that the voice is being output from the terminal device 300/1 or that the output has been completed based on the estimated or received information.

도 7은 일 실시예에 단말 장치(300/2)에서 수행되는 방법에 대한 절차의 순서를 도시한 순서도이다. 다만 도 7은 예시적인 것에 불과하다.7 is a flowchart illustrating a procedure for a method performed in the terminal device 300/2 according to an embodiment. However, FIG. 7 is merely exemplary.

도 7을 참조하면, 텍스트 타입의 통화 서비스에 대한 요청 신호를 데이터망을 통해 송신하는 단계(S100)가 수행된다.Referring to FIG. 7 , a step S100 of transmitting a request signal for a text-type call service through a data network is performed.

또한, 상대방 측 단말 장치로부터의 음성을 변환한 텍스트 및 상기 상대방의 감정을 포함하는 부가 정보가 상기 데이터망을 통해 수신되는 단계(S110)가 수행된다. In addition, the step (S110) of receiving the text converted from the voice from the other terminal device and the additional information including the emotion of the other party through the data network is performed (S110).

또한, 상기 수신된 텍스트 및 부가 정보를 표시하는 단계(S120)가 수행된다.In addition, the step of displaying the received text and additional information (S120) is performed.

다음으로, 인터페이싱 장치(100)에 대해 살펴보기로 하자.Next, the interfacing apparatus 100 will be described.

도 8은 도 1에 도시된 인터페이싱 장치(100)의 구성을 도시한 도면이다. 먼저, 인터페이싱 장치(100)는 이하에서 설명할 기능을 수행하는 서버군에서 구현 가능하다. 아울러, 인터페이싱 장치(100)는 ACS(Augmented Communication System) 또는 ACP(Augmented Communicatin Platform)라고 지칭될 수도 있다. FIG. 8 is a diagram illustrating the configuration of the interfacing device 100 shown in FIG. 1 . First, the interfacing apparatus 100 can be implemented in a server group that performs a function to be described below. In addition, the interfacing apparatus 100 may be referred to as an Augmented Communication System (ACS) or an Augmented Communicatin Platform (ACP).

이러한 인터페이싱 장치(100)는 전술한 메시지를 통한 채팅 방식의 통화 서비스(이하, '텍스트 타입의 통화 서비스'라고 지칭될 수도 있음)를 지원할 수 있다. 즉, 인터페이싱 장치(100)는, 단말 장치(300/1)와 단말 장치(300/2)가 서로 간에 음성 또는 영상 통화를 시도하거나 진행하는 중에, 어느 하나의 단말 장치(300/1)는 음성 또는 영상 통화를 유지하지만 다른 하나의 단말 장치(300/2)는 메시지를 통한 채팅 방식으로 단말 장치(300/1)와 소통을 하도록 지원할 수 있다. 이를 위해, 인터페이싱 장치(100)는 단말 장치(300/1)로부터의 음성을 인공지능 서비스 제공 서버(200)에게 제공하여서 그에 대응되는 텍스트를 제공받을 수 있고, 이렇게 제공받은 텍스트를 단말 장치(300/2)에게 제공할 수 있다. 뿐만 아니라, 인터페이싱 장치(100)는 단말 장치(300/2)로부터의 텍스트를 인공지능 서비스 제공 서버(200)에게 제공하여서 그에 대응되는 음성을 제공받을 수 있고, 이렇게 제공받은 음성을 단말 장치(300/1)에게 제공할 수 있다. The interfacing apparatus 100 may support a chatting-type call service (hereinafter, may also be referred to as a 'text-type call service') through the aforementioned message. That is, in the interfacing device 100, while the terminal device 300/1 and the terminal device 300/2 attempt or proceed with an audio or video call with each other, any one of the terminal devices 300/1 Alternatively, while maintaining the video call, the other terminal device 300/2 may support communication with the terminal device 300/1 through a chatting method through a message. To this end, the interfacing device 100 may provide a voice from the terminal device 300/1 to the artificial intelligence service providing server 200 to receive a text corresponding thereto, and transmit the received text to the terminal device 300 . /2) can be provided. In addition, the interfacing device 100 may provide a text from the terminal device 300/2 to the artificial intelligence service providing server 200 to receive a corresponding voice, and transmit the received voice to the terminal device 300 . /1) can be provided.

이러한 인터페이싱 장치(100)는, 도 8을 참조하면 인공지능망 인터페이싱부(110), 통화망 인터페이싱부(120), 데이터망 인터페이싱부(130) 및 프로세서(140)를 포함한다. 다만, 도 8은 예시적인 것에 불과한 바, 인터페이싱 장치(100)는 도 8에 도시된 것으로 한정 해석되지 않는다. 예컨대 인터페이싱 장치(100)는 메모리를 더 포함할 수도 있고, 또는 도 8에 도시된 구성 중 적어도 하나를 포함하지 않을 수도 있다.The interfacing apparatus 100 includes an artificial intelligence network interfacing unit 110 , a call network interfacing unit 120 , a data network interfacing unit 130 , and a processor 140 , with reference to FIG. 8 . However, FIG. 8 is only an example, and the interfacing apparatus 100 is not interpreted as being limited to that illustrated in FIG. 8 . For example, the interfacing apparatus 100 may further include a memory or may not include at least one of the components illustrated in FIG. 8 .

인공지능망 인터페이싱부(110), 통화망 인터페이싱부(120) 및 데이터망 인터페이싱부(130) 각각은 음성이나 영상 데이터 또는 채팅 메시지와 같은 패킷 데이터를 인공지능 서비스 제공 서버(200), 통화망(410) 및 데이터망(420) 각각과 송수신한다. 이를 위해, 각각의 인터페이싱부(110 내지 130)는 유선 또는 무선 통신 모듈을 포함할 수 있다.Each of the artificial intelligence network interfacing unit 110 , the call network interfacing unit 120 , and the data network interfacing unit 130 transmits packet data such as voice or video data or a chatting message to the artificial intelligence service providing server 200 , the communication network 410 . ) and the data network 420, respectively. To this end, each of the interfacing units 110 to 130 may include a wired or wireless communication module.

프로세서(140)는 이하에서 설명할 기능을 수행하도록 프로그램된 명령어를 저장하는 메모리와, 이러한 명령어를 실행하는 마이크로프로세서에 의해 구현 가능하다.The processor 140 may be implemented by a memory for storing instructions programmed to perform a function to be described below, and a microprocessor for executing these instructions.

보다 구체적으로 살펴보면, 프로세서(140)는 통화망(410)이나 데이터망(420) 각각에 대한 설정을 수행할 수 있다. 또한 프로세서(140)는 통화망(410)으로부터 수신받은 음성, 또는 데이터망(420)으로부터 수신받은 텍스트와 같은 패킷이 인공지능 서비스 제공 서버(200)에게 전달되도록 제어할 수 있다. 또한 프로세서(140)는 인공지능 서비스 제공 서버(200)로부터 수신받은 음성이나 텍스트와 같은 패킷이 통화망(410)이나 데이터망(240)에게 전달되도록 제어할 수 있다.More specifically, the processor 140 may perform setting for each of the communication network 410 and the data network 420 . In addition, the processor 140 may control a packet such as a voice received from the communication network 410 or a text received from the data network 420 to be delivered to the artificial intelligence service providing server 200 . In addition, the processor 140 may control packets such as voice or text received from the artificial intelligence service providing server 200 to be delivered to the communication network 410 or the data network 240 .

또한. 프로세서(140)는 각 단말 장치(300/1,300/2) 중 어느 한 단말 장치, 예컨대 단말 장치(300/1)에게 전달된 음성이 해당 단말 장치(300/1)에서 출력이 완료되었는지 여부를 추정할 수 있고, 출력 완료된 것으로 추정되면 이를 타 단말 장치(300/2)에게 알림이 전달되도록 제어할 수 있다. 이를 위해 프로세서(140)는 단말 장치(300/1)에게 전달된 음성이, 단말 장치(300/1)에서 출력 완료되기까지 소요되는 시간을 계산하는 알고리즘을 채용하고 있을 수 있다. 이를 통해, 어느 단말 장치의 사용자는 타 단말 장치에게 음성이 출력 중인지 아니면 출력이 완료되었는지 여부를 인식할 수 있다.In addition. The processor 140 estimates whether the output of the voice transmitted to any one of the terminal devices 300/1 and 300/2, for example, the terminal device 300/1, has been completed in the corresponding terminal device 300/1. and when it is estimated that the output is completed, it is possible to control the notification to be delivered to the other terminal device 300/2. To this end, the processor 140 may employ an algorithm for calculating the time it takes for the voice transmitted to the terminal device 300/1 to be output from the terminal device 300/1. Through this, the user of a certain terminal device may recognize whether the voice is being output to another terminal device or whether the output has been completed.

또한, 프로세서(140)는 전술한 TTS 모듈(230)에 채용되어 있는 알고리즘, 즉 텍스트의 입력이 시작되었는데 소정 시간 동안 완료가 되지 않을 경우, 이러한 텍스트의 입력 완료가 지연되고 있음을 나타내는 소리를 음성 사이에 또는 음성 중간에 삽입하는 알고리즘을 채용하고 있을 수도 있다. 여기서, 실시예에 따라, 만약 TTS 모듈(230)에 전술한 알고리즘이 채용되어 있지 않다면 프로세서(140)가 이러한 알고리즘을 채용할 수 있고, 이와 달리 TTS 모듈(230)에 전술한 알고리즘이 채용되어 있다면 프로세서(140)는 이러한 알고리즘을 채용하지 않을 수 있다. In addition, when the algorithm employed in the above-described TTS module 230, that is, input of text is not completed for a predetermined time, the processor 140 sounds a sound indicating that the completion of the text input is delayed. Algorithms that insert between or in the middle of speech may be employed. Here, according to the embodiment, if the above-described algorithm is not employed in the TTS module 230 , the processor 140 may employ such an algorithm, otherwise, if the above-described algorithm is employed in the TTS module 230 , The processor 140 may not employ such an algorithm.

이상에서 살펴본 바와 같이, 일 실시예에 따르면 단말 장치(300/1)와 단말 장치(300/2)가 서로 간에 음성 또는 영상 통화를 시도하거나 진행하는 중에, 어느 하나의 단말 장치(300/1)는 음성 또는 영상 통화를 유지하지만 다른 하나의 단말 장치(300/2)는 메시지를 통한 채팅 방식으로 단말 장치(300/1)와 소통을 할 수 있다. As described above, according to an embodiment, while the terminal device 300/1 and the terminal device 300/2 are attempting or conducting an audio or video call with each other, any one of the terminal devices 300/1 maintains an audio or video call, but the other terminal device 300/2 may communicate with the terminal device 300/1 through a chatting method through a message.

이하 단말 장치들(300/1,300/2)이 서로 음성 또는 영상 통화 중에, 전술한 텍스트 타입의 통화 서비스에 대한 요청이 단말 장치(300/2)로부터 있을 경우에 대한 동작 흐름에 대해 살펴보기로 하자.Hereinafter, an operation flow for a case in which a request for the aforementioned text type call service is received from the terminal device 300/2 while the terminal devices 300/1 and 300/2 are in an audio or video call with each other will be described below. .

도 9를 참조하면, 단말 장치들(300/1,300/2) 간에는 음성 또는 영상 통화가 통화망(410)을 통해 수행된다(①,②).Referring to FIG. 9 , a voice or video call is performed between the terminal devices 300/1 and 300/2 through the communication network 410 (①,②).

수행 중 단말 장치(300/2)로부터 텍스트 타입의 통화 서비스에 대한 요청이 통화망(410)에 수신될 수 있다(③). 이러한 요청은 도 10에 도시된 것과 같이 단말 장치(300/2)의 화면(310) 상에서, 소정의 아이콘(311)을 사용자가 터치할 경우, 단말 장치(300/2)로부터 통화망(410)으로 전달된 것일 수 있으며, 다만 이외의 상황에서도 이러한 요청이 수신될 수 있는데, 이에 대해서는 후술하기로 한다.During execution, a request for a text-type call service may be received from the terminal device 300/2 to the call network 410 (③). As shown in FIG. 10 , when the user touches a predetermined icon 311 on the screen 310 of the terminal device 300/2, the request is transmitted from the terminal device 300/2 to the call network 410 may have been transmitted, but such a request may be received in other circumstances, which will be described later.

다시 도 9를 참조하면, 통화망(410)의 응용 서비스 노드, 즉 TAS(412)는 이러한 요청을 인터페이싱 장치(100)에게 전달한다(④). Referring back to FIG. 9 , the application service node of the communication network 410 , that is, the TAS 412 transmits this request to the interfacing device 100 (④).

그러면 인터페이싱 장치(100)는 프로세서(140)를 통해서, 단말 장치(300/1)로부터의 음성이 텍스트로 변환되어서 단말 장치(300/2)에게 전달되도록 제어할 뿐 아니라, 단말 장치(300/2)로부터의 텍스트가 음성으로 변환되어서 단말 장치(300/1)에게 전달되도록 제어를 수행한다. 이에 대해서는 도 9에 도시된 ⑤ 내지 ⑭를 참조해서 살펴보기로 하자.Then, the interfacing device 100 controls, through the processor 140 , so that the voice from the terminal device 300/1 is converted into text and transmitted to the terminal device 300/2, as well as the terminal device 300/2. ), the text is converted into voice and transmitted to the terminal device 300/1. Let's take a look at this with reference to ⑤ to ⑭ shown in FIG.

앞서 살펴본 바와 같이, ④에 대응되는 요청이 TAS(412)로부터 인터페이싱 장치(100)에 수신되었다고 전제하자.As described above, it is assumed that a request corresponding to ④ is received from the TAS 412 to the interfacing device 100 .

이 후부터, 단말 장치(300/1)의 사용자의 음성이 통화망(410)을 통해 인터페이싱 장치(100)에게 전달되면(⑤, ⑥), 인터페이싱 장치(100)는 이러한 음성을 인공지능 서비스 제공 서버(200)에게 전달한다(⑦).After that, when the user's voice of the terminal device 300/1 is transmitted to the interfacing device 100 through the communication network 410 (⑤, ⑥), the interfacing device 100 transmits the voice to the artificial intelligence service providing server Deliver to (200) (⑦).

인공지능 서비스 제공 서버(200)는 STT 모듈(240)를 이용해서 ⑦에서 전달받은 음성을 텍스트로 변환한 뒤, 이러한 텍스트를 인터페이싱 장치(100)에게 전달한다(⑧). 이 때, 이러한 텍스트에는 단말 장치(300/1)의 사용자에 대해 STT 모듈(240)이 파악한 전술한 기분이나 감정 등을 나타내는 이모티콘 등이 포함될 수 있다.The artificial intelligence service providing server 200 converts the voice received in ⑦ into text using the STT module 240 and then transmits the text to the interfacing device 100 (⑧). In this case, the text may include an emoticon indicating the above-described mood or emotion, etc. identified by the STT module 240 for the user of the terminal device 300/1.

그러면 인터페이싱 장치(100)는 ⑧에서 전달받은 텍스트를 데이터망(420)을 통해 단말 장치(300/2)에게 전달한다(⑨,⑩). 이로써 단말 장치(300/2)의 사용자는 단말 장치(300/1)의 사용자가 발한 음성에 대응되는 텍스트를 채팅 방식의 메시지로서 수신받을 수 있다.Then, the interfacing device 100 transmits the text received in ⑧ to the terminal device 300/2 through the data network 420 (⑨, ⑩). Accordingly, the user of the terminal device 300/2 may receive a text corresponding to the voice uttered by the user of the terminal device 300/1 as a chatting message.

한편, ⑩에서 수신받은 텍스트를 근거로, 단말 장치(300/2)의 사용자는 그러한 텍스트에 대한 응답 메시지를 단말 장치(300/2)에 입력할 수 있다. 이렇게 입력된 응답 메시지에 대한 텍스트는 데이터망(420)을 통해 인터페이싱 장치(100)에게 전달된다(⑪, ⑫). Meanwhile, based on the text received in ⑩, the user of the terminal device 300/2 may input a response message to the text into the terminal device 300/2. The text for the input response message is transmitted to the interfacing device 100 through the data network 420 (⑪, ⑫).

그러면 인터페이싱 장치(100)는 ⑫에서 전달받은 텍스트를 인공지능 서비스 제공 서버(200)에게 전달한다(⑬).Then, the interfacing device 100 transmits the text received in ⑫ to the artificial intelligence service providing server 200 (⑬).

그러면, 인공지능 서비스 제공 서버(200)는 TTS 모듈(230)를 이용해서 ⑬에서 전달받은 텍스트를 음성으로 변환한 뒤, 이러한 음성을 인터페이싱 장치(100)에게 전달한다(⑭). 이 때, 이렇게 변환된 음성에는, TTS 모듈(230)에 의해 획득된, 단말 장치(300/2)의 사용자의 기분이나 감정 등이 반영되어 있을 수 있으며, 이에 따라 그 음성의 톤이나 크기 등이 변환되어 있을 수 있다.Then, the artificial intelligence service providing server 200 uses the TTS module 230 to convert the text received in step ⑬ into a voice, and then transmits the voice to the interfacing device 100 (⑭). At this time, the converted voice may reflect the mood or emotion of the user of the terminal device 300/2 obtained by the TTS module 230, and accordingly, the tone or size of the voice may be changed. may have been converted.

그러면 인터페이싱 장치(100)는 ⑭에서 전달받은 음성을 통화망(410)을 통해 단말 장치(300/1)에게 전달한다(⑮,16). 이로써 단말 장치(300/21의 사용자는 단말 장치(300/2)의 사용자가 입력한 텍스트에 대응되는 음성을 수신받을 수 있다.Then, the interfacing device 100 transmits the voice received in ⑭ to the terminal device 300/1 through the communication network 410 (⑮, 16). Accordingly, the user of the terminal device 300/21 may receive a voice corresponding to the text input by the user of the terminal device 300/2.

즉, 일 실시예에 따르면, 단말 장치(300/1)와 단말 장치(300/2)가 서로 간에 음성 또는 영상 통화를 시도하거나 진행하는 중에, 어느 하나의 단말 장치(300/1)는 음성 또는 영상 통화를 유지하지만 다른 하나의 단말 장치(300/2)는 메시지를 통한 채팅 방식으로 단말 장치(300/1)와 소통을 할 수 있다. That is, according to an embodiment, while the terminal device 300/1 and the terminal device 300/2 are attempting or conducting a voice or video call with each other, any one of the terminal devices 300/1 is Although the video call is maintained, the other terminal device 300/2 may communicate with the terminal device 300/1 through a chatting method through a message.

한편, 도 10에서 살펴본 바로는 단말 장치(300/2)의 사용자가 직접 단말 장치(300/2)의 화면(310)에서 소정의 아이콘(311)을 터치한 경우에 전술한 텍스트 타입의 통화 서비스가 가능하지만, 텍스트 타입의 통화 서비스는 다른 방식에 의해서도 시작될 수 있다. 예컨대 단말 장치(300/2)의 사용자가 단말 장치(300/2)를 '휴식 모드(무음, 무진동 모드 등)'로서 설정한 경우, 단말 장치(300/2)가 음성 또는 영상 통화를 소정의 시간 동안 수행한 이후에는, 사용자가 아이콘(311)을 터치하지 않아도 자동으로 전술한 텍스트 타입의 통화 서비스가 시작될 수 있다.Meanwhile, as seen in FIG. 10 , when the user of the terminal device 300/2 directly touches a predetermined icon 311 on the screen 310 of the terminal device 300/2, the above-described text-type call service is possible, but the text-type call service may be started by other methods as well. For example, when the user of the terminal device 300/2 sets the terminal device 300/2 as a 'rest mode (silent mode, non-vibration mode, etc.)', the terminal device 300/2 performs a predetermined voice or video call. After performing for a period of time, even if the user does not touch the icon 311 , the aforementioned text type call service may be automatically started.

다른 한편, 단말 장치(300/2)의 사용자는 ⑪에서 텍스트를 전달받을 뿐 아니라, 단말 장치(300/1)의 사용자가 발한 음성, 즉 ⑤와 ⑥의 과정을 통해 인터페이싱 장치(100)에게 전달된 음성까지도 이러한 텍스트와 함께 전달받을 수 있다. On the other hand, the user of the terminal device 300/2 not only receives the text in ⑪, but also transmits the voice uttered by the user of the terminal device 300/1, that is, to the interfacing device 100 through the processes ⑤ and ⑥. Even the spoken voice can be delivered along with these texts.

이 때, 단말 장치(300/2)의 사용자는 텍스트만 받고 음성은 받지 않길 원할 수도 있다. 이를 위해, 단말 장치(300/2)의 화면에는 텍스트만 수신되도록 하고 음성은 수신되지 않도록, 즉 음소거를 하는 토글 버튼이 배치되어 있을 수 있으며, 이는 도 11에서 식별번호 312로 도시되어 있다. 사용자가 이러한 토글 버튼을 누를 때마다, 단말 장치(300/1)의 사용자의 음성이 출력되거나 출력되지 않을 수 있다.In this case, the user of the terminal device 300/2 may wish to receive only text and not voice. To this end, a toggle button for receiving only text and not receiving voice, ie, muting, may be disposed on the screen of the terminal device 300/2, which is illustrated by identification number 312 in FIG. 11 . Whenever the user presses such a toggle button, the user's voice of the terminal device 300/1 may or may not be output.

또 다른 한편으로, 인공지능 서비스 제공 서버(200)의 자연어 처리부(또는 자연언어 처리부)(220)는 ⑦에서 전달받은 질문을 인지하고 분석한 뒤, 그에 대한 추천 응답을 제공할 수 있다. 이렇게 제공되는 추천 응답은, 인터페이싱 장치(100)를 거쳐서 데이터망(420)을 통해 단말 장치(300/2)에게 제공된다. 그러면, 단말 장치(300/2)의 사용자는 이러한 추천 응답을 선택하는 것만으로도, 자신이 원하는 텍스트가 상대방인 단말 장치(300/1)의 사용자에게 전달되도록 할 수 있다. 즉, 단말 장치(300/2)의 사용자가 텍스트를 입력하는데에 시간이 소요되는 바, 이러한 시간을 단축하기 위해 단말 장치(300/2)의 사용자에게 추천 응답 텍스트가 제공될 수 있는 것이다. 이를 통해 보다 원활하게 단말 장치(300/2)의 사용자는 텍스트 타입의 통화 서비스를 제공받을 수 있다.On the other hand, the natural language processing unit (or natural language processing unit) 220 of the artificial intelligence service providing server 200 may recognize and analyze the question received in ⑦, and then provide a recommended response. The recommendation response provided in this way is provided to the terminal device 300/2 through the data network 420 via the interfacing device 100 . Then, the user of the terminal device 300/2 can transmit the desired text to the user of the terminal device 300/1, which is the counterpart, simply by selecting such a recommendation response. That is, it takes time for the user of the terminal device 300/2 to input text. In order to shorten this time, the recommended response text may be provided to the user of the terminal device 300/2. Through this, the user of the terminal device 300/2 may be provided with a text-type call service more smoothly.

또한, 실시예에 따라 단말 장치(300/2)의 화면에는 텍스트 타입의 통화 서비스와 음성/영상 방식의 통화 서비스 사이에서 서로 간에 전환이 가능한 방식 토글 버튼이 마련되어 있을 수 있으며, 이는 도 11에서 식별번호 313으로 도시되어 있다. 사용자가 이러한 토글 버튼을 누를 때마다, 단말 장치(300/2)에서는 텍스트 타입의 통화 서비스가 제공되거나 또는 음성/영상 방식의 통화 서비스가 제공될 수 있다.In addition, according to an embodiment, on the screen of the terminal device 300/2, a method toggle button capable of switching between a text-type call service and an audio/video-type call service may be provided, which is identified in FIG. 11 . It is shown as number 313. Whenever the user presses such a toggle button, the terminal device 300/2 may provide a text-type call service or an audio/video-type call service.

또 한편, 텍스트 타입의 통화 서비스가 단말 장치(300/2)에 제공되는 상황을 가정해보면, 단말 장치(300/2)의 사용자가 텍스트를 입력하는 속도가, 단말 장치(300/1)의 사용자가 음성을 발하는 속도보다 상대적으로 느린 경우가 많다. 이에, 인공지능 서비스 제공 서버(200)의 TTS 모듈(230)은, 텍스트의 입력이 시작되었는데 소정 시간 동안 완료가 되지 않을 경우, 이러한 텍스트의 입력이 지연되고 있음을 나타내는 소리를 음성 사이에 또는 음성 중간에 삽입할 수 있다. 도 12를 참조하면, 텍스트의 입력이 시작된 시점부터 소정 시간이 경과되어도 종점이 오지 않았으면, 이러한 소정 시간이 경과된 이후부터 종점의 직전까지 소정의 소리가 상대방의 단말 장치에서 출력될 수 있다. 여기서, 이러한 소리의 종류에는 예컨대 '음...', '그러니까....', '잠깐만....'과 같이, 대화에서 일반적으로 지연을 나타내는 단어가 포함될 수 있다.On the other hand, if it is assumed that a text-type call service is provided to the terminal device 300/2, the speed at which the user of the terminal device 300/2 inputs text depends on the user of the terminal device 300/1. In many cases, it is relatively slower than the speed at which the voice produces a voice. Accordingly, the TTS module 230 of the artificial intelligence service providing server 200, when the input of text is started and is not completed for a predetermined time, a sound indicating that the input of the text is delayed between voice or voice Can be inserted in the middle. Referring to FIG. 12 , if a predetermined time elapses from the start of text input and the end point does not arrive, a predetermined sound may be output from the other party's terminal device from the lapse of the predetermined time to just before the end point. Here, the type of sound may include, for example, a word indicating a delay in a conversation, such as 'Ummm...', 'So...', and 'Wait a minute...'.

도 13은 일 실시예에 따른 인공지능 인터페이싱 방법의 흐름을 도시한 도면이다. 다만, 도 13은 예시적인 것에 불과한 바, 본 발명의 사상이 도 13에 도시된 것으로 한정 해석되지는 않는다.13 is a diagram illustrating a flow of an artificial intelligence interfacing method according to an embodiment. However, FIG. 13 is only an exemplary bar, and the spirit of the present invention is not construed as being limited to that illustrated in FIG. 13 .

도 13을 참조하면, 단말 장치들(300/1,300/2) 간에는 음성 또는 영상 통화가 통화망(410)을 통해 수행된다(S10).Referring to FIG. 13 , a voice or video call is performed between terminal devices 300/1 and 300/2 through a communication network 410 (S10).

수행 중 단말 장치(300/2)로부터 텍스트 타입의 통화 서비스에 대한 요청이 통화망 인터페이싱부(120) 또는 데이터망 인터페이싱부(130)에 수신될 수 있다. 이러한 요청은 도 10에 도시된 것과 같이 단말 장치(300/2)의 화면(310) 상에서, 소정의 아이콘(311)을 사용자가 터치할 경우, 단말 장치(300/2)로부터 전달된 것일 수 있다.During execution, a request for a text-type call service from the terminal device 300/2 may be received by the call network interfacing unit 120 or the data network interfacing unit 130 . This request may be transmitted from the terminal device 300/2 when the user touches a predetermined icon 311 on the screen 310 of the terminal device 300/2 as shown in FIG. .

이 후, 데이터망 인터페이싱부(130)에 수신된 S11에서의 요청은 프로세서(140)에게 전달된다(S12). 그러면 프로세서(140)는 통화망 인터페이싱부(120)를 통해 단말 장치(300/1)와 단말 장치(300/2) 각각에게 re-invite를 요청한다(S13 내지 S15). 각 단말 장치(300/1,300/2)는 re-invite가 완료되면, 완료되었다는 응답을 준다(S16,S17)Thereafter, the request in S11 received by the data network interfacing unit 130 is transferred to the processor 140 (S12). Then, the processor 140 requests re-invite to each of the terminal device 300/1 and the terminal device 300/2 through the communication network interfacing unit 120 (S13 to S15). When the re-invite is completed, each terminal device 300/1, 300/2 gives a response indicating that the re-invite is completed (S16, S17)

이후부터 인터페이싱 장치(100)는 프로세서(140)를 통해서, 단말 장치(300/1)로부터의 음성이 텍스트로 변환되어서 단말 장치(300/2)에게 전달되도록 제어할 뿐 아니라, 단말 장치(300/2)로부터의 텍스트가 음성으로 변환되어서 단말 장치(300/1)에게 전달되도록 제어를 수행한다. 이에 대해서는 도 13에 도시된 S20 내지 S43을 참조해서 살펴보기로 하자.From then on, the interfacing device 100 controls, through the processor 140 , so that the voice from the terminal device 300/1 is converted into text and transmitted to the terminal device 300/2, as well as the terminal device 300/ Control is performed so that the text from 2) is converted into voice and delivered to the terminal device 300/1. This will be described with reference to S20 to S43 shown in FIG. 13 .

단말 장치(300/1)의 사용자의 음성, 예컨대 '여보세요'가 단말 장치(300/1)로부터 통화망 인터페이싱부(120)를 거쳐서 인공지능망 인터페이싱부(110)를 통해 인공지능 서비스 제공 서버(200)의 STT 모듈(240)에게 전달된다(S20 내지 S22). 그러면 STT 모듈(240)은 이러한 음성 '여보세요'에 대응되는 텍스트를 출력한다. 출력된 텍스트는 인공지능망 인터페이싱부(110)에게 전달된다(S24)한다. The user's voice of the terminal device 300/1, for example, 'Hello', is transmitted from the terminal device 300/1 through the call network interfacing unit 120 through the artificial intelligence network interfacing unit 110 through the artificial intelligence service providing server ( 200) is transmitted to the STT module 240 (S20 to S22). Then, the STT module 240 outputs a text corresponding to the voice 'hello'. The output text is transmitted to the artificial intelligence network interfacing unit 110 (S24).

뿐만 아니라 자연언어 처리부(220)는 S22에서 수신된 음성에 대한 추천 응답 텍스트를 도출한 뒤 인공지능망 인터페이싱부(110)에게 전달한다(S23, S25)In addition, the natural language processing unit 220 derives the recommended response text for the voice received in S22 and delivers it to the artificial intelligence network interfacing unit 110 (S23, S25)

그러면 S24와 S25에서 전달받은 텍스트는 데이터망 인터페이싱부(130)를 통해 단말 장치(300/2)에게 전달된다(S27). 이로써 단말 장치(300/2)에는 단말 장치(300/1)의 사용자가 발한 음성에 대응되는 텍스트가 표시된다(S28). 이 때, 단말 장치(300/2)에는 추천 응답 텍스트도 표시된다. 단말 장치(300/2)의 사용자는 이러한 추천 응답 텍스트를 선택하는 것만으로도, 자신이 원하는 응답을 상대방에게 전달할 수 있다.Then, the text received in S24 and S25 is transmitted to the terminal device 300/2 through the data network interfacing unit 130 (S27). Accordingly, the text corresponding to the voice uttered by the user of the terminal device 300/1 is displayed on the terminal device 300/2 (S28). At this time, the recommended response text is also displayed on the terminal device 300/2. The user of the terminal device 300/2 can transmit a desired response to the other party simply by selecting the recommended response text.

한편, S28에서 표시된 텍스트를 근거로, 단말 장치(300/2)의 사용자는 그러한 텍스트에 대한 응답 메시지를 단말 장치(300/2)에 입력할 수 있다. 이렇게 입력된 응답 메시지에 대한 텍스트는 데이터망 인터페이싱부(130)를 거쳐서 인공지능망 인터페이싱부(110)에게 전달된다(S30, S31). 아울러, 이러한 텍스트는 인공지능망 인터페이싱부(110)를 통해 인공지능 서비스 제공 서버(200)의 TTS 모듈(230)에게 전달된다(S32). Meanwhile, based on the text displayed in S28 , the user of the terminal device 300/2 may input a response message to the text to the terminal device 300/2. The text for the input response message is transmitted to the artificial intelligence network interfacing unit 110 through the data network interfacing unit 130 (S30, S31). In addition, this text is transmitted to the TTS module 230 of the artificial intelligence service providing server 200 through the artificial intelligence network interfacing unit 110 (S32).

그러면 TTS 모듈(230)은 이러한 텍스트에 대응되는 음성을 출력한다. 출력된 음성은 인공지능망 인터페이싱부(110)에게 전달된다(S33)한다.Then, the TTS module 230 outputs a voice corresponding to the text. The output voice is transmitted to the artificial intelligence network interfacing unit 110 (S33).

S33에서 전달받은 음성은, 인공지능망 인터페이싱부(110)로부터 통화망 인터페이싱부(120)를 거쳐서 단말 장치(300/1)에게 전달된다(S34 내지 S35). 이로써 단말 장치(300/1)에는 단말 장치(300/2)의 사용자가 입력한 텍스트에 대응되는 음성이 출력된다(S36).The voice received in S33 is transmitted from the artificial intelligence network interfacing unit 110 to the terminal device 300/1 through the communication network interfacing unit 120 (S34 to S35). Accordingly, a voice corresponding to the text input by the user of the terminal device 300/2 is output to the terminal device 300/1 (S36).

이 후, S36에서의 음성의 출력이 완료되었음이 인터페이싱 장치(100)에서 인식되는데, 이러한 인식은 단말 장치(300/1)로부터 수신받은 정보를 기초로 되거나 또는 프로세서(140)가 이를 계산할 수도 있다. 이 후, S36에서의 음성의 출력이 완료되었음이 데이터망 인터페이싱부(130)를 통해서 단말 장치(300/2)에게 전달된다(S42). 그러면 단말 장치(300/2)에서는, 단말 장치(300/2)의 사용자가 입력한 텍스트가, 상대방에게 음성의 형태로 출력이 완료되었음이 소정의 방식으로 표시된다(S43).Thereafter, it is recognized by the interfacing device 100 that the output of the voice in S36 is completed. This recognition is based on information received from the terminal device 300/1, or the processor 140 may calculate it. . Thereafter, the completion of voice output in S36 is transmitted to the terminal device 300/2 through the data network interfacing unit 130 (S42). Then, in the terminal device 300/2, the text input by the user of the terminal device 300/2 is displayed to the other party in the form of a voice in a predetermined manner (S43).

이상에서 살펴본 바와 같이, 일 실시예에 따르면 단말 장치(300/1)와 단말 장치(300/2)가 서로 간에 음성 또는 영상 통화를 시도하거나 진행하는 중에, 어느 하나의 단말 장치(300/1)는 음성 또는 영상 통화를 유지하지만 다른 하나의 단말 장치(300/2)는 메시지를 통한 채팅 방식으로 단말 장치(300/1)와 소통을 할 수 있다. 또한, 이 경우 단말 장치(300/2)의 사용자가 채팅을 입력하는데에 시간이 소요되는 바, 이러한 시간을 단축하기 위해 단말 장치(300/2)의 사용자에게는 추천 응답 텍스트가 제공될 수도 있다. 이를 통해 보다 원활하게 단말 장치(300/2)의 사용자는 텍스트 타입의 통화 서비스를 제공받을 수 있다.As described above, according to an embodiment, while the terminal device 300/1 and the terminal device 300/2 are attempting or conducting an audio or video call with each other, any one of the terminal devices 300/1 maintains an audio or video call, but the other terminal device 300/2 may communicate with the terminal device 300/1 through a chatting method through a message. Also, in this case, it takes time for the user of the terminal device 300/2 to input chatting, and in order to shorten this time, a recommended response text may be provided to the user of the terminal device 300/2. Through this, the user of the terminal device 300/2 may be provided with a text-type call service more smoothly.

한편, 본 발명의 사상에 따른 전술한 방법은, 이러한 방법에 포함된 각 단계를 수행하도록 프로그램된 컴퓨터 프로그램을 저장하는 컴퓨터 판독가능한 기록매체, 또는 이러한 컴퓨터 판독가능한 기록매체에 저장된 컴퓨터 프로그램에 의해 구현 가능하다.On the other hand, the above-described method according to the spirit of the present invention is implemented by a computer-readable recording medium storing a computer program programmed to perform each step included in the method, or a computer program stored in such a computer-readable recording medium It is possible.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 품질에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 균등한 범위 내에 있는 모든 기술사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical spirit of the present invention, and various modifications and variations will be possible without departing from the essential quality of the present invention by those skilled in the art to which the present invention pertains. Accordingly, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The protection scope of the present invention should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be interpreted as being included in the scope of the present invention.

일 실시예에 따르면, 사용자는 자신이 원할 때 텍스트 타입의 통화 서비스를 제공받을 수 있다.According to an embodiment, the user may be provided with a text-type call service when he/she wants.

100: 인공지능의 인터페이싱 장치
200: 인공지능 서비스 제공 서버
300/1,2 : 단말 장치100: artificial intelligence interfacing device
200: artificial intelligence service providing server
300/1,2: terminal device

Claims

a communication unit connected to the data network;
a user interface,
After a request signal for a text-type call service is transmitted through the data network, text converted from voice from the other terminal device and additional information including emotions of the other party are received by the communication unit through the data network If it is, including a processor for controlling the received text and additional information to be displayed on the user interface unit
terminal device.

The method of claim 1,
The additional information is
which was derived based on the voice
terminal device.

The method of claim 1,
Transmission of the request signal,
Performed when a request is made from the position of the terminal device while the terminal device and the other terminal device are attempting or performing a voice call through a communication network connected to the communication unit
terminal device.

The method of claim 1,
The user interface unit,
Displaying the additional information in the form of an emoticon, special character or text corresponding thereto
terminal device.

The method of claim 1,
The communication unit,
Further receiving the voice together with the text and the additional information,
The terminal device,
Further comprising a speaker for outputting the received voice
terminal device.

The method of claim 1,
The communication unit,
receiving a recommendation response text from the position of the terminal device for the text and the additional information;
The user interface unit,
to display the suggested response text
terminal device.

7. The method of claim 6,
The recommendation response text is,
based on the frequency of each instance in which the terminal device responded in response to the text.
terminal device.

The method of claim 1,
The user interface unit receives text and emotional information of the user from the user of the terminal device,
The processor controls the text from the user and the emotional information of the user to be transmitted through the data network,
The user interface unit,
Displaying whether the output of the voice based on the text and emotion from the user has been completed in the terminal device of the other party
terminal device.

A method performed by a terminal device, comprising:
Transmitting a request signal for a text-type call service through a data network;
receiving, through the data network, additional information including text converted from voice from a counterpart's terminal device and feelings of the counterpart;
Displaying the received text and additional information
Way.

An artificial intelligence service interfacing method performed by an artificial intelligence service interfacing device, comprising:
When a voice is received from the terminal device of the other party connected to the communication network, transmitting the voice to an artificial intelligence service providing server;
When the text converted from the voice and additional information representing the user emotion of the other terminal device derived based on the voice are received from the artificial intelligence service providing server, the text and the additional information are provided to the other terminal device. comprising the step of delivering
A method of interfacing artificial intelligence services.