KR20060028302A

KR20060028302A - Voice communication system and method using distributed speech recognition over ip network

Info

Publication number: KR20060028302A
Application number: KR1020040077402A
Authority: KR
Inventors: 박성준; 정영준
Original assignee: 주식회사 케이티
Priority date: 2004-09-24
Filing date: 2004-09-24
Publication date: 2006-03-29
Also published as: KR101082700B1

Abstract

1. 청구범위에 기재된 발명이 속한 기술분야1. TECHNICAL FIELD OF THE INVENTION

본 발명은 인터넷 프로토콜 망에서의 분산음성처리와 음성인식을 이용한 음성통화 시스템 및 그 방법과, 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것임.The present invention relates to a voice call system using distributed speech processing and speech recognition in an internet protocol network, a method thereof, and a computer-readable recording medium having recorded thereon a program for realizing the method.

2. 발명이 해결하려고 하는 기술적 과제2. The technical problem to be solved by the invention

본 발명은 인터넷 프로토콜(IP) 망에서 음성인식을 이용하여 IP 단말기간에 통화가 이루어지게 하고(호를 연결), 아울러 음성인식속도 및 음성인식율을 향상시킬 수 있는 음성통화 시스템 및 그 방법과, 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있음.The present invention provides a voice call system and method for making a call between IP terminals using voice recognition in an internet protocol (IP) network, and also improving voice recognition speed and voice recognition rate, and The aim is to provide a computer readable recording medium having recorded thereon a program for realizing the method.

3. 발명의 해결방법의 요지3. Summary of Solution to Invention

본 발명은, 인터넷 프로토콜(IP) 망에서의 음성통화 시스템에 있어서, SIP(Session Initiation Protocol) 메시지 형식에 맞게 발신 IP 단말기로부터 전송된 통화요청 메시지에서 '음성인식 기반의 음성통화 서비스'임을 인지하여 음성처리 서버의 IP 주소를 파악하여 해당 음성처리 서버로 호를 설정하는 IP 교환기; 및 상기 IP 교환기로부터 통화요청 메시지를 입력받음에 따라 상기 발신 IP 단말기와 세션을 설정하고, RTP(Real Time Protocol)를 통해 상기 발신 IP 단말기로부터의 '발신자가 발화한 음성에서 특징을 추출하여 인코딩한 특징 추출 데이터'를 디코딩한 후 음성인식하여 착신 IP 단말기의 주소를 파악한 후 인식결과와 함께 상기 발 신 IP 단말기로 전송하며, 상기 발착신 IP 단말기간에 통화가 성립되면 상기 설정된 세션을 해제하는 상기 음성처리 서버를 포함한다.The present invention, in the voice call system in the Internet Protocol (IP) network, in the call request message transmitted from the originating IP terminal according to the SIP (Session Initiation Protocol) message format to recognize that the 'voice recognition based voice call service' An IP exchanger configured to identify an IP address of the speech processing server and set up a call to the speech processing server; And establishing a session with the calling IP terminal in response to receiving a call request message from the IP exchanger, and extracting and encoding a feature from the voice uttered by the caller from the calling IP terminal through a Real Time Protocol (RTP). Feature extraction data, the voice recognition recognizes the destination IP terminal address, and transmits the received IP terminal to the calling IP terminal together with the recognition result, and releases the set session when a call is established between the calling IP terminals. It includes a processing server.

4. 발명의 중요한 용도4. Important uses of the invention

본 발명은 인터넷 프로토콜 망 등에 이용됨.
The present invention is used in the Internet protocol network.

음성인식, 인터넷 프로토콜, 음성통화, IP 단말기, SIP, RTP Voice recognition, Internet protocol, voice call, IP terminal, SIP, RTP

Description

Voice communication system and method using distributed speech processing and speech recognition in internet protocol network

도 1 은 본 발명에 따른 인터넷 프로토콜 기반의 음성통화 시스템의 일실시예 구성도,1 is a configuration diagram of an embodiment of an internet protocol-based voice call system according to the present invention;

도 2 는 종래의 음성인식을 사용하지 않고 두 IP 단말기 사이에 이루어지는 통화 설정 과정에 대한 흐름도, 2 is a flowchart illustrating a call setup process performed between two IP terminals without using conventional voice recognition;

도 3 은 본 발명에 따른 인터넷 프로토콜 망에서의 분산음성처리와 음성인식을 이용한 음성통화 방법에 대한 일실시예 흐름도, 3 is a flowchart illustrating an embodiment of a voice call method using distributed voice processing and voice recognition in an internet protocol network according to the present invention;

도 4 는 본 발명에 따른 인터넷 프로토콜 망에서의 분산음성처리와 음성인식을 이용한 음성통화 방법에 대한 다른 실시예 흐름도이다.
4 is a flowchart illustrating another embodiment of a voice call method using distributed voice processing and voice recognition in an internet protocol network according to the present invention.

* 도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

11,12 : 무선 IP 단말기 13,14 : 액세스 포인트(AP)11,12: wireless IP terminal 13,14: access point (AP)

15 : IP-PBX 16 : 음성처리 서버
15: IP-PBX 16: Voice processing server

본 발명은 인터넷 프로토콜 망에서의 분산음성처리와 음성인식을 이용한 음성통화 시스템 및 그 방법과, 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것으로, 더욱 상세하게는 인터넷 프로토콜(IP : Internet Protocol) 망에서 음성인식을 이용하여 IP 단말기간에 통화가 이루어지게 하고(호를 연결), 인식속도 및 인식율을 향상시킬 수 있는 음성통화 시스템 및 그 방법과, 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.The present invention relates to a voice call system using distributed voice processing and voice recognition in an internet protocol network, and a method thereof, and a computer-readable recording medium recording a program for realizing the method. (IP: Internet Protocol) A voice call system and method for making a call between IP terminals using voice recognition (connection of a call) and improving recognition speed and recognition rate, and to realize the above method. A computer readable recording medium having recorded a program.

종래 IP 망에서의 음성통화는, 기존의 유선전화망 또는 무선통신망에서의 음성통화와 같은 형태를 가지며, 단지 구현기술이 IP를 통한 데이터 전송을 통해 이루어진다는 내부적인 차이점만 있을 뿐이다. 특히, 두 IP 단말기 사이에 이루어지는 통화 설정 과정에 있어서, 음성인식을 채용하지 않았다. The conventional voice call in the IP network has the same form as the voice call in a conventional wired telephone network or a wireless communication network, and there is only an internal difference that an implementation technique is performed through data transmission through IP. In particular, in the call setup process between two IP terminals, voice recognition is not employed.

이해를 돕기 위하여, 도 2를 참조하여 종래의 음성인식을 사용하지 않고 두 IP 단말기 사이에 이루어지는 통화 설정 과정에 대해 살펴보기로 한다. 다만, 설명의 편의를 위하여 도 1을 참조하여 '음성인식없이 사용자 A, B가 통신하는 절차'를 설명하기로 한다. For the sake of understanding, referring to FIG. 2, a call setup process performed between two IP terminals without using conventional voice recognition will be described. However, for convenience of description, the procedure for communicating between users A and B without speech recognition will be described with reference to FIG. 1.

만약, 사용자A가 사용자B와 통화하기를 원할 때, 사용자A는 자신의 IP 단말기(11)를 통해 사용자B의 전화번호를 누르고 통화 버튼을 누른다. 그러면, 사용자A 의 IP 단말기(11)에서는 사용자B와 연결하기를 원하는 내용을 SIP(Session Initiation Protocol) 메시지 형식에 맞게 IP-PBX(15)로 음성통화요청(INVITE) 메시지를 보낸다(201). If user A wants to talk to user B, user A presses user B's telephone number through his IP terminal 11 and presses a call button. Then, the user A's IP terminal 11 sends a voice call request (INVITE) message to the IP-PBX (15) in accordance with the Session Initiation Protocol (SIP) message format that the user A wants to connect with the user B (201).

이후, IP-PBX(15)는 INVITE 메시지로부터 사용자A가 사용자B와 전화통화하기를 원한다는 것을 파악하여, IP-PBX(15)내의 프레전스(PRESENCE) 서버를 이용하여 사용자B의 IP 주소를 얻어, 사용자B의 IP 단말기(12)로 INVITE 메시지를 보내고(202), 또한 사용자A의 IP 단말(11)로는 응답(100 Trying) 메시지를 보낸다(203). Then, the IP-PBX 15 finds out that User A wants to make a telephone call with User B from the INVITE message, and obtains the IP address of User B using the presence server in IP-PBX 15. In step 203, an INVITE message is sent to IP terminal 12 of user B, and a response (100 Trying) message is sent to IP terminal 11 of user A (203).

다음으로, 사용자B의 IP 단말기(12)에 INVITE 메시지가 도착하면(202), IP 단말기(12)는 벨을 울리면서 역 경로(사용자B의 IP 단말기(12) → IP-PBX(15) → 사용자A의 IP 단말기(11))로 응답(180 Ringing) 메시지를 보낸다(204,205). Next, when the INVITE message arrives at the IP terminal 12 of the user B (202), the IP terminal 12 rings with the reverse path (the IP terminal 12 of the user B → IP-PBX (15) → IP address 11 of user A sends a response (180 ringing) message (204, 205).

그리고, 사용자B가 수화기를 들면, IP 단말기(12)는 역 경로(사용자B의 IP 단말기(12) → IP-PBX(15) → 사용자A의 IP 단말기(11))로 최종 응답(200 OK) 메시지를 보낸다(206,207). 이에 대해, 사용자A의 IP 단말기(11)는 최종 응답(200 OK) 메시지를 받았음을 확인하는 ACK 메시지를 사용자B의 IP 단말기(12)로 직접 보냄으로써(208), 세션 설정이 이루어지게 된다. Then, when the user B picks up the handset, the IP terminal 12 sends a final response (200 OK) to the reverse path (the IP terminal 12 of the user B → IP-PBX 15 → the IP terminal 11 of the user A). Send a message (206, 207). On the other hand, the user terminal A's IP terminal 11 sends an ACK message confirming that it has received the final response (200 OK) message to the user terminal B's IP terminal 12 (208), thereby establishing a session.

이와 같이 세션이 설정되면, 이후부터는 사용자A의 IP 단말기(11)와 사용자B의 IP 단말기(12) 사이에 RTP(Real Time Protocol)를 이용하여 직접 데이터가 전달되어 통화가 가능하게 된다(209). When the session is established as described above, data is directly transferred between the IP terminal 11 of the user A and the IP terminal 12 of the user B using RTP (Real Time Protocol) so that the call is possible (209). .

만약, 통화가 끝나면, 일 단말기(11 혹은 12)에서 타 단말기(12 혹은 11)로 통화종료(BYE) 메시지를 보내고(210), 타 단말기(12 혹은 11)에서는 응답(200 OK) 메시지를 일 단말기(11 혹은 12)로 보냄으로써 호 설정이 끝나게 된다(211). If the call is over, one terminal 11 or 12 sends a BYE message to another terminal 12 or 11 (210), and the other terminal 12 or 11 responds with a 200 OK message. Call setup is completed by sending to terminal 11 or 12 (211).

그런데, IP 망에서는 데이터 전송량에 따라 음질이 영향을 많이 받으며, 데이터 전송량의 증가에 따라 음성통화의 음질이 떨어지는 단점이 있다. 이로 인해, 특히 음성인식을 사용할 경우, 사용자가 말한 내용이 음성인식기에 전달될 때 데이터가 분실되거나 왜곡되어 음성인식율이 떨어질 수 있으며, 또한 데이터량이 많아질 경우 데이터 전달에 시간이 많이 걸려 음성인식 결과를 얻기까지 전체 시간이 길어지는 문제점이 있다.
However, in the IP network, the sound quality is affected by the amount of data transmission, and the sound quality of the voice call is degraded as the amount of data transmission increases. For this reason, in particular, when speech recognition is used, the speech recognition rate may be degraded because the data is lost or distorted when the user's speech is transmitted to the speech recognition device. Also, when the volume of data increases, the data transmission takes much time, resulting in speech recognition. There is a problem that the entire time is long to obtain.

본 발명은 상기 문제점을 해결하기 위하여 제안된 것으로, 인터넷 프로토콜(IP) 망에서 음성인식을 이용하여 IP 단말기간에 통화가 이루어지게 하고(호를 연결), 아울러 음성인식속도 및 음성인식율을 향상시킬 수 있는 음성통화 시스템 및 그 방법과, 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있다.The present invention has been proposed to solve the above problems, it is possible to make a call between the IP terminals using the voice recognition in the Internet Protocol (IP) network (connect the call), and also to improve the voice recognition speed and voice recognition rate It is an object of the present invention to provide a voice call system and a method thereof, and a computer-readable recording medium recording a program for realizing the method.

본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.
Other objects and advantages of the present invention can be understood by the following description, and will be more clearly understood by the embodiments of the present invention. In addition, it will be readily appreciated that the objects and advantages of the present invention may be realized by the means and combinations thereof indicated in the claims.

상기 목적을 달성하기 위한 본 발명은, 인터넷 프로토콜(IP) 망에서의 음성통화 시스템에 있어서, SIP(Session Initiation Protocol) 메시지 형식에 맞게 발신 IP 단말기로부터 전송된 통화요청 메시지에서 '음성인식 기반의 음성통화 서비스'임을 인지하여 음성처리 서버의 IP 주소를 파악하여 해당 음성처리 서버로 호를 설정하는 IP 교환기; 및 상기 IP 교환기로부터 통화요청 메시지를 입력받음에 따라 상기 발신 IP 단말기와 세션을 설정하고, RTP(Real Time Protocol)를 통해 상기 발신 IP 단말기로부터의 '발신자가 발화한 음성에서 특징을 추출하여 인코딩한 특징 추출 데이터'를 디코딩한 후 음성인식하여 착신 IP 단말기의 주소를 파악한 후 인식결과와 함께 상기 발신 IP 단말기로 전송하며, 상기 발착신 IP 단말기간에 통화가 성립되면 상기 설정된 세션을 해제하는 상기 음성처리 서버를 포함하여 이루어진 것을 특징으로 한다. The present invention for achieving the above object, in a voice call system in the Internet Protocol (IP) network, in the call request message transmitted from the originating IP terminal according to the SIP (Session Initiation Protocol) message format 'voice recognition-based voice An IP exchanger for recognizing that the call service is to determine an IP address of the voice processing server and setting up a call to the corresponding voice processing server; And establishing a session with the calling IP terminal in response to receiving a call request message from the IP exchanger, and extracting and encoding a feature from the voice uttered by the caller from the calling IP terminal through a Real Time Protocol (RTP). Feature extraction data, decodes the voice and recognizes the address of the called IP terminal and transmits the received IP terminal to the calling IP terminal together with the recognition result, and releases the set session when a call is established between the called IP terminals. It is characterized by including a server.

그리고, 본 발명은 인터넷 프로토콜(IP) 망에서의 음성통화 시스템에 있어서, SIP(Session Initiation Protocol) 메시지 형식에 맞게 발신 IP 단말기로부터 전송된 통화요청 메시지에서 '음성인식 기반의 음성통화 서비스'임을 인지하여 음성처리 서버의 IP 주소를 파악하여 해당 음성처리 서버로 호를 설정하는 IP 교환기; 및 상기 IP 교환기로부터 통화요청 메시지를 입력받음에 따라 상기 발신 IP 단말기와 세션을 설정하고, RTP(Real Time Protocol)를 통해 상기 발신 IP 단말기로부터의 '발신자가 발화한 음성에서 특징을 추출하여 인코딩한 특징 추출 데이터'를 디코딩한 후 음성인식하여 인식결과를 상기 발신 IP 단말기로 전송하며, 소정 시간 경과후 인식결과에 해당되는 착신 IP 단말기의 주소로 호를 자동 연결하여, 착신시 상기 설정된 세션을 해제하는 상기 음성처리 서버를 포함하여 이루어진 것을 특징으로 한다. In addition, the present invention, in a voice call system in an Internet Protocol (IP) network, recognizes that it is a 'voice recognition based voice call service' in a call request message transmitted from an originating IP terminal according to a SIP (Session Initiation Protocol) message format. An IP exchanger configured to identify an IP address of the voice processing server and set up a call to the corresponding voice processing server; And establishing a session with the calling IP terminal in response to receiving a call request message from the IP exchanger, and extracting and encoding a feature from the voice uttered by the caller from the calling IP terminal through a Real Time Protocol (RTP). Feature extraction data, and the voice recognition is performed to transmit the recognition result to the calling IP terminal, and after a predetermined time, the call is automatically connected to the address of the called IP terminal corresponding to the recognition result, and the set session is released when the call is received. It characterized in that it comprises a voice processing server.

상기 목적을 달성하기 위한 본 발명은, 인터넷 프로토콜(IP) 망에서의 음성통화 방법에 있어서, SIP(Session Initiation Protocol) 메시지 형식에 맞게 발신 IP 단말기로부터 전송된 통화요청 메시지에서, IP 교환기가 '음성인식 기반의 음성통화 서비스'임을 인지하여 음성처리 서버의 IP 주소를 파악하여 해당 음성처리 서버로 호를 설정하는 호 설정 단계; 상기 IP 교환기로부터 통화요청 메시지를 입력받음에 따라, 음성처리 서버가 상기 발신 IP 단말기와 세션을 설정하는 세션 설정 단계; 상기 음성처리 서버가 RTP(Real Time Protocol)를 통해 상기 발신 IP 단말기로부터의 '발신자가 발화한 음성에서 특징을 추출하여 인코딩한 특징 추출 데이터'를 디코딩한 후 음성인식하여 착신 IP 단말기의 주소를 파악한 후 인식결과와 함께 상기 발신 IP 단말기로 전송하는 음성인식 단계; 및 상기 발착신 IP 단말기간에 통화가 성립되면 상기 설정된 세션을 해제하는 세션 해제 단계를 포함하여 이루어진 것을 특징으로 한다. In order to achieve the above object, the present invention provides a voice call method in an Internet Protocol (IP) network, in which a call request message transmitted from an originating IP terminal in accordance with a Session Initiation Protocol (SIP) message format is used. A call setup step of recognizing that the voice call service is based on a recognition, identifying an IP address of the voice processing server and setting up a call to the corresponding voice processing server; A session establishment step of establishing, by the voice processing server, a session with the calling IP terminal in response to receiving a call request message from the IP exchange; The voice processing server decodes the feature extraction data by extracting and encoding a feature from the voice spoken by the caller from the calling IP terminal through a real time protocol (RTP), and recognizes the address of the called IP terminal by speech recognition. A voice recognition step of transmitting to the calling IP terminal together with a recognition result afterwards; And a session releasing step of releasing the established session when a call is established between the originating and incoming IP terminals.

그리고, 본 발명은 인터넷 프로토콜(IP) 망에서의 음성통화 방법에 있어서, SIP(Session Initiation Protocol) 메시지 형식에 맞게 발신 IP 단말기로부터 전송된 통화요청 메시지에서, IP 교환기가 '음성인식 기반의 음성통화 서비스'임을 인지하여 음성처리 서버의 IP 주소를 파악하여 해당 음성처리 서버로 호를 설정하는 호 설정 단계; 상기 IP 교환기로부터 통화요청 메시지를 입력받음에 따라, 음성처리 서버가 상기 발신 IP 단말기와 세션을 설정하는 세션 설정 단계; 상기 음성처리 서버가 RTP(Real Time Protocol)를 통해 상기 발신 IP 단말기로부터의 '발신자가 발화한 음성에서 특징을 추출하여 인코딩한 특징 추출 데이터'를 디코딩한 후 음성인식하여 인식결과를 상기 발신 IP 단말기로 전송하는 음성인식 단계; 및 소정 시간 경과후, 상기 인식결과에 해당되는 착신 IP 단말기의 주소로 호를 자동 연결하여, 착신시 상기 설정된 세션을 해제하는 호 연결 및 세션 해제 단계를 포함하여 이루어진 것을 특징으로 한다. In addition, the present invention provides a voice call method in an Internet protocol (IP) network, in which a call request message transmitted from an originating IP terminal conforming to a SIP (Session Initiation Protocol) message format, an IP exchanger uses a voice recognition-based voice call. A call setup step of recognizing that the service is a service and determining an IP address of the voice processing server and setting up a call to the corresponding voice processing server; A session establishment step of establishing, by the voice processing server, a session with the calling IP terminal in response to receiving a call request message from the IP exchange; The voice processing server decodes the feature extraction data obtained by extracting and encoding a feature from the voice spoken by the caller from the calling IP terminal through a real time protocol (RTP), and recognizes the recognition result by using the voice recognition. Voice recognition step of transmitting to; And a call connection and session release step of automatically connecting a call to an address of a destination IP terminal corresponding to the recognition result after a predetermined time, and releasing the established session when the call is received.

또한, 본 발명은 인터넷 프로토콜(IP) 망에서의 음성통화 방법에 있어서, 음성인식 버튼이 눌러졌을 때, 발신 IP 단말기가 음성처리 서버와 별도의 프로토콜을 통해, 상기 IP 교환기를 통하지 않고 직접 상기 음성처리 서버로 음성통화요청(INVITE) 메시지를 보내고, 사용자 발화 음성의 시작점과 끝점을 검출하여 음성 구간을 알아내 이로부터 특징을 추출하여 인코딩하는 단계; 상기 음성처리 서버로부터 요청 응답 메시지 수신시, 상기 발신 IP 단말기가 상기 인코딩된 특징 추출 데이터를 상기 음성처리 서버로 전송하는 단계; 상기 발신 IP 단말기가 상기 음성처리 서버로부터 인식결과 및 연결주소를 입력받아, 상기 연결주소를 포함하여 SIP(Session Initiation Protocol) 메시지 형식에 맞게 상기 IP 교환기로 음성통화 요청 메시지를 전송하는 단계; 상기 IP 교환기를 경유하여 상기 연결주소에 해당하는 착신 IP 단말기로부터의 착신시, 상기 발신 IP 단말기가 상기 착신 IP 단말기와 세션을 설정하여, RTP(Real Time Protocol)를 이용하여 음성 데이터를 전송하는 단계; 및 일측의 통화 종료시, 설정된 세션을 해제하는 세션 해제 단계를 포함하여 이루어진 것을 특징으로 한다. In addition, the present invention provides a voice call method in an Internet Protocol (IP) network, when the voice recognition button is pressed, the originating IP terminal via the protocol separate from the voice processing server, the voice directly without passing through the IP exchange Sending a voice call request (INVITE) message to a processing server, detecting a start point and an end point of a user spoken voice, finding a voice section, and extracting and encoding a feature therefrom; When the requesting response message is received from the speech processing server, sending the encoded feature extraction data to the speech processing server by the calling IP terminal; Receiving, by the calling IP terminal, a recognition result and a connection address from the voice processing server, and transmitting a voice call request message to the IP exchanger in accordance with a Session Initiation Protocol (SIP) message format including the connection address; When the incoming IP terminal corresponding to the connection address is received from the IP exchanger, the originating IP terminal establishes a session with the destination IP terminal and transmits voice data using Real Time Protocol (RTP); ; And a session release step of releasing the established session upon termination of a call on one side.

한편, 본 발명은 인터넷 프로토콜 망에서의 분산음성처리와 음성인식을 이용한 음성통화를 위하여, 프로세서를 구비한 음성통화 시스템에, SIP(Session Initiation Protocol) 메시지 형식에 맞게 발신 IP 단말기로부터 전송된 통화요청 메시지에서, IP 교환기가 '음성인식 기반의 음성통화 서비스'임을 인지하여 음성처리 서버의 IP 주소를 파악하여 해당 음성처리 서버로 호를 설정하는 호 설정 기능; 상기 IP 교환기로부터 통화요청 메시지를 입력받음에 따라, 음성처리 서버가 상기 발신 IP 단말기와 세션을 설정하는 세션 설정 기능; 상기 음성처리 서버가 RTP(Real Time Protocol)를 통해 상기 발신 IP 단말기로부터의 '발신자가 발화한 음성에서 특징을 추출하여 인코딩한 특징 추출 데이터'를 디코딩한 후 음성인식하여 착신 IP 단말기의 주소를 파악한 후 인식결과와 함께 상기 발신 IP 단말기로 전송하는 음성인식 기능; 및 상기 발착신 IP 단말기간에 통화가 성립되면 상기 설정된 세션을 해제하는 세션 해제 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.Meanwhile, the present invention provides a call request transmitted from an originating IP terminal in accordance with a SIP (Session Initiation Protocol) message format to a voice call system having a processor for voice calls using distributed voice processing and voice recognition in an Internet protocol network. A message setting function for recognizing that the IP exchange is a voice recognition based voice call service in the message, identifying an IP address of the voice processing server and setting up a call to the corresponding voice processing server; A session establishment function of establishing, by the voice processing server, a session with the originating IP terminal when receiving a call request message from the IP exchange; The voice processing server decodes the feature extraction data by extracting and encoding a feature from the voice spoken by the caller from the calling IP terminal through a real time protocol (RTP), and recognizes the address of the called IP terminal by speech recognition. A voice recognition function for transmitting to the calling IP terminal together with a recognition result afterwards; And a computer-readable recording medium having recorded thereon a program for realizing a session release function of releasing the established session when a call is established between the originating and receiving IP terminals.

그리고, 본 발명은 인터넷 프로토콜 망에서의 분산음성처리와 음성인식을 이용한 음성통화를 위하여, 프로세서를 구비한 음성통화 시스템에, SIP(Session Initiation Protocol) 메시지 형식에 맞게 발신 IP 단말기로부터 전송된 통화요청 메시지에서, IP 교환기가 '음성인식 기반의 음성통화 서비스'임을 인지하여 음성처리 서버의 IP 주소를 파악하여 해당 음성처리 서버로 호를 설정하는 호 설정 기능; 상기 IP 교환기로부터 통화요청 메시지를 입력받음에 따라, 음성처리 서버가 상기 발신 IP 단말기와 세션을 설정하는 세션 설정 기능; 상기 음성처리 서버가 RTP(Real Time Protocol)를 통해 상기 발신 IP 단말기로부터의 '발신자가 발화한 음성에서 특징을 추출하여 인코딩한 특징 추출 데이터'를 디코딩한 후 음성인식하여 인식결과를 상기 발신 IP 단말기로 전송하는 음성인식 기능; 및 소정 시간 경과후, 상기 인식결과에 해당되는 착신 IP 단말기의 주소로 호를 자동 연결하여, 착신시 상기 설정된 세션을 해제하는 호 연결 및 세션 해제 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.In addition, the present invention provides a call request transmitted from an originating IP terminal according to a SIP (Session Initiation Protocol) message format to a voice call system having a processor for voice calls using distributed voice processing and voice recognition in an Internet protocol network. A message setting function for recognizing that the IP exchange is a voice recognition based voice call service in the message, identifying an IP address of the voice processing server and setting up a call to the corresponding voice processing server; A session establishment function of establishing, by the voice processing server, a session with the originating IP terminal when receiving a call request message from the IP exchange; The voice processing server decodes the feature extraction data obtained by extracting and encoding a feature from the voice spoken by the caller from the calling IP terminal through a real time protocol (RTP), and recognizes the recognition result by using the voice recognition. Voice recognition function to transmit to; And after a predetermined time has elapsed, automatically connects a call to an address of a destination IP terminal corresponding to the recognition result, and records a program for realizing a call connection and session release function for releasing the set session when the call is received. Provide a record carrier.

본 발명은 IP 망에서 음성인식을 이용하여 IP 단말기간의 통화가 이루어지게 하는 것으로, IP 망에서 IP 단말기를 사용하는 사용자들 사이에 음성인식을 이용하여 호를 연결하고, 인식 속도 및 인식율을 향상시키고자 한다. The present invention is to make a call between IP terminals using voice recognition in the IP network, to connect a call using voice recognition between users using the IP terminal in the IP network, improve the recognition speed and recognition rate Let's do it.

이를 위해, 본 발명은 IP 단말기에 음성인식을 위한 전처리 기능을 포함시키고, 인코딩되고 압축된 데이터를 음성처리 서버로 전송하여 데이터 분실을 줄임으로써 인식속도 및 인식율을 향상시킬 수 있다. To this end, the present invention includes a preprocessing function for speech recognition in the IP terminal, and transmits the encoded and compressed data to the speech processing server to reduce the data loss to improve the recognition speed and recognition rate.

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명하기로 한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings, whereby those skilled in the art may easily implement the technical idea of the present invention. There will be. In addition, in describing the present invention, when it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명에 따른 인터넷 프로토콜 기반의 음성통화 시스템의 일실시 예 구성도이다. 다만, 여기에서는 무선 IP 망 기반의 발착신 무선 단말기를 가정하였으나, 이에 한정되지 않음을 밝혀둔다. 1 is a configuration diagram of an embodiment of an internet protocol-based voice call system according to the present invention. However, although it is assumed here that the inbound and outbound wireless terminal based on the wireless IP network is not limited thereto.

본 발명에 따른 인터넷 프로토콜 망에서의 분산음성처리와 음성인식을 이용한 음성통화 서비스를 위해서는 무선 IP 단말기(11,12), 액세스 포인트(AP)(13,14), IP-PBX(15), 음성처리 서버(16)가 필요하지만, IP-PBX(15)는 기존의 것을 그대로 사용 가능하며, 무선 IP 단말기(11)와 음성처리 서버(16)는 본 발명에 따른 기능을 갖게 된다. For voice call service using distributed voice processing and voice recognition in Internet protocol network according to the present invention, wireless IP terminals 11 and 12, access points 13 and 14, IP-PBX 15 and voice Although the processing server 16 is required, the IP-PBX 15 can be used as it is, and the wireless IP terminal 11 and the voice processing server 16 have a function according to the present invention.

도 1에 도시된 바와 같이, 본 발명에 따른 인터넷 프로토콜 망에서의 분산음성처리와 음성인식을 이용한 음성통화 시스템은, SIP 메시지 형식에 맞게 발신 IP 단말기로부터 전송된 통화요청(INVITE) 메시지에서 '음성인식 기반의 음성통화 서비스'임을 인지하여 음성처리 서버(16)의 IP 주소를 파악하여 해당 음성처리 서버(16)로 호를 설정하는 IP-PBX(15)와, IP-PBX(15)로부터 통화요청 메시지를 입력받음에 따라 발신 무선 IP 단말기(11)와 세션을 설정하고, RTP를 통해 발신 무선 IP 단말기(11)로부터의 '발신자가 발화한 음성에서 특징을 추출하여 인코딩한 특징 추출 데이터'를 음성인식하여 착신 무선 IP 단말기(12)의 주소를 파악한 후 인식결과와 함께 발신 무선 IP 단말기(11)로 전송하며, 발착신 무선 IP 단말기(11,12)간에 통화가 성립되면 설정된 세션을 해제하는 음성처리 서버(16)를 포함한다. As shown in FIG. 1, a voice call system using distributed voice processing and voice recognition in an Internet protocol network according to the present invention, a voice message is transmitted in an INVITE message transmitted from an originating IP terminal according to a SIP message format. IP-PBX (15) for recognizing the voice call service based on the recognition and grasping the IP address of the voice processing server (16) and setting up a call to the corresponding voice processing server (16) and the call from the IP-PBX (15). Upon receiving the request message, a session is established with the calling wireless IP terminal 11, and the feature extraction data extracted from the calling voice spoken by the caller from the calling wireless IP terminal 11 is encoded. Voice recognition recognizes the address of the incoming wireless IP terminal 12 and transmits it to the calling wireless IP terminal 11 with the recognition result, and releases the established session when a call is established between the incoming and outgoing wireless IP terminals 11 and 12. voice A processing server 16.

또한, 음성처리 서버(16)는 IP-PBX(15)로부터 통화요청(INVITE) 메시지를 입력받음에 따라 발신 무선 IP 단말기(11)와 세션을 설정하고, RTP를 통해 발신 무선 IP 단말기(11)로부터의 '발신자가 발화한 음성에서 특징을 추출하여 인코딩한 특징 추출 데이터'를 음성인식하여 인식결과를 발신 무선 IP 단말기(11)로 전송하며, 소정 시간 경과후(소정 시간 경과후, 발신 무선 IP 단말기(11)로부터 응답이 없으면 인식이 제대로 된 것으로 간주함) 인식결과에 해당되는 착신 무선 IP 단말기(12)의 주소로 호를 자동 연결하여, 착신시 설정된 세션을 해제한다. In addition, the voice processing server 16 establishes a session with the originating radio IP terminal 11 in response to receiving an INVITE message from the IP-PBX 15, and establishes a session with the originating radio IP terminal 11 through the RTP. Voice recognition is performed on the feature extraction data extracted from the voice uttered by the caller and encoded. The recognition result is transmitted to the calling wireless IP terminal 11, and after a predetermined time (after a predetermined time elapses, the calling wireless IP is transmitted). If there is no response from the terminal 11, the recognition is regarded as correct.) The call is automatically connected to the address of the destination wireless IP terminal 12 corresponding to the recognition result, and the session established at the time of the incoming call is released.

이를 위해, 발신 무선 IP 단말기(11)는, 음성인식 버튼(핫키)이 눌러졌을 때, IP-PBX(15)로 음성통화(INVITE) 메시지를 보내고, 사용자 발화 음성의 시작점과 끝점을 검출하여 음성 구간을 알아내 이로부터 특징을 추출하여 인코딩하며, IP-PBX(15)가 음성처리 서버(16)로 호를 설정하여 발신 무선 IP 단말기(11)와 음성처리 서버(16)간에 호 설정이 이루어지면, 음성처리 서버(16)와 세션을 설정하여, 인코딩된 특징 추출 데이터를 음성처리 서버(16)로 전송한다. To this end, when the voice recognition button (hot key) is pressed, the outgoing wireless IP terminal 11 sends an INVITE message to the IP-PBX 15, and detects the start and end points of the user spoken voice. After the interval is found, the feature is extracted and encoded, and the IP-PBX 15 sets up a call to the voice processing server 16 to establish a call between the calling wireless IP terminal 11 and the voice processing server 16. The session establishes a session with the speech processing server 16 and transmits the encoded feature extraction data to the speech processing server 16.

또한, 발신 무선 IP 단말기(11)는, 음성처리 서버(16)와 별도의 프로토콜을 통해, 음성인식 버튼(핫키)이 눌러졌을 때, IP-PBX(15)를 통하지 않고 직접 음성처리 서버(16)로 음성통화(INVITE) 메시지를 보내고, 사용자 발화 음성의 시작점과 끝점을 검출하여 음성 구간을 알아내 이로부터 특징을 추출하여 인코딩한 후, 음성처리 서버(16)로부터 응답(ACK) 메시지가 오면, 인코딩된 특징 추출 데이터를 음성처리 서버(16)로 전송한다. In addition, when the voice recognition button (hot key) is pressed through the protocol separate from the voice processing server 16, the originating wireless IP terminal 11 does not directly pass through the IP-PBX 15, but the voice processing server 16 Send an INVITE message to the user, detect the start and end points of the user's spoken voice, find out the voice section, extract the feature from the encoding, and then encode an acknowledgment message. The encoded feature extraction data is transmitted to the speech processing server 16.

그럼, 본 발명에 따른 인터넷 프로토콜 망에서의 분산음성처리와 음성인식을 이용한 음성통화 서비스를 살펴보기로 한다. 예를 들어, 병원의 경우 간호사와 의사들이 서로 통화를 함에 있어서 그들만 사용할 수 있는 무전기와 같은 통신 수단을 가지면 편리하며, 일 사용자가 타 사용자와 통화를 할 때의 상황에서 시스템이 어떻게 동작하는지를 살펴보기로 한다. Then, a voice call service using distributed speech processing and voice recognition in the Internet protocol network according to the present invention will be described. For example, in a hospital, it is convenient for nurses and doctors to have a means of communication, such as a radio, that only they can use to talk to each other, and look at how the system works in situations where one user talks to another user. Let's look at it.

먼저, 사용자A는 무선 IP 단말기(11)의 음성인식 버튼을 누르고 원하는 상대방의 이름을 말한다. 그러면, 사용자A의 무선 IP 단말기(11)는 음성인식 버튼이 눌러졌을 때, IP-PBX(15)와 SIP(session initiation protocol)를 사용하여 호 설정을 하고, IP-PBX(15)에서는 음성처리 서버(16)와 호 설정을 한다. 최종적으로는, 무선 IP 단말기(11)와 음성처리 서버(16)간에 호 설정이 이루어진다. First, user A presses the voice recognition button of the wireless IP terminal 11 and speaks the name of the desired counterpart. Then, when the voice recognition button is pressed, the user IP's wireless IP terminal 11 sets up a call using the IP-PBX 15 and the session initiation protocol (SIP), and the IP-PBX 15 processes the voice. Set up call with server 16. Finally, call setup is made between the wireless IP terminal 11 and the voice processing server 16.

무선 IP 단말기(11)가 IP-PBX(15) 및 음성처리 서버(16)와 연결되기 위해서는, IP 망이 구축되어 있어야 하며, 무선 IP 단말기(11)는 액세스 포인트(AP)(13)를 통해서 IP 망과 연결된다. In order for the wireless IP terminal 11 to be connected to the IP-PBX 15 and the voice processing server 16, an IP network must be established, and the wireless IP terminal 11 is connected via an access point (AP) 13. It is connected to IP network.

호 설정이 이루어지면, 사용자의 음성은 특징 추출과 인코딩을 통해 데이터 스트림으로 변환되고, RTP(Real Time Protocol)에 의해 음성처리 서버(16)로 전달된다. 이후에, 음성처리 서버(16)에서는 전달받은 인코딩된 특징 추출 데이터를 디코딩하여 특징 추출 데이터를 생성하고, 이를 이용하여 음성인식을 수행하게 된다. 음성인식 결과가 나오면, 음성처리 서버(16)는 음성인식 결과를 합성음으로 만들어 RTP를 통해 전화를 건 사용자A의 무선 IP 단말기(11)로 전달한다. Once the call setup is made, the user's voice is converted into a data stream through feature extraction and encoding, and delivered to the voice processing server 16 by the Real Time Protocol (RTP). Thereafter, the voice processing server 16 decodes the received encoded feature extraction data to generate feature extraction data, and performs voice recognition using the extracted feature extraction data. When the voice recognition result is output, the voice processing server 16 converts the voice recognition result into a synthesized sound and transmits the result to the wireless IP terminal 11 of the user A who made the call via RTP.

만약, 인식된 결과가 올바르다면 사용자A는 일정 시간 기다리고, 일정 시간이 지난 후(소정 시간 경과후, 발신 무선 IP 단말기(11)로부터 응답이 없으면 인식이 제대로 된 것으로 간주함), 음성처리 서버(16)는 인식 결과에 해당되는 사용자B에게 호를 전달한다. 호를 전달하는 과정은 기존의 IP 망에서 이루어지는 프로토콜을 따른다. If the recognized result is correct, the user A waits for a certain time and after a predetermined time (after a predetermined time elapses, if there is no response from the calling wireless IP terminal 11, the recognition is regarded as proper), and the voice processing server ( 16) forwards the call to user B corresponding to the recognition result. The call forwarding process follows the protocol in the existing IP network.

그러나, 이와 같이 음성처리 서버(16)에서 사용자A의 무선 IP 단말기(11)로 인식결과를 알려주고, 소정 시간이 경과되어 발신 무선 IP 단말기(11)로부터 응답이 없으면 인식이 제대로 된 것으로 간주하여 사용자B의 무선 IP 단말기(12)로 호를 전달할 수도 있지만, 음성처리 서버(16)에서 음성인식 결과와 함께 사용자B의 무선 IP 단말기(12)의 주소를 함께 사용자A의 무선 IP 단말기(11)로 전송하여, 사용자A의 무선 IP 단말기(11)에서 사용자B의 무선 IP 단말기(12) 주소로 호를 시도하여 사용자B의 무선 IP 단말기(12)와 통화를 성립할 수도 있다. However, as described above, the voice processing server 16 notifies the user A's wireless IP terminal 11 of the recognition result. If a predetermined time has elapsed and there is no response from the calling wireless IP terminal 11, the recognition is regarded as proper. Although the call may be forwarded to B's wireless IP terminal 12, the voice processing server 16 transmits the address of the user's B's wireless IP terminal 12 together with the voice recognition result to user A's wireless IP terminal 11 together. By transmitting, the user A's wireless IP terminal 11 may attempt to call the user B's wireless IP terminal 12 address to establish a call with the user B's wireless IP terminal 12.

물론, 음성처리 서버(16)로부터의 인식결과가 틀린 경우, 사용자A가 연결하고자 하는 상대방(사용자B)의 이름을 다시 말하면, 사용자의 음성은 다시 특징 추출과 인코딩을 거쳐 음성처리 서버(16)로 전달된다. 이하, 상기의 과정을 반복 수행하게 된다. Of course, if the recognition result from the speech processing server 16 is wrong, the user A's name of the counterpart (user B) to which the user wants to connect, in other words, the user's voice is again subjected to feature extraction and encoding and then to the speech processing server 16. Is delivered to. Hereinafter, the above process is repeated.

두 사용자간에 호가 연결되면, 그 이후에는 기존의 IP 망에서의 통화 형태와 같으며, 이때 IP 망에서 전달되는 데이터는 특징 추출 및 인코딩한 데이터 스트림이 아니고, 음성 자체를 인코딩한 데이터이다. When a call is connected between two users, the call is the same as that of a conventional IP network. In this case, the data transmitted in the IP network is not data streams extracted from features and encoded, but data encoded in the voice itself.

도 3 은 본 발명에 따른 인터넷 프로토콜 망에서의 분산음성처리와 음성인식을 이용한 음성통화 방법에 대한 일실시예 흐름도로서, 음성인식 기능을 이용하여 사용자 A, B가 통신하게 되는 절차를 나타낸다. 3 is a flowchart illustrating a method for communicating voice using a distributed speech processing and voice recognition in an internet protocol network according to the present invention, and showing a procedure in which users A and B communicate using a voice recognition function.

먼저, 사용자A가 사용자B와 통화하기를 원할 때, 사용자A는 자신의 무선 IP 단말기(11)상의 음성인식 버튼을 누르고 사용자B의 이름을 부른다. 그러면, 사용자A의 무선 IP 단말기(11)에는 음성인식 버튼이 눌러졌을 때, 음성처리 서버(16)와 연결하기를 원한다는 내용을 SIP 형식에 담아서 음성통화요청(INVITE) 메시지를 IP-PBX(15)로 보낸다(301). 이때, 사용자A의 무선 IP 단말기(11)에서는 사용자가 발화한 음성의 시작점과 끝점을 검출하여 음성 구간을 알아내고, 이로부터 특징을 추출하여 인코딩한다. First, when user A wants to talk to user B, user A presses the voice recognition button on his wireless IP terminal 11 and calls user B's name. Then, when the voice recognition button is pressed, the wireless IP terminal 11 of the user A carries a voice call request (INVITE) message to the IP-PBX (15) containing the information that the user wants to connect with the voice processing server 16. (301). At this time, the user A's wireless IP terminal 11 detects the start point and the end point of the voice spoken by the user, finds the voice section, and extracts and encodes the feature therefrom.

이후, IP-PBX(15)는 INVITE 메시지를 통해 음성처리 서버(16)에게 전화하기를 원한다는 것을 파악하고, IP-PBX(15) 내의 프레전스(PRESENCE) 서버를 이용하여 음성처리 서버의 IP 주소를 얻어, 해당 음성처리 서버(16)로 INVITE 메시지를 보내고(302), 또한 사용자A의 무선 IP 단말기(11)로는 응답(100 Trying) 메시지를 보낸다(303). Then, the IP-PBX 15 knows that it wants to call the voice processing server 16 through the INVITE message, and uses the presence server in the IP-PBX 15 to obtain the IP address of the voice processing server. In step 303, the INVITE message is sent to the voice processing server 16, and a response (100 Trying) message is sent to the user A's wireless IP terminal 11 (303).

이후, INVITE 메시지가 도착하면, 음성처리 서버(16)는 역 경로(음성처리 서버(16) → IP-PBX → 사용자A의 IP 단말기(11))로 응답(180 Ringing) 메시지와 최종 응답(200 OK) 메시지를 보낸다(304~307). 이에 대해, 사용자A의 무선 IP 단말기(11)는 응답(200 OK) 메시지를 받았음을 확인하는 응답 확인(ACK) 메시지를 음성처리 서버(16)로 직접 보냄으로써(308), 사용자A의 무선 IP 단말기(11)와 음성처리 서버(16)간에 세션 설정이 이루어지게 된다. Then, when the INVITE message arrives, the voice processing server 16 responds (180 ringing) message and the final response (200) to the reverse path (voice processing server 16 → IP-PBX → user A's IP terminal 11). OK) sends a message (304-307). On the other hand, user A's wireless IP terminal 11 sends a response acknowledgment (ACK) message directly to voice processing server 16 confirming that a response (200 OK) message has been received (308). Session establishment is made between the terminal 11 and the voice processing server 16.

세션 설정이 이루어지면, 사용자A의 무선 IP 단말기(11)는 인코딩된 특징 추출 데이터를 RTP를 통해 음성처리 서버(16)로 전달한다(309). 그러면, 음성처리 서버(16)는 인코딩된 특징 추출 데이터를 받아서 디코딩하고, 디코딩된 특징 추출 데이터를 이용하여 음성인식 과정을 거치게 된다. 음성인식 결과가 나오면, 음성처리 서버는 합성기를 이용하여 사용자A의 무선 IP 단말기(11)로 인식결과를 알려 준다 (310). 여기서는 인식결과에 해당되는 사용자를 B라고 가정한다.If session establishment is made, User A's wireless IP terminal 11 transmits the encoded feature extraction data to voice processing server 16 via RTP (309). Then, the speech processing server 16 receives and decodes the encoded feature extraction data and performs a speech recognition process using the decoded feature extraction data. When the voice recognition result is output, the voice processing server informs the user A's wireless IP terminal 11 of the recognition result using the synthesizer (310). In this case, it is assumed that the user corresponding to the recognition result is B.

이때, 두 가지 방식이 존재할 수 있는데, 하나는 음성처리 서버(16)에서 사용자A의 무선 IP 단말기(11)로 인식결과를 알려주고, 소정 시간이 경과되어 발신 무선 IP 단말기(11)로부터 응답이 없으면 인식이 제대로 된 것으로 간주하여, 사용자B의 무선 IP 단말기(12)로 호를 자동 전달하여 사용자B의 무선 IP 단말기(12)와 통화를 성립하는 방식이다. At this time, there may be two methods. One is that the voice processing server 16 notifies the user A's wireless IP terminal 11 of the recognition result, and if there is no response from the calling wireless IP terminal 11 after a predetermined time has elapsed. Assuming that the recognition is correct, the call is automatically transferred to the wireless IP terminal 12 of the user B to establish a call with the wireless IP terminal 12 of the user B.

다른 하나는, 음성처리 서버(16)에서 음성인식 결과와 함께 사용자B의 무선 IP 단말기(12)의 주소를 함께 사용자A의 무선 IP 단말기(11)로 전송하여, 사용자A의 무선 IP 단말기(11)에서 사용자B의 무선 IP 단말기(12) 주소로 호를 자동으로 시도하여 사용자B의 무선 IP 단말기(12)와 통화를 성립하는 방식이다. 그럼, 후자의 경우를 보다 설명하기로 한다. On the other hand, the voice processing server 16 transmits the address of the user B's wireless IP terminal 12 together with the voice recognition result to the user A's wireless IP terminal 11, and the user A's wireless IP terminal 11 ) Automatically attempts a call to the address of the user B's wireless IP terminal 12 to establish a call with the user B's wireless IP terminal 12. Then, the latter case will be explained more.

우선, 음성처리 서버(16)는 사용자A에게 INVITE 신호를 보내어 호를 일시 정지시킨다(311). 그러면, 사용자A의 무선 IP 단말기(11)에서는 더 이상 송신할 패킷을 생성하지 않는다. 이에 대해, INVITE 신호를 받은 사용자A의 무선 IP 단말기(11)에서는 음성처리 서버(16)로 응답(200 OK) 메시지를 보내고(312), 음성처리 서버(16)는 사용자A의 무선 IP 단말기(11)로 응답확인(ACK) 메시지를 보낸다(313). First, the voice processing server 16 sends an INVITE signal to the user A to pause the call (311). Then, user A's wireless IP terminal 11 no longer generates a packet to transmit. On the other hand, the user A's wireless IP terminal 11 receiving the INVITE signal sends a response (200 OK) message to the voice processing server 16 (312), and the voice processing server 16 sends the user's wireless IP terminal ( 11) an acknowledgment (ACK) message is sent (313).

이후, 음성처리 서버(16)는 사용자B의 무선 IP 단말기(12)의 주소를 담은 레퍼(REFER) 메시지를 사용자A의 무선 IP 단말기(11)로 보낸다(314). 이에 대해, 사용자A의 무선 IP 단말기(11)는 음성처리 서버(16)로 202 Accepted 메시지를 보내고(315), NOTIFY(100 Trying) 메시지를 보냄으로써 사용자B의 무선 IP 단말기(12)와 연결을 시도하고 있음을 알린다(316). 그러면, 음성처리 서버(16)는 OK 메시지를 사용자A의 무선 IP 단말기(11)로 보낸다(317). Thereafter, the voice processing server 16 sends a REFER message containing the address of the user B's wireless IP terminal 12 to the user A's wireless IP terminal 11 (314). On the other hand, user A's wireless IP terminal 11 sends a 202 Accepted message to voice processing server 16 and sends a NOTIFY (100 Trying) message to establish connection with User B's wireless IP terminal 12. Notify that you are trying (316). The voice processing server 16 then sends an OK message to the user A's wireless IP terminal 11 (317).

다음으로, 사용자A의 무선 IP 단말기(11)가 사용자B의 무선 IP 단말기(12)로 INVITE 메시지를 보내(318), 사용자B의 무선 IP 단말기(12)가 OK 메시지를 보내오면(319), 사용자A의 무선 IP 단말기(11)와 사용자B의 무선 IP 단말기(12)간에 통화가 이루어질 수 있으므로, 사용자A의 무선 IP 단말기(11)는 사용자B의 무선 IP 단말기(12)로 ACK 메시지를 보내고(320), 음성처리 서버(16)로는 NOTIFY(200 OK) 메시지를 보냄으로써 호 전달이 제대로 이루어졌음을 알린다(321). Next, if user A's wireless IP terminal 11 sends an INVITE message to user B's wireless IP terminal 12 (318), and user B's wireless IP terminal 12 sends an OK message (319), Since a call can be made between User A's wireless IP terminal 11 and User B's wireless IP terminal 12, User A's wireless IP terminal 11 sends an ACK message to User B's wireless IP terminal 12. In operation 320, the voice processing server 16 notifies the call transfer that the call is properly transmitted by sending a NOTIFY (200 OK) message.

이에 대해, 음성처리 서버(16)는 NOTIFY 메시지를 받았음을 알리는 OK 메시지를 사용자A의 무선 IP 단말기(11)로 보내고(322), 사용자A의 무선 IP 단말기(11)와의 세션을 종료하기 위하여 BYE 메시지를 보낸다(323). 이에 대해, 사용자A의 무선 IP 단말기(11)는 음성처리 서버(16)로 OK 메시지를 보내고(324), 사용자B의 무선 IP 단말기(12)와의 통화를 시작한다(325). In response, the voice processing server 16 sends an OK message indicating that the NOTIFY message has been received to the user A's wireless IP terminal 11 (322), and terminates the session with the user A's wireless IP terminal 11 (BYE). Send a message (323). On the other hand, user A's wireless IP terminal 11 sends an OK message to voice processing server 16 (324), and starts a call with user B's wireless IP terminal 12 (325).

만약, 통화가 끝나면, 일 무선 IP 단말기(11 혹은 12)에서 타 무선 IP 단말기(12 혹은 11)로 통화종료(BYE) 메시지를 보내고(326), 타 무선 IP 단말기(12 혹은 11)에서는 응답(200 OK) 메시지를 일 무선 IP 단말기(11 혹은 12)로 보냄으로써 호 설정이 끝나게 된다(327). If the call is over, a call termination (BYE) message is sent from one wireless IP terminal 11 or 12 to another wireless IP terminal 12 or 11 (326), and the other wireless IP terminal 12 or 11 receives a response ( 200 OK) Call setup is completed by sending a message to one wireless IP terminal 11 or 12 (327).

지금까지 설명된 과정을 위하여 무선 IP 단말기(11)에서 필요한 것은 음성인식 버튼(핫키)이 눌러졌을 때, 자동적으로 음성처리 서버(16)로 INVITE 메시지를 보내는 기능이 필요하며, 사용자의 말하는 것의 시작점과 끝점을 검출하여 음성 구 간을 알아내고, 이것으로부터 특징 추출을 하여 인코딩하는 기능이 필요하다. What is needed in the wireless IP terminal 11 for the process described so far is that when a voice recognition button (hot key) is pressed, a function of automatically sending an INVITE message to the voice processing server 16 is required, and a starting point of the user's speaking We need a function to detect the speech segment by detecting the end point and to extract the feature from it and encode it.

그리고, 음성처리 서버(16)에서는 사용자A의 무선 IP 단말기(11)와 연결되고 나서 특징 추출 데이터를 받아들여 음성인식 과정을 수행하고, 그 결과에 따라 호를 전달하는 기능이 있어야 한다. In addition, the voice processing server 16 should be connected to the wireless IP terminal 11 of the user A, receive the feature extraction data, perform a voice recognition process, and transmit a call according to the result.

이상에서는 무선 IP 단말기(11)가 음성처리 서버(16)와 연결할 때 SIP 프로토콜을 사용하여 IP-PBX(15)를 경유하는 것을 가정하였으나, 이와는 달리 무선 IP 단말기(11)가 음성처리 서버(16)와 직접 연결될 수 있는 방법이 있을 수 있다. 이는 무선 IP 단말기(11)와 음성처리 서버(16)와의 프로토콜을 별도로 정의함으로써 가능하며, 이러한 경우 음성처리 서버(16)에서는 호 설정을 위한 SIP와 관련된 프로그램 부분이 없어도 된다. In the above description, it is assumed that the wireless IP terminal 11 passes through the IP-PBX 15 using the SIP protocol when the wireless IP terminal 11 connects to the voice processing server 16. There may be a way to connect directly). This is possible by separately defining a protocol between the wireless IP terminal 11 and the voice processing server 16. In this case, the voice processing server 16 may not have a program part related to SIP for call setup.

그럼, 도 4를 참조하여 무선 IP 단말기(11)가 음성처리 서버(16)와 직접 연결될 때의 과정을 살펴보기로 한다. Next, the process when the wireless IP terminal 11 is directly connected to the voice processing server 16 will be described with reference to FIG. 4.

먼저, 사용자A가 사용자B와 통화하기를 원할 때, 사용자A는 음성인식 버튼(핫키)을 누르고 사용자B의 이름을 부른다. 그러면, 사용자A의 무선 IP 단말기(11)는 사용자의 음성을 인코딩한 특징 추출 데이터를 음성처리 서버(16)로 보낸다. 음성처리 서버는 미리 오픈한 포트를 통하여 데이터를 받아들인다(401). First, when user A wants to talk to user B, user A presses the voice recognition button (hot key) and calls user B's name. Then, user A's wireless IP terminal 11 sends the feature extraction data encoding the user's voice to the speech processing server 16. The voice processing server receives data through a port previously opened (401).

이후, 음성처리 서버(16)는 인코딩된 특징 추출 데이터를 받아서 디코딩을 하고 디코딩된 특징 추출 데이터를 이용하여 음성인식 과정을 거치게 된다. 음성인식 결과가 나오면 음성처리 서버(16)는 합성기를 사용하여 생성된 인식결과와 그 결과에 해당되는 연결할 주소를 사용자A의 무선 IP 단말기(11)로 넘겨준다(402). Then, the speech processing server 16 receives the encoded feature extraction data, decodes the speech recognition process using the decoded feature extraction data. When the voice recognition result is output, the voice processing server 16 passes the recognition result generated by the synthesizer and the connection address corresponding to the result to the user A's wireless IP terminal 11 (402).

이때, 사용자A의 무선 IP 단말기(11)에서는 사용자에게 인식결과를 들려 주고 일정 시간 동안 응답이 없으면 인식이 제대로 된 것으로 판단하고 인식결과에 해당되는 사용자와 호를 연결한다. At this time, the user A's wireless IP terminal 11 tells the user the recognition result, and if there is no response for a certain time, the user determines that the recognition is correct and connects the call with the user corresponding to the recognition result.

여기서는 인식결과에 해당되는 사용자를 B라고 가정한다. In this case, it is assumed that the user corresponding to the recognition result is B.

만약, 인식결과가 틀리다면 사용자A는 다시 음성인식 버튼(핫키)을 누르고 다시 말을 하고, 상기의 과정을 반복하여 음성처리 서버(16)로부터 인식결과를 다시 받게 된다. If the recognition result is incorrect, the user A presses the voice recognition button (hot key) again and speaks again, and repeats the above process to receive the recognition result from the voice processing server 16 again.

상기 인식결과가 제대로 된 것으로 판단되면, 소정 시간 경과후 사용자A의 무선 IP 단말기(11)에서는 음성처리 서버(16)로부터 넘어 온 데이터 중에서 연결할 주소를 포함하여 SIP 메시지 형식에 맞게 IP-PBX(15)로 INVITE 메시지를 보낸다(403).If it is determined that the recognition result is correct, after a predetermined time elapses, the wireless IP terminal 11 of the user A receives an IP-PBX (15) in accordance with the SIP message format including an address to be connected among data transferred from the voice processing server 16. Send an INVITE message (403).

이후, IP-PBX(15)는 INVITE 메시지로부터 사용자A가 사용자B와 전화통화하기를 원한다는 것을 파악하여, IP-PBX(15)내의 프레전스(PRESENCE) 서버를 이용하여 사용자B의 IP 주소를 얻어, 사용자B의 IP 단말기(12)로 INVITE 메시지를 보내고(404), 또한 사용자A의 IP 단말(11)로는 응답(100 Trying) 메시지를 보낸다(405). Then, the IP-PBX 15 finds out that User A wants to make a telephone call with User B from the INVITE message, and obtains the IP address of User B using the presence server in IP-PBX 15. In step 405, the INVITE message is sent to the IP terminal 12 of the user B (404), and a response (100 Trying) message is sent to the IP terminal 11 of the user A (405).

다음으로, 사용자B의 IP 단말기(12)에 INVITE 메시지가 도착하면(404), IP 단말기(12)는 벨을 울리면서 역 경로(사용자B의 IP 단말기(12) → IP-PBX → 사용자A의 IP 단말기(11))로 응답(180 Ringing) 메시지를 보낸다(406,407).Next, when the INVITE message arrives at the IP terminal 12 of the user B (404), the IP terminal 12 rings with the reverse path (the IP terminal 12 of the user B → IP-PBX → the user A's IP terminal 11 sends a response (180 ringing) message (406, 407).

그리고, 사용자B가 수화기를 들면, IP 단말기(12)는 역 경로(사용자B의 IP 단말기(12) → IP-PBX → 사용자A의 IP 단말기(11))로 최종 응답(200 OK) 메시지를 보낸다(408,409). 이에 대해, 사용자A의 IP 단말기(11)는 최종 응답(200 OK) 메시지를 받았음을 확인하는 ACK 메시지를 사용자B의 IP 단말기(12)로 직접 보냄으로써(410), 세션 설정이 이루어지게 된다. Then, when user B picks up the handset, IP terminal 12 sends a final response (200 OK) message to the reverse path (user B's IP terminal 12 → IP-PBX → user A's IP terminal 11). (408,409). On the other hand, the user A's IP terminal 11 sends an ACK message confirming that the user has received the final response (200 OK) message to the user B's IP terminal 12 (410), thereby establishing a session.

이와 같이 세션이 설정되면, 이후부터는 사용자A의 IP 단말기(11)와 사용자B의 IP 단말기(12) 사이에 RTP(Real Time Protocol)를 이용하여 직접 데이터가 전달되어 통화가 가능하게 된다(411). When the session is established as described above, data is directly transferred between the IP terminal 11 of the user A and the IP terminal 12 of the user B using the RTP (Real Time Protocol), thereby enabling the call (411). .

만약, 통화가 끝나면, 일 무선 IP 단말기(11 혹은 12)에서 타 무선 IP 단말기(12 혹은 11)로 통화종료(BYE) 메시지를 보내고(412), 타 무선 IP 단말기(12 혹은 11)에서는 응답(200 OK) 메시지를 일 무선 IP 단말기(11 혹은 12)로 보냄으로써 호 설정이 끝나게 된다(413). If the call is over, a call termination (BYE) message is sent from one wireless IP terminal 11 or 12 to another wireless IP terminal 12 or 11 (412), and the other wireless IP terminal 12 or 11 receives a response ( 200 OK) Call setup is completed by sending a message to one wireless IP terminal 11 or 12 (413).

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 형태로 기록매체(씨디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다. 이러한 과정은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있으므로 더 이상 상세히 설명하지 않기로 한다.As described above, the method of the present invention may be implemented as a program and stored in a recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.) in a computer-readable form. Since this process can be easily implemented by those skilled in the art will not be described in more detail.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다.
The present invention described above is capable of various substitutions, modifications, and changes without departing from the technical spirit of the present invention for those skilled in the art to which the present invention pertains. It is not limited by the drawings.

상기와 같은 본 발명은, IP 망에서 음성인식을 이용하여 전화번호를 입력하지 않아도 상대방과 연결할 수 있으며, 특히 음성인식시에 단말기에서는 음성의 특징 추출만 하고 이를 인코딩하여 음성처리 서버로 보냄으로써 단말기에서의 계산량을 줄이게 되며, 또한 특징 추출 데이터가 인코딩되어 압축됨으로써 음성처리 서버로 전달될 때 데이터의 분실율을 줄이고 전달 속도를 높임으로써, 음성인식율과 인식 속도를 향상시킬 수 있는 효과가 있다. As described above, the present invention can be connected to the other party without inputting a telephone number using voice recognition in IP network. In particular, the terminal only extracts the feature of the voice and encodes it and sends it to the voice processing server. In addition, the amount of computation is reduced, and the feature extraction data is encoded and compressed to reduce the loss rate of the data when it is delivered to the speech processing server and increase the transmission speed, thereby improving the speech recognition rate and recognition speed.

또한, 본 발명은 사용자가 음성만을 이용하여 원하는 상대방과 쉽게 연결 가능토록 함으로써 서비스 편의성을 증대시킬 수 있는 효과가 있다. In addition, the present invention has the effect that the user can easily connect to the desired counterpart using only the voice to increase the service convenience.

또한, 본 발명은 음성처리 서버가 SIP를 사용하지 않을 경우에 음성인식이 필요한 초기 단계에서 단말기가 직접 음성처리 서버와 연결 가능토록 함으로써 인식속도를 향상시킬 수 있는 효과가 있다. In addition, when the voice processing server does not use SIP, the present invention has an effect of improving the recognition speed by allowing the terminal to directly connect with the voice processing server in the initial stage requiring voice recognition.

Claims

In a voice call system in an Internet Protocol (IP) network,

It recognizes the voice call service based on voice recognition in the call request message sent from the originating IP terminal according to the SIP (Session Initiation Protocol) message format and determines the IP address of the voice processing server and sets up the call to the corresponding voice processing server. IP switch; And

In response to receiving a call request message from the IP exchanger, a session is established with the calling IP terminal, and the feature is extracted and encoded from the voice of the caller from the calling IP terminal through the Real Time Protocol (RTP). Extract the data ', decode the voice, identify the destination IP terminal's address, and transmit it to the calling IP terminal along with the recognition result, and releases the set session when a call is established between the called IP terminals.

Voice call system using distributed speech processing and speech recognition in the Internet protocol network comprising a.

In a voice call system in an Internet Protocol (IP) network,

In response to receiving a call request message from the IP exchanger, a session is established with the calling IP terminal, and the feature is extracted and encoded from the voice of the caller from the calling IP terminal through the Real Time Protocol (RTP). Extracted data 'and decodes the voice and transmits the recognition result to the calling IP terminal, and after a predetermined time elapses, automatically connects the call to the address of the called IP terminal corresponding to the recognition result, and releases the set session when the incoming call is received. The voice processing server

The method according to claim 1 or 2,

The originating IP terminal,

When the voice recognition button is pressed, an INVITE message is sent to the IP exchanger, the start point and the end point of the user's spoken voice are detected, the voice interval is detected, the feature is extracted from the encoded voice, and the IP exchanger When a call is set up between the calling IP terminal and the voice processing server by setting up a call to a voice processing server, a session is set up with the voice processing server, and the encoded feature extraction data is transmitted to the voice processing server. Voice call system using distributed voice processing and voice recognition in internet protocol network.

The method according to claim 1 or 2,

The originating IP terminal,

When a voice recognition button is pressed through a separate protocol from the voice processing server, an INVITE message is sent directly to the voice processing server without passing through the IP exchanger, and a start point and an end point of a user spoken voice are detected. In the Internet protocol network, after detecting a speech section and extracting and encoding a feature from the speech processing unit, when an ACK message is received from the speech processing server, the encoded feature extraction data is transmitted to the speech processing server. Voice call system using distributed speech processing and speech recognition.

The method according to claim 1 or 2,

The originating IP terminal,

When the user presses the voice recognition button and utters, the encoded feature extraction data is transmitted to the voice processing server, and the synthesized sound and the corresponding address are received from the voice processing server. If the recognition result is correct, the call is received through the IP exchange. In the Internet protocol network, a call can be established by making a call with an IP terminal, and if the recognition result is incorrect, the user can press the voice recognition button again and repeat the process of receiving a recognition result by repeating the call. Voice call system using distributed speech processing and speech recognition.

In the voice call method in the Internet Protocol (IP) network,

In the call request message sent from the originating IP terminal in conformity with the SIP (Session Initiation Protocol) message format, the IP exchanger recognizes the voice recognition service based on voice recognition and recognizes the IP address of the speech processing server to the corresponding speech processing server. Call setup step of setting up a call;

A session establishment step of establishing, by the voice processing server, a session with the calling IP terminal in response to receiving a call request message from the IP exchange;

The voice processing server decodes the feature extraction data by extracting and encoding a feature from the voice spoken by the caller from the calling IP terminal through a real time protocol (RTP), and recognizes the address of the called IP terminal by speech recognition. A voice recognition step of transmitting to the calling IP terminal together with a recognition result afterwards; And

A session releasing step of releasing the established session when a call is established between the originating and receiving IP terminals;

Voice call method using distributed speech processing and speech recognition in the Internet protocol network comprising a.

In the voice call method in the Internet Protocol (IP) network,

The voice processing server decodes the feature extraction data obtained by extracting and encoding a feature from the voice spoken by the caller from the calling IP terminal through a real time protocol (RTP), and recognizes the recognition result by using the voice recognition. Voice recognition step of transmitting to; And

After a predetermined time elapses, the call connection and session release step of automatically connecting the call to the address of the destination IP terminal corresponding to the recognition result and releasing the set session when the call is received.

The method according to claim 6 or 7,

Call setup process in the calling IP terminal,

When the voice recognition button is pressed, sending an INVITE message to the IP exchanger, detecting a start point and an end point of a user spoken voice, finding a voice section, and extracting and encoding a feature from the voice call;

Establishing a session with the voice processing server when the IP exchange sets up a call to the voice processing server to establish a call between the calling IP terminal and the voice processing server; And

Transmitting the encoded feature extraction data to the speech processing server.

The method according to claim 6 or 7,

Call setup process in the voice processing server,

When a session is established with the originating IP terminal, voice recognition is performed by receiving feature extraction data, and the speech recognition result is synthesized by a synthesizer and transmitted to the originating IP terminal. A voice call method using distributed voice processing and voice recognition in an internet protocol network, characterized in that a call is transmitted to the called IP terminal corresponding to the recognition result.

In the voice call method in the Internet Protocol (IP) network,

When the voice recognition button is pressed, the originating IP terminal sends a voice call request (INVITE) message directly to the voice processing server without passing through the IP exchange through a separate protocol from the voice processing server, Detecting an end point to detect an end point, and extracting and encoding a feature from the end point;

When the requesting response message is received from the speech processing server, sending the encoded feature extraction data to the speech processing server by the calling IP terminal;

Receiving, by the calling IP terminal, a recognition result and a connection address from the voice processing server, and transmitting a voice call request message to the IP exchanger in accordance with a Session Initiation Protocol (SIP) message format including the connection address;

When the incoming IP terminal corresponding to the connection address is received from the IP exchanger, the originating IP terminal establishes a session with the destination IP terminal and transmits voice data using Real Time Protocol (RTP); ; And

Session release step of releasing the established session at the end of one call

The method of claim 10,

The voice processing server,

When the feature extraction data is received from the originating IP terminal, speech recognition is performed, and distributed speech processing in the Internet protocol network is performed by synthesizing the speech recognition result with a synthesizer and a corresponding connection address to the originating IP terminal. Voice call method using voice recognition.

In a voice call system having a processor for distributed voice processing and voice recognition in an internet protocol network,

In the call request message sent from the originating IP terminal in conformity with the SIP (Session Initiation Protocol) message format, the IP exchanger recognizes the voice recognition service based on voice recognition and recognizes the IP address of the speech processing server to the corresponding speech processing server. Call setup function for setting up a call;

A session establishment function of establishing, by the voice processing server, a session with the originating IP terminal when receiving a call request message from the IP exchange;

The voice processing server decodes the feature extraction data by extracting and encoding a feature from the voice spoken by the caller from the calling IP terminal through a real time protocol (RTP), and recognizes the address of the called IP terminal by speech recognition. A voice recognition function for transmitting to the calling IP terminal together with a recognition result afterwards; And

A session release function for releasing the established session when a call is established between the call originating IP terminals

A computer-readable recording medium having recorded thereon a program for realizing this.

The voice processing server decodes the feature extraction data obtained by extracting and encoding a feature from the voice spoken by the caller from the calling IP terminal through a real time protocol (RTP), and recognizes the recognition result by using the voice recognition. Voice recognition function to transmit to; And

After a predetermined time elapses, the call connection and session release function of automatically connecting the call to the address of the destination IP terminal corresponding to the recognition result and releasing the set session when the call is received.