KR20110070386A

KR20110070386A - The system and method for automatically making image ars

Info

Publication number: KR20110070386A
Application number: KR1020090127193A
Authority: KR
Inventors: 신영준; 안희중
Original assignee: 주식회사 케이티
Priority date: 2009-12-18
Filing date: 2009-12-18
Publication date: 2011-06-24

Abstract

PURPOSE: An automatic image ARS manufacture system and a method thereof are provided to offer visualized menu or information to a customer. CONSTITUTION: A voice recognition unit(110) changes the voice of a voice ARS(Auto Response System) into a text. A keyword extraction unit(120) extracts a set keyword from the text. In case the same keyword as the set keyword exists within a keyword DB(125), a matching unit(140) matches a stored image with the keyword. An alignment unit(150) aligns matched images in the time sequence of a recognized voice. A display unit(160) displays the aligned images.

Description

Image ARS automatic production system and its method {The system and method for automatically making image ARS}

본 발명은 영상 ARS 자동 제작 시스템 및 그 방법에 관한 것으로서, 더욱 상세하게는 기존 음성 ARS 시스템을 영상 ARS 시스템으로 전환하기 위한 시스템 교체 작업이 용이한 영상 ARS 자동 제작 시스템 및 그 방법에 관한 것이다. The present invention relates to a video ARS automatic production system and a method thereof, and more particularly, to a video ARS automatic production system and method for easily replacing a system for converting an existing audio ARS system to a video ARS system.

최근에는 컴퓨터, 전자, 통신 기술이 비약적으로 발전함에 따라 이동 통신망을 이용한 다양한 이동통신 서비스가 제공되고 있다.Recently, as the computer, electronic, and communication technologies have developed rapidly, various mobile communication services using a mobile communication network have been provided.

이에 따라 이동통신 서비스 가입자들은 무선 인터넷 서비스를 통해 뉴스, 날씨, 스포츠, 증권, 환율, 교통 정보 등의 각종 정보를 문자, 음성, 정지 영상, 동영상 등의 각종 형태로 제공받을 수 있게 되었다.Accordingly, mobile communication service subscribers can receive various types of information such as news, weather, sports, securities, exchange rates, and traffic information in various forms such as text, voice, still image, and video through wireless Internet service.

이처럼 이동통신 기술의 발달로 인해 이동통신 시스템에서 제공하는 서비스는 음성 서비스뿐만 아니라, 서킷 데이터, 패킷 데이터 등을 전송하는 멀티미디어 통신 서비스로 발전해 가고 있다. As a result of the development of mobile communication technology, the services provided by the mobile communication system are developing into a multimedia communication service for transmitting not only voice services but also circuit data and packet data.

이를 위한 이동통신 시스템은 제 1세대 아날로그 AMPS(Advanced Mobile Phone System)방식과 제 2세대 셀룰러 및 개인휴대 통신(PCS: Personal Communication Service)방식을 거쳐 발전하여 왔으며, 최근에는 ITU-R에서 표준으로 제정하고 있는 제 3 세대 이동통신 시스템인 IMT-2000(International Mobile Telecommunication 2000; 예컨대 CDMA 2000 1X, EV-DO, WCDMA 등)이 상용화되고 있다.The mobile communication system for this purpose has been developed through the first generation analog AMPS (Advanced Mobile Phone System) method and the second generation cellular and personal communication service (PCS) method, and recently established as a standard by ITU-R. IMT-2000 (International Mobile Telecommunication 2000; for example, CDMA 2000 1X, EV-DO, WCDMA, etc.), which is a third generation mobile communication system, is commercially available.

더 상세하게는, 이동통신 시스템은, 3GPP(3^rd Generation Partnership Projects) 국제 표준화 회의에서 규격화한 GSM, GPRS, WCDMA 망으로 발전해왔고, 3GPP2(3^rd Generation Partnership Projects 2) 국제 표준화 회의에서 규격화한 IS-95A, IS-95B, CDMA20001x, CDMA1x-EVDO으로 발전되어 왔다. More specifically, the mobile communication ^{system, 3GPP (3 rd Generation Partnership Projects} ) haewatgo development as GSM, GPRS, WCDMA networks standardized by the international standardization ^{conference, 3GPP2 (3 rd Generation Partnership Projects} 2) normalized IS by the International Standardization Conference -95A, IS-95B, CDMA20001x, CDMA1x-EVDO.

이러한 무선 이동통신 기술 중, IMT-2000망의 경우, 동기식 IMT-2000을 CDMA2000 1x-EVDO라고 하고, 비동기식 IMT-2000을 WCDMA라 한다.Among these wireless mobile communication technologies, in the case of the IMT-2000 network, the synchronous IMT-2000 is called CDMA2000 1x-EVDO, and the asynchronous IMT-2000 is called WCDMA.

IMT-2000은 CDMA 2000 1x, EV-DO, WCDMA 등으로 기존의 IS-95A, IS-95B 망에서 진화한 IS-95C망을 이용하여 IS-95A, IS-95B망에서 지원 가능한 데이터 전송속도인 14.1Kbps나 56Kbps보다 훨씬 빠른 144Kbps의 전송속도로 무선 인터넷을 제공할 수 있는 서비스이다. IMT-2000 is CDMA 2000 1x, EV-DO, WCDMA, etc. It is a data transmission speed that can be supported in IS-95A and IS-95B networks by using IS-95C network that has evolved from IS-95A and IS-95B networks. It is a service that can provide wireless Internet at 144Kbps transmission speed much faster than 14.1Kbps or 56Kbps.

특히, IMT-2000 서비스를 이용하면 기존의 음성 및 WAP 서비스 품질의 향상은 물론 각종 멀티미디어 서비스를 보다 효율적으로 제공할 수 있다.In particular, by using the IMT-2000 service, it is possible to more efficiently provide various multimedia services as well as improve existing voice and WAP service quality.

여기서, WCDMA방식은 IMT-2000 시스템 중에서 비동기식으로 구현된 이동통신 시스템으로, 기지국과 이동통신 단말 간의 통신을 위한 무선 접속방식은 CDMA방식을 채용하지만 이동통신망 관련 기술은 GSM(Global System for Mobile communication)의 망 기술에 기반을 두고 있다.Here, the WCDMA method is a mobile communication system asynchronously implemented in the IMT-2000 system. The wireless access method for communication between the base station and the mobile communication terminal adopts the CDMA method, but the technology related to the mobile communication network is GSM (Global System for Mobile communication). Based on network technology.

WCDMA방식은 전체 기지국의 동기를 맞추기 위해 GPS(Global Positioning System)를 이용할 필요가 없고 국제 로밍 서비스를 지원하며, 주파수 대역폭을 5MHz로 광역화하고 2Mbps의 데이터 전송속도를 가지므로 고속 데이터 전송에 적합하며, 기존의 IS-95 및 GSM 방식에서는 제공할 수 없었던 역방향의 통화 품질에 따라 다이버시티(Diversity)를 제공하여 통신 환경이 좋지 않은 지역에서도 통화 및 데이터 전송 성능이 비교적 양호하다.WCDMA method does not need to use Global Positioning System (GPS) to synchronize the entire base station, supports international roaming service, widens the frequency bandwidth to 5MHz, and has a data rate of 2Mbps, which is suitable for high-speed data transmission. Diversity is provided according to the reverse call quality, which cannot be provided in the IS-95 and GSM methods, so that the call and data transmission performance is relatively good even in a poor communication environment.

다만, WCDMA 방식은 기존의 CDMA 및 GSM 방식과의 역방향 호환성을 지원하지 않기 때문에 이동 통신망을 새로 구축하고 WCDMA 방식을 지원하는 이동통신 단말을 사용하여야 한다.However, since the WCDMA scheme does not support backward compatibility with the existing CDMA and GSM schemes, a new mobile communication network must be constructed and a mobile communication terminal supporting the WCDMA scheme should be used.

여기서, 영상통화 서비스를 위하여, WCDMA 망에서는 유선 서킷 망을 기반으로 제공되며, ITU-T H.324 프로토콜 규격을 3GPP 표준화기구에서 무선 서킷 망에 맞게 일부 변경하여 3G-324M 프로토콜 규격으로 표준화하여 사용하고 있는 반면, CDMA 2000 1x-EVDO망에서는 유선 패킷 망에서 사용되는 ITU-T H.323 프로토콜 규격을 무선 망에서 별도의 표준 규격 없이 무선 패킷 망에 맞게 사용하고 있다.Here, for video call service, WCDMA network is provided based on wired circuit network, and part of ITU-T H.324 protocol standard is changed to 3GPP standardization organization to wireless circuit network and standardized to 3G-324M protocol standard. On the other hand, in the CDMA 2000 1x-EVDO network, the ITU-T H.323 protocol specification used in the wired packet network is used for the wireless packet network without a separate standard specification in the wireless network.

한편, SIP(Session Initiation Protocol) 기반에서는 상기 동기 망과 비동기 망을 통합한 화상전화 서비스를 제공하며, 웹폰은 기존의 패킷 데이터망을 이용하여 화상전화 서비스를 제공하기 때문에 가격이 저렴하고, 컴퓨터 기술과 결합하여 새로운 서비스를 창출할 수 있기 때문에 차세대 기술로써 각광을 받고 있다.On the other hand, SIP (Session Initiation Protocol) based provides a video phone service that integrates the synchronous network and asynchronous network, and web phone provides a video phone service using the existing packet data network because the price is low, computer technology It is attracting the spotlight as next generation technology because it can create new service in combination with.

최근 이러한 영상 통화 서비스가 제공되기 시작한 이후로 많은 기술 발전과 홍보 등에 힘입어 영상 통화 가입자 수가 1000만 명을 돌파하였고, 이동통신사들은 다양한 서비스를 제공하고 있으며, 더 많은 영상 통화 관련 서비스를 개발하고 있는 중이다.Since these video call services have recently been provided, the number of video call subscribers has exceeded 10 million thanks to many technological developments and promotions, and mobile carriers are providing various services and developing more video call related services. In the process.

이러한 영상 통화 서비스 가입자의 증가는, 기존의 음성만으로 통화를 하던 형식에서 벗어나 상대방의 영상을 보면서 통화를 할 수 있다는 점이 가장 큰 원인이 되었으며, 또한 기존의 음성 통화 서비스와는 차별화된 다양한 부가 서비스의 제공도 큰 역할을 한 것이 사실이다.The increase in the number of subscribers of the video call service was caused by the fact that it is possible to make a call while watching the video of the other party, away from the format of the call using only existing voice. It is true that provision also played a big role.

영상 통화 서비스의 차별화된 부가 서비스의 예로는, 음성 ARS에 비해 많은 장점을 가진 영상 ARS 서비스의 제공을 들 수 있다. An example of the differentiated additional service of the video call service is the provision of a video ARS service having many advantages over the voice ARS.

일반적으로, 음성 자동 응답 시스템(Automatic Response System, ARS)이란 각 정보를 음성으로 저장하여 두고 사용자가 전화를 이용하여 시스템에 접속할 경우, 음성으로 필요한 정보를 검색할 수 있도록 사용법을 알려줌과 동시에 필요한 정보를 찾으면 이를 음성으로 들려주는 시스템을 지칭한다.In general, the Automatic Response System (ARS) stores each information as a voice, and when the user accesses the system using a telephone, it teaches the user how to search for necessary information by voice and at the same time, necessary information. If is found, it refers to a system that speaks it.

상기 음성ARS를 이용한 전화 정보 서비스(이하, 음성ARS 서비스)는 전화 교환기에 음성 사서함 장치를 설치 및 각종 정보를 수록해 놓고 정보 이용자가 전화를 걸어 원하는 정보를 청취하는 방식이다.The telephone information service using the voice ARS (hereinafter, referred to as a voice ARS service) is a method of installing a voice mail device in a telephone exchange and storing various types of information, and allowing an information user to make a call and listen to desired information.

이와 같은 음성 ARS 서비스는 상담원의 수를 감소시키고 다수의 고객으로부터 요청되는 안내를 효율적으로 처리하기 위해 고안되었다.This voice ARS service is designed to reduce the number of agents and to efficiently handle the guidance requested from multiple customers.

이러한 음성ARS 서비스는 금융, 증권, 교통, 관광, 스포츠, 공연, 건강, 운세, 날씨 고객 센터, 의료 상담 또는 법률 상담 등과 같은 매우 다양한 분야에 적용되고 있으며, 이와 같은 음성ARS서비스의 대부분은 전화요금 외에 이용한 시간에 비례하여 정보 이용료를 부과하는 것이 일반적이다.These voice ARS services are applied to a wide variety of fields such as finance, securities, transportation, tourism, sports, performances, health, fortune telling, weather customer service, medical counseling, or legal counseling. In addition, it is common to charge an information fee in proportion to the time used.

이와 관련하여, 기존의 음성ARS 서비스의 경우 접속한 사용자가 특정 서비스를 제공받고자 할 경우 유료 ARS 전화를 걸어 음성안내를 청취하여야 하는 음성 위주의 서비스가 제공되고 있는 실정이다.In this regard, in the case of the existing voice ARS service, a voice-oriented service is provided in which a user who needs to listen to a voice guide by making a paid ARS call when a user who wants to receive a specific service is provided.

그러나, 음성 위주의 정보를 제공하는 기존의 음성ARS 서비스의 경우, 사용자가 취할 수 있는 정보의 양은 시각적인 정보를 제공하는 서비스에 비해 제한적일 수밖에 없으며, 이는 사용자 입장에서 필요한 정보를 얻기 위해 장시간 동안 서비스를 이용하여야 한다는 것을 의미한다.However, in the case of the existing voice ARS service that provides voice-oriented information, the amount of information that a user can take is limited compared to the service that provides visual information. Means that you must use.

또한, 통상 ARS 서비스는 고객이 호 접속을 요청 시, 고객의 단말기로 미리 준비된 안내 메시지를 전송하며, 고객이 선택하는 번호에 맞추어 하위 메뉴를 고객에게 제공하도록 구성된다.In addition, the ARS service is typically configured to transmit a preliminary guide message to the terminal of the customer when the customer requests a call connection, and provide the submenu according to the number selected by the customer.

그러나, 하위 메뉴의 수가 많거나 각각의 메뉴에 포함되는 정보량이 많아 청취하는데 많은 시간이 소요되는 경우, 고객은 자신이 원하는 메뉴를 선정하기 위해 듣지 않아도 되는 모든 메뉴에 대한 음성메시지를 지속적으로 청취해야 하며, 자신이 원하는 메뉴에 대한 번호를 잘못 선택하는 경우에는 상위 메뉴로 되돌아가거나 처음부터 다시 시작해야 하는 불편함이 있다.However, if there are a large number of submenus or a large amount of information included in each menu, and it takes a long time to listen, the customer must continuously listen to the voice messages for all the menus that do not need to be heard in order to select the desired menu. If the wrong number for the desired menu is selected, there is an inconvenience of returning to the upper menu or starting again from the beginning.

즉, 기존의 음성ARS 서비스는 실질적인 서비스를 제공하기 위해 서비스와 관 련된 안내 방송 등 불필요한 음성 안내가 필수적이며, 이와 같은 음성 안내는 사용자 입장에서 통화료 낭비를 유발시키는 비효율적인 요소로 작용한다.In other words, the existing voice ARS service is required to provide unnecessary services, such as unnecessary announcements related to the service, and such voice guidance acts as an inefficient factor that causes waste of call charges from the user's point of view.

게다가, 통상적인 ARS 서비스는 고객이 최종적으로 원하는 정보(또는 컨텐츠)를 음성으로만 제공하므로 고객이 음성을 듣지 못하거나 잘못 전달받을 소지가 있고, 고객이 음성 메시지를 청취하면서 이를 받아 적는 것이 용이하지 않다.In addition, typical ARS services provide only the information (or content) that the customer ultimately wants, so that the customer may not hear or misrepresent the voice, and it is not easy for the customer to listen to the voice message and write it down. not.

이를 토대로 살펴볼 때, 기존의 음성ARS 서비스는 상술한 바와 같이 음성 위주의 서비스 제공에서 기인되는 비효율성 및 정보량의 한계로 인해 서비스 다양화 측면에서 극히 제한적일 수밖에 없는 문제가 있다.Based on this, the existing voice ARS service has a problem in that it is extremely limited in terms of service diversification due to the inefficiency and the limitation of information amount caused by the provision of the voice-oriented service.

따라서, 고객이 메뉴를 통해 선택한 최종 정보(또는 컨텐츠)를 시각적으로 볼 수 있도록 시각화된 메뉴 또는 정보(또는 컨텐츠)를 제공할 수 있는 영상 ARS 시스템에 대한 필요성이 대두되어 왔다.Accordingly, there is a need for an image ARS system that can provide a visualized menu or information (or content) so that customers can visually view the final information (or content) selected through the menu.

도 1은 영상 ARS 시스템의 개략적인 구성도이다.1 is a schematic configuration diagram of an image ARS system.

도 1에 도시한 바와 같이, 상기 시스템은, ARS 서비스 사용을 위한 호 연결을 요청하여 영상 ARS서비스를 제공받는 사용자 단말기(1), 이동통신망을 통한 사용자 단말기(1)의 호 연결 요청에 따라 호 연결 제어를 수행하는 교환기(2), 교환기를 통해 사용자 단말기(1)가 접속할 경우 영상 ARS 서비스를 제공하는 ARS 서비스 서버(3)를 포함하는 구성을 갖는다.As shown in FIG. 1, the system requests a call connection for using an ARS service and receives a call according to a call connection request from a user terminal 1 receiving a video ARS service and a user terminal 1 through a mobile communication network. A switch 2 that performs connection control and an ARS service server 3 that provides a video ARS service when the user terminal 1 connects through the switch have a configuration.

상기 사용자 단말기(1)는 금융, 증권, 교통, 관광, 스포츠, 공연, 건강, 운세, 날씨, 고객 센터, 의료 상담 또는 법률 상담 등과 같은 다양한 ARS 서비스를 제공받기 위해 교환기(2)에 상기 ARS 서비스 서버(3)와의 호 접속을 요청한다.The user terminal 1 provides the ARS service to the exchange 2 to receive various ARS services such as finance, securities, transportation, tourism, sports, performances, health, fortune telling, weather, customer center, medical counseling, or legal counseling. A call connection with the server 3 is requested.

상기 교환기(2)는 이동통신망을 통한 사용자 단말기(1)의 호 연결 요청에 따라 상기 사용자 단말기(1)가 영상 ARS 서비스 지원 단말인 경우ARS 서비스 서버(3)와의 호 연결을 수행한다.The switch 2 performs a call connection with the ARS service server 3 when the user terminal 1 is a video ARS service supporting terminal according to a call connection request of the user terminal 1 through a mobile communication network.

상기 ARS 서비스 서버(3)는 교환기(2)를 통해 접속한 상기 사용자 단말기(1)의 단말기 유형 정보를 토대로 영상 ARS 서비스 제공 여부를 결정하고 영상 ARS 서비스 지원 단말인 경우 영상 ARS 서비스를 제공한다.The ARS service server 3 determines whether to provide the video ARS service based on the terminal type information of the user terminal 1 connected through the exchange 2, and provides the video ARS service when the video ARS service is supported.

상기와 같이, 시각화된 메뉴 또는 정보(또는 컨텐츠)를 효율적으로 제공받을 수 있는 영상 ARS 서비스에 대한 요구가 높아지고 있다.As described above, there is an increasing demand for an image ARS service capable of efficiently receiving visualized menus or information (or contents).

그러나, 현재 음성 ARS 서비스가 매우 다양한 분야에 적용되고 있으며, 따라서 영상 ARS 서비스를 제공하기 위해서는 기존의 음성 ARS 사업자들이 기존 음성 ARS 시스템을 영상 ARS 시스템으로 전환하여야 한다.However, the voice ARS service is currently applied to a wide variety of fields, and therefore, in order to provide the video ARS service, the existing voice ARS service providers must convert the existing voice ARS system into the video ARS system.

따라서, 음성 ARS 시스템을 영상 ARS 시스템으로 전환하기 위한 교체 작업이 수반되어야 하고, 그러한 시스템 교체를 용이하게 하기 위한 기술이 필요한 실정이다. Therefore, replacement work for converting the voice ARS system into the video ARS system must be accompanied, and there is a need for a technology for facilitating such a system replacement.

본 발명은 상기와 같은 문제점을 해결하기 위하여 창안된 것으로서, 고객이 ARS 시스템의 메뉴 또는 메뉴를 통해 선택한 최종 정보(또는 컨텐츠)를 시각적으로 볼 수 있도록, 시각화된 메뉴 또는 정보(또는 컨텐츠)를 제공할 수 있는 영상 ARS 시스템을 자동으로 제작할 수 있는 영상 ARS 자동 제작 시스템 및 그 방법을 제공하는 것을 목적으로 한다.The present invention was devised to solve the above problems, and provides a visualized menu or information (or content) so that the customer can visually view the final information (or content) selected through the menu or menu of the ARS system. An object of the present invention is to provide a video ARS automatic production system and a method for automatically producing a video ARS system.

또한, 본 발명은, 기존 음성 ARS 시스템 사용자가 기존 음성 ARS 시스템을 영상 ARS 시스템으로 전환하기 위한 시스템 교체 작업이 용이한 영상 ARS 자동 제작 시스템 및 그 방법을 제공하는 것을 목적으로 한다.In addition, an object of the present invention is to provide a video ARS automatic production system and a method for easily replacing a system for converting an existing voice ARS system to a video ARS system.

상기와 같은 본 발명의 목적을 달성하기 위한 본 발명의 영상 ARS 자동 제작 방법은, 음성 ARS를 재생하는 단계와 상기 음성 ARS의 음성을 인식하는 단계와 상기 인식된 음성에서 기 설정된 다수의 키워드를 추출하는 단계와 상기 추출된 다수의 키워드를 상기 키워드와 대응하여 기 저장된 영상과 매칭하는 단계 및 상기 매칭된 영상을 순서대로 정렬하는 단계를 포함하는 것을 특징으로 한다.In order to achieve the object of the present invention as described above, the automatic ARS video production method of the present invention includes reproducing a voice ARS, recognizing a voice of the voice ARS, and extracting a plurality of preset keywords from the recognized voice. And matching the extracted plurality of keywords with pre-stored images corresponding to the keywords, and arranging the matched images in order.

여기서 상기 매칭된 영상을 순서대로 정렬하는 단계 이후에, 상기 순서대로 정렬된 영상을 시현하는 단계를 포함하는 것이 바람직하다.Here, after the step of arranging the matched image in order, it is preferable to include the step of displaying the image arranged in the order.

또한, 상기 순서대로 정렬된 영상을 시현하는 단계 이후에, 상기 시현되는 영상 중 특정 영상을 기 저장된 다른 영상으로 교체하는 단계를 포함하는 것이 바람직하다.In addition, after the displaying of the images arranged in the order, it is preferable to include the step of replacing a specific image of the image to be stored with another image stored in advance.

한편, 상기와 같은 본 발명의 목적을 달성하기 위한 본 발명의 영상 ARS 자동 제작 시스템은, 음성 ARS로부터의 음성을 인식하는 음성인식부와 상기 인식된 음성에서 기 설정된 키워드를 추출하는 키워드추출부와 상기 추출된 다수의 키워드와 대응하여 기 저장된 영상을 매칭하는 매칭부 및 상기 매칭된 영상을 순서대로 정렬하는 정렬부를 포함하는 것을 특징으로 한다.On the other hand, the video ARS automatic production system of the present invention for achieving the above object of the present invention, a voice recognition unit for recognizing the voice from the voice ARS and a keyword extraction unit for extracting a predetermined keyword from the recognized voice; And a matching unit for matching pre-stored images and an alignment unit for aligning the matched images in order.

여기서 기 설정된 키워드와 대응되는 영상이 저장되는 영상DB를 더 포함하는 것이 바람직하다.The image DB may further include an image DB in which an image corresponding to a preset keyword is stored.

또한, 상기 영상 DB에는 하나의 키워드에 대해 하나의 영상이 대응되어 저장될 수도 있고, 하나의 키워드에 대해 다수의 영상이 대응되어 저장될 수도 있다.In addition, one image may be stored corresponding to one keyword in the image DB, or a plurality of images may be stored in correspondence with one keyword.

그리고, 상기 영상 DB에는 기 설정된 키워드와 대응되는 영상이 태깅되어 저장될 수 있다.In addition, an image corresponding to a preset keyword may be tagged and stored in the image DB.

게다가, 상기 키워드추출부는 특정 키워드를 미리 설정하여 저장하는 키워드 DB를 더 포함할 수 있다.In addition, the keyword extracting unit may further include a keyword DB for presetting and storing a specific keyword.

더욱이, 상기 키워드는 상기 음성 ARS의 각 메뉴에 대응하여 미리 설정하는 것이 바람직하다.Further, the keyword is preferably set in advance corresponding to each menu of the voice ARS.

나아가, 본 발명에 따른 영상 ARS 자동 제작 시스템은, 상기 순서대로 정렬된 영상을 시현하는 표시부를 더 포함할 수 있다.Furthermore, the automatic image ARS production system according to the present invention may further include a display unit for displaying the images arranged in the above order.

본 발명의 영상 ARS 자동 제작 시스템 및 그 방법에 따르면, 고객이 ARS 시트템의 메뉴 또는 메뉴를 통해 선택한 최종 정보(또는 컨텐츠)를 시각적으로 볼 수 있도록, 시각화된 메뉴 또는 정보(또는 컨텐츠)를 제공할 수 있는 영상 ARS 시스템을 자동으로 제작할 수 있는 효과가 있다.According to the video ARS automatic production system and method of the present invention, a visualized menu or information (or content) is provided so that a customer can visually view the final information (or content) selected through the menu or menu of the ARS system. There is an effect that can automatically produce a video ARS system that can be.

또한, 본 발명에서는, 기존 음성 ARS 시스템을 영상 ARS 시스템으로 전환하기 위한 시스템 교체 작업이 용이한 장점이 있다.In addition, in the present invention, there is an advantage that the system replacement work for converting the existing voice ARS system to the video ARS system is easy.

이하, 첨부된 도면을 참조로 하여 본 발명에 따른 바람직한 실시 예를 상세히 설명하기로 한다. 그러나 이하에 기재된 본 발명의 실시예는 당업계에서 평균적인 지식을 가진 자가 본 발명을 보다 용이하게 이해할 수 있도록 제공되는 것이며, 본 발명의 실시 범위가 기재된 실시예에 한정되는 것은 아니다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, the embodiments of the present invention described below are provided to enable those skilled in the art to more easily understand the present invention, and the scope of the present invention is not limited to the described embodiments.

이하, 도 2를 참조하여 본 발명의 영상 ARS 자동 제작 시스템에 대하여 상세하게 설명한다.Hereinafter, the automatic image ARS production system of the present invention will be described in detail with reference to FIG. 2.

도 2는 본 발명에 따른 영상 ARS 자동 제작 시스템의 구성을 개략적으로 나타낸 블록구성도이다.2 is a block diagram schematically showing the configuration of an automatic ARS video production system according to the present invention.

도 2를 참조하면, 본 발명에 따른 영상 ARS 자동 제작 시스템은 음성인식 부(110), 키워드 추출부(120), 영상 DB(130), 매칭부(140) 및 정렬부(150)를 포함하여 구성된다.2, the automatic video ARS production system according to the present invention includes a voice recognition unit 110, a keyword extraction unit 120, an image DB 130, a matching unit 140, and an alignment unit 150. It is composed.

음성인식부(110)는 기존의 음성 ARS 를 재생하여 음성 ARS의 음성에 대한 음성인식을 실행하여, 인식된 음성을 텍스트로 변환한다.The voice recognition unit 110 reproduces the existing voice ARS, performs voice recognition on the voice of the voice ARS, and converts the recognized voice into text.

여기서, 상기 "기존의 음성 ARS" 금융, 증권, 교통, 관광, 스포츠, 공연, 건강, 운세, 날씨 고객 센터, 의료 상담 또는 법률 상담 등과 같은 매우 다양한 분야에 기존에 적용되고 있는 것 중 임의의 것을 말한다.Here, any of the "traditional voice ARS" that is conventionally applied to a wide variety of fields such as finance, securities, transportation, tourism, sports, performances, health, fortune telling, weather customer center, medical counseling or legal counseling, etc. Say.

상기 키워드 추출부(120)는 상기 텍스트로 변환된 음성인식 내용 중에서 기 설정된 키워드를 추출한다.The keyword extraction unit 120 extracts a predetermined keyword from the speech recognition contents converted into the text.

상기 "기설정된 키워드"란 통상의 음성 ARS 시스템에서 음성으로 안내하던 메뉴를 키워드로 설정한 것으로서, 키워드 DB(125)에 미리 저장되어 관리된다.The " preset keyword " is a menu set as a voice guide in a general voice ARS system as a keyword, and is stored in advance in the keyword DB 125 and managed.

음성 ARS 시스템은 현재 금융, 증권, 교통, 관광, 스포츠, 공연, 건강, 운세, 날씨 고객 센터, 의료 상담 또는 법률 상담 등에서 상담원의 수를 감소시키고 다수의 고객으로부터 요청되는 안내를 효율적으로 처리하기 위해 사용되고 있다.The voice ARS system is currently used to reduce the number of agents in finance, securities, transportation, tourism, sports, performances, health, fortune-telling, weather call centers, medical counseling or legal counseling, and to efficiently handle guidance from multiple customers. It is used.

따라서 상기 음성 ARS의 음성 내용은 상기 음성 ARS 시스템을 사용하는 사용자에 따라 다른 내용을 가지고 있으며, 그에 따라 음성 ARS를 통해 안내되는 메뉴 또한 다르다.Therefore, the voice content of the voice ARS has different contents according to the user who uses the voice ARS system, and accordingly, a menu guided through the voice ARS is also different.

이때, 상기 키워드는 기존에 음성 ARS를 통해 제공되던 다양한 메뉴들에 대해 핵심 단어를 키워드로 설정하며, 핵심 단어는 해당 메뉴당 하나 여도 좋고 복수 개이어도 좋다.In this case, the keyword is set as a key word for a variety of menus previously provided through the voice ARS, the key word may be one or a plurality of key words per menu.

상기 키워드 추출부(120)는 상기 음성인식부(110)를 통해 텍스트로 변환된 ARS의 내용에서 사용되는 다수의 단어들을 상기 키워드 DB(125)에 저장된 키워드들과 비교하는 과정을 거쳐 상기 키워드 DB(125)에 기 저장된 키워드들만을 추출하게 된다.The keyword extracting unit 120 compares the plurality of words used in the contents of the ARS converted into text through the speech recognition unit 110 with the keywords stored in the keyword DB 125. Only keywords pre-stored in 125 are extracted.

본 실시예에서는 상기 키워드 DB(125)가 상기 키워드 추출부(120)에 포함된 구성에 대해 언급하고 있지만, 상기 키워드 DB(125)는 별도의 DB로 구성될 수도 있다.In the present embodiment, the keyword DB 125 refers to the configuration included in the keyword extraction unit 120, but the keyword DB 125 may be configured as a separate DB.

상기 영상 DB(130)에는 상기 키워드와 대응하는 다양한 영상이 미리 제작되어 저장된다.In the image DB 130, various images corresponding to the keyword are prepared in advance and stored.

여기서 상기 키워드와 대응하는 영상은 상기 키워드의 관념이나 키워드를 통해 떠오르는 느낌 등과 어울리는 이미지(사진이나 그림), 정지 영상 또는 동영상 등이 될 수 있다.Here, the image corresponding to the keyword may be an image (photo or picture), a still image, a video, or the like, which matches the idea of the keyword or the feeling of rising through the keyword.

예를 들어, 음성 안내 메뉴가 "휴대폰 요금"에 관한 것이고, 그에 대한 키워드가 "요금"으로 기 설정되고 저장된 경우, 상기 "요금"에 대응하는 영상은 요금에 관련된 이미지(사진이나 그림), 정지 영상 또는 동영상 등이 될 수 있다.For example, if the voice guidance menu is related to the “phone charge”, and the keyword for the “charge” is preset and stored, the image corresponding to the “charge” may be an image (picture or picture) related to the charge or a still image. It may be an image or a video.

예컨대, 키워드 "요금"에 대해서 "돈"모양의 이미지가 대응하는 영상이 되어 상기 영상 DB(130)에 저장되는 것이다.For example, a "money" -shaped image becomes a corresponding image with respect to the keyword "rate" and is stored in the image DB 130.

여기서 상기 영상 DB(130)에는 상기 하나의 키워드에 대해 하나의 대응하는 영상이 저장될 수도 있고, 하나의 키워드에 대해 복수 개의 영상이 저장될 수도 있다.Here, one image corresponding to the one keyword may be stored in the image DB 130, or a plurality of images may be stored for one keyword.

이때, 상기 키워드와 대응하는 영상을 저장할 때는 상기 키워드를 그와 대응하는 영상과 태깅하여 저장할 수 있다.In this case, when the image corresponding to the keyword is stored, the keyword may be tagged and stored with the image corresponding to the keyword.

태깅(Tagging)이란, 주로 블로그나 웹 페이지 상에서 이용되는 것으로서, 어떤 사이트의 관리자가 사이트의 이미지나 텍스트를 관련된 주제나 카테고리로 키워드 처리를 해주는 것으로 쉽게 말하면 태그(웹 컨텐츠의 내용을 대표할 수 있는 키워드)를 다는 것이다.Tagging is primarily used on blogs or web pages, where a site administrator can process a site's image or text into related topics or categories, or simply tag (representing the content of web content). Keyword).

즉, 상기 영상 DB(130)에는 상기 키워드를 상기 키워드와 대응하는 영상의 고유 ID와 결합하여 다양한 태깅 정보와 함께 저장될 수 있다.That is, the keyword DB 130 may be stored together with various tagging information by combining the keyword with a unique ID of the image corresponding to the keyword.

상기 매칭부(140)는 음성인식 내용 중 키워드 DB(125)에 저장되어 있는 기 설정된 키워드와 동일한 키워드가 존재하는 경우, 상기 키워드에 대응하여 저장된 영상을 상기 키워드와 매칭한다.When there is a keyword identical to a preset keyword stored in the keyword DB 125 among the voice recognition contents, the matching unit 140 matches the stored image with the keyword.

이때, 음성인식 내용 중에 다수의 키워드가 포함된 경우, 모든 키워드에 대하여 상기 매칭 과정을 수행한다.In this case, when a plurality of keywords are included in the voice recognition content, the matching process is performed on all keywords.

여기서 상기 추출된 하나의 키워드에 대해 다수의 영상이 존재할 수 있으며, 이 경우 다수의 영상들간에는 영상 ARS 자동 제작 시스템의 제작자가 임의로 정한 순위가 부여되거나 각종 통계 정보를 이용하여 다수의 영상들간에 랭킹 알고리즘이 적용될 수도 있다.Here, a plurality of images may exist for the extracted one keyword, and in this case, a plurality of images may be given a rank arbitrarily determined by the producer of the image ARS automatic production system or ranked among the plurality of images using various statistical information. An algorithm may be applied.

따라서 가장 상위의 영상과 상기 키워드가 매칭되도록 할 수 있다.Therefore, the highest image and the keyword may be matched.

상기 정렬부(150)는 상기 매칭된 다수의 영상들을 음성 ARS로 부터 인식된 음성의 시간 순서대로 정렬한다.The sorter 150 sorts the matched plurality of images in chronological order of the speech recognized from the speech ARS.

다시 말하면, 음성 ARS로부터 인식된 음성에서 추출된 키워드가 상기 음성의 시간 순서대로 정렬되고, 그 정렬된 키워드 순서와 동일 순서로 상기 영상들이 정렬되는 것이다.In other words, the keywords extracted from the speech recognized from the speech ARS are sorted in chronological order of the speech, and the images are sorted in the same order as the sorted keyword sequence.

상기 정렬된 영상들은 음성 ARS의 음성이 영상으로 대체된 영상 ARS를 형성하게 된다.The aligned images form an image ARS in which the voice of the audio ARS is replaced with the image.

상기 표시부(160)는 상기 정렬부(150)를 통해 정렬된 영상들을 사용자가 볼 수 있도록 시현하며, 사용자의 핸드폰, 컴퓨터의 모니터 등이 될 수 있다.The display unit 160 displays the images arranged through the alignment unit 150 so that the user can view them, and may be a user's mobile phone or a computer monitor.

이하에서는, 첨부도면을 참조하여 본 발명에 따른 영상 ARS 자동 제작 방법에 대해 상세하게 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail for the automatic image ARS production method according to the present invention.

도 3은 본 발명에 따른 영상 ARS 자동 제작 방법을 나타내는 흐름도이다.3 is a flowchart illustrating a method of automatically producing video ARS according to the present invention.

먼저, 기존의 음성 ARS 시스템 사용자는 본 발명에 따른 영상 ARS 자동 제작 시스템을 이용하고자 하는 경우, 영상 ARS 자동 제작 시스템의 제작자에게 사용 신청을 한다.First, when a user of an existing audio ARS system wants to use the video ARS automatic production system according to the present invention, the user of the existing voice ARS system requests the use of the producer of the video ARS automatic production system.

도 3을 참조하면, 기존의 음성 ARS 시스템 사용자가 사용 신청 등을 통해 본 발명에 따른 영상 ARS 자동 제작 시스템에 로그인 등을 함으로써 접근한 후, 기존에 사용하던 음성 ARS를 재생(S210)시킨다.Referring to FIG. 3, the user of the existing voice ARS system accesses the video ARS automatic production system according to the present invention through an application for use and the like, and plays back the previously used voice ARS (S210).

여기서, 상기 "기존의 음성 ARS"란 금융, 증권, 교통, 관광, 스포츠, 공연, 건강, 운세, 날씨 고객 센터, 의료 상담 또는 법률 상담 등과 같은 매우 다양한 분야에 기존에 적용되고 있는 것 중 임의의 것을 말한다.Here, the "existing voice ARS" is any of those that are conventionally applied to a wide variety of fields such as finance, securities, transportation, tourism, sports, performances, health, fortune telling, weather customer center, medical counseling or legal counseling. Say that.

다음으로, 음성인식부(110)를 통해 상기 음성 ARS의 음성에 대한 음성인식을 실행(S215)하고, 인식된 음성을 텍스트로 변환한다.Next, the voice recognition unit 110 performs voice recognition on the voice of the voice ARS (S215), and converts the recognized voice into text.

이어 상기 인식되는 음성 내용 중 키워드 추출부(120)는 기 설정된 키워드의 유무(S220)에 따라 키워드가 존재하는 경우는 실시간으로 상기 키워드를 추출(S225)하고, 키워드가 없는 경우는 계속해서 음성 인식 과정을 수행한다(S215).Subsequently, the keyword extracting unit 120 of the recognized speech contents extracts the keyword in real time when there is a keyword according to the presence or absence of a predetermined keyword (S220), and when there is no keyword, the speech recognition continues. Perform the process (S215).

상기 키워드 추출부(120)는 상기 음성인식부(110)를 통해 텍스트로 변환된 ARS의 내용에서 사용되는 다수의 단어들을 키워드 DB(125)에 저장된 키워드들과 비교하는 과정을 거쳐 상기 키워드 DB(125)에 기 저장된 키워드들만을 추출하게 된다.The keyword extracting unit 120 compares the plurality of words used in the contents of the ARS converted into text through the speech recognition unit 110 with the keywords stored in the keyword DB 125. Only keywords pre-stored in 125 are extracted.

여기서 "기 설정된 키워드"란 통상의 음성 ARS 시스템에서 음성으로 안내하던 메뉴를 키워드로 설정한 것으로서, 키워드 DB(125)에 미리 저장되어 관리된다.Here, the "preset keyword" is a menu set as a voice guide in a general voice ARS system as a keyword, and is stored in advance in the keyword DB 125 and managed.

다음으로, 상기 추출된 키워드와 대응하는 영상이 영상 DB(130)에 존재하는 지(S230)에 따라 대응하는 영상이 존재하는 경우 매칭부(140)를 통해 상기 추출된 키워드와 그 대응 영상을 매칭한다(S235).Next, when there is a corresponding image according to whether the image corresponding to the extracted keyword exists in the image DB 130 (S230), the matching keyword 140 is matched with the extracted keyword through the matching unit 140. (S235).

그리고 대응하는 영상이 존재하지 않는 경우는 계속해서 음성 인식 과정을 수행한다(S215).If the corresponding image does not exist, the voice recognition process is continued (S215).

또한, 상기 키워드와 대응하는 영상은 상기 키워드의 관념이나 키워드를 통해 떠오르는 느낌 등과 어울리도록 임의적으로 설정한 이미지(사진이나 그림), 정지 영상 또는 동영상 등이 될 수 있다.In addition, the image corresponding to the keyword may be an image (picture or picture), a still image, or a video that is arbitrarily set to match the idea of the keyword or the feeling of rising through the keyword.

게다가, 상기 영상 DB(130)에는 상기 하나의 키워드에 대해 하나의 대응하는 영상이 저장될 수도 있고, 하나의 키워드에 대해 복수 개의 영상이 저장될 수도 있다.In addition, one image corresponding to the one keyword may be stored in the image DB 130, or a plurality of images may be stored for one keyword.

그리고 상기의 매칭 과정(S235)은 음성 ARS의 음성이 종료할 때까지 계속해서 수행된다.The matching process (S235) is continued until the voice of the voice ARS is finished.

즉, 음성인식 내용 중에 다수의 키워드가 포함된 경우, 모든 키워드에 대하여 상기 매칭 과정을 수행한다.That is, when a plurality of keywords are included in the speech recognition content, the matching process is performed on all keywords.

이어서, 상기 매칭된 다수의 영상들을 음성 ARS의 음성 시간 순서대로 정렬한다(S240).Subsequently, the matched plurality of images are arranged in a voice time order of a voice ARS (S240).

다음으로, 표시부(160)를 통해 상기 정렬된 영상들을 사용자가 볼 수 있도록 시현(S245)하며, 이때 상기 표시부(160)는 사용자의 핸드폰, 컴퓨터의 모니터 등이 될 수 있다.Next, the display unit 160 displays the aligned images so that a user can view the display unit S245. In this case, the display unit 160 may be a user's mobile phone or a computer monitor.

사용자는 시현되는 자동 제작 영상 ARS를 검토하고 교체를 원하는 영상이 있 는 경우(S250), 상기 영상 DB(130)의 검색을 통해 원하는 영상을 선택하여 상기 교체를 원하는 영상을 교체(S255)할 수 있다.When the user examines the automatically produced video ARS displayed and there is an image to be replaced (S250), the user may select the desired image through the search of the image DB 130 to replace the image to be replaced (S255). have.

또한, 교체를 원하는 영상이 있는 경우 영상을 교체하는 과정은 부분적으로 이루어 질 수도 있고, 자동 제작된 영상 ARS에 대해 전체적으로 이루어질 수도 있다.In addition, when there is an image to be replaced, the process of replacing the image may be partially performed or may be entirely performed for the automatically produced image ARS.

사용자는 부분적으로 영상을 교체한 후, 교체된 영상을 포함한 영상 ARS를 재검토하는 과정을 거쳐 반복적으로 영상을 교체할 수 있다.After the user partially replaces the image, the user may repeatedly replace the image through a process of reviewing the image ARS including the replaced image.

한편, 사용자가 자동으로 제작된 영상 ARS를 검토하여 교체를 원하는 영상이 없는 경우(S250)는 영상 ARS 자동 제작과정이 종료(S260)되고, 이로써 영상 ARS의 자동 제작과정이 완료된다.On the other hand, if the user does not want to replace the image ARS automatically produced by the image (S250), the video ARS automatic production process is terminated (S260), thereby completing the automatic production process of the video ARS.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서 본 발명에 기재된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상이 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의해서 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다. The foregoing description is merely illustrative of the technical idea of the present invention, and various changes and modifications may be made by those skilled in the art without departing from the essential characteristics of the present invention. Therefore, the embodiments described in the present invention are not intended to limit the technical idea of the present invention but to explain, and the technical idea of the present invention is not limited by these embodiments. The protection scope of the present invention should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

도 1은 영상 ARS 시스템의 개략적인 구성도이며,1 is a schematic configuration diagram of an image ARS system,

도 2는 본 발명에 따른 영상 ARS 자동 제작 시스템의 구성을 개략적으로 나타낸 블록구성도이고,Figure 2 is a block diagram schematically showing the configuration of the automatic image ARS production system according to the present invention,

<도면의 주요 부분에 대한 부호의 설명> <Description of the symbols for the main parts of the drawings>

110: 음성인식부 120: 키워드 추출부110: speech recognition unit 120: keyword extraction unit

125: 키워드 DB 130: 영상 DB125: keyword DB 130: image DB

140: 매칭부 150: 정렬부140: matching unit 150: alignment unit

160: 표시부160: display unit

Claims

Playing the voice ARS;

Recognizing a voice of the voice ARS;

Extracting a plurality of preset keywords from the recognized voice;

Matching the extracted plurality of keywords with pre-stored images corresponding to the keywords; And

Sorting the matched images in order;

Automatic video ARS production method comprising a.

The method of claim 1,

After sorting the matched images in order,

Displaying the images arranged in the order;

Automatic video ARS production method comprising a.

The method of claim 1,

After displaying the images arranged in the above order,

Replacing a specific image among the displayed images with another previously stored image;

Automatic video ARS production method comprising a.

A voice recognition unit for recognizing a voice from the voice ARS;

A keyword extraction unit for extracting a predetermined keyword from the recognized voice;

A matching unit which matches pre-stored images in correspondence with the extracted plurality of keywords; And

An alignment unit to arrange the matched images in order;

Image ARS automatic production system comprising a.

The method of claim 4, wherein

An image DB storing an image corresponding to the preset keyword;

Image ARS automatic production system further comprises.

The method of claim 5,

The video ARS automatic production system, characterized in that one video is stored corresponding to one keyword in the video DB.

The method of claim 5,

Automatic image production system ARS, characterized in that a plurality of images corresponding to one keyword is stored in the image DB.

The method of claim 5,

The video ARS automatic production system, characterized in that the video corresponding to the predetermined keyword is stored in the image DB is tagged.

The method of claim 4, wherein

The keyword extraction unit is a keyword DB for storing a predetermined keyword in advance;

Image ARS automatic production system further comprises.

The method of claim 9,

The keyword is set in advance corresponding to each menu of the voice ARS video ARS automatic production system.

The method of claim 4, wherein

A display unit for displaying the images arranged in the order;

Image ARS automatic production system further comprises.