KR20010108330A

KR20010108330A - Method with a plurality of speech recognizers

Info

Publication number: KR20010108330A
Application number: KR1020017011408A
Authority: KR
Inventors: 메인하드 울리히; 에릭 텔렌; 스테판 베스링
Original assignee: 요트.게.아. 롤페즈; 코닌클리케 필립스 일렉트로닉스 엔.브이.
Priority date: 1999-03-09
Filing date: 2000-02-10
Publication date: 2001-12-07
Also published as: WO2000054252A2; DE19910234A1; EP1163660A2; JP2002539481A; WO2000054252A3; AU2672100A; CN1350685A

Abstract

본 발명은 음성 입력이 가능한 정보 유닛(3)이 서버(3)에 저장되어 클라이언트(2)가 검색할 수 있고, 상기 클라이언트(2)는 통신 네트워크(6)를 통해 복수의 음성 인식자(speech recognizer)(7-9)에 연결되고, 적어도 하나의 인식 결과(11-13)의 생성을 위해 사용자의 음성 입력이 적어도 하나의 음성 인식자(7-9)에 인가되고, 상기 인식 결과(11-13)는 복수의 독립적인 처리 과정에서 해석되며, 상기 사용자에게 전송되는 복수의 해석 결과(22-24)가 생성되는 방법에 관한 것이다. 사용자는 짧은 시간에 복수의 양질의 정보 아이템을 수신하는데, 본 발명이 아니라면, 사용자는 음성 입력에 의하여 여러 차례 데이터베이스에 질의조회를 해야만 한다.According to the present invention, an information unit (3) capable of voice input is stored in the server (3), which can be retrieved by the client (2), and the client (2) is a plurality of speech recognizers (speech) through the communication network (6). a voice input of the user to the at least one voice recognizer 7-9 for generating at least one recognition result 11-13, and the recognition result 11. -13) is interpreted in a plurality of independent processes, and relates to a method for generating a plurality of analysis results 22-24 transmitted to the user. The user receives a plurality of high quality information items in a short time, and if not the present invention, the user has to query the database several times by voice input.

Description

{METHOD WITH A PLURALITY OF SPEECH RECOGNIZERS}

키보드나 마우스 대신 음성 입력으로 컴퓨터와 통신을 할 수 있다면, 사용자가 컴퓨터를 가지고 일을 하는 경우에 부담을 덜게되며, 종종 입력 속도를 빠르게 할 수 있다. 음성 인식은 요즈음 키보드에 의하여 입력이 수행되는 많은 분야에서 사용될 수 있다.If you can communicate with your computer using voice input instead of a keyboard or mouse, you'll be less burdened if you're working with your computer, and can often speed up typing. Speech recognition can be used in many fields where input is performed by a keyboard these days.

EP 0 872 827은 음성 인식 시스템 및 방법을 기술한다. 음성 인식을 위한 압축된 소프트웨어가 실행되는 클라이언트는 네트워크를 통해 음성 인식 서버에 연결된다. 클라이언트는 음성 인식 서버에 음성 인식 문법과 음성 입력 데이터를 전송한다. 음성 인식 서버는 음성 인식을 실행하고 인식 결과를 클라이언트에게 되돌려준다.EP 0 872 827 describes a speech recognition system and method. A client running compressed software for speech recognition is connected to a speech recognition server via a network. The client transmits the speech recognition grammar and speech input data to the speech recognition server. The speech recognition server executes speech recognition and returns the recognition result to the client.

사용자가 정보에 관심이 있을 때, 사용자는 그에게 알려진 로케이션(location)에서 정보를 찾는다. 특정 분야에 대해서는 하나 이상의 서비스 제공자가 있다는 사실을 종종 사용자가 모르는 경우가 있다. 서로 다른 서비스 제공자는 사용자 각각의 질의조회(inquiry)에 대해 다르게 반응한다. 그러나, 대부분의 경우 사용자는 추가적인 정보 소스가 어느 곳에 존재하는 지를 알지 못한다. 설령 안다고 하더라도, 새로운 질의조회를 하여야만 할 것이다. 이것은 시간을 소비하는 일이다.When a user is interested in information, the user looks for information at a location known to him. Often users do not know that there is more than one service provider for a particular field. Different service providers respond differently to the inquiry of each user. In most cases, however, users do not know where additional sources of information exist. Even if you know it, you will have to make a new inquiry. This is a time consuming task.

본 발명은 음성 입력을 가능하게 하는 정보 유닛이 서버에 저장되어 클라이언트가 상기 정보 유닛을 검색할 수 있는 방법에 관한 것이다.The present invention relates to a method in which an information unit enabling speech input is stored in a server so that a client can retrieve the information unit.

도 1은 본 발명에 따른 방법을 구현하기 위한 배열의 블록도.1 is a block diagram of an arrangement for implementing the method according to the invention.

도 2는 음성 인식자를 구비하는 본 발명에 따른 방법의 블록도.2 is a block diagram of a method according to the invention with a speech recognizer.

도 3은 병렬의 음성 인식자를 구비하는 본 발명에 따른 방법의 블록도.3 is a block diagram of a method according to the invention with parallel speech recognizers.

도 4는 병합된 데이터베이스를 갖는 병렬 음성 인식자를 구비하는 본 발명에 따른 방법의 블록도.4 is a block diagram of a method according to the invention with a parallel speech recognizer having a merged database.

그러므로, 본 발명의 목적은 짧은 시간 이내에 가능한 한 많은 양질의 정보를 사용자에게 제공하는 것이다.Therefore, it is an object of the present invention to provide the user with as much quality information as possible within a short time.

상기 목적은, 클라이언트가 통신 네트워크를 통해 복수의 음성 인식자(speech recognizer)에 결합되고, 인식 결과를 생성하기 위하여 사용자의 음성 입력이 적어도 하나의 음성 인식자에 인가되며, 상기 인식 결과가 복수의 독립적인 처리 과정에서 해석되고(interpreted), 사용자에게 공급되는 복수의 해석 결과가 생성되는 식으로 성취된다.The object is that the client is coupled to a plurality of speech recognizers via a communication network, the user's voice input is applied to the at least one speech recognizer to generate a recognition result, the recognition result being a plurality of Interpreted in an independent process, a plurality of analysis results supplied to the user are achieved.

서비스 제공자는 음성 입력이 가능한 정보 유닛을 서버에 저장한다. 클라이언트는 음성 입력이 가능한 정보 유닛을 상기 서버로부터 다운로드한다. 서버는 예를 들면, 인터넷인 통신 네트워크의 컴퓨터이고, 상기 컴퓨터에 제공자의 정보가 저장되어 클라이언트가 상기 정보를 검색할 수 있다. 클라이언트는 인터넷으로부터 정보를 검색하여 서버에 저장된 정보 유닛을 다운로드 하기 위한, 서버에 연결된 컴퓨터이고, 상기 클라이언트는 서프트웨어에 의해 상기 정보 유닛을 표시한다. 상기 정보 유닛은 클라이언트에 전달되어 사용자는 상기 정보 유닛의 컨텐트(content)를 인지할 수 있다. 사용자는 정보 유닛에 의해 음성을 입력하도록요청받거나, 또는 정보 유닛이 빈번히 불러내어지기 때문에 음성 입력의 가능성에 관해 통보 받는다. 사용자가 음성 입력을 한 후에 음성 입력은 하나 이상의 음성 인식자에 인가된다. 개별적인 음성 인식자는 음성 인식을 실행하고 각각 인식 결과를 생성한다. 이러한 인식결과는 개별적으로 해석이 되어진다. 이러한 인식 결과는 독립적인 처리 과정에 의해서 해석 결과에 다다르게 된다. 인식 결과의 해석을 위하여 상기 인식 결과가 분석된다. 그러므로, 인식 결과는 개별적인 성분 부분으로 다시 나뉘어져, 예를 들면, 키워드가 찾아진다. 이후의 정보 질의조회에 무관한 인식 결과의 부분들은 생략된다. 후에 음성 인식자 또는 데이터베이스로부터 분석이 이루어진다. 그러므로, 인식 결과를 분석하기 위하여, 음성 입력의 컨텐트에 관한 정보를 갖는 것이 필요하다. 음성 입력의 가능한 컨텐트는 정보 유닛의 컨텐트에 의해 결정된다. 이러한 분석에 의하여, 질의조회가 데이터베이스에 대해 이루어진다. 이러한 질의조회는 개별적인 데이터베이스에 전송되어, 이후에 복수의 독립적으로 생성되는 해석 결과를 생성한다. 사용자에 의한 음성 입력에 대한 응답의 품질에 결정적인 영향을 갖는 중요한 국면은 질의조회에 대한 답을 발견하기 위해 사용되는 데이터베이스이다. 독립적인 데이터베이스의 수는 계속 증가한다. 게다가 답을 발견하는데 또한 조력할 수도 있는 광범위한 비지니스(business) 데이터베이스가 있다. 답을 발견해야할 때, 인식 결과가 다중의 해석을 위해 데이터베이스에 할당된다는 점에서 개별적인 데이터베이스는 통합된다.The service provider stores in the server an information unit capable of voice input. The client downloads an information unit capable of voice input from the server. The server is, for example, a computer in a communication network, which is the Internet, and information of a provider is stored in the computer so that a client can retrieve the information. The client is a computer connected to the server for retrieving information from the Internet and downloading the information unit stored in the server, and the client displays the information unit by software. The information unit is delivered to the client so that the user can recognize the content of the information unit. The user is asked by the information unit to input a voice or is informed about the possibility of the voice input because the information unit is frequently called. After the user makes a voice input, the voice input is applied to one or more voice recognizers. Individual speech recognizers perform speech recognition and generate recognition results respectively. These recognition results are interpreted individually. This recognition result reaches the analysis result by an independent process. The recognition result is analyzed for interpretation of the recognition result. Therefore, the recognition result is subdivided into individual component parts, for example, a keyword is found. Subsequent parts of the recognition result irrelevant to the information query inquiry are omitted. An analysis is then made from the speech recognizer or database. Therefore, in order to analyze the recognition result, it is necessary to have information about the content of the voice input. The possible content of the voice input is determined by the content of the information unit. By this analysis, query queries are made against the database. These query queries are sent to separate databases to generate a plurality of independently generated interpretation results. An important aspect that has a decisive impact on the quality of the response to voice input by the user is the database used to find the answer to the query. The number of independent databases continues to grow. In addition, there is an extensive business database that can also help you find answers. When the answer must be found, individual databases are integrated in that recognition results are assigned to databases for multiple interpretations.

인식 결과를 생성하기 위한 음성 인식은 다른 비용 레벨로 사용될 수 있다. 음성 인식자는 그들의 크기 및 전문 단어 뿐만 아니라 음성 인식을 수행하는 알고리즘에 의해서도 구별된다. 양호한 데이터베이스 질의조회는 사용자의 음성 입력을 통해 사용자에 의해 이루어진 질의조회의 양호한 인식을 필요로 한다.Speech recognition for generating recognition results can be used at different cost levels. Speech recognizers are distinguished not only by their size and terminology, but also by algorithms that perform speech recognition. A good database query query requires good recognition of the query query made by the user through the user's voice input.

음성 인식자 또는 데이터베이스로부터의 해석 결과는 클라이언트에 자동적으로 되돌려 전송되거나, 또는 서버가 상기 해석 결과를 이용가능하게 하여, 사용자가 필요할 때 개별적인 해석 결과를 검색할 수 있다. 어느 경우이든지, 해석 결과는 사용자가 지각할 수 있는 형태로 클라이언트에 의해 인도된다.The interpretation results from the speech recognizer or database are automatically sent back to the client or the server makes the interpretation results available so that the user can retrieve the individual interpretation results when needed. In either case, the interpretation results are guided by the client in a form that the user can perceive.

정보 유닛과 하나 이상의 음성 인식자의 결합에 의하여, 사용자에게 음성 입력으로 이루어진 질의조회에 대한 복수의 답이 제공된다. 그 결과 상기의 방법이 아니었더라면, 사용자는 상당한 시간을 지연하며 복수의 질의조회를 시작했어야만 하는 정보를 수신한다.By the combination of the information unit and one or more speech recognizers, the user is provided with a plurality of answers to the query query made of the speech input. As a result, if it were not the above method, the user would receive information that should have initiated multiple query queries with a significant delay.

음성 인식을 하는 동안의 서로 다른 인식 결과는 제외하더라도, 다른 데이터베이스에 기초한 개별적인 인식 결과의 독립적인 해석의 결과로서 서로 다른 해석 결과가 생성되는데, 상기 해석 결과는 각각 사용자로부터 오는 음성 입력에 대한 응답을 준다. 음성 입력에 대한 단 하나의 해석이 있다면, 질의조회에 대한 가장 있음직한 제한된 수의 응답이 클라이언트에게 전송되거나, 사용자는 그 컨텐트에 관해서 질의조회와 동떨어진 응답을 수신할 것이다. 하나 이상의 인식 결과의 다중의 해석의 결과로서 사용자는 동일한 시간에 적어도 두 배의 정보의 양을 통보 받는다.Apart from the different recognition results during speech recognition, different interpretation results are generated as a result of independent interpretation of individual recognition results based on different databases, each of which interprets the response to speech input from the user. give. If there is only one interpretation of the voice input, the most likely limited number of responses to the query query will be sent to the client, or the user will receive a response away from the query query about the content. As a result of multiple interpretations of one or more recognition results, the user is informed of at least twice the amount of information at the same time.

음성 입력이 단지 하나의 음성 인식자에 할당될 때, 클라이언트에 되돌려 전송되거나 사용자에 의해 검색되어 사용자의 질의조회에 다중의 응답을 제공하는 해석 결과를 모두 생성하는 복수의 해석 처리 과정에 인식 결과가 인가된다.When a voice input is assigned to only one voice recognizer, the recognition result is sent to a plurality of interpretation processes that are sent back to the client or retrieved by the user to produce all the analysis results that provide multiple responses to the user's query. Is approved.

본 발명의 추가적인 실시예에서는 클라이언트 측에서 음성 입력을 미리 처리하는 것이 유리하다는 것이 증명되었다. 이러한 선처리를 위하여 추가적인 소프트웨어가, 정보 유닛이 로드될 때, 시작하는데, 상기 추가적인 소프트웨어는 음성 입력의 특징들을 추출한다. 상기 추가적인 소프트웨어는 전기 신호로 이용할 수 있는(available) 음성 입력을 디지털화하고, 양자화하여, 각각의 분석을 하는데, 상기 분석은 특징 벡터(feature vector)가 할당되는 성분을 생성한다. 상기 특징 벡터는 이후에 연결된 음성 인식자에 전송된다. 음성 인식자는 컴퓨테이션-인텐시브 인식(computation-intensive recognition)을 실행한다. 클라이언트 상에서 실행되는 특징들의 추출의 결과로서 음성 입력은 압축되고 코드화되므로 전송될 데이터의 수는 감소된다. 게다가 특징 추출을 위해 필요한 시간이 클라이언트 측에서 감소되므로 음성 인식자는 인가된 특징 벡터의 인식을 실행하기만 한다. 빈번히 사용되는 음성 인식자라면, 이런 감소는 유리할 수 있다. 음성 입력이 복수의 음성 인식자에 할당될 때, 선처리가 단지 한번만 수행될 필요가 있다는 점에서 유리하다. 클라이언트 측에서 특징 추출을 하지 않는다면, 각각의 선택된 음성 인식자가 그러한 추출을 실행해야 한다.In a further embodiment of the invention it has proven advantageous to preprocess the voice input on the client side. Additional software for this preprocessing starts when the information unit is loaded, which extracts features of the voice input. The additional software digitizes, quantizes, and performs each analysis of the available voice input as an electrical signal, which produces a component to which a feature vector is assigned. The feature vector is then sent to a connected speech recognizer. The speech recognizer performs computation-intensive recognition. As a result of the extraction of features executed on the client, the voice input is compressed and coded so that the number of data to be transmitted is reduced. In addition, since the time required for feature extraction is reduced on the client side, the speech recognizer only executes the recognition of the authorized feature vector. If it is a frequently used speech recognizer, this reduction may be advantageous. When speech input is assigned to a plurality of speech recognizers, it is advantageous in that preprocessing only needs to be performed once. If no feature extraction is done on the client side, each selected speech recognizer must perform such extraction.

본 발명의 추가적인 실시예로서, 클라이언트가 서버로부터 HTML(Hyper Text Makeup Language) 페이지의 형태의 정보 유닛을 다운로드할 것을 제안한다. 상기 HTML 페이지는 클라이언트 상에서 웹 브라우저에 의해 보여진다. 클라이언트는 링크를 통하여 서버에 연결을 셋업하며, 상기 링크 상에 사용자가 관심을 갖는 HTML페이지가 저장된다. HTML 페이지는 표시될 텍스트 이외에 그래픽 기호, 오디오 및/또는 비디오 데이터를 포함할 수 있다. HTML 페이지는 지시(indication)를 통하여 음성 입력을 하도록 사용자에게 요청한다. 사용자가 음성 입력을 한 후에, 상기 음서 입력은 클라이언트로부터 하나 이상의 음성 인식자에게 전달된다. 음성 인식자에서 음성 인식이 실행된다. 인식 결과의 품질은 음성 인식자가 얼마나 전문화되어 있느냐에 따라 결정적으로 의존한다. 음성 인식자는 어떤 한정된 단어를 갖고 작업을 하는데, 이것이 대개는 음성 인식자가 특정한 분야의 응용에 제한되는 이유이다. 그러므로, 음성 입력이 전달되는 음성 인식자가 그에 따라 특성화되어야하는 것이 사용가능한 인식 결과를 위해 중요하다. 경우에 따라, 인식 결과 또는 복수의 인식 결과가 해석 과정을 거친다. 이러한 목적을 위하여, 예를 들면, 인식된 음성 입력이 데이터베이스를 위해 분석되고, 이 분석에 기초하여 상기 데이터베이스의 데이터 파일에 질의조회가 이루어진다. 결과적인 해석 결과가 자동적으로 클라이언트에 전송되거나 또는 클라이언트에 의해 검색되어 클라이언트 상에서 웹 브라우저로 표시된다. 이제 사용자는 복수의 해석 결과에서 선택을 할 수 있다. 이러한 작동은 시간을 절약하기 위해서 복수의 사전을 찾아보는 것과 비교될 수 있다.As a further embodiment of the invention, it is proposed that the client download an information unit in the form of a Hyper Text Makeup Language (HTML) page from the server. The HTML page is viewed by a web browser on the client. The client establishes a connection to the server via a link, where the HTML page of interest to the user is stored. The HTML page may contain graphical symbols, audio and / or video data in addition to the text to be displayed. The HTML page asks the user to make voice input via an indication. After the user makes a voice input, the note input is passed from the client to one or more voice recognizers. Speech recognition is performed on the speech recognizer. The quality of the recognition result depends critically on how specialized the speech recognizer is. Speech recognizers work with certain finite words, which is usually the reason why speech recognizers are limited to specific applications. Therefore, it is important for the recognition results to be available that the speech recognizer to which the speech input is delivered must be characterized accordingly. In some cases, the recognition result or the plurality of recognition results are analyzed. For this purpose, for example, the recognized speech input is analyzed for a database and a query is made on a data file of the database based on this analysis. The resulting interpretation is automatically sent to the client or retrieved by the client and displayed on the client in a web browser. The user can now select from multiple analysis results. This operation can be compared to browsing multiple dictionaries to save time.

본 발명의 추가적인 실시예에서, 각각 음성 입력을 가능하게 하는 HTML 페이지 상에서의, 예컨대 회사 광고와 같은 복수의 객체(object)를 표현하는 것이 제공된다. 각각의 객체에는 통신 네트워크를 통하여 연결된 음성 인식자가 할당되며, 상기 음성 인식자에는 사용자로부터 인입하는 음성 입력이 전송된다. 음성 인식자는 음성 인식을 실행하며 개별적인 인식 결과를 독립적인 해석 처리 과정에 전달한다. 클라이언트에 되돌려 전송되거나, 사용자에 의해 검색되는 해석 결과는 그래픽 표시 또는 오디오 신호로서 사용자에게 제공된다.In a further embodiment of the present invention, it is provided to represent a plurality of objects, such as a company advertisement, for example, on an HTML page each enabling voice input. Each object is assigned a voice recognizer connected via a communication network, and the voice recognizer receives a voice input coming from the user. The speech recognizer executes speech recognition and passes the individual recognition results to an independent interpretation process. The analysis results sent back to the client or retrieved by the user are provided to the user as a graphical display or audio signal.

예를 들면, 광고 배너(banner)로서 실현될 수 있는 객체가 동일한 비지니스 라인(line of business)에서 작업하는 회사에 의해 제공된다면, 사용자에 의한 음성 입력과 그에 대한 다중 병렬 처리의 결과, 사용자는 경쟁 회사들로부터 복수의 제안을 제공받는다.For example, if an object that can be realized as an advertising banner is provided by a company working in the same line of business, then as a result of the voice input by the user and its multiple parallel processing, the user may compete Receive multiple offers from companies.

HTML 페이지 상에 표시되는 비경쟁 회사들의 광고 배너의 경우에 있어서는, 특정 광고 배너에 관련된 사용자의 음성 입력은, 광고 배너가 마우스로 클릭되고, 또는 사용자의 시선이 추적되고, 또는 개별적인 객체에 대한 복수의 음성 입력 옵션에 우선권이 주어기기 때문에, 객체에 할당된 음성 인식자에 전달된다. 음성 입력이나 미리 처리된 음성 입력을 클라이언트의 메모리에 저장하거나, 또는 인식 결과를 클라이언트에게 다시 전송하여, 다른 해석 처리의 목적으로 사용자가 언제나 이용가능한 이러한 중간의 결과를 사용할 수 있다면, 유리할 것이다. 저장된 음성 입력 또는 인식 결과는, 음성 입력이 저장되었다면 다른 음성 인식자에 전달되고, 인식 결과가 저장되었다면, 다른 데이터베이스에 전달되어, 추가적인 해석과 함께 추가적인 해석 결과를 만들 수 있다.In the case of an advertising banner of non-competitive companies displayed on an HTML page, the user's voice input related to a particular advertising banner may include a user clicking on the advertising banner, tracking the user's gaze, or a plurality of individual objects for an individual object. Because voice input options are given priority, they are passed to the voice recognizer assigned to the object. It would be advantageous if the voice input or preprocessed voice input could be stored in the client's memory, or the recognition results sent back to the client to use this intermediate result, which is always available to the user for the purpose of other interpretation processing. The stored voice input or recognition result can be passed to another voice recognizer if the voice input has been stored, and passed to another database if the recognition result has been stored, creating an additional interpretation result with additional interpretation.

추가적인 실시예에서, 음성 입력에 의해 인에이블된(enabled) 웹 브라우저에 의해 표시되는 복수의 객체로부터 선택이 이루어진다. 보여지는 총 객체의 수로부터, 예를 들면 마우스를 클릭하여 사용자는 몇 개의 객체를 선택한다. 그 후에 음성 입력은 이렇게 선택된 객체의 음성 인식자에만 전송된다.In a further embodiment, the selection is made from a plurality of objects displayed by a web browser enabled by voice input. From the total number of objects shown, the user selects several objects, for example by clicking on the mouse. The voice input is then sent only to the voice recognizer of the object thus selected.

본 발명의 추가적인 실시예에서, 객체와 음성 인식자를 결합하기 위하여, 서버는 추가적인 정보를 HTML 테그의 형태로 각 객체에 할당한다. 그 결과, HTML 페이지가 다운로드되는 동안, 인터넷 상의 어떤 음성 인식자에 음성 입력이 전송되어 처리될 것인지의 정보가 객체에 통지된다.In a further embodiment of the invention, in order to combine the object with the speech recognizer, the server assigns additional information to each object in the form of an HTML tag. As a result, while the HTML page is being downloaded, the object is informed of which voice recognizer on the Internet is to be input and processed.

게다가 이러한 추가적인 정보와 함께 인식 결과의 해석이 이루어지는 데이터베이스를 할당하는 것이 또한 가능하다. 그 결과, HTML 페이지의 제공자는 어떤 데이터베이스에 인식 결과 또는 질의조회가 전송될 것인지를 결정한다.In addition it is also possible to allocate a database in which interpretation of the recognition result is carried out with this additional information. As a result, the provider of the HTML page determines in which database the recognition results or query queries are sent.

본 발명의 추가적인 유리한 실시예는, 인식 결과가 어떤 데이터베이스에 전송될 것인지의 결정을 음성 인식자에게 맡기는 가능성에 의해 제공된다. 이러한 점은 어떤 데이터베이스에서 사용자의 질의조회가 처리될 것인지에 관한 결정의 이동(shift)을 성취한다. 음성 인식자를 각각의 객체에 할당하는 HTML 페이지의 제공자가 데이터베이스에 관하여 최신의 것은 아니지만, 음성 인식자의 조작자는 최신식이고, 상기 조작자가 데이터베이스를 할당한 사람일 때, 요청에 대한 응답의 품질은 그 결과로서 증가한다.A further advantageous embodiment of the present invention is provided by the possibility of entrusting the speech recognizer with the decision of which database the recognition result will be sent to. This achieves a shift in the decision as to which database the user's query query will be processed. Although the provider of the HTML page that assigns the speech recognizer to each object is not up-to-date with respect to the database, the operator of the speech recognizer is up-to-date and when the operator is the assigner of the database, the quality of the response to the request is the result. Increase as.

새로운 서적의 출판물에 관한 정보를 알리고, 복수의 서로 다른 출판사의 광고 배너로 스위치하는 HTML 페이지의 경우에, 출판사와 독립적인 HTML 페이지 제공자는 각각의 분야에서의 새로운 출판물에 관한 사용자의 질의조회로부터의 인식 결과를 상기 사용자가 이용가능한 모든 데이터베이스에 전송할 수 있다. 그 결과 사용자는 각각의 분야에서의 새로운 서적의 출판물에 대한 광범위한 정보를 빠르게 수신할 수 있다.In the case of HTML pages that announce information about the publications of new books and switch to advertising banners from a number of different publishers, publisher-independent HTML page providers may request from users queries about new publications in their respective fields. The recognition result can be sent to all databases available to the user. As a result, a user can quickly receive extensive information about the publication of a new book in each field.

게다가, 본 발명의 목적은 클라이언트가 검색할 수 있는 정보 유닛이 저장된 서버에 의해서도 또한 성취될 수 있는데,In addition, the object of the present invention can also be achieved by a server storing information units that can be retrieved by a client.

- 클라이언트는 사용자에게 전송할 복수의 해석 결과를 생성하기 위하여 하나 이상의 음성 인식자에 연결될 수 있으며The client may be connected to one or more speech recognizers to generate a plurality of interpretation results for transmission to the user.

- 인식 결과를 생성하기 위해 적어도 하나의 음성 인식자에 음성 입력이 인가되고 인식 결과가 복수의 독립적인 처리 과정에서 해석되며A speech input is applied to at least one speech recognizer to generate a recognition result and the recognition result is interpreted in a plurality of independent processes.

- 음성 입력이 가능한 객체와 인식 결과를 생성하는 음성 인식자와의 결합을 결정하기 위하여, 추가적인 정보가 상기 객체에 할당된다.Additional information is assigned to the object in order to determine the combination of the object capable of speech input and the speech recognizer generating a recognition result.

이러한 그리고 다른 본 발명의 국면은 지금부터 설명하는 실시예를 참조하면 명백하고 분명해질 것이다.These and other aspects of the invention will become apparent and apparent with reference to the embodiments described hereinafter.

도 1은 본 발명에 따른 방법을 구현하기 위한 배열을, 예로서, 도시한다. 정보 유닛(3)이 서버(1)에 저장된다. 서버(1)는 통신 네트워크(6)를 통해서 클라이언트(2)와 연결될 수 있다. 이후로는 소위 인터넷(6)이라 칭할 통신 네트워크(6)를통해 음성 인식자(7-9)가 클라이언트(2)와 연결될 수 있다. 또한 인터넷(6)을 통하여 데이터베이스(5)는 클라이언트(2), 음성 인식자(7-9) 및 서버(1)에 연결될 수 있다.1 shows, by way of example, an arrangement for implementing a method according to the invention. The information unit 3 is stored in the server 1. The server 1 can be connected with the client 2 via the communication network 6. The voice recognizer 7-9 can be connected to the client 2 via a communication network 6, hereinafter referred to as the Internet 6. The database 5 can also be connected to the client 2, the voice recognizer 7-9 and the server 1 via the Internet 6.

제공자는 정보 유닛(3)을 서버(1)에 저장하고, 사용자로 하여금, 예컨대 상기 제공자를 경유하여 정보에 엑세스하도록 한다. 정보 유닛(3)은 표시할 컨텐트(content) 및 포맷 명령(formatting instructions)뿐만 아니라 추가적인 정보(4)를 포함한다. 사용자는 서버(1)로부터, 이후에 HTML 페이지(3)로 참조될, 사용자의 관심 대상인 정보 유닛(3)을 다운로드한다. 이 목적을 위하여, TCP/IP 프로토콜에 기초한 접속이 서버(1)로 셋업된다. 예컨대 웹 브라우저(Web browser)에 의해 실현될 수 있는 소프트웨어가 클라이언트(2) 상에서 실행되어 HTML 페이지(3)가 사용자에게 디스플레이된다. 클라이언트(2)는, 사용자가 발음한 음성 입력 또는 음성 인식자(7-9)로부터 수신한 인식 결과가 저장되는 메모리(25)를 포함한다.The provider stores the information unit 3 on the server 1 and allows the user to access the information, for example via the provider. The information unit 3 comprises additional information 4 as well as content and formatting instructions to be displayed. The user downloads from the server 1 an information unit 3 of interest to the user, which will later be referred to as the HTML page 3. For this purpose, a connection based on the TCP / IP protocol is set up with the server 1. Software that can be realized by, for example, a web browser is executed on the client 2 so that the HTML page 3 is displayed to the user. The client 2 includes a memory 25 in which a voice input pronounced by a user or a recognition result received from the voice recognizer 7-9 is stored.

도 2는 음성-입력 옵션의 형태의 상호작용을 사용자에게 제공하는 정보 유닛(3)을 도시한다. 객체(object)(19,20 및 21)는 광고 배너(banner)인데, 예컨대 자동자 회사의 광고를 사용자에게 보여준다. 게다가 상기 광고 배너는, 예컨대 "당신이 좋아하는 자동차를 말씀해 주십시요"와 같은 번쩍이는 텍스트(flashing text)를 통하여 사용자가 음성 입력을 할 수 있음을 알리는 음성 입력 옵션을 HTML 페이지는 제공한다. 이러한 실시예의 예시에서, 세 개의 광고 배너(19,20,21) 모두는 유사한 음성 입력을 받을 것을 기대한다. 그러므로, 음성 입력은 인터넷(6)을 통하여 단 하나의 음성 인식자(7)에 전달된다. 예를 들면, 자동차를 발견하기 위하여,사용자는 그의 관심 분야에 속하는 개념 또는 단어 그룹을 발음할 수 있는데, 상기 발음한 개념 또는 단어는 입력 디바이스(10)에 의하여 클라이언트에게 인가되어, 음성 인식자(7)에 전달된다. 추가적인 소프트웨어(미도시)에 의해 클라이언트(2)는 음성 입력의 특징들을 추출할 수 있으므로 음성 인식자(7)에는 압축된 형태의 특징 벡터(feature vectors)로 정리된 음성-입력 특징들만이 제공된다. 음성 인식자(7)는 음성 인식을 수행하여 인식 결과(11)를 생성한다. 상기 인식 결과(11)는 분석되어 질의조회로서 음성 인식자(7)로부터 데이터베이스(14,15,16)로 전송된다. 이 경우에 데이터베이스(14,15,16)로 전송된 질의조회는 모두 동일하다.2 shows an information unit 3 for providing a user with an interaction in the form of a voice-input option. Objects 19, 20 and 21 are advertising banners, for example, showing advertisements of auto companies to the user. In addition, the advertising banner provides a voice input option that informs the user that voice input can be made, for example, via flashing text such as "Please tell me your favorite car." In the example of this embodiment, all three advertising banners 19, 20, 21 expect to receive similar voice input. Therefore, the voice input is transmitted to only one voice recognizer 7 via the Internet 6. For example, in order to find a car, a user may pronounce a concept or group of words belonging to his or her field of interest, which is pronounced by the input device 10 to the client, thereby providing a speech recognizer ( 7) is delivered. The additional software (not shown) allows the client 2 to extract the features of the speech input so that the speech recognizer 7 is provided with only speech-input features arranged as feature vectors in compressed form. . The speech recognizer 7 performs speech recognition to generate a recognition result 11. The recognition result 11 is analyzed and transmitted from the voice recognizer 7 to the database 14, 15, 16 as a query query. In this case, all query queries sent to the databases 14, 15, and 16 are identical.

데이터베이스는 또한 음성 인식자(7)와 동일한 서버에 위치할 수 있다. 그러나, 상기 질의조회들이 전송되는 데이터베이스가 서로 다른 서버에 위치할 수도 있다. 음성 인식자(7)는 HTML 페이지(3)의 제공자에 속하거나 그에 의해 채용되었다(hired)는 사실을 주목해야한다. 상기 제공자는 질의조회가 HTML 페이지(3) 상에서 자동차 이후에 이루어졌다는 사실을 알기 때문에, 클라이언트는 음성 입력을 인식하기 위하여 전담 음성 인식자에 연결된다. 데이터베이스(14)는 광고 배너(19)의 자동차 회사의 파일로부터 온 데이터를 포함한다. 데이터베이스(15)는 광고 배너(20)와 함께 자동차 회사의 데이터를 포함하고, 데이터베이스(16)는 광고 배너(21)와 함께 자동차 회사의 데이터를 포함한다. 그리고, 데이터베이스(14,15,16)는 질의조회와 일치하는 정보를 위하여 검색된다. 상기 동작은 또한 해석으로 참조된다. 데이터베이스(14,15,16)는 각각 인터넷(6)을 통해 전송된 후에 클라이언트(2)에 보여지는 해석 결과(22,23 및 24)를 생성한다. 해석결과(22)와 함께, 광고 배너(19)를 갖는 자동차 회사로부터의 제안이 사용자에게 제공된다. 그리고, 해석 결과(23)와 함께, 광고 배너(20)를 갖는 자동차 회사로부터의 제안이 사용자에게 제공되며, 해석 결과(24)와 함께, 광고 배너(21)를 갖는 자동차 회사로부터의 제안이 사용자에게 제공된다.The database may also be located on the same server as the speech recognizer 7. However, databases to which the query queries are sent may be located on different servers. It should be noted that the speech recognizer 7 belongs to or has been employed by the provider of the HTML page 3. Since the provider knows that the query query is made after the car on the HTML page 3, the client is connected to a dedicated voice recognizer to recognize the voice input. The database 14 contains data from the file of the automobile company of the advertising banner 19. The database 15 contains the data of the automobile company with the advertising banner 20, and the database 16 contains the data of the automobile company with the advertising banner 21. Databases 14, 15, and 16 are then searched for information that matches the query. This operation is also referred to in the interpretation. Databases 14, 15, and 16 generate analysis results 22, 23 and 24, which are shown to client 2 after being transmitted via the Internet 6, respectively. Along with the analysis result 22, a proposal from the car company with the advertising banner 19 is provided to the user. Then, together with the analysis result 23, a proposal from the automobile company having the advertisement banner 20 is provided to the user, and together with the analysis result 24, the proposal from the automobile company having the advertisement banner 21 is presented to the user. Is provided to.

이러한 방식으로 세 개의 서로 다른 데이터베이스(14-16)로부터의 정보가 사용자가 이용가능하게 제공된다. 예를 들어, 사용자는 이제 광고 배너(19)를 갖는 자동차 회사의 파일로부터 자동차의 제안, 광고 배너(20)를 갖는 자동차 회사의 제안 및 광고 배너(21)를 갖는 회사로부터의 제안을 수신한다.In this way, information from three different databases 14-16 is made available to the user. For example, a user now receives a suggestion of a car from a file of a car company with an advertising banner 19, a suggestion of a car company with an advertising banner 20, and a suggestion from a company with an advertising banner 21.

음성 입력 및/또는 인식 결과가 어는 음성 인식자 및/또는 데이터베이스에 전달되는 지에 관한 정보는 HTML 페이지의 제공자에 의해 주어지며, 상기 제공자는 광고 배너에 대한 고객으로부터 정보를 받는다.Information regarding which speech input and / or recognition results are passed to the frozen speech recognizer and / or database is given by the provider of the HTML page, which receives information from the customer for the advertising banner.

HTML 페이지의 제공자는 인식 결과의 분석에 중요한 정보를 음성 인식자 또는 데이터베이스에 전달할 수 있다.The provider of the HTML page can convey information important to the analysis of the recognition result to the speech recognizer or database.

연속적인 질의조회와 함께 음성 입력이 메모리(25)에 저장된다는 점에서, 메모리(25)는 배열을 확장한다. 대안으로, 상기 메모리(25)에 이미 생성된 인식 결과를 저장하는 것이 가능하다. 그 경우에 매번 음성 입력이나 또한 음성 인식을 반복하지 않고, 사용자는 연속적으로 복수의 데이터베이스에 질의조회할 수 있다.Memory 25 expands the arrangement in that voice input is stored in memory 25 with successive query queries. Alternatively, it is possible to store the recognition result already generated in the memory 25. In that case, the user can query the plural databases continuously without repeating the voice input or speech recognition each time.

도 3은 음성 입력이 세 개의 서로 다른 음성 인식자(7,8 및 9)에 전달되는 방법에 관한 배열을 도시한다. 따라서, 객체(object)(19,20 및 21)의 사용자는 음성 입력을 할 것을 요청받는다. 이러한 음성 입력은, 각각의 인식 결과(11,12 및13)를 생성하기 위하여, 음성 인식자(7,8 및 9)에 전달된다. 음성 인식자(7-9)는 인식 결과(11,12 및 13)를 분석하며, 데이터베이스(14,15 및 16)에 대한 각각의 질의조회를 준비한다. 한편으로는 서로 다른 음성 인식자(7-9)에 의해 생성되므로 인식 결과(11,12 및 13)가 서로 다르고, 다른 한편으로는 분석을 하는 동안 서로 다른 데이터베이스(14,15 및 16)에 인가되는 서로 다른 질의요청이 이러한 다른 인식 결과(11,12 및 13)와 함께 생성되기 때문에, 사용자는 클라이언트(2) 상에서 그에게 되돌려지고 서로 다른 데이터베이스에 기초한 응답인 해석 결과(22,23 및 24)를 수신한다.3 shows an arrangement of how speech inputs are delivered to three different speech recognizers 7, 8 and 9. Thus, the user of objects 19, 20 and 21 is asked to make a voice input. This speech input is passed to speech recognizers 7, 8 and 9 to produce respective recognition results 11, 12 and 13. The speech recognizers 7-9 analyze the recognition results 11, 12 and 13 and prepare respective query queries for the databases 14, 15 and 16. On the one hand it is generated by different speech recognizers 7-9, so the recognition results 11, 12 and 13 are different and on the other hand it is applied to different databases 14, 15 and 16 during the analysis. Since different query requests are generated with these different recognition results 11, 12 and 13, the user is returned to him on the client 2 and the interpretation results 22, 23 and 24, which are responses based on different databases. Receive

인식 결과의 분석이 음성 인식자 대신 데이터베이스에서 수행될 때, 다른 실시예가 있다. 따라서, 데이터베이스(14-16)는 각각의 데이터베이스에 특별히 포함된 키워드(key word)로 개별적인 인식 결과(11,12 및 13)의 분석을 할 수 있다.When the analysis of the recognition result is performed in a database instead of the speech recognizer, there is another embodiment. Thus, the database 14-16 can analyze the individual recognition results 11, 12 and 13 with key words specifically included in each database.

텔레비전 프로그램에서, 다른 스테이션(station)에 대해서는 각각의 특징들이 다르게 지시된다. 예를 들면, 한 스테이션에 대해서는 "어린이 영화"의 특징이 다른 스테이션에 대해서는 "트릭 영화"로 참조될 수도 있다. 이제 사용자가 트릭 영화를 보기 원한다고 말하면, 이 음성 입력은 할당된 음성 인식자에 의해 인식되어 각각의 데이터베이스에서 유사하게 해석되므로 사용자는 궁극적으로 스테이션에 따라 트릭 영화 또는 어린이 영화로 참조된 영화를 제공받는다.In a television program, the respective features are indicated differently for different stations. For example, the feature of "child movies" for one station may be referred to as "trick movie" for another station. Now if the user says that they want to watch a trick movie, this voice input is recognized by the assigned voice recognizer and interpreted similarly in each database, so the user is ultimately provided with a movie referred to as a trick movie or a child movie, depending on the station. .

도 4는 데이터베이스(14-16)가 음성 인식자(7-9)와 함께 통합된 배열을 도시한다. 더 작은 데이터 파일로 데이터베이스(14-16)를 각각의 음성 인식자(7-9)와 함께 통합하는 것이 가능하다. 게다가 각각의 광고 배너(19-21)로부터 그에 관련된해석 결과(22-24) 및 관련 데이터베이스(14-16)로의 양방향 링크가 설정된 것이 여기에 표시되어 있다. 데이터베이스(14-16) 중의 하나에서 질의요청의 응답이 너무 크기 때문에 클라이언트 상에서 해석 결과(22-24)의 표시가 현명하지 않은 경우가 있다. 그러한 경우에, 예를 들면, 음성 입력에 따라 발견된 응답의 수만이 클라이언트에 전송되어 디스플레이된다. 예를 들어, 광고 배너(19)를 갖는 회사의 해석 결과(21)를 사용자가 보기를 원할 때면, 사용자는 상기 해석 결과를 요청하고, 데이터베이스(14)로부터 상기 해석 결과를 검색할 수 있다. 그 후에 이러한 결과들이 클라이언트(2) 상에서 디스플레이 된다.4 shows an arrangement in which database 14-16 is integrated with speech recognizers 7-9. It is possible to integrate the database 14-16 with each speech recognizer 7-9 into smaller data files. In addition, it is shown here that a bidirectional link from each advertisement banner 19-21 to its associated interpretation results 22-24 and associated database 14-16 has been established. In one of the databases 14-16, the display of the interpretation results 22-24 on the client may be unwise because the response of the query request is too large. In such a case, for example, only the number of responses found according to the voice input is sent to the client for display. For example, when a user wants to view an analysis result 21 of a company having an advertising banner 19, the user can request the analysis result and retrieve the analysis result from the database 14. These results are then displayed on the client 2.

상술한 바와 같이 본 발명은 음성 입력을 가능하게 하는 정보 유닛이 서버에 저장되어 클라이언트가 상기 정보 유닛을 검색할 수 있는 방법에 이용할 수 있다.As described above, the present invention can be used in a method in which an information unit enabling voice input is stored in a server so that a client can retrieve the information unit.

Claims

An information unit 3 capable of voice input is stored in the server 1 and can be retrieved by the client 2, and the client 2 is provided with a plurality of speech recognizers ( 7-9), a user's voice input is applied to at least one voice recognizer 7-9 for generating a recognition result 11-13, and the recognition result 11-13 Interpreted in an independent process, wherein a plurality of analysis results (22-24) are supplied to the user.

Method according to claim 1, characterized in that the analysis results (22-24) are automatically returned to the client (2) or retrieved by the client.

Method according to one of the preceding claims, characterized in that the speech input is applied in parallel to a plurality of speech recognizers (7-9) to produce a recognition result (11-13).

The method according to any one of claims 1 to 3, wherein additional software for extracting the features of the speech input is executed on the client (2) and the extracted features are assigned to the assigned speech recognizer (s) 7. -9).

2. The information unit (3) according to claim 1, wherein the information unit (3) is realized as an HTML page (3) and a plurality of objects (19-21) are found on one HTML page (3), each object being an object. And (19-21) enables speech input while being combined with the speech recognizer (7-9).

The additional information (4) for combining the objects (19-21) with one of the voice recognizers (7-9), respectively, is provided by the server (1). Assigned to them.

The voice input or the recognition results 11-13 are buffered in the memory 25 so that a plurality of analysis processes are continuously performed on the basis of the buffered data. The method characterized in that the execution.

A server 1 in which an information unit 3 capable of voice input is stored,

The information unit 3,

-The client 2 can be connected to one or more speech recognizers 7-9 to generate a plurality of interpretation results 11-13 to send to the user,

While a speech input is applied to at least one speech recognizer 7-9 for generating a recognition result 11-13 and for interpreting the recognition result 11-13 in a plurality of independent processes. Can be retrieved by the client 2,

In order to determine the association between the object capable of speech input and the speech recognizer 7-9 generating the recognition result 11-13, additional information 4 is assigned to the object 19-21,

Server 1 in which information unit 3 capable of voice input is stored