KR20020052196A

KR20020052196A - Pattern matching method and apparatus

Info

Publication number: KR20020052196A
Application number: KR1020027005531A
Authority: KR
Inventors: 필립 네일 가너; 제이슨 피터 앤드류 챨스워스; 아사꼬 히구찌
Original assignee: 미다라이 후지오; 캐논 가부시끼가이샤
Priority date: 1999-10-28
Filing date: 2000-10-25
Publication date: 2002-07-02

Abstract

본 발명에 따르면, 텍스트 또는 음성으로부터 생성될 수 있는 음소의 2개 이상의 시퀀스를 매칭하는 시스템이 제공된다. 바람직하게는, 2개의 시퀀스가 텍스트 또는 음성으로부터 생성되는지에 따르고, 동적 프로그래밍 경로의 스코어링이 적절한 곳에서 음소 혼동 스코어, 음소 삽입 스코어 및 음소 삭제 스코어에 의해 가중되는 제한을 갖는 동적 프로그래밍 매칭 기술이 사용된다.According to the present invention, a system is provided for matching two or more sequences of phonemes that may be generated from text or speech. Preferably, a dynamic programming matching technique is used that has constraints that are weighted by phoneme confusion score, phoneme insertion score and phoneme deletion score where scoring of the dynamic programming path is appropriate, depending on whether the two sequences are generated from text or speech do.

Description

[0001] PATTERN MATCHING METHOD AND APPARATUS [0002]

정보의 데이터베이스는 공지되어 있고, 신속하고 효율적으로 데이터베이스로부터 원하는 정보를 찾아서 불러오는(locate and retrieve) 방법의 문제가 존재한다. 기존의 데이터베이스 검색 도구는 타이핑된 키워드를 사용하여 사용자가 데이터베이스를 검색할 수 있게 한다. 이는 신속하고 효율적이지만, 이러한 형태의 검색은 비디오 또는 오디오 데이터베이스 등의 각종의 데이터베이스에는 적절하지 못하다.BACKGROUND OF THE INVENTION A database of information is known and there exists a problem of a method of locating and retrieving desired information quickly and efficiently from a database. Existing database search tools use typed keywords to allow users to search the database. While this is fast and efficient, this type of search is not appropriate for a variety of databases such as video or audio databases.

오디오 및 비디오 파일에서 음성 컨텐츠(speech content)의 음성 녹음(phonetic transcription)으로 비디오 및 오디오 데이터베이스에 주해를 달아, 후속의 불러오기(retrieval)가 데이터베이스의 음소 주해와 사용자의 입력 질의의 음성 녹음을 비교함으로써 달성되게 하려는 제안이 최근에 이루어졌다. 음소의 시퀀스를 매칭하도록 제안된 기술은 우선 음소 스트링으로부터 중복되는 고정된 크기의 조각으로서 각각 취해진 질의에서 한 세트의 특성을 정의한 다음에, 질의 및 주해 모두에서 특성의 발생 빈도를 식별하고, 이들 발생 빈도의 코사인 측정치를 사용하여 질의와 주해 사이의 유사성의 측정치를 최종적으로 결정한다. 이러한 종류의 음소 비교 기술의 장점은 질의의 단어 시퀀스가 주해의 단어 시퀀스와 정확하게 매칭되지 않는 상황에 대처할 수 있다는 것이다. 그러나, 특히 질의 및 주해가 상이한 속도로 말해질 때 그리고 질의로부터 단어의 일부 삭제가 있지만 주해로부터는 그렇지 않은 경우 또는 그 반대의 경우에 에러가 발생하는 경향이 있다는 문제점을 경험한다.A video and audio database is annotated with phonetic transcription of speech content in audio and video files so that subsequent retrieval can be done by comparing the phoneme notes in the database with the voice recordings of the user's input query This has been done recently. A technique proposed to match a sequence of phonemes first defines a set of characteristics in each taken query as overlapping fixed-size fragments from a phoneme string, then identifies the frequency of occurrence of the features in both the query and annotation, Finally, a measure of the similarity between query and annotation is determined using the cosine measure of frequency. The advantage of this kind of phoneme comparison technique is that it can cope with situations where the word sequence of the query does not exactly match the word sequence of the annotation. However, we experience the problem that errors tend to occur, especially when the query and annotation are spoken at different rates and there is a partial deletion of the word from the query, but not from the annotation, or vice versa.

본 발명은 음소 등의 시퀀스(sequences of phonemes or the like)를 매칭하는 장치 및 방법에 관한 것이다. 본 발명은 사용자의 입력 질의(input query)에 응답하여 관련된 음성 주해(phonetic annotation)를 갖는 데이터 파일의 데이터베이스를 검색(search)하는데 사용될 수 있다. 입력 질의는 음성 또는 타이핑 질의(voiced or typed query)일 수도 있다.The present invention relates to an apparatus and method for matching sequences of phonemes or the like. The present invention may be used to search a database of data files having associated phonetic annotations in response to a user ' s input query. The input query may be a voiced or typed query.

도 1은 데이터 파일의 주해가 사용자로부터의 타이핑 또는 음성 입력으로부터 생성될 수 있게 하는 사용자 터미널을 도시하는 개략 블록도.BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a schematic block diagram illustrating a user terminal that allows an annotation of a data file to be generated from a typing or voice input from a user.

도 2는 데이터 파일에 주해를 달도록 사용자에 의한 타이핑 입력으로부터 생성되는 음소 및 단어 격자 주해 데이터(phoneme and word lattice annotationdata)의 개략도.2 is a schematic diagram of phoneme and word lattice annotation data generated from a typing input by a user to annotate a data file;

도 3은 데이터 파일에 주해를 달도록 사용자에 의한 음성 입력으로부터 생성되는 음소 및 단어 격자 주해 데이터의 개략도.3 is a schematic diagram of phoneme and word grid annotation data generated from speech input by a user to annotate a data file;

도 4는 타이핑 또는 음성 질의에 의해 사용자가 데이터베이스로부터 정보를 불러올 수 있게 하는 사용자의 터미널의 개략 블록도.4 is a schematic block diagram of a user's terminal that allows a user to retrieve information from a database by typing or voice query;

도 5a는 도 4에 도시된 사용자 터미널의 흐름 제어 중 일부를 도시하는 흐름도.5A is a flow chart illustrating some of the flow control of the user terminal shown in FIG. 4;

도 5b는 도 4에 도시된 사용자 터미널의 흐름 제어 중 일부를 도시하는 흐름도.FIG. 5B is a flow chart showing a part of the flow control of the user terminal shown in FIG. 4; FIG.

도 6a는 질의 및 주해를 모두 생성시킨 것으로 추정되는 기본 통계 모델을 도시하는 개략도.6A is a schematic diagram showing a basic statistical model estimated to generate both a query and an annotation;

도 6b는 타이핑 입력을 나타내는 음소의 제1 시퀀스와, 사용자의 음성 입력을 나타내는 음소의 제2 시퀀스를 도시하고 타이핑 입력에 대한 사용자의 음성 입력으로부터 음소 삽입 및 삭제(phoneme insertions and deletions)가 존재할 가능성을 도시하는 개략도.6B shows a first sequence of phonemes representing the typing input, a second sequence of phonemes representing the user's speech input, and the possibility of phoneme insertions and deletions from the user ' s voice input to the typing input Fig.

도 6c는 음성 입력을 각각 나타내는 음소의 제1 시퀀스 및 제2 시퀀스와, 대응 음성 입력에 실제로 음성 입력된 것에 대응하는 음소의 기준 시퀀스(canonical sequence)를 나타내는 음소의 제3 시퀀스의 질의 및 주해를 모두 생성시킨 것으로 추정되는 기본 통계 모델을 도시하는 개략도.6C shows a query and annotation of a third sequence of phonemes representing a canonical sequence of phonemes corresponding to the phonemes corresponding respectively to the first and second sequences of phonemes respectively representing speech input; A schematic diagram showing a basic statistical model presumed to have been generated.

도 7은 시작 널 노드(start null node) 및 종료 널 노드(end null node)와함께 주해 음소의 시퀀스 및 질의 음소의 시퀀스에 의해 형성되는 검색 공간의 개략도.Figure 7 is a schematic diagram of a search space formed by a sequence of phoneme phonemes and a sequence of query phonemes with a start null node and an end null node;

도 8은 수평축이 주해의 음소에 제공되고 수직축이 질의의 음소에 제공되며 주해 음소와 질의 음소 사이의 가능한 매치에 각각 대응하는 다수개의 격자 지점을 도시하는 2차원 플롯.8 is a two-dimensional plot showing a plurality of lattice points corresponding to possible matches between the annotated phoneme and the query phoneme, with the horizontal axis being provided to the phoneme of the annotation and the vertical axis being provided to the phoneme of the query;

도 9a는 주해가 타이핑 입력이고 질의가 음성 입력으로부터 생성될 때 동적 프로그래밍 매칭 프로세스(dynamic programming matching process)에 채용되는 동적 프로그래밍 제약(dynamic programming constraints)을 도시하는 개략도.9A is a schematic diagram illustrating dynamic programming constraints employed in a dynamic programming matching process when an annotation is a typing input and a query is generated from a speech input.

도 9b는 질의가 타이핑 입력일 때 그리고 주해가 음성 입력일 때 동적 프로그래밍 매칭 프로세스에 채용되는 동적 프로그래밍 제약을 도시하는 개략도.9B is a schematic diagram illustrating dynamic programming constraints employed in a dynamic programming matching process when the query is a typing input and when the annotation is a voice input.

도 10은 예컨대 음소에 저장되는 삭제 및 디코딩 확률을 도시하는 개략도.10 is a schematic diagram showing deletion and decoding probabilities stored in phonemes;

도 11은 주해 및 질의가 모두 음성 입력일 때 동적 프로그래밍 매칭 프로세스에 채용되는 동적 프로그래밍 제약을 도시하는 개략도.11 is a schematic diagram illustrating dynamic programming constraints employed in a dynamic programming matching process when annotations and queries are both speech input;

도 12는 동적 프로그래밍 매칭 프로세스에서 수행되는 주요 처리 단계를 도시하는 흐름도.12 is a flow chart illustrating the main processing steps performed in the dynamic programming matching process;

도 13은 널 시작 노드로부터 모든 가능한 시작 지점으로 전파함으로써 동적 프로그래밍 프로세스를 시작하는데 채용되는 주요 처리 단계를 도시하는 흐름도.13 is a flow chart illustrating the main processing steps employed to initiate a dynamic programming process by propagating from a null starting node to all possible starting points;

도 14는 시작 지점으로부터 모든 가능한 종료 지점으로 동적 프로그래밍 경로를 전파하는데 채용되는 주요 처리 단계를 도시하는 흐름도.14 is a flow chart showing the main processing steps employed in propagating the dynamic programming path from the starting point to all possible ending points;

도 15는 종료 지점으로부터 모든 가능한 널 종료 노드로 상기 경로를 전파하는데 채용되는 주요 처리 단계를 도시하는 흐름도.15 is a flow chart showing the main processing steps employed in propagating the path from an end point to all possible null end nodes;

도 16a는 동적 프로그래밍 제약을 사용하여 경로를 전파하는데 수행되는 처리 단계 중 일부를 도시하는 흐름도.16A is a flow diagram illustrating some of the processing steps performed to propagate a path using dynamic programming constraints.

도 16b는 동적 프로그래밍 제약을 사용하여 경로를 전파하는데 포함되는 잔여 처리 단계를 도시하는 흐름도.16B is a flow chart illustrating the remaining processing steps involved in propagating a path using dynamic programming constraints.

도 17은 시작 지점으로부터 종료 지점으로 경로를 전파하기 위한 천이 스코어를 결정하는데 포함되는 처리 단계를 도시하는 흐름도.17 is a flow chart showing the processing steps involved in determining a transition score for propagating a path from a starting point to an ending point;

도 18a는 주해 및 질의 음소의 삭제 및 디코딩을 위한 스코어를 계산하는데 채용되는 처리 단계 중 일부를 도시하는 흐름도.18A is a flow chart illustrating some of the processing steps employed to calculate score for erasure and decoding of annotation and query phonemes;

도 18b는 주해 및 질의 음소의 삭제 및 디코딩을 위한 스코어를 계산하는데 채용되는 잔여 처리 단계를 도시하는 흐름도.FIG. 18B is a flow chart showing the remaining processing steps employed in calculating score for erasure and decoding of annotation and query phonemes; FIG.

도 19는 시작 널 노드 및 종료 널 노드와 함께 주해 음소의 1개의 시퀀스 및 질의 음소의 2개의 시퀀스에 의해 형성되는 검색 공간을 도시하는 개략도.19 is a schematic diagram showing a search space formed by two sequences of a query phoneme and one sequence of annotated phonemes together with a start null node and an end null node;

도 20은 널 시작 노드로부터 모든 가능한 시작 지점으로 전파함으로써 동적 프로그래밍 프로세스를 시작하는데 채용되는 주요 처리 단계를 도시하는 흐름도.20 is a flow chart illustrating the main processing steps employed to initiate a dynamic programming process by propagating from a null starting node to all possible starting points;

도 21은 시작 지점으로부터 모든 가능한 종료 지점으로 동적 프로그래밍 경로를 전파하는데 채용되는 주요 처리 단계를 도시하는 흐름도.Figure 21 is a flow chart illustrating the main processing steps employed to propagate a dynamic programming path from a starting point to all possible ending points;

도 22는 종료 지점으로부터 널 종료 노드로 상기 경로를 전파하는데 채용되는 주요 처리 단계를 도시하는 흐름도.22 is a flow chart showing the main processing steps employed in propagating the path from an end point to a null end node;

도 23은 동적 프로그래밍 제약을 사용하여 경로를 전파하는데 수행되는 처리단계를 도시하는 흐름도.23 is a flow chart illustrating the processing steps performed to propagate a path using dynamic programming constraints.

도 24는 시작 지점으로부터 종료 지점으로 경로를 전파하기 위한 천이 스코어를 결정하는데 포함되는 처리 단계를 도시하는 흐름도.24 is a flow chart illustrating the processing steps involved in determining a transition score for propagating a path from a starting point to an ending point;

도 25a는 주해 및 질의 음소의 삭제 및 디코딩을 위한 스코어를 계산하는데 채용되는 처리 단계 중 제1 부분을 도시하는 흐름도.25A is a flow chart illustrating a first portion of a processing step employed to calculate a score for annotation and query phoneme deletion and decoding.

도 25b는 주해 및 질의 음소의 삭제 및 디코딩을 위한 스코어를 계산하는데 채용되는 처리 단계 중 제2 부분을 도시하는 흐름도.25B is a flow chart showing a second part of the processing steps employed to calculate scores for erasure and decoding of annotation and query phonemes;

도 25c는 주해 및 질의 음소의 삭제 및 디코딩을 위한 스코어를 계산하는데 채용되는 처리 단계 중 제3 부분을 도시하는 흐름도.Figure 25c is a flow chart showing a third part of the processing steps employed to calculate the score for annotation and query phoneme deletion and decoding;

도 25d는 주해 및 질의 음소의 삭제 및 디코딩을 위한 스코어를 계산하는데 채용되는 처리 단계 중 제4 부분을 도시하는 흐름도.FIG. 25D is a flow chart showing a fourth portion of processing steps employed to calculate score for erasure and decoding of annotation and query phonemes; FIG.

도 25e는 주해 및 질의 음소의 삭제 및 디코딩을 위한 스코어를 계산하는데 채용되는 잔여 처리 단계를 도시하는 흐름도.Figure 25E is a flow chart illustrating the remaining processing steps employed to calculate scores for erasure and decoding of annotations and query phonemes;

도 26a는 각각의 주해와 질의를 정렬시키는 상이한 기술을 채용하는 대체 실시예를 도시하는 개략도.26A is a schematic diagram illustrating an alternative embodiment employing a different technique for aligning each annotation and query;

도 26b는 동적 프로그래밍 스코어가 도 26a에 도시된 실시예에서 주해와 질의의 비교에 따라 변하는 방식을 도시하는 플롯.Figure 26B is a plot showing how the dynamic programming score varies according to a comparison of an annotation and a query in the embodiment shown in Figure 26A.

도 27은 입력 음성 질의에 응답하여 원격 서버 내에 위치된 데이터베이스로부터 데이터 파일을 불러오도록 동작 가능한 대체 사용자 터미널의 형태를 도시하는 개략 블록도.Figure 27 is a schematic block diagram illustrating a form of an alternate user terminal operable to retrieve a data file from a database located in a remote server in response to an input voice query;

도 28은 입력 음성 질의에 응답하여 사용자가 원격 서버 내에 위치된 데이터베이스로부터 데이터를 불러올 수 있게 하는 다른 사용자 터미널을 도시하고 개략 블록도.28 is a schematic block diagram illustrating another user terminal that enables a user to retrieve data from a database located within a remote server in response to an input voice query;

본 발명의 목적은 데이터베이스를 검색하는 대체 시스템을 제공하는 것이다.It is an object of the present invention to provide an alternative system for retrieving a database.

일 양상에 따르면, 본 발명은 제1 시퀀스의 특성 및 제2 시퀀스의 특성을 수신하는 수단과; 다수개의 정렬된 특성 쌍을 형성하기 위하여 제2 시퀀스의 특성과 제1 시퀀스의 특성을 정렬시키는 수단과; 정렬된 특성 쌍들 사이의 유사성을 나타내는 비교 스코어를 생성시키도록 각각의 정렬된 특성 쌍의 특성을 비교하는 수단과; 제1 시퀀스의 특성 및 제2 시퀀스의 특성들 사이의 유사성의 측정치를 제공하도록 모든 정렬된 특성 쌍에 대한 비교 스코어를 결합시키는 수단을 포함하는 특성 비교 장치에 있어서, 한 세트의 소정 특성으로부터 취해진 각각의 다수개의 특성과 정렬된 쌍의 제1 시퀀스의 특성을 각각의 정렬된 쌍에 대해 비교하여 제1 시퀀스의 특성과 상기 세트로부터의 각각의 특성 사이의 유사성을 나타내는 대응하는 다수개의 중간 비교 스코어를 제공하는 제1 비교 수단과; 상기 세트로부터의 각각의 다수개의 특성과 정렬된 쌍의 제2 시퀀스의 특성을 각각의 정렬된 쌍에 대해 비교하여 제2 시퀀스의 특성과 상기 세트로부터의 각각의 특성 사이의 유사성을 나타내는 추가의 대응하는 다수개의 중간 비교 스코어를 제공하는 제2 비교 수단과; 다수개의 중간 비교 스코어를 결합시킴으로써 정렬된 쌍에 대한 비교 스코어를 계산하는 수단을 포함하는 것을 특징으로 하는 장치를 제공한다. 이러한 시스템은 인식 시스템에 의한 특성의 오인식으로 인한 제1 시퀀스의 특성 및 제2 시퀀스의 특성 모두에서의 변화를 고려한 장점을 갖는다.According to one aspect, the present invention provides an apparatus comprising: means for receiving a characteristic of a first sequence and a characteristic of a second sequence; Means for aligning characteristics of the second sequence and characteristics of the first sequence to form a plurality of aligned property pairs; Means for comparing characteristics of each ordered property pair to produce a comparison score representing similarity between the ordered property pairs; Characterized in that it comprises means for combining a comparison score for all ordered characteristic pairs to provide a measure of the similarity between the characteristics of the first sequence and the characteristics of the second sequence, Comparing the plurality of characteristics of the first sequence with the characteristics of the first sequence of the aligned pair against each aligned pair to determine a corresponding plurality of intermediate comparison scores representing the similarity between the characteristics of the first sequence and the respective characteristics from the set First comparing means for comparing the first comparison result with the second comparison result; Comparing each of the plurality of characteristics from the set with the characteristics of the second sequence of the aligned pairs for each ordered pair to determine a further correspondence indicating a similarity between the characteristics of the second sequence and the respective characteristics from the set Second comparison means for providing a plurality of intermediate comparison scores; And means for calculating a comparison score for the aligned pair by combining a plurality of intermediate comparison scores. Such a system has an advantage considering the change in both the characteristics of the first sequence and the characteristics of the second sequence due to the misrecognition of the characteristics by the recognition system.

다른 양상에 따르면, 본 발명은 불러올 정보를 식별하도록 음성 특성의 시퀀스를 각각 포함하는 다수개의 정보 엔트리의 데이터베이스를 검색하고, 음성 특성의 시퀀스를 포함하는 입력 질의를 수신하는 수단과; 한 세트의 비교 결과를 제공하도록 각각의 음성 특성의 데이터베이스 시퀀스와 음성 특성의 질의 시퀀스를 비교하는 수단과; 비교 결과를 사용하여 데이터베이스로부터 불러올 정보를 식별하는 수단을 포함하는 장치에 있어서, 비교 수단은 동작의 다수개의 상이한 비교 모드를 갖고, (ⅰ) 음성 특성의 질의 시퀀스가 오디오 신호로부터 생성되었는지 또는 텍스트로부터 생성되었는지; 그리고 (ⅱ) 음성 특성의 현재의 데이터베이스 시퀀스가 오디오 신호로부터 생성되었는지 또는 텍스트로부터 생성되었는지를 결정하며 결정 결과를 출력하는 수단과; 결정 결과에 따라 비교 수단의 동작 모드를 현재의 데이터베이스 시퀀스에 대해 선택하는 수단을 더 포함하는 것을 특징으로 하는 장치를 제공한다. 바람직하게는, 결정 수단이 입력 질의 및 주해가 모두 음성으로부터 생성되는 것을 결정할 때, 비교 수단은 상기된 장치와 같이 동작한다.According to another aspect, the present invention provides an apparatus comprising: means for retrieving a database of a plurality of information entries each comprising a sequence of voice characteristics to identify the information to be retrieved, and receiving an input query comprising a sequence of voice characteristics; Means for comparing a query sequence of voice characteristics with a database sequence of each voice characteristic to provide a set of comparison results; An apparatus comprising means for identifying information to be retrieved from a database using a comparison result, the comparison means having a plurality of different comparison modes of operation, the method comprising the steps of: (i) determining whether a query sequence of speech characteristics is generated from the audio signal, Created; And (ii) means for determining whether a current database sequence of speech characteristics has been generated from an audio signal or from text and outputting a determination result; And means for selecting an operation mode of the comparison means for the current database sequence according to the determination result. Preferably, when the determining means determines that both the input query and the commentary are generated from the speech, the comparing means operates like the above-described apparatus.

다른 양상에 따르면, 본 발명은 불러올 정보를 식별하도록 음성 주해 특성의 시퀀스를 포함하는 관련된 주해를 각각 갖는 다수개의 정보 엔트리를 포함하는 데이터베이스를 검색하는 장치에 있어서, 입력 음성 질의의 다수개의 오디오 렌디션(rendition)을 수신하는 수단과; 렌디션 내의 음성을 나타내는 음성 질의 특성의 시퀀스로 입력 질의의 각각의 렌디션을 변환시키는 수단과; 한 세트의 비교 결과를 제공하도록 각각의 주해의 음성 주해 특성과 각각의 렌디션의 음성 질의 특성을 비교하는 수단과; 입력 질의와 주해 사이의 유사성의 측정치를 각각의 주해에 대해 제공하도록 동일한 주해의 음성 주해 특성과 각각의 렌디션의 음성 질의 특성을 비교함으로써 얻어지는 비교 결과를 결합시키는 수단과; 모든 주해에 대해 결합 수단에 의해 제공되는 유사성 측정치를 사용하여 데이터베이스로부터 불러올 정보를 식별하는 수단을 포함하는 장치를 제공한다.According to another aspect, the invention provides an apparatus for searching a database comprising a plurality of information entries each having an associated annotation that includes a sequence of voice annotation properties to identify the information to be retrieved, the apparatus comprising: a plurality of audio renditions means for receiving a rendition; Means for transforming each rendition of the input query into a sequence of voice query characteristics representing the voice in the rendition; Means for comparing a voice annotation characteristic of each annotation with a voice quality characteristic of each rendition to provide a set of comparison results; Means for combining the comparison results obtained by comparing the voice annotation characteristics of the same annotation with the voice query characteristics of each rendition to provide a measure of similarity between the input query and the annotation for each annotation; And means for identifying information to be retrieved from the database using similarity measures provided by the combining means for all annotations.

본 발명의 다른 양상에 따르면, 본 발명은 입력 질의의 렌디션을 각각 나타내는 질의 특성의 제1 및 제2 시퀀스를 수신하는 수단과; 주해 특성의 시퀀스를 수신하는 수단과; 각각의 렌디션으로부터의 질의 특성과 주해 특성을 각각 포함하는 다수개의 정렬된 특성 그룹을 형성하기 위하여 주해의 주해 특성과 각각의 렌디션의 질의 특성을 정렬시키는 수단과; 정렬된 그룹의 특성들 사이의 유사성을 나타내는 비교 스코어를 생성시키도록 각각의 정렬된 특성 그룹의 특성을 비교하는 수단과; 입력 질의 및 주해의 렌디션들 사이의 유사성의 측정치를 제공하도록 모든 정렬된 특성 그룹에 대한 비교 스코어를 결합시키는 수단을 포함하는 특성 비교 장치에 있어서, 비교 수단은 한 세트의 소정 특성으로부터 취해진 각각의 다수개의 특성과 정렬된 그룹의 제1 질의 시퀀스의 특성을 각각의 정렬된 그룹에 대해 비교하여 제1 질의 시퀀스의 특성과 상기 세트로부터의 각각의 특성 사이의 유사성을 나타내는 대응하는 다수개의 중간 비교 스코어를 제공하는 제1 특성 비교기와; 상기 세트로부터의 각각의 다수개의 특성과 정렬된 그룹의 제2 질의 시퀀스의 특성을 각각의 정렬된 그룹에 대해 비교하여 제2 질의 시퀀스의 특성과 상기 세트로부터의 각각의 특성 사이의 유사성을 나타내는 추가의 대응하는 다수개의 중간 비교 스코어를 제공하는 제2 특성 비교기와; 상기 세트로부터의 각각의 다수개의 특성과 정렬된 그룹의 주해 특성을 각각의 정렬된 그룹에 대해 비교하여 주해 특성과 상기 세트로부터의 각각의 특성 사이의 유사성을 나타내는 추가의 대응하는 다수개의 중간 비교 스코어를 제공하는 제3 특성 비교기와; 다수개의 중간 비교 스코어를 결합시킴으로써 정렬된 그룹에 대한 비교 스코어를 계산하는 수단을 포함하는 것을 특징으로 하는 장치를 제공한다.According to another aspect of the present invention, there is provided an apparatus comprising: means for receiving first and second sequences of query characteristics, each representing a rendition of an input query; Means for receiving a sequence of annotation properties; Means for aligning the annotation properties of the annotation and the quality characteristics of each rendition to form a plurality of ordered property groups each comprising a query property and an annotation property from each rendition; Means for comparing characteristics of each ordered property group to generate a comparison score that indicates a similarity between properties of the sorted group; Means for combining the comparison scores for all ordered property groups to provide a measure of similarity between the input query and the renditions of the annotation, wherein the comparing means comprises means for comparing Comparing a plurality of characteristics and characteristics of the first query sequence of the sorted group against each sorted group to determine a corresponding plurality of intermediate comparison scores representing similarities between the characteristics of the first query sequence and respective characteristics from the set A first characteristic comparator for providing a first input signal; Comparing each of the plurality of characteristics from the set and the characteristics of the second query sequence of the sorted group against each of the sorted groups to determine the similarity between the characteristics of the second query sequence and the respective characteristics from the set A second characteristic comparator for providing a corresponding plurality of intermediate comparison scores of the second characteristic comparator; Comparing each of the plurality of properties from the set with the annotation properties of the sorted group for each ordered group to determine an additional corresponding plurality of intermediate comparison scores representing similarity between the annotation properties and respective characteristics from the set A third characteristic comparator for providing the second characteristic comparator; And means for calculating a comparison score for the sorted group by combining a plurality of intermediate comparison scores.

이제 도 1 내지 도 28을 참조하여 본 발명의 예시 실시예를 설명하기로 한다.Now, an exemplary embodiment of the present invention will be described with reference to FIGS. 1 to 28. FIG.

본 발명의 실시예는 전용 하드웨어 회로를 사용하여 실시될 수 있지만, 후술된 실시예는 개인용 컴퓨터, 워크스테이션, 사진 복사기(photocopier), 팩시밀리기, 개인 휴대 정보 단말기(PDA: personal digital assistant) 등의 처리 하드웨어와 연계하여 운영되는 컴퓨터 소프트웨어 또는 코드에서 실시된다.Although embodiments of the present invention may be practiced using dedicated hardware circuitry, the embodiments described below may be implemented in a personal computer, a workstation, a photocopier, a facsimile machine, a personal digital assistant (PDA) And is implemented in computer software or code operating in conjunction with processing hardware.

데이터 파일 주해Comment the data file

도 1은 사용자가 데이터베이스(29)에 저장된 데이터 파일(91)에 주해를 달도록 키보드(3) 및 마이크로폰(7)을 통해 타이핑 또는 음성 주해를 입력할 수 있게 하는 사용자 터미널(59)의 형태를 도시하고 있다. 상기 실시예에서, 데이터 파일(91)은 예컨대 카메라에 의해 생성되는 2차원 이미지를 포함한다. 사용자 터미널(59)은 사용자(39)가 다음에 데이터베이스(29)로부터 2D 이미지를 불러오도록 사용될 수 있는 적절한 주해로 2D 이미지에 주해를 달 수 있게 한다. 상기 실시예에서, 타이핑 입력은 음성 녹음 유닛(75)에 의해 제어 유닛(55)으로 통과되는 음소(또는 음소와 같은) 및 단어 격자 주해 데이터로 변환된다. 도 2는 타이핑 입력 "picture of the Taj Mahal"에 대해 생성되는 음소 및 단어 격자 주해 데이터의 형태를 도시하고 있다. 도 2에 도시된 바와 같이, 음소 및 단어 격자는 단일 진입 지점 및 단일 진출 지점을 갖는 비주기성 그래프(acyclic directed graph)이다.이는 사용자의 입력의 상이한 구문 해석(parses)을 나타낸다. 도시된 바와 같이, 음성 녹음 유닛(75)은 내부 음성 사전(internal phonetic dictionary)(도시되지 않음)으로부터 타이핑 입력에 대응하는 다수개의 상이한 가능한 음소 스트링을 식별한다.1 shows a form of a user terminal 59 that allows a user to input a typing or voice annotation via the keyboard 3 and microphone 7 to annotate a data file 91 stored in the database 29. [ . In the above embodiment, the data file 91 includes a two-dimensional image generated by a camera, for example. The user terminal 59 allows the user 39 to annotate the 2D image with an appropriate annotation that can then be used to fetch the 2D image from the database 29. In this embodiment, the typing input is converted into phonemes (such as phonemes) and word grid annotation data that are passed to the control unit 55 by the voice recording unit 75. Fig. 2 shows the form of the phoneme and word grid annotation data generated for the typing input " picture of the Taj Mahal ". As shown in Figure 2, phonemes and word grids are acyclic directed graphs with a single entry point and a single entry point, which represent different parses of the user's input. As shown, the voice recording unit 75 identifies a number of different possible phonetic strings corresponding to the typing input from an internal phonetic dictionary (not shown).

마찬가지로, 음성 입력은 자동 음성 인식 유닛(automatic speech recognition unit)(51)에 의해 제어 유닛(55)으로 통과되는 음소(또는 음소와 같은) 및 단어 격자 주해 데이터로 변환된다. 자동 음성 인식 유닛(51)은 (ⅰ) 입력 발성(input utterance)에 대한 음소 격자를 생성시키고; (ⅱ) 다음에, 음소 격자 내의 단어를 식별하며; (ⅲ) 마지막으로, 이 2개를 결합시킴으로써 이러한 음소 및 단어 격자 주해 데이터를 생성시킨다. 도 3은 입력 발성 "picture of the Taj Mahal"에 대해 생성되는 음소 및 단어 격자 주해 데이터의 형태를 도시하고 있다. 도시된 바와 같이, 자동 음성 인식 유닛은 이러한 입력 발음에 대응하는 다수개의 상이한 가능한 음소 스트링을 식별한다. 음성 인식 업계에 공지된 바와 같이, 이들 상이한 가능성은 음성 인식 유닛(51)에 의해 생성되고 음성 인식 유닛의 출력 신뢰성을 나타내는 자신의 가중치(weighting)를 가질 수 있다. 그러나, 상기 실시예에서, 이러한 음소의 가중치는 수행되지 않는다. 도 3에 도시된 바와 같이, 자동 음성 인식 유닛(51)이 음소 격자 내에서 식별하는 단어는 음소 격자 데이터 구조에 합체된다. 상기 제공된 예시 구문에 대해 도시된 바와 같이, 자동 음성 인식 유닛(51)은 단어 "picture", "of", "off", "the", "other", "ta", "tar", "jam", "ah", "hal", "ha" 및 "al"을 식별한다.Similarly, the speech input is converted into phonemes (such as phonemes) and word grid annotation data that are passed to the control unit 55 by an automatic speech recognition unit 51. The automatic speech recognition unit 51 generates (i) a phoneme lattice for input utterance; (Ii) Next, identify the words in the phonetic lattice; (Iii) Finally, these two are combined to generate such phoneme and word grid annotation data. Fig. 3 shows the form of phonemic and word grid annotation data generated for the input utterance " picture of the Taj Mahal ". As shown, the automatic speech recognition unit identifies a number of different possible phoneme strings corresponding to this input pronunciation. As is known in the speech recognition arts, these different possibilities may have their own weightings, which are generated by the speech recognition unit 51 and represent the output reliability of the speech recognition unit. However, in the above embodiment, the weight of such a phoneme is not performed. As shown in Fig. 3, the words that the automatic speech recognition unit 51 identifies in the phoneme lattice are incorporated into the phoneme lattice data structure. As shown for the example sentence provided above, the automatic speech recognition unit 51 includes the words "picture", "of", "off", "the", "other" quot ;, " ah ", " hal ", " ha "

도 3에 도시된 바와 같이, 자동 음성 인식 유닛(51)에 의해 생성되는 음소 및 단어 격자는 단일 진입 지점 및 단일 진출 지점을 갖는 비주기성 그래프이다. 이는 사용자의 입력 주해 발성의 상이한 구문 해석을 나타낸다. 각각의 단어는 단일 대체물로 교체될 필요가 없고, 하나의 단어가 2개 이상의 단어 또는 음소로 교체될 수 있으며, 전체 구조는 하나 이상의 단어 또는 음소에 대한 대용물을 형성할 수 있으므로, 단순히 대체물로서의 단어 시퀀스는 아니다. 따라서, 음소 및 단어 격자 주해 데이터 내의 데이터 밀도는 기본적으로 오디오 주해 입력에 대한 N-best 단어 리스트를 생성시키는 시스템의 경우에서와 같이 기하급수적으로 성장하는 것이 아니라 주해 데이터 전체에 걸쳐 선형으로 유지된다.As shown in FIG. 3, the phoneme and word grid generated by the automatic speech recognition unit 51 are acyclic graphs having a single entry point and a single entry point. This represents a different syntax analysis of the user's input annotation. Each word does not need to be replaced with a single substitute, and one word can be replaced with two or more words or phonemes, and the entire structure can form a substitute for one or more words or phonemes, It is not a word sequence. Thus, the data density in the phoneme and word grid annotation data is maintained linearly throughout the annotation data, rather than growing exponentially as in the case of a system that basically produces an N-best word list for the audio annotation input.

상기 실시예에서, 자동 음성 인식 유닛(51) 또는 음성 녹음 유닛(75)에 의해 생성되는 주해 데이터는 다음의 일반 형태를 갖는다.In the above embodiment, the annotation data generated by the automatic voice recognition unit 51 or the voice recording unit 75 has the following general form.

헤더Header

- 플래그(flag) 단어, 음소 또는 혼합 여부- flag Whether words, phonemes, or mixtures

- 소정 시점까지 메모리 내의 주해 데이터의 블록 위치와 관련된 시간 인덱스- a time index associated with the block location of the annotation data in the memory up to a point in time

- 사용되는 단어 세트(즉, 사전)- the set of words used (ie, the dictionary)

- 사용되는 음소 세트- Phoneme set used

- 어휘가 속하는 언어- the language the vocabulary belongs to

- 음소 확률 데이터- Phoneme probability data

블록(i) i = 0, 1, 2,.....Block (i) i = 0, 1, 2, .....

노드 N_jj = 0, 1, 2,.....Node N _j j = 0, 1, 2, .....

- 블록의 시작으로부터 노드의 시간 오프셋- the time offset of the node from the beginning of the block

- 음소 링크 (k) k = 0, 1, 2,.....- phoneme link (k) k = 0, 1, 2, .....

- 노드 N_j= N_k- N_j(N_k는 링크 K가 연장되는 노드)에 대한 오프셋- offset for N _j (N _k is the node that links K extend) node N _j = _k N

링크 (k)와 관련된 음소Phoneme associated with link (k)

- 단어 링크 (l) l = 0, 1, 2,.....- Word link (l) l = 0, 1, 2, .....

- 노드 N_j= N_i- N_j[N_j는 링크(l)가 연장되는 노드]에 대한 오프셋- an offset for a node N _j = N _i - N _j [N _j is the node from which the link (l) extends)

링크(l)와 관련된 단어Words related to link (l)

데이터베이스 내의 모든 데이터 파일이 상기 논의된 결합된 음소 및 단어 격자 주해 데이터를 포함하는 것이 아니므로 주해 데이터가 단어 주해 데이터, 음소 주해 데이터 또는 혼합된 것 중 어느 것인지를 식별하는 플래그가 제공되고, 이 경우에, 상이한 검색 전략이 이 주해 데이터를 검색하는데 사용된다.Since not all the data files in the database contain the combined phoneme and word grid annotation data discussed above, a flag is provided identifying whether the annotation data is word annotation data, phoneme annotation data, or mixed, A different search strategy is used to retrieve this annotation data.

상기 실시예에서, 주해 데이터는 검색이 소정의 검색을 위한 주해 데이터의 중간에 건너뛸 수 있게 하기 위해 노드의 블록들로 분할된다. 따라서, 헤더는 시작 시간과 블록의 시작에 대응하는 시간 사이에 오프셋된 소정 시간까지 메모리 내의 주해 데이터의 블록 위치와 관련되는 시간 인덱스를 포함한다.In the above embodiment, the annotation data is divided into blocks of the node so that the search can be skipped in the middle of annotation data for a given search. Thus, the header includes a time index associated with the block location of the annotation data in the memory up to a predetermined time offset between the start time and the time corresponding to the beginning of the block.

또한, 헤더는 사용되는 단어 세트(즉, 사전), 사용되는 음소 세트, 이들의확률 및 어휘가 속하는 언어를 정의하는 데이터를 포함한다. 또한, 헤더는 주해 데이터와 이 주해 데이터의 생성 중에 사용된 어떤 적절한 설정을 생성시키는데 사용되는 자동 음성 인식 시스템의 세부 사항을 포함할 수도 있다.The header also includes data defining a set of words to be used (i.e., dictionaries), a set of phonemes to be used, their probability, and the language to which the vocabulary belongs. The header may also include details of the automatic speech recognition system used to generate annotation data and any appropriate settings used during the generation of this annotation data.

다음에, 주해 데이터의 블록은 헤더를 따르고, 블록의 시작으로부터 노드의 시간 오프셋과, 음소에 의해 이 노드를 다른 노드에 연결하는 음소 링크와, 단어에 의해 이 노드를 다른 노드에 연결하는 단어 링크를 블록의 각각의 노드에 대해 식별한다. 각각의 음소 링크 및 단어 링크는 링크와 관련되는 음소 및 단어를 식별한다. 또한, 이들은 현재의 노드에 대한 오프셋을 식별한다. 예컨대, 노드(N₅₀)는 음소 링크에 의해 노드(N₅₅)에 링크되고, 노드(N₅₀)로의 오프셋은 5이다. 당업자라면 이해하겠지만, 이와 같은 오프셋 표시(offset indication)를 사용하면, 별도의 블록으로의 연속적인 주해 데이터의 분할이 가능하다.Next, a block of annotation data follows the header, and includes a time offset of the node from the beginning of the block, a phoneme link that links the node to another node by phonemes, and a word link &Lt; / RTI > for each node of the block. Each phoneme link and word link identifies the phonemes and words associated with the link. They also identify the offset for the current node. For example, the node N ₅₀ is linked to the node N ₅₅ by a phoneme link, and the offset to the node N ₅₀ is 5. As will be appreciated by those skilled in the art, the use of such an offset indication allows the partitioning of consecutive annotation data into separate blocks.

자동 음성 인식 유닛이 음성 인식 유닛 출력의 신뢰성을 나타내는 가중치를 출력하는 실시예에서, 이들 가중치 또는 신뢰성 스코어는 데이터 구조에 포함된다. 특히, 노드에 도착하는 신뢰성을 나타내는 신뢰성 스코어가 각각의 노드에 대해 제공되고, 각각의 음소 및 단어 링크는 대응 음소 또는 단어에 주어진 가중치에 따라 천이 스코어를 포함한다. 다음에, 이들 가중치는 낮은 신뢰성 스코어를 갖는 그러한 매칭을 폐기함으로써 데이터 파일의 검색 및 불러오기를 제어하는데 사용된다.In embodiments in which the automatic speech recognition unit outputs weights indicative of the reliability of the speech recognition unit outputs, these weights or reliability scores are included in the data structure. In particular, a reliability score is provided for each node, indicating reliability to arrive at the node, and each phoneme and word link includes a transition score according to a weight given to the corresponding phoneme or word. These weights are then used to control retrieval and retrieval of data files by discarding such matching with a low reliability score.

사용자의 입력에 응답하여, 제어 유닛(55)은 데이터베이스(29)로부터 적절한 2D 파일을 불러오고, 데이터 파일(91)로 생성된 음소 및 단어 주해 데이터를 추가한다. 다음에, 증가된 데이터 파일은 데이터베이스(29)로 복귀된다. 이러한 주해 형성 단계에서, 제어 유닛(55)은 주해 데이터가 정확한 데이터 파일(91)과 결합되는 것을 사용자가 보증할 수 있게 디스플레이(57) 상에 2D 이미지를 표시하도록 동작 가능하다.In response to the user's input, the control unit 55 loads the appropriate 2D file from the database 29 and adds the phoneme and word annotation data generated in the data file 91. The increased data file is then returned to the database 29. In this annotation formation step, the control unit 55 is operable to display the 2D image on the display 57 so that the user can assure that the annotation data is combined with the correct data file 91. [

이하에서 보다 상세하게 설명되겠지만, 이러한 음소 및 단어 격자 주해 데이터의 사용은 내부에 저장된 원하는 2D 이미지 데이터 파일을 식별하고 불러오기 위해 데이터베이스(29)의 신속하고 효율적인 검색이 수행될 수 있게 한다. 이는 우선 단어 데이터를 사용하여 데이터베이스(29)를 검색하고, 이러한 검색이 요구된 데이터 파일을 제공하지 못하면, 보다 강력한 음소 데이터를 사용하여 추가 검색을 수행함으로써 달성될 수 있다. 음성 인식 업계의 당업자가 이해하고 있는 바와 같이, 음소 데이터의 사용은 음소가 사전과 독립적이고 시스템이 명칭, 장소, 외래어 등의 어휘 단어에 대해 대처할 수 있게 하기 때문에 보다 강력하다. 또한, 음소 데이터의 사용은 원래의 주해가 음성에 의해 입력되었고 원래의 자동 음성 인식 시스템이 입력된 주해의 단어를 이해하지 못했을 때 데이터베이스(29) 내로 위치된 데이터 파일을 불러올 수 있게 하므로 시스템의 미래 지향적이게 할 수 있다.The use of such phonemic and word grid annotation data, as will be described in more detail below, allows quick and efficient retrieval of the database 29 to identify and recall the desired 2D image data files stored therein. This can be accomplished by first searching the database 29 using word data and performing an additional search using more powerful phoneme data if such a search fails to provide the requested data file. As one of ordinary skill in the speech recognition arts appreciates, the use of phonemic data is more powerful because phonemes are dictionary-independent and allow the system to cope with vocabulary words such as names, places, and foreign words. In addition, the use of phonemic data allows the user to recall data files located in the database 29 when the original annotations have been entered by speech and the original automatic speech recognition system did not understand the words of the commentary entered, It can be oriented.

데이터 파일 불러오기Import data file

도 4는 데이터베이스(29)로부터 주해가 달린 2D 이미지를 불러오기 위해 상기 실시예에서 사용되는 사용자 터미널(59)의 형태를 도시하는 블록도이다. 이러한 사용자 터미널(59)은 예컨대 개인용 컴퓨터, 휴대 장치(hand-held device) 등일 수도 있다. 도시된 바와 같이, 상기 실시예에서, 사용자 터미널(59)은 주해가 달린 2D 이미지의 데이터베이스(29), 자동 음성 인식 유닛(51), 음성 녹음 유닛(75), 키보드(3), 마이크로폰(7), 검색 엔진(search engine)(53), 제어 유닛(55) 및 디스플레이(57)를 포함한다. 동작시, 사용자는 마이크로폰(7)을 통한 음성 질의 또는 키보드(3)를 통한 타이핑 질의 중 하나를 입력하고, 이 질의는 대응 음소 및 단어 데이터를 생성시키도록 자동 음성 인식 유닛(51) 또는 음성 녹음 유닛(75) 중 하나에 의해 처리된다. 또한, 이러한 데이터는 음소 및 단어 격자의 형태를 취할 수도 있지만, 이는 필수적인 것이 아니다. 다음에, 이러한 음소 및 단어 데이터는 검색 엔진(53)을 사용하여 데이터베이스(29)의 적절한 검색을 개시하도록 동작 가능한 제어 유닛(55)으로 입력된다. 다음에, 검색 엔진(53)에 의해 생성되는 검색의 결과는 검색 결과를 분석하고 디스플레이(57)를 통해 사용자에게 적절한 디스플레이 데이터를 생성시켜 표시하는 제어 유닛(55)으로 재전송된다.4 is a block diagram showing the form of the user terminal 59 used in the above embodiment to retrieve the annotated 2D image from the database 29. [ Such a user terminal 59 may be, for example, a personal computer, a hand-held device, or the like. As shown, in this embodiment, the user terminal 59 comprises a database 29 of annotated 2D images, an automatic speech recognition unit 51, a voice recording unit 75, a keyboard 3, a microphone 7 ), A search engine 53, a control unit 55, and a display 57. [ In operation, the user inputs one of a voice query via the microphone 7 or a typing query via the keyboard 3, which queries the automatic speech recognition unit 51 or voice recording RTI ID = 0.0 > 75 < / RTI > In addition, such data may take the form of phonemes and word lattices, but this is not necessary. These phonemes and word data are then input to control unit 55, which is operable to initiate an appropriate search of database 29 using search engine 53. [ The result of the search generated by the search engine 53 is then resent to the control unit 55 which analyzes the search results and generates and displays appropriate display data to the user via the display 57. [

도 5a 및 5b는 사용자 터미널(59)이 상기 실시예에서 동작하는 방법을 도시하는 흐름도이다. 단계 s1에서, 사용자 터미널(59)은 정지 상태(idle state)에 있고, 사용자(39)로부터 입력 질의를 기다린다. 입력 질의의 수신시, 이 입력 질의에 대한 음소 및 단어 데이터는 자동 음성 인식 유닛(51) 또는 음성 녹음 유닛(75)에 의해 단계 s3에서 생성된다. 다음에, 제어 유닛(55)은 단계 s5에서 검색 엔진(53)이 입력 질의로부터 생성된 단어 데이터를 사용하여 데이터베이스(29)의 검색을 수행하라고 명령한다. 상기 실시예에서 채용된 단어 검색은 타이핑된 단어 검색에 대해 당업계에서 기존에 사용되는 것과 동일하고, 여기에서는 상세하게 설명하지 않기로 한다. 단계 s7에서, 제어 유닛(55)이 검색 결과로부터 사용자의 입력 질의에 대한 매치(match)가 찾아낸 것을 식별하면, 디스플레이(57)를 통해 사용자에서 검색 결과를 출력한다.Figures 5A and 5B are flow charts illustrating how the user terminal 59 operates in this embodiment. In step s1, the user terminal 59 is in an idle state and waits for an input query from the user 39. [ Upon reception of the input query, the phoneme and word data for this input query are generated in step s3 by the automatic speech recognition unit 51 or the voice recording unit 75. [ Next, in step s5, the control unit 55 instructs the search engine 53 to perform a search of the database 29 using the word data generated from the input query. The word search employed in the above embodiment is the same as that used in the art for the typed word search, and will not be described in detail here. In step s7, if the control unit 55 identifies a match for the user's input query from the search result, the search result is output from the user via the display 57. [

상기 실시예에서, 사용자 터미널(59)은 사용자가 검색 결과를 고려할 수 있게 하고 이 결과가 사용자가 요구하는 정보와 일치하는지에 대한 사용자의 확인을 기다린다. 검색 결과가 일치하면, 처리는 단계 s11로부터 처리의 종료로 진행되고, 사용자 터미널(59)은 정지 상태로 복귀하여 다음의 입력 질의를 기다린다. 그러나, 사용자가 검색 결과가 원하는 정보와 일치하지 않는다고 (예컨대, 적절한 음성 명령을 입력함으로써) 지적하면, 처리는 단계 s11로부터 단계 s13으로 진행되고, 검색 엔진(53)은 데이터베이스(29)의 음소 검색을 수행한다. 그러나, 상기 실시예에서, 단계 s13에서 수행되는 음소 검색은 전체 데이터베이스(29)에 대해서가 아닌데, 이는 데이터베이스의 크기에 따라 수 시간이 걸릴 수도 있기 때문이다.In this embodiment, the user terminal 59 allows the user to consider the search results and waits for the user's confirmation of whether the results match the information requested by the user. If the search results match, the process proceeds from step s11 to the end of the process, and the user terminal 59 returns to the stop state and waits for the next input query. However, if the user indicates that the search result does not match the desired information (for example, by entering an appropriate voice command), processing proceeds from step s11 to step s13 and the search engine 53 searches the database 29 for phoneme search . However, in the above embodiment, the phoneme search performed in step s13 is not for the entire database 29, because it may take several hours depending on the size of the database.

대신에, s13에서 수행되는 음소 검색은 사용자의 입력 질의와 일치하는 데이터베이스 내의 하나 이상의 부분을 식별하도록 단계 s5에서 수행된 단어 검색의 결과를 사용한다. 예컨대, 질의가 3개의 단어를 포함하고 단어 검색이 주해에서 질의 단어 중 하나 또는 2개를 식별하기만 하면, 식별된 단어(들) 주변의 주해 중 일부의 음소 검색을 수행한다. s13에서 수행된 음소 검색이 상기 실시예에서 수행되는 방법은 차후에 보다 상세하게 설명하기로 한다.Instead, the phoneme search performed at s13 uses the result of the word search performed at step s5 to identify one or more parts of the database that match the user's input query. For example, a phoneme search of some of the annotations around the identified word (s) is performed if the query includes three words and the word search identifies one or two of the query words in the annotation. The method in which the phoneme search performed in s13 is performed in the above embodiment will be described in detail later.

음소 검색이 수행된 후, 제어 유닛(55)은 매치를 찾아냈는지를 단계 s15에서 식별한다. 매치를 찾아냈다면, 처리는 단계 s17로 진행되고, 제어 유닛(55)은 검색 결과가 디스플레이(57) 상에서 사용자에게 표시되게 한다. 다시, 시스템은 검색 결과가 원하는 정보와 일치하는지에 대해 사용자의 확인을 기다린다. 결과가 정확하면, 처리는 단계 s19로부터 종료로 통과되고, 사용자 터미널(59)은 정지 상태로 복귀하여 다음의 입력 질의를 기다린다. 그러나, 사용자가 검색 결과가 원하는 정보와 일치하지 않는다고 지적하면, 처리는 단계 s19로부터 단계 s21로 진행되는데, 제어 유닛(55)은 음소 검색이 전체 데이터베이스(29)에 대해 수행되어야 하는지를 디스플레이(57)를 통해 사용자에게 물어보도록 동작 가능하다. 이러한 질의에 응답하여, 사용자는 이러한 검색이 수행되어야 한다고 지시하면, 처리는 단계 s23으로 진행되고, 검색 엔진은 전체 데이터베이스(29)에 대한 음소 검색을 수행한다.After the phoneme search is performed, the control unit 55 identifies at step s15 whether or not a match has been found. If the match is found, the process proceeds to step s17 and the control unit 55 causes the search result to be displayed on the display 57 to the user. Again, the system waits for the user's confirmation that the search results match the desired information. If the result is correct, the process passes from step s19 to end, and the user terminal 59 returns to the stop state and waits for the next input query. However, if the user indicates that the search result does not match the desired information, the process proceeds from step s19 to step s21, where the control unit 55 displays on the display 57 whether the phoneme search should be performed for the entire database 29. [ Lt; RTI ID = 0.0 > user < / RTI > In response to this query, if the user indicates that such a search should be performed, the process proceeds to step s23, and the search engine performs a phoneme search for the entire database 29. [

이러한 검색의 완료시, 제어 유닛(55)은 사용자의 입력 질의가 찾아냈는지를 단계 s25에서 식별한다. 매치가 찾아지면, 처리는 단계 s27로 진행되고, 제어 유닛(55)은 검색 결과가 디스플레이(57) 상에서 사용자에게 표시되게 한다. 검색 결과가 정확하면, 처리는 단계 s29로부터 처리의 종료로 진행되고, 사용자 터미널(59)은 정지 상태로 복귀하여 다음의 입력 질의를 기다린다. 반대로, 사용자가 검색 결과가 원하는 정보와 일치하지 않는다고 지적하면, 처리는 단계 s31로 통과되고, 제어 유닛(55)은 사용자가 검색 질의를 재한정하거나 보정하고 싶은지를 디스플레이(57)를 통해 사용자에게 질의한다. 사용자가 검색 질의를 재한정하거나 보정하고 싶다면, 처리는 단계 s3으로 복귀하고, 사용자의 차후의 입력 질의가 유사한 방법으로 처리된다. 검색이 재한정되거나 보정되지 않으면, 검색 결과 및 사용자의 초기 입력 질의는 폐기되고, 사용자 터미널(59)은 정지 상태로 복귀하여 다음의 입력 질의를 기다린다.Upon completion of this search, the control unit 55 identifies in step s25 whether the user's input query has been found. If a match is found, the process proceeds to step s27, and the control unit 55 causes the search result to be displayed to the user on the display 57. [ If the search result is correct, the process proceeds from step s29 to the end of the process, and the user terminal 59 returns to the stop state and waits for the next input query. Conversely, if the user indicates that the search result does not match the desired information, the process is passed to step s31 and the control unit 55 informs the user via display 57 whether the user wants to redefine or correct the search query Query. If the user wishes to redefine or correct the search query, the process returns to step s3 and the user's subsequent input query is processed in a similar manner. If the search is not redefined or corrected, the search result and the user's initial input query are discarded, and the user terminal 59 returns to the idle state and waits for the next input query.

검색이 사용자 터미널(59)에 의해 상기 실시예에서 수행되는 방식이 일반적인 설명으로서 제시되었다. 이제, 밑줄친 검색 전략의 간단한 설명과 함께, 검색 엔진(53)이 음소 검색을 수행하는 방식에 대한 보다 상세한 설명이 제시될 것이다.The manner in which the search is performed in the embodiment by the user terminal 59 has been presented as a general description. Now, with a brief description of the underlined search strategy, a more detailed description of how the search engine 53 performs the phoneme search will be presented.

분류 문제로서의 정보 불러오기Loading information as a classification problem

고전적인 분류 시나리오에서, 시험 데이터는 K 클래스(class)들 중 하나로 분류된다. 이는 클래스가 알려진 다른 데이터에 대한 정보를 사용하여 수행된다. 분류 문제는 1 내지 K의 수치를 취할 수 있는 "클래스" 임의 변수가 존재하는 것으로 추정한다. 다음에, 최적 분류는 시험 데이터가 가장 적절하게 속할 것 같은 클래스를 식별함으로써 찾아낸다. 트레이닝 데이터(training data)는 클래스 k의 n_k데이터의 결과를 가져오는 N 생성 프로세스(N generative process)에 의해 생성되는 것으로 추정된다. 여기에서, Σ^K _k=1n_k= N. 벡터(n₁, n₂, …, n_k)를 n에 의해, 트레이닝 데이터를 D에 의해 그리고 시험 데이터를 x에 의해 표시하면, 고전적인 분류 문제는 다음의 확률을 최대화하는 k의 수치를 결정하는 것이다:In a classical classification scenario, the test data is classified as one of the K classes. This is done using information about other data for which the class is known. The classification problem assumes that there is a " class " random variable that can take a number from 1 to K. [ Next, the optimal classification is found by identifying the classes in which the test data most likely belongs. It is assumed that training data is generated by an N generative process that results in n _k data of class k. ^{_{_{Here, Σ K k = 1 n k}}} = N. vector _{_{(n 1, n 2, ...}} , n k) when the by n, by the training data in the D and shown by the test data in the x, classic classification The problem is to determine the number of k that maximizes the probability of:

분자(numerator)에 있는 제2항은 보다 자주 일어나는 클래스에 대한 보다 큰 가중치를 주는 클래스에 대한 사전 확률(prior probability)이다. 정보 불러오기의 관계에서, 각각의 클래스는 단일 트레이닝 데이터(즉, 주해 데이터)를 갖는다. 따라서, 정보 불러오기에 대해, 상기 표현식의 우측에 있는 제2항은 무시될 수 있다. 마찬가지로, P(x|D)는 각각의 클래스에 대해 동일하여 분자만 정규화시키므로 분모(denominator)도 무시될 수 있다. 결국, 클래스의 차수는 클래스에 대해 상기 표현식의 분자에 있는 제1항의 차수만 평가함으로써 즉 모든 클래스에 대해 P(x|d_k)를 결정하여 평가함으로써 평가될 수 있다. 여기에서, d_k는 클래스 k에 대한 트레이닝 데이터이다.The second term in the numerator is the prior probability for the class giving a larger weight to the more frequently occurring class. In the context of information retrieval, each class has single training data (i.e. annotation data). Thus, for information retrieval, the second term on the right side of the above expressions can be ignored. Similarly, P (x | D) is the same for each class, normalizing only the molecules, so denominators can also be ignored. Finally, the order of the class can be evaluated by evaluating P (x | d _k ) for all classes by evaluating only the order of the first term in the numerator of the expression for the class. Here, d _k is training data for class k.

상기 실시예에서, 시험 데이터(x)는 입력 질의를 나타내고, 클래스 k에 대한 트레이닝 데이터(즉, d_k)는 k번째 주해를 나타내며, 도 6a에 도시된 바와 같이 질의 및 주해를 모두 생성시킨 기본 통계 모델(M)이 존재하는 것으로 추정된다. 일반적인 경우에, 이러한 모델은 3개의 미지수(unknown) 즉 모델 구조(m), 질의 및 주해 모두에 대한 모델을 통한 상태 시퀀스(state sequence)(S_q및 S_a) 및 출력 분포(C)를 갖는다. 이러한 경우에, 입력 음성으로부터 음소 스트링을 생성시키는 음성 인식 시스템의 특성을 구체화하므로 출력 분포를 알고 있다. 차후에 설명되겠지만, 이는 음성 인식 시스템에 공지된 음성의 대형 데이터베이스를 적용함으로써 얻어질 수 있고, 이하에서는 혼동 통계(confusion statistics)로서 불린다. 따라서, 상기 확률식으로 상태 시퀀스 및 모델을 도입하면 (그리고 입력 질의에 대해 변수 q 그리고 주해에 대해 변수 a를 사용하면) 다음의 식이 산출된다:In the above embodiment, the test data (x) represents the input query, the training data for class k (i.e., d _k ) represents the kth annotation, and the base It is assumed that a statistical model (M) exists. In the general case, this model has a state sequence (S _q and S _a ) and an output distribution (C) through the model for all three unknowns, the model structure (m) . In this case, the output distribution is known because it embodies the characteristics of a speech recognition system that generates a phoneme string from the input speech. As will be described later, this can be achieved by applying a large database of speech known to speech recognition systems, hereinafter referred to as confusion statistics. Therefore, by introducing the state sequence and model as the probability equation (and using the variable q for the input query and the variable a for the annotation), the following equation is computed:

이는 베이즈 방법(Bayesian method)을 사용하여 다음과 같이 전개될 수 있다:This can be developed using the Bayesian method as follows:

상기 표현식은 복잡해 보이지만, 상태 시퀀스 s_q및 s_a의 세트에 걸친 합계는 표준 동적 프로그래밍 알고리즘을 사용하여 수행될 수 있다. 또한, 분자 및 분모에 모두 있는 마지막 항은 동일한 것으로 보이고, 상태 시퀀스의 항[P(s|m,c)]은 각각의 상태 시퀀스가 동일한 것으로 보이는 것으로 추정될 수 있기 때문에 무시될 수 있다. 또한, 기본 모델 구조는 삽입이 적용되는 질의와 대략 동일한 길이를 갖는 음소의 기준 시퀀스인 것으로 추정함으로써, 상이한 모델에 걸친 합계는 모든 가능한 음소에 걸친 합체로 대체되더라도 제거될 수 있고, 일반적인 경우에, 모델의 음소의 기본 시퀀스는 알려져 있지 않다. 따라서, 상태 시퀀스의 합계를 무시하여, 동적 프로그램 알고리즘 내부에서 평가되는 항은 분자에서,While the above expression may seem complex, the sum over the set of state sequences s _q and s _a may be performed using standard dynamic programming algorithms. Also, the last term in both the numerator and the denominator seems to be the same, and the term [P (s | m, c)] of the state sequence can be ignored since it can be assumed that each state sequence seems to be the same. In addition, by estimating that the base model structure is a reference sequence of phonemes having a length approximately equal to the query to which the insert is applied, summations over different models can be eliminated even if they are replaced by merging over all possible phonemes, The basic sequence of phonemes in the model is not known. Thus, by ignoring the sum of the state sequences, the terms evaluated within the dynamic program algorithm are,

그리고 분모(즉, 정규화 항)에서,And in the denominator (that is, the normalization term)

가 된다. 여기에서 N_p는 시스템에 알려진 음수의 총수이고, a_i, q_j및 p_r은 평가될 기존의 DP 격자 지점에 각각 대응하는 주해 음소, 질의 음소 및 모델 음소이다. 수학식 (4) 및 (5)의 비교로부터 알 수 있는 바와 같이, 분모에서 계산되는 확률 항은 분자에서도 계산된다. 따라서, 이 항들은 모두 동일한 동적 프로그래밍 시퀀스 중에 누적될 수 있다. 보다 상세하게 결정되는 확률을 고려하면, P(q_j|p_r,c)는 혼동 통계라고 가정하면 질의 음소 q_j로서 기준 음소 p_r을 디코딩하는 확률이고; P(a_i|p_r,c)는 혼동 통계라고 가정하면 주해 음소 a_i로서 기준 음소 p_r을 디코딩하는 확률이고; P(p_r|c)는 혼동 통계라고 가정하면 무조건적으로 일어나는 기준 음소 p_r을 디코딩하는 확률이다.. Where N _p is the total number of negative numbers known to the system, and a _i , q _j, and p _r are the annotation phoneme, query phoneme, and model phoneme, respectively, corresponding to the existing DP lattice points to be evaluated. As can be seen from the comparison of equations (4) and (5), the probability term calculated in the denominator is also calculated in the numerator. Thus, all of these terms can be accumulated during the same dynamic programming sequence. Considering the probability of being determined in more detail, P (q _j | p _r , c) is the probability of decoding the reference phoneme p _r as the query phoneme q _j , assuming confusion statistics; P (a _i | p _r , c) is the probability of decoding the reference phoneme p _r as the annotation phoneme a _i , assuming confusion statistics; P (p _r | c) is the probability of decoding the reference phoneme p _r unconditionally assuming confusion statistics.

상기 항에 추가로, 동적 프로그래밍 계산의 각각의 지점에서, 모델에 대해 질의 또는 주해의 삽입 및 삭제를 취급하는 추가항이 계산되어야 한다. 당업자라면 이해하겠지만, 질의에서의 삽입 또는 삭제는 주해에서의 삽입 또는 삭제와 독립적이고, 그 역도 마찬가지이다. 따라서, 이들 추가항은 별도로 취급된다. 또한, 모델에 대한 주해에서의 삽입 및 삭제는 상기 수학식 (5)에 주어진 정규화 항에 대해 고려되어야 한다.In addition to the above, additional terms must be calculated to handle the insertion or deletion of queries or annotations for the model at each point in the dynamic programming computation. As will be appreciated by those skilled in the art, the insertion or deletion in the query is independent of insertion or deletion in the annotation, and vice versa. Therefore, these additional terms are handled separately. In addition, insertions and deletions in the annotations for the model should be considered for the normalization term given in equation (5) above.

당업자라면 상기 실시예에서 주해 음소 데이터 및 질의 음소 데이터는 모두텍스트 또는 음성(text or speech) 중 하나로부터 얻어질 수도 있다는 것을 도 4 및 도 5의 설명으로부터 이해할 것이다. 따라서, 고려할 4개의 상황이 존재한다:It will be understood by those skilled in the art from the description of FIGS. 4 and 5 that in this embodiment both the annotation phoneme data and the query phoneme data may be obtained from either text or speech. Thus, there are four situations to consider:

ⅰ) 주해 및 질의는 모두 텍스트로부터 생성되는 상황;I) the annotations and queries are all generated from text;

ⅱ) 주해는 텍스트로부터 생성되고 질의는 음성으로부터 생성되는 상황;Ii) a situation where the annotation is generated from text and the query is generated from speech;

ⅲ) 주해는 음성으로부터 생성되고 질의는 텍스트로부터 생성되는 상황; 및Iii) the annotation is generated from speech and the query is generated from text; And

ⅳ) 질의 및 주해는 모두 음성으로부터 생성되는 상황.Iv) Situations in which queries and annotations are all generated from speech.

제1 상황은 주해 또는 질의의 압축/팽창 그리고 주해와 질의 사이의 비교가 각각의 음소의 시퀀스의 간단한 부울 비교(boolean comparison)에 의해 수행될 시간이 있을 수 없는 간단한 경우이다.The first situation is a simple case where the compression / expansion of the annotation or query and the comparison between annotation and query can not be time to be performed by a simple boolean comparison of the sequence of each phoneme.

제2 상황에서, 주해는 정확한 것으로 간주되고, 동적 프로그래밍 정렬은 이들 2개 사이의 가장 우수한 정렬을 찾아내기 위해 질의에서 음소의 삽입 및 삭제를 가능하게 한다. 이 경우를 도시하기 위해, 도 6b는 주해 음소가 텍스트로부터 생성될 때 주해 음소의 시퀀스 a₀, a₁, a₂… 및 질의 음소의 시퀀스 q₀, q₁, q₂… 사이의 가능한 매칭을 도시하고 있다. 점선 화살표에 의해 도시된 바와 같이, 주해 음소 a₀은 질의 음소 q₀과 정렬되고, 주해 음소 a₁은 질의 음소 q₂와 정렬되고, 주해 음소 a₂는 질의 음소 q₃과 정렬되고, 주해 음소 a₃은 질의 음소 q₃과 정렬되고, 주해 음소 a₄는 질의 음소 q₄와 정렬된다. 이들 각각의 정렬에 대해, 동적 프로그래밍 시퀀스는 수학식 (4) 및 (5)에 주어진 항들을 계산한다. 그러나, 이러한 경우에, 이들 수학식은 모델 음소의 기준 시퀀스가 알려져 있기 때문에(이들은 주해 음소이므로) 간략화된다. 특히, 정규화 항은 주해가 모델이고 분자가 P(q_i|a_j,c)인 것이 하나의 이유이다. 이들 디코딩 항에 추가로, 동적 프로그래밍 루틴은 [질의 음소 q₁등의] 주해에 대한 질의에서 삽입되는 음소에 대한 그리고 [2개의 주해 음소 a₂및 a₃과 매칭되는 질의 음소 q₃에 의해 나타낸] 주해에 대한 질의에서 삭제되는 음소에 대한 관련된 삽입 및 삭제 확률을 계산한다.In the second situation, the annotation is considered correct, and the dynamic programming alignment enables the insertion and deletion of phonemes in the query to find the best alignment between these two. To illustrate this case, FIG. 6B shows the sequence a ₀ , a ₁ , a ₂ ... of the annotation phoneme when the annotation phoneme is generated from the text. And a sequence of query phonemes q ₀ , q ₁ , q ₂ ... Lt; / RTI > As shown by the dotted arrows, the annotation phoneme a ₀ is aligned with the query phoneme q ₀ , the annotation phoneme a ₁ is aligned with the query phoneme q ₂ , the annotation phoneme a ₂ is aligned with the query phoneme q ₃ , a ₃ is aligned with the query phoneme q _3, and the phoneme a ₄ is aligned with the query phoneme q ₄ . For each of these alignments, the dynamic programming sequence computes the terms given in equations (4) and (5). However, in this case, these equations are simplified because the reference sequence of the model phoneme is known (these are annotated phonemes). In particular, the normalization term is one reason why the annotation is the model and the numerator is P (q _i | a _j , c). In addition to these decoding terms, the dynamic programming routines may be used for phonemes inserted in queries for annotations [such as query phoneme q ₁ ], and for query phoneme q ₃ matched with the two annotation a ₂ and a ₃ ] Calculate the related insertion and deletion probabilities for the phoneme deleted from the query for the annotation.

상기 언급된 제3 상황은 질의 음소의 시퀀스가 정확하게 취해지고 동적 프로그래밍 정렬이 질의에 대한 주해에서의 음소의 삽입 및 삭제가 가능하다는 것을 제외하면 제2 상황과 유사하다. 그러나, 이러한 상황에서, 수학식 (1) 내지 (5)는 질의가 알려져 있기 때문에 사용될 수 없다. 따라서, 이러한 상황에서, 수학식 (1)은 다음과 같이 재구성될 수 있다:The third situation mentioned above is similar to the second situation except that the sequence of the query phonemes is taken correctly and the dynamic programming alignment is capable of inserting and deleting phonemes in the annotation to the query. However, in this situation, the equations (1) to (5) can not be used because the query is known. Thus, in this situation, equation (1) can be reconstructed as follows:

상기 수학식(1)의 대응 항과 같이, 분자 및 분모에 있는 제2항은 모두 무시될 수 있다. 상기 수학식 (6)에서 분자의 제1항은 수학식 (1)의 분자에 있는 제1항이 전개된 방식과 유사한 방식으로 전개될 수 있다. 그러나, 상기 상황에서, 질의가 모델이 되도록 취해진 상태에서, 동적 프로그래밍 루틴 중에 계산되는 정규화 항은 하나로 간략화되고, 분자는 P(a_i|q_j,c)로 간략화된다. 상기 논의된 제2 상황과 같이, 동적 프로그래밍 루틴은 질의에 대한 주해에서 삽입되는 음소에 대한 그리고 질의에 대한 주해에서 삭제되는 음소에 대한 관련된 삽입 및 삭제 확률을 계산한다.As with the corresponding term in Equation (1) above, the second term in the numerator and denominator can all be ignored. In Equation (6), the first term of the molecule can be developed in a manner similar to the way in which the first term in the numerator of Equation (1) is developed. However, in this situation, with the query taken to be a model, the normalization term computed during the dynamic programming routine is simplified to one, and the molecule is simplified to P (a _i | q _j , c). As in the second situation discussed above, the dynamic programming routine calculates the related insertion and deletion probabilities for the phoneme inserted in the annotation for the query and for the phoneme deleted in the annotation for the query.

마지막으로, 제4 상황에서, 주해 및 질의가 모두 음성으로부터 생성될 때, 음소 데이터의 시퀀스는 모두 실제로 말해진 텍스트를 나타내는 모델 음소의 미지의 기준 시퀀스에 대한 삽입 및 삭제를 가질 수 있다. 이는 주해 음소의 시퀀스 a_i, a_i+1, a_i+2…와, 질의 음소의 시퀀스 q_j, q_j+1, q_j+2…와, 질의 및 주해에 의해 모두 실제로 말해진 음소의 기준 시퀀스를 나타내는 음소의 시퀀스 p_n, p_n+1, p_n+2… 사이의 가능한 매칭을 도시하는 도 6c에 도시되어 있다. 도 6c에 도시된 바와 같이, 이러한 경우에, 동적 프로그래밍 정렬 기술은 모델 음소의 기준 시퀀스에 대해 (모두 음소의 기준 시퀀스에서 2개의 음소와 정렬되는 음소 a_i+3및 q_j+1에 의해 나타낸) 주해 및 질의 모두로부터 음소의 삭제뿐만 아니라 (삽입된 음소 a_i+3및 q_j+1에 의해 나타낸) 주해 및 질의 모두에서 음소의 삽입을 허용하여야 한다.Finally, in the fourth situation, when annotations and queries are all generated from speech, the sequence of phoneme data may all have insertions and deletions to the unknown reference sequence of the model phoneme that actually represents the spoken text. This is the sequence of the annotation phoneme a _i , a _{i + 1} , a _{i + 2} ... And a phoneme sequence of the query _{_{q j, q j + 1,}} q j + 2 ... And a sequence of phonemes p _n , p _{n + 1} , p _{n + 2} ... indicating the reference sequence of the phonemes actually all spoken by the query and annotation. Lt; RTI ID = 0.0 > 6C. &Lt; / RTI > As shown in Fig. 6C, in this case, the dynamic programming alignment technique is applied to the reference sequence of model phonemes (represented by phoneme a _{i + 3} and q _{j + 1} aligned with two phonemes in the reference sequence of all phonemes ) The insertion of phonemes in both annotations and queries (as indicated by the inserted phoneme a _{i + 3} and q _{j + 1} ) as well as deletion of phonemes from both annotations and queries should be allowed.

당업자라면 상기 계산 내로 음소의 모델 시퀀스를 도입시킴으로써 알고리즘이 질의 및 주해 모두에서 발음 변화(pronunciation variation)에 대해 보다 융통성이 있다는 것을 이해할 것이다.Those skilled in the art will appreciate that by introducing a model sequence of phonemes into the above calculation, the algorithm is more flexible for pronunciation variation in both queries and annotations.

상기에서 본 실시예가 데이터베이스에서 주해 음소의 시퀀스와 질의 음소의 시퀀스를 매칭함으로써 정보 불러오기를 수행하는 방식에 대해 대체로 설명되었다. 본 실시예의 동작을 더 이해하기 위해, 이제는 표준 동적 프로그래밍 알고리즘을 간략하게 설명한 다음에 본 실시예에 사용되는 특정 알고리즘을 보다 상세하게 설명하기로 한다.In the above, the present embodiment has been generally described as a method of performing information retrieval by matching a sequence of a phoneme in a database and a sequence of a query phoneme in a database. To better understand the operation of the present embodiment, the standard dynamic programming algorithm will now be briefly described and then the specific algorithm used in this embodiment will be described in more detail.

DP 검색의 개요DP Search Overview

당업자에게 알려진 바와 같이, 동적 프로그래밍은 본 실시예에서 음소들인 특성 시퀀스들 사이의 최적 정렬을 찾아내는데 사용될 수 있는 기술이다. 이는 주해 음소의 시퀀스와 질의 음소의 시퀀스 사이의 가능한 매칭을 각각 나타내는 다수개의 동적 프로그래밍 경로를 동시에 전파함으로써 수행된다. 주해 및 질의가 시작되는 시작 널 노드에서 모든 경로들이 시작하여, 이들이 주해 및 질의의 끝인 종료 널 노드에 도달할 때까지 전파된다. 도 7 및 도 8은 수행되는 매칭과 이러한 경로 전파를 개략적으로 도시하고 있다. 특히, 도 7은 수평축이 주해를 위해 제공되고 수직축이 질의를 위해 제공되는 직교 좌표 플롯을 도시하고 있다. 시작 널 노드(ø_s)는 상부 좌측 코너에 제공되며 종료 널 노드(ø_s)는 하부 우측 코너에 제공된다. 도 8에 도시된 바와 같이, 주해의 음소는 수평축을 따라 제공되며 질의의 음소는 수직축을 따라 제공된다. 도 8은 또한 주해의 음소와 질의의 음소 사이의 가능한 정렬을 각각 나타내는 다수의 격자 지점을 도시하고 있다. 격자 지점(21)은 주해 음소 a₃과 질의 음소q₁사이의 가능한 정렬을 나타낸다. 또한, 도 8은 주해를 나타내는 음소의 시퀀스와 질의를 나타내는 음소의 시퀀스 사이의 3가지 가능한 매칭을 나타내며 시작 널 노드(ø_s)에서 시작하여 격자 지점들을 통해 종료 널 노드(ø_s)로 전파한다. 다시 상기 수학식 (2) 및 (3)을 참조하면, 이들 동적 프로그래밍 경로들은 상기 논의된 상이한 상태 시퀀스 s_q및 s_a를 나타낸다.As is known to those skilled in the art, dynamic programming is a technique that can be used to find optimal alignment between feature sequences that are phonemes in this embodiment. This is done by simultaneously propagating a number of dynamic programming paths each representing a possible match between the sequence of the annotation phoneme and the sequence of querying phonemes. All paths start at the start null node where the annotation and query begin, and propagate until they reach the end null node, which is the end of the annotation and query. Figures 7 and 8 schematically illustrate the matching performed and the path propagation performed. In particular, Figure 7 shows a Cartesian plot in which a horizontal axis is provided for annotations and a vertical axis is provided for queries. A start null node (? _S ) is provided in the upper left corner and an end null node (? _S ) is provided in the lower right corner. As shown in Fig. 8, the phoneme of the annotation is provided along the horizontal axis, and the phonemes of the query are provided along the vertical axis. Figure 8 also shows a number of lattice points each representing a possible alignment between phonemes in the annotation and ques- tions of the query. The lattice point 21 represents a possible alignment between the annotation phoneme a ₃ and the query phoneme q ₁ . Figure 8 also shows three possible matches between the sequence of phonemes representing the annotation and the sequence of phonemes representing the query and propagates from the start null node ( _s ) to the end null node ( _s ) through the lattice points . Referring again to the above equations (2) and (3), these dynamic programming paths represent the different state sequences s _q and s _a discussed above.

도 7에 도시된 수평축 및 수직축의 상이한 길이에 의해 표현된 바와 같이, 입력 질의는 주해의 모든 단어들을 포함할 필요는 없다. 예를 들어, 주해가 "타즈마할의 사진"이라면, 사용자는 질의 "타즈마할"을 입력함으로써 그 사진에 대해 데이터베이스(29)를 간단하게 검색할 수 있다. 이러한 상황에서, 최적의 정렬 경로는 질의가 주해와 매칭되기 시작할 때까지 상부 수평축을 따라 통과한다. 다음에, 격자 지점들을 통해 하부 수평축으로 통과하기 시작하여 종료 노드에서 종료한다. 이는 경로(23)로 도 7에 도시되어 있다. 그러나, 당업자라면 질의의 단어들이 그들이 주해에서 나타난 바와 동일한 순서여야 하며, 그렇지 않은 경우 동적 프로그래밍 정렬은 동작될 수 없을 것이라는 사실을 인식할 것이다.As represented by the different lengths of the horizontal and vertical axes shown in FIG. 7, the input query need not include all the words of the annotation. For example, if the annotation is "a picture of Taj Mahal", the user can simply search the database 29 for that picture by entering the query "Taj Mahal". In this situation, the optimal alignment path passes along the upper horizontal axis until the query begins to match the annotation. It then begins to traverse through the grid points to the lower horizontal axis and ends at the end node. This is shown in FIG. 7 as path 23. However, those skilled in the art will recognize that the words of the query must be in the same order as they appear in the annotation, otherwise the dynamic programming alignment will not work.

주해 음소들의 시퀀스와 질의 음소들의 시퀀스 사이의 유사점을 판정하기 위해, 동적 프로그래밍 프로세스는 전파하는 동적 프로그래밍 경로들 각각에 대한 스코어를 유지하는데, 스코어는 경로를 따라 정렬된 음소의 전체적인 유사점에 따른다. 시퀀스들이 매칭될 시에 음소들의 삭제 및 삽입의 수를 제한하기 위해, 동적 프로그래밍 프로세스는 동적 프로그래밍 경로가 전파할 수 있는 방향에 일정한 제약들을 둔다. 당업자라면, 상기와 같은 동적 프로그래밍 제약들이 상기에서 논의된 4가지 상황과 다를 것이라는 사실을 인식할 것이다.To determine similarities between a sequence of annotated phonemes and a sequence of query phonemes, the dynamic programming process maintains a score for each of the propagating dynamic programming paths, where the score follows the overall similarity of the phonemes aligned along the path. To limit the number of deletions and insertions of phonemes when sequences are matched, the dynamic programming process places certain constraints on the direction in which the dynamic programming path can propagate. Those skilled in the art will recognize that such dynamic programming constraints will differ from the four situations discussed above.

DP 제약DP Pharmaceutical

주해 및 질의가 모두 텍스트이다.Both annotations and queries are text.

질의 음소 데이터와 주해 음소 데이터가 모두 텍스트로부터 생성되는 경우, 동적 프로그래밍 정렬은 2개의 음소 시퀀스들 사이의 부울 매치(boolean match)로 변성된다.When both the query phonemic data and the annotation phoneme data are generated from text, the dynamic programming alignment is transformed into a boolean match between two phoneme sequences.

주해는 텍스트이고 질의는 음성이다.Note is text and query is speech.

주해 음소 데이터는 텍스트로부터 생성되고 질의 음소 데이터는 음성으로부터 생성되는 경우에, 주해 내에는 음소 삭제 및 삽입이 존재할 수 없지만 주해와 관련된 질의 내에는 음소 삭제 및 삽입이 존재할 수 있다. 도 9a는 주해가 텍스트로부터 생성되고 질의는 음성으로부터 생성될 때 본 실시예에 사용되는 동적 프로그래밍 제약들을 도시하고 있다. 도시된 바와 같이, 동적 프로그래밍 경로가 주해 음소 a_i와 질의 음소 q_j사이의 정렬을 나타내는 격자 지점 (i, j)에서 종료한다면, 동적 프로그래밍 경로는 격자 지점들 (i+1, j), (i+1, j+1), 및 (i+1, j+2)로 전파할 수 있다. 지점 (i+1, j)로의 전파는 타이핑된 주해와 비교한 구두 질의로부터의 음소 삭제가 있는 경우를 나타낸다. 지점 (i+1, j+1)로의 전파는 주해 내의 다음 음소와 질의 내의 다음 음소 사이에 간단한 디코딩이 있는 경우를 나타낸다. 지점 (i+1, j+2)로의 전파는 타이핑된 주해와 비교한 구두 질의 내에 음소 q_j+1의 삽입이 있으며 주해 음소 a_i+1과 질의 음소 q_j+2사이에 디코딩이 있는 경우를 나타낸다.In the case where the annotated phoneme data is generated from text and the query phonemic data is generated from speech, there may be no phoneme deletion and insertion in the annotation, but there may be phonemic deletion and insertion in the query related to the annotation. 9A shows the dynamic programming constraints used in this embodiment when an annotation is generated from text and a query is generated from speech. As shown, if the dynamic programming path ends at a lattice point (i, j) that represents an alignment between the algebraic phoneme _ai and the query phoneme _qj , then the dynamic programming path is the lattice points (i + 1, j) i + 1, j + 1) and (i + 1, j + 2). Propagation to point (i + 1, j) represents a case where there is phoneme deletion from a verbal query compared to a typed annotation. Propagation to point (i + 1, j + 1) represents a case where there is a simple decoding between the next phoneme in the annotation and the next phoneme in the query. Propagation to point (i + 1, j + 2) has the insertion of phoneme q _{j + 1 in} the verbal query compared to the typed annotation, and there is decoding between the phoneme a _{i + 1} and the query phoneme q _{j + 2} .

주해는 음성이고 질의는 텍스트이다.The commentary is voice and the query is text.

주해가 음성으로부터 생성되고 질의가 텍스트로부터 생성되는 경우, 질의로부터의 음소 삽입 또는 삭제는 존재할 수 없지만 질의에 관한 주해로부터의 삽입 및 삭제는 존재할 수 있다. 도 9b는 주해가 음성으로부터 생성되고 질의가 텍스트로부터 생성될 때 본 실시예에서 사용되는 동적 프로그래밍 제약들을 도시하고 있다. 도시된 바와 같이, 동적 프로그래밍 경로가 주해 음소 a_i와 질의 음소 q_j사이의 정렬을 나타내는 격자 지점 (i, j)에서 종료한다면, 동적 프로그래밍 경로는 격자 지점들 (i, j+1), (i+1, j+1), 및 (i+2, j+1)로 전파할 수 있다. 지점 (i, j+1)로의 전파는 타이핑된 주해와 비교한 구두 질의로부터의 음소 삭제가 있는 경우를 나타낸다. 지점 (i+1, j+1)로의 전파는 주해 내의 다음 음소와 질의 내의 다음 음소 사이에 간단한 디코딩이 있는 경우를 나타낸다. 지점 (i+2, j+1)로의 전파는 타이핑된 주해와 비교한 구두 질의 내에 음소 a_i+1의 삽입이 있으며 주해 음소 a_i+2와 질의 음소 q_j+1사이에 디코딩이 있는 경우를 나타낸다.If the annotation is generated from speech and the query is generated from text, there may be no phonemic insertions or deletions from the query, but insertions and deletions from the annotations about the query may exist. Figure 9B shows the dynamic programming constraints used in this embodiment when an annotation is generated from speech and a query is generated from the text. As shown, if the dynamic programming path ends at a lattice point (i, j) that represents an alignment between the algebraic phoneme _ai and the query phoneme _qj , then the dynamic programming path is the lattice points (i, j + i + 1, j + 1), and (i + 2, j + 1). Propagation to point (i, j + 1) represents the case of phoneme deletion from a verbal query compared to the typed annotation. Propagation to point (i + 1, j + 1) represents a case where there is a simple decoding between the next phoneme in the annotation and the next phoneme in the query. Propagation to the point (i + 2, j + 1) has the insertion of the phoneme a _{i + 1 in} the verbal query compared to the typed annotation, and there is decoding between the annotation phoneme a _{i + 2} and the query phoneme q _{j + 1} .

주해가 음성이고 질의가 음성이다.Zhuhai is the voice and the query is the voice.

주해와 질의 모두가 음성으로부터 생성되는 경우, 주해 및 나머지에 관한 질의 각각으로부터 음소들이 삽입 및 삭제될 수 있다. 도 11은 주해 음소들 및 질의 음소들이 음성으로부터 생성될 때 본 실시예에서 사용되는 동적 프로그래밍 제약들을 도시하고 있다. 특히, 동적 프로그래밍 경로가 주해 음소 a_i와 질의 음소 q_j사이의 정렬을 나타내는 격자 지점 (i, j)에서 종료한다면, 동적 프로그래밍 경로는 격자 지점들 (i+1, j), (i+2, j), (i+3, j), (i, j+1), (i+1, j+1), (i+2, j+1), (i, j+2), (i+1, j+2), 및 (i, j+3)으로 전파할 수 있다. 따라서, 이들 전파는 실제로 말해진 텍스트에 대응하는 모델 음소들의 질의 및 주해 모두에서 음소의 삽입 및 삭제가 가능하도록 한다.If both annotations and queries are generated from speech, phonemes can be inserted and deleted from each of the queries about annotations and remainders. Figure 11 shows the dynamic programming constraints used in this embodiment when annotation phonemes and query phonemes are generated from speech. In particular, if the dynamic programming path ends at a lattice point (i, j) that represents an alignment between the annotation phoneme _ai and the query phoneme _qj , then the dynamic programming path is the lattice points (i + 1, j) i, j + 1, j + 1, j + 1, j + 1, j + 1, +1, j + 2), and (i, j + 3). Thus, these propagations enable the insertion and deletion of phonemes in both the query and annotation of model phonemes corresponding to the actually spoken text.

DP 제약의 시작과 종료The start and end of DP constraints

본 실시예에서, 동적 프로그래밍 정렬 동작은 동적 프로그래밍 경로가 임의의 주해 음소들에서 시작 및 종료하도록 한다. 그 결과, 질의 단어들이 그들이 주해에서 나타난 순서와 동일할 필요가 있다해도, 질의는 주해의 모든 단어를 포함할 필요가 없게된다.In this embodiment, the dynamic programming sorting operation causes the dynamic programming path to start and end at any commentary phonemes. As a result, even if the query words need to be identical to the order in which they appear in the commentary, the query does not need to include all the words of the commentary.

DP 스코어 전파DP score propagation

전술한 바와 같이, 동적 프로그래밍 프로세스는 동적 프로그래밍 경로들 각각에 대해 스코어를 유지하는데, 이 스코어는 경로를 따라 정렬된 음소들의 유사점에 따른다. 그러므로, 지점 (i, j)에서 종료한 경로가 3가지 다른 지점으로 전파할 때, 동적 프로그래밍 프로세스는 그와 같이 행한 각각의 "코스트(cost)"를 지점 (i, j)에서 종료하는 경로에 대한 누적 스코어에 추가하는데, 이는 그 지점과 연관된 저장소 (SCORE(i, j)) 내에 저장된다. 당업자라면, 상기 코스트가 상술한 삽입 확률들, 삭제 확률들, 및 디코딩 확률들을 포함하고 있다는 것을 인식할 것이다. 특히, 삽입이 있을 때, 누적 스코어는 주어진 음소를 삽입하는 확률과 승산된다. 삭제가 있을 때, 누적 스코어는 음소를 삭제하는 확률과 승산된다. 디코딩이 있을 때, 누적 스코어는 2개의 음소를 디코딩하는 확률과 승산된다.As described above, the dynamic programming process maintains a score for each of the dynamic programming paths, which depends on the similarity of the phonemes aligned along the path. Therefore, when a path terminated at a point (i, j) propagates to three different points, the dynamic programming process begins by traversing each " cost " , Which is stored in the store (SCORE (i, j)) associated with that point. Those skilled in the art will recognize that the cost includes the above-described insertion probabilities, deletion probabilities, and decoding probabilities. In particular, when there is an insertion, the cumulative score is multiplied by the probability of inserting the given phoneme. When there is an erasure, the cumulative score is multiplied by the probability of erasing the phoneme. When there is decoding, the cumulative score is multiplied by the probability of decoding two phonemes.

이러한 확률들을 계산할 수 있도록 하기 위해, 시스템은 모든 가능한 음소 조합들에 대한 확률을 저장한다. 본 실시예에서, 주해 또는 질의의 음소의 삭제는디코딩에 대해서와 유사한 방식으로 취급된다. 이는 간단하게 삭제를 다른 음소로서 취급함으로써 이루어진다. 따라서, 시스템에 공지된 43개의 음소가 존재한다면, 시스템은 각각의 가능한 음소 디코딩 및 삭제에 대해 하나씩인 1892개 (1892 = 43 × 44)의 디코딩/삭제 확률을 저장할 것이다. 이는 음소 /ax/에 대해 저장된 가능한 음소 디코딩들을 도시하며 삭제 음소 (ø)를 확률 중 하나로서 포함하는 도 10에 나타나 있다. 당업자라면, 다른 확률들이 존재하지 않으므로, 주어진 음소에 대한 모든 디코딩 확률들은 하나로 합해져야한다는 것을 인식할 것이다. 이러한 디코딩/삭제 확률들에 추가하여, 시스템은 각각의 가능한 음소 삽입에 대해 하나씩인 43개의 삽입 확률을 저장한다. 후술되는 바와 같이, 이들 확률들은 트레이닝 데이터에 앞서 판정된다.To be able to calculate these probabilities, the system stores the probabilities for all possible phoneme combinations. In this embodiment, deletion of phonemes in annotations or queries is handled in a similar manner as for decoding. This is done simply by treating the deletion as another phoneme. Thus, if there are 43 phonemes known in the system, the system will store 1892 (1892 = 43 x 44) decoding / erasure probabilities, one for each possible phoneme decoding and erasure. This shows possible phoneme decodings that can be stored for phoneme / ax /, and is shown in Fig. 10 which includes the phoneme erased as one of the probabilities. Those skilled in the art will appreciate that since there are no other probabilities, all decoding probabilities for a given phoneme must be summed together. In addition to these decoding / erasure probabilities, the system stores 43 insertion probabilities, one for each possible phoneme insert. As described below, these probabilities are determined prior to the training data.

스코어 전파를 설명하기 위해, 다수의 예들이 고려될 수 있다. 주해가 텍스트이고 질의가 음성인 경우, 지점 (i, j)로부터 지점 (i+1, j+2)로 전파한 경로에 대해, 음소 q_j+1은 주해에 관련하여 삽입되고 질의 음소 q_j+2는 주해 음소 a_i+1로 디코딩된다. 따라서, 지점 (i+1, j+2)로 전파되는 스코어는 다음과 같다.To illustrate score propagation, a number of examples can be considered. For the path propagated from the point (i, j) to the point (i + 1, j + 2), the phoneme q _{j + 1} is inserted in relation to the annotation and the query phoneme q _{j +2} is decoded to the annotation phoneme _{ai + 1} . Therefore, the score propagated to the point (i + 1, j + 2) is as follows.

여기서 PI(q_j+1｜C)는 구두 질의의 삽입 음소 q_j+1의 확률이며 P(q_j+2｜a_i+1, C)는 주해 음소 a_i+1을 질의 음소 q_j+2로서 디코딩하는 확률을 나타낸다.Where _{PI (q j + 1 | C} ) is the probability of the inserted phoneme q _{j + 1} of the shoe query P (q _{j + 2} | a _{i + 1,} C) are annotation phoneme a to _{i + 1} query phoneme q _{j + 2} < / RTI >

주해 및 질의 모두가 음성으로부터 생성되는 경우, 지점 (i, j)로부터 지점(i+2, j+1)로 전파할 때, 주해 음소 a_i+1은 질의에 대해 삽입되고 주해 음소 a_i+2와 질의 음소 q_j+1사이에 디코딩이 존재한다. 그러므로, 지점 (i+2, j+1)로 전파되는 스코어는 다음의 수학식 8과 같이 제공된다.Annotation and the query time all be propagated to the case that is generated from the voice, point to point (i + 2, j + 1 ) from the (i, j), annotation phoneme a _{i + 1} is inserted to the query annotation phoneme a _{i + 2} and the query phoneme q _{j + 1} . Therefore, the score propagated to the point (i + 2, j + 1) is given by the following equation (8).

당업자라면, 이러한 경로 전파 동안에, 여러 경로들이 동일한 격자 지점에서 일치한다는 것을 인식할 것이다. 본 실시예에서, 일치하는 경로들과 연관된 스코어들이 간단하게 함께 가산된다. 다르게는, 스코어들 간의 비교가 이루어질 수 있으며 최상의 스코어를 갖는 경로가 계속되고 다른 경로들은 무시될 수 있다. 그러나, 이는 본 실시예에서 중요한 것은 아닌데, 이는 동적 프로그래밍 프로세스가 질의의 음소 데이터와 주해의 음소 데이터 사이의 유사점을 나타내는 스코어를 찾는 것에만 관심이 있는 것이기 때문이다. 2개 사이에 최상의 정렬이 무엇인지를 알아내는 것에 관심이 있는 것이 아니다.Those skilled in the art will appreciate that, during such path propagation, several paths coincide at the same lattice point. In this embodiment, the scores associated with matching paths are simply added together. Alternatively, comparisons between scores can be made and the path with the best score continues and other paths can be ignored. However, this is not important in this embodiment because the dynamic programming process is only interested in finding a score that represents the similarity between the phonemic data of the query and the phonemic data of the annotation. I am not interested in finding out what the best alignment is between the two.

질의 및 주해가 음성으로부터 생성되는 경우, 모든 경로들이 종료 노드(ø_e)로 전파되고 질의와 현재 주해 사이의 유사점에 대한 전체 스코어가 판정되기만 하면, 시스템은 DP 프로세스 동안에 누적되었던 정규화 조건을 사용하여 상기 스코어를 정규화한다. 다음에, 시스템은 유사한 방식으로 질의를 다음 주해와 비교한다. 질의가 모든 주해들과 매칭되었으면, 주해에 대해 정규화된 스코어들은계층화(ranking)되고 이 계층화를 기초로 하여 시스템은 사용자에게 입력 질의에 가장 유사한 사용자 주해를 출력한다.If the query and annotation are generated from speech, then all the paths are propagated to the end node ( _e ) and the overall score for the similarity between the query and the current annotation is determined, the system uses the normalization condition that was accumulated during the DP process Normalize the score. Next, the system compares the query with the following annotation in a similar manner. If the query has matched all annotations, the normalized scores for the annotation are ranked and based on this stratification, the system outputs the user annotation most similar to the input query to the user.

DP 검색의 상세한 설명Detailed description of DP search

질의 음소들의 시퀀스를 주해 음소들의 시퀀스와 매칭시킬 때 동적 프로그래밍 검색이 수행되는 방식에 대해 보다 상세히 설명될 것이다. 도 12를 참조하면, 단계 s101에서, 시스템은 동적 프로그래밍 스코어들을 초기화한다. 다음에, 단계 s103에서, 시스템은 널 시작 노드(ø_s)로부터 모든 가능한 시작 지점들로 경로들을 전파시킨다. 다음에, 단계 s105에서, 시스템은 상기 논의된 동적 프로그래밍 제약들을 사용하여 모든 시작 지점들로부터 모든 가능한 종료 지점들로 동적 프로그래밍 경로들을 전파시킨다. 최종적으로, 단계 s107에서, 시스템은 종료 지점들에서 종료하는 경로들을 널 종료 노드(ø_e)로 전파시킨다.The manner in which a dynamic programming search is performed when matching a sequence of querying phonemes with a sequence of commented phonemes will be described in more detail. Referring to Fig. 12, in step s101, the system initializes dynamic programming scores. Next, at step s103, the system propagates the paths from the null starting node ( _s ) to all possible starting points. Next, at step s105, the system propagates the dynamic programming paths from all start points to all possible end points using the dynamic programming constraints discussed above. Finally, at step s107, the system propagates the paths terminating at the end points to the null end node ( _e ).

도 13은 널 시작 노드(ø_s)로부터 동적 프로그래밍 제약들에 의해 정의된 모든 가능한 시작 지점들로 동적 프로그래밍 경로들을 전파시킬 때 단계 s103에서 포함되는 프로세싱 단계들을 보다 상세히 도시하고 있다. 제약들 중 하나는 동적 프로그래밍 경로가 임의의 주해 음소들 및 다른 제약에서 시작할 수 있다는 것인데, 이는 질의가 텍스트인지 음성인지에 따라 질의 음소들의 시퀀스에서 허용되는 홉(hop)들의 수를 정의한다. 특히, 질의가 텍스트로부터 생성되는 경우, 시작 지점들은 검색 공간 내의 격자 지점들의 제1행, 즉 Nann-1에 대해 i=0인 지점들 (i, 0)을 포함하고, 질의가 음성으로부터 생성되는 경우, 검색 공간 내의 격자 지점들의 첫 번째 4개의 행들, 즉 Nann-1에 대해 i=0인 지점들 (i, 0), (i, 1), (i, 2) 및 (i, 3)을 포함한다.FIG. 13 shows in more detail the processing steps involved in step s103 when propagating dynamic programming paths to all possible starting points defined by dynamic programming constraints from null starting node ( _s ). One of the constraints is that the dynamic programming path can start with any annotation phonemes and other constraints, which define the number of hops that are allowed in the sequence of query phonemes depending on whether the query is text or speech. In particular, when the query is generated from text, the starting points include points (i, 0) where i = 0 for the first row of lattice points in the search space, i. E., Nann-1, (I, 0), (i, 1), (i, 2) and (i, 3) with i = 0 for the first four rows of grid points in the search space, .

도 13에 도시된 단계들을 참조로 하여 행해질 수 있는 방법이 설명될 것이다. 도시된 바와 같이, 단계 s111에서, 시스템은 입력 질의가 텍스트 질의인지 아닌지를 판정한다. 텍스트 질의라면, 프로세싱은 시스템이 질의가 텍스트일 때 질의의 음소들의 시퀀스에서 허용되는 "홉들(hops)"의 최대 수를 정의하는 변수 mx의 값을 1로 설정하는 단계 s113으로 진행한다. 다음에, 프로세싱은 널 시작 노드로부터 격자 지점 (i, 0)으로 통과시키기 위한 천이 스코어를 Nann-1에 대해 i=0인 지점들 (i, 0)과 연관된 스코어 (SCORE(i, 0))에 가산함으로써, 검색 공간의 첫 번째 행 내의 격자 지점들 각각에서 동적 프로그래밍 경로를 시작하도록 동작 가능한 단계들 s115, s117, 및 s119로 진행한다. 질의가 텍스트일 때, 도 12에 도시된 단계 s103에서의 프로세싱을 종료하고 다음에 프로세싱은 단계 s105로 진행한다.A method which can be performed with reference to the steps shown in Fig. 13 will be described. As shown, in step s111, the system determines whether the input query is a text query or not. If the query is text, the processing proceeds to step s113 where the system sets the value of the variable mx to 1, which defines the maximum number of " hops " allowed in the sequence of phonemes of the query when the query is text. (I, 0) associated with points (i, 0) with i = 0 for Nann-1, and a transition score for passing from the null start node to the lattice point (i, To steps s115, s117, and s119 that are operable to start the dynamic programming path at each of the lattice points in the first row of the search space. When the query is text, the processing in step s103 shown in Fig. 12 is ended, and then the processing proceeds to step s105.

시스템이 단계 s111에서 질의가 텍스트가 아니고 구두 입력으로부터 생성된 것으로 판정한 경우, 시스템은 동적 프로그래밍 제약들에 의해 허용되는 "홉"들의 최대수 이상인 값을 갖는 값을 갖는 제약인 mxhops로 mx가 설정되는 단계 s121로 진행한다. 도 9 및 도 10에 도시된 바와 같이, 질의가 음성인 경우에, 경로는 기껏해야 질의 음소들의 시퀀스를 따르는 3개의 음소들인 질의 음소로 점프할 수 있다. 따라서, 본 실시예에서, 질의 내에 4개 이상의 음소들이 존재하는 경우 mxhops는 4의 값을 갖고 변수 mx는 4로 설정되며, 그렇지 않은 경우 mx는 질의 내의 음소들의 수로 동일하게 설정된다. 다음에, 프로세싱은 대응하는 천이 확률에대응하는 격자 지점과 연관된 스코어를 가산함으로써 검색 공간의 첫 번째 4개의 행들 내의 격자 지점들 각각에서 동적 프로그래밍 경로들이 시작하도록 동작 가능한 단계들 s123, s125, s127, s129, 및 s131로 진행한다. 질의가 구두 입력으로부터 생성될 때, 도 12에 도시된 단계 s103에서 프로세싱이 종료하며 다음에 프로세싱은 단계 s105로 진행한다.If the system determines in step s111 that the query is not from text and is generated from verbal input, then the system determines that mx is set to mxhops, which is a constraint with a value having a value that is at least equal to the maximum number of " hops & The process proceeds to step S121. As shown in FIGS. 9 and 10, in the case where the query is negative, the path can jump to the query phoneme which is at most three phonemes following the sequence of query phonemes. Thus, in the present embodiment, when there are four or more phonemes in the query, mxhops has a value of 4 and the variable mx is set to 4; otherwise, mx is set equal to the number of phonemes in the query. The processing then includes steps s123, s125, s127, s127, s122, s123, s124, s122, s124, s122, s129, and s131. When the query is generated from the verbal input, the processing ends at step s103 shown in Fig. 12, and then the processing goes to step s105.

본 실시예에서, 시스템은 래스터(raster)와 같은 기술로 열마다 검색 공간 내의 격자 지점들을 프로세싱함으로써 단계 s105에서 시작 지점들로부터 종료 지점들로의 동적 프로그래밍 경로들을 전파한다. 단계 s105에서, 시스템은 주해 음소 루프 포인터 i를 주해 내의 음소수(Nann)와 비교한다. 처음에, 주해 음소 루프 포인트(i)는 0으로 설정되고 프로세싱은 질의 내의 전체 음소수(Nquery)와 관련하여 질의 음소 루프 포인터 j에 대해 유사한 비교가 이루어지는 단계 s153으로 진행할 것이다. 처음에, 루프 포인터 j가 0으로 또한 설정되고 프로세싱은 상기 논의된 동적 프로그래밍 제약들을 사용하여 시스템이 지점 (i, j)에서 종료하는 경로를 전파하는 단계 s155로 진행한다. 시스템이 단계 s155에서 경로들을 전파하는 방법은 이후에 보다 상세히 설명될 것이다. 단계 s155 후에, 루프 포인터 j는 단계 s157에서 1만큼 증가되고 프로세싱은 단계 s153으로 복귀한다. 이러한 처리가 질의 내의 모든 음소들을 통해 루프되면 (따라서 격자 지점들의 현재 열을 프로세싱하면), 프로세싱은 질의 음소 루프 포인터 j는 0으로 재설정되고 주해 음소 루프 포인터 I는 1만큼 증가되는 단계 s159로 진행한다. 다음에, 프로세싱은 유사한 과정이 격자 지점들의 다음 열에 대해 수행되는 단계 s151로 복귀한다. 격자 지점들의 최종열이 프로세싱되면, 프로세싱은 주해 음소 루프 포인터 i는 0으로 재설정되는 단계 s161로 진행하여 도 12에 도시된 단계 s105에서의 처리가 종료한다.In this embodiment, the system propagates dynamic programming paths from start points to end points in step s105 by processing lattice points within the search space per row with a technique such as raster. In step s105, the system compares the annotation loop pointer i with the number of phonemes Nann in the annotation. Initially, the annotation phoneme loop point (i) is set to zero and the processing will proceed to step s153 where a similar comparison is made to the query phoneme loop pointer j with respect to the total number of phonemes (Nquery) in the query. Initially, loop pointer j is also set to zero and processing proceeds to step s155 where the system propagates the path terminating at point (i, j) using the dynamic programming constraints discussed above. The manner in which the system propagates the paths in step s155 will be described in more detail below. After step s155, the loop pointer j is incremented by one in step s157 and the processing returns to step s153. If this process is looped through all of the phonemes in the query (thus processing the current row of lattice points), processing proceeds to step s159 where the query phoneme loop pointer j is reset to zero and the annotation loop pointer I is incremented by one . Processing then returns to step s151 where a similar process is performed for the next column of lattice points. If the last row of the lattice points is processed, the processing proceeds to step s161 where the annotation loop pointer i is reset to 0, and the processing in step s105 shown in Fig. 12 ends.

도 15는 종료 지점들에서의 경로들이 종료 널 노드(ø_e)로 전파할 때 도 12에 도시된 단계 s107에 포함된 프로세싱 단계들을 보다 상세히 도시하고 있다. 시작 널 노드(ø_s)로부터 전파함에 있어서, "종료 지점"인 격자 지점들은 질의가 텍스트인지 음성인지에 따라 동적 프로그래밍 제약들에 의해 정의된다. 또한, 본 실시예에서, 동적 프로그래밍 제약들은 동적 프로그래밍 경로들이 주해 음소들의 시퀀스에 따르는 임의의 지점에서 주해를 벗어나도록 한다. 따라서, 질의가 텍스트라면, 시스템은 격자 지점들, 즉 Nann-1에 대해 i=0인 지점들 (i, Nquery-1)의 최종행에서 종료하는 동적 프로그래밍 경로들이 종료 널 노드(ø_e)로 전파하도록 한다. 그러나, 질의가 음성으로부터 생성되었다면, 시스템은 격자 지점들, 즉 Nann-1에 대해 i=0인 지점들 (i, Nquery-4), (i, Nquery-3), (i, Nquery-2), 및 (i, Nquery-1)의 최종 4개의 행들 내에서 전파하는 임의의 경로가 종료 널 노드(ø_e)로 전파하도록 한다.Fig. 15 shows in more detail the processing steps included in step s107 shown in Fig. 12 when the paths at the end points propagate to the end null node _e . In propagating from the start null node ( _s ), the lattice points that are " end points " are defined by dynamic programming constraints, depending on whether the query is text or speech. Further, in this embodiment, the dynamic programming constraints cause the dynamic programming paths to depart from the annotation at any point along the sequence of annotation phonemes. Thus, if the question text, the system with i = 0 in the point (i, Nquery-1) dynamic programming paths end to end at the last row of the board nodes (ø _e) for the grid point, that is Nann-1 Propagate. (I, Nquery-4), (i, Nquery-3), (i, Nquery-2) where i = 0 for the lattice points, Nann-1, , And (i, Nquery-1) to propagate to the end null node ( _e ).

도 15에 도시된 바와 같이, 프로세스는 시스템이 질의가 텍스트인지 아닌지를 판정하는 단계 s171에서 시작한다. 텍스트라면, 프로세싱은 질의 음소 루프 포인터 j가 Nquery-1로 설정되는 단계 s173으로 진행한다. 다음에, 프로세싱은 주해 음소 루프 포인터 i가 주해 내의 음소수(Nann)와 비교되는 단계 s175로 진행한다. 처음에, 주해 음소 루프 포인터 i는 0으로 설정되고 프로세싱은 시스템이 지점 (i,Nquery-1)로부터 널 종료 노드(ø_e)로의 천이 스코어를 계산하는 단계 s117로 진행할 것이다. 다음에, 이 천이 스코어는 SCORE(i, Nquery-1)에 저장되는 지점 (i, Nquery-1)에서 종료하는 경로에 대한 누적 스코어와 비교된다. 상술한 바와 같이, 본 실시예에서, 승산을 수행할 필요성을 제거하고 높은 부동 소수점 정밀도의 사용을 회피하기 위해, 시스템은 천이 및 누적 스코어들에 대한 로그 확률들을 사용한다. 그러므로, 단계 s179에서, 시스템은 지점 (i, Nquery-1)에서 종료하는 경로에 대한 누적 스코어를 단계 s117에서 계산된 천이 스코어에 가산하고 결과는 임시 저장소인 TEMPENDSCORE로 복사된다.As shown in Fig. 15, the process starts in step s171 in which the system determines whether the query is text or not. If it is text, the processing proceeds to step s173 where the query phoneme loop pointer j is set to Nquery-1. Next, the processing proceeds to step s175 in which the annotated phoneme loop pointer i is compared with the number of phonemes Nann in the annotation. Initially, the annotation loop pointer i is set to 0 and processing will proceed to step s117 where the system computes a transition score from the point (i, Nquery-1) to the null end node ( _e ). Next, this transition score is compared to the cumulative score for the path ending at point (i, Nquery-1) stored in SCORE (i, Nquery-1). As described above, in this embodiment, the system uses log probabilities for transition and cumulative scores to eliminate the need to perform multiplications and avoid the use of high floating point precision. Therefore, at step s179, the system adds the cumulative score for the path terminating at point (i, Nquery-1) to the transition score calculated at step s117 and the result is copied to the temporary store TEMPENDSCORE.

상술한 바와 같이, 2개 이상의 동적 프로그래밍 경로가 동일한 지점에서 만나게 된다면, 경로들 각각에 대한 누적 스코어들은 함께 가산된다. 그러므로, 로그 확률들이 사용되고 있으므로, 일치하는 경로들과 연관된 스코어들이 확률들로 다시 효과적으로 변환되어, 가산된 다음 로그 확률들로 재변환된다. 본 실시예에서, 이러한 연산은 "로그 가산"으로서 칭한다. 이는 널리 공지된 기술이며 예를 들어, 제목이 "자동 음성 인식 (Sphinx) 시스템의 개발"인 리, 카이-후(Lee, Kai-Fu)에 의해 클루워 아카데믹 퍼블리셔(Kluwer Academic Publisher)에서 1989에 발행된 책의 제28면 및 제29면에 기술되어 있다.As described above, if two or more dynamic programming paths are encountered at the same point, the cumulative scores for each of the paths are added together. Therefore, since log probabilities are being used, the scores associated with matching paths are effectively transformed back into probabilities again, and are re-transformed into the next log probabilities that are added. In this embodiment, such an operation is referred to as " log addition ". This is a well known technique and is described, for example, in Kluwer Academic Publisher, 1989 by Lee, Kai-Fu, entitled " Development of an Automatic Speech Recognition System (Sphinx) Are described on pages 28 and 29 of the published book.

지점 (i, Nquery-1)으로부터 널 종료 노드로의 전파 경로가 다른 동적 프로그래밍 경로들과 만날 것이므로, 시스템은 종료 노드(ENDSCORE)에 저장된 스코어와 TEMPENDSCORE의 로그 가산을 수행하고 그 결과는 ENDSCORE에 저장된다. 다음에,프로세싱은 격자 지점들의 최종행 내의 다음 격자 지점에 대해 유사한 프로세스가 수행되는 단계 S175로 복귀한다. 최종행 내의 모든 격자 지점들이 상기와 같은 방법으로 처리되면, 도 12에 도시된 단계 s107에서 수행되는 프로세싱은 종료된다.Since the propagation path from point (i, Nquery-1) to the null-termination node will meet with other dynamic programming paths, the system performs log addition of the TEMPENDSCORE and score stored in the end node (ENDSCORE) and stores the result in ENDSCORE do. Processing then returns to step S175 where a similar process is performed for the next lattice point in the last row of lattice points. If all the grid points in the final row are processed in the same manner as described above, the processing performed in step s107 shown in Fig. 12 ends.

질의가 텍스트가 아닌 것으로 단계 S171에서 판정된다면, 프로세싱은 질의 음소 루프 포인터 j가 질의 내의 음소수 - mxhops, Nquery-4로 설정되는 단계 s185로 진행한다. 다음에, 프로세싱은 주해 음소 루프 포인터 i가 주해 내의 음소수(Nann) 미만인지를 알아내기 위해 시스템이 검사하는 단계 s187로 진행한다. 처음에, 주해 음소 루프 포인터 i는 0으로 설정되고 프로세싱은 질의 음소 루프 포인터 j가 질의 내의 음소수(Nquery) 미만인지를 알아내기 위해 시스템이 검사하는 단계 s189로 진행한다. 다음에, 프로세싱은 시스템이 격자 지점 (i, j)로부터 널 종료 노드(ø_e)로의 천이 스코어를 계산하는 단계 s191로 진행한다. 다음에, 단계 s193에서, 이 천이 스코어는 지점 (i, j)에서 종료하는 경로에 대한 누적 스코어에 가산되며 그 결과는 임시 스코어 TEMPENDSCORE로 복사된다. 다음에, 프로세싱은 시스템이 TEMPENDSCORE와 ENDSCORE의 로그 가산을 수행하고 그 결과는 ENDSCORE에 저장된다. 다음에, 프로세싱은 질의 음소 루프 포인터 j가 1만큼 증가되는 단계 s197로 진행하고 프로세싱은 단계 s189로 복귀한다. 다음에, 상기 프로세싱 단계들은 질의 음소 루프 포인터 j가 증가되어 질의 내의 음소수(Nquery)와 동일해질 때까지 반복된다. 다음에, 프로세싱은 질의 음소 루프 포인터 j가 Nquery-4로 설정되고 주해 음소 루프 포인터 i가 1만큼 증가되는 단계 s199로 진행한다. 다음에, 프로세싱은 s187로 복귀한다. 상기 프로세싱 단계들은 상기와 같은 방식으로 검색 공간의 최종 4개의 행들 내의 모든 격자 지점들이 처리될 때까지 반복되고, 그 후에 도 12에 도시된 단계 s107에서 수행되는 프로세싱이 종료한다.If the query is determined to be non-text in step S171, processing proceeds to step s185 where the query phoneme loop pointer j is set to the number of phonemes in the query - mxhops, Nquery-4. Next, the processing proceeds to step s187 where the system checks to see if the annotation loop pointer i is less than the number of phonemes (Nann) in the annotation. Initially, the annotation loop pointer i is set to zero and processing proceeds to step s189 where the system checks to see if the query phoneme loop pointer j is less than the number of phonemes in the query (Nquery). Next, processing proceeds to step s191 in which the system calculates a transition score from the lattice point (i, j) to the null end node ( _e ). Next, at step s193, this transition score is added to the cumulative score for the path that ends at point (i, j), and the result is copied to the temporary score TEMPENDSCORE. Next, processing performs the log addition of TEMPENDSCORE and ENDSCORE, and the result is stored in ENDSCORE. Next, the processing proceeds to step s197 where the query phoneme loop pointer j is increased by 1, and the processing returns to step s189. Next, the processing steps are repeated until the query phoneme loop pointer j is incremented to become equal to the number of phonemes in the query (Nquery). Next, the processing proceeds to step s199 where the query phoneme loop pointer j is set to Nquery-4 and the annotation loop pointer i is incremented by one. Next, the processing returns to S187. The processing steps are repeated until all the grid points in the last four rows of the search space have been processed in the same manner, and then the processing performed in step s107 shown in FIG. 12 ends.

전파spread

도 14에 도시된 단계 s155에서, 시스템은 상기 논의된 동적 프로그래밍 제약들을 사용하여 격자 지점 (i, j)에서 종료하는 경로를 전파한다. 도 16은 전파 단계를 수행할 때 포함되는 프로세싱 단계들을 도시한 흐름도이다. 도시된 바와 같이, 단계 s211에서, 시스템은 2개의 변수들 mxi 및 mxj의 값들을 설정하고 주해 음소 루프 포인터 i2와 질의 음소 루프 포인터 j2를 초기화한다. 루프 포인터 i2 및 j2는 지점 (i, j)에서 종료하는 경로가 전파하는 모든 격자 지점들을 통해 루프되도록 제공되며 변수들 mxi 및 mxj는 i2 및 j2가 동적 프로그래밍 제약들에 의해 허용되는 값들만을 취할 수 있도록 보정하는데 사용된다. 특히, mxi가 주해 내의 음소수 이하이면 i + mxhops로 설정되고, 그렇지 않으면 mxi는 주해 내의 음소수(Nann)로 설정된다. 유사하게, mxj가 주해의 음소수 이하이면 j + mxhops로 설정되고, 그렇지 않으면 mxj는 질의 내의 음소수(Nquer)로 설정된다. 최종적으로, 단계 s211에서, 시스템은 주해 음소 루프 포인터 i2를 주해 음소 루프 포인터 i의 현재값과 동일하게 그리고 질의 음소 루프 포인터 j2를 질의 음소 루프 포인터 j의 현재값과 동일하게 초기화한다.In step s155 shown in Fig. 14, the system propagates the path terminating at the lattice point (i, j) using the dynamic programming constraints discussed above. 16 is a flow chart illustrating the processing steps involved in performing the propagation step. As shown, in step s211, the system sets the values of the two variables mxi and mxj and initializes the annotated phoneme loop pointer i2 and the query phoneme loop pointer j2. Loop pointers i2 and j2 are provided to loop through all lattice points propagated by the path terminating at point (i, j), and variables mxi and mxj take i2 and j2 only those values allowed by dynamic programming constraints . In particular, if mxi is less than or equal to the number of phonemes in the annotation, it is set to i + mxhops, otherwise mxi is set to the number of phonemes (Nann) in the annotation. Similarly, if mxj is less than or equal to the annotation of the annotation, it is set to j + mxhops, otherwise mxj is set to the number of phonemes in the query (Nquer). Finally, in step s211, the system initializes the annotated phoneme loop pointer i2 to be equal to the current value of the annotated loop pointer i, and the query phoneme loop pointer j2 to be equal to the current value of the query phoneme loop pointer j.

시스템에 의해 사용되는 동적 프로그래밍 제약들은 주해가 텍스트인지 음성인지 그리고 질의가 텍스트인지 음성인지에 따르므로, 다음 단계는 어떻게 주해 및질의가 생성되었는지를 판정하는 것이다. 이는 판정 블록들 s213, s215, 및 s217에 의해 수행된다. 주해 및 질의 모두가 음성으로부터 생성된다면, 격자 지점 (i, j)에서 종료하는 동적 프로그래밍 경로는 도 11에 도시된 다른 지점들로 전파될 수 있으며 프로세스 단계들 s219 내지 s235는 상기 경로가 상기 다른 지점들로 전파하도록 동작한다. 특히, 단계 s219에서, 시스템은 주해 음소 루프 포인터 i2를 변수 mxi와 비교한다. 주해 음소 루프 포인터 i2는 i로 설정되고 mxi는 i+1로 설정되므로, 단계 s211에서, 프로세싱은 질의 음소 루프 포인터 j2에 대해 유사한 비교가 행해지는 단계 s221로 진행할 것이다. 다음에, 초기에 i2가 i와 동일하고 j2가 j와 동일할 것이므로 경로가 동일한 격자 지점 (i, j)에 머물지 않도록 하는 것을 보장하는 단계 s233으로 프로세싱이 진행한다. 따라서, 프로세싱은 질의 음소 루프 포인터 j2가 1만큼 증가되는 단계 s225로 진행할 것이다.The dynamic programming constraints used by the system depend on whether the annotation is text or speech and whether the query is text or speech, so the next step is to determine how annotations and queries were generated. This is done by decision blocks s213, s215, and s217. If both annotations and queries are generated from speech, the dynamic programming path terminating at the lattice point (i, j) may propagate to the other points shown in Fig. 11, and the process steps s219 to s235 may be such that the path Lt; / RTI > Specifically, in step s219, the system compares the annotation loop pointer i2 with the variable mxi. Since the annotation loop pointer i2 is set to i and mxi is set to i + 1, in step s211, the processing will proceed to step s221 where a similar comparison is made to the query phoneme loop pointer j2. Processing then proceeds to step s233, which ensures that the path does not stay at the same lattice point (i, j), since initially i2 is equal to i and j2 is equal to j. Thus, the processing will proceed to step s225 where the query phoneme loop pointer j2 is incremented by one.

프로세싱은 j2의 증가된 값이 mxj와 비교되는 단계 s221로 복귀한다. j2가 mxj보다 작다면, 프로세싱은 단계 s223으로 복귀하여 주해 음소들 및 질의 음소들 모두를 따르는 홉이 너무 커지는 것을 방지하도록 동작 가능한 단계 s227로 진행한다. 이는 i2 + j2가 i + j + mxhops보다 작을 경우에만 경로가 전파되는 것을 보장함으로써 행해진다. 이는 도 11에 도시된 삼각 세트의 지점들만이 프로세싱되도록 하는 것을 보장한다. 이러한 조건이 충족되면, 다음에 프로세싱은 시스템이 격자 지점 (i, j)으로부터 격자 지점 (i2, j2)로의 천이 스코어(TRANSCORE)를 계산하는 단계 s229로 진행한다. 다음에, 이 프로세싱은 시스템이 단계 s229에서 판정된 천이 스코어를 지점 (i, j)에 대해 저장된 누적 스코어에 합산하고 이를 일시저장소 TEMPSCORE로 복사하는 단계 s231로 진행한다. 상술한 바와 같이, 본 실시예에서, 2개 이상의 동적 프로그래밍 경로들이 동일한 격자 지점에서 만난다면, 경로를 각각과 연관된 누적 스코어들은 함께 합산된다. 따라서, 단계 s233에서, 시스템은 TEMPSCORE와 지점 (i2, j2)에 대해 이미 저장되어 있는 누적 스코어의 로그 합산을 수행하고 그 결과는 스코어(i2, j2)에 저장된다. 다음에, 프로세싱은 단계 s225로 복귀하여 질의 음소 루프 포인터 j2가 1만큼 증가시키고 프로세싱은 단계 s221로 복귀한다. 질의 음소 루프 포인터 j2가 mxj의 값에 도달하였다면, 프로세싱은 질의 음소 루프 포인터 j2가 초기값 j로 재설정되고 주해 음소 루프 포인터 i2가 1만큼 증가되는 단계 s235로 진행한다. 다음에, 프로세싱은 도 11에 도시된 지점들의 다음 열에 대해 프로세싱이 다시 시작되는 단계 s219로 진행한다. 경로가 지점 (i, j)로부터 도 11에 도시된 모든 다른 지점들로 전파되었으면, 프로세싱은 종료한다.The processing returns to step s221 where the incremented value of j2 is compared to mxj. If j2 is less than mxj, the processing returns to step s223 and proceeds to step s227, which is operable to prevent the hops along both the note phonemes and the query phonemes from becoming too large. This is done by ensuring that the path propagates only if i2 + j2 is less than i + j + mxhops. This ensures that only the points of the triangular set shown in FIG. 11 are processed. If this condition is satisfied, then processing proceeds to step s229 where the system calculates a transition score TRANSCORE from the lattice point (i, j) to the lattice point (i2, j2). This processing then proceeds to step s231 where the system adds the transition score determined in step s229 to the cumulative score stored for point (i, j) and copies it to the temporary store TEMPSCORE. As described above, in this embodiment, if two or more dynamic programming paths meet at the same lattice point, the cumulative scores associated with each of the paths are summed together. Thus, at step s233, the system performs a log sum of cumulative scores already stored for TEMPSCORE and point (i2, j2) and the result is stored in the score (i2, j2). Next, the processing returns to step s225 to increment the query phoneme loop pointer j2 by one, and the processing returns to step s221. If the query phoneme loop pointer j2 has reached the value of mxj, the processing proceeds to step s235 where the query phoneme loop pointer j2 is reset to the initial value j and the annotation loop pointer i2 is incremented by one. Next, processing proceeds to step s219 where processing is resumed for the next column of points shown in Fig. If the path has propagated from point (i, j) to all other points shown in Fig. 11, the processing ends.

판정 블록들 s213 및 s215가 그 주해는 텍스트이고 질의는 음성이라고 판정한다면, 프로세싱은 지점 (i, j)에서 종료하는 경로를 도 9a에 도시된 지점들로 전파시키도록 동작하는 단계들 s241 내지 s251로 진행한다. 특히, 단계 s241에서, 시스템은 주해 음소 루프 포인터 i가 주해 내의 최종 음소를 지시하는지 아닌지를 판정한다. 만약 그렇다면, 주해 내의 음소들은 더 존재하지 않으며 프로세싱은 종료한다. 주해 음소 루프 포인터 i는 Nann-1보다 작다면, 프로세싱은 질의 음소 루프 포인터 j2가 mxj와 비교되는 단계 s243으로 진행한다. 처음에, j2는 mxj보다 작을 것이며 따라서 프로세싱은 시스템이 지점 (i, j)로부터 지점 (i+1, j2)로의천이 스코어(TRANSCORE)를 계산하는 단계 s245로 진행한다. 다음에, 이 천이 스코어는 지점 (i, j)에서 종료하는 경로와 연관된 누적 스코어에 합산되고 그 결과는 임시 스코어 TEMPSCORE로 복사된다. 다음에, 단계 s249에서, 시스템은 TEMPSCORE와 지점 (i+1, j2)와 연관된 누적 스코어의 로그 가산을 수행하고 그 결과를 SCORE (i+1, j2)에 저장하여, 격자 지점 (i+1, j2)에서 만나는 경로들에 대한 경로 스코어들이 결합되는 것을 보장한다. 다음에, 프로세싱은 질의 음소 루프 포인터 j2가 1만큼 증가되는 단계 s251로 진행하고 다음에 프로세싱은 단계 s243으로 복귀한다. 지점 (i, j)에서 종료하는 경로가 도 9a에 도시된 다른 지점들로 전파되면, j2는 mxj와 동일하게 될 것이고 지점 (i, j)에서 종료하는 경로의 전파는 종료할 것이다.If the decision blocks s213 and s215 determine that the annotation is text and the query is negative, then processing proceeds to steps s241 through s251 (i, j) operating to propagate the path ending at point (i, j) . Specifically, in step s241, the system determines whether the annotation loop pointer i indicates the final phoneme in the annotation. If so, there are no more phonemes in the annotation and the processing ends. If the annotation loop pointer i is smaller than Nann-1, processing proceeds to step s243 where the query phoneme loop pointer j2 is compared to mxj. Initially, j2 will be less than mxj and therefore processing proceeds to step s245 where the system calculates a transition score TRANSCORE from point (i, j) to point (i + 1, j2). Next, this transition score is added to the cumulative score associated with the path terminating at point (i, j) and the result is copied to the temporary score TEMPSCORE. Next, at step s249, the system performs a log addition of cumulative scores associated with TEMPSCORE and point (i + 1, j2) and stores the result in SCORE (i + 1, j2) , < / RTI > j2) are combined. Next, the processing proceeds to step s251 where the query phoneme loop pointer j2 is incremented by one, and then the processing returns to step s243. If the path terminating at point (i, j) is propagated to other points shown in FIG. 9A, j2 will be equal to mxj and the propagation of the path terminating at point (i, j) will end.

판정 블록들 s213 및 s217이 주해가 음성이고 질의가 텍스트인 것으로 판정한다면, 지점 (i, j)에서 종료하는 경로를 도 9b에 도시된 다른 지점들로 전파시키도록 동작 가능한 도 16b에 도시된 단계들 s255 내지 s265로 진행한다. 이는 단계 s255에서 질의 음소 루프 포인터 j가 질의를 나타내는 음소들의 시퀀스 내에서 최종 음소를 지시하지 않는지를 먼저 검사함으로써 이루어진다. 그렇지 않다면, 프로세싱은 주해 음소 루프 포인터 i2가 mxi와 비교되는 단계 s257로 진행한다. 처음에, i2는 i인 값을 갖고 주해 음소 i가 주해를 나타내는 음소들의 시퀀스의 말단에 있지 않으면, 프로세싱은 지점 (i, j)로부터 지점 (i2, j+1)로 이동시키기 위한 천이 스코어가 계산되는 단계 s259로 진행할 것이다. 다음에, 프로세싱은 천이 스코어가 지점 (i, j)에서 종료하는 경로에 대한 누적 스코어에 합산되고 그 결과가임시 스코어(TEMPSCORE)로 복사되는 단계 s261로 진행한다. 다음에, 프로세싱은 TEMPSCORE와 지점 (i2, j+1)에 대해 이미 저장되어 있는 누적 스코어의 로그 가산이 수행되고 그 결과는 SCORE (i2, j+1)에 저장되는 단계 s263으로 진행한다. 다음에, 프로세싱은 주해 음소 루프 포인터 i2가 1만큼 증가되는 단계 s265로 진행하고 프로세싱은 s257로 복귀한다. 이러한 프로세싱 단계들은 다음에 지점 (i, j)에서 종료하는 경로가 도 9b에 도시된 다른 지점들 각각으로 전파될 때까지 반복된다. 이 때에, 지점 (i, j)에서의 경로 전파가 완료되어 프로세싱이 종료한다.If the decision blocks s213 and s217 are negative and the query is determined to be text, the steps shown in Fig. 16B, which are operable to propagate the path ending at point (i, j) to the other points shown in Fig. S255 to s265. This is done in step s255 by first checking whether the query phoneme loop pointer j indicates the final phoneme in the sequence of phonemes representing the query. Otherwise, the processing proceeds to step s257 where the annotation loop pointer i2 is compared to mxi. Initially, if i2 has a value of i and the phoneme i is not at the end of the sequence of phonemes representing the annotation, processing may be performed using a transition score for moving from point (i, j) to point (i2, j + And proceeds to step s259 in which it is calculated. Processing then proceeds to step s261 where the transition score is added to the cumulative score for the path ending at point (i, j) and the result is copied to a temporary score (TEMPSCORE). Next, the processing proceeds to step s263 where the log addition of cumulative scores already stored for TEMPSCORE and point (i2, j + 1) is performed and the result is stored in SCORE (i2, j + 1). Processing then proceeds to step s265 where the annotation loop pointer i2 is incremented by one and processing returns to s257. These processing steps are repeated until the next ending path at point (i, j) propagates to each of the other points shown in Figure 9b. At this time, the propagation of the path at the point (i, j) is completed and the processing ends.

최종적으로, 판정 블록들 s213 및 s215가 주해 및 질의 모두가 텍스트인 것으로 판정한 경우, 추가 주해 음소 및 추가 질의 음소가 존재하면, 프로세싱은 지점 (i, j)에서 종료하는 경로를 지점 (i+1, j+1)로 전파하도록 동작 가능한 도 16b에 도시된 단계들 s271 내지 s279로 진행한다. 특히, 단계 s271에서, 시스템은 주해 음소 루프 포인터 i가 최종 주해 음소를 지시하지 않는지를 검사한다. 그렇지 않다면, 프로세싱은 질의 음소들의 시퀀스에 관련한 질의 음소 루프 포인터 j에 대해 유사한 검사가 이루어지는 단계 s273으로 진행한다. 더 이상 주해 음소가 존재하지 않거나 더 이상 질의 음소들이 존재하지 않는다면, 프로세싱은 종료한다. 그러나, 주해 음소 및 질의 음소가 더 존재한다면, 프로세싱은 시스템이 지점 (i, j)로부터 지점 (i+1, j+1)로의 천이 스코어를 계산하는 단계 s275로 진행한다. 단계 s277에서, 이 천이 스코어는 지점 (i, j)에 대해 저장된 누적 스코어와 합산되어 임시 스코어(TEMPSCORE)에 저장된다. 다음에, 프로세싱은 시스템이 TEMPSCORE와 지점 (i+1, j+1)에 대해 이미 저장된 누적 스코어의 로그 가간을 수행하고 그 결과가 SCORE (i+1, j+1)에 복사되는 단계 s279로 진행한다. 당업자라면, 동적 프로그래밍 제약들은 경로가 주해를 나타내는 음소들의 시퀀스 내에서 임의의 음소에서 시작하도록 되고 따라서, 지점 (i+1, j+1)은 이미 그와 연관된 스코어를 가질 수 있기 때문에 본 실시예에서 단계들 s277 및 s279가 필요하다는 것을 인식할 것이다. 단계 s279 후에, 지점 (i, j)의 전파가 완료되고 프로세싱이 종료한다.Finally, if the decision blocks s213 and s215 determine that both the annotation and the query are text, then processing proceeds to the point (i + j) if the additional annotation phoneme and the additional query phoneme exist, 1, j + 1) shown in Fig. 16B. Specifically, in step s271, the system checks whether the annotation loop pointer i indicates the final annotation phoneme. Otherwise, the processing proceeds to step s273 where a similar check is made for the query phoneme loop pointer j related to the sequence of querying phonemes. If there are no more annotated phonemes or no more query phonemes, the processing ends. However, if there are more annotated phonemes and query phonemes, processing proceeds to step S275 where the system calculates a transition score from point (i, j) to point (i + 1, j + 1). In step s277, this transition score is summed with the cumulative score stored for point (i, j) and stored in a temporary score (TEMPSCORE). Processing then proceeds to step s279 where the system performs a log change of the accumulated score already stored for TEMPSCORE and point (i + 1, j + 1) and the result is copied to SCORE (i + 1, j + 1) Go ahead. Those skilled in the art will appreciate that the dynamic programming constraints may be used in the present embodiment because the path is to start at any phoneme in the sequence of phonemes representing the annotation and thus the point (i + 1, j + 1) Lt; RTI ID = 0.0 > s277 < / RTI > After step s279, the propagation of the point (i, j) is completed and the processing ends.

천이 스코어Transition Score

도 12에 도시된 단계들 s103, s105, 및 s107에서, 동적 프로그래밍 경로들이 전파되고, 그 전파 동안에 한 지점으로부터 다른 지점으로의 천이 스코어가 단계들 s127, s117, s177, s191, s229, s245, s259, 및 s275에서 계산된다. 이 단계들에서, 시스템은 천이의 시작 지점 및 종료 지점에 관련한 적절한 삽입 확률들, 삭제 확률들, 및 디코딩 확률들을 계산한다. 본 실시예에서 이루어지는 방법이 도 17 및 도 18을 참조로 설명될 것이다.In steps s103, s105, and s107 shown in Fig. 12, the dynamic programming paths are propagated and a transition score from one point to another during its propagation is calculated using steps s127, s117, s177, s191, s229, s245, s259 , And s275. In these steps, the system calculates appropriate insertion probabilities, deletion probabilities, and decoding probabilities associated with the start and end points of the transition. The method of this embodiment will be described with reference to Figs. 17 and 18. Fig.

특히, 도 17은 격자 지점 (i, j)로부터 격자 지점 (i2, j2)로 전파하는 경로에 대한 천이 스코어를 계산할 때 포함되는 일반적인 프로세싱 단계들을 나타내는 흐름도를 도시하고 있다. 단계 s291에서, 시스템은 지점 (i, j)와 지점 (i2, j2) 사이에 삽입되는 각각의 주해 음소에 대해 삽입 음소(들)를 삽입하기 위한 스코어 (상기 논의된 확률 PI( )의 로그)를 계산하고 이를 적절한 저장소 INSERTSCORE에 저장한다. 다음에, 프로세싱은 시스템이 지점 (i, j)와 지점 (i2, j2) 사이에 삽입되는 각각의 질의 음소에 대해 유사한 계산을 수행하고 이것을 INSERTSCORE에 가산하는 단계 s293으로 진행한다. 그러나, (i, j)가 시작 널 노드(ø_s)이거나 (i2, j2)가 종료 널 노드(ø_e)인 경우, 시스템은 임의의 삽입되는 질의 음소들에 대해서는 삽입 확률들을 계산하지만, (임의의 주해 음소들에서 경로를 시작 또는 종료하는 것에 대한 불이익이 없기 때문에) 임의의 삽입되는 주해 음소들에 대해서는 삽입 확률들을 계산하지 않는다. 상술한 바와 같이, 계산된 스코어들은 확률들에 기초한 로그이므로, INSERTSCORE의 스코어들의 가산은 대응하는 삽입 확률들의 승산에 대응한다. 다음에, 프로세싱은 시스템이 지점 (i, j)로부터 지점 (i2, j2)로 전파할 때 임의의 삭제 및/또는 임의의 디코딩에 대한 스코어를 계산하고 이 스코어들이 적절한 저장소 DELSCORE에 가산 및 저장되는 단계 s295로 진행한다. 다음에, 프로세싱은 시스템이 INSERTSCORE와 DELSCORE를 가산하고 이 결과를 TRANSCORE에 복사하는 단계 s297로 진행한다.In particular, FIG. 17 shows a flow chart illustrating typical processing steps involved in calculating a transition score for a path propagating from a lattice point (i, j) to a lattice point (i2, j2). In step s 291, the system calculates a score (the log of the probabilities PI () discussed above) for inserting the embedded phoneme (s) for each annotation phoneme inserted between the point (i, j) And stores it in the appropriate store INSERTSCORE. Processing then proceeds to step s 293 where the system performs a similar calculation for each query phoneme inserted between point (i, j) and point (i 2, j 2) and adds this to INSERTSCORE. However, if (i, j) is the start null node (ø _s ) or (i2, j2) is the end null node ( _øe ), the system calculates the insertion probabilities for any inserted query phonemes, Does not calculate insertion probabilities for any inserted annotation phonemes (since there is no penalty for starting or ending a path in any annotation phonemes). As discussed above, since the calculated scores are log based on probabilities, the addition of the scores of INSERTSCORE corresponds to the multiplication of corresponding insertion probabilities. Processing then calculates the scores for any deletions and / or any decoding as the system propagates from point (i, j) to point (i2, j2) and these scores are added to and stored in the appropriate store DELSCORE The flow advances to step s295. Processing then proceeds to step s297 where the system adds INSERTSCORE and DELSCORE and copies the result to TRANSCORE.

지점 (i, j)로부터 지점 (i2, j2)로 전파할 때 삭제 및/또는 디코딩 스코어들을 판정하는 단계 s295 내에 포함된 프로세싱이 도 18을 참조로 하여 보다 상세히 설명될 것이다. 삭제 및 디코딩은 주해가 텍스트로부터 생성되는지 여부와 질의가 텍스트로부터 생성되는지에 따르므로, 판정 블록 s301, s303, 및 s305는 주해가 텍스트인지 음성인지 그리고 질의가 텍스트인지 음성인지를 판정한다. 이 판정 블록들이 주해 및 질의 모두가 텍스트인 것으로 판정하면, 삭제는 없으며 2개 음소들의 디코딩이 단계 s307에서 불 매칭에 의해 수행된다. 주해 음소 a_i2가 질의 음소 q_j2와 동일하다면, 프로세싱은 TRANSCORE가 로그 [1] (즉, 0)로 설정되는 단계s309로 진행하고 다음에 프로세싱은 종료한다. 그러나, 주해 음소 a_i2가 질의 음소 q_j2와 동일하지 않다면, 프로세싱은 단계 311로 진행하여 TRANSCORE가 로그 [0]의 시스템 근사(approximation)인 매우 큰 음수로 설정되며 다음에 프로세싱은 종료한다.The processing involved in step S295 for determining erasure and / or decoding scores when propagating from point (i, j) to point (i2, j2) will be described in more detail with reference to FIG. Since deletion and decoding depend on whether an annotation is generated from text and whether the query is generated from text, decision blocks s301, s303, and s305 determine whether the annotation is text or speech and whether the query is text or speech. If these decision blocks determine that both the annotation and the query are text, there is no deletion and the decoding of the two phonemes is performed by mismatching in step s307. If the annotation phoneme a _i2 is equal to the query phoneme q _j2 , processing proceeds to step s309 where TRANSCORE is set to log [1] (i.e., 0) and processing then ends. However, the annotation phoneme a _i2 not identical to the query phoneme q _j2, the processing advances to step 311 and TRANSCORE is set to a very large negative system approximation (approximation) of the log [0] and then the processing is ended.

판정 블록들 s301 및 s305가 주해는 음성이고 질의는 텍스트인 것으로 판정한다면, 천이 스코어들은 상기 논의된 수학식 4의 간소화된 형태를 사용하여 판정된다. 이러한 경우에, 프로세싱은 시스템이 주해 루프 포인터 i2가 주해 루프 포인터 i와 동일한 지를 판정하는 단계 s303으로부터 단계 s313으로 진행한다. 동일하다면, 이는 경로가 지점 (i, j)로부터 지점 (i, j+1)로 전파되었다는 것을 의미한다. 그러므로, 질의 음소 q_j+1은 질의 음소들의 시퀀스에 관련한 주해 음소들의 시퀀스로부터 삭제되었다. 따라서, 단계 s317에서, 시스템은 음소 q_j+1을 삭제하는 로그 확률(즉, log P(ø｜q_j+1, C))을 DELSCORE로 복사하고 프로세싱은 종료한다. 단계 s313에서, 시스템이 i2가 i와 동일하지 않은 것으로 판정한다면, 시스템은 지점 (i, j)에서 종료하는 경로를 지점들 (i+1, j+1), (i+2, j+1) 또는 (i+3, j+1) 중 하나로 전파시키는 것을 고려한다. 이러한 경우에, 삭제는 없고, 주해 음소 a_i2와 질의 음소 q_j+1사이의 삽입과 디코딩만이 존재한다. 그러므로, 단계 s315에서, 시스템은 질의 음소 q_j+1을 주해 음소 a_i2로서 디코딩하는 로그 확률[즉, logP(a_i2|q_j+1,C)]을 DELSCORE로 복사하고 프로세싱은 종료한다.If decision blocks s301 and s305 determine that the annotation is speech and the query is text, the transition scores are determined using the simplified form of Equation 4 discussed above. In this case, processing proceeds from step s303 to step s313 where the system determines whether annotated loop pointer i2 is equal to annotation loop pointer i. If they are the same, this means that the path has propagated from point (i, j) to point (i, j + 1). Therefore, the query phoneme q _{j + 1} has been removed from the sequence of annotation phonemes associated with the sequence of query phonemes. Accordingly, in step s317, the system log probability (that is, log P (ø | q _{j + 1,} C)) to remove the phoneme q _{j + 1} is copied to the DELSCORE and processing is terminated. (I + 1, j + 1), (i + 2, j + 1) ) Or (i + 3, j + 1). In this case, there is no deletion, and there is only insertion and decoding between the annotation phoneme a _i2 and the query phoneme q _{j + 1} . Thus, in step s315, the system log probability _{_{i.e., logP (a i2 | q j}} + 1, C)] to decode the query phoneme q _{j + 1} as the annotation phoneme a _i2 Copy to DELSCORE and processing is terminated.

판정 블록들 s301 및 s305가 주해는 텍스트이고 질의는 음성인 것으로 판정하면, 천이 스코어는 상기 논의된 수학식 4의 다른 간소화된 형태를 사용하여 판정된다. 이러한 경우에, 프로세싱은 시스템이 질의 음소 루프 포인터 j2가 질의 음소 루프 포인터 j와 동일한 지 여부를 판정하는 단계 s305로부터 단계 s319로 진행한다. 동일하다면, 시스템은 지점 (i, j)으로부터 지점 (i+1, j)으로의 천이 스코어를 계산한다. 이러한 경우에, 주해 음소 a_i+1은 주해 음소들의 시퀀스에 관련한 질의 음소들의 시퀀스로부터 삭제되었다. 그러므로, 단계 s321에서, 시스템은 주해 음소 a_i+1을 삭제하는 로그 확률(즉, log P(ø｜a_i+1, C))을 판정하여 DELSCORE로 복사한다. 단계 s319에서, 시스템이 질의 음소 루프 포인터 j2가 질의 음소 루프 포인터 j와 동일하지 않은 것으로 판정한다면, 시스템은 지점 (i, j)로부터 지점들 (i+1, j+1), (i+1, j+2) 또는 (i+1, j+3) 중 하나로의 천이 스코어를 판정한다. 이러한 경우에, 삭제는 없고, 주해 음소 a_i+1과 질의 음소 q_j2사이의 삽입 및 디코딩만이 존재한다. 그러므로, 단계 s323에서, 시스템은 주해 음소 a_i+1을 질의 음소 q_j2로서 디코딩하는 로그 확률(즉, log P(q_j2|a_i+1,C))을 판정하여 DELSCORE로 복사하고 프로세싱은 종료한다.If the decision blocks s301 and s305 determine that the annotation is text and the query is negative, the transition score is determined using the other simplified form of Equation 4 discussed above. In this case, the processing proceeds from step s305 to step s319 in which the system determines whether the query phoneme loop pointer j2 is the same as the query phoneme loop pointer j. If they are the same, the system calculates the transition score from point (i, j) to point (i + 1, j). In this case, the annotation phoneme a _{i + 1} has been deleted from the sequence of query phonemes associated with the sequence of annotation phonemes. Therefore, in step s321, the system determines the log probability (i.e., log P (? | A _{i + 1} , C)) that deletes the annotation phoneme a _{i + 1} and copies it to DELSCORE. If the system determines in step s319 that the query phoneme loop pointer j2 is not the same as the query phoneme loop pointer j, the system extracts points (i + 1, j + 1), (i + , j + 2) or (i + 1, j + 3). In this case, there is no erasure, and there is only insertion and decoding between the annotation phoneme a _{i + 1} and the query phoneme q _j2 . Therefore, in step s323, the system determines and copies the log probability (i.e., log P (q _j2 | a _{i + 1} , C)) decoding the annotation phoneme a _{i + 1} as the query phoneme q _j2 to DELSCORE And terminates.

판정 블록들 s301 및 s303이 주해 및 질의 모두가 음성으로부터 생성되는 것으로 판정한다면, 천이 스코어들은 상기 수학식 4를 사용하여 판정된다. 이러한경우에, 프로세싱은 시스템이 주해 루프 포인터 i2가 주해 루프 포인터 i와 동일한 지를 판정하는 단계 s303으로부터 단계 s325로 통과한다. 동일하다면, 프로세싱은 음소 루프 포인터 r이 1로 초기화되는 단계 s327로 진행한다. 음소 루프 포인터 r은 상기 수학식 4의 각각의 계산 동안에 시스템에 알려진 각각의 가능한 음소를 통해 루핑하는데 사용된다. 다음에, 프로세싱은 음소 포인터 r과 시스템에 알려진 음소수(Nphonemes)(본 실시예에서는 43)를 비교하는 단계 s329로 진행한다. 처음에, r은 단계 s327에서 1로 설정되며, 따라서 프로세싱은 시스템이 발생하는 음소 p_r의 로그 확률(즉, log P(p_r|C))을 판정하여 이를 임시 스코어 TEMPDELSCORE에 복사하는 단계 s331로 진행한다. 주해 음소 루프 포인터 i2가 주해 음소 i와 동일하다면, 시스템은 지점 (i, j)에서 종료하는 경로를 지점들 (i, j+1), (i, j+2), 또는 (i, j+3) 중 하나로 전파시킨다. 그러므로, 주해 내에 있지 않은 질의 내의 음소가 존재한다. 따라서, 단계 s333에서, 시스템은 주해로부터의 음소 p_r를 삭제하는 로그 확률(즉, log P(ø｜p_r, C))을 TEMPDELSCORE에 가산한다. 프로세싱은 시스템이 음소 p_r을 질의 음소 q_j2로서 디코딩하는 로그 확률(즉, log P(q_j2｜p_r, C))을 TEMPDEL SCORE에 가산하는 단계 s335로 진행한다. 다음에, 프로세싱은 TEMPDELSCORE와 DELSCORE의 로그 가산이 수행되고 그 결과가 DELSCORE에 저장되는 단계 s337로 진행한다. 다음에, 프로세싱은 음소 루프 포인터 r이 1만큼 증가되는 단계 s339로 진행하고 다음에 프로세싱은 시스템에 알려진 다음의 음소에 대해 유사한 프로세싱이 수행되는 단계 s329로 복귀한다. 이러한 계산이 시스템에 알려진43개의 음소들 각각에 대해 수행되었으면, 프로세싱은 종료한다.If decision blocks s301 and s303 determine that both annotations and queries are generated from speech, the transition scores are determined using Equation (4) above. In this case, processing passes from step s303 to step s325 where the system determines whether annotation loop pointer i2 is equal to annotation loop pointer i. If so, processing proceeds to step s327 where the phoneme loop pointer r is initialized to 1. The phoneme loop pointer r is used to loop through each possible phoneme known to the system during each computation of Equation (4) above. Next, processing proceeds to step s329 of comparing the phoneme pointer r with the number of known phonemes (Nphonemes) (43 in this embodiment) known to the system. Initially, r is set to 1 in step s327 and processing therefore determines the log probability (i.e., log P (p _r | C)) of the phoneme p _r that the system is generating and copies it to the temporary score TEMPDELSCORE s331 . (I, j + 1), (i, j + 2), or (i, j + 1) if the annotation loop pointer i2 is the same as the annotation phoneme i, 3). Therefore, there are phonemes in the query that are not in the annotation. Thus, in step s333, the system adds to the TEMPDELSCORE the log probability (i.e., log P (? | P _r , C)) that deletes the phoneme p _r from the annotation. Processing proceeds to step s335 of adding to the TEMPDEL SCORE the log probability (i.e., log P (q _j2 | p _r , C)) at which the system decodes the phoneme p _r as the querying phoneme q _j2 . Processing then proceeds to step s337 where the log addition of TEMPDELSCORE and DELSCORE is performed and the result is stored in DELSCORE. Processing then proceeds to step s339 where the phoneme loop pointer r is incremented by one and then processing returns to step s329 where similar processing is performed for the next phoneme known to the system. If this calculation has been performed for each of the 43 phonemes known to the system, the processing ends.

단계 s325에서, 시스템이 i2와 i가 동일하지 않은 것으로 판정하면, 프로세싱은 시스템이 질의 음소 루프 포인터 j2가 질의 음소 루프 포인터 j와 동일한 지를 판정하는 단계 s341로 진행한다. 동일하다면, 프로세싱은 음소 루프 포인터 r이 1로 초기화되는 단계 s343으로 진행한다. 다음에, 프로세싱은 음소 루프 포인터 r이 시스템에 알려진 전체 음소수(Nphonemes)와 비교되는 단계 s345로 진행한다. 처음에, 단계 s343에서 r은 1로 설정되고, 따라서 프로세싱은 발생하는 음소 p_r의 로그 확률이 결정되어 임시 저장소 TEMPDELSCORE로 복사되는 단계 s347로 진행한다. 다음에, 프로세싱은 시스템이 음소 p_r를 주해 음소 a_i2로서 디코딩하는 로그 확률을 결정하여 이를 TEMPDELSCORE에 가산하는 단계 s349로 진행한다. 질의 음소 루프 포인터 j2가 질의 음소 루프 포인터 j와 동일하다면, 시스템은 지점 (i, j)에서 종료하는 경로를 지점들 (i+1, j), (i+2, j) 또는 (i+3, j) 중 하나로 전파시킨다. 그러므로, 질의 내에 없는 주해 내의 음소가 존재하게 된다. 따라서, 단계 s351에서, 시스템은 질의로부터의 음소 p_r를 삭제하는 로그 확률을 결정하여 이를 TEMPDEL SCORE에 가산한다. 다음에, 프로세싱은 시스템이 TEMPDELSCORE와 DELSCORE의 로그 가산을 수행하고 그 결과를 DELSCORE에 저장하는 단계 s353으로 진행한다. 음소 루프 포인터 r은 단계 s355에서 1만 증가되고 프로세싱은 단계 s345로 복귀한다. 프로세싱 단계들 s347 내지 s353이 시스템에 알려진 모든 음소들에 대해 수행되었으면, 프로세싱은 종료한다.In step s325, if the system determines that i2 and i are not the same, processing proceeds to step s341 where the system determines whether the query phoneme loop pointer j2 is equal to the query phoneme loop pointer j. If so, the processing proceeds to step s343 where the phoneme loop pointer r is initialized to 1. Next, processing proceeds to step s345 where the phoneme loop pointer r is compared to the total number of phonemes (Nphonemes) known to the system. Initially, r is set to 1 in step s343, and thus processing proceeds to step s347 where the log probability of the resulting phoneme p _r is determined and copied to the temporary store TEMPDELSCORE. Processing then proceeds to step s349 where the system determines the log probability that the phoneme p _r is decoded as phoneme a _i2 and adds it to TEMPDELSCORE. If the query phoneme loop pointer j2 is the same as the query phoneme loop pointer j, the system returns the path ending at point (i, j) to points (i + 1, j), (i + , j). Therefore, there are phonemes in the annotation that are not in the query. Thus, at step s351, the system determines the log probability to delete the phoneme p _r from the query and adds it to TEMPDEL SCORE. Processing then proceeds to step s353 where the system performs the log addition of TEMPDELSCORE and DELSCORE and stores the result in DELSCORE. The phoneme loop pointer r is increased by 1 in step s355 and the processing returns to step s345. If the processing steps s347 to s353 have been performed for all phonemes known to the system, the processing ends.

단계 s341에서, 시스템은 질의 음소 루프 포인터 j2가 질의 음소 루프 포인터 j와 동일하지 않은 것으로 판정한다면, 프로세싱은 음소 루프 포인터 r이 1로 초기화되는 단계 s357로 진행한다. 다음에, 프로세싱은 시스템이 음소 카운터 r과 시스템에 알려진 음소수(Nphonemes)를 비교하는 단계 s359로 진행한다. 처음에, r은 단계 s357에서 1로 설정되며, 따라서, 프로세싱은 시스템이 발생하는 음소 p_r의 로그 확률을 결정하여 이를 임시 스코어 TEMPDELSCORE에 복사하는 단계 s361로 진행한다. 질의 음소 루프 포인터 j2가 질의 음소 루프 포인터 j2와 동일하지 않다면, 시스템은 지점 (i, j)에서 종료하는 경로를 지점들 (i+1, j+1), (i+1, j+2), 및 (i+2, j+1) 중 하나로 전파시킨다. 그러므로, 삭제는 없으며, 단지 삽입과 디코딩만이 존재한다. 그러므로, 프로세싱은 음소 p_r을 주해 음소 a_i2로서 디코딩하는 로그 확률이 TEMP DELSCORE에 가산되는 단계 s363으로 진행한다. 다음에, 음소 p_r을 질의 음소 q_j2로서 디코딩하는 로그 확률이 결정되어 TEMPDELSCORE로 가산되는 단계 s365로 진행한다. 다음에, 단계 s367에서, 시스템은 TEMPDELSCORE와 DELSCORE의 로그 가산을 수행하여 그 결과를 DELSCORE에 저장한다. 다음에, 음소 카운터 r은 단계 s369에서 1만큼 증가되고 처리는 단계 s359로 복귀한다. 프로세싱 단계들 s361 내지 s367이 시스템에 알려진 모든 음소들에 대해 수행되었으면, 프로세싱은 종료한다.In step s341, if the system determines that the query phoneme loop pointer j2 is not the same as the query phoneme loop pointer j, processing proceeds to step s357 where the phoneme loop pointer r is initialized to one. Processing then proceeds to step s359 where the system compares the phoneme counter r to the number of known phonemes (Nphonemes) in the system. Initially, r is set to 1 in step s357, so processing proceeds to step s361 where the system determines the log probability of the phoneme p _r that the system is generating and copies it to the temporary score TEMPDELSCORE. If the query phoneme loop pointer j2 is not the same as the query phoneme loop pointer j2, the system computes the paths ending at point (i, j) as points (i + 1, j + , And (i + 2, j + 1). Therefore, there is no erasure, only insertion and decoding. Therefore, the processing proceeds to step s363 in which the log probability to decode the phoneme p _r as the note phoneme a _i2 is added to TEMP DELSCORE. Next, a log probability of decoding the phoneme p _r as the query phoneme q _j2 is determined, and the process proceeds to step s365 where TEMPDELSCORE is added. Next, in step s367, the system performs a log addition of TEMPDELSCORE and DELSCORE and stores the result in DELSCORE. Then, the phoneme counter r is incremented by 1 in step s369 and the processing returns to step s359. If the processing steps s361 through s367 have been performed for all phonemes known to the system, the processing ends.

정규화Normalization

동적 프로그래밍 프로세스의 상기 설명은 상기 수학식 3의 분자 부분만으로다루어졌다. 그러므로, 입력 질의가 데이터베이스 내의 주해 음소들의 시퀀스와 매칭된 후에, (ENDSCORE 내에 저장된) 매치에 대한 스코어는 수학식 3의 분모에 의해 정의되는 정규화 항에 의해 정규화되어야 한다. 상술한 바와 같이, 분모 항의 계산은 분자, 즉 상술한 동적 프로그래밍 루틴의 계산과 동일한 시간에 수행된다. 이는 분자와 분모의 비교로부터 알 수 있는 바와 같이, 분모를 위해 요구되는 항이 분자 상에서 모두 계산되기 때문이다. 그러나, 주해 또는 질의가 텍스트로부터 생성될 때, 정규화가 수행되지 않음에 유의하여야 한다. 본 실시예에서, 보다 긴 주해가 보다 짧은 주해보다 더 가중치가 주어지지 않도록 그리고 공통 음소들을 포함하는 주해들이 공통이 아닌 음소들을 포함하는 주해들보다 더 가중치가 주어지지 않도록 정규화가 수행된다. 이는 주해가 기본 모델과 얼마나 잘 매칭되는지에 따르는 조건에 의해 스코어를 정규함으로써 행해진다.The above description of the dynamic programming process has been dealt with only in the molecular part of the above equation (3). Therefore, after the input query is matched with the sequence of note phonemes in the database, the score for the match (stored in ENDSCORE) should be normalized by the normalization term defined by the denominator of equation (3). As described above, the calculation of the denominator term is performed at the same time as the calculation of the numerator, i. E., The dynamic programming routine described above. This is because the terms required for the denominator are all computed on the molecule, as can be seen from the comparison of the denominator to the denominator. However, it should be noted that when an annotation or query is generated from text, no normalization is performed. In this embodiment, normalization is performed so that longer annotations are not given more weight than shorter annotations, and annotations containing common phonemes are not weighted more than annotations containing non-common phonemes. This is done by normalizing the score by the conditions depending on how well the annotation matches the base model.

트레이닝(training)Training

상기 실시예에서, 시스템은 음소 매칭 연산에서 동적 프로그래밍 경로를 스코어링하는데 사용된 (상기에서 혼동 통계로서 칭한) 1892개의 디코딩/삭제 확률들 및 43개의 삽입 확률들을 사용하였다. 본 실시예에서, 이 확률들은 트레이닝 세션 동안에 미리 결정되어 메모리 (도시생략)에 저장된다. 특히, 이러한 트레이닝 세션 동안에, 음성 인식 시스템은 2가지 방법으로 음성의 음소 디코딩을 제공하는데 사용된다. 첫 번째 방법에서, 음성 인식 시스템에는 발음되는 음성 및 실제 단어들이 제공된다. 그러므로, 음성 인식 장치는 발음된 단어들의 기준 음소 시퀀스를 생성하여 음성의 이상적인 디코딩을 얻도록 상기 정보를 사용할 수 있다. 다음에,음성 인식 시스템은 동일한 음성이지만 발음되는 실제 단어의 인식없이 디코딩 (이하, 프리 디코딩이라 칭함)하는데 사용된다. 프리 디코딩으로부터 생성되는 음소 시퀀스는 다음의 방법에서 기준 음소 시퀀스와 다를 것이다.In the above example, the system used 1892 decoding / erasure probabilities (referred to above as confusion statistics) and 43 insertion probabilities used to score the dynamic programming path in the phoneme matching operation. In the present embodiment, these probabilities are predetermined in a training session and stored in a memory (not shown). Specifically, during such a training session, the speech recognition system is used to provide phoneme decoding of speech in two ways. In the first method, the speech recognition system is provided with pronounced voice and actual words. Thus, the speech recognition apparatus can generate the reference phoneme sequence of pronounced words to use the information to obtain an ideal decoding of the speech. Next, the speech recognition system is used to decode (hereinafter, referred to as pre-decoding) the same speech but without recognizing the actual word to be pronounced. The phoneme sequence generated from the pre-decoding will be different from the reference phoneme sequence in the following method.

i) 프리 디코딩은 실수하여 음소들을 기준 시퀀스 내에 존재하지 않는 디코딩으로 삽입하거나, 다르게는, 기준 시퀀스 내에 존재하는 디코딩의 음소들을 생략할 수 있다.i) The pre-decoding may be a mistake to insert phonemes into decoding that does not exist in the reference sequence, or alternatively, omit the phonemes of decoding present in the reference sequence.

ii) 하나의 음소는 다른 것과 혼동될 수 있다.ii) One phoneme can be confused with another.

iii) 음성 인식 시스템이 음성을 완전하게 디코딩한다 해도, 프리 디코딩은 회화 발음과 기준 발음 간의 차이로 인해 기준 디코딩과 다를 수 있다. 예를 들어, 대화 음성에서, (기준 형태가 /ae/ /n/ /d/와 /ax/ /n/ /d/인) 단어 "and"는 종종 /ax/ /n/ 또는 심지어 /n/으로 감소된다.iii) Although the speech recognition system completely decodes the speech, the pre-decoding may be different from the reference decoding due to the difference between the pronunciation of the speech and the reference pronunciation. For example, in a conversation voice, the words "and" (with the reference forms / ae / / n / / d / and / ax / n / d /) are often / ax / / n / .

그러므로, 다양한 발음이 그 기준 형태 및 그 프리 디코딩된 형태로 디코딩된다면, 동적 프로그래밍 방법은 2개로 정렬하는데 사용될 수 있다. 이는 음소가 p였을 때, 디코딩된 음소 d의 카운트를 제공한다. 이러한 트레이닝 결과들로부터, 상기 디코딩, 삭제, 및 삽입 확률들은 다음의 방법으로 근사될 수 있다.Therefore, if various pronunciations are decoded in its reference form and its pre-decoded form, then the dynamic programming method can be used to sort by two. This provides a count of the decoded phoneme d when the phoneme is p. From these training results, the decoding, erasure and insertion probabilities can be approximated in the following way.

음소 d가 삽입된 확률은 다음과 같이 제공된다.The probability that a phoneme d is inserted is given as follows.

여기서 I_d는 음성 인식 시스템이 음소 d를 삽입한 횟수이고 n_o ^d는 기준 시퀀스에 관련하여 삽입되는 디코딩된 음소의 전체 수이다.Where I _d is the number of times the speech recognition system inserted the phoneme d and n _o ^d is the total number of decoded phonemes inserted relative to the reference sequence.

음소 p를 음소 d로서 디코딩하는 확률은 다음과 같이 제공된다.The probability of decoding phoneme p as phoneme d is provided as follows.

여기서 C_dp는 p였을 때 자동 음성 인식 시스템이 d를 디코딩한 횟수이고 n_p는 p였을 때 자동 음성 인식 시스템이 (삭제를 포함하는) 무엇인가를 디코딩한 횟수이다.Where C _dp is the number of times the automatic speech recognition system decoded d when n _p was p and the number of times that the automatic speech recognition system decoded something (including erasure) when n _p was p.

음소 p가 디코딩되었을 때 어떤 것도 디코딩하지 않은 (즉 삭제인) 확률은 다음에 의해 제공된다.When phoneme p is decoded, the probability that it has not decoded anything (that is, erased) is provided by

여기서 O_p는 p를 디코딩하였을 때 자동 인식 시스템이 디코딩하지 않은 횟수이며 n_p는 상기와 동일하다.Where O _p is the number of times that the automatic recognition system did not decode when p was decoded and n _p is the same as above.

제2 실시예Second Embodiment

제1 실시예에서는, 단일 입력 질의가 다수의 저장된 주해들과 비교되었다. 본 실시예에서는, 2개의 입력 음성 질의가 저장된 주해와 비교된다. 본 실시예는 입력 질의들이 소란스러운 환경에서 이루어지거나 고 정확도가 요구되는 응용에 적합하다. 이는 다른 질의들을 중복시키기 때문에 임의의 질의들이 텍스트인 상황에는 분명히 적합하지 않다. 그러므로, 시스템은 다음의 2가지 사항을 취급할 수 있다.In the first embodiment, a single input query was compared to a number of stored comments. In this embodiment, the two input voice queries are compared with the stored annotations. This embodiment is suitable for applications where input queries are performed in a loud environment or where high accuracy is required. This is obviously not appropriate for situations where certain queries are text because they duplicate other queries. Therefore, the system can handle the following two points.

(i) 입력 질의들 모두가 음성으로부터 생성되고 주해가 음성으로부터 생성된다.(i) all of the input queries are generated from speech and the annotation is generated from speech.

(ii) 입력 질의들 모두가 음성으로부터 생성되고 주해가 텍스트로부터 생성된다.(ii) all of the input queries are generated from the speech and the annotation is generated from the text.

본 실시예는 2개의 질의가 동시에 주해와 매칭되도록 적용되는 것을 제외하고는 제1 실시예에서 사용한 것과 유사한 동적 프로그래밍 알고리즘을 사용한다. 도 19는 2개의 질의 각각에 대해 한 차원이 제공되고 주해에 대해 나머지 차원이 제공된 3차원 좌표 플롯을 도시하고 있다. 도 19는 본 실시예의 동적 프로그래밍 알고리즘에 의해 프로세싱되는 지점의 3차원 격자를 도시하고 있다. 알고리즘은 도 19에 도시된 플롯에서 격자 지점들의 3차원 네트웍를 통해 경로들 각각을 전파하고 스코어링하기 위해 제1 실시예에서 사용되었던 동일한 천이 스코어들, 동적 프로그래밍 제약들, 및 혼동 통계 (즉, 음소 확률들)를 사용한다.This embodiment uses a dynamic programming algorithm similar to that used in the first embodiment, except that the two queries are applied simultaneously to match the annotation. 19 illustrates a three-dimensional coordinate plot in which one dimension is provided for each of the two queries and the remaining dimension is provided for the annotation. Figure 19 shows a three dimensional grid of points processed by the dynamic programming algorithm of this embodiment. The algorithm uses the same transition scores, dynamic programming constraints, and confusion statistics (i.e., phoneme probability) that were used in the first embodiment to propagate and score each of the paths through the three-dimensional network of lattice points in the plot shown in FIG. ).

이러한 3차원 동적 프로그래밍 프로세스에 대해 보다 상세히 설명할 것이다. 당업자라면, 3차원 동적 프로그래밍 알고리즘은 추가의 질문을 고려하기 위해 몇몇의 추가 제어 루프들을 추가하는 것을 제외하고는 제1 실시예들에 사용된 2차원 동적 프로그래밍 알고리즘과 본질적으로 동일하다는 사실을 도 20 내지 25와 도 13 내지 도 18을 비교함으로써 인식할 것이다. 3차원 동적 프로그래밍 알고리즘은 2개의 질의를 도 12에 도시된 모든 단계들을 따르는 주해와 비교한다. 도 20은 널시작 노드(ø_s)로부터 동적 프로그래밍 제약들에 의해 정의되는 모든 가능한 시작 지점들로의 동적 프로그래밍 경로들을 전파할 때 단계 s103에 포함된 프로세싱 단계들을 보다 상세히 도시하고 있다. 이 점에 대해서는, 제약들은 동적 프로그래밍 경로가 주해 음소들 중 임의의 하나에서 시작하며 경로는 어느 하나의 질의 내의 첫 번째 4개의 음소들 중 임의의 하나에서 시작할 것이라는 것이다. 그러므로, 도 20을 참조하면, 단계 s401에서 시스템은 변수들 mxj 및 mxk의 값을 제1 실시예에서 사용된 제약과 동일한 mxhops으로 설정한다. 그러므로, 본 실시예에서, 각각의 입력 질의가 4개 이상의 음소들을 포함하면, mxj 및 mxk는 모두 4로 동일하게 설정된다. 그렇지 않으면, mxj 및/또는 mxk는 대응하는 질의 내의 음소수와 동일하게 설정된다. 다음에, 프로세싱은 Nann-1, j=0 내지 3, 및 k=0 내지 3에 대해 i=0인 지점들 (i, j, k)에서 동적 프로그래밍 경로들을 시작하도록 동작 가능한 단계들 s403 내지 s417로 진행한다. 이는 도 12에 도시된 단계 s103에서의 프로세싱을 종료하며 다음에 프로세싱은 동적 프로그래밍 경로들이 종료 지점들로 전파되는 단계 s105로 진행한다.This three-dimensional dynamic programming process will be described in more detail. It will be appreciated by those skilled in the art that the three-dimensional dynamic programming algorithm is essentially the same as the two-dimensional dynamic programming algorithm used in the first embodiment except that it adds some additional control loops to account for the additional questions. To 25 and 13 to 18, respectively. The 3D dynamic programming algorithm compares the two queries with the annotations following all the steps shown in Fig. Figure 20 shows in more detail the processing steps included in step s103 when propagating dynamic programming paths from all possible starting points defined by dynamic programming constraints from a null starting node ( _s ). In this regard, the constraints are that the dynamic programming path starts at any one of the annotation phonemes and the path will start at any one of the first four phonemes in any query. Therefore, referring to Fig. 20, in step s401, the system sets the values of the variables mxj and mxk to the same mxhops as the constraint used in the first embodiment. Therefore, in this embodiment, if each input query includes four or more phonemes, mxj and mxk are all set equal to four. Otherwise, mxj and / or mxk are set equal to the number of phonemes in the corresponding query. The processing then proceeds to steps s403 through s417 which are operable to start dynamic programming paths at points (i, j, k) with Nann-1, j = 0 through 3 and i = 0 for k = . This ends processing at step s103 shown in Fig. 12 and then processing proceeds to step s105 where the dynamic programming paths are propagated to end points.

제1 실시예서와 같이, 본 실시예에서는, 시스템이 래스터와 같은 형태의 검색 공간에서 지점들을 프로세싱함으로써 시작 지점들로부터 종료 지점들로의 동적 프로그래밍 경로들을 전파한다. 상기 래스터 프로세싱 연산을 제어하는데 사용되는 제어 알고리즘이 도 21에 도시되어 있다. 도 21과 도 14를 비교하여 알 수 있는 바와 같이, 상기 제어 알고리즘은 제1 실시예에서 사용되는 제어 알고리즘과 동일한 일반적인 형태를 갖는다. 단지 차이는 보다 복잡한 전파 단계 s419와 제2 입력 질의에 의해 야기되는 추가 지점들을 프로세싱하기 위해 필요한 질의 블록 s421, 블록 s423, 블록 s425의 제공에 있다. 도 21에 도시된 제어 알고리즘이 어떻게 동작하는지에 대한 보다 충분한 이해를 위해, 독자는 상기 도 14에서 주어진 설명을 참조한다.As in the first embodiment, in the present embodiment, the system propagates dynamic programming paths from start points to end points by processing points in a search space, such as a raster. The control algorithm used to control the raster processing operation is shown in FIG. As can be seen by comparing FIGS. 21 and 14, the control algorithm has the same general form as the control algorithm used in the first embodiment. The only difference is in the provision of query blocks s421, s423, and s425, which are needed to process more complex propagation steps s419 and additional points caused by the second input query. For a better understanding of how the control algorithm shown in FIG. 21 operates, the reader refers to the description given in FIG. 14 above.

도 22는 종료 지점들에서의 경로들을 종료 널 노드(ø_e)로 전파시킬 때 본 실시예에서 도 12에 도시된 단계 s107에 사용된 프로세싱 단계들을 보다 상세하게 도시한다. 도 22와 도 15를 비교하여 알 수 있는 바와 같이, 본 실시예에서의 단계 s107에 포함된 프로세싱 단계들은 제1 실시예에서 사용된 대응하는 단계들과 유사하다. 그 차이는 보다 복잡한 천이 스코어 계산 블록 s443, 및 제2 질의로 인해 추가 격자 지점들을 처리하는데 요구되는 추가 블록들 (s439, s441, 및 s449) 및 변수 (k)이다. 그러므로, 단계들 s431 내지 s449 내에 포함된 프로세싱을 이해하기 위해, 상기 도 15에 대한 설명을 참조한다.Fig. 22 shows the processing steps used in step s107 shown in Fig. 12 in this embodiment in more detail when propagating the paths at the end points to the end null node _e . As can be seen by comparing FIGS. 22 and 15, the processing steps included in step s107 in this embodiment are similar to the corresponding steps used in the first embodiment. The difference is a more complex transition score calculation block s443 and additional blocks s439, s441, and s449 and variable k required to process additional grid points due to the second query. Therefore, in order to understand the processing included in steps s 431 to s 449, reference is made to the description of FIG.

도 23은 도 21에 도시된 전파 단계 s419에 포함된 프로세싱 단계들을 도시하고 있다. 도 16은 상술한 2차원 실시예들에 대해 대응하는 흐름도를 도시하고 있다. 도 23과 도 16의 비교로부터 알 수 있는 바와 같이, 2개 실시예들 간의 주요 차이점들은 제2 질의로 인해 추가 격자 지점을 프로세싱하는 것이 요구되는 프로세싱 블록들 (s451, s453, s455, 및 s457), 및 추가 변수들 (mxk 및 k2)이다. 또한, 도 23은 양 질의들이 음성이어야 하며 하나는 주해가 텍스트일 때에 대한 것이고나머지는 주해가 음성일 때에 대한 것인 2개의 메인 브랜치들이 존재하기 때문에 약간 더 간단하다. 도 23에 도시된 흐름도에 포함된 프로세싱 단계들을 보다 잘 이해하기 위해 도 16의 설명을 참조한다.Fig. 23 shows the processing steps included in the propagation step s419 shown in Fig. Fig. 16 shows a corresponding flow chart for the two-dimensional embodiments described above. As can be seen from the comparison of Figures 23 and 16, the main differences between the two embodiments are the processing blocks s451, s453, s455, and s457, which are required to process additional lattice points due to the second query, , And additional variables (mxk and k2). Also, Figure 23 is somewhat simpler because there are two main branches for which both queries should be negative, one for when the annotation is text, and the other for when the annotation is negative. Reference is made to the description of FIG. 16 to better understand the processing steps involved in the flow chart shown in FIG.

도 24는 동적 프로그래밍 경로가 도 23의 프로세싱 단계들 동안에 지점 (i, j, k)로부터 지점 (i2, j2, k2)로 전파될 때 천이 스코어를 계산하는 것에 포함되는 프로세싱 단계들을 도시한 흐름도이다. 도 17은 상술한 2차원 실시예들에 대해 대응하는 흐름도를 도시하고 있다. 도 24와 도 17을 비교함으로써 알 수 있는 바와 같이 본 실시예와 제1 실시예 간의 주요한 차이는 제2 질의 내에 삽입된 음소들에 대해 삽입 확률들을 계산하기 위한 추가적인 프로세스 단계 s461이다. 그러므로, 도 24에 도시된 흐름도 내에 포함된 프로세싱 단계들을 잘 이해하기 위해, 도 17의 설명을 참조한다.Figure 24 is a flow chart illustrating the processing steps involved in calculating the transition score when the dynamic programming path is propagated from point (i, j, k) to point (i2, j2, k2) during the processing steps of Figure 23 . Figure 17 shows a corresponding flow chart for the two-dimensional embodiments described above. As can be seen by comparing Figs. 24 and 17, the main difference between this embodiment and the first embodiment is the additional process step s461 for calculating insertion probabilities for phonemes inserted in the second query. Therefore, to better understand the processing steps included in the flowchart shown in FIG. 24, reference is made to the description of FIG.

지점 (i, j, k)로부터 지점 (i2, j2, k2)로 전파할 때 삭제 및/또는 디코딩 스코어들을 결정하는 도 24의 단계 s463 내에 포함된 프로세싱 단계들이 도 25를 참조로 보다 상세히 설명될 것이다. 가능한 삭제들 및 디코딩들은 주해가 텍스트 또는 음성으로부터 생성되는지 여부에 따르므로, 판정 블록 s501은 주해가 텍스트인지 음성인지를 판정한다. 주해가 텍스트로부터 생성된다면, 음소 루프 포인터 i2는 주해 음소 a_i+1로 지시되어야 한다. 다음에, 프로세싱은 주해에 관련된 제1 및 제2 질의 내의 임의의 음소 삭제들이 존재하는지 여부를 판정하도록 동작 가능한 단계들 s503, s505, 및 s507로 진행한다. 존재한다면, j2 및/또는 k2는 각각 j 또는 k와 동일할 것이다.The processing steps included in step s463 of Fig. 24 for determining erasure and / or decoding scores when propagating from point (i, j, k) to point (i2, j2, k2) will be. Since possible deletions and decodings depend on whether the annotation is generated from text or speech, decision block s501 determines whether the annotation is text or speech. If an annotation is generated from the text, the phoneme loop pointer i2 should be indicated by the annotation phoneme _{ai + 1} . The processing then proceeds to steps s503, s505, and s507 that are operable to determine whether any phoneme deletes in the first and second queries related to the annotation are present. If present, j2 and / or k2 will be equal to j or k, respectively.

- j2가 j와 동일하지 않고 k2가 k와 동일하지 않다면, 주해와 관련한 질의들 내의 삭제는 존재하지 않으며 프로세싱은 주해 음소 a_i+1을 제1 질의 음소 q_j2로서 디코딩하는 로그 확률이 DELSCORE에 복사되는 단계 s509로 진행한다. 다음에, 프로세싱은 주해 음소 a_i+1을 제2 질의 음소 q_k2로서 디코딩하는 로그 확률이 DELSCORE에 가산되는 단계 s511로 진행한다.If - j2 is not equal to j and k2 is not equal to k, there is no deletion in queries related to the annotation and the processing has the log probability to decode the annotated phoneme _{ai + 1} as the first query phoneme q _j2 to DELSCORE The process advances to step s509. Next, processing proceeds to step s511, the log probability of decoding a phoneme annotation a _{i + 1} as the second query phoneme q _k2 is added to the DELSCORE.

- 시스템이 j2가 j와 동일하지 않고 k2가 k와 동일한 것으로 판정한다면 프로세싱은 주해 음소 a_i+1을 삭제하는 확률이 결정되어 DELSCORE로 복사되며 주해 음소 a_i+1을 제1 질의 음소 q_j2로서 디코딩하는 로크 확률이 DELSCORE에 각각 추가되는 단계들 s513 및 s515로 진행한다.- If the system determines that j2 is k2 is equal to k not equal to the j processing annotation phoneme probability of deleting the a _{i + 1} is determined and copied into the DELSCORE annotation phoneme a _{i + 1,} the first query phoneme q _j2 Lt; RTI ID = 0.0 > s513 < / RTI > and s515, respectively.

- 시스템이 j2가 j와 동일하고 k2가 k와 동일한 것으로 판정한다면, 프로세싱은 시스템이 제1 및 제2 질의 양자 모두로부터 주해 음소 a_i+1을 삭제하는 로그 확률을 결정하여 그 결과를 DELSCORE로 저장하는 단계들 s517 및 s519로 진행하고 그 결과는 DELSCORE에 저장한다.If the system determines that j2 is equal to j and k2 is equal to k, the processing determines the log probability that the system deletes the annotation phoneme _{ai + 1} from both the first and second qualities and passes the result to DELSCORE The process proceeds to storing steps s517 and s519 and stores the result in DELSCORE.

- 시스템이 j2가 j와 동일하고 k2가 k와 동일하지 않다면, 프로세싱은 각각 주해 음소 a_i+1의 로그 확률을 DELSCORE로 복사하고 주해 음소 a_i+1을 제2 질의 음소 q_k2로서 디코딩하는 로그 확률을 DELSCORE에 가산하도록 동작 가능한 단계들 s521 및 s523으로 진행한다.- the system is j2 is not, the same as j and k2 are not equal to k, the processing for each copy and the log probability of the annotation phoneme a _{i + 1} to DELSCORE and decode annotation phoneme a _{i + 1} as the second query phoneme q _k2 Proceed to steps s521 and s523 operable to add the log probability to DELSCORE.

단계 s501에서, 시스템이 주해가 음성으로부터 생성된 것으로 판정하면, 시스템은 각각 i2, j2, 및 k2를 i, j, k와 비교함으로써 주해 또는 2개의 질의로부터 임의의 음소 삭제가 존재하는지를 (단계들 s525 내지 s537에서) 판정한다. 도 25b 내지 25e에 도시된 바와 같이, 주해가 음성으로부터 생성될 때 8개의 가능한 상황에 대해 적절한 디코딩 및 삭제 확률들을 판정하도록 동작하는 8개의 주요 브렌치들이 존재한다. 각각의 상황에서 수행되는 프로세싱은 매우 유사하므로, r 중 하나의 상황만이 설명될 것이다.If the system determines in step s501 that the annotation is generated from speech, the system determines whether there is any phoneme deletion from the annotation or two queries by comparing i2, j2, and k2, respectively, with i, j, s525 to s537). As shown in FIGS. 25B-25E, there are eight major branches that are operative to determine appropriate decoding and erasure probabilities for eight possible situations when an annotation is generated from speech. Since the processing performed in each situation is very similar, only one of r's situations will be explained.

특히, 단계 525, 527, s531에서, 시스템이 (i2=i 때문에) 주해로부터 삭제가 존재하고 (j2 ≠ j 및 k2 ≠ k이기 때문에) 2개의 질의로부터 삭제가 존재하지 않는다면, 프로세싱은 음소 루프 포인터 r이 1로 초기화되는 단계 s541로 진행한다. 음소 루프 포인터 r은 제1 실시예에서 상술한 수학식 4와 유사한 수학식의 계산 동안에 시스템에 알려진 각각의 가능한 음소를 통해 루핑되는데 사용된다. 다음에, 프로세싱은 음소 포인터를 시스템에 알려진 음소수(Nphonemes)(본 실시예에서는 43)와 비교하는 단계 s543으로 진행한다. 처음에, r은 단계 s541에서 1로 설정된다. 그러므로, 프로세싱은 시스템이 발생하는 음소 p_r의 로그 확률을 판정하고 이를 임시 스코어 TEMPDELSCORE로 복사하는 단계 s545로 진행한다. 다음에, 프로세싱은 시스템이 주해 내의 음소 p_r를 삭제하는 로그 확률을 판정하여 이를 TEMPDELSCORE에 가산하는 단계 s547로 진행한다. 다음에, 프로세싱은 음소 p_r을제1 질의 음소 q¹ _j2로서 디코딩하는 로그 확률을 판정하여 이를 TEMPDELSCORE에 가산하는 단계 s549로 진행한다. 다음에, 프로세싱은 시스템이 음소 p_r을 제2 질의 음소 q² _k2로서 디코딩하는 로그 확률을 결정하고 이를 TEMPDELSCORE에 가산하는 단계 s551로 진행한다. 다음에, 프로세싱은 시스템이 TEMPDELSCORE와 DELSCORE의 로그 가산을 수행하고 그 결과를 DELSCORE에 저장하는 단계 s553으로 진행한다. 다음에, 프로세싱은 음소 포인터 r이 1만큼 증가되는 단계 s555로 진행한다. 다음에, 프로세싱은 시스템에 알려진 다음의 음소에 대해 유사한 프로세싱이 수행되는 단계 s543으로 복귀한다. 이러한 계산이 시스템에 알려진 43개의 음소 각각에 대해 수행되었다면, 프로세싱은 종료한다.Specifically, at steps 525, 527 and s531, if there is no deletion from the two queries (since j2? J and k2? K) and there is no deletion from the annotation (because i2 = i) the process proceeds to step s541 where r is initialized to 1. The phoneme loop pointer r is used to loop through each possible phoneme known to the system during the calculation of the mathematical expression similar to Equation 4 above in the first embodiment. Next, processing proceeds to step s543 of comparing the phoneme pointer to the number of phonemes (Nphonemes) known in the system (43 in this embodiment). Initially, r is set to 1 in step s541. Thus, processing proceeds to step s545 where the system determines the log probability of the phoneme p _r that is generated and copies it to the temporary score TEMPDELSCORE. Processing then proceeds to step s547 where the system determines the log probability of erasing the phoneme p _r in the annotation and adds it to TEMPDELSCORE. Processing then proceeds to step s549 of determining the log probability to decode the phoneme p _{r as the} first quality phoneme q ¹ _j2 and adding it to TEMPDELSCORE. Processing then proceeds to step s551 where the system determines the log probability that the phoneme p _r is decoded as the second query phoneme q ² _k2 and adds it to TEMPDELSCORE. Processing then proceeds to step s553 where the system performs the log addition of TEMPDELSCORE and DELSCORE and stores the result in DELSCORE. Processing then proceeds to step s555 where the phoneme pointer r is incremented by one. Processing then returns to step s543 where similar processing is performed for the next phoneme known to the system. If these calculations were performed for each of the 43 phonemes known to the system, the processing ends.

도 25에서 수행되는 프로세싱 단계들과 도 18에서 수행되는 단계들로부터 알 수 있는 바와 같이, 디코딩 및 삭제를 위한 동적 프로그래밍 알고리즘 내에서 계산되는 항은 수학식 4와 유사하지만, 제2 질의에 대한 추가 확률 항을 갖는다. 특히, 다음과 같다.As can be seen from the processing steps performed in Fig. 25 and the steps performed in Fig. 18, the terms computed within the dynamic programming algorithm for decoding and deleting are similar to Equation 4, but the addition to the second query Probability terms. Specifically, it is as follows.

2개의 질의는 서로 조건적으로 독립되어 있으므로, 이는 예상된 것이다.Since the two queries are conditionally independent of each other, this is expected.

모든 동적 프로그래밍 경로가 종료 노드 ø_e로 전파된 후에, 정렬을 위한 전체 스코어는 제1 실시예에서 계산되었던 것과 동일한 (상기 수학식 5에서 주어진) 정규화 항으로 정규화된다. 이는 정규화 항이 모델에 대한 주해의 유사성에만 종속적이기 때문이다. 2개의 질의가 모든 주해에 대해 매칭되었다면, 주해에 대한 정규화 스코어는 계층화되고, 이 계층화를 기초로 하여 시스템은 사용자에게 입력 질의들에 가장 가까운 주해 또는 주해들을 출력한다.After all the dynamic programming paths have been propagated to the terminating node? _E , the overall score for the alignment is normalized to the same normalization term (given in Equation 5 above) as that calculated in the first embodiment. This is because the normalization term is only dependent on the similarity of the annotation to the model. If the two queries are matched for all annotations, the normalization score for the annotation is layered, and based on this stratification, the system outputs the annotations or annotations closest to the input queries to the user.

상술한 제2 실시예에서, 2개의 입력 질의는 저장된 주해들과 비교되었다. 당업자라면 상기 알고리즘이 임의 수의 입력 질의들에 적응될 수 있다는 것을 인식할 것이다. 2개 질의인 경우에 대해 입증된 바와 같이, 추가 질의를 고려하기 위해 추가 질의 가산은 알고리즘 내의 루프 수의 가산을 간단히 포함한다. 그러나, 3개 이상의 입력 질의가 저장된 주해들과 비교되는 실시예에서는, 일정한 속도 또는 메모리 제약들을 충족시키기 위해 삭감을 채용한 동적 프로그래밍 루틴을 사용하는 것이 필요할 것이다. 이러한 경우에, 모든 경로들에 함께 모든 확률들을 가산하기 보다는, 단지 만나는 경로에 대한 최상의 스코어만이 전파되고 경로 스코어링이 종료된다.In the second embodiment described above, the two input queries were compared to stored annotations. Those skilled in the art will appreciate that the algorithm can be adapted to any number of input queries. As evidenced for the case of two queries, the additional query addition simply includes an addition of the number of loops in the algorithm to account for the additional query. However, in embodiments in which three or more input queries are compared to saved annotations, it may be necessary to use dynamic programming routines employing cuts to meet constant speed or memory constraints. In this case, rather than summing all probabilities together in all paths, only the best score for the path that they meet is propagated and path scoring ends.

다른 실시예Other Embodiments

당업자라면, 음소의 한 시퀀스를 음소의 다른 시퀀스와 매칭시키기 위한 상기 기술이 데이터 검색이 아닌 응용분야에 적용될 수 있다는 것을 인식할 것이다. 또한, 당업자라면, 상술한 시스템이 음소와 단어 격자 내의 음소들을 사용하였지만, 음절 또는 가타가나 (일본 알파벳)와 같은 다른 음소 유사 단위가 사용될 수 있다.Those skilled in the art will recognize that the techniques described above for matching a sequence of phonemes with other sequences of phonemes may be applied to applications other than data retrieval. Also, those skilled in the art will appreciate that while the system described above uses phonemes and phonemes within the word grid, other phoneme-like units such as syllables or katakana (Japanese alphabets) may be used.

당업자라면, 동적 프로그래밍 매칭과 음소들의 2개의 시퀀스들의 정렬의 상기 기술이 단지 예시적인 방법으로 제공되었으며 다양한 수정이 이루어질 수 있다는 것을 인식할 것이다. 예를 들어, 격자 지점들을 통해 경로들을 전파시키기 위한 레스터 스캐닝 기술이 채용되면서, 격자 지점들을 통과하는 경로를 전파하는 다른 기술들이 채용될 수 있다. 또한, 당업자라면 상기 논의된 것과 다른 동적 프로그래밍 제약들이 매칭 프로세스 제어에 사용될 수 있다는 것을 인식할 것이다.Those skilled in the art will recognize that the above description of dynamic programming matching and alignment of two sequences of phonemes has been provided in an exemplary manner only and various modifications may be made. For example, while employing a raster scanning technique to propagate paths through lattice points, other techniques can be employed to propagate the path through lattice points. It will also be appreciated by those skilled in the art that other dynamic programming constraints than those discussed above may be used for matching process control.

상기 실시예에서, 주해는 일반적으로 질의보다 길며 동적 프로그래밍 알고리즘은 질의와 전체 주해를 정렬시켰다. 다른 실시예에서, 정렬 알고리즘은 시작부터 종료까지의 주해에 걸쳐 질의를 스테핑(stepping)함으로써 질의와 주해를 비교할 수 있으며, 각각의 단계에서, 질의와 대략 동일한 크기의 주해의 한 부분과 질의를 비교한다. 이러한 실시예에서, 각각의 단계에서, 질의는 상술한 바와 유사한 동적 프로그래밍 기술을 사용하여 주해의 대응하는 부분과 정렬된다. 이러한 기술은 질의가 도 26b에 도시된 주해에 걸쳐 스테핑됨에 따라 질의와 현재 주해간의 정렬에 대한 동적 프로그래밍 스코어가 변화하는 방법을 도시한 결과 플롯으로 도 26a에 도시되어 있다. 도 26b에 도시된 플롯 내의 피크들은 질의와 가장 매칭하는 주해의 일부를 나타낸다. 다음에, 질의와 가장 유사한 주해가 질의의 비교 동안에 얻어지는 피크 DP 스코어를 각각의 주해와 비교함으로써 결정될 수 있다.In the above example, the annotation is generally longer than the query and the dynamic programming algorithm aligned the query and the entire annotation. In another embodiment, the sorting algorithm may compare the query and the annotation by stepping the query over the annotation from start to end, and at each step, compare the query with a portion of the annotation of approximately the same size as the query do. In this embodiment, at each step, the query is aligned with the corresponding portion of the annotation using a dynamic programming technique similar to that described above. This technique is shown in FIG. 26A with a result plot showing how the dynamic programming score for the alignment between the query and the current annotation changes as the query is stepped over the annotation shown in FIG. 26B. The peaks in the plot shown in Figure 26B represent part of the annotation that best matches the query. Next, the annotation most similar to the query can be determined by comparing the peak DP score obtained during the query comparison with the respective annotations.

상기 실시예에서, 사진은 음소 및 단어 격자 주해 데이터를 사용하여 주해가 부여되었다. 당업자라면 이러한 음소 및 단어 격자 데이터가 많은 다른 형태의 데이터 파일들에 주해를 다는데 사용될 수 있다는 것을 인식할 것이다. 예를 들어,이러한 종류의 주해 데이터는 환자의 x-ray, 예를 들어, NMR스캔, 초음파 스캔 등의 3D 비디오에 주해를 붙이기 위한 의학적인 응용에 사용될 수 있다.In the above example, the photograph was annotated using phoneme and word grid annotation data. Those skilled in the art will appreciate that such phoneme and word grid data may be used to comment on many different types of data files. For example, this kind of annotation data can be used in medical applications to annotate 3D video, such as patient x-rays, e.g., NMR scans, ultrasound scans, and the like.

상기 실시예에서, 입력 음성 신호로부터 음소의 시퀀스를 생성하는 음성 인식 시스템이 사용되었다. 당업자라면 상기 시스템이 예를 들어, 음소 열을 생성하는 인식기를 시뮬레이션하기 위해 대안으로서 음소의 대응하는 열로 분해될 수 있는 출력 단어 또는 단어 격자의 시퀀스를 생성하는 다른 형태의 음성 인식 시스템이 사용될 수 있다는 것을 인식할 것이다.In the above embodiment, a speech recognition system for generating a sequence of phonemes from input speech signals has been used. Those skilled in the art will appreciate that other types of speech recognition systems may be used that generate an output word or sequence of word lattices that can be decomposed into corresponding columns of phonemes as an alternative, for example, to simulate a recognizer that generates phoneme strings &Lt; / RTI >

상기 실시예에서, 삽입, 삭제, 및 디코딩 확률들은 확률들의 최대 가능 추정치를 사용하여 음성 인식 시스템에 대해 혼동 통계로부터 계산되었다. 당업자라면, 최대 엔트로피 기술(maximun entropy 기술과 같은 다른 기술들이 상기 확률들을 추정하는데 사용될 수 있다는 것을 인식할 것이다. 적합한 최대 엔트로피 기술의 세부사항은 존 스킬링(John Skilling)에 의해 저술되었으며 클워 아카데믹 퍼블리셔(Kluwer Academic publishers)에 의해 발행된 제목 "최대 엔트로피 및 베이즈 방법(Maximum Entropy and Bayesian Methods)"의 제45면 내지 제52면에서 찾을 수 있으며, 그 내용은 본 명세서에 참조로서 포함되어 있다.In this embodiment, the insertion, deletion and decoding probabilities have been calculated from the confusion statistics for the speech recognition system using the maximum likelihood estimates of probabilities. Those skilled in the art will recognize that other techniques, such as maximum entropy techniques, can be used to estimate these probabilities. Details of suitable maximum entropy techniques have been described by John Skilling and published by the Institute of Clinical Academic Publishers Pp. 45 to 52 of the title entitled " Maximum Entropy and Bayesian Methods, " published by Kluwer Academic publishers, the contents of which are incorporated herein by reference.

상기 실시예에서, 데이터베이스(29)와 자동 음성 인식 장치(51)는 사용자 터미널(59) 내에 위치한다. 당업자라면 이것이 필수적이 아니라는 것을 인식할 것이다. 도 27은 데이터베이스(29)와 검색 엔진(53)이 원격 서버(60)에 위치하고 사용자 터미널(59)이 네트웍 인터페이스 장치(67 및 69) 및 (인터넷과 같은) 데이터 네트웍(68)을 통해 데이터베이스(29)를 억세스하는 실시예를 도시하고 있다. 본 실시예에서, 사용자 터미널(59)은 단지 마이크로폰(7)으로부터의 음성 질의들을 수신할 수 있다. 이러한 질의들은 자동 음성 인식 장치(51)에 의해 음소 및 단어 데이터로 변환된다. 다음에, 이 데이터는 데이터 네트웍(68)을 통해 원격 서버(60) 내에 위치한 검색 엔진(53)으로의 데이터의 전송을 제어하는 제어 장치(55)로 통과된다. 다음에, 검색 엔진(53)은 상기 실시예에서 검색이 수행되었던 것과 유사한 방식으로 검색을 수행한다. 다음에, 검색의 결과는 검색 엔진(53)으로부터 데이터 네트웍(68)을 통해 제어 장치(55)로 다시 전송된다. 다음에, 제어 장치(55)는 네트웍으로부터 다시 수신되는 검색 결과를 고려하여 사용자(39)에 의해 시청되도록 디스플레이(57) 상에 적절한 데이터를 표시한다. 원격 서버(60) 내에 데이터베이스(29)와 검색 엔진(53)을 위치시키는 것에 추가하여, 원격 서버(60) 내에 자동 음성 인식 장치(51)를 위치시키는 것이 또한 가능하다. 이러한 실시예는 도 28에 도시되어 있다. 도시된 바와 같이, 본 실시예에서, 사용자로부터의 입력 음성 질의는 데이터 네트웍(68)을 통한 효율적인 전송을 위해 음성을 인코딩하도록 동작 가능한 음성 인코딩 장치(73)로 입력 라인(61)을 통해 전송된다. 이 인코딩된 데이터는 다음에 네트웍(68)을 통해 원격 서버(60)로 데이터를 송신하는 제어 장치(55)로 전송되는데, 데이터는 자동 음성 인식 장치(51)에 의해 처리된다. 입력 질의에 대해 음성 인식 장치(51)에 의해 생성된 음소 및 단어 데이터는 데이터베이스(29) 검색에 이용하기 위해 검색 엔진(53)으로 통과된다. 검색 엔진(53)에 의해 생성된 검색 결과들은 네트웍 인터페이스(69) 및 네트웍(68)을 거쳐 다시 사용자 터미널(59)로 전송된다. 원격 서버로부터 다시 수신된 검색 결과는 네트웍 인터페이스 장치(67)를 통해 제어 장치(55)로 전송되는데, 여기서 제어 장치(55)는 결과를 분석하고 사용자(39)에 의한 시청을 위해 디스플레이(57) 상에 적절한 데이터를 표시한다.In the above embodiment, the database 29 and the automatic speech recognition device 51 are located in the user terminal 59. [ Those skilled in the art will recognize that this is not necessary. Figure 27 shows an example in which database 29 and search engine 53 are located at remote server 60 and user terminal 59 is connected to network interface devices 67 and 69 and data network 68 (such as the Internet) 29 in accordance with an embodiment of the present invention. In this embodiment, the user terminal 59 can only receive voice queries from the microphone 7. These queries are converted into phoneme and word data by the automatic speech recognition device 51. [ This data is then passed through the data network 68 to the control device 55 which controls the transfer of data to the search engine 53 located in the remote server 60. Next, the search engine 53 performs the search in a manner similar to that in which the search was performed in the above embodiment. Next, the result of the search is sent back from the search engine 53 to the control device 55 via the data network 68. Next, the control device 55 displays appropriate data on the display 57 so as to be viewed by the user 39 in consideration of the retrieval result received from the network again. In addition to locating the database 29 and the search engine 53 in the remote server 60, it is also possible to place the automatic speech recognition device 51 within the remote server 60. This embodiment is shown in Fig. As shown, in this embodiment, an input voice query from a user is transmitted over the input line 61 to a voice encoding device 73 operable to encode the voice for efficient transmission over the data network 68 . The encoded data is then transmitted via network 68 to a control device 55 which transmits the data to the remote server 60, the data being processed by the automatic speech recognition device 51. The phonemes and word data generated by the speech recognition device 51 for the input query are passed to the search engine 53 for use in the database 29 search. The search results generated by the search engine 53 are transmitted to the user terminal 59 via the network interface 69 and the network 68 again. The retrieved results received from the remote server are transmitted to the control device 55 via the network interface device 67 where the control device 55 analyzes the results and displays 57 on the display for viewing by the user 39. [ Lt; / RTI >

유사한 방식으로, 사용자 터미널(59)은 사용자로부터 타이핑된 입력만 허용하며 원격 서버 내에 위치한 검색 엔진 및 데이터베이스를 갖는다. 이러한 실시예에서, 음성 녹음 장치(75)가 마찬가지로 원격 서버(60) 내에 위치할 수 있다.In a similar manner, the user terminal 59 has only a typed input from the user and has a search engine and a database located within the remote server. In this embodiment, the voice recording device 75 may likewise be located in the remote server 60. [

상기 실시예들에서, 동적 프로그래밍 알고리즘은 질의 음소의 시퀀스를 주해 음소와 정렬시키는데 사용되었다. 당업자라면, 임의의 정렬 기술이 사용될 수 있다는 것을 인식할 것이다. 예를 들어, 모든 가능한 정렬을 식별하는데 특유의 기술이 사용될 수 있다. 그러나, 동적 프로그래밍은 표준 프로세싱 하드웨어를 사용하여 구현하는 것이 바람직하다.In the above embodiments, the dynamic programming algorithm was used to align the sequence of query phonemes with the phonemes. Those skilled in the art will recognize that any alignment technique may be used. For example, a technique unique to identifying all possible alignments may be used. However, dynamic programming is preferably implemented using standard processing hardware.

음소들의 2개 이상의 기준 시퀀스가 다중 프로그래밍 기술을 사용하여 비교되는 방식이 위에서 설명되었다. 그러나, 도 2 및 도 3에 도시된 바와 같이, 주해는 양호하게 격자들로서 저장된다. 당업자라면 상기 비교 기술이 격자들을 사용하여 동작할 수 있도록 하기 위해, 격자들에 의해 정의된 음소 시퀀스가 브랜치가 없는 음소들의 단일 시퀀스로 "평탄화(flattened)"되어야 한다는 것을 인식할 것이다. 이를 행하는 고유한 접근방식은 격자에 의해 정의된 모든 가능한 다른 음소 시퀀스들을 식별하여 이들 각각을 각각의 질의 시퀀스와 비교하는 것이다. 그러나, 이는 선호되지 않는데, 이는 격자의 공통 부분들이 각각의 질의 시퀀스와 여러번 매칭되지 않을 것이기 때문이다. 그러므로, 격자는 그 내부의 각각의 음소에대해 이용 가능한 타임 스탬프(time stamp) 정보에 따라 격자 내의 각각의 음소에 순차적으로 라벨을 부여함으로써 양호하게 평탄화된다. 다음에, 동적 프로그래밍 정렬 동안에, 상이한 동적 프로그래밍 제약들이 격자 구조에 따라 경로가 전파하는 것을 보장하기 위해 각각의 DP 격자 지점에서 사용된다.The manner in which two or more reference sequences of phonemes are compared using multiple programming techniques has been described above. However, as shown in Figs. 2 and 3, the annotations are preferably stored as grids. Those skilled in the art will recognize that the phoneme sequence defined by the grids must be " flattened " with a single sequence of branchless phonemes in order to allow the comparison technique to operate using gratings. A unique approach to doing this is to identify all possible different phoneme sequences defined by the lattice and compare each with each of the query sequences. However, this is not preferred because the common portions of the grid will not be matched multiple times with each query sequence. Thus, the lattice is well flattened by sequentially labeling each phoneme in the lattice according to available time stamp information for each phoneme therein. Next, during dynamic programming alignment, different dynamic programming constraints are used at each DP lattice point to ensure that the path propagates according to the lattice structure.

아래의 도표는 도 2에 도시된 음소 격자의 일부에 대해 사용되는 DP 제약을 나타내고 있다. 특히, 제1 열은 격자의 각각의 음소에 할당된 음소 번호 (p_i내지 p₉)를 나타내고 있다. 중간 열은 격자 내의 실제 음소에 대응한다. 마지막 열은 각각의 음소에 대해 동적 프로그래밍 시점에서 그 음소에서 종료하는 경로가 전파할 수 있는 음소들을 나타내고 있다. 도시되지는 않았지만, 중간열은 또한 음소가 접속되고 대응하는 음소가 링크되는 노드의 세부사항을 포함할 것이다.The table below shows the DP constraints used for some of the phoneme grids shown in FIG. In particular, the first column shows the phoneme numbers (p _i to p ₉ ) assigned to the respective phonemes of the lattice. The middle column corresponds to the actual phoneme in the lattice. The last column shows the phonemes for each phoneme that can be propagated by the path ending at that phoneme at the point of dynamic programming. Although not shown, the middle column will also include the details of the node to which the phoneme is connected and the corresponding phoneme is linked.

음소 번호Phoneme number 음소phoneme 동적 프로그래밍 제약Dynamic programming constraints p₁ p ₁ /p// p / p₁; p₂; p₃; p₄ p ₁ ; p ₂ ; p ₃ ; p ₄ p₂ p ₂ /ih// ih / p₂; p₃; p₄; p₅ p ₂ ; p ₃ ; p ₄ ; p ₅ p₃ p ₃ /k// k / p₆p₃; p₄; p₅; p₇p₈ p ₆ p ₃ ; p ₄ ; p ₅ ; p ₇ p ₈ p₄ p ₄ /ch// ch / p₆; p₁₀p₄; p₅; p₇; p₉p₈; p₁₁ p ₆ ; p ₁₀ p ₄ ; p ₅ ; p ₇ ; p ₉ p ₈ ; p ₁₁ p₅ p ₅ /ax// ax / p₆; p₁₀; p₁₂p₅; p₇; p₉; p₁₂p₈; p₁₁; p₁₃p₁₄ p ₆ ; p ₁₀ ; p ₁₂ p ₅ ; p ₇ ; p ₉ ; p ₁₂ p ₈ ; p ₁₁ ; p ₁₃ p ₁₄ p₆ p ₆ /ax// ax / p₆; p₁₀; p₁₂; p₁₅p₁₆ p ₆ ; p ₁₀ ; p ₁₂ ; p ₁₅ p ₁₆ p₇ p ₇ /ao// ao / p₇; p₉; p₁₂; p₁₅p₁₆ p ₇ ; p ₉ ; p ₁₂ ; p ₁₅ p ₁₆ p₈ p ₈ /ah// ah / p₈; p₁₁; p₁₃; p₁₈p₁₄; p₁₇ p ₈ ; p ₁₁ ; p ₁₃ ; p ₁₈ p ₁₄ ; p ₁₇ p₉ p ₉ /f// f / p₉; p₁₂; p₁₅; p₁₈p₁₆; p₁₈ p ₉ ; p ₁₂ ; p ₁₅ ; p ₁₈ p ₁₆ ; p ₁₈

예를 들어, 동적 프로그래밍 경로가 시간 정렬된 음소 p₄에서 종료한다면, 그 동적 프로그래밍 경로는 음소 p₄에 머무르거나 시간 정렬된 음소 p₅내지 p₁₁중 임의의 음소로 전파할 수 있다. 도표에 도시된 바와 같이, 몇몇의 지점에서 경로가 연장할 수 있는 가능성이 있는 음소들은 시간 정렬된 음소 시퀀스 내에 연속적으로 배열되지 않는다. 예를 들어, 시간 정렬된 음소 p₆에서 종료하는 동적 프로그래밍 경로에 있어서, 이 경로는 상기 음소에 머무르거나 음소 p₁₀, p₁₂, p₁₅또는 p₁₆으로 진행할 수 있다. 이러한 방식으로 격자 내의 음소들에 연속적으로 번호를 부여하고 격자에 종속되어 사용되는 동적 프로그래밍 제약들을 변화시킴으로써, 입력 질의와 주해 격자 간에 효율적인 동적 프로그래밍 매칭이 이루어질 수 있다. 또한, 당업자라면, 입력 질의가 격자를 생성할 경우, 이는 유사한 방식으로 평탄화될 수 있어 동적 프로그래밍 제약들이 따라서 조정될 수 있다.For example, if the dynamic programming path ends at time aligned phoneme p ₄ , the dynamic programming path may remain at phoneme p ₄ or propagate to any phoneme of time-aligned phoneme p ₅ to p ₁₁ . As shown in the diagram, the phonemes at which the path is likely to extend at some point are not consecutively arranged within the time aligned phoneme sequence. For example, in a dynamic programming path which ends at time aligned phoneme p _6, the path may proceed stay in the phoneme or a phoneme p _10, p _12, p _15, or ₁₆ p. In this manner, efficient dynamic programming matching between the input query and the annotation lattice can be achieved by sequentially numbering the phonemes in the lattice and varying the dynamic programming constraints used in dependence on the lattice. Also, those skilled in the art will appreciate that when the input query generates a grid, this can be planarized in a similar manner, so that dynamic programming constraints can be adjusted accordingly.

상기 실시예에서, 동일한 음소 혼동 확률들은 주해와 질의 모두에 대해 사용되었다. 당업자라면, 상이한 인식 시스템이 이들을 생성하는데 사용되면, 상이한 음소 혼동 확률들이 주해 및 음소에 대해 사용되어야한다는 것을 인식할 것이다. 이러한 혼동 확률들은 음소 시퀀스를 생성하는데 사용된 인식 시스템에 따른다.In the above example, the same phoneme confusion probabilities were used for both annotation and query. Those skilled in the art will recognize that different phoneme confusion probabilities should be used for annotations and phonemes if different recognition systems are used to generate them. These confusion probabilities depend on the recognition system used to generate the phoneme sequence.

상기 실시예에서, 주해 또는 질의가 텍스트로부터 생성될 때, 타이핑된 텍스트에 대응하는 음소들의 기준 시퀀스가 올바르다고 가정하였다. 이는 타이핑된 단어 또는 단어들의 스펠링이 틀리거나 잘못 타이핑되는 경우는 존재할 수 없다는 가정이다. 그러므로, 다른 실시예에서는 혼동 확률들이 타이핑된 질의들 및/또는 주해들에 대해 또한 사용될 수 있다. 즉, 수학식 4 및 12는 주해 또는 질의가 모두 텍스트인 경우에도 사용될 수 있다. 사용되는 혼동 확률들은 스펠링이 틀리거나 타이핑이 틀리는 것 중 어느 하나이거나 양자 모두인 것을 모두 포함할 수 있다. 당업자라면, 타이핑 오류(mis-typing)에 대한 혼동 확률이 사용되는 키보드의 형태에 종속적일 것이라는 사실을 인식할 것이다. 특히, 타이핑 오류 단어의 혼동 확률들은 키보드의 레이아웃에 종속적일 것이다. 예를 들어, 문자 "d"가 타이핑되면 문자 "d" 키 주위의 키들은 높은 오타 확률을 가질 것이며 "d" 키로부터 보다 멀리 위치한 키들은 타이핑 오류 확률이 있을 것이다. 상술한 바와 같이, 이러한 타이핑 오류 확률들은 단어의 스펠링 오류(mis-spelling)에 대한 혼동 확률과 함께 사용되거나 대체될 수 있다. 이러한 스펠링 오류 확률은 다수의 상이한 사용자들로부터 타이핑된 문서들을 분석하고 자주 발생하는 스펠링 오류의 형태를 감시함으로써 해결될 수 있다. 이러한 스펠링 오류 확률들은 또한 키잉 오류(mis-keying)에 의해 야기된 녹음 에러로 고려될 수 있다. 이러한 실시예에서, 사용되는 동적 프로그래밍 제약들은 타이핑 입력에서 삽입 및/또는 삭제가 가능하도록 한다. 예를 들어, 도 11에 도시된 제약들이 사용될 수 있다.In this embodiment, when the annotation or query is generated from the text, it is assumed that the reference sequence of the phonemes corresponding to the typed text is correct. This assumes that spelling of the typed word or words can not be present if wrong or incorrectly typed. Thus, in other embodiments, confusion probabilities may also be used for typed queries and / or comments. That is, equations (4) and (12) can also be used when the annotation or query is both text. The confusion probabilities used may include either spelling wrong, incorrect typing, or both. Those skilled in the art will recognize that the confusion probability for mis-typing will depend on the type of keyboard being used. In particular, the confusion probabilities of typing error words will depend on the layout of the keyboard. For example, if the letter "d" is typed, the keys around the letter "d" key will have a high typographical probability and the keys farther from the "d" key will have typing error probability. As noted above, these typing error probabilities can be used or replaced with confusion probabilities for mis-spelling of words. This spelling error probability can be resolved by analyzing typed documents from a number of different users and monitoring the type of spelling errors that often occur. These spelling error probabilities can also be considered as recording errors caused by mis-keying. In such an embodiment, the dynamic programming constraints used allow insertion and / or deletion at the typing input. For example, the constraints shown in Fig. 11 can be used.

다른 선택은 텍스트가 (이동 전화의 키보드와 같은) 각각의 키에 하나 이상의 문자를 할당한 키보드를 통해 입력되는 것이며, 여기서 사용자는 키에 할당된 문자를 통해 사용자가 각각의 키를 일정 주기로 반복적으로 눌러야 한다. 이러한 실시예에서, 혼동 확률은 입력 문자와 동일한 키가 할당된 문자가 다른 키들과 연관된 것보다 높은 타이핑 오류 혼동 확률을 가지게 된다. 이는 이동 전화를 사용하여 텍스트 메시지를 전송하였던 누군가가 원하는 문자를 입력하기 위해 맞는 횟수만큼 키가 눌려지지 않았기 때문에 타이핑 오류가 발생한 것이다.Another choice is that the text is input via a keyboard that assigns one or more characters to each key (such as the keyboard of a mobile phone), where the user can repeatedly < RTI ID = 0.0 > repeatedly Should be pressed. In this embodiment, the confusion probability has a higher typing error confusion probability than the character assigned the same key as the input character is associated with other keys. This is because a typing error has occurred because a key has not been pressed the correct number of times to input a desired character by someone who has transmitted a text message using a mobile phone.

상기 실시예들에서, 제어 장치는 상기 수학식 4 또는 12를 사용하여 각각의 천이에 대한 디코딩 스코어들을 계산하였다. 이러한 수학식들에 따라 시스템에 알려진 모든 가능한 음소들을 합계하는 대신에, 제어 장치는 합계 내의 확률 항을 최대화하는 알려지지 않은 음소 p_r를 식별하고 그 최대 확률을 주해 및 질의의 대응하는 음소들을 디코딩하는 확률로서 사용하도록 배열될 수 있다. 그러나, 이는 음소 p_r가 합계 내의 확률 항을 최대화하는 추가 게산을 포함하므로 바람직하지 않다.In the above embodiments, the controller calculates the decoding scores for each transition using Equation (4) or (12) above. Instead of summing all the possible phonemes known to the system according to these equations, the control device identifies the unknown phoneme _r maximizing the probability terms in the sum and decodes the corresponding phonemes of the annotation and query Probability can be arranged to use. However, this is undesirable because the phoneme p _r contains additional calculations that maximize the probability term in the sum.

상술한 제1 실시예에서, 동적 프로그래밍 알고리즘 동안에, 수학식 4는 음소들의 각각의 정렬된 쌍에 대해 계산되었다. 수학식 4의 계산에서, 주해 음소 및 질의 음소는 시스템에 알려진 각각의 음소들과 비교되었다. 당업자라면, 주어진 주해 음소 및 질의 음소쌍에 대해, 수학식 4에서 주어진 다수의 확률들이 0에 근접하거나 동일하다는 것을 인식할 것이다. 따라서, 다른 실시예에서는, 주해 및 질의 음소쌍이 단지 모든 알려진 음소들의 서브셋과 비교될 수 있는데, 여기서 서브셋은 혼동 통계에서 앞서서 결정된다. 이러한 실시예를 구현하기 위해, 주해 음소 및 질의 음소는 수학식 4를 사용하여 주해 및 질의 음소들과 비교되는 모델 음소들을 식별하는 룩업 테이블(lookup table)을 어드레싱(addressing)하는데 사용될 수 있다.In the first embodiment described above, during the dynamic programming algorithm, Equation 4 was calculated for each ordered pair of phonemes. In the calculation of equation (4), the phoneme and the query phoneme were compared with each phoneme known to the system. It will be appreciated by those skilled in the art that for a given phoneme and query phoneme pair, the multiple probabilities given in equation (4) are close to or equal to zero. Thus, in other embodiments, the annotation and query phoneme pairs can be compared only to a subset of all known phonemes, where the subset is determined ahead of the confusion statistics. To implement this embodiment, the annotation phoneme and the query phoneme can be used to address a lookup table that identifies the model phonemes that are compared with the annotation and query phonemes using Equation (4).

상기 실시예들에서, 정렬되고 매칭된 주해 및 질의의 특성들은 음성의 표현 단위이다. 당업자라면, 상술한 기술이 질의 및 주해의 특성이 특성 시퀀스들을 생성한 인식 시스템의 부정확성으로 인해 혼동될 수 있는 다른 응용에 사용될 수 있다는 것을 인식할 것이다. 예를 들어, 상기 기술은 인식 시스템이 오류를 일으킬 가능성이 있는 광학 문자 또는 필기 인식 시스템에 사용될 수 있다.In the above embodiments, the characteristics of the aligned and matched annotations and queries are the units of speech representation. Those skilled in the art will appreciate that the techniques described above may be used in other applications where the nature of the query and annotation may be confused by the inaccuracy of the recognition system that generated the feature sequences. For example, the techniques may be used in an optical character or handwriting recognition system where the recognition system is likely to fail.

다수의 실시예들과 변형예들이 위에서 설명되었으나, 당업자라면 다른 많은 실시예들 및 변형예들이 이루어질 수 있다는 것을 인식할 것이다.While a number of embodiments and variations have been described above, those skilled in the art will recognize that many other embodiments and modifications may be made.

Claims

In the characteristic comparison device,

Means for receiving characteristics of a first sequence and characteristics of a second sequence;

Means for aligning a characteristic of the first sequence with a characteristic of the second sequence to form a plurality of aligned characteristic pairs;

Means for comparing characteristics of each of the aligned feature pairs to generate a comparison score that indicates similarities between the aligned feature pairs; And

Means for combining a comparison score for all aligned feature pairs to provide a measure of the similarity between the characteristics of the first sequence and the characteristics of the second sequence

/ RTI >

Wherein the comparison means comprises:

Comparing the characteristics of the first sequence of each of the plurality of features taken from the predetermined feature set and the characteristics of the first sequence of the aligned pair against each of the aligned pairs to determine a similarity between the characteristics of the first sequence and the respective characteristics from the set First comparing means for providing a corresponding plurality of intermediate comparison scores indicating;

Comparing the characteristics of each of the plurality of characteristics from the set and the characteristics of the second sequence of the aligned pair against each of the aligned pairs to determine the similarity between the characteristics of the second sequence and the respective characteristics from the set, Second comparison means for providing a corresponding plurality of intermediate comparison scores of the second comparison means; And

Means for calculating the comparison score for the aligned pair by combining the plurality of intermediate comparison scores,

And the characteristic comparison device.

The characteristic comparison device of claim 1, wherein the first and second comparison means are operable to compare, respectively, the characteristics of the first sequence and the characteristics of the second sequence with the characteristics of each of the predetermined feature sets.

3. The method of claim 1 or 2, wherein the comparing means is adapted to generate a comparison score for an ordered set of characteristics that indicates a probability of confusing the characteristics of the second sequence of ordered pairs into the characteristics of the first sequence of ordered pairs Operable characteristic comparison device.

4. The apparatus of claim 3, wherein the first and second comparison means are operable to provide an intermediate comparison score indicative of a probability of confusing the corresponding property taken from the predetermined property set into the aligned property pair.

5. The method of claim 4, wherein the computing means is configured to: (i) multiply the intermediate score obtained when comparing the characteristics of the first sequence of the ordered pair and the characteristics of the second sequence to the same characteristic from the set, To provide a comparison score, and (ii) to add the multiplied median score of the result to calculate the comparison score for the aligned pair.

6. The method of claim 5, wherein the characteristics of each of the predetermined feature sets have a certain probability that will occur in a sequence of characteristics, and the calculation means calculates a probability of occurrence of each characteristic from the set used to generate the multiplied intermediate comparison score To each of the multiplied intermediate comparison scores.

7. The apparatus of claim 6, wherein the calculation means is operable to calculate:

q _j and a _i are the characteristics of the first sequence and the characteristics of the second sequence, respectively, of the aligned pair; P (q _j | p _r ) is a probability to confuse the set characteristic p _r with the characteristic q _j of the first sequence; P (a _i | p _r ) is the probability that the set characteristic p _r will be confused with the characteristic a _i of the second sequence; P (p _r ) is a probability of a set characteristic p _r occurring in a sequence of characteristics.

8. The apparatus of claim 7, wherein the confusion probabilities for the characteristics of the first sequence and for the characteristics of the second sequence are predetermined and are used to generate each of the first and second sequences.

9. The method according to any one of claims 5 to 8, wherein the intermediate score represents a log probability, and the calculation means is operable to perform the multiplication by adding each intermediate score, And perform an addition of the multiplied scores.

10. The apparatus of claim 9, wherein the combining means is operable to add a comparison score for all aligned pairs to determine the similarity measure.

11. A method according to any one of the preceding claims, wherein the alignment means is operable to identify a characteristic deletion and insertion in a characteristic of the first sequence and in a characteristic of a second sequence, And to generate the comparison score for the feature pairs aligned according to the feature deletion and insertion identified by the alignment means occurring in the vicinity of the pair.

12. Apparatus according to any one of the preceding claims, wherein the means for sorting comprises dynamic programming means for sorting the characteristics of the first sequence and the characteristics of the second sequence using dynamic programming techniques.

13. The apparatus according to claim 12, wherein the dynamic programming means is operable to step-by-step determine a plurality of possible arrangements between the characteristics of the first sequence and the characteristics of the second sequence, And to determine a comparison score for each possible pair of aligned properties.

14. The apparatus of claim 13, wherein the comparing means is operable to generate the comparison score during a phased determination of the possible alignment.

15. The apparatus according to any one of claims 12 to 14, wherein the dynamic programming means is operable to determine an optimum alignment between the characteristics of the first sequence and the characteristics of the second sequence, Lt; RTI ID = 0.0 > 1, < / RTI >

15. The characteristic comparison device according to claim 13 or 14, wherein the combining means is operable to provide the similarity measure by combining all comparison scores for all possible aligned feature pairs.

17. The method according to any one of claims 1 to 16, wherein each characteristic in the characteristic of the first sequence and the characteristic of the second sequence belongs to the set of predetermined characteristics, To provide said intermediate score using predetermined data correlating the characteristics of said characteristic points.

18. The apparatus of claim 17, wherein the predetermined data used by the first comparing means is in accordance with a system used to generate the characteristics of the first sequence, and wherein the predetermined data used by the second comparing means Is used to generate the characteristics of the second sequence different from the predetermined data used by the second sequence.

19. The apparatus of claim 17 or 18, wherein the predetermined or respective predetermined data comprises a probability of confusing the characteristic for each characteristic of the characteristic set with each of the other characteristics of the characteristic set.

20. The apparatus of claim 19, wherein the predetermined or respective predetermined data further comprises a probability of inserting a characteristic into a sequence of characteristics for each characteristic of the set of characteristics.

21. Apparatus according to any one of claims 19 to 20, wherein the predetermined or respective predetermined data further comprises a probability of deleting the characteristic from the sequence of characteristics for each characteristic of the set of characteristics.

22. Apparatus according to any one of the preceding claims, wherein the characteristics of the first sequence and the characteristics of the second sequence represent a temporal sequential signal.

23. The apparatus of any one of claims 1 to 22, wherein the characteristics of the first sequence and the characteristics of the second sequence are audio signals.

24. The apparatus of claim 23, wherein the characteristics of the first sequence and the characteristics of the second sequence represent text and / or speech.

25. The apparatus of claim 24, wherein each characteristic is a sub-word unit of text or speech.

The characteristic comparison apparatus according to claim 25, wherein each characteristic is a phoneme.

27. The method of any one of claims 1 to 26, wherein the characteristics of the first sequence include a plurality of sub-word units generated from a typed input, and wherein the first comparing means comprises a mis-typing probability And / or to provide the intermediate comparison score using a mis-spelling probability.

28. The method of any one of claims 1 to 27, wherein the characteristics of the second sequence include a sequence of sub-word units generated from a spoken input, and wherein the second comparison means uses a recognition error probability A characteristic comparison device operable to provide a median score.

29. The method according to any one of claims 1 to 28,

The receiving means being operable to receive characteristics of three or more sequences,

The alignment means being operable to align a characteristic of each characteristic of the received sequence to form a plurality of aligned characteristic groups,

Wherein the comparing means is operable to compare the characteristics of each ordered property group to generate a comparison score representing the similarity between the sorted property groups,

Wherein the combining means is operable to combine a comparison score for all ordered property groups to provide a measure of similarity between properties of the three or more sequences.

30. The apparatus of claim 29, wherein the alignment means is operable to simultaneously align the sequence of characteristics with respect to each other.

31. The method according to any one of claims 1 to 30,

The receiving means being operable to receive a plurality of characteristics of a second sequence,

Wherein the alignment means is operable to align the characteristics of the first sequence with the characteristics of each second sequence to form a plurality of aligned property pairs for each alignment,

Wherein the combining means is operable to combine a comparison score for each alignment to provide a respective measure of similarity between the characteristics of the first sequence and the characteristics of the plurality of second sequences.

32. The apparatus of claim 31, further comprising means for comparing a plurality of similarity measurements output by the combining means, and means for outputting a signal representative of characteristics of a second sequence most similar to the characteristics of the first sequence Device.

The characteristic comparison device according to claim 31 or 32, wherein the combining means includes normalization means for normalizing each of the similarity measurement values.

34. The feature comparison device of claim 33, wherein the normalization means is operable to normalize each similarity measure by dividing each similarity measure by a respective normalization score that varies with a length of a characteristic of the corresponding second sequence.

35. The apparatus of claim 34, wherein each normalization score varies according to a property sequence of characteristics of the corresponding second sequence.

36. The characteristic comparison device according to claim 34 or 35, wherein each normalization score is changed with a corresponding intermediate comparison score calculated by the second comparison means.

37. Apparatus according to any one of claims 33 to 36, wherein the means for sorting comprises dynamic programming means for aligning the characteristics of the first sequence and the characteristics of the second sequence using dynamic programming techniques, And to calculate each normalization score during the stepwise computation of the possible alignment by the dynamic programming means.

38. The apparatus of claim 37, wherein the normalization means is operable to calculate the following equation for each possible aligned property pair,

P (a _i | p _r ) is a probability that the set property p _r is confused with the characteristic a _i of the second sequence, and P (p _r ) is the probability of the set property p _r occurring in the sequence of characteristics.

39. The feature comparison device of claim 38, wherein the normalization means is operable to calculate the respective normalization by multiplying normalized terms that are calculated for each aligned feature pair.

An apparatus for searching a database that includes a plurality of information entries, each having an associated annotation that identifies information to be retrieved and includes a sequence of annotation properties,

Means for receiving a plurality of renditions of an input query;

Means for transforming each rendition of the input query into a sequence of query characteristics representing the rendition;

Means for comparing the query property of each rendition to the annotation property of each annotation to provide a comparison result set;

Means for combining the comparison results obtained by comparing the query properties of each rendition with the annotation properties of the same annotation to provide a measure of similarity between the input query and the annotation for each annotation; And

Means for identifying said information to be retrieved from said database using similarity measures provided by said combining means for all annotations;

And a database search device.

41. The apparatus of claim 40, wherein the comparing means is operable to compare the query characteristics of each rendition simultaneously with the annotation characteristics of the current annotation.

42. The method according to claim 40 or 41,

Wherein the comparison means comprises:

Means for aligning the query properties of each rendition with the annotation properties of the current annotation to form a plurality of ordered property groups each comprising a query property and an annotation property from each rendition; And

A feature comparator that compares the characteristics of each ordered feature group to produce a comparison score that indicates similarity between the groups of aligned features

Lt; / RTI >

Wherein the combining means is operable to combine a comparison score for all ordered property groups for a current annotation to provide a measure of similarity between the input query and the current annotation.

43. The apparatus of claim 42, wherein the feature comparator compares the group characteristics with each of a plurality of respective characteristics taken from a set of predetermined properties for each feature of the aligned group, Each characteristic comparison means providing a corresponding plurality of intermediate comparison scores indicative of the similarity of the plurality of comparison scores; And means for calculating the comparison score for the sorted group by combining the plurality of intermediate comparison scores generated by the respective feature comparison means.

44. The apparatus of any one of claims 40 to 43, wherein the sequence of voice annoying properties for some or all of the annotations is generated from an audio annotation signal.

45. The apparatus of any of claims 40 to 44, wherein the sequence of voice annotation properties for some or all of the annotations is generated from a text annotation.

46. The database search apparatus according to any one of claims 40 to 45, wherein the conversion means includes a speech recognition system.

47. The apparatus of any one of claims 40-46, wherein the at least one information entry is an associated comment.

An apparatus for searching a database comprising a plurality of information entries, each having an associated annotation that identifies information to be retrieved and comprises a sequence of characteristics,

Means for receiving an input query comprising a sequence of characteristics;

39. Apparatus according to any one of claims 1 to 39, for comparing a query sequence of characteristics to a characteristic of each annotation to provide a comparison result set. And

Means for identifying said information to be retrieved from said database using said comparison result

And a database search device.

An apparatus for searching a database that includes a plurality of information entries, each having an associated annotation that identifies information to be retrieved and comprises a sequence of voice properties,

Means for receiving an input query comprising a sequence of voice characteristics;

Means for comparing the query sequence of the speech characteristics with the speech characteristics of each annotation to provide a comparison result set; And

/ RTI >

The comparison means comprising a plurality of different comparison modes of operation,

The device

(I) means for determining whether the query sequence of the speech characteristics is an audio signal or a text, (ii) whether the sequence of speech characteristics of the current annotation is an audio signal or a text, and outputting a determination result ; And

And means for selecting an operation mode of the comparison means for the current annotation according to the determination result

Further comprising: a database searching unit for searching for a database of the database.

50. The apparatus of claim 49, wherein when said determining means determines that both said input query and said current annotation are generated from speech, And to select an operation mode to operate the database.

51. The apparatus of any of claims 48 to 50, wherein the at least one information entry is an associated comment.

In the characteristic comparison device,

Means for receiving first and second sequences of query properties, each representing a rendition of an input query;

Means for receiving a sequence of annotation properties;

Means for aligning the query properties of each rendition with the annotation properties of the annotation to form a plurality of ordered property groups each comprising a query property and an annotation property from each rendition;

Means for comparing characteristics of each ordered property group to produce a comparison score that indicates a similarity between properties of the sorted group; And

Means for combining a comparison score for the characteristics of all ordered groups to provide a measure of similarity between the input query and the renditions of the annotation;

/ RTI >

Wherein the comparison means comprises:

Comparing the characteristics of the first query sequence of the sorted group with each of the plurality of characteristics taken from the predetermined feature set and for each sorted group to determine the difference between the characteristics of the first query sequence and the respective characteristics from the set A first characteristic comparator that provides a corresponding plurality of intermediate comparison scores indicative of similarity;

Comparing the characteristics of the second query sequence of the sorted group to each of the plurality of characteristics from the set for each sorted group to determine a similarity between the characteristics of the second query sequence and respective characteristics from the set A second characteristic comparator for providing an additional corresponding plurality of intermediate comparison scores representing the intermediate comparison score;

For each sorted group, an annotation property of the sorted group to each of the plurality of properties from the set to determine an additional corresponding number of properties representing the similarity between the annotation property and the respective property from the set A third characteristic comparator for providing an intermediate comparison score; And

Means for calculating the comparison score for the sorted group by combining the plurality of intermediate comparison scores,

And the characteristic comparison device.

An apparatus for searching a database comprising a plurality of information entries, each having an associated annotation that identifies information to be retrieved and comprises a sequence of voice annotation properties,

Means for receiving a plurality of renditions of a verbal input query;

Means for transforming each rendition of the input query into a sequence of voice query characteristics representing the voice in the rendition;

Means for comparing a voice annotation characteristic of each annotation with a voice query characteristic of each rendition to provide a measure of similarity between the input query and each annotation; And

Means for identifying said information to be retrieved from said database using similarity measures provided by said combining means for all said annotations;

/ RTI >

The apparatus comprises:

Means for determining whether a sequence of speech characteristics of the current annotation is generated from an audio signal or from text and outputting a determination result; And

In the characteristic comparison method,

Receiving characteristics of the first sequence and characteristics of the second sequence;

Aligning the characteristics of the first sequence with the characteristics of the second sequence to form a plurality of aligned property pairs;

Comparing each ordered property pair to generate a comparison score that indicates similarity between the ordered property pairs; And

Combining the comparison scores for all aligned feature pairs to provide a measure of the similarity between the characteristics of the first sequence and the characteristics of the second sequence

Lt; / RTI >

Wherein the comparing comprises:

Comparing the characteristics of each of the plurality of features taken from the predetermined feature set for each aligned pair with the characteristics of the first sequence of the aligned pairs to determine a similarity between the characteristics of the first sequence and the respective characteristics from the set A first comparison step of providing a corresponding plurality of intermediate comparison scores;

For each aligned pair, comparing the characteristics of each of the plurality of characteristics from the set with the characteristics of the second sequence of the aligned pairs to determine the similarity between the characteristics of the second sequence and the respective characteristics from the set. A second comparison step of providing a corresponding plurality of intermediate comparison scores of the comparison result; And

Calculating the comparison score for the aligned pair by combining the plurality of intermediate comparison scores

Wherein the characteristic comparison method comprises the steps of:

55. The method of claim 54, wherein the first and second comparison steps compare respective characteristics of the predetermined feature set with characteristics of the first sequence and characteristics of the second sequence.

54. The method of claim 54 or 55, wherein the comparing step comprises comparing scores of the ordered pairs of characteristics indicating a probability of confusing the characteristics of the second sequence of the ordered pairs with the characteristics of the first sequence of the aligned pairs How to compare characteristics to generate.

57. The method of claim 56, wherein the first and second comparison steps provide an intermediate comparison score that indicates a probability of confusing a corresponding property taken from the predetermined property set into an ordered property pair.

58. The method of claim 57, wherein the step of calculating comprises: (i) multiplying the intermediate score obtained when comparing the characteristics of the first sequence of the ordered pair and the characteristics of the second sequence to the same characteristic from the set, Providing a comparison score, and (ii) adding the intermediate scores of the result multiplied to calculate a comparison score for the ordered pair.

59. The computer-readable medium of claim 58, wherein each characteristic of the predetermined feature set has a certain probability that will occur in a property sequence, and wherein the calculating step further comprises calculating each of the multiplied intermediate comparison scores using the set And the probability of each occurrence of the characteristic from the second characteristic.

60. The method of claim 59, wherein the calculating step calculates the following equation,

61. The method of claim 60, wherein the confusion probabilities for the characteristics of the first sequence and the characteristics of the second sequence are predetermined and are used to generate each of the first and second sequences.

62. The method of any one of claims 58 to 61, wherein the intermediate score represents a log probability, the calculating step performs the multiplication by adding each intermediate score, and performing the log addition calculation to calculate the multiplied score < RTI ID = And the addition of the characteristic values is performed.

63. The method of claim 62, wherein the combining step adds a comparison score for all aligned pairs to determine the similarity measure.

63. The method of any one of claims 54 to 63, wherein the aligning step identifies a feature deletion and insertion in the characteristics of the first sequence and the features of the second sequence, And generating a comparison score for the property pairs sorted in accordance with the property deletion and insertion identified by the sorting step occurring in the query.

65. The method of any of claims 54 to 64, wherein the aligning step uses dynamic programming techniques to align the properties of the first sequence and the properties of the second sequence.

66. The method of claim 65, wherein the step of arranging stepwise determines a plurality of possible alignments between the characteristics of the first sequence and the characteristics of the second sequence, A characteristic comparison method for determining a comparison score for each.

67. The method of claim 66, wherein the comparing step generates the comparison score during a staged determination of the possible alignment.

67. The method of any one of claims 65 to 67, wherein said aligning step determines an optimal alignment between the characteristics of said first sequence and the characteristics of a second sequence, And providing the similarity measure by combining the comparison scores only for the comparison score.

67. The method of claim 67 or 68, wherein the combining step provides the similarity measure by combining all comparison scores for all possible aligned feature pairs.

71. The method according to any one of claims 54 to 69, wherein each characteristic in the characteristic of the first sequence and the characteristic of the second sequence belongs to the set of predetermined characteristics, &Lt; / RTI > wherein the intermediate score is provided using predetermined data relating characteristics of the set to each other.

70. The method of claim 70, wherein the predetermined data used in the first comparison step is in accordance with a system used to generate a characteristic of the first sequence, and the predetermined data used in the second comparison step is used in the first comparison step Wherein the second sequence is different from the predetermined data and used to generate the characteristics of the second sequence.

72. The method of claim 70 or claim 71, wherein the predetermined or respective predetermined data comprises a probability to confuse the property with each of the other properties of the property set for each property of the property set.

73. The method of claim 72, wherein the predetermined or respective predetermined data further comprises the probability of inserting a characteristic into the sequence of characteristics for each characteristic of the set of characteristics.

74. The method of claim 72 or 73, wherein the predetermined or respective predetermined data further comprises the probability of deleting the property from the sequence of properties for each property of the feature set.

74. The method of any one of claims 54 to 74, wherein the characteristics of the first sequence and the characteristics of the second sequence are time sequential signals.

The method according to any one of claims 54 to 75, wherein the characteristics of the first sequence and the characteristics of the second sequence are audio signals.

77. The method of claim 76, wherein the characteristics of the first sequence and the characteristics of the second sequence are speech.

77. The method of claim 77, wherein each characteristic is a lower word unit of speech.

82. The method of claim 78, wherein each characteristic is a phoneme.

80. The method of any one of claims 54 to 79, wherein the characteristics of the first sequence include a plurality of sub-word units, and wherein the first comparing step uses a typing error and / or a spelling error probability to calculate an intermediate comparison score Wherein the characteristic comparison method comprises:

79. A method according to any one of claims 54 to 80, wherein the characteristics of the second sequence include a sub-word unit sequence generated from a verbal input, and the second comparison step uses a recognition error probability to provide a median score .

83. The method according to any one of claims 54 to 81,

Wherein the receiving step receives the characteristics of three or more sequences,

Wherein the aligning step aligns the characteristics of each of the characteristics of the received sequence to form a plurality of aligned property groups,

Wherein the comparing step compares the characteristics of each ordered property group to generate a comparison score representing the similarity between the sorted property groups,

The combining step combines the comparison scores for all ordered property groups to provide a measure of the similarity between the properties of the three or more sequences

How to compare characteristics.

83. The method of claim 82, wherein the aligning step simultaneously aligns the sequence of characteristics with respect to each other.

83. The method according to any one of claims 54 to 83,

Wherein the receiving step receives a plurality of characteristics of the second sequence,

Wherein the aligning step aligns the characteristics of the first sequence with the characteristics of each second sequence to form a plurality of aligned feature pairs for each alignment,

Wherein the combining step combines a comparison score for each alignment to provide a respective measure of similarity between the characteristics of the first sequence and the characteristics of the plurality of second sequences.

85. The method of claim 84, further comprising: comparing the plurality of similarity measurements output by the combining means; and outputting a signal representative of a characteristic of a second sequence most similar to the characteristics of the first sequence Comparison method.

84. The method of claim 84 or 85, wherein the combining step includes a normalizing step of normalizing each of the similarity measures.

90. The method of claim 86, wherein the normalizing step normalizes each similarity measure by dividing each similarity measure by a respective normalization score that varies with a length of a characteristic of the corresponding second sequence.

89. The method of claim 87, wherein each normalization score varies according to a sequence of characteristics of a characteristic of the corresponding second sequence.

The characteristic comparison method according to claim 87 or 88, wherein each normalization score is changed with a corresponding intermediate comparison score calculated by the second comparison step.

90. A method according to any one of claims 86-89, wherein the step of arranging stepwise determines a plurality of possible alignments between the characteristics of the first sequence and the characteristics of the second sequence, Wherein the normalization step computes a respective normalization score during the stepwise computation of the possible alignment by the aligning step.

91. The method of claim 90 wherein the normalizing step calculates the following equation for each possible aligned property pair,

P (a _i | p _r ) is the probability of confusion of the set property p _r with the property a _i of the second sequence, and P (p _r ) is the probability of the set property p _r occurring in the sequence of properties.

92. The method of claim 91, wherein the normalizing step calculates each normalization by multiplying normalized terms that are calculated for each ordered property pair.

CLAIMS 1. A method for retrieving a database comprising a plurality of information entries each having an associated annotation identifying the information to be retrieved and comprising a sequence of annotation properties,

Receiving a plurality of renditions of an input query;

Transforming each rendition of the input query into a sequence of query characteristics representing the rendition;

Comparing the query characteristics of each rendition to the annotation properties of each annotation to provide a comparison result set;

Combining the comparison results obtained by comparing the query properties of each rendition to the annotation properties of the same annotation to provide a measure of similarity between the input query and the annotation for each annotation; And

Identifying the information to be retrieved from the database using similarity measures provided by the combining step for all annotations

/ RTI >

The method of claim 93, wherein the comparing step simultaneously compares the query characteristics of each rendition with the annotation characteristics of the current annotation.

95. The method of claim 93 or 94,

Wherein the comparing comprises:

Aligning the query characteristics of each rendition with the annotation properties of the current annotation to form a plurality of sorted feature groups each including a query property and an annotation property from each rendition; And

Using a property comparator that compares the characteristics of each ordered property group to produce a comparison score that indicates a similarity between properties of the sorted group

Lt; / RTI >

Wherein the combining step combines a comparison score for all ordered property groups for the current annotation to provide a measure of the similarity between the input query and the current annotation.

95. A method according to any one of claims 93 to 95, wherein the sequence of the respective query characteristics and the sequence of the annotation properties represent audio signals.

96. The method of claim 96, wherein the sequence of each query characteristic and the sequence of annotative properties are voices.

97. The method of claim 97, wherein each characteristic is a lower word unit of speech.

97. The method of claim 98, wherein each characteristic is a phoneme.

100. The method of any one of claims 93 to 99, wherein the sequence of voice annotation properties for some or all of the annotations is generated from an audio annotation signal or a text annotation.

In the characteristic comparison method,

Aligning characteristics of the second sequence with characteristics of the first sequence;

Comparing each aligned property pair to generate a comparison score for the aligned property pair; And

Lt; / RTI >

Wherein the comparing comprises:

A first comparison step of comparing each of a plurality of possible characteristics with an ordered characteristic of the first sequence and providing a corresponding plurality of intermediate comparison scores; And

A second comparison step of comparing each of the plurality of possible characteristics with the aligned characteristics of the second sequence to provide an additional corresponding plurality of intermediate comparison scores

Lt; / RTI >

Wherein combining the plurality of intermediate comparison scores provides the comparison score for the aligned pair.

CLAIMS 1. A method for retrieving a database comprising a plurality of information entries each having an associated annotation identifying the information to be retrieved and comprising a sequence of properties,

Receiving an input query comprising a sequence of characteristics;

Using the method according to any one of claims 54 to 101 for comparing the query sequence of the characteristics and characteristics of each annotation to provide a comparison result set; And

Identifying the information to be retrieved from the database using the comparison result

/ RTI >

CLAIMS What is claimed is: 1. A method for searching a database comprising a plurality of information entries each having an annotation identifying information to be retrieved and comprising a sequence of voice properties,

Receiving an input query comprising a sequence of speech characteristics;

Comparing the query sequence of the speech characteristics with the speech characteristics of each annotation to provide a comparison result set; And

Lt; / RTI >

The comparing step may use a plurality of different comparison techniques to perform the comparison,

The method comprises:

(I) determining whether the query sequence of the speech characteristic is an audio signal or a text, (ii) determining whether the sequence of the speech characteristic of the current annotation is an audio signal or a text, and outputting the determination result; And

Selecting a technique to be used to perform the comparing step for the current annotation according to the determination result

Further comprising the steps of:

111. The method of claim 103, wherein when the determining step determines that the input query and the current annotation are both generated from a voice, the selecting step comprises searching a database for performing the method according to any one of claims 54 to 101. & Way.

Receiving a plurality of renditions of an input query;

Comparing the annotation properties of each annotation with the query properties of each rendition to provide a comparison result set;

Combining the comparison results obtained by comparing the annotation properties of the same annotation with the query characteristics of each rendition to provide a measure of similarity between the input query and the annotation for each annotation; And

Identifying said information to be retrieved from said database using said similarity measure provided by said combining step for all said annotations

/ RTI >

111. The database searching method according to claim 105, wherein the comparing step simultaneously compares the query characteristics of each rendition with the annotation characteristics of the current annotation.

107. The method of claim 105 or 106,

Wherein the comparing comprises:

Using a property comparator that compares the characteristics of each ordered property group to generate a comparison score that indicates a similarity between the groups of aligned properties

Lt; / RTI >

107. The apparatus of claim 107, wherein the feature comparator compares a respective feature of the group with a predetermined feature set to provide a corresponding plurality of intermediate comparison scores that indicate a similarity between the group feature and each feature from the set And calculating the comparison score for the sorted group by combining the corresponding generated plurality of intermediate comparison scores.

109. The database search method according to any one of claims 105 to 108, wherein the sequence of the respective query characteristics and the sequence of the annotative properties represent a time sequential signal.

109. The method of any one of claims 105 to 109, wherein the sequence of the respective query characteristics and the sequence of the annotation properties represent audio signals.

112. The method of claim 110, wherein the sequence of each query characteristic and the sequence of the annotative characteristic are voices.

111. The method of claim 111, wherein each characteristic is a lower word unit of speech.

The method of claim 112, wherein each characteristic is a phoneme.

115. The method of any one of claims 105 to 113, wherein the sequence of voice annotation properties for some or all of the annotations is generated from an audio annotation signal.

115. The method of any one of claims 105 to 113, wherein the sequence of voice annotation properties for some or all of the annotations is generated from a text annotation.

115. The method of any one of claims 105 to 115, wherein the converting step uses a speech recognition system.

114. The method of any one of claims 105 to 116, wherein the at least one information entry is an associated comment.

In the characteristic comparison method,

The method comprising: receiving first and second sequences of query characteristics each representing a rendition of an input query;

Receiving a sequence of annotation properties;

Aligning the query properties of each rendition with the annotation properties of the annotation to form a plurality of ordered property groups each comprising a query property and an annotation property from each rendition;

Comparing characteristics of each ordered property group to generate a comparison score that indicates similarities between properties of the sorted group; And

Combining the comparison scores for the characteristics of all ordered groups to provide a measure of similarity between the input query and the renditions of the annotation;

Lt; / RTI >

Wherein the comparing comprises:

Comparing the characteristics of the first query sequence of the sorted group with each of the plurality of characteristics taken from the set of predetermined characteristics for each sorted group to determine a difference between the characteristics of the first query sequence and the respective characteristics from the set Providing a corresponding plurality of intermediate comparison scores indicative of similarity;

Comparing the characteristics of each of the plurality of characteristics from the set and the characteristics of the second query sequence of the sorted group for each sorted group to determine a similarity between the characteristics of the second query sequence and the respective characteristics from the set, Providing a further corresponding plurality of intermediate comparison scores representing a plurality of intermediate comparison scores;

For each sorted group, comparing each of the plurality of properties from the set with the annotation properties of the ordered group to determine an additional corresponding plurality of annotations representing the similarity between the annotation property and the respective property from the set Providing an intermediate comparison score; And

Calculating the comparison score for the sorted group by combining a plurality of intermediate comparison scores

Wherein the characteristic comparison method comprises the steps of:

CLAIMS What is claimed is: 1. A method for retrieving a database comprising a plurality of information entries each having an associated annotation identifying the information to be retrieved and comprising a sequence of voice annotation properties,

Receiving a plurality of renditions of a verbal input query;

Converting each rendition of the input query into a sequence of voice query characteristics representing the voice in the rendition;

Comparing the voice annotation characteristics of each annotation with the voice query characteristics of each rendition to provide a measure of similarity between the input query and each annotation; And

Identifying the information to be retrieved from the database using the similarity measure provided by the combining step for all annotations

Lt; / RTI >

Wherein the comparing comprises a plurality of different comparison modes of operation,

The method comprises:

Determining whether a sequence of speech characteristics of the current annotation is generated from an audio signal or from text and outputting a determination result; And

Selecting an operation mode of the comparison step for the current annotation according to the determination result

Further comprising the steps of:

119. The method of any one of claims 102 to 119, wherein the at least one information entry is an associated annotation.

120. The method according to any one of claims 54 to 120, wherein the method steps are performed in the claimed order.

121. A storage medium storing a processor capable of executing instructions that control a processor to perform the method according to any one of claims 54 to 121.

121. A processor capable of executing instructions to control a processor to perform the method of any one of claims 54 to 121.