KR20130107325A

KR20130107325A - Spell-check for a keyboard system with automatic correction

Info

Publication number: KR20130107325A
Application number: KR1020137014217A
Authority: KR
Inventors: 얼랜드 운루
Original assignee: 뉘앙스 커뮤니케이션즈, 인코포레이티드
Priority date: 2010-11-04
Filing date: 2011-11-04
Publication date: 2013-10-01
Also published as: WO2012061701A4; EP2636149A1; EP2636149A4; WO2012061701A1; WO2012061701A8; KR101520455B1; CN103299550A; CN103299550B

Abstract

터치 감응 디스플레이 상에 표현되는 키보드를 가로질러 연속적으로 추적되는 경로를 지정하는, 사용자 입력이 수신된다. 규정된 기준에 의해 추적된 키 및 추적된 키에 근접한 보조 키들을 포함하는, 입력 시퀀스가 해결된다. 규정된 어휘의 하나 또는 그 이상의 후보 엔트리들을 위하여, 상기 입력 시퀀스 및 각각의 상기 후보 엔트리들 사이에 세트-편집-거리 메트릭이 계산된다. 세트-편집-거리 메트릭을 계산하는데 페널티를 부과하거나 또는 부과하지 않을 때, 다양한 룰들이 지정된다. 후보 엔트리들은 계산된 메트릭에 따라 순위가 매겨지고 디스플레이된다. 여기에 설명된 특징들은 장치, 프로그래밍된 제품, 방법, 회로, 또는 이들의 조합으로서 구현될 수 있다.User input is received that specifies a path that is continuously tracked across the keyboard represented on the touch-sensitive display. An input sequence is solved, comprising a key tracked by defined criteria and auxiliary keys proximate the tracked key. For one or more candidate entries of the defined vocabulary, a set-edit-distance metric is calculated between the input sequence and each of the candidate entries. Various rules are specified when imposing or not penalizing the calculation of the set-edit-distance metric. Candidate entries are ranked and displayed according to the calculated metric. The features described herein may be implemented as a device, programmed product, method, circuit, or a combination thereof.

Description

Spell check for keyboard systems with automatic correction {SPELL-CHECK FOR A KEYBOARD SYSTEM WITH AUTOMATIC CORRECTION}

관련 특허의 교차 참조Cross-reference of related patents

본 발명은 2010년 11월 4일에 출원된 미국특허 제 12/939,918의 우선권을 주장하며, 여기에 전체 내용이 참조로써 통합된다.
The present invention claims the priority of U.S. Patent No. 12 / 939,918, filed November 4, 2010, which is hereby incorporated by reference in its entirety.

본 발명은 데이터 입력 장치들에 관한 것이다. 더 구체적으로, 본 발명은 자동 수정 능력을 갖는 키보드 시스템용 맞춤법 검사(spell-check) 메커니즘에 관한 것이다.
The present invention relates to data input devices. More specifically, the present invention relates to a spell-check mechanism for keyboard systems with automatic correction capability.

치환된/첨가된/빠뜨린 글자들을 위한 고전적인 맞춤법 검사("편집 거리(edit distance)") 기술들은 상대적으로 긴 역사를 갖는다. 예를 들면, K. Kukich의 Techniques for Automatically Correcting Words, ACM Computing Surveys, Vol 24, No. 4(1992년 12월); J.L. Peterson의 Computer Programs for Detecting and Correcting Spelling Errors, The Communicators of the ACM, Volume 23, No. 12(1980년 12월); 및 J. Daciuk의 International Construction of Finite-State Automate and Transducers, and their Use in the Natural Language Progressing (1988)라는 문헌에서의 맞춤법 수정이 참조된다.
Classic spell check ("edit distance") techniques for substituted / added / missing letters have a relatively long history. For example, K. Kukich's Techniques for Automatically Correcting Words, ACM Computing Surveys, Vol 24, No. 4 (December 1992); JL Peterson's Computer Programs for Detecting and Correcting Spelling Errors, The Communicators of the ACM, Volume 23, No. 12 (December 1980); And Spelling Corrections in J. Daciuk, International Construction of Finite-State Automate and Transducers, and their Use in the Natural Language Progressing (1988).

그러나 고전적인 맞춤법 검사 기술들은 타이핑된(typed) 단어 및 의도하는 정확한 단어 사이의 특정 수의 차이만을 처리할 수 있다. 최상의 수정 후보는 최소한의 변경들을 갖는 것으로 추정되기 때문에, 맞춤법 검사 알고리즘들은 예를 들면, 키보드 상의 타이피스트(typist)의 손가락의 이동, 또는 터치 스크린 키보드 상에서의 다급하고 부정확한 태핑(tapping), 및 따라서 거의 모든 글자를 틀리게 타이핑함으로써 당황하게 된다.
Classic spell checking techniques, however, can only handle a certain number of differences between the typed word and the exact word intended. Since the best modification candidate is assumed to have minimal changes, the spell checking algorithms are for example the movement of a typist's finger on the keyboard, or an urgent and incorrect tapping on a touch screen keyboard, and thus You are embarrassed by typing almost all the letters incorrectly.

특히 저성능 모바일 장치들 상의 전산 처리의 양을 한정하기 위하여, 고전적인 알고리즘의 구현들은 애매성(ambiguity) 및 따라서 고려되는 후보 단어들의 수를 감소시키도록 가정하거나 제약을 가한다. 예를 들면, 그것들은 수정되려는 단어의 첫 글자들에 의존할 수 있거나 또는 어휘의 크기를 엄격하게 한정할 수 있다.
In particular, in order to limit the amount of computational processing on low performance mobile devices, implementations of classical algorithms assume or impose constraints to reduce ambiguity and thus the number of candidate words considered. For example, they may depend on the first letters of the word to be corrected or may strictly limit the size of the vocabulary.

접촉 감응 표면들 상의 키보드들 및 전화기 키패드 모두에 유용한, 자동 오류 수정의 또 다른 형태는 각각의 입력 위치 및 가까운 글자 사이의 거리를 계산하고 가능한 단어들에 대하여 전체 입력 순서를 비교한다. 단어 사용의 가장 높은 빈도 및/또는 신근성(recency)과 결합되는, 글자들이 입력 위치들에 가까운 단어가 최선의 수정 후보이다. 이러한 기술은 이동된 손가락들 및 다급한 태핑 모두를 쉽게 수정한다. 또한 초기 글자들이 모두 정확하게 입력되지 않더라도 합리적인 단어 완성을 제공할 수 있다.
Another form of automatic error correction, useful for both keyboards and telephone keypads on touch sensitive surfaces, calculates the distance between each input location and near letters and compares the entire input order for possible words. Words whose letters are close to input locations, combined with the highest frequency and / or recency of word usage, are the best candidates for modification. This technique easily corrects both moved fingers and urgent tapping. It can also provide reasonable word completion even if the initial letters are not all entered correctly.

다음의 특허들은 명확성과 자동 수정 애매한 키들, 소프트 키보드들, 필기 인식 시스템들을 위한 "슬로피 타이프(Sloppy Type)" 엔진의 사용을 설명한다: "Keyboard System With Automatic Correction"이라는 발명의 명칭으로 B. Alex Robinsom, Michael R. Longe에 의해 출원된 미국특허 6,801,190(2004년 10월 5일); 미국특허 7,088,345(2006년 8월 8일)과 미국특허 7,277,088(2007년 10월 2일), 및 Robinson 들에 의해 "Handwriting And Voice Input With Automatic Correction"이라는 발명의 명칭으로 출원된 미국특허 7,319,957(2008년 1월 15일)과 미국특허 제 11/043,525(2005년 1월 25일에 출원). 또한 "Adjusting keyboard"라는 발명이 명칭으로 출원된 Garrett R. Vargas의 미국특허 제 5,748,512(1998년 5월 5일)가 참조된다.
The following patents describe the use of a "Sloppy Type" engine for clarity and automatic correction obscure keys, soft keyboards and handwriting recognition systems: B. In the name of the invention "Keyboard System With Automatic Correction". US Patent 6,801,190 (October 5, 2004) filed by Alex Robinsom, Michael R. Longe; U.S. Patent 7,319,957 (2008) filed under the name "Handwriting And Voice Input With Automatic Correction" by U.S. Patent 7,088,345 (August 8, 2006) and U.S. Patent 7,277,088 (October 2, 2007), and Robinsons. January 15, and US Patent No. 11 / 043,525, filed January 25, 2005. See also US Pat. No. 5,748,512 (5 May 1998) to Garrett R. Vargas, filed under the name “Adjusting keyboard”.

게다가, 다음의 특허들은 텍스트 명확화를 위한 수동 및 음성 입력의 조합들을 포함한다: "Multimodal Disambiguation of Speech Recognition"이라는 발명의 명칭으로 Longe 등에 의해 2006년 6월 1일에 출원된 미국특허 제 11/143/408; 및 "Method and Apparatus Utilizing Voice Input to Resolve Ambiguous Manually Entered Text Input"이라는 발명의 명칭으로 Stephanick 등에 의해 2006년 7월 2일에 출원된 미국특허 제 11/350,234.
In addition, the following patents include combinations of manual and speech input for text clarification: US Patent No. 11/143, filed June 1, 2006 by Longe et al. Under the name " Multimodal Disambiguation of Speech Recognition " / 408; And US patent application Ser. No. 11 / 350,234, filed Jul. 2, 2006, by Stephanick et al., Entitled "Method and Apparatus Utilizing Voice Input to Resolve Ambiguous Manually Entered Text Input."

위에서 설명된 "슬로피 타이프" 기술은 전체 단어 상의 거리 기반 오류 수정을 사용한다. 입력 시퀀스의 길이는 의도되는 단어의 길이와 동일하고 각각의 입력 위치가 적절한 순서 내에 존재한다고 가정하면 이는 각각의 입력을 위하여 다중의 인근 글자들을 고려함으로써 도입되는 증가된 애매성을 보상하는데 도움을 준다. 그러나 작은 대상 오류에 더하여, 사람들은 또한 타이핑할 때 키를 바꾸거나, 키를 이중 태핑하거나, 키를 완전히 놓치거나, 또는 단어를 잘못 쓴다.
The "sloppy type" technique described above uses distance based error correction on whole words. Assuming the length of the input sequence is the same as the length of the intended word and assuming that each input position is in the proper order, this helps to compensate for the increased ambiguity introduced by considering multiple adjacent letters for each input. . But in addition to small target errors, people also change keys when they type, double tap keys, miss a key entirely, or misspell words.

정확한 수정 및 허용가능한 실행 모두를 제공하는 것과 같은 방법으로 모든 형태의 타이핑 오류를 처리하기 위한 메커니즘을 제공하는 것이 바람직할 수 있다.
It may be desirable to provide a mechanism for handling all types of typing errors in such a way as to provide both corrective and acceptable execution.

터치 감응 디스플레이(touch sensitive display) 상에 표현되는 키보드를 가로질러 연속적으로 추적되는(taced) 경로를 지정하는, 사용자 입력이 수신된다. 규정된 기준에 의해 추적된 키 및 추적된 키에 근접한 보조 키들을 포함하는, 입력 시퀀스가 해결된다. 규정된 어휘의 하나 또는 그 이상의 후보 엔트리(entry)들을 위하여, 상기 입력 시퀀스 및 각각의 상기 후보 엔트리들 사이에 세트-편집-거리 메트릭(set-edit-distance metric)이 계산된다. 세트-편집-거리 메트릭을 계산하는데 페널티(penalty)를 부과하거나 또는 부과하지 않을 때, 다양한 룰(rule)들이 지정된다. 후보 엔트리들은 계산된 메트릭에 따라 순위가 매겨지고(ranked) 디스플레이된다.
User input is received that specifies a path that is taced continuously across the keyboard represented on a touch sensitive display. An input sequence is solved, comprising a key tracked by defined criteria and auxiliary keys proximate the tracked key. For one or more candidate entries of the defined vocabulary, a set-edit-distance metric is calculated between the input sequence and each of the candidate entries. When rules impose or do not impose a penalty on calculating a set-edit-distance metric, various rules are specified. Candidate entries are ranked and displayed according to the calculated metric.

여기에 설명된 특징들은 장치, 프로그래밍된 제품, 방법, 회로, 또는 이들의 조합으로서 구현될 수 있다.
The features described herein may be implemented as a device, programmed product, method, circuit, or a combination thereof.

도 1은 본 발명의 일 실시 예에 따른 자동 수정을 갖는 키보드용 맞춤법 검사 방법의 플로 다이어그램이다.
도 2는 본 발명의 일 실시 예에 따른 맞춤법 검사 및 자동 수정을 갖는 입력 시스템의 하드웨어 블록 다이어그램이다.
도 2a는 본 발명의 일 실시 예에 따른 디지털 데이터 처리 기계의 블록 다이어그램이다.
도 2b는 본 발명의 일 실시 예에 따른 바람직한 기억 매체를 도시한다.
도 2c는 본 발명의 일 실시 예에 따른 바람직한 논리 회로의 배경도이다.
도 3은 본 발명의 일 실시 예에 따라, 툴(tool)로서 매트릭스(matrix)를 사용하는 입력 단어 및 대상 단어 사이의 표준 편집-거리 계산을 도시한 테이블이다.
도 4는 본 발명의 일 실시 예에 따른 12-키 휴대폰상의 입력을 위한 세트-편집-거리 계산을 나타내는 테이블이다.
도 5a-5c는 본 발명의 일 실시 예에 따른 스템(stem) 편집-거리 및 스템 세트-편집-거리를 설명하기 위한 도면들이다.
도 6은 본 발명의 일 실시 예에 따른 후보 단어를 식별하기 위하여 스템-편집-거리 계산들 및 증가 필터링(incremental filtering)을 실행하기 위한 단계들을 나타내는 플로 다이어그램이다.
도 7은 본 발명의 일 실시 예에 따른 표준 편집-거리를 사용하는 단어 "misspell"을 위한 일례를 도시한 매트릭스이다.
도 8은 본 발명의 일 실시 예에 따라 계산되는 셀을 기초로 하여 표준 편집-거리 값들을 찾기 위한 방법을 나타내는 매트릭스이다.
도 9는 본 발명의 일 실시 예에 따라 비교되는 단어들이 완전히 매치할 때를 나타내는 매트릭스이다.
도 10a-10b는 본 발명의 일시 예에 따라 비교되는 단어들 사이에 미스매치(mismatch)가 존재할 때를 나타내는 일련의 매트릭스들이다.
도 11은 본 발명의 일 실시 예에 따른 회전된/변환된 매트릭스 공간을 도시한다.
도 12는 본 발명의 일 실시 예에 따라 도 11의 회전된 매트릭스를 위한 표준 편집-거리 값들을 찾기 위한 방법을 도시한다.
도 13은 본 발명의 일 실시 예에 따른 언어 데이터베이스 검색 스크린 기능을 위한 인접한 입력 설정들의 조합을 나타내는 테이블이다.
도 14는 본 발명의 일 실시 예에 따른 입력 길이 9를 위한 길이 독립 스크리닝 맵이다.
도 15는 본 발명의 일 실시 예에 따른 길이 6의 대상 단어 및 입력 길이 9를 위한 길이 의존 스크리닝 맵이다.
도 16은 본 발명의 일 실시 예에 따른 영역 내 자동 수정을 갖는 세트-편집-거리 맞춤법 수정을 도시한 일련의 스크린이다.
도 17은 본 발명의 일 실시 예에 따른 추적(trace)을 도시한 키보드이다.
도 18은 본 발명의 일 실시 예에 따라 추적된 입력을 위하여 계산되는 세트-편집-거리 매트릭스를 위한 레이아웃을 도시한다.
도 19-23은 본 발명의 일 실시 예에 따른 세트 편집 거리 매트릭스 및 다양한 그림자 매트릭스를 도시한다.
도 24는 본 발명의 일 실시 예에 따른 추적 기술을 거쳐 입력된 사용자 입력을 해결하기 위한 바람직한 운용 순서를 도시한 플로차트이다.
도 25-26은 본 발명의 일 실시 예에 따른 보조 키들을 결정하기 위한 다양한 접근법을 나타내는 키보드의 스크린 샷들이다.
도 27-30은 본 발명의 일 실시 예에 따른 세트-편집-거리 및 다양한 그림자 매트릭스를 도시한다.1 is a flow diagram of a spell checking method for a keyboard with automatic correction according to one embodiment of the present invention.
2 is a hardware block diagram of an input system with spell checking and automatic correction according to one embodiment of the invention.
2A is a block diagram of a digital data processing machine according to one embodiment of the invention.
2B illustrates a preferred storage medium in accordance with one embodiment of the present invention.
2C is a background of a preferred logic circuit in accordance with one embodiment of the present invention.
3 is a table illustrating a standard edit-distance calculation between an input word and a target word using a matrix as a tool, according to one embodiment of the invention.
4 is a table illustrating set-edit-distance calculation for input on a 12-key mobile phone according to one embodiment of the invention.
5A and 5C are diagrams for describing a stem edit-distance and a stem set-edit-distance according to an embodiment of the present invention.
6 is a flow diagram illustrating steps for performing stem-edit-distance calculations and incremental filtering to identify a candidate word in accordance with one embodiment of the present invention.
7 is a matrix illustrating an example for the word "misspell" using standard edit-distance in accordance with one embodiment of the present invention.
8 is a matrix illustrating a method for finding standard edit-distance values based on a cell computed in accordance with one embodiment of the present invention.
9 is a matrix illustrating when words to be compared completely match according to an embodiment of the present invention.
10A-10B are a series of matrices illustrating when there is a mismatch between words being compared according to a date and time example of the present invention.
11 illustrates a rotated / transformed matrix space in accordance with one embodiment of the present invention.
12 illustrates a method for finding standard edit-distance values for the rotated matrix of FIG. 11 in accordance with an embodiment of the present invention.
13 is a table illustrating a combination of adjacent input settings for a language database search screen function according to an embodiment of the present invention.
14 is a length independent screening map for an input length 9 according to an embodiment of the present invention.
FIG. 15 is a length-dependent screening map for a target word of length 6 and an input length 9 according to an embodiment of the present invention.
FIG. 16 is a series of screens illustrating set-edit-distance spell correction with automatic correction in a region, according to one embodiment of the invention.
17 is a keyboard illustrating a trace according to an embodiment of the present invention.
18 shows a layout for a set-edit-distance matrix calculated for tracked input according to one embodiment of the invention.
19-23 illustrate a set edit distance matrix and various shadow matrices in accordance with one embodiment of the present invention.
24 is a flowchart illustrating a preferred operation procedure for resolving user input input through tracking technology according to an embodiment of the present invention.
25-26 are screen shots of a keyboard illustrating various approaches for determining auxiliary keys according to one embodiment of the present invention.
27-30 illustrate a set-edit-distance and various shadow matrices in accordance with one embodiment of the present invention.

용어 해설(Glossary of Terms ( GlossaryGlossary ))

여기에 논의의 목적을 위하여, 다음의 용어들은 그것과 함께 관련된 의미를 갖는다.
For purposes of discussion herein, the following terms have the meanings associated with them.

편집 거리(또한 "표준" 편집 거리) - 두 개의 스트링(string)를 비교하고 하나를 다른 하나와 동일하게 하는데 필요한 변경들의 최소한의 수를 결정하기 위하여 충분히 입증된 알고리즘.
Edit distance (also "standard" edit distance)-A sufficiently proven algorithm to compare two strings and determine the minimum number of changes needed to make one equal to the other.

다음의 약어들이 여기서 그리고 도면에서 사용될 수 있다:The following abbreviations may be used here and in the drawings:

T - 치환된 (Transposed, 두 개의 연속적인 글자가 교환됨)T-replaced (two consecutive letters are swapped)

I - 삽입된 (Inserted, 다른 스트링 내에 없던 글자를 추가)I-inserted (Inserted, adds characters that were not in another string)

D - 삭제된 (Deleted, 하나의 스트링으로부터 추가 글자를 빠뜨림)D-deleted (Deleted, missing additional characters from a string)

S - 대체된 (substituted, 동일한 위치에서 글자를 다른 글자로 대체)S-replaced (substituted, replacing letters with another letter at the same location)

X - 계산되려는 대상 셀
X-the target cell to be counted

개량된 편집 거리, 또는 세트-편집-거리 (또는 "퍼지 비교(fuzzy compare)") - 본 특허의 주제; 표준 편집 거리에서와 같은 단일 글자보다는 각각의 입력을 표현하기 위한 글자들의 세트(각각에 대한 선택적인 확률을 갖는), 및 다른 최적화들을 사용하는 향상된 편집 거리.
Improved editing distance, or set-editing-distance (or “fuzzy compare”) — the subject of this patent; Improved editing distance using a set of letters (each with an optional probability for each) to represent each input, rather than a single letter as at standard editing distance, and other optimizations.

모드 - 운용 상태, 이러한 예를 위하여, 두 가지 중의 하나가 지정되는데, "정확한(exact)"(표준 편집 거리와 마찬가지로, 각각의 후보 단어를 일치시키기 위하여 각각의 입력 이벤트로부터 단지 정확한-탭 글자/값을 사용하는) 또는 "영역 내(regional)" "세트-기반"(입력 당 다중 글자/값들을 사용하는); 모드는 사용자 또는 시스템 지정일 수 있다.
Mode-operation status, for this example, one of two is specified, "exact" (as with the standard edit distance, just the exact-tab character / from each input event to match each candidate word). Value) or "regional""set-based" (using multiple letters / values per input); The mode can be user or system specific.

영역 내 입력 - 실제로 태핑되거나/눌려지는(pressed) 글자/키에 더하여 인근/주변의 글자들(선택적 확률들을 갖는)을 포함하는 방법(또는 이벤트).
In-Area Input-A method (or event) that includes nearby / peripheral letters (with optional probabilities) in addition to the letters / keys that are actually tapped / pressed.

세트-기반 - 각각의 입력을 표현하기 위하여 단지 하나보다는, 다중 글자 값들의 사용; 각각의 세트 구성원은 서로 다른 상대 확률을 가질 수 있다; 세트는 또한 예를 들면, 키 상에 나타나는 기본 글자의 악센트 표시가 있는 변형들을 포함할 수 있다.
Set-based—use of multiple character values, rather than just one, to represent each input; Each set member can have a different relative probability; The set can also include variants with accent marks, for example, of the basic letters appearing on the keys.

"고전적 비교", "고전적 매치(classic match)", 슬로피 형태, 또는 "영역 내 수정" - 위의 인근 글자들을 고려하는 자동 수정을 사용하는 전 단어(full word) 매칭; 일반적으로, 입력들의 수는 각각의 후보 단어(또는 완성된 단어의 단어 스템) 내의 글자들의 수와 동일하다.
"Classic comparison", "classic match", slope shape, or "correction in area"-full word matching using automatic correction taking into account the nearby letters above; In general, the number of inputs is equal to the number of letters in each candidate word (or word stem of the completed word).

필터 또는 스크린 - 결과적으로 선택 리스트에 더해지지 않는 단어를 식별하고 제거함으로써 전체 비교 또는 검색을 간단히 하기 위한 룰.
Filter or Screen-A rule to simplify the entire comparison or search by identifying and removing words that do not eventually add to the selection list.

KDB - 키보드 데이터베이스(Keyboard Database); 키보드 레이아웃, 각각의 글자를 둘러싸는 애매성의 레벨, 및 각각의 글자를 위한 인근 글자들에 관한 정보.
KDB-Keyboard Database; Information about the keyboard layout, the level of ambiguity surrounding each letter, and nearby letters for each letter.

LDB - 언어 데이터베이스(Linguistic Database), 즉, 언어를 위한 주된 어휘(vocabulary).
LDB-Linguistic Database, that is, the main vocabulary for a language.

"단어 탭 빈도" - 눌려진 키들로부터 단어가 대상 단어(target word)일 가능성까지의 물리적 거리의 기여.
"Word tap frequency"-the contribution of the physical distance from the pressed keys to the likelihood that a word is a target word.

논의Argument

본 발명의 일 실시 예는 애매한 키패드들 및 다른 예측 텍스트 입력 시스템(predictive text input system)들을 위하여 확률 기반 자동 수정 알고리즘들 및 데이터 구조들과 함께 운용하는 표준 편집 거리 맞춤법 검사 알고리즘들을 제공한다. 본 발명의 실시 예들은 또한 서로 다른 형태의 결과들을 최적화하고 배치하기 위한 전략들을 제공한다.
One embodiment of the present invention provides standard edit distance spell checking algorithms that work with probability-based automatic correction algorithms and data structures for obscure keypads and other predictive text input systems. Embodiments of the present invention also provide strategies for optimizing and placing different types of results.

도 1은 자동 수정을 갖는 키패드를 위한 맞춤법 검사 방법의 플로 다이어그램이다. 도 1은 데이터 입력 장치(data entry device, 105)를 거쳐 사용자에 의해 입력되는 입력 시퀀스를 포함하는 사용자/입력을 도시하는데, 사용자의 입력은 애매할 수 있다. 사용자의 입력을 위한 대상 의미들의 소스로서 적어도 하나의 사전(dictionary, 115)이 또한 제공된다. 각각의 사용자 입력 이벤트(100) 상에서 본 발명의 시스템에 사용자 입력 시퀀스가 제공된다. 위에서 논의된 사전(115)과 같은, 각각의 소스(source, 110)가 쿼리된다(queried). 잠재적으로 각각의 사용자 입력 이벤트 상에서 본 발명의 시스템에 대한 입력으로서, 각각의 사전 내의 모든 단어(120)가 차례로 제공된다.
1 is a flow diagram of a spell checking method for a keypad with automatic correction. FIG. 1 illustrates a user / input including an input sequence input by a user via a data entry device 105, which may be ambiguous. At least one dictionary 115 is also provided as a source of object meanings for the user's input. On each user input event 100 a user input sequence is provided to the system of the present invention. Each source 110, such as the dictionary 115 discussed above, is queried. As input to the system of the present invention, potentially on each user input event, all words 120 in each dictionary are provided in turn.

이러한 입력들을 수신하면서, 시스템은 증가된 필터링과 편집 거리 및 영역 내/확률 계산들을 실행하는데(140), 입력들과의 유사성을 위한 최대 한계(threshold)를 충족하지 못하는 어떠한 단어도 버린다(discard). 그리고 나서 시스템은 입력 시퀀스와 사전 입력들에 대한 결과를 단어 선택 리스트 내의 다른 최상 매치들과 비교하고 만일 리스트 상에 너무 낮게 순위가 매겨지면 단어를 버린다(140). 만일 리스트가 가득 차면, 리스트 내의 가장 낮은 순위의 단어가 떨어지고(dropped), 단어는 순위를 기초로 하여 리스트 내에 삽입된다(150). 그리고 나서 리스트는 사용자에 표현된다.Upon receiving these inputs, the system executes the increased filtering and editing distance and in-region / probability calculations (140), discarding any words that do not meet the maximum threshold for similarity with the inputs. . The system then compares the results for the input sequence and dictionary inputs with the other best matches in the word selection list and discards the word if it is ranked too low on the list (140). If the list is full, the lowest ranked word in the list is dropped, and the word is inserted into the list based on the ranking (150). The list is then presented to the user.

도 2는 맞춤법 검사 및 자동 수정을 갖는 입력 시스템(200)의 하드웨어 블록 다이어그램이다. 입력 장치(202) 및 디스플레이(203)가 적절한 인터페이스 회로를 통하여 프로세서(201)에 결합된다. 선택적으로, 스피커(204)가 또한 프로세서에 결합된다. 프로세서(201)는 입력 장치로부터 입력을 수신하고, 디스플레이 및 스피커로의 모든 출력을 관리한다. 프로세서(201)는 메모리(210)에 결합된다. 메모리는 랜덤 액세스 메모리(RAM)와 같은, 임시 저장 매체 및 판독-전용메모리(ROM), 플로피 디스크, 하드 디스크, 또는 CD-ROM과 같은, 영구 저장 매체의 조합을 포함한다. 메모리(210)는 시스템 작동을 통제하기 위하여 모든 소프트웨어 루틴(software routine)을 포함한다. 바람직하게는, 메모리는 운용 시스템(211), 편집 거리를 계산하고 특히 맞춤법 검사를 실행하기 위한 수정 소프트웨어(212), 및 아래에 상세히 논의되는 관련 어휘 모듈(vocabulary module, 213)들을 포함한다. 선택적으로, 메모리는 하나 또는 그 이상의 응용 프로그램(214, 215, 216)을 포함할 수 있다. 응용 프로그램의 예들은 워드 프로세서(word processor), 소프트웨어 딕셔너리(software dictionaries), 및 외국어 번역기(foreign language translators)를 포함한다. 통신 지원으로서 기능을 하기 위하여 통신 입력 시스템이 모든 수정 능력을 갖도록 허용하는, 응용 프로그램으로서 음성 합성 소프트웨어가 또한 제공될 수 있다.
2 is a hardware block diagram of an input system 200 with spell checking and automatic correction. Input device 202 and display 203 are coupled to processor 201 through appropriate interface circuitry. Optionally, speaker 204 is also coupled to the processor. The processor 201 receives input from the input device and manages all output to the display and speakers. The processor 201 is coupled to the memory 210. The memory includes a combination of temporary storage media, such as random access memory (RAM) and permanent storage media, such as read-only memory (ROM), floppy disk, hard disk, or CD-ROM. Memory 210 includes all software routines to control system operation. Preferably, the memory comprises an operating system 211, correction software 212 for calculating the editing distance and in particular performing a spell check, and associated vocabulary modules 213 discussed in detail below. Optionally, the memory may include one or more application programs 214, 215, 216. Examples of applications include word processors, software dictionaries, and foreign language translators. Speech synthesis software may also be provided as an application that allows the communication input system to have all modification capabilities to function as communication support.

바람직한 디지털 데이터 처리 장치Preferred Digital Data Processing Unit

중앙처리장치(CPU, 201)와 같은 데이터 처리 개체들이 다양한 형태로 구현될 수 있다. 일부 실시 예들은 일반적인 목적의 프로세서, 디지털 신호 프로세서(DSP), 주문형 반도체(ASIC), 현장 프로그램가능 게이트 어레이(FPGA) 또는 다른 프로그램가능 논리 장치, 이산 게이트 또는 트랜지스터 논리, 이산 하드웨어 부품들, 혹은 여기에 설명되는 기능들을 실행하도록 디자인된 그것들의 어떤 조합을 포함한다. 일반적인 목적의 프로세서는 마이크로프로세서일 수 있으나, 대안으로서, 프로세서는 종래의 어떠한 프로세서, 컨트롤러, 마이크로컨트롤러, 또는 상태 기계(state machine)일 수 있다. 프로세서는 또한 계산 장치들의 조합, 예를 들면, 디지털 신호 프로세서와 마이크로프로세서, 복수의 마이크로프로세서, 디지털 신호 프로세서 코어와 함께 하나 또는 그 이상의 마이크로프로세서, 또는 다른 그러한 구성의 조합으로서 구현될 수 있다.Data processing entities such as a central processing unit (CPU) 201 may be implemented in various forms. Some embodiments may be a general purpose processor, digital signal processor (DSP), application specific semiconductor (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or excitation It includes any combination of those designed to perform the functions described in. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, eg, a combination of a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors with a digital signal processor core, or other such configuration.

더 구체적인 예로서, 도 2a는 디지털 데이터 처리 장치(220)를 도시한다. 장치(220)는 디지털 데이터 기억장치(224)에 결합되는, 마이크로프로세서, 개인용 컴퓨터, 워크스테이션(workstation), 컨트롤러, 마이크로컨트롤러, 상태 기계와 같은 프로세서(222)를 포함한다. 본 실시 예에서, 기억장치(224)는 비휘발성 기억장치뿐만 아니라, 속성 액세스(fast-access) 기억장치를 포함한다. 속성-액세스 기억장치(226)는 예를 들면, 프로세서(222)에 의해 실행되는 프로그래밍 명령들을 기억하도록 사용될 수 있다. 기억장치(226 및 228)는 도 2b-2c와 함께 상세히 설명되는 것과 같은, 다양한 장치들에 의해 구현될 수 있다. 많은 대안들이 가능하다. 예를 들면, 부품들(226, 228) 중의 하나가 제거될 수 있으며, 또한 탑재된 프로세서(222)에 기억장치(224, 226 및/또는 228)가 제공될 수 있거나, 또는 장치(220) 외부에 제공될 수 있다.As a more specific example, FIG. 2A illustrates a digital data processing apparatus 220. Device 220 includes a processor 222, such as a microprocessor, personal computer, workstation, controller, microcontroller, state machine, coupled to digital data storage 224. In the present embodiment, the storage device 224 includes not only non-volatile memory but also fast-access memory. Attribute-access storage 226 may be used to store programming instructions executed by processor 222, for example. Storage devices 226 and 228 may be implemented by various devices, such as those described in detail in conjunction with FIGS. 2B-2C. Many alternatives are possible. For example, one of the components 226, 228 may be removed and the memory 224, 226 and / or 228 may also be provided in the mounted processor 222, or external to the device 220. Can be provided.

장치(220)는 또한 데이터를 장치(220) 외부의 다른 하드웨어와 교환하기 위한 커넥터, 라인, 버스, 버퍼, 전자기 링크, 네트워크, 모뎀, 변환기, 적외선 포트, 안테나, 또는 프로세스를 위한 다른 수단들과 같은, 입력/출력(221)을 포함한다.
Device 220 may also include connectors, lines, buses, buffers, electromagnetic links, networks, modems, converters, infrared ports, antennas, or other means for exchanging data with other hardware external to device 220. Like, input / output 221.

기억 매체(Storage media ( StorageStorage MediaMedia ))

디지털 데이터 기억장치의 다양한 인스턴스(instance)들은 기억장치(24 및 228, 도 2a) 등을 구현하기 위하여 메모리(210)와 같은, 기억장치를 제공하도록 사용될 수 있다. 그것의 적용에 따라, 이러한 디지털 데이터 기억장치는 데이터를 기억하거나 또는 기계-판독가능한 명령들을 기억하기 위한 것과 같이 다양한 기능들을 위하여 사용될 수 있다. 이러한 명령들은 스스로 다양한 처리 기능을 수행하는데 도움을 줄 수 있거나, 또는 컴퓨터상에 소프트웨어 프로그램들을 설치하는데 도움을 줄 수 있는데, 그러한 소프트웨어 프로그램은 그리고 나서 본 발명과 관련된 다른 기능들을 실행하도록 실행가능하다.Various instances of digital data storage may be used to provide storage, such as memory 210, to implement storage 24 and 228 (FIG. 2A), and the like. Depending on its application, such digital data storage can be used for various functions, such as for storing data or for storing machine-readable instructions. Such instructions may help to perform various processing functions on their own or may help install software programs on a computer, which software program may then be executable to execute other functions related to the present invention.

어쨌든, 기억 매체는 기계-판독가능한 신호들을 디지털 방식으로 기억하기 위하여 거의 모든 메커니즘에 의해 구현될 수 있다. 일례가 CD-ROM, WORM, DVD, 디지털 광학 테이프, 디스크 기억장치(230, 도 2b), 또는 다른 광학 기억장치와 같은, 광학 기억장치이다. 또 다른 예는 종래의 "하드 드라이브", 종래의 레이드(redundant array of inexpensive disks, RAID), 또는 다른 직접 액세스 기억 장치(DASD)와 같은, 직접 액세스 기억장치이다. 디지털 데이터 기억장치의 또 다른 예는 ROM, EPROM, 플래시 ROM, EEPROM, 메모리 레지스터, 배터리 백-업 RAM 등과 같은 전자 메모리를 포함한다. In any case, the storage medium can be implemented by almost any mechanism for digitally storing machine-readable signals. One example is an optical storage device, such as a CD-ROM, WORM, DVD, digital optical tape, disk storage 230 (FIG. 2B), or other optical storage. Another example is a direct access storage device, such as a conventional " hard drive ", conventional RAID (redundant array of inexpensive disks), or other direct access storage device (DASD). Another example of digital data storage includes electronic memory, such as ROM, EPROM, flash ROM, EEPROM, memory registers, battery back-up RAM, and the like.

바람직한 기억 매체는 프로세서가 기억 매체로부터 정보를 판독하고 기억 매체로 정보를 기록하도록 프로세서에 결합된다. 대안으로서, 기억 매체는 프로세서에 통합될 수 있다. 또 다른 실시 예에서, 프로세서 및 기억 매체는 ASIC 또는 다른 집적 회로에 존재할 수 있다.
A preferred storage medium is coupled to the processor such that the processor reads information from and writes information to the storage medium. In the alternative, the storage medium may be integral to the processor. In still other embodiments, the processor and the storage medium may reside in an ASIC or other integrated circuit.

논리 회로Logic circuit

위에서 설명된 것과 같은, 기계-실행가능 명령들을 포함하는 기억 매체와 대조적으로, 여기서 설명되는 처리 특징들을 구현하기 위하여 다른 실시 예는 논리 회로를 사용한다. 속도, 비용, 장비 비용 등의 분야에서의 적용의 특정 요건에 따라, 이러한 논리는 수천의 작은 통합 트랜지스터를 갖는 주문형 반도체를 구성함으로써 구현될 수 있다. 그러한 주문형 반도체는 CMOS, TTL, VLSI, 또는 다른 적절한 구성과 함께 구현될 수 있다. 다른 대안들은 디지털 신호 처리 칩(DSP), 이산 회로(저항기, 커패시터, 다이오드, 인덕터, 및 트랜지스터와 같은), 현장 프로그램가능 게이트 어레이, 프로그램가능 논리 어레이(PLA), 프로그램가능 논리 장치(PLD) 등을 포함한다. 도 2c는 집적 회로(240) 형태의 논리 회로의 일 실시 예를 도시한다.
In contrast to a storage medium containing machine-executable instructions, such as described above, another embodiment uses logic circuitry to implement the processing features described herein. Depending on the specific requirements of the application in the fields of speed, cost, equipment cost, etc., this logic can be implemented by constructing a custom semiconductor with thousands of small integrated transistors. Such custom semiconductors may be implemented with CMOS, TTL, VLSI, or other suitable configuration. Other alternatives include digital signal processing chips (DSPs), discrete circuits (such as resistors, capacitors, diodes, inductors, and transistors), field programmable gate arrays, programmable logic arrays (PLAs), programmable logic devices (PLDs), and the like. It includes. 2C illustrates one embodiment of a logic circuit in the form of an integrated circuit 240.

영역 내 수정과 In-zone modifications 결합된Combined 편집 거리 Edit distance

편집-거리는 하나의 스트링을 다른 스트링으로 바꾸기 위하여 필요한 작업의 수이다. 본질적으로, 이는 철자가 틀린 단어를 고치기 위하여, 펜으로 수동으로 만들어야만 하는 편집의 수이다. 예를 들면, 입력 단어 "ressumt"를 대상 단어 "result"로 고치기 위하여, 두 개의 편집이 만들어져야 하는데, "s"가 제어되어야 하며, "m"이 "l"로 변경되어야 한다. 따라서, "result"는 "ressumt"로부터 편집 거리 2이다.
Edit-distance is the number of operations needed to replace one string with another. In essence, this is the number of edits that must be made manually with the pen to correct misspelled words. For example, to modify the input word "ressumt" to the target word "result", two edits must be made, "s" must be controlled, and "m" must be changed to "l". Therefore, "result" is edit distance 2 from "ressumt".

입력 단어 및 대상 단어 사이의 편집-거리를 결정하기 위한 일반적인 기술은 툴로서 매트릭스를 사용한다(도 3 참조). 접근법은 입력 단어 내의 글자를 대상 단어 내의 글자와 비교하고, 매트릭스의 하부 맨 오른쪽 구성요소에서의 단어들 사이의 전체 편집-거리를 준다. 계산의 상세 내용은 복잡하나, 일반적으로 편집-거리(대각선의 구성요소 내의 수에 의해 나타나는)는 단어들의 부들이 다르게 보이기 시작할 때마다 증가한다(그리고 작은 값은 더 유사한 것을 의미한다). 상부 오른쪽으로부터 하부 왼쪽으로 매트릭스를 작업할 때, 만일 대상 단어 내의 글자가 입력 단어 내의 글자와 동일하면, 편집-거리는 증가하지 않는다. 만일 대상 단어 내의 글자가 동일하지 않으면, 편집-거리는 표준 룰에 따라 증가한다. 최종 결과, 전체 편집 거리는 하부 맨 오른쪽 요소(굵은 윤곽선)이다.
A general technique for determining the edit-distance between an input word and a target word uses a matrix as a tool (see FIG. 3). The approach compares the letters in the input word with the letters in the target word and gives the overall edit-distance between the words in the lower rightmost component of the matrix. The details of the calculations are complex, but generally the edit-distance (indicated by the number in the diagonal's components) increases each time the parts of the words begin to look different (and smaller values mean more similar). When working the matrix from the upper right to the lower left, the edit-distance does not increase if the letter in the target word is the same as the letter in the input word. If the letters in the target word are not the same, the edit-distance increases according to standard rules. As a result, the overall edit distance is the bottom rightmost element (bold outline).

그러한 개념은 이제 각각의 입력이 단일 글자들보다 글자들의 세트와 상응하는 애매한 입력으로 확장된다. 이러한 일례가 사용자에게 사용자가 입력하기를 원하는 글자들과 상응하는 키들을 누르도록 허용하는 휴대폰상의 텍스트 입력 시스템인데, 시스템은 키들이 그것들과 관련된 다중 글자를 갖는다는 사실에 내재하는 애매함을 해결한다. 새로운 용어 "세트-편집-거리"는 애매한 입력에 대한 편집-거리 개념의 확장을 언급한다. 세트-편집-거리를 설명하기 위하여, 휴대폰의 텍스트 입력 시스템의 사용자는 단어 "result"를 입력하려고 시도하는 동안에 키(7,3,7,7,8,6,8)를 누르는 것으로 가정한다. 이러한 애매한 시스템상의 맞춤법 수정은 입력 키 시퀀스에 대하여 가장 작은 세트-편집-거리를 갖는 단어들을 찾는다. 본 기술은 편집-거리의 그것과 유사하나, 대상 단어 내의 글자를 입력 시퀀스 내의 글자와 비교하는 대신에, 대상 내의 글자는 입력 키에 의해 표현되는 글자들 세트에 대하여 비교된다. 만일 대상 글자가 입력 세트 내에 존재하면, 세트-편집-거리는 증가하지 않는다. 만일 대상 글자가 입력 세트 내에 존재하지 않으면, 세트-편집-거리는 표준 룰에 따라 증가한다. 세트-편집-거리와 상응하는 매트릭스가 도 4에 도시되는데, 그 결과는 하부 맨 오른쪽(굵은 윤곽선)에 존재한다.
Such a concept now extends to obscure input, where each input corresponds to a set of letters rather than single letters. An example of this is a text input system on a cell phone that allows a user to press keys corresponding to the letters that the user wants to enter, which solves the ambiguity inherent in the fact that the keys have multiple letters associated with them. The new term "set-edit-distance" refers to the extension of the edit-distance concept for ambiguous input. To illustrate the set-edit-distance, it is assumed that the user of the text entry system of the cellular phone presses the keys 7, 3, 7, 7, 7, 8, 6 and 8 while attempting to enter the word "result". This spelling correction on the ambiguous system finds the words with the smallest set-edit-distance for the input key sequence. The technique is similar to that of edit-distance, but instead of comparing the letters in the target word with the letters in the input sequence, the letters in the object are compared against a set of letters represented by the input key. If the target character is in the input set, the set-edit-distance does not increase. If the target character does not exist in the input set, the set-edit-distance increases according to standard rules. The matrix corresponding to the set-edit-distance is shown in FIG. 4, with the result at the bottom rightmost (bold outline).

도 4의 실시 예는 세트-편집-거리의 개념을 설명하기 위하여 휴대폰상의 키 입력을 사용하나, 이러한 개념은 또한 쿼티 키보드 상에 눌려진 키를 둘러싸는 글자들의 세트, 또는 글자 인식 엔진으로부터 복귀되는 글자들의 세트와 같은, 다른 애매한 시스템들에 적용된다. 또한, 위의 실시 예는 세트 내의 글자들이 모두 동일한 가능성인 것으로 가정하나, 시스템은 최종 세트-편집-거리 스코어(score) 내의 글자 확률들을 통합하도록 확장될 수 있다.
Although the embodiment of FIG. 4 uses a key input on a mobile phone to illustrate the concept of set-edit-distance, this concept may also be a set of letters surrounding a key pressed on a QWERTY keyboard, or letters returned from a letter recognition engine. It applies to other obscure systems, such as a set of. Further, while the above embodiment assumes that the letters in the set are all equally likely, the system can be extended to incorporate the letter probabilities in the final set-edit-distance score.

그러한 확장된 시스템에서, 입력 시퀀스는 하나 또는 그 이상의 그 이상의 글자 + 확률 쌍들의 어레이로서 나타낼 수 있다. 확률은 시스템에 의해 확인되는 글자가 사용자가 의도한 글자라는 가능성을 반영한다. "Handwriting And Voice Input With Automatic Correction"이라는 발명의 명칭으로 Robinson 등에 의한 미국특허 7,319,957(2008년 1월 15일) 및 "Handwriting And Voice Input With Automatic Correction"이라는 발명의 명칭으로 Robinson 등에 의해 2005년 1월 25일에 출원된 미국특허 제 11/043,525에서 설명된 것과 같이, 이들 각각은 여기에 참조로써 통합된다. 확률은 하나 또는 그 이상의 다음을 기초로 할 수 있다:In such an extended system, the input sequence may be represented as an array of one or more letter + probability pairs. The probability reflects the likelihood that the character identified by the system is the character intended by the user. US Patent 7,319,957 (January 15, 2008) by Robinson et al. In the name of the invention "Handwriting And Voice Input With Automatic Correction" and January 2005 by Robinson et al. In the name of the invention "Handwriting And Voice Input With Automatic Correction". As described in US Patent No. 11 / 043,525, filed 25, each of which is incorporated herein by reference. Probability may be based on one or more of the following:

스타일러스(stylus) 또는 손가락 탭 위치로부터 터치 스크린상에 표시되는 키보드 상의 각각의 인접한 글자까지의 데카르트 거리(Cartesian distance), 인접한 글자들의 빈도, 및/또는 각각의 글자 주위의 탭들의 분포;

Cartesian distance, the frequency of adjacent letters, and / or distribution of tabs around each letter from the stylus or finger tap position to each adjacent letter on the keyboard displayed on the touch screen;

알파벳의 인접 글자들의 지정된 파이 슬라이스(pie slice)들에 대한 조이스틱 틸트 방향 사이의 방사상 거리;

Radial distance between joystick tilt directions for designated pie slices of adjacent letters of the alphabet;

손으로 쓴 글자 및 가능한 글자 형태들/템플레이트들 사이의 유사성 정도, 예를 들면, "ink trail"은 글자 'c'(60% 확률)가 가장 유사한 것으로 보이나, 또한 'o'(20%), 'e'(10%), 'a'(10%)와 같은, 다른 글자들일 수 있다; 및

The degree of similarity between handwritten letters and possible letter types / templates, for example "ink trail", appears to be the most similar to the letter 'c' (60% probability), but also to 'o' (20%), other letters, such as 'e' (10%), 'a'(10%); And

글자/문자소(grapheme)가 음성 인식 전위(front-end)에 의해 처리되는 음소(phoneme) 또는 전체 단어 발음(utterance)에서 표현되는 확률

The probability that a letter / grapheme is represented in the phoneme or whole word utterance processed by the speech recognition front-end

따라서, 세트-편집-거리는 애매한 세트들에 적용되는 표준 편집 거리인데, 입력된 어휘 단어 및 대상 어휘 단어 사이의 각각의 차이에 페널티들이 할당된다. "이 글자가 다른가?"라는 질문 대신에 "이 글자가 확률 세트 내의 가능한 후보 중의 하나인가?"라는 질문이 요구된다.
Thus, the set-edit-distance is a standard edit distance applied to obscure sets, where penalties are assigned to each difference between the input lexical word and the target lexical word. Instead of the question "Is this letter different?", The question is "is this letter one of the possible candidates in the probability set?"

따라서, 일 실시 예는 다음의 알고리즘을 적용한다:Thus, one embodiment applies the following algorithm:

만일 매치를 야기하는 두 개의 확률 변환이 존재하면, 가장 낮은 편집 거리를 갖는 하나를 선택한다.

If there are two probability transforms that cause a match, the one with the lowest edit distance is selected.

만일 글자가 입력의 확률 세트 내에 존재하면, 또한 그러한 글자를 위한 영역-수정 확률 스코어를 계산한다..

If a letter is in the set of probabilities of the input, it also calculates a domain-corrected probability score for that letter.

맞춤법 수정 탭 빈도를 계산하기 위하여 단어 내의 모든 글자를 위한 모든 영역-수정 확률 스코어를 누적한다.

Accumulate all area-correction probability scores for all letters in a word to calculate the spelling correction tap frequency.

제로-세트-편집-거리 단어, 즉 어휘 단어 내에 동일한 단어 길이 및 각각의 단어가 입력 확률 세트들 내에 존재하는 단어들을 위하여, 탭 빈도만이 사용된다.

For zero-set-edit-distance words, ie words where the same word length in the lexical word and each word is in the input probability sets, only the tap frequency is used.

매칭 및 단어 리스트 배치 단계들 위하여 다수의 값들이 계산되거나 누적된다:Multiple values are calculated or accumulated for matching and word list placement steps:

1. 세트-편집-거리;1. set-edit-distance;

2. 비교의, 탭 빈도;2. of comparison, tap frequency;

3. 스템 편집-거리;3. stem edit-distance;

4. 단어 빈도; 및4. word frequency; And

5. 소스, 예를 들면, 사전.
5. Sources, eg dictionaries.

단어의 탭 빈도(TF) 또는 스템은 다음과 같이 계산될 수 있다:The tap frequency (TF) or stem of a word can be calculated as follows:

TF = 글자 1의 확률 * 글자 2의 확률 * ... (1)
TF = probability of letter 1 * probability of letter 2 * ... (1)

이는 표준 확률 세트 자동-수정 계산들과 유사하나, 편집 거리 알고리즘은 대안들을 생성하고 그리고 나서 이러한 대안들 중에서 가장 크게 계산된 빈도가 선택된다.
This is similar to standard probability set auto-correction calculations, but the edit distance algorithm generates alternatives and then the largest calculated frequency among these alternatives is selected.

도 4의 실시 예는 세트-기반 입력 시퀀스 및 전체 대상 단어 사이의 비교를 사용한다. 이러한 개념은 또한 대상 단어의 시작(스템)에 대하여 입력들의 세트들을 비교하도록 적용될 수 있다. 이는 사용자가 전체 입력 시퀀스를 입력하기 전에 시스템이 맞춤법 수정들을 예측하기 시작하도록 허용한다. 이는 스템-편집-거리라고 불린다. 도 5a-5b는 부분적인 입력 시퀀스들을 도시한다. 이러한 도면들에서, 글자 'a' 및 'c'는 터치 스크린 쿼티 키보드 상의 물리적 근접성을 기초로 하는 동일한 세트의 구성원들일 수 있으며, 's' 및 'g'는 그렇지 않다. 대상 단어의 세 번째 위치에서의 글자 's'는 도 5a에서의 세 번째 입력을 위한 세트 내에 존재하기 때문에, 입력 및 대상 단어 사이의 스템 세트-편집-거리는 제로이다. 세 번째 글자 's'는 도 5b에서의 세 번째 입력을 위한 동일한 세트 내에 존재하지 않기 때문에, 입력 및 대상 단어 사이의 스템 세트-편집-거리는 1이다.
4 uses a comparison between the set-based input sequence and the entire target word. This concept can also be applied to compare sets of inputs relative to the start of the target word (stem). This allows the system to begin predicting spelling corrections before the user enters the entire input sequence. This is called stem-edit-distance. 5A-5B show partial input sequences. In these figures, the letters 'a' and 'c' may be members of the same set based on physical proximity on the touch screen qwerty keyboard, while 's' and 'g' are not. Since the letter 's' at the third position of the target word is in the set for the third input in FIG. 5A, the stem set-edit-distance between the input and the target word is zero. Since the third letter 's' is not in the same set for the third input in FIG. 5B, the stem set-edit-distance between the input and the target word is one.

스템 편집-거리는 분명하게 입력되거나 가장 개연성 있는 글자들을 위한 편집 거리 값, 주로 긴 대상 단어의 상응하는 글자들과 비교하여, 각각의 입력 확률 세트로부터의 추출-탭 값이다. 이 경우에 있어서, 터치 스크린 쿼티 키보드를 위하여 각각의 입력으로부터 가장 개연성 있는 글자는 추출-탭 글자이다. 대상 단어의 세 번째 위치에서의 글자 's'는 도 5a에서의 세 번째 입력을 위한 추출-탭 값과 동일하지 않기 때문에, 입력 및 대상 단어 사이의 스템 편집-거리는 1이다. 유사하게, 도 5에서의 입력 대상 단어 사이의 스템 편집-거리는 또한 1이다.
The stem edit-distance is the edit distance value for the clearly entered or most probable letters, mainly the extract-tap value from each set of input probabilities, compared to the corresponding letters of the long target word. In this case, the most probable character from each input for the touch screen qwerty keyboard is the extract-tab character. Since the letter 's' at the third position of the target word is not the same as the extract-tap value for the third input in FIG. 5A, the stem edit-distance between the input and the target word is one. Similarly, the stem edit-distance between the words to be input in FIG. 5 is also one.

스템 세트-편집-거리를 위한 세트들은 또한 언어 특이적일 수 있다. 예를 들면, 프랑스어에서의 글자의 악센트가 드러나는 변형들이 동일한 세트의 구성원들일 수 있다. 도 5c는 'e'의 변형이 동일한 키에 대하여 매핑하는 일 실시 예를 도시하는데, 이는 입력 및 대상 단어 사이에 제로의 스템 세트-편집-거리를 야기한다.
Sets for stem set-edit-distance may also be language specific. For example, variations in which the accents of letters in French are revealed may be members of the same set. FIG. 5C shows one embodiment where the variant of 'e' maps to the same key, resulting in zero stem set-edit-distance between the input and target words.

본 발명의 일 실시 예는 또한 사용자의 의도 또는 입력 스타일을 반영하도록 선택 리스트 내의 단어들의 배치를 조정하기 위한 다수의 혁신적인 전략을 제공한다. 예를 들면, 결과들은 다음의 두 가지 방법 중의 하나로 편향될 수 있다:One embodiment of the present invention also provides a number of innovative strategies for adjusting the placement of words in a selection list to reflect the user's intent or input style. For example, results can be biased in one of two ways:

전 단어 우선권 - 예를 들면, 복잡하거나 또는 낮은 촉각 피드백을 갖는 좋지 못한 키보드, 및/또는 빠르거나 슬로피 타이피스트를 위하여, 결과들은 영역 내, 즉, 니어-미스(near-miss), 모든 입력의 수정 및 소수의 단어 완성들을 강조한다; 및

Full word priority-for example, for a bad keyboard with complex or low tactile feedback, and / or a fast or slippy typist, the results are in the area, ie near-miss, of all input. Emphasize corrections and few word completions; And

완성들의 촉진 - 뛰어난/정확한 키보드, 및/또는 완전히 신장시키기 위한 완성들을 원하는 느린, 세심한 타이피스트를 위하여, 결과들은 즉, 어느 정도는 정확한 탭, 입력 시퀀스 등을 기초로 하는 단어 완성들을 강조한다

Promotion of Completions-For a slow, meticulous typist who wants an excellent / accurate keyboard, and / or completions to fully elongate, the results emphasize words completion to some degree based on correct taps, input sequences, etc.

본 발명의 일 실시 예는 비-데스크톱 장치상에서, 특히 타이핑 오류 및 모든 타이피스트에 더 유용하도록 위에서 설명된 "슬로피 타이프" 기술과 협력하는 그러한 시스템들을 허용하는 맞춤법 검사 특징들을 제공한다. "슬로피 타이프" 시스템은 사용자 키스트로크 엔트리(keystroke entry)에서 자동으로 부정확성을 수정하기 위하여 단어 레벨의 명확성을 사용하는 향상된 텍스트 입력 시스템을 제공한다. 구체적으로, "슬로피 타이프" 시스템은 (a) 알파벳의 복수의 글자들을 포함하는 자동-수정 키보드 영역을 포함하는 터치 감응 표면을 구비하고, 각각의 복수의 글자들은 자동-수정 키보드 영역 내의 알려진 좌표를 갖는 위치와 상응하고, 사용자가 자동-수정 키보드 영역 내의 사용자 입력 장치를 접촉할 때마다, 사용자 접촉과 관련된 위치가 결정되고 결정된 접촉 위치는 접촉 위치들의 현재 입력 시퀀스에 더해지며; (b) 복수의 대상과 접촉하는 메모리를 구비하고, 각각의 대상은 단어 또는 단어의 일부를 형성하는 하나 또는 복수의 글자의 스트링이며, 각각의 대상은 또한 사용의 빈도와 관련되며; (c) 텍스트 디스플레이 영역을 갖는 출력 장치; 및 (d) 사용자 입력 장치, 메모리, 및 출력 장치에 결합되는 프로세서;를 포함하며, 상기 프로세서는 (ⅰ) 접촉들의 입력 시퀀스 내의 각각의 결정된 접촉 위치를 위하여, 접촉 위치들 및 자동-수정 키보드 영역 내의 하나 또는 복수의 글자와 상응하는 알려진 좌표 위치들 사이의 거리 값들의 세트를 계산하는, 거리 값 계산 부품(distance value calculation component); (ⅱ) 각각의 발생된 입력 시퀀스를 위하여, 메모리 내의 하나 또는 복수의 글자를 식별하고, 하나 또는 복수의 식별된 후보 대상 각각을 위하여, 계산된 거리 값들 및 대상과 관련된 사용의 빈도를 기초로 하여 매칭 메트릭(matching metric)을 계산함으로써 각각의 식별된 후보 대상을 평가하며, 계산된 매칭 메트릭 값들을 기초로 하여 평가된 후보 대상들에 순위를 매기는, 단어 평가 부품(word evaluation component); 및 (ⅲ) (a)그것들의 평가된 순위에 따라, 하나 또는 복수의 후보 대상을 식별하고, (b) 식별된 대상들을 사용자에게 보여주고, 출력 장치상의 텍스트 디스플레이 영역으로의 출력을 위하여 사용자가 보여준 대상들 중 하나를 선택할 수 있도록 하기 위한 선택 부품;을 포함한다.
One embodiment of the present invention provides spell checking features that allow such systems to work with non-desktop devices, in particular with the "sloppy type" technique described above to be more useful for typing errors and all typists. The "sloppy type" system provides an enhanced text input system that uses word-level clarity to automatically correct inaccuracies in user keystroke entries. Specifically, the “sloppy type” system includes (a) a touch sensitive surface that includes an auto-correction keyboard area that includes a plurality of letters of the alphabet, each of the plurality of letters having known coordinates within the auto-correction keyboard area. Corresponding to a location having a position, each time a user contacts a user input device within an auto-correction keyboard area, a location associated with the user contact is determined and the determined contact location is added to the current input sequence of contact locations; (b) having a memory in contact with a plurality of objects, each object being a word or a string of one or more letters forming part of a word, each object also associated with a frequency of use; (c) an output device having a text display area; And (d) a processor coupled to the user input device, memory, and output device, the processor comprising: (i) contact locations and auto-correction keyboard area for each determined contact location in the input sequence of contacts; A distance value calculation component for calculating a set of distance values between one or a plurality of letters within and corresponding known coordinate positions; (Ii) for each generated input sequence, identify one or a plurality of letters in memory and, for each of the one or a plurality of identified candidate objects, based on the calculated distance values and the frequency of use associated with the object. A word evaluation component that evaluates each identified candidate object by calculating a matching metric and ranks the evaluated candidate objects based on the calculated matching metric values; And (iii) identify (a) one or a plurality of candidate objects according to their evaluated ranking, (b) present the identified objects to the user, and output the text to the text display area on the output device. And an optional component for allowing selection of one of the objects shown.

최적화들Optimizations

이론적으로, 충분히 큰 편집 거리 스코어가 주어진다면, 어휘 내의 모든 단어는 수정으로 고려될 수 있다. 그러나, 데이터베이스 처리는 사용자가 타이핑할 때마다 실시간으로 발생하여야만 하며, 특히 모바일 기기를 위하여, 이용가능한 처리 능력 및 작동 메모리에 대한 제한이 존재한다. 따라서, 결합된 편집 거리 알고리즘들의 모든 부분을 최적화하고 가능할 때 처리 단계들을 제거하는 것이 중요하다. 예를 들면, 가능한 단어 매치를 버리기 위한 최고 레벨 기준은 비교되는 어떠한 단어에 대하여 최대 세 개의 편집까지, 매 세 개의 실제 입력들에 대하여 단지 하나의 편집/수정을 허용하는 것이다.
In theory, given a sufficiently large edit distance score, every word in the vocabulary can be considered a modification. However, database processing must occur in real time each time a user types, and especially for mobile devices, there are limitations on available processing power and working memory. Therefore, it is important to optimize all parts of the combined edit distance algorithms and to eliminate processing steps when possible. For example, the highest level criterion for discarding possible word matches is to allow only one edit / modification for every three actual inputs, up to three edits for any word being compared.

다른 성능 개량들은 예를 들면 다음을 포함(제한 없이)할 수 있다:Other performance improvements may include, for example, without limitation:

편집 거리 계산들을 최소화하기 위한 전략들, 예들 들면, 비교가 전부 거부되도록 허용할 수 있는 셀들을 계산하는 첫 번째 통과(first pass).

Strategies for minimizing edit distance calculations, e.g., a first pass that calculates cells that can allow the comparison to be rejected entirely.

시스템은 사용자가 또 다른 글자를 입력할 때와 같이, 이전 통과의 결과들로부터 시작하거나; 또는 임시로 예를 들면, 사용자가 엔트리를 멈출 때까지, 단축되거나, 부분적인, 또는 흐릿한 선택 리스트를 나타내는, 이전 단어들을 줄인다,

The system starts from the results of the previous pass, such as when the user enters another letter; Or temporarily reduce the previous words, for example, indicating a shortened, partial, or blurred selection list until the user stops the entry,

예를 들면, 덜 엄격한 대부분에서, 필터링의 레벨들은 편집 거리 매트릭스 계산들이 완료되기 전에, 동안에, 또는 이후에 적용되는데, 예를 들면:

For example, in most less stringent levels of filtering are applied before, during, or after edit distance matrix calculations are completed, for example:

정확한 첫 번째 단어, 그렇지 않으면 고려로부터 대상 단어를 철회;

Withdraw the target word from the exact first word, otherwise consider;

확률 세트 내의 첫 번째 단어 니어-미스, 영역 내;

First word near-miss in the probability set, in the region;

어휘 단어의 첫 번째 글자는 반드시 첫 번째 두 입력들 중의 하나와 매치하여야 하는데, 예를 들면, 하나의 첨가, 하나의 드롭(drop), 또는 하나의 치환된 쌍을 허용한다;

The first letter of a lexical word must match one of the first two inputs, for example allowing one addition, one drop, or one substituted pair;

어휘 단어의 첫 번째 글자는 반드시 첫 번째 두 입력들 중의 하나의 세트 내에 존재하여야 한다;

The first letter of the lexical word must be in one set of the first two inputs;

다른 필터링 개념들 및 변형들이 적용될 수 있다; 및

Other filtering concepts and variations can be applied; And

비 필터링

Non-filtering

단어 빈도는 지프의 법칙(Zipf's Law)을 기초로 하여 근사치가 계산될 수 있는데, 이는 자연 언어 발음의 주어진 일부 코퍼스(corpus), 어떤 단어의 빈도는 빈도 테이블 내의 그것의 순위에 반비례한다는 것을 설명한다..따라서, 가장 빈번한 단어는 대략 두 번째 빈번한 단어의 두 배로 발생하고, 두 번째 빈번한 단어는 네 번째 빈번한 단어의 두 배로 발생한다. 일 실시 예에서, 어휘 데이터베이스 내의 각각의 단어를 위하여 저장된 값보다는, 근사치가 사용된다:The word frequency can be approximated based on Zipf's Law, which explains that given some corpus of natural language pronunciation, the frequency of a word is inversely proportional to its rank in the frequency table. Thus, the most frequent word occurs approximately twice as often as the second, and the second most often occurs twice as often as the fourth. In one embodiment, an approximation is used, rather than the value stored for each word in the lexical database:

F_n = F₁/n (N번째 단어의 빈도는 단어 위치에 의해 분할되는 첫 번째 단어의 빈도이다) (2)
F _n = F ₁ / n (the frequency of the Nth word is the frequency of the first word divided by the word position) (2)

다른 조정가능한 구성 파라미터들은 다음을 포함할 수 있다:Other adjustable configuration parameters may include:

니어-미스 섹션 당 단어 완성들의 수;

Number of word completions per near-miss section;

맞춤법 수정의 수; 및

Number of spelling corrections; And

맞춤법 수정 모드, 표준 편집 거리 또는 세트-편집-거리(글자 확률들을 갖거나 혹은 갖지 않는).

Spelling correction mode, standard edit distance, or set-edit-distance (with or without letter probabilities).

맞춤법 수정 실행Run spell correction

큰 단어 리스트 상의 맞춤법 수정은 중앙처리장치 집중적인 작업이고 메모리가 제한될 때 훨씬 더하다. 따라서, 수용가능한 성능에 도달하기 위하여, 전체 시스템은 선택된 맞춤법 수정 특징들을 기초로 하여 최적화되어야만 한다. 결과로서 생기는 시스템은 따라서 특징 관점으로부터 상당히 경직될 수 있다. 특정 최적화 없이 성능은 한 등급 또는 두 등급 더 나빠질 수 있다.
Spelling corrections on large word lists is central processor intensive and even more so when memory is limited. Thus, in order to achieve acceptable performance, the entire system must be optimized based on the selected spelling correction features. The resulting system can thus be quite rigid from the feature point of view. Without specific optimization, performance can be one or two grades worse.

맞춤법 수정 실행은 대부분 다음에 의존한다:Most spell correction execution depends on:

허용된 편집들, 모드들, 및 필터들과 같은, 맞춤법 수정 특성들

Spelling correction properties, such as allowed edits, modes, and filters

"퍼지 비교" 기능(단어가 입력과 일치하는지를 결정하는)

"Fuzzy compare" function (determines whether a word matches input)

낮은 레벨 언어 데이터베이스 검색 기능

Low level language database search function

언어 데이터베이스 포맷(구조 및 행동)

Language database format (structure and behavior)

언어 데이터베이스 내의 단어들의 수 및 그것들의 길이 분포

The number of words in the language database and their length distribution

언어 데이터베이스에 대하여 키보드 데이터베이스는 얼마나 애매한가.

How obscure is the keyboard database?

각각의 이러한 요소들은 다음 섹션에서 더 상세히 설명된다.
Each of these elements is described in more detail in the next section.

맞춤법 수정 특성들Spelling Corrections

허용된 편집들Allowed edits

허용된 편집들의 수는 매우 중요한 성능 계수이다. 편집들이 많을수록 비교 내에 더 많은 애매성이 존재하며 따라서 더 많은 단어들이 매치하고 우선순위화를 위하여 선택 리스트 내로 들어간다. 만일 비교가 너무 관대하면 영향은 원치않는 너무 많은 단어들이 리스트 내로 들어간다는 것이다.
The number of edits allowed is a very important performance factor. The more edits, the more ambiguity there is in the comparison, so more words match and get into the selection list for prioritization. If the comparison is too lenient, the effect is that too many unwanted words get into the list.

바람직한 실시 예에서, 허용된 편집들의 수는 입력 길이와 관련되고 하나의 편집은 최대 세 개까지 매 세 번째 입력을 위하여 주어진다. 아래의 실시 예들에 걸쳐 세 개의 입력 당 하나의 입력의 파라미터가 가정된다.
In a preferred embodiment, the number of edits allowed is related to the input length and one edit is given for every third input up to three. The parameters of one input per three inputs are assumed throughout the following embodiments.

모드들Modes 및 필터들 And filters

모드들 및 필터들은 결과 세트뿐만 아니라 성능을 실행을 제어하도록 사용된다. 모드의 두 가지 예는 정확한 입력 및 영역 내이다. 예를 들면, 터치 스크린 소프트 키보드 상에서, 사용자는 원하는 글자 상에 정확하게 태핑할 수 있을 뿐만 아니라 글자들의 거의 정확한 영역을 나타낸다. 정확한 입력 모드에서, 각각의 사용자 입력으로부터의 정확한 탭 글자만이 고려된다. 영역 내 모드에서, 각각의 사용자 입력에 의해 표시되는 인근 글자들의 일부 또는 모두가 고려된다.
Modes and filters are used to control performance as well as the result set. Two examples of modes are exact input and area. For example, on a touch screen soft keyboard, the user can not only accurately tap on the desired letter but also represent an almost accurate area of the letter. In the correct input mode, only the correct tab character from each user input is considered. In the in-zone mode, some or all of the nearby letters displayed by each user input are considered.

정확한 입력에 대한 맞춤법 수정은 애매성을 감소시키고 후보들을 더 입력된 것과 같이 보이게 한다(비록 입력된 것이 부정확하더라도). 이는 터치 스크린 소프트 키보드와 같은, 정확한 탭 값들을 특징으로 하는 키보드 데이터베이스에 효율적이다. 12 키 시스템들(표준 전화 키패드용)은 어떤 유용한 정확한 탭 값도 가질 수 없다; 각각의 키 누름은 글자들 중의 하나 대신에 키의 숫자(digit)에 의해 표현될 수 있으며, 각각의 키 상의 하나의 글자가 의도되는 글자인 다른 것보다 더 많을 것으로 직감할 수가 없다.
Spelling corrections to the correct input reduce ambiguity and make candidates appear more input (even if the input is incorrect). This is effective for a keyboard database that features accurate tap values, such as a touch screen soft keyboard. 12 key systems (for standard telephone keypads) cannot have any useful correct tap values; Each key press can be represented by a digit of a key instead of one of the letters, and one cannot feel that one letter on each key will be more than the other, which is the intended letter.

불행히도 12 키 시스템들을 위하여, 키보드 데이터베이스들은 일반적인 영역 내 모드 레이아웃으로서 행동하는데, 즉, 각각의 입력은 세트 당 적어도 3개의 글자를 생산하고, 악센트가 있는 단어들이 포함될 때 더 많은 글자들이 생산되나, 정확한 입력 모드와 필터링을 위하여 사용될 수 있는 정확한 탭 값을 갖지는 않는다.
Unfortunately for 12 key systems, the keyboard databases act as a mode layout in the general area, that is, each input produces at least three letters per set, and more letters are produced when the accented words are included, but correct It does not have the correct tap value that can be used for input mode and filtering.

필터는 만일 달성된 최소 기준을 충족시키지 못하면 후보 단어의 또 다른 고려를 종료하는 스크리닝 기능이다. 예를 들면, ONE/TWO 필터들은 대부분 단어 내의 첫 번째 글자가 첫 번째 또는 두 번째 입력과 밀접하게 연관되게 하고 매치하지 않는 어떤 후보 단어도 거부하는, 성능 개량을 위한 것이다.
The filter is a screening function that ends another consideration of candidate words if it does not meet the minimum criteria achieved. For example, ONE / TWO filters are mostly for performance improvement, allowing the first letter in a word to be closely associated with the first or second input and rejecting any candidate words that do not match.

"퍼지 비교" 기능"Fuzzy compare" function

퍼지 비교 기능은 입력 및 비교되는 단어 사이의 특정 차이, 편집 거리를 허용한다. 개념은 편집 거리를 계산하고 그 값을 기초로 하여 단어를 통과시키거나 또는 거부하는 것이다
The fuzzy comparison function allows for a specific difference, edit distance between the words being entered and compared. The concept is to calculate the editing distance and to pass or reject words based on their values.

정확한 편집 거리의 계산은 비싼 실행 방법이다. 그 해결책은 실제 계산 이전에 스크리닝 메커니즘을 위치시키는 것이다. 온당한 범위 내에서 "아래로" 거부하는 것이 수용가능하나, 만일 가능하다면 "위로" 거부는 방지되어야 한다. "아래로 거부" 때문에 스크리닝을 통과한 단어들은 실제 거리 계산 이후에, 뒤에 제거된다.
Accurate calculation of the editing distance is an expensive practice. The solution is to locate the screening mechanism before the actual calculation. To the extent denied "down" it is acceptable, but if possible a "up" denial should be avoided. Words that pass screening because of "deny down" are removed after the actual distance calculation.

빠른 스크리닝은 각각의 키 프레스 상의 수용가능한 실행을 유지하는데 중요하다. 잠재적으로, 많은 양의 단어들이 스크리닝을 위하여 들어올 수 있으며 정상적으로 소수만이 전달된다. 따라서, 뛰어난 성능을 위하여 스크리닝의 모든 것들은 또한 매우 효율적이다. 스크리닝 이후에 수행된 것들은 덜 중요한 실행 방법이나, 특히 수천 단어들이 선택 리스트 삽입 기능 내로 들어오는 특정 입력 조합들을 위하여, 여전히 들어오는 상당한 양의 데이터가 존재한다.
Fast screening is important to maintain acceptable performance on each key press. Potentially, large amounts of words can come in for screening and normally only a few are delivered. Thus, for good performance all of the screenings are also very efficient. What has been done after screening is a less important method of implementation, but there is still a significant amount of data coming in, especially for the particular input combinations where thousands of words enter into the select list insertion function.

하나 또는 그 이상의 실시 예들에서, 맞춤법 수정은 영역 내 자동 수정의 확률 세트 비교 논리와 함께 작동한다. 맞춤법 수정 계산을 기초로 하여 수용되지 않는 세트 비교들에 의해 수용되는 단어들이 존재한다. 이는 맞춤법 수정이 정확한 입력 모드에서 설정될 때 또는 정확한 필터들을 사용할 때 영역 내 입력을 위한 경우이다. 단어 완성은 또한 고전적인 비교를 위하여 더 단순하나 맞춤법 수정에서 편집들의 비용이 든다.
In one or more embodiments, the spelling correction works with the probability set comparison logic of the automatic corrections in the area. There are words that are accepted by set comparisons that are not accepted based on the spelling correction calculation. This is the case for input within a region when spell correction is set in the correct input mode or when using the correct filters. Word completion is also simpler for classical comparisons but costs the edits in spelling correction.

바람직한 실시 예에서, 퍼지 비교 단계들은 다음과 같다:In a preferred embodiment, the fuzzy comparison steps are as follows:

1. 너무 짧은 단어들을 위한 스크린1. Screens for words that are too short

2. 세트 기반 매치를 위한 스크린2. Screens for set based matches

3. 스템 편집-거리 계산3. Stem Edit-Distance Calculation

4. ONE/TWO를 위한 스크린4. Screen for ONE / TWO

5. 세트-편집 거리를 위한 스크린5. Screen for set-edit distance

6. 위치-잠금 글자들을 위한 스크린6. Screen for position-locked letters

7. 세트-편집--거리 및 빈도 계산7. Set-Edit--Calculate Distance and Frequency

8. 스템 편집-거리 계산8. Stem Edit-Distance Calculation

이러한 단계들은 도 1의 계산들(130)의 일 구현을 나타내는, 도 6의 플로 다이어그램으로서 도시된다.
These steps are shown as the flow diagram of FIG. 6, representing one implementation of the calculations 130 of FIG. 1.

고전 비교를 위한 스크리닝 및 단어 완성들의 처리 등은 추후의 맞춤법 수정 계산들 이전에 단계 2에 위치된다. 이는 모든 "고전의" 복잡성을 그 다음의 코드에서 제외한다. 또한 이는 맞춤법 수정이 꺼지면, 다른 모든 계산들이 건너뛸 수 있다는 것을 의미한다.
Screening for classical comparisons, processing of word completions, etc., are placed in step 2 before further spell correction calculations. This excludes all "classic" complexity from subsequent code. This also means that if spell correction is turned off, all other calculations can be skipped.

알고리즘은 서로에 대하여 두 단어를 비교하는 것과 같이 그려진다. 대부분의 실시 예들에서, 이는 일반화되며 따라서 하나의 단어는 입력 부호들과 상응한다. 아래에 참조되는 도면들에서의 샘플 매트릭스들에서, 입력 시퀀스는 수직으로 도시된다. 따라서, 각각의 입력 단어 위치가 표준 편집 거리와 마찬가지로 단일 글자라기보다는 오히려, 이는 실제로 애매하거나 또는 영역 내 입력과 상응하는 글자들의 세트이다. 비교는 만일 세트 내의 어떠한 글자라도 매치하면 매치를 생산한다.
The algorithm is drawn as comparing two words against each other. In most embodiments, this is generalized so that one word corresponds to input codes. In the sample matrices in the figures referenced below, the input sequence is shown vertically. Thus, rather than each single input word position being a single letter, as with standard editing distances, it is actually a set of letters that are either obscure or correspond to input in the region. The comparison produces a match if any character in the set matches.

만일 단어가 맞춤법 수정을 위하여 너무 짧으면, 즉, 입력 길이 - 이용가능한 편집 거리보다 짧으면, 즉시 거부될 수 있다.
If the word is too short for spelling correction, that is, less than the input length minus the available editing distance, it can be rejected immediately.

2. 세트 기반 매치를 위한 스크린2. Screens for set based matches

이는 비교되는 단어 내의 상응하는 위치에 매치하는 것을 입증하는, 입력 시퀀스에 대한 반복인데, 즉, 후보 단어 내의 각각의 글자는 각각의 입력 세트 내에 존재하여만 한다.
This is an iteration over an input sequence, which proves to match the corresponding position in the word being compared, i.e. each letter in the candidate word must be present in each input set.

만일 비-매치가 존재하고 단어가 맞춤법 수정을 위하여 너무 길면, 즉, 만일 입력 길이 + 이용가능한 편집 거리보다 길면, 그것은 즉시 거부될 수 있다.
If there is a non-match and the word is too long for spell correction, that is, if it is longer than the input length + available editing distance, it can be rejected immediately.

3. 스템 편집-거리 계산3. Stem Edit-Distance Calculation

이는 입력 시퀀스 내의 모든 부호들에 대한 반복이고, 세트 기반 매치가 존재할 때만 실행된다. 정확한 탭 값으로부터의 모든 차이는 스템 거리를 증가시키는데, 예를 들면, 후보 단어 "tomorrow"는 "tom"의 정확한 탭 입력을 위한 0의 스템 거리 및 "tpm"을 위한 1의 스템 거리를 가질 수 있다. 단어 탭 빈도는 또한 반복 동안에 계산된다.
This is an iteration over all the signs in the input sequence and is executed only when there is a set based match. Every difference from the correct tap value increases the stem distance, for example the candidate word "tomorrow" may have a stem distance of 0 for accurate tap input of "tom" and a stem distance of 1 for "tpm". have. The word tap frequency is also calculated during the iteration.

만일 그것이 유효한 고전적인 매치라면, 후보 단어의 "퍼지 비교"는 이 시점에서 완성된다. 후보 단어는 선택 리스트 내에 삽입된다.
If it is a valid classic match, then the "fuzzy comparison" of candidate words is completed at this point. Candidate words are inserted into the selection list.

4. ONE/TWO를 위한 스크린4. Screen for ONE / TWO

이는 단어 내의 첫 번째 글자가 첫 번째 ONE 또는 TWO 입력 부호들과 매치하는지를 알기 위한 빠른 검사이다. 만일 그렇지 않으면, 단어는 거부된다.
This is a quick check to see if the first letter in a word matches the first ONE or TWO input codes. If not, the word is rejected.

5. 세트-편집 거리를 위한 스크린5. Screen for set-edit distance

개념적으로 이는 매우 간단한 작업인데 그 이유는 개량된 편집 거리가 삽입, 삭제, 및 대체와 치환(후자는 주로 텍스트 엔트리 수정을 위하여 포함된다)을 사용하는 종래의 정의를 따르기 때문이다. 효율적인 방법으로 이를 시행하는 것은 매우 어렵다.
Conceptually this is a very simple task because the improved editing distance follows the conventional definition of using insertions, deletions, and substitutions and substitutions (the latter are mainly included for text entry modification). It is very difficult to implement this in an efficient way.

편집 거리를 계산하는 종래의 방법은 매트릭스를 사용한다. 일 실시 예가 도 7에 도시된다. 모든 모서리(회색 수자)는 미리 정의되고 항상 동일하다. 나머지는 행(column)들을 먼저, 왼쪽에서 오른쪽으로 그리고 상부에서 하부로 가로질러 계산된다. 삽입, 삭제, 대체, 및 치환과 상응하는 최소 값을 취함으로써 각각의 개별 위치가 계산된다,. 대체 및 치환 값들은 그러한 위치들을 위하여 매치가 존재하는지의 조건으로 한다. 결과로서 생긴 편집 거리가 하부 오른쪽 모서리, 본 경우에서는 "2"로 알려진다.
The conventional method of calculating the edit distance uses a matrix. One embodiment is shown in FIG. 7. All edges (gray figures) are predefined and always the same. The remainder is calculated across the columns first, from left to right and from top to bottom. Each individual position is calculated by taking the minimum values corresponding to the insertions, deletions, substitutions, and substitutions. Substitution and substitution values are conditioned on whether a match exists for such locations. The resulting edit distance is known as the lower right corner, in this case "2".

계산되는 셀, 즉, 도 8에서 "X"로 표시된 셀을 기초로 하여 값들을 찾기 위하여: 대체('D') 셀을 취하기 위한 비용은 매치가 존재하는가에 따라 0 또는 1이다. 치환('T') 셀은 두 글자 모두, 즉, 현재 및 이전 글자들 모두 매치하고 비용이 1일 때만 취해질 수 있다. 삽입('I') 및 삭제('D')는 또한 각각 하나의 비용이다. 따라서, 셀의 비용은 그러한 셀 뿐만 아니라 방금 언급된 추가 비용을 위하여 이미 계산된 비용이다.
To find the values based on the cell to be calculated, ie the cell indicated by "X" in FIG. 8: The cost to take a replacement ('D') cell is 0 or 1 depending on whether a match exists. The substitution ('T') cell can only be taken when both letters match, ie both current and previous letters and the cost is one. Insertion ('I') and deletion ('D') are also one cost each. Thus, the cost of a cell is the cost already calculated for that cell as well as the additional cost just mentioned.

이는 특히 긴 단어들의 거리를 계산하는데 계산적으로 매우 비싼 방법이다. 일 실시 예에서, 최대 허용가능한 편집 거리가 설정되고 따라서 단어들의 1% 이하가 그러한 한계를 통과한다. 만일 허용된 거리가 너무 높으면 전체 단어 리스트는 그것을 선택 리스트 내로 만들 수 있으며 맞춤법 수정의 전체 개념은 상실된다. 따라서, 초기에 정확한 거리는 중요하지 않으며, 오히려 결과가 거부 제한의 아래 또는 위에 위치되는가가 중요하다. 이러한 테스트를 통과하는 소수의 단어들을 위하여 정확한 거리, 빈도 등을 계산하는데 더 많은 노력이 소모될 수 있다.
This is a computationally very expensive method, especially for calculating long word distances. In one embodiment, the maximum allowable editing distance is set so that up to 1% of words pass such a limit. If the allowed distance is too high, the entire word list can make it into the selection list and the entire concept of spelling correction is lost. Thus, initially the correct distance is not important, rather it is important whether the result is below or above the rejection limit. More effort may be spent calculating the exact distance, frequency, etc. for the few words that pass this test.

스크리닝 단계의 목적은 가능한 한 빨리, 결과로서 생기는 거리가 거부 제한 위에 존재하는지를 입증하는 것이다.
The purpose of the screening step is to verify as soon as possible that the resulting distance is above the rejection limit.

도 9에 도시된 것과 같이, 길이를 제외하고는, 비교되는 단어들이 매치할 때를 고려한다. 셀들 중의 어떠한 것도 낮은 값을 갖는 것은 불가능하다. 실이 6 및 길이 9 단어들의 비교는 기대되는 것과 같이, 3의 편집 거리를 야기한다.
As shown in FIG. 9, except for the length, consider when the words being compared match. It is not possible for any of the cells to have a low value. A comparison of six and nine words in length results in an edit distance of three, as expected.

이러한 초기 매트릭스는 어떠한 두 개의 단어를 계산할 때 사용될 수 있다. 실제로 비교를 위하여 선택되는 셀들 내의 값들만이 그 방법을 따라 업데이트될 필요가 있다. 목적은 낮은 오른쪽 셀을 그것의 거부 한계 위로 미는 것이다. 그렇게 하기 위하여, 이러한 값을 얻기 위하여 의존하는 셀들 중 어떤 것이 실제로 높은 값을 가지며, 따라서 재귀적인 것이 입증되어야만 한다.
This initial matrix can be used when calculating any two words. In practice only the values in the cells selected for comparison need to be updated according to the method. The goal is to push the lower right cell above its rejection limit. In order to do so, some of the cells that depend on obtaining this value actually have a high value, and therefore must be proved recursive.

길이 차이가 3이고 첫 번째 글자가 매치하지 않는(도 10a에서 첫 번째 'x'를 'y'로 변경), 이러한 실시 예를 위하여, 거부는 4개의 셀을 계산함으로써 입증될 수 있다; 관련된 셀 업데이트들의 나머지가 포함된다. 도 10b에서의 반복은 재계산된 셀들(굵은 윤곽선) 및 각각의 반복에서 다른 의존 셀들 상의 효과를 나타낸다.
For this embodiment, the length difference is 3 and the first letter does not match (change the first 'x' to 'y' in FIG. 10A), the rejection can be proved by counting four cells; The remainder of the relevant cell updates is included. The iteration in FIG. 10B shows the effect on the recalculated cells (bold outlines) and other dependent cells in each iteration.

그 결과 중앙 대각선(center diagonal) 및 결과 값을 갖는 대각선을 향하는 것은 증가된 값들을 얻는다. 이는 또 다른 셀에서 가장 낮은 값을 제공하는, 마지막 셀이 완성된 비교 불일치의 결과로 증가될 때마다 발생한다.
As a result, towards the center diagonal and the diagonal with the resulting value get increased values. This occurs whenever the last cell, which gives the lowest value in another cell, is increased as a result of the complete comparison mismatch.

도시된 매트릭스들은 단어 길이 차이가 존재할 때 발생하는 것만을 설명한다. 만일 길이 차이가 제로이면, 중앙 대각선이 주 매트릭스가 되고 지원, 즉, 계산에 영향을 미치는데 충분히 큰 값은 거부를 입증하기 위하여 결과 대각선의 두 측면 모두로부터 나와야만 한다.
The illustrated matrices only describe what happens when there is a word length difference. If the length difference is zero, the central diagonal becomes the main matrix and a value large enough to support, ie affect the calculation, must come from both sides of the resulting diagonal to prove rejection.

계산들에서의 대각선은 데이터 액세스 패턴들을 최적화하는데 더 어렵게 만든다(위치들과 상응하는 실제 메모리를 액세스하는). 회전/변환된 매트릭스 공간 내의 운용은 또 다른 최적화이다; 도 11이 참조된다. 중앙 대각선(굵은 윤곽선) 내의 셀들은 단일 열이 된다. 모서리 셀들을 위한 디폴트 값들, 만일 참조되면 그것이 최대 가능한 편집-거리를 즉시 초과하는, 충분히 큰 값을 제공하기 위하여 새로운 "9"들(회색으로 도시)이 첨가된다. 이러한 변형된 공간에서 셀 계산 관계들은 도 12에 도시된 것과 같이 변한다.
Diagonal lines in the calculations make it more difficult to optimize data access patterns (accessing real memory corresponding to locations). Operation in the rotated / transformed matrix space is another optimization; Reference is made to FIG. 11. Cells in the center diagonal (bold outline) become a single column. Default values for the corner cells, if referenced, are added with new "9" s (shown in gray) to provide a sufficiently large value that immediately exceeds the maximum possible edit-distance. In this deformed space, cell calculation relationships change as shown in FIG.

맞춤법 수정 후보 상에서 실행되지 않기 때문에, 잠긴(locked) 위치들, 즉, 값의 이동 또는 변경을 허용하지 않는 위치들을 갖는 입력 부호들을 확인하기 위한 필요성이 존재한다. 이는 단지 그것들이 매치하는지를 검사하는, 잠긴 위치들을 갖는 입력 부호들에 대한 반복이다. 만일 그렇지 않으면, 단어는 거부된다.
Since it is not executed on the spelling correction candidate, there is a need to identify input signs with locked positions, i.e., positions that do not allow the movement or change of a value. It is just a repetition of input symbols with locked positions, checking that they match. If not, the word is rejected.

편집 거리를 위하여 스크리닝하기 위한 알고리즘은 편집 거리 및 단어 빈도와 같은 다른 것들을 계산하도록 변형될 수 있다. 그러나, 스크리닝 코드 내로 병합되어서는 안 된다. 그러한 코드는 순수한 스크리닝을 위하여 분리되어 유지되고 최적화되어야 한다. 서로 다른 버전(version)이 더 철저한, 스크리닝을 통과하는 단어들에 적용되는데, 그 이유는 그것이 서로 다른 셀들을 평가하고 낮은 거리와 높은 빈도를 위한 최선의 선택을 골라야만 하기 때문이다. 그것들은 또한 가능한 잠긴 부호 값들(위치가 아닌, 값)과 같은, 것을 처리해야만 한다.
The algorithm for screening for edit distance can be modified to calculate other things such as edit distance and word frequency. However, it should not be merged into screening code. Such code must be kept separate and optimized for pure screening. Different versions apply to more thorough, screening words, because they must evaluate different cells and choose the best option for low distance and high frequency. They must also handle things, such as possible locked sign values (values, not positions).

만일 세트-편집-거리 값이 특정 한계를 초과하면 후보는 거부된다.
If the set-edit-distance value exceeds a certain limit, the candidate is rejected.

8. 스템 편집-거리 계산8. Stem Edit-Distance Calculation

이는 또한 다음의 두 가지 이유로 인한 스크리닝 알고리즘이 변형된 모방이다:It is also a variation of the screening algorithm for two reasons:

첫 번째로, 스템 거리가 매우 다를 수 있는데 그 이유는 그것이 항상 정확한 매치를 기초로 하기 때문이다. 따라서, 값은 거리를 위한 의도되는 최대보다 더 높을 수 있다. 최대보다 높은 거리 값들은 알고리즘 최적화들 때문에 완전히 정확하지 않으나 여전히 충분히 뛰어나다.
First, the stem distance can be very different because it is always based on the exact match. Thus, the value may be higher than the intended maximum for distance. Distance values higher than the maximum are not completely accurate because of algorithmic optimizations but are still good enough.

두 번째로, 스템 거리는 또한 그것이 후보 단어의 전체 길이를 고려하지 않을 수 있다는 점에서 다르다. 비 맞춤법 수정 단어들에 호환되기 위하여, 스템 거리 계산은 입력의 길이에서 멈출 것이다. 삽입들 및 삭제들에 따라 최소 값을 얻기 위하여 종료 셀 주위에 일부 부가적인 검사가 필요하다.
Secondly, the stem distance is also different in that it may not take into account the total length of the candidate words. In order to be compatible with non-spelling correction words, the stem distance calculation will stop at the length of the input. Some additional checking around the end cell is needed to get the minimum value following insertions and deletions.

낮은 레벨 언어 데이터베이스 검색 기능Low level language database search function

퍼지 비교 기능은 스크리닝 및 계산에서 매우 효율적으로 만들어질 수 있으나, 특히 내장된 플랫폼들 상의, 뛰어난 성능을 위하여 단독으로는 충분하지 않다. 입력에 따라, 어휘 내의 거의 모든 단어는 잠재적인 맞춤법 수정 후보들일 수 있다. 이는 일반적으로 세 개의 입력 당 하나의 편집이 허용될 때, 대부분의 언어들에서 9번째 또는 10번째 입력을 입력할 때 발생한다.
Fuzzy comparison functions can be made very efficient in screening and calculation, but are not sufficient alone for good performance, especially on embedded platforms. Depending on the input, almost every word in the vocabulary can be potential spelling correction candidates. This typically occurs when you enter the ninth or tenth input in most languages when one edit is allowed per three inputs.

입력 길이 9에서 길이 6-12를 갖는 모든 단어는 잠재적인 맞춤법 수정 후보들이고 12보다 긴 모든 단어는 잠재적인 완성 후보들이다. 예를 들면, 입력 길이 9에서, 핀란드 어휘의 70% 이상은 맞춤법 수정을 기초로 하는 비교를 위하여 고려될 수 있으며 또 다른 20%는 단어 완성을 기초로 한다. 이는 상당한 효율성 문제를 생성하는데 그 이유는 맞춤법 수정이 가장 계산적인 수고를 필요로 하기 때문이다. 다음의 전략들은 이전에 설명된 하나 또는 그 이상의 스크리닝 기능을 통합함으로써 데이터베이스 검색 과정의 효율성을 증가시키도록 추구한다.
All words having an input length of 9 to lengths 6-12 are potential spelling correction candidates and all words longer than 12 are potential completion candidates. For example, at input length 9, more than 70% of the Finnish vocabulary can be considered for comparison based on spelling correction and another 20% based on word completion. This creates a significant efficiency problem because spelling modifications require the most computational effort. The following strategies seek to increase the efficiency of the database search process by integrating one or more of the screening functions described previously.

비 맞춤법 수정을 위한 검색 전략Search strategy for non-spelling corrections

여기에 참조로써 통합되는. Unruh; Erland, Kay; David Jon에 의해 "Efficient Storage and Search Of Word Lists and Other Text"라는 발명의 명칭으로 출원된 미국특허 제 11/379,354(2006년 4월 19일에 출원)에서 설명되는 것과 같은, 어휘 데이터베이스의 바람직한 실시 예가 맞춤법 수정 없이 디자인되고 최적화된다. 전체 입력 길이는 간격 스트림(interval stream)들에 직접 매핑되고 단어 리스트 내의 빠른 점핑(jumping)에 도움을 주기 위하여 가장 희박한 스트림들이 우선 방문된다. 일단 매치가 존재하면, 완성 글자들은 입력에 매칭되지 않은 스트림들로부터 선택될 수 있다.
Incorporated herein by reference. Unruh; Erland, Kay; Preferred implementation of a lexical database, as described in US Patent No. 11 / 379,354, filed April 19, 2006, filed by David Jon under the name "Efficient Storage and Search Of Word Lists and Other Text." The example is designed and optimized without spelling correction. The overall input length is mapped directly to interval streams and the sparest streams are first visited to help with fast jumping in the word list. Once a match exists, complete characters can be selected from streams that do not match the input.

이러한 전략으로 너무 짧은 단어들은 자동으로 건너뛰는데 그 이유는 그것들이 상응하는 입력과 매치하는 글자들을 갖지 않기 때문이다.
With this strategy, words that are too short are automatically skipped because they do not have letters that match the corresponding input.

맞춤법 수정을 위한 검색 전략Search strategy for spelling correction

맞춤법 수정과 함께, 언어 데이터베이스 내의 단어들은 입력 길이에 따라 다음과 같이 세 개의 범주에 포함된다:With spelling correction, words in the language database fall into three categories, depending on the length of the input:

너무 짧은 단어들

Too short words

완성들이 될 수 있는 긴 단어들

Long words that can be completions

맞춤법 수정을 위하여 적용가능한 단어들(입력 길이와의 특정 길이 차이)

Words applicable for spelling correction (specific length difference from input length)

이러한 범주들 각각이 다음의 섹션에서 설명된다.
Each of these categories is described in the following sections.

너무 짧은 단어들Too short words

이것들은 가장 짧게 허용된 단어에서의 마지막 글자와 상응하는 간격 스트림을 검사함으로써 쉽게 건너뛸 수 있다. 예를 들면, 만일 최소 길이가 6이면, 6번째 간격 스트림은 비어 있어서는 안 된다(종결 제로를 갖는다); 만일 비어 있으면, 직접 간격의 종료로 점프하는 것이 가능하다.
These can easily be skipped by checking the interval stream corresponding to the last letter in the shortest allowed word. For example, if the minimum length is 6, the sixth interval stream must not be empty (with termination zero); If empty, it is possible to jump directly to the end of the interval.

긴 단어들Long words

특별한 간격 스트림이 너무 짧은 단어들을 검사하는데 사용될 수 있는 것과 같이 또 다른 스트림이 긴 단어를 위하여 검사하는데 사용될 수 있다. 예를 들면, 만일 최대 길이가 12이면, 13번째 스트림은 단어가 긴지 아닌지를 결정한다.
Another stream can be used to check for long words, just as a special interval stream can be used to check for words that are too short. For example, if the maximum length is 12, the thirteenth stream determines whether the word is long or not.

긴 단어들은 마치 맞춤법 수정이 꺼진 것처럼 동일한 방법으로 정확하게 처리될 수 있다. 입력에 매핑된 스트림들은 점핑을 위하여 사용될 수 있으며 완성 부분이 스트림들의 나머지로부터 획득된다.
Long words can be handled exactly the same way as if spell correction was turned off. Streams mapped to the input can be used for jumping and the completion part is obtained from the rest of the streams.

맞춤법 수정 단어들Spelling correction words

효율적으로 검색될 수 있는 이전의 두 카테고리와 달리, 이 카테고리에 속하는 모든 단어는 기본적으로 편집 거리 계산을 위하여 보내져야만 한다. 이는 비록 언어 데이터베이스 검색 레벨에서 스크리닝이 필요하나, 실현 가능하지 않은, 성능이다. 성능 이득을 제공하는 한, 이러한 스크리닝은 상당히 아래로 거부될 수 있다.
Unlike the previous two categories, which can be searched efficiently, all words belonging to this category must be sent by default to calculate the editing distance. This is a performance that is not feasible, although screening is required at the language database search level. As long as it provides a performance gain, this screening can be rejected significantly below.

복잡한 요인은 맞춤법 수정 모드들 및 필터들이 정확한 모드에서 작동할 수 있으나 입력은 여전히 세트 기반이고, 따라서 비-맞춤법 수정 후보들이 세트 성향의 매치들일 수 있으나 맞춤법 수정은 세트 성향 정보를 사용할 수 없다는 것이다. 그 결과 어떤 스크리닝 과정도 마찬가지로 세트 성향 비교 논리를 고수하여야만 한다.
A complex factor is that spell correction modes and filters can operate in the correct mode but the input is still set based, so non-spell correction candidates may be set propensity matches but spell correction cannot use set propensity information. As a result, any screening process must adhere to the set propensity comparison logic as well.

바람직한 실시 예를 위한 언어 데이터베이스 감색 스크리닝 기능의 양상이 도 13에 도시된다. 세트 성향 비교 논리와 함께, 대상 단어는 입력 시퀀스와 매치하지 않는데 그 이유는 4GHI 키가 그것의 세트 내의 'd'를 포함하지 않기 때문이다. 그러나 세트-편집-거리 비교 논리는 어떤 입력도 삽입되거나, 삭제되거나, 또는 치환되는 것을 허용한다. 따라서, 각각이 입력에 의해 표현되는 세트는 인접한 키들을 포함하는 세트들의 조합으로 확장한다. 포함되는 인접한 키들의 수는 허용된 편집들의 수와 같은 제약 파라미터들에 의존한다.
An aspect of a language database navy blue screening function for a preferred embodiment is shown in FIG. 13. With set propensity comparison logic, the target word does not match the input sequence because the 4GHI key does not contain a 'd' in its set. However, set-edit-distance comparison logic allows any input to be inserted, deleted or replaced. Thus, the set each represented by the input extends to a combination of sets containing adjacent keys. The number of adjacent keys included depends on constraint parameters such as the number of edits allowed.

다음의 문단들에서 설명되는 것과 같이, 퍼지 비교 기능으로부터의 많은 스크리닝 기능들이 적용될 수 있고 데이터베이스 검색 과정 내로 통합될 수 있다.
As described in the following paragraphs, many screening functions from the fuzzy comparison function can be applied and integrated into the database search process.

필터 ONE/TWOFilter ONE / TWO

필터 ONE/TWO는 점핑을 위하여 사용될 수 있다. 만일 간격 스트림 제로(단어 내의 첫 번째 글자)가 상응하는 입력(필터에 따른, 첫 번째 또는 두 번째 입력)과 매치하지 않으면, 점프가 발생할 수 있다.
Filter ONE / TWO can be used for jumping. If the interval stream zero (the first letter in the word) does not match the corresponding input (first or second input, depending on the filter), a jump may occur.

만일 필터 설정이 세트 기반 비교 논리와 매치하지 않으면, 떨어지는 스트림에 의해 동반되어야만 한다. 결과로서 생기는 점프는 두 개 중의 짧은 것에 한정된다(두 개의 스트림 중의 하나 내의 가장 가까운 종료). 이러한 필터는 맞춤법 수정 후보들에만 적용된다.
If the filter settings do not match the set based comparison logic, they must be accompanied by a falling stream. The resulting jump is limited to the short of two (closest end in one of the two streams). This filter applies only to spell correction candidates.

입력 기반 스크리닝Input based screening

비록 이용가능한 편집들이 단어들을 매치시키더라도, 이는 입력과 더 달라 보이며, 매치할 수 있는 것에 대한 제한이 여전히 존재한다. 이용가능한 편집들의 제한된 수는 만일 삽입하고 삭제하면 제한된 수만이 적용될 수 있다는 것을 의미하며, 따라서 단어 내의 글자가 입력 관련 스트림으로부터 얼마나 떨어져 있는가에 대한 제한이 존재하고 여전히 매치로서 계산된다.
Although the available edits match words, it looks more different than the input, and there are still restrictions on what can be matched. The limited number of edits available means that only a limited number can be applied if inserted and deleted, so there is a limit on how far the letters in a word are from the input related stream and still count as a match.

스크리닝은 필터와 상관없이 적용될 수 있으나, 필터들은 효율적인 방법으로 스크리닝의 일부분으로 만들어질 수 있다. 스크리닝은 반드시 매우 빨라야 하는데, 따라서 복잡도는 낮게 유지되어야 한다.
Screening can be applied independent of filters, but filters can be made part of the screening in an efficient manner. Screening must be very fast, so the complexity must be kept low.

단어를 거부하기 위하여, 편집들의 이용가능한 수보다 하나 더 많은 누락(miss)이 필요하다. 예를 들면, 편집 거리 3을 위하여, 4개의 누락이 발견되어야 한다. 만일 9개의 입력이 존재하고 비교되는 단어가 길이 6을 가지면, 길이 9까지 비교하는데 그 이유는 위치 7, 8, 및 9가 종료 코드로서 0을 가지며 이는 항상 어떠한 입력 조합과도 비교할 수 없기 때문이다. 만일 단어가 입력보다 길면, 단어의 길이까지 비교한다.
To reject a word, one more miss is needed than the available number of edits. For example, for edit distance 3, four omissions must be found. If there are nine inputs and the word being compared has a length of 6, the length is compared to 9 because positions 7, 8, and 9 have 0 as the termination code, which cannot always be compared with any combination of inputs. . If the word is longer than the input, the length of the word is compared.

길이-독립 스크리닝Length-Independent Screening

단어 길이가 미리 결정되지 않을 때 스크리닝하기 위한 하나의 해결책은 스크리닝 매칭을 위하여 사용될 수 있는 두 번째의, 제작되는, 입력을 설정하는 것이다. 이는 모든 위치가 주변 원래 위치들의 조합이 되는 것과 같은 방법으로 제작된다.
One solution for screening when word length is not predetermined is to set a second, fabricated, input that can be used for screening matching. It is produced in such a way that all positions are a combination of the surrounding original positions.

입력 길이 9를 위하여, 조합 맵은 도 14에 도시된 것과 유사하다. 모든 'lxx' 열은 입력 내의 위치이다. 각각이 행은 비교되는 단어 내의 위치이다. 예를 들면, 단어 내의 4번째 글자는 첫 번째 7 입력들 중 어느 것과 매치할 수 있으며 사용되는 편집으로서 계산될 수 없다. 12번째 글자는 9번째 입력과 매치하며, 따라서 이는 훨씬 더 제한적이다.
For input length 9, the combination map is similar to that shown in FIG. All 'lxx' columns are positions in the input. Each row is a position within the word being compared. For example, the fourth letter in a word can match any of the first seven inputs and cannot be calculated as the edit used. The twelfth character matches the ninth input, which is much more restrictive.

만일 단어 내의 어떠한 글자가 조합을 매치시키지 못하면 이는 누락으로서 계산하며 따라서 잠재적인 편집을 호출한다. 충분한 누락과 함께 단어는 이러한 스크리닝에 의해 버려질 수 있다.
If any letter in a word does not match a combination it counts as missing and thus invokes potential editing. Words with sufficient omission can be discarded by this screening.

만일 단어가 입력보다 짧으면, 그러한 차이는 이용가능한 편집에서 뺄 수 있으며 비교는 이용가능한 위치들의 검사만 필요로 한다. 따라서, 만일 길이 차이가 이용가능한 편집들의 수와 동일하면, 하나의 위치만이 단어를 거부하지 않도록 해야만 한다.
If the word is shorter than the input, the difference can be subtracted from the available edits and the comparison only requires checking of the available positions. Thus, if the length difference is equal to the number of edits available, then only one position should be made to not reject the word.

필터를 위하여 적용된 것과 동일한 제한들이 여기에 적용된다. 만일 정확한/영역 내 유의성이 존재하면 세트 기반 간격 스트림의 실패에 의해 거부가 달성되어야만 한다.
The same restrictions apply here as applied for the filter. If there is correct / in-region significance, rejection must be achieved by failure of the set based interval stream.

가능한 가장 긴 점프는 조합 또는 세트 기반이든지 간에, 떨어지는 간격 스트림의 가장 근접한 종료에 대한 것이다.
The longest jump possible is for the closest end of the falling interval stream, whether combination or set based.

점프를 만들기 위하여 존재하는 떨어지는 세트 기반 스트림을 위한 요구사항이 존재하기 때문에, 단어 길이 카테고리 내의 변화와 관련하여 점프를 제한할 필요가 없다.
Since there is a requirement for a falling set based stream to exist to make a jump, there is no need to limit the jump with respect to changes in the word length category.

길이-의존 스크리닝Length-dependent screening

길이 의존 스크리닝의 바람직한 실시 예에서, 비교되는 단어의 길이의 계산은 그러한 길이를 위하여 적용할 수 있는 조합들을 한정할 수 있다. 예를 들면, 길이 6 및 입력 길이 9를 위하여, 조합 맵은 도 15와 유사하다.
In a preferred embodiment of length dependent screening, the calculation of the length of the words being compared can define the combinations applicable for that length. For example, for length 6 and input length 9, the combination map is similar to FIG.

이는 더 한정된 조합들을 특징으로 하나, 조합들을 선택하기 위한 단어 길이를 찾기 위한 비용이 추가된다. 그것은 또한 동일한 길이를 갖는 단어들의 청크(chunk) 내까지 가능한 점프 길이를 한정하는데 그 이유는 길이가 변경되자마자, 조합들도 변경되기 때문이다. 따라서, 이는 또한 언어 데이터베이스를 걸쳐 단어 길이 변경들의 수를 최소화하기 위한 필요조건이다.
This features more limited combinations, but adds the cost of finding the word length for selecting combinations. It also limits the possible jump lengths up to the chunks of words having the same length, as soon as the length is changed, the combinations are also changed. Thus, this is also a requirement to minimize the number of word length changes across the language database.

길이 의존 패턴들을 갖는 것과 별도로, 독립적인 스크리닝의 설명이 마찬가지로 여기에 적용된다.
Apart from having length dependent patterns, the description of independent screening applies here as well.

선택 리스트 배치 전략들 및 알고리즘들Choice List Placement Strategies and Algorithms

결합된 알고리즘들이 결과는 아마 순서대로, 만일 입력 시퀀스가 완료되면 사용자가 벌써 타이핑한 단어, 혹은 만일 입력 시퀀스가 단어 또는 구(phrase)의 스템을 표현하면 사용자가 타이핑하기 시작한 단어 중의 하나를 포함하는 선택을 위한 단어 선택들의 리스트이다.
The combined algorithms result in a sequence that probably contains a word that the user has already typed when the input sequence is completed, or one that the user has started typing if the input sequence represents a stem of words or phrases. List of word choices for selection.

단어 리스트 정렬 순서는 영역 내 확률, 편집 거리, 단어 신근성/빈도(각각의 데이터베이스 내에 저장되는 것과 같이), 단어 길이, 및/또는 스템 편집 거리의 요소들을 기초로 할 수 있다. 단어 리스트 배치는 또한 두 개 또는 그 이상의 서로 다른 리스트 프로파일 또는 전략들 중 어느 것을 사용하는가에 의존할 수 있다. 예를 들면:The word list sort order may be based on factors in probability, edit distance, word extinction / frequency (as stored in each database), word length, and / or stem edit distance in the region. The word list layout may also depend on which of the two or more different list profiles or strategies is used. For example:

전-단어 우선권Pre-word priority

1. 전 단어는 항상 단어 완성 이전에 온다;1. The whole word always comes before word completion;

2. 소스 사전, 예를 들면, 주 어휘, 맥락과 관련된, 사용자 정의된, 최신 배치된, 플러그-인, 매크로 치환;2. Source dictionaries, eg, user defined, up-to-date, plug-in, macro substitutions related to the main vocabulary, context;

3. 편집 거리, 예를 들면, 큰 값 앞의 작은 값;3. Edit distance, eg, small value before large value;

4. 스템 편집 거리, 예를 들면, 가장 작은 첫 번째; 및 만일 편집 거리가 0보다 크고 두 단어 선택들을 위하여 동일할 때만;4. Stem edit distance, eg smallest first; And only if the editing distance is greater than zero and equal for two word selections;

5. 빈도, 예를 들면, 가장 큰 첫 번째; 탭 빈도 x 단어 빈도
5. frequency, for example, the largest first; Tap frequency x word frequency

평가의 순서는 위와 같은데, 예를 들면 기준 3은 기준 2가 비교된 아이템들에 대하여 동일할 때만 고려된다. 이러한 이유 때문에, 예를 들면, 커스텀(custom) 사용자 단어들 상의 맞춤법 수정들은 표준 어휘 단어들을 위한 영역 내 수정 앞에 나타날 수 있다.
The order of evaluation is as above, for example criterion 3 is considered only when criterion 2 is the same for the compared items. For this reason, for example, spelling corrections on custom user words may appear before modifications in the area for standard lexical words.

단어 완성들 Word completions

1. 스템 편집 거리;1. Stem edit distance;

2. 단어 완성2. Word Completion

3. 소스;3. source;

4. 편집 거리4. Edit distance

5. 빈도.
5. Frequency.

스템 편집 거리가 첫 번째 표준이고, 완성은 두 번째이기 때문에, 리스트는 다음과 같이 효율적으로 세그먼트화된다:Since the stem edit distance is the first standard and the completion is the second, the list is effectively segmented as follows:

0개의 오류를 갖는 전 단어, 정확한 탭 입력 시퀀스는 단어와 동일하다The whole word with zero errors, the exact tapping sequence is the same as the word

0개의 오류 스템(들)을 갖는 완성(들)Completion (s) with 0 error stem (s)

1개의 니어-미스를 갖는 전 단어(들)All word (s) with one near miss

1개의 니어-미스 스템을 갖는 완성(들)Completion (s) with one near-miss stem

...
...

시스템은 지정된 기본 전략을 허용할 수 있다. 이는 또한 소스 데이터베이스 내에 기록된 빈도/신근성 정보에 덧붙여, 단어 선택의 인식된 패턴들을 기초로 하여 배치를 자동으로 적용할 수 있다. 예를 들면, 시스템은 첫 번째 글자들이 입력과 정확하게 매치하는 단어 완성을 선택하는 그러한 대부분의 시간을 감지할 수 있으며, 따라서 단어 리스트 배치 성향을 "완성들의 촉진" 프로파일 쪽으로 바꿀 수 있다.
The system may allow a specified default strategy. It can also automatically apply placement based on recognized patterns of word selection, in addition to frequency / extension information recorded in the source database. For example, the system can sense most of those times when the first letter selects word completion that exactly matches the input, thus shifting the word list placement propensity towards the "promote completion" profile.

도 16은 이 경우에 있어서, 영역 내 자동 수정을 갖는 세트-편집-거리 맞춤법 수정을 나타내는, 본 발명의 운용 동안에 샘플 사용자 인터페이스를 도시한다. 모바일 장치상의 이러한 실시 예에서, 후보 단어들은 각각이 사용자 입력상의 스크린의 하부를 가로질러 나타난다. 이탤릭체로 도시된, 왼쪽에서의 스트링은 이러한 장치를 위하여 그것의 쿼티 엄지 키보드 상에 눌려지는 각각의 키인, 정확한 탭 글자 시퀀스이다. 화살표는 디폴트(가장 높은 순위) 단어 선택을 나타낸다. 두 번째 스크린은 키들 "b"와 "o"가 눌려진 후에 제공되는 세 개의 단어 완성들을 도시한다. 세 번째 스크린은 만일 글자 "w"가 중간에 삽입되고(1의 표준 편집-거리) "i"가 키보드 상의 "k"에 인접하면(영역 내 자동 수정을 사용하여) 입력 시퀀스 "bok"에 가까이 매치하는, 후보로서 "bowl"을 나타낸다. 다섯 번째 스크린은 디폴트 단어 선택으로서 "going"을 나타내는데, 그 이유는 "g"와 "i"가 각각 "b"와 "k"의 입력들에 인접하기 때문인데; 두 번째 단어 선택이 "e"가 "o"로 대체된(1의 편집-거리), "being"인 것과 같이 나타낸다. 수정 파라미터들은 편집-거리 차이들보다 적은 영역 내 자동 수정 차이들에 페널티를 준다.
FIG. 16 shows a sample user interface during operation of the present invention, in this case representing a set-editing-distance spelling correction with automatic correction in the area. In this embodiment on the mobile device, the candidate words each appear across the bottom of the screen on the user input. The string on the left, shown in italics, is the exact sequence of tab letters, each key pressed on its QWERTY thumb keyboard for this device. Arrows indicate default (highest ranking) word selection. The second screen shows the three word completions provided after the keys "b" and "o" are pressed. The third screen is closer to the input sequence "bok" if the letter "w" is inserted in the middle (standard edit-distance of 1) and "i" is adjacent to "k" on the keyboard (using autocorrection in the area). "Bowl" is indicated as a candidate to match. The fifth screen shows "going" as the default word choice because "g" and "i" are adjacent to the inputs of "b" and "k"respectively; The second word selection is represented as "being", with "e" replaced by "o" (edit-distance of 1). The correction parameters penalize auto correction differences in the area less than the edit-distance differences.

다른 특징들 및 애플리케이션들Other Features and Applications

자동-치환, 예를 들면, 매크로들: 비록 단어 완성이 확장된 텍스트에 적용될 수 있더라도, 영역 내 및 맞춤법 수정 모두 단축키에 적용될 수 있다. 따라서, 만일 입력 시퀀스가 확장된 텍스트의 단축키 및 스템과 대략 매치하면, 매크로의 순위는 증가될 수 있다. 매크로들은 미리 정의되거나 또는 사용자-정의가능할 수 있다.
Auto-replacement, eg, macros: Although word completion can be applied to expanded text, both in region and spelling corrections can be applied to shortcut keys. Thus, if the input sequence approximately matches the shortcut and stem of the expanded text, the ranking of the macro can be increased. Macros can be predefined or user-definable.

광고 목적을 위한, 키보드 플래깅(keyboard flagging)은 자동 치환 및/또는 맞춤법 수정으로부터 이익을 얻는다. 예를 들면, 만일 모바일 메시지 내의 단어가 속어 또는 철자가 틀린 텍스트이면, 본 발명의 실시 예들은 유용한 스폰서(sponsor)의 키워드를 찾을 수 있다.
For advertising purposes, keyboard flagging benefits from automatic substitutions and / or spelling corrections. For example, if a word in a mobile message is a slang or misspelled text, embodiments of the present invention may find a keyword of a useful sponsor.

본 발명의 일 실시 예는 예를 들면, 다중-탭을 통하여 그것의 텍스트가 원래 애매하게 또는 분명하게 입력되었거나 혹은 또 다른 장치로부터 메시지 또는 파일로서 수신되었든지 간에 전체 메시지 버퍼, 즉, 배치 모드(batch mode)에 적용될 수 있다.
One embodiment of the present invention provides a full message buffer, i.e., batch mode, whether its text was originally entered ambiguously or explicitly through multi-tabs or received as a message or file from another device. mode).

맞춤법이 수정된 단어 선택은 만일 입력 방법이 구두점(punctuation)을 중심으로 만들어진 룰들을 포함하는, 단어 선택의 자동 확장을 허용하면, 또 다른 입력, 단어 완성 등을 위한 기본이 될 수 있다. 일 실시 예에서, 연쇄 메뉴(cascading menu)는 선택된 단어 또는 스템을 위한 단어 완성들의 리스트를 제출한다.
Spelled word selection can be the basis for further input, word completion, etc., if the input method allows for automatic expansion of word selection, including rules built around punctuation. In one embodiment, the cascading menu submits a list of word completions for the selected word or stem.

본 발명의 실시 예들은 또한 검색 및 발견을 위한 애매한 엔트리에 적용될 수 있다. 예를 들면, 만일 사용자의 입력 시퀀스가 이동 장치의 콘텐츠 또는 서버 기반 검색 엔진들에 의해 근접하게 매치하지 않으면, 매치를 야기하는 하나 또는 그 이상의 맞춤법의 수정된 해석들이 제공될 수 있다.
Embodiments of the invention can also be applied to ambiguous entries for search and discovery. For example, if the user's input sequence does not closely match by the mobile device's content or server-based search engines, one or more modified interpretations of the spelling that cause the match may be provided.

위의 실시 예들은 라틴 기반 언어들을 갖는 본 발명의 실시 예들의 사용을 설명하나, 다른 실시 예들은 다른 알파벳들 또는 문자들의 특정 요구를 다룰 수 있다.
While the above embodiments illustrate the use of embodiments of the invention with Latin based languages, other embodiments may address specific needs of other alphabets or characters.

입력을 추적하기 위한 애플리케이션Application to track input

서론(Introduction( introductionintroduction ))

위에서 설명된 내용에 대한 개량으로서, 하드웨어, 소프트웨어, 펌웨어, 회로, 및 다른 특징들이 여기서 "추적" 기술을 사용하여 구성될 것이다. 추적 기술로, 사용자는 입력 단어들 내의 원하는 글자들을 통하여(또는 거의 통하여) 단일의, 연속적인 경로를 추적하며, 시스템(200)의 예측 기술은 어떤 단어가 입력되고 스크린(203) 상에 디스플레이하는지를 계산해 낸다. 만일 추적된 경로로부터 예측될 수 있는 다중 단어들이 존재하면, 시스템(200)은 선택들의 리스트를 제공한다.
As an improvement to the above description, hardware, software, firmware, circuitry, and other features will be configured here using a "tracking" technique. With the tracking technique, the user tracks a single, continuous path through (or nearly through) the desired letters in the input words, and the prediction technique of the system 200 indicates which words are entered and displayed on the screen 203. Calculate If there are multiple words that can be predicted from the tracked path, the system 200 provides a list of choices.

추적 기술은 또한 Nuance Corporation 또는 자회사에서 출원된 다음의 특허들과 같은 공보에서 설명된다: (1) Levi에 의해 "FAST TYPING SYSTEM AND METHOD"라는 발명의 명칭으로 2007년 2월 13일에 등록된 미국특허 제 7,175,438; (2) Zhai에 의해 "SYSTEM AMD METHOD FOR RECOGNIZING WORD PATTERNS BASED ON A VIRTUAL KEYBOARD LAYOUT"이라는 발명의 명칭으로 2007년 7월 31일에 등록된 미국특허 제 7,251,367; (3) Zhai 등에 의해 "SYSTEM AND METHOD FOR ISSUING COMMANDS BASED ON PEN MOTIONS ON A GRAPHICAL KEYBOARD"라는 발명의 명칭으로 2009년 2원 3일에 등록된 미국특허 제 7,487,461; (4) Kristensson 등에 의해 "SYSTEM AND METHOD FOR RECOGNIZING WORD PATTERNS IN A VERY LARGE VOCABULARY BASED ON A VIRTUAL KEYBOARD LAYOUT"라는 발명이 명칭으로 2010년 3월 27일에 등록된 미국특허 제 7,706,616; (5) Kristensson에 의해 "SYSTEM AND METHOD FOR PREVIEW AND SELECTION OF WORDS"라는 발명이 명칭으로 2008년 10월 30일에 공개된 미국특허 제 2008/0270896; (6) Kristensson 등에 의해 "SYSTEM AND METHOD FOR IMPROVING TEXT INPUT ON A SHORTHAND-ON-KEYBOARD INTERFACE"라는 발명의 명칭으로 2007년 3월 26일에 공개된 미국특허 제 2007/0094024. 상기 특허들 각각은 여기에 전체가 참조로써 통합된다.
Tracking technology is also described in the publications such as the following patents filed by Nuance Corporation or its subsidiaries: (1) United States, registered February 13, 2007, under the name "FAST TYPING SYSTEM AND METHOD" by Levi. Patent 7,175,438; (2) U.S. Patent No. 7,251,367, filed Jul. 31, 2007 by Zhai under the name "SYSTEM AMD METHOD FOR RECOGNIZING WORD PATTERNS BASED ON A VIRTUAL KEYBOARD LAYOUT"; (3) US Patent No. 7,487,461, registered on Feb. 3, 2009, entitled “SYSTEM AND METHOD FOR ISSUING COMMANDS BASED ON PEN MOTIONS ON A GRAPHICAL KEYBOARD” by Zhai et al .; (4) US Pat. No. 7,706,616, filed March 27, 2010, entitled “SYSTEM AND METHOD FOR RECOGNIZING WORD PATTERNS IN A VERY LARGE VOCABULARY BASED ON A VIRTUAL KEYBOARD LAYOUT” by Kristensson et al. (5) US Patent No. 2008/0270896, published on October 30, 2008 by Kristensson, entitled “SYSTEM AND METHOD FOR PREVIEW AND SELECTION OF WORDS”; (6) United States Patent 2007/0094024, published on March 26, 2007 by Kristensson et al. Entitled " SYSTEM AND METHOD FOR IMPROVING TEXT INPUT ON A SHORTHAND-ON-KEYBOARD INTERFACE. &Quot; Each of the above patents is incorporated herein by reference in its entirety.

도 17은 소프트 키보드(1700)의 부분 스크린 샷이다. 추적된 패턴(1706)은 순서대로 글자들 P-R-E-T-T-Y를 통한 추적을 포함하는, 단어 "pretty"이 사용자의 엔트리를 나타낸다. 추적된 패턴은 또한 어떠한 의도되는 제한 없이, 추적의 추적된 경로로서 언급될 수 있다. 운용에서, 키보드(1700)는 피드백을 나타내는 추적(1706)의 일부 또는 모두를 디스플레이한다. 추적(1706)은 사용자의 손가락 또는 스타일러스를 따른다. 이러한 실시 예에서, 추적은 사용자가 손가락 또는 스타일러스를 올릴 때 사라지거나 희미해진다. 일 실시 예에서, 추적된(1706) 디스플레이의 부분의 길이는 사용자의 손가락 또는 스타일러스의 이동 속도에 따라 다양할 수 있다. 또 다른 실시 예에서, 키보드(1700)는 추적(1706)을 디스플레이하지 않는다. 상부 영역(1702)에서, 키보드(1700)는 PRETTY, PERRY, PETTY, 및 PREY를 포함하는, 사용자의 추적(1706)과 매칭하는 일부 잠재적인 단어들을 디스플레이한다.
17 is a partial screen shot of soft keyboard 1700. The traced pattern 1706 represents the entry of the user with the word "pretty", which in turn includes tracing through the letters PRETTY. The tracked pattern can also be referred to as the tracked path of tracking, without any intended limitation. In operation, the keyboard 1700 displays some or all of the traces 1706 representing the feedback. Tracking 1706 follows the user's finger or stylus. In this embodiment, the tracking disappears or fades when the user raises the finger or stylus. In one embodiment, the length of the portion of the tracked 1706 display may vary depending on the speed of movement of the user's finger or stylus. In another embodiment, the keyboard 1700 does not display the trace 1706. In the upper area 1702, the keyboard 1700 displays some potential words that match the user's tracking 1706, including PRETTY, PERRY, PETTY, and PREY.

위에서 설명된 것과 같은 맞춤법 수정 기술들을 갖는 추적 기술을 사용하기 위하여, 아래에 요약되는 실시 예들에 따라, 다양한 변경 및 추가들이 사용들이 있다.
In order to use a tracking technique with spelling correction techniques as described above, there are various modifications and additions in accordance with the embodiments summarized below.

운용 operation 시퀀스sequence

도 24는 사용자의 추적 입력을 해결하기 위한 작동 시퀀스를 나타낸다. 도 2의 실시 예를 계속 참조하면, 본 실시 예에서 사용되는 디스플레이(203)는 터치 감응 디스플레이를 포함하며, 따라서 입력 장치(202)는 디스플레이(203)와 통합되는 것과 같이 보일 수 있는 부품을 포함한다. 물론, 장치(200) 내에 부가적인 입력 장치들이 존재할 수 있으나, 이러한 시퀀스(2400)의 목적을 위한 관련 입력 장치는 복합 터치 감응 디스플레이이다. 이러한 실시 예의 목적을 위하여, 어휘 모듈(213)은 적어도 다중 엔트리를 갖는 어휘 데이터베이스를 포함한다. 어휘 모듈(213)은 사전 또는 어휘로서 언급될 수 있다.
24 illustrates an operation sequence for resolving a tracking input of a user. With continued reference to the embodiment of FIG. 2, the display 203 used in this embodiment includes a touch sensitive display, such that the input device 202 includes components that may appear to be integrated with the display 203. do. Of course, there may be additional input devices within the device 200, but the associated input device for the purpose of this sequence 2400 is a composite touch sensitive display. For purposes of this embodiment, the lexical module 213 includes a lexical database with at least multiple entries. The vocabulary module 213 may be referred to as a dictionary or a vocabulary.

계속 도 2의 실시 예를 참조하면, 운용들(2400)은 본 실시 예에서 중앙처리장치(201)에 의해 실행된다. 2402 단계에서, 중앙처리장치(201)는 터치 감응 디스플레이(203) 상의 소프트 키보드를 디스플레이한다. 소프트 키보드는 하나 또는 그 이상의 글자를 포함한다. 설명의 목적을 위하여, 키보드(1700)에 의해 예시된 것과 같이, 쿼티 키보드가 논의된다.
With continued reference to the embodiment of FIG. 2, operations 2400 are executed by the central processing unit 201 in this embodiment. In operation 2402, the CPU 201 displays a soft keyboard on the touch-sensitive display 203. The soft keyboard includes one or more letters. For purposes of explanation, a QWERTY keyboard is discussed, as illustrated by keyboard 1700.

단계 2404에서, 장치(200)는 터치 감응 디스플레이 표면을 통하여 사용자의 추적을 수신한다. 추적은 다중의 디스플레이된 키를 접촉하는 단일 연속 추적을 포함한다. 접촉된 키들은 추적이 시작되는 시작 키, 및 추적이 중단되는 종료 키를 포함한다. 설명의 편이를 위하여, 본 실시 예는 도 17에 도시된 것과 같은 "pretty"의 추적을 사용한다. 일 실시 예에서, 중앙처리장치(201)는 추적된 경로의 좌표 또는 다른 기계-판독가능 표현을 저장한다.
In step 2404, device 200 receives a trace of the user through the touch-sensitive display surface. The tracking includes a single continuous tracking that touches multiple displayed keys. The touched keys include a start key at which tracking begins and an end key at which tracking stops. For ease of explanation, this embodiment uses tracking of "pretty" as shown in FIG. In one embodiment, central processing unit 201 stores coordinates or other machine-readable representations of tracked paths.

2404 단계에서 수신된 추적을 기초로 하여, 단계 2406은 입력 시퀀스를 달성한다. 입력 시퀀스는 위에서 설명된 것과 같은 접촉된 키들을 포함한다. 입력 시퀀스는 추적에 의해 실제로 접촉되지 않았으나, 추적에 근접한, 다양한 다른 '보조' 키들을 더 포함한다.
Based on the tracking received in step 2404, step 2406 achieves an input sequence. The input sequence includes contacted keys as described above. The input sequence further includes various other 'secondary' keys that are not actually touched by the trace, but are close to the trace.

단계 2406은 보조 키들을 식별하기 위하여 하나 또는 그 이상의 서로 다른 기준을 사용할 수 있다. 예를 들면, 도 25에 도시된 것과 같이, 추적이 키와 접촉할 때, 접촉된 키의 규정된 반경 내의 모든 키는 입력 시퀀스의 일부분이다. 도 26에 도시된 것과 같이, 서로 다른 접근법 하에서, 추적이 주어진 키와 접촉할 때, 주어진 키 상의 중심에 있는 규정된 직사각형 '터치-영역'은 보조 키들, 따라서 입력 시퀀스의 일부분으로 고려된다. 터치 영역의 면적에 따라, 이러한 기준은 보조 키들을 더 제한적으로 정의하는데 유용할 수 있다. 작동 터치 영역의 일 실시 예는 표현가능한 소프트 키의 크기의 두 배의 영역이다. 도 26의 실시 예에서, 접촉된 "G" 키의 보조 키들은 이웃자리들 T, Y, F, H, C, 및 B를 포함한다.
Step 2406 may use one or more different criteria to identify secondary keys. For example, as shown in FIG. 25, when a trace contacts a key, all keys within the defined radius of the touched key are part of the input sequence. As shown in FIG. 26, under different approaches, when a trace contacts a given key, a defined rectangular 'touch-area' at the center on the given key is considered as an auxiliary key, and thus part of the input sequence. Depending on the area of the touch area, this criterion may be useful for defining the auxiliary keys more restrictively. One embodiment of the operative touch area is twice the size of the representable soft key. In the embodiment of Fig. 26, the auxiliary keys of the contacted "G" key include neighbors T, Y, F, H, C, and B.

보조 키들을 확인하기 위한 기준의 또 다른 실시 예는 추적의 궤적을 따른 타원이다. 이는 수평 과소이동 및 과대이동 오차들에 대한 추가적인 고려를 주며, 추적에서의 수직 오차를 최소화한다. 이러한 접근법은 따라서 추적 이동의 방향을 따라 키들에 대한 선호도를 갖는다.
Another embodiment of the criteria for identifying auxiliary keys is an ellipse along the trajectory of the trace. This gives additional consideration to horizontal undertravel and overtravel errors, minimizing vertical errors in tracking. This approach thus has a preference for keys along the direction of the tracking movement.

선택적으로, 단계 2406은 입력 시퀀스 내의 키들의 변형들을 포함하도록 입력 시퀀스를 확장할 수 있다. 예를 들면, 악센트들, 움라우트들, 조판 변화(typographical change)들, 및 외국어와 알파벳에 상응하는 변형들이 고려될 수 있다. 도 23은 추적(1706)과 상응하는 입력 시퀀스의 목록 및 입력 시퀀스의 각각의 키들의 확률들을 나타낸다.
Optionally, step 2406 can extend the input sequence to include variations of the keys in the input sequence. For example, accents, umlauts, typographical changes, and variations corresponding to foreign languages and alphabets may be considered. 23 shows a list of input sequences corresponding to trace 1706 and the probabilities of respective keys of the input sequence.

단계 2406은 (1) 시작 키 및 그것의 보조들, (2) 종료 키 및 그것의 보조들, 및 (3) 미리 결정된 최소 방향 변화가 발생하는 모든 키 및 그러한 키의 보조들을 포함하는 "일차" 또는 키들의 세트를 정의한다. 이러한 일차 키들 사이의 접촉된 키들(및 그것들의 보조들)은 "중개(intervening)" 키들로 불린다. "일차" 키들이 아닌 입력 시퀀스 내의 키들은 "이차" 또는 "선택적" 키들로서 언급된다. 따라서, 이차 키들은 중개 키들 및 그것들의 보조들을 포함한다.
Step 2406 includes (1) a "primary" including a start key and its aids, (2) an end key and its aids, and (3) all keys for which a predetermined minimum direction change occurs and the aids of such a key. Or define a set of keys. The keys touched between these primary keys (and their assistants) are called "intervening" keys. Keys in the input sequence that are not "primary" keys are referred to as "secondary" or "optional" keys. Thus, secondary keys include intermediate keys and their assistants.

도 17의 실시 예에서, 추적은 P, O, I, U, Y, T, R, E, R, T, 및 Y와 접촉하였다. 여기서, 단계 2406은 다음의 키들을 정의한다: (1) 보조 키 O를 갖는 P와 같은 시작 키를 포함하는 일차 키들, (2) 추적의 속도 또는 방향에 기인하여 보조들이 없는 중개 키들 O, I, U, Y, T, R을 포함하는 이차 키들, (3) 보조 키들 W와 R을 갖는 방향-변화-키 E를 포함하는 일차 키들, (4) 추적의 속도 또는 방향에 기인하여 보조들이 없는 중개 키들 R과 T를 포함하는 이차 키들, 및 (5) 종료 키 Y 및 그것의 보조들 T와 U를 포함하는 일차 키들.
In the example of FIG. 17, the trace contacted P, O, I, U, Y, T, R, E, R, T, and Y. Here, step 2406 defines the following keys: (1) primary keys comprising a starting key, such as P with auxiliary key O, (2) intermediate keys O, I without aids due to the speed or direction of tracking. , Secondary keys comprising U, Y, T, R, (3) primary keys comprising direction-change-key E with auxiliary keys W and R, (4) no aids due to the speed or direction of tracking Secondary keys comprising intermediate keys R and T, and (5) primary keys comprising an end key Y and its assistants T and U.

단계 2406 이후에, 입력 시퀀스가 정의된다. 단계 2408에서, 중앙처리장치(201)는 한번에 하나씩, 단계 2406으로부터의 입력 시퀀스를 어휘(213)로부터의 일부 또는 모든 엔트리와 비교한다. 언제든지 고려되는 어휘 엔트리는 "현재" 엔트리로서 언급된다.
After step 2406, an input sequence is defined. In step 2408, central processing unit 201 compares the input sequence from step 2406 with some or all entries from vocabulary 213, one at a time. Vocabulary entries that are considered at any time are referred to as "current" entries.

어휘(213) 내의 모든 엔트리들의 비교는 너무 소비적 또는 처리 자원들의 소모적일 수 있는데, 따라서 다양한 기술들이 특정 어휘 엔트리만의 비교를 한정하도록 사용될 수 있다. 예를 들면, 단계 2408은 가장 높은 빈도의 사용을 갖는 어휘(213) 내의 엔트리들에 대한 비교를 한정할 수 있다. 선택적으로, 그러한 비교를 한정하는 단계 2408에서의 결정은 실시간으로 만들어질 수 있는데, 예를 들면, 중앙처리장치(201) 상에 주어진 작업부하가 존재할 때, 또는 단계 2408의 처리가 규정된 레벨에 도달할 때, 규정된 기간 동안에 계속되거나, 또는 주어진 수의 매트릭스 운용을 실행한다.
The comparison of all entries in the vocabulary 213 may be too consuming or exhaustive of processing resources, so various techniques may be used to limit the comparison of only specific vocabulary entries. For example, step 2408 may define a comparison to entries in vocabulary 213 with the highest frequency of use. Optionally, the determination in step 2408 defining such a comparison can be made in real time, for example, when there is a given workload on the central processing unit 201, or the processing of step 2408 is at a defined level. When it arrives, it continues for a defined period of time, or performs a given number of matrix operations.

도시된 실시 예에서, 단계 2408은 위에서 설명된 것과 같이 세트-편집-거리를 계산함으로써 입력 시퀀스 및 현재 어휘 엔트리 사이의 비교를 수행한다. 결과는 입력 시퀀스 및 현재 어휘 엔트리 사이의 유사도 정도를 나타내는 메트릭이다. 세트-편집-거리가 도 4와 유사한 방법으로 계산되는데, 각각의 매트릭스 열은 사용자의 행동에 의해 의도될 수 있는 다중 글자를 고려한다. 그러나, 추적 입력에 적용가능한 현재 실시 예에서, 서로 다른 매트릭스 열들은 다음과 함께 그룹을 이루는 서로 다른 키들을 나타낸다: 시작 키와 그것의 보조들을 위한 하나의 열, 추적이 방향의 상당한 변화를 나타내는 각각의 키(및 보조들)를 위한 하나의 열, 중개 키들과 그것들의 보조의 각각의 그룹을 위한 하나의 열, 및 종료 키와 그것의 보조를 위한 하나의 열.
In the illustrated embodiment, step 2408 performs a comparison between the input sequence and the current lexical entry by calculating the set-edit-distance as described above. The result is a metric indicating the degree of similarity between the input sequence and the current lexical entry. The set-edit-distance is calculated in a similar manner to FIG. 4, where each matrix column takes into account multiple letters that can be intended by the user's action. However, in the present embodiment applicable to the tracking input, different matrix columns represent different keys grouped together with: a start key and one column for its assistants, each of which indicates a significant change in direction of the tracking One column for the key (and assistants), one column for each group of intermediate keys and their assistants, and one column for the end key and its assistants.

시퀀스 2400에서, 세트-편집-거리의 계산은 또한 다수의 룰(2420)의 적용에 의해 세트-편집-거리의 비-추적 실시 예들과 구별된다. 룰 2421은 일차 키들의 다수의 그룹을 정의한다. 각각의 그룹은 매트릭스(1800) 내의 서로 다른 열 상에 나타난다. 각각의 그룹으로부터 적어도 하나의 키는 세트-편집-거리 내에 고려되거나 또는 페널티가 존재하는지 고려되어야만 한다. 바꾸어 말하면, 현재 어휘 엔트리 내에 표현되지 않은 각각의 그룹을 위하여 페널티가 추정된다. 이러한 그룹들은 (1) 매트릭스(1800) 내의 1802에 의해 도시된 것과 같이, 시작 키 및 그것의 보조들 모두, (2) 1808에 의해 도시된 것과 같이, 종료 키 및 그것의 보조들 모두, (3) 1805에 의해 예시된 것과 같이, 방향 내의 미리 결정된 최소 변화가 발생하는 모든 키 및 이러한 키에 대한 보조들을 포함한다. 따라서, 만일 현재 어휘가 시작 키 및 그것들의 보조 모두를 누락하면, 페널티가 존재하고, 만일 현재 어휘 엔트리가 종료 키 및 그것의 보조 모두를 누락하면, 또 다른 페널티가 존재한다.
In sequence 2400, the calculation of the set-edit-distance is also distinguished from the non-tracking embodiments of the set-edit-distance by the application of a number of rules 2420. Rule 2421 defines a number of groups of primary keys. Each group appears on a different column in matrix 1800. At least one key from each group must be considered within the set-edit-distance or whether a penalty exists. In other words, a penalty is estimated for each group not represented in the current lexical entry. These groups include (1) both the start key and its assistants, as shown by 1802 in matrix 1800, and (2) both the end key and its assistants, as shown by 1808, and (3) As illustrated by 1805, includes all keys for which a predetermined minimum change in direction occurs and aids to such keys. Thus, if the current vocabulary misses both the start key and their aids, there is a penalty, and if the current vocabulary entry misses both the end key and its aids, another penalty exists.

룰 2422는 이차 키들에 관한 것이다. 이러한 키들을 위하여, 룰 2422는 만일 현재 어휘 엔트리가 이러한 키들 중 어떤 것 또는 모두를 생략하면 페널티가 존재하지 않는다는 것을 규정한다. 세트-편집-거리 계산에서, 이러한 룰은 이차 키들의 각각을 위하여 무료의 삭제를 허용한다. 이차 키들은 매트릭스(1800) 내의 열들 1804와 1806에 의해 도시된다.
Rule 2422 relates to secondary keys. For these keys, rule 2422 specifies that no penalty exists if the current lexical entry omits any or all of these keys. In set-edit-distance calculations, this rule allows free deletion for each of the secondary keys. Secondary keys are shown by columns 1804 and 1806 in matrix 1800.

룰 2323은 반복되는 키들(2423)에 관한 룰을 제시한다. 예를 들면, 룰 2423은 만일 현재 어휘 엔트리가 입력 시퀀스 내의 어떠한 키를 연속해서 두 번 또는 그 이상 사용하면 페널티가 존재하지 않는다는 것을 규정한다. 이는 이중 글자(double-leeter)들을 허용하는데, 그 이유는 사용자들이 추적을 사용하여 반복되는 글자를 나타내는 것이 어렵기 때문이다. 따라서, 세트-편집-거리 계산에서, 이러한 룰은 무료의 추가를 허용한다. 추적 기술을 넘어, 이러한 룰은 키보드, 12-키 키패드 등과 같은, 다양한 비-추적 사용자 입력을 해결하는데 구현될 수 있다. 예를 들면, 12-키 키패드의 경우에 있어서, "3" 키를 한 번 누르는 것은 단어 "FED"를 산출할 수 있는데 그 이유는 "3" 키가 글자들 "3EFD"를 표현하기 때문이다.
Rule 2323 presents a rule regarding repeated keys 2423. For example, rule 2423 specifies that there is no penalty if the current lexical entry uses any key in the input sequence two or more times in succession. This allows double-leeters, because it is difficult for users to represent repeating letters using tracking. Thus, in set-edit-distance calculation, this rule allows the addition of free. Beyond tracking technology, these rules can be implemented to address various non-tracking user inputs, such as keyboards, 12-key keypads, and the like. For example, in the case of a 12-key keypad, pressing the "3" key once can yield the word "FED" because the "3" key represents the letters "3EFD".

룰 2425는 구두점, 숫자들, 및 부호들에 관한 것이다. 이러한 룰은 만일 현재 어휘 엔트리가 규정된 그룹(구두점, 숫자들, 부호들, 인사/키릴문자와 같은 알파벳 이외의 글자들과 같은)을 포함하면 세트-편집-거리 계산(2408)이 어떠한 페널티도 부가하지 않는다는 것을 나타낸다. 이것들은 입력하기에 더 어려운 글자들이 예들이다. 이러한 실시 예의 변형으로서, 룰 2425는 그러한 그룹이 추적이 만들어지는 시간에 개별적으로 도시되지 않는 한 규정된 그룹의 글자들을 사용하기 위하여 페널티가 존재하지 않는다는 것을 규정한다. 예를 들면, 만일 소프트 키보드가 추적이 만들어지는 시간에서 구두점 글자들의 팰릿(pallet)의 디스플레이를 포함하지 않으면, 현재 어휘 엔트리가 입력 시퀀스가 없는 하나 또는 그 이상의 이러한 구두점 글자들을 포함하면 페널티가 존재하지 않는다. 따라서, 세트-편집-거리 계산에서, 이러한 룰은 무료의 추가를 허용한다.
Rule 2425 relates to punctuation, numbers, and symbols. This rule implies that set-edit-distance calculation 2408 does not incur any penalty if the current vocabulary entry includes a defined group (such as non-alphabetic characters such as punctuation, numbers, signs, and greeting / Cyrillic). Indicates no addition. These are examples of letters that are more difficult to type. As a variation of this embodiment, rule 2425 specifies that there is no penalty for using the letters of a defined group unless such a group is individually shown at the time the trace is made. For example, if the soft keyboard does not include a display of a palette of punctuation characters at the time the trace is made, there is no penalty if the current lexical entry contains one or more of these punctuation characters without an input sequence. Do not. Thus, in set-edit-distance calculation, this rule allows the addition of free.

룰 2423뿐만 아니라, 룰 2425는 추적 기술보다 광범위한 기술을 갖는다. 예를 들면, 이러한 룰들은 키들이 동시에 다중 글자들을 표현하는 키패드를 통하여 제출되는 본질적으로 애매한 사용자 입력의 해석을 넓히도록 사용될 수 있다. 따라서, 룰 2425의 경우에 있어서, 장치는 사용자가 입력하지 않은 숫자들 또는 구두점 또는 특정 부호들을 포함하는 단어들을 갖는 전화기 키패드 사용자를 제공할 수 있다. 룰 2423의 경우에 있어서, 장치는 반복된 키들("O" 대신에 "OO"와 같은), 또는 반복되는("F" 대신에 "FED"와 같은) 동일한 키의 서로 다른 글자들을 갖는 단어들을 kw동으로 고려할 수 있다. 애매한 12-키 입력의 해결과 관련된 하나의 바람직한 특허가 Grover 등에 의해 "REDUCED KEYBOARD DISAMBIGUATING COMPUTER"라는 발명의 명칭으로 1998년 10월 6일에 공개된 미국특허 제 5,818,437이다. 상기 특허의 전체 내용은 여기에 참조로써 통합된다.
In addition to rule 2423, rule 2425 has a wider range of techniques than tracking techniques. For example, these rules can be used to broaden the interpretation of the essentially obscure user input in which keys are submitted through a keypad that simultaneously represents multiple letters. Thus, in the case of rule 2425, the device may provide a telephone keypad user with words that contain numbers or punctuation or specific symbols that the user did not enter. In the case of rule 2423, the device may select words having different letters of the same key that are repeated (such as "OO" instead of "O") or repeated (such as "FED" instead of "F"). It can be considered as kW. One preferred patent relating to the resolution of ambiguous 12-key input is US Pat. No. 5,818,437, published October 6, 1998, by the name of the invention "REDUCED KEYBOARD DISAMBIGUATING COMPUTER" by Grover et al. The entire contents of these patents are incorporated herein by reference.

도 20의 매트릭스는 후보 단어 "pretty"를 위한 세트-편집-거리의 계산을 도시한다. 본 실시 예에서, 계산된 세트-편집-거리는 제로이다.
The matrix of FIG. 20 shows the calculation of the set-edit-distance for the candidate word "pretty". In this embodiment, the calculated set-edit-distance is zero.

단계 2408의 세트-편집-거리의 계산에 대한 하나의 가능한 변화는 키보드 기하학적 구조를 기초로 한 확률의 고려이다. 주로, 주어진 후보 단어를 위한 계산된 세트-편집-거리 메트릭은 사용자가 후보 단어의 글자들을 의도한 확률에 따라 더 변형될 수 있다. 예를 들면, 만일 도 25의 접근법이 보조 키들을 식별하도록 사용되었으면, 각각의 보조 키는 방정식 1에 따라 계산된 확률과 관련될 수 있다:One possible change to the calculation of the set-edit-distance of step 2408 is the consideration of probabilities based on keyboard geometry. Primarily, the calculated set-edit-distance metric for a given candidate word may be further modified according to the probability that the user intended the letters of the candidate word. For example, if the approach of FIG. 25 was used to identify supplemental keys, each supplemental key may be associated with a probability calculated according to equation 1:

확률 = 1-거리/반경 [방정식 1]Probability = 1-distance / radius [Equation 1]

여기서 거리는 원형에 중심이 되는 키 및 보조 키 사이의 거리와 동일하고, 반경은 원형의 반경이다.
The distance here is equal to the distance between the key and the auxiliary key centered on the circle, and the radius is the circle radius.

도 26의 다른 실시 예에서, 각각의 보조 키는 방정식 2에 따라 계산되는 확률과 관련된다:In another embodiment of FIG. 26, each auxiliary key is associated with a probability calculated according to equation 2:

확률 = 오버랩/터치-영역 [방정식 2]Probability = Overlap / Touch-Area [Equation 2]

여기서 오버랩은 터치 영역을 교차하는 대상 보조 키의 영역이고, 터치-영역은 터치-영역 내의 영역이다.
Here, the overlap is an area of the target auxiliary key crossing the touch area, and the touch area is an area within the touch area.

개량되거나 또는 대안의 접근법으로서, 주어진 키를 위한 키 스트라이크는 적어도 부분적으로 주어진 키를 지나 추적된 경로의 속도를 기초로 할 수 있다. 바꾸어 말하면, 키 스트라이크 확률은 추적 속도에 반비례할 수 있다. 따라서, 스타일러스의 사용자의 손가락이 주어진 키를 지나 더 빠르게 이동하면, 그러한 키의 키 스트라이크 확률은 덜하다.
As an improved or alternative approach, the key strike for a given key may be based at least in part on the speed of the tracked path past the given key. In other words, the key strike probability may be inversely proportional to the tracking speed. Thus, if the user's finger of the stylus moves faster through a given key, the key strike probability of that key is less.

이러한 확률들을 결정하기 위하여 어떤 접근법이 사용되는가에 상관없이, 후보 단어 내의 모든 키의 확률들은 다양한 접근법 중의 하나에 따라 고려된다. 하나의 단순화된 실시 예에서, 후보 단어 내의 모든 키의 확률들은 함께 곱해지고, 세트-편집-거리는 이러한 수에 의해 나뉜다. 따라서, 더 많은 보조 키들을 갖는 후보 단어들은 부풀린 세트-편집-거리를 야기한다.
Regardless of which approach is used to determine these probabilities, the probabilities of all keys in the candidate word are considered according to one of the various approaches. In one simplified embodiment, the probabilities of all keys in the candidate word are multiplied together, and the set-edit-distance is divided by this number. Thus, candidate words with more auxiliary keys cause an inflated set-edit-distance.

또 다른 실시 예에서, 확률들은 도 22에 의해 예시되는 것과 같이, 확률 그림자 매트릭스 내로의 입력이다. 이는 예를 들면, 동일한 세트-편집-거리를 갖는 두 개의 후보 단어들 사이의 관계(tie)를 해결하기 위하여 이차 고려로서 사용된다. 본 실시 예에서, 도 22의 그림자 매트릭스에 의해 계산된 확률은 12,642,870이다. 도 22의 매트릭스 내로 입력된 확률들은 도 25-26의 실시 예들과 같이, 위에서 설명된 것과 같이 구현되는 모든 키 스트라이크 확률들로부터 발생한다. 예를 들면, 이러한 확률 계산은 위에서 상세히 논의된 것과 같이, 탭 빈도를 사용하여 실행될 수 있다.
In another embodiment, the probabilities are input into the probability shadow matrix, as illustrated by FIG. 22. This is used as a secondary consideration, for example, to resolve the tie between two candidate words having the same set-edit-distance. In this embodiment, the probability calculated by the shadow matrix of FIG. 22 is 12,642,870. The probabilities input into the matrix of FIG. 22 arise from all key strike probabilities implemented as described above, as in the embodiments of FIGS. 25-26. For example, such probability calculation can be performed using tap frequency, as discussed in detail above.

이와 관련하여, 단계 2408은 동일한 세트-편집-거리를 갖는 후보 단어들 사이의 관계들을 해결하는데 사용하기 위하여, 다양한 또 다른 그림자 매트릭스를 계산할 수 있다. 일 실시 예에서, 이러한 그림자 매트릭스들은 후보 단어에 도달하는데 필요한 무료 첨가들의 수를 계산하는 매트릭스를 포함한다. 도 19는 이러한 그림자 매트릭스의 실시 예를 도시한다. 도 19의 매트릭스는 세 개의 무료 첨가를 표시한다. 그림자 매트릭스는 또한 후보 단어를 위한 스템-편집-거리를 계산하는 매트릭스를 포함한다. 도 21은 이러한 그림자 매트릭스의 실시 예에다. 이러한 실시 예에서, 도 21로부터의 스템-편집-거리는 제로이다.
In this regard, step 2408 may calculate various other shadow matrices for use in resolving relationships between candidate words having the same set-edit-distance. In one embodiment, these shadow matrices include a matrix that calculates the number of free additions needed to reach the candidate word. 19 shows an embodiment of such a shadow matrix. The matrix of FIG. 19 indicates three free additions. The shadow matrix also includes a matrix for calculating the stem-edit-distance for candidate words. 21 is an embodiment of such a shadow matrix. In this embodiment, the stem-edit-distance from FIG. 21 is zero.

일 실시 예에서, 단계 2408은 도 20에 의해 예시된 것과 같이, 세트-편집-거리를 계산하고 만일 세트-편집-거리 매트릭스가 규정된 한계 크기를 초과하면 하나 또는 모든 그림자 매트릭스의 고려를 누락시킴으로써 능률화될 수 있다. 이는 단계 2408이 더 빠르게 완료하도록 도움을 준다.
In one embodiment, step 2408 calculates the set-edit-distance as illustrated by FIG. 20 and omits the consideration of one or all shadow matrices if the set-edit-distance matrix exceeds a defined limit size. Can be streamlined. This helps step 2408 complete more quickly.

단계 2408 이후에, 단계 2410은 세트-편집-거리가 계산되는 어휘 입력들을 취하며, 그것들의 세트-편집-거리 매트릭스에 따라 그것들에 순위를 매긴다. 단계 2410은 주어진 표준에 따라, 디스플레이(203) 상에와 같이, 이러한 가장 높은 순위의 후보 엔트리들의 시각적 입력을 제공한다. 예를 들면, 표준은 상위 10개 엔트리, 상위 20개 엔트리, 디스플레이 스크린상에 맞는 수의 엔트리, 또는 다른 기준을 지정할 수 있다.
After step 2408, step 2410 takes the lexical inputs for which the set-edit-distance is calculated and ranks them according to their set-edit-distance matrix. Step 2410 provides a visual input of these highest ranked candidate entries, as on display 203, according to a given standard. For example, the standard may specify the top 10 entries, the top 20 entries, the correct number of entries on the display screen, or other criteria.

비록 루틴(2400)이 다양한 "무료" 첨가 및 삭제들을 설명하나, 시퀀스의 일 실시 예는 여전히 이러한 무료 특징들의 사용을 트래킹(tracking)한다. 이러한 데이터는 주로 어떤 무료 첨가 또는 삭제들을 사용하지 않는 단어들 대 그러한 것을 사용하는 단어들 사이의 관계를 깨뜨리는, 개량된 신뢰도 측정을 제공하는 것과 같은, 다양한 목적을 위하여 사용될 수 있다.
Although the routine 2400 describes various "free" additions and deletions, one embodiment of the sequence still tracks the use of these free features. This data can be used for a variety of purposes, such as to provide an improved confidence measure that primarily breaks the relationship between words that do not use any free additions or deletions versus words that use such.

특정 실시 예Specific embodiments

시퀀스(2400)를 더 설명하기 위하여, 다음의 실시 예가 주어진다. 단계 2402에서 시스템(200)은 도 17에 도시된 소프트 키보드(1700)를 디스플레이한다. 단계 2404에서 시스템(200)은 도 17에서 참조번호 1706에 의해 도시된, 사용자의 추적을 받는다. 단계 2406에서, 시스템은 사용자의 입력 시퀀스를 정의한다. 여기서, 추적은 키들 P, O, I, U, T, R, E, R, T, 및 Y와 직접 접촉한다. 이러한 키들 및 그것들의 보조들을 포함하는, 전체 입력 시퀀스는 다음을 포함한다:To further illustrate the sequence 2400, the following embodiments are given. In step 2402, the system 200 displays the soft keyboard 1700 shown in FIG. In step 2404 the system 200 is tracked by the user, shown by reference numeral 1706 in FIG. In step 2406, the system defines a user's input sequence. Here, the tracking is in direct contact with the keys P, O, I, U, T, R, E, R, T, and Y. The entire input sequence, including these keys and their assistants, includes:

(1) PO : 접촉된 시작 키 P뿐만 아니라, 보조 O를 포함하는 일차 키들 그 이유는 그것이 직사각형 확률 접근법을 만족시키고 추적 이동의 방향을 따라 키들에 대한 선호도를 만족시키기 때문이다.(1) PO: primary keys containing auxiliary O as well as contacted start key P because it satisfies the rectangular probability approach and satisfies the preference for keys along the direction of the tracking movement.

(2) OIUYTR : 추적의 속도 또는 방향에 기안하여 보조들이 없는 이차, 중개 키들 O, I, U, Y, T, 및 R.(2) OIUYTR: secondary, intermediary keys O, I, U, Y, T, and R without aids based on the speed or direction of the trace.

(3) EWR : 접촉된 방향 변화 키 E뿐만 아니라, W와 R 그 이유는 그것들이 E에 대한 보조들이기 때문이다.(3) EWR: W and R, as well as the direction change keys E contacted, because they are aids for E.

(4) RT : 추적의 속도 또는 방향에 기안하여 보조들이 없는 이차, 중개 키들 R과 T.(4) RT: Secondary, intermediary keys R and T without aids based on the speed or direction of the trace.

(5) YTU : 접촉된 종료 키 Y와 함께 T와 U 그 이유는 그것들이 Y에 대한 보조들이기 때문이다.
(5) YTU: T and U with contacted exit key Y because they are auxiliary to Y.

그 다음에, 단계 2406은 단계 2406의 입력 시퀀스를 위하여 세트-편집-거리를 계산한다. 일 실시 예에서, 단계 2406은 도 18의 매트릭스(1800)를 사용하여 수행된다(적어도 부분적으로). 이러한 매트릭스는 도 4의 매트릭스와 유사한 개념들을 사용하나, 추적의 사용에 기인하는 일부 새로운 트위스트(twist)들을 갖는다. 이것들은 위에서 설명되었으며, 아래에 더 설명될 것이다.
Step 2406 then calculates the set-edit-distance for the input sequence of step 2406. In one embodiment, step 2406 is performed (at least in part) using the matrix 1800 of FIG. 18. This matrix uses concepts similar to the matrix of FIG. 4 but with some new twists due to the use of tracking. These have been described above and will be explained further below.

단어들(1810)은 단계 2406에서의 현재 입력 시퀀스들과 비교되는 입력 엔트리들의 대표적인 일부이다. 설명된 것과 같이, 비교는 어휘 엔트리들의 일부 또는 모두로 만들어질 수 있다. 열 1802는 PO와 상응하고, 열 1804는 OIUYTR과 상응하며, 열 1805는 EWR과 상응하며, 열 1806은 RT와 상응하며, 열 1808은 YTU와 상응한다. 열 1804, 1806에서의 별표는 이러한 열들의 키들이 이차 키들인 것을 나타낸다.
The words 1810 are representative of some of the input entries compared to the current input sequences at step 2406. As described, the comparison can be made with some or all of the lexical entries. Column 1802 corresponds to PO, column 1804 corresponds to OIUYTR, column 1805 corresponds to EWR, column 1806 corresponds to RT, and column 1808 corresponds to YTU. The asterisks in columns 1804 and 1806 indicate that the keys of these columns are secondary keys.

본 실시 예에서 룰 2420은 다음과 같이 적용된다. 룰 2321은 시작 키 P 또는 그것의 보조 O가 열 1802에 고려되고 종료 키 Y 또는 그것의 보조 T 혹은 U가 열 1808에 고려되며, 방향 변화 키 E 또는 그것의 보조 W 혹은 R이 열 1805에 고려되는 것이 필수적인 것을 말한다. 이차 키들 OIUYTR의 열 1804 및 이차 키들 RT의 열 1806이 이차 키들이고, 룰 2422하에서 하며, 그것들은 세트-편집-거리에 대한 손실 없이 무시될 수 있다. 룰 2423은 키들(1802, 1804, 1805, 1806, 및 1808)의 어떤 것도 페널티 없이 수차례 반복될 수 있다는 것을 규정한다. 룰 2425는 이러한 특정 실시 예에 적용되지 않는데 그 이유는 본 실시 예에서 후보 엔트리들(1810) 중 어떠한 것도 구두점 또는 특별한 부호를 포함하지 않기 때문이다.
In the present embodiment, rule 2420 is applied as follows. Rule 2321 allows the start key P or its secondary O to be considered in column 1802 and the end key Y or its secondary T or U to column 1808 and the direction change key E or its secondary W or R to column 1805. It is essential to be. Column 1804 of the secondary keys OIUYTR and column 1806 of the secondary keys RT are the secondary keys, and under rule 2422, they can be ignored without loss to the set-edit-distance. Rule 2423 specifies that any of the keys 1802, 1804, 1805, 1806, and 1808 can be repeated many times without penalty. Rule 2425 does not apply to this particular embodiment because none of the candidate entries 1810 contain punctuation or special marks in this embodiment.

도 28은 도 18의 어휘 엔트리 'potter'를 위한 세트-편집-거리를 계산하는 매트릭스를 도시한다. 이러한 실시 예에서, 후부 단어 'pretty'는 'potter'를 이기는데, 그 이유는 'pretty'를 위한 세트-편집-거리가 제로이기 때문이다.
FIG. 28 shows a matrix for calculating the set-edit-distance for the lexical entry 'potter' of FIG. 18. In this embodiment, the trailing word 'pretty' beats 'potter' because the set-edit-distance for 'pretty' is zero.

만일 관계가 존재하였으면, 단계 2408은 관계를 해결하기 위하여 단어 "potter"를 위한 하나 또는 그 이상의 그림자 매트릭스를 고려할 수 있다. 이와 관련하여, 도 27의 매트릭스는 본 실시 예에서는 세 개인, 후보 엔트리 "potter"에 도달하기 위하여 필요한 자유 첨가들의 수를 계산하는 그림자 매트릭스를 도시한다. 도 29는 스템-편집-거리를 위한 그림자 매트릭스를 도시하고, 도 30은 키 스트라이크 확률을 위한 그림자 매트릭스를 도시한다.
If a relationship existed, step 2408 may consider one or more shadow matrices for the word "potter" to resolve the relationship. In this regard, the matrix of FIG. 27 shows a shadow matrix that calculates the number of free additions needed to reach three individuals, the candidate entry "potter" in this embodiment. FIG. 29 shows the shadow matrix for the stem-edit-distance, and FIG. 30 shows the shadow matrix for the key strike probability.

다른 실시 예들Other embodiments

앞선 설명들은 다수의 도시된 실시 예들을 나타내나, 통상의 지식을 가진 자들에게 첨부된 청구항들에 의해 정의되는 본 발명의 범위를 벗어나지 않고 다양한 변경 및 변형들이 만들어질 수 있다는 것은 자명할 것이다. 따라서, 개시된 실시 예들은 본 발명에 의해 광범위하게 고려되는 주제를 나타내고, 본 발명의 범위는 통상의 지식을 가진 자들에게 자명할 수 있는 다른 실시 예들을 포함하며, 따라서 본 발명의 범위는 첨부된 청구항들에 의해서만 한정된다.
While the foregoing descriptions illustrate a number of illustrated embodiments, it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the scope of the invention as defined by the appended claims. Accordingly, the disclosed embodiments represent the subject matter broadly contemplated by the present invention, and the scope of the present invention includes other embodiments that may be apparent to those skilled in the art, and the scope of the present invention is therefore appended to the appended claims. Limited only by them.

105 : 데이터 입력 장치
110 : 소스
115 : 사전
200 : 입력 시스템
201 : 프로세서
202 : 입력 장치
203 : 디스플레이
204 : 스피커
210 : 메모리
211 : 운용 시스템
212 : 수정 소프트웨어
213 : 어휘 모듈
214, 215, 216 : 응용 프로그램
220 : 디지털 데이터 처리 장치
221 : 입력/출력
222 : 프로세서
224 : 디지털 데이터 기억장치
226, 228, 230 : 기억장치
1700 : 소프트 키보드
1706 : 추적105: data input device
110: source
115: dictionary
200: input system
201: processor
202: input device
203: display
204: Speaker
210: memory
211: operating system
212: Fix Software
213: Vocabulary Module
Application: 214, 215, 216
220: digital data processing device
221: input / output
222 processor
224: Digital Data Storage
226, 228, 230: memory
1700: Soft Keyboard
1706: Tracking

Claims

A display comprising a touch sensitive display surface;
Digital data storage including a lexical database comprising multiple lexical entries; And
Coupled to the display and the storage device,
Directing the display to indicate an arrangement of keys, each representing one or more letters,
Receiving, via the display surface, a user input comprising a single continuous trace of contacting the multiple keys in sequence, including a start key, an end key, and any intermediate key between the start key and the end key. step,
Defining an input sequence comprising any auxiliary keys including the touched keys and also including keys indicative of the proximity defined in the touched keys,
Comparing the candidate entries from the vocabulary to the vocabulary, including operations of calculating a set-edit-distance metric to calculate a matching metric between the input sequence and a candidate entry for each candidate entry. Equipped,
The set-edit-distance calculation imposes a penalty for excluding all keys from the first group including the start key and auxiliary keys for the start key from the candidate word,
The set-edit-distance calculation imposes a penalty for excluding all keys from the second group including the end key and auxiliary keys for the end key from the candidate word,
The set-edit-distance calculation imposes a penalty for excluding all keys from the third group, including all contacted keys and their aids, from which the defined minimum change in direction of the tracking occurs, at least. Impose,
The set-edit-distance calculation imposes no penalty on excluding any key of the input sequence outside all the first, second and third groups from the candidate word,
The set-edit-distance calculation adds no penalty for using any key in the input sequence two or more times in succession in the candidate word, and
Ranking the candidate words according to a criterion comprising at least the calculated set-edit-distance metric, and providing an output of at least some of the ranked candidate words. A text input device comprising a processor programmed to.

2. The text input of claim 1, wherein the set-edit-distance calculation adds no penalty for candidate words that include a defined group of letters including any of punctuation, numbers, or signs. Device.

The method of claim 1, wherein the operations comprise: calculating a keyboard-geometry-based probability in conjunction with each calculated set-edit-distance metric;
And using said probability calculated to break the relationship between similarly ranked candidate words.

4. The text input device of claim 3, wherein for each of the intermediate keys, the keyboard-geometry-based probability decreases with the speed at which the tracking avoids the intermediate keys.

4. The text input device of claim 3, wherein the keyboard-geometry-based probability of a given key changes with the radius of the key from a point on the tracking.

4. The method of claim 3, wherein the keyboard-geometry-based probability of a given key varies according to the area of the given key that intersects a defined rectangular touch area centered on the contacted key closest to the given key. Text input device.

2. The text input of claim 1, wherein said ranking operation is performed to rank candidate words according to criteria comprising said set-edit distance metric and language-model-based probabilities. Device.

2. The method of claim 1, wherein the operations further comprise tracking the additions and deletions of free and using the tracked free additions and deletions to break the relationship between similarly ranked candidate words. Characterized by a text input device.

The method of claim 1, wherein the operations further comprise calculating a stem-editing-distance and using the calculated stem-editing-distance to break a relationship between similarly ranked candidate words. Characterized by a text input device.

Guiding the display to indicate an arrangement of keys, each representing one or more letters;
Receiving, via the display surface, a user input comprising a single continuous trace of contacting the multiple keys in sequence, including a start key, an end key, and any intermediate key between the start key and the end key. step;
Defining an input sequence comprising any auxiliary keys that include the touched keys and also include keys indicative of the proximity defined in the touched keys;
Comparing the candidate entries from the vocabulary to the vocabulary, comprising operations of calculating a set-edit-distance metric to calculate a matching metric between the input sequence and a candidate entry for each candidate entry. Equipped,
The set-edit-distance calculation imposes a penalty for excluding all keys from the first group including the start key and auxiliary keys for the start key from the candidate word,
The set-edit-distance calculation imposes a penalty for excluding all keys from the second group including the end key and auxiliary keys for the end key from the candidate word,
The set-edit-distance calculation imposes a penalty for excluding all keys from the third group, including all contacted keys and their aids, from which the defined minimum change in direction of the tracking occurs, at least. Impose,
The set-edit-distance calculation imposes no penalty on excluding any key of the input sequence outside all the first, second and third groups from the candidate word,
The set-edit-distance calculation adds no penalty for using any key in the input sequence two or more times in succession in the candidate word; And
Ranking the candidate words according to a criterion including at least the calculated set-edit-distance metric, and providing an output of at least a portion of the ranked candidate words.
At least one of the operations is executed by a processor.

11. The method of claim 10, wherein the set-edit-distance calculation adds no penalty for candidate words that include a defined group of letters including any of punctuation, numbers, or symbols.

11. The method of claim 10, wherein the operations comprise: calculating a keyboard-geometry-based probability in conjunction with each calculated set-edit-distance metric;
Using the probability calculated to break the relationship between similarly ranked candidate words.

13. The method of claim 12, wherein for each of the relay keys, the keyboard-geometry-based probability decreases with the speed at which the tracking avoids the relay key.

13. The method of claim 12, wherein the keyboard-geometry-based probability of a given key changes with the radius of the key from a point on the tracking.

13. The method of claim 12, wherein the keyboard-geometry-based probability of a given key varies according to the area of the given key that intersects a defined rectangular touch area centered on the contacted key closest to the given key. How to.

11. The method of claim 10, wherein said ranking operation is performed to rank candidate words according to a criterion comprising said set-edit distance metric and a language-model-based probability.

11. The method of claim 10, wherein the operations further comprise tracking the additions and deletions of free and using the tracked free additions and deletions to break the relationship between similarly ranked candidate words. How to feature.

11. The method of claim 10, wherein the operations further comprise calculating a stem-edit-distance and using the calculated stem-edit-distance to break the relationship between similarly ranked candidate words. How to feature.

An article of manufacture comprising at least one medium of digital data storage comprising a non-transitory storage of a program executable by a processor to carry out the operations of claim 10.

Receiving a user input specifying a path that is continuously tracked across the keyboard represented on the touch-sensitive display;
Resolving an input sequence of keys tracked by a defined criterion and auxiliary keys proximate the tracked keys;
For each of one or more candidate entries of a defined vocabulary, calculating a set-edit-distance metric between the input sequence and the candidate entry,
The set-edit-distance calculation imposes a penalty for excluding all keys from the first group including the start key of the route and auxiliary keys for the start key from the candidate entry,
The set-edit-distance calculation imposes a penalty for excluding all keys from the second group including the end key of the route and auxiliary keys for the end key from the candidate entry,
The set-edit-distance calculation imposes a penalty for excluding all keys from the third group, including all contacted keys and their aids, from which the specified minimum change in direction of the path occurs, at least. Impose,
The set-edit-distance calculation imposes no penalty on excluding any key of the input sequence outside all the first, second and third groups from the candidate entry,
The set-edit-distance calculation adds no penalty for the candidate entry using any key in the input sequence two or more times in succession; And
Ranking the candidate words according to the calculated set-edit-distance metrics, and providing an output of some or all of the ranked candidate words,
At least one of the operations is executed by a processor.

An apparatus comprising a touch sensitive display coupled to a processor programmed to perform the operations of claim 20.

An article of manufacture comprising at least one medium of digital data storage comprising a non-transitory storage of a program executable by a processor to carry out the operations of claim 20.

display;
A user input device comprising a keypad of multiple keys, or a touch sensitive display surface;
Digital data storage including a lexical database comprising multiple lexical entries; And
Coupled to the display, the user input, and the storage device,
A sequence of keys comprising, via the user input device, one of the keys adjacent to and in contact with successive traces entered through a touch-sensitive display surface, or one of the pressed keypad keys in which one or more jointly represent multiple letters. Receiving user input,
Identifying different candidate words, each representing a combination of letters potentially represented by the sequence of keys,
Comparing the candidate words with inputs of the vocabulary to score the compared vocabulary entries according to the reproducibility of the user input; a processor programmed to perform operations comprising:
The operation of the comparing step may include punctuation, symbols, numbers, repetition of any letter represented together on one key of the user input, together on one key of the user input, which are not included in the input sequence. And prevent any vocabulary entry including any of the use of multiple letters represented from being penalized.