GB2136612A - Word checking system - Google Patents

Word checking system Download PDF

Info

Publication number
GB2136612A
GB2136612A GB08306665A GB8306665A GB2136612A GB 2136612 A GB2136612 A GB 2136612A GB 08306665 A GB08306665 A GB 08306665A GB 8306665 A GB8306665 A GB 8306665A GB 2136612 A GB2136612 A GB 2136612A
Authority
GB
United Kingdom
Prior art keywords
word
code
store
stored
responsive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB08306665A
Other versions
GB8306665D0 (en
GB2136612B (en
Inventor
John Rigby Waterworth
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ferranti International PLC
Original Assignee
Ferranti PLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ferranti PLC filed Critical Ferranti PLC
Priority to GB08306665A priority Critical patent/GB2136612B/en
Publication of GB8306665D0 publication Critical patent/GB8306665D0/en
Priority to DE19843407831 priority patent/DE3407831A1/en
Priority to NL8400712A priority patent/NL8400712A/en
Priority to AU25418/84A priority patent/AU559290B2/en
Priority to BR8401177A priority patent/BR8401177A/en
Publication of GB2136612A publication Critical patent/GB2136612A/en
Application granted granted Critical
Publication of GB2136612B publication Critical patent/GB2136612B/en
Expired legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9017Indexing; Data structures therefor; Storage structures using directory or table look-up
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A word checking system includes first and second encoders EN1, EN2 responsive to a word to encode it in accordance with two different algorithms to produce first and second codes. A directory store DS is included in which the first code of any word may be stored at a location determined by the second code. Checking means CM are provided to check for the presence in the store DS of a first code relating to a word applied to the encoders at a location defined by the second code. If the applied word is present then another word may be checked. The absence of an applied word causes the checking means to give an indication so that an operator may decide on the action to be taken. <IMAGE>

Description

SPECIFICATION Word checking system This invention relates to a word checking system, that is to a system which will carry out a simple check on spelling when text is written or encoded.
Systems are known which check spelling, usually comprising word processing or computer systems.
One of the main disadvantages is the need for the system to store a very large vocabulary so that any word used may be checked. The presence of a large vocabulary also leads to slow and complex search routines. It is, of course, possible to use smaller vocabularies and to define rules for its use. For example only words having more than a certain number of characters may be checked, or only technical terms, and so on. These rules are, however, complex to define and again result in complex search routines. In addition, words vary condiderably in length, and their distribution throughout a conventional dictionary is very uneven. For example there are many more words starting with the letter "e" than with the letter "q".This results in very uneconomic use of storage since it is difficult to decide in advance how much space should be provided for words beginning with any particular letter.
It is an object of the invention to provide a word checking system having a vocabulary of any desirable size which operates with simple and clearlydefined rules.
According to the present invention there is provided a word checking system which includes first encoding means responsive to the application of a word to encode the word in accordance with a first algorithm so as to produce a first code representative of that word, a store in which said code may be stored, second encoding means responsive to the application of said word to encode the word in accordance with a second algorithm different from the first algorithm so as to produce a second code defining a location in said store in which a number of said first codes may be stored, and checking means operable on receipt of a first code and a second code both corresponding to a word to compare the first code with any first code already stored in the store location defined by the second code, the checking means being responsive to the presence of said first code in said location to cause the first and second encoding means to accept another word, and responsive to the absence of said code to give an indication of such absence.
The invention will now be described with referpence to the accompanying drawings, in which: Figure 1 is a schematic block diagram of one embodiment of the invention; Figure 2 is a flow diagram illustrating the operation of the invention; and Figure 3 is a flow diagram illustrating the operation of one encoder.
Referring now to Figure 1, this shows a schematic block diagram of one embodiment of the invention.
The invention is applied to a word processor or to a computer having word-processing facilities, and including a text store TS and a display DP on which selected parts of the text may be displayed. The word processor is assumed to have the ability to apply successive words of the text displayed on the display to some other device. This is a conventional facility, used for example when printing a screen of displayed information. In the invention each successive word of the displayed text is applied in parallel to two encoders, shown in Figure 1 as EC1 and EC2.
Each of the two encoders produces a code in respect of the word applied to it as determined by a "hash" or algorithm, the two algorithms being different so that the two codes together identify the word in question. The code output from encoder EN2 defines an address in the dictionary store DS in which all words forming the dictionary are stored in a form determined by the output of encoder EN 1.
The output from encoder EN2 may be used to cause each word in the defined location to be read out to a comparator CM, in which it is compared with the output from encoder ENI. The same encoder output may also be applied to the store.
"Hashing" techniques are well known, and a survey of such techniques may be found, for example in chapter 4 of "Compiling Techniques" by F.R.A.
Hopgood, published by Macdonald in 1969.
Referring now to Figure 2, this shows a flow chart for the operation of the invention. The blocks having a double outline are those denoting a decision made or an operation carried out by the system operation.
It is assumed initially that a page or pages of text have been typed and that the text is stored in the text store TS of Figure 1. The first operation of the system, as shown by block 10 is therefore to withdraw the first word of text from the text store.
The word is then subjected to the two hash operations simultaneously to provide the two code outputs required, as shown at block 11. As defined by block 12, the dictionary store DS is then searched at an address defined by the second code to check whether the word defined by the first code is present or not. The first decision is made at block 13, depending upon whether or not the required word is present in the store at the address searched. If the word is present, then the next word is selected from the text store and the operation is repeated. If, however, the word is not present, block 14 shows the next decision which has to be made. This determines whether or not the spelling of the word in question is correct, and requires action by an operator. If the spelling is not correct, then a correction has to be made, as shown by block 15.In this case the corrected word is passed back for the two hash operations to be repeated, followed by the search of the dictionary store and decisions 13 and 14.
If the spelling of the word was in fact correct, and the word is not already in the dictionary store, then a further decision, shown by block 16, has to be made, again by the operator. This questions whether the word should be entered into the dictionary store as a new word. In practice, words which are names or rarely-used words may not need to be stored, whereas words which are likely to be used again may be stored. If the word is not to be stored, then the next word in the text store is selected, and the procedure set out above is repeated.
If the word is to be stored, then this is done as indicated by block 17. Finally, the last decision block 18 asks if there are any more words in the text store.
If so, then the next word is selected. If there are no further words, then the operation is stopped, with some appropriate indication being given.
The manner in which operator decisions are requested or the end of the operation indicated will depend upon the particular word processing or computer system to which the spelling check system is attached. By way of example only, the need for an operator decision may be indicated by highlighting or flashing the word in question, and the end of the operation may be indicated by the cursor flashing at the bottom of a page of displayed text. There are, of course, other ways of indicating these functions of the system.
Hash operations are well-known, and may take a wide variety of forms. As already stated, the system described above uses two different hash operations which together define any particular word. By way of example only, Figure 3 is a flow chart for one particular hash operation which may be used. This operation requires that any character may be defined by a five-bit number, the final hash being a sixteen-bit number.
The first operation, indicated by block 20, is to clear a 16-bit register of the results of any previous hash operation. Block 21 requires the first letter of the word to be converted into a 5-bit number. The simplest way of doing this is to use the position of the letter in the alphabet, so that for example the letter 'w' becomes 10111. Other conversions may be used. The 5-bit number is then put through an 'exclusive-OR' operation with any number already occupying the five least significant bits of the register, and the result is stored in that same position in the register, as indicated by blocks 22 and 23. Decision block 24 asks whether there is another letter in the word, and if not the entire 16-bit contents of the register are read out as the required 'hash' or code, as at block 25.
If there is another letter in the word, then the contents of the register are shifted cyclically 5 bits to the left. Hence the previous five most significant bits will become the five least significant bits, and all other bits will increase their significance by five (see block 26). The cycle then repeats, with the next letter being converted into a five-bit number, the exclusive-OR operation, and so on. When the complete word has been processed, then the 18bit number in the register represents the final encoded output.
Many other hash techniques are known, possibly involving purely mathematical operations.
The technique described above overcomes one of the main problems of known systems in that all codes to be stored in the dictionary store are of the same length. This results in considerable improvement in the use of the storage available. In addition the stored codes will tend to be shorter than the average word.
All of the operations described above are capable of being performed by any general-purpose computer.

Claims (5)

1. A word checking system which includes first encoding means responsive to the application of a word to encode the word in accordance with a first algorithm so as to produce a first code representative of that word, a store in which said code may be stored, second encoding means responsive to the application of said word to encode the word in accordance with a second algorithm different from the first algorithm so as to produce a second code defining a location in said store in which a number of said first codes may be stored, and checking means operable on receipt of a first code and a second code both corresponding to a word to compare the first code with any first code already stored in the store location defined by the second code, the checking means being responsive to the presence of said first code in said location to cause the first and second encoding means to accept another word, and responsive to the absence of said code to give an indication of such absence.
2. A system as claimed in Claim 1 which includes means operable to store the first code relating to a word in the store at a location determined by the second code relating to that word.
3. A system as claimed in either of Claims 1 or 2 in which the first and second codes uniquely identify any word.
4. A system as claimed in any of Claims 1 to 3 in which the first and second codes are generated simultaneously.
5. A word checking system substantially as herein described with reference to the accompanying drawings.
GB08306665A 1983-03-10 1983-03-10 Word checking system Expired GB2136612B (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
GB08306665A GB2136612B (en) 1983-03-10 1983-03-10 Word checking system
DE19843407831 DE3407831A1 (en) 1983-03-10 1984-03-02 WORD CHECK ARRANGEMENT
NL8400712A NL8400712A (en) 1983-03-10 1984-03-05 WORD CONTROL SYSTEM.
AU25418/84A AU559290B2 (en) 1983-03-10 1984-03-08 Word checking system
BR8401177A BR8401177A (en) 1983-03-10 1984-03-12 WORD VERIFICATION SYSTEM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB08306665A GB2136612B (en) 1983-03-10 1983-03-10 Word checking system

Publications (3)

Publication Number Publication Date
GB8306665D0 GB8306665D0 (en) 1983-04-13
GB2136612A true GB2136612A (en) 1984-09-19
GB2136612B GB2136612B (en) 1986-04-09

Family

ID=10539349

Family Applications (1)

Application Number Title Priority Date Filing Date
GB08306665A Expired GB2136612B (en) 1983-03-10 1983-03-10 Word checking system

Country Status (5)

Country Link
AU (1) AU559290B2 (en)
BR (1) BR8401177A (en)
DE (1) DE3407831A1 (en)
GB (1) GB2136612B (en)
NL (1) NL8400712A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0287713A1 (en) * 1987-04-23 1988-10-26 Océ-Nederland B.V. A text processing system and methods for checking in a text processing system the correct and consistent use of units or chemical formulae
WO1998039715A1 (en) * 1997-03-07 1998-09-11 Apple Computer, Inc. System and method for rapidly identifying the existence and location of an item in a file

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS57172471A (en) * 1981-04-17 1982-10-23 Casio Comput Co Ltd Searching system for electronic dictionary having extended memory
US4588985A (en) * 1983-12-30 1986-05-13 International Business Machines Corporation Polynomial hashing

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0287713A1 (en) * 1987-04-23 1988-10-26 Océ-Nederland B.V. A text processing system and methods for checking in a text processing system the correct and consistent use of units or chemical formulae
US5159552A (en) * 1987-04-23 1992-10-27 Oce-Nederland B.V. Method for checking the correct and consistent use of units or chemical formulae in a text processing system
WO1998039715A1 (en) * 1997-03-07 1998-09-11 Apple Computer, Inc. System and method for rapidly identifying the existence and location of an item in a file
US5897637A (en) * 1997-03-07 1999-04-27 Apple Computer, Inc. System and method for rapidly identifying the existence and location of an item in a file

Also Published As

Publication number Publication date
GB8306665D0 (en) 1983-04-13
BR8401177A (en) 1984-10-23
AU559290B2 (en) 1987-03-05
DE3407831C2 (en) 1988-10-13
NL8400712A (en) 1984-10-01
AU2541884A (en) 1984-09-13
DE3407831A1 (en) 1984-09-13
GB2136612B (en) 1986-04-09

Similar Documents

Publication Publication Date Title
US5224038A (en) Token editor architecture
US4689768A (en) Spelling verification system with immediate operator alerts to non-matches between inputted words and words stored in plural dictionary memories
US3995254A (en) Digital reference matrix for word verification
EP0054667A1 (en) Method of generating a list of expressions semantically related to an input linguistic expression
US4092729A (en) Apparatus for automatically forming hyphenated words
KR950012251A (en) Hanja conversion correction processing method
GB2136612A (en) Word checking system
JPH056398A (en) Document register and document retrieving device
JPS621062A (en) Documentation supporting device
JPH0575143B2 (en)
JPH0731315Y2 (en) Electronics
EP0391706B1 (en) A method encoding text
JPH0685169B2 (en) Document processing method
JP2889431B2 (en) Character processor
JP2761606B2 (en) Document data processing device
JPS6315360A (en) Kana-kanji converting system
JPS5925268B2 (en) A device that generates a vector representation of an input word
JPH05289845A (en) Code converter
JPS62271051A (en) Producing device for document in japanese language
Rolfe Generations of permutations with non-unique elements
JPS5852719A (en) Character data inputting method
JPH0395668A (en) Character data processor
JPH0556553B2 (en)
JPH05128105A (en) Kanji/kana converter
JPS62117064A (en) Kanji-to-kana converting device

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee