GB2050019A

GB2050019A - Method of Producing Typographical Data

Info

Publication number: GB2050019A
Application number: GB7918335A
Authority: GB
Original assignee: SHIU - CHANG LOH
Current assignee: SHIU - CHANG LOH
Priority date: 1979-05-25
Filing date: 1979-05-25
Publication date: 1980-12-31
Also published as: GB2050019B; JPS5696382A

Abstract

A method of producing typographical data defining the characters used in an ideographic language, uses a store containing sets of coded signals, each defining the strokes of one of a set of character components or radicals from which the characters of the language can be formed. The store also contains data defining a set of basic patterns for the arrangement of the components within a character area. A character selector, conveniently a keyboard and associated encoding circuitry, is operated to generate an identifying signal which is unique to the selected character and which identifies the appropriate pattern for that character. The identifying signal is processed to obtain from the store typographical data corresponding to the various coded signals defining the strokes of all the components forming the selected character, and their positions within the character area according to the associated pattern. <IMAGE>

Description

SPECIFICATION Method of Producing Typographical Data This invention relates to character composition, and more particularly to the storage and retrieval of typographical information defining the two dimensional geometrical structure of stylised characters. The invention is particularly, but not exclusively applicable in the composition of characters in an ideographic language, such as Chinese and other oriental languages which use a large number of symbols for written communication.

The size of the vocabulary employed in such languages presents many general and specific problems in this field. The first problem is that of character selection, this being the first function to be performed in the process of character composition and generally representing the link between the human and mechanical operations.

The thought processes of the operator are translated, usually by the selective operation of keys on a keyboard, into "signals", which may be electrical impulses or mechanical movements, for further processing. Character selection by keyboard operation is not a problem in the composition of characters of European languages employing a limited character alphabet since the keyboard necessary to provide a comprehensive selection facility is managable in proportions and number of keys; the time employed in visually locating each key to be operated in sequence is, to an experienced operator, insignificant compared with the time taken to perform the physical process of key actuation.However, where, as in Chinese, ideographic characters (equivalent to "words" in a European language) are themselves the basic linguistic units, and are not normally further subdivided, the system of assigning to each key a unique character produces an overwhelming problem of keyboard size. Various attempts have been made to reduce this problem by, for example, using a plurality of interchangeable keyboards and/or assigning a plurality of characters to each key. Such measures have enjoyed only limited success. A keyboard designed to enable rapid character selection is the subject of copending patent applications Nos.

27636/78 and 43081/78.

A further problem lies in the subsequent process and means for character generation. After character selection, it is generally necessary to print, display or transmit in coded form the selected character. For this purpose, it is necessary to store the data concerning the typographical construction of the characters.

Again, the very large character vocabulary used in, for example the Chinese, Japanese languages leads to the requirement for large data storage facilities, where, as is usual, the typographical data of such complete character of the language is stored. In a known arrangement, which consisted of a typewriter, the data was stored in the form of individual typeface elements. To accommodate the number of elements necessary to provide a reasonable vocabulary a rather cumbersome mechanical arrangement was necessary.More recently, the data has been stored on magnetic media, such as tape or disc in the form of sets of permanent coded signals, or "programmes", each defining the structure of a complete character in such a way as to permit reconstruction of the character as a pattern of dots within a matrix of, say 1 4x 16, 20x20, 20x24 or 24x24 dots, dependent upon the definition of the visual character form required.

The programmes can be read out from a data store to operate, for example, a typewriter, a visual display unit (VDU) having a cathode ray display tube, a photo composing machine, either locally to or remotely from the data store.

However, since the Chinese language, for example, uses some 50,000 characters the storage requirements are extensive, requiring the use of mini computer for programme handling.

These requirements have hitherto precluded the possibility of constructing a reasonably compact, self-contained typewriter use in the Chinese and other ideographic languages.

With the advent of micro chip technology, it has become possible to provide facilities for storage of binary coded digital data in very compact semiconductor devices. It would be advantageous to provide a method of data storage and retrieval applicable to the composition of ideographic characters, which could utilize the advantages and abilities of such micro chip devices.

All of the more common oriental ideographic languages e.g. Chinese, Japanese, Korean possess a common feature, viz. most of the characters of any such language can be -constructed from a respective set of basic character components, or so-called modified radicals. As used herein a character component is a basic character element which may or may not have both linguistic and orthographic identity (by this it is meant that it may or may not have a meaning of its own in the linguistic sense and be visually represented alone in the orthographic sense) but which can form part of one or more composite characters comprising different geometrical arrangements of such character components.This construction of characters from relatively simple components is a weliknown characteristic of ideographic languages and accordingly will not be discussed in greater detail herein.

The present invention provides a method of producing typographical data defining the characters used in an ideographic language, said characters each comprising one or more of a set of character components, as herein defined, comprising providing a store containing, in respect of each component, a set of coded signals defining the strokes of said component, and also containing data defining a plurality of patterns for the arrangement of the components within a character area, operating a character selector to generate an identifying signal which is unique to a selected character and which identifies the pattern associated with that character, and obtaining, in response to said identifying signal, from said store, typographical data corresponding to the various coded signals defining the strokes of all the components forming the selected character, and their positions within said character area according to the associated pattern.

The invention therefore seeks to reduce the data storage requirements by storing the component data and not the entire character data.

Preferably, each set of coded signals defines, in coordinate form, the relative positions of an array of points and indicates the manner in which the associated component may be visually represented by forming the strokes as straight lines between the points.

The invention will now be described more fully with reference to the accompanying drawings, in which: Figure 1 illustrates the steps involved in producing a unique code for a desired ideographic character, and the content and meaning of the various portions of the code produced; Figure 2 illustrates four out of a set of basic character patterns; Figure 3 illustrates, upon a rectangular, two dimensional dot matrix the character to be constructed, and the method by which it is defined on the matrix; and Figures 4 to 6 illustrate four more characters as defined on the dot matrix.

As mentioned earlier, the character selection input device for the typewriter, or other composing apparatus operating in accordance with this invention, will be a keyboard depicting a set of components from which the characters of the language concerned can be constructed.

Although the keyboard will also include function keys necessary to select the usual typing functions such as space, tabulate, backspace, etc.

it is the component keys which are of interest here, and the processes indicated by their operation. The operation of each key causes the production of a key signal unique to that key.

Between successive operations of a delimiter key provided on the keyboard, the component keys associated with the components which together make up a required character, are operated. The keyboard thereby produces a set of key signals, each, for example in the form of an 8-bit word.

Appropriate signal processing circuitry responds to such a set to produce an identifying signal, unique to the selected character, which can then be used to derive from a store the programmes associated with the components. It should be noted that it is not necessary to store the typographical data concerning the structures of the complete characters. However, there are certain variations which must be accounted for.

Firstly, a given component will not always appear in the same part of the character area, for different characters, and the relative location of components varies for characters comprising the same number of components. A thorough investigation of the various different component arrangements has shown that the component arrangements of the complete vocabulary of Chinese characters can be reduced to a set of 54 basic different "patterns". A smaller set is applicable where a reduced character set, or vocabulary is to be used. Four of these patterns are given in Figure 2.

Each of these patterns consists of an overall character area comprising one or more component areas; the data defining the basic patterns is stored in the data store. The first pattern illustrated is the simplest and is applicable where the selected character consists of a single omponent; no subdivision of the character area is required, and this area is therefore equivalent to the single component area. The second pattern consists of a single vertical subdivision of the character area into two component areas. This pattern is, as shown in Figure 1, applicable to the character illustrated in Figure 3, which consists of two components, each to be represented in one of the two component areas.It will be appreciated that the use of a different pair of components to construct a different character to that shown in Figure 23 but with the same basic mutual disposition may require a sideways shift of the notional boundary line to vary the relative sizes of the component areas. This is applicable to most of the basic patterns. Thus, as shown in Figure 1 a first part A of the identifying signal defines the particular pattern stored in the data store into which the components will fit to define the associated character, and also defines the exact positions of the boundary line or lines.

A second variation of the components is that of size; i.e. a given component may be larger in one character than in another. Investigation on this point has shown that, including the size variants, the complete range of Chinese characters can be constructed using about 600 component forms.

When defined in programmes in the manner to be described later herein, the structure of all of these component forms can readily be stored in a microchip storage device.

A second part B of the identifying signal represents the number of components in the selected character, and the subsequent portions C, D etc. identify the particular component forms.

When applied to the data store, the character code signal initiates the output of the data programmes defining the component form structure.

Figure 3 illustrates a character depicted as a dot pattern on a 1 4x 16 matrix. The pattern consists of two components, each formed by strokes comprising-straight lines, and is therefore possible to store the constructional data necessary-to define these components as sets of coordinates defining the end points of these straight lines. The sets, or programmes, defining the Figure 3 components provide the following digital coordinate data: 1.105-501405-801 603-1603 2.402-406-806-802 601-608 1002 -10061202-1206104-1604.

The first set above defines three lines. The first begins at the point which is the fifth matrix point in the first line and ends (denoted by the minus sign) at the first point in the fifth line. Accordingly, each point is defined by the row number followed by the point number in that row, counting from left to right. Following this scheme it will be seen how the component < is defined by the first set, or programme, above.

The second programme above defines the right hand component $ . It will be noted that the coordinates define the component as if it were to be depicted in the left hand component area. This is to ailow for the shifting of the component towards the right by an amount determined by the previously mentioned vertical boundary line position. In the Figure 3 character, this boundary line extends from point 106 of the character area to point 1 606. Thus, when the second programme is read out, it can readily be modified to place the component in the right hand component area by adding 6 to each point position number. In this way, any component form can be shifted about the character area by a common modification of the programme to produce a common shift of all the point coordinates defined therein.This modification is performed at the time that the programme is read out in accordance with the data contained in portion A of the character code signal concerning the exact positions of the boundary lines forming the patterns stored in the data store.

An appropriate interface decoder translates the programmes of coordinates points into sets of point coordinates defining all of the matrix points forming the component strokes, and feeds appropriate signals to operate a dot matrix pointer or other device as required.

Figures 4 to 7 show four more characters as depicted by dot matrix patterns. The component programmes as stored in the data store to define these characters are as follows:- Figure 4 1.102-405 601 -904 1601 -1205 2.101-303203-208-1608 3.101-106-606-1206-1201 -601 -101601-606 Figure 5 1.103406 111408 503511 803-811 1102 -1112 507 1607 Figure 6 1.102-405601-9041601 -1205 2.102-108-308-302 102-1502-1601 3.102-1202-1206-1006901 -406 Figure 7 1. 501 -505 103 -1603 --1502 1101-804 2.101-108-1608-1601-101 1101 3. 302 --306 The notional dividing lines between component areas are shown in dashed lines in these figures.

The characters of Figures 4, 6 and 7 each comprise three components. While that of Figure 5 comprises a single component.

The data store therefore stores, in digital form the vector structure of the component forms, and the general schemes of patterns of subdivision of the character area. The effect of the application of the identifying signal is to determine which of the patterns is applicable and the exact positions of the subdividing boundary lines, and to derive from the store the programmes, adjusted in accord with the pattern so selected, defining the component structures.

This method obviates the necessity to store the constructional data of each individual character.

Claims

1. A method of producing typographical data defining the characters used in an ideographic language, said characters each comprising one or more of a set of character components, as herein defined, comprising providing a store containing, in respect of each component, set of coded signals defining the strokes of said component, and also containing data defining a plurality of patterns for the arrangement of the components within a character area, operating a character selector to generate an identifying signal which is unique to a selected character and which identifies the pattern associated with that character, and obtaining, in response to said identifying signal, from said store, typographical data corresponding to the various coded signals defining the strokes of all the components forming the selected character, and their positions within said character area according to the associated pattern.

2. A method according to claim 1 wherein each set of coded signals defines the relative positions of an array of points and indicates the manner in which the associated component may be visually represented by forming the strokes as straight lines between the points.

3. A method according to claim 2 wherein the position of said points are defined as coordinates in a two dimensional frame of reference.

4. A method according to any preceding claim, wherein said data defining said patterns determines the positions of rectilinear boundary lines for subdividing the character area into a plurality of component areas to include the constituent components constituting the characters.

5. A method according to claim 4, when dependent on claim 2 or claim 3, wherein the typographical data obtained from the store is derived from the sets of coded signals by reference to the pattern identified by the identifying signal so as to place the components accurately within the character area with reference to said boundary lines.

6. A method of producing typographical data defining the characters used in an ideographic language, substantially as hereinbefore described, with reference to the accompanying drawings.