WO2022180725A1

WO2022180725A1 - Character recognition device, program, and method

Info

Publication number: WO2022180725A1
Application number: PCT/JP2021/007012
Authority: WO
Inventors: 直樹渡辺
Original assignee: 株式会社ワコム
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2022-09-01
Also published as: CN116075868A; JPWO2022180725A1

Abstract

Provided are a character recognition device, a program, and a method, with which it is possible to improve the precision of recognition of characters in relation to vector-format stroke data. The character recognition device 16 acquires stroke data 48 indicating an aggregation of indication positions and pen pressure values detected in sequence via the handwriting operations of a user, uses the acquired stroke data 48 to generate a feature amount set relating to a stroke group 52 handwritten with the handwriting operations, and uses the generated feature amount set to identify individual characters or character strings indicated by the stroke group 52. The feature amount set includes terminal feature amounts relating to the terminal shapes of each stroke.

Description

Character recognition device, program and method

The present invention relates to a character recognition device, program and method.

Conventionally, there has been known a character recognition technology that recognizes characters by processing data representing single characters or character strings. When the data to be processed is in raster format, that is, in the form of an image, a method of recognizing characters using a convolutional neural network is often used (see

Patent Documents

1 and 2, for example).

Japanese Patent Application Laid-Open No. 2020-027598 JP 2020-119559 A

However, if the data to be processed is in vector format, such as stroke data (or digital ink), using a convolutional neural network is not necessarily effective. In other words, technical ingenuity is required to improve character recognition accuracy.

The present invention has been made in view of such problems, and its object is to provide a character recognition device, program, and method capable of improving the character recognition accuracy for vector format stroke data.

A character recognition apparatus according to a first aspect of the present invention comprises a data acquisition unit for acquiring stroke data indicating a set of designated positions and writing pressure values sequentially detected through a user's handwriting operation; a feature amount generation unit that uses data to generate a feature amount set related to a group of strokes written by the writing operation; and a unit represented by the stroke group using the feature amount set generated by the feature amount generation unit. and a character identification unit for identifying a character or character string, wherein the feature amount set includes an end feature amount related to the end shape of each stroke.

A character recognition program according to a second aspect of the present invention comprises an obtaining step of obtaining stroke data indicating a set of indicated positions and writing pressure values sequentially detected through a user's writing operation; a generation step of generating a feature amount set related to a group of strokes written by a writing operation; and an identification step of identifying a single character or character string indicated by the stroke group using the generated feature amount set. , and the feature set includes the end feature for the end shape of each stroke.

In the character recognition method according to the third aspect of the present invention, an acquiring step of acquiring stroke data indicating a set of indicated positions and writing pressure values sequentially detected through a user's writing operation; a generation step of generating a feature amount set related to a group of strokes written by a writing operation; and an identification step of identifying a single character or character string indicated by the stroke group using the generated feature amount set. is executed, and the feature set includes the end feature for the end shape for each stroke.

According to the present invention, it is possible to improve the character recognition accuracy for vector format stroke data.

1 is an overall configuration diagram of an input system incorporating a character recognition device according to an embodiment of the present invention; FIG. 2 is a flow chart showing the operation of the character recognition device of FIG. 1; 2 is a diagram showing an example of a data structure of ink data in FIG. 1; FIG. FIG. 4 is a diagram showing an example of content visualized by the stroke data of FIG. 3; FIG. FIG. 10 is a diagram illustrating an example of a method of generating feature quantity sets for each bounding box; FIG. 7 is a diagram showing an example of changes in writing pressure when writing strokes with different end shapes. It is a figure which shows an example of the network structure which a discriminator has. FIG. 10 is a diagram showing an example of a character string recognition result; FIG. 11 is a diagram showing an example of a network structure of discriminators in another example; It is a figure which shows an example of the data structure which character information has.

Embodiments of the present invention will be described below with reference to the accompanying drawings. In order to facilitate understanding of the description, the same components and steps in each drawing are denoted by the same reference numerals as much as possible, and redundant description may be omitted.

[Overall Configuration of Input System 10]
FIG. 1 is an overall configuration diagram of an input system 10 incorporating a character recognition device 16 according to one embodiment of the present invention. This input system 10 is configured to be able to provide a "digital ink service" that converts characters written by a user into text data and handles them. This input system 10 specifically includes one or more user devices 12 , one or more electronic pens 14 , and a character recognition device 16 .

The user device 12 is a computer owned by a user who uses the digital ink service, and has a function of detecting the position indicated by the electronic pen 14 . The user device 12 is composed of, for example, a tablet, a smart phone, a personal computer, or the like.

The electronic pen 14 is a pen-type pointing device, and is configured to be capable of unidirectionally or bidirectionally communicating with the user device 12 . The electronic pen 14 is, for example, an active electrostatic coupling (AES) or electromagnetic induction (EMR) stylus. The user can write pictures and characters on the user device 12 by holding the electronic pen 14 and moving the pen tip while pressing it against the touch surface of the user device 12 .

The character recognition device 16 is a computer that performs general control regarding character recognition, and may be either a cloud type or an on-premise type. Here, although the character recognition device 16 is illustrated as a single computer, the character recognition device 16 may alternatively be a group of computers constructing a distributed system. The character recognition device 16 specifically includes a communication section 20 , a control section 22 and a storage section 24 .

The communication unit 20 is an interface that transmits and receives electrical signals to and from an external device. Thereby, the character recognition device 16 acquires the ink data 42 from the user device 12 and supplies the text data 44 generated by itself to the user device 12 .

The control unit 22 is composed of processors including a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit). The control unit 22 reads and executes programs and data stored in the storage unit 24, thereby functioning as a data acquisition unit 26, a recognition processing unit 28, and an output processing unit 30, all of which will be described later.

The storage unit 24 stores programs and data necessary for the control unit 22 to control each component. The storage unit 24 is composed of a non-transitory computer-readable storage medium. Here, computer-readable storage media include magneto-optical disks, ROMs, CD-ROMs, portable media such as flash memory, hard disks built into computer systems (HDD: Hard Disk Drive), solid state drives (SSD: Solid State Drive) and other storage devices.

In the storage unit 24, a database related to digital ink (hereinafter referred to as a digital ink DB 40) is constructed, and ink data 42, text data 44, and parameter sets 46 are stored.

[Operation of Character Recognition Device 16]
The input system 10 in this embodiment is configured as described above. Next, the operation of the character recognition device 16 forming part of the input system 10 will be described with reference to the flow chart of FIG. 2 and FIGS. 3 to 8. FIG.

<Description of flow chart>
In step SP10 of FIG. 2, the data acquisition unit 26 acquires data (hereinafter referred to as "stroke data 48") indicating a set of indicated positions and writing pressure values sequentially detected through the user's writing operation. Specifically, the data acquisition unit 26 extracts and acquires the stroke data 48 from the ink data 42 supplied from the user device 12 .

FIG. 3 is a diagram showing an example of the data structure of the ink data 42 of FIG. In this figure, an example of InkML (Ink Markup Language) written in XML (eXtensible Markup Language) format will be described. The data format of the ink data 42, the so-called "ink description language", is not limited to InkML, and may be WILL (Wacom Ink Layer Language) or ISF (Ink Serialized Format), for example.

In the example of this figure, the ink data 42 includes stroke data 48 describing the writing state of at least one stroke. In the example of this figure, the writing state of a total of 20 strokes from the top line to the bottom line is described in the writing order of the strokes. Also, one stroke is described by a plurality of point data sequentially arranged within the <trace> tag. For convenience of illustration, only the point data indicating the start and end points of the stroke are shown, and the point data indicating a plurality of waypoints are omitted.

Each point data consists of a combination of the indicated position (X coordinate value, Y coordinate value) of the electronic pen 14 and the pen pressure value (eg, 8-bit level value), and is delimited by a delimiter such as a comma. This point data may include the writing order of strokes, the tilt angle of the electronic pen 14, and the like, in addition to the indicated positions and pen pressure values described above.

FIG. 4 is a diagram showing an example of content visualized by the stroke data 48 of FIG. On the display panel of the user device 12, a two-dimensional coordinate system (hereinafter referred to as sensor coordinate system: XY) is defined for specifying a touch position. A stroke group 52 consisting of 20 strokes is arranged in the entire area 50 of the sensor coordinate system. A stroke group 52 indicates a character string composed of kanji and hiragana.

The leading kanji characters on the left side of the drawing include an end 54 indicating the trace of "stop", an end 56 indicating the trace of "push", and an end 58 indicating the trace of "bounce". “Stop” is an operation of releasing the pen tip from the touch surface after suddenly stopping the pen tip during movement. "Pay" is an operation of sliding the tip of the pen while it is moving and releasing it from the touch surface. "Bounce" is the operation of suddenly stopping the moving pen tip and then performing the above-described "swipe" while changing the direction of the pen tip.

At step SP12 in FIG. 2, the recognition processing unit 28 designates a bounding box 60 to be recognized from among the entire area 50 defined by the stroke data 48 acquired at step SP10. The shape of the bounding box 60 is determined by four variables: the X- and Y-coordinate values of the reference position (for example, the vertex closest to the origin O), the length of the side in the X direction, and the length of the side in the Y direction. Uniquely defined.

At step SP14, the recognition processing unit 28 (more specifically, the feature amount generation unit 32) generates a feature amount set within the bounding box 60 specified at step SP12. Here, the feature amount set includes [1] a feature amount related to the trajectory of each stroke (hereinafter referred to as a trajectory feature amount) and [2] a feature amount related to the end shape of each stroke (hereinafter referred to as an end feature amount). be

FIG. 5 is a diagram showing an example of a method of generating feature quantity sets for each bounding box. The feature amount generation unit 32 defines a two-dimensional coordinate system (hereinafter referred to as normalized coordinate system; X'-Y') corresponding to the touch surface of the user device 12 . The origin O' of this normalized coordinate system corresponds to the vertex of the bounding box 60 that is closest to the origin O of the sensor coordinate system. The X'-axis of the normalized coordinate system is parallel to the X-axis of the sensor coordinate system and the Y'-axis of the normalized coordinate system is parallel to the Y-axis of the sensor coordinate system. The scales of the X' and Y' axes are set so that the coordinates of the four vertices forming the bounding box 60 are (0, 0), (1, 0), (0, 1), (1, 1). is normalized to

The feature quantity generating unit 32 refers to the stroke data 48 (FIG. 3) and acquires the coordinate values (X, Y) of the stroke start point Ps and the coordinate values (X, Y) of the end point Pe of the stroke in the sensor coordinate system. Then, the feature amount generation unit 32 performs linear transformation of the coordinate system to obtain the coordinate values (X′, Y′) of the stroke start point Ps and the coordinate values (X′, Y′) of the end point Pe of the stroke in the normalized coordinate system. ) are derived respectively. Note that when the start point Ps or the end point Pe of the stroke is outside the bounding box 60, the feature amount generating unit 32 generates the trajectory feature amount by regarding the via points on the boundary line as the virtual start point Ps or the end point Pe. Just do it.

FIG. 6 is a diagram showing an example of changes in writing pressure when writing strokes with different end shapes. The horizontal axis of the graph indicates time (t), and the vertical axis of the graph indicates writing pressure. The time t=Ts corresponds to the execution time of the pen-down operation of pressing the electronic pen 14 against the touch surface. On the other hand, the time t=Te corresponds to the execution time point of the pen-up operation of releasing the electronic pen 14 from the touch surface.

When the end shape is "stop", the pen pressure tends to drop sharply just before t=Te while remaining substantially constant. When the end shape is "sweep", the writing pressure tends to gradually decrease before t=Te while remaining substantially constant. When the end shape is "bounce", the pen pressure temporarily increases before t=Te while keeping approximately constant, and then at an intermediate slope between "stop" and "bounce". tend to decline.

The feature quantity generation unit 32 estimates the end shape for each stroke using various methods including pattern matching. If an estimation result is obtained that does not correspond to any of stop, sweep, and bounce, an attribute of "unknown" or "not applicable" may be assigned. Note that the end feature quantity is not limited to identification values such as stop, sweep, bounce, etc., but various values that indicate shape features (for example, the slope at which the pen pressure value decreases, the position at which the pen pressure value begins to decrease, the time series of the pen pressure value, etc.) etc.).

At step SP16 in FIG. 2, the recognition processing unit 28 (more specifically, the character identification unit 34) performs character identification processing using the feature quantity set generated at step SP14. Prior to this identification, the character identification unit 34 selects a classifier 70 suitable for the type of character from among a plurality of machine-learned classifiers 70 . Types of characters include, for example, kanji, hiragana, katakana, Arabic numerals, alphabets, and symbols.

FIG. 7 is a diagram showing an example of the network structure of the discriminator 70. FIG. The discriminator 70 is composed of, for example, a hierarchical neural network consisting of an input layer 72 , an intermediate layer 74 and an output layer 76 . The calculation rule of the discriminator 70 is determined by the values of the parameter set 46 (FIG. 1), which is a collection of learning parameters. The parameter set 46 may include, for example, coefficients describing activation functions of arithmetic units, weighting coefficients corresponding to the strength of synaptic connections, the number of arithmetic units that make up each layer, the number of intermediate layers 74, and the like. The parameter set 46 is stored in the storage unit 24 (FIG. 1) in a state in which each value has been determined upon completion of machine learning, and is read out as needed.

The input layer 72 is a layer for inputting the trajectory feature amount and the terminal feature amount, and is composed of 5n arithmetic units in the example of this figure. Here, the natural number n corresponds to the number of strokes. This feature set includes: [1] X' coordinate of start point Ps, [2] Y' coordinate of start point Ps, [3] X' coordinate of end point Pe, [4] Y' coordinate of end point Pe, [5] end It is an input vector consisting of 5n components in which shape identification values are arranged in order of stroke ID.

The intermediate layer 74 is composed of one or more layers and has a dimensionality reduction function that reduces the number of dimensions of the feature quantity set. In other words, it is desirable that the number of arithmetic units constituting the intermediate layer 74 is sufficiently less than 5n.

The output layer 76 is a layer that outputs a group of character labels, and in the example of this figure, it is composed of operation units for the number of hiragana characters (for example, 46 excluding dakuten and handakuten). If the activation function of the arithmetic unit is the softmax function, this label set is a 46-component output vector representing the probabilities for each character.

Also, if the character type is "Kanji", the classifier 70 suitable for identifying Kanji may be selected. Alternatively, if the type of character cannot be identified, the identifier 70 suitable for identifying multiple types of characters (for example, hiragana and katakana) may be selected.

Then, the character identification unit 34 receives the feature quantity set corresponding to the bounding box 60 from the input layer 72 and outputs the label group corresponding to the bounding box 60 to the output layer 76 through the calculation by the intermediate layer 74 . For example, if the label values indicate probabilities, the character with the highest probability is identified as the "character candidate."

At step SP18 in FIG. 2, the recognition processing unit 28 confirms whether or not all bounding boxes 60 have been specified at step SP12. If an undesignated bounding box 60 remains (step SP18: NO), the process returns to step SP12.

In the second and subsequent recognition processes, the recognition processing unit 28 sequentially performs feature set generation (SP14) and character recognition (SP16) while variously changing the combination of the four variables that specify the bounding box 60. Then, when all bounding boxes 60 have been specified for the entire area 50 (step SP18: YES), the process proceeds to the next step SP20.

At step SP20, the recognition processing unit 28 aggregates the identification results at step SP16 and generates text data 44 indicated by the stroke group 52. Specifically, the recognition processing unit 28 extracts four characters by selecting the combination of the bounding box 60 and the label that maximizes the probability.

As shown in FIG. 8, bounding boxes B1, B2, B3, B4 are defined that respectively enclose the four characters. Text data 44 meaning "to write a character" in Japanese is obtained by successively concatenating the four characters from the left side to the right side. Ink data 42 and text data 44 are digital ink DB

At step SP22 in FIG. 2, the output processing unit 30 outputs the text data 44 generated at step SP20 in response to a request from the user device 12. Thus, a series of operations (flowchart in FIG. 2) by the character recognition device 16 are completed.

<Another example of identification>
FIG. 9 is a diagram showing an example of the network structure of discriminator 80 in another example. The discriminator 80 is composed of an input layer 82 having a different number of arithmetic units from the input layer 72 in FIG.

The input layer 82 is a layer for inputting only the trajectory feature amount, and in the example of this figure, it is composed of 4n (n is the number of strokes) arithmetic units. This feature set includes [1] the X' coordinate of the start point Ps, [2] the Y' coordinate of the start point Ps, [3] the X' coordinate of the end point Pe, and [4] the Y' coordinate of the end point Pe. It is an input vector consisting of 4n elements arranged in order.

FIG. 10 is a diagram showing an example of the data structure of the character information 84. FIG. The character information 84 is data in the form of a table showing correspondence between [1] character identification information, [2] stroke order of strokes, [3] types of end shapes, and [4] importance of end shapes. is.

　Hereinafter, the method of recognizing characters with the configurations in FIGS. 9 and 10 will be described according to the flowchart in FIG. Steps other than steps SP16 and SP20 are the same as those described above, and therefore will not be described again.

At step SP16 in FIG. 2, the recognition processing unit 28 (more specifically, the character identification unit 34) performs character identification processing using the feature quantity set generated at step SP14. Prior to this identification, the character identification unit 34 selects one or more classifiers 80 from a plurality of machine-learned classifiers 80 (FIG. 9).

Then, the character identification unit 34 inputs a feature amount set (here, only the trajectory feature amount) corresponding to the bounding box 60 from the input layer 92, and through the calculation by the intermediate layer 74, the label group corresponding to the bounding box 60. Output to output layer 76 . For example, if the label values indicate probabilities, the characters with the highest probabilities (for example, the highest or the top three) are identified as "character candidates."

At step SP20 in FIG. 2, the recognition processing unit 28 aggregates the identification results at step SP16 and generates text data 44 indicated by the stroke group 52. Specifically, the recognition processing unit 28 extracts four characters by selecting the label with the highest degree of matching of the terminal shape from among the plurality of selected character candidates. Specifically, the degree of matching between the end shape estimated in step SP14 and the end shape of the character information 84 is evaluated for each stroke. The degree of matching is evaluated, for example, by summing scores weighted according to the degree of importance of the character information 84 .

In this way, the recognition processing unit 28 selects a plurality of character candidates using the discriminator 80 whose input is a feature amount set that does not include the terminal feature amount, and specifies a character using the terminal feature amount. good.

[Effect of Character Recognition Device 16]
As described above, the character recognition device 16 includes the data acquisition unit 26 for acquiring the stroke data 48 indicating a set of pointing positions and writing pressure values sequentially detected through the user's handwriting operation, and A feature amount generation unit 32 that generates a feature amount set related to a stroke group 52 written by a writing operation using the stroke data 48 obtained by writing, and a stroke group 52 using the feature amount set generated by the feature amount generation unit 32 and a character identification unit 34 for identifying a single character or character string indicated by . The feature amount set includes the end feature amount related to the end shape for each stroke.

In this character recognition method and program, the character recognition device 16 as a computer acquires stroke data 48 indicating a set of indicated positions and writing pressure values sequentially detected through a user's writing operation (SP10). , a generation step (SP14) of generating a feature amount set related to the stroke group 52 written by the writing operation using the acquired stroke data 48; and an identification step (SP16) for identifying the character or character string of . The feature amount set includes the end feature amount related to the end shape for each stroke.

Thus, since the feature amount set includes the end feature amount related to the end shape of each stroke, it is possible to accurately capture the unique shape features of handwritten characters, and the character can improve the recognition accuracy of

In addition, the character identification unit 34 may be configured to include a classifier 70 that receives as input a feature amount set including the terminal feature amount and outputs a probability for each character. Alternatively, the character identification unit 34 is configured to include a discriminator 80 that receives a feature amount set that does not include the terminal feature amount as input and outputs a probability for each character, and when a plurality of character candidates are obtained by the discriminator 80 , may be used to identify characters.

It should be noted that the end feature amount may be a value that identifies stopping, sweeping, or bouncing. Also, the types of characters may include at least one of kanji, hiragana, katakana, and Arabic numerals.

[Modification]
It goes without saying that the present invention is not limited to the above-described embodiments, and can be freely modified without departing from the gist of the present invention. Alternatively, each configuration may be arbitrarily combined as long as there is no technical contradiction.

In the above embodiment, the classifiers 70 and 80 (see FIGS. 7 and 9) were constructed using hierarchical neural networks, but the machine learning method is not limited to this. For example, various techniques may be employed including support vector machines, decision trees (eg, random forests), boosting methods (eg, gradient boosting methods).

In the above-described embodiment, the case where the feature amount generation unit 32 (FIG. 1) generates the trajectory feature amount using only the start point and end point of the stroke has been described, but the method for generating the trajectory feature amount is not limited to this. For example, the feature amount generation unit 32 may generate the trajectory feature amount using at least one waypoint (for example, the middle point between the start point and the end point) in addition to the start point and the end point.

[Description of symbols]
DESCRIPTION OF SYMBOLS 10... Input system, 12... User device, 14... Electronic pen, 16... Character recognition device, 26... Data acquisition part, 28... Recognition processing part, 30... Output processing part, 32... Feature amount generation part, 34... Character recognition Part 48... Stroke data 52...

Stroke group

70, 80... Discriminator

Claims

a data acquisition unit that acquires stroke data indicating a set of designated positions and writing pressure values that are sequentially detected through a user's writing operation;
a feature amount generation unit that generates a feature amount set related to a group of strokes written by the writing operation using the stroke data acquired by the data acquisition unit;
a character identification unit that identifies a single character or character string indicated by the stroke group using the feature amount set generated by the feature amount generation unit;
with
The character recognition device, wherein the feature amount set includes an end feature amount related to an end shape of each stroke.
The character identification unit includes a classifier that receives a feature set including the terminal feature as input and outputs a probability for each character,
2. The character recognition device according to claim 1.
The character identification unit
A discriminator that receives as input a feature set that does not include the terminal feature and outputs a probability for each character,
When a plurality of character candidates are obtained by the classifier, identifying a character using the terminal feature amount;
2. The character recognition device according to claim 1.
The terminal feature amount is a value that identifies stopping, sweeping, or bouncing,
A character recognition device according to any one of claims 1 to 3.
The character identification unit identifies at least one type of character among kanji, hiragana, katakana, and Arabic numerals.
A character recognition device according to any one of claims 1 to 3.
an acquiring step of acquiring stroke data indicating a set of pointing positions and writing pressure values sequentially detected through a user's writing operation;
a generation step of generating a feature amount set related to a group of strokes written by the handwriting operation, using the acquired stroke data;
an identification step of identifying a single character or character string indicated by the stroke group using the generated feature amount set;
on the computer, and
The character recognition program, wherein the feature amount set includes an end feature amount related to an end shape of each stroke.
an acquiring step of acquiring stroke data indicating a set of pointing positions and writing pressure values sequentially detected through a user's writing operation;
a generation step of generating a feature amount set related to a group of strokes written by the writing operation using the acquired stroke data;
an identification step of identifying a single character or character string indicated by the stroke group using the generated feature set;
is executed by the computer and
The character recognition method, wherein the feature amount set includes an end feature amount related to an end shape of each stroke.