WO2022180725A1 - Character recognition device, program, and method - Google Patents

Character recognition device, program, and method Download PDF

Info

Publication number
WO2022180725A1
WO2022180725A1 PCT/JP2021/007012 JP2021007012W WO2022180725A1 WO 2022180725 A1 WO2022180725 A1 WO 2022180725A1 JP 2021007012 W JP2021007012 W JP 2021007012W WO 2022180725 A1 WO2022180725 A1 WO 2022180725A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
feature amount
stroke
character recognition
recognition device
Prior art date
Application number
PCT/JP2021/007012
Other languages
French (fr)
Japanese (ja)
Inventor
直樹 渡辺
Original Assignee
株式会社ワコム
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社ワコム filed Critical 株式会社ワコム
Priority to JP2023501748A priority Critical patent/JPWO2022180725A1/ja
Priority to CN202180056714.2A priority patent/CN116075868A/en
Priority to PCT/JP2021/007012 priority patent/WO2022180725A1/en
Publication of WO2022180725A1 publication Critical patent/WO2022180725A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing

Definitions

  • the present invention relates to a character recognition device, program and method.
  • the present invention has been made in view of such problems, and its object is to provide a character recognition device, program, and method capable of improving the character recognition accuracy for vector format stroke data.
  • a character recognition apparatus comprises a data acquisition unit for acquiring stroke data indicating a set of designated positions and writing pressure values sequentially detected through a user's handwriting operation; a feature amount generation unit that uses data to generate a feature amount set related to a group of strokes written by the writing operation; and a unit represented by the stroke group using the feature amount set generated by the feature amount generation unit. and a character identification unit for identifying a character or character string, wherein the feature amount set includes an end feature amount related to the end shape of each stroke.
  • a character recognition program comprises an obtaining step of obtaining stroke data indicating a set of indicated positions and writing pressure values sequentially detected through a user's writing operation; a generation step of generating a feature amount set related to a group of strokes written by a writing operation; and an identification step of identifying a single character or character string indicated by the stroke group using the generated feature amount set.
  • the feature set includes the end feature for the end shape of each stroke.
  • an acquiring step of acquiring stroke data indicating a set of indicated positions and writing pressure values sequentially detected through a user's writing operation; a generation step of generating a feature amount set related to a group of strokes written by a writing operation; and an identification step of identifying a single character or character string indicated by the stroke group using the generated feature amount set. is executed, and the feature set includes the end feature for the end shape for each stroke.
  • FIG. 1 is an overall configuration diagram of an input system incorporating a character recognition device according to an embodiment of the present invention
  • FIG. 2 is a flow chart showing the operation of the character recognition device of FIG. 1
  • 2 is a diagram showing an example of a data structure of ink data in FIG. 1
  • FIG. FIG. 4 is a diagram showing an example of content visualized by the stroke data of FIG. 3
  • FIG. FIG. 10 is a diagram illustrating an example of a method of generating feature quantity sets for each bounding box
  • FIG. 7 is a diagram showing an example of changes in writing pressure when writing strokes with different end shapes. It is a figure which shows an example of the network structure which a discriminator has.
  • FIG. 10 is a diagram showing an example of a character string recognition result
  • FIG. 11 is a diagram showing an example of a network structure of discriminators in another example
  • It is a figure which shows an example of the data structure which character information has.
  • FIG. 1 is an overall configuration diagram of an input system 10 incorporating a character recognition device 16 according to one embodiment of the present invention.
  • This input system 10 is configured to be able to provide a "digital ink service" that converts characters written by a user into text data and handles them.
  • This input system 10 specifically includes one or more user devices 12 , one or more electronic pens 14 , and a character recognition device 16 .
  • the user device 12 is a computer owned by a user who uses the digital ink service, and has a function of detecting the position indicated by the electronic pen 14 .
  • the user device 12 is composed of, for example, a tablet, a smart phone, a personal computer, or the like.
  • the electronic pen 14 is a pen-type pointing device, and is configured to be capable of unidirectionally or bidirectionally communicating with the user device 12 .
  • the electronic pen 14 is, for example, an active electrostatic coupling (AES) or electromagnetic induction (EMR) stylus.
  • AES active electrostatic coupling
  • EMR electromagnetic induction
  • the character recognition device 16 is a computer that performs general control regarding character recognition, and may be either a cloud type or an on-premise type.
  • the character recognition device 16 is illustrated as a single computer, the character recognition device 16 may alternatively be a group of computers constructing a distributed system.
  • the character recognition device 16 specifically includes a communication section 20 , a control section 22 and a storage section 24 .
  • the communication unit 20 is an interface that transmits and receives electrical signals to and from an external device. Thereby, the character recognition device 16 acquires the ink data 42 from the user device 12 and supplies the text data 44 generated by itself to the user device 12 .
  • the control unit 22 is composed of processors including a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit).
  • the control unit 22 reads and executes programs and data stored in the storage unit 24, thereby functioning as a data acquisition unit 26, a recognition processing unit 28, and an output processing unit 30, all of which will be described later.
  • the storage unit 24 stores programs and data necessary for the control unit 22 to control each component.
  • the storage unit 24 is composed of a non-transitory computer-readable storage medium.
  • computer-readable storage media include magneto-optical disks, ROMs, CD-ROMs, portable media such as flash memory, hard disks built into computer systems (HDD: Hard Disk Drive), solid state drives (SSD: Solid State Drive) and other storage devices.
  • a database related to digital ink (hereinafter referred to as a digital ink DB 40) is constructed, and ink data 42, text data 44, and parameter sets 46 are stored.
  • the input system 10 in this embodiment is configured as described above. Next, the operation of the character recognition device 16 forming part of the input system 10 will be described with reference to the flow chart of FIG. 2 and FIGS. 3 to 8. FIG.
  • step SP10 of FIG. 2 the data acquisition unit 26 acquires data (hereinafter referred to as "stroke data 48") indicating a set of indicated positions and writing pressure values sequentially detected through the user's writing operation. Specifically, the data acquisition unit 26 extracts and acquires the stroke data 48 from the ink data 42 supplied from the user device 12 .
  • stroke data 48 data (hereinafter referred to as "stroke data 48") indicating a set of indicated positions and writing pressure values sequentially detected through the user's writing operation.
  • the data acquisition unit 26 extracts and acquires the stroke data 48 from the ink data 42 supplied from the user device 12 .
  • FIG. 3 is a diagram showing an example of the data structure of the ink data 42 of FIG.
  • InkML Ink Markup Language
  • XML eXtensible Markup Language
  • the data format of the ink data 42 is not limited to InkML, and may be WILL (Wacom Ink Layer Language) or ISF (Ink Serialized Format), for example.
  • the ink data 42 includes stroke data 48 describing the writing state of at least one stroke.
  • the writing state of a total of 20 strokes from the top line to the bottom line is described in the writing order of the strokes.
  • one stroke is described by a plurality of point data sequentially arranged within the ⁇ trace> tag. For convenience of illustration, only the point data indicating the start and end points of the stroke are shown, and the point data indicating a plurality of waypoints are omitted.
  • Each point data consists of a combination of the indicated position (X coordinate value, Y coordinate value) of the electronic pen 14 and the pen pressure value (eg, 8-bit level value), and is delimited by a delimiter such as a comma.
  • This point data may include the writing order of strokes, the tilt angle of the electronic pen 14, and the like, in addition to the indicated positions and pen pressure values described above.
  • FIG. 4 is a diagram showing an example of content visualized by the stroke data 48 of FIG.
  • a two-dimensional coordinate system (hereinafter referred to as sensor coordinate system: XY) is defined for specifying a touch position.
  • a stroke group 52 consisting of 20 strokes is arranged in the entire area 50 of the sensor coordinate system.
  • a stroke group 52 indicates a character string composed of kanji and hiragana.
  • the leading kanji characters on the left side of the drawing include an end 54 indicating the trace of "stop”, an end 56 indicating the trace of "push”, and an end 58 indicating the trace of "bounce”.
  • “Stop” is an operation of releasing the pen tip from the touch surface after suddenly stopping the pen tip during movement.
  • Payment is an operation of sliding the tip of the pen while it is moving and releasing it from the touch surface.
  • “Bounce” is the operation of suddenly stopping the moving pen tip and then performing the above-described "swipe” while changing the direction of the pen tip.
  • the recognition processing unit 28 designates a bounding box 60 to be recognized from among the entire area 50 defined by the stroke data 48 acquired at step SP10.
  • the shape of the bounding box 60 is determined by four variables: the X- and Y-coordinate values of the reference position (for example, the vertex closest to the origin O), the length of the side in the X direction, and the length of the side in the Y direction. Uniquely defined.
  • the recognition processing unit 28 (more specifically, the feature amount generation unit 32) generates a feature amount set within the bounding box 60 specified at step SP12.
  • the feature amount set includes [1] a feature amount related to the trajectory of each stroke (hereinafter referred to as a trajectory feature amount) and [2] a feature amount related to the end shape of each stroke (hereinafter referred to as an end feature amount).
  • FIG. 5 is a diagram showing an example of a method of generating feature quantity sets for each bounding box.
  • the feature amount generation unit 32 defines a two-dimensional coordinate system (hereinafter referred to as normalized coordinate system; X'-Y') corresponding to the touch surface of the user device 12 .
  • the origin O' of this normalized coordinate system corresponds to the vertex of the bounding box 60 that is closest to the origin O of the sensor coordinate system.
  • the X'-axis of the normalized coordinate system is parallel to the X-axis of the sensor coordinate system and the Y'-axis of the normalized coordinate system is parallel to the Y-axis of the sensor coordinate system.
  • the scales of the X' and Y' axes are set so that the coordinates of the four vertices forming the bounding box 60 are (0, 0), (1, 0), (0, 1), (1, 1). is normalized to
  • the feature quantity generating unit 32 refers to the stroke data 48 (FIG. 3) and acquires the coordinate values (X, Y) of the stroke start point Ps and the coordinate values (X, Y) of the end point Pe of the stroke in the sensor coordinate system. Then, the feature amount generation unit 32 performs linear transformation of the coordinate system to obtain the coordinate values (X′, Y′) of the stroke start point Ps and the coordinate values (X′, Y′) of the end point Pe of the stroke in the normalized coordinate system. ) are derived respectively. Note that when the start point Ps or the end point Pe of the stroke is outside the bounding box 60, the feature amount generating unit 32 generates the trajectory feature amount by regarding the via points on the boundary line as the virtual start point Ps or the end point Pe. Just do it.
  • FIG. 6 is a diagram showing an example of changes in writing pressure when writing strokes with different end shapes.
  • the horizontal axis of the graph indicates time (t), and the vertical axis of the graph indicates writing pressure.
  • the feature quantity generation unit 32 estimates the end shape for each stroke using various methods including pattern matching. If an estimation result is obtained that does not correspond to any of stop, sweep, and bounce, an attribute of "unknown” or “not applicable” may be assigned. Note that the end feature quantity is not limited to identification values such as stop, sweep, bounce, etc., but various values that indicate shape features (for example, the slope at which the pen pressure value decreases, the position at which the pen pressure value begins to decrease, the time series of the pen pressure value, etc.) etc.).
  • the recognition processing unit 28 (more specifically, the character identification unit 34) performs character identification processing using the feature quantity set generated at step SP14. Prior to this identification, the character identification unit 34 selects a classifier 70 suitable for the type of character from among a plurality of machine-learned classifiers 70 .
  • Types of characters include, for example, kanji, hiragana, katakana, Arabic numerals, alphabets, and symbols.
  • FIG. 7 is a diagram showing an example of the network structure of the discriminator 70.
  • the discriminator 70 is composed of, for example, a hierarchical neural network consisting of an input layer 72 , an intermediate layer 74 and an output layer 76 .
  • the calculation rule of the discriminator 70 is determined by the values of the parameter set 46 (FIG. 1), which is a collection of learning parameters.
  • the parameter set 46 may include, for example, coefficients describing activation functions of arithmetic units, weighting coefficients corresponding to the strength of synaptic connections, the number of arithmetic units that make up each layer, the number of intermediate layers 74, and the like.
  • the parameter set 46 is stored in the storage unit 24 (FIG. 1) in a state in which each value has been determined upon completion of machine learning, and is read out as needed.
  • the input layer 72 is a layer for inputting the trajectory feature amount and the terminal feature amount, and is composed of 5n arithmetic units in the example of this figure.
  • the natural number n corresponds to the number of strokes.
  • This feature set includes: [1] X' coordinate of start point Ps, [2] Y' coordinate of start point Ps, [3] X' coordinate of end point Pe, [4] Y' coordinate of end point Pe, [5] end It is an input vector consisting of 5n components in which shape identification values are arranged in order of stroke ID.
  • the intermediate layer 74 is composed of one or more layers and has a dimensionality reduction function that reduces the number of dimensions of the feature quantity set. In other words, it is desirable that the number of arithmetic units constituting the intermediate layer 74 is sufficiently less than 5n.
  • the output layer 76 is a layer that outputs a group of character labels, and in the example of this figure, it is composed of operation units for the number of hiragana characters (for example, 46 excluding dakuten and handakuten). If the activation function of the arithmetic unit is the softmax function, this label set is a 46-component output vector representing the probabilities for each character.
  • the classifier 70 suitable for identifying Kanji may be selected.
  • the identifier 70 suitable for identifying multiple types of characters for example, hiragana and katakana may be selected.
  • the character identification unit 34 receives the feature quantity set corresponding to the bounding box 60 from the input layer 72 and outputs the label group corresponding to the bounding box 60 to the output layer 76 through the calculation by the intermediate layer 74 . For example, if the label values indicate probabilities, the character with the highest probability is identified as the "character candidate.”
  • the recognition processing unit 28 confirms whether or not all bounding boxes 60 have been specified at step SP12. If an undesignated bounding box 60 remains (step SP18: NO), the process returns to step SP12.
  • the recognition processing unit 28 sequentially performs feature set generation (SP14) and character recognition (SP16) while variously changing the combination of the four variables that specify the bounding box 60. Then, when all bounding boxes 60 have been specified for the entire area 50 (step SP18: YES), the process proceeds to the next step SP20.
  • SP14 feature set generation
  • SP16 character recognition
  • the recognition processing unit 28 aggregates the identification results at step SP16 and generates text data 44 indicated by the stroke group 52. Specifically, the recognition processing unit 28 extracts four characters by selecting the combination of the bounding box 60 and the label that maximizes the probability.
  • bounding boxes B1, B2, B3, B4 are defined that respectively enclose the four characters.
  • Text data 44 meaning "to write a character" in Japanese is obtained by successively concatenating the four characters from the left side to the right side.
  • Ink data 42 and text data 44 are digital ink DB
  • the output processing unit 30 outputs the text data 44 generated at step SP20 in response to a request from the user device 12.
  • a series of operations (flowchart in FIG. 2) by the character recognition device 16 are completed.
  • FIG. 9 is a diagram showing an example of the network structure of discriminator 80 in another example.
  • the discriminator 80 is composed of an input layer 82 having a different number of arithmetic units from the input layer 72 in FIG.
  • the input layer 82 is a layer for inputting only the trajectory feature amount, and in the example of this figure, it is composed of 4n (n is the number of strokes) arithmetic units.
  • This feature set includes [1] the X' coordinate of the start point Ps, [2] the Y' coordinate of the start point Ps, [3] the X' coordinate of the end point Pe, and [4] the Y' coordinate of the end point Pe. It is an input vector consisting of 4n elements arranged in order.
  • FIG. 10 is a diagram showing an example of the data structure of the character information 84.
  • the character information 84 is data in the form of a table showing correspondence between [1] character identification information, [2] stroke order of strokes, [3] types of end shapes, and [4] importance of end shapes. is.
  • the recognition processing unit 28 (more specifically, the character identification unit 34) performs character identification processing using the feature quantity set generated at step SP14. Prior to this identification, the character identification unit 34 selects one or more classifiers 80 from a plurality of machine-learned classifiers 80 (FIG. 9).
  • the character identification unit 34 inputs a feature amount set (here, only the trajectory feature amount) corresponding to the bounding box 60 from the input layer 92, and through the calculation by the intermediate layer 74, the label group corresponding to the bounding box 60. Output to output layer 76 .
  • a feature amount set here, only the trajectory feature amount
  • the label values indicate probabilities
  • the characters with the highest probabilities are identified as "character candidates.”
  • the recognition processing unit 28 aggregates the identification results at step SP16 and generates text data 44 indicated by the stroke group 52. Specifically, the recognition processing unit 28 extracts four characters by selecting the label with the highest degree of matching of the terminal shape from among the plurality of selected character candidates. Specifically, the degree of matching between the end shape estimated in step SP14 and the end shape of the character information 84 is evaluated for each stroke. The degree of matching is evaluated, for example, by summing scores weighted according to the degree of importance of the character information 84 .
  • the recognition processing unit 28 selects a plurality of character candidates using the discriminator 80 whose input is a feature amount set that does not include the terminal feature amount, and specifies a character using the terminal feature amount. good.
  • the character recognition device 16 includes the data acquisition unit 26 for acquiring the stroke data 48 indicating a set of pointing positions and writing pressure values sequentially detected through the user's handwriting operation, and A feature amount generation unit 32 that generates a feature amount set related to a stroke group 52 written by a writing operation using the stroke data 48 obtained by writing, and a stroke group 52 using the feature amount set generated by the feature amount generation unit 32 and a character identification unit 34 for identifying a single character or character string indicated by .
  • the feature amount set includes the end feature amount related to the end shape for each stroke.
  • the character recognition device 16 acquires stroke data 48 indicating a set of indicated positions and writing pressure values sequentially detected through a user's writing operation (SP10). , a generation step (SP14) of generating a feature amount set related to the stroke group 52 written by the writing operation using the acquired stroke data 48; and an identification step (SP16) for identifying the character or character string of .
  • the feature amount set includes the end feature amount related to the end shape for each stroke.
  • the feature amount set includes the end feature amount related to the end shape of each stroke, it is possible to accurately capture the unique shape features of handwritten characters, and the character can improve the recognition accuracy of
  • the character identification unit 34 may be configured to include a classifier 70 that receives as input a feature amount set including the terminal feature amount and outputs a probability for each character.
  • the character identification unit 34 is configured to include a discriminator 80 that receives a feature amount set that does not include the terminal feature amount as input and outputs a probability for each character, and when a plurality of character candidates are obtained by the discriminator 80 , may be used to identify characters.
  • the end feature amount may be a value that identifies stopping, sweeping, or bouncing.
  • the types of characters may include at least one of kanji, hiragana, katakana, and Arabic numerals.
  • the classifiers 70 and 80 were constructed using hierarchical neural networks, but the machine learning method is not limited to this.
  • various techniques may be employed including support vector machines, decision trees (eg, random forests), boosting methods (eg, gradient boosting methods).
  • the feature amount generation unit 32 (FIG. 1) generates the trajectory feature amount using only the start point and end point of the stroke has been described, but the method for generating the trajectory feature amount is not limited to this.
  • the feature amount generation unit 32 may generate the trajectory feature amount using at least one waypoint (for example, the middle point between the start point and the end point) in addition to the start point and the end point.

Abstract

Provided are a character recognition device, a program, and a method, with which it is possible to improve the precision of recognition of characters in relation to vector-format stroke data. The character recognition device 16 acquires stroke data 48 indicating an aggregation of indication positions and pen pressure values detected in sequence via the handwriting operations of a user, uses the acquired stroke data 48 to generate a feature amount set relating to a stroke group 52 handwritten with the handwriting operations, and uses the generated feature amount set to identify individual characters or character strings indicated by the stroke group 52. The feature amount set includes terminal feature amounts relating to the terminal shapes of each stroke.

Description

文字認識装置、プログラム及び方法Character recognition device, program and method
 本発明は、文字認識装置、プログラム及び方法に関する。 The present invention relates to a character recognition device, program and method.
 従来から、単独の文字又は文字列を示すデータを処理することで文字を認識する文字認識技術が知られている。処理対象のデータがラスタ形式、つまり画像である場合、畳み込みニューラルネットワークを用いて文字を認識する手法が多く用いられる(例えば、特許文献1,2参照)。 Conventionally, there has been known a character recognition technology that recognizes characters by processing data representing single characters or character strings. When the data to be processed is in raster format, that is, in the form of an image, a method of recognizing characters using a convolutional neural network is often used (see Patent Documents 1 and 2, for example).
特開2020-027598号公報Japanese Patent Application Laid-Open No. 2020-027598 特開2020-119559号公報JP 2020-119559 A
 ところが、処理対象のデータがベクトル形式、例えばストロークデータ(あるいはデジタルインク)である場合、畳み込みニューラルネットワークを用いることが必ずしも有効ではない。つまり、文字の認識精度を高めるための技術的な工夫が求められる。 However, if the data to be processed is in vector format, such as stroke data (or digital ink), using a convolutional neural network is not necessarily effective. In other words, technical ingenuity is required to improve character recognition accuracy.
 本発明はこのような問題に鑑みてなされたものであり、その目的は、ベクトル形式のストロークデータに対する文字の認識精度を向上可能な文字認識装置、プログラム及び方法を提供することにある。 The present invention has been made in view of such problems, and its object is to provide a character recognition device, program, and method capable of improving the character recognition accuracy for vector format stroke data.
 第1の本発明における文字認識装置は、ユーザの筆記操作を通じて順次検出される指示位置及び筆圧値の集合を示すストロークデータを取得するデータ取得部と、前記データ取得部により取得された前記ストロークデータを用いて、前記筆記操作により筆記されたストローク群に関する特徴量セットを生成する特徴量生成部と、前記特徴量生成部により生成された前記特徴量セットを用いて、前記ストローク群が示す単体の文字又は文字列を識別する文字識別部と、を備え、前記特徴量セットには、ストローク毎の終端形状に関する終端特徴量が含まれる。 A character recognition apparatus according to a first aspect of the present invention comprises a data acquisition unit for acquiring stroke data indicating a set of designated positions and writing pressure values sequentially detected through a user's handwriting operation; a feature amount generation unit that uses data to generate a feature amount set related to a group of strokes written by the writing operation; and a unit represented by the stroke group using the feature amount set generated by the feature amount generation unit. and a character identification unit for identifying a character or character string, wherein the feature amount set includes an end feature amount related to the end shape of each stroke.
 第2の本発明における文字認識プログラムは、ユーザの筆記操作を通じて順次検出される指示位置及び筆圧値の集合を示すストロークデータを取得する取得ステップと、取得された前記ストロークデータを用いて、前記筆記操作により筆記されたストローク群に関する特徴量セットを生成する生成ステップと、生成された前記特徴量セットを用いて、前記ストローク群が示す単体の文字又は文字列を識別する識別ステップと、をコンピュータに実行させ、前記特徴量セットには、ストローク毎の終端形状に関する終端特徴量が含まれる。 A character recognition program according to a second aspect of the present invention comprises an obtaining step of obtaining stroke data indicating a set of indicated positions and writing pressure values sequentially detected through a user's writing operation; a generation step of generating a feature amount set related to a group of strokes written by a writing operation; and an identification step of identifying a single character or character string indicated by the stroke group using the generated feature amount set. , and the feature set includes the end feature for the end shape of each stroke.
 第3の本発明における文字認識方法では、ユーザの筆記操作を通じて順次検出される指示位置及び筆圧値の集合を示すストロークデータを取得する取得ステップと、取得された前記ストロークデータを用いて、前記筆記操作により筆記されたストローク群に関する特徴量セットを生成する生成ステップと、生成された前記特徴量セットを用いて、前記ストローク群が示す単体の文字又は文字列を識別する識別ステップと、をコンピュータが実行し、前記特徴量セットには、ストローク毎の終端形状に関する終端特徴量が含まれる。 In the character recognition method according to the third aspect of the present invention, an acquiring step of acquiring stroke data indicating a set of indicated positions and writing pressure values sequentially detected through a user's writing operation; a generation step of generating a feature amount set related to a group of strokes written by a writing operation; and an identification step of identifying a single character or character string indicated by the stroke group using the generated feature amount set. is executed, and the feature set includes the end feature for the end shape for each stroke.
 本発明によれば、ベクトル形式のストロークデータに対する文字の認識精度を向上させることができる。 According to the present invention, it is possible to improve the character recognition accuracy for vector format stroke data.
本発明の一実施形態における文字認識装置が組み込まれた入力システムの全体構成図である。1 is an overall configuration diagram of an input system incorporating a character recognition device according to an embodiment of the present invention; FIG. 図1の文字認識装置の動作を示すフローチャートである。2 is a flow chart showing the operation of the character recognition device of FIG. 1; 図1のインクデータが有するデータ構造の一例を示す図である。2 is a diagram showing an example of a data structure of ink data in FIG. 1; FIG. 図3のストロークデータにより可視化されるコンテンツの一例を示す図である。FIG. 4 is a diagram showing an example of content visualized by the stroke data of FIG. 3; FIG. 境界ボックス毎の特徴量セットの生成方法の一例を示す図である。FIG. 10 is a diagram illustrating an example of a method of generating feature quantity sets for each bounding box; 終端形状が異なるストロークを筆記する際の筆圧変化の一例を示す図である。FIG. 7 is a diagram showing an example of changes in writing pressure when writing strokes with different end shapes. 識別器が有するネットワーク構造の一例を示す図である。It is a figure which shows an example of the network structure which a discriminator has. 文字列の認識結果の一例を示す図である。FIG. 10 is a diagram showing an example of a character string recognition result; 別の例における識別器が有するネットワーク構造の一例を示す図である。FIG. 11 is a diagram showing an example of a network structure of discriminators in another example; 文字情報が有するデータ構造の一例を示す図である。It is a figure which shows an example of the data structure which character information has.
 以下、添付図面を参照しながら本発明の実施形態について説明する。説明の理解を容易にするため、各図面において同一の構成要素及びステップに対しては可能な限り同一の符号を付して、重複する説明を省略する場合がある。 Embodiments of the present invention will be described below with reference to the accompanying drawings. In order to facilitate understanding of the description, the same components and steps in each drawing are denoted by the same reference numerals as much as possible, and redundant description may be omitted.
[入力システム10の全体構成]
 図1は、本発明の一実施形態における文字認識装置16が組み込まれた入力システム10の全体構成図である。この入力システム10は、ユーザによる筆記文字をテキストデータ化して取り扱う「デジタルインクサービス」を提供可能に構成される。この入力システム10は、具体的には、1台又は複数台のユーザ装置12と、1本又は複数本の電子ペン14と、文字認識装置16と、を含んで構成される。
[Overall Configuration of Input System 10]
FIG. 1 is an overall configuration diagram of an input system 10 incorporating a character recognition device 16 according to one embodiment of the present invention. This input system 10 is configured to be able to provide a "digital ink service" that converts characters written by a user into text data and handles them. This input system 10 specifically includes one or more user devices 12 , one or more electronic pens 14 , and a character recognition device 16 .
 ユーザ装置12は、デジタルインクサービスを利用するユーザが所有するコンピュータであって、電子ペン14による指示位置を検出する機能を有する。ユーザ装置12は、例えば、タブレット、スマートフォン、パーソナルコンピュータなどから構成される。 The user device 12 is a computer owned by a user who uses the digital ink service, and has a function of detecting the position indicated by the electronic pen 14 . The user device 12 is composed of, for example, a tablet, a smart phone, a personal computer, or the like.
 電子ペン14は、ペン型のポインティングデバイスであり、ユーザ装置12との間で一方向又は双方向に通信可能に構成される。この電子ペン14は、例えば、アクティブ静電結合方式(AES)又は電磁誘導方式(EMR)のスタイラスである。ユーザは、電子ペン14を把持し、ユーザ装置12が有するタッチ面にペン先を押し当てながら移動させることで、ユーザ装置12に絵や文字を書き込むことができる。 The electronic pen 14 is a pen-type pointing device, and is configured to be capable of unidirectionally or bidirectionally communicating with the user device 12 . The electronic pen 14 is, for example, an active electrostatic coupling (AES) or electromagnetic induction (EMR) stylus. The user can write pictures and characters on the user device 12 by holding the electronic pen 14 and moving the pen tip while pressing it against the touch surface of the user device 12 .
 文字認識装置16は、文字の認識に関する統括的な制御を行うコンピュータであり、クラウド型あるいはオンプレミス型のいずれであってもよい。ここで、文字認識装置16を単体のコンピュータとして図示しているが、文字認識装置16は、これに代わって分散システムを構築するコンピュータ群であってもよい。文字認識装置16は、具体的には、通信部20と、制御部22と、記憶部24と、を含んで構成される。 The character recognition device 16 is a computer that performs general control regarding character recognition, and may be either a cloud type or an on-premise type. Here, although the character recognition device 16 is illustrated as a single computer, the character recognition device 16 may alternatively be a group of computers constructing a distributed system. The character recognition device 16 specifically includes a communication section 20 , a control section 22 and a storage section 24 .
 通信部20は、外部装置に対して電気信号を送受信するインターフェースである。これにより、文字認識装置16は、ユーザ装置12からのインクデータ42を取得するとともに、自身が生成したテキストデータ44をユーザ装置12に供給する。 The communication unit 20 is an interface that transmits and receives electrical signals to and from an external device. Thereby, the character recognition device 16 acquires the ink data 42 from the user device 12 and supplies the text data 44 generated by itself to the user device 12 .
 制御部22は、CPU(Central Processing Unit)やGPU(Graphics Processing Unit)を含むプロセッサによって構成される。制御部22は、記憶部24に格納されたプログラム及びデータを読み出して実行することで、いずれも後述するデータ取得部26、認識処理部28、及び出力処理部30として機能する。 The control unit 22 is composed of processors including a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit). The control unit 22 reads and executes programs and data stored in the storage unit 24, thereby functioning as a data acquisition unit 26, a recognition processing unit 28, and an output processing unit 30, all of which will be described later.
 記憶部24は、制御部22が各構成要素を制御するのに必要なプログラム及びデータを記憶している。記憶部24は、非一過性であり、かつ、コンピュータ読み取り可能な記憶媒体で構成されている。ここで、コンピュータ読み取り可能な記憶媒体は、光磁気ディスク、ROM、CD-ROM、フラッシュメモリなどの可搬媒体、コンピュータシステムに内蔵されるハードディスク(HDD:Hard Disk Drive)、ソリッドステートドライブ(SSD:Solid State Drive)などの記憶装置である。 The storage unit 24 stores programs and data necessary for the control unit 22 to control each component. The storage unit 24 is composed of a non-transitory computer-readable storage medium. Here, computer-readable storage media include magneto-optical disks, ROMs, CD-ROMs, portable media such as flash memory, hard disks built into computer systems (HDD: Hard Disk Drive), solid state drives (SSD: Solid State Drive) and other storage devices.
 なお、記憶部24には、デジタルインクに関するデータベース(以下、デジタルインクDB40)が構築されるとともに、インクデータ42、テキストデータ44及びパラメータセット46が格納されている。 In the storage unit 24, a database related to digital ink (hereinafter referred to as a digital ink DB 40) is constructed, and ink data 42, text data 44, and parameter sets 46 are stored.
[文字認識装置16の動作]
 この実施形態における入力システム10は、以上のように構成される。続いて、この入力システム10の一部を構成する文字認識装置16の動作について、図2のフローチャート及び図3~図8を参照しながら説明する。
[Operation of Character Recognition Device 16]
The input system 10 in this embodiment is configured as described above. Next, the operation of the character recognition device 16 forming part of the input system 10 will be described with reference to the flow chart of FIG. 2 and FIGS. 3 to 8. FIG.
<フローチャートの説明>
 図2のステップSP10において、データ取得部26は、ユーザの筆記操作を通じて順次検出される指示位置及び筆圧値の集合を示すデータ(以下、「ストロークデータ48」という)を取得する。具体的には、データ取得部26は、ユーザ装置12から供給されたインクデータ42の中からストロークデータ48を抽出して取得する。
<Description of flow chart>
In step SP10 of FIG. 2, the data acquisition unit 26 acquires data (hereinafter referred to as "stroke data 48") indicating a set of indicated positions and writing pressure values sequentially detected through the user's writing operation. Specifically, the data acquisition unit 26 extracts and acquires the stroke data 48 from the ink data 42 supplied from the user device 12 .
 図3は、図1のインクデータ42が有するデータ構造の一例を示す図である。本図では、XML(eXtensible Markup Language)形式で記述するInkML(Ink Markup Language)の例を挙げて説明する。なお、インクデータ42のデータ形式、いわゆる「インク記述言語」は、InkMLに限られず、例えば、WILL(Wacom Ink Layer Language)あるいはISF(Ink Serialized Format)であってもよい。 FIG. 3 is a diagram showing an example of the data structure of the ink data 42 of FIG. In this figure, an example of InkML (Ink Markup Language) written in XML (eXtensible Markup Language) format will be described. The data format of the ink data 42, the so-called "ink description language", is not limited to InkML, and may be WILL (Wacom Ink Layer Language) or ISF (Ink Serialized Format), for example.
 本図の例では、インクデータ42は、少なくとも1本のストロークの筆記状態を記述するストロークデータ48を含んで構成される。本図の例では、ストロークの筆記順に、上行から下行にわたって合計20本のストロークの筆記状態が記述される。また、1本のストロークは、<trace>タグ内に順次配列される複数のポイントデータにより記述される。図示の便宜上、ストロークの始点及び終点を示すポイントデータのみを表記し、複数の経由点を示すポイントデータの表記を省略している。 In the example of this figure, the ink data 42 includes stroke data 48 describing the writing state of at least one stroke. In the example of this figure, the writing state of a total of 20 strokes from the top line to the bottom line is described in the writing order of the strokes. Also, one stroke is described by a plurality of point data sequentially arranged within the <trace> tag. For convenience of illustration, only the point data indicating the start and end points of the stroke are shown, and the point data indicating a plurality of waypoints are omitted.
 各々のポイントデータは、電子ペン14の指示位置(X座標値、Y座標値)及び筆圧値(例えば、8ビットのレベル値)の組み合わせからなり、カンマなどのデリミタで区切られる。このポイントデータには、上記した指示位置及び筆圧値の他に、ストロークの筆記順や、電子ペン14の傾斜角度などが含まれてもよい。 Each point data consists of a combination of the indicated position (X coordinate value, Y coordinate value) of the electronic pen 14 and the pen pressure value (eg, 8-bit level value), and is delimited by a delimiter such as a comma. This point data may include the writing order of strokes, the tilt angle of the electronic pen 14, and the like, in addition to the indicated positions and pen pressure values described above.
 図4は、図3のストロークデータ48により可視化されるコンテンツの一例を示す図である。ユーザ装置12の表示パネル上において、タッチ位置を特定するための二次元座標系(以下、センサ座標系:X-Y)が定義される。センサ座標系の全体領域50内には、20本のストロークからなるストローク群52が配置される。ストローク群52は、漢字及び平仮名から構成される文字列を示している。 FIG. 4 is a diagram showing an example of content visualized by the stroke data 48 of FIG. On the display panel of the user device 12, a two-dimensional coordinate system (hereinafter referred to as sensor coordinate system: XY) is defined for specifying a touch position. A stroke group 52 consisting of 20 strokes is arranged in the entire area 50 of the sensor coordinate system. A stroke group 52 indicates a character string composed of kanji and hiragana.
 図面左側にある先頭の漢字には、「止め」の形跡を示す終端54と、「払い」の形跡を示す終端56と、「跳ね」の形跡を示す終端58と、が含まれる。「止め」は、移動中のペン先を急に停止させた後に、ペン先をタッチ面から離す操作である。「払い」は、移動中のペン先を滑らせながらタッチ面から離す操作である。「跳ね」は、移動中のペン先を急に停止させた後に、ペン先を方向転換しながら上記した「払い」を行う操作である。 The leading kanji characters on the left side of the drawing include an end 54 indicating the trace of "stop", an end 56 indicating the trace of "push", and an end 58 indicating the trace of "bounce". “Stop” is an operation of releasing the pen tip from the touch surface after suddenly stopping the pen tip during movement. "Pay" is an operation of sliding the tip of the pen while it is moving and releasing it from the touch surface. "Bounce" is the operation of suddenly stopping the moving pen tip and then performing the above-described "swipe" while changing the direction of the pen tip.
 図2のステップSP12において、認識処理部28は、ステップSP10で取得されたストロークデータ48にて定義される全体領域50の中から、認識対象である境界ボックス60を指定する。境界ボックス60の形状は、4つの変数、すなわち、基準位置(例えば、原点Oに最も近い頂点)のX座標値・Y座標値、X方向の辺の長さ、Y方向の辺の長さによって一意に定められる。 At step SP12 in FIG. 2, the recognition processing unit 28 designates a bounding box 60 to be recognized from among the entire area 50 defined by the stroke data 48 acquired at step SP10. The shape of the bounding box 60 is determined by four variables: the X- and Y-coordinate values of the reference position (for example, the vertex closest to the origin O), the length of the side in the X direction, and the length of the side in the Y direction. Uniquely defined.
 ステップSP14において、認識処理部28(より詳しくは、特徴量生成部32)は、ステップSP12で指定された境界ボックス60内における特徴量セットを生成する。ここでは、特徴量セットには、[1]ストローク毎の軌跡に関する特徴量(以下、軌跡特徴量)と、[2]ストローク毎の終端形状に関する特徴量(以下、終端特徴量)と、が含まれる。 At step SP14, the recognition processing unit 28 (more specifically, the feature amount generation unit 32) generates a feature amount set within the bounding box 60 specified at step SP12. Here, the feature amount set includes [1] a feature amount related to the trajectory of each stroke (hereinafter referred to as a trajectory feature amount) and [2] a feature amount related to the end shape of each stroke (hereinafter referred to as an end feature amount). be
 図5は、境界ボックス毎の特徴量セットの生成方法の一例を示す図である。特徴量生成部32は、ユーザ装置12が有するタッチ面に対応する二次元座標系(以下、正規化座標系;X’-Y’)を定義する。この正規化座標系の原点O’は、センサ座標系の原点Oに最も近い位置にある、境界ボックス60の頂点に相当する。正規化座標系のX’軸はセンサ座標系のX軸に平行であるとともに、正規化座標系のY’軸はセンサ座標系のY軸に平行である。また、X’軸,Y’軸のスケールは、境界ボックス60をなす4つの頂点の座標が(0,0),(1,0),(0,1),(1,1)となるように正規化されている。 FIG. 5 is a diagram showing an example of a method of generating feature quantity sets for each bounding box. The feature amount generation unit 32 defines a two-dimensional coordinate system (hereinafter referred to as normalized coordinate system; X'-Y') corresponding to the touch surface of the user device 12 . The origin O' of this normalized coordinate system corresponds to the vertex of the bounding box 60 that is closest to the origin O of the sensor coordinate system. The X'-axis of the normalized coordinate system is parallel to the X-axis of the sensor coordinate system and the Y'-axis of the normalized coordinate system is parallel to the Y-axis of the sensor coordinate system. The scales of the X' and Y' axes are set so that the coordinates of the four vertices forming the bounding box 60 are (0, 0), (1, 0), (0, 1), (1, 1). is normalized to
 特徴量生成部32は、ストロークデータ48(図3)を参照し、センサ座標系におけるストロークの始点Psの座標値(X,Y)及び終点Peの座標値(X,Y)を取得する。そして、特徴量生成部32は、座標系の線形変換を施すことで、正規化座標系におけるストロークの始点Psの座標値(X’,Y’)及び終点Peの座標値(X’,Y’)をそれぞれ導出する。なお、ストロークの始点Ps又は終点Peが境界ボックス60の外側にある場合、特徴量生成部32は、境界線上にある経由点を仮想的な始点Ps又は終点Peとみなして軌跡特徴量を生成すればよい。 The feature quantity generating unit 32 refers to the stroke data 48 (FIG. 3) and acquires the coordinate values (X, Y) of the stroke start point Ps and the coordinate values (X, Y) of the end point Pe of the stroke in the sensor coordinate system. Then, the feature amount generation unit 32 performs linear transformation of the coordinate system to obtain the coordinate values (X′, Y′) of the stroke start point Ps and the coordinate values (X′, Y′) of the end point Pe of the stroke in the normalized coordinate system. ) are derived respectively. Note that when the start point Ps or the end point Pe of the stroke is outside the bounding box 60, the feature amount generating unit 32 generates the trajectory feature amount by regarding the via points on the boundary line as the virtual start point Ps or the end point Pe. Just do it.
 図6は、終端形状が異なるストロークを筆記する際の筆圧変化の一例を示す図である。グラフの横軸は時間(t)を示すとともに、グラフの縦軸は筆圧を示している。時間t=Tsは、電子ペン14をタッチ面に押し当てるペンダウン操作の実行時点に相当する。一方、時間t=Teは、電子ペン14をタッチ面から離すペンアップ操作の実行時点に相当する。 FIG. 6 is a diagram showing an example of changes in writing pressure when writing strokes with different end shapes. The horizontal axis of the graph indicates time (t), and the vertical axis of the graph indicates writing pressure. The time t=Ts corresponds to the execution time of the pen-down operation of pressing the electronic pen 14 against the touch surface. On the other hand, the time t=Te corresponds to the execution time point of the pen-up operation of releasing the electronic pen 14 from the touch surface.
 終端形状が「止め」である場合、筆圧は、略一定を保ちながら、t=Teの直前で急激に低下する傾向がある。終端形状が「払い」である場合、筆圧は、略一定を保ちながら、t=Teの手前から緩やかに低下する傾向がある。終端形状が「跳ね」である場合、筆圧は、略一定を保ちながら、t=Teの手前で一時的に増加し、その後に、「止め」と「払い」の間の中間的な傾きで低下する傾向がある。 When the end shape is "stop", the pen pressure tends to drop sharply just before t=Te while remaining substantially constant. When the end shape is "sweep", the writing pressure tends to gradually decrease before t=Te while remaining substantially constant. When the end shape is "bounce", the pen pressure temporarily increases before t=Te while keeping approximately constant, and then at an intermediate slope between "stop" and "bounce". tend to decline.
 特徴量生成部32は、パターンマッチングを含む様々な手法を用いて、ストローク毎に終端形状を推定する。止め・払い・跳ねのいずれにも該当しないという推定結果が得られた場合、「不明」あるいは「該当なし」という属性が割り当てられてもよい。なお、終端特徴量は、止め・払い・跳ねなどの識別値に限られず、形状的特徴を示す様々な値(例えば、筆圧値が減少する傾き、減少の開始位置、筆圧値の時系列など)であってもよい。 The feature quantity generation unit 32 estimates the end shape for each stroke using various methods including pattern matching. If an estimation result is obtained that does not correspond to any of stop, sweep, and bounce, an attribute of "unknown" or "not applicable" may be assigned. Note that the end feature quantity is not limited to identification values such as stop, sweep, bounce, etc., but various values that indicate shape features (for example, the slope at which the pen pressure value decreases, the position at which the pen pressure value begins to decrease, the time series of the pen pressure value, etc.) etc.).
 図2のステップSP16において、認識処理部28(より詳しくは、文字識別部34)は、ステップSP14で生成された特徴量セットを用いて文字の識別処理を行う。この識別に先立ち、文字識別部34は、機械学習がなされた複数の識別器70の中から、文字の種類に適した識別器70を選択する。文字の種類として、例えば、漢字、平仮名、片仮名、算用数字、アルファベット、記号などが挙げられる。 At step SP16 in FIG. 2, the recognition processing unit 28 (more specifically, the character identification unit 34) performs character identification processing using the feature quantity set generated at step SP14. Prior to this identification, the character identification unit 34 selects a classifier 70 suitable for the type of character from among a plurality of machine-learned classifiers 70 . Types of characters include, for example, kanji, hiragana, katakana, Arabic numerals, alphabets, and symbols.
 図7は、識別器70が有するネットワーク構造の一例を示す図である。識別器70は、例えば、入力層72、中間層74及び出力層76からなる階層型ニューラルネットワークで構成される。識別器70の演算規則は、学習パラメータの集合体であるパラメータセット46(図1)の値によって定められる。パラメータセット46は、例えば、演算ユニットの活性化関数を記述する係数、シナプス結合の強さに相当する重み付け係数、各層を構成する演算ユニットの個数、中間層74の層数などを含んでもよい。パラメータセット46は、機械学習の終了によって各値が確定された状態で記憶部24(図1)に格納され、必要に応じて適時に読み出される。 FIG. 7 is a diagram showing an example of the network structure of the discriminator 70. FIG. The discriminator 70 is composed of, for example, a hierarchical neural network consisting of an input layer 72 , an intermediate layer 74 and an output layer 76 . The calculation rule of the discriminator 70 is determined by the values of the parameter set 46 (FIG. 1), which is a collection of learning parameters. The parameter set 46 may include, for example, coefficients describing activation functions of arithmetic units, weighting coefficients corresponding to the strength of synaptic connections, the number of arithmetic units that make up each layer, the number of intermediate layers 74, and the like. The parameter set 46 is stored in the storage unit 24 (FIG. 1) in a state in which each value has been determined upon completion of machine learning, and is read out as needed.
 入力層72は、軌跡特徴量及び終端特徴量を入力する層であり、本図の例では5n個の演算ユニットから構成される。ここで、自然数nは、ストロークの本数に相当する。この特徴量セットは、[1]始点PsのX’座標、[2]始点PsのY’座標、[3]終点PeのX’座標、[4]終点PeのY’座標、[5]終端形状の識別値、をストロークIDの順に配列した5n個の成分からなる入力ベクトルである。 The input layer 72 is a layer for inputting the trajectory feature amount and the terminal feature amount, and is composed of 5n arithmetic units in the example of this figure. Here, the natural number n corresponds to the number of strokes. This feature set includes: [1] X' coordinate of start point Ps, [2] Y' coordinate of start point Ps, [3] X' coordinate of end point Pe, [4] Y' coordinate of end point Pe, [5] end It is an input vector consisting of 5n components in which shape identification values are arranged in order of stroke ID.
 中間層74は、1層又は2層以上で構成され、特徴量セットの次元数を削減する次元削減機能を有する。つまり、中間層74を構成する演算ユニットの個数は、5n個よりも十分に少ないことが望ましい。 The intermediate layer 74 is composed of one or more layers and has a dimensionality reduction function that reduces the number of dimensions of the feature quantity set. In other words, it is desirable that the number of arithmetic units constituting the intermediate layer 74 is sufficiently less than 5n.
 出力層76は、文字のラベル群を出力する層であり、本図の例では平仮名の文字数(例えば、濁点・半濁点を除いた46個)の演算ユニットから構成される。演算ユニットの活性化関数がソフトマックス関数である場合、このラベル群は、文字毎の確率を示す46個の成分からなる出力ベクトルである。 The output layer 76 is a layer that outputs a group of character labels, and in the example of this figure, it is composed of operation units for the number of hiragana characters (for example, 46 excluding dakuten and handakuten). If the activation function of the arithmetic unit is the softmax function, this label set is a 46-component output vector representing the probabilities for each character.
 また、文字の種類が「漢字」である場合、漢字の識別に適した識別器70が選択されてもよい。あるいは、文字の種類が特定できない場合、複数種類の文字(例えば、平仮名及び片仮名)の識別に適した識別器70が選択されてもよい。 Also, if the character type is "Kanji", the classifier 70 suitable for identifying Kanji may be selected. Alternatively, if the type of character cannot be identified, the identifier 70 suitable for identifying multiple types of characters (for example, hiragana and katakana) may be selected.
 そして、文字識別部34は、境界ボックス60に対応する特徴量セットを入力層72から入力し、中間層74による演算を通じて、境界ボックス60に対応するラベル群を出力層76に出力する。例えば、ラベル値が確率を示す場合、最大の確率を有する文字が「文字候補」として識別される。 Then, the character identification unit 34 receives the feature quantity set corresponding to the bounding box 60 from the input layer 72 and outputs the label group corresponding to the bounding box 60 to the output layer 76 through the calculation by the intermediate layer 74 . For example, if the label values indicate probabilities, the character with the highest probability is identified as the "character candidate."
 図2のステップSP18において、認識処理部28は、ステップSP12にてすべての境界ボックス60が指定されたか否かを確認する。未指定の境界ボックス60が残っている場合(ステップSP18:NO)、ステップSP12に戻る。 At step SP18 in FIG. 2, the recognition processing unit 28 confirms whether or not all bounding boxes 60 have been specified at step SP12. If an undesignated bounding box 60 remains (step SP18: NO), the process returns to step SP12.
 認識処理部28は、2回目以降の認識処理では、境界ボックス60を特定する4つの変数の組み合わせを様々変化させながら、特徴量セットの生成(SP14)及び文字の認識(SP16)を順次行う。そして、全体領域50に対してすべての境界ボックス60が指定された場合(ステップSP18:YES)、次のステップSP20に進む。 In the second and subsequent recognition processes, the recognition processing unit 28 sequentially performs feature set generation (SP14) and character recognition (SP16) while variously changing the combination of the four variables that specify the bounding box 60. Then, when all bounding boxes 60 have been specified for the entire area 50 (step SP18: YES), the process proceeds to the next step SP20.
 ステップSP20において、認識処理部28は、ステップSP16での識別結果を集約し、ストローク群52が示すテキストデータ44を生成する。具体的には、認識処理部28は、確率が最大になる境界ボックス60及びラベルの組み合わせを選択することで、4つの文字を抽出する。 At step SP20, the recognition processing unit 28 aggregates the identification results at step SP16 and generates text data 44 indicated by the stroke group 52. Specifically, the recognition processing unit 28 extracts four characters by selecting the combination of the bounding box 60 and the label that maximizes the probability.
 図8に示すように、4つの文字をそれぞれ囲む境界ボックスB1,B2,B3,B4が画定される。そして、4つの文字を左側から右側に向かって順次連結することで、日本語で「字を書く」という意味のテキストデータ44が得られる。インクデータ42とテキストデータ44は、デジタルインクDB As shown in FIG. 8, bounding boxes B1, B2, B3, B4 are defined that respectively enclose the four characters. Text data 44 meaning "to write a character" in Japanese is obtained by successively concatenating the four characters from the left side to the right side. Ink data 42 and text data 44 are digital ink DB
 図2のステップSP22において、出力処理部30は、ステップSP20で生成されたテキストデータ44を、ユーザ装置12からの要求に応じて出力する。このようにして、文字認識装置16による一連の動作(図2のフローチャート)が終了する。 At step SP22 in FIG. 2, the output processing unit 30 outputs the text data 44 generated at step SP20 in response to a request from the user device 12. Thus, a series of operations (flowchart in FIG. 2) by the character recognition device 16 are completed.
<別の識別例>
 図9は、別の例における識別器80が有するネットワーク構造の一例を示す図である。識別器80は、図7の入力層72とは演算ユニットの個数が異なる入力層82と、中間層74及び出力層76からなる階層型ニューラルネットワークで構成される。
<Another example of identification>
FIG. 9 is a diagram showing an example of the network structure of discriminator 80 in another example. The discriminator 80 is composed of an input layer 82 having a different number of arithmetic units from the input layer 72 in FIG.
 入力層82は、軌跡特徴量のみを入力する層であり、本図の例では4n個(nは、ストロークの本数)の演算ユニットから構成される。この特徴量セットは、[1]始点PsのX’座標、[2]始点PsのY’座標、[3]終点PeのX’座標、[4]終点PeのY’座標、をストロークIDの順に配列した4n個の成分からなる入力ベクトルである。 The input layer 82 is a layer for inputting only the trajectory feature amount, and in the example of this figure, it is composed of 4n (n is the number of strokes) arithmetic units. This feature set includes [1] the X' coordinate of the start point Ps, [2] the Y' coordinate of the start point Ps, [3] the X' coordinate of the end point Pe, and [4] the Y' coordinate of the end point Pe. It is an input vector consisting of 4n elements arranged in order.
 図10は、文字情報84が有するデータ構造の一例を示す図である。文字情報84は、[1]文字の識別情報、[2]ストロークの書き順、[3]終端形状の種類、及び[4]終端形状の重要度、の間の対応関係を示すテーブル形式のデータである。 FIG. 10 is a diagram showing an example of the data structure of the character information 84. FIG. The character information 84 is data in the form of a table showing correspondence between [1] character identification information, [2] stroke order of strokes, [3] types of end shapes, and [4] importance of end shapes. is.
 以下、図9及び図10の構成による文字の認識方法について、図2のフローチャートに従って説明する。なお、ステップSP16,SP20以外のステップに関しては上述した通りであるので、再度の説明を省略する。  Hereinafter, the method of recognizing characters with the configurations in FIGS. 9 and 10 will be described according to the flowchart in FIG. Steps other than steps SP16 and SP20 are the same as those described above, and therefore will not be described again.
 図2のステップSP16において、認識処理部28(より詳しくは、文字識別部34)は、ステップSP14で生成された特徴量セットを用いて文字の識別処理を行う。この識別に先立ち、文字識別部34は、機械学習がなされた複数の識別器80(図9)の中から、1つ以上の識別器80を選択する。 At step SP16 in FIG. 2, the recognition processing unit 28 (more specifically, the character identification unit 34) performs character identification processing using the feature quantity set generated at step SP14. Prior to this identification, the character identification unit 34 selects one or more classifiers 80 from a plurality of machine-learned classifiers 80 (FIG. 9).
 そして、文字識別部34は、境界ボックス60に対応する特徴量セット(ここでは、軌跡特徴量のみ)を入力層92から入力し、中間層74による演算を通じて、境界ボックス60に対応するラベル群を出力層76に出力する。例えば、ラベル値が確率を示す場合、確率の大きさが上位(例えば、最上位や、上位3番目まで)にある文字が「文字候補」として識別される。 Then, the character identification unit 34 inputs a feature amount set (here, only the trajectory feature amount) corresponding to the bounding box 60 from the input layer 92, and through the calculation by the intermediate layer 74, the label group corresponding to the bounding box 60. Output to output layer 76 . For example, if the label values indicate probabilities, the characters with the highest probabilities (for example, the highest or the top three) are identified as "character candidates."
 図2のステップSP20において、認識処理部28は、ステップSP16での識別結果を集約し、ストローク群52が示すテキストデータ44を生成する。具体的には、認識処理部28は、選択された複数の文字候補のうち、終端形状の一致度が最も高いラベルを選択することで4つの文字を抽出する。具体的には、ステップSP14で推定された終端形状と、文字情報84が有する終端形状との一致度をストローク毎に評価する。この一致度は、例えば、文字情報84が有する重要度に応じて重み付けした得点の合計によって評価される。 At step SP20 in FIG. 2, the recognition processing unit 28 aggregates the identification results at step SP16 and generates text data 44 indicated by the stroke group 52. Specifically, the recognition processing unit 28 extracts four characters by selecting the label with the highest degree of matching of the terminal shape from among the plurality of selected character candidates. Specifically, the degree of matching between the end shape estimated in step SP14 and the end shape of the character information 84 is evaluated for each stroke. The degree of matching is evaluated, for example, by summing scores weighted according to the degree of importance of the character information 84 .
 このように、認識処理部28は、終端特徴量を含まない特徴量セットを入力とする識別器80を用いて複数の文字候補を選択するとともに、終端特徴量を用いて文字を特定してもよい。 In this way, the recognition processing unit 28 selects a plurality of character candidates using the discriminator 80 whose input is a feature amount set that does not include the terminal feature amount, and specifies a character using the terminal feature amount. good.
[文字認識装置16による効果]
 以上のように、この文字認識装置16は、ユーザの筆記操作を通じて順次検出される指示位置及び筆圧値の集合を示すストロークデータ48を取得するデータ取得部26と、データ取得部26により取得されたストロークデータ48を用いて、筆記操作により筆記されたストローク群52に関する特徴量セットを生成する特徴量生成部32と、特徴量生成部32により生成された特徴量セットを用いて、ストローク群52が示す単体の文字又は文字列を識別する文字識別部34と、を備える。特徴量セットには、ストローク毎の終端形状に関する終端特徴量が含まれる。
[Effect of Character Recognition Device 16]
As described above, the character recognition device 16 includes the data acquisition unit 26 for acquiring the stroke data 48 indicating a set of pointing positions and writing pressure values sequentially detected through the user's handwriting operation, and A feature amount generation unit 32 that generates a feature amount set related to a stroke group 52 written by a writing operation using the stroke data 48 obtained by writing, and a stroke group 52 using the feature amount set generated by the feature amount generation unit 32 and a character identification unit 34 for identifying a single character or character string indicated by . The feature amount set includes the end feature amount related to the end shape for each stroke.
 また、この文字認識方法及びプログラムでは、コンピュータとしての文字認識装置16が、ユーザの筆記操作を通じて順次検出される指示位置及び筆圧値の集合を示すストロークデータ48を取得する取得ステップ(SP10)と、取得されたストロークデータ48を用いて、筆記操作により筆記されたストローク群52に関する特徴量セットを生成する生成ステップ(SP14)と、生成された特徴量セットを用いて、ストローク群52が示す単体の文字又は文字列を識別する識別ステップ(SP16)と、を実行する。特徴量セットには、ストローク毎の終端形状に関する終端特徴量が含まれる。 In this character recognition method and program, the character recognition device 16 as a computer acquires stroke data 48 indicating a set of indicated positions and writing pressure values sequentially detected through a user's writing operation (SP10). , a generation step (SP14) of generating a feature amount set related to the stroke group 52 written by the writing operation using the acquired stroke data 48; and an identification step (SP16) for identifying the character or character string of . The feature amount set includes the end feature amount related to the end shape for each stroke.
 このように、特徴量セットには、ストローク毎の終端形状に関する終端特徴量が含まれるので、手書きによる文字の独特な形状的特徴を的確に捉えることが可能となり、ベクトル形式のストロークデータ48に対する文字の認識精度を向上させることができる。 Thus, since the feature amount set includes the end feature amount related to the end shape of each stroke, it is possible to accurately capture the unique shape features of handwritten characters, and the character can improve the recognition accuracy of
 また、文字識別部34は、終端特徴量を含む特徴量セットを入力とし、文字毎の確率を出力する識別器70を含んで構成されてもよい。あるいは、文字識別部34は、終端特徴量を含まない特徴量セットを入力とし、文字毎の確率を出力する識別器80を含んで構成され、識別器80により複数の文字候補が得られた場合、終端特徴量を用いて文字を特定してもよい。 In addition, the character identification unit 34 may be configured to include a classifier 70 that receives as input a feature amount set including the terminal feature amount and outputs a probability for each character. Alternatively, the character identification unit 34 is configured to include a discriminator 80 that receives a feature amount set that does not include the terminal feature amount as input and outputs a probability for each character, and when a plurality of character candidates are obtained by the discriminator 80 , may be used to identify characters.
 なお、終端特徴量は、止め、払い、又は跳ねを識別する値であってもよい。また、文字の種類は、漢字、平仮名、片仮名、及び算用数字のうち少なくとも1種類を含んでもよい。 It should be noted that the end feature amount may be a value that identifies stopping, sweeping, or bouncing. Also, the types of characters may include at least one of kanji, hiragana, katakana, and Arabic numerals.
[変形例]
 なお、本発明は、上記した実施形態に限定されるものではなく、この発明の主旨を逸脱しない範囲で自由に変更できることは勿論である。あるいは、技術的に矛盾が生じない範囲で各々の構成を任意に組み合わせてもよい。
[Modification]
It goes without saying that the present invention is not limited to the above-described embodiments, and can be freely modified without departing from the gist of the present invention. Alternatively, each configuration may be arbitrarily combined as long as there is no technical contradiction.
 上記した実施形態では、階層型ニューラルネットワークを用いて識別器70,80(図7及び図9参照)を構築したが、機械学習の手法はこれに限られない。例えば、サポートベクターマシン、ディシジョンツリー(一例として、ランダムフォレスト)、ブースティング法(一例として、勾配ブースティング法)を含む様々な手法を採用してもよい。 In the above embodiment, the classifiers 70 and 80 (see FIGS. 7 and 9) were constructed using hierarchical neural networks, but the machine learning method is not limited to this. For example, various techniques may be employed including support vector machines, decision trees (eg, random forests), boosting methods (eg, gradient boosting methods).
 上記した実施形態では、特徴量生成部32(図1)がストロークの始点及び終点のみを用いて軌跡特徴量を生成する場合について説明したが、軌跡特徴量の生成方法はこれに限られない。例えば、特徴量生成部32は、始点及び終点の他に、少なくとも1つの経由点(例えば、始点と終点の間の中点)を併せて用いて軌跡特徴量を生成してもよい。 In the above-described embodiment, the case where the feature amount generation unit 32 (FIG. 1) generates the trajectory feature amount using only the start point and end point of the stroke has been described, but the method for generating the trajectory feature amount is not limited to this. For example, the feature amount generation unit 32 may generate the trajectory feature amount using at least one waypoint (for example, the middle point between the start point and the end point) in addition to the start point and the end point.
[符号の説明]
 10…入力システム、12…ユーザ装置、14…電子ペン、16…文字認識装置、26…データ取得部、28…認識処理部、30…出力処理部、32…特徴量生成部、34…文字識別部、48…ストロークデータ、52…ストローク群、70,80…識別器
[Description of symbols]
DESCRIPTION OF SYMBOLS 10... Input system, 12... User device, 14... Electronic pen, 16... Character recognition device, 26... Data acquisition part, 28... Recognition processing part, 30... Output processing part, 32... Feature amount generation part, 34... Character recognition Part 48... Stroke data 52... Stroke group 70, 80... Discriminator

Claims (7)

  1.  ユーザの筆記操作を通じて順次検出される指示位置及び筆圧値の集合を示すストロークデータを取得するデータ取得部と、
     前記データ取得部により取得された前記ストロークデータを用いて、前記筆記操作により筆記されたストローク群に関する特徴量セットを生成する特徴量生成部と、
     前記特徴量生成部により生成された前記特徴量セットを用いて、前記ストローク群が示す単体の文字又は文字列を識別する文字識別部と、
     を備え、
     前記特徴量セットには、ストローク毎の終端形状に関する終端特徴量が含まれる、文字認識装置。
    a data acquisition unit that acquires stroke data indicating a set of designated positions and writing pressure values that are sequentially detected through a user's writing operation;
    a feature amount generation unit that generates a feature amount set related to a group of strokes written by the writing operation using the stroke data acquired by the data acquisition unit;
    a character identification unit that identifies a single character or character string indicated by the stroke group using the feature amount set generated by the feature amount generation unit;
    with
    The character recognition device, wherein the feature amount set includes an end feature amount related to an end shape of each stroke.
  2.  前記文字識別部は、前記終端特徴量を含む特徴量セットを入力とし、文字毎の確率を出力する識別器を含んで構成される、
     請求項1に記載の文字認識装置。
    The character identification unit includes a classifier that receives a feature set including the terminal feature as input and outputs a probability for each character,
    2. The character recognition device according to claim 1.
  3.  前記文字識別部は、
      前記終端特徴量を含まない特徴量セットを入力とし、文字毎の確率を出力する識別器を含んで構成され、
      前記識別器により複数の文字候補が得られた場合、前記終端特徴量を用いて文字を特定する、
     請求項1に記載の文字認識装置。
    The character identification unit
    A discriminator that receives as input a feature set that does not include the terminal feature and outputs a probability for each character,
    When a plurality of character candidates are obtained by the classifier, identifying a character using the terminal feature amount;
    2. The character recognition device according to claim 1.
  4.  前記終端特徴量は、止め、払い、又は跳ねを識別する値である、
     請求項1~3のいずれか1項に記載の文字認識装置。
    The terminal feature amount is a value that identifies stopping, sweeping, or bouncing,
    A character recognition device according to any one of claims 1 to 3.
  5.  前記文字識別部は、漢字、平仮名、片仮名、及び算用数字のうち少なくとも1種類の文字を識別する、
     請求項1~3のいずれか1項に記載の文字認識装置。
    The character identification unit identifies at least one type of character among kanji, hiragana, katakana, and Arabic numerals.
    A character recognition device according to any one of claims 1 to 3.
  6.  ユーザの筆記操作を通じて順次検出される指示位置及び筆圧値の集合を示すストロークデータを取得する取得ステップと、
     取得された前記ストロークデータを用いて、前記筆記操作により筆記されたストローク群に関する特徴量セットを生成する生成ステップと、
     生成された前記特徴量セットを用いて、前記ストローク群が示す単体の文字又は文字列を識別する識別ステップと、
     をコンピュータに実行させ、
     前記特徴量セットには、ストローク毎の終端形状に関する終端特徴量が含まれる、文字認識プログラム。
    an acquiring step of acquiring stroke data indicating a set of pointing positions and writing pressure values sequentially detected through a user's writing operation;
    a generation step of generating a feature amount set related to a group of strokes written by the handwriting operation, using the acquired stroke data;
    an identification step of identifying a single character or character string indicated by the stroke group using the generated feature amount set;
    on the computer, and
    The character recognition program, wherein the feature amount set includes an end feature amount related to an end shape of each stroke.
  7.  ユーザの筆記操作を通じて順次検出される指示位置及び筆圧値の集合を示すストロークデータを取得する取得ステップと、
     取得された前記ストロークデータを用いて、前記筆記操作により筆記されたストローク群に関する特徴量セットを生成する生成ステップと、
     生成された前記特徴量セットを用いて、前記ストローク群が示す単体の文字又は文字列を識別する識別ステップと、
     をコンピュータが実行し、
     前記特徴量セットには、ストローク毎の終端形状に関する終端特徴量が含まれる、文字認識方法。
    an acquiring step of acquiring stroke data indicating a set of pointing positions and writing pressure values sequentially detected through a user's writing operation;
    a generation step of generating a feature amount set related to a group of strokes written by the writing operation using the acquired stroke data;
    an identification step of identifying a single character or character string indicated by the stroke group using the generated feature set;
    is executed by the computer and
    The character recognition method, wherein the feature amount set includes an end feature amount related to an end shape of each stroke.
PCT/JP2021/007012 2021-02-25 2021-02-25 Character recognition device, program, and method WO2022180725A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2023501748A JPWO2022180725A1 (en) 2021-02-25 2021-02-25
CN202180056714.2A CN116075868A (en) 2021-02-25 2021-02-25 Character recognition device, program, and method
PCT/JP2021/007012 WO2022180725A1 (en) 2021-02-25 2021-02-25 Character recognition device, program, and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/007012 WO2022180725A1 (en) 2021-02-25 2021-02-25 Character recognition device, program, and method

Publications (1)

Publication Number Publication Date
WO2022180725A1 true WO2022180725A1 (en) 2022-09-01

Family

ID=83047870

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/007012 WO2022180725A1 (en) 2021-02-25 2021-02-25 Character recognition device, program, and method

Country Status (3)

Country Link
JP (1) JPWO2022180725A1 (en)
CN (1) CN116075868A (en)
WO (1) WO2022180725A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10171926A (en) * 1996-12-11 1998-06-26 Kiyadeitsukusu:Kk Method and device for collating handwritten character
JP2013171441A (en) * 2012-02-21 2013-09-02 Nippon Telegr & Teleph Corp <Ntt> Apparatus and method for learning character recognition discriminator, and character recognition device, method, and program
JP2016071777A (en) * 2014-10-01 2016-05-09 株式会社東芝 Electronic apparatus, method and program
JP2017059120A (en) * 2015-09-18 2017-03-23 ヤフー株式会社 Generation apparatus, generation method, and generation program
WO2017098869A1 (en) * 2015-12-09 2017-06-15 コニカミノルタ株式会社 Personal dictionary creation program, personal dictionary creation method, and personal dictionary creation device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10171926A (en) * 1996-12-11 1998-06-26 Kiyadeitsukusu:Kk Method and device for collating handwritten character
JP2013171441A (en) * 2012-02-21 2013-09-02 Nippon Telegr & Teleph Corp <Ntt> Apparatus and method for learning character recognition discriminator, and character recognition device, method, and program
JP2016071777A (en) * 2014-10-01 2016-05-09 株式会社東芝 Electronic apparatus, method and program
JP2017059120A (en) * 2015-09-18 2017-03-23 ヤフー株式会社 Generation apparatus, generation method, and generation program
WO2017098869A1 (en) * 2015-12-09 2017-06-15 コニカミノルタ株式会社 Personal dictionary creation program, personal dictionary creation method, and personal dictionary creation device

Also Published As

Publication number Publication date
CN116075868A (en) 2023-05-05
JPWO2022180725A1 (en) 2022-09-01

Similar Documents

Publication Publication Date Title
Zeng et al. Hand gesture recognition using leap motion via deterministic learning
US7860313B2 (en) Methods and apparatuses for extending dynamic handwriting recognition to recognize static handwritten and machine generated text
US20050100214A1 (en) Stroke segmentation for template-based cursive handwriting recognition
JP5717691B2 (en) Handwritten character search device, method and program
Bhattacharya et al. A sigma-lognormal model-based approach to generating large synthetic online handwriting sample databases
CN108701215B (en) System and method for identifying multi-object structures
JP2019508770A (en) System and method for beautifying digital ink
JP6055065B1 (en) Character recognition program and character recognition device
US11837001B2 (en) Stroke attribute matrices
Mohammadi et al. Air-writing recognition system for Persian numbers with a novel classifier
Mohammadi et al. Real-time Kinect-based air-writing system with a novel analytical classifier
Chiang et al. Recognizing arbitrarily connected and superimposed handwritten numerals in intangible writing interfaces
US9250802B2 (en) Shaping device
CN106250035B (en) System and method for dynamically generating personalized handwritten fonts
CN111738167A (en) Method for recognizing unconstrained handwritten text image
WO2022180725A1 (en) Character recognition device, program, and method
Xu et al. On-line sample generation for in-air written chinese character recognition based on leap motion controller
Jian et al. Mobile terminal trajectory recognition based on improved LSTM model
TW201248456A (en) Identifying contacts and contact attributes in touch sensor data using spatial and temporal features
CN111459395A (en) Gesture recognition method and system, storage medium and man-machine interaction device
Shankar et al. Sketching in three dimensions: A beautification scheme
JP6030172B2 (en) Handwritten character search device, method and program
JP7320157B1 (en) CONTENT EVALUATION DEVICE, PROGRAM, METHOD, AND SYSTEM
K Jabde et al. A Comprehensive Literature Review on Air-written Online Handwritten Recognition
Kumar et al. Real Time Air-Written Mathematical Expression Recognition for Children’s Enhanced Learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21927827

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023501748

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21927827

Country of ref document: EP

Kind code of ref document: A1