JP2022533065A

JP2022533065A - Character recognition methods and devices, electronic devices and storage media

Info

Publication number: JP2022533065A
Application number: JP2021567034A
Authority: JP
Inventors: シアオユーユエ; ジャンフイクアン; チェンハオリン; ホンビンスン; ウェイジャン
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2020-04-16
Filing date: 2021-03-19
Publication date: 2022-07-21
Also published as: CN111539410A; WO2021208666A1; KR20220011783A; CN111539410B; TW202141352A

Abstract

本発明は、文字認識方法及び装置、電子機器並びに記憶媒体に関し、ここで、前記文字認識方法は、認識対象となる目標画像を取得することと、決定された位置ベクトル及び前記目標画像の第１画像特徴に基づいて、前記目標画像の文字特徴を取得することであって、前記位置ベクトルは、プリセット情報シーケンスにおける文字の位置特徴に基づいて決定されたものであることと、前記文字特徴に基づいて前記目標画像内の文字を認識して、前記目標画像の文字認識結果を取得することと、を含む。本発明の実施例は、文字認識の精度を向上させることができる。【選択図】図１The present invention relates to a character recognition method and apparatus, an electronic device, and a storage medium, wherein the character recognition method comprises obtaining a target image to be recognized, determining a position vector and a first position vector of the target image. obtaining character features of the target image based on image features, wherein the position vector is determined based on position features of characters in a preset information sequence; and recognizing characters in the target image to obtain character recognition results for the target image. Embodiments of the present invention can improve the accuracy of character recognition. [Selection drawing] Fig. 1

Description

［関連出願への相互参照］
本願は、２０２０年０４月１６日に中国特許局に提出された、出願番号が２０２０１０３０１３４０．３である中国特許出願に基づいて提出されるものであり、当該中国特許出願の優先権を主張し、当該中国特許出願の全ての内容が参照として本願に組み込まれる。
［技術分野］
本発明は、電子技術分野に関し、特に、文字認識方法及び装置、電子機器並びに記憶媒体に関する。 [Cross-reference to related applications]
This application is submitted based on a Chinese patent application with an application number of 20100301340.3, which was filed with the Chinese Patent Office on April 16, 2020, claiming the priority of the Chinese patent application. The entire contents of the Chinese patent application are incorporated herein by reference.
[Technical field]
The present invention relates to the field of electronic technology, and more particularly to character recognition methods and devices, electronic devices and storage media.

電子技術の発展に伴い、ますます多くの作業を電子機器で遂行したり、電子機器の支援により遂行したりすることができ、これは、人々に便利をもたらす。例えば、コンピュータを用いて文字を自動的に認識することにより、手動処理の効率を改善することができる。 With the development of electronic technology, more and more tasks can be performed with electronic devices or with the support of electronic devices, which brings convenience to people. For example, the efficiency of manual processing can be improved by automatically recognizing characters using a computer.

現在、文字認識では、ドキュメントの解析など、規則な文字を認識することができる。文字認識は、不規則な文字を認識することもでき、例えば、交通標識や店の看板など、自然のシーンで不規則な文字を認識することができる。ただし、視角の変化や光照の変化などの要因により、不規則な文字を正確に認識することは困難である。 Currently, character recognition can recognize regular characters, such as parsing documents. Character recognition can also recognize irregular characters, and can recognize irregular characters in natural scenes such as traffic signs and shop signs. However, it is difficult to accurately recognize irregular characters due to factors such as changes in viewing angle and changes in illumination.

本発明は、文字認識のための技術的解決策を提案する。 The present invention proposes a technical solution for character recognition.

本発明の１つの態様によれば、文字認識方法を提供し、前記文字認識方法は、認識対象となる目標画像を取得することと、決定された位置ベクトル及び前記目標画像の第１画像特徴に基づいて、前記目標画像の文字特徴を取得することであって、前記位置ベクトルは、プリセット情報シーケンスにおける文字の位置特徴に基づいて決定されたものであることと、前記文字特徴に基づいて前記目標画像内の文字を認識して、前記目標画像の文字認識結果を取得することと、を含む。 According to one aspect of the present invention, a character recognition method is provided, in which the character recognition method obtains a target image to be recognized and uses a determined position vector and a first image feature of the target image. Based on the acquisition of the character feature of the target image, the position vector is determined based on the position feature of the character in the preset information sequence, and the target is based on the character feature. It includes recognizing characters in an image and acquiring a character recognition result of the target image.

１つの可能な実施形態において、前記決定された位置ベクトル及び前記目標画像の第１画像特徴に基づいて、前記目標画像の文字特徴を取得することは、前記目標画像の第１画像特徴を符号化して、前記第１画像特徴の符号化結果を取得することと、前記第１画像特徴の符号化結果に従って、前記目標画像の第２画像特徴を決定することと、決定された位置ベクトル、前記第１画像特徴及び前記第２画像特徴に基づいて、前記目標画像の文字特徴を取得することと、を含む。ここで、第２画像特徴がより強い位置特徴を持つため、それに対応する取得された目標画像の文字特徴もより強い位置特徴を持ち、それにより、文字特徴に基づいて取得された文字認識結果はより正確であり、文字認識結果へのセマンティックの影響を更に軽減する。 In one possible embodiment, acquiring the character features of the target image based on the determined position vector and the first image feature of the target image encodes the first image feature of the target image. Then, the coding result of the first image feature is acquired, the second image feature of the target image is determined according to the coding result of the first image feature, and the determined position vector, the first. It includes acquiring character features of the target image based on one image feature and the second image feature. Here, since the second image feature has a stronger position feature, the corresponding acquired character feature of the target image also has a stronger position feature, so that the character recognition result acquired based on the character feature is It is more accurate and further reduces the semantic effect on character recognition results.

１つの可能な実施形態において、前記目標画像の第１画像特徴を符号化して、前記第１画像特徴の符号化結果を取得することは、前記第１画像特徴の複数の第１次元特徴ベクトルに対して少なくとも１レベルの第１符号化処理を順次に実行して、前記第１画像特徴の符号化結果を取得することを含む。第１画像特徴の複数の第１次元特徴ベクトルに対して、１レベル又は複数レベルの第１符号化処理を順次に実行することにより、第１画像特徴に含まれる位置特徴を強調することができ、取得された第１画像特徴の符号化結果は、文字間のより明白な位置特徴を持つことができる。 In one possible embodiment, encoding the first image feature of the target image and obtaining the coding result of the first image feature can result in a plurality of one-dimensional feature vectors of the first image feature. On the other hand, at least one level of the first coding process is sequentially executed, and the coding result of the first image feature is acquired. Positional features included in the first image feature can be emphasized by sequentially executing one-level or multiple-level first coding processing on a plurality of first-dimensional feature vectors of the first image feature. , The obtained encoded result of the first image feature can have more obvious positional features between the characters.

１つの可能な実施形態において、前記第１画像特徴の複数の第１次元特徴ベクトルに対して少なくとも１レベルの第１符号化処理を順次に実行して、前記第１画像特徴の符号化結果を取得することは、前記少なくとも１レベルの第１符号化処理における１レベルの第１符号化処理について、Ｎ（Ｎは、正の整数である）個の第１符号化ノードを用いて前記第１符号化ノードの入力情報に対して順次符号化を行い、Ｎ個の第１符号化ノードの出力結果を取得することであって、１＜ｉ≦Ｎである場合、ｉ（ｉは、正の整数である）番目の第１符号化ノードの入力情報は、ｉ－１番目の第１符号化ノードの出力結果を含むことと、前記Ｎ個の第１符号化ノードの出力結果に従って、前記第１画像特徴の符号化結果を取得することと、を含む。このようにして、１番目の第１符号化ノードの入力情報を最後の第１符号化ノードへ転送することができるため、第１符号化ノードの入力情報を長期間記憶することができ、取得された出力結果をより正確にすることができる。 In one possible embodiment, at least one level of first coding processing is sequentially executed on a plurality of first-dimensional feature vectors of the first image feature, and the coding result of the first image feature is obtained. To obtain the first coding process of the first level in the first coding process of at least one level using N (N is a positive integer) first coding nodes. It is to sequentially encode the input information of the coding node and acquire the output result of N first coding nodes, and when 1 <i ≦ N, i (i is positive). The input information of the first coded node (which is an integer) includes the output result of the i-1st first coded node, and according to the output result of the N first coded nodes, the first coded node is described. 1 Acquiring the coding result of the image feature includes. In this way, since the input information of the first first coded node can be transferred to the last first coded node, the input information of the first coded node can be stored for a long period of time and acquired. The output result can be made more accurate.

１つの可能な実施形態において、前記第１符号化ノードの入力情報は、前記第１画像特徴の第１次元特徴ベクトル又は前レベルの第１符号化処理の出力結果を更に含む。このようにして、１レベルの第１符号化処理において、第１符号化ノードを介して、第１画像特徴の第１次元特徴ベクトル又は前レベルの第１符号化処理の出力結果を最後の第１符号化ノードへ転送することができるため、１レベルの第１符号化処理出力結果をより正確にすることができる。 In one possible embodiment, the input information of the first coding node further includes the output result of the first dimension feature vector of the first image feature or the first level first coding process. In this way, in the first-level first coding process, the output result of the first-dimensional feature vector of the first image feature or the first-level first coding process of the previous level is finally output via the first coding node. Since it can be transferred to one coded node, the output result of the first level coded processing can be made more accurate.

１つの可能な実施形態において、前記決定された位置ベクトル、前記第１画像特徴及び前記第２画像特徴に基づいて、前記目標画像の文字特徴を取得することは、前記位置ベクトル及び前記第２画像特徴に従って、アテンション重みを決定することと、前記アテンション重みを用いて前記第１画像特徴に対して特徴加重を実行することにより、前記目標画像の文字特徴を取得することと、を含む。このように、アテンション重みを用いて、第１画像特徴内の注意すべき特徴を強調できるため、アテンション重みを用いて第１画像特徴に対して特徴加重を実行することにより取得された文字特徴は、第１画像特徴のより重要な特徴部分をより正確に反映できる。 In one possible embodiment, acquiring the character features of the target image based on the determined position vector, the first image feature and the second image feature is the position vector and the second image. It includes determining the attention weight according to the feature and acquiring the character feature of the target image by performing feature weighting on the first image feature using the attention weight. In this way, the attention weight can be used to emphasize the feature to be noted in the first image feature, so that the character feature acquired by performing the feature weighting on the first image feature using the attention weight is , The more important feature portion of the first image feature can be reflected more accurately.

１つの可能な実施形態において、前記文字認識方法は、少なくとも１つの第１プリセット情報を含むプリセット情報シーケンスを取得することと、前記少なくとも１つの第１プリセット情報に対して少なくとも１レベルの第２符号化処理を順次に行い、前記位置ベクトルを取得することと、を更に含む。ニューラルネットワークを用いて少なくとも１つの第１プリセット情報に対して第２符号化処理を行うプロセスでは、少なくとも１つの第１プリセット情報を順次に符号化するため、生成された位置ベクトルは、少なくとも１つの第１プリセット情報の順序に関連し、それによって、位置ベクトルは、文字間の位置特徴を表すことができる。 In one possible embodiment, the character recognition method obtains a preset information sequence that includes at least one first preset information and a second code of at least one level with respect to the at least one first preset information. Further including the acquisition of the position vector by sequentially performing the conversion process. In the process of performing the second coding process on at least one first preset information using the neural network, at least one first preset information is sequentially encoded, so that at least one position vector is generated. Related to the order of the first preset information, the position vector can represent the position features between the characters.

１つの可能な実施形態において、前記少なくとも１つの第１プリセット情報に対して少なくとも１レベルの第２符号化処理を順次に行い、前記位置ベクトルを取得することは、前記少なくとも１レベルの第２符号化処理における１レベルの第２符号化処理について、Ｍ（Ｍは、正の整数である）個の第２符号化ノードを用いて前記第２符号化ノードの入力情報に対して順次符号化を行い、Ｍ個の第２符号化ノードの出力結果を取得することであって、１＜ｊ≦Ｍである場合、ｊ（ｊは、正の整数である）番目の第２符号化ノードの入力情報は、ｊ－１番目の第２符号化ノードの出力結果を含むことと、前記Ｍ個の第２符号化ノードの出力結果に従って、前記位置ベクトルを取得することと、を含む。このようにして、１番目の第２符号化ノードの入力情報を最後の第２符号化ノードへ転送することができるため、第２符号化ノードの入力情報を長期間記憶することができ、取得された位置ベクトルをより正確にすることができる。 In one possible embodiment, acquiring the position vector by sequentially performing at least one level of second coding processing on the at least one first preset information is a process of sequentially performing the at least one level second coding process. Regarding the 1-level second coding process in the conversion process, M (M is a positive integer) second coding nodes are used to sequentially encode the input information of the second coding node. This is to acquire the output results of M second coded nodes, and when 1 <j ≦ M, the input of the j (j is a positive integer) second coded node is input. The information includes including the output result of the j-1st second coded node and acquiring the position vector according to the output result of the M second coded nodes. In this way, since the input information of the first second coding node can be transferred to the last second coding node, the input information of the second coding node can be stored for a long period of time and acquired. The position vector can be made more accurate.

１つの可能な実施形態において、前記第２符号化ノードの入力情報は、前記第１プリセット情報又は前レベルの第２符号化処理の出力結果を更に含む。このようにして、１レベルの第２符号化処理において、第２符号化ノードを介して、第１プリセット情報又は前レベルの第２符号化処理の出力結果を最後の第１符号化ノードへ転送することができるため、１レベルの第１符号化処理出力結果をより正確にすることができる。 In one possible embodiment, the input information of the second coding node further includes the first preset information or the output result of the second coding process at the previous level. In this way, in the first-level second coding process, the output result of the first preset information or the previous level second coding process is transferred to the last first coding node via the second coding node. Therefore, the output result of the first level coding process can be made more accurate.

１つの可能な実施形態において、前記文字特徴に基づいて前記目標画像内の文字を認識して、前記目標画像の文字認識結果を取得することは、前記目標画像のセマンティック特徴を抽出することと、前記目標画像のセマンティック特徴及び前記文字特徴に基づいて、前記目標画像の文字認識結果を取得することと、を含む。このように、目標画像の文字認識結果を取得するプロセスにおいて、セマンティック特徴と文字特徴とを組み合わせることにより、文字認識結果の精度を向上させることができる。 In one possible embodiment, recognizing a character in the target image based on the character feature and acquiring the character recognition result of the target image is to extract the semantic feature of the target image. Acquiring the character recognition result of the target image based on the semantic feature of the target image and the character feature. In this way, in the process of acquiring the character recognition result of the target image, the accuracy of the character recognition result can be improved by combining the semantic feature and the character feature.

１つの可能な実施形態において、前記目標画像のセマンティック特徴を抽出することは、取得された第２プリセット情報に基づいて、少なくとも１つのタイムステップにおける前記目標画像のセマンティック特徴を順次に決定することを含み、前記目標画像のセマンティック特徴及び前記文字特徴に基づいて、前記目標画像の文字認識結果を取得することは、少なくとも１つのタイムステップにおける前記目標画像のセマンティック特徴及び前記文字特徴に基づいて、少なくとも１つのタイムステップにおける前記目標画像の文字認識結果を取得することを含む。ここで、目標画像に複数の文字が存在するある場合、文字の位置（文字特徴）及びセマンティック（セマンティック特徴）に従って文字認識結果を順次に取得できるため、文字認識結果の精度を向上させることができる。 In one possible embodiment, extracting the semantic features of the target image sequentially determines the semantic features of the target image in at least one time step based on the acquired second preset information. Acquiring the character recognition result of the target image, including, based on the semantic features and the character features of the target image, is at least based on the semantic features and the character features of the target image in at least one time step. It includes acquiring the character recognition result of the target image in one time step. Here, when there are a plurality of characters in the target image, the character recognition results can be sequentially acquired according to the character positions (character features) and semantics (semantic features), so that the accuracy of the character recognition results can be improved. ..

１つの可能な実施形態において、前記取得された第２プリセット情報に基づいて、少なくとも１つのタイムステップにおける前記目標画像のセマンティック特徴を順次に決定することは、前記第２プリセット情報に対して少なくとも１レベルの第３符号化処理を行い、前記少なくとも１つのタイムステップのうちの１番目のタイムステップのセマンティック特徴を取得することと、ｋ－１（ｋは、１より大きい整数である）番目のタイムステップにおける前記目標画像の文字認識結果に対して少なくとも１レベルの第３符号化処理を行い、ｋ番目のタイムステップにおける前記目標画像のセマンティック特徴を取得することと、を含む。このようにして、前の順番の第３符号化ノードの入力情報を後の順番の第３符号化ノードへ転送することができ、これにより、第３符号化ノードの入力情報を長期間記憶することができ、取得されたセマンティック特徴をより正確にすることができる。 In one possible embodiment, sequentially determining the semantic features of the target image in at least one time step based on the acquired second preset information is at least one with respect to the second preset information. The third coding process of the level is performed to acquire the semantic feature of the first time step of the at least one time step, and the k-1 (k is an integer greater than 1) th time. The character recognition result of the target image in the step is subjected to at least one level of third coding processing, and the semantic feature of the target image in the kth time step is acquired. In this way, the input information of the third coded node in the previous order can be transferred to the third coded node in the later order, whereby the input information of the third coded node is stored for a long period of time. It is possible to make the acquired semantic features more accurate.

本発明の１つの態様によれば、文字認識装置を提供し、前記文字認識装置は、
認識対象となる目標画像を取得するように構成される取得部と、
決定された位置ベクトル及び前記目標画像の第１画像特徴に基づいて、前記目標画像の文字特徴を取得するように構成される決定部であって、前記位置ベクトルは、プリセット情報シーケンスにおける文字の位置特徴に基づいて決定されたものである、決定部と、
前記文字特徴に基づいて前記目標画像内の文字を認識して、前記目標画像の文字認識結果を取得するように構成される認識部と、を備える。 According to one aspect of the present invention, a character recognition device is provided, and the character recognition device is a character recognition device.
An acquisition unit configured to acquire the target image to be recognized,
A determination unit configured to acquire character features of the target image based on the determined position vector and the first image feature of the target image, wherein the position vector is the position of the character in the preset information sequence. The decision part, which is decided based on the characteristics,
A recognition unit configured to recognize characters in the target image based on the character features and acquire a character recognition result of the target image is provided.

１つの可能な実施形態において、前記決定部は更に、前記目標画像の第１画像特徴を符号化して、前記第１画像特徴の符号化結果を取得し、前記第１画像特徴の符号化結果に従って、前記目標画像の第２画像特徴を決定し、決定された位置ベクトル、前記第１画像特徴及び前記第２画像特徴に基づいて、前記目標画像の文字特徴を取得するように構成される。 In one possible embodiment, the determination unit further encodes the first image feature of the target image to obtain the encoded result of the first image feature and according to the encoded result of the first image feature. , The second image feature of the target image is determined, and the character feature of the target image is acquired based on the determined position vector, the first image feature, and the second image feature.

１つの可能な実施形態において、前記決定部は更に、前記第１画像特徴の複数の第１次元特徴ベクトルに対して少なくとも１レベルの第１符号化処理を順次に実行して、前記第１画像特徴の符号化結果を取得するように構成される。 In one possible embodiment, the determination unit further sequentially performs at least one level of first coding processing on the plurality of first-dimensional feature vectors of the first image feature to sequentially perform the first image. It is configured to acquire the coded result of the feature.

１つの可能な実施形態において、前記決定部は更に、前記少なくとも１レベルの第１符号化処理における１レベルの第１符号化処理について、Ｎ（Ｎは、正の整数である）個の第１符号化ノードを用いて前記第１符号化ノードの入力情報に対して順次符号化を行い、Ｎ個の第１符号化ノードの出力結果を取得し、ここで、１＜ｉ≦Ｎである場合、ｉ（ｉは、正の整数である）番目の第１符号化ノードの入力情報は、ｉ－１番目の第１符号化ノードの出力結果を含み、前記Ｎ個の第１符号化ノードの出力結果に従って、前記第１画像特徴の符号化結果を取得するように構成される。 In one possible embodiment, the determination unit further describes N (where N is a positive integer) first for the 1st level 1st coding process in the at least 1st level 1st coding process. When the input information of the first coding node is sequentially coded using the coding node and the output results of N first coding nodes are acquired, where 1 <i ≦ N. , I (i is a positive integer) th input information includes the output result of the i-1st first coded node, and of the N first coded nodes. It is configured to acquire the coding result of the first image feature according to the output result.

１つの可能な実施形態において、前記第１符号化ノードの入力情報は、前記第１画像特徴の第１次元特徴ベクトル又は前レベルの第１符号化処理の出力結果を更に含む。 In one possible embodiment, the input information of the first coding node further includes the output result of the first dimension feature vector of the first image feature or the first level first coding process.

１つの可能な実施形態において、前記決定部は更に、前記位置ベクトル及び前記第２画像特徴に従って、アテンション重みを決定し、前記アテンション重みを用いて前記第１画像特徴に対して特徴加重を実行することにより、前記目標画像の文字特徴を取得するように構成される。 In one possible embodiment, the determination unit further determines the attention weight according to the position vector and the second image feature, and uses the attention weight to perform feature weighting on the first image feature. This is configured to acquire the character features of the target image.

１つの可能な実施形態において、前記文字認識装置は更に、少なくとも１つの第１プリセット情報を含むプリセット情報シーケンスを取得し、前記少なくとも１つの第１プリセット情報に対して少なくとも１レベルの第２符号化処理を順次に行い、前記位置ベクトルを取得するように構成される符号化部を備える。 In one possible embodiment, the character recognition device further acquires a preset information sequence that includes at least one first preset information and at least one level of second coding for the at least one first preset information. A coding unit configured to sequentially perform processing and acquire the position vector is provided.

１つの可能な実施形態において、前記符号化部は更に、前記少なくとも１レベルの第２符号化処理における１レベルの第２符号化処理について、Ｍ（Ｍは、正の整数である）個の第２符号化ノードを用いて前記第２符号化ノードの入力情報に対して順次符号化を行い、Ｍ個の第２符号化ノードの出力結果を取得し、ここで、１＜ｊ≦Ｍである場合、ｊ（ｊは、正の整数である）番目の第２符号化ノードの入力情報は、ｊ－１番目の第２符号化ノードの出力結果を含み、前記Ｍ個の第２符号化ノードの出力結果に従って、前記位置ベクトルを取得するように構成される。 In one possible embodiment, the coding unit further describes an M (where M is a positive integer) th-order for the one-level second coding process in the at least one-level second coding process. The input information of the second coding node is sequentially coded using the two coding nodes, and the output results of the M second coding nodes are acquired, where 1 <j ≦ M. In the case, the input information of the j (j is a positive integer) second coding node includes the output result of the j-1st second coding node, and the M second coding nodes are included. It is configured to acquire the position vector according to the output result of.

１つの可能な実施形態において、前記第２符号化ノードの入力情報は、前記第１プリセット情報又は前レベルの第２符号化処理の出力結果を更に含む。 In one possible embodiment, the input information of the second coding node further includes the first preset information or the output result of the second coding process at the previous level.

１つの可能な実施形態において、前記認識部は更に、前記目標画像のセマンティック特徴を抽出し、前記目標画像のセマンティック特徴及び前記文字特徴に基づいて、前記目標画像の文字認識結果を取得するように構成される。 In one possible embodiment, the recognition unit further extracts the semantic features of the target image and obtains the character recognition result of the target image based on the semantic features of the target image and the character features. It is composed.

１つの可能な実施形態において、前記認識部は更に、取得された第２プリセット情報に基づいて、少なくとも１つのタイムステップにおける前記目標画像のセマンティック特徴を順次に決定し、少なくとも１つのタイムステップにおける前記目標画像のセマンティック特徴及び前記文字特徴に基づいて、少なくとも１つのタイムステップにおける前記目標画像の文字認識結果を取得するように構成される。 In one possible embodiment, the recognizer further sequentially determines the semantic features of the target image in at least one time step based on the acquired second preset information, said in at least one time step. Based on the semantic features of the target image and the character features, it is configured to acquire the character recognition result of the target image in at least one time step.

１つの可能な実施形態において、前記認識部は更に、前記第２プリセット情報に対して少なくとも１レベルの第３符号化処理を行い、前記少なくとも１つのタイムステップのうちの１番目のタイムステップのセマンティック特徴を取得し、ｋ－１（ｋは、１より大きい整数である）番目のタイムステップにおける前記目標画像の文字認識結果に対して少なくとも１レベルの第３符号化処理を行い、ｋ番目のタイムステップにおける前記目標画像のセマンティック特徴を取得するように構成される。 In one possible embodiment, the recognition unit further performs at least one level of third coding processing on the second preset information, and the semantic of the first time step of the at least one time step. The feature is acquired, and at least one level of third coding processing is performed on the character recognition result of the target image in the k-1 (k is an integer greater than 1) th time step, and the kth time is obtained. It is configured to acquire the semantic features of the target image in the step.

本発明の１つの態様によれば、電子機器を提供し、前記電子機器は、
プロセッサと、
プロセッサ実行可能な命令を記憶するように構成されるメモリと、を備え、
ここで、前記プロセッサは、前記メモリに記憶されている命令を呼び出して実行することにより、上記の文字認識方法を実行するように構成される。 According to one aspect of the present invention, an electronic device is provided, wherein the electronic device is a device.
With the processor
With memory configured to store processor executable instructions,
Here, the processor is configured to execute the character recognition method by calling and executing an instruction stored in the memory.

本発明の１つの態様によれば、コンピュータプログラム命令が記憶されているコンピュータ可読記憶媒体を提供し、前記コンピュータプログラム命令がプロセッサによって実行されるときに、上記の文字認識方法を実現する。 According to one aspect of the present invention, a computer-readable storage medium in which computer program instructions are stored is provided, and when the computer program instructions are executed by a processor, the above-mentioned character recognition method is realized.

本発明の１つの態様によれば、コンピュータ可読コードを含むコンピュータプログラムを提供し、前記コンピュータ可読コードが電子機器で実行されるときに、前記電子機器のプロセッサに、上記の文字認識方法を実行させる。 According to one aspect of the present invention, a computer program including a computer-readable code is provided, and when the computer-readable code is executed in an electronic device, the processor of the electronic device is made to execute the above-mentioned character recognition method. ..

本発明の実施例では、認識対象となる目標画像を取得し、次に、決定された位置ベクトル及び目標画像の第１画像特徴に基づいて、目標画像の文字特徴を取得し、その後、文字特徴に基づいて目標画像内の文字を認識して、目標画像の文字認識結果を取得することができる。ここで、位置ベクトルは、プリセット情報シーケンスにおける文字の位置特徴に基づいて決定されたものであり、文字間の位置特徴を表すことができるため、文字認識プロセスにおいて、文字認識結果への文字間の位置特徴の影響を高め、文字認識の精度を向上させることができ、例えば、不規則な文字や非セマンティック文字の場合、より良い認識効果を得ることができる。 In the embodiment of the present invention, the target image to be recognized is acquired, then the character features of the target image are acquired based on the determined position vector and the first image feature of the target image, and then the character features. The characters in the target image can be recognized based on the above, and the character recognition result of the target image can be obtained. Here, since the position vector is determined based on the character position feature in the preset information sequence and can represent the position feature between the characters, in the character recognition process, the character spacing to the character recognition result is obtained. The influence of the position feature can be enhanced and the accuracy of character recognition can be improved. For example, in the case of irregular characters or non-semantic characters, a better recognition effect can be obtained.

上記した一般的な説明および後述する詳細な説明は、単なる例示および説明に過ぎず、本開示を限定するものではないことを理解されたい。 It should be understood that the general description described above and the detailed description described below are merely examples and description and are not intended to limit the present disclosure.

以下、図面を参照した例示的な実施例に対する詳細な説明によれば、本開示の他の特徴および態様は明らかになる。 In the following, a detailed description of an exemplary embodiment with reference to the drawings will reveal other features and embodiments of the present disclosure.

ここでの図面は、本明細書に組み込まれてその一部を構成し、これらの図面は、本発明に準拠する実施例を示し、明細書とともに本発明の技術的解決策を説明するために使用される。
本発明の実施例に係る文字認識方法のフローチャートである。本発明の実施例に係る、目標画像の第２画像特徴を決定する一例を示すブロック図である。本発明の実施例に係る、ニューラルネットワークを用いて文字認識結果を取得する一例を示すブロック図である。本発明の実施例に係る文字認識装置の一例のブロック図である。本発明の実施例に係る文字認識装置の一例のブロック図である。本発明の実施例に係る電子機器の一例のブロック図である。 The drawings herein are incorporated herein to constitute a portion thereof, and these drawings are intended to provide examples in accordance with the present invention and to illustrate the technical solutions of the present invention together with the specification. used.
It is a flowchart of the character recognition method which concerns on embodiment of this invention. It is a block diagram which shows an example which determines the 2nd image feature of the target image which concerns on Example of this invention. It is a block diagram which shows an example which acquires the character recognition result using the neural network which concerns on embodiment of this invention. It is a block diagram of an example of the character recognition apparatus which concerns on embodiment of this invention. It is a block diagram of an example of the character recognition apparatus which concerns on embodiment of this invention. It is a block diagram of an example of the electronic device which concerns on embodiment of this invention.

以下、本発明のさまざまな例示的な実施例、特徴及び態様を、図面を参照して詳細に説明する。図面における同じ参照番号は、同じまたは類似の機能の要素を表示する。実施例の様々な態様を図面に示したが、特に明記しない限り、縮尺通りに図面を描く必要がない。 Hereinafter, various exemplary examples, features and embodiments of the present invention will be described in detail with reference to the drawings. The same reference number in the drawing indicates an element of the same or similar function. Various aspects of the examples are shown in the drawings, but it is not necessary to draw the drawings to scale unless otherwise specified.

明細書における「例示的」という専門の記載は、「例、実施例または説明として使用される」ことを意味する。ここで、「例示的」として記載される任意の実施例は、他の実施例より適切であるかまたは優れると解釈される必要はない。 The specialized description of "exemplary" in the specification means "used as an example, example or description". Here, any example described as "exemplary" need not be construed as more appropriate or superior to the other examples.

本明細書における「及び／又は」という用語は、単に関連対象の関連関係を表し、３種類の関係が存在し得ることを示し、例えば、Ａ及び／又はＢは、Ａが独立で存在する場合、ＡとＢが同時に存在する場合、Ｂが独立で存在する場合のような３つの場合を表す。更に、本明細書における「少なくとも１つ」という用語は、複数のうちの１つ又は複数のうちの少なくとも２つの任意の組み合わせを示し、例えば、Ａ、Ｂ、Ｃのうちの少なくとも１つを含むことは、Ａ、Ｂ及びＣで構成された集合から選択された任意の１つ又は複数の要素を含むことを示す。 The term "and / or" in the present specification merely refers to the relational relationship of the related object, and indicates that three kinds of relations can exist. For example, A and / or B means that A exists independently. , A and B exist at the same time, and represent three cases such as the case where B exists independently. Further, the term "at least one" as used herein refers to any combination of one or more of a plurality, including, for example, at least one of A, B, C. This indicates that it contains any one or more elements selected from the set composed of A, B and C.

更に、本開示をよりよく説明するために、以下の実施形態において、多数の詳細が記載されている。当業者は、幾つかの詳細が記載されなくても本開示が実施できることを理解されたい。いくつかの実施例において、本開示の要旨を強調するために、当業者に周知の方法、手段、要素及び回路について詳細に説明しない。 In addition, a number of details are provided in the following embodiments to better illustrate the present disclosure. It will be appreciated by those skilled in the art that the present disclosure can be carried out without some details being provided. In some embodiments, methods, means, elements and circuits well known to those of skill in the art will not be described in detail in order to emphasize the gist of the present disclosure.

本発明の実施例に係る文字認識の解決策によれば、認識対象となる目標画像を取得した後、決定された位置ベクトル及び目標画像の第１画像特徴に基づいて、目標画像の文字特徴を取得し、次に、文字特徴に基づいて目標画像内の文字を認識して、目標画像の文字認識結果を取得することができる。ここで、位置ベクトルは、プリセット情報シーケンスにおける文字の位置特徴に基づいて決定されたものであり、文字の位置特徴を表すことができるため、文字認識プロセスで、文字間の位置特徴を強調することができ、これにより、取得された文字認識結果をより正確にすることができる。 According to the character recognition solution according to the embodiment of the present invention, after acquiring the target image to be recognized, the character feature of the target image is determined based on the determined position vector and the first image feature of the target image. Then, the characters in the target image can be recognized based on the character features, and the character recognition result of the target image can be acquired. Here, since the position vector is determined based on the position feature of the character in the preset information sequence and can represent the position feature of the character, the position feature between the characters is emphasized in the character recognition process. This makes it possible to make the acquired character recognition result more accurate.

関連技術では、通常、文字間のセマンティック特徴を用いて文字シーケンスを認識するが、一部の文字シーケンス内の文字間の意味的関連性が低い。例えば、ライセンスプレート番号や部屋番号などの文字シーケンスの文字間の意味的関連性が低いため、セマンティック特徴による文字シーケンスの認識効果が悪い。本発明の実施例に係る文字認識の解決策は、文字認識への文字の位置特徴の影響を高め、文字認識プロセスのセマンティック特徴への依存性を低減し、意味的関連性が低い文字の認識や不規則な文字の認識に対して、より優れた認識効果を得ることができる。 In related techniques, character sequences are usually recognized using semantic features between characters, but the semantic relevance between characters in some character sequences is low. For example, since the semantic relevance between characters of a character sequence such as a license plate number or a room number is low, the effect of recognizing a character sequence by a semantic feature is poor. The character recognition solution according to the embodiment of the present invention enhances the influence of character position features on character recognition, reduces the dependence of the character recognition process on semantic features, and recognizes characters with low semantic relevance. It is possible to obtain a better recognition effect on the recognition of irregular characters.

本発明の実施例に係る技術的解決策は、画像内の文字認識、画像からテキストへの変換などの適用シナリオの拡張に適用することができ、本発明の実施例は、これらに対して特に限定しない。例えば、交通標識の不規則な文字に対して文字認識を行い、交通標識が表す交通指示を判断することにより、ユーザに便利をもたらすことができる。 The technical solution according to the embodiment of the present invention can be applied to the extension of application scenarios such as character recognition in an image and conversion from an image to text, and the embodiment of the present invention particularly relates to these. Not limited. For example, it is possible to bring convenience to the user by performing character recognition on irregular characters of a traffic sign and determining a traffic instruction represented by the traffic sign.

図１は、本発明の実施例に係る文字認識方法のフローチャートを示す。当該文字認識方法は、端末機器、サーバ又は他のタイプの電子機器によって実行されることができ、ここで、端末機器は、ユーザ機器（ＵＥ：ＵｓｅｒＥｑｕｉｐｍｅｎｔ）、モバイル機器、ユーザ端末、端末、セルラー電話、コードレス電話、携帯情報端末（ＰＤＡ：ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）、ハンドヘルド機器、コンピューティング機器、車載機器、ウェアラブル機器などであってもよい。いくつかの可能な実施形態において、当該文字認識方法は、プロセッサが、メモリに記憶されているコンピュータ可読命令を呼び出して実することにより実現することができる。以下では、電子機器が実行主体であることを例として、本発明の実施例の文字認識方法について説明する。 FIG. 1 shows a flowchart of a character recognition method according to an embodiment of the present invention. The character recognition method can be performed by a terminal device, server or other type of electronic device, where the terminal device is a user device (UE: User Computing), mobile device, user terminal, terminal, cellular. It may be a telephone, a cordless telephone, a personal digital assistant (PDA), a handheld device, a computing device, an in-vehicle device, a wearable device, or the like. In some possible embodiments, the character recognition method can be implemented by the processor calling and implementing a computer-readable instruction stored in memory. Hereinafter, the character recognition method according to the embodiment of the present invention will be described by taking as an example that the electronic device is the execution subject.

ステップＳ１１において、認識対象となる目標画像を取得する。 In step S11, the target image to be recognized is acquired.

本発明の実施例において、電子機器は、画像収集機能を有することができ、認識対象となる目標画像を収集することができる。又は、電子機器は、他の機器から、認識対象となる目標画像を取得してもよく、例えば、電子機器は、撮影機器や監視機器などの機器から、認識対象となる目標画像を取得することができる。認識対象となる目標画像は、文字認識待ちの画像であってもよい。目標画像には文字が含まれてもよく、文字は、１つの文字であってもよいし、文字列であってもよい。目標画像内の文字は、規則な文字（例えば、標準フォントで書かれたテキストなど）であってもよい。規則な文字は、すっきりとした配置、均一な大きさ、変形なし、遮蔽なしなどの特点を持つことができる。いくつかの実施形態において、目標画像内の文字は、不規則な文字（例えば、店の看板や広告の表紙上の芸術的なテキストなど）であってもよい。不規則な文字は、すっきりとしない配置、不均一な大きさ、変形あり、遮蔽ありなどの特点を持つことができる。 In the embodiment of the present invention, the electronic device can have an image collecting function and can collect a target image to be recognized. Alternatively, the electronic device may acquire the target image to be recognized from another device. For example, the electronic device acquires the target image to be recognized from a device such as a photographing device or a monitoring device. Can be done. The target image to be recognized may be an image waiting for character recognition. The target image may include characters, and the characters may be one character or a character string. The characters in the target image may be regular characters (for example, text written in a standard font). Regular letters can have features such as neat placement, uniform size, no deformation, and no shading. In some embodiments, the characters in the target image may be irregular characters (eg, artistic text on the cover of a store sign or advertisement). Irregular characters can have features such as uncluttered placement, uneven size, deformation, and shielding.

ステップＳ１２において、決定された位置ベクトル及び前記目標画像の第１画像特徴に基づいて、前記目標画像の文字特徴を取得し、ここで、前記位置ベクトルは、プリセット情報シーケンスにおける文字の位置特徴に基づいて決定されたものである。 In step S12, the character feature of the target image is acquired based on the determined position vector and the first image feature of the target image, where the position vector is based on the position feature of the character in the preset information sequence. It was decided.

本発明の実施例において、プリセット情報シーケンスの文字の位置特徴に基づいて、文字の位置特徴を表すための位置ベクトルを決定することができ、例えば、特定の長さのプリセット情報シーケンスを取得した後、プリセット情報シーケンス内の文字の位置特徴を抽出することができる。位置ベクトルは、文字の所在位置に関連付けられ、例えば、文字シーケンスにおける、認識対象となる文字の位置が３番目の文字位置である場合、位置ベクトルは、当該文字シーケンスでの、認識対象となる文字の位置、すなわち、３番目の文字位置を表すことができる。位置ベクトルと文字セマンティックとの間の相関性を減らすために、プリセット情報シーケンス内の文字は同じであってもよい。いくつかの実施形態において、プリセット情報シーケンス内の各文字を、セマンティックのない情報に設定することもできるため、位置ベクトルと文字セマンティックとの間の相関性を更に低減することができる。位置ベクトルは、文字セマンティックとの相関性が低いため、異なる目標画像の場合、位置ベクトルは、同じであっても、異なっていてもよい。 In an embodiment of the present invention, a position vector for representing a character position feature can be determined based on the character position feature of the preset information sequence, for example, after obtaining a preset information sequence of a specific length. , The position feature of the character in the preset information sequence can be extracted. The position vector is associated with the location position of the character. For example, when the position of the character to be recognized in the character sequence is the third character position, the position vector is the character to be recognized in the character sequence. The position of, that is, the position of the third character can be represented. The characters in the preset information sequence may be the same to reduce the correlation between the position vector and the character semantics. In some embodiments, each character in the preset information sequence can also be set to non-semantic information, further reducing the correlation between the position vector and the character semantics. Since the position vector has a low correlation with the character semantics, the position vector may be the same or different for different target images.

目標画像の第１画像特徴は、目標画像に対して画像抽出を実行することにより取得されたものであってもよく、例えば、ニューラルネットワークを用いて目標画像に対して少なくとも一回の畳み込み操作を実行して、目標画像の第１画像特徴を取得することができる。決定された位置ベクトル及び目標画像の第１画像特徴に従って、目標画像の文字特徴を決定することができ、例えば、決定された位置ベクトルと目標画像の第１画像特徴とを融合して、目標画像の文字特徴を取得することができる。ここで、文字特徴は、位置ベクトル及び第１画像特徴に基づいて取得されたものであるため、文字特徴への文字のセマンティックの影響が低い。 The first image feature of the target image may be obtained by performing image extraction on the target image, for example, using a neural network to perform at least one convolution operation on the target image. It can be executed to acquire the first image feature of the target image. The character features of the target image can be determined according to the determined position vector and the first image feature of the target image. For example, the determined position vector and the first image feature of the target image are fused to form the target image. Character features can be obtained. Here, since the character features are acquired based on the position vector and the first image feature, the influence of the character semantics on the character features is low.

ステップＳ１３において、前記文字特徴に基づいて前記目標画像内の文字を認識して、前記目標画像の文字認識結果を取得する。 In step S13, the characters in the target image are recognized based on the character features, and the character recognition result of the target image is acquired.

本発明の実施例において、ニューラルネットワークを用いて文字特徴を処理することができ、例えば、文字特徴に対して活性化操作を実行するか、又は、文字特徴をニューラルネットワークの全結合層に出力して全結合操作を実行して、目標画像の文字認識結果を取得することができる。文字認識結果は、目標画像内の文字の認識結果であり得る。目標画像が１つの文字を含む場合、文字認識結果は、１つの文字であり得る。目標画像が文字シーケンスを含む場合、文字認識結果は、１つの文字シーケンスであり得、文字認識結果の各文字の順序は、目標画像内の対応する文字の順序と同じである。 In the embodiment of the present invention, the character feature can be processed by using the neural network, for example, the activation operation is performed on the character feature, or the character feature is output to the fully connected layer of the neural network. The character recognition result of the target image can be obtained by executing the full join operation. The character recognition result can be the recognition result of the characters in the target image. If the target image contains one character, the character recognition result can be one character. If the target image contains a character sequence, the character recognition result can be one character sequence, and the order of each character in the character recognition result is the same as the order of the corresponding characters in the target image.

文字特徴によって取得される文字認識結果は、文字のセマンティックの影響を受けにくいため、文字間の意味的関連性が低い文字シーケンスの場合でも、より優れた認識効果を得ることができ、例えば、ライセンスプレート番号内の、意味的関連性のない文字シーケンスに対して文字認識を実行することができる。 Since the character recognition result obtained by the character feature is not easily affected by the semantics of the characters, a better recognition effect can be obtained even in the case of a character sequence having a low semantic relevance between the characters. For example, a license. Character recognition can be performed on character sequences that are not semantically relevant within the plate number.

上記のステップＳ１２において、決定された位置ベクトル及び目標画像の第１画像特徴に基づいて、目標画像の文字特徴を取得することができ、それによって、文字特徴へのセマンティックの影響を低減する。以下、目標画像の文字特徴を取得する実施形態を提供する。 In step S12 above, the character features of the target image can be acquired based on the determined position vector and the first image feature of the target image, thereby reducing the semantic influence on the character features. Hereinafter, an embodiment for acquiring the character features of the target image will be provided.

１つの可能な実施形態において、目標画像の第１画像特徴を符号化して、第１画像特徴の符号化結果を取得した後、第１画像特徴の符号化結果に従って、目標画像の第２画像特徴を決定し、決定された位置ベクトル、第１画像特徴及び第２画像特徴に基づいて、目標画像の文字特徴を取得することができる。 In one possible embodiment, the first image feature of the target image is encoded to obtain the encoded result of the first image feature, and then the second image feature of the target image is according to the encoded result of the first image feature. Is determined, and the character features of the target image can be acquired based on the determined position vectors, the first image feature, and the second image feature.

この実施形態では、ニューラルネットワークを用いて目標画像の第１画像特徴を符号化することができ、例えば、第１画像特徴に対して行ごと又は列ごとに符号化することができ、それによって、第１画像特徴に含まれる位置特徴を強調することができる。その後、第１画像特徴を符号化することにより取得された符号化結果に従って、目標画像の第２画像特徴を取得することができ、例えば、第１画像特徴と符号化結果とを融合して、目標画像の第２画像特徴を取得し、第２画像特徴は、第１画像特徴よりも強い位置特徴を持つ。その後、決定された位置ベクトル、第１画像特徴及び第２画像特徴に基づいて、目標画像の文字特徴を取得することができ、例えば、決定された位置ベクトル、第１画像特徴及び第２画像特徴を融合して目標画像の文字特徴を取得し、第２画像特徴はより強い位置特徴を持つため、取得された目標画像の文字特徴もより強い位置特徴を持ち、それにより、文字特徴に基づいて取得された文字認識結果がより正確であり、文字認識結果へのセマンティックの影響を更に軽減する。 In this embodiment, a neural network can be used to encode the first image feature of the target image, eg, the first image feature can be encoded row by row or column by column, thereby. The positional feature included in the first image feature can be emphasized. After that, the second image feature of the target image can be acquired according to the coding result obtained by encoding the first image feature. For example, the first image feature and the encoding result are fused to obtain the first image feature. The second image feature of the target image is acquired, and the second image feature has a stronger positional feature than the first image feature. After that, the character features of the target image can be acquired based on the determined position vector, the first image feature, and the second image feature. For example, the determined position vector, the first image feature, and the second image feature can be acquired. The character features of the target image are acquired by fusing, and since the second image feature has a stronger position feature, the character feature of the acquired target image also has a stronger position feature, thereby based on the character feature. The acquired character recognition result is more accurate, further reducing the semantic influence on the character recognition result.

上記の実施形態では、目標画像の第１画像特徴を符号化することにより、第１画像特徴に含まれる位置特徴を強調することができる。以下では、１つの例を介して、第１画像特徴の符号化結果を取得するプロセスについて説明する。 In the above embodiment, the positional feature included in the first image feature can be emphasized by encoding the first image feature of the target image. In the following, the process of acquiring the coding result of the first image feature will be described with reference to one example.

一例において、第１画像特徴の複数の第１次元特徴ベクトルに対して少なくとも１レベルの第１符号化処理を順次に実行して、第１画像特徴の符号化結果を取得することができる。 In one example, at least one level of first coding processing can be sequentially executed on a plurality of first-dimensional feature vectors of the first image feature, and the coding result of the first image feature can be obtained.

この例において、第１画像特徴は、複数の第１次元特徴ベクトルを含み得る。第１画像特徴は、複数の次元における特徴を含み得、例えば、第１画像特徴は、長さ、幅、深さなどの複数の次元を含み得る。異なる次元での特徴次元数は異なってもよい。第１次元特徴ベクトルは、１つの次元上の第１画像特徴の特徴であってもよく、例えば、第１次元特徴ベクトルは、長さ次元又は幅次元上の特徴であってもよい。第１符号化処理は、第１画像特徴への符号化であってもよく、それに対応して、ニューラルネットワークは、少なくとも１つの第１符号化層を含み得、第１符号化層に対応する符号化処理は、第１符号化処理であってもよい。ここで、ニューラルネットワークを用いて複数の第１次元特徴ベクトルに対して１レベル又は複数レベルの第１符号化処理を行い、複数の第１次元特徴ベクトルの処理結果を取得することができ、１つの第１次元特徴ベクトルは、１つの処理結果に対応することができ、その後、複数の第１次元特徴の複数の処理結果を合併して、第１画像特徴の符号化結果を形成することができる。第１画像特徴の複数の第１次元特徴ベクトルに対して、１レベル又は複数レベルの第１符号化処理を順次に実行することにより、第１画像特徴に含まれる位置特徴を強調することができ、取得された第１画像特徴の符号化結果は、文字間のより明白な位置特徴を持つことができる。 In this example, the first image feature may include a plurality of one-dimensional feature vectors. The first image feature may include features in multiple dimensions, for example, the first image feature may include features in multiple dimensions such as length, width, and depth. The number of feature dimensions in different dimensions may be different. The one-dimensional feature vector may be a feature of the first image feature on one dimension, for example, the first-dimensional feature vector may be a feature on the length dimension or the width dimension. The first coding process may be coding to the first image feature, correspondingly the neural network may include at least one first coding layer and corresponds to the first coding layer. The coding process may be the first coding process. Here, it is possible to perform one-level or multiple-level first coding processing on a plurality of first-dimensional feature vectors using a neural network, and obtain the processing results of the plurality of first-dimensional feature vectors. One one-dimensional feature vector can correspond to one processing result, and then a plurality of processing results of the plurality of first-dimensional features can be merged to form a coding result of the first image feature. can. Positional features included in the first image feature can be emphasized by sequentially executing one-level or multiple-level first coding processing on a plurality of first-dimensional feature vectors of the first image feature. , The obtained encoded result of the first image feature can have a more pronounced positional feature between the characters.

この例において、少なくとも１レベルの第１符号化処理における１レベルの第１符号化処理について、Ｎ（Ｎは、正の整数である）個の第１符号化ノードを用いて前記第１符号化ノードの入力情報に対して順次符号化を行い、Ｎ個の第１符号化ノードの出力結果を取得することができ、１＜ｉ≦Ｎである場合、ｉ（ｉは、正の整数である）番目の第１符号化ノードの入力情報は、ｉ－１番目の第１符号化ノードの出力結果を含む。Ｎ個の第１符号化ノードの出力結果に従って、第１画像特徴の符号化結果を取得する。 In this example, for the 1st level 1st coding process in at least the 1st level 1st coding process, the 1st coding process is performed using N (N is a positive integer) first coding nodes. The input information of the nodes is sequentially coded, and the output results of N first coded nodes can be obtained. When 1 <i ≦ N, i (i is a positive integer). The input information of the first coding node of the) th th code includes the output result of the first coding node of the i-1st. The coding result of the first image feature is acquired according to the output result of N first coding nodes.

この例において、ニューラルネットワークを用いて第１画像特徴に対して少なくとも１レベルの第１符号化処理を実行することにより、第１画像特徴の符号化結果を取得することができる。ニューラルネットワークは、少なくとも１レベルの第１符号化層を含み得、第１符号化層は、第１符号化処理を実行することができ、各レベルの第１符号化処理は、複数の符号化ノードによって実現される。第１符号化処理が複数レベルの処理である場合、各レベルの第１符号化処理に係る動作は同じであってもよい。少なくとも１レベルの第１符号化処理における１レベルの第１符号化処理について、Ｎ個の第１符号化ノードを用いて当該レベルの第１符号化処理の入力情報を符号化することができ、１つの第１符号化ノードは、１つの入力情報に対応することができ、異なる第１符号化ノードの入力情報は異なっていてもよい。それに対応して、１つの第１符号化ノードにより、１つの出力結果を取得することができる。第１レベルの第１符号化処理の第１符号化ノードの入力情報は、第１画像特徴の第１次元特徴ベクトルであってもよい。第１レベルの第１符号化処理における第１符号化ノードの出力結果は、第２レベルの第１符号化処理における、同じ順番を有する第１符号化ノードの入力情報として使用されることができ、最後のレベルの第１符号化処理まで同様である。最後のレベルの第１符号化処理における第１符号化ノードの出力結果は、上記の第１次元特徴ベクトルの処理結果であってもよい。１レベルの第１符号化処理は、Ｎ個の第１符号化ノードを含み得、１＜ｉ≦Ｎである場合、すなわち、第１符号化ノードが、現在のレベルの第１符号化処理における１番目の第１符号化ノード以外の第１符号化ノードである場合、第１符号化ノードの入力情報は、当該レベルの第１符号化処理における前の第１符号化ノードの出力結果を更に含み得、それによって、１番目の第１符号化ノードの入力情報を最後の第１符号化ノードへ転送することができるため、第１符号化ノードの入力情報を長期間記憶することができ、取得された出力結果をより正確にすることができる。 In this example, the coding result of the first image feature can be obtained by executing at least one level of the first coding process on the first image feature using the neural network. The neural network can include at least one level of the first coding layer, the first coding layer can perform the first coding process, and each level of the first coding process has a plurality of coding processes. Realized by nodes. When the first coding process is a plurality of levels of processing, the operations related to the first coding process of each level may be the same. For the 1st level 1st coding process in at least the 1st level 1st coding process, the input information of the 1st level coding process can be encoded by using N 1st coding nodes. One first coded node can correspond to one input information, and the input information of different first coded nodes may be different. Correspondingly, one output result can be acquired by one first coding node. The input information of the first coding node of the first level first coding process may be the first-dimensional feature vector of the first image feature. The output result of the first coding node in the first level first coding process can be used as the input information of the first coding node having the same order in the second level first coding process. , The same applies to the first level of the first coding process. The output result of the first coding node in the first level first coding processing may be the processing result of the above-mentioned one-dimensional feature vector. The one-level first coding process can include N first coding nodes, where 1 <i ≦ N, that is, the first coding node is in the current level of the first coding process. In the case of a first coding node other than the first first coding node, the input information of the first coding node further adds the output result of the previous first coding node in the first coding process of the relevant level. It can include, thereby transferring the input information of the first first coded node to the last first coded node, so that the input information of the first coded node can be stored for a long period of time. The acquired output result can be made more accurate.

図２は、本発明の実施例に係る、目標画像の第２画像特徴を決定する一例を示すブロック図を示す。この例において、ニューラルネットワーク（例えば、長・短期記憶（ＬＳＴＭ：ＬｏｎｇＳｈｏｒｔ－ＴｅｒｍＭｅｍｏｒｙ）ネットワークなど）を用いて目標画像の第１画像特徴Ｆを符号化することができる。ニューラルネットワークは、２つの第１符号化層を含み得、各第１符号化層は、複数の第１符号化ノード（図２の符号化ノードに対応する）を含み得る。ここで、目標画像の第１画像特徴Ｆをニューラルネットワークの第１符号化層に入力し、第１符号化層の複数の第１符号化ノードを用いて第１画像特徴Ｆの複数の第１次元特徴ベクトル（幅次元の特徴ベクトル）をそれぞれ符号化して、各第１符号化ノードの出力結果を取得することができる。ここで、１番目の第１符号化ノードの入力情報は、１番目の第１次元特徴ベクトルであり、２番目の第１符号化ノードの入力情報は、１番目の第１符号化ノードの出力結果及び２番目の第１次元特徴ベクトルであり、これによって類推すれば、最後の第１符号化ノードの出力結果を取得することができる。複数の第１符号化点の出力結果を２番目の第１符号化層に入力し、２番目の第１符号化層の処理は、１番目の第１符号化層の処理と同様であり、ここでは繰り返して説明しない。最終的には、第１画像特徴の符号化結果Ｆ^２を取得することができる。その後、第１画像特徴Ｆ及び第１画像特徴の符号化結果Ｆ^２に対して特徴融合（特上の加算や合併などであってもよい）を実行して、目標画像の第２画像特徴

を取得することができる。 FIG. 2 shows a block diagram showing an example of determining a second image feature of a target image according to an embodiment of the present invention. In this example, a neural network (eg, a long short-term memory (LSTM) network, etc.) can be used to encode the first image feature F of the target image. The neural network may include two first coding layers, each first coding layer may contain a plurality of first coding nodes (corresponding to the coding nodes of FIG. 2). Here, the first image feature F of the target image is input to the first coding layer of the neural network, and the plurality of first coding nodes of the first image feature F are used by using the plurality of first coding nodes of the first coding layer. The output result of each first coded node can be obtained by encoding each of the dimensional feature vectors (width-dimensional feature vectors). Here, the input information of the first first coding node is the first one-dimensional feature vector, and the input information of the second first coding node is the output of the first first coding node. It is the result and the second one-dimensional feature vector, and by analogy with this, the output result of the last first coding node can be obtained. The output results of the plurality of first coding points are input to the second first coding layer, and the processing of the second first coding layer is the same as the processing of the first first coding layer. It will not be explained repeatedly here. Finally, the coding result F2 of the ^first image feature can be obtained. After that, feature fusion (may be special addition or merger) is executed on the first image feature F and the coding result F ² of the first image feature, and the second image feature of the target image is executed.

Can be obtained.

ここで、２層のＬＳＴＭを使用して目標画像の第１画像特徴Ｆを符号化することを例にとると、次の式により、第１画像特徴Ｆによって第２画像特徴

を取得することができる。

式（１）

式（２）

式（３） Here, taking the example of encoding the first image feature F of the target image using the two-layer LSTM as an example, the second image feature F is used by the first image feature F according to the following equation.

Can be obtained.

Equation (1)

Equation (2)

Equation (3)

ここで、

は、位置（ｉ，ｊ）における第１画像特徴Ｆの特徴ベクトル（第１次元特徴ベクトル）であり得、

は、位置（ｉ，ｊ）における１番目の第１符号化層の出力結果Ｆ¹の特徴ベクトルを表すことができ、

は、位置（ｉ，ｊ－１）における出力結果Ｆ¹の特徴ベクトルを表すことができ、

は、位置（ｉ，ｊ）における符号化結果Ｆ^２の特徴ベクトルを表すことができ、

は、位置（ｉ，ｊ－１）における符号化結果Ｆ^２の特徴ベクトルを表すことができ、

は、取得された第２画像特徴を表すことができ、

は、ベクトルの加算演算を表すことができる。ここで、ｉ及びｊは、自然数である。 here,

Can be a feature vector (first dimension feature vector) of the first image feature F at position (i, j).

Can represent the feature vector of the output result F ¹ of the first first coding layer at position (i, j).

Can represent the feature vector of the output result F ¹ at position (i, j-1).

Can represent the feature vector of the coding result F2 at position ⁽ i, j).

Can represent the feature vector of the coding result F2 at position (i, j ^- 1).

Can represent the acquired second image feature,

Can represent a vector addition operation. Here, i and j are natural numbers.

上記の実施形態において、決定された位置ベクトル、第１画像特徴及び第２画像特徴に基づいて、目標画像の文字特徴を取得することができる。以下では、１つの例を介して、目標画像の文字特徴を取得するプロセスについて説明する。 In the above embodiment, the character features of the target image can be acquired based on the determined position vector, the first image feature, and the second image feature. In the following, the process of acquiring the character features of the target image will be described with reference to one example.

一例において、決定された位置ベクトル及び第２画像特徴に従って、アテンション重みを決定し、次に、アテンション重みを用いて第１画像特徴に対して特徴加重を行い、目標画像の文字特徴を取得することができる。 In one example, the attention weight is determined according to the determined position vector and the second image feature, and then the feature weight is applied to the first image feature using the attention weight to acquire the character feature of the target image. Can be done.

一例において、位置ベクトルと第２画像特徴は両方とも顕著な位置特徴を含むため、位置ベクトル及び第２画像特徴に従ってアテンション重みを決定することができ、例えば、位置ベクトルと第２画像特徴との間の相関性を決定し、当該相関性に従って、アテンション重みを決定することができる。位置ベクトルと第２画像特徴との間の相関性は、位置ベクトルと第２画像特徴との内積によって得ることができる。決定されたアテンション重みを用いて、第１画像特徴に対して特徴加重を行うことができ、例えば、アテンション重みと第１画像特徴とを乗算してその合計を求めることにより、目標画像の文字特徴を取得することができる。アテンション重みを用いて、第１画像特徴内の注意すべき特徴を強調できるため、アテンション重みを用いて第１画像特徴に対して特徴加重を実行することにより、取得された文字特徴は、第１画像特徴のより重要な特徴部分をより正確に反映できる。 In one example, since both the position vector and the second image feature include prominent position features, the attention weight can be determined according to the position vector and the second image feature, eg, between the position vector and the second image feature. The correlation can be determined, and the attention weight can be determined according to the correlation. The correlation between the position vector and the second image feature can be obtained by the inner product of the position vector and the second image feature. Using the determined attention weight, feature weighting can be performed on the first image feature. For example, by multiplying the attention weight and the first image feature to obtain the total, the character feature of the target image is obtained. Can be obtained. Since the attention weight can be used to emphasize the feature to be noted in the first image feature, the character feature acquired by performing the feature weighting on the first image feature using the attention weight is the first. More important feature parts of image features can be reflected more accurately.

この例において、アテンション重みは、以下の式（４）により決定できる。

式（４） In this example, the attention weight can be determined by the following equation (4).

Equation (4)

ここで、

は、アテンション重みを表し、

は、活性化関数を表し、

は、位置ベクトル

の転置を表し、

は、特徴位置（ｉ，ｊ）における第２画像特徴

の特徴ベクトルを表す。上記の式（４）を使用して、位置ベクトル及び第２画像に従って、特徴アテンション重みを決定することができる。 here,

Represents the attention weight

Represents the activation function

Is the position vector

Represents the transpose of

Is the second image feature at the feature position (i, j)

Represents the feature vector of. The feature attention weight can be determined according to the position vector and the second image using the above equation (4).

この例において、文字特徴は、以下の式（５）により決定できる。

式（５） In this example, the character feature can be determined by the following equation (5).

Equation (5)

ここで、

は、文字特徴を表し、

は、アテンション重みを表し、

は、特徴位置（ｉ，ｊ）における第１画像特徴Fの特徴ベクトルを表す。上記の式（５）を使用して、アテンション重み及び第１画像特徴に基づいて文字特徴を取得することができる。 here,

Represents a character feature,

Represents the attention weight

Represents the feature vector of the first image feature F at the feature position (i, j). Using the above equation (5), character features can be obtained based on the attention weight and the first image feature.

上記の実施形態において、決定された位置ベクトル及び第２画像特徴に従って、アテンション重みを決定することができる。位置ベクトルは、文字の位置特徴、すなわち、文字間の相対位置を表すことがきる。以下では、１つの実施形態を介して、位置ベクトルを決定するプロセスについて説明する。 In the above embodiment, the attention weight can be determined according to the determined position vector and the second image feature. The position vector can represent the position feature of the character, that is, the relative position between the characters. The process of determining the position vector via one embodiment will be described below.

１つの可能な実施形態において、少なくとも１つの第１プリセット情報を含むプリセット情報シーケンスを取得した後、少なくとも１つの第１プリセット情報に対して少なくとも１レベルの第２符号化処理を順次に行い、位置ベクトルを取得することができる。 In one possible embodiment, after acquiring a preset information sequence containing at least one first preset information, at least one level of second coding processing is sequentially performed on the at least one first preset information to obtain a position. You can get the vector.

この実施形態では、プリセット情報シーケンスは、１つ又は複数の第１プリセット情報を含み得る。第１プリセット情報は、実際の場合に応じて設定された情報であってもよく、特定の意味を有しないものであってもよい。例えば、第１プリセット情報は、カウント命令であってもよい。ニューラルネットワークを用いて、少なくとも１つの第１プリセット情報に対して、１レベル又は複数レベルの第２符号化処理を順次に行い、位置ベクトルを取得することができる。少なくとも１つの第１プリセット情報は同じでありかつ特定の意味を持たないため、少なくとも１つの第１プリセット情報間の意味的関連性低く、これにより、少なくとも１つの第１プリセット情報に対して１レベル又は複数レベルの第２符号化処理を行うことによって取得された位置ベクトルは、意味的関連性が低い。また、ニューラルネットワークを用いて少なくとも１つの第１プリセット情報に対して第２符号化処理を行うプロセスでは、少なくとも１つの第１プリセット情報を順次に符号化するため、生成された位置ベクトルは、少なくとも１つの第１プリセット情報の順序に関連しており（すなわち、少なくとも１つの第１プリセット情報間の位置に関連していると理解できる）、それによって、位置ベクトルは、文字間の位置特徴を表すことができる。 In this embodiment, the preset information sequence may include one or more first preset information. The first preset information may be information set according to an actual case, or may have no specific meaning. For example, the first preset information may be a count instruction. Using a neural network, a position vector can be acquired by sequentially performing a one-level or a plurality of levels of second coding processing on at least one first preset information. Since at least one first preset information is the same and has no specific meaning, the semantic relevance between at least one first preset information is low, so that one level for at least one first preset information. Alternatively, the position vector obtained by performing the second coding process of a plurality of levels has low semantic relevance. Further, in the process of performing the second coding process on at least one first preset information using the neural network, at least one first preset information is sequentially encoded, so that the generated position vector is at least It is related to the order of one first preset information (ie, it can be understood that it is related to the position between at least one first preset information), whereby the position vector represents the position feature between characters. be able to.

本実施形態の一例において、少なくとも１レベルの第２符号化処理における１レベルの第２符号化処理について、Ｍ個の第２符号化ノードを用いて、第２符号化ノードの入力情報を順次に符号化して、Ｍ個の第２符号化ノードの出力結果を取得することができる。１＜ｊ≦Ｍ（Ｍ及びｊは、正の整数である）である場合、ｊ番目の第２符号化ノードの入力情報は、ｊ－１番目の第２符号化ノードの出力結果を含む。Ｍ個の第２符号化ノードの出力結果に従って、位置ベクトルを取得する。 In an example of the present embodiment, for the 1st level 2nd coding process in the 1st level 2nd coding process, the input information of the 2nd coding node is sequentially input by using M 2nd coding nodes. By encoding, the output results of M second coding nodes can be obtained. When 1 <j≤M (M and j are positive integers), the input information of the jth second coding node includes the output result of the j-1st second coding node. The position vector is acquired according to the output result of the M second coding nodes.

この例では、ニューラルネットワークを用いて、少なくとも１つの第１プリセット情報に対して、１レベル又は複数レベルの第２符号化処理を順次に行い、位置ベクトルを取得することができる。第２符号化処理が複数レベルの処理である場合、各レベルの第２符号化処理に係る動作は同じであってもよい。少なくとも１レベルの第２符号化処理における１レベルの第２符号化処理について、Ｍ個の第２符号化ノードを用いて当該レベルの第２符号化処理の入力情報を符号化することができ、１つの第２符号化ノードは、１つの入力情報に対応することができ、異なる第２符号化ノードの入力情報は異なっていてもよい。それに対応して、１つの第２符号化ノードにより、１つの出力結果を取得することができる。第１レベルの第２符号化処理の１つの第２符号化ノードの入力情報は、１つの第１プリセット情報であってもよい。第１レベルの第２符号化処理における第２符号化ノードの出力結果は、第２レベルの第２符号化処理における、同じ順番を有する第２符号化ノードの入力情報として使用されることができ、最後のレベルの第２符号化処理まで同様である。最後のレベルの第２符号化処理における最後の第２符号化ノードの出力結果は、位置ベクトルとして使用されることができ、又は、最後のレベルの第２符号化処理における最後の第２符号化ノードの出力結果に対して畳み込み処理やプーリング処理などの更なる処理を行い、位置ベクトルを取得することができる。１レベルの第２符号化処理は、Ｍ個の第２符号化ノードを含み得、１＜ｊ≦Ｍである場合、すなわち、第２符号化ノードが、現在のレベルの第２符号化処理における１番目の第２符号化ノード以外の第２符号化ノードである場合、第２符号化ノードの入力情報は、当該レベルの第２符号化処理における前の第２符号化ノードの出力結果を更に含み得、それによって、１番目の第２符号化ノードの入力情報を最後の第２符号化ノードへ転送することができるため、第２符号化ノードの入力情報を長期間記憶することができ、取得された位置ベクトルをより正確にすることができる。 In this example, using a neural network, a position vector can be acquired by sequentially performing a one-level or a plurality of levels of second coding processing on at least one first preset information. When the second coding process is a plurality of levels of processing, the operations related to the second coding process of each level may be the same. For the 1st level 2nd coding process in at least 1st level 2nd coding process, the input information of the 1st level 2nd coding process can be encoded by using M 2nd coding nodes. One second coding node can correspond to one input information, and the input information of different second coding nodes may be different. Correspondingly, one output result can be acquired by one second coding node. The input information of one second coding node of the first level second coding process may be one first preset information. The output result of the second coding node in the first level second coding process can be used as the input information of the second coding node having the same order in the second level second coding process. , The same applies to the second coding process at the final level. The output result of the last second coding node in the last level second coding process can be used as a position vector or the last second coding in the last level second coding process. The position vector can be acquired by performing further processing such as convolution processing and pooling processing on the output result of the node. The one-level second coding process can include M second coding nodes, where 1 <j ≦ M, that is, the second coding node is in the current level second coding process. In the case of a second coding node other than the first second coding node, the input information of the second coding node further adds the output result of the previous second coding node in the second coding process of the relevant level. It can include, thereby transferring the input information of the first second coded node to the last second coded node, so that the input information of the second coded node can be stored for a long period of time. The acquired position vector can be made more accurate.

ここで、第１プリセット情報が定数「＜ｎｅｘｔ＞」であり、第２符号化処理が２層のＬＳＴＭであることを例にとると、以下の式（６）及び式（７）により位置ベクトル

を決定することができる。

式（６）

式（７） Here, taking as an example that the first preset information is a constant "<next>" and the second coding process is a two-layer LSTM, the position vector is based on the following equations (6) and (7).

Can be determined.

Equation (6)

Equation (7)

ここで、

は、第１レベルの第２符号化処理におけるｔ番目の第２符号化ノードの出力結果を表すことができ、

は、第１レベルの第２符号化処理におけるｔ－１番目の第２符号化ノードの出力結果を表すことができ、

は、第２レベルの第２符号化処理におけるｔ番目の第２符号化ノードの出力結果（すなわち、位置ベクトル）を表すことができ、

は、第２レベルの第２符号化処理におけるｔ－１番目の第２符号化ノードの出力結果を表すことができる。ここで、ｔは自然数である。 here,

Can represent the output result of the t-th second coding node in the first level second coding process.

Can represent the output result of the t-1st second coding node in the first level second coding process.

Can represent the output result (ie, position vector) of the t-th second coding node in the second level second coding process.

Can represent the output result of the t-1st second coding node in the second level second coding process. Here, t is a natural number.

留意されたいこととして、少なくとも１つの第１プリセット情報によって位置ベクトルを取得するプロセスは、図２に示されたニューラルネットワークによって実現でき、ここで、位置ベクトルは、複数の第２符号化ノードの出力結果によって形成されたものではなく、第２レベルの第２符号化処理における最後の第２符号化ノードの出力結果であり得る。 It should be noted that the process of acquiring the position vector with at least one first preset information can be implemented by the neural network shown in FIG. 2, where the position vector is the output of the plurality of second coded nodes. It may not be formed by the result, but may be the output result of the last second coding node in the second level second coding process.

上記のステップＳ１３において、文字特徴に基づいて目標画像内の文字を認識して、目標画像の文字認識結果を取得することができる。文字認識結果の精度を向上させるために、目標画像内の文字を認識するプロセスで、目標画像内の文字のセマンティック特徴を考慮することもできる。以下では、１つの実施形態を介して、目標画像の文字認識結果を取得するプロセスについて説明する。 In step S13 described above, the characters in the target image can be recognized based on the character features, and the character recognition result of the target image can be acquired. In order to improve the accuracy of the character recognition result, the semantic characteristics of the characters in the target image can also be considered in the process of recognizing the characters in the target image. Hereinafter, the process of acquiring the character recognition result of the target image will be described through one embodiment.

１つの可能な実施形態において、目標画像のセマンティック特徴を抽出した後、目標画像のセマンティック特徴及び文字特徴に基づいて、目標画像の文字認識結果を取得することができる。 In one possible embodiment, after extracting the semantic features of the target image, the character recognition result of the target image can be obtained based on the semantic features and the character features of the target image.

この実施形態では、目標画像のセマンティック特徴を抽出することができ、例えば、いくつかのシナリオにおけるセマンティック抽出モデルを用いて目標画像のセマンティック特徴を抽出し、その後、目標画像のセマンティック特徴と文字特徴とを融合して、融合結果を取得することができる。例えば、セマンティック特徴と文字特徴とを繋ぎ合わせるか、又は、セマンティック特徴と文字特徴とを繋ぎ合わせた後、更に特徴加重を行い、融合結果を取得することができる。ここで、特徴加重の重みは、事前設定されてもよいし、セマンティック特徴及び文字特徴に従って計算されてもよい。その後、当該融合結果に従って目標画像の文字認識結果を取得することができ、例えば、融合結果に対して少なくとも一回の畳み込み操作、全結合操作などを行い、目標画像の文字認識結果を取得することができる。このように、目標画像の文字認識結果を取得するプロセスにおいて、セマンティック特徴と文字特徴とを組み合わせることにより、文字認識結果の精度を向上させることができる。 In this embodiment, the semantic features of the target image can be extracted, for example, the semantic features of the target image can be extracted using the semantic extraction model in some scenarios, followed by the semantic features and character features of the target image. Can be fused to obtain the fusion result. For example, the semantic feature and the character feature can be connected, or the semantic feature and the character feature can be connected and then the feature weighting can be further performed to obtain the fusion result. Here, the feature weighting may be preset or calculated according to semantic features and character features. After that, the character recognition result of the target image can be acquired according to the fusion result. For example, the fusion result is subjected to at least one convolution operation, a fully combined operation, and the like to acquire the character recognition result of the target image. Can be done. In this way, in the process of acquiring the character recognition result of the target image, the accuracy of the character recognition result can be improved by combining the semantic feature and the character feature.

例えば、セマンティック特徴は、

で表すことができ、文字特徴は、

で表すことができ、以下の式（８）及び式（９）により、セマンティック特徴と文字特徴との融合結果を取得することができる。

式（８）

式（９） For example, the semantic feature is

Can be represented by, character features,

It can be expressed by, and the fusion result of the semantic feature and the character feature can be obtained by the following equations (8) and (9).

Equation (8)

Equation (9)

ここで、

は、融合結果を表すことができ、ｗ_ｔは、セマンティック特徴

及び文字特徴

に対して特徴加重を行うための重みを表すことができ、

は、第１マッピング行列を表すことができ、ここで、第１マッピング行列を用いてセマンティック特徴

と文字特徴

を二次元ベクトル空間にマッピングすることができ、

は、第１バイアス項を表すことができる。 here,

Can represent the fusion result, _wt is a semantic feature

And character features

Can represent the weight for performing feature weighting on

Can represent a first mapping matrix, where semantic features using the first mapping matrix

And character features

Can be mapped to a two-dimensional vector space,

Can represent the first bias term.

融合結果

を取得した後、以下の式（１０）により、目標画像の文字認識結果を取得することができる。

式（１０） Fusion result

After obtaining the above, the character recognition result of the target image can be obtained by the following equation (10).

Equation (10)

ここで、

は、文字認識結果を表すことができ、Ｗは、第２マッピング行列を表すことができ、ここで、第２マッピング行列を用いて融合結果

に対して線形変換を行うことができ、ｂは、第２バイアス項であり得る。 here,

Can represent the character recognition result, W can represent the second mapping matrix, where the fusion result using the second mapping matrix.

Can be linearly transformed against, where b can be the second bias term.

本実施形態の一例において、取得された第２プリセット情報に基づいて、少なくとも１つのタイムステップにおける目標画像のセマンティック特徴を順次に決定し、その後、目標画像在少なくとも１つのタイムステップのセマンティック特徴及び文字特徴に基づいて、少なくとも１つのタイムステップにおける目標画像の文字認識結果を取得することができる。 In one example of the present embodiment, the semantic features of the target image in at least one time step are sequentially determined based on the acquired second preset information, and then the semantic features and characters of the target image in at least one time step are determined. Based on the features, the character recognition result of the target image in at least one time step can be acquired.

この例では、取得された第２プリセット情報は、実際の場合に応じて選択でき、第２プリセット情報は、特定の意味を有しないものであってもよい。例えば、第２プリセット情報は、開始命令であってもよい。タイムステップのステップ幅は、実際の必要に応じて設定できる。タイムステップ毎に、１つのセマンティック特徴を決定することができ、異なるタイムステップによって取得されたセマンティック特徴は異なっていてもよい。ここで、ニューラルネットワークを用いて第２プリセット情報を符号化して、少なくとも１つのタイムステップのセマンティック特徴を順次に取得し、その後、少なくとも１つのタイムステップにおける目標画像のセマンティック特徴及び少なくとも１つのタイムステップにおける目標画像の文字特徴に従って、少なくとも１つのタイムステップにおける目標画像の文字認識結果を取得することができる。１つのタイムステップのセマンティック特徴及び同じタイムステップの文字特徴は、１つのタイムステップの文字認識結果に対応することができ、つまり、目標画像に複数の文字が存在する場合、文字の位置（文字特徴）及びセマンティック（セマンティック特徴）に従って文字認識結果を順次に取得できるため、文字認識結果の精度を向上させることができる。 In this example, the acquired second preset information can be selected according to the actual case, and the second preset information may not have a specific meaning. For example, the second preset information may be a start command. The step width of the time step can be set according to the actual need. One semantic feature can be determined for each time step, and the semantic features acquired by different time steps may be different. Here, the second preset information is encoded using a neural network to sequentially acquire the semantic features of at least one time step, and then the semantic features of the target image at at least one time step and at least one time step. According to the character characteristics of the target image in, the character recognition result of the target image in at least one time step can be acquired. The semantic feature of one time step and the character feature of the same time step can correspond to the character recognition result of one time step, that is, when multiple characters are present in the target image, the position of the character (character feature). ) And semantics (semantic features), the character recognition results can be acquired in sequence, so that the accuracy of the character recognition results can be improved.

この例では、第２プリセット情報に対して少なくとも１レベルの第３符号化処理を行い、少なくとも１つのタイムステップのうちの１番目のタイムステップのセマンティック特徴を取得し、次に、ｋ－１番目のタイムステップにおける目標画像の文字認識結果に対して少なくとも１レベルの第３符号化処理を行い、ｋ番目のタイムステップにおける目標画像のセマンティック特徴を取得することができる。ここで、ｋは、１より大きい整数である。 In this example, the second preset information is subjected to at least one level of third coding processing to acquire the semantic characteristics of the first time step of at least one time step, and then the k-1th. The character recognition result of the target image in the time step of 1 can be subjected to at least one level of third coding processing, and the semantic features of the target image in the kth time step can be acquired. Here, k is an integer greater than 1.

この例では、第２プリセット情報を、ニューラルネットワークの少なくとも１レベルの第３符号化処理の入力情報として使用することができる。各レベルの第３符号化処理は、複数の第３符号化ノードを含み得、各第３符号化ノードは、１つのタイムステップの入力情報に対応することができる。異なる第３符号化ノードの入力情報は異なっていてもよい。それに対応して、１つの第３符号化ノードにより、１つの出力結果を取得することができる。第１レベルの第３符号化処理における１番目の第３符号化ノードの入力情報は、第２プリセット情報であってもよい。第１レベルの第３符号化処理における第３符号化ノードの出力結果は、第２レベルの第３符号化処理における、同じ順番を有する第３符号化ノードの入力情報として使用されることができ、最後のレベルの第２符号化処理まで同様である。このようにして、第２プリセット情報に対して少なくとも１レベルの第３符号化処理を行い、最後のレベルの第３符号化処理の１番目の第３符号化ノードの出力結果を取得することができ、当該出力結果は、少なくとも１つのタイムステップのうちの１番目のタイムステップのセマンティック特徴であってもよい。更に、１番目のタイムステップのセマンティック特徴及び同じタイムステップの文字特徴に従って、１番目のタイムステップの文字認識結果を取得することができる。第１レベルの第３処理の２番目の第３符号化ノードの入力情報は、１番目のタイムステップの文字認識結果であってもよい。その後、１番目のタイムステップの文字認識結果に対して少なくとも１レベルの第３符号化処理を行い、２番目のタイムステップのセマンティック特徴を取得することができる。更に、２番目のタイムステップのセマンティック特徴及び同じタイムステップの文字特徴に従って、２番目のタイムステップの文字認識結果を取得することができる。最後のレベルの第３符号化処理まで同様である。最後のレベルの第３符号化処理において、最後の第３符号化ノードの出力結果は、最後のタイムステップのセマンティック特徴であってもよい。つまり、ｋ－１番目のタイムステップにおける目標画像の文字認識結果に対して少なくとも１レベルの第３符号化処理を行い、ｋ番目のタイムステップにおける目標画像のセマンティック特徴を取得することができる。ｋが１より大きい整数である場合、すなわち、第３符号化ノードが、現在のレベルの第３符号化処理における１番目の第３符号化ノード以外の第３符号化ノードである場合、第３符号化ノードの入力情報は、当該レベルの第３符号化処理における前の第３符号化ノードの出力結果を更に含み得、それによって、前の順番の第３符号化ノードの入力情報を後の順番の第３符号化ノードへ転送することができ、それにより、第３符号化ノードの入力情報を長期間記憶することができ、取得されたセマンティック特徴をより正確にすることができる。 In this example, the second preset information can be used as input information for at least one level of the third coding process of the neural network. The third coding process at each level may include a plurality of third coding nodes, and each third coding node can correspond to the input information of one time step. The input information of different third coded nodes may be different. Correspondingly, one output result can be acquired by one third coding node. The input information of the first third coding node in the first level third coding process may be the second preset information. The output result of the third coding node in the first level third coding process can be used as the input information of the third coding node having the same order in the second level third coding process. , The same applies to the second coding process at the final level. In this way, it is possible to perform at least one level of third coding processing on the second preset information and acquire the output result of the first third coding node of the last level third coding processing. The output result may be a semantic feature of the first time step of at least one time step. Further, the character recognition result of the first time step can be acquired according to the semantic feature of the first time step and the character feature of the same time step. The input information of the second third coding node of the third process of the first level may be the character recognition result of the first time step. After that, at least one level of third coding processing can be performed on the character recognition result of the first time step, and the semantic features of the second time step can be acquired. Further, according to the semantic feature of the second time step and the character feature of the same time step, the character recognition result of the second time step can be acquired. The same applies to the final level of the third coding process. In the third level of third coding processing, the output result of the last third coding node may be a semantic feature of the last time step. That is, at least one level of third coding processing can be performed on the character recognition result of the target image in the k-1st time step, and the semantic features of the target image in the kth time step can be acquired. A third if k is an integer greater than 1, i.e., if the third coded node is a third coded node other than the first third coded node in the current level of third coded processing. The input information of the coding node may further include the output result of the previous third coding node in the third coding process of the level, thereby the input information of the third coding node in the previous order later. It can be transferred to the third coded node in order, whereby the input information of the third coded node can be stored for a long period of time, and the acquired semantic features can be made more accurate.

留意されたいこととして、第２プリセット情報によりセマンティック特徴を決定するプロセスは、図２に示されたニューラルネットワークによって実現でき、ここで、ｋ番目のタイムステップのセマンティック特徴は、第２レベルの第３符号化処理のｋ番目の第３符号化ノードの出力結果であってもよい。 It should be noted that the process of determining the semantic features with the second preset information can be realized by the neural network shown in FIG. 2, where the semantic features of the kth time step are the third level of the second level. It may be the output result of the kth third coding node of the coding process.

本発明の実施例では、ニューラルネットワークを用いて目標画像の文字認識結果を取得することができる。以下では、１つ例を介して、ニューラルネットワークを用いて目標画像の文字認識結果を取得するプロセスについて説明する。 In the embodiment of the present invention, the character recognition result of the target image can be acquired by using the neural network. In the following, the process of acquiring the character recognition result of the target image using the neural network will be described with reference to one example.

図３は、本発明の実施例に係る、ニューラルネットワークを用いて文字認識結果を取得する一例を示すブロック図である。この例では、ニューラルネットワークは、エンコーダ及びデコーダを含み得る。先ず、目標画像をニューラルネットワークのエンコーダに入力し、エンコーダを用いて目標画像の画像特徴を抽出することにより、目標画像の第１画像特徴Ｆを取得することができる。ここで、３１層の残差ニューラルネットワーク（ＲｅｓＮｅｔ：ＲｅｓｉｄｕａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）のネットワークアーキテクチャを用いて目標画像に対して画像特徴抽出を実行することができる。エンコーダは、位置情報強調モジュールを含み得、位置情報強調モジュールを用いて第１画像特徴の位置情報を強調して、目標画像の第２画像特徴

を取得することができ、位置情報強調モジュールのネットワークアーキテクチャは、図２に示すとおりであり得る。その後、第２画像特徴

をデコーダのアテンションモジュールに入力し、アテンションモジュールを用いて第２画像特徴

と位置ベクトル

とに対して行列乗算及び活性化操作を行い、アテンション重みを取得し、次に、アテンション重みを用いて第１画像特徴Ｆに対して特徴加重を行い（すなわち、アテンション重みと第１画像特徴とに対して行列乗算を行う）、目標画像の文字特徴を受信することができる。デコーダは更に、動的融合モジュールを備え、動的融合モジュールを用いて、文字特徴とセマンティック特徴とを融合することができ、その後、融合結果を全結合層に入力して、文字認識結果を取得することができる。 FIG. 3 is a block diagram showing an example of acquiring a character recognition result using a neural network according to an embodiment of the present invention. In this example, the neural network may include encoders and decoders. First, the first image feature F of the target image can be acquired by inputting the target image into the encoder of the neural network and extracting the image feature of the target image using the encoder. Here, image feature extraction can be performed on the target image using the network architecture of a 31-layer residual neural network (ResNet: Residual Neural Network). The encoder may include a position information enhancement module, which uses the position information enhancement module to emphasize the position information of the first image feature to enhance the position information of the second image feature of the target image.

The network architecture of the location information enhancement module can be as shown in FIG. After that, the second image feature

Is input to the attention module of the decoder, and the second image feature is used using the attention module.

And position vector

Matrix multiplication and activation operations are performed on the image to obtain the attention weight, and then the feature weight is applied to the first image feature F using the attention weight (that is, the attention weight and the first image feature). Matrix multiplication is performed on the target image), and the character features of the target image can be received. The decoder is further equipped with a dynamic fusion module, which can be used to fuse character features and semantic features, after which the fusion result is input to the fully coupled layer to obtain the character recognition result. can do.

ここで、デコーダは更に、位置符号化モジュールを備え、複数の定数「＜ｎｅｘｔ＞」（第１プリセット情報）を位置符号化モジュールに順次に入力することができ、つまり、各タイムステップに１つの定数「＜ｎｅｘｔ＞」を入力することができる。位置符号化モジュールは、２つの符号化層（第１符号化処理に対応する）を含み得、入力された「＜ｎｅｘｔ＞」を符号化して、ｔ番目のタイムステップの位置ベクトル

を取得することができる。ここで、位置符号化モジュールは、２つの符号化層を含み得る。デコーダは更に、セマンティックモジュールを備え、１番目のタイムステップの入力情報として１つの特殊トークン「＜ｓｔａｒｔ＞」（第２プリセット情報）をセマンティックモジュールに入力して、セマンティックモジュールによって出力された１番目のタイムステップのセマンティック特徴を取得することができる。その後、１番目のタイムステップの文字認識結果

を、セマンティックモジュールの２番目のタイムステップの入力情報として使用でき、セマンティックモジュールによって出力された２番目のタイムステップのセマンティック特徴を取得し、以下同様に、ｔ番目のタイムステップにおけるセマンティックモジュールによって出力されたセマンティック特徴

を取得することができる。セマンティックモジュールは、２層の符号化層を含み得る。位置符号化モジュール及びセマンティックモジュールのネットワークアーキテクチャは、図２のネットワークアーキテクチャと同様であってもよく、ここでは繰り返して説明しない。 Here, the decoder further includes a position coding module, and a plurality of constants "<next>" (first preset information) can be sequentially input to the position coding module, that is, one for each time step. The constant "<next>" can be input. The position coding module may include two coding layers (corresponding to the first coding process), encode the input "<next>", and position vector of the t-th time step.

Can be obtained. Here, the position coding module may include two coding layers. The decoder also includes a semantic module, which inputs one special token "<start>" (second preset information) into the semantic module as input information for the first time step, and is output by the semantic module. You can get the semantic features of the time step. After that, the character recognition result of the first time step

Can be used as input information for the second time step of the semantic module, and the semantic characteristics of the second time step output by the semantic module are acquired, and similarly, output by the semantic module at the t-th time step. Semantic features

Can be obtained. The semantic module may include two coding layers. The network architecture of the position coding module and the semantic module may be similar to the network architecture of FIG. 2, and will not be described repeatedly here.

例示的に、エンコーダは、位置情報強調モジュールを備え、デコーダは、位置符号化モジュール、アテンションモジュール、セマンティックモジュール、及び動的融合モジュールを備える。ここで、位置情報強調モジュールは、２層のＬＳＴＭ（図２を参照）を含み、２層のＬＳＴＭを使用して、目標画像の第１画像特徴を左から右への順に符号化して、第１画像特徴の符号化結果を取得し、第１画像特徴の符号化結果を第１画像特徴に加算して、目標画像の第２特徴の符号化結果を取得し、これによって、第２画像特徴を決定し、位置情報強調モジュールの出力として第２画像特徴を出力する。位置符号化モジュールは、２層のＬＳＴＭを含み、位置符号化モジュールの各入力は、１つの特定の入力であるため、当該位置符号化モジュールは、本質的には、文字長カウンタである。位置符号化モジュールを用いて少なくとも１つのプリセット情報に対して２レベルの第２符号化処理を行い、位置ベクトルを取得し、位置ベクトル及び第２画像特徴をアテンションモジュールに入力し、アテンションモジュールによって第２画像特徴と位置ベクトルとに対して行列乗算及び活性化操作を行い、アテンション重みを取得する。次に、アテンション重みに従って、第１画像特徴の加重平均値を取り、目標画像の文字特徴を取得する。第２プリセット情報をセマンティックモジュールに入力して、目標画像のセマンティック特徴を取得し、動的融合モジュールを用いて、セマンティック特徴及び文字特徴に対して重み予測を行い、セマンティック特徴と文字特徴の加重平均値を、融合結果出力として出力し、融合結果を予測モジュールに入力し、予測モジュールによって文字分類を行い、文字認識結果を取得することができる。 Illustratively, the encoder includes a position information enhancement module, and the decoder includes a position coding module, an attention module, a semantic module, and a dynamic fusion module. Here, the position information enhancement module includes a two-layer LSTM (see FIG. 2), and the two-layer LSTM is used to encode the first image feature of the target image in order from left to right. The coding result of one image feature is acquired, the coding result of the first image feature is added to the first image feature, and the coding result of the second feature of the target image is obtained, whereby the second image feature is obtained. Is determined, and the second image feature is output as the output of the position information enhancement module. The position coding module is essentially a character length counter because the position coding module includes two layers of LSTMs and each input of the position coding module is one particular input. A two-level second coding process is performed on at least one preset information using the position coding module, a position vector is acquired, the position vector and the second image feature are input to the attention module, and the attention module performs the second coding process. 2 Matrix multiplication and activation operations are performed on the image features and the position vector to acquire the attention weight. Next, the weighted average value of the first image feature is taken according to the attention weight, and the character feature of the target image is acquired. The second preset information is input to the semantic module, the semantic feature of the target image is acquired, the weight prediction is performed for the semantic feature and the character feature using the dynamic fusion module, and the weighted average of the semantic feature and the character feature is performed. The value can be output as a fusion result output, the fusion result can be input to the prediction module, character classification can be performed by the prediction module, and the character recognition result can be obtained.

本発明の実施例による文字符号化解決策によれば、文字間の位置情報を強調し、文字認識結果のセマンティックへの依存性を低減し、それによって、文字認識の精度を向上させる。本発明に係る文字符号化解決策は、より複雑な文字認識シナリオ（例えば、不規則な文字の認識や非セマンティック文字の認識など）に適用でき、画像認識などのシナリオ（例えば、画像審査や画像分析など）にも適用できる。 According to the character encoding solution according to the embodiment of the present invention, the position information between characters is emphasized, the dependency of the character recognition result on the semantics is reduced, and thereby the accuracy of character recognition is improved. The character coding solution according to the present invention can be applied to more complex character recognition scenarios (eg, irregular character recognition, non-semantic character recognition, etc.) and scenarios such as image recognition (eg, image review and image analysis). Etc.) can also be applied.

本発明で述べた上述の各方法の実施例は、原理および論理に違反することなく、互いに組み合わせて、組み合わされた実施例を形成することができ、紙数に限りがあるので、本発明を繰り返して説明しないことを理解されたい。 The examples of the above-mentioned methods described in the present invention can be combined with each other to form a combined example without violating the principle and logic, and the number of papers is limited. Please understand that I will not explain it repeatedly.

さらに、本発明はまた、文字認識装置、電子機器、コンピュータ可読記憶媒体、プログラムを提供し、これらはすべて、本発明による文字認識方法のいずれかを実現するために使用でき、対応する技術的解決策と説明については、方法の実施例の対応する説明を参照することができ、ここでは繰り返して説明しない。 In addition, the invention also provides character recognition devices, electronic devices, computer readable storage media, programs, all of which can be used to implement any of the character recognition methods according to the invention and the corresponding technical solutions. For the measures and explanations, the corresponding description of the embodiment of the method can be referred to and will not be repeated here.

当業者なら自明であるが、上記の具体的な実施形態における方法において、記載された各ステップの順序は、実施プロセスを限定する厳密な実行順序を意味するのではなく、各ステップの具体的な実行順序は、その機能と可能な内部ロジックによって決定する必要がある。 As will be obvious to those skilled in the art, in the method of the above specific embodiments, the order of each step described does not mean a strict execution order that limits the implementation process, but a specific of each step. The order of execution must be determined by its function and possible internal logic.

図４は、本開示の実施例に係る文字認識装置のブロック図を示し、図４に示されたように、前記文字認識装置は、
認識対象となる目標画像を取得するように構成される取得部４１と、
決定された位置ベクトル及び前記目標画像の第１画像特徴に基づいて、前記目標画像の文字特徴を取得するように構成される決定部４２であって、前記位置ベクトルは、プリセット情報シーケンスにおける文字の位置特徴に基づいて決定されたものである、決定部４２と、
前記文字特徴に基づいて前記目標画像内の文字を認識して、前記目標画像の文字認識結果を取得するように構成される認識部４３と、を備える。 FIG. 4 shows a block diagram of the character recognition device according to the embodiment of the present disclosure, and as shown in FIG. 4, the character recognition device is
An acquisition unit 41 configured to acquire a target image to be recognized, and
A determination unit 42 configured to acquire the character features of the target image based on the determined position vector and the first image feature of the target image, wherein the position vector is a character in the preset information sequence. The determination unit 42, which is determined based on the positional feature,
A recognition unit 43 configured to recognize characters in the target image based on the character features and acquire a character recognition result of the target image is provided.

１つの可能な実施形態において、前記決定部４２は更に、前記目標画像の第１画像特徴を符号化して、前記第１画像特徴の符号化結果を取得し、前記第１画像特徴の符号化結果に従って、前記目標画像の第２画像特徴を決定し、決定された位置ベクトル、前記第１画像特徴及び前記第２画像特徴に基づいて、前記目標画像の文字特徴を取得するように構成される。 In one possible embodiment, the determination unit 42 further encodes the first image feature of the target image, obtains the coding result of the first image feature, and encodes the first image feature. According to the above, the second image feature of the target image is determined, and the character feature of the target image is acquired based on the determined position vector, the first image feature, and the second image feature.

１つの可能な実施形態において、前記決定部４２は更に、前記第１画像特徴の複数の第１次元特徴ベクトルに対して少なくとも１レベルの第１符号化処理を順次に実行して、前記第１画像特徴の符号化結果を取得するように構成される。 In one possible embodiment, the determination unit 42 further sequentially executes at least one level of first coding processing on the plurality of first-dimensional feature vectors of the first image feature to perform the first coding process. It is configured to acquire the encoded result of the image feature.

１つの可能な実施形態において、前記決定部４２は更に、前記少なくとも１レベルの第１符号化処理における１レベルの第１符号化処理について、Ｎ（Ｎは、正の整数である）個の第１符号化ノードを用いて前記第１符号化ノードの入力情報に対して符号化を順次に行い、Ｎ個の第１符号化ノードの出力結果を取得し、ここで、１＜ｉ≦Ｎである場合、ｉ（ｉは、正の整数である）番目の第１符号化ノードの入力情報は、ｉ－１番目の第１符号化ノードの出力結果を含み、前記Ｎ個の第１符号化ノードの出力結果に従って、前記第１画像特徴の符号化結果を取得するように構成される。 In one possible embodiment, the determination unit 42 further describes N (where N is a positive integer) number of 1st level 1st coding processes in the at least 1st level 1st coding process. The input information of the first coding node is sequentially coded using the one coding node, and the output results of N first coding nodes are acquired, where 1 <i ≦ N. In some cases, the input information of the i-th first coding node (i is a positive integer) includes the output result of the i-1th first coding node, and the N first codings are included. It is configured to acquire the coding result of the first image feature according to the output result of the node.

１つの可能な実施形態において、前記決定部４２は更に、前記位置ベクトル及び前記第２画像特徴に従って、アテンション重みを決定し、前記アテンション重みを用いて前記第１画像特徴に対して特徴加重を実行することにより、前記目標画像の文字特徴を取得するように構成される。 In one possible embodiment, the determination unit 42 further determines the attention weight according to the position vector and the second image feature, and uses the attention weight to perform feature weighting on the first image feature. By doing so, it is configured to acquire the character features of the target image.

１つの可能な実施形態において、前記文字認識装置は更に、
少なくとも１つの第１プリセット情報を含むプリセット情報シーケンスを取得し、前記少なくとも１つの第１プリセット情報に対して少なくとも１レベルの第２符号化処理を順次に行い、前記位置ベクトルを取得するように構成される符号化部を備える。 In one possible embodiment, the character recognition device further
A preset information sequence including at least one first preset information is acquired, and at least one level of second coding processing is sequentially performed on the at least one first preset information to acquire the position vector. It is provided with an encoding unit to be used.

１つの可能な実施形態において、前記符号化部は更に、前記少なくとも１レベルの第２符号化処理における１レベルの第２符号化処理について、Ｍ（Ｍは、正の整数である）個の第２符号化ノードを用いて前記第２符号化ノードの入力情報に対して符号化を順次に行い、Ｍ個の第２符号化ノードの出力結果を取得し、ここで、１＜ｊ≦Ｍである場合、ｊ（ｊは、正の整数である）番目の第２符号化ノードの入力情報は、ｊ－１番目の第２符号化ノードの出力結果を含み、前記Ｍ個の第２符号化ノードの出力結果に従って、前記位置ベクトルを取得するように構成される。 In one possible embodiment, the coding unit further describes an M (where M is a positive integer) th-order for the one-level second coding process in the at least one-level second coding process. The input information of the second coding node is sequentially coded using the two coding nodes, and the output results of the M second coding nodes are acquired, where 1 <j ≦ M. In some cases, the input information of the j (j is a positive integer) second coding node includes the output result of the j-1st second coding node, and the M second coding nodes are described. It is configured to acquire the position vector according to the output result of the node.

１つの可能な実施形態において、前記認識部４３は更に、前記目標画像のセマンティック特徴を抽出し、前記目標画像のセマンティック特徴及び前記文字特徴に基づいて、前記目標画像の文字認識結果を取得するように構成される。 In one possible embodiment, the recognition unit 43 further extracts the semantic features of the target image and obtains the character recognition result of the target image based on the semantic features of the target image and the character features. It is composed of.

１つの可能な実施形態において、前記認識部４３は更に、取得された第２プリセット情報に基づいて、少なくとも１つのタイムステップにおける前記目標画像のセマンティック特徴を順次に決定し、少なくとも１つのタイムステップにおける前記目標画像のセマンティック特徴及び前記文字特徴に基づいて、少なくとも１つのタイムステップにおける前記目標画像の文字認識結果を取得するように構成される。 In one possible embodiment, the recognition unit 43 further sequentially determines the semantic features of the target image in at least one time step, based on the acquired second preset information, in at least one time step. Based on the semantic features of the target image and the character features, it is configured to acquire the character recognition result of the target image in at least one time step.

１つの可能な実施形態において、前記認識部４３は更に、前記第２プリセット情報に対して少なくとも１レベルの第３符号化処理を行い、前記少なくとも１つのタイムステップのうちの１番目のタイムステップのセマンティック特徴を取得し、ｋ－１（ｋは、１より大きい整数である）番目のタイムステップにおける前記目標画像の文字認識結果に対して少なくとも１レベルの第３符号化処理を行い、ｋ番目のタイムステップにおける前記目標画像のセマンティック特徴を取得するように構成される。 In one possible embodiment, the recognition unit 43 further performs at least one level of third coding processing on the second preset information, and the first time step of the at least one time step. The semantic feature is acquired, and at least one level of third coding processing is performed on the character recognition result of the target image in the k-1 (k is an integer greater than 1) th time step, and the kth th coding process is performed. It is configured to acquire the semantic features of the target image in the time step.

本発明の実施例および他の実施例において、「部分」は、部分回路、部分プロセッサ、部分プログラムまたはソフトウェア等であってもよく、もちろん、ユニットであってもよく、モジュールまたは非モジュール化であってもよいことを理解することができる。 In an embodiment of the invention and other embodiments, the "part" may be a partial circuit, a partial processor, a partial program or software, etc., and of course, a unit, modular or non-modular. You can understand that it may be.

いくつかの実施例において、本発明の実施例で提供される装置の機能又はモジュールは、上記の方法の実施例で説明された方法を実行するように構成されることができ、その具体的な実現は、上記の方法の実施例の説明を参照することができ、簡潔にするために、ここでは繰り返して説明しない。 In some embodiments, the functions or modules of the apparatus provided in the embodiments of the present invention can be configured to perform the methods described in the embodiments of the methods described above, and the specifics thereof. The implementation can be referred to in the description of the embodiments of the above method and will not be repeated here for brevity.

図５は、一例示的な実施例によって示された文字認識装置８００のブロック図である。例えば、装置８００は、携帯電話、コンピュータ、デジタル放送端末、メッセージングデバイス、ゲームコンソール、タブレットデバイス、医療機器、フィットネス機器、携帯情報端末などであってもよい。 FIG. 5 is a block diagram of the character recognition device 800 shown by an exemplary embodiment. For example, the device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a mobile information terminal, or the like.

図５を参照すると、装置８００は、処理コンポーネント８０２、メモリ８０４、電力コンポーネント８０６、マルチメディアコンポーネント８０８、オーディオコンポーネント８１０、入力／出力（Ｉ／Ｏ）インターフェース８１２、センサコンポーネント８１４、及び通信コンポーネント８１６のうちの１つまたは複数のコンポーネントを備えることができる。 Referring to FIG. 5, the apparatus 800 includes processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input / output (I / O) interface 812, sensor component 814, and communication component 816. It can include one or more of these components.

処理コンポーネント８０２は、一般的に、ディスプレイ、電話の呼び出し、データ通信、カメラ操作及び記録操作に関する操作のような装置８００の全般的な操作を制御する。処理コンポーネント８０２は、上記の方法のステップのすべてまたは一部を完了するための命令を実行するための１つまたは複数のプロセッサ８２０を備えることができる。加えて、処理コンポーネント８０２は、処理コンポーネント８０２と他のコンポーネントの間のインタラクションを容易にするための１つまたは複数のモジュールを備えることができる。例えば、処理コンポーネント８０２は、マルチメディアコンポーネント８０８と処理コンポーネント８０２との間のインタラクションを容易にするためのマルチメディアモジュールを備えることができる。 The processing component 802 generally controls general operations of the device 800, such as operations relating to displays, telephone calls, data communications, camera operations and recording operations. The processing component 802 may include one or more processors 820 for executing instructions to complete all or part of the steps of the above method. In addition, the processing component 802 may include one or more modules to facilitate the interaction between the processing component 802 and the other components. For example, the processing component 802 may include a multimedia module for facilitating the interaction between the multimedia component 808 and the processing component 802.

メモリ８０４は、装置８００での操作をサポートするために、様々なタイプのデータを格納するように構成される。これらのデータの例には、装置８００で動作する任意のアプリケーションまたは方法の命令、連絡先データ、電話帳データ、メッセージ、写真、ビデオ等が含まれる。第１メモリ８０４は、スタティックランダムアクセスメモリ（ＳＲＡＭ）、電気的に消去可能なプログラム可能な読み取り専用メモリ（ＥＥＰＲＯＭ）、消去可能なプログラム可能な読み取り専用メモリ（ＥＰＲＯＭ）、プログラム可能な読み取り専用メモリ（ＰＲＯＭ）、読み取り専用メモリ（ＲＯＭ）、磁気メモリ、フラッシュメモリ、磁気ディスク、または光ディスクなど、あらゆるタイプの揮発性または不揮発性ストレージデバイスまたはそれらの組み合わせによって実現されることができる。 Memory 804 is configured to store various types of data to support operations on device 800. Examples of these data include instructions, contact data, phonebook data, messages, photos, videos, etc. of any application or method running on device 800. The first memory 804 includes static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), and programmable read-only memory (EPROM). It can be achieved by any type of volatile or non-volatile storage device or a combination thereof, such as PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk.

電力コンポーネント８０６は、装置８００の様々なコンポーネントに電力を提供する。電力コンポーネント８０６は、電力管理システム、１つまたは複数の電源、及び装置８００の電力の生成、管理および分配に関する他のコンポーネントを備えることができる。 The power component 806 provides power to various components of the device 800. The power component 806 can include a power management system, one or more power sources, and other components related to the generation, management, and distribution of power for the device 800.

マルチメディアコンポーネント８０８は、前記装置８００とユーザとの間の出力インターフェースとして提供されるスクリーンを備える。いくつかの実施例において、スクリーンは、液晶ディスプレイ（ＬＣＤ）及びタッチパネル（ＴＰ）を備えることができる。スクリーンがタッチパネルを備える場合、スクリーンは、ユーザからの入力信号を受信するためのタッチスクリーンとして実装されることができる。タッチパネルは、タッチ、スワイプ及びタッチパネルでのジェスチャーを検知するための１つまたは複数のタッチセンサを備える。前記タッチセンサは、タッチまたはスワイプの操作の境界を感知するだけでなく、前記タッチまたはスワイプ動作に関連する持続時間及び圧力も検出することができる。いくつかの実施例において、マルチメディアコンポーネント８０８は、１つのフロントカメラおよび／またはリアカメラを備える。装置８００が撮影モードまたはビデオモードなどの動作モードにあるとき、フロントカメラおよび／またはリアカメラは、外部のマルチメディアデータを受信することができる。各フロントカメラ及びリアカメラは、固定された光学レンズシステムであってもよく、焦点距離と光学ズーム機能を有するものであってもよい。 The multimedia component 808 includes a screen provided as an output interface between the device 800 and the user. In some embodiments, the screen can include a liquid crystal display (LCD) and a touch panel (TP). If the screen comprises a touch panel, the screen can be implemented as a touch screen for receiving input signals from the user. The touch panel comprises one or more touch sensors for detecting touches, swipes and gestures on the touch panel. The touch sensor can not only detect the boundaries of the touch or swipe operation, but also the duration and pressure associated with the touch or swipe operation. In some embodiments, the multimedia component 808 comprises one front camera and / or rear camera. When the device 800 is in an operating mode such as a shooting mode or a video mode, the front camera and / or the rear camera can receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or may have a focal length and an optical zoom function.

オーディオコンポーネント８１０は、オーディオ信号を出力および／または入力するように構成される。例えば、オーディオコンポーネント８１０は、１つのマイクロフォン（ＭＩＣ）を備え、装置８００が通話モード、録音モード及び音声認識モードなどの動作モードにあるとき、マイクロフォンは、外部オーディオ信号を受信するように構成される。受信されたオーディオ信号は、メモリ８０４にさらに記憶されてもよく、または通信コンポーネント８１６を介して送信されてもよい。いくつかの実施例において、オーディオコンポーネント８１０は、さらに、オーディオ信号を出力するためのスピーカを備える。 The audio component 810 is configured to output and / or input an audio signal. For example, the audio component 810 comprises one microphone (MIC), and the microphone is configured to receive an external audio signal when the device 800 is in an operating mode such as a call mode, a recording mode, and a voice recognition mode. .. The received audio signal may be further stored in memory 804 or transmitted via the communication component 816. In some embodiments, the audio component 810 further comprises a speaker for outputting an audio signal.

Ｉ／Ｏインターフェース８１２は、処理コンポーネント８０２と周辺インターフェースモジュールとの間にインターフェースを提供し、前記周辺インターフェースモジュールは、キーボード、クリックホイール、ボタンなどであってもよい。これらのボタンは、ホームボタン、ボリュームボタン、スタートボタン、ロックボタンを備えることができるが、これらに限定されない。 The I / O interface 812 provides an interface between the processing component 802 and the peripheral interface module, which peripheral interface module may be a keyboard, click wheel, buttons, or the like. These buttons may include, but are not limited to, a home button, a volume button, a start button, and a lock button.

センサコンポーネント８１４は、装置８００に各態様の状態の評価を提供するための１つまたは複数のセンサを備える。例えば、センサコンポーネント８１４は、装置８００のオン／オフ状態と、装置８００のディスプレイやキーパッドなどのコンポーネントの相対的な位置づけを検出することができ、センサコンポーネント８１４は、装置８００または装置８００のコンポーネントの位置の変化、ユーザとの装置８００の接触の有無、装置８００の向きまたは加速／減速、及び装置８００の温度の変化も検出することができる。センサコンポーネント８１４は、物理的接触なしに近くの物体の存在を検出するように構成された近接センサを備えることができる。センサコンポーネント８１４はまた、撮像用途で使用するためのＣＭＯＳまたはＣＣＤ画像センサなどの光センサをさらに備えることができる。いくつかの実施例において、当該センサコンポーネント８１４は、さらに、加速度センサ、ジャイロスコープセンサ、磁気センサ、圧力センサまたは温度センサを備えることができる。 The sensor component 814 comprises one or more sensors for providing the device 800 with an assessment of the state of each aspect. For example, the sensor component 814 can detect the on / off state of the device 800 and the relative positioning of components such as the display and keypad of the device 800, and the sensor component 814 is a component of the device 800 or the device 800. Changes in the position of the device 800, the presence or absence of contact of the device 800 with the user, the orientation or acceleration / deceleration of the device 800, and changes in the temperature of the device 800 can also be detected. The sensor component 814 can include a proximity sensor configured to detect the presence of nearby objects without physical contact. The sensor component 814 can also further include an optical sensor such as a CMOS or CCD image sensor for use in imaging applications. In some embodiments, the sensor component 814 can further include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor or a temperature sensor.

通信コンポーネント８１６は、装置８００と他の装置の間の有線または無線通信を容易にするように構成される。装置８００は、ＷｉＦｉ、２Ｇまたは３Ｇ、またはそれらの組み合わせなどの通信規格に基づく無線ネットワークにアクセスすることができる。一例示的な実施例において、前記通信コンポーネント８１６は、放送チャンネルを介して外部放送管理システムからの放送信号または放送関連情報を受信する。一例示的な実施例において、前記通信コンポーネント８１６は、さらに、短距離通信を促進するために、近距離通信（ＮＦＣ）モジュールを備える。例えば、ＮＦＣモジュールは、無線周波数識別（ＲＦＩＤ）技術、赤外線データ協会（ＩｒＤＡ）技術、超広帯域（ＵＷＢ）技術、ブルートゥース（ＢＴ）技術及び他の技術に基づいて具現することができる。 Communication component 816 is configured to facilitate wired or wireless communication between device 800 and other devices. The device 800 can access a wireless network based on communication standards such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further comprises a Near Field Communication (NFC) module to facilitate short range communication. For example, NFC modules can be embodied on the basis of radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

例示的な実施例において、装置８００は、上記の方法を実行するように構成される、１つまたは複数の特定用途向け集積回路（ＡＳＩＣ）、デジタル信号プロセッサ（ＤＳＰ）、デジタル信号処理装置（ＤＳＰＤ）、プログラマブルロジックデバイス（ＰＬＤ）、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、コントローラ、マイクロコントローラ、マイクロプロセッサまたは他の電子素子によって具現されることができる。 In an exemplary embodiment, the device 800 is configured to perform the above method, one or more application-specific integrated circuits (ASICs), a digital signal processor (DSP), a digital signal processor (DSPD). ), Programmable logic devices (PLDs), field programmable gate arrays (FPGAs), controllers, microprocessors, microprocessors or other electronic components.

例示的な実施例において、コンピュータプログラム命令を含むメモリ８０４などの、コンピュータ可読記憶媒体をさらに提供し、上述のコンピュータプログラム命令は、装置８００のプロセッサ８２０によって実行されることにより、上記の方法を遂行することができる。 In an exemplary embodiment, a computer-readable storage medium, such as a memory 804 containing computer program instructions, is further provided, the computer program instructions described above being executed by processor 820 of device 800 to perform the above method. can do.

本発明の実施例は、電子機器をさらに提案し、前記電子機器は、プロセッサと、プロセッサ実行可能命令を記憶するように構成されるメモリとを備え、ここで、前記プロセッサは、前記メモリ８０４によって記憶された命令を呼び出して実行することにより、上記の方法を実行するように構成される。 An embodiment of the present invention further proposes an electronic device, wherein the electronic device comprises a processor and a memory configured to store processor-executable instructions, wherein the processor is provided by the memory 804. It is configured to execute the above method by calling and executing the stored instruction.

電子機器は、端末、サーバまたは他の形の機器として提供される。 Electronic devices are provided as terminals, servers or other forms of device.

図６は、１つの例示的な実施例に係る電子機器１９００のブロック図である。例えば、電子機器１９００は、サーバとして提供されることができる。図６を参照すると、電子機器１９００は、１つまたは複数のプロセッサを含む処理コンポーネント１９２２と、処理コンポーネント１９２２によって実行可能な命令（アプリケーションなど）を記憶するように構成されるメモリリソースとして表されるメモリ１９３２と、を備える。メモリ１９３２に記憶されたアプリケーションは、それぞれが一セットの命令に対応する１つまたは複数のモジュールを備えることができる。さらに、処理コンポーネント１９２２は、命令を実行することにより、上記の方法を実行するように構成される。 FIG. 6 is a block diagram of an electronic device 1900 according to one exemplary embodiment. For example, the electronic device 1900 can be provided as a server. Referring to FIG. 6, electronic device 1900 is represented as a processing component 1922 that includes one or more processors and a memory resource that is configured to store instructions (such as applications) that can be executed by the processing component 1922. It includes a memory 1932. An application stored in memory 1932 may include one or more modules, each corresponding to a set of instructions. Further, the processing component 1922 is configured to execute the above method by executing an instruction.

電子機器１９００は、さらに、電子装置１９００の電源管理を実行するように構成される電力コンポーネント１９２６と、電子装置１９００をネットワークに接続するように構成される有線または無線ネットワークインターフェース１９５０と、入力／出力（Ｉ／Ｏ）インターフェース１９５８と、を備えることができる。電子機器１９００は、メモリ１９３２に記憶されたオペレーティングシステム、例えば、ＷｉｎｄｏｗｓＳｅｒｖｅｒＴＭ、ＭａｃＯＳＸＴＭ、ＵｎｉｘＴＭ、ＬｉｎｕｘＴＭ、ＦｒｅｅＢＳＤＴＭまたは類似したものに基づいて操作されることができる。 The electronic device 1900 also includes an input / output with a power component 1926 configured to perform power management of the electronic device 1900 and a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network. (I / O) interface 1958 and can be provided. The electronic device 1900 can be operated on the basis of an operating system stored in memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.

例示的な実施例において、コンピュータプログラム命令を含むメモリ１９３２などの、コンピュータ可読記憶媒体をさらに提供し、電子機器１９００の処理コンポーネント１９２２によって上述のコンピュータプログラム命令を実行することにより、上記の方法を完了することができる。 In an exemplary embodiment, the above method is completed by further providing a computer-readable storage medium, such as memory 1932 containing computer program instructions, and executing the computer program instructions described above by processing component 1922 of electronic device 1900. can do.

本発明は、システム、方法および／またはコンピュータプログラム製品であってもよい。コンピュータプログラム製品は、コンピュータ可読記憶媒体を含み得、当該コンピュータ可読記憶媒体には、プロセッサに、本発明の実施例の様々な態様を実現させるためのコンピュータ可読プログラム命令が含まれる。 The present invention may be a system, method and / or computer program product. The computer program product may include a computer-readable storage medium, which includes computer-readable program instructions for the processor to realize various aspects of the embodiments of the present invention.

コンピュータ可読記憶媒体は、命令実行機器によって使用される命令を保持および記憶することができる有形機器であってもよい。コンピュータ可読記憶媒体は、例えば、電気記憶機器、磁気記憶機器、光学記憶機器、電磁記憶機器、半導体記憶機器または前述の任意の適切な組み合わせであり得るが、これらに限定されない。コンピュータ可読記憶媒体のより具体的な例（非網羅的リスト）は、ポータブルコンピュータディスケット、ハードディスク、ランダムアクセスメモリ（ＲＡＭ）、読み取り専用メモリ（ＲＯＭ）、消去可能プログラマブル読み取り専用メモリ（ＥＰＲＯＭまたはフラッシュメモリ）、スタティックランダムアクセスメモリ（ＳＲＡＭ）、ポータブルコンパクトディスク読み取り専用メモリ（ＣＤ－ＲＯＭ）、デジタル多用途ディスク（ＤＶＤ）、メモリスティック、フロッピーディスク、命令が記憶されたパンチカードまたは溝内の凸状構造、および前述の任意の適切な組み合わせなどの機械的符号化機器を含む。ここで使用されるコンピュータ可読記憶媒体は、電波や自由に伝播される他の電磁波、導波管や他の伝播媒体を介して伝播される電磁波（光ファイバーケーブルを介した光パルスなど）、またはワイヤを介して伝送される電子信号などの、一時的な信号として解釈されてはならない。 The computer-readable storage medium may be a tangible device capable of holding and storing instructions used by the instruction executing device. The computer-readable storage medium can be, for example, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination described above, but is not limited thereto. More specific examples (non-exhaustive lists) of computer-readable storage media are portable computer disksets, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory). , Static Random Access Memory (SRAM), Portable Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disk (DVD), Memory Stick, Floppy Disk, Punch Card or Groove Convex Structure in which Instructions Are Stored, And include mechanical coding equipment such as any suitable combination described above. Computer-readable storage media used herein are radio waves, other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other propagating media (such as optical pulses through fiber optic cables), or wires. It should not be interpreted as a temporary signal, such as an electronic signal transmitted via.

本明細書で説明するコンピュータ可読プログラム命令は、コンピュータ可読記憶媒体から各コンピューティング／処理機器にダウンロードされるか、インターネット、ローカルエリアネットワーク、広域ネットワークおよび／または無線ネットワークなどのネットワークを介して外部コンピュータまたは外部記憶機器にダウンロードされることができる。ネットワークは、銅線伝送ケーブル、光ファイバー伝送、無線伝送、ルータ、ファイアウォール、交換機、ゲートウェイコンピュータおよび／またはエッジサーバなどを含み得る。各コンピューティング／処理機器におけるネットワークアダプターカードまたはネットワークインターフェースは、ネットワークからコンピュータ可読プログラム命令を受信し、他のコンピューティング／処理機器のコンピュータ可読記憶媒体への記憶のために、当該コンピュータ可読プログラム命令を転送する。 The computer-readable program instructions described herein are downloaded from a computer-readable storage medium to each computing / processing device or via a network such as the Internet, local area networks, wide area networks and / or wireless networks to external computers. Or it can be downloaded to an external storage device. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and / or edge servers and the like. A network adapter card or network interface in each computing / processing device receives a computer-readable program instruction from the network and issues the computer-readable program instruction for storage in a computer-readable storage medium of another computing / processing device. Forward.

本発明の操作を実行するためのコンピュータプログラム命令は、アセンブリ命令、命令セットアーキテクチャ（ＩＳＡ）命令、機械命令、機械関連命令、マイクロコード、ファームウェア命令、状態設定データ、または以１つまたは複数のプログラミング言語の任意の組み合わせでプログラミングされたソースコードまたは目標コードであってもよく、前記プログラミング言語は、Ｓｍａｌｌｔａｌｋ、Ｃ＋＋などのオブジェクト指向プログラミング言語、および「Ｃ」言語または類似のプログラミング言語などの一般的な手続き型プログラミング言語を含む。コンピュータ可読プログラム命令は、完全にユーザのコンピュータで実行されてもよく、その一部がユーザのコンピュータで実行されてもよく、１つの独立したソフトウェアパッケージとして実行されてもよく、その一部がユーザのコンピュータで実行されかつその他の部分がリモートコンピュータで実行されてもよく完全にリモートコンピュータまたはサーバで実行されてもよい。リモートコンピュータの場合、リモートコンピュータは、ローカルエリアネットワーク（ＬＡＮ）または広域ネットワーク（ＷＡＮ）を含む任意のタイプのネットワークを経由して、ユーザのコンピュータに接続するか、または、外部コンピュータに接続することができる（例えば、インターネットサービスプロバイダを使用してインターネットを経由して外部コンピュータにアクセスすることができる）。いくつかの実施例において、コンピュータ可読命令の状態情報を使用することにより、プログラマブルロジック回路、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）またはプログラマブルロジックアレイ（ＰＬＡ）などの、電子回路をカスタマイズし、当該電子回路は、コンピュータ可読プログラム命令を実行し、それにより、本発明の各態様を実現することができる。 The computer programming instructions for performing the operations of the present invention are assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine related instructions, microcode, firmware instructions, state setting data, or programming of one or more. It may be source code or target code programmed in any combination of languages, said programming language being an object-oriented programming language such as Smalltalk, C ++, and common programming languages such as "C" or similar programming languages. Includes procedural programming languages. Computer-readable program instructions may be executed entirely on the user's computer, some may be executed on the user's computer, or some may be executed as a separate software package, some of which may be executed by the user. It may be run on one computer and the rest may be run on a remote computer or entirely on a remote computer or server. For remote computers, the remote computer can connect to the user's computer or to an external computer via any type of network, including local area networks (LANs) or wide area networks (WANs). Yes (for example, you can use an internet service provider to access an external computer over the internet). In some embodiments, computer-readable instruction state information is used to customize an electronic circuit, such as a programmable logic circuit, field programmable gate array (FPGA) or programmable logic array (PLA). , Computer-readable program instructions can be executed, thereby realizing each aspect of the present invention.

ここで、本発明の実施例に係る方法、装置（システム）およびコンピュータプログラム製品のフローチャートおよび／またはブロックを参照して、本発明の各態様について説明したが、フローチャートおよび／またはブロック図の各ブロック、およびフローチャートおよび／またはブロック図の各ブロックの組み合わせは、いずれもコンピュータ可読プログラム命令によって実現できることを理解されたい。 Here, each aspect of the present invention has been described with reference to the flowcharts and / or blocks of the method, apparatus (system) and computer program product according to the embodiment of the present invention, but each block of the flowchart and / or block diagram has been described. It should be understood that each block combination of, and the flowchart and / or block diagram can be achieved by computer-readable program instructions.

これらのコンピュータ可読プログラム命令は、汎用コンピュータ、専用コンピュータまたは他のプログラム可能なデータ処理装置のプロセッサに提供することができ、それにより、これらの命令がコンピュータまたは他のプログラム可能なデータ処理装置のプロセッサによって実行されるときに、フローチャートおよび／またはブロック図における１つのまたは複数のブロックで指定された機能／動作を実現する手段を創出する。これらのコンピュータ可読プログラム命令をコンピュータ可読記憶媒体に記憶してもよく、コンピュータ、プログラム可能データ処理装置および／または他の機器が、これらの命令に応じて特定方式で動作することができる。したがって、命令が記憶されたコンピュータ可読媒体は、フローチャートおよび／またはブロック図における１つまたは複数のブロックで指定された機能／動作の各態様の命令を含む、製品を含むことができる。 These computer-readable program instructions can be provided to the processor of a general purpose computer, dedicated computer or other programmable data processor, whereby these instructions can be provided to the processor of the computer or other programmable data processor. Creates means to achieve the function / operation specified by one or more blocks in the flowchart and / or block diagram when executed by. These computer-readable program instructions may be stored on a computer-readable storage medium, allowing a computer, programmable data processing device, and / or other device to operate in a particular manner in response to these instructions. Thus, a computer-readable medium in which instructions are stored can include a product that includes instructions for each aspect of a function / operation specified by one or more blocks in a flowchart and / or block diagram.

また、コンピュータ可読プログラム命令を、コンピュータ、他のプログラム可能なデータ処理装置、または他の機器にロードすることで、コンピュータ、プログラム可能な数据処理装置または他の機器に、一連の操作ステップを実行させることにより、コンピュータによって実現されるプロセスを生成し、それにより、コンピュータ、他のプログラム可能な数据処理装置、または他の機器で実行される命令により、フローチャートおよび／またはブロック図における１つまたは複数のブロックで指定された機能／動作を実現することができる。 Also, loading computer-readable program instructions into a computer, other programmable data processor, or other device causes the computer, programmable number of processor, or other device to perform a series of operational steps. Thereby generating a process implemented by a computer, thereby one or more in a flow chart and / or a block diagram by instructions executed by a computer, other programmable number processing device, or other equipment. The function / operation specified by the block can be realized.

添付の図面におけるフローチャートおよびブロック図は、本発明の複数の実施例に係るシステム、方法およびコンピュータプログラム製品の実現可能な実装アーキテクチャ、機能および動作を示す。これに関して、フローチャートまたはブロック図における各ブロックは、１つのモジュール、プログラムセグメントまたは命令の一部を表すことができ、前記モジュール、プログラムセグメントまたは命令の一部は、指定された論理機能を実現するための１つまたは複数の実行可能な命令を含む。いくつかの代替的な実現では、ブロックで表示された機能は、図面で表示された順序とは異なる順序で実行することもできる。例えば、２つの連続するブロックは、実際には、並行して実行されることができ、関連する機能によっては、逆の順序で実行されることもできる。ブロック図および／またはフローチャートにおける各ブロック、およびブロック図および／またはフローチャートにおけるブロックの組み合わせは、指定された機能または動作を実行するハードウェアに基づく専用システムによって実現することができ、または専用ハードウェアとコンピュータ命令の組み合わせによって実現されることができることに留意されたい。 The flowcharts and block diagrams in the accompanying drawings show the feasible implementation architectures, functions and operations of the systems, methods and computer program products according to a plurality of embodiments of the present invention. In this regard, each block in a flowchart or block diagram can represent a module, program segment or part of an instruction, said module, program segment or part of the instruction to achieve a specified logical function. Includes one or more executable instructions of. In some alternative implementations, the functions displayed in blocks can also be performed in a different order than they are displayed in the drawing. For example, two consecutive blocks can actually be executed in parallel and, depending on the related functions, in reverse order. Each block in the block diagram and / or flowchart, and a combination of blocks in the block diagram and / or flowchart, can be achieved by a dedicated system based on the hardware performing the specified function or operation, or with dedicated hardware. Note that this can be achieved by a combination of computer instructions.

以上、本発明の各実施例を説明したが、以上の説明は、例示的なものであり、網羅的なものではなく、開示された各実施例に限定されない。説明された各実施例の範囲および精神から逸脱することなく、多くの修正および変更は、当業者にとっては明らかである。本明細書で使用される用語の選択は、各実施例の原理、実際の応用または市場における技術の技術的改善を最もよく説明するか、当業者が本明細書で開示された各実施例を理解することができるようにすることを意図する。 Although each embodiment of the present invention has been described above, the above description is exemplary, not exhaustive, and is not limited to each disclosed embodiment. Many modifications and changes will be apparent to those skilled in the art without departing from the scope and spirit of each of the embodiments described. The choice of terminology used herein best describes the principles of each embodiment, the actual application or technical improvement of the art in the market, or each embodiment disclosed herein by one of ordinary skill in the art. Intended to be understandable.

本発明の実施例では、認識対象となる目標画像を取得し、次に、決定された位置ベクトル及び目標画像の第１画像特徴に基づいて、目標画像の文字特徴を取得し、その後、文字特徴に基づいて目標画像内の文字を認識して、目標画像の文字認識結果を取得することができる。ここで、位置ベクトルは、プリセット情報シーケンスにおける文字の位置特徴に基づいて決定されたものであり、文字間の位置特徴を表すことができるため、文字認識プロセスにおいて、文字認識結果への文字間の位置特徴の影響を高め、文字認識プロセスのセマンティック特徴への依存性を低減し、文字認識の精度を向上させることができる。 In the embodiment of the present invention, the target image to be recognized is acquired, then the character features of the target image are acquired based on the determined position vector and the first image feature of the target image, and then the character features. The characters in the target image can be recognized based on the above, and the character recognition result of the target image can be obtained. Here, the position vector is determined based on the position feature of the character in the preset information sequence, and can represent the position feature between the characters. Therefore, in the character recognition process, the character spacing to the character recognition result is obtained. The influence of positional features can be increased, the dependence of the character recognition process on semantic features can be reduced, and the accuracy of character recognition can be improved.

Claims

It is a character recognition method
Acquiring the target image to be recognized and
Acquiring the character features of the target image based on the determined position vector and the first image feature of the target image, the position vector is determined based on the position features of the characters in the preset information sequence. That it is a vector
The character recognition method including recognizing a character in the target image based on the character feature and acquiring a character recognition result of the target image.

Acquiring the character features of the target image based on the determined position vector and the first image feature of the target image is
To obtain the coding result of the first image feature by encoding the first image feature of the target image, and
Determining the second image feature of the target image according to the coding result of the first image feature,
Acquiring the character features of the target image based on the determined position vector, the first image feature and the second image feature, and the like.
The character recognition method according to claim 1.

Encoding the first image feature of the target image to obtain the coding result of the first image feature is possible.
The first coding process of at least one level is sequentially executed on a plurality of first-dimensional feature vectors of the first image feature, and the coding result of the first image feature is acquired.
The character recognition method according to claim 2.

Obtaining the coding result of the first image feature by sequentially executing at least one level of the first coding process on the plurality of first-dimensional feature vectors of the first image feature is possible.
For the 1st level 1st coding process in the at least 1st level 1st coding process, the input of the 1st coding node is performed by using N (N is a positive integer) first coding nodes. Information is sequentially coded and the output results of N first coded nodes are acquired. When 1 <i ≦ N, the i (i is a positive integer) th. The input information of the first coded node of the above includes the output result of the i-1st first coded node.
Acquiring the coding result of the first image feature according to the output result of the N first coding nodes, and the like.
The character recognition method according to claim 3.

The input information of the first coding node further includes the output result of the first dimension feature vector of the first image feature or the first coding process of the previous level.
The character recognition method according to claim 4.

Acquiring the character features of the target image based on the determined position vector, the first image feature, and the second image feature can be done.
Determining the attention weight according to the position vector and the second image feature,
Acquiring the character features of the target image by performing feature weighting on the first image feature using the attention weights, including.
The character recognition method according to any one of claims 2 to 5.

The character recognition method is
Acquiring a preset information sequence that includes at least one first preset information,
Further comprising sequentially performing at least one level of second coding processing on the at least one first preset information and acquiring the position vector.
The character recognition method according to any one of claims 1 to 6.

Acquiring the position vector by sequentially performing at least one level of second coding processing on the at least one first preset information is possible.
For the 1st level 2nd coding process in the at least 1st level 2nd coding process, the input of the 2nd coding node is performed by using M (M is a positive integer) number of 2nd coding nodes. Information is sequentially coded and the output results of M second coded nodes are acquired. When 1 <j ≦ M, the j (j is a positive integer) th. The input information of the second coded node of the above includes the output result of the j-1st second coded node.
Acquiring the position vector according to the output result of the M second coding nodes, and the like.
The character recognition method according to claim 7.

The input information of the second coding node further includes the output result of the first preset information or the second coding process of the previous level.
The character recognition method according to claim 8.

Recognizing the characters in the target image based on the character features and acquiring the character recognition result of the target image is possible.
Extracting the semantic features of the target image and
Acquiring the character recognition result of the target image based on the semantic feature of the target image and the character feature, and the like.
The character recognition method according to any one of claims 1 to 9.

Extracting the semantic features of the target image
Including sequentially determining the semantic features of the target image in at least one time step based on the acquired second preset information.
Acquiring the character recognition result of the target image based on the semantic feature of the target image and the character feature is
Acquiring the character recognition result of the target image in at least one time step based on the semantic feature and the character feature of the target image in at least one time step.
The character recognition method according to claim 10.

It is possible to sequentially determine the semantic features of the target image in at least one time step based on the acquired second preset information.
Performing at least one level of third coding processing on the second preset information to acquire the semantic characteristics of the first time step of the at least one time step.
The character recognition result of the target image in the k-1 (k is an integer greater than 1) th time step is subjected to at least one level of third coding processing, and the target image in the kth time step. To get the semantic features of, including,
The character recognition method according to claim 11.

It is a character recognition device
An acquisition unit configured to acquire the target image to be recognized,
A determination unit configured to acquire character features of the target image based on the determined position vector and the first image feature of the target image, wherein the position vector is the position of the character in the preset information sequence. The decision part, which is decided based on the characteristics,
The character recognition device including a recognition unit configured to recognize characters in the target image based on the character features and acquire a character recognition result of the target image.

It ’s an electronic device,
With the processor
With memory configured to store processor executable instructions,
The electronic device, wherein the processor is configured to execute the method according to any one of claims 1 to 12 by calling and executing an instruction stored in the memory.

A computer-readable storage medium that stores computer program instructions.
The computer-readable storage medium that implements the method of any one of claims 1-12 when the computer program instructions are executed by a processor.

A computer program that contains computer-readable code
The computer program that causes the processor of the electronic device to perform the method according to any one of claims 1 to 12 when the computer-readable code is executed in the electronic device.