JP7429307B2

JP7429307B2 - Character string recognition method, device, equipment and medium based on computer vision

Info

Publication number: JP7429307B2
Application number: JP2022564797A
Authority: JP
Inventors: 志成楊; 睿宇李
Original assignee: Shenzhen Smartmore Technology Co Ltd
Current assignee: Shenzhen Smartmore Technology Co Ltd
Priority date: 2020-07-03
Filing date: 2021-07-02
Publication date: 2024-02-07
Anticipated expiration: 2041-07-02
Also published as: WO2022002262A1; CN111832561B; JP2023523745A; CN111832561A

Description

本願は、コンピュータビジョンに基づく文字列認識方法、装置、コンピュータ機器及び記憶媒体に関する。 The present application relates to a computer vision-based character string recognition method, apparatus, computer equipment, and storage medium.

本願は、２０２０年０７月０３日に提出された、発明が「コンピュータビジョンに基づく文字列認識方法、装置、機器及び媒体」、出願番号が２０２０１０６３０５５３．０である中国出願の優先権を主張しており、当該出願の開示内容は引用により全体として本願に組み込まれている。 This application claims the priority of the Chinese application filed on July 3, 2020, whose invention is "Character string recognition method, device, device, and medium based on computer vision" and whose application number is 202010630553.0. The disclosure of that application is incorporated herein by reference in its entirety.

コンピュータビジョン技術の発展に伴い、文字列に対する認識はコンピュータビジョン技術が生活において実際に適用されることの１つとなっており、例えば、工業シーンにおいて、シリアル番号、製造日、ステンシルや碑文などの文字列を認識する。一般には、文字列に対する認識プロセスでは、まず文字列の位置を検出し、検出した位置にある文字列をトリミングし、最後に、トリミングした文字列画像について角度判定及び認識を行い、対応するテキストコンテンツを得るか、又は、文字列を特殊な目標として検出し、分類器によって検出し、画像構造のモデルに基づいて１つの語として集約するか、又はニューラルネットワークアルゴリズムにより、画像特徴及び文字列位置と対応するコンテンツとのマッチング関係を作成することで、文字列を認識する。 With the development of computer vision technology, recognition of character strings has become one of the practical applications of computer vision technology in life. For example, in industrial scenes, recognition of character strings such as serial numbers, manufacturing dates, stencils, inscriptions, etc. Recognize columns. Generally, in the recognition process for a character string, the position of the character string is first detected, the character string at the detected position is trimmed, and finally, the angle of the trimmed character string image is determined and recognized, and the corresponding text content is Alternatively, the string can be detected as a special target, detected by a classifier and aggregated as a single word based on a model of the image structure, or a neural network algorithm can combine image features and string position. Recognize strings by creating matching relationships with corresponding content.

複数の実施例によれば、本願の第１態様は、
認識対象文字列が付いた画像を取得するステップと、
予め構築された位置検出モデルに基づいて、前記認識対象文字列が付いた画像のうちの前記認識対象文字列が位置する目標領域画像を取得するステップと、
前記目標領域画像を横方向補正して、横方向の目標領域画像を得るステップと、
予め構築された角度判定モデルに基づいて、前記横方向の目標領域画像の文字列の起立状態を取得するステップと、
前記文字列の起立状態が正立状態である場合、予め構築されたコンテンツ認識モデルに前記横方向の目標領域画像を入力し、前記認識対象文字列に対応する文字列コンテンツを取得するステップとを含む、コンピュータビジョンに基づく文字列認識方法を提供する。 According to embodiments, a first aspect of the present application includes:
obtaining an image with a character string to be recognized;
a step of acquiring a target area image in which the recognition target character string is located among the images to which the recognition target character string is attached, based on a position detection model constructed in advance;
horizontally correcting the target area image to obtain a horizontal target area image;
acquiring the standing state of the character string in the horizontal target area image based on a pre-built angle determination model;
If the standing state of the character string is an upright state, inputting the horizontal target area image to a content recognition model built in advance to obtain character string content corresponding to the character string to be recognized. A method for character string recognition based on computer vision is provided.

複数の実施例によれば、本願の第２態様は、
認識対象文字列が付いた画像を取得する画像取得モジュールと、
予め構築された位置検出モデルに基づいて、前記認識対象文字列が付いた画像のうちの前記認識対象文字列が位置する目標領域画像を取得する位置検出モジュールと、
前記目標領域画像を横方向補正して、横方向の目標領域画像を得る横方向補正モジュールと、
予め構築された角度判定モデルに基づいて、前記横方向の目標領域画像の文字列の起立状態を取得する角度判定モジュールと、
前記文字列の起立状態が正立状態である場合、予め構築されたコンテンツ認識モデルに前記横方向の目標領域画像を入力し、前記認識対象文字列に対応する文字列コンテンツを取得するコンテンツ認識モジュールとを含む、コンピュータビジョンに基づく文字列認識装置を提供する。 According to embodiments, a second aspect of the present application includes:
an image acquisition module that acquires an image with a recognition target character string;
a position detection module that acquires a target area image in which the recognition target character string is located among the images with the recognition target character string, based on a position detection model built in advance;
a lateral correction module that laterally corrects the target area image to obtain a lateral target area image;
an angle determination module that acquires a standing state of a character string in the horizontal target area image based on a pre-built angle determination model;
When the character string is in an upright state, a content recognition module inputs the horizontal target area image into a pre-built content recognition model and obtains character string content corresponding to the recognition target character string. Provided is a character string recognition device based on computer vision, including the following.

複数の実施例によれば、本願の第３態様は、
コンピュータプログラムが記憶されているメモリと、前記コンピュータプログラムを実行すると上記方法のステップを実現するプロセッサとを含む、コンピュータ機器を提供する。 According to embodiments, a third aspect of the present application includes:
A computer device is provided that includes a memory in which a computer program is stored and a processor that, when executed, implements the steps of the method.

複数の実施例によれば、本願の第４態様は、
プロセッサによって実行されると上記の方法のステップを実現するコンピュータプログラムが記憶されている、コンピュータ読み取り可能な記憶媒体を提供する。 According to embodiments, a fourth aspect of the present application includes:
A computer readable storage medium is provided having stored thereon a computer program which, when executed by a processor, implements the steps of the above method.

本願の１つ又は複数の実施例の詳細は以下の図面及び説明に記載される。本願の他の特徴及び利点は、明細書、図面及び特許請求の範囲から明らかになる。 The details of one or more implementations of the present application are set forth in the drawings and description below. Other features and advantages of the present application will be apparent from the specification, drawings, and claims.

本願の実施例又は従来技術の技術的解決手段をより明確に説明するために、以下、実施例又は従来の説明に必要な図面を簡単に説明するが、以下の説明における図面は本願の一部の実施例に過ぎず、当業者であれば、創造的な努力を必要とせずに、これらの図面に基づいて他の図面を得ることもできることは明らかである。 In order to more clearly explain the embodiments of the present application or the technical solutions of the prior art, drawings necessary for the embodiments or conventional explanation will be briefly explained below, but the drawings in the following explanation are part of the present application. It is clear that those skilled in the art can also derive other drawings based on these drawings without any creative effort.

一実施例におけるコンピュータビジョンに基づく文字列認識方法の流れの概略図である。FIG. 2 is a schematic diagram of the flow of a computer vision-based character string recognition method in one embodiment. 一実施例における予め構築された角度判定モデルに基づいて、横方向の目標領域画像の文字列の起立状態を取得する流れの概略図である。FIG. 3 is a schematic diagram of a flow of acquiring the standing state of a character string in a horizontal target area image based on a pre-built angle determination model in an embodiment. 一実施例における予め構築された位置検出モデルに基づいて、画像のうちの認識対象文字列が位置する目標領域画像を取得する流れの概略図である。FIG. 2 is a schematic diagram of a flow of acquiring a target area image in which a recognition target character string is located in an image based on a position detection model built in advance in one embodiment. 一実施例における予め構築されたコンテンツ認識モデルに横方向の目標領域画像を入力し、認識対象文字列に対応する文字列コンテンツを取得する流れの概略図である。FIG. 2 is a schematic diagram of a flow of inputting a horizontal target area image into a content recognition model built in advance and acquiring character string content corresponding to a recognition target character string in one embodiment. 別の実施例におけるコンピュータビジョンに基づく文字列認識方法の流れの概略図である。FIG. 3 is a schematic diagram of the flow of a computer vision-based character string recognition method in another embodiment. 一適用例におけるアルゴリズムトレーニング及び予測処理の流れの概略図である。FIG. 2 is a schematic diagram of the flow of algorithm training and prediction processing in one application example. 一適用例における画像特徴ピラミッドの構造概略図である。FIG. 2 is a structural schematic diagram of an image feature pyramid in one application example. 一適用例における文字列角度判定アルゴリズムの流れの概略図である。FIG. 2 is a schematic diagram of the flow of a character string angle determination algorithm in one application example. 一適用例における文字列コンテンツ認識アルゴリズムの流れの概略図である。FIG. 2 is a schematic diagram of the flow of a string content recognition algorithm in one application example. 一実施例におけるコンピュータビジョンに基づく文字列認識装置の構造ブロック図である。1 is a structural block diagram of a character string recognition device based on computer vision in one embodiment. FIG. 一実施例におけるコンピュータ機器の内部構造図である。FIG. 2 is an internal structural diagram of computer equipment in one embodiment.

現在の文字列認識方法は、全て低次元の手動特徴に基づくものであり、工業シーンにおける画像撮影角度の変化への適応処理能力に欠けるため、文字列に対する認識の正確率が低い。 Current character string recognition methods are all based on low-dimensional manual features and lack the ability to adapt to changes in image capture angles in industrial scenes, resulting in a low recognition accuracy rate for character strings.

本願の目的、技術的解決手段及び利点をより明確にするために、以下、図面及び実施例を参照して、本願についてさらに詳細に説明する。なお、ここで説明される具体的な実施例は本願を解釈するために過ぎず、本願を限定するものではない。 In order to make the objectives, technical solutions and advantages of the present application more clear, the present application will be described in more detail below with reference to the drawings and examples. Note that the specific examples described here are only for the purpose of interpreting the present application, and are not intended to limit the present application.

一実施例では、図１に示すように、コンピュータビジョンに基づく文字列認識方法を提供し、本実施例では、該方法が端末に適用される場合を例として説明するが、該方法はサーバに適用されてもよいし、端末とサーバを備えたシステムに適用され、端末とサーバとの相互作用を通じて実装されてもよいことが理解される。本実施例では、該方法は、ステップＳ１０１～ステップＳ１０５を含む。 In one embodiment, a character string recognition method based on computer vision is provided, as shown in FIG. It is understood that the invention may be applied to a system comprising a terminal and a server and may be implemented through interaction between the terminal and the server. In this embodiment, the method includes steps S101 to S105.

ステップＳ１０１において、端末は認識対象文字列が付いた画像を取得する。 In step S101, the terminal acquires an image with a recognition target character string attached.

ここでは、認識対象文字列とは、ユーザが画像から取得すべき文字列を指し、該画像は工業シーンで撮影された画像であってもよい。具体的には、ユーザは、携帯電話のカメラ又はビデオ収集機器などにより、さまざまなシーンから認識対象文字列が付いた画像を記録し、この画像を端末に記憶し、端末が認識対象文字列が付いた画像を得るようにしてもよい。 Here, the recognition target character string refers to a character string that the user should obtain from an image, and the image may be an image taken in an industrial scene. Specifically, the user records images with target character strings attached to them from various scenes using a mobile phone camera or video collection device, stores these images in the terminal, and the terminal records images with target character strings attached to them from various scenes. It is also possible to obtain an attached image.

ステップＳ１０２において、端末は、予め構築された位置検出モデルに基づいて、認識対象文字列が付いた画像のうちの認識対象文字列が位置する目標領域画像を取得する。 In step S102, the terminal acquires a target area image in which the recognition target character string is located, out of the images with the recognition target character string, based on a position detection model constructed in advance.

ここでは、位置検出モデルは主に画像内の認識対象文字の位置領域を検出するものであり、目標領域画像とは、認識対象文字列の該画像での位置領域の画像を指す。具体的には、端末は、予め構築された位置検出モデルを利用して、認識対象文字列が付いた画像に対して文字列位置検出を行うことで、認識対象文字列が位置する目標領域画像を決定してもよい。 Here, the position detection model mainly detects the position area of the character to be recognized within an image, and the target area image refers to the image of the position area of the character string to be recognized in the image. Specifically, the terminal uses a position detection model built in advance to perform character string position detection on an image with the recognition target character string, thereby detecting the target area image where the recognition target character string is located. may be determined.

ステップＳ１０３において、端末は、目標領域画像を横方向補正して、横方向の目標領域画像を得る。 In step S103, the terminal corrects the target area image in the horizontal direction to obtain a horizontal target area image.

ユーザがさまざまな撮影角度から認識対象文字列の画像を撮影するのが一般的であるため、端末により得られた、認識対象文字列が付いた画像では、認識対象文字列は横方向に配列されるのではなく、横方向とある角度をなして表現される場合が多い。このため、文字列認識の正確性を向上させるために、端末は、ステップＳ１０２で目標領域画像を得た後、目標領域画像を横方向に補正して、横方向の目標領域画像を得る必要がある。横方向の目標領域画像内では、認識対象文字列は横方向に配列される。具体的には、端末は目標領域画像に対してアフィン変換を行うことで横方向補正を行い、横方向の目標領域画像を得るようにしてもよい。 It is common for users to take images of recognition target character strings from various shooting angles, so in images with recognition target character strings obtained by a terminal, the recognition target character strings are arranged horizontally. It is often expressed at an angle with the horizontal direction, rather than at a horizontal angle. Therefore, in order to improve the accuracy of character string recognition, the terminal needs to obtain a target area image in the horizontal direction by correcting the target area image in the horizontal direction after obtaining the target area image in step S102. be. Within the horizontal target area image, the recognition target character strings are arranged horizontally. Specifically, the terminal may perform lateral correction by performing affine transformation on the target area image to obtain a lateral target area image.

ステップＳ１０４において、端末は、予め構築された角度判定モデルに基づいて、横方向の目標領域画像の文字列の起立状態を取得する。 In step S104, the terminal acquires the standing state of the character string in the horizontal target area image based on the angle determination model constructed in advance.

ステップＳ１０３では、端末は、目標領域画像に対する横方向補正を完了した後、ユーザの初期の撮影画像の角度から、得た横方向の目標領域画像の文字列の起立状態が正立状態であってもよいし、倒立状態であってもよく、倒立状態である場合、文字列の起立状態のずれにより最終の文字列認識結果が影響を受ける。このため、端末は、横方向の目標領域画像を得た後、得た横方向の目標領域画像の文字列の起立状態を決定する必要がある。具体的には、端末は、予め構築された角度判定モデルに横方向の目標領域画像を入力することで、横方向の目標領域画像の文字列の起立状態を決定してもよい。 In step S103, after completing the horizontal direction correction on the target area image, the terminal determines that the character string in the obtained horizontal target area image is in an upright position based on the angle of the initial photographed image of the user. In the case of an inverted state, the final character string recognition result is affected by the deviation in the standing state of the character string. Therefore, after obtaining the horizontal target area image, the terminal needs to determine the standing state of the character string in the obtained horizontal target area image. Specifically, the terminal may determine the standing state of the character string in the horizontal target area image by inputting the horizontal target area image into an angle determination model constructed in advance.

ステップＳ１０５において、端末は、文字列の起立状態が正立状態である場合、予め構築されたコンテンツ認識モデルに横方向の目標領域画像を入力し、認識対象文字列に対応する文字列コンテンツを取得する。 In step S105, if the standing state of the character string is an upright state, the terminal inputs the horizontal target area image to a content recognition model built in advance, and acquires the character string content corresponding to the character string to be recognized. do.

一方、端末は、このときの文字列の起立状態を正立状態として決定した場合、予め構築されたコンテンツ認識モデルに横方向の目標領域画像を直接入力してもよく、コンテンツ認識モデルは主に目標領域画像内の文字列のコンテンツを認識するものであり、このため、端末は、このコンテンツ認識モデルを利用して、認識対象文字列に対応する文字列コンテンツを得てもよい。 On the other hand, if the terminal determines that the standing state of the character string at this time is the upright state, the terminal may directly input the horizontal target area image into a pre-built content recognition model, and the content recognition model mainly uses This method recognizes the content of a character string in a target area image, and therefore, the terminal may use this content recognition model to obtain character string content corresponding to a character string to be recognized.

上記のコンピュータビジョンに基づく文字列認識方法では、端末は、認識対象文字列が付いた画像を取得し、予め構築された位置検出モデルに基づいて、画像のうちの認識対象文字列が位置する目標領域画像を取得し、目標領域画像を横方向補正して、横方向の目標領域画像を得て、予め構築された角度判定モデルに基づいて、横方向の目標領域画像の文字列の起立状態を取得し、文字列の起立状態が正立状態である場合、予め構築されたコンテンツ認識モデルに横方向の目標領域画像を入力し、認識対象文字列に対応する文字列コンテンツを取得する。本願では、端末が目標領域画像を横方向補正することにより、工業シーンにおける画像撮影角度の変化への適応処理が図られ、文字列に対する認識の正確率が向上する。 In the computer vision-based character string recognition method described above, a terminal acquires an image with a recognition target character string attached, and based on a pre-built position detection model, targets the target character string in the image where the recognition target character string is located. Acquire a region image, correct the target region image in the horizontal direction to obtain a horizontal target region image, and determine the standing state of the character string in the horizontal target region image based on a pre-built angle determination model. If the character string is acquired and the standing state of the character string is an upright state, the horizontal target area image is input to a content recognition model built in advance, and character string content corresponding to the character string to be recognized is acquired. In the present application, the terminal corrects the target area image in the horizontal direction, thereby achieving adaptive processing to changes in the image capturing angle in an industrial scene, and improving the recognition accuracy rate for character strings.

一実施例では、図２に示すように、ステップＳ１０４は、ステップＳ２０１とステップＳ２０２を含む。 In one embodiment, as shown in FIG. 2, step S104 includes step S201 and step S202.

ステップＳ２０１において、端末は、角度判定モデルに基づいて、横方向の目標領域画像の起立角度を取得する。 In step S201, the terminal acquires the upright angle of the target area image in the lateral direction based on the angle determination model.

ここでは、角度判定モデルは主に横方向の目標領域画像の角度を決定するものであり、文字列の起立状態が主としてユーザの初期の撮影画像の角度によるものであるため、端末はこの角度判定モデルにより、横方向の目標領域画像の起立角度を決定し、起立角度を利用して文字列の起立状態を決定してもよい。 Here, the angle determination model mainly determines the angle of the target area image in the horizontal direction, and since the standing state of the character string is mainly determined by the angle of the initial photographed image of the user, the terminal uses this angle determination model. The standing angle of the target area image in the horizontal direction may be determined by the model, and the standing state of the character string may be determined using the standing angle.

ステップＳ２０２において、端末は、起立角度が属する起立角度区間から、文字列の起立状態を決定する。 In step S202, the terminal determines the standing state of the character string from the standing angle section to which the standing angle belongs.

一方、端末により得られた横方向の目標領域画像の起立角度と標準の横方向角度との間の僅かなずれを回避するために、ステップＳ２０１では、端末は、角度判定モデルによって起立角度を決定した後、予め設定された起立角度区間表から、当該起立角度に適した起立角度区間を、該起立角度が属する起立角度区間として選択し、起立角度区間を利用して文字列の起立状態を決定してもよい。 On the other hand, in order to avoid a slight deviation between the rising angle of the lateral target area image obtained by the terminal and the standard lateral angle, in step S201, the terminal determines the rising angle using the angle determination model. After that, from the preset standing angle interval table, select the standing angle interval suitable for the standing angle as the standing angle interval to which the standing angle belongs, and use the standing angle interval to determine the standing state of the character string. You may.

さらに、起立角度区間は第１角度区間と第２角度区間を含んでもよく、文字列の起立状態は正立状態と倒立状態を含んでもよく、ステップＳ２０２は、起立角度区間が第１角度区間である場合、端末は文字列の起立状態を正立状態として決定するステップと、起立角度区間が第２角度区間である場合、文字列の起立状態を倒立状態として決定するステップとをさらに含んでもよい。 Further, the standing angle section may include a first angle section and a second angle section, and the standing state of the character string may include an upright state and an inverted state, and step S202 is performed when the standing angle section is the first angle section. In some cases, the terminal may further include determining the standing state of the character string as an upright state; and, if the standing angle section is a second angle section, determining the standing state of the character string as an inverted state. .

ここでは、第１角度区間と第２角度区間はそれぞれ２つの異なる角度区間であり、文字列の２種の起立状態をそれぞれ表す。具体的には、端末により得られた横方向の目標領域画像の起立角度が属する起立角度区間が第１角度区間である場合、端末は、このときの横方向の目標領域画像を正立状態として決定してもよく、一方、端末により得られた横方向の目標領域画像の起立角度が属する起立角度区間が第２角度区間である場合、端末は、このときの横方向の目標領域画像を倒立状態として決定してもよい。 Here, the first angular interval and the second angular interval are two different angular intervals, and represent two types of standing states of the character string, respectively. Specifically, when the upright angle interval to which the upright angle of the horizontal target area image obtained by the terminal belongs is the first angle interval, the terminal sets the horizontal target area image at this time as the upright state. On the other hand, if the upright angle interval to which the upright angle of the horizontal target area image obtained by the terminal belongs is the second angle interval, the terminal may invert the horizontal target area image at this time. It may also be determined as a state.

また、文字列の起立状態が倒立状態である場合、横方向の目標領域画像を正立状態に回転させて、コンテンツ認識モデルに入力し、文字列コンテンツを取得する。 Further, when the standing state of the character string is an inverted state, the horizontal target area image is rotated to an upright state and input to the content recognition model to obtain the character string content.

端末がコンテンツ認識モデルに倒立状態の横方向の目標領域画像をそのまま入力すれば、コンテンツ認識モデルが得た文字列コンテンツと実際の文字コンテンツとの間のずれをもたらす恐れがある。このため、コンテンツ認識モデルに横方向の目標領域画像を入力するに先立って、横方向の目標領域画像を回転させて、正立状態にする必要があり、例えば、横方向の目標領域画像の中心を１８０°回転させることによって、横方向の目標領域画像を正立状態に回転させ、コンテンツ認識モデルに回転後の横方向の目標領域画像を入力し、認識対象文字列の文字列コンテンツを得るようにしてもよい。 If the terminal directly inputs an inverted horizontal target area image into the content recognition model, there is a risk that a discrepancy will occur between the character string content obtained by the content recognition model and the actual character content. Therefore, before inputting the horizontal target area image into the content recognition model, it is necessary to rotate the horizontal target area image so that it is upright. By rotating 180 degrees, the horizontal target area image is rotated to an upright state, and the rotated horizontal target area image is input to the content recognition model to obtain the character string content of the recognition target character string. You can also do this.

上記の実施例では、端末は、角度判定モデルにより、横方向の目標領域画像の起立角度を得て、文字列の起立状態を決定してもよく、一方、文字列の起立状態が倒立状態である場合、端末は、回転によって横方向の目標領域画像を正立状態に変換し、コンテンツ認識モデルに正立状態の横方向の目標領域画像を入力し、文字列コンテンツを得るようにしてもよく、これは、得た文字列コンテンツの正確性のさらなる向上に有利である。 In the above embodiment, the terminal may determine the upright state of the character string by obtaining the upright angle of the horizontal target area image using the angle determination model, while the upright state of the character string is the inverted state. In some cases, the terminal may transform the horizontal target area image into an upright state by rotation, and input the upright horizontal target area image into the content recognition model to obtain the string content. , which is advantageous to further improve the accuracy of the obtained string content.

一実施例では、図３に示すように、ステップＳ１０２は、ステップＳ３０１～ステップＳ３０３を含む。 In one embodiment, as shown in FIG. 3, step S102 includes steps S301 to S303.

ステップＳ３０１において、端末は、位置検出モデルを利用して、画像から文字領域画像特徴を抽出する。 In step S301, the terminal extracts character area image features from the image using the position detection model.

ここでは、文字領域画像特徴とは、文字列位置を決定するための画像特徴を指す。具体的には、端末は、位置検出モデルを利用して、得た認識対象文字列の画像から、上記文字領域画像特徴を抽出してもよい。 Here, the character area image feature refers to an image feature for determining a character string position. Specifically, the terminal may extract the character region image features from the obtained image of the recognition target character string using the position detection model.

ステップＳ３０２において、端末は、文字領域画像特徴に従って、目標領域画像の予測マスクを取得する。 In step S302, the terminal obtains a prediction mask of the target area image according to the character area image characteristics.

ここで、マスクとは、選択された画像、図形又は物体であり、処理対象の画像（グローバル又はローカル）を遮断することで、画像の処理領域又は処理プロセスを制御することを指す。具体的には、端末は、文字領域画像特徴を利用して、文字領域画像特徴に対応する予測マスクを得るようにしてもよい。 Here, a mask is a selected image, figure, or object, and refers to controlling the processing area or processing process of an image by blocking the image to be processed (globally or locally). Specifically, the terminal may use the character area image features to obtain a prediction mask corresponding to the character area image features.

ステップＳ３０３において、端末は、予測マスクについて連通ドメイン及び最小外接矩形を求め、目標領域画像を得る。 In step S303, the terminal determines a connected domain and a minimum circumscribed rectangle for the prediction mask, and obtains a target area image.

ステップＳ３０２では、端末は、目標領域画像の予測マスクを得た後、該マスクについて連通ドメイン及び最小外接矩形を求め、目標画像を得るようにしてもよい。 In step S302, the terminal may obtain a prediction mask of the target area image, and then obtain a communicating domain and a minimum circumscribed rectangle for the mask to obtain the target image.

さらに、端末により得られた認識対象文字列が付いた画像に存在し得る鮮明さの不足や、光照射強度が低すぎることにより文字列認識の正確率が低すぎるという問題を回避するために、一実施例では、ステップＳ３０１は、さらに、端末は、画像を前処理し、前処理後の画像から高次元画像特徴を抽出するステップと、画像特徴ピラミッドを利用して、高次元画像特徴に対して第１特徴強調処理を行い、文字領域画像特徴とするステップとを含んでもよい。 Furthermore, in order to avoid the problem that the accuracy rate of character string recognition is too low due to lack of sharpness that may exist in the image with the recognition target character string obtained by the terminal and the light irradiation intensity is too low, In one embodiment, step S301 further includes the steps of preprocessing the image and extracting high-dimensional image features from the pre-processed image; The method may also include a step of performing first feature enhancement processing to make the character area image feature.

ここで、前処理のプロセスは、端末が、認識対象文字列が付いた画像のうちの小さい又は視認しにくい文字列領域画像をフィルタリングすることで、認識対象文字列が付いた画像内の高次元画像特徴を抽出できることであってもよく、また、端末は、画像特徴ピラミッドを利用して、抽出した高次元画像特徴に対して第１特徴強調処理を行ってもよく、これは、文字領域画像特徴の特徴表現能力の向上に有利であり、特徴が不明確な環境においても正確な目標領域画像の予測マスクを生成することが可能である。 Here, in the preprocessing process, the terminal filters out small or hard-to-see character string region images in the image with the recognition target character string, thereby reducing the high-dimensionality of the image with the recognition target character string. The terminal may be able to extract image features, and the terminal may perform a first feature enhancement process on the extracted high-dimensional image features using an image feature pyramid, which may be performed on character area images. This is advantageous in improving the ability to express features, and it is possible to generate accurate prediction masks for target area images even in environments where features are unclear.

上記実施例では、端末は、画像から文字領域画像特徴を抽出し、対応する予測マスクを生成し、また、予測マスクについて連通ドメイン及び最小外接矩形を求めることで、正確な目標領域画像を得てもよく、また、特徴が不明瞭であることによる文字列の認識漏れや誤認識などの問題を回避するために、端末は、画像特徴ピラミッドにより、抽出した画像特徴に対して第１特徴強調処理を行うことで、文字領域画像特徴の特徴表現能力を向上させることができ、これにより、文字列認識の正確性をさらに向上させる。 In the above embodiment, the terminal extracts character area image features from the image, generates a corresponding prediction mask, and obtains an accurate target area image by determining a continuous domain and a minimum circumscribing rectangle for the prediction mask. In addition, in order to avoid problems such as missed recognition or misrecognition of character strings due to unclear features, the terminal performs the first feature enhancement process on the extracted image features using an image feature pyramid. By doing so, it is possible to improve the ability to express character region image features, thereby further improving the accuracy of character string recognition.

一実施例では、図４に示すように、ステップＳ１０５は、ステップＳ４０１～ステップＳ４０３を含む。 In one embodiment, as shown in FIG. 4, step S105 includes steps S401 to S403.

ステップＳ４０１において、端末は、コンテンツ認識モデルを利用して、横方向の目標領域画像に対してグローバル画像特徴抽出を行い、横方向の目標領域画像に対応する文字列画像特徴を得る。 In step S401, the terminal performs global image feature extraction on the horizontal target area image using the content recognition model to obtain character string image features corresponding to the horizontal target area image.

ここで、コンテンツ認識モデルは、主に横方向の目標領域画像に含まれる認識対象文字列の文字コンテンツを認識するものである。具体的には、端末は、コンテンツ認識モデルを利用して、得た横方向の目標領域画像に対してグローバル画像特徴抽出を行い、横方向の目標領域画像に対応する文字列画像特徴を得るようにしてもよい。 Here, the content recognition model mainly recognizes the character content of the recognition target character string included in the horizontal target area image. Specifically, the terminal uses the content recognition model to perform global image feature extraction on the obtained horizontal target area image to obtain character string image features corresponding to the horizontal target area image. You can also do this.

ステップＳ４０２において、端末は、行ベクトル畳み込みカーネルを用いて横方向に沿って文字列画像特徴に対して第２特徴強調処理を行う。 In step S402, the terminal performs a second feature enhancement process on the character string image feature along the horizontal direction using a row vector convolution kernel.

ここでは、第２特徴強調処理とは、文字列画像特徴に対する特徴強調処理を指す。具体的には、ステップＳ４０１では、文字列画像特徴を得た後、行ベクトル畳み込みカーネルを用いて、横方向、すなわち文字列の方向に沿って文字列画像特徴に対して第２特徴強調処理を行ってもよい。 Here, the second feature enhancement process refers to feature enhancement processing for character string image features. Specifically, in step S401, after obtaining character string image features, a second feature enhancement process is performed on the character string image features in the horizontal direction, that is, along the direction of the character string, using a row vector convolution kernel. You may go.

ステップＳ４０３において、端末は、第２特徴強調処理により得られた文字列画像特徴に基づいて、認識対象文字列を並列予測して、前記文字列コンテンツを得る。 In step S403, the terminal performs parallel prediction of the recognition target character string based on the character string image feature obtained by the second feature enhancement process to obtain the character string content.

また、文字列認識の効率をさらに高めるために、端末は、第２特徴強調処理により得られた文字列画像特徴について、文字列コンテンツの認識を行ってもよく、また、認識プロセスは並列予測であり、複数の文字列について予測することができ、このため、文字列コンテンツに対する効率的な予測が図られる。 Furthermore, in order to further improve the efficiency of character string recognition, the terminal may perform character string content recognition on the character string image features obtained by the second feature enhancement process, and the recognition process may be performed using parallel prediction. It is possible to make predictions for multiple character strings, and therefore, efficient prediction for character string content is achieved.

本実施例では、端末は、コンテンツ認識モデルによって文字列のコンテンツを正確に認識し、文字列画像特徴に対して第２特徴強調処理を行うことで、特徴の表現能力を向上させることができ、これにより、文字列コンテンツ認識の正確性を向上させ、また、並列予測方法によって全ての文字列に対して予測を行うことで、文字列コンテンツ認識の効率をさらに向上させる。 In this embodiment, the terminal can accurately recognize the content of the character string using the content recognition model, and perform the second feature enhancement process on the character string image feature, thereby improving the ability to express the feature. This improves the accuracy of character string content recognition, and further improves the efficiency of character string content recognition by making predictions for all character strings using the parallel prediction method.

一実施例では、図５に示すように、コンピュータビジョンに基づく文字列認識方法を提供し、本実施例では、該方法が端末に適用される場合を例として説明するが、本実施例では、該方法は、ステップＳ５０１～ステップＳ５１０を含む。 In one embodiment, as shown in FIG. 5, a character string recognition method based on computer vision is provided, and in this embodiment, a case where the method is applied to a terminal will be explained as an example. The method includes steps S501 to S510.

ステップＳ５０１において、端末は、認識対象文字列が付いた画像を取得する。 In step S501, the terminal acquires an image with a recognition target character string attached.

ステップＳ５０２において、端末は、画像を前処理し、前処理後の画像から高次元画像特徴を抽出し、画像特徴ピラミッドを利用して、高次元画像特徴に対して第１特徴強調処理を行い、文字領域画像特徴とする。 In step S502, the terminal preprocesses the image, extracts high-dimensional image features from the pre-processed image, and performs a first feature enhancement process on the high-dimensional image features using an image feature pyramid; Let the character area be an image feature.

ステップＳ５０３において、端末は、文字領域画像特徴に従って、目標領域画像の予測マスクを取得し、予測マスクについて連通ドメイン及び最小外接矩形を求め、目標領域画像を得る。 In step S503, the terminal obtains a prediction mask of the target area image according to the character area image characteristics, determines a continuous domain and a minimum circumscribed rectangle for the prediction mask, and obtains a target area image.

ステップＳ５０４において、端末は、目標領域画像を横方向補正して、横方向の目標領域画像を得る。 In step S504, the terminal corrects the target area image in the horizontal direction to obtain a horizontal target area image.

ステップＳ５０５において、端末は、角度判定モデルに基づいて、横方向の目標領域画像の起立角度を取得する。 In step S505, the terminal acquires the upright angle of the target area image in the lateral direction based on the angle determination model.

ステップＳ５０６において、起立角度区間が前記第１角度区間である場合、端末は、文字列の起立状態を正立状態として決定し、起立角度区間が第２角度区間である場合、端末は、文字列の起立状態を倒立状態として決定する。 In step S506, if the standing angle section is the first angle section, the terminal determines the standing state of the character string as an upright state, and if the standing angle section is the second angle section, the terminal determines that the standing state of the character string is the upright state. The standing state of is determined as an inverted state.

ステップＳ５０７において、文字列の起立状態が正立状態である場合、端末は、予め構築されたコンテンツ認識モデルに横方向の目標領域画像を入力し、文字列の起立状態が倒立状態である場合、端末は、横方向の目標領域画像を正立状態に回転させてコンテンツ認識モデルに入力する。 In step S507, if the standing state of the character string is an upright state, the terminal inputs the horizontal target area image to a pre-built content recognition model, and if the standing state of the character string is an inverted state, The terminal rotates the horizontal target area image into an upright state and inputs it to the content recognition model.

ステップＳ５０８において、端末は、コンテンツ認識モデルを利用して、横方向の目標領域画像に対してグローバル画像特徴抽出を行い、横方向の目標領域画像に対応する文字列画像特徴を得る。 In step S508, the terminal performs global image feature extraction on the horizontal target area image using the content recognition model to obtain character string image features corresponding to the horizontal target area image.

ステップＳ５０９において、端末は、行ベクトル畳み込みカーネルを用いて横方向に沿って文字列画像特徴に対して第２特徴強調処理を行う。 In step S509, the terminal performs second feature enhancement processing on the character string image feature along the horizontal direction using the row vector convolution kernel.

ステップＳ５１０において、端末は、第２特徴強調処理により得られた文字列画像特徴に基づいて、認識対象文字列を並列予測して、文字列コンテンツを得る。 In step S510, the terminal performs parallel prediction of the recognition target character string based on the character string image features obtained by the second feature enhancement process to obtain character string content.

上記実施例では、端末が目標領域画像を横方向補正することにより、工業シーンにおける画像撮影角度の変化への適応処理が図られ、文字列に対する認識の正確率が向上する。また、端末は、角度判定モデルにより、横方向の目標領域画像の起立角度を得て、文字列の起立状態を決定してもよく、文字列の起立状態が倒立状態である場合、端末は、回転によって横方向の目標領域画像を正立状態に変換し、これは、得られた文字列コンテンツの正確性のさらなる向上に有利である。また、端末は、画像特徴ピラミッドを利用して、抽出した高次元画像特徴に対して第１特徴強調処理を行い、文字列画像特徴に対して第２特徴強調処理を行ってもよく、これにより、特徴の表現能力を向上させ、文字列コンテンツ認識の正確性をさらに向上させることができる。しかも、並列予測方法によって全ての文字列について予測を行うことにより、文字列コンテンツ認識の効率をさらに向上させる。 In the embodiment described above, by the terminal correcting the target area image in the horizontal direction, adaptive processing to changes in the image capturing angle in an industrial scene is achieved, and the recognition accuracy rate for character strings is improved. Further, the terminal may determine the standing state of the character string by obtaining the standing angle of the horizontal target area image using an angle determination model, and if the standing state of the character string is an inverted state, the terminal: The rotation transforms the lateral target area image into an upright state, which is advantageous for further improving the accuracy of the obtained string content. Further, the terminal may use the image feature pyramid to perform first feature enhancement processing on the extracted high-dimensional image features, and may perform second feature enhancement processing on the character string image features. , it is possible to improve the feature representation ability and further improve the accuracy of character string content recognition. Moreover, by predicting all character strings using the parallel prediction method, the efficiency of character string content recognition is further improved.

一適用例では、現在の工業シーンにおいて、文字認識アルゴリズムのぼやけ、光照射や角度変化などの場合での認識漏れ、誤認識等の問題を効果的に解決し、認識正確率をより高くする目的で、工業シーンにおける任意の角度の文字列認識アルゴリズムをさらに提供する。本願は、カメラの画像形成環境が悪い工業環境に配置されてもよく、また、認識アルゴリズムの効率性や正確性を確保し、多角度、さらに倒立文字の認識をサポートする。ここで、アルゴリズムのトレーニング及び予測処理の流れを図６に示す。流れは主としてアルゴリズムのトレーニングと予測の２つのプロセスに分けられる。トレーニングプロセスでは、それぞれ文字列位置の検出、文字列角度の判定及び文字列コンテンツの認識のための３つの異なるモデルをトレーニングする必要がある。予測プロセスでは、トレーニング済みのモデルはテスト画像に入力されて、位置検出、角度判定及び認識コンテンツの順に処理を行い、最後に、文字列、位置及び対応するコンテンツが得られる。 One application example is to effectively solve problems such as blurring of character recognition algorithms, omission of recognition due to light irradiation or angle changes, and misrecognition in the current industrial scene, and to increase the recognition accuracy rate. In this paper, we further provide an arbitrary angle character string recognition algorithm in industrial scenes. The present application may be placed in an industrial environment where the camera image forming environment is poor, and also ensures the efficiency and accuracy of the recognition algorithm, and supports multi-angle and even inverted character recognition. Here, the flow of algorithm training and prediction processing is shown in FIG. The flow is mainly divided into two processes: algorithm training and prediction. The training process requires training three different models for string position detection, string angle determination, and string content recognition, respectively. In the prediction process, the trained model is input to a test image and processes position detection, angle determination and recognition content in this order, and finally, the string, position and corresponding content are obtained.

各モジュールによる処理の流れは、具体的には、以下のとおりである。 Specifically, the flow of processing by each module is as follows.

（一）トレーニングプロセス
１．１文字列位置検出アルゴリズム
トレーニングサンプルは、文字列を含む全体のサンプル画像であり、対応する注釈は、文字列位置の座標情報、例えば文字列の開始点の左上隅及び終了点の右下隅の情報を含む画像内の文字列の位置ボックスである。異なるトレーニングサンプルの間にスケール、色分布の違いが存在することから、サンプルに対して正規化処理を行うとともに、小さい又は視認しにくい文字列位置ボックスをフィルタリングする必要がある。画像前処理を受けたデータは、文字列位置検出アルゴリズム部分の入力とし、この部分はディープニューラルネットワークを介して、画像特徴ピラミッド構造と合わせて特徴強調を行う。図７に示すように、ｃｏｎｖは畳み込み層を表し、ｓｔｒｉｄｅはステップサイズを表し、抽出した各スケールの特徴についてアップサンプリングを行い、以前にネットワークを介して得られた特徴を加算することにより、最終的な画像特徴が得られる。この場合、該特徴は、空間情報に加えて、セマンティクス情報を保持している。位置検出アルゴリズムによって得られた画像特徴は、最終的な画像文字列領域に対するマスクを予測することに用いられる。該マスクについて連通ドメイン及び最小外接矩形を求めることにより、文字列位置ボックスが得られる。
１．２文字列角度判定アルゴリズム
図８に示すように、文字列の角度が０度よりも大きく１８０度未満の場合、アフィン変換によって横方向に補正された文字列画像が得られる。横方向に補正された後、最初の撮影角度により、補正後の文字列について正立か倒立が確保されにくく、このため、補正後の文字列が倒立であるか否かを判定するための角度判定アルゴリズムが追加され、倒立の場合、中心に対して文字列を１８０度回転させ、正立の場合、処理せずに直接出力する。このようにして、最終的に得られた文字列画像が正立のものとして確保され、次の文字列コンテンツの出力とされる。
１．３文字列コンテンツ認識アルゴリズム
図９に示すように、文字列画像コンテンツの認識には、ディープニューラルネットワークを用いて文字列特徴について学習を行い、列全体の特徴を取得するために、最後に、抽出した画像特徴に対して、行ベクトルを畳み込みカーネルとして、文字列方向に沿って特徴強調を行い、これにより、文字列コンテンツを並列して効率的に予測する。 (1) Training process 1.1 String position detection algorithm The training sample is the entire sample image containing the string, and the corresponding annotation is the coordinate information of the string position, such as the upper left corner of the starting point of the string and This is the location box for the string in the image that contains the bottom right corner information of the ending point. Since there are differences in scale and color distribution between different training samples, it is necessary to perform normalization processing on the samples and filter out small or difficult to see character string position boxes. The data that has undergone image preprocessing is input to the character string position detection algorithm section, and this section performs feature enhancement in conjunction with the image feature pyramid structure via a deep neural network. As shown in Figure 7, conv represents a convolutional layer, stride represents a step size, upsamples the extracted features of each scale, and adds the features previously obtained through the network. The final image features are obtained. In this case, the features carry semantic information in addition to spatial information. The image features obtained by the position detection algorithm are used to predict the mask for the final image string region. A character string position box is obtained by determining a communicating domain and a minimum circumscribed rectangle for the mask.
1.2 Character String Angle Determination Algorithm As shown in FIG. 8, when the angle of a character string is greater than 0 degrees and less than 180 degrees, a character string image corrected in the horizontal direction by affine transformation is obtained. After being corrected in the horizontal direction, depending on the initial shooting angle, it is difficult to ensure that the corrected character string is erect or inverted. Therefore, the angle used to determine whether the corrected character string is inverted is A determination algorithm has been added, which rotates the character string 180 degrees about the center if it is inverted, and outputs it directly without processing if it is upright. In this way, the finally obtained character string image is ensured to be erect, and is used as the output of the next character string content.
1.3 String content recognition algorithm As shown in Figure 9, to recognize string image content, a deep neural network is used to learn about string features, and in order to obtain the features of the entire string, , For the extracted image features, feature emphasis is performed along the character string direction using row vectors as convolution kernels, thereby efficiently predicting character string contents in parallel.

（二）予測プロセス
テスト画像を入力し、まず、文字列位置検出アルゴリズムで該テスト画像の文字列位置を検出する。次に、検出した画像領域についてトリミング及びアフィン変換を行い、変換後のトリミング領域を文字列角度判定アルゴリズムに供給し、トリミング領域画像が倒立であると判定した場合、中心に対して１８０度回転させ、正立であると判定した場合、処理しない。文字列位置検出アルゴリズム及び文字列角度判定アルゴリズムで処理された画像領域を、文字列コンテンツ認識ネットワークの入力とし、最後に、コンテンツ認識ネットワークにより、画像内の文字列の位置及び対応する文本コンテンツを得る。 (2) Prediction process A test image is input, and first, the character string position of the test image is detected using a character string position detection algorithm. Next, trimming and affine transformation are performed on the detected image area, and the converted trimming area is supplied to the character string angle determination algorithm. If the cropped area image is determined to be inverted, it is rotated 180 degrees about the center. , if it is determined that the image is erect, no processing is performed. The image area processed by the character string position detection algorithm and the character string angle determination algorithm is input to the character string content recognition network, and finally, the position of the character string in the image and the corresponding text content are obtained by the content recognition network. .

上記適用例では、カスケード文字列位置検出アルゴリズム、文字列角度判定アルゴリズム及び文字列コンテンツ認識アルゴリズムという合計３段階のアルゴリズムにより、形成画像の鮮明さが変化したり、角度が変化したり、光照射が変化したりする一般的な工業シーンにおいても文字列を安定的かつ効率よく認識するアルゴリズムが得られ、工業シーンにおける文字列認識の適用のための基礎を築いた。 In the application example above, a total of three stages of algorithms: a cascade character string position detection algorithm, a character string angle determination algorithm, and a character string content recognition algorithm are used to change the sharpness of the formed image, change the angle, and change the light irradiation. We obtained an algorithm that can stably and efficiently recognize character strings even in general industrial scenes that change, and laid the foundation for the application of character string recognition in industrial scenes.

なお、本願の流れ図における各ステップは、矢印のような順で示されているものの、これらのステップは必ずしも矢印のような順番に従って実行されるわけではない。明確な記載がない限り、これらのステップの実行の順番には厳格な制限がなく、これらのステップは他の順番で実行されてもよい。そして、図における少なくとも一部のステップは複数のステップ又は複数の段階を含んでもよく、これらのステップ又は段階は必ずしも同じタイミングで完了するわけではなく、異なるタイミングで実行されても構わず、これらのステップ又は段階は必ずしも順次実行されるとは限らず、他のステップ、他のステップにおけるステップ又は段階の少なくとも一部と順番に又は交互に実行されてもよい。 Note that although the steps in the flowchart of the present application are shown in the order indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless explicitly stated, there is no strict restriction on the order of performance of these steps, and these steps may be performed in other orders. Furthermore, at least some of the steps in the diagram may include multiple steps or multiple stages, and these steps or stages are not necessarily completed at the same timing, and may be performed at different timings, and these The steps or stages are not necessarily performed sequentially, but may be performed sequentially or alternately with other steps or at least some of the steps or stages in other steps.

一実施例では、図１０に示すように、画像取得モジュール１００１と、位置検出モジュール１００２と、横方向補正モジュール１００３と、角度判定モジュール１００４と、コンテンツ認識モジュール１００５とを含むコンピュータビジョンに基づく文字列認識装置を提供し、
画像取得モジュール１００１は、認識対象文字列が付いた画像を取得するために用いられる。
位置検出モジュール１００２は、予め構築された位置検出モデルに基づいて、画像のうちの認識対象文字列が位置する目標領域画像を取得するために用いられる。
横方向補正モジュール１００３は、目標領域画像を横方向補正して、横方向の目標領域画像を得るために用いられる。
角度判定モジュール１００４は、予め構築された角度判定モデルに基づいて、横方向の目標領域画像の文字列の起立状態を取得するために用いられる。
コンテンツ認識モジュール１００５は、文字列の起立状態が正立状態である場合、予め構築されたコンテンツ認識モデルに横方向の目標領域画像を入力し、認識対象文字列に対応する文字列コンテンツを取得するために用いられる。 In one embodiment, as illustrated in FIG. provide a recognition device;
The image acquisition module 1001 is used to acquire an image with a recognition target character string attached.
The position detection module 1002 is used to obtain a target area image in which a recognition target character string is located in the image based on a position detection model constructed in advance.
The horizontal correction module 1003 is used to horizontally correct the target area image to obtain a horizontal target area image.
The angle determination module 1004 is used to obtain the standing state of a character string in a horizontal target area image based on a pre-built angle determination model.
When the character string is in an erect state, the content recognition module 1005 inputs the horizontal target area image into a pre-built content recognition model and obtains the character string content corresponding to the recognition target character string. used for

一実施例では、角度判定モジュール１００４は、さらに、角度判定モデルに基づいて、横方向の目標領域画像の起立角度を取得し、起立角度が属する起立角度区間から、文字列の起立状態を決定するために用いられる。 In one embodiment, the angle determination module 1004 further obtains the upright angle of the horizontal target area image based on the angle determination model, and determines the upright state of the character string from the upright angle section to which the upright angle belongs. used for

一実施例では、起立角度区間は第１角度区間と第２角度区間を含み、文字列の起立状態は正立状態と倒立状態を含み、角度判定モジュール１００４は、さらに、起立角度区間が前記第１角度区間である場合、文字列の起立状態を正立状態として決定し、起立角度区間が前記第２角度区間である場合、文字列の起立状態を倒立状態として決定するために用いられる。 In one embodiment, the standing angle section includes a first angular section and a second angular section, the standing states of the character string include an upright state and an inverted state, and the angle determination module 1004 further determines that the standing angle section includes a first angular section and a second angular section; If it is one angle section, it is used to determine the standing state of the character string as an upright state, and if the standing angle section is the second angle section, it is used to determine the standing state of the character string as an inverted state.

一実施例では、コンテンツ認識モジュール１００５は、さらに、文字列の起立状態が倒立状態である場合、横方向の目標領域画像を正立状態に回転させる。 In one embodiment, the content recognition module 1005 further rotates the horizontal target area image to an upright state if the character string is in an inverted state.

一実施例では、位置検出モジュール１００２は、さらに、位置検出モデルを利用して、認識対象文字列が付いた画像から文字領域画像特徴を抽出し、文字領域画像特徴に従って、目標領域画像の予測マスクを取得し、予測マスクについて連通ドメイン及び最小外接矩形を求め、目標領域画像を得るために用いられる。 In one embodiment, the position detection module 1002 further utilizes the position detection model to extract character region image features from the image with the character string to be recognized , and uses a predictive mask of the target region image according to the character region image features. The connected domain and minimum circumscribing rectangle are obtained for the prediction mask and used to obtain the target area image.

一実施例では、位置検出モジュール１００２は、さらに、認識対象文字列が付いた画像を前処理し、前処理後の画像から高次元画像特徴を抽出し、画像特徴ピラミッドを利用して、高次元画像特徴に対して第１特徴強調処理を行い、文字領域画像特徴とするために用いられる。 In one embodiment, the position detection module 1002 further preprocesses the image with the character string to be recognized , extracts high-dimensional image features from the preprocessed image, and utilizes an image feature pyramid to perform high-dimensional It is used to perform a first feature enhancement process on the image feature and make it a character area image feature.

一実施例では、コンテンツ認識モジュール１００５は、さらに、コンテンツ認識モデルを利用して、横方向の目標領域画像に対してグローバル画像特徴抽出を行い、横方向の目標領域画像に対応する文字列画像特徴を得て、行ベクトル畳み込みカーネルを用いて横方向に沿って文字列画像特徴に対して第２特徴強調処理を行い、第２特徴強調処理により得られた文字列画像特徴に基づいて、認識対象文字列を並列予測して、文字列コンテンツを得るために用いられる。 In one embodiment, the content recognition module 1005 further utilizes the content recognition model to perform global image feature extraction on the horizontal target region image, and performs global image feature extraction on the horizontal target region image to identify character string image features corresponding to the horizontal target region image. Then, a second feature enhancement process is performed on the character string image features along the horizontal direction using a row vector convolution kernel, and based on the character string image features obtained by the second feature enhancement process, the recognition target is It is used to predict strings in parallel to obtain string content.

コンピュータビジョンに基づく文字列認識装置についての具体的な限定は、上記でコンピュータビジョンに基づく文字列認識方法に対する限定を参照してもよいため、ここでは詳しく説明しない。上記コンピュータビジョンに基づく文字列認識装置における各モジュールの全部又は一部は、ソフトウェア、ハードウェアとソフトウェアとの組み合わせによって実装されてもよい。上記各モジュールは、ハードウェアの形態でコンピュータ機器のプロセッサに組み込まれたり、コンピュータ機器のプロセッサから独立してもよく、また、ソフトウェアの形態でコンピュータ機器のメモリに記憶されて、プロセッサによって呼び出されて以上の各モジュールに対応する操作を実行してもよい。 Specific limitations on the computer vision-based string recognition device may refer to the limitations on the computer vision-based string recognition method above, so they will not be described in detail here. All or part of each module in the character string recognition device based on computer vision may be implemented by software or a combination of hardware and software. Each of the above modules may be incorporated in the processor of the computer device in the form of hardware, or may be independent from the processor of the computer device, or may be stored in the memory of the computer device in the form of software and called up by the processor. Operations corresponding to each of the above modules may be executed.

一実施例では、コンピュータ機器を提供し、該コンピュータ機器は端末であってもよく、その内部構造図は図１１に示されるものであってもよい。該コンピュータ機器は、システムのバスを介して接続されたプロセッサ、メモリ、通信インターフェース、表示画面、及び入力装置を含む。該コンピュータ機器のプロセッサは計算及び制御の能力を提供するものである。該コンピュータ機器のメモリは、不揮発性記憶媒体、内部メモリを含む。該不揮発性記憶媒体にはオペレーティングシステム及びコンピュータプログラムが記憶されている。該内部メモリは不揮発性記憶媒体におけるオペレーティングシステム及びコンピュータプログラムが実行するための環境を提供する。該コンピュータ機器の通信インターフェースは外部の端末と有線又は無線通信を行うことに用いられ、無線方式は、ＷＩＦＩ、事業者のネットワーク、ＮＦＣ（近距離無線通信）や他の技術で実現されてもよい。該コンピュータプログラムは、プロセッサによって実行されると、コンピュータビジョンに基づく文字列認識方法を実現する。該コンピュータ機器の表示画面は液晶表示画面又は電子インク表示画面であってもよく、該コンピュータ機器の入力装置は表示画面上に覆われたタッチ層であってもよいし、コンピュータ機器のケースに設けられたボタン、トラックボール又はタッチパネルであってもよいし、外付けのキーボード、タッチパネルやマウスなどであってもよい。 In one embodiment, a computer device is provided, the computer device may be a terminal, and the internal structure diagram thereof may be as shown in FIG. 11. The computing equipment includes a processor, memory, communication interface, display screen, and input devices connected through a system bus. The processor of the computer equipment provides computing and control capabilities. The memory of the computer equipment includes non-volatile storage media, internal memory. The non-volatile storage medium stores an operating system and computer programs. The internal memory provides an environment for the operating system and computer programs to execute on non-volatile storage media. The communication interface of the computer device is used to perform wired or wireless communication with an external terminal, and the wireless method may be realized by WIFI, an operator's network, NFC (near field communication), or other technology. . The computer program, when executed by a processor, implements a computer vision-based character string recognition method. The display screen of the computer equipment may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment may be a touch layer covered on the display screen or provided on the case of the computer equipment. It may be a button, a trackball, or a touch panel, or it may be an external keyboard, touch panel, mouse, or the like.

当業者にとって明らかなように、図１１に示す構造は、本願の解決手段に関連する部分の構造のブロック図に過ぎず、本願の解決手段が適用されるコンピュータ機器を限定するものではない。具体的には、コンピュータ機器は、図に示したものよりも少ない又は多い部材を含んだり、一部の部材を組み合わせたり、異なる部材の配置を有したりしてもよい。 As is obvious to those skilled in the art, the structure shown in FIG. 11 is only a block diagram of the structure of a part related to the solution of the present application, and does not limit the computer equipment to which the solution of the present application is applied. In particular, the computing equipment may include fewer or more components, may have combinations of components, or have different arrangements of components than are shown in the figures.

一実施例では、コンピュータプログラムが記憶されているメモリと、コンピュータプログラムを実行すると上記各方法実施例におけるステップを実現するプロセッサと、を含むコンピュータ機器をさらに提供する。 In one embodiment, computer equipment is further provided that includes a memory in which a computer program is stored and a processor that, when executed, implements the steps in each of the method embodiments described above.

一実施例では、プロセッサによって実行されると上記各方法実施例のステップを実現するコンピュータプログラムが記憶されているコンピュータ読み取り可能な記憶媒体を提供する。 One embodiment provides a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the steps of each of the method embodiments described above.

当業者にとって明らかなように、上記実施例方法の全部又は一部の流れは、コンピュータプログラムが関連するハードウェアに命令することで実施されてもよく、前記コンピュータプログラムは不揮発性コンピュータ読み取り可能な取記憶媒体に記憶されてもよく、該コンピュータプログラムは、実行されると、上記各方法の実施例の流れを含んでもよい。本願に係る各実施例で使用されるメモリ、記憶、データベース又は他の媒体の全ての引用は、不揮発性メモリ及び揮発性メモリの少なくとも１種を含んでもよい。不揮発性メモリは、読み取り専用メモリ（ＲＯＭ：Ｒｅａｄ－ＯｎｌｙＭｅｍｏｒｙ）、磁気テープ、フロッピーディスク、フラッシュメモリ又は光メモリなどを含んでもよい。揮発性メモリは、ランダムアクセスメモリ（ＲＡＭ：ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）又は外部キャッシュメモリを含んでもよい。非限定的な説明であるが、ＲＡＭは、スタティックランダムアクセスメモリ（ＳＲＡＭ：ＳｔａｔｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）又はダイナミックランダムアクセスメモリ（ＤＲＡＭ：ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）などのさまざまな形態であってもよい。 As will be apparent to those skilled in the art, all or part of the steps of the example method described above may be implemented by a computer program instructing relevant hardware, the computer program being a non-volatile computer readable device. The computer program may be stored on a storage medium and, when executed, may include the flow of each of the method embodiments described above. All references to memory, storage, database, or other media used in embodiments of this application may include at least one of non-volatile memory and volatile memory. Nonvolatile memory may include read-only memory (ROM), magnetic tape, floppy disk, flash memory, optical memory, and the like. Volatile memory may include random access memory (RAM) or external cache memory. By way of non-limiting illustration, RAM may be in various forms, such as static random access memory (SRAM) or dynamic random access memory (DRAM).

以上の実施例の各技術的特徴は任意に組み合わせられてもよく、説明の便宜上、上記実施例の各技術的特徴の全ての可能な組み合わせは記載されていないが、これらの技術的特徴の組み合わせは、矛盾がない限り、本明細書に記載の範囲にあるとみなすべきである。 The technical features of the above embodiments may be combined arbitrarily, and for convenience of explanation, all possible combinations of the technical features of the above embodiments are not described. should be considered within the scope of this specification unless there is a contradiction.

以上に記載の実施例は本願のいくつかの実施形態に過ぎず、その説明は具体的かつ詳細であるが、本発明の特許範囲を制限するものとして理解すべきではない。なお、当業者であれば、本願の趣旨を逸脱せずに、いくつかの変形や改良を行うことができ、これらは全て本願の特許範囲に含まれるものとする。このため、本願の特許範囲は添付の特許請求の範囲に準じるべきである。 The examples described above are only some embodiments of the present application, and although the description thereof is specific and detailed, they should not be understood as limiting the patent scope of the present invention. Note that those skilled in the art can make several modifications and improvements without departing from the spirit of the present application, and all of these are included within the patent scope of the present application. Therefore, the patent scope of the present application should conform to the appended claims.

Claims

A character string recognition method based on computer vision,
obtaining an image with a character string to be recognized;
a step of acquiring a target area image in which the recognition target character string is located among the images to which the recognition target character string is attached, based on a position detection model constructed in advance;
horizontally correcting the target area image to obtain a horizontal target area image;
acquiring the standing state of the character string in the horizontal target area image based on a pre-built angle determination model;
If the standing state of the character string is an upright state, the step of inputting the horizontal target area image into a content recognition model built in advance to obtain character string content corresponding to the character string to be recognized. hand,
performing global image feature extraction on the horizontal target area image using the content recognition model to obtain character string image features corresponding to the horizontal target area image;
performing a second feature enhancement process on the character string image feature along the horizontal direction using a row vector convolution kernel;
A character string recognition method, comprising the step of performing parallel prediction on the recognition target character string based on the character string image feature obtained by the second feature enhancement process to obtain the character string content .

The step of acquiring the standing state of the character string of the horizontal target area image based on the pre-built angle determination model,
obtaining an upright angle of the horizontal target area image based on the angle determination model;
2. The method according to claim 1, further comprising the step of determining the standing state of the character string from the standing angle section to which the standing angle belongs.

The standing angle section includes a first angle section and a second angle section, and the standing state of the character string includes an upright state and an inverted state,
The step of determining the standing state of the character string from the standing angle section to which the standing angle belongs,
when the standing angle section is the first angle section, determining the standing state of the character string as the upright state;
3. The method according to claim 2, further comprising determining the standing state of the character string as the inverted state when the standing angle section is the second angle section.

4. The method of claim 3, further comprising, when the character string is in the inverted state, rotating the horizontal target area image to the upright state.

The step of acquiring a target area image in which the recognition target character string is located among the images to which the recognition target character string is attached based on a position detection model constructed in advance,
extracting character area image features from the image with the recognition target character string using the position detection model;
obtaining a prediction mask of the target region image according to the character region image characteristics;
2. The method of claim 1, further comprising determining a minimum circumscribed rectangle for the prediction mask to obtain the target region image.

The step of extracting character area image features from the image with the recognition target character string includes:
preprocessing the image with the recognition target character string and extracting high-dimensional image features from the preprocessed image;
6. The method according to claim 5, further comprising the step of performing a first feature enhancement process on the high-dimensional image feature using an image feature pyramid to make it the character area image feature.

The pretreatment is
filtering a small or hard-to-see character string region image of the image with the recognition target character string, and extracting the high-dimensional image feature in the image with the recognition target character string; 7. The method of claim 6, characterized in that:

A character string recognition device based on computer vision,
an image acquisition module that acquires an image with a recognition target character string;
a position detection module that acquires a target area image in which the recognition target character string is located among the images with the recognition target character string, based on a position detection model built in advance;
a lateral correction module that laterally corrects the target area image to obtain a lateral target area image;
an angle determination module that acquires a standing state of a character string in the horizontal target area image based on a pre-built angle determination model;
When the character string is in an upright state, a content recognition module inputs the horizontal target area image into a pre-built content recognition model and obtains character string content corresponding to the recognition target character string. And,
Performing global image feature extraction on the horizontal target area image using the content recognition model to obtain character string image features corresponding to the horizontal target area image;
performing a second feature enhancement process on the character string image feature along the horizontal direction using a row vector convolution kernel;
and a content recognition module that performs parallel prediction of the recognition target character string to obtain the character string content based on the character string image feature obtained by the second feature enhancement process. String recognizer.

memory in which computer programs are stored;
Computer equipment, characterized in that it comprises a processor, which when executing the computer program implements the steps of the method according to any one of claims 1 to 7 .

A computer-readable storage medium, characterized in that a computer program is stored thereon, which, when executed by a processor, implements the steps of the method according to any one of claims 1 to 7 .