JP7405528B2

JP7405528B2 - Media discrimination device, medium discrimination system, and medium discrimination method

Info

Publication number: JP7405528B2
Application number: JP2019139725A
Authority: JP
Inventors: 英嗣長谷部; 滋子文野; 政憲横田
Original assignee: Glory Ltd
Current assignee: Glory Ltd
Priority date: 2019-07-30
Filing date: 2019-07-30
Publication date: 2023-12-26
Anticipated expiration: 2039-07-30
Also published as: JP2021022285A

Description

本発明は、媒体判別装置、媒体判別システム及び媒体判別方法に関する。 The present invention relates to a medium discrimination device, a medium discrimination system, and a medium discrimination method.

従来、帳票等といった文字が記入された媒体の種類を、当該媒体の画像から判別する技術が知られている。 2. Description of the Related Art Conventionally, a technique is known for determining the type of medium, such as a form, on which characters are written based on an image of the medium.

例えば、特許文献１には、帳票画像から縦罫線及び横罫線の特徴を抽出し、抽出した罫線特徴を照合して複数の有力候補帳票を選択し、選択した有力候補帳票の所定の位置の印字文字と、帳票画像上の対応する領域で認識された文字との合致度に基づいて帳票の種類判別結果を出力する手法が開示されている。また、特許文献１には、取り込んだ帳票画像の縦罫線及び横罫線をそれぞれ基準の方向と平行となるように帳票画像の傾きを補正すること（例えば、段落［００２１］、［００３２］参照）、有力候補帳票と帳票画像の間に位置ずれが存在する場合は、この位置ずれを考慮して有力候補帳票の印字文字が印字される領域を補正すること（例えば、段落［００３０］、［００６７－００７０］参照）、標準ヒストグラムと比較し易くするために罫線のエッジに基づくヒストグラムを圧縮することによって正規化すること（例えば、段落［００３８］、［００４６］、［００６６］参照）等が記載されている。 For example, in Patent Document 1, features of vertical ruled lines and horizontal ruled lines are extracted from a form image, a plurality of promising candidate forms are selected by comparing the extracted ruled line features, and printing is performed at a predetermined position of the selected promising candidate forms. A method is disclosed that outputs a form type determination result based on the degree of matching between a character and a character recognized in a corresponding area on a form image. Further, Patent Document 1 describes that the inclination of the document image is corrected so that the vertical ruled lines and horizontal ruled lines of the imported document image are parallel to the reference direction (for example, see paragraphs [0021] and [0032]). , if there is a positional shift between the likely candidate form and the form image, the area where the printed characters of the likely candidate form are printed should be corrected in consideration of this positional deviation (for example, paragraphs [0030], [0067] -0070]), normalization by compressing the histogram based on the edges of ruled lines to make it easier to compare with the standard histogram (see, for example, paragraphs [0038], [0046], and [0066]), etc. has been done.

特開２０１４－１６９２１号公報JP2014-16921A

しかしながら、特許文献１に記載の手法では、帳票画像の回転角度が大きすぎると、上述の傾き補正や位置ずれ補正を行ったとしても当該帳票の種類を精度良く判別することは困難である。また、スキャナではなくカメラで帳票画像を取り込む場合では、帳票端が画像からはみ出した場合のように帳票端が検出できないと、上述の正規化を行ったとしても、当該帳票の種類を判別できないことがある。更に、スキャナではなくカメラで帳票画像を取り込む場合は、帳票画像の大きさ（解像度）が撮影状況により一致しないことに起因して、当該帳票の種類を判別できないこともある。そもそも、特許文献１に記載の手法では、罫線がない媒体の種類を判別することはできない。 However, in the method described in Patent Document 1, if the rotation angle of the form image is too large, it is difficult to accurately determine the type of the form even if the above-described tilt correction and positional deviation correction are performed. In addition, when capturing a form image with a camera rather than a scanner, if the edge of the form cannot be detected, such as when the edge of the form protrudes from the image, it may not be possible to determine the type of the form, even if the above-mentioned normalization is performed. There is. Furthermore, when capturing a form image with a camera rather than a scanner, the type of the form may not be determined because the size (resolution) of the form image does not match depending on the shooting situation. In the first place, the method described in Patent Document 1 cannot determine the type of medium without ruled lines.

本発明は、上記現状に鑑みてなされたものであり、種々の媒体について撮像された向き及び大きさによらずそれらの種類を高精度に判別可能な媒体判別装置、媒体判別システム及び媒体判別方法を提供することを目的とするものである。 The present invention has been made in view of the above-mentioned current situation, and provides a medium discriminating device, a medium discriminating system, and a medium discriminating method that are capable of discriminating the types of various media with high precision regardless of the orientation and size in which images are taken. The purpose is to provide the following.

上述した課題を解決し、目的を達成するために、本発明は、媒体判別装置であって、媒体の画像に基づいて、前記媒体の複数の特徴部を検出する特徴検出部と、前記複数の特徴部に基づいて、前記媒体の種類を判別する種類判別部と、を備えることを特徴とする。 In order to solve the above-mentioned problems and achieve the objects, the present invention is a medium discrimination device, which includes a feature detection unit that detects a plurality of characteristic parts of the medium based on an image of the medium, and a feature detection unit that detects a plurality of characteristic parts of the medium based on an image of the medium. The present invention is characterized by comprising a type determination unit that determines the type of the medium based on the characteristic portion.

また、本発明は、上記発明において、前記媒体判別装置は、前記複数の特徴部の間の相対位置と、前記媒体の前記種類とに基づいて、文字認識の対象となる文字認識領域を特定する認識領域特定部と、前記文字認識領域内の文字を認識する文字認識部と、を更に備えることを特徴とする。 Further, in the present invention, in the above invention, the medium discrimination device specifies a character recognition area to be subjected to character recognition based on the relative positions between the plurality of characteristic parts and the type of the medium. The present invention is characterized in that it further includes a recognition area specifying section and a character recognition section that recognizes characters within the character recognition area.

また、本発明は、上記発明において、前記媒体判別装置は、前記複数の特徴部の間の前記相対位置に基づいて、前記文字認識領域の向きを補正する文字認識用向き補正部を更に備え、前記文字認識部は、向きが補正された前記文字認識領域内の文字を認識することを特徴とする。 Further, in the present invention, in the above invention, the medium discrimination device further includes a character recognition orientation correction unit that corrects the orientation of the character recognition area based on the relative position between the plurality of characteristic parts, The character recognition unit is characterized in that it recognizes characters within the character recognition area whose orientation has been corrected.

また、本発明は、上記発明において、前記媒体判別装置は、前記媒体の前記画像を表示する表示部と、前記複数の特徴部の間の相対位置に基づいて、前記表示部に表示する前記画像の向きを補正する表示用向き補正部と、を更に備えることを特徴とする。 Further, in the above invention, the present invention provides that the medium discrimination device displays the image displayed on the display unit based on the relative position between the display unit that displays the image of the medium and the plurality of characteristic parts. The display device further includes a display orientation correction unit that corrects the orientation of the display.

また、本発明は、上記発明において、前記特徴検出部は、機械学習された推論モデルを用いて、前記複数の特徴部を検出することを特徴とする。 Further, the present invention is characterized in that, in the above invention, the feature detection unit detects the plurality of feature parts using a machine-learned inference model.

また、本発明は、上記発明において、前記媒体判別装置は、前記複数の特徴部と、前記媒体の前記種類とに基づいて、前記媒体の大きさを判定するとともに、前記媒体の全体が前記画像中に撮像されているか否かを判定する判定部を更に備えることを特徴とする。 Further, in the above invention, the present invention provides that the medium determining device determines the size of the medium based on the plurality of characteristic parts and the type of the medium, and that the entire medium is in the image. The device is characterized by further comprising a determination unit that determines whether or not an image is being captured inside.

また、本発明は、媒体判別システムであって、前記媒体判別装置と、媒体の画像を撮像する撮像装置と、を備えることを特徴とする。 Further, the present invention is a medium discrimination system characterized by comprising the medium discrimination device and an imaging device that captures an image of the medium.

また、本発明は、媒体判別方法であって、媒体の画像に基づいて、前記媒体の複数の特徴部を検出する特徴検出ステップと、前記複数の特徴部に基づいて、前記媒体の種類を判別する種類判別ステップと、前記複数の特徴部の間の相対位置と、前記媒体の前記種類とに基づいて、文字認識の対象となる文字認識領域を特定する認識領域特定ステップと、前記文字認識領域内の文字を認識する文字認識ステップと、を含むことを特徴とする。 The present invention also provides a medium discrimination method, which includes a feature detection step of detecting a plurality of characteristic portions of the medium based on an image of the medium, and a step of determining the type of the medium based on the plurality of characteristic portions. a recognition area specifying step of specifying a character recognition area to be subjected to character recognition based on the relative positions between the plurality of characteristic parts and the type of the medium; A character recognition step of recognizing characters within.

本発明の媒体判別装置、媒体判別システム及び媒体判別方法によれば、種々の媒体について撮像された向き及び大きさによらずそれらの種類を高精度に判別することができる。 According to the medium discriminating device, medium discriminating system, and medium discriminating method of the present invention, the types of various media can be discriminated with high precision regardless of the orientation and size of images of the media.

実施形態１における媒体の判別手法の概要を説明するための模式図であり、機械学習段階において媒体の複数の特徴部に対して設定された矩形のバウンディングボックスと、文字認識の対象となる文字認識領域とを示す。It is a schematic diagram for explaining an overview of a medium discrimination method in Embodiment 1, and shows a rectangular bounding box set for multiple characteristic parts of a medium in the machine learning stage and a character recognition target for character recognition. area. 実施形態１における媒体の判別手法の概要を説明するための模式図であり、媒体の判別段階において検出された複数の特徴部と、検出された複数の特徴部に基づき特定された文字認識領域とを示す。2 is a schematic diagram for explaining an overview of a medium discrimination method in Embodiment 1, showing a plurality of characteristic parts detected in the medium discrimination stage and a character recognition area specified based on the plurality of detected characteristic parts. FIG. shows. 実施形態１に係る媒体判別システムの全体構成を説明する図である。1 is a diagram illustrating the overall configuration of a medium discrimination system according to a first embodiment; FIG. 実施形態１に係る媒体判別装置の構成を説明するブロック図である。1 is a block diagram illustrating the configuration of a medium discrimination device according to a first embodiment. FIG. 実施形態１における文字認識領域の特定方法を説明するための図であり、（ａ）は、複数の特徴部及び文字認識領域の基準の位置を示し、（ｂ）は、特徴検出部によって検出された複数の特徴部の位置と、認識領域特定部によって特定された文字認識領域の位置とを示す。2 is a diagram for explaining a method for specifying a character recognition area in Embodiment 1, in which (a) shows a plurality of characteristic parts and reference positions of the character recognition area, and (b) shows positions detected by a feature detection unit; FIG. The positions of the plurality of characteristic parts and the position of the character recognition area specified by the recognition area specifying section are shown. 実施形態１に係る媒体判別システムで行われる媒体判別処理の手順の一例を示すフローチャートである。3 is a flowchart illustrating an example of a procedure of a medium discrimination process performed by the medium discrimination system according to the first embodiment. 変形形態に係る媒体判別システムの全体構成を説明する図である。It is a figure explaining the whole structure of a medium discrimination system concerning a modification.

以下、本発明に係る媒体判別装置、媒体判別システム及び媒体判別方法の好適な実施形態を、図面を参照しながら説明する。本発明において、媒体判別装置、媒体判別システム及び媒体判別方法が判別する媒体の具体的な種類は、特に限定されず、例えば、切符（交通機関で発行される乗車券や定期券）、レシート（流通店舗等で発行されるレシート）、処方箋、テーマパーク等の入場券、ギフト券、商品券等の紙媒体が挙げられる。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of a medium discrimination device, a medium discrimination system, and a medium discrimination method according to the present invention will be described below with reference to the drawings. In the present invention, the specific types of media that are determined by the media discriminating device, the media discriminating system, and the media discriminating method are not particularly limited. Examples include paper media such as receipts issued at distribution stores, prescriptions, admission tickets to theme parks, etc., gift certificates, and gift certificates.

＜媒体の判別手法の概要＞
まず、実施形態１における媒体の判別手法の概要について説明する。本実施形態では、媒体の種類の判別に必要な複数個所の特徴部を予め登録及び機械学習することによって、判別対象の媒体の映像から当該媒体の種類を判別し、更に、当該媒体に記載された読み取りたい部分を文字認識する。また、このとき、認識した複数個所の特徴部の位置関係に基づき媒体の向きを判断し、文字が読める向きに当該媒体の映像を回転して補正する。以下、図１及び２を用いて、より詳しく説明する。 <Overview of media discrimination method>
First, an overview of the medium discrimination method in the first embodiment will be explained. In this embodiment, the type of medium is determined from the image of the medium to be determined by registering and machine learning in advance multiple characteristic parts necessary for determining the type of medium, and furthermore, the type of medium is determined from the image of the medium to be determined. Recognizes characters in the part you want to read. Also, at this time, the orientation of the medium is determined based on the recognized positional relationship of the plurality of characteristic parts, and the image of the medium is rotated and corrected in an orientation in which the characters can be read. A more detailed explanation will be given below using FIGS. 1 and 2.

まず、機械学習段階では、図１に示すように、事前に各種の媒体Ｍを撮影し、キーワードやロゴ等の判別に用いる複数の特徴部Ｍａに矩形のバウンディングボックスを媒体毎に設定する。この設定内容に基づき教師データを作成し、Single Shot Multibox Detector（ＳＳＤ）と呼ばれる機械学習アルゴリズムにより、各特徴部Ｍａを学習する。また、図１に示したように、媒体毎に、各特徴部Ｍａと、文字認識の対象となる文字認識領域Ｍｂとの間の相対的な位置関係を特定し、媒体の基準情報に登録しておく。 First, in the machine learning stage, as shown in FIG. 1, various media M are photographed in advance, and rectangular bounding boxes are set for each medium in a plurality of feature parts Ma used for determining keywords, logos, etc. Teacher data is created based on the settings, and each feature Ma is learned using a machine learning algorithm called Single Shot Multibox Detector (SSD). In addition, as shown in FIG. 1, for each medium, the relative positional relationship between each characteristic part Ma and the character recognition area Mb that is the target of character recognition is specified and registered in the standard information of the medium. I'll keep it.

続いて、媒体の判別段階では、図２に示すように、まず、上記教師データを学習した推論プログラム（学習済みモデル）により、撮影した媒体Ｍの映像から複数の特徴部Ｍａを検出し、検出した複数の特徴部Ｍａの組み合わせに基づき当該媒体Ｍを判別する。次に、検出した複数の特徴部Ｍａの間の相対的な位置関係と、基準情報とに基づき文字認識領域Ｍｂを特定する。最後に、特定した文字認識領域Ｍｂ内の文字を認識する。なお、特定及び認識する文字認識領域Ｍｂの数は、一つでも複数でもよい。 Next, in the medium discrimination stage, as shown in FIG. The medium M is determined based on the combination of the plurality of characteristic parts Ma. Next, a character recognition area Mb is specified based on the relative positional relationship between the plurality of detected feature parts Ma and the reference information. Finally, the characters within the specified character recognition area Mb are recognized. Note that the number of character recognition areas Mb to be specified and recognized may be one or more.

本実施形態によれば、推論プログラム（学習済みモデル）を用いて特徴部を検出するため、媒体の向き、大きさ（画像サイズ）といった条件に関係なく、特徴部を検出することができる。また、特徴部のみから媒体の判別を行うため、特徴部以外の手書き部分やスタンプ部分といった媒体によって変動する箇所の影響を受けにくい。 According to the present embodiment, since the characteristic portion is detected using the inference program (trained model), the characteristic portion can be detected regardless of conditions such as the orientation and size (image size) of the medium. Furthermore, since the medium is discriminated only from the characteristic parts, it is less susceptible to the influence of parts other than the characteristic parts, such as handwritten parts and stamp parts, which vary depending on the medium.

＜媒体判別システムの全体構成＞
次に、図３を用いて、本実施形態の媒体判別システムの全体構成について説明する。図３に示すように、本実施形態の媒体判別システム１は、媒体Ｍを撮像する撮像装置としてのカメラ２と、媒体判別装置３と、を備えている。カメラ２は、媒体判別装置３と通信可能に接続されている。本実施形態では、図３に示すように、上面が平らな読取台６の上面に載置された媒体Ｍをカメラ１によって撮像する。 <Overall configuration of media discrimination system>
Next, the overall configuration of the medium discrimination system of this embodiment will be described using FIG. 3. As shown in FIG. 3, the medium discrimination system 1 of this embodiment includes a camera 2 as an imaging device that captures an image of the medium M, and a medium discrimination device 3. The camera 2 is communicably connected to the medium discrimination device 3. In this embodiment, as shown in FIG. 3, the camera 1 images the medium M placed on the top surface of the reading table 6, which has a flat top surface.

カメラ２は、読取台６の上方に固定されており、読取台６の上面を含む所定エリアの画像、ここでは動画像（映像）を取得する。撮影された映像（動画像）は、媒体判別装置３に出力される。カメラ２は、ＲＧＢのカラー映像を取得してもよいし、モノクロ映像を取得してもよい。 The camera 2 is fixed above the reading table 6 and captures an image of a predetermined area including the upper surface of the reading table 6, in this case a moving image (video). The photographed video (moving image) is output to the medium discrimination device 3. The camera 2 may acquire RGB color images or monochrome images.

媒体判別装置３には、撮影した映像等を表示する表示部としてのモニタ（表示装置）４と、操作者が種々の入力操作を行う入力デバイス（例えばキーボードやマウス等）５とが通信可能に接続されている。なお、モニタ４及び入力デバイス５は、タッチパネルディスプレイ等の入力機能付きの表示装置から構成されてもよい。 The medium discrimination device 3 is capable of communicating with a monitor (display device) 4 serving as a display section for displaying photographed images, etc., and an input device 5 (for example, a keyboard, a mouse, etc.) through which an operator performs various input operations. It is connected. Note that the monitor 4 and the input device 5 may be configured from a display device with an input function such as a touch panel display.

媒体判別装置３は、操作者が、モニタ４によって、カメラ２で撮像された映像をリアルタイムで閲覧できるように構成されている。 The medium discrimination device 3 is configured so that the operator can view the video imaged by the camera 2 on the monitor 4 in real time.

＜媒体判別装置の構成＞
次に、図４を用いて、媒体判別装置３の構成について更に説明する。媒体判別装置３は、一般的なパーソナルコンピューター相当の機能を有する情報処理装置から構成され、図４に示すように、制御部１０及び記憶部２０を備えている。 <Configuration of medium discrimination device>
Next, the configuration of the medium discrimination device 3 will be further explained using FIG. 4. The medium discrimination device 3 is composed of an information processing device having functions equivalent to a general personal computer, and includes a control section 10 and a storage section 20, as shown in FIG.

制御部１０は、映像入力部１１と、特徴検出部１２と、種類判定部１３と、認識領域特定部１４と、文字認識用向き補正部１５と、文字認識部１６と、表示用向き補正部１７と、判定部１８との機能を備えている。制御部１０は、例えば、各種の処理を実現するためのソフトウェアプログラムと、該ソフトウェアプログラムを実行するＣＰＵ（Central Processing Unit）と、該ＣＰＵによって制御される各種ハードウェア等によって構成されている。機械学習時及び学習済みモデルの実行時における処理の高速化のために、制御部１０は、ＣＰＵに加えて、ＧＰＵ（Graphics Processing Unit）等のハードウェアを含む構成であってもよい。制御部１０の動作に必要なソフトウェアプログラムやデータは記憶部２０に記憶される。 The control unit 10 includes a video input unit 11, a feature detection unit 12, a type determination unit 13, a recognition area identification unit 14, a character recognition orientation correction unit 15, a character recognition unit 16, and a display orientation correction unit. 17 and a determination unit 18. The control unit 10 includes, for example, a software program for implementing various processes, a CPU (Central Processing Unit) that executes the software program, and various hardware controlled by the CPU. In order to speed up processing during machine learning and when executing a trained model, the control unit 10 may include hardware such as a GPU (Graphics Processing Unit) in addition to the CPU. Software programs and data necessary for the operation of the control unit 10 are stored in the storage unit 20.

記憶部２０は、ハードディスク装置や不揮発性メモリ等の記憶装置から構成され、推論モデル２１及び基準情報２２を記憶している。 The storage unit 20 is composed of a storage device such as a hard disk device or a non-volatile memory, and stores an inference model 21 and reference information 22.

映像入力部１１は、カメラ２から映像を取得する処理を行い、映像を特徴検出部１２に出力する。 The video input unit 11 performs a process of acquiring video from the camera 2 and outputs the video to the feature detection unit 12.

特徴検出部１２は、映像入力部１１から出力された媒体の映像（画像）に基づいて、媒体の複数の特徴部を検出する。すなわち、媒体を撮影した映像において、当該媒体を特徴付ける部分、例えば特定の文字列等を少なくとも二つ検出する。特徴検出部１２は、画像の中から特定の物体の位置及びカテゴリー（クラス）を検出する物体検出の手法を用いて実現されるものである。すなわち、特徴検出部１２は、媒体上の所定の位置に特徴部が存在するか否かを判定するものではなく、媒体上をくまなく検索し、事前に登録された特徴部を媒体上の任意の場所で見つけ出し、かつその種類を認識するものである。また、特徴検出部１２は、事前に登録された特徴部をその回転角度及び大きさ（解像度）によらず、すなわち媒体の回転角度及び大きさ（解像度）によらず、媒体上で見つけ出し、かつその種類を認識する。 The feature detection unit 12 detects a plurality of feature parts of the medium based on the video (image) of the medium output from the video input unit 11 . That is, at least two portions that characterize the medium, such as specific character strings, are detected in a video of the medium. The feature detection unit 12 is realized using an object detection method that detects the position and category (class) of a specific object from an image. In other words, the feature detection unit 12 does not determine whether or not a feature exists at a predetermined position on the medium, but searches the entire medium and detects a pre-registered feature at an arbitrary position on the medium. It is something that can be found in a location and recognizes its type. Further, the feature detection unit 12 finds the feature part registered in advance on the medium regardless of its rotation angle and size (resolution), that is, regardless of the rotation angle and size (resolution) of the medium, and Recognize its type.

本実施形態では、特徴検出部１２は、記憶部２０に記憶された推論モデル２１を用いて、複数の特徴部を検出（推定）する。 In this embodiment, the feature detection unit 12 detects (estimates) a plurality of feature parts using the inference model 21 stored in the storage unit 20.

ここで、推論モデル２１について説明する。推論モデル２１は、ラベル情報（正解データ）が付されたデータセット（教師データ）の教師あり機械学習により作成される。より具体的には、推論モデル２１は、媒体の画像（二次元の静止画像）を入力データとし、その媒体の各特徴部に付与された位置やクラス（種類）等の情報をラベルとして、畳み込みニューラルネットワーク（ＣＮＮ：Convolutional Neural Network）を利用した学習用プログラムにより深層学習（ディープラーニング）を行うことによって作成される。本実施形態では、畳み込みニューラルネットワークとして、ＳＳＤを利用する。 Here, the inference model 21 will be explained. The inference model 21 is created by supervised machine learning of a data set (teacher data) to which label information (correct data) is attached. More specifically, the inference model 21 uses an image of a medium (a two-dimensional still image) as input data, uses information such as the position and class (type) given to each feature of the medium as a label, and performs convolution. It is created by performing deep learning using a learning program using a neural network (CNN: Convolutional Neural Network). In this embodiment, an SSD is used as the convolutional neural network.

教師あり機械学習により作成された推論モデル２１は、学習済みパラメータが組み込まれた推論プログラム（学習済みモデル）として機能する。なお、学習済みパラメータは、データセットを用いた学習の結果、得られたパラメータ（係数）である。また。推論プログラムは、入力として与えられた媒体の映像（映像を構成する各静止画像）に対して、学習の結果として取得された学習済みパラメータを適用し、当該映像に対する結果（具体的には、上述したような各特徴部の位置やクラス等）を出力するための一連の演算手順を規定したプログラムである。 The inference model 21 created by supervised machine learning functions as an inference program (learned model) in which learned parameters are incorporated. Note that the learned parameters are parameters (coefficients) obtained as a result of learning using a data set. Also. The inference program applies the learned parameters obtained as a result of learning to the medium video (each still image that makes up the video) given as input, and calculates the results for the video (specifically, the above-mentioned This is a program that defines a series of calculation procedures for outputting the position, class, etc. of each characteristic part.

機械学習に用いるデータセットは、例えば、以下のようにして生成することができる。まず、各種の媒体の画像（二次元の静止画像）を取得する。このとき、回転角度や形状（皺や折り目）、手書きやスタンプの有無等を適宜変更して様々な状態の媒体の画像を取得する。そして、取得した各画像において、各媒体の各特徴部に、矩形のバウンディングボックスを設定することによって、バウンディングボックスの左上及び右下の座標を各特徴部の位置に指定する。また、各特徴部について、クラス（種類）を設定する。このとき、一つの媒体に対して少なくとも二つの特徴部の位置及びクラスを登録する。また、このとき、取得した各画像において、機械学習に用いられない領域として無効領域を指定してもよい。これにより、手書き部分やスタンプ部分等、同一の種類であっても媒体によって変動する箇所が存在する場合に、そのような箇所を無効領域に指定することができるため、より効果的に特徴部を学習することができる。以上の結果、各画像に対して、各特徴部の位置及びクラスがラベル情報として付されたデータセットが生成される。これにより、各種の媒体における各特徴部の画像と、その特徴部の位置及びクラスとが対応付けられることとなる。 A dataset used for machine learning can be generated, for example, as follows. First, images (two-dimensional still images) of various media are acquired. At this time, images of the medium in various states are obtained by appropriately changing the rotation angle, shape (wrinkles or creases), presence or absence of handwriting or stamps, etc. Then, in each acquired image, by setting a rectangular bounding box for each characteristic part of each medium, the coordinates of the upper left and lower right of the bounding box are designated as the position of each characteristic part. Furthermore, a class (type) is set for each characteristic part. At this time, the positions and classes of at least two characteristic parts are registered for one medium. Further, at this time, an invalid area may be designated as an area that is not used for machine learning in each acquired image. As a result, if there are parts of the same type that vary depending on the medium, such as handwritten parts or stamp parts, such parts can be designated as invalid areas, making it possible to more effectively identify characteristic parts. can be learned. As a result of the above, a data set is generated for each image, in which the position and class of each feature are attached as label information. As a result, images of each characteristic part in various media are associated with the position and class of the characteristic part.

なお、作成された推論モデル２１は、その後、追加学習されてもよい。すなわち、推論モデル２１に異なるデータセットを適用し、更なる学習を行うことによって、新たに学習済みパラメータを生成し、この新たな学習済みパラメータが組み込まれた推論プログラムを推論モデル２１として利用してもよい。 Note that the created inference model 21 may be additionally trained thereafter. That is, by applying a different data set to the inference model 21 and performing further learning, newly learned parameters are generated, and an inference program incorporating the newly learned parameters is used as the inference model 21. Good too.

また、機械学習は、媒体判別装置３が備えるプロセッサにより実行してもよいが、専用のサーバやクラウド上のサーバといった演算処理能力に優れたコンピュータにより実行することが好ましい。 Furthermore, although machine learning may be executed by a processor included in the medium discrimination device 3, it is preferably executed by a computer with excellent arithmetic processing ability, such as a dedicated server or a server on a cloud.

種類判別部１３は、特徴検出部１２によって検出された複数の特徴部に基づいて、媒体の種類を判別する。 The type determining unit 13 determines the type of medium based on the plurality of features detected by the feature detecting unit 12.

より詳細には、基準情報２２は、媒体の種類毎に、当該媒体がもつ複数の特徴部の組み合わせを示す組合せ情報を含んでおり、種類判別部１３は、特徴検出部１２によって検出された複数の特徴部の組み合わせを組合せ情報と比較して、媒体の種類を判別する。 More specifically, the reference information 22 includes, for each type of medium, combination information indicating a combination of a plurality of characteristic parts that the medium has, and the type discrimination unit 13 includes combination information indicating a combination of a plurality of characteristic parts that the medium has. The type of medium is determined by comparing the combination of characteristic parts with the combination information.

なお、複数の特徴部のクラスの組み合わせは、異なる種類の媒体間で一致しないように設定されているが、異なる種類の媒体間で一部の特徴部のクラスは重複していてもよい。具体的には、例えば、特徴部のクラスとして、ａ～ｄの４つがあり、媒体Ａの特徴部の組み合わせが（ａ，ｂ）であり、媒体Ｂの特徴部の組み合わせが（ｃ，ｄ）であり、媒体Ｃの特徴部の組み合わせが（ａ，ｃ）であってもよい。もちろん、異なる種類の媒体間で、特徴部のクラスが全く重複していなくてもよい。すなわち、例えば、特徴部のクラスとして、ａ～ｆの６つがあり、媒体Ａの特徴部の組み合わせが（ａ，ｂ）であり、媒体Ｂの特徴部の組み合わせが（ｃ，ｄ）であり、媒体Ｃの特徴部の組み合わせが（ｅ，ｆ）であってもよい。 Note that although the combination of classes of a plurality of feature parts is set so as not to match between different types of media, some classes of feature parts may overlap between different types of media. Specifically, for example, there are four classes of feature parts, a to d, and the combination of feature parts of medium A is (a, b), and the combination of feature parts of medium B is (c, d). , and the combination of the characteristic parts of medium C may be (a, c). Of course, the classes of feature parts do not need to overlap at all between different types of media. That is, for example, there are six classes of feature parts a to f, the combination of feature parts of medium A is (a, b), the combination of feature parts of medium B is (c, d), The combination of features of medium C may be (e, f).

また、異なる種類の媒体間で、複数の特徴部の組み合わせの数は異なっていてもよい。例えば、ある媒体では、二つの特徴部のクラスの組み合わせに基づいて、その媒体の種類を判別し、他の媒体では、三つ以上の特徴部のクラスの組み合わせに基づいて、その媒体の種類を判別してもよい。 Additionally, the number of combinations of features may vary between different types of media. For example, for some media, the type of the medium is determined based on a combination of two feature classes; for other media, the type of the media is determined based on a combination of three or more feature classes. May be determined.

このように、媒体の特徴部とは、複数の特徴部の組み合わせによって当該媒体の種類を特定し得る特徴であればよく、例えば、キーワードやタイトル等の特定の文字列、意匠、ロゴ等が挙げられる。各特徴部は、予め人為的に決定されたものであってもよいし、予め機械的に決定されたものであってもよい。後者の場合は、例えば、同一の種類の複数の媒体の画像から変動が小さい部分を抽出する画像処理を行い、その部分を特徴部に決定してもよい。これにより、手書き部分やスタンプ部分等、同一の種類であっても媒体によって変動する箇所が存在する場合に、検出するのにより効果的な特徴部を容易に決定することができる。 In this way, a feature of a medium may be any feature that can identify the type of medium by a combination of multiple features, such as a specific character string such as a keyword or title, a design, a logo, etc. It will be done. Each characteristic portion may be determined artificially in advance, or may be determined mechanically in advance. In the latter case, for example, image processing may be performed to extract a portion with small variations from images of multiple media of the same type, and that portion may be determined as the characteristic portion. As a result, when there are portions of the same type that vary depending on the medium, such as a handwritten portion or a stamp portion, it is possible to easily determine a characteristic portion that is more effective for detection.

認識領域特定部１４は、特徴検出部１２によって検出された複数の特徴部の間の相対位置と、種類判別部１３によって判別された媒体の種類とに基づいて、文字認識の対象となる文字認識領域を特定する。 The recognition area specifying unit 14 recognizes a character as a character recognition target based on the relative position between the plurality of feature parts detected by the feature detecting unit 12 and the type of medium determined by the type determining unit 13. Identify the area.

より詳細には、基準情報２２は、媒体の種類毎に、基準となる各特徴部及び文字認識領域のそれぞれの位置を示す位置情報を含んでおり、認識領域特定部１４は、複数の特徴部間の相対的な位置関係と、当該媒体の種類に対応する位置情報とに基づいて、文字認識領域の位置を算出する。 More specifically, the reference information 22 includes position information indicating the respective positions of each characteristic portion and character recognition area serving as a reference for each type of medium, and the recognition area specifying unit 14 The position of the character recognition area is calculated based on the relative positional relationship between the characters and the positional information corresponding to the type of the medium.

図５を用いて更に具体的に説明すると、例えば、基準情報２２の位置情報は、図５（ａ）に示すように、二つの特徴部Ａ及びＢのそれぞれの基準の位置（座標）と、文字認識領域Ｃの基準の位置（座標）とを含んでおり、認識領域特定部１４は、特徴検出部１２によって検出された特徴部Ａ’及びＢ’の座標を（図５（ｂ）参照）、一方の特徴部Ａ’の位置が対応する特徴部Ａの基準の位置に一致するように、特徴部Ａ’及びＢ’の相対的な位置関係を維持しつつ平行移動させる（図５（ａ）参照）。そして、図５（ａ）に示すように、一致させた特徴部Ａ’（Ａ）から特徴部Ｂへ向かう方向に対する特徴部Ａ’（Ａ）から特徴部Ｂ’へ向かう方向の回転角θを算出する。この回転角θは、撮像された媒体の基準の方向に対する回転角度に相当する。また、特徴部Ａから特徴部Ｂまでの長さＬと、特徴部Ａ’から特徴部Ｂ’までの長さＬ’の比Ｒを算出する（Ｌ’＝Ｒ×Ｌ）。この長さの比Ｒは、撮像された媒体の基準の画像に対する拡大率に相当する。その後、図５（ｂ）に示すように、認識領域特定部１４は、基準情報２２の位置情報から特徴部Ａに対する文字認識領域Ｃの相対的な位置（特徴部Ａから文字認識領域Ｃへ向かう方向と長さの比Ｒ）から特徴検出部１２によって検出された特徴部Ａ’に対する仮の文字認識領域Ｃ’’の位置を算出し、特徴部Ａ’を中心にして文字認識領域Ｃ’’の位置を回転角θだけ回転し、目的の文字認識領域Ｃ’の位置を算出する。 To explain more specifically using FIG. 5, for example, the position information of the reference information 22 includes the reference positions (coordinates) of each of the two characteristic parts A and B, as shown in FIG. 5(a), The recognition area specifying unit 14 determines the coordinates of the feature parts A' and B' detected by the feature detection unit 12 (see FIG. 5(b)). , the features A' and B' are translated in parallel while maintaining the relative positional relationship so that the position of one feature A' matches the reference position of the corresponding feature A (Fig. 5(a) )reference). Then, as shown in FIG. 5(a), the rotation angle θ in the direction from feature A'(A) to feature B' with respect to the direction from feature A'(A) to feature B, which has been matched, is determined. calculate. This rotation angle θ corresponds to the rotation angle of the imaged medium with respect to the reference direction. Further, the ratio R of the length L from the feature part A to the feature part B and the length L' from the feature part A' to the feature part B' is calculated (L'=R×L). This length ratio R corresponds to the magnification of the imaged medium relative to the reference image. Thereafter, as shown in FIG. 5B, the recognition area specifying unit 14 determines the relative position of the character recognition area C with respect to the feature part A (from the feature part A to the character recognition area C) based on the position information of the reference information 22. The position of the temporary character recognition area C'' with respect to the feature A' detected by the feature detection unit 12 is calculated from the direction and length ratio R), and the character recognition area C'' is calculated with the feature A' as the center. The position of the target character recognition area C' is calculated by rotating the position by the rotation angle θ.

なお、媒体判別装置３は、特徴検出部１２及び認識領域特定部１４による出力結果をモニタ４に表示可能なように構成されていてもよい。例えば、図２に示したように、モニタにおいて、媒体Ｍの映像上に、各特徴部Ｍａ及び各文字認識領域Ｍｂを囲むバウンディングボックスがオーバーレイ表示されてもよい。 Note that the medium discriminating device 3 may be configured to be able to display output results from the feature detecting section 12 and the recognition area specifying section 14 on the monitor 4. For example, as shown in FIG. 2, a bounding box surrounding each characteristic portion Ma and each character recognition area Mb may be overlaid displayed on the image of the medium M on the monitor.

文字認識用向き補正部１５は、特徴検出部１２によって検出された複数の特徴部の間の相対位置に基づいて、認識領域特定部１４によって特定された文字認識領域の向きを補正する。これにより、文字認識部１６が文字認識領域内の文字を誤認識する割合を低減することが可能である。例えば、認識すべき文字列が「０６９」であった場合、当該媒体が反対向きに配置されてしまうと、その文字列を「６９０」と誤って認識してしまうが、上述のように、文字認識用向き補正部１５によって文字認識領域の向きを補正することによって、このような誤認識を防止することが可能である。 The character recognition orientation correction unit 15 corrects the orientation of the character recognition area specified by the recognition area identification unit 14 based on the relative positions between the plurality of feature parts detected by the feature detection unit 12. Thereby, it is possible to reduce the rate at which the character recognition unit 16 incorrectly recognizes characters within the character recognition area. For example, if the character string to be recognized is "069", if the medium is placed in the opposite direction, the character string will be mistakenly recognized as "690". By correcting the orientation of the character recognition area by the recognition orientation correction unit 15, it is possible to prevent such erroneous recognition.

より詳細には、文字認識用向き補正部１５は、上述のように認識領域特定部１４によって複数の特徴部の間の相対位置に基づいて算出された回転角θに対して反対方向の回転角（－θ）だけ回転するように、文字認識領域をアフィン変換する。 More specifically, the character recognition orientation correction unit 15 calculates a rotation angle in the opposite direction to the rotation angle θ calculated by the recognition area identification unit 14 based on the relative positions between the plurality of features as described above. Affine transformation is performed on the character recognition area so that it is rotated by (-θ).

文字認識部１６は、文字認識用向き補正部１５によって向きが補正された文字認識領域内の文字を認識（光学文字認識：ＯＣＲ）する。 The character recognition unit 16 recognizes characters (optical character recognition: OCR) within the character recognition area whose orientation has been corrected by the character recognition orientation correction unit 15.

より詳細には、記憶部２０には、判別対象の媒体の文字認識領域に使用され得る全種類の文字が文字画像として記憶されており、文字認識部１６は、向きが補正された文字認識領域内の文字列の各構成文字を文字画像と比較して各構成文字を特定し、最終的に、それらの特定した文字を結合することにより文字認識領域内の文字列を認識する。 More specifically, the storage unit 20 stores all types of characters that can be used in the character recognition area of the medium to be determined as character images, and the character recognition unit 16 stores the character recognition area whose orientation has been corrected. Each constituent character of the character string within is compared with a character image to identify each constituent character, and finally, by combining these identified characters, the character string within the character recognition area is recognized.

表示用向き補正部１７は、特徴検出部１２によって検出された複数の特徴部の間の相対位置に基づいて、モニタ４に表示する媒体の映像の向きを補正する。これにより、操作者が文字認識領域内の文字を視認し易い向きで媒体の映像をモニタ４に表示することが可能である。 The display orientation correction unit 17 corrects the orientation of the image of the medium displayed on the monitor 4 based on the relative positions between the plurality of feature parts detected by the feature detection unit 12. This allows the image of the medium to be displayed on the monitor 4 in an orientation that makes it easy for the operator to visually recognize the characters within the character recognition area.

より詳細には、表示用向き補正部１７は、上述のように認識領域特定部１４によって複数の特徴部の間の相対位置に基づいて算出された回転角θに対して反対方向の回転角（－θ）だけ回転するように、媒体の映像全体をアフィン変換する。 More specifically, the display orientation correction unit 17 calculates a rotation angle ( The entire image on the medium is affine transformed so that it is rotated by -θ).

判定部１８は、特徴検出部１２によって検出された複数の特徴部と、種類判別部１３によって判別された媒体の種類とに基づいて、当該媒体の大きさを判定するとともに、当該媒体の全体が映像中に撮像されているか否かを判定する。これにより、媒体の一部、例えば文字認識領域が撮像されていない場合に、そのことを知らせるエラーメッセージを操作者に報知することが可能である。 The determining unit 18 determines the size of the medium based on the plurality of characteristic parts detected by the characteristic detecting unit 12 and the type of medium determined by the type determining unit 13, and determines whether the entire medium is It is determined whether the image is captured in the video. This makes it possible to notify the operator of an error message when a portion of the medium, for example, a character recognition area, is not imaged.

より詳細には、基準情報２２は、媒体の種類毎に、当該媒体の外形を規定する媒体領域を示す第一の媒体領域情報と、媒体領域に対応する各特徴部に対する当該媒体領域の相対的な位置関係を示す第二の媒体領域情報とを含んでおり、判定部１８は、種類判別部１３によって判別された媒体の種類に対応する第一の媒体領域情報に基づいて、当該媒体の媒体領域を特定することによって当該媒体の大きさを判定する。また、判定部１８は、特徴検出部１２によって検出された各特徴部の位置と、種類判別部１３によって判別された媒体の種類に対応する第二の媒体領域情報とに基づいて、当該媒体の媒体領域の映像内における位置を特定し、そして、当該媒体の全体が映像中に撮像されているか否かを判定する。 More specifically, the reference information 22 includes, for each type of medium, first medium area information indicating a medium area that defines the external shape of the medium, and relative information of the medium area with respect to each feature corresponding to the medium area. Based on the first medium area information corresponding to the type of medium determined by the type determining unit 13, the determining unit 18 determines the medium of the medium based on the first medium area information corresponding to the type of medium determined by the type determining unit 13. The size of the medium is determined by specifying the area. Further, the determining unit 18 determines the position of each characteristic portion detected by the characteristic detecting unit 12 and the second medium area information corresponding to the type of medium determined by the type determining unit 13. The position of the medium area in the video is identified, and it is determined whether the entire medium is captured in the video.

判定部１８は、特徴検出部１２によって検出された複数の特徴部と、種類判別部１３によって判別された媒体の種類と、認識領域特定部１４によって特定された文字認識領域とに基づいて、当該文字認識領域の全体が映像中に撮像されているか否かを判定してもよい。これにより、文字認識領域の少なくとも一部が撮像されていない場合に、そのことを知らせるエラーメッセージを操作者に報知することが可能である。 The determining unit 18 determines the character recognition area based on the plurality of characteristic parts detected by the characteristic detecting unit 12, the type of medium determined by the type determining unit 13, and the character recognition area specified by the recognition area specifying unit 14. It may be determined whether the entire character recognition area is captured in the video. Thereby, when at least a part of the character recognition area is not imaged, it is possible to notify the operator of an error message to notify this fact.

この場合、判定部１８は、認識領域特定部１４によって特定された文字認識領域の位置に基づいて、当該文字認識領域の全体が映像中に撮像されているか否かを判定する。 In this case, the determining unit 18 determines whether the entire character recognition area is captured in the video based on the position of the character recognition area specified by the recognition area specifying unit 14.

＜媒体判別処理の手順＞
次に、図６を用いて、媒体判別システム１で行われる媒体判別処理の手順について説明する。 <Procedure for media discrimination processing>
Next, the procedure of the medium discrimination process performed by the medium discrimination system 1 will be explained using FIG.

図６に示すように、まず、映像入力部１１に、カメラ２から媒体の映像が入力される（映像入力ステップＳ１１）。 As shown in FIG. 6, first, a video of a medium is input from the camera 2 to the video input unit 11 (video input step S11).

次に、特徴検出部１２が、推論モデル２１を用いて、映像入力ステップＳ１１で入力された媒体の映像から、媒体の複数の特徴部を検出する（特徴検出ステップＳ１２）。このとき、特徴部が一つしか検出されないか、又は全く検出されなかった場合は、当該媒体を判別不能な媒体として処理する。例えば、判別不能な媒体である旨を操作者に報知する処理を行う。 Next, the feature detection unit 12 uses the inference model 21 to detect a plurality of feature parts of the medium from the video of the medium input in the video input step S11 (feature detection step S12). At this time, if only one characteristic part or no characteristic part is detected, the medium is treated as an unidentifiable medium. For example, processing is performed to notify the operator that the medium is an unidentifiable medium.

次に、種類判別部１３が、特徴検出ステップＳ１２で検出された複数の特徴部に基づいて、媒体の種類を判別する（種類判別ステップＳ１３）。このとき、検出された全ての特徴部の組み合わせが、登録されたいずれかの媒体種の組合せ情報に一致する場合は、当該媒体をその種類に判別する。他方、検出された全ての特徴部の組み合わせが登録されたいずれの媒体種の組合せ情報にも一致しない場合は、当該媒体を判別不能な媒体として処理する。例えば、判別不能な媒体である旨を操作者に報知する処理を行う。 Next, the type determination unit 13 determines the type of medium based on the plurality of characteristic parts detected in the characteristic detection step S12 (type determination step S13). At this time, if the combination of all the detected feature parts matches the combination information of any registered medium type, the medium is determined to be of that type. On the other hand, if the combinations of all detected characteristic parts do not match the combination information of any registered medium type, the medium is treated as an unidentifiable medium. For example, processing is performed to notify the operator that the medium is an unidentifiable medium.

次に、認識領域特定部１４が、特徴検出ステップＳ１２で検出された複数の特徴部の間の相対位置と、種類判別ステップＳ１３で判別された媒体の種類とに基づいて、文字認識の対象となる文字認識領域を特定する（認識領域特定ステップＳ１４）。 Next, the recognition area specifying unit 14 determines the character recognition target based on the relative positions between the plurality of feature parts detected in the feature detection step S12 and the type of medium determined in the type discrimination step S13. A character recognition area is specified (recognition area specification step S14).

次に、判定部１８が、特徴検出ステップＳ１２で検出された複数の特徴部と、種類判別ステップＳ１３で判別された媒体の種類とに基づいて、当該媒体の大きさを判定するとともに、当該媒体の全体が映像中に撮像されているか否かを判定する（判定ステップＳ１５）。 Next, the determination unit 18 determines the size of the medium based on the plurality of characteristic parts detected in the characteristic detection step S12 and the type of medium determined in the type determination step S13, and determines the size of the medium. It is determined whether the entire area is captured in the video (determination step S15).

判定部１８は、判定ステップＳ１５において、特徴検出ステップＳ１２で検出された複数の特徴部と、種類判別ステップＳ１３で判別された媒体の種類と、認識領域特定ステップＳ１４で特定された文字認識領域とに基づいて、当該文字認識領域の全体が映像中に撮像されているか否かを判定してもよい。 In the determination step S15, the determination unit 18 determines the plurality of characteristic parts detected in the feature detection step S12, the type of medium determined in the type determination step S13, and the character recognition area specified in the recognition area determination step S14. Based on this, it may be determined whether the entire character recognition area is captured in the video.

次に、文字認識用向き補正部１５が、特徴検出ステップＳ１２で検出された複数の特徴部の間の相対位置に基づいて、認識領域特定ステップＳ１４で特定された文字認識領域の向きを補正する（文字認識用向き補正ステップＳ１６）。 Next, the character recognition orientation correction unit 15 corrects the orientation of the character recognition area specified in the recognition area identification step S14 based on the relative positions between the plurality of feature parts detected in the feature detection step S12. (Character recognition orientation correction step S16).

次に、文字認識部１６が、認識領域補正ステップＳ１６で向きが補正された文字認識領域内の文字を認識する（文字認識ステップＳ１７）。 Next, the character recognition unit 16 recognizes the characters within the character recognition area whose orientation has been corrected in the recognition area correction step S16 (character recognition step S17).

次に、表示用向き補正部１７が、特徴検出ステップＳ１２で検出された複数の特徴部の間の相対位置に基づいて、モニタ４に表示する媒体の映像の向きを補正する（表示用向き補正ステップＳ１８）。 Next, the display orientation correction unit 17 corrects the orientation of the image of the medium to be displayed on the monitor 4 based on the relative positions between the plurality of features detected in the feature detection step S12 (display orientation correction Step S18).

そして、モニタ４が、表示用向き補正ステップＳ１８で向きが補正された媒体の映像を表示し（表示ステップＳ１９）、媒体判別処理を終了する。 Then, the monitor 4 displays the image of the medium whose orientation has been corrected in the display orientation correction step S18 (display step S19), and the medium discrimination process ends.

なお、表示用向き補正ステップＳ１８及び表示ステップＳ１９の処理実行タイミングは、特徴検出ステップＳ１２以降であれば特に限定されず、適宜、変更可能である。 Note that the processing execution timing of the display orientation correction step S18 and the display step S19 is not particularly limited as long as it is after the feature detection step S12, and can be changed as appropriate.

以上説明したように、本実施形態では、特徴検出部１２が、媒体の映像（画像）に基づいて、媒体の複数の特徴部を検出し、種類判別部１３が、特徴検出部１２によって検出された複数の特徴部に基づいて、媒体の種類を判別することから、特徴検出部１２によって、事前に登録された複数の特徴部をそれらの回転角度によらず媒体上の任意の場所で見つけ出し、かつそれらの種類を認識することができ、種類判別部１３によって、検出された複数の特徴部の組み合わせから当該媒体の種類を判別することができる。したがって、種々の媒体について撮像された向きによらずそれらの種類を判別することができる。また、複数の特徴部に基づいて媒体の種類を判別することから、一つの特徴部に基づいて媒体の種類を判別する場合に比べて、より高精度に当該媒体の種類を判別することができる。 As described above, in the present embodiment, the feature detection unit 12 detects a plurality of feature parts of the medium based on the video (image) of the medium, and the type discrimination unit 13 detects the plurality of features detected by the feature detection unit 12. Since the type of medium is determined based on a plurality of characteristic parts, the characteristic detecting section 12 finds a plurality of pre-registered characteristic parts at any location on the medium regardless of their rotation angle, and In addition, the type of the medium can be recognized, and the type determination unit 13 can determine the type of the medium from a combination of the plurality of detected characteristic parts. Therefore, the types of various media can be determined regardless of the orientation in which images are taken. Additionally, since the type of media is determined based on multiple features, the type of media can be determined with higher accuracy than when determining the type of media based on a single feature. .

また、本実施形態では、認識領域特定部１４が、特徴検出部１２によって検出された複数の特徴部の間の相対位置と、種類判別部１３によって判別された媒体の種類とに基づいて、文字認識の対象となる文字認識領域を特定し、文字認識部１６が、認識領域特定部１４によって特定された文字認識領域内の文字を認識することから、文字認識すべき領域を正確に特定でき、かつ当該領域内で文字認識を行うことができる。したがって、媒体の向き、画像サイズといった状態によらず、目的の領域の文字を正確に認識することができる。 Further, in the present embodiment, the recognition area specifying unit 14 determines whether the character Since the character recognition area to be recognized is specified and the character recognition unit 16 recognizes the characters within the character recognition area specified by the recognition area identification unit 14, the area to be recognized can be accurately specified. In addition, character recognition can be performed within the area. Therefore, the characters in the target area can be accurately recognized regardless of the orientation of the medium or the image size.

なお、上記実施形態では、媒体の動画像（映像）において当該媒体の種類を判別する場合について説明したが、使用する画像は、静止画であってもよい。 Note that in the above embodiment, a case has been described in which the type of medium is determined based on a moving image (video) of the medium, but the image used may be a still image.

また、上記実施形態では、推論モデル２１が畳み込みニューラルネットワークを利用した深層学習により構築された場合について説明したが、推論モデル２１は、機械学習により作成されたものであれば特に限定されず、推論モデル２１は、深層学習以外の機械学習により作成されたものであってもよい。ただし、媒体の特徴部の検出処理を様々な対象へ適用できることから、上述のＳＳＤや、You Only Look Once（ＹＯＬＯ）、Regions with Convolutional Neural Networks（Ｒ－ＣＮＮ）といった、畳み込みニューラルネットワークを利用した深層学習により構築されることが好ましい。特に、ＳＳＤが好ましい。 Further, in the above embodiment, a case has been described in which the inference model 21 is constructed by deep learning using a convolutional neural network, but the inference model 21 is not particularly limited as long as it is created by machine learning, and The model 21 may be created by machine learning other than deep learning. However, since the detection processing of media features can be applied to various targets, deep detection methods using convolutional neural networks, such as the SSD mentioned above, You Only Look Once (YOLO), and Regions with Convolutional Neural Networks (R-CNN), Preferably, it is constructed by learning. In particular, SSD is preferred.

また、上記実施形態では、特徴検出部１２が機械学習により作成された推論モデル２１を用いて複数の特徴部を検出する場合について説明したが、特徴検出部１２による特徴部の検出手法は、画像の回転角度及び大きさ（解像度）によらず物体を検出可能な物体検出の手法であればよく、機械学習を利用した検出手法に特に限定されない。例えば、Scale-Invariant Feature Transform（ＳＩＦＴ）やSpeeded-Up Robust Features（ＳＵＲＦ）等の局所特徴量を用いた物体検出の手法を用いてもよい。 Furthermore, in the above embodiment, a case has been described in which the feature detection unit 12 detects a plurality of feature parts using the inference model 21 created by machine learning, but the feature detection method by the feature detection unit 12 is Any object detection method that can detect an object regardless of its rotation angle and size (resolution) may be used, and is not particularly limited to a detection method that uses machine learning. For example, an object detection method using local features such as Scale-Invariant Feature Transform (SIFT) or Speeded-Up Robust Features (SURF) may be used.

また、上記実施形態では、媒体判別装置３を一つの装置として構成する場合について説明したが、媒体判別装置３の各機能を適宜複数の装置に分散した分散処理システムにより実現してもよい。 Further, in the above embodiment, a case has been described in which the medium discriminating device 3 is configured as one device, but each function of the medium discriminating device 3 may be realized by a distributed processing system that is appropriately distributed among a plurality of devices.

具体的には、例えば、図７に示すように、カメラ１０２と、カメラ１０２と通信可能に接続されたクラウドサーバ１０３と、クラウドサーバ１０３と通信可能に接続されたモニタを備えるパーソナルコンピューター１０４とから媒体判別システムを構成してもよい。そして、カメラ１０２に上述の特徴検出部１２の機能を持たせ、クラウドサーバ１０３に上述の種類判定部１３、認識領域特定部１４、文字認識用向き補正部１５、文字認識部１６、表示用向き補正部１７及び判定部１８の機能を持たせ、パーソナルコンピューター１０４のモニタを表示部として利用し、当該モニタに媒体の映像や、特徴検出部１２及び認識領域特定部１４による出力結果を表示してもよい。 Specifically, as shown in FIG. 7, for example, a camera 102, a cloud server 103 communicably connected to the camera 102, and a personal computer 104 equipped with a monitor communicably connected to the cloud server 103. A media discrimination system may be configured. The camera 102 is provided with the function of the feature detection unit 12 described above, and the cloud server 103 includes the type determination unit 13, the recognition area identification unit 14, the character recognition orientation correction unit 15, the character recognition unit 16, and the display orientation. It has the functions of the correction unit 17 and the determination unit 18, and uses the monitor of the personal computer 104 as a display unit to display the image of the medium and the output results from the feature detection unit 12 and the recognition area identification unit 14 on the monitor. Good too.

以上、図面を参照しながら本発明の実施形態を説明したが、本発明は、上記実施形態に限定されるものではない。また、各実施形態の構成は、本発明の要旨を逸脱しない範囲において適宜組み合わされてもよいし、変更されてもよい。 Although the embodiments of the present invention have been described above with reference to the drawings, the present invention is not limited to the above embodiments. Furthermore, the configurations of each embodiment may be combined or modified as appropriate without departing from the gist of the present invention.

以上のように、本発明は、種々の媒体の種類を判別するのに有用な技術である。 As described above, the present invention is a technique useful for determining the types of various media.

１：媒体判別システム
２、１０２：カメラ
３：媒体判別装置
４：モニタ
５：入力デバイス
６：読取台
１０：制御部
１１：映像入力部
１２：特徴検出部
１３：種類判定部
１４：認識領域特定部
１５：文字認識用向き補正部
１６：文字認識部
１７：表示用向き補正部
１８：判定部
２０：記憶部
２１：推論モデル
２２：基準情報
１０３：クラウドサーバ
１０４：パーソナルコンピューター
Ｍ：媒体
Ｍａ：特徴部
Ｍｂ：文字認識領域

1: Medium discrimination system 2, 102: Camera 3: Medium discrimination device 4: Monitor 5: Input device 6: Reading table 10: Control unit 11: Video input unit 12: Feature detection unit 13: Type determination unit 14: Recognition area identification Unit 15: Character recognition orientation correction unit 16: Character recognition unit 17: Display orientation correction unit 18: Determination unit 20: Storage unit 21: Inference model 22: Reference information 103: Cloud server 104: Personal computer M: Medium Ma: Characteristic part Mb: Character recognition area

Claims

a feature detection unit that detects a plurality of characteristic parts of the medium based on an image of the medium;
a type determination unit that determines the type of the medium based on the plurality of characteristic parts;
a recognition area specifying unit that specifies a character recognition area that is a target of character recognition based on the relative positions between the plurality of characteristic parts and the type of the medium;
a character recognition unit that recognizes characters within the character recognition area;
Equipped with
Each of the plurality of feature parts includes at least one of a specific character string and a logo,
The recognition area specifying unit selects the position information of the medium determined by the type determining unit from among the position information set for each type of medium and indicating the respective positions of each characteristic part and character recognition area. The position of the character recognition area is calculated based on the position information corresponding to the type and the relative positional relationship between the plurality of feature parts detected by the feature detection unit.
A medium discrimination device characterized by:

Further comprising a character recognition orientation correction unit that corrects the orientation of the character recognition area based on the relative position between the plurality of feature parts,
2. The medium discrimination device according to claim 1 , wherein the character recognition unit recognizes characters within the character recognition area whose orientation has been corrected.

a display unit that displays the image of the medium;
The medium according to claim 1 or 2, further comprising a display orientation correction unit that corrects the orientation of the image displayed on the display unit based on relative positions between the plurality of characteristic parts. Discrimination device.

4. The medium discrimination device according to claim 1, wherein the feature detection unit detects the plurality of feature parts using a machine-learned inference model.

The feature detection unit detects the plurality of feature parts using an inference model machine-learned using a data set in which positions and classes of at least two feature parts are attached as label information for the medium. The medium discrimination device according to any one of claims 1 to 4 .

The method further includes a determining unit that determines the size of the medium based on the plurality of characteristic parts and the type of the medium, and determines whether the entire medium is captured in the image. The medium discrimination device according to any one of claims 1 to 5 .

A medium discrimination device according to any one of claims 1 to 6 ,
an imaging device that captures an image of a medium;
A medium discrimination system comprising:

a feature detection step of detecting a plurality of features of the medium based on an image of the medium;
a type determination step of determining the type of the medium based on the plurality of characteristic parts;
a recognition area specifying step of specifying a character recognition area to be subjected to character recognition based on the relative positions between the plurality of feature parts and the type of the medium;
a character recognition step of recognizing characters within the character recognition area;
including;
Each of the plurality of feature parts includes at least one of a specific character string and a logo,
The recognition area specifying step includes position information set for each type of medium, which indicates the respective positions of each characteristic part and character recognition area, of the medium determined in the type determination step. The position of the character recognition area is calculated based on the position information corresponding to the type and the relative positional relationship between the plurality of features detected by the feature detection step.
A medium discrimination method characterized by the following.