JP7438690B2

JP7438690B2 - Information processing device, image recognition method, and learning model generation method

Info

Publication number: JP7438690B2
Application number: JP2019147085A
Authority: JP
Inventors: 崇文久野; 誠佐藤; 大樹加藤; 秀樹横山
Original assignee: Nippon Television Network Corp
Current assignee: Nippon Television Network Corp
Priority date: 2019-08-09
Filing date: 2019-08-09
Publication date: 2024-02-27
Anticipated expiration: 2039-08-09
Also published as: JP2021026744A

Description

本発明は情報処理装置、画像認識方法及び学習モデル生成方法に関し、特に、認識対象の人物の顔を認識する情報処理装置、画像認識方法及び学習モデル生成方法に関する。 The present invention relates to an information processing device, an image recognition method, and a learning model generation method, and particularly relates to an information processing device, an image recognition method, and a learning model generation method for recognizing the face of a person to be recognized.

映像中から人物の顔を認識する技術の開発が行われている。特に、近年では、ディープラーニング、強化学習等の機械学習により、学習モデルを生成し、その学習モデルを用いて人物を画像認識する技術が盛んに行われている。高精度な認識を行う学習モデルを生成するには、大量の教師データを必要とする。 Technology is being developed to recognize people's faces from videos. In particular, in recent years, techniques for generating a learning model using machine learning such as deep learning and reinforcement learning and recognizing images of people using the learning model have been actively used. Generating a learning model that performs highly accurate recognition requires a large amount of training data.

ところで、放送局等は、俳優、スポーツ選手、政治家等の多数の映像を保有している。なかでも、多数出演している俳優などは、通常の顔のみならず、笑顔や怒った顔など表現が異なる様々な表現の顔の画像を保持可能である。一方で、一部を除いて選手や政治家等は、日常的にテレビに出演又は放送されるものではなく、大会や選挙など、映像として保存される機会はあまりない。すなわち、このような人物は、様々な表現の顔の画像を保持することが不可能であり、これらの人物を画像認識する場合、教師データの絶対数が不足する。 By the way, broadcasting stations and the like possess a large number of videos of actors, athletes, politicians, and the like. In particular, it is possible to hold not only normal faces but also images of faces with various expressions such as smiling faces and angry faces, for actors who have appeared in many appearances. On the other hand, with a few exceptions, athletes and politicians do not appear on or are broadcast on television on a daily basis, and there are not many opportunities for them to be preserved as footage, such as during tournaments and elections. That is, it is impossible to maintain facial images of various expressions for such people, and when recognizing these people as images, the absolute number of training data is insufficient.

そこで、本発明は、機械学習による学習モデルによって人物の顔の画像認識を行う際、教師データの絶対数の不足による認識精度の低下を防ぐことができる情報処理装置、画像認識方法及び学習モデル生成方法を提供することにある。 SUMMARY OF THE INVENTION The present invention provides an information processing device, an image recognition method, and a learning model generation method that can prevent a decrease in recognition accuracy due to a lack of absolute number of training data when performing image recognition of a person's face using a learning model based on machine learning. The purpose is to provide a method.

本発明の一態様は、認識対象人物の顔の基本顔画像から、前記基本顔画像の顔の表情と異なる表情の顔の表情変化顔画像を生成する表情変化顔画像生成部と、前記認識対象人物の識別情報、前記認識対象人物の基本顔画像及び表情変化顔画像を含む教師データを用いて生成された、人物の顔画像を入力、前記認識対象人物に関する値を出力とする学習モデルと、映像から人物の顔画像を抽出し、前記学習モデルに入力する顔画像抽出部と、前記学習モデルの出力値から、前記映像から前記認識対象人物を認識する認識対象人物認識部と、を備える映像処理装置である。 One aspect of the present invention includes an expression change face image generation unit that generates, from a basic face image of a face of a recognition target person, an expression change face image of a face with a different expression from the facial expression of the basic face image; a learning model that inputs a face image of a person and outputs a value related to the person to be recognized, which is generated using training data including identification information of the person, a basic face image of the person to be recognized, and a facial expression change facial image; A video image comprising: a face image extraction unit that extracts a face image of a person from the video and inputs it to the learning model; and a recognition target person recognition unit that recognizes the recognition target person from the video based on an output value of the learning model. It is a processing device.

本発明の一態様は、認識対象人物の顔の基本顔画像から、前記基本顔画像の顔の表情と異なる表情の顔の表情変化顔画像を生成し、前記認識対象人物の識別情報、前記認識対象人物の基本顔画像及び表情変化顔画像を含む教師データを用いて、人物の顔画像を入力、前記認識対象人物に関する値を出力とする学習モデルを生成し、映像から人物の顔画像を抽出し、前記学習モデルに入力し、前記学習モデルの出力値から、前記映像から前記認識対象人物を認識する画像認識方法である。 One aspect of the present invention is to generate, from a basic facial image of a face of a person to be recognized, a facial expression change face image having a facial expression different from the facial expression of the basic facial image, and to generate identification information of the person to be recognized, the recognition information, and the like. Using training data including basic facial images and expression-changing facial images of the target person, a learning model is generated that inputs the person's facial image and outputs values related to the recognition target person, and extracts the facial image of the person from the video. In this image recognition method, the information is input to the learning model, and the recognition target person is recognized from the video based on the output value of the learning model.

本発明の一態様は、認識対象人物の顔の基本顔画像から、前記基本顔画像の顔の表情と異なる表情の顔の表情変化顔画像を生成し、前記認識対象人物の識別情報、前記認識対象人物の基本顔画像及び表情変化顔画像を含む教師データを用いて、人物の顔画像を入力、前記認識対象人物に関する値を出力とする学習モデルを生成する学習モデル生成方法である。 One aspect of the present invention is to generate, from a basic facial image of a face of a person to be recognized, a facial expression change face image having a facial expression different from the facial expression of the basic facial image, and to generate identification information of the person to be recognized, the recognition information, and the like. This learning model generation method uses training data including a basic face image and facial expression change facial images of a target person to generate a learning model whose input is a face image of a person and whose output is a value related to the recognition target person.

本発明の一態様は、認識対象人物の顔の基本顔画像から、前記認識対象人物の顔の経過時間情報に対応した経時変化顔画像を生成し、前記認識対象人物の識別情報、前記経過時間情報、前記認識対象人物の基本顔画像及び経時変化顔画像を含む教師データを用いて、人物の顔画像及び前記経過時間情報を入力、前記認識対象人物に関する値を出力とする学習モデルを生成する学習モデル生成方法である。 One aspect of the present invention is to generate a time-varying face image corresponding to elapsed time information of the face of the recognition target person from a basic face image of the face of the recognition target person, and to generate a time-varying face image corresponding to elapsed time information of the recognition target person's face, information, a basic face image of the person to be recognized, and training data including a face image that changes over time to generate a learning model that inputs the face image of the person and the elapsed time information and outputs a value related to the person to be recognized. This is a learning model generation method.

本発明によれば、機械学習による学習モデルによって人物の顔の画像認識を行う際、教師データの絶対数の不足による認識精度の低下を防ぐことができる。 According to the present invention, when performing image recognition of a person's face using a learning model based on machine learning, it is possible to prevent a decrease in recognition accuracy due to a lack of absolute number of teacher data.

図１は第１の実施の形態の全体の構成を示すブロック図である。FIG. 1 is a block diagram showing the overall configuration of the first embodiment. 図２は第１の実施の形態の映像処理装置２のブロック図である。FIG. 2 is a block diagram of the video processing device 2 according to the first embodiment. 図３は表情変化顔画像の生成を説明するための図である。FIG. 3 is a diagram for explaining generation of a facial image with a change in expression. 図４は情報処理装置２の具体的な動作を説明するための図である。FIG. 4 is a diagram for explaining specific operations of the information processing device 2. As shown in FIG. 図５は情報処理装置２の具体的な動作を説明するための図である。FIG. 5 is a diagram for explaining specific operations of the information processing device 2. As shown in FIG. 図６は情報処理装置２の具体的な動作を説明するための図である。FIG. 6 is a diagram for explaining specific operations of the information processing device 2. As shown in FIG. 図７は情報処理装置２の具体的な動作を説明するための図である。FIG. 7 is a diagram for explaining specific operations of the information processing device 2. As shown in FIG. 図８は第１の実施の形態の変形例の映像処理装置２のブロック図である。FIG. 8 is a block diagram of the video processing device 2 as a modification of the first embodiment. 図９は物品装着顔画像の生成を説明するための図である。FIG. 9 is a diagram for explaining generation of a facial image with an article attached. 図１０は第１の実施の形態の変形例２の顔画像抽出部２４を説明するための図である。FIG. 10 is a diagram for explaining the face image extraction unit 24 of the second modification of the first embodiment. 図１１は第１の実施の形態の変形例３の顔画像抽出部２４を説明するための図である。FIG. 11 is a diagram for explaining the face image extraction unit 24 of the third modification of the first embodiment. 図１２は第１の実施の形態の変形例３の顔画像抽出部２４を説明するための図である。FIG. 12 is a diagram for explaining the face image extraction unit 24 of the third modification of the first embodiment. 図１３は第２の実施の形態の映像処理装置２のブロック図である。FIG. 13 is a block diagram of the video processing device 2 according to the second embodiment. 図１４は第２の実施の形態の具体的な動作を説明するための図である。FIG. 14 is a diagram for explaining the specific operation of the second embodiment. 図１５は第２の実施の形態の具体的な動作を説明するための図である。FIG. 15 is a diagram for explaining the specific operation of the second embodiment. 図１６は第３の実施の形態の映像処理装置２のブロック図である。FIG. 16 is a block diagram of the video processing device 2 according to the third embodiment. 図１７は第４の実施の形態の映像処理装置２のブロック図である。FIG. 17 is a block diagram of a video processing device 2 according to the fourth embodiment. 図１８は第４の実施の形態の情報処理装置２の具体的な動作を説明するための図である。FIG. 18 is a diagram for explaining the specific operation of the information processing device 2 according to the fourth embodiment. 図１９は第４の実施の形態の情報処理装置２の具体的な動作を説明するための図である。FIG. 19 is a diagram for explaining specific operations of the information processing device 2 according to the fourth embodiment. 図２０は第４の実施の形態の情報処理装置２の具体的な動作を説明するための図である。FIG. 20 is a diagram for explaining the specific operation of the information processing device 2 according to the fourth embodiment. 図２１はコンピュータシステムによって構成された映像処理装置２のブロック図である。FIG. 21 is a block diagram of a video processing device 2 configured by a computer system.

＜第１の実施の形態＞
第１の実施の形態を説明する。 <First embodiment>
A first embodiment will be described.

図１は第１の実施の形態の全体の構成を示すブロック図である。図１中、１はカメラ、２は映像処理装置、３は表示装置である。 FIG. 1 is a block diagram showing the overall configuration of the first embodiment. In FIG. 1, 1 is a camera, 2 is a video processing device, and 3 is a display device.

カメラ１は、映像を撮影するカメラである。カメラ１は、人物などの認識のために、高画質な映像が撮影できる４Ｋ又は８Ｋのカメラが好ましいが、これらに限定されるものではない。 Camera 1 is a camera that takes pictures. The camera 1 is preferably a 4K or 8K camera capable of capturing high-quality images for recognizing people, but is not limited to these.

映像処理装置２は、カメラ１の撮影映像を入力し、映像中の人物のなかから、特定の人物（以下、認識対象人物と記載する）を認識し、その結果を表示装置３に出力するものである。尚、本実施の形態では、映像処理装置２が処理する映像は、カメラ１から出力される映像であるが、カメラ１で撮影され、一度、記憶装置に記憶された映像（リアルタイムではない）でも良い。 The video processing device 2 inputs the video shot by the camera 1, recognizes a specific person (hereinafter referred to as a recognition target person) from among the people in the video, and outputs the result to the display device 3. It is. In this embodiment, the video processed by the video processing device 2 is the video output from the camera 1, but the video that has been shot by the camera 1 and once stored in the storage device (not in real time) may also be used. good.

表示装置３は、撮影映像と、映像処理装置２から出力される認識結果とが出力されるディスプレイである。しかし、表示装置３は、表示機能のみならず、タブレット端末のように、タッチパネルの機能を持つディスプレイであっても良い。 The display device 3 is a display on which the captured video and the recognition results output from the video processing device 2 are output. However, the display device 3 may have not only a display function but also a display having a touch panel function, such as a tablet terminal.

次に、映像処理装置２を説明する。図２は映像処理装置２のブロック図である。 Next, the video processing device 2 will be explained. FIG. 2 is a block diagram of the video processing device 2. As shown in FIG.

映像処理装置２は、基本顔画像記憶部２１と、表情変化顔画像生成部２２と、学習モデル２３と、顔画像抽出部２４と、認識対象人物認識部２５とを備える。 The video processing device 2 includes a basic face image storage section 21 , an expression change face image generation section 22 , a learning model 23 , a face image extraction section 24 , and a recognition target person recognition section 25 .

基本顔画像記憶部２１は、映像中の認識対象の人物となる複数の認識対象人物の基本顔画像が格納された記憶部である。基本顔画像は、識対象人物の基本となる表情の顔が写った顔画像である。 The basic face image storage unit 21 is a storage unit that stores basic face images of a plurality of recognition target persons who are recognition target persons in a video. The basic face image is a face image that shows a face with a basic expression of the person to be recognized.

表情変化顔画像生成部２２は、基本顔画像を用いて、基本顔画像の顔の表情と異なる表情の認識対象人物の顔の画像（以下、表情変化顔画像と記載する）を生成する。例えば、図３に示すような、笑顔、泣き顔、怒った顔、恐れた顔、疲労した顔など顔画像である。表情変化顔画像の生成は既知の技術を用いることが可能である。例えば、基本顔画像の顔の特徴点を抽出し、その特徴点をあるルールに従って変化させることにより、基本顔画像とは異なる他の表現の表情変化顔画像の生成が可能である。 The expression-changing facial image generation unit 22 uses the basic facial image to generate an image of the face of the person to be recognized (hereinafter referred to as an expression-changing facial image) having a facial expression different from the facial expression of the basic facial image. For example, the face images include a smiling face, a crying face, an angry face, a fearful face, a tired face, etc. as shown in FIG. A known technique can be used to generate the expression-changing facial image. For example, by extracting facial feature points of a basic facial image and changing the feature points according to a certain rule, it is possible to generate an expression-changing facial image with a different expression from that of the basic facial image.

学習モデル２３は、人物の顔画像を入力とし、認識対象人物に関する値を出力とする学習モデルである。学習モデルの生成は、各認識対象人物の識別情報（例えば、氏名）と、各認識対象人物の基本顔画像記憶部２１からの基本顔画像と、各認識対象人物の表情変化顔画像生成部２２によって生成された表情変化顔画像とを含む教師データを用いて、ディープラーニング、強化学習、それらの組み合わせなどにより学習する。学習モデル２３の出力である認識対象人物に関する値は、例えば、入力された顔画像の人物が認識対象人物である確率などである。 The learning model 23 is a learning model that inputs a face image of a person and outputs a value related to the person to be recognized. The learning model is generated using the identification information (for example, name) of each recognition target person, the basic face image from the basic face image storage unit 21 of each recognition target person, and the facial expression change face image generation unit 22 of each recognition target person. Learning is performed using deep learning, reinforcement learning, a combination of these, etc., using training data including expression-changing facial images generated by . The value related to the person to be recognized, which is the output of the learning model 23, is, for example, the probability that the person in the input face image is the person to be recognized.

顔画像抽出部２４は、入力された映像から人物の顔を検出し、その顔の顔画像を抽出し、顔画像とその顔画像を識別できる識別情報とを、学習モデル２３に出力する。映像から人物の抽出は、既知の技術を用いることができる。 The face image extraction unit 24 detects a person's face from the input video, extracts a face image of the face, and outputs the face image and identification information that can identify the face image to the learning model 23. A known technique can be used to extract a person from a video.

認識対象人物認識部２５は、学習モデル２３からの値を受けて、認識対象人物を認識し、認識結果を表示装置３に出力する。例えば、学習モデルの出力値のうち、所定の閾値以上の確率を持つ顔画像の顔を、認識対象人物として認識し、認識対象人物の映像中の位置を示すマーカ（例えば、認識対象人物の顔を囲む矩形）とその認識対象人物の識別情報（例えば、氏名）とを表示装置３に出力する。 The recognition target person recognition unit 25 receives the value from the learning model 23, recognizes the recognition target person, and outputs the recognition result to the display device 3. For example, among the output values of the learning model, a face in a face image with a probability greater than or equal to a predetermined threshold is recognized as the recognition target person, and a marker (for example, the face of the recognition target person) indicating the position of the recognition target person in the video is recognized. (a rectangle surrounding the rectangle) and the identification information (for example, name) of the person to be recognized are output to the display device 3.

次に、情報処理装置２の具体的な動作を説明する。 Next, the specific operation of the information processing device 2 will be explained.

まず、各認識対象人物の基本顔画像を用意し、表情変化顔画像生成部２２に入力する。各認識対象人物の複数の表情変化顔画像を生成する。 First, a basic facial image of each person to be recognized is prepared and input to the facial expression change facial image generation section 22. A plurality of facial expression change facial images of each recognition target person are generated.

図４は認識対象人物Ｘの場合の複数の表情変化顔画像の生成を説明するための図である。認識対象人物Ｘの基本顔画像を用意し、表情変化顔画像生成部２２に入力する。表情変化顔画像生成部２２は、例えば、基本顔画像の顔の特徴点を抽出し、その特徴点をあるルールに従って変化させることにより、認識対象人物Ｘの基本顔画像の顔とは異なる他の表現（例えば、笑顔、泣き顔、怒った顔等）の表情変化顔画像を複数生成する。これにより、認識対象人物Ｘの画像認識する場合の教師データ数の不足を解決する。同様に、他の認識対象人物Ｙ、認識対象人物Ｚ．．．等の認識したい人物の基本顔画像を用意し、表情変化顔画像生成部２２により、基本顔画像の顔とは異なる他の表現の表情変化顔画像を複数生成する。 FIG. 4 is a diagram for explaining the generation of a plurality of expression-changing facial images for the recognition target person X. A basic facial image of the person to be recognized X is prepared and input to the facial expression change facial image generation section 22. For example, the facial expression change face image generation unit 22 extracts the feature points of the face of the basic face image and changes the feature points according to a certain rule, thereby creating a face different from the face of the basic face image of the person to be recognized. A plurality of facial images with changing expressions (for example, a smiling face, a crying face, an angry face, etc.) are generated. This solves the problem of insufficient training data when recognizing the image of the person X to be recognized. Similarly, other recognition target person Y, recognition target person Z. ．．．． A basic facial image of a person desired to be recognized is prepared, and the facial expression changing face image generation unit 22 generates a plurality of facial expression changing facial images with different expressions from the face in the basic facial image.

次に、少なくとも１以上の認識対象人物の識別情報（例えば、氏名）と、少なくとも１以上の認識対象人物の基本顔画像と、少なくとも１以上の認識対象人物の基本顔画像を用いて表情変化顔画像生成部２２により生成された複数の表情変化顔画像とを教師データとし、顔画像から認識対象人物を認識する学習モデル２３を生成する。学習モデル２３の出力は、入力された顔画像の顔が、認識対象人物の顔である確率（確からしさ）である。 Next, a facial expression changing face is created using identification information (for example, name) of at least one person to be recognized, at least one basic face image of the person to be recognized, and at least one basic face image of the person to be recognized. A learning model 23 for recognizing a person to be recognized from the facial images is generated using the plurality of facial expression change facial images generated by the image generating unit 22 as training data. The output of the learning model 23 is the probability (likelihood) that the face of the input face image is the face of the person to be recognized.

学習モデル２３の学習が完了した後、認識対象人物の画像認識の処理を開始する。以下の説明では、カメラ１は、選挙の演説の模様を撮影しており、その映像は図５に示す映像であり、表示装置３に表示される映像である。 After the learning of the learning model 23 is completed, image recognition processing of the person to be recognized is started. In the following explanation, the camera 1 photographs an election speech, and the video is the video shown in FIG. 5, which is the video displayed on the display device 3.

カメラ１の映像は、顔画像抽出部２４に入力される。顔画像抽出部２４は、カメラ１の映像中の人物の顔を検出する。カメラ１の映像で検出された顔は、図６に示す如く、顔Ａ、顔Ｂ、顔Ｃ、顔Ｄ、顔Ｅ、顔Ｆである。尚、図６の映像中において、検出された人物の顔を点線の矩形で示しているが、これは検出を概念的に示しているだけであり、表示装置３には点線の矩形は表示されない。 The image captured by the camera 1 is input to the face image extraction section 24 . The face image extraction unit 24 detects the face of a person in the video of the camera 1. The faces detected in the video of the camera 1 are face A, face B, face C, face D, face E, and face F, as shown in FIG. In the video of FIG. 6, the detected face of the person is shown as a dotted rectangle, but this only conceptually shows the detection, and the dotted rectangle is not displayed on the display device 3. .

顔画像抽出部２４は、カメラ１の映像から、顔Ａ、顔Ｂ、顔Ｃ、顔Ｄ、顔Ｅ、顔Ｆをそれぞれ囲む矩形領域の画像を抽出し、顔Ａの顔画像、顔Ｂの顔画像、顔Ｃの顔画像、顔Ｄの顔画像、顔Ｅの顔画像、顔Ｆの顔画像として、学習モデル２３に出力する。 The face image extraction unit 24 extracts images of rectangular areas surrounding each of face A, face B, face C, face D, face E, and face F from the video of camera 1, and They are output to the learning model 23 as a face image, a face image of face C, a face image of face D, a face image of face E, and a face image of face F.

学習モデル２３は、顔Ａの顔画像、顔Ｂの顔画像、顔Ｃの顔画像、顔Ｄの顔画像、顔Ｅの顔画像、顔Ｆの顔画像に対して、各認識対象人物の顔である確率（確からしさ）を出力する。ここでは、学習モデル２３の出力が、認識対象人物Ｘについて、顔Ａの顔画像＝０．１、顔Ｂの顔画像＝０．８５、顔Ｃの顔画像＝０．０５、顔Ｄの顔画像＝０．９、顔Ｅの顔画像＝０．３、顔Ｆの顔画像＝０．１であり、認識対象人物Ｙについて、顔Ａの顔画像＝０．１、顔Ｂの顔画像＝０．５、顔Ｃの顔画像＝０．０１、顔Ｄの顔画像＝０．６、顔Ｅの顔画像＝０．９、顔Ｆの顔画像＝０．１であり、認識対象人物Ｚについて、顔Ａの顔画像＝０．２、顔Ｂの顔画像＝０．１、顔Ｃの顔画像＝０．１、顔Ｄの顔画像＝０．２、顔Ｅの顔画像＝０．１、顔Ｆの顔画像＝０．９であり、．．．、とする。 The learning model 23 uses the face image of each recognition target person for the face image of face A, the face image of face B, the face image of face C, the face image of face D, the face image of face E, and the face image of face F. Outputs the probability (likelihood) that is. Here, the output of the learning model 23 is, for the person to be recognized image = 0.9, face image of face E = 0.3, face image of face F = 0.1, and for recognition target person Y, face image of face A = 0.1, face image of face B = 0.5, the face image of face C = 0.01, the face image of face D = 0.6, the face image of face E = 0.9, the face image of face F = 0.1, and the recognition target person Z For, the facial image of face A = 0.2, the facial image of face B = 0.1, the facial image of face C = 0.1, the facial image of face D = 0.2, the facial image of face E = 0. 1. The face image of face F = 0.9. ．．．． , and so on.

認識対象人物認識部２５は、学習モデル２３の出力値を入力する。ここで、認識対象人物認識部２５は、所定の閾値以上の確率を持つ顔画像の顔を、認識対象人物として認識する。ここで、閾値を０．８とすると、顔Ｂ及び顔Ｄが認識対象人物Ｘの顔であり、顔Ｅが認識対象人物Ｙの顔であり、顔Ｆが認識対象人物Ｚの顔であると認識する。そして、顔Ｂ及び顔Ｄを囲む矩形と「認識対象人物Ｘ」と、顔Ｅを囲む矩形と「認識対象人物Ｙ」と、顔Ｆを囲む矩形と「認識対象人物Ｚ」とを表示装置３に出力する。 The recognition target person recognition unit 25 receives the output value of the learning model 23 as input. Here, the recognition target person recognition unit 25 recognizes a face in a face image having a probability equal to or higher than a predetermined threshold value as a recognition target person. Here, if the threshold value is 0.8, then face B and face D are the faces of recognition target person X, face E is the face of recognition target person Y, and face F is the face of recognition target person Z. recognize. Then, a rectangle surrounding faces B and D and "recognition target person X", a rectangle surrounding face E and "recognition target person Y", and a rectangle surrounding face F and "recognition target person Z" are displayed on the display device 3. Output to.

図７は、表示装置３に表示される認識結果の一例であり、顔Ｂ及び顔Ｄは矩形で囲まれ、矩形の下には識別情報である「認識対象人物Ｘ」が表示される。同様に、顔Ｅは矩形で囲まれ、矩形の下には識別情報である「認識対象人物Ｙ」が表示される。同様に、顔Ｆは矩形で囲まれ、矩形の下には識別情報である「認識対象人物Ｆ」が表示される。 FIG. 7 is an example of a recognition result displayed on the display device 3, in which faces B and D are surrounded by a rectangle, and "recognition target person X", which is identification information, is displayed below the rectangle. Similarly, the face E is surrounded by a rectangle, and the identification information "person Y to be recognized" is displayed below the rectangle. Similarly, the face F is surrounded by a rectangle, and the identification information "person F to be recognized" is displayed below the rectangle.

第１の実施の形態によれば、教師データの絶対数の不足による認識精度の低下を防ぐことができる。 According to the first embodiment, it is possible to prevent a decrease in recognition accuracy due to a lack of absolute number of teacher data.

＜第１の実施の形態の変形例１＞
第１の実施の形態の変形例を説明する。 <Modification 1 of the first embodiment>
A modification of the first embodiment will be described.

図８は第１の実施の形態の変形例の映像処理装置２のブロック図である。 FIG. 8 is a block diagram of the video processing device 2 as a modification of the first embodiment.

第１の実施の形態の変形例は、第１の実施の形態の情報処理装置２に、物品装着画像生成部２６を追加している。 In a modification of the first embodiment, an article mounting image generation unit 26 is added to the information processing device 2 of the first embodiment.

物品装着顔画像生成部２６は、変化顔画像生成部２２と同様に、基本顔画像を用いて、基本顔画像の顔に物品を装着した認識対象人物の顔の画像（以下、物品装着顔画像と記載する）を生成する。物品装着顔画像は、例えば、図９に示すように、認識対象人物の基本顔画像にメガネを装着した場合の顔画像である。ここで、物品は、顔に装着するものであれば何でも良いが、例えば、めがね、サングラス、帽子、ヘルメット、アクセサリー等がある。 Similar to the changed face image generation section 22, the article-mounted face image generation section 26 uses the basic face image to generate an image of the face of the person to be recognized who has worn the article on the face of the basic face image (hereinafter referred to as an article-mounted face image). ) is generated. The article-wearing facial image is, for example, as shown in FIG. 9, a facial image when glasses are worn on the basic facial image of the person to be recognized. Here, the article may be anything as long as it is worn on the face, and includes, for example, glasses, sunglasses, hats, helmets, and accessories.

第１の実施の形態の変形例は、認識対象人物がメガネなどを装着した場合の顔画像を生成し、学習モデル２３の教師データとすることにより、学習モデル２３の認識精度が高まるという効果がある。 A modification of the first embodiment has the effect of increasing the recognition accuracy of the learning model 23 by generating a face image of the person to be recognized wearing glasses or the like and using it as training data for the learning model 23. be.

＜第１の実施の形態の変形例２＞
第１の実施の形態の変形例２を説明する。 <Modification 2 of the first embodiment>
A second modification of the first embodiment will be described.

第１の実施の形態の顔画像抽出部２４は、映像中に写っている人物と思われる全ての顔を検出している。しかし、放送などで使用される映像では、映像の中心付近に位置する人物は重要な被写体であることが多い。すなわち、認識対象人物が、映像の中心付近に位置することが多い。そこで、顔画像抽出部２４は、顔を認識する領域を限定し、その領域に存在する人物のみの顔を検出するように構成する。図１０の例では、映像の中心付近に識別対象領域を設定し、その識別対象領域内の人物のみ顔を検出するようにしているので、検出される顔は、顔Ｂ、顔Ｃ及び顔Ｄとなり、第１の実施の形態と比べて検出される顔の数が減り、認識処理する数を削減すことができる。 The face image extraction unit 24 of the first embodiment detects all faces that appear to be people in the video. However, in videos used in broadcasting and the like, a person located near the center of the video is often an important subject. That is, the person to be recognized is often located near the center of the image. Therefore, the face image extraction unit 24 is configured to limit the area in which faces are recognized and to detect only the faces of people present in that area. In the example shown in FIG. 10, an identification target area is set near the center of the video, and only the faces of people within the identification target area are detected, so the detected faces are face B, face C, and face D. Therefore, the number of detected faces is reduced compared to the first embodiment, and the number of faces to be recognized can be reduced.

第１の実施の形態の変形例２は、顔を検出する領域を限定することにより、顔画像抽出部２４による顔の検出数を減らすことにより、画像認識処理全体を高速化する効果を有する。 Modification 2 of the first embodiment has the effect of speeding up the entire image recognition process by limiting the region in which faces are detected and thereby reducing the number of faces detected by the face image extraction unit 24.

＜第１の実施の形態の変形例３＞
第１の実施の形態の変形例３を説明する。 <Modification 3 of the first embodiment>
A third modification of the first embodiment will be described.

放送等で使用される映像では、映像のセンター（中心）付近にいる人は重要な被写体であることが多い。また、グループショットの場合、センター（中心）付近にいる人と同程度の奥行に位置する人は同じく重要であることが多い。第１の実施の形態の変形例３は、これらの特徴を利用して、認識対象者を減らし、処理時間を短くする例を説明する。 In videos used in broadcasting, etc., people near the center of the video are often important subjects. Furthermore, in the case of a group shot, people near the center and people located at the same depth are often equally important. In a third modification of the first embodiment, an example will be described in which these features are utilized to reduce the number of recognition targets and shorten the processing time.

具体的に説明すると、顔画像抽出部２４は、映像中の検出できる顔を検出し、その顔のサイズ（顔を囲む矩形のサイズ）を求める。図１１の例では、検出できる顔は顔Ａから顔Ｆであり、顔Ａから顔Ｆのサイズを求める。 Specifically, the face image extraction unit 24 detects a detectable face in the video, and calculates the size of the face (the size of the rectangle surrounding the face). In the example of FIG. 11, the faces that can be detected are faces A to F, and the sizes of faces A to F are determined.

続いて、映像のセンター（中心）に最も近い位置に存在する人物の顔を認識対象とする。しかし、映像のセンター（中心）付近に位置に存在する人物を検出するのは、別途、骨格検出等の処理が必要となる。そこで、映像のセンター（中心）付近に位置に存在する人物の顔の位置を想定し、その位置を基準点とし、その基準点に最も近い顔を認識対象とする。具体的に説明すると、例えば、図１１に示すように、縦方向を上から３０パーセント対７０パーセントに分割する線と、横方向を５０パーセント対５０パーセントに分割する線との交点を基準点とする。そして、基準点に最も近い顔Ｂを認識対象とし、認識対象である顔Ｂのサイズ（顔を囲む矩形のサイズ）を検出し、これを基準サイズとする。 Next, the face of the person closest to the center of the image is targeted for recognition. However, detecting a person located near the center of the image requires separate processing such as skeleton detection. Therefore, the position of a person's face near the center of the image is assumed, that position is used as a reference point, and the face closest to the reference point is set as the recognition target. Specifically, as shown in FIG. 11, for example, the reference point is the intersection of a line that divides the vertical direction into 30% to 70% from above and a line that divides the horizontal direction into 50% to 50%. do. Then, the face B closest to the reference point is set as the recognition target, and the size of the face B (the size of the rectangle surrounding the face) that is the recognition target is detected, and this is set as the reference size.

次に、認識対象とした顔の基準サイズよりも一定以上小さい（例えば、７０％以下）、もしくは、大きい（１４０％以上）サイズの顔は認識対象外とする。すなわち、それらの顔の顔画像を学習モデル２３に出力しない。図１１の例では、上記の条件に合致し、顔Ｂ以外に認識対象となる顔は顔Ａ、顔Ｃであり、顔Ｄ、顔Ｅ、顔Ｆは認識対象外となる。従って、学習モデル２３に出力される顔画像は、図１２に示す如く、顔Ａの顔画像、顔Ｂの顔画像及び顔Ｃの顔画像である。 Next, faces that are smaller (for example, 70% or less) or larger (140% or more) than the standard size of the face to be recognized are excluded from the recognition target. That is, facial images of those faces are not output to the learning model 23. In the example of FIG. 11, faces that meet the above conditions and are to be recognized in addition to face B are face A and face C, and faces D, E, and F are not to be recognized. Therefore, the face images output to the learning model 23 are the face image of face A, the face image of face B, and the face image of face C, as shown in FIG.

このような処理により、学習モデル２３が処理する顔画像の数を削減することができ、結果として、画像認識処理全体を高速化する効果を有する。 Such processing can reduce the number of face images processed by the learning model 23, and as a result has the effect of speeding up the entire image recognition process.

＜第２の実施の形態＞
第２の実施の形態を説明する。 <Second embodiment>
A second embodiment will be described.

第２の実施の形態は、認識対象人物の顔の経過時間による顔の表情の変化を考慮して、画像認識を行うものである。 In the second embodiment, image recognition is performed in consideration of changes in the facial expression of a person to be recognized over time.

図１３は第２の実施の形態の映像処理装置２のブロック図である。 FIG. 13 is a block diagram of the video processing device 2 according to the second embodiment.

第２の実施の形態の映像処理装置２が、第１の実施の形態の映像処理装置２と異なる所は、時間情報（例えば、試合の経過時間、タイムコード）を学習モデルが入力し、認識対象人物の顔の経過時間による顔の表情の変化を考慮して、学習モデル２３が認識対象人物の認識を行うことである。ここで、経過時間による顔の表情の変化とは、時間の経過にともなう顔の表情の変化であり、例えば、時間経過の疲労による顔の表情の変化、年齢経過の老いによる顔の表情の変化などである。 The video processing device 2 of the second embodiment differs from the video processing device 2 of the first embodiment in that the learning model inputs time information (e.g., elapsed time of a match, time code) and recognizes it. The learning model 23 recognizes the recognition target person by taking into account changes in facial expressions of the target person's face over time. Here, changes in facial expressions due to elapsed time refer to changes in facial expressions over time, such as changes in facial expressions due to fatigue over time, changes in facial expressions due to aging, etc. etc.

具体的な説明をすると、競技などでは、開始から時間が経過するにつれて、選手は疲労し、顔に疲労が表れる。特に、マラソンなどの競技では、それが顕著である。 To give a concrete explanation, as time passes from the start of a competition, athletes become tired and fatigue appears on their faces. This is especially noticeable in events such as marathons.

そこで、変化顔画像生成部２２は、基本顔画像から、競技開始から経過時間に応じた疲労度を加味した変化顔画像を生成する。例えば、図１４に示すように、競技開始から１時間経過後の疲労度を加味した変化顔画像ａ、競技開始から２時間経過後の疲労度を加味した変化顔画像ｂを生成する。 Therefore, the changed face image generation unit 22 generates a changed face image from the basic face image, taking into account the degree of fatigue according to the time elapsed since the start of the competition. For example, as shown in FIG. 14, a changed face image a that takes into account the degree of fatigue after one hour has elapsed from the start of the competition, and a changed face image b that takes into account the degree of fatigue that has passed two hours after the start of the competition are generated.

学習モデル２３が学習する際、教師データとして、各認識対象人物の基本顔画像記憶部２１からの基本顔画像と、各認識対象人物の表情変化顔画像生成部２２によって生成された表情変化顔画像とに加えて、表情変化顔画像の時間的情報を加える。上記の例では、変化顔画像ａには競技開始から１経過後、変化顔画像ｂには競技開始から２時間経過後という時間的情報も教師データとして用いる。 When the learning model 23 learns, the basic face image of each recognition target person from the basic face image storage unit 21 and the expression change face image generated by the expression change face image generation unit 22 of each recognition target person are used as teacher data. In addition to this, temporal information of facial images with changing expressions is added. In the above example, time information such as 1 lapse after the start of the competition for the changed face image a and 2 hours after the start of the competition for the changed face image b is also used as the teacher data.

一方、画像認識の際には、顔画像に加えて、時間情報（競技開始からの経過時間）が入力データとして、学習モデル２３に入力される。 On the other hand, during image recognition, in addition to the face image, time information (elapsed time from the start of the competition) is input to the learning model 23 as input data.

図１５はマラソンを一例とした図である。マラソン開始後は、選手も疲労はなく、基本顔画像に近い顔をしているが、１時間経過すると、選手の顔は、疲労して変化顔画像ｂに近い顔になる。更に、２時間経過すると、選手の顔は、更に疲労して変化顔画像ｃに近い顔になる。 FIG. 15 is a diagram using a marathon as an example. After the start of the marathon, the athlete is not fatigued and has a face similar to the basic face image, but after one hour, the athlete's face becomes tired and resembles the changed face image b. Furthermore, after two hours have passed, the player's face becomes even more fatigued and becomes closer to the changed face image c.

学習モデル２３は、上記の特徴を利用し、画像認識の際には、映像とともに、競技開始からのおよその経過時間を入力し、これを認識のパラメータのひとつとする。それにより、経過時間を考慮した画像認識を行うことができ、より、高精度な画像認識を行うことができる。 The learning model 23 utilizes the above characteristics, and when performing image recognition, inputs the approximate elapsed time from the start of the competition together with the video, and uses this as one of the recognition parameters. Thereby, image recognition can be performed in consideration of elapsed time, and more accurate image recognition can be performed.

＜第３の実施の形態＞
第３の実施の形態を説明する。 <Third embodiment>
A third embodiment will be described.

図１６は第３の実施の形態の映像処理装置２のブロック図である。 FIG. 16 is a block diagram of the video processing device 2 according to the third embodiment.

第３の実施の形態の映像処理装置２が、第１の実施の形態の映像処理装置２と異なる所は、映像に関する映像関連情報を用いて、学習モデル２３が認識対象人物の認識を行うことである。 The video processing device 2 of the third embodiment differs from the video processing device 2 of the first embodiment in that the learning model 23 recognizes the person to be recognized using video-related information regarding the video. It is.

ニュースのような映像は、そのニュースの原稿等の映像関連情報が存在する。原稿等は、その映像に存在する人物の氏名等が記載されている場合が多い。そこで、原稿などのテキストデータから抽出した認識対象人物の識別情報（例えば、氏名）を、そのテキストデータと映像との対応関係（例えば、タイムコード等）を用いて、学習モデル２３が認識している映像時に学習モデル２３に入力する。 For videos such as news, there is video-related information such as the manuscript of the news. In many cases, a manuscript or the like includes the names and the like of the people present in the video. Therefore, the learning model 23 recognizes the identification information (for example, name) of the person to be recognized extracted from text data such as a manuscript, using the correspondence relationship between the text data and the video (for example, time code, etc.). input to the learning model 23 when the video is displayed.

学習モデル２３は、入力された認識対象人物の識別情報に対応する認識対象人物の確からしさに重み付けを行う。これにより、学習モデル２３の認識精度を高めることができる。 The learning model 23 weights the probability of the recognition target person corresponding to the input identification information of the recognition target person. Thereby, the recognition accuracy of the learning model 23 can be improved.

＜第４の実施の形態＞
図１７は第４の実施の形態の映像処理装置２のブロック図である。 <Fourth embodiment>
FIG. 17 is a block diagram of a video processing device 2 according to the fourth embodiment.

第４の実施の形態の映像処理装置２が、第１の実施の形態の映像処理装置２と異なる所は、認識対象人物認識部２５が認識した認識対象人物の顔のうち経時的変化を検出しない認識対象人物を認識対象から除外する認識対象除外部２７を更に備えることである。 The difference between the video processing device 2 of the fourth embodiment and the video processing device 2 of the first embodiment is that the recognition target person recognition unit 25 detects changes over time in the recognized face of the recognition target person. The present invention further includes a recognition target exclusion unit 27 that excludes recognition target persons who do not do so from recognition targets.

学習モデル２３は、認識対象候補人物の顔は識別することはできるが、実際にその場所にいる認識対象人物と、ポスター又は絵画やフィギュア等に写った認識対象人物とを区別することはできない。例えば、図７に示すように、第１の実施の形態では、実際にその場所にいる認識対象人物の顔と、ポスターに写った認識対象人物の顔とを区別せず、認識対象人物の顔として検出している。 Although the learning model 23 can identify the face of the recognition target candidate, it cannot distinguish between the recognition target person who is actually present at the location and the recognition target person who appears on a poster, painting, figurine, or the like. For example, as shown in FIG. 7, in the first embodiment, the face of the person to be recognized who is actually present at the location and the face of the person to be recognized on the poster are not distinguished, and the face of the person to be recognized is It is detected as.

しかし、実際にその場所にいる認識対象人物と、ポスター又は絵画やフィギュア等に写った認識対象人物とを区別し、実際にその場所にいる認識対象人物のみを識別して欲しい場合もある。 However, there may be cases where it is desired to distinguish between the person to be recognized who is actually present at that location and the person to be recognized who is shown on a poster, painting, figurine, etc., and to identify only the person to be recognized who is actually present at that location.

そこで、実際にその場所にいる認識対象人物は、時間の経過とともに動いたり、笑ったりして、動きや表情の変化がある。このような性質を利用し、認識対象除外部２７は、それらの認識対象人物の顔の経時的変化を検出し、経時的変化を検出しない認識対象人物の顔を除外することにより、ポスター又は絵画やフィギュア等に写った人物の顔を除外し、実際にその場所にいる認識対象人物の顔のみ認識することが出来る。 Therefore, the person to be recognized who is actually in the location moves or smiles over time, and their movements and facial expressions change. Utilizing such properties, the recognition target exclusion unit 27 detects changes over time in the faces of these recognition target persons and excludes faces of recognition target persons for which no change over time is detected. It is possible to exclude the faces of people in photographs, figurines, etc., and only recognize the faces of the people who are actually present at the location.

ここで、経時的変化とは、認識対象人物の顔画像が、経時適に変化することをいい、例えば、認識対象人物の顔画像から抽出された特徴点の位置等の変化である。認識対象除外部２７は、認識対象人物の顔画像のうち特徴点が変化している顔画像に対応する人物を認識対象人物として特定する。 Here, the term "change over time" refers to a change in the face image of the person to be recognized over time, such as a change in the position of a feature point extracted from the face image of the person to be recognized. The recognition target exclusion unit 27 specifies, as a recognition target person, a person corresponding to a face image in which feature points have changed among the face images of the recognition target person.

次に、第４の実施の形態の具体的な動作を説明する。 Next, the specific operation of the fourth embodiment will be explained.

認識対象人物認識部２５は、図１８に示すように、顔Ｂを「認識対象人物Ｘ」、顔Ｄを「認識対象人物Ｘ」、顔Ｅを「認識対象人物Ｙ」、顔Ｆを「認識対象人物Ｆ」と検出する。 As shown in FIG. 18, the recognition target person recognition unit 25 identifies face B as "recognition target person X," face D as "recognition target person X," face E as "recognition target person Y," and face F as "recognition target person X." Target person F' is detected.

認識対象除外部２７は、所定フレーム分の各認識対象人物の顔の顔画像を取得し、各認識対象人物の顔の特徴点の変化を検出する。ここで、顔Ｂは実際にその場所にいる認識対象人物の顔であり、顔Ｄ、顔Ｅ及び顔Ｆはポスターに写った認識対象人物の顔写真である。従って、図１９に示すように、顔Ｂは特徴点の変化が検出されるが、顔Ｄ、顔Ｅ及び顔Ｆは特徴点の変化が検出されない。 The recognition target exclusion unit 27 acquires a predetermined frame of facial images of each recognition target person's face, and detects changes in the feature points of each recognition target person's face. Here, face B is the face of the person to be recognized who is actually present at that location, and face D, face E, and face F are facial photographs of the person to be recognized that are shown on the poster. Therefore, as shown in FIG. 19, a change in the feature points of face B is detected, but a change in the feature points of face D, face E, and face F is not detected.

認識対象除外部２７は、特徴点の変化が検出されない顔Ｄ、顔Ｅ及び顔Ｆを認識対象人物の顔から除外し、特徴点の変化を検出した顔Ｂのみを認識対象人物として、表示装置３に出力する。図２０は、第４の実施の形態における表示装置３の表示例である。図７と比べて、図２０の例では、実際にその場所にいる認識対象人物Ｘの顔Ｂのみが矩形で囲まれ、矩形の下に識対象人物Ｘが表示されている。 The recognition target exclusion unit 27 excludes faces D, E, and F for which no changes in feature points have been detected from the faces of the recognition target person, and displays only face B for which a change in feature points has been detected as the recognition target person on the display device. Output to 3. FIG. 20 is a display example of the display device 3 in the fourth embodiment. Compared to FIG. 7, in the example of FIG. 20, only the face B of the person to be recognized X who is actually present at the location is surrounded by a rectangle, and the person to be recognized X is displayed below the rectangle.

第４の実施の形態は、ポスター又は絵画やフィギュア等に写った認識対象人物の顔は認識せず、実際にその場所にいる認識対象人物の顔のみ認識することが出来る。 The fourth embodiment does not recognize the face of the person to be recognized that is reflected in a poster, painting, figure, etc., but can only recognize the face of the person to be recognized who is actually present at that location.

上述した映像処理装置２は、具体的には、各種の演算処理等を行うプロセッサを有するコンピュータシステムによって実現することもできる。 Specifically, the video processing device 2 described above can also be realized by a computer system having a processor that performs various calculation processes.

図２１はコンピュータシステムによって構成された映像処理装置２のブロック図である。 FIG. 21 is a block diagram of a video processing device 2 configured by a computer system.

映像処理装置２は、図２１に示す如く、プロセッサ１０１、メモリ（ＲＯＭやＲＡＭ）１０２、記憶装置（ハードディスク、半導体ディスクなど）１０３、入力装置（キーボード、マウス、タッチパネルなど）１０４、通信装置１０５を有するコンピュータ１００により構成することができる。 As shown in FIG. 21, the video processing device 2 includes a processor 101, a memory (ROM or RAM) 102, a storage device (hard disk, semiconductor disk, etc.) 103, an input device (keyboard, mouse, touch panel, etc.) 104, and a communication device 105. It can be configured by a computer 100 having a computer 100.

映像処理装置２は、記憶装置１０３に格納されたプログラムがメモリ１０２にロードされ、プロセッサ１０１により実行されることにより、表情変化顔画像生成部２２、学習モデル２３、顔画像抽出部２４、認識対象人物認識部２５、物品装着顔画像生成部２６、認識対象除外部２７の機能が実現されるものである。また、基本顔画像記憶部２１は、記憶装置１０３に対応する。尚、基本顔画像記憶部２１は、コンピュータ１００と物理的に外部に設けられ、ＬＡＮ等のネットワークを介してコンピュータ１００と接続されていても良い。 The video processing device 2 includes a facial expression change face image generation unit 22, a learning model 23, a face image extraction unit 24, and a recognition target by loading the program stored in the storage device 103 into the memory 102 and executing it by the processor 101. The functions of the person recognition section 25, the article-wearing face image generation section 26, and the recognition target exclusion section 27 are realized. Further, the basic face image storage unit 21 corresponds to the storage device 103. Note that the basic face image storage unit 21 may be physically provided outside the computer 100 and connected to the computer 100 via a network such as a LAN.

以上好ましい実施の形態をあげて本発明を説明したが、全ての実施の形態の構成を備える必要はなく、適時組合せて実施することができるばかりでなく、本発明は必ずしも上記実施の形態に限定されるものではなく、その技術的思想の範囲内において様々に変形し実施することが出来る。 Although the present invention has been described above with reference to preferred embodiments, it is not necessary to include the configurations of all the embodiments, and not only can they be implemented in combination as appropriate, but the present invention is not necessarily limited to the above embodiments. The present invention is not limited to the above, and can be modified and implemented in various ways within the scope of the technical idea.

１カメラ
２映像処理装置
３表示装置
２１基本顔画像記憶部
２２表情変化顔画像生成部
２３学習モデル
２４顔画像抽出部
２５認識対象人物認識部
２６物品装着顔画像生成部
２７認識対象除外部 1 Camera 2 Video processing device 3 Display device 21 Basic facial image storage unit 22 Facial expression change facial image generation unit 23 Learning model 24 Facial image extraction unit 25 Recognition target person recognition unit 26 Article-mounted facial image generation unit 27 Recognition target exclusion unit

Claims

a face image extraction unit that extracts a face image of a person from the video;
a learning model that outputs a probability that the extracted person's face image is the face of the recognition target person;
a recognition target person recognition unit that recognizes the recognition target person from the video using the probability output by the learning model;
a changed face image generation unit that generates a time-varying face image in which the face of the recognition target person changes over time from a basic face image of the face of the recognition target person;
Equipped with
The changed face image generation unit generates the time-changed face image every predetermined elapsed time,
The learning model is a learning model trained using teacher data including a basic face image of the recognition target person, the time-varying face image, and the elapsed time of the time-varying face image,
The extracted face image of the person and the elapsed time are input to the learning model.
Video processing device.

The elapsed time is the time that has elapsed since the start of the predetermined competition,
The video processing device according to claim 1 .

The changing face image generation unit generates the time-changing face image that takes into account fatigue of the recognition target person over time that has passed since the start of a predetermined competition.
The video processing device according to claim 2 .

The changed facial image generation unit generates, from the basic facial image, a facial expression changed facial image with a facial expression different from the facial expression of the basic facial image,
The training data of the learning model further includes the facial expression change facial image.
The video processing device according to any one of claims 1 to 3 .

The changed face image generation unit generates an article-attached face image in which an article is attached to the face of the recognition target person from the basic face image,
The training data of the learning model further includes the image of the face wearing the article.
The video processing device according to any one of claims 1 to 4 .

The information processing device
Generating a time-varying face image in which the face of the recognition target person changes over time from a basic face image of the recognition target person's face at every predetermined elapsed time ;
Generating a learning model using training data including a basic face image of the recognition target person, the time-varying face image, and the elapsed time of the time-varying face image ;
extracting a person's face image from the video, inputting the extracted person's face image and the elapsed time to the learning model,
Recognizing the recognition target person from the video using an output value of the probability that the face of the recognition target person is the face of the learning model;
Image recognition method.

The elapsed time is the time that has elapsed since the start of the predetermined competition,
The image recognition method according to claim 6 .

the information processing device generates the time-varying facial image that takes into account fatigue of the recognition target person over time that has elapsed since the start of a predetermined competition;
The image recognition method according to claim 7 .