JP2020038605A

JP2020038605A - Information processing method and information processing system

Info

Publication number: JP2020038605A
Application number: JP2019075031A
Authority: JP
Inventors: 育規石井; Yasunori Ishii; 弘章浦部; Hiroaki Urabe
Original assignee: Panasonic Intellectual Property Corp of America
Current assignee: Panasonic Intellectual Property Corp of America
Priority date: 2018-08-29
Filing date: 2019-04-10
Publication date: 2020-03-12
Anticipated expiration: 2039-04-10
Also published as: JP7257227B2

Abstract

To provide an information processing method capable of improving training efficiency pf each of the models trained by machine learning.SOLUTION: An information processing method includes the steps of: acquiring sending data (S10); determining a synthetic portion which synthesizes recognition target data on the sensing data (S20); generating synthetic data in the synthetic portion by synthesizing recognition target data having characteristics identical to or similar to the characteristics perceived by a human receptor in the sensing data (S30); inputting the synthetic data to the model trained using machine learning so as to recognize the recognition target and acquiring recognition result data (S40); determining whether to perform second determination for determining whether to perform first determination for determining training data of the model on the basis of the synthetic data using at least correct answer data containing the synthetic portion and recognition result data (S50); and performing the first determination when the second determination determines that the first determination is performed (S60).SELECTED DRAWING: Figure 2

Description

本開示は、情報処理方法及び情報処理システムに関する。 The present disclosure relates to an information processing method and an information processing system.

機械学習のためのデータセット（以下、訓練用データセット又は学習用データセットとも称する。）の構築等において、十分な量のデータ（以下、訓練用データ又は学習用データとも称する。）を準備する方法として、合成画像を生成する方法が提案されている。例えば、特許文献１には、複数個の異なる種類のセンサを併用し、これらのセンサから得られる情報に基づいてＣＧ（ＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｓ）画像を生成することによって学習のサンプル数を増加させるシミュレーションシステム等が開示されている。また、例えば、特許文献２には、背景画像と色彩及び明度の少なくともいずれかが類似した部分を有する物体を撮影した画像と背景画像との差分画像を学習用データとして用いる画像処理装置等が開示されている。 In constructing a data set for machine learning (hereinafter also referred to as a training data set or a learning data set) or the like, a sufficient amount of data (hereinafter also referred to as training data or learning data) is prepared. As a method, a method of generating a composite image has been proposed. For example, Patent Literature 1 discloses a simulation system in which a plurality of different types of sensors are used together, and a CG (Computer Graphics) image is generated based on information obtained from these sensors to increase the number of learning samples. Is disclosed. Also, for example, Patent Document 2 discloses an image processing apparatus or the like that uses a difference image between an image obtained by photographing an object having a portion having at least one of color and brightness similar to a background image and a background image as learning data. Have been.

国際公開第２０１８／０６６３５１号International Publication No. WO2018 / 066351 国際公開第２０１７／１５４６３０号WO 2017/154630

上記従来技術では、生成される訓練用データが、機械学習により訓練されるモデル（以下、訓練モデル又は学習モデルとも称する。）の個々にとって有益であるとは限らない。したがって、上記従来技術では、個々のモデルの訓練効率を向上させることが難しい。 In the above related art, the generated training data is not always useful for each model trained by machine learning (hereinafter, also referred to as a training model or a learning model). Therefore, it is difficult to improve the training efficiency of each model with the above-mentioned conventional technology.

そこで、本開示は、機械学習により訓練されるモデルの個々の訓練効率を向上させることができる情報処理方法及び情報処理システムを提供する。 Thus, the present disclosure provides an information processing method and an information processing system that can improve the efficiency of individual training of a model trained by machine learning.

本開示の非限定的で例示的な一態様に係る情報処理方法は、センシングデータを取得し、前記センシングデータ上の認識対象データを合成する合成部分を決定し、前記合成部分に、前記センシングデータが有する人の感覚器により知覚される特徴と同一又は類似の特徴を有する認識対象データを合成して合成データを生成し、前記合成データを、認識対象を認識するように機械学習を用いて訓練されたモデルに入力して認識結果データを取得し、前記合成データに基づいて前記モデルの訓練用データを決定することである第１の決定を行うか否かを決定することである第２の決定を少なくとも前記合成部分を含む正解データと前記認識結果データとを用いて行い、前記第１の決定を行うと前記第２の決定において決定された場合、前記第１の決定を行う。 An information processing method according to a non-limiting, exemplary aspect of the present disclosure obtains sensing data, determines a synthesis part for synthesizing recognition target data on the sensing data, and sets the sensing data to the synthesis part. Generating synthetic data by synthesizing recognition target data having the same or similar characteristics as characteristics perceived by the human sensory organs, and training the synthesized data using machine learning to recognize the recognition target. Inputting the obtained model to obtain recognition result data, and determining whether or not to make a first determination of determining training data of the model based on the composite data. The decision is made using the correct answer data including at least the synthesized part and the recognition result data, and when the first decision is made in the second decision, the first decision is made. Performs constant.

なお、上記の包括的又は具体的な態様は、システム、装置、方法、集積回路、コンピュータプログラム又はコンピュータ読み取り可能な記録ディスク等の記録媒体で実現されてもよく、システム、装置、方法、集積回路、コンピュータプログラム及び記録媒体の任意な組み合わせで実現されてもよい。コンピュータ読み取り可能な記録媒体は、例えばＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃ−ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）等の不揮発性の記録媒体を含む。本開示の一態様の付加的な恩恵及び有利な点は本明細書及び図面から明らかとなる。この恩恵及び／又は有利な点は、本明細書及び図面に開示した様々な態様及び特徴により個別に提供され得るものであり、その１以上を得るために全てが必要ではない。 Note that the above comprehensive or specific aspects may be realized by a recording medium such as a system, an apparatus, a method, an integrated circuit, a computer program or a computer-readable recording disk, and the system, the apparatus, the method, and the integrated circuit. , A computer program and a recording medium. The computer-readable recording medium includes a non-volatile recording medium such as a CD-ROM (Compact Disc-Read Only Memory). Additional benefits and advantages of one aspect of the disclosure will be apparent from the description and drawings. This benefit and / or advantage can be provided individually by the various aspects and features disclosed herein and in the drawings, and not all are required to achieve one or more of the above.

本開示に係る情報処理方法等によれば、機械学習により訓練されるモデルの個々の訓練効率を向上させることができる。 According to the information processing method and the like according to the present disclosure, it is possible to improve the efficiency of individual training of a model trained by machine learning.

図１は、実施の形態に係る情報処理システムの構成の一例を示すブロック図である。FIG. 1 is a block diagram illustrating an example of a configuration of the information processing system according to the embodiment. 図２は、実施の形態に係る情報処理方法のフローの一例を示すフローチャートである。FIG. 2 is a flowchart illustrating an example of the flow of the information processing method according to the embodiment. 図３は、画像取得部で取得された撮影画像を示す図である。FIG. 3 is a diagram illustrating a captured image acquired by the image acquisition unit. 図４は、合成位置決定部で画像上の物体合成位置が決定された撮影画像を示す図である。FIG. 4 is a diagram illustrating a captured image in which an object combining position on an image is determined by the combining position determining unit. 図５は、合成画像生成部で物体合成位置に物体を合成することにより生成された合成画像を示す図である。FIG. 5 is a diagram illustrating a synthesized image generated by synthesizing an object at an object synthesis position in the synthesized image generation unit. 図６は、変形例１に係る情報処理システムの構成の一例を示すブロック図である。FIG. 6 is a block diagram illustrating an example of a configuration of an information processing system according to the first modification. 図７は、変形例１に係る情報処理方法のフローの一例を示すフローチャートである。FIG. 7 is a flowchart illustrating an example of the flow of the information processing method according to the first modification. 図８は、変形例２に係る情報処理システムの構成の一例を示すブロック図である。FIG. 8 is a block diagram illustrating an example of a configuration of an information processing system according to a second modification. 図９は、変形例２に係る情報処理方法のフローの一例を示すフローチャートである。FIG. 9 is a flowchart illustrating an example of a flow of an information processing method according to the second modification. 図１０は、変形例３に係る情報処理システムの構成の一例を示すブロック図である。FIG. 10 is a block diagram illustrating an example of a configuration of an information processing system according to Modification 3. 図１１は、変形例３に係る情報処理方法のフローの一例を示すフローチャートである。FIG. 11 is a flowchart illustrating an example of the flow of the information processing method according to the third modification.

本開示の一態様の概要は以下のとおりである。 The outline of one embodiment of the present disclosure is as follows.

本開示の一態様に係る情報処理方法は、センシングデータを取得し、前記センシングデータ上の認識対象データを合成する合成部分を決定し、前記合成部分に、前記センシングデータが有する人の感覚器により知覚される特徴と同一又は類似の特徴を有する認識対象データを合成して合成データを生成し、前記合成データを、認識対象を認識するように機械学習を用いて訓練されたモデルに入力して認識結果データを取得し、前記合成データに基づいて前記モデルの訓練用データを決定することである第１の決定を行うか否かを決定することである第２の決定を少なくとも前記合成部分を含む正解データと前記認識結果データとを用いて行い、前記第１の決定を行うと前記第２の決定において決定された場合、前記第１の決定を行う。このとき、前記センシングデータが有する前記特徴は、前記センシングデータの要素の統計的特徴であってもよい。また、前記センシングデータが有する前記特徴は、前記センシングデータの定性的特徴であってもよい。 An information processing method according to an aspect of the present disclosure obtains sensing data, determines a synthesis portion that synthesizes recognition target data on the sensing data, and, in the synthesis portion, by a human sensory organ that the sensing data has. Generating synthesized data by synthesizing recognition target data having the same or similar characteristics as the perceived characteristics, and inputting the synthesized data to a model trained using machine learning to recognize the recognition target Obtaining recognition result data and determining whether or not to make a first decision to determine training data for the model based on the combined data, at least a second decision to determine whether to make the It is performed using the correct answer data and the recognition result data, and if the first determination is made in the second determination, the first determination is performed. At this time, the feature of the sensing data may be a statistical feature of an element of the sensing data. Further, the feature of the sensing data may be a qualitative feature of the sensing data.

上記態様によれば、例えば、センシングデータが画像である場合、画像上の所望の合成部分に認識対象（例えば、物体）データを合成するため、合成部分（例えば、物体合成位置）の座標及び物体の種類等を示すアノテーションを、合成データ（ここでは、合成画像）に新たに付与する必要がない。そのため、正解データ作成のための一連の情報処理に要する時間を短くすることができる。また、上記態様によれば、画像上の所望の位置及びサイズで物体データを合成するため、物体合成位置の座標などの情報を、合成画像を学習モデル（以下、認識モデルとも呼ぶ）に入力した場合の正解データとして使用することができる。そのため、物体合成位置に物体が合成された合成画像を学習モデルに入力することにより得られる出力データと正解データとを比較して、学習モデルでの認識精度が低い合成画像を特定することができる。これにより、学習モデルでの認識精度が低い合成画像に基づいて、当該合成画像又は当該合成画像に類似する画像を学習モデルの訓練用データとして使用することができる。したがって、学習モデルの個々の訓練効率を向上させることができる。言い換えると、機械学習に有益でないデータが混在していると機械学習の処理が収束しにくくなるが、上記態様によれば、機械学習に有益なデータが特定され訓練用データとして使用されるため、機械学習の処理が収束しやすくなる。それにより、学習モデルの個々の訓練効率が向上する。例えば、生成された全ての合成画像を学習モデルの訓練用データとして使用する場合に比べて、学習モデルの認識精度をより短時間で向上させることができる。また、上記態様によれば、画像が有する人の感覚器により知覚される特徴（例えば、視覚的特徴）と同一又は類似の特徴（つまり、視覚的特徴）を有する物体を物体合成位置に合成するため、画像がカメラなどで実際に撮影された画像（以下、撮影画像）である場合は、撮影画像に近い自然な合成画像を得ることができる。そのため、当該合成画像を訓練用データとして使用して機械学習により訓練された学習モデルは、撮影画像を訓練用データとして使用した場合の認識精度により近い認識精度を得ることができる。なお、画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する物体とは、例えば、画像の要素（例えば、画像パラメタ）の統計的特徴である画像の色味又はエッジなどが同一又は類似している物体であってもよく、画像の定性的特徴である雨又は雪などの気象条件、気象条件による路面の状態、及びオクルージョンなどが同一又は類似している物体であってもよい。当該物体は、上記特徴を有することにより、画像に馴染みやすいため、物体合成位置に当該物体を合成して生成される合成画像は自然な画像となる。 According to the above aspect, for example, when the sensing data is an image, the coordinates of the combining portion (for example, the object combining position) and the object are combined to combine the recognition target (for example, object) data with the desired combining portion on the image. It is not necessary to newly add an annotation indicating the type or the like to the composite data (here, the composite image). Therefore, the time required for a series of information processing for creating the correct answer data can be reduced. According to the above aspect, in order to synthesize object data at a desired position and size on an image, information such as coordinates of an object synthesis position is input to a learning image (hereinafter, also referred to as a recognition model) of the synthesized image. It can be used as correct answer data in the case. Therefore, by comparing the output data obtained by inputting the synthesized image in which the object is synthesized at the object synthesis position into the learning model with the correct answer data, it is possible to specify the synthesized image with low recognition accuracy in the learning model. . This makes it possible to use the synthesized image or an image similar to the synthesized image as training data of the learning model based on the synthesized image with low recognition accuracy in the learning model. Therefore, the efficiency of individual training of the learning model can be improved. In other words, if data that is not useful for machine learning is mixed, it is difficult for the processing of machine learning to converge, but according to the above aspect, data that is useful for machine learning is specified and used as training data. Machine learning processing is likely to converge. Thereby, the efficiency of individual training of the learning model is improved. For example, the recognition accuracy of the learning model can be improved in a shorter time than when all generated synthetic images are used as training data for the learning model. According to the above aspect, an object having the same or similar feature (that is, visual feature) as a feature (for example, visual feature) perceived by a human sensory organ included in the image is synthesized at the object synthesis position. Therefore, when the image is an image actually captured by a camera or the like (hereinafter, a captured image), a natural composite image close to the captured image can be obtained. Therefore, a learning model trained by machine learning using the synthetic image as training data can obtain recognition accuracy closer to the recognition accuracy when the captured image is used as training data. Note that an object having the same or similar visual feature as an image has, for example, the same or similar color or edge of an image, which is a statistical feature of an element (for example, an image parameter) of the image. It may be an object having the same or similar weather conditions, such as rain or snow, which are qualitative features of the image, the state of the road surface due to the weather conditions, and occlusion. Since the object has the above characteristics, it is easy to adapt to the image. Therefore, a synthesized image generated by synthesizing the object at the object synthesis position is a natural image.

例えば、本開示の一態様に係る情報処理方法は、前記第１の決定では、前記合成データを前記モデルの訓練用データとして決定してもよい。 For example, in the information processing method according to an aspect of the present disclosure, in the first determination, the composite data may be determined as training data of the model.

上記態様によれば、学習モデルでの認識精度が低いと判定された合成画像を訓練用データとして使用することができる。そのため、学習モデルでの認識精度が高いデータ、すなわち訓練用データとしては不要なデータを訓練用データとして蓄積することが抑制される。したがって、データを蓄積するためのコストが削減される。言い換えると、学習モデルでの認識精度が低いシーンの画像を重点的に訓練用データとして蓄積することができるため、認識精度の低いシーンに対する効率的な学習が可能となる。そのため、学習モデルの認識精度がより向上される。 According to the above aspect, it is possible to use a synthesized image determined to have low recognition accuracy in the learning model as training data. Therefore, accumulation of data with high recognition accuracy in the learning model, that is, data unnecessary as training data, as training data is suppressed. Therefore, the cost for storing data is reduced. In other words, since images of scenes with low recognition accuracy in the learning model can be accumulated as training data, efficient learning can be performed on scenes with low recognition accuracy. Therefore, the recognition accuracy of the learning model is further improved.

例えば、本開示の一態様に係る情報処理方法は、前記第１の決定では、前記合成データが有する前記特徴と同一又は類似の前記特徴を有する対応データを前記モデルの訓練用データとして決定してもよい。このとき、前記合成データが有する前記特徴は、前記合成データの要素の統計的特徴であってもよい。また、前記合成データが有する前記特徴は、前記合成データの定性的特徴であってもよい。 For example, in the information processing method according to an aspect of the present disclosure, in the first determination, correspondence data having the same or similar feature as the feature of the composite data is determined as training data of the model. Is also good. At this time, the characteristic of the composite data may be a statistical characteristic of an element of the composite data. Further, the characteristic of the composite data may be a qualitative characteristic of the composite data.

上記態様によれば、合成画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する対応画像を訓練用データとして決定するため、学習モデルでの認識精度が低いシーンの画像及び当該画像に類似するシーンの画像を訓練用データとして使用することができる。そのため、認識精度が低いシーンに対する訓練用データの数及びバリエーションを効率よく増やすことができる。なお、対応画像が撮影画像の場合は、合成画像を訓練用データとしたときに比べて学習効果を向上させることができる。また、視覚的特徴が合成画像の要素（例えば、画像パラメタ）の統計的特徴である場合は、統計学的な観点から訓練用データの数及びバリエーションを効率よく増やすことができる。また、視覚的特徴が合成画像の定性的特徴である場合は、定量化しづらい特徴を有する訓練用データの数及びバリエーションを効率よく増やすことができる。 According to the above aspect, since the corresponding image having the same or similar visual feature as the visual feature of the composite image is determined as the training data, an image of a scene with low recognition accuracy in the learning model and a similar image are used. The image of the scene to be used can be used as training data. Therefore, the number and variations of training data for a scene with low recognition accuracy can be efficiently increased. When the corresponding image is a photographed image, the learning effect can be improved as compared with the case where the synthetic image is used as training data. When the visual feature is a statistical feature of an element (for example, an image parameter) of the composite image, the number and variations of the training data can be efficiently increased from a statistical viewpoint. When the visual features are qualitative features of the composite image, the number and variations of training data having features that are difficult to quantify can be efficiently increased.

例えば、本開示の一態様に係る情報処理方法では、前記センシングデータは、画像であり、前記認識対象は、物体であり、前記合成部分は、前記画像上の物体データを合成する物体合成位置であり、前記合成データは、前記物体合成位置に、前記画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する物体データを合成して生成される合成画像であり、前記認識結果データは、前記合成画像を前記モデルに入力して取得される物体認識結果データであり、前記第１の決定は、前記合成画像に基づいて前記モデルの訓練用データを決定することであり、前記第２の決定は、少なくとも前記物体合成位置を含む正解データと前記物体認識結果データとを用いて行われてもよい。例えば、前記第１の決定は、前記合成画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する対応画像を前記モデルの訓練用データとして決定することであり、前記合成画像が有する視覚的特徴は、前記合成画像における前記物体の態様であり、前記対応画像が有する視覚的特徴は、前記物体が有する属性と同一又は類似の属性を有する対応物体の態様であってもよい。この場合、前記態様は、前記物体の前記合成画像上における位置であってもよい。また、前記態様は、前記物体の姿勢であってもよい。 For example, in the information processing method according to an aspect of the present disclosure, the sensing data is an image, the recognition target is an object, and the combining portion is an object combining position for combining object data on the image. The synthesized data is a synthesized image generated by synthesizing object data having the same or similar visual characteristics as the visual characteristics of the image at the object synthesis position, and the recognition result data is The first image is object recognition result data obtained by inputting the composite image to the model, and the first determination is to determine training data of the model based on the composite image. The determination may be performed using correct data including at least the object synthesis position and the object recognition result data. For example, the first determination is to determine a corresponding image having the same or similar visual feature as the visual feature of the composite image as training data of the model, and the visual determination of the composite image is The feature may be an aspect of the object in the composite image, and the visual feature of the corresponding image may be an aspect of a corresponding object having the same or similar attribute as the attribute of the object. In this case, the aspect may be a position of the object on the composite image. Further, the aspect may be a posture of the object.

上記態様によれば、合成画像が有する視覚的特徴が合成画像における物体の態様であるため、例えば、合成画像上の物体の位置又は物体の姿勢などの物体の態様の違いにより学習モデルでの物体の認識精度が低いと判定された合成画像に基づいて訓練用データが決定される。これにより、学習モデルでの認識精度が低いシーンの画像及び当該画像に類似するシーンの画像を訓練用データとして使用することができる。そのため、認識精度が低いシーンに対する訓練用データの数及びバリエーションを効率よく増やすことができる。このような訓練用データを用いて構築される認識モデルは、画像から物体を認識する精度が向上される。 According to the above aspect, since the visual feature of the composite image is the aspect of the object in the composite image, for example, the object in the learning model depends on the aspect of the object such as the position of the object on the composite image or the posture of the object. The training data is determined based on the synthesized image determined to have low recognition accuracy. Thus, an image of a scene with low recognition accuracy in the learning model and an image of a scene similar to the image can be used as training data. Therefore, the number and variations of training data for a scene with low recognition accuracy can be efficiently increased. A recognition model constructed using such training data has improved accuracy in recognizing an object from an image.

例えば、本開示の一態様に係る情報処理方法では、合成部分は、さらに画像上に合成される物体データのサイズを含んでもよい。 For example, in the information processing method according to an aspect of the present disclosure, the combining portion may further include the size of the object data combined on the image.

上記態様によれば、当該画像にとってより違和感の少ない合成データを得ることができる。 According to the above aspect, it is possible to obtain combined data with less discomfort for the image.

例えば、本開示の一態様に係る情報処理方法では、前記合成データが有する前記特徴と同一又は類似の前記特徴を有するデータを、前記対応データとして、前記合成データとは異なるセンシングデータから選出又は生成してもよい。 For example, in the information processing method according to an aspect of the present disclosure, data having the same or similar feature as the feature of the combined data is selected or generated as the corresponding data from sensing data different from the combined data. May be.

上記態様によれば、撮影画像を訓練用データとして使用することができる。そのため、合成画像を訓練用データとして使用する場合に比べて、より高い学習効果が得られる。なお、撮影画像の選出は、画像が取得される度に所定の条件に基づいて記録するか否かを判定することであってもよく、取得された画像の中から所定の条件に基づいて画像をサンプリングすることであってもよく、メモリ又はデータベース等に格納された撮影画像から所定の条件を満たす撮影画像を検索して抽出することであってもよい。また、上記態様によれば、対応画像を撮影画像から生成することができる。具体的には、認識モデルでの認識精度が低いシーンの画像及び当該シーンに類似する画像を撮影画像から生成することができる。これにより、撮影画像をそのまま対応画像として使用できない場合であっても対応画像を生成することができるため、訓練用データの数及びバリエーションを容易に増やすことができる。 According to the above aspect, the captured image can be used as training data. Therefore, a higher learning effect can be obtained as compared with the case where the synthesized image is used as training data. Note that the selection of the captured image may be to determine whether or not to record the image based on a predetermined condition each time the image is acquired, and to select an image based on a predetermined condition from the acquired images. May be sampled, or a captured image satisfying a predetermined condition may be searched for and extracted from a captured image stored in a memory or a database. According to the above aspect, the corresponding image can be generated from the captured image. Specifically, an image of a scene with low recognition accuracy in the recognition model and an image similar to the scene can be generated from the captured image. Accordingly, the corresponding image can be generated even when the captured image cannot be used as the corresponding image as it is, so that the number and variations of the training data can be easily increased.

例えば、本開示の一態様に係る情報処理方法では、ＧＡＮ（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋ）モデルを用いて前記合成部分に前記認識対象データを合成してもよい。 For example, in the information processing method according to an aspect of the present disclosure, the recognition target data may be combined with the combining portion using a GAN (Generative Adversary Network) model.

上記態様によれば、所望の位置に所望の物体を合成しつつ、撮影画像に近い、より自然な合成画像を得ることができる。このような合成画像を訓練用データとして使用することにより、学習モデルでの物体の認識精度を高めることができる。 According to the above aspect, it is possible to obtain a more natural synthesized image close to a captured image while synthesizing a desired object at a desired position. By using such a synthesized image as training data, it is possible to improve the recognition accuracy of the object in the learning model.

例えば、本開示の一態様に係る情報処理方法では、さらに、前記第１の決定を行うと前記第２の決定において決定された場合、前記学習モデルのユーザに通知を行ってもよい。このとき、前記通知は、決定された前記訓練用データを用いた前記モデルの訓練の要請に関する通知であってもよい。また、本開示の一態様に係る情報処理方法では、さらに、決定された前記訓練用データを用いた前記モデルの訓練を実行し、前記通知は、前記訓練の完了に関する通知であってもよい。 For example, in the information processing method according to an aspect of the present disclosure, when the first determination is made and the second determination is made, the user of the learning model may be notified. At this time, the notification may be a notification regarding a request for training of the model using the determined training data. In the information processing method according to an aspect of the present disclosure, the model may be further trained using the determined training data, and the notification may be a notification regarding completion of the training.

上記態様によれば、合成画像に基づいて学習モデルの訓練用データが決定された場合、学習モデルのユーザに通知が行われるため、ユーザは学習モデルで物体を認識しにくいシーンがあることを把握することができる。また、上記通知が学習モデルの訓練の要請に関する通知である場合、ユーザは、学習モデルの訓練を行うタイミングを決定することができる。また、上記通知が訓練の完了に関する通知である場合、ユーザは、訓練により学習モデルが更新されたことを知ることができる。 According to the above aspect, when the training data of the learning model is determined based on the composite image, the user of the learning model is notified, so that the user grasps that there is a scene in which the learning model has difficulty in recognizing the object. can do. In addition, when the notification is a notification regarding a request for training of a learning model, the user can determine a timing for performing training of the learning model. In addition, when the notification is a notification regarding the completion of the training, the user can know that the learning model has been updated by the training.

また、本開示の一態様に係る情報処理システムは、センシングデータを取得する第１取得部と、前記センシングデータ上の認識対象データを合成する合成部分を決定する第１決定部と、前記合成部分に、前記センシングデータが有する人の感覚器により知覚される特徴と同一又は類似の特徴を有する認識対象データを合成して合成データを生成する生成部と、前記合成データを、認識対象を認識するように機械学習を用いて訓練されたモデルに入力して認識結果データを取得する第２取得部と、前記合成データに基づいて前記モデルの訓練用データを決定することである第１の決定を行うか否かを決定することである第２の決定を少なくとも前記合成部分を含む正解データと前記認識結果データとを用いて行い、前記第１の決定を行うと前記第２の決定において決定された場合、前記第１の決定を行う第２決定部と、を備える。 In addition, the information processing system according to an aspect of the present disclosure includes a first acquisition unit that acquires sensing data, a first determination unit that determines a combining unit that combines recognition target data on the sensing data, and the combining unit. A generating unit that generates synthetic data by synthesizing recognition target data having the same or similar characteristics as characteristics perceived by a human sensory organ included in the sensing data, and recognizing the recognition target with the synthetic data. A second acquisition unit that acquires recognition result data by inputting to a model trained using machine learning, and a first determination that is to determine training data of the model based on the combined data. A second decision to determine whether or not to perform is performed using the correct answer data including at least the synthesized part and the recognition result data, and the first determination is performed when the first determination is performed. If it is determined in the determination of 2, and a second determination unit to perform the first determination.

上記態様によれば、例えば、センシングデータが画像である場合、画像上の所望の合成部分に認識対象（例えば、物体）データを合成するため、合成部分（例えば、物体合成位置）の座標及び物体の種類等を示すアノテーションを、合成データ（ここでは、合成画像）に新たに付与する必要がない。そのため、正解データ作成のための一連の情報処理に要する時間を短くすることができる。また、上記態様によれば、画像上の所望の位置及びサイズで物体データを合成するため、物体合成位置の座標などの情報を、合成画像を学習モデル（以下、認識モデルとも呼ぶ）に入力した場合の正解データとして使用することができる。そのため、物体合成位置に物体が合成された合成画像を学習モデルに入力することにより得られる出力データと正解データとを比較して、学習モデルでの認識精度が低い合成画像を特定することができる。これにより、学習モデルでの認識精度が低い合成画像に基づいて、当該合成画像又は当該合成画像に類似する画像を学習モデルの訓練用データとして使用することができる。したがって、学習モデルの個々の訓練効率を向上させることができる。言い換えると、機械学習に有益でないデータが混在していると機械学習の処理が収束しにくくなるが、本実施の形態によれば、機械学習に有益なデータが特定され訓練用データとして使用されるため、機械学習の処理が収束しやすくなる。これにより、学習モデルの個々の訓練効率が向上する。例えば、生成された全ての合成画像を学習モデルの訓練用データとして使用する場合に比べて、学習モデルの認識精度をより短時間に効率よく向上させることができる。また、上記態様によれば、画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する物体を物体合成位置に合成するため、画像がカメラなどで実際に撮影された撮影画像である場合は、撮像画像近い自然な合成画像を得ることができる。そのため、当該合成画像を訓練用データとして使用して機械学習により訓練された学習モデルは、撮影画像を訓練用データとして使用した場合の認識精度により近い認識精度を得ることができる。 According to the above aspect, for example, when the sensing data is an image, the coordinates of the combining portion (for example, the object combining position) and the object are combined to combine the recognition target (for example, object) data with the desired combining portion on the image. It is not necessary to newly add an annotation indicating the type or the like to the composite data (here, the composite image). Therefore, the time required for a series of information processing for creating the correct answer data can be reduced. According to the above aspect, in order to synthesize object data at a desired position and size on an image, information such as coordinates of an object synthesis position is input to a learning image (hereinafter, also referred to as a recognition model) of the synthesized image. It can be used as correct answer data in the case. Therefore, by comparing the output data obtained by inputting the synthesized image in which the object is synthesized at the object synthesis position into the learning model with the correct answer data, it is possible to specify the synthesized image with low recognition accuracy in the learning model. . This makes it possible to use the synthesized image or an image similar to the synthesized image as training data of the learning model based on the synthesized image with low recognition accuracy in the learning model. Therefore, the efficiency of individual training of the learning model can be improved. In other words, if data that is not useful for machine learning is mixed, it is difficult for the processing of machine learning to converge, but according to the present embodiment, data that is useful for machine learning is specified and used as training data. Therefore, the machine learning process is easily converged. Thereby, the efficiency of individual training of the learning model is improved. For example, the recognition accuracy of the learning model can be improved more efficiently in a shorter time than when all the generated synthesized images are used as training data for the learning model. According to the above aspect, in order to combine an object having the same or similar visual feature as the visual feature of the image at the object combining position, if the image is a captured image actually captured by a camera or the like, Thus, it is possible to obtain a natural synthesized image close to the captured image. Therefore, a learning model trained by machine learning using the synthetic image as training data can obtain recognition accuracy closer to the recognition accuracy when the captured image is used as training data.

なお、上記の包括的又は具体的な態様は、システム、装置、方法、集積回路、コンピュータプログラム又はコンピュータ読み取り可能な記録ディスク等の記録媒体で実現されてもよく、システム、装置、方法、集積回路、コンピュータプログラム及び記録媒体の任意な組み合わせで実現されてもよい。コンピュータ読み取り可能な記録媒体は、例えばＣＤ−ＲＯＭ等の不揮発性の記録媒体を含む。 Note that the above comprehensive or specific aspects may be realized by a recording medium such as a system, an apparatus, a method, an integrated circuit, a computer program or a computer-readable recording disk, and the system, the apparatus, the method, and the integrated circuit. , A computer program and a recording medium. The computer-readable recording medium includes, for example, a non-volatile recording medium such as a CD-ROM.

以下、本開示の実施の形態に係る情報処理方法及び情報処理システムについて、図面を参照しながら具体的に説明する。なお、以下で説明する実施の形態は、いずれも包括的又は具体的な例を示すものである。以下の実施の形態で示される数値、形状、構成要素、構成要素の配置位置及び接続形態、ステップ（工程）、ステップの順序等は、一例であり、本開示を限定する主旨ではない。また、以下の実施の形態における構成要素のうち、最上位概念を示す独立請求項に記載されていない構成要素については、任意の構成要素として説明される。また、以下の実施の形態の説明において、略平行、略直交のような「略」を伴った表現が、用いられる場合がある。例えば、略平行とは、完全に平行であることを意味するだけでなく、実質的に平行である、すなわち、例えば数％程度の差異を含むことも意味する。他の「略」を伴った表現についても同様である。また、以下の実施の形態の説明において、２つの要素が類似することは、例えば、２つの要素の間で半分以上の部分又は主要な部分が同じであること、あるいは、２つの要素が共通の性質を有すること等を意味する。また、各図は模式図であり、必ずしも厳密に図示されたものではない。さらに、各図において、実質的に同一の構成要素に対しては同一の符号を付しており、重複する説明は省略又は簡略化される場合がある。 Hereinafter, an information processing method and an information processing system according to an embodiment of the present disclosure will be specifically described with reference to the drawings. Each of the embodiments described below shows a comprehensive or specific example. Numerical values, shapes, components, arrangement positions and connection forms of components, steps (processes), order of steps, and the like shown in the following embodiments are examples, and are not intended to limit the present disclosure. In addition, among the components in the following embodiments, components not described in the independent claims indicating the highest concept are described as arbitrary components. In the following description of the embodiments, expressions with “substantially” such as substantially parallel or substantially orthogonal may be used. For example, substantially parallel means not only completely parallel, but also substantially parallel, that is, including, for example, a difference of about several percent. The same applies to expressions with other “abbreviations”. In the following description of the embodiments, the similarity of two elements means that, for example, a half or more part or a main part is the same between two elements, or that two elements are common. It means having properties. Each drawing is a schematic diagram and is not necessarily strictly illustrated. Further, in each of the drawings, substantially the same components are denoted by the same reference numerals, and redundant description may be omitted or simplified in some cases.

（実施の形態）
［実施の形態に係る情報処理システムの構成及び動作］
実施の形態に係る情報処理システムの構成及び動作について図１及び図２を参照して説明する。図１は、本実施の形態に係る情報処理システム１００の構成の一例を示すブロック図である。図２は、本実施の形態に係る情報処理方法のフローを示すフローチャートである。 (Embodiment)
[Configuration and Operation of Information Processing System According to Embodiment]
The configuration and operation of the information processing system according to the embodiment will be described with reference to FIGS. FIG. 1 is a block diagram illustrating an example of a configuration of an information processing system 100 according to the present embodiment. FIG. 2 is a flowchart illustrating a flow of the information processing method according to the present embodiment.

図１に示すように、情報処理システム１００は、画像取得部１０と、画像上の物体合成位置を決定する合成位置決定部２０と、合成画像生成部３０と、合成画像を学習モデル（以下、認識モデル）に入力して得られる出力データを取得する出力データ取得部４０と、認識モデルの学習用データを決定する決定部５０と、を備える。 As shown in FIG. 1, the information processing system 100 includes an image acquisition unit 10, a combination position determination unit 20 that determines an object combination position on an image, a combination image generation unit 30, and a learning model (hereinafter, referred to as a learning model). An output data obtaining unit 40 obtains output data obtained by inputting to the recognition model, and a determination unit 50 that determines learning data of the recognition model.

なお、画像取得部１０は、センシングデータを取得する第１取得部の一例である。例えば、センシングデータは、画像である。また、合成位置決定部２０は、センシングデータ上の認識対象データを合成する合成部分（ここでは、物体合成位置）を決定する第１決定部の一例である。例えば、認識対象は、物体である。また、合成画像生成部３０は、合成部分にセンシングデータが有する人の感覚器により知覚される特徴と同一又は類似の特徴を有する認識対象データを合成して合成データを生成する生成部の一例である。例えば、合成部分は、画像上の物体データを合成する物体合成位置であり、合成データは、合成画像である。また、出力データ取得部４０は、合成データを、認識対象を認識するように機械学習を用いて訓練されたモデル（以下、認識モデル又は学習モデルとも呼ぶ）に入力して認識結果データを取得する第２取得部の一例である。例えば、認識結果データは、合成画像を認識モデルに入力して取得される物体認識結果データである。 The image acquisition unit 10 is an example of a first acquisition unit that acquires sensing data. For example, the sensing data is an image. The combining position determining unit 20 is an example of a first determining unit that determines a combining unit (here, an object combining position) that combines recognition target data on sensing data. For example, the recognition target is an object. The synthetic image generating unit 30 is an example of a generating unit that generates synthetic data by synthesizing recognition target data having the same or similar characteristics as characteristics perceived by a human sensory organ included in sensing data in a synthetic part. is there. For example, the combining portion is an object combining position at which the object data on the image is combined, and the combining data is a combined image. Further, the output data acquisition unit 40 inputs the synthesized data to a model trained using machine learning to recognize the recognition target (hereinafter, also referred to as a recognition model or a learning model) and obtains recognition result data. It is an example of a 2nd acquisition part. For example, the recognition result data is object recognition result data obtained by inputting a composite image to a recognition model.

図１及び図２に示すように、画像取得部１０は、撮影された画像を取得する（図２のＳ１０）。取得するとは、例えば、撮像装置によって撮影された画像（以下、撮影画像とも呼ぶ）を取得することであってもよく、画像を撮像することにより撮影画像を取得することであってもよい。前者の場合、画像取得部１０は、例えば、受信部であり、撮像装置によって撮影された画像を通信により受信する。また、後者の場合、画像取得部１０は、例えば、カメラなどの撮像部であり、画像を撮像する。 As shown in FIGS. 1 and 2, the image acquisition unit 10 acquires a captured image (S10 in FIG. 2). Acquiring may be, for example, acquiring an image photographed by an imaging device (hereinafter, also referred to as a photographed image), or acquiring a photographed image by photographing the image. In the former case, the image acquiring unit 10 is, for example, a receiving unit, and receives an image captured by the imaging device through communication. In the latter case, the image acquisition unit 10 is, for example, an imaging unit such as a camera, and captures an image.

合成位置決定部２０は、画像取得部１０で取得された画像上の、物体データを合成する位置である物体合成位置を決定する（図２のＳ２０）。画像に合成される物体は、例えば、人、動物及び車両等の移動可能な物体、並びに、植物及び道路付属物等の不動な物体を含む。物体データを合成する画像上の位置は、任意に決定される。物体合成位置は、さらに、画像上に合成される物体データのサイズを含んでもよい。 The combining position determining unit 20 determines an object combining position on the image acquired by the image acquiring unit 10 at which the object data is combined (S20 in FIG. 2). Objects to be synthesized with the image include, for example, movable objects such as people, animals and vehicles, and immovable objects such as plants and road accessories. The position on the image where the object data is combined is arbitrarily determined. The object combining position may further include the size of the object data combined on the image.

合成画像生成部３０は、物体合成位置に、撮影画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する物体データを合成して合成画像を生成する（図２のＳ３０）。撮影画像が有する視覚的特徴は、撮影画像の要素（以下、画像パラメタともいう）の統計的特徴である。画像パラメタの統計的特徴とは、例えば、画像の色調、明るさ、及びエッジ等が挙げられる。また、撮影画像が有する視覚的特徴は、撮影画像の定性的特徴である。画像の定性的特徴とは、数値化が難しい画像の特徴であり、例えば、雨又は雪などの気象条件、気象条件に伴う路面の状態（例えば、路面が雨で濡れた状態）、オクルージョン等が挙げられる。 The synthetic image generating unit 30 generates a synthetic image by synthesizing object data having the same or similar visual characteristics as the visual characteristics of the captured image at the object synthesizing position (S30 in FIG. 2). The visual characteristics of the captured image are statistical characteristics of elements of the captured image (hereinafter, also referred to as image parameters). The statistical characteristics of the image parameters include, for example, the color tone, brightness, and edge of the image. Further, the visual features of the captured image are qualitative features of the captured image. The qualitative feature of an image is a feature of an image that is difficult to quantify. No.

合成画像生成部３０は、例えば、ＧＡＮ（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋ）モデルを用いて物体合成位置に物体を合成する。これにより、得られる合成画像では、合成された物体と背景との色調及び明るさ、つまり、撮影画像上に合成された物体と当該撮影画像との画像パラメタの統計的特徴が同一又は類似になる。そのため、所望の位置に所望の物体を合成しつつ、撮影画像に近い、より自然な合成画像を得ることができる。なお、ＧＡＮを用いた合成画像の生成方法については、後述する。また、ＧＡＮは一例であって、合成画像の生成方法については、特に限定されない。合成画像の生成方法は、より自然な撮影画像に近い合成画像を得ることができる方法であればよい。 The synthetic image generation unit 30 synthesizes an object at an object synthesis position using, for example, a GAN (Generative Adversary Network) model. Thereby, in the obtained composite image, the color tone and brightness of the composite object and the background, that is, the statistical characteristics of the image parameters of the composite object and the captured image on the captured image are the same or similar. . Therefore, a more natural synthesized image close to the captured image can be obtained while synthesizing a desired object at a desired position. Note that a method of generating a composite image using GAN will be described later. GAN is an example, and the method of generating a composite image is not particularly limited. The method of generating the composite image may be any method that can obtain a composite image closer to a more natural captured image.

出力データ取得部４０は、合成画像生成部３０で得られた合成画像を認識モデルに入力することにより得られる物体認識結果データ（つまり、認識モデルの出力データ）を取得する（図２のＳ４０）。 The output data acquisition unit 40 acquires object recognition result data (that is, output data of the recognition model) obtained by inputting the composite image obtained by the composite image generation unit 30 to the recognition model (S40 in FIG. 2). .

決定部５０は、正解データと出力データとを用いて、第１の決定を行うか否かを決定することである第２の決定を行う。より具体的には、決定部５０は、合成画像に基づいて認識モデルの訓練用データを決定することである第１の決定を行うか否かを決定することである第２の決定を、少なくとも物体合成位置を含む正解データと出力データとを用いて行う（図２のＳ５０）。正解データは、例えば、物体合成位置の領域を示す座標、物体の種類、物体の姿勢などの情報を含む。決定部５０は、当該正解データと出力データとの差分の大きさに基づいて、物体合成位置に合成された物体を認識する精度（以下、物体の認識精度）を判定する。決定部５０は、第１の決定を行うと第２の決定において決定された場合、第１の決定を行う（図２のＳ６０）。より具体的には、決定部５０は、認識モデルにおける物体の認識精度が所定の閾値よりも低い場合、認識モデルに入力された合成画像及び当該合成画像と同一又は類似する画像を認識モデルの訓練用データとして決定する。 The decision unit 50 makes a second decision, which is to decide whether or not to make the first decision, using the correct answer data and the output data. More specifically, the determination unit 50 determines at least a second determination to determine whether to make a first determination to determine training data of the recognition model based on the composite image, This is performed using the correct data including the object combining position and the output data (S50 in FIG. 2). The correct answer data includes, for example, information indicating the coordinates of the area of the object combining position, the type of the object, the posture of the object, and the like. The determination unit 50 determines the accuracy of recognizing the object synthesized at the object synthesis position (hereinafter, object recognition accuracy) based on the magnitude of the difference between the correct answer data and the output data. The deciding unit 50 makes the first decision when the first decision is made in the second decision (S60 in FIG. 2). More specifically, when the recognition accuracy of the object in the recognition model is lower than a predetermined threshold, the determination unit 50 trains the synthesized image input to the recognition model and an image identical or similar to the synthesized image to the recognition model. Determined as data for use.

以上のように、情報処理システム１００は、訓練用データを用いて認識モデルに学習させることによって、物体の認識精度が向上された認識モデルを構築する。本実施の形態では、認識モデルは、ＤｅｅｐＬｅａｒｎｉｎｇ（深層学習）等のニューラルネットワークを用いた機械学習モデルであるが、他の学習モデルであってもよい。例えば、他の学習モデルは、ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ、Ｂｏｏｓｔｉｎｇ、ＲａｎｄｏｍＦｏｒｅｓｔ、又は、ＧｅｎｅｔｉｃＰｒｏｇｒａｍｍｉｎｇ等を用いた機械学習モデルであってもよい。 As described above, the information processing system 100 constructs a recognition model in which the recognition accuracy of the object is improved by causing the recognition model to learn using the training data. In the present embodiment, the recognition model is a machine learning model using a neural network such as Deep Learning (deep learning), but may be another learning model. For example, another learning model may be a machine learning model using Support Vector Machine, Boosting, Random Forest, or Genetic Programming.

以下、図３〜図５を参照して、本実施の形態に係る情報処理システム１００において、撮影画像が取得されてから合成画像が生成されるまでの手順の一例を説明する。図３は、画像取得部１０で取得された撮影画像を示す図である。図４は、合成位置決定部２０で画像上の物体合成位置が決定された撮影画像を示す図である。図５は、合成画像生成部３０で物体合成位置に物体を合成することにより生成された合成画像を示す図である。ここでは、ＧＡＮモデルを用いて物体を物体合成位置に合成する例を説明する。 Hereinafter, with reference to FIG. 3 to FIG. 5, an example of a procedure from acquisition of a captured image to generation of a composite image in the information processing system 100 according to the present embodiment will be described. FIG. 3 is a diagram illustrating a captured image obtained by the image obtaining unit 10. FIG. 4 is a diagram illustrating a captured image in which the object combining position on the image is determined by the combining position determination unit 20. FIG. 5 is a diagram illustrating a combined image generated by combining the object at the object combining position in the combined image generation unit 30. Here, an example in which an object is synthesized at an object synthesis position using a GAN model will be described.

画像取得部１０は、図３に示す撮影画像を取得する。この撮影画像は、例えば、車載カメラで撮影された画像である。 The image acquisition unit 10 acquires the captured image shown in FIG. This captured image is, for example, an image captured by a vehicle-mounted camera.

次いで、図４に示すように、合成位置決定部２０は、撮影画像上に物体を合成する物体合成位置Ａ及び物体合成位置Ｂを決定する。物体合成位置Ａ及び物体合成位置Ｂはそれぞれ任意に決定された位置である。ＧＡＮモデルでは、撮影画像上の物体合成位置Ａ及び物体合成位置Ｂにノイズを発生させる。 Next, as shown in FIG. 4, the combining position determination unit 20 determines an object combining position A and an object combining position B at which an object is combined on the captured image. The object combining position A and the object combining position B are positions arbitrarily determined. In the GAN model, noise is generated at the object combining position A and the object combining position B on the captured image.

次いで、図５に示すように、合成画像生成部３０は、ノイズが発生された物体合成位置Ａ及び物体合成位置Ｂのそれぞれに、異なるジェネレータを有する画像を合成する。異なるジェネレータを有する画像とは、例えば、性別、年齢、服装、及び姿勢等の異なる人物の画像、自転車等の乗り物を運転する人物の画像、並びに、乗用車等の傍に人物が立っている画像等が挙げられる。これらのジェネレータに基づいて所定の物体合成位置に所定の物体が合成される。例えば、図５に示すように、物体合成位置Ａには、歩行者Ａ１が合成され、物体合成位置Ｂには自転車を運転する人物Ｂ１が合成される。 Next, as illustrated in FIG. 5, the combined image generation unit 30 combines images having different generators at each of the object combination position A and the object combination position B where noise has occurred. The images having different generators include, for example, images of people having different genders, ages, clothes, and postures, images of people driving vehicles such as bicycles, and images of people standing near a car or the like. Is mentioned. A predetermined object is synthesized at a predetermined object synthesis position based on these generators. For example, as shown in FIG. 5, a pedestrian A1 is synthesized at the object synthesis position A, and a person B1 driving a bicycle is synthesized at the object synthesis position B.

なお、ＧＡＮモデルでは、物体合成位置に合成された物体が人物として認識され得るかを判定し、かつ、物体合成位置に合成された物体が背景に馴染んでいるかを判定する。例えば、合成された物体が人物として認識されると判定されたとしても、当該物体が背景に馴染んでいないと判定された場合は、ジェネレータを調整し、再度、物体合成位置に物体を合成する。これにより、合成画像全体として元の撮影画像に近い自然な画像を得ることができる。 In the GAN model, it is determined whether or not the object synthesized at the object synthesis position can be recognized as a person, and whether the object synthesized at the object synthesis position is familiar with the background. For example, even if it is determined that the synthesized object is recognized as a person, if it is determined that the object is not familiar with the background, the generator is adjusted and the object is synthesized again at the object synthesis position. As a result, a natural image close to the original captured image can be obtained as the entire composite image.

このように、本実施の形態では、撮影画像上の所望の物体合成位置に物体を合成するため、物体合成位置の座標及び物体の種類等を示すアノテーションを、合成画像に付与する必要がない。そのため、正解データ作成のための一連の情報処理に要する時間を短くすることができる。また、本実施の形態では、撮影画像上の所望の物体合成位置に物体を合成するため、物体合成位置の座標などの情報を、合成画像を学習モデル（認識モデルとも呼ぶ）に入力した場合の正解データとして使用することができる。そのため、物体合成位置に物体が合成された合成画像を学習モデルに入力することにより得られる出力データと正解データとを比較して、学習モデルでの認識精度が低い合成画像を特定することができる。これにより、学習モデルでの認識精度が低い合成画像に基づいて、当該合成画像又は当該合成画像に類似する画像を学習モデルの訓練用データとして使用することができる。したがって、生成された全ての合成画像を学習モデルの訓練用データとして使用する場合に比べて、学習モデルの個々の訓練効率を向上させることができる。言い換えると、機械学習に有益でないデータが混在していると機械学習の処理が収束しにくくなるが、本実施の形態によれば、機械学習に有益なデータが特定され訓練用データとして使用されるため、機械学習の処理が収束しやすくなる。それにより、学習モデルの個々の訓練効率が向上する。例えば、学習モデルの認識精度をより短時間で向上させることができる。 As described above, in the present embodiment, since an object is synthesized at a desired object synthesis position on a captured image, it is not necessary to add an annotation indicating the coordinates of the object synthesis position and the type of the object to the synthesized image. Therefore, the time required for a series of information processing for creating the correct answer data can be reduced. Further, in the present embodiment, in order to synthesize an object at a desired object synthesis position on a captured image, information such as coordinates of the object synthesis position is input to a learning model (also referred to as a recognition model). It can be used as correct answer data. Therefore, by comparing the output data obtained by inputting the synthesized image in which the object is synthesized at the object synthesis position into the learning model with the correct answer data, it is possible to specify the synthesized image with low recognition accuracy in the learning model. . This makes it possible to use the synthesized image or an image similar to the synthesized image as training data of the learning model based on the synthesized image with low recognition accuracy in the learning model. Therefore, the training efficiency of each of the learning models can be improved as compared with the case where all the generated synthetic images are used as training data of the learning model. In other words, if data that is not useful for machine learning is mixed, it is difficult for the processing of machine learning to converge, but according to the present embodiment, data that is useful for machine learning is specified and used as training data. Therefore, the machine learning process is easily converged. Thereby, the efficiency of individual training of the learning model is improved. For example, the recognition accuracy of the learning model can be improved in a shorter time.

また、本実施の形態では、画像が有する人の感覚器により知覚される特徴（ここでは、視覚的特徴）と同一又は類似の特徴（つまり、視覚的特徴）を有する物体を物体合成位置に合成するため、画像がカメラなどで実際に撮影された撮影画像である場合は、撮影画像に近い自然な合成画像を得ることができる。そのため、当該合成画像を訓練用データとして使用して学習を行った学習モデルは、撮影画像を訓練用データとして使用した場合の認識精度により近い認識精度を得ることができる。 Further, in the present embodiment, an object having the same or similar feature (that is, visual feature) as a feature (here, visual feature) perceived by a human sensory organ of an image is synthesized at the object synthesis position. Therefore, when the image is a photographed image actually photographed by a camera or the like, a natural composite image close to the photographed image can be obtained. Therefore, the learning model that has performed learning using the composite image as training data can obtain recognition accuracy closer to the recognition accuracy when the captured image is used as training data.

以上のように、情報処理システム１００は、画像を取得する画像取得部１０と、画像上の物体合成位置を決定する合成位置決定部２０と、物体合成位置に、画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する物体を合成することにより合成画像を生成する合成画像生成部３０と、合成画像を学習モデルに入力することにより得られる学習モデルの出力データを取得する出力データ取得部４０と、合成画像に基づいて学習モデルの訓練用データを決定することである第１の決定を行うか否かを決定することである第２の決定を少なくとも物体合成位置を含む正解データと出力データとを用いて行い、第１の決定を行うと第２の決定において決定された場合、第１の決定を行う決定部５０と、を備える。このとき、当該画像が有する視覚的特徴は、当該画像の画像パラメタの統計的特徴である。また、当該画像が有する視覚的特徴は、当該画像の定性的特徴である。 As described above, the information processing system 100 includes the image acquisition unit 10 that acquires an image, the composition position determination unit 20 that determines the object composition position on the image, and the same visual features of the image at the object composition position. Alternatively, a synthesized image generating unit 30 that generates a synthesized image by synthesizing objects having similar visual characteristics, and an output data obtaining unit that obtains output data of a learning model obtained by inputting the synthesized image to the learning model 40. Correct answer data including at least an object combining position and output a second decision to determine whether or not to make a first decision to determine training data of a learning model based on the combined image. A determination unit that performs the first determination when the determination is made in the second determination that the first determination is performed using the data. At this time, the visual characteristics of the image are statistical characteristics of image parameters of the image. The visual features of the image are qualitative features of the image.

また、本実施の形態に係る情報処理方法は、画像を取得し（Ｓ１０）、画像上の物体合成位置を決定し（Ｓ２０）、物体合成位置に、画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する物体を合成することにより合成画像を生成し（Ｓ３０）、合成画像を学習モデルに入力することにより得られる学習モデルの出力データを取得し（Ｓ４０）、合成画像に基づいて学習モデルの訓練用データを決定することである第１の決定を行うか否かを決定することである第２の決定を少なくとも物体合成位置を含む正解データと出力データとを用いて行い（Ｓ５０）、第１の決定を行うと第２の決定において決定された場合、第１の決定を行う（Ｓ６０）。 Further, the information processing method according to the present embodiment obtains an image (S10), determines an object combining position on the image (S20), and sets the object combining position to be the same as or similar to the visual feature of the image. A synthetic image is generated by synthesizing an object having a visual feature (S30), output data of a learning model obtained by inputting the synthetic image to the learning model is obtained (S40), and learning is performed based on the synthetic image. A second decision to decide whether to make a first decision to decide training data of the model is made by using the correct answer data including at least the object synthesis position and the output data (S50). If the first decision is made in the second decision, the first decision is made (S60).

これにより、画像上の所望の物体合成位置に物体が合成されるため、物体合成位置の座標などの情報を学習モデルの正解データとして使用することができる。そのため、物体合成位置に物体が合成された合成画像を学習モデルに入力することにより得られる出力データと正解データとを比較して、学習モデルでの認識精度が低い合成画像を特定することができる。これにより、学習モデルでの認識精度が低い合成画像に基づいて、当該合成画像又は当該合成画像に類似する画像を学習モデルの訓練用データとして使用することができる。したがって、学習モデルの個々の訓練効率を向上させることができる。言い換えると、機械学習に有益でないデータが混在していると機械学習の処理が収束しにくくなるが、本実施の形態によれば、機械学習に有益なデータが特定され訓練用データとして使用されるため、機械学習の処理が収束しやすくなる。これにより、学習モデルの個々の訓練効率が向上する。例えば、全ての合成画像を学習モデルの訓練用データとして使用する場合に比べて、学習モデルの認識精度をより短時間に効率よく向上させることができる。また、本実施の形態では、画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する物体を物体合成位置に合成するため、画像がカメラなどで実際に撮影された画像である場合は、撮像画像に近い自然な合成画像を得ることができる。そのため、当該合成画像を訓練用データとして使用して学習を行った学習モデルは、撮影画像を訓練用データとして使用した場合の認識精度により近い認識精度を得ることができる。 Accordingly, the object is synthesized at the desired object synthesis position on the image, and thus information such as the coordinates of the object synthesis position can be used as correct data of the learning model. Therefore, by comparing the output data obtained by inputting the synthesized image in which the object is synthesized at the object synthesis position into the learning model with the correct answer data, it is possible to specify the synthesized image with low recognition accuracy in the learning model. . This makes it possible to use the synthesized image or an image similar to the synthesized image as training data of the learning model based on the synthesized image with low recognition accuracy in the learning model. Therefore, the efficiency of individual training of the learning model can be improved. In other words, if data that is not useful for machine learning is mixed, it is difficult for the processing of machine learning to converge, but according to the present embodiment, data that is useful for machine learning is specified and used as training data. Therefore, the machine learning process is easily converged. Thereby, the efficiency of individual training of the learning model is improved. For example, the recognition accuracy of the learning model can be improved more efficiently in a shorter time than when all the synthesized images are used as training data for the learning model. Further, in the present embodiment, in order to synthesize an object having the same or similar visual characteristics as the visual characteristics of the image at the object synthesis position, if the image is an image actually captured by a camera or the like, A natural synthesized image close to the captured image can be obtained. Therefore, the learning model that has performed learning using the composite image as training data can obtain recognition accuracy closer to the recognition accuracy when the captured image is used as training data.

なお、画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する物体とは、例えば、画像の画像パラメタの統計的特徴である画像の色味又はエッジなどが同一又は類似している物体であってもよく、画像の定性的特徴である雨又は雪などの気象条件、気象条件による路面の状態、及びオクルージョンなどが同一又は類似している物体であってもよい。当該物体は、上記特徴を有することにより、画像に馴染みやすいため、物体合成位置に当該物体を合成して生成される合成画像は自然な画像となる。 Note that an object having the same or similar visual feature as an image has, for example, an object having the same or similar color or edge of an image, which is a statistical feature of image parameters of the image. It may be an object having the same or similar weather conditions, such as rain or snow, which are qualitative features of the image, the state of the road surface due to the weather conditions, and occlusion. Since the object has the above characteristics, it is easy to adapt to the image. Therefore, a synthesized image generated by synthesizing the object at the object synthesis position is a natural image.

（変形例１）
［変形例１に係る情報処理システムの構成］
実施の形態の変形例１に係る情報処理システムについて図６を参照して説明する。図６は、変形例１に係る情報処理システム１００の構成の一例を示すブロック図である。 (Modification 1)
[Configuration of Information Processing System According to Modification 1]
An information processing system according to a first modification of the embodiment will be described with reference to FIG. FIG. 6 is a block diagram illustrating an example of a configuration of an information processing system 100 according to the first modification.

なお、実施の形態に係る情報処理システム１００において、画像取得部１０は、画像を取得する受信部及び画像を撮像する撮像部のいずれであってもよい例を説明したが、変形例１では、画像取得部１０は、撮影された画像を受信する受信部である例を説明する。 In the information processing system 100 according to the embodiment, an example has been described in which the image acquisition unit 10 may be either a reception unit that acquires an image or an imaging unit that captures an image. An example in which the image acquiring unit 10 is a receiving unit that receives a captured image will be described.

変形例１に係る情報処理システム１００は、撮像部２１０と、認識部２２０とを含む認識処理部２００と、認識モデル更新部３００と、を備える。 The information processing system 100 according to the first modification includes a recognition processing unit 200 including an imaging unit 210 and a recognition unit 220, and a recognition model updating unit 300.

変形例１に係る情報処理システム１００は、撮像部２１０によって撮影された画像（以下、撮影画像とも呼ぶ）に画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する物体を合成することにより合成画像を生成し、当該合成画像内に合成された物体にアノテーションを付与し、認識モデルを構築するための訓練用データを決定する。さらに、情報処理システム１００は、合成画像に基づいて決定した訓練用データを用いて、画像から物体を検出するための認識モデルを構築する。認識モデルの構築には、後述する学習モデルが適用される。訓練用データは、認識モデルが学習するために（言い換えると、認識モデルが機械学習を用いて訓練されるために）使用するデータである。訓練用データは、合成画像と、合成画像内の物体の種別及び動作などの内容、物体の位置及び領域等の情報とを含む。 The information processing system 100 according to Modification 1 synthesizes an image having the same or similar visual feature as an image captured by the imaging unit 210 (hereinafter, also referred to as a captured image) by synthesizing the same. A synthetic image is generated, an annotation is given to an object synthesized in the synthetic image, and training data for constructing a recognition model is determined. Further, the information processing system 100 constructs a recognition model for detecting an object from the image using the training data determined based on the composite image. To construct the recognition model, a learning model described below is applied. The training data is data used by the recognition model for learning (in other words, for the recognition model to be trained using machine learning). The training data includes a synthesized image, information such as the type and motion of the object in the synthesized image, and information such as the position and area of the object.

認識処理部２００は、認識モデル更新部３００が訓練により構築した認識モデルを、認識部２２０の認識モデル受信部３で受信する。認識モデル受信部３で受信された認識モデルは、認識モデル更新部４に入力されて更新される。認識モデルが更新されると、更新情報提示部５は、認識モデルが更新された旨の通知を提示する。当該通知の提示は、音声であってもよく、画面に画像として表示されてもよい。また、認識部２２０は、撮像部２１０で撮影された画像に含まれる物体を認識する。なお、認識処理部２００は、物体の認識結果を音声又は画像として出力してユーザに知らせてもよい。 The recognition processing unit 200 receives the recognition model constructed by the training by the recognition model updating unit 300 by the recognition model receiving unit 3 of the recognition unit 220. The recognition model received by the recognition model receiving unit 3 is input to the recognition model updating unit 4 and updated. When the recognition model is updated, the update information presenting unit 5 presents a notification that the recognition model has been updated. The presentation of the notification may be sound or may be displayed as an image on a screen. Further, the recognition unit 220 recognizes an object included in the image captured by the imaging unit 210. Note that the recognition processing unit 200 may output the recognition result of the object as a voice or an image to notify the user.

以降において、認識処理部２００が、移動体、具体的には、自動車等に搭載され、認識モデル更新部３００が、自動車から離れた位置にあるサーバに搭載されるとして、説明する。自動車の認識処理部２００と、サーバとは、無線通信を介して接続され、例えば、インターネット等の通信網を介して互いに無線通信する。認識処理部２００と認識モデル更新部３００とは、無線通信を介して、情報を送受信する。上記無線通信には、Ｗｉ−Ｆｉ（登録商標）（ＷｉｒｅｌｅｓｓＦｉｄｅｌｉｔｙ）などの無線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）が適用されてもよく、その他の無線通信が適用されてもよい。サーバは、コンピュータ等の情報処理装置であってもよい。サーバは、１つ以上のサーバを含んでもよく、クラウドシステムを構成してもよい。 Hereinafter, a description will be given on the assumption that the recognition processing unit 200 is mounted on a moving object, specifically, a car, and the recognition model updating unit 300 is mounted on a server located at a position away from the car. The recognition processing unit 200 of the vehicle and the server are connected via wireless communication, and wirelessly communicate with each other via a communication network such as the Internet, for example. The recognition processing unit 200 and the recognition model updating unit 300 transmit and receive information via wireless communication. For the wireless communication, a wireless LAN (Local Area Network) such as Wi-Fi (registered trademark) (Wireless Fidelity) may be applied, or other wireless communication may be applied. The server may be an information processing device such as a computer. The server may include one or more servers, and may constitute a cloud system.

情報処理システム１００は、通信回路等の無線通信装置を備えてもよく、サーバが備える無線通信装置を利用してもよい。認識処理部２００は、通信回路等の無線通信装置を備えてもよく、自動車が備える無線通信装置を利用してもよい。なお、認識処理部２００と認識モデル更新部３００とは、無線通信を介して接続されず、有線通信を介して接続されてもよく、不揮発性メモリ等の記録媒体を介して、互いに情報を授受してもよい。 The information processing system 100 may include a wireless communication device such as a communication circuit, or may use a wireless communication device included in a server. The recognition processing unit 200 may include a wireless communication device such as a communication circuit, or may use a wireless communication device included in an automobile. Note that the recognition processing unit 200 and the recognition model updating unit 300 may be connected via wired communication without being connected via wireless communication, and exchange information with each other via a recording medium such as a nonvolatile memory. May be.

また、自動車に搭載されるコンピュータが処理可能であれば、認識モデル更新部３００が自動車に搭載されてもよい。この場合、認識モデル更新部３００と認識処理部２００とは一体化されてもよい。そして、認識モデル更新部３００は、無線通信、有線通信又は記録媒体を介して、自動車の外部と、情報を授受してもよい。 If the computer mounted on the vehicle can process, the recognition model updating unit 300 may be mounted on the vehicle. In this case, the recognition model updating unit 300 and the recognition processing unit 200 may be integrated. Then, the recognition model updating unit 300 may exchange information with the outside of the vehicle via wireless communication, wired communication, or a recording medium.

さらに、図６を参照して、変形例１に係る情報処理システム１００の認識処理部２００及び認識モデル更新部３００の詳細な構成を説明する。なお、以降において、認識モデル更新部３００が、合成画像を生成するためにＧＡＮモデルを使用する例を説明する。 Further, a detailed configuration of the recognition processing unit 200 and the recognition model updating unit 300 of the information processing system 100 according to the first modification will be described with reference to FIG. Hereinafter, an example will be described in which the recognition model updating unit 300 uses the GAN model to generate a composite image.

変形例１に係る情報処理システム１００では、認識処理部２００は、撮像部２１０と認識部２２０とを備える。 In the information processing system 100 according to the first modification, the recognition processing unit 200 includes an imaging unit 210 and a recognition unit 220.

撮像部２１０は、例えば、カメラであり、画像撮像部１と画像送信部２とを備える。撮像部２１０で撮影された画像は、画像送信部２を介して認識モデル更新部３００の画像取得部１１０に送信される。 The imaging unit 210 is, for example, a camera, and includes an image imaging unit 1 and an image transmission unit 2. The image captured by the imaging unit 210 is transmitted to the image acquisition unit 110 of the recognition model updating unit 300 via the image transmission unit 2.

認識部２２０は、例えば、撮像部２１０で撮影された画像に含まれる人物等の物体を認識する。認識部２２０は、認識モデル受信部３と認識モデル更新部４と更新情報提示部５とを備える。認識モデル受信部３は、認識モデル更新部３００で更新された認識モデルを受信して認識モデル更新部４に出力する。認識モデル更新部４は、認識モデル受信部３から出力された認識モデルを格納することにより、認識モデルを更新する。更新情報提示部５は、ディスプレイ及び／又はスピーカで構成されてよく、第１の決定を行うと第２の決定において決定された場合、認識モデルのユーザに通知を行う。例えば、更新情報提示部５は、所定量の訓練用データが訓練用データ保持部１６０に格納された場合、決定された訓練用データを用いた認識モデルの訓練の要請に関する通知を行う。また、例えば、更新情報提示部５は、訓練部１７０において、決定された訓練用データを用いた認識モデルの訓練が実行された場合、訓練の完了に関する通知を行う。また、更新情報提示部５は、認識部２２０に保持された認識モデルが訓練済みの認識モデルに更新されたことをユーザに提示してもよい。さらに、更新情報提示部５は、例えば、更新された認識モデルと更新前の認識モデルとの違い、及び、更新されたことにより得られる効果等の更新情報をユーザに提示してもよい。なお、ディスプレイは、液晶パネル、有機又は無機ＥＬ（ＥｌｅｃｔｒｏＬｕｍｉｎｅｓｃｅｎｃｅ）などの表示パネルで構成されてもよい。 The recognition unit 220 recognizes, for example, an object such as a person included in an image captured by the imaging unit 210. The recognition unit 220 includes a recognition model receiving unit 3, a recognition model updating unit 4, and an update information presenting unit 5. The recognition model receiving unit 3 receives the recognition model updated by the recognition model updating unit 300 and outputs the received recognition model to the recognition model updating unit 4. The recognition model updating unit 4 updates the recognition model by storing the recognition model output from the recognition model receiving unit 3. The update information presenting unit 5 may be configured by a display and / or a speaker, and notifies the user of the recognition model when the first determination is made and the second determination is made. For example, when a predetermined amount of training data is stored in the training data holding unit 160, the update information presentation unit 5 notifies a request for training of a recognition model using the determined training data. In addition, for example, when the training of the recognition model using the determined training data is performed in the training unit 170, the update information presentation unit 5 notifies the completion of the training. In addition, the update information presenting unit 5 may present to the user that the recognition model stored in the recognition unit 220 has been updated to a trained recognition model. Further, the update information presenting unit 5 may present the user with update information such as a difference between the updated recognition model and the pre-update recognition model, and an effect obtained by being updated. Note that the display may be configured by a display panel such as a liquid crystal panel, an organic or inorganic EL (Electro Luminescence).

認識モデル更新部３００は、画像取得部１１０、サンプリング部１１２、合成位置設定部１２０、画像合成部１３０、検知処理部１４０、データ利用判定部１５０、訓練部１７０、認識モデル送信部１８０、訓練用データ保持部１６０及び認識モデル保持部１４２を備える。 The recognition model update unit 300 includes an image acquisition unit 110, a sampling unit 112, a synthesis position setting unit 120, an image synthesis unit 130, a detection processing unit 140, a data use determination unit 150, a training unit 170, a recognition model transmission unit 180, and a training model. A data storage unit 160 and a recognition model storage unit 142 are provided.

画像取得部１１０は、撮像部２１０から送信された画像を取得する。画像取得部１１０は、取得した画像をサンプリング部１１２に出力する。 The image obtaining unit 110 obtains an image transmitted from the imaging unit 210. The image obtaining unit 110 outputs the obtained image to the sampling unit 112.

サンプリング部１１２は、画像取得部１１０から出力された画像を受信し、受信した画像の中から、例えば、周期的に画像をサンプリングして、サンプリングした画像を合成位置設定部１２０に出力する。 The sampling unit 112 receives the image output from the image acquisition unit 110, periodically samples the image from the received image, for example, and outputs the sampled image to the combining position setting unit 120.

合成位置設定部１２０は、実施の形態における合成位置決定部２０（図１参照）の一例であり、サンプリング部１１２でサンプリングされた画像上の物体合成位置を任意に設定する。 The synthesis position setting unit 120 is an example of the synthesis position determination unit 20 (see FIG. 1) in the embodiment, and arbitrarily sets the object synthesis position on the image sampled by the sampling unit 112.

画像合成部１３０は、実施の形態における合成画像生成部３０（図１参照）の一例であり、合成位置設定部１２０で設定された物体合成位置に物体を合成する。このとき、物体の合成方法としては、ＧＡＮモデルを用いる。なお、ＧＡＮモデルについては、実施の形態にて説明したため、ここでの説明を省略する。 The image combining unit 130 is an example of the combined image generating unit 30 (see FIG. 1) in the embodiment, and combines an object at the object combining position set by the combining position setting unit 120. At this time, a GAN model is used as a method of combining the objects. Since the GAN model has been described in the embodiment, the description thereof is omitted here.

検知処理部１４０は、実施の形態における出力データ取得部４０（図１参照）の一例であり、画像合成部１３０で合成された合成画像を認識モデル保持部１４２に出力して得られる認識モデルの出力データを取得する。より具体的には、検知処理部１４０は、認識モデル保持部１４２に保持された認識モデルに合成画像を入力することにより得られる認識モデルの出力データを取得する。検知処理部１４０は、取得した出力データをデータ利用判定部１５０に出力する。 The detection processing unit 140 is an example of the output data acquisition unit 40 (see FIG. 1) in the embodiment, and outputs a recognition model obtained by outputting the combined image combined by the image combining unit 130 to the recognition model holding unit 142. Get output data. More specifically, the detection processing unit 140 acquires output data of a recognition model obtained by inputting a composite image to the recognition model held in the recognition model holding unit 142. The detection processing unit 140 outputs the obtained output data to the data use determination unit 150.

データ利用判定部１５０は、実施の形態における決定部５０（図１参照）の一例であり、少なくとも物体合成位置を含む正解データと出力データとを用いて、合成画像に基づいて認識モデルの訓練用データを決定することである第１の決定を行うか否かの第２の決定を行う。データ利用判定部１５０は、正解データと出力データとの差分により判定された認識モデルの認識精度が所定の閾値よりも高い場合、第１の決定を行わないとする第２の決定を行う。より具体的には、この場合、データ利用判定部１５０は、認識モデルに入力された合成画像を、認識モデルによって認識される画像であると判定し、当該合成画像に基づいて認識モデルの訓練用データを決定しないとの第２の決定を行う。データ利用判定部１５０は、第２の決定に応じて、当該合成画像に基づいて認識モデルの訓練用データを決定しない。 The data use determination unit 150 is an example of the determination unit 50 (see FIG. 1) in the embodiment, and uses a correct answer data including at least an object combining position and output data to train a recognition model based on a combined image. A second decision is made as to whether to make a first decision to determine the data. When the recognition accuracy of the recognition model determined based on the difference between the correct answer data and the output data is higher than a predetermined threshold, the data use determination unit 150 makes a second determination not to make the first determination. More specifically, in this case, the data use determination unit 150 determines that the composite image input to the recognition model is an image that is recognized by the recognition model, and performs training for the recognition model based on the composite image. A second decision not to determine the data is made. The data use determination unit 150 does not determine the training data of the recognition model based on the composite image according to the second determination.

一方、データ利用判定部１５０は、認識モデルの認識精度が所定の閾値よりも低い場合、第１の決定を行うとする第２の決定を行う。より具体的には、データ利用判定部１５０は、認識モデルに入力された合成画像を、認識モデルによって物体が認識されにくい画像であると判定し、当該合成画像に基づいて認識モデルの訓練用データを決定するとの第２の決定を行う。データ利用判定部１５０は、第２の決定に応じて、当該合成画像を訓練用データとして決定する。また、データ利用判定部１５０は、当該合成画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する対応画像を訓練用データとして決定する。対応画像は、訓練用データ保持部１６０に格納されている画像から選出されてもよく、生成されてもよい。データ利用判定部１５０は、訓練用データとして決定された画像を、訓練用データ保持部１６０に新規訓練用データとして格納する。 On the other hand, when the recognition accuracy of the recognition model is lower than the predetermined threshold, the data use determination unit 150 makes a second determination to make the first determination. More specifically, the data use determination unit 150 determines that the composite image input to the recognition model is an image in which an object is difficult to be recognized by the recognition model, and based on the composite image, training data of the recognition model. Is determined, a second determination is made. The data use determination unit 150 determines the composite image as training data according to the second determination. Further, the data use determination unit 150 determines, as training data, a corresponding image having the same or similar visual feature as the visual feature of the composite image. The corresponding image may be selected from an image stored in the training data holding unit 160 or may be generated. The data use determination unit 150 stores the image determined as the training data in the training data holding unit 160 as new training data.

ここで、合成画像が有する視覚的特徴は、当該合成画像の画像パラメタの統計的特徴である。また、合成画像が有する視覚的特徴は、当該合成画像の定性的特徴である。なお、画像パラメタの統計的特徴及び定性的特徴については、実施の形態に記載の内容と同様であるため、ここでの説明を省略する。 Here, the visual characteristics of the composite image are statistical characteristics of the image parameters of the composite image. Further, the visual characteristics of the composite image are qualitative characteristics of the composite image. Note that the statistical characteristics and qualitative characteristics of the image parameters are the same as those described in the embodiment, and thus description thereof will be omitted.

また、合成画像が有する視覚的特徴は、合成画像における物体の態様であり、対応画像が有する視覚的特徴は、物体が有する属性と同一又は類似の属性を有する対応物体の態様である。例えば、当該態様は、物体の合成画像上における位置である。より具体的には、物体の合成画像上の位置は、物体が占める領域の座標である。また、例えば、当該態様は、物体の姿勢である。 The visual feature of the composite image is the aspect of the object in the composite image, and the visual characteristic of the corresponding image is the aspect of the corresponding object having the same or similar attribute as the attribute of the object. For example, the aspect is a position of the object on the composite image. More specifically, the position of the object on the composite image is the coordinates of the area occupied by the object. Also, for example, the aspect is the posture of the object.

なお、物体が有する属性とは、物体が有する性質であり、例えば、物体の種類、形、色、材質等が挙げられる。より具体的には、物体の種類が人物である場合、性別、体格、年齢、肌の色、服装、持ち物、姿勢、年齢、表情なども物体が有する属性に含まれてもよい。また、物体の種類が自動車である場合、車種、形状、ボディの色、窓ガラスの色なども物体が有する属性に含まれてもよい。 Note that the attribute of an object is a property of the object, such as the type, shape, color, and material of the object. More specifically, when the type of the object is a person, gender, physique, age, skin color, clothing, belongings, posture, age, expression, and the like may be included in the attributes of the object. When the type of the object is an automobile, the attribute of the object may include a vehicle type, a shape, a body color, a window glass color, and the like.

訓練用データ保持部１６０は、新規訓練用データ、及び、訓練用データとして事前に保持された種々の画像を含む事前保持ＤＢ（ＤａｔａＢａｓｅ）などを格納している。訓練用データ保持部１６０は、上記のデータの他に、背景の情報、物体の情報、天候などの環境の情報などを格納し、かつ、格納した情報を取り出すことができる。訓練用データ保持部１６０は、例えば、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、フラッシュメモリなどの半導体メモリ、ハードディスクドライブ、又は、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記憶装置によって実現される。 The training data holding unit 160 stores new training data, a pre-hold DB (Data Base) including various images held in advance as training data, and the like. The training data holding unit 160 can store background information, object information, environment information such as weather, and the like in addition to the above-described data, and can retrieve the stored information. The training data holding unit 160 is realized by, for example, a storage device such as a RAM (Random Access Memory), a semiconductor memory such as a flash memory, a hard disk drive, or an SSD (Solid State Drive).

訓練用データ保持部１６０は、データ利用判定部１５０が学習モデルの訓練用データに決定した合成画像に対して、画像上の画像合成位置の情報と、当該合成画像とを対応付けて格納する。また、訓練用データ保持部１６０は、データ利用判定部１５０が合成画像に基づいて訓練用データに決定した対応画像に対して、合成画像上に合成された物体に対応する対応物体の対応画像上の位置の情報と、当該対応画像とを対応付けて格納する。 The training data holding unit 160 stores the information of the image combining position on the image and the combined image in association with the combined image determined by the data use determination unit 150 as the training data of the learning model. Further, the training data holding unit 160 compares the corresponding image determined as training data based on the composite image by the data use determination unit 150 with the corresponding image of the corresponding object corresponding to the object synthesized on the composite image. And the corresponding image are stored in association with each other.

訓練用データ保持部１６０は、データ利用判定部１５０が訓練用データとして決定した合成画像と同一又は類似の画像を要求するコマンドをデータ利用判定部１５０から受信したことに応じて、訓練用データ保持部１６０に格納された新規訓練用データ及び事前保持ＤＢから所望の画像をデータ利用判定部１５０に出力する。また、訓練用データ保持部１６０は、訓練用データを要求するコマンドを訓練部１７０から受信したことに応じて、訓練用データ保持部１６０に格納された新規訓練用データ及び事前保持ＤＢから所期の画像を訓練部１７０に出力する。 The training data holding unit 160 holds the training data holding unit in response to receiving from the data usage determining unit 150 a command requesting an image that is the same as or similar to the synthesized image determined by the data use determining unit 150 as training data. A desired image is output from the new training data stored in the unit 160 and the pre-holding DB to the data use determining unit 150. In addition, in response to receiving a command for requesting training data from the training unit 170, the training data holding unit 160 reads the new training data stored in the training data holding unit 160 and the desired data from the pre-holding DB. Is output to the training unit 170.

認識モデル保持部１４２は、認識部２２０が有する認識モデルと同一の認識モデルを格納している。認識モデル保持部１４２は、画像合成部１３０で生成された合成画像を認識モデルに入力して得られた出力データを検知処理部１４０に出力する。認識モデル保持部１４２は、訓練部１７０で訓練された認識モデルを取得し、認識モデル保持部１４２に格納することにより更新する。 The recognition model holding unit 142 stores the same recognition model as the recognition model of the recognition unit 220. The recognition model holding unit 142 outputs, to the detection processing unit 140, output data obtained by inputting the synthesized image generated by the image synthesis unit 130 to the recognition model. The recognition model holding unit 142 acquires the recognition model trained by the training unit 170, and updates the obtained recognition model by storing it in the recognition model holding unit 142.

訓練部１７０は、データ利用判定部１５０で決定された訓練用データを用いた認識モデルの訓練を実行する。例えば、訓練部１７０は、訓練用データ保持部１６０に所定量の新規訓練用データが格納されると、訓練用データ保持部１６０から訓練用データを読み出し、それらの訓練用データを訓練部１７０に格納された認識モデルに入力して認識モデルの訓練を実行する。訓練部１７０は、機械学習を用いて訓練された認識モデルを認識モデル保持部１４２及び認識モデル送信部１８０に出力する。 The training unit 170 executes the training of the recognition model using the training data determined by the data use determining unit 150. For example, when a predetermined amount of new training data is stored in the training data holding unit 160, the training unit 170 reads the training data from the training data holding unit 160 and sends the training data to the training unit 170. The training of the recognition model is executed by inputting to the stored recognition model. The training unit 170 outputs the recognition model trained using machine learning to the recognition model holding unit 142 and the recognition model transmitting unit 180.

認識モデル送信部１８０は、訓練部１７０で訓練された認識モデルを、認識処理部２００の認識部２２０に送信する。認識部２２０の認識モデル受信部３は、訓練された認識モデルを受信すると、認識モデル更新部４に当該認識モデルを出力する。 The recognition model transmission unit 180 transmits the recognition model trained by the training unit 170 to the recognition unit 220 of the recognition processing unit 200. Upon receiving the trained recognition model, the recognition model receiving unit 3 of the recognition unit 220 outputs the recognition model to the recognition model updating unit 4.

［変形例１に係る情報処理システムの動作］
変形例１に係る情報処理システム１００の動作について、図７を参照して説明する。図７は、変形例１に係る情報処理方法のフローを示すフローチャートである。 [Operation of Information Processing System According to Modification 1]
The operation of the information processing system 100 according to the first modification will be described with reference to FIG. FIG. 7 is a flowchart illustrating a flow of the information processing method according to the first modification.

図７に示すように、変形例１に係る情報処理システム１００では、ステップＳ１０において、画像取得部１１０は、撮像部２１０で撮影された画像を取得する。画像取得部１１０は、取得した画像をサンプリング部１１２に出力する。 As shown in FIG. 7, in the information processing system 100 according to the first modification, in step S10, the image obtaining unit 110 obtains an image captured by the imaging unit 210. The image obtaining unit 110 outputs the obtained image to the sampling unit 112.

次いで、ステップＳ１０１において、サンプリング部１１２は、画像取得部１１０から出力された画像を受信し、受信した画像の中から、例えば、周期的に画像をサンプリングする。サンプリング部１１２は、サンプリングした画像を合成位置設定部１２０に出力する。 Next, in step S101, the sampling unit 112 receives the image output from the image acquisition unit 110, and, for example, periodically samples the image from the received image. The sampling unit 112 outputs the sampled image to the combining position setting unit 120.

次いで、ステップＳ２０において、合成位置設定部１２０は、サンプリング部１１２から出力された画像を受信し、受信した画像上の物体合成位置を任意に決定する。合成位置設定部１２０は、物体合成位置が決定された画像を画像合成部１３０に出力する。 Next, in step S20, the combining position setting unit 120 receives the image output from the sampling unit 112, and arbitrarily determines the object combining position on the received image. The combining position setting unit 120 outputs the image in which the object combining position is determined to the image combining unit 130.

次いで、ステップＳ３０において、画像合成部１３０は、物体合成位置に物体を合成して合成画像を生成する。画像合成部１３０は、例えばＧＡＮモデルを用いて、画像上の物体合成位置に物体を合成する。画像合成部１３０は、生成した合成画像を検知処理部１４０に出力する。 Next, in step S30, the image combining unit 130 combines the object at the object combining position to generate a combined image. The image synthesis unit 130 synthesizes an object at an object synthesis position on an image using, for example, a GAN model. The image synthesis unit 130 outputs the generated synthesized image to the detection processing unit 140.

次いで、ステップＳ４０において、検知処理部１４０は、画像合成部１３０から出力された合成画像を受信し、当該合成画像を認識モデル保持部１４２に格納されている認識モデルに入力して出力データを取得する。検知処理部１４０は、取得した出力データをデータ利用判定部１５０に出力する。 Next, in step S40, the detection processing unit 140 receives the synthesized image output from the image synthesis unit 130, inputs the synthesized image to the recognition model stored in the recognition model holding unit 142, and obtains output data. I do. The detection processing unit 140 outputs the obtained output data to the data use determination unit 150.

次いで、ステップＳ５０において、データ利用判定部１５０は、合成画像の正解データと、合成画像を認識モデルに入力して得られた出力データとを用いて、第１の決定を行うか否かを決定することである第２の決定を行う。なお、第１の決定は、合成画像に基づいて訓練用データを決定することである。第１の決定を行わないと第２の決定において決定された場合（ステップＳ５０１でＮＯ）、当該合成画像に基づく訓練用データを決定する情報処理方法のフローを終了する。一方、第１の決定を行うと第２の決定において決定された場合（ステップＳ５０１でＹＥＳ）、ステップＳ６０において、データ利用判定部１５０は、第１の決定を行う。このとき、データ利用判定部１５０は、合成画像に基づいて訓練用データを決定する。データ利用判定部１５０は、当該合成画像を認識モデルの訓練用データとして決定する。また、データ利用判定部１５０は、当該合成画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する対応画像を認識モデルの訓練用データとして決定する。次いで、ステップＳ６０１において、データ利用判定部１５０は、訓練用データとして決定された画像を、訓練用データとして訓練用データ保持部１６０に格納する。 Next, in step S50, the data use determination unit 150 determines whether to make the first determination using the correct data of the composite image and the output data obtained by inputting the composite image to the recognition model. Make a second decision to do so. The first determination is to determine the training data based on the composite image. If it is determined in the second determination that the first determination is not made (NO in step S501), the flow of the information processing method for determining training data based on the composite image ends. On the other hand, when the first decision is made in the second decision (YES in step S501), in step S60, the data use determination unit 150 makes the first decision. At this time, the data use determination unit 150 determines training data based on the composite image. The data use determination unit 150 determines the synthesized image as training data for the recognition model. Further, the data use determination unit 150 determines a corresponding image having the same or similar visual feature as the visual feature of the composite image as the training data of the recognition model. Next, in step S601, the data use determination unit 150 stores the image determined as training data in the training data holding unit 160 as training data.

訓練用データ保持部１６０に所定量の訓練用データが格納されると、ステップＳ７０において、訓練部１７０は、決定された訓練用データを用いた認識モデルの訓練を実行する。 When a predetermined amount of training data is stored in the training data holding unit 160, in step S70, the training unit 170 executes training of the recognition model using the determined training data.

次いで、ステップＳ８０において、訓練部１７０は、機械学習を用いて訓練された認識モデルを認識モデル保持部１４２及び認識モデル送信部１８０に出力する。認識モデル保持部１４２は、訓練部１７０から出力された訓練済みの認識モデルを格納することにより更新する。また、認識モデル送信部１８０は、訓練部１７０から出力された訓練済みの認識モデルを認識処理部２００の認識部２２０に送信する。 Next, in step S80, the training unit 170 outputs the recognition model trained using machine learning to the recognition model holding unit 142 and the recognition model transmitting unit 180. The recognition model holding unit 142 updates by storing the trained recognition model output from the training unit 170. The recognition model transmitting unit 180 transmits the trained recognition model output from the training unit 170 to the recognition unit 220 of the recognition processing unit 200.

なお、認識モデル送信部１８０から送信された訓練済みの認識モデルは、認識部２２０の認識モデル受信部３で受信され、認識モデル更新部４に出力される。認識モデル更新部４は、認識モデル受信部３から受信した訓練済みの認識モデルを格納することにより認識モデルを更新する。また、更新情報提示部５は、認識モデル受信部３において、訓練済みの認識モデルが受信された場合、ユーザに訓練の完了に関する通知を行う。 Note that the trained recognition model transmitted from the recognition model transmission unit 180 is received by the recognition model receiving unit 3 of the recognition unit 220, and is output to the recognition model updating unit 4. The recognition model updating unit 4 updates the recognition model by storing the trained recognition model received from the recognition model receiving unit 3. Further, when the recognition model receiving unit 3 receives the trained recognition model, the update information presentation unit 5 notifies the user of the completion of the training.

［変形例１の効果等］
上述したような変形例１に係る情報処理システム１００及び情報処理方法によれば、実施の形態に記載した効果に加え、以下の効果を有する。 [Effects of Modification Example 1]
According to the information processing system 100 and the information processing method according to the first modification as described above, the following effects are obtained in addition to the effects described in the embodiment.

変形例１に係る情報処理方法は、第１の決定では、合成画像を認識モデルの訓練用データとして決定する。 In the information processing method according to the first modification, in the first determination, the synthesized image is determined as the training data of the recognition model.

これにより、学習モデルでの認識精度が低いと判定された合成画像を訓練用データとして使用することができる。そのため、学習モデルでの認識精度が高いデータ、すなわち訓練用データとしては不要なデータを訓練用データとして蓄積することが抑制される。したがって、データを蓄積するためのコストが削減される。言い換えると、学習モデルでの認識精度が低いシーンの画像を重点的に訓練用データとして蓄積することができるため、認識精度の低いシーンに対する効率的な学習が可能となる。そのため、学習モデルの認識精度がより向上される。 Thus, a synthesized image determined to have low recognition accuracy in the learning model can be used as training data. Therefore, accumulation of data with high recognition accuracy in the learning model, that is, data unnecessary as training data, as training data is suppressed. Therefore, the cost for storing data is reduced. In other words, since images of scenes with low recognition accuracy in the learning model can be accumulated as training data, efficient learning can be performed on scenes with low recognition accuracy. Therefore, the recognition accuracy of the learning model is further improved.

また、変形例１に係る情報処理方法は、第１の決定では、合成画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する対応画像を認識モデルの訓練用データとして決定する。このとき、合成画像が有する視覚的特徴は、当該合成画像の画像パラメタの統計的特徴である。また、合成画像が有する視覚的特徴は、当該合成画像の定性的特徴である。 In the information processing method according to the first modification, in the first determination, a corresponding image having the same or similar visual feature as the composite image has is determined as the training data of the recognition model. At this time, the visual characteristics of the composite image are statistical characteristics of the image parameters of the composite image. Further, the visual characteristics of the composite image are qualitative characteristics of the composite image.

このように、合成画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する対応画像を訓練用データとして決定するため、学習モデルでの認識精度が低いシーンの画像及び当該画像に類似するシーンの画像を訓練用データとして使用することができる。そのため、認識精度が低いシーンに対する訓練用データの数及びバリエーションを効率よく増やすことができる。なお、対応画像が撮影画像の場合は、合成画像を訓練用データとしたときに比べて学習効果を向上させることができる。また、視覚的特徴が合成画像の画像パラメタの統計的特徴である場合は、統計学的な観点から訓練用データの数及びバリエーションを効率よく増やすことができる。また、視覚的特徴が合成画像の定性的特徴である場合は、定量化しづらい特徴を有する訓練用データの数及びバリエーションを効率よく増やすことができる。 As described above, since the corresponding image having the same or similar visual feature as that of the composite image is determined as training data, an image of a scene with low recognition accuracy in the learning model and a scene similar to the image are used. Can be used as training data. Therefore, the number and variations of training data for a scene with low recognition accuracy can be efficiently increased. When the corresponding image is a photographed image, the learning effect can be improved as compared with the case where the synthetic image is used as training data. When the visual feature is a statistical feature of the image parameters of the composite image, the number and variations of the training data can be efficiently increased from a statistical viewpoint. When the visual features are qualitative features of the composite image, the number and variations of training data having features that are difficult to quantify can be efficiently increased.

また、変形例１に係る情報処理方法では、合成画像が有する視覚的特徴は、当該合成画像における物体の態様であり、対応画像が有する視覚的特徴は、物体が有する属性と同一又は類似の属性を有する対応物体の態様である。この場合、当該態様は、物体の合成画像上における位置である。また、当該態様は、前記物体の姿勢である。 In the information processing method according to Modification 1, the visual feature of the composite image is an aspect of the object in the composite image, and the visual characteristic of the corresponding image has the same or similar attribute as the attribute of the object. FIG. In this case, the aspect is the position of the object on the composite image. Further, the aspect is a posture of the object.

これにより、例えば、合成画像上の物体の位置又は物体の姿勢などの物体の態様の違いにより学習モデルでの物体の認識精度が低いと判定された場合、合成画像に基づいて訓練用データが決定される。そのため、学習モデルでの認識精度が低いシーンの画像及び当該画像に類似するシーンの画像を訓練用データとして使用することができる。これにより、認識精度が低いシーンに対する訓練用データの数及びバリエーションを効率よく増やすことができる。このような訓練用データを用いて構築される認識モデルは、画像から物体を認識する精度が向上される。 Thereby, for example, when it is determined that the recognition accuracy of the object in the learning model is low due to a difference in the state of the object such as the position of the object or the posture of the object on the synthetic image, the training data is determined based on the synthetic image. Is done. Therefore, an image of a scene with low recognition accuracy in the learning model and an image of a scene similar to the image can be used as training data. This makes it possible to efficiently increase the number and variations of training data for a scene with low recognition accuracy. A recognition model constructed using such training data has improved accuracy in recognizing an object from an image.

また、変形例１に係る情報処理方法では、さらに、第１の決定を行うと第２の決定において決定された場合、認識モデルのユーザに通知を行う。このとき、当該通知は、例えば、決定された訓練用データを用いた認識モデルの訓練の要請に関する通知である。また、変形例１に係る情報処理方法では、さらに、決定された訓練用データを用いた認識モデルの訓練を実行し、当該通知は、訓練の完了に関する通知である。 Further, in the information processing method according to the first modification, when the first determination is made and the second determination is made, the user of the recognition model is notified. At this time, the notification is, for example, a notification regarding a request for training of a recognition model using the determined training data. Further, in the information processing method according to the first modification, the recognition model is further trained using the determined training data, and the notification is a notification regarding the completion of the training.

これにより、合成画像に基づいて学習モデルの訓練用データが決定された場合、学習モデルのユーザに通知が行われるため、ユーザは学習モデルで物体を認識しにくいシーンがあることを把握することができる。また、上記通知が学習モデルの訓練の要請に関する通知である場合、ユーザは、学習モデルの訓練を行うタイミングを決定することができる。また、上記通知が訓練の完了に関する通知である場合、ユーザは、訓練により学習モデルが更新されたことを知ることができる。 With this, when the training data of the learning model is determined based on the synthetic image, the user of the learning model is notified, so that the user can grasp that there is a scene where it is difficult to recognize the object in the learning model. it can. In addition, when the notification is a notification regarding a request for training of a learning model, the user can determine a timing for performing training of the learning model. In addition, when the notification is a notification regarding the completion of the training, the user can know that the learning model has been updated by the training.

（変形例２）
［変形例２に係る情報処理システムの構成］
実施の形態の変形例２に係る情報処理システムについて図８を参照して説明する。図８は、変形例２に係る情報処理システム１００の構成の一例を示すブロック図である。 (Modification 2)
[Configuration of Information Processing System According to Modification 2]
An information processing system according to a second modification of the embodiment will be described with reference to FIG. FIG. 8 is a block diagram illustrating an example of a configuration of an information processing system 100 according to the second modification.

なお、実施の形態の変形例１に係る情報処理システム１００において、サンプリング部１１２は、画像取得部１１０で取得された画像の中から周期的に画像をサンプリングして合成位置設定部１２０に出力する例を説明した。変形例２では、サンプリング部１１２は、さらに、画像取得部１１０で取得された画像のうち、所定の条件に適合する画像をサンプリングして訓練用データとして訓練用データ保持部１６０に格納する例を説明する。以下、変形例２に係る情報処理システム１００について、変形例１に係る情報処理システム１００と異なる点を中心に説明する。 In the information processing system 100 according to the first modification of the embodiment, the sampling unit 112 periodically samples an image from the images acquired by the image acquisition unit 110 and outputs the sampled image to the combination position setting unit 120. Examples have been described. In the second modification, the sampling unit 112 further samples an image that satisfies a predetermined condition from among the images acquired by the image acquisition unit 110 and stores the sampled image in the training data holding unit 160 as training data. explain. Hereinafter, the information processing system 100 according to Modification 2 will be described focusing on differences from the information processing system 100 according to Modification 1.

変形例２に係る情報処理システム１００では、合成画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する画像を、対応画像として、当該合成画像とは異なる撮影画像から選出してもよい。なお、撮影画像の選出は、画像取得部１１０で取得された画像の中から所定の条件に基づいて画像をサンプリングすることであってもよく、訓練用データ保持部１６０に格納された撮影画像から所望の撮影画像を検索して抽出することであってもよい。サンプリング部１１２は、画像取得部１１０で取得された画像の中から周期的に画像をサンプリングして合成位置設定部１２０に出力する。さらに、サンプリング部１１２は、画像取得部１１０で取得された画像のうち、所定の条件に適合する画像をサンプリングして訓練用データとして訓練用データ保持部１６０に格納する。ここで、所定の条件とは、認識モデルでの物体の認識精度が低いシーンに基づく条件であり、例えば、物体の種類、物体の位置、物体の態様、照光状態、気象条件、気候、建物の配置、道路条件などから構成される。例えば、所定の条件は、データ利用判定部１５０が合成画像に基づいて訓練用データを決定する、つまり第１の決定を行う、と第２の決定を行った場合、当該合成画像が有する視覚的特徴と同一又は類似する画像の特徴量を含むように構成される。これにより、サンプリング部１１２は、認識モデルで物体の認識精度の低い画像と同一又は類似の画像を訓練用データとしてサンプリングするように、当該所定の条件を更新する。 In the information processing system 100 according to the second modification, an image having the same or similar visual feature as the composite image may be selected as a corresponding image from a captured image different from the composite image. The selection of the captured image may be performed by sampling the image based on a predetermined condition from the images acquired by the image acquisition unit 110, and may be performed based on the captured image stored in the training data holding unit 160. A desired captured image may be searched and extracted. The sampling unit 112 periodically samples an image from the images acquired by the image acquisition unit 110 and outputs the image to the synthesis position setting unit 120. Further, the sampling unit 112 samples an image that satisfies a predetermined condition from the images acquired by the image acquisition unit 110 and stores the sampled image in the training data holding unit 160 as training data. Here, the predetermined condition is a condition based on a scene in which the recognition accuracy of the object in the recognition model is low. For example, the type of the object, the position of the object, the state of the object, the lighting state, the weather condition, the climate, the building Consists of layout, road conditions, etc. For example, when the data use determination unit 150 determines the training data based on the composite image, that is, when the data use determination unit 150 makes the first determination and the second determination, It is configured to include the feature amount of the image that is the same or similar to the feature. Thereby, the sampling unit 112 updates the predetermined condition so that the same or similar image as the image with low recognition accuracy of the object in the recognition model is sampled as training data.

データ利用判定部１５０は、第１の決定を行うとの第２の決定を行った場合、合成画像が有する視覚的特徴と同一又は類似する画像の特徴量を含むように所定の条件を設定し、当該条件をサンプリング部１１２に出力する。 When making the second decision to make the first decision, the data use determining unit 150 sets a predetermined condition so as to include a feature amount of an image that is the same as or similar to the visual feature of the composite image. , And outputs the conditions to the sampling unit 112.

また、変形例２に係る情報処理システム１００は、類似シーン検索部１９０を備える点で上記実施の形態及び変形例１に係る情報処理システム１００と異なる。類似シーン検索部１９０は、例えば、データ利用判定部１５０が第１の決定を行うとの第２の決定を行った場合、合成画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する画像を、訓練用データ保持部１６０に格納された撮影画像から選出する。合成画像が有する視覚的特徴と同一又は類似の視覚的特徴とは、例えば、画像上の物体合成位置、画像の背景、人物の姿勢など合成された物体の態様、画像の色味及びエッジなどの画像パラメタの統計学的特徴、及び、気象条件、路面の濡れ、及び、オクルージョンなどの定性的特徴などである。当該撮影画像は、サンプリング部１１２により選出されて新規訓練用データとして格納された撮影画像であってもよく、事前保持ＤＢに含まれる撮影画像であってもよい。なお、事前保持ＤＢは、情報処理システムがデフォルトとして保持している様々なシーンの画像を有する。様々なシーンの画像とは、例えば、気候が異なる地域で撮影された画像、天候、路面の状態、風景などの異なる画像、画像上の物体の位置、物体の種類、姿勢などの物体の態様が異なる画像などである。さらに、当該撮影画像は、例えば、画像取得部１１０が取得した画像を一時的に保持するための記憶部を備える場合、当該記憶部に一時的に保持される画像から選出されてもよい。 The information processing system 100 according to the second modification is different from the information processing system 100 according to the above embodiment and the first modification in that a similar scene search unit 190 is provided. For example, when the data use determination unit 150 makes the second determination that the first determination is made, the similar scene search unit 190 determines whether the image having the same or similar visual feature as the composite image has Is selected from the captured images stored in the training data holding unit 160. The visual features that are the same as or similar to the visual features of the synthesized image include, for example, the synthesized position of the object on the image, the background of the image, the form of the synthesized object such as the posture of the person, the color and edge of the image, Statistical characteristics of image parameters, and qualitative characteristics such as weather conditions, road surface wetting, and occlusion. The captured image may be a captured image selected by the sampling unit 112 and stored as new training data, or may be a captured image included in the pre-hold DB. The pre-hold DB has images of various scenes held as defaults by the information processing system. The images of various scenes include, for example, images taken in regions with different climates, different images such as weather, road surface conditions, landscapes, etc. For example, different images. Further, for example, when a storage unit for temporarily storing an image acquired by the image acquisition unit 110 is provided, the captured image may be selected from images temporarily stored in the storage unit.

［変形例２に係る情報処理システムの動作］
変形例２に係る情報処理システム１００の動作について、図９を参照して説明する。図９は、変形例２に係る情報処理方法のフローの一例を示すフローチャートである。 [Operation of Information Processing System According to Modification 2]
The operation of the information processing system 100 according to Modification 2 will be described with reference to FIG. FIG. 9 is a flowchart illustrating an example of a flow of an information processing method according to the second modification.

実施の形態の変形例１に係る情報処理システム１００において、サンプリング部１１２は、画像取得部１１０が取得した画像から周期的に画像をサンプリングして、当該画像を合成位置設定部１２０に出力する。実施の形態の変形例２に係る情報処理システム１００では、サンプリング部１１２は上記動作に加え、画像取得部１１０が取得した画像のうち、所定の条件に適合する画像をサンプリングし、当該画像を訓練用データとして訓練用データ保持部１６０に格納する。類似シーン検索部１９０は、訓練用データ保持部１６０に新規訓練用データとして格納されている撮影画像、及び、事前保持ＤＢに格納されている撮影画像から、合成画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する画像を選出する。変形例２に係る情報処理システム１００では、これらの撮影画像を含む画像群を訓練用データとして使用して、認識モデルを構築する。以下、変形例２について、実施の形態及び変形例１と異なる点を中心に説明する。 In the information processing system 100 according to the first modification of the embodiment, the sampling unit 112 periodically samples an image from the image acquired by the image acquisition unit 110 and outputs the image to the synthesis position setting unit 120. In the information processing system 100 according to Modification 2 of the embodiment, in addition to the above operation, the sampling unit 112 samples an image that meets predetermined conditions from among the images acquired by the image acquisition unit 110, and trains the image. The training data is stored in the training data holding unit 160 as training data. The similar scene search unit 190 obtains the same or the same visual feature as the composite image from the captured image stored as the new training data in the training data holding unit 160 and the captured image stored in the pre-holding DB. Select images with similar visual characteristics. In the information processing system 100 according to the second modification, a recognition model is constructed using an image group including these captured images as training data. Hereinafter, Modification 2 will be described focusing on differences from the embodiment and Modification 1.

具体的には、ステップＳ１０１において、サンプリング部１１２は、変形例１におけるステップＳ１０１と同様の動作に加え、所定の条件に適合する画像を、ステップＳ１０において取得された画像からサンプリングして、当該画像を訓練用データ保持部１６０に格納する。次いで、情報処理システム１００は、ステップＳ２０〜Ｓ６０までの動作を、実施の形態及び変形例１と同様に行う。図示していないが、変形例２では、データ利用判定部１５０は、第１の決定を行うとの第２の決定を行った場合、合成画像が有する視覚的特徴と同一又は類似する画像の特徴量を含むように所定の条件を設定し、当該条件をサンプリング部１１２に出力する。当該条件をサンプリング部１１２が受信すると、サンプリング部１１２は、所定の条件をサンプリング部１１２に格納することにより更新する。 Specifically, in step S101, the sampling unit 112 samples an image that meets predetermined conditions from the image acquired in step S10, in addition to the same operation as step S101 in the first modification, Is stored in the training data holding unit 160. Next, the information processing system 100 performs the operations of steps S20 to S60 in the same manner as in the embodiment and the first modification. Although not shown, in the second modification, when the data use determination unit 150 makes the second determination to perform the first determination, the data use determination unit 150 determines the characteristics of the image that are the same as or similar to the visual characteristics of the composite image. A predetermined condition is set so as to include the quantity, and the condition is output to the sampling unit 112. When the sampling unit 112 receives the condition, the sampling unit 112 updates the predetermined condition by storing the predetermined condition in the sampling unit 112.

次いで、ステップＳ６０２において、類似シーン検索部１９０は、第１の決定を行うとの第２の決定が実行された場合の合成画像と同一又は類似の画像を、対応画像として、訓練用データ保持部１６０に格納されている撮影画像から検索し、所望の撮影画像を訓練用データとして選出する。次いで、情報処理システム１００は、ステップＳ７０及びＳ８０の動作を、変形例１と同様に行う。 Next, in step S602, the similar scene search unit 190 sets, as a corresponding image, an image that is the same as or similar to the synthesized image when the second determination for performing the first determination is performed, as the corresponding image, A search is performed from the captured images stored in 160, and a desired captured image is selected as training data. Next, the information processing system 100 performs the operations of steps S70 and S80 in the same manner as in the first modification.

［変形例２の効果等］
上述したような変形例２に係る情報処理システム１００及び情報処理方法によれば、実施の形態及び変形例１に記載の効果に加え、以下の効果を有する。 [Effects of Modification Example 2]
According to the information processing system 100 and the information processing method according to Modification 2 as described above, the following effects are provided in addition to the effects described in the embodiment and Modification 1.

変形例２に係る情報処理方法では、合成画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する画像を、対応画像として、当該合成画像とは異なる撮影画像から選出する。なお、撮影画像の選出は、画像取得部１１０で取得された画像の中から所定の条件に基づいて画像をサンプリングすることであってもよく、訓練用データ保持部１６０などのメモリ及びデータベース等に格納された撮影画像から所望の撮影画像を検索して抽出することであってもよい。 In the information processing method according to the second modification, an image having the same or similar visual feature as the composite image is selected as a corresponding image from a captured image different from the composite image. The selection of the photographed image may be performed by sampling the image based on predetermined conditions from the images acquired by the image acquiring unit 110, and may be performed by storing the image in a memory and a database of the training data holding unit 160 and the like. A desired captured image may be searched for and extracted from the stored captured images.

これにより、撮影画像を訓練用データとして使用することができる。そのため、合成画像を訓練用データとして使用する場合に比べて、より高い学習効果が得られる。なお、撮影画像の選出は、画像が取得される度に所定の条件に基づいて記録するか否かを判定することであってもよく、取得された画像の中から所定の条件に基づいて画像をサンプリングすることであってもよく、メモリ又はデータベース等に格納された撮影画像から所定の条件を満たす撮影画像を検索して抽出することであってもよい。 Thus, the captured image can be used as training data. Therefore, a higher learning effect can be obtained as compared with the case where the synthesized image is used as training data. Note that the selection of the captured image may be to determine whether or not to record the image based on a predetermined condition each time the image is acquired, and to select an image based on a predetermined condition from the acquired images. May be sampled, or a captured image satisfying a predetermined condition may be searched for and extracted from a captured image stored in a memory or a database.

（変形例３）
［変形例３に係る情報処理システムの構成］
実施の形態の変形例３に係る情報処理システムについて図１０を参照して説明する。図１０は、変形例３に係る情報処理システム１００の構成の一例を示すブロック図である。 (Modification 3)
[Configuration of Information Processing System According to Modification 3]
An information processing system according to Modification 3 of the embodiment will be described with reference to FIG. FIG. 10 is a block diagram illustrating an example of a configuration of an information processing system 100 according to Modification 3.

なお、実施の形態の変形例２に係る情報処理システム１００は、類似シーン検索部１９０を備える。類似シーン検索部１９０は、データ利用判定部１５０が第１の決定を行うとの第２の決定を行った場合、合成画像と同一又は類似の画像を、訓練用データ保持部１６０に格納されている撮影画像から検索し、所望の撮影画像を訓練用データとして選出する。変形例３に係る情報処理システム１００は、類似シーン検索部１９０を備えず、類似シーン加工部１９２を備える。以下、変形例３に係る情報処理システムについて、変形例２に係る情報処理システム１００と異なる点を中心に説明する。 The information processing system 100 according to Modification 2 of the embodiment includes a similar scene search unit 190. When the data use determination unit 150 makes the second determination that the first determination is made, the similar scene search unit 190 stores an image that is the same as or similar to the synthesized image in the training data holding unit 160. A search is performed from the captured images, and a desired captured image is selected as training data. The information processing system 100 according to the third modification does not include the similar scene search unit 190 but includes a similar scene processing unit 192. Hereinafter, the information processing system according to Modification 3 will be described focusing on differences from the information processing system 100 according to Modification 2.

変形例３に係る情報処理システム１００では、合成画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する画像を、対応画像として、当該合成画像とは異なる撮影画像から生成してもよい。 In the information processing system 100 according to Modification 3, an image having the same or similar visual feature as the composite image may be generated as a corresponding image from a captured image different from the composite image.

類似シーン加工部１９２は、例えば、データ利用判定部１５０が第１の決定を行うとの第２の決定を行った場合、合成画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する画像を、訓練用データ保持部１６０に格納された撮影画像から生成する。 For example, when the data use determination unit 150 makes the second determination that the first determination is made, the similar scene processing unit 192 determines that the image having the same or similar visual feature as the composite image has Is generated from the captured image stored in the training data holding unit 160.

［変形例３に係る情報処理システムの動作］
変形例３に係る情報処理システム１００の動作について、図１１を参照して説明する。図１１は、変形例３に係る情報処理方法のフローの一例を示すフローチャートである。 [Operation of Information Processing System According to Modification 3]
The operation of the information processing system 100 according to Modification 3 will be described with reference to FIG. FIG. 11 is a flowchart illustrating an example of the flow of the information processing method according to the third modification.

実施の形態の変形例２に係る情報処理システムにおいて、類似シーン検索部１９０は、データ利用判定部１５０が第１の決定を行うとの第２の決定を行った場合、合成画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する画像、つまり、合成画像に類似する画像を、訓練用データ保持部１６０に格納された撮影画像から選出する。実施の形態の変形例３に係る情報処理システムにおいて、類似シーン加工部１９２は、データ利用判定部１５０が第１の決定を行うとの第２の決定を行った場合、合成画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する画像、つまり、合成画像に類似する画像を、訓練用データ保持部１６０に格納された撮影画像から生成する。以下、変形例３について、実施の形態、並びに、変形例１及び変形例２と異なる点を中心に説明する。 In the information processing system according to the second modification of the embodiment, when the similar scene search unit 190 makes a second determination that the data use determination unit 150 makes a first determination, the similar scene search unit 190 performs An image having the same or similar visual feature as the feature, that is, an image similar to the composite image is selected from the captured images stored in the training data holding unit 160. In the information processing system according to the third modification of the embodiment, when the similar scene processing unit 192 makes the second determination that the data use determination unit 150 makes the first determination, the similar scene processing unit 192 performs the visual determination of the composite image. An image having the same or similar visual feature as the feature, that is, an image similar to the composite image is generated from the captured image stored in the training data holding unit 160. Hereinafter, Modification 3 will be described focusing on differences from the embodiment and Modifications 1 and 2.

具体的には、ステップＳ６０３において、類似シーンの撮影画像、つまり、第１の決定を行うとの第２の決定が実行された場合の合成画像に類似するシーンの画像を、対応画像として、当該合成画像とは異なる、訓練用データ保持部１６０に格納された撮影画像を加工して生成する。 Specifically, in step S603, the captured image of the similar scene, that is, the image of the scene similar to the composite image in the case where the second determination of performing the first determination is executed is set as the corresponding image. The captured image stored in the training data holding unit 160, which is different from the composite image, is processed and generated.

［変形例３の効果等］
上述したような変形例３に係る情報処理システム１００及び情報処理方法によれば、実施の形態及び変形例１に記載の効果に加え、以下の効果を有する。 [Effects of Modification 3 etc.]
According to the information processing system 100 and the information processing method according to the third modification described above, the following effects are provided in addition to the effects described in the embodiment and the first modification.

変形例３に係る情報処理方法では、合成画像が有する視覚的特徴と同一又は類似の視覚的特徴を有する画像を、対応画像として、当該合成画像とは異なる撮影画像から生成する。 In the information processing method according to Modification 3, an image having the same or similar visual feature as the composite image is generated as a corresponding image from a captured image different from the composite image.

これにより、対応画像を撮影画像から生成することができる。具体的には、認識モデルでの物体の認識精度が低いシーンの画像及び当該シーンに類似する画像を撮影画像から生成することができる。これにより、撮影画像をそのまま対応画像として使用できない場合であっても対応画像を生成することができるため、訓練用データの数及びバリエーションを容易に増やすことができる。 Thereby, the corresponding image can be generated from the captured image. Specifically, an image of a scene with low recognition accuracy of an object in the recognition model and an image similar to the scene can be generated from the captured image. Accordingly, the corresponding image can be generated even when the captured image cannot be used as the corresponding image as it is, so that the number and variations of the training data can be easily increased.

［その他の変形例］
以上のように、本出願において開示する技術の例示として、実施の形態及び変形例を説明した。しかしながら、本開示における技術は、これらに限定されず、適宜、変更、置き換え、付加、省略などを行った実施の形態の変形例又は他の実施の形態にも適用可能である。また、実施の形態及び変形例で説明する各構成要素を組み合わせて、新たな実施の形態又は変形例とすることも可能である。 [Other Modifications]
As described above, the embodiments and the modifications have been described as examples of the technology disclosed in the present application. However, the technology according to the present disclosure is not limited to these, and can be applied to a modification of the embodiment in which change, replacement, addition, omission, and the like are appropriately made or other embodiments. Further, it is also possible to combine the components described in the embodiment and the modified example to form a new embodiment or a modified example.

実施の形態及び変形例に係る情報処理システム１００は、自動車に適用されるとした。情報処理システムは、センシングデータから認識対象を認識するシステムであれば、いかなるシステムに適用されてもよい。例えば、情報処理システムは、住居又はオフィスなどの建物における人の行動又は状態を観測するシステムに適用されてもよい。この場合、認識処理部２００はカメラ等のセンサモジュールに搭載され、認識モデル更新部３００は、センサモジュールに搭載されてもよく、サーバ等のセンサモジュールと分離した装置に搭載されてもよい。 The information processing system 100 according to the embodiment and the modified example is applied to an automobile. The information processing system may be applied to any system as long as the system recognizes a recognition target from sensing data. For example, the information processing system may be applied to a system that observes the behavior or state of a person in a building such as a residence or an office. In this case, the recognition processing unit 200 may be mounted on a sensor module such as a camera, and the recognition model updating unit 300 may be mounted on a sensor module or may be mounted on a device separate from the sensor module such as a server.

また、上記実施の形態においては処理の対象が画像である例を説明したが、処理の対象は画像以外のセンシングデータであってもよい。例えば、マイクロフォンから出力される音声データ、ＬｉＤＡＲ等のレーダから出力される点群データ、圧力センサから出力される圧力データ、温度センサ又は湿度センサから出力される温度データ又は湿度データ、香りセンサから出力される香りデータなどの正解データが取得可能なセンシングデータであれば、処理の対象とされてよい。例えば、センシングデータが音声データである場合は、音声データの要素は、周波数及び振幅などであり、音声データの要素の統計的特徴は、周波数帯及び音圧などであり、音声データの定性的特徴は、騒音及び背景音などである。 Further, in the above-described embodiment, an example has been described in which the processing target is an image, but the processing target may be sensing data other than an image. For example, audio data output from a microphone, point cloud data output from a radar such as LiDAR, pressure data output from a pressure sensor, temperature data or humidity data output from a temperature sensor or humidity sensor, output from a scent sensor If it is sensing data from which correct answer data such as scent data to be obtained can be obtained, the sensing data may be set as a processing target. For example, when the sensing data is audio data, the elements of the audio data are frequencies and amplitudes, and the statistical characteristics of the elements of the audio data are frequency bands and sound pressures. Are noise and background sound.

なお、本開示の包括的又は具体的な態様は、システム、装置、方法、集積回路、コンピュータプログラム又はコンピュータ読み取り可能な記録ディスク等の記録媒体で実現されてもよく、システム、装置、方法、集積回路、コンピュータプログラム及び記録媒体の任意な組み合わせで実現されてもよい。コンピュータ読み取り可能な記録媒体は、例えばＣＤ−ＲＯＭ等の不揮発性の記録媒体を含む。 It should be noted that the general or specific aspects of the present disclosure may be realized by a recording medium such as a system, an apparatus, a method, an integrated circuit, a computer program, or a computer-readable recording disk. The present invention may be realized by an arbitrary combination of a circuit, a computer program, and a recording medium. The computer-readable recording medium includes, for example, a non-volatile recording medium such as a CD-ROM.

例えば、実施の形態及び変形例に係る情報処理システム１００に含まれる各構成要素は典型的には集積回路であるＬＳＩ（大規模集積回路、ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）として実現される。これらは個別に１チップ化されてもよいし、一部又は全てを含むように１チップ化されてもよい。また、集積回路化はＬＳＩに限るものではなく、専用回路又は汎用プロセッサで実現してもよい。ＬＳＩ製造後にプログラムすることが可能なＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）、又はＬＳＩ内部の回路セルの接続や設定を再構成可能なリコンフィギュラブル・プロセッサを利用してもよい。 For example, each component included in the information processing system 100 according to the embodiment and the modification is typically realized as an LSI (Large Scale Integration) which is an integrated circuit. These may be individually formed into one chip, or may be formed into one chip so as to include some or all of them. Further, the integrated circuit is not limited to the LSI, and may be realized by a dedicated circuit or a general-purpose processor. An FPGA (Field Programmable Gate Array) that can be programmed after manufacturing the LSI, or a reconfigurable processor that can reconfigure the connection and setting of circuit cells inside the LSI may be used.

なお、実施の形態及び変形例において、各構成要素は、専用のハードウェアで構成されるか、各構成要素に適したソフトウェアプログラムを実行することによって実現されてもよい。各構成要素は、ＣＰＵ又はプロセッサなどのプログラム実行部が、ハードディスク又は半導体メモリなどの記録媒体に記録されたソフトウェアプログラムを読み出して実行することによって実現されてもよい。 In the embodiments and the modifications, each component may be configured by dedicated hardware, or may be realized by executing a software program suitable for each component. Each component may be realized by a program execution unit such as a CPU or a processor reading and executing a software program recorded on a recording medium such as a hard disk or a semiconductor memory.

また、上記構成要素の一部又は全部は、脱着可能なＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）カード又は単体のモジュールから構成されてもよい。ＩＣカード又はモジュールは、マイクロプロセッサ、ＲＯＭ、ＲＡＭ等から構成されるコンピュータシステムである。ＩＣカード又はモジュールは、上記のＬＳＩ又はシステムＬＳＩを含むとしてもよい。マイクロプロセッサが、コンピュータプログラムに従って動作することにより、ＩＣカード又はモジュールは、その機能を達成する。これらＩＣカード及びモジュールは、耐タンパ性を有するとしてもよい。 In addition, a part or all of the above components may be configured by a removable IC (Integrated Circuit) card or a single module. The IC card or module is a computer system including a microprocessor, a ROM, a RAM, and the like. The IC card or module may include the above-described LSI or system LSI. The IC card or module achieves its functions by the microprocessor operating according to the computer program. These IC cards and modules may have tamper resistance.

なお、上記方法は、ＭＰＵ、ＣＰＵ、プロセッサ、ＬＳＩなどの回路、ＩＣカード又は単体のモジュール等によって、実現されてもよい。 The above method may be realized by an MPU, a CPU, a processor, a circuit such as an LSI, an IC card, a single module, or the like.

また、本開示の技術は、ソフトウェアプログラム又はソフトウェアプログラムからなるデジタル信号によって実現されてもよく、プログラムが記録された非一時的なコンピュータ読み取り可能な記録媒体であってもよい。 Further, the technology of the present disclosure may be realized by a software program or a digital signal including the software program, or may be a non-temporary computer-readable recording medium on which the program is recorded.

なお、上記プログラム及び上記プログラムからなるデジタル信号は、コンピュータ読み取り可能な記録媒体、例えば、フレキシブルディスク、ハードディスク、ＳＳＤ、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ、ＤＶＤ−ＲＯＭ、ＤＶＤ−ＲＡＭ、ＢＤ（Ｂｌｕ−ｒａｙ（登録商標）Ｄｉｓｃ）、半導体メモリ等に記録したものであってもよい。また、上記プログラム及び上記プログラムからなるデジタル信号は、電気通信回線、無線又は有線通信回線、インターネットを代表とするネットワーク、データ放送等を経由して伝送するものであってもよい。また、上記プログラム及び上記プログラムからなるデジタル信号は、記録媒体に記録して移送されることにより、又はネットワーク等を経由して移送されることにより、独立した他のコンピュータシステムにより実施されてもよい。 Note that the program and a digital signal including the program are stored in a computer-readable recording medium such as a flexible disk, a hard disk, an SSD, a CD-ROM, an MO, a DVD, a DVD-ROM, a DVD-RAM, and a BD (Blu-ray). (Registered trademark) Disc), or may be recorded on a semiconductor memory or the like. The program and the digital signal composed of the program may be transmitted via an electric communication line, a wireless or wired communication line, a network represented by the Internet, a data broadcast, or the like. Further, the program and a digital signal composed of the program may be implemented by another independent computer system by being recorded on a recording medium and transferred, or transferred via a network or the like. .

また、上記で用いた序数、数量等の数字は、全て本開示の技術を具体的に説明するために例示するものであり、本開示は例示された数字に制限されない。また、構成要素間の接続関係は、本開示の技術を具体的に説明するために例示するものであり、本開示の機能を実現する接続関係はこれに限定されない。 Further, the numbers such as ordinal numbers and quantities used above are merely examples for specifically explaining the technology of the present disclosure, and the present disclosure is not limited to the illustrated numbers. In addition, the connection relation between the components is illustrated for specifically describing the technology of the present disclosure, and the connection relation that realizes the function of the present disclosure is not limited thereto.

また、ブロック図における機能ブロックの分割は一例であり、複数の機能ブロックを１つの機能ブロックとして実現したり、１つの機能ブロックを複数に分割したり、一部の機能を他の機能ブロックに移してもよい。また、類似する機能を有する複数の機能ブロックの機能を単一のハードウェア又はソフトウェアが並列又は時分割に処理してもよい。 The division of functional blocks in the block diagram is merely an example, and a plurality of functional blocks can be implemented as one functional block, one functional block can be divided into a plurality of functional blocks, and some functions can be transferred to other functional blocks. You may. Also, the functions of a plurality of functional blocks having similar functions may be processed by a single piece of hardware or software in parallel or time division.

本開示は、学習モデルの個々の訓練効率を向上させることができるため、例えば、自動運転システム、交通管理システム、防犯システム、及び、製造管理システム等の技術に利用可能である。 INDUSTRIAL APPLICABILITY The present disclosure can improve the efficiency of individual training of a learning model, and thus can be used for technologies such as an automatic driving system, a traffic management system, a security system, and a manufacturing management system.

１画像撮像部
２画像送信部
３認識モデル受信部
４認識モデル更新部
５更新情報提示部
１０画像取得部
２０合成位置決定部
３０合成画像生成部
４０出力データ取得部
５０決定部
１００情報処理システム
１１０画像取得部
１１２サンプリング部
１２０合成位置設定部
１３０画像合成部
１４０検知処理部
１４２認識モデル保持部
１５０データ利用判定部
１６０訓練用データ保持部
１７０訓練部
１８０認識モデル送信部
１９０類似シーン検索部
１９２類似シーン加工部
２００認識処理部
２１０撮像部
２２０認識部
３００認識モデル更新部 DESCRIPTION OF SYMBOLS 1 Image imaging part 2 Image transmission part 3 Recognition model reception part 4 Recognition model update part 5 Update information presentation part 10 Image acquisition part 20 Synthesis position determination part 30 Synthetic image generation part 40 Output data acquisition part 50 Determination part 100 Information processing system 110 Image acquisition unit 112 Sampling unit 120 Synthesis position setting unit 130 Image synthesis unit 140 Detection processing unit 142 Recognition model storage unit 150 Data use determination unit 160 Training data storage unit 170 Training unit 180 Recognition model transmission unit 190 Similar scene search unit 192 Similar Scene processing unit 200 Recognition processing unit 210 Imaging unit 220 Recognition unit 300 Recognition model update unit

Claims

Using a computer,
Get sensing data,
Determine a combining portion for combining the recognition target data on the sensing data,
In the synthesis portion, to generate recognition data by synthesizing recognition target data having the same or similar characteristics as the characteristics perceived by the human sensory organ of the sensing data,
The synthesized data is input to a model trained using machine learning to recognize a recognition target to obtain recognition result data,
Determining the training data of the model based on the combined data, determining whether to make a first decision or not, a second decision to determine whether or not to make a first decision; Using data and
If the first decision is made and the second decision is made, the first decision is made;
Information processing method.

The feature that the sensing data has is a statistical feature of an element of the sensing data,
The information processing method according to claim 1.

The feature of the sensing data is a qualitative feature of the sensing data,
The information processing method according to claim 1.

In the first determination, the synthesized data is determined as training data for the model.
The information processing method according to claim 1.

In the first determination, corresponding data having the same or similar feature as the feature of the composite data is determined as training data of the model,
The information processing method according to claim 1.

The feature of the combined data is a statistical feature of an element of the combined data,
The information processing method according to claim 5.

The feature of the combined data is a qualitative feature of the combined data,
The information processing method according to claim 5.

The sensing data is an image,
The recognition target is an object,
The combining part is an object combining position for combining the object data on the image,
The synthesized data is a synthesized image generated by synthesizing the object data having the same or similar visual feature as the visual feature of the image at the object synthesis position,
The recognition result data is object recognition result data obtained by inputting the composite image to the model,
The first determination is to determine training data of the model based on the composite image,
The second determination is performed using correct data including at least the object synthesis position and the object recognition result data,
The information processing method according to claim 1.

The first determination is to determine a corresponding image having the same or similar visual feature as the visual feature of the composite image as training data of the model,
The visual feature of the composite image is an aspect of the object in the composite image,
The visual feature of the corresponding image is an aspect of the corresponding object having the same or similar attribute as the attribute of the object,
An information processing method according to claim 8.

The aspect is a position of the object on the composite image,
The information processing method according to claim 9.

The aspect is a posture of the object,
The information processing method according to claim 9.

The combining portion further includes a size of object data to be combined on the image,
The information processing method according to claim 8.

Data having the same or similar feature as the feature of the combined data is selected or generated as the corresponding data from sensing data different from the combined data,
The information processing method according to claim 5.

Synthesizing the recognition target data with the synthesizing part using a GAN (Generative Adversary Network) model;
The information processing method according to claim 1.

Further, when the first decision is made and the decision is made in the second decision, the user of the model is notified,
The information processing method according to claim 1.

Further, performing the training of the model using the determined training data,
The notification is a notification regarding completion of the training,
The information processing method according to claim 15.

The notification is a notification regarding a request for training the model using the determined training data,
The information processing method according to claim 15.

A first acquisition unit for acquiring sensing data;
A first determination unit that determines a combining part that combines recognition target data on the sensing data;
A generation unit that generates synthesis data by synthesizing recognition target data having the same or similar characteristics as characteristics perceived by a human sensory organ of the sensing data, in the synthesis unit,
A second acquisition unit that inputs the synthesized data to a model trained using machine learning to recognize a recognition target and obtains recognition result data;
Determining the training data of the model based on the combined data, determining whether to make a first decision or not, a second decision to determine whether or not to make a first decision; And a second determining unit that performs the first determination when the first determination is performed when the first determination is performed using the data.
Comprising,
Information processing system.