WO2019176235A1

WO2019176235A1 - Image generation method, image generation device, and image generation system

Info

Publication number: WO2019176235A1
Application number: PCT/JP2018/048149
Authority: WO
Inventors: クリンキグト，マルティン; 小味　弘典; 俊明垂井; 村上　智一
Original assignee: 株式会社日立産業制御ソリューションズ
Priority date: 2018-03-12
Filing date: 2018-12-27
Publication date: 2019-09-19
Also published as: JP2019159630A; CN111742342A; JP6719497B2

Abstract

The purpose of the present invention is to generate, by means of a neural network, an image for machine-learning training from data such as a vector model or a 3D model, and improve the efficiency of the machine-learning training or the accuracy of object detection in the image by using the generated image in machine learning. To this end, when it is difficult to obtain the image for machine-learning training, the present invention can generate images for machine-learning training in large quantities by combining an image to be detected and a background image that represents a desired background and by applying a hostile generation network to the combined image.

Description

Image generation method, image generation apparatus, and image generation system

The present invention relates to an image generation method, an image generation apparatus, and an image generation system.

In recent years, detection accuracy for recognizing a specific object from a captured image has been improved by using a machine learning method (a neural network such as deep learning) in image processing technology. One method for optimizing these machine learning techniques is to input a large number of images as training samples into the system to train machine learning.

For example, as a system for performing high-accuracy image recognition by automatically collecting a recognition target and an image pattern that does not include the recognition target and using the collected image pattern for machine learning, Japanese Patent Laid-Open No. 2012-2012 There is a technique described in Japanese Patent No. 088787 (Patent Document 1). This publication states that “the object tracking unit extracts a region in which a recognition target is shown from the image of each frame constituting the moving image. The image conversion unit performs geometric conversion on the image in this region. The recognition target sample is generated based on the image, and the region cutout unit sets a region for the image of the frame constituting the moving image, and the image composition unit 35 selects a plurality of images in the set region. There is a description that a non-recognition target sample image is generated based on an image obtained by synthesizing these regions.The learning unit learns a recognition target using the recognition target sample and the non-recognition target sample.

JP 2012-088787 A

If training sample images are readily available, machine learning performance is likely to improve.If training sample images are difficult to obtain or cannot be obtained, improve the accuracy of image object detection by machine learning. Is difficult. For this reason, a user will acquire the sample image which trains machine learning at cost. However, in Patent Document 1, how to deal with a case where it is difficult to obtain a sample image for training is not considered in detail, and the cost burden for the user to obtain the sample image for training is not discussed. Still not resolved.

Accordingly, in the present invention, an image for machine learning training is generated from data such as a vector model and a 3D model using a neural network, and the generated image is used for machine learning training, thereby improving the efficiency of machine learning training. The purpose is to improve the accuracy of image detection.

In order to solve the above problems, one of the representative image generation methods of the present invention includes a background image acquisition step of acquiring a background image by an image selection unit, and detection including metadata by the image selection unit. A detection target image specifying step of specifying a target image from a source image, a model generation step of generating a detection target image model corresponding to the detection target image by a model generation unit, and the background image and the This is an image generation method including a detection target image establishing step of establishing a final image by combining with a detection target image model.

An image for machine learning training is generated from data such as a vector model or a 3D model by a neural network, and the generated image is used for machine learning to improve the efficiency of machine learning training and the accuracy of image detection. Can do.

It is a figure which shows the whole system configuration | structure of the hardware which concerns on embodiment of this invention. It is a flowchart of the image generation method which concerns on 1st Embodiment of this invention. It is a figure for demonstrating the calculation method of the camera parameter which concerns on 1st Embodiment of this invention. It is a flowchart of the modification of the image generation method which concerns on 1st Embodiment of this invention. It is a figure which shows an example of the vector model and image which concern on 1st Embodiment of this invention. It is a figure which shows an example of matching of the vector model and image which concerns on 1st Embodiment of this invention. It is a figure which shows an example of the process for establishing the detection target image which concerns on 1st Embodiment of this invention. It is a flowchart of the image generation method which concerns on embodiment of this invention. It is a figure which shows an example of the process for establishing the detection target image which concerns on 2nd Embodiment of this invention. It is a figure which shows an example of the process for establishing the detection target image which concerns on 3rd Embodiment of this invention. It is a figure which shows an example of the process of improving the capability to produce the machine learning image which concerns on 4th Embodiment of this invention. It is a figure which shows an example of the method to produce | generate the image which concerns on 5th Embodiment of this invention. It is a figure which shows an example of the method to produce | generate the image which concerns on 5th Embodiment of this invention. It is a figure which shows an example of the process of improving the detection precision of the machine learning which concerns on 6th Embodiment of this invention. It is a figure which shows an example of the system architecture which concerns on embodiment of this invention. It is a conceptual diagram explaining an example of embodiment of this invention.

Hereinafter, a conventional example and a first embodiment of the present invention will be described with reference to the drawings. In addition, this invention is not limited by this embodiment. Moreover, in description of drawing, the same code | symbol is attached | subjected and shown to the same part.

First, an example of the concept of the embodiment of the present invention will be described with reference to FIG.

In machine learning for detecting an object, in order to train a machine learning system, a large amount of images of the object to be detected are required. For example, when trying to detect a person who has a white cane (hereinafter also referred to as a “white cane”) using machine learning, it has not been possible to learn unless there are a large number of images of the white cane. Therefore, in the present invention, even when there are few images of the detection target (for example, a white cane), an image of a general pedestrian (available in large quantities) and a video of a white cane (small amount) are used. Thus, a large amount of white cane footage is generated and machine learning is effectively enhanced.

[Configuration of image generation system]
FIG. 1 is a diagram showing an overall system configuration of hardware according to an embodiment of the present invention. As shown in FIG. 1, this system includes a central server 100, a client terminal 130, a client terminal 140, and a network (Internet LAN or the like) 150. The central server 100, the client terminal 130, and the client terminal 140 may be communicably connected to each other via the network 150.

The central server 100 is a device that performs image generation requested from the client terminals 130 and 140 via the network 150. Specifically, the central server 100 can include functional units that perform functions such as image selection, model creation, image processing, and machine learning in the image generation process. In addition, the central server 100 may include a unit (for example, a storage unit 120) that stores image data such as a background image and a detection target image, which will be described later, and model data such as a vector model and a 3D model.

The client terminal 130 and the client terminal 140 are devices for transmitting an image generation request to the central server 100 via the network 150. Specifically, the user can input image generation conditions to the client terminal 130 and the client terminal 140. For example, the user may designate a detection target to be described later and a background image used for image generation using the client terminal 130 or the client terminal 140. Instructions such as conditions input at the client terminal 130 and the client terminal 140 are transmitted to the central server 100 via the network 150.

[Configuration of Central Server 100]
As described above, the central server 100 is a device that performs image generation requested from the client terminals 130 and 140 via the network 150. As shown in FIG. 1, the central server 100 includes a processing unit 110 that performs each function of image generation, and a storage unit 120 that stores information used for the image generation.

The processing unit 110 includes a functional unit for performing each function according to the embodiment of the present invention. Specifically, the processing unit 110 acquires a background image, generates an image selection unit 112 that identifies a detection target image including metadata from a source image, and generates a detection target image model corresponding to the detection target image. A model creation unit 114 that establishes a final image by combining the background image and the detection target image model, an image processing unit 116 that performs image processing on the model, a machine learning detection accuracy improvement process, and a machine learning image It comprises a machine learning unit 118 that performs each step of the improvement process of the creation ability.

The processing unit 110 functions as each functional unit described above when an arithmetic processing unit such as a CPU (Central Processing Unit) in the apparatus executes a control program stored in the memory.

The storage unit 120 includes an image database 122 and an image / model database 124. The image database 122 is a database (device or logical storage area) that stores background images used for image generation and data of detection target images described later. The image database 122 may store, for example, image data indicating the state of the station platform as shown in FIG. 7 and metadata included in the image. In one embodiment, the storage unit 120 receives an image (source image, background image, desired detection target image) specified by the user from the client terminal 130 or the client terminal 140, and the received image data is used for image generation. It may be stored in the image database 122 in a format that can be obtained. The image / model database 124 is a database (device or logical storage area) that stores a specific image and a model associated with the image in a form associated with each other. For example, as shown in FIGS. 5 to 6, which will be described later, a model (vector model, point cloud, etc.) used for image generation and a realistic image associated with the model are stored in the image / model database 124. It may be saved. The image database 122 and the image / model database 124 of the storage unit 120 are realized by, for example, a semiconductor memory device such as a RAM (Random Access Memory), a flash memory (Flash Memory), or a storage unit such as a hard disk or an optical disk. May be.

[Configuration of Client Terminal 130]
As described above, the client terminal 130 is a device for transmitting an image generation request to the central server 100 via the network 150. The client terminal 130 includes a processing unit 132 that executes commands sent from other functional units in the terminal, an instruction receiving unit 134 that receives instructions from the user (such as image generation conditions), and images (source images, background images, An image selection unit 136 for selecting a detection target image), a communication unit 138 for managing exchanges with the central server 100 and other network terminals (for example, the client terminal 140), and information (image data, commands from the user, etc.). And a storage unit 139 for storing. As described above, in an embodiment, the user can use the client terminal 130 to input image generation conditions and to specify a source image, a background image, or a detection target image used for image generation. . The client terminal 130 may transmit conditions and instructions input from the user to the central server 100.

[Configuration of Client Terminal 140]
Similar to the client terminal 130, the client terminal 140 is a device for transmitting an image generation request to the central server 100 via the network 150. Similarly to the client terminal 130, the client terminal 140 includes a processing unit 142 that executes commands sent from other functional units in the terminal, and an instruction receiving unit 144 that receives instructions from the user (such as image generation conditions). A communication unit 146 that manages exchanges with the central server 100 and other network terminals (for example, the client terminal 130), and a storage unit 148 that stores information (image data, commands from the user, and the like). The client terminal 140 differs from the client terminal 130 in that it does not have an image selection unit such as the image selection unit 136. Therefore, when an image generation request is sent from a terminal such as the client terminal 140 that does not have an image selection unit, an image used for image generation is selected using the image selection unit 112 of the central server 100 according to a user instruction. Alternatively, the image selection unit 112 of the central server 100 may select automatically (for example, randomly).

Next, background image acquisition, detection target identification, detection target image model creation, and detection target image establishment in the first embodiment will be described with reference to FIG. FIG. 2 is a flowchart showing the flow of the image generation method according to the first embodiment of the present invention.

First, in step S200, a background image is acquired. In this specification, the expression to acquire includes obtaining, receiving, securing, procuring, selecting, and specifying. The background image is an image that becomes a final image by arranging a detection target image described later. For example, the background image may be an image showing various environments such as a station platform, an airport boarding gate, a concert or sports game venue, or a shopping mall. This background image may be designated by the user (for example, the image selection unit 136 of the client terminal 130), and is stored in the image database 122 of the storage unit 120 according to an instruction (for example, via the client terminal 140) input by the user. The selected image may be selected by the image selection unit 112 of the central server 100 from among the images being displayed.

Next, in step S220, a detection target image is specified. In this specification, the expression specifying includes selecting, selecting, setting, specifying, identifying, or detecting. The detection target image is an image in which an object that the user wants to place in the background image is captured. For example, the detection target image may be an image showing an object to be trained by the machine learning unit so that it can be detected from within the image. The detection target image may include, for example, a person who has a white cane, a person who wears specific clothes, a luggage exceeding a predetermined size, a certain kind of animal, and the like. Further, the detection target image may include metadata. The metadata here indicates information indicating the position of the detection target image such as two-dimensional coordinates, and the shape (rectangle, round) and size (length / height as viewed in pixels) of the detection target image. Information that represents the information and the nature of the detection target image (labels of people, animals, luggage, cars, etc.) may be included. This metadata may be used for machine learning training described later.

The detection target image may be specified by the image selection unit 112 of the central server 100 from the source image input by the user, or may be specified directly by a user instruction. As an example, when the background image is designated as an image of a construction site, the instruction receiving unit 134 of the client terminal 130 or the instruction receiving unit 144 of the client terminal 140 is displayed as a “target image” in the source image. It is assumed that the user is designated as “unweared person” and the request is received from the user. In this case, the instruction receiving unit 134 that has received this request transmits a user instruction to the central server 100, and the image selection unit 112 of the central server 100 selects a person who has not worn a helmet from the source image in accordance with the user request. You may specify as a detection target image.

Next, in step S240, a detection target image model is created. As used herein, the expression creating includes creating, creating, forming, preparing, and creating. The detection target image model is a model that embodies the shape and structure of the object shown in the detection target image. As the detection target image model, for example, a vector model, a point cloud, or a 3D model may be used. The detection target image model may be automatically performed by, for example, a known model creation tool. As will be described later, the detection target image model created here is processed in the image processing unit 116 of the central server 100 and a hostile generation network described later, so that it can be finished as an image closer to a real image. Good.

Next, in step S260, a final image is established. As used herein, the phrase establishing includes establishing, setting, establishing, creating, building, providing, and creating. The final image is an image generated by combining the background image acquired in step S200 and the detection target image model created in step S240. Specifically, the final image may be generated by the model creation unit 114 of the central server 100 combining the background image and the detection target image model. Details of the process of combining the background image and the detection target image model to generate the final image will be described with reference to FIG.

In this way, a background image is acquired, a detection target image including metadata is identified from the source image, a detection target image model corresponding to the detection target image is generated, and the background image and the detection target image model are combined. Then, by establishing the final image, it is possible to generate an image for machine learning even when there are few actual detection target images and it is difficult to obtain.

Next, the camera parameter calculation method according to the first embodiment of the present invention will be described with reference to FIG. As shown in FIG. 3, the background image 320, the ticket gate 321, the horizontal line 323, and the detection target image model 327 are used for the calculation of the camera parameters.

In the step of establishing the detection target image described above, in order to place the detection target image model 327 on the background image 320 with an appropriate size or the like, it is necessary to calculate the camera parameters of the background image 320. By using the camera parameters calculated here, the model creation unit 114 can place the detection target image model 327 in the background image 320 at an appropriate position, size, and orientation.

In order to calculate camera parameters, a reference object is first identified from the background image 320. The reference object here is an object that serves as a guide for the size of the detection target image model arranged in the background image 320. For example, the reference object may be one whose size is generally known or easily estimated and estimated. As an example, here, the ticket gate 321 may be identified as a reference object. Next, camera parameters are calculated based on the dimensional elements (height, length, etc.) of the identified reference object. Specifically, a reference such as a horizontal line 323 that matches the position of the ticket gate 321 identified as the reference object in the background image 320 is set, and the length and height of the ticket gate 321 as measured by the number of pixels are measured. The Then, by using the ratio obtained by dividing the actual length / height of the ticket gate 321 by the length / height as viewed from the number of pixels in the background image 320, the size of the image model to be detected should be Can be easily calculated. Therefore, the final image can be generated by combining the detection target image model 327 with the background image 320 based on the camera parameters calculated here.

As described above, the camera parameter is calculated based on the dimension element of the reference object, and the detection target image model is appropriately combined with the background image based on the calculated camera parameter. It can be arranged in size, position and orientation.

Next, a flow of a modification of the image generation method according to the first embodiment of the present invention will be described with reference to FIG.

First, in step S400, a background image is acquired. The background image acquisition here is substantially the same as the background image acquisition in S200 in FIG. 3, and thus the description thereof is omitted here.

Next, in step S420, a vector model representing the detection target image is arranged on the background image acquired in step S400. A vector model is a model that expresses the shape and structure of an object shown in a detection target image by a space vector. For example, an example of a vector model is shown in FIG. FIG. 5 is a diagram illustrating an example of a vector model and an image according to the first embodiment of the present invention. A vector model 531 shown in FIG. 5 is a vector model of the human body. For example, the vector model 531 may be arranged in the background image at an appropriate position, size, and orientation based on the camera parameters described with reference to FIG.

Next, in step S440, the vector model is adjusted. The adjustment of the vector model may be performed by the image processing unit 116 of the central server 100. Here, the vector model adjustment is an image closer to an actual image (hereinafter also referred to as “realistic image”) obtained by using the generally known image processing technique to arrange the vector model in step S420. To convert to. Specifically, the adjustment of the vector model here may be performed by, for example, a hostile generation network (sometimes referred to as a generative Adversary Network or GAN). As an example, the image processing unit 116 compares the vector model arranged in the background image with a model stored in the image / model database of the storage unit 120 of the central server 100, and the similarity between the vector model and the model is one of them. The highest image may be selected. For example, as shown in FIG. 5, the vector model 531 may be converted into a realistic image 532 such as a person holding a white cane. The final image may be generated by superimposing the selected image on the vector model.

Next, in step S460, machine learning training is performed using the final image generated in step S440. The machine learning here may be performed by the machine learning unit 118 of the central server 100. As described above, the final image obtained by combining the background image and the detection target image may be performed using a technique for training a neural network such as a hostile generation network. Since the details of the machine learning training process will be described later, a description thereof is omitted here.

Next, in step S480, the machine learning system trained in S460 is actually applied. For example, a machine learning system trained with images generated by the method of the present embodiment is used for actual training such as fully automatic driving car accident detection, structural crack detection, natural disaster simulation, etc. This is useful when it is difficult to obtain image data.

In this way, a good quality image for machine learning can be obtained by arranging and adjusting the vector model.

Next, an example of vector model / image association according to the first embodiment of the present invention will be described with reference to FIG.

As described above, the image generation according to the present invention may use a vector model and a realistic image corresponding to the vector model. Here, the association between a realistic image and a vector model will be described. First, a detection target that is a source of the detection target image 641 can be obtained by being photographed in a laboratory environment. Next, a generally known edge / orientation detection algorithm such as Openpose is applied to the detection target image 641 to generate a vector model superimposed on the detection target image 641. . Next, the vector model 643 and the detection target image 641 are stored in the image / model database 124 of the storage unit 120 as a model / image pair 645 associated with each other, whereby each of the processing units 110 of the central server 100 is stored. The functional part can be made accessible.

As described above, there is an advantage that the hostile generation network can be easily trained by storing the vector model and the realistic image in association with each other.

Next, an example of a process for establishing a detection target image according to the first embodiment of the present invention will be described with reference to FIG.

As shown in FIG. 7, an image showing the state of the station platform is acquired as the background image 301. As described with reference to FIG. 3, the camera parameters of the background image 301 are calculated based on reference objects such as the track 302 and the ticket gate 303. Next, the model creation unit 114 places a vector model 304 corresponding to the detection target image requested by the user in the background image 301 in accordance with the position, size, and orientation specified by the calculated camera parameter. Next, the image processing unit 116 generates a realistic image 305 corresponding to the vector model 304 and inserts it into the background image 301 at the same position, size, and orientation as the vector model 304. Note that at this stage, image processing such as light adjustment and edge harmony may be performed in order to merge this realistic image into the background image 301. In this manner, the final image 309 is obtained by combining the realistic image 305 corresponding to the detection target image and the background image 301. Further, as described above, this final image 309 may be used to train a machine learning method such as a neural network or a support vector machine. As described above, even when the number of detection targets is small and it is difficult to obtain, the machine learning image can be generated by using the invention of this embodiment.

In this embodiment, an example in which one detection target image is combined with a background image has been described. However, the present invention is not limited to this, and a plurality of detection target images are arranged in one background image by repeating the above steps. It is also possible to do.

Next, the flow of the image generation method according to the embodiment of the present invention will be described with reference to FIG.

First, in step S400, a background image is acquired. In step S420, the vector model is placed on the background image. In S440, the vector model is adjusted. Since these steps are substantially the same as the image generation method described with reference to FIG. 4, description thereof is omitted here.

In the above, an example in which the generated final image is used for machine learning training has been described. However, the present invention is not limited to this, and the final image may be used for another purpose. Accordingly, in step S800, the final image generated is provided after adjusting the vector model in step S440. This final image may be provided, for example, to the party who requested the image generation, or may be transmitted to a third party. In addition to machine learning training, the final image may be applied to advertising, face recognition, object detection, image processing, and the like. As described above, the image generation method according to the present invention may be applied not only to train machine learning but also to various fields.

Next, an example of processing for establishing a detection target image according to the second embodiment of the present invention will be described with reference to FIG.

Depending on the conditions for image generation request, it may be difficult or unnecessary to generate a model (such as a vector model) representing the detection target image. For example, as an example, when it is not necessary to describe detailed elements (for example, color, shape, structure, etc.) of the detection target image, in order to reduce the file size of the final image, a vector model or a realistic image is used. There are cases where a poor image may be substituted. Therefore, the process for establishing the detection target image according to the second embodiment of the present invention is not a vector model or a realistic image, but simply inserting a partial image into the background image as a detection target image, Solve the above problems.

In FIG. 9, an image showing the state of the station platform is acquired as the background image 301. In FIG. 9, as in FIG. 7, the background image 301 includes reference objects such as the track 302 and the ticket gate 303. Next, in the second embodiment, the position / size specified in the camera parameter (or the image generation condition input by the user) calculated based on the reference object such as the track 302 and the ticket gate 303 is set. Accordingly, the partial image 314 is defined as the detection target image. This partial image is, for example, an image that is made in an arbitrary size and shape and becomes a certain area in the background image. Although the partial image 314 shown in FIG. 9 is shown as a rectangular area, the shape of the area of the partial image 314 according to the present invention is not limited to a rectangle, and may be any shape. . As described above, by setting the background image 301 into which the partial image 314 is inserted as the final image 315, a final image having a smaller file size than the final image obtained by the image generation method described in the first embodiment can be obtained. Also, as shown in FIG. 9, an image that matches the size of the partial image 314 (for example, an image stored in the image database 122 or an image selected by the user) is inserted into the area of the partial image 314. Also good.

Thus, according to the second embodiment, an effect of suppressing the file size of the image is obtained.

Next, an example of the detection target image establishment process according to the third embodiment of the present invention will be described with reference to FIG.

Depending on the selected background image, it may be difficult to generate a model (such as a vector model) representing the detection target image because an object to be detected is unclear or incomplete. For example, as an example, if a part of the object to be detected is blurred, cut off, or appears in multiple images, it is difficult to create an accurate vector model, which is used for machine learning training. The generated image cannot be generated. Therefore, the process for establishing the detection target image according to the third embodiment of the present invention solves the above problem by inserting or replacing a clear partial image as a detection target image in the background image.

As shown in FIG. 10, an image indicating the state of the platform at the station where a candidate object (for example, a person) 324 as a detection target image is captured is acquired as the background image 301. However, since this candidate object 324 is, for example, unclear or partially missing, it may be difficult to generate a model that accurately represents this candidate object 324. In such a case, in the process for establishing the detection target image of the present embodiment, the partial image 325 is drawn so as to surround the whole or part of the candidate object 324. The partial image 325 may be designated by a user via a GUI or the like, or may be automatically generated by the machine learning unit 118. Next, the image processing unit 116 of the central server 100 performs image processing on the area where the partial image 325 is designated, thereby finishing the candidate object 324 as the detection target image 326, so that the detection target image 326 is displayed. By using the reflected background image 301 as the final image 329, a final image that can be used for machine learning is obtained. The partial image 325 here is substantially the same as the partial image 314 described in the third embodiment.

As described above, according to the third embodiment, even when the detection target image is unclear or incomplete, it is possible to generate a high-quality image and obtain an effect of pushing the file size of the image.

Next, an example of a machine learning image creation capability improving process according to the fourth embodiment of the present invention will be described with reference to FIG.

As described above, an aspect of the present invention relates to the use of an image generated by the above-described image generation method for machine learning training. Hereinafter, an example of improving the image generation capability of machine learning will be described with respect to a hostile generation network, but the present invention is not limited thereto, and may be applied to any machine learning method such as a support vector machine.

The hostile generation network is a network composed of two networks, a generation network (generator) and an identification network (discriminator), and learns by competing two data sets. Specifically, when a pair of a basic basic image and a target image desired to be generated in a network is input, the generation side generates and outputs a created image as a result. It is better that the created image is similar to the target image. The identification side compares the created image with the target image to determine the accuracy of the created image. Thus, the generator learns to deceive the discriminator, and the discriminator learns to identify more accurately

In the present embodiment, first, in step 1110, a detection target that is a source of the detection target image is photographed in a laboratory environment to obtain a detection target image. For example, as shown in FIG. 11, by photographing a person who has a white color, a person having a white color can be obtained as a detection target image. Next, in step 1120, a vector corresponding to the detection target image is applied by applying a generally known edge / orientation detection algorithm such as Openpose to the detection target image obtained in step 1110. A model is generated. For example, when the detection target image is a person, as shown in FIG. 11, a vector model representing a person's head, shoulders, arms, torso, legs, and the like may be generated. In addition, when the detection target has an edge such as a suitcase or a car, an edge extraction technique may be applied. These edges may be expressed by splines or the like.

In step 1130, the detection target image captured in step 1110 and the vector model generated in step 1120 are stored in the image / model database 124 of the storage unit 120 as a model / image pair associated with each other. It may be saved. Next, in step 1140, the background image in which the vector model is arranged is input as a basic image to a hostile generation network (sometimes called a second neural network). In step 1150, the generation network of the hostile generation network converts the vector model into a realistic image based on the model / image pair associated in step 1130, thereby realizing a realistic corresponding to the vector model. An image in which a simple image is reflected in the background image is created as a creation image.

Next, the hostile generation network compares the detection target (target image) photographed in step 1110 with the created image created in step 1150. Specifically, the identification network of the hostile generation network compares the metadata of the target image and the created image (information that defines the position, shape, size, property, etc. of the object in the image). Also good. Further, the identification network may compare the target image and the created image using a predetermined similarity criterion. This similarity criterion may be, for example, a threshold of the degree to which two or more images are similar to each other. If the target image and the generated image meet a predetermined similarity criterion (that is, if it is determined that the target image and the generated image are sufficiently similar to each other), the parameters of the hostile generation network are adjusted. The This parameter adjustment includes, for example, setting the conditions used for creating the created image so as to be applied to other image generation.

In this way, the basic image is input to the hostile generation network, a created image is created based on the basic image, the created image and the target image are compared, and the created image and the target image achieve a predetermined similarity criterion. In this case, a hostile generation network that can generate a high-quality final image can be obtained by adjusting the parameters of the hostile generation network.

Next, an example of an image generation method according to the fifth embodiment of the present invention will be described with reference to FIGS.

According to the image generation method of the present invention, it is possible not only to generate one model for one detection target, but also to generate a plurality of detection target models for one detection target. As illustrated in FIG. 12, one detection target 1203 may be captured by a plurality of

cameras

1207, 1208, and 1209. As described above, the detection target images captured by the

respective cameras

1207, 1208, and 1209 and the background images captured by the

respective cameras

1207, 1208, and 1209 are used for the image generation method described above, thereby performing the same detection. A final image showing the object 1203 from different perspectives can be generated.

Also, when the detection target moves, it is necessary to represent the verification target model as an image series in order to express the movement of the detection target. For example, assume that the detection target 1213 advances in the direction indicated by the arrow 1215 as shown in FIG. The movement of the detection target 1213 is imaged by the camera 1217. Therefore, by using the image captured by the camera 1217 in the image generation method described above, a detection target model (such as a vector model) can be generated for each frame of movement of the detection target 1213. By performing image processing using a neural network for each of these detection target models, an image series that smoothly represents the movement of the detection target 1213 can be obtained. Note that not only when the detection target moves, but also an image showing the same detection target in different illumination environments (for example, morning and night, or natural light and artificial light) can be generated.

In addition, although the example which produces | generates the detection target image which looked at the single detection target from the different viewpoint was demonstrated here, this invention is not limited to it, The detection target image model showing a several different object is the same background image It is also possible to combine them. Specifically, the model creation unit 114 may generate a first detection target model corresponding to the first detection target image and a second detection target image model corresponding to the second detection target image. Next, as described above, the model creation unit 114 acquires the first background image. Finally, the model creation unit 114 may insert the first detection target image model and the second detection target image model into the first background image.

Thus, by generating a plurality of detection target models for one detection target or generating an image series as a verification target model, an image with a high training effect can be obtained.

Next, an example of a machine learning detection accuracy improving process according to the sixth embodiment of the present invention will be described with reference to FIG.

As described above, an aspect of the present invention relates to using an image generated by the above-described image generation method for machine learning training. Hereinafter, an example of improving the object detection accuracy of machine learning will be described with respect to a neural network such as Faster-RCNN and SVM. However, the present invention is not limited to this and may be applied to any object detection algorithm or machine learning method. .

First, in order to optimize the first neural network (also referred to as an object detection neural network), metadata associated with the detection target image model is provided to the object detection neural network. As described above, the metadata may be information defining characteristics such as the position, shape, size, and property of the detection target model 402 in the image 401. The image 401 may be a final image generated by any of the image generation methods described above (for example, the final image 309 in FIG. 3, the final image 315 in FIG. 9, the final image 329 in FIG. 10, etc.). Alternatively, not only the metadata of the detection target model 402 but also the image 401 including the detection target model may be provided to the object detection neural network.

Next, the target image 404 is provided to the object detection network. The target image 404 is, for example, an image including a target object 405 that is the same as or similar to the detection target model 402 shown in the image 401, and is an image that is a target for object detection. Next, the object detection network performs object detection optimization 403 on the target image 404 and attempts to identify the target object 405 from the target image 404 based on the metadata of the detection target model 402. Specifically, the object detection network compares the metadata of the detection target model 402 with the object shown in the target image 404, and specifies the object having the highest matching with the metadata. As shown in FIG. 10, the object detection network may be indicated by a square area 406 or the like surrounding the specified target object 405.

Next, the specific accuracy of the object detection network is calculated based on the result of the object detection. This identification accuracy evaluates factors such as how much the object identified by the object detection network matches the detection target model 402, whether all target objects have been identified, or objects other than the target object have been incorrectly identified. The result is a process of expressing the result in a quantitative form. For example, the degree of specification may be expressed as a percentage such as 75% or 91%. As an example, when 9 out of 10 target objects are correctly identified, the calculated identification accuracy may be 90%. Next, the calculated specific accuracy may be compared with a predetermined specific accuracy criterion (a predetermined accuracy threshold). If the calculated specific accuracy does not achieve a predetermined specific accuracy criterion, it may be determined that the object detection optimization described above is repeated (that is, better specific accuracy by repeating object detection). Seeking).

In this way, the metadata associated with the detection target image model is provided to the object detection network, and based on the metadata, the detection target image is specified from the target image to the object detection network. Thus, by calculating the specific accuracy and optimizing the object detection, an effect of improving the detection accuracy of the object detection network can be obtained.

Next, an example of the system architecture according to the embodiment of the present invention will be described with reference to FIG.

As described above, the present invention may be configured as a client / server architecture. Specifically, as illustrated in FIG. 15, the user 1401 may specify a desired background image and a desired detection target via a terminal 1402 such as a computer, a tablet PC, or a smartphone. Next, the server on the cloud 1403 may generate a final image using the detection target 1409 specified by the user 1401 and the data stored in the background image 1408 and / or the storage unit 1404.

As another system architecture, a configuration not including the terminal 1402 is also possible. In this case, the camera 1405 may be directly connected to the cloud 1403, and an image or video captured by the camera 1405 may be transmitted to the image generation service provider without going through the user terminal. In this case, the user may contact a desired detection target using another means such as an e-mail, a telephone, or a smartphone.

100 central server, 110 processing unit, 112 image selection unit, 114 model creation unit, 116 image processing unit, 118 machine learning unit, 120 storage unit, 122 image database, 124 image / model database, 130 client terminal, 140 client terminal

Claims

An image generation method comprising:
A background image acquisition step of acquiring a background image by the image selection unit;
A detection target image specifying step of specifying, from the source image, a detection target image including metadata by the image selection unit;
A model generation step of generating a detection target image model corresponding to the detection target image by the model creation unit;
A detection target image establishing step of establishing a final image by combining the background image and the detection target image model by the model creating unit.
The image generation method according to claim 1, wherein the metadata includes information indicating a position of the detection target image, information indicating a shape and a size of the detection target image, and information indicating a property of the detection target image. .
The image generation method according to claim 1, wherein the detection target image model is selected from a vector model, a 3D model, and a point cloud model.
The model generation step includes
A vector model generating step for generating the vector model corresponding to the detection target image by the model generating unit;
The image generation method according to claim 3, wherein a machine learning unit performs image processing on the vector model to generate the detection target image model.
The detection target image establishment step includes:
An object identification step for identifying a reference object from the background image;
A camera parameter calculation step for calculating a camera parameter based on a dimension element of the reference object;
The image generation method according to claim 1, wherein a final image is established by a combining step of combining the detection target image model with the background image based on the calculated camera parameter.
An image generation method including a machine learning detection accuracy improving step,
The machine learning detection accuracy improving step is:
In order to optimize the first neural network, metadata associated with the detection target image model is provided to the first neural network, and the detection target image is selected from among target images based on the metadata. Detecting target image specifying training step for causing the first neural network to specify
According to the result of the detection target image identification training step, a specific accuracy calculation step for calculating a specific accuracy,
A specific accuracy determination step that determines that the detection target image specific training step is repeated when the specific accuracy does not achieve the predetermined specific accuracy criterion by comparing the specific accuracy with a predetermined specific accuracy criterion. The image generation method according to claim 1.
An image creation method including a machine learning image creation capability improvement step,
The machine learning image creation capability improvement step inputs a basic image to a second neural network, and creates a creation image creation step of creating a creation image based on the basic image;
A comparison step of comparing the created image with a target image;
When the created image and the target image achieve a predetermined similarity criterion, a parameter adjustment step of adjusting a parameter of the second neural network;
The image generation method according to claim 1, comprising:
The image generation method according to claim 7, wherein the second neural network is a hostile generation network.
The detection target image establishment step includes:
A first target model generation step of generating a first detection target image model corresponding to the first detection target image by the model creating unit;
A second target model generation step of generating a second detection target image model corresponding to the second detection target image by the model creation unit;
A first background image acquisition step of acquiring a first background image by the model creating unit;
Inserting the first detection target image and the second detection target image with respect to the first background image;
The image generation method according to claim 1, comprising:
The image generation method according to claim 1, wherein the detection target image model includes an image series corresponding to the detection target image.
The detection target image establishment step includes:
By the model creation unit,
If there are blurry parts in the source image,
The image generation method according to claim 1, wherein the final image is generated by replacing or inserting the portion with another clear image.
An image generation device,
An image selection unit that acquires a background image and identifies a detection target image including metadata from a source image;
A model generation unit for generating a detection target image model corresponding to the detection target image, and establishing a final image by combining the background image and the detection target image model;
An image generation apparatus having
The image generation apparatus according to claim 12, wherein the detection target image model is selected from a vector model, a 3D model, and a point cloud model.
The model creation unit
Generating the vector model corresponding to the detection target image;
The image generation device includes:
The image generation apparatus according to claim 13, further comprising a machine learning unit that performs image processing on the vector model to generate the detection target image model.
An image generation system in which a central server and a client terminal are connected via a network,
The client terminal has an image selection unit,
The central server has a model creation unit,
The image selection unit
A background image is obtained by user input,
Identify the detection target image with metadata from the source image,
Transmitting the background image and the detection target image to a central server;
The central server is
Receiving the background image and the detection target image from a client terminal;
The model creation unit
Generating a model for generating a detection target image model corresponding to the detection target image;
Establishing a final image by combining the background image and the detection target image model;
An image generation system characterized by that.