WO2021039192A1

WO2021039192A1 - Image processing apparatus, image processing method, and program

Info

Publication number: WO2021039192A1
Application number: PCT/JP2020/027931
Authority: WO
Inventors: 高橋　修一
Original assignee: ソニー株式会社
Priority date: 2019-08-30
Filing date: 2020-07-17
Publication date: 2021-03-04

Abstract

An interpolated image fits well in an additional information removal area of an input image.　The interpolated image to fit in the removal area of the input image is generated using a generator neural network, and the interpolated image fits in the removal area of the input image, thereby obtaining an output image. Since the interpolated image to fit in the removal area of the input image is generated using the generator neural network, the interpolated image having reduced visual discomfort can fit in the removal area of the input image without using a special signal for the interpolation.

Description

Image processing equipment, image processing methods and programs

The present technology relates to an image processing apparatus, an image processing method and a program, and more particularly to an image processing apparatus for fitting an interpolated image into an additional information removal area of an input image.

In the image, not only the image information but also additional information such as logo, telop, graphics, etc. are superimposed and displayed to provide the viewer with explanation and supplementary information. For example, the name of a person superimposed on an image, the location of a landscape, the logo mark of a program or a producer, and the like. In addition to that, information that requires breaking news, such as news and weather information, may be superimposed and displayed regardless of the image.

A technique for removing such additional information superimposed and displayed in an image has been conventionally provided. For example, in Patent Document 1, a pixel constituting a character of a telop is detected to detect the telop, and the pixel after removing the pixel is interpolated (spatial interpolation) with the information of another pixel in the same frame. Alternatively, a processing device that interpolates (time interpolation) with information of the same pixel in another frame is disclosed.

Further, in Patent Document 2, based on the fineness and roughness of the texture in the vicinity of the telop, the pixels in the telop region are interpolated by propagating the pixel values outside the region into the region or repeatedly copying several pixels in the vicinity. The method of erasing the telop is disclosed. Further, Patent Document 3 discloses a video / audio signal recording device that erases and interpolates the telop display inserted in the video signal by using an interpolated signal including the original video signal information in which the telop is not inserted.

Japanese Unexamined Patent Publication No. 2014-212434 Japanese Unexamined Patent Publication No. 2006-148263 Japanese Unexamined Patent Publication No. 9-200684

In the invention shown in Patent Document 1, when the pixel group of the telop or graphics that is always displayed in the image is removed, the pixel group does not change with time, so that time interpolation cannot be performed and spatial interpolation is applied. I have no choice. However, this Patent Document 1 does not specifically describe what kind of spatial interpolation is performed.

In the invention shown in Patent Document 2, since only the pixel values in the vicinity of the region determined to be a telop are interpolated, what was originally displayed in that region is not considered. Therefore, only the information having similar pixel values is interpolated, and the interpolated content itself becomes irrelevant to what was originally displayed. Further, depending on the pattern in the vicinity of the region to be interpolated, an unnatural pattern may appear due to the propagation of pixel values or repeated copying.

In the invention shown in Patent Document 3, a special signal called an interpolated signal is required to remove the telop, but such an interpolated signal usually does not exist and is not realistic.

The purpose of this technique is to fit the interpolated image well into the additional information removal area of the input image.

The concept of this technology is
An image processing apparatus including a processing unit that generates an interpolated image to be fitted in a removal region of an input image using a generation neural network and fits the interpolated image into the removal region of the input image to obtain an output image.

In the present technology, the processing unit generates an interpolated image to be fitted in the removal area of the input image using a generation neural network, and fits this interpolated image in the removal area of the input image to obtain an output image. Will be

For example, the processing unit is a segmentation processing unit that performs segmentation processing on the input image to obtain area information of each subject included in the image, and a processing unit on the input image based on the information of the removal area and the area information of each subject. A training data area specification unit that specifies a data area that can be used for training data required for training by a generation neural network, and a training data generation that generates training data from an input image based on this data area. A unit, a generation network learning unit that learns a generation neural network using this training data, an interpolation image generation unit that generates an interpolation image to be fitted in a removal area using this generation neural network, and this interpolation. It may be configured to have an image integration unit that fits the image into the removal area of the input image to obtain the output image.

In this case, for example, the generating network learning unit further learns from the learning data based on the generating neural network that has been trained in advance to generate an image of the same type as the subject corresponding to the removal region. May be made. This makes it possible to efficiently train the learning of the generating neural network that generates the interpolated image to be fitted in the removal region with a small amount of learning data.

As described above, in this technique, an interpolated image to be fitted in the removal region of the input image is generated by using the generating neural network. Therefore, it is possible to fit an interpolated image with reduced visual discomfort in the removal region of the input image without using a special signal for interpolation.

Note that, in the present technology, for example, a detection unit for obtaining information on the removal region included in the input image may be further provided. In this case, for example, the detection unit may obtain information on the removal region based on information from the outside. This makes it possible to improve the accuracy of the information in the removed area.

Further, in the present technology, for example, a processing update unit that updates the function of the processing unit based on the evaluation information of the output image may be further provided. This makes it possible to improve the interpolated image generated by the processing unit to a more appropriate one. In this case, for example, the processing update unit may update the function of the processing unit based on the external setting information. This makes it possible to make improvements efficiently.

It is a block diagram which shows the structural example of the television receiving apparatus as an embodiment. It is a block diagram which shows the structural example of the image processing part. It is a figure which shows an example of the image state in each part of an image processing part. It is a block diagram which shows the structural example of the detection part. It is a block diagram which shows the structural example of a processing part. It is a flowchart which shows an example of the processing procedure of an image processing unit. It is a block diagram which shows the other structural example of a detection part. It is a block diagram which shows the other structural example of a processing part. It is a block diagram which shows the other structural example of a processing part.

Hereinafter, embodiments for carrying out the invention (hereinafter referred to as “embodiments”) will be described. The explanation will be given in the following order.
1. 1. Embodiment 2. Modification example

<1. Embodiment>
[TV receiver configuration]
FIG. 1 shows a configuration example of the television receiving device 10 as an embodiment. The television receiving device 10 includes a receiving antenna 101, a digital broadcast receiving unit 102, a display unit 103, a recording / reproducing unit 104, an image processing unit 105, a CPU 106, and a user operation unit 107.

The CPU 106 controls the operation of each part of the television receiving device 10. The user can perform various operation inputs by the user operation unit 107. The user operation unit 107 includes a remote control unit, a touch panel unit that performs operation input by proximity / touch, a mouse, a keyboard, a gesture input unit that detects operation input with a camera, a voice input unit that is operated by voice, and the like.

The digital broadcast receiving unit 102 processes the television broadcast signal input from the receiving antenna 101 to obtain an image signal related to the broadcast content. The recording / reproducing unit 104 records the image signal obtained by the digital broadcasting receiving unit 102 and reproduces it at an appropriate timing. The display unit 103 displays an image based on the image signal obtained by the digital broadcast receiving unit 102 or the image signal reproduced by the recording / reproducing unit 104.

The image processing unit 105 performs image processing on the image signal reproduced by the recording / reproducing unit 104, returns the processed image signal to the recording / reproducing unit 104, and causes the image signal to be recorded as the processed image signal. The image processing unit 105 detects an area of additional information such as a logo, telop, and graphics from the input image as a removal area, generates an interpolation image to be fitted in the removal area using a generation neural network, and performs this interpolation. The process of fitting the image into the removal area of the input image to obtain the output image is performed.

The processing of the image processing unit 105 is performed with the operation of reproducing the image signal before processing and recording the image signal after processing in the recording / reproducing unit 104. At this time, the image by the reproduced image signal is displayed. It can be displayed on the unit 103 or not displayed. In the case of displaying and performing, it is also possible to selectively instruct additional information to be removed by user operation. It is also conceivable that the image processing unit 105 performs image processing in real time on the received image signal obtained by the digital broadcast receiving unit 102.

FIG. 2 shows a configuration example of the image processing unit 105. The image processing unit 105 has a detection unit 200, a removal unit 300, and a processing unit 400. The input image signal is supplied to the detection unit 200 and the removal unit 300. The detection unit 200 obtains information on a removal area (for example, an area of additional information such as a logo, telop, graphics, etc.) included in the input image. The information of this removal area is supplied to the removal unit 300 and the processing unit 400.

The removal unit 300 refers to the information of the removal area and obtains an image obtained by removing the image of the removal area from the input image. The image signal output from the removal unit 300 is supplied to the processing unit 400. The processing unit 400 performs segmentation processing on the input image to obtain information on the area and type of each subject included in the image, refers to this information and the information on the removal area, and uses a biological neural network to obtain information on the input image. An interpolated image to be fitted in the removal area is generated, and the interpolated image is fitted in the removal area of the input image to obtain an output image. The output image signal is output from the processing unit 400.

FIG. 3 shows an example of the image state in each part of the image processing unit 105 shown in FIG. FIG. 3A shows an input image of the detection unit 200. As shown in FIG. 3B, the detection unit 200 detects the logo of the program existing in the upper left of the input image, and the area of the logo is set as the removal area. Then, as shown in FIG. 3C, the removal unit 300 obtains an image obtained by removing the image of the removal region from the input image.

As shown in FIG. 3D, the processing unit 400 performs segmentation processing on the input image to obtain information on the area and type of each subject included in the image. In this case, the input image is divided into each subject area, and a label indicating the type is given to each area. In the illustrated example, it is divided into bridge, sky, mountain, and water areas.

Further, in the processing unit 400, the information on the area and type of each subject and the information on the removal area are referred to, and as shown in FIG. 3E, the interpolated image for fitting in the removal area is the biological neural network. Is generated using.

In this case, since the removal region exists in the bridge region, the processing unit 400 determines that it is desirable to interpolate the bridge information in the removal region, and the biological neural network that has learned the bridge information in advance. A network is used to generate a plausible bridge texture as an interpolated image. FIG. 3F shows an output image obtained by fitting the interpolated image obtained by the processing unit 400 into the removal region of the input image.

Since it is impossible to know what kind of image it was originally, it is not possible to generate a complete correct image and fit it in the removal area. However, unlike the conventional method, the pixel information of the peripheral pixels and the front and rear frames is not referred to as it is, but the pattern is generated in consideration of the meaning in the image, so that unnaturalness and discomfort can be significantly suppressed.

Here, an example of the processing policy of the detection unit 200 is shown below.
-Detect target areas such as telops and graphics.
-Detects telops and graphics that are displayed constantly in a specific area (upper left, upper right, or lower part of the image) or are displayed non-steadily on the entire screen.
-Compare images across multiple chapters, and determine that the area on the screen that does not change even across chapters is the logo of the program that is constantly displayed in that area.

-The presence or absence of telop may be determined based on the frame difference before and after.
-If it is a broadcast program, it is possible to predict in which area the telop or graphics will be displayed by analyzing the past broadcast, so the analysis result may be used.
-You may refer to the viewer's setting information such as "remove only the program logo without erasing the telop" and "remove the telop and the program logo, but do not remove the graphics".

An example of the processing policy of the processing unit 400 is shown below.
-Insert a new texture etc. into the area where the telop etc. have been removed.
-Generate and fit a plausible pattern using a generating neural network.

-When interpolating, do not use temporal information (information on the previous and next frames) and spatial information (information on the vicinity of the same frame and other areas) in the area to be fitted.
-For learning of the generating neural network, other areas that are not the interpolation target in the image to be processed are used as learning data. This is because there is a high possibility that spatially similar information is included within the same frame.

-Using the segmentation information of the scene, the texture with high probability of the interpolated area may be estimated, and the generated network or the training data of the network may be selected according to the result.
-The segmentation information of the scene may be used, and the information of the region having the same segmentation result as the region to be interpolated may be used for the learning data. Since learning can be performed based on information that has a higher correlation with the interpolation area, there is a high possibility that a more plausible pattern can be generated.

In the image processing unit 105 shown in FIG. 2, the removal unit 300 is not always necessary. This is because when the interpolated image is fitted into the removal region of the input image in the processing unit 400, the image in the removal region of the input image is substantially removed.

Further, a part or all of the processing of each part of the image processing unit 105 can be performed by software processing by a computer.

FIG. 4 shows a configuration example of the detection unit 200. The detection unit 200 includes a change area determination unit 201, a chapter information recording unit 202, a change information recording unit 203, a removal area position identification unit 204, a removal information type determination unit 205, and a removal information recording unit 206. doing. Here, the removal area position specifying unit 204 and the removal information type determination unit 205 constitute a removal area determination unit.

The change area determination unit 201 refers to a group of input images and determines an area with a change and an area without a change in the image. For the judgment, the information about one or more chapters given according to the content of the input image is referred to, and the region where there is no change even if the chapters are straddled is judged as the region where there is no steady change, or the time even within the chapter. A region where a change is seen every time is determined to be a region where there is a steady change.

For example, when detecting a program logo, the following measures can be taken.
-The logo is often displayed throughout the program, and even if the chapter changes, the area does not change. Therefore, the area that does not change across chapters is judged to be the logo (information that is constantly displayed). ..
-If it is a specific program, the area where the logo is displayed can be specified almost uniquely, so the position of the logo is determined by acquiring information about the logo from the outside.

In addition, for example, when detecting telop (explanatory information), the following measures can be taken.
-For telops, the presence or absence of display and the displayed contents change with time depending on the displayed contents. However, considering the position of the lower right, it can be judged that the information with high urgency is not displayed, so that it can be judged that the non-stationary information without urgency is displayed.
-If it is a specific program, the area where the telop is displayed can be specified almost uniquely, so the position of the explanatory information is determined by acquiring the information about the telop from the outside.

The chapter information recording unit 202 records information about the chapter. The change information recording unit 203 records information regarding the presence or absence of a change in each area with respect to the input image, which is determined by the change area determination unit 201.

The removal area position specifying unit 204 identifies the position of the area to be removed from the input image based on the information regarding the presence or absence of change for each area recorded in the change information recording unit 203, and uses it as the position information of the removal area. Output. The removal information type determination unit 205 is based on the information regarding the presence or absence of a change for each area recorded in the change information recording unit 203, and what kind of information the area has, for example, a program. It is determined whether the logo or telop (explanatory information) is urgent or not, and is output as the type information of the removal information.

The removal information recording unit 206 records information on the removal area of the input image. That is, the removal information recording unit 206 integrates and records the position information of the removal area and the type information of the removal information. The detection unit 200 outputs the information of the removal area recorded in the removal information recording unit 206 as an output signal.

FIG. 5 shows a configuration example of the processing unit 400. Segmentation processing unit 401, segmentation information recording unit 402, learning data area designation unit 403, learning data generation unit 404, learning data recording unit 405, generation network learning unit 406, and setting parameter recording unit. It has a 407, a generation model recording unit 408, an interpolation image generation unit 409, and an image integration unit 410. Here, the learning data generation unit 404 and the generation network learning unit 406 form a learning unit, and the interpolation image generation unit 409 and the image integration unit 410 constitute an interpolation processing unit.

The segmentation processing unit 401 performs segmentation processing on the input image. The segmentation processing unit 401 divides an area for each subject, and then assigns a label indicating what the subject is, that is, the type of the subject, for each area. The segmentation information recording unit 402 records the area information of the subject obtained by the segmentation process and the label information in association with each other.

The learning data area designation unit 403 refers to the information of the removal area supplied from the detection unit 200 or referred to from the detection unit 200, and further, the area information and the label information of the subject obtained by the segmentation process described above. , In order to construct the training data necessary for learning by the network for interpolating the removal area, the area of data that can be used for the training data is specified on the input image.

Specifically, for example, in the example shown in FIG. 3, it is specified that the segmentation including the removed area is a bridge, and the entire area determined to be a bridge on the input image is used to construct training data. Specify as the data to use. As a result, the same image as the image having the removed region can be used for the training data, so that texture generation with extremely high reproducibility can be performed. Further, it is not necessary to be a still image, and by utilizing the information of continuous frames of a moving image, the amount of training data can be increased, and it is possible to generate a texture with higher reproducibility accuracy.

The learning data generation unit 404 configures the learning data by referring to the area of the input image that can be used for the learning data specified by the learning data area designation unit 403. When constructing the training data, either the training data is composed only of this input image, the data widely acquired based on a predetermined label (for example, a bridge) is diverted, or both are combined. You may take measures.

The learning data recording unit 405 records the learning data generated by the learning data generation unit 404. The generating network learning unit 406 reads the learning data from the learning data recording unit 405 and learns the generating neural network used for the interpolation processing. In this case, for example, further learning may be performed using the training data based on a generating neural network that has been trained in advance to generate an image of the same type as the subject corresponding to the removal region. This makes it possible to efficiently train the learning of the generating neural network that generates the interpolated image to be fitted in the removal region with a small amount of learning data.

As this generating neural network, a hostile generation network (GAN: Generative Adversarial Network) is known as a typical example. Although detailed description is omitted, as is well known in the past, a hostile generation network (GAN) has a structure in which a generator and a discriminator in the network compete with each other in an adversarial manner. It is possible to improve the performance of the generator by repeatedly competing the generator and the classifier. In this technique, learning of both the generator and the classifier is required. However, the generator may be trained in advance so that many patterns can be generated, and additional learning may be performed in order to specialize in the input image.

The setting parameter recording unit 407 records parameters related to the type of learning data used by the learning data generation unit 404 and the specifications of the learning data, and parameters related to learning such as the number of learning times and the learning rate in the generation network learning unit 406.

The generation model recording unit 408 records the network model generated by the generation network learning unit 406. Here, a plurality of network models may be recorded. The interpolated image generation unit 409 selects an appropriate network model from the generation model recording unit 408, and generates a texture most suitable for the removal area as an interpolated image. The image integration unit 410 fits the interpolated image generated by the interpolation image generation unit 409 into the removal area of the input image, integrates it as a single image, and outputs it as an output image.

In the above description, the detection unit 200 is also arranged in the image processing unit 105 like the processing unit 400, but all of the detection unit 200 is arranged outside the image processing unit 105, and the removal area is specified. The same effect can be obtained by inputting the input image to the processing unit 400 arranged inside the image processing unit 105.

The flowchart of FIG. 6 shows an example of the processing procedure of the image processing unit 105 shown in FIG. In this example, the process of removing the image in the removal region from the input image in the removal unit 300 is omitted. The processing procedure of the image processing unit 105 is roughly divided into two steps, a detection step and a processing step.

First, the detection step, which is the process of the detection unit 200, will be described. The detection step is composed of a change area determination step ST1, a removal area position identification step ST2, and a removal information type determination step ST3.

In the change area determination step ST1, the area where there is a change and the area where there is no change in the image are determined with reference to a group of input images. For the judgment, the information about one or more chapters given according to the content of the input image is referred to, and the region where there is no change even across chapters is determined to be a region where there is no steady change, or the time even within the chapter. A region where a change is seen every time is determined to be a region where there is a steady change.

In the removal area position specifying step ST2, the position of the area to be removed is specified from the input image and output as the removal position information. In the removal information type determination step ST3, it is determined what kind of information the area to be removed has (for example, whether it is a program logo or a telop, whether there is an urgency, etc.), and it is used as the type information. Output.

Next, the processing steps of the processing unit 40 will be described. The processing step is composed of a segmentation processing step ST4, a learning data area designation step ST5, a learning data generation step ST6, a generation network learning step ST7, an interpolation image generation step ST8, and an image integration step ST9. To.

Segmentation processing In step ST4, segmentation processing is performed on the input image. After dividing the area for each subject, a label indicating what the subject is, that is, a label indicating the type is given to each area.

In the learning data area designation step ST5, the removal information obtained in the removal area position identification step ST2 and the removal information type determination step ST3, and the area information and label information of each subject obtained in the segmentation processing step ST4 are referred to. In order to construct the training data necessary for learning by the network for interpolating the removal area, the area of data that can be used for the training data is specified on the input image.

In the learning data generation step ST6, the learning data is configured by referring to the area of the input image that can be used for the learning data specified in the learning data area designation step ST5. When constructing the learning data, any means may be taken: the learning data is composed only of this input image, the data widely acquired based on a predetermined label is diverted, or both are combined. ..

In the generating network learning step ST7, the learning data generated in the learning data generation step ST6 is read out, and the generating neural network used for the interpolation processing is learned. In the interpolated image generation step ST8, an appropriate generating system neural network learned in the generating system network learning step ST7 is selected, and the texture most suitable for the removal region is generated as an interpolated image. In the image integration step ST9, the interpolated image generated in the interpolation image generation step ST8 is fitted into the removal area of the input image, integrated as one image, and output as an output image.

As described above, in the image processing unit 105 shown in FIG. 2, the processing unit 400 uses a generating neural network to generate an interpolated image to be fitted in the removal region of the input image. Therefore, it is possible to fit an interpolated image with reduced visual discomfort in the removal region of the input image without using a special signal for interpolation.

Further, in the image processing unit 105 shown in FIG. 2, the generating network learning unit 406 of the processing unit 400 uses a generating neural network that has been pre-learned to generate an image of the same type as the subject corresponding to the removal region. It is something to learn. Therefore, it is possible to efficiently learn the learning of the generating neural network that generates the interpolated image to be fitted in the removal region with a small amount of learning data.

<2. Modification example>
[Other configuration examples of the detector]
FIG. 7 shows another configuration example of the detection unit 200. Here, the detection unit 200A will be described. In FIG. 7, the parts corresponding to those in FIG. 4 are designated by the same reference numerals, and detailed description thereof will be omitted as appropriate. Similar to the detection unit 200 shown in FIG. 4, the detection unit 200A also has the change area determination unit 201, the chapter information recording unit 202, the change information recording unit 203, the removal area position identification unit 204, and the removal information type determination. It has a unit 205 and a removal information recording unit 206.

In this detection unit 200A, the removal area position identification unit 204 and the removal information type determination unit 205 can refer to external information. As described above, the removal area position specifying unit 204 identifies and removes the position of the area to be removed from the input image based on the information regarding the presence or absence of change for each area recorded in the change information recording unit 203. Output as position information.

In addition to referring to the input image, (1) specify the area to be removed by the viewer himself, (2) refer to the information about the program being watched, and (3) relate to various events that occurred on that day. Referencing the information also helps to locate the area to be removed from the input image.

For example, regarding (1), when the viewer specifies that there is an area to be removed in the upper left area of the image, it is possible to determine whether or not there is information that can be removed in that area. Further, for example, regarding (2), by acquiring information on the appearance and display position of the program logo, or the insertion position and size of the telop from the information on the program being watched, it is removed in any area on the screen. It is possible to determine whether there is information that can be the target.

For example, regarding (3), it is predicted that breaking news, weather information, etc. will be displayed from information on events such as incidents, accidents, disasters, elections, etc. that occurred on and around the day when the program was broadcast, and they are It is possible to predict the position on the image where the information of is likely to be displayed. This makes it possible to accurately estimate the position of the region that can be removed.

Further, as described above, the removal information type determination unit 205 has what kind of information the area has based on the information regarding the presence or absence of change for each area recorded in the change information recording unit 203. Is determined and output as type information. By using the above-mentioned information (1) to (3), it is possible to estimate the type of information displayed in the area that can be removed with high accuracy.

[Other configuration examples of the processing unit]
FIG. 8 shows another configuration example of the processing unit 400. Here, the processing unit 400A will be described. In FIG. 8, the parts corresponding to those in FIG. 5 are designated by the same reference numerals, and detailed description thereof will be omitted as appropriate. Similar to the processing unit 400 shown in FIG. 5, this processing unit 400A also has a segmentation processing unit 401, a segmentation information recording unit 402, a learning data area designation unit 403, a learning data generation unit 404, and learning data. It has a recording unit 405, a generation network learning unit 406, a setting parameter recording unit 407, a generation model recording unit 408, an interpolation image generation unit 409, and an image integration unit 410.

Further, the processing unit 400A determines whether or not to update the functions of the evaluation unit 411 that evaluates the output image, the evaluation information recording unit 412 that records the evaluation information, and the processing unit 400A based on the evaluation information. Further, it has a processing update unit 413 for determining.

As described above, the image integration unit 410 fits the interpolated image generated by the interpolation image generation unit 409 into the removal area of the input image, integrates it as a single image, and outputs it as an output image. The evaluation unit 411 evaluates whether or not the output image is appropriate, and outputs the evaluation result as evaluation information.

For example, the evaluation can be directly or explicitly input by the viewer using a remote controller or a terminal, obtained by measuring the viewer's line of sight, emotion, or biological information, or inferred from the viewer's voice information. The method of doing this can be considered.

Based on the obtained evaluation information, the processing update unit 413 includes a segmentation processing unit 401, a learning data area designation unit 403, a learning data generation unit 404, a generation network learning unit 406, and a setting parameter recording in the processing unit 400A. After determining how to update the functions of the unit 407 and the like, if it is determined that the update is necessary, the functions are updated.

For example, for the segmentation processing unit 401, it is conceivable to update the segmentation method and the number of classifications. If it becomes possible to accurately recognize and label a region that cannot be classified by the conventional segmentation method, it becomes possible to generate and fit a texture with higher reproduction accuracy when performing interpolation for that region.

By updating the segmentation method and the number of classifications for the learning data area designation unit 403, the accuracy of the area to be designated as the learning data is improved, and the learning data composed of a specific label is mistakenly used. It is possible to avoid mixing the identified learning data, and the learning performance in the generation system neural network learning unit is improved.

For the learning data generation unit 404, either means of configuring the learning data only from the input image, diverting the data widely acquired based on a predetermined label, or combining both means. It is conceivable to correct the label. It is conceivable to update the network structure for the generating network learning unit 406.

For the setting parameter recording unit 407, parameters related to the type of learning data used in the learning data generation unit 404 and the specifications of the learning data can be updated, and learning such as the number of learning times and the learning rate in the generation network learning unit 406 can be performed. It is conceivable to update the parameters related to.

By the viewer evaluating the complemented processed image as described above, it is possible to evaluate the validity of the generated network used for interpolation and its learning data and continuously improve it.

[Other configuration examples of the processing unit]
FIG. 9 shows yet another configuration example of the processing unit 400. Here, it will be described as the processing unit 400B. In FIG. 9, the parts corresponding to those in FIG. 8 are designated by the same reference numerals, and detailed description thereof will be omitted as appropriate. The basic configuration of the processing unit 400B is the same as that of the processing unit 400A shown in FIG. In the processing unit 400B, when the processing update unit 413 determines whether or not to update the function in the processing unit 400B, not only the evaluation information but also the external setting information is used.

As described in FIG. 8 above, the processing update unit 413 determines how to update each function in the processing unit 400A (400B) based on the evaluation information, and then it is necessary to update. If it is determined, the function will be updated.

The processing update unit 413 in the processing unit 400B further uses the setting information that can be obtained from the outside, so that the segmentation processing unit 401, the learning data area designation unit 403, the learning data generation unit 404, and the generation network in the processing unit 400B are used. The functions of the learning unit 406 and the setting parameter recording unit 407 can be updated in the same manner. For example, it is conceivable to use the setting information in the processing unit of another viewer having the same function as the processing unit 400B and the evaluation information of the other viewer. This makes it possible to make improvements efficiently.

Although the preferred embodiments of the present disclosure have been described in detail with reference to the accompanying drawings, the technical scope of the present disclosure is not limited to such examples. It is clear that a person having ordinary knowledge in the technical field of the present disclosure can come up with various modifications or modifications within the scope of the technical ideas described in the claims. Of course, it is understood that the above also belongs to the technical scope of the present disclosure.

Further, the effects described in the present specification are merely explanatory or exemplary and are not limited. That is, the techniques according to the present disclosure may exhibit other effects apparent to those skilled in the art from the description herein, in addition to or in place of the above effects.

The present technology can have the following configurations.
(1) An image processing apparatus including a processing unit that generates an interpolated image to be fitted in a removal region of an input image using a generation neural network, and fits the interpolated image into the removal region of the input image to obtain an output image.
(2) The processing unit is
A segmentation processing unit that performs segmentation processing on the input image to obtain area information of each subject included in the image.
Designation of a learning data area for designating a data area that can be used for learning data required for learning by the generation neural network on the input image based on the information of the removal area and the area information of each subject. Department and
A learning data generation unit that generates learning data from the input image based on the data area,
A generating network learning unit that learns the generating neural network using the learning data,
An interpolated image generator that generates an interpolated image to be fitted in the removal region using the generating neural network,
The image processing apparatus according to (1), further comprising an image integration unit for fitting the interpolated image into a removal region of the input image to obtain the output image.
(3) The generating network learning unit further learns from the learning data based on the generating neural network that has been previously trained to generate an image of the same type as the subject corresponding to the removal region. The image processing apparatus according to (2) above.
(4) The image processing apparatus according to any one of (1) to (3) above, further comprising a detection unit for obtaining information on the removal region included in the input image.
(5) The image processing apparatus according to (4), wherein the detection unit obtains information on the removal area based on information from the outside.
(6) The image processing apparatus according to any one of (1) to (5) above, further comprising a processing updating unit that updates the function of the processing unit based on the evaluation information of the output image.
(7) The image processing apparatus according to (6), wherein the processing update unit further updates the function of the processing unit based on external setting information.
(8) An image processing method comprising a procedure of generating an interpolated image to be fitted in a removal region of an input image using a generation neural network, and fitting the interpolated image into the removal region of the input image to obtain an output image.
(9) Computer
A program that uses a generating neural network to generate an interpolated image to be fitted into the removal area of an input image, and fits the interpolated image into the removal area of the input image to function as a processing means for obtaining an output image.
(10) A change area determination unit that determines an area with a change and an area without a change in the image with reference to the input image, and
A chapter information recording unit in which information about chapters of the input image is recorded, and
A change information recording unit that records information regarding the presence or absence of changes in each region with respect to the input image, and
Based on the information regarding the presence or absence of change for each area recorded in the change information recording unit, the removal area position identification unit that identifies the position of the area to be removed from the input image and outputs it as removal position information,
Based on the information regarding the presence or absence of change for each area recorded in the change information recording unit, it is determined what kind of information the area has, and the removal information type determination is output as type information. Department and
A removal information recording unit that integrates and records the removal position information and the type information,
A segmentation processing unit that performs segmentation processing on the input image,
A segmentation information recording unit that records the area information of the subject obtained by the segmentation process and the label information in association with each other.
In order to construct the learning data necessary for learning by the network for interpolating the removal area by referring to the removal information recorded in the removal information recording unit, the area of data that can be used for the learning data is determined. The learning data area specification part specified on the input image and
A learning data generation unit that generates learning data by referring to an input image area that can be used for learning data, which is designated by the learning data area designation unit.
A learning data recording unit that records learning data generated by the learning data generation unit,
A generating network learning unit that reads learning data from the learning data recording unit and learns a generating neural network used for interpolation processing.
A setting parameter recording unit that records parameters related to the type of learning data used in the learning data generation unit and learning data specifications, and learning parameters such as the number of learning times and the learning rate in the generation network learning unit.

A generation model recording unit that records the network model generated by the generation network learning unit, and a generation model recording unit.
An interpolated image generator that selects an appropriate network model from the generated model recording unit and generates the texture that best fits the removal area as an interpolated image.
An image processing apparatus including an image integrating unit that fits the interpolated image generated by the interpolated image generation unit into a corresponding area of an input image, integrates the interpolated image as a single image, and outputs the output image.
(11) The image processing apparatus according to (10), wherein the removal area position specifying unit and the removal information type determination unit refer to information from the outside.
(12) An evaluation unit that evaluates the quality of the image output from the image integration unit, and
The image processing apparatus according to (10) or (11), further comprising a processing updating unit that determines whether or not to update the function based on the evaluation information obtained by the evaluation unit.
(13) The image processing apparatus according to (12), wherein not only the evaluation information but also the external setting information is used when the processing update unit determines whether or not to update the function.
(14) A change area determination step for determining an area where there is a change and an area where there is no change in the image with reference to the input image, and
Based on the information regarding the presence or absence of change for each area recorded in the change information recording unit, the removal area position specifying step of specifying the position of the area to be removed from the input image and outputting it as the removal position information,
Based on the information regarding the presence or absence of change for each area recorded in the change information recording unit, it is determined what kind of information the area has, and the removal information type determination is output as type information. Steps and
A segmentation processing step for performing segmentation processing on the input image, and
In order to construct the learning data necessary for learning by the network for interpolating the removal area by referring to the removal information recorded in the removal information recording unit, the area of data that can be used for the learning data is determined. The learning data area specification step specified on the input image and
A learning data generation step of generating learning data by referring to an area of an input image that can be used for learning data, which is designated by the learning data area designation unit, and
A generating network learning step of reading learning data from the learning data recording unit and learning a generating neural network used for interpolation processing.
An interpolated image generation step of selecting an appropriate network model from the generated model recording unit and generating the texture most suitable for the removal area as an interpolated image.
An image processing method including an image integration step of fitting an interpolated image generated by the interpolated image generation unit into a corresponding region of an input image, integrating the image as a single image, and outputting the image as an output image.

10 ... TV receiver 101 ... Reception antenna 102 ... Digital broadcast receiver 103 ... Display unit 104 ... Recording / playback unit 105 ... Image processing unit 106 ... CPU
107 ・・・

User operation unit

200, 200A ・・・ Detection unit 201 ・・・ Change area determination unit 202 ・・・ Chapter information recording unit 203 ・・・ Change information recording unit 204 ・・・ Removal area position identification unit 205 ・・・ Removal information type determination unit 206 ・・・ Removal information recording unit 300 ・・・

Removal unit

400, 400A, 400B ・・・ Processing unit 401 ・・・ Segmentation processing unit 402 ・・・ Segmentation information recording unit 403 ・・・Learning data area designation unit 404 ・・・ Learning data generation unit 405 ・・・ Learning data recording unit 406 ・・・ Generation network learning unit 407 ・・・ Setting parameter recording unit 408 ・・・ Generation model recording unit 409 ・・・ Interpolated image generation unit 410 ・・・ Image integration unit 411 ・・・ Evaluation unit 412 ・・・ Evaluation information recording unit 413 ・・・ Processing update unit

Claims

An image processing apparatus including a processing unit that generates an interpolated image to be fitted into a removal region of an input image using a generation neural network, and fits the interpolated image into the removal region of the input image to obtain an output image.
The processing unit
A segmentation processing unit that performs segmentation processing on the input image to obtain area information of each subject included in the image.
Designation of a learning data area for designating a data area that can be used for learning data required for learning by the generation neural network on the input image based on the information of the removal area and the area information of each subject. Department and
A learning data generation unit that generates learning data from the input image based on the data area,
A generating network learning unit that learns the generating neural network using the learning data,
An interpolated image generator that generates an interpolated image to be fitted in the removal region using the generating neural network,
The image processing apparatus according to claim 1, further comprising an image integration unit for fitting the interpolated image into a removal region of the input image to obtain the output image.
The generating network learning unit further learns from the learning data based on the generating neural network that has been previously trained to generate an image of the same type as the subject corresponding to the removal region. The image processing apparatus according to.
The image processing apparatus according to claim 1, further comprising a detection unit for obtaining information on the removal region included in the input image.
The image processing apparatus according to claim 4, wherein the detection unit obtains information on the removal region based on information from the outside.
The image processing apparatus according to claim 1, further comprising a processing update unit that updates the function of the processing unit based on the evaluation information of the output image.
The image processing apparatus according to claim 6, wherein the processing update unit further updates the function of the processing unit based on external setting information.
An image processing method comprising a procedure of generating an interpolated image to be fitted in a removal region of an input image using a generation neural network, and fitting the interpolated image into the removal region of the input image to obtain an output image.
Computer,
A program that uses a generating neural network to generate an interpolated image to be fitted into the removal area of an input image, and fits the interpolated image into the removal area of the input image to function as a processing means for obtaining an output image.