CN113988294A

CN113988294A - Method for training prediction network, image processing method and device

Info

Publication number: CN113988294A
Application number: CN202111279847.4A
Authority: CN
Inventors: 林天威; 李甫
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-01-28

Abstract

The disclosure provides a method for training a prediction network, an image processing method, an image processing device, equipment and a storage medium, and relates to the field of artificial intelligence, in particular to computer vision and deep learning technology. The specific implementation scheme is as follows: for each sample set of a plurality of sample sets, inputting the sample set into a prediction network, wherein each sample set of the plurality of sample sets comprises a background histogram sample and a corresponding foreground histogram sample; determining a loss value according to a standard histogram corresponding to the sample set and a prediction result; and adjusting the parameters of the prediction network according to the loss value under the condition that the loss value is larger than the loss threshold value.

Description

Method for training prediction network, image processing method and device

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular to computer vision and deep learning techniques.

Background

With the rise of remote offices in recent years, video conference software is increasingly used. One commonly used function in video conference software is background replacement, that is, after a portrait part in a foreground in an image is extracted, a background part in the image is replaced, so that the effects of protecting privacy, obtaining a better conference effect and the like are achieved.

Disclosure of Invention

The disclosure provides a method for training a prediction network, an image processing method, an apparatus, a device and a storage medium.

According to an aspect of the present disclosure, there is provided a method of training a prediction network, including: for each sample set of a plurality of sample sets, inputting the sample set into a prediction network, wherein each sample set of the plurality of sample sets comprises a background histogram sample and a corresponding foreground histogram sample; determining a loss value according to a standard histogram corresponding to the sample set and the prediction result; and adjusting a parameter of the prediction network according to the loss value when the loss value is greater than a loss threshold value.

According to another aspect of the present disclosure, there is provided an image processing method including: determining a background histogram of a background image and a first foreground histogram of a first foreground image; inputting the background histogram and the first foreground histogram into a prediction network to obtain a prediction histogram; determining a first target foreground image according to the prediction histogram and the first foreground histogram; and synthesizing the background image and the first target foreground image to obtain a first target image, wherein the prediction network is trained according to the method for training the prediction network disclosed by the embodiment of the disclosure.

According to another aspect of the present disclosure, there is provided an apparatus for training a prediction network, including: an input module to input a plurality of sample sets into a prediction network, wherein each sample set of the plurality of sample sets comprises a background histogram sample and a corresponding foreground histogram sample; a loss determining module, configured to determine a loss value according to the standard histogram corresponding to the sample set and the prediction result; and an adjusting module, configured to adjust a parameter of the predicted network according to the loss value when the loss value is greater than a loss threshold.

According to another aspect of the present disclosure, there is provided an image processing apparatus including: the first determining module is used for determining a background histogram of a background image and a first foreground histogram of a first foreground image; the input module is used for inputting the background histogram and the first foreground histogram into a prediction network to obtain a prediction histogram; a second determining module, configured to determine a first target foreground image according to the prediction histogram and the first foreground histogram; and a synthesis module, configured to synthesize the background image and the first target foreground image to obtain a first target image, where the prediction network is trained according to the method for training a prediction network according to the embodiment of the present disclosure.

Another aspect of the present disclosure provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the embodiments of the present disclosure.

According to another aspect of the disclosed embodiments, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method shown in the disclosed embodiments.

According to another aspect of the embodiments of the present disclosure, there is provided a computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the steps of the method shown in the embodiments of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1A schematically illustrates a background image schematic according to an embodiment of the disclosure;

FIG. 1B schematically shows a video image schematic according to an embodiment of the disclosure;

FIG. 1C schematically illustrates a composite image schematic according to an embodiment of the disclosure;

FIG. 2 schematically illustrates a flow diagram of a method of training a predictive network according to an embodiment of the disclosure;

FIG. 3 schematically shows a predictive network schematic according to an embodiment of the disclosure;

FIG. 4 schematically shows a schematic diagram of a method of training a predictive network according to an embodiment of the disclosure;

FIG. 5 schematically shows a flow chart of an image processing method according to an embodiment of the present disclosure;

FIG. 6 schematically shows a flow chart of an image processing method according to another embodiment of the present disclosure;

FIG. 7 schematically illustrates a block diagram of an apparatus for training a predictive network, in accordance with an embodiment of the present disclosure;

fig. 8 schematically shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure; and

FIG. 9 schematically shows a block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1A to 1C.

It should be noted that the following is only an example of an application scenario in which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.

The image processing method according to the embodiment of the present disclosure may be used, for example, to replace a background portion in an image.

Based on this, fig. 1A schematically illustrates a background image schematic according to an embodiment of the present disclosure. As shown in fig. 1A, the background image 110 illustratively demonstrates an office environment.

Fig. 1B schematically shows a video image schematic according to an embodiment of the disclosure. As shown in fig. 1B, the video image diagram 120 includes a human body image 121.

According to the embodiment of the present disclosure, for the reasons of protecting privacy or achieving better video effect, the background of the human body image 121 in the video image 120 may be replaced. Based on this, a human body image 121 can be extracted from the video image 120. Then, the human body image 121 is synthesized with the background image 110 as a foreground image, and a synthesized image 130 is obtained.

Fig. 1C schematically illustrates a composite image schematic according to an embodiment of the disclosure. As shown in fig. 1C, the background portion in the original video image 120 is replaced by the background image 110, so that the human body image 121 and the background image 110 are synthesized into the composite image 130, which achieves the effect of making the human body in the office environment.

However, in the case where the shooting environments of the background image 110 and the video image 120 are different, in the composite image 130 after replacing the background, the human body image 121 and the background image 110 have a large difference in color, illumination, and the like, and thus the composite image 130 has a strong sense of incongruity.

Based on this, according to embodiments of the present disclosure, the predictive network may be trained in advance. The input of the prediction network is a histogram of a background picture and a histogram of a foreground picture, and the output is another histogram. The histogram contains color and illumination information resulting from the synthesis of the background picture and the foreground picture. On this basis, a background histogram of the background image 120 and a foreground histogram of the human body image 121 as a foreground image can be determined. And then inputting the background histogram and the foreground histogram into a pre-trained prediction network to obtain a prediction histogram. Next, a target foreground image is determined based on the prediction histogram and the foreground histogram. The background image 120 and the target foreground image are synthesized to obtain the synthesized image 130, so that the color and illumination of the foreground image 121 in the synthesized image 130 can be more matched with the background image 110, and the sense of incongruity is reduced.

In the technical scheme of the present disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the related users all conform to the regulations of the related laws and regulations, and do not violate the good custom of the public order.

Fig. 2 schematically shows a flow diagram of a method of training a predictive network according to an embodiment of the disclosure.

As shown in fig. 2, the method 200 of training a prediction network includes inputting a plurality of sample sets into the prediction network for each of the sample sets in operation S210.

According to an embodiment of the present disclosure, each sample set of the plurality of sample sets comprises a background histogram sample and a corresponding foreground histogram sample.

Then, in operation S220, a loss value is determined according to the standard histogram corresponding to the sample set and the prediction result.

In operation S230, in case the loss value is greater than the loss threshold value, a parameter of the prediction network is adjusted according to the loss value.

According to embodiments of the present disclosure, the background histogram samples and the corresponding foreground histogram samples in the sample set may be generated from a standard image. Wherein the standard image comprises a background portion and a foreground portion. According to the embodiment of the disclosure, the foreground image and the background image in the standard image can be extracted. The foreground image may then be adjusted so that the foreground image does not fit the background image. The adjustment may be performed at least once to obtain at least one adjusted foreground image, i.e. the target foreground image. Next, a histogram of at least one target foreground image may be determined as a foreground histogram sample, a histogram of a background image may be determined as a background histogram sample, and a histogram of a standard image may be determined as a standard histogram.

In this embodiment, for example, a foreground histogram sample and a background histogram sample corresponding to the same standard image may be taken as one sample set. It will be appreciated that in training the prediction network, the sample set may have a standard histogram of the standard image as the expected output.

For example, in the present embodiment, an image captured in a real scene may be collected as a standard image, for example.

The predictive network shown above is further described with reference to fig. 3 in conjunction with specific embodiments.

Fig. 3 schematically illustrates a predictive network schematic according to an embodiment of the disclosure.

As shown in fig. 3, prediction network 310 may include a plurality of one-dimensional convolutional layers. Illustratively, in the present embodiment, the prediction network 310 may include, for example, 8 one-dimensional convolutional layers.

According to an embodiment of the present disclosure, the inputs to the prediction network 310 may be a background histogram 301 and a foreground histogram 302. The background histogram 301 and the foreground histogram 302 may respectively include histograms of 3 channels, where the 3 channels are a red channel, a green channel, and a blue channel. The size of the background histogram 301 and the foreground histogram 302 may be 3 × 256, where 3 represents 3 channels and 256 represents that the luminance range (0, 255) of the pixel contains 256 luminance values.

According to embodiments of the present disclosure, the output of the prediction network 310 may be a histogram 303. The histogram 303 also includes histograms of 3 channels, which are the same as the background histogram 301 and the foreground histogram 302, and the size is also 3 × 256.

The method for training the prediction network described above is further described with reference to fig. 4 in conjunction with specific embodiments.

Fig. 4 schematically shows a schematic diagram of a method of training a predictive network according to an embodiment of the disclosure.

As shown in fig. 4, a sample set may be obtained from a plurality of sample sets, where the sample set includes a background histogram sample Fa and a foreground histogram sample Fb, and the expected output of the sample set is a standard histogram Fc.

According to the embodiment of the disclosure, the background histogram sample Fa and the corresponding foreground histogram sample Fb can be input into the prediction network to obtain the prediction result Fd. Then, a loss value between the predicted result Fd and the standard histogram Fc is calculated using a loss function. The penalty value may be used to indicate the magnitude of the difference between the predicted outcome Fd and the expected output. In the present embodiment, the standard histogram Fc corresponding to the sample set as an input is expected to be output.

Next, it is determined whether the loss value is greater than a loss threshold. If the loss value is larger than the loss threshold value, the parameters of the prediction network are adjusted, and then another sample set is selected to continue the training process. And if the loss value is less than or equal to the loss threshold value, ending the training. According to the embodiments of the present disclosure, the loss threshold may be determined according to actual needs, and the specific value of the loss threshold is not specifically limited by the present disclosure.

Exemplarily, in the present embodiment, the loss value may be calculated according to the following formula, for example:

wherein, L is a loss value, n is a total number of luminance values in the prediction result, yi is a number of pixels corresponding to an ith luminance value in the standard histogram, and xi is a number of pixels corresponding to the ith luminance value in the prediction result.

According to the embodiment of the disclosure, the histogram output by the network prediction can be used for performing a lighting (relighting) process on the foreground image, so that the color and illumination in the foreground image are more suitable for the background.

Based on this, fig. 5 schematically shows a flow chart of an image processing method according to an embodiment of the present disclosure.

As shown in fig. 5, the image processing method 500 includes determining a background histogram of a background image and a foreground histogram of a foreground image in operation S510.

In operation S520, the background histogram and the foreground histogram are input to a prediction network, resulting in a prediction histogram.

According to an embodiment of the present disclosure, a prediction network is trained according to a method of training a prediction network of an embodiment of the present disclosure.

In operation S530, a target foreground image is determined according to the prediction histogram and the foreground histogram.

Operation S530 may include, for example, performing histogram equalization processing on the prediction histogram and the foreground histogram, according to an embodiment of the present disclosure. And then taking the equalized prediction histogram as a target, and carrying out migration processing on the foreground histogram to obtain a target foreground image.

In operation S540, the background image and the target foreground image are synthesized to obtain a target image.

According to the embodiment of the disclosure, the histogram output by the prediction network is utilized to perform the redright processing on the foreground image, so that the color and the illumination in the foreground image are more fit with the background. Therefore, after the foreground image and the background image are synthesized, the synthetic effect is more real and natural.

The image processing method according to the embodiment of the present disclosure can also be used for processing frame images in a video stream. Based on this, fig. 6 schematically shows a flow chart of an image processing method according to another embodiment of the present disclosure.

As shown in fig. 6, the image processing method 600 includes acquiring an initial frame image in a video stream in operation S610.

According to an embodiment of the present disclosure, the initial frame image may be, for example, a frame image corresponding to a first frame in a video stream.

In operation S620, an image including a target object in an initial frame image is extracted as a first foreground image.

According to an embodiment of the present disclosure, the target object may include, for example, a human body. According to other embodiments of the present disclosure, the target object may also include other objects besides a human body that need to be placed in the foreground.

In operation S630, a background histogram of the background image and a first foreground histogram of the first foreground image are determined.

In operation S640, the background histogram and the first foreground histogram are input to a prediction network, resulting in a prediction histogram.

In operation S650, a first target foreground image is determined according to the prediction histogram and the first foreground histogram.

According to an embodiment of the present disclosure, for example, the prediction histogram and the first foreground histogram may be histogram equalized. And then, taking the equalized prediction histogram as a target, and carrying out migration processing on the first foreground histogram to obtain a first target foreground image.

In operation S660, the background image and the first target foreground image are synthesized to obtain a first target image.

In operation S670, other frame images in the video stream except the initial frame image are acquired. Then, operations S680 to S6110 may be performed for each of the other frame images.

In operation S680, an image including the target object in the frame image is extracted as a second foreground image.

In operation S690, a second foreground histogram of the second foreground image is determined.

In operation S6100, a second target foreground image is determined according to the prediction histogram and the second foreground histogram.

According to the embodiment of the present disclosure, for example, reference may be made to the above method for determining a foreground image of a first target, which is not described herein again.

In operation S6110, the background image and the second foreground image are synthesized to obtain a second target image.

According to the embodiment of the disclosure, except that the first frame needs to be predicted by using a prediction network, the subsequent frame can directly perform the migration processing on the previously obtained prediction histogram, so that the processing speed is high. In addition, in the related art, the method of redrawing the picture is directly applied to the video stream, which often causes the problems of jitter and the like. When the image processing method according to the embodiment of the present disclosure processes the frame images of the video stream, since all the frame images are migrated with the same prediction histogram as the target, the timing sequence is stable, and thus, there is no jitter.

The following further describes an apparatus for training a prediction network according to an embodiment of the present disclosure.

Fig. 7 schematically illustrates a block diagram of an apparatus for training a predictive network according to an embodiment of the present disclosure.

As shown in fig. 7, the apparatus 700 for training a prediction network includes an input module 710, a loss determination module 720, and an adjustment module 730.

An input module 710 configured to input the sample sets into a prediction network for each of a plurality of sample sets, wherein each of the plurality of sample sets comprises a background histogram sample and a corresponding foreground histogram sample.

And a loss determining module 720, configured to determine a loss value according to the prediction result and the standard histogram corresponding to the sample set.

And an adjusting module 730, configured to adjust a parameter of the prediction network according to the loss value when the loss value is greater than the loss threshold.

According to an embodiment of the present disclosure, the loss determination module may include a calculation sub-module for calculating the loss value according to the following formula:

wherein L is the loss value, n is the total number of luminance values in the prediction result, yi is the number of pixels corresponding to the ith luminance value in the standard histogram, and xi is the number of pixels corresponding to the ith luminance value in the prediction result.

According to an embodiment of the present disclosure, the apparatus may further include an image extraction module, an adjustment module, a first histogram determination module, a second histogram determination module, and a third histogram determination module. The image extraction module is used for extracting a foreground image and a background image in at least one standard image aiming at each standard image in the standard images. And the adjusting module is used for adjusting the foreground image to obtain at least one target foreground image. A first histogram determination module, configured to determine a histogram of the at least one target foreground image as the foreground histogram sample. And the second histogram determination module is used for determining the histogram of the background image as the background histogram sample. And the third histogram determination module is used for determining the histogram of the standard image as the standard histogram.

According to embodiments of the present disclosure, a prediction network may include, for example, a plurality of one-dimensional convolutional layers.

The image processing apparatus shown in the embodiments of the present disclosure is further described below.

Fig. 8 schematically shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 8, the image processing apparatus 800 includes a first determining module 810, an input module 820, a second determining module 830, and a synthesizing module 840.

A first determining module 810 for determining a background histogram of a background image and a first foreground histogram of a first foreground image.

And an input module 820, configured to input the background histogram and the first foreground histogram into the prediction network to obtain a prediction histogram.

A second determining module 830, configured to determine the first target foreground image according to the prediction histogram and the first foreground histogram.

The synthesizing module 840 is configured to synthesize the background image and the first target foreground image to obtain a first target image. Wherein the prediction network is trained according to the method for training the prediction network of the embodiment of the disclosure.

According to an embodiment of the present disclosure, the second determination module may include an equalization processing sub-module and a migration processing sub-module. And the equalization processing sub-module is used for performing histogram equalization processing on the prediction histogram and the first foreground histogram. And the migration processing sub-module is used for taking the equalized prediction histogram as a target and performing migration processing on the first foreground histogram to obtain a first target foreground image.

According to an embodiment of the present disclosure, the apparatus may further include a first obtaining module and a first extracting module. The first obtaining module is used for obtaining an initial frame image in a video stream. And the first extraction module is used for extracting an image containing the target object in the initial frame image as a first foreground image.

According to an embodiment of the present disclosure, the apparatus may further include a second obtaining module, a second extracting module, a third determining module, a fourth determining module, and a second synthesizing module. The second obtaining module is used for obtaining other frame images except the initial frame image in the video stream. And the second extraction module is used for extracting an image containing the target object in the frame image as a second foreground image aiming at each frame image in other frame images. A third determining module for determining a second foreground histogram of the second foreground image. And the fourth determining module is used for determining a second target foreground image according to the prediction histogram and the second foreground histogram. And the second synthesis module is used for synthesizing the background image and the second target foreground image to obtain a second target image.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 9 schematically shows a block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as the method of training the prediction network and the image processing method. For example, in some embodiments, the method of training a predictive network and the image processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the method of training a prediction network and the image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the method of training the prediction network and the image processing method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of training a predictive network, comprising:

inputting the sample sets into a prediction network for each sample set in a plurality of sample sets to obtain a prediction result, wherein each sample set in the plurality of sample sets comprises a background histogram sample and a corresponding foreground histogram sample;

determining a loss value according to a standard histogram corresponding to the sample set and the prediction result; and

and adjusting the parameters of the prediction network according to the loss value when the loss value is larger than a loss threshold value.

2. The method of claim 1, wherein the determining a loss value from the standard histogram corresponding to the set of samples and the prediction comprises:

calculating the loss value according to the following formula:

wherein L is the loss value, n is the total number of brightness values in the prediction result, and y is_iThe number of pixels corresponding to the ith brightness value in the standard histogram, x_iFor the i-th brightness value in the prediction resultThe number of pixels.

3. The method of claim 1, further comprising: for each of the at least one standard image,

extracting a foreground image and a background image in the standard image;

adjusting the foreground image to obtain at least one target foreground image;

determining a histogram of the at least one target foreground image as the foreground histogram sample;

determining a histogram of the background image as a background histogram sample; and

and determining a histogram of the standard image as the standard histogram.

4. The method of any of claims 1-3, wherein the prediction network comprises a plurality of one-dimensional convolutional layers.

5. An image processing method comprising:

determining a background histogram of a background image and a first foreground histogram of a first foreground image;

inputting the background histogram and the first foreground histogram into a prediction network to obtain a prediction histogram;

determining a first target foreground image according to the prediction histogram and the first foreground histogram; and

synthesizing the background image and the first target foreground image to obtain a first target image,

wherein the predictive network is trained according to the method of any one of claims 1-4.

6. The method of claim 5, wherein said determining a first target foreground image from the prediction histogram and the first foreground histogram comprises:

performing histogram equalization processing on the prediction histogram and the first foreground histogram; and

and taking the equalized prediction histogram as a target, and carrying out migration processing on the first foreground histogram to obtain a first target foreground image.

7. The method of claim 5 or 6, further comprising:

acquiring an initial frame image in a video stream; and

and extracting an image containing a target object in the initial frame image as the first foreground image.

8. The method of claim 7, further comprising:

acquiring other frame images except the initial frame image in the video stream;

for each of the other frame images,

extracting an image containing a target object in the frame image as a second foreground image;

determining a second foreground histogram of a second foreground image;

determining a second target foreground image according to the prediction histogram and the second foreground histogram; and

and synthesizing the background image and the second target foreground image to obtain a second target image.

9. An apparatus for training a predictive network, comprising:

an input module to input a plurality of sample sets into a prediction network, wherein each sample set of the plurality of sample sets comprises a background histogram sample and a corresponding foreground histogram sample;

a loss determining module, configured to determine a loss value according to the standard histogram corresponding to the sample set and the prediction result; and

and the adjusting module is used for adjusting the parameters of the prediction network according to the loss value under the condition that the loss value is greater than the loss threshold value.

10. The apparatus of claim 9, wherein the loss determination module comprises:

a calculation submodule for calculating the loss value according to the following formula:

wherein L is the loss value, n is the total number of brightness values in the prediction result, and y is_iThe number of pixels corresponding to the ith brightness value in the standard histogram, x_iThe number of pixels corresponding to the ith brightness value in the prediction result is obtained.

11. The apparatus of claim 9, further comprising:

the image extraction module is used for extracting a foreground image and a background image in at least one standard image aiming at each standard image in the standard images;

the adjusting module is used for adjusting the foreground image to obtain at least one target foreground image;

a first histogram determination module, configured to determine a histogram of the at least one target foreground image as the foreground histogram sample;

a second histogram determining module, configured to determine a histogram of the background image as the background histogram sample; and

and the third histogram determination module is used for determining the histogram of the standard image as the standard histogram.

12. The apparatus of any of claims 9-11, wherein the prediction network comprises a plurality of one-dimensional convolutional layers.

13. An image processing apparatus comprising:

the first determining module is used for determining a background histogram of a background image and a first foreground histogram of a first foreground image;

the input module is used for inputting the background histogram and the first foreground histogram into a prediction network to obtain a prediction histogram;

a second determining module, configured to determine a first target foreground image according to the prediction histogram and the first foreground histogram; and

a first synthesis module for synthesizing the background image and the first target foreground image to obtain a first target image,

14. The apparatus of claim 13, wherein the second determining means comprises:

the equalization processing sub-module is used for carrying out histogram equalization processing on the prediction histogram and the first foreground histogram; and

and the migration processing sub-module is used for taking the equalized prediction histogram as a target and performing migration processing on the first foreground histogram to obtain a first target foreground image.

15. The apparatus of claim 13 or 14, further comprising:

the first acquisition module is used for acquiring an initial frame image in a video stream; and

and the first extraction module is used for extracting an image containing a target object in the initial frame image as the first foreground image.

16. The apparatus of claim 15, further comprising:

the second acquisition module is used for acquiring other frame images except the initial frame image in the video stream;

a second extraction module, configured to extract, for each frame image in the other frame images, an image that includes a target object in the frame image as a second foreground image;

a third determining module for determining a second foreground histogram of the second foreground image;

a fourth determining module, configured to determine a second target foreground image according to the prediction histogram and the second foreground histogram; and

and the second synthesis module is used for synthesizing the background image and the second target foreground image to obtain a second target image.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.

19. A computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method according to any of claims 1-8.