CN113988294A - Method for training prediction network, image processing method and device - Google Patents

Method for training prediction network, image processing method and device Download PDF

Info

Publication number
CN113988294A
CN113988294A CN202111279847.4A CN202111279847A CN113988294A CN 113988294 A CN113988294 A CN 113988294A CN 202111279847 A CN202111279847 A CN 202111279847A CN 113988294 A CN113988294 A CN 113988294A
Authority
CN
China
Prior art keywords
histogram
image
foreground
prediction
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111279847.4A
Other languages
Chinese (zh)
Inventor
林天威
李甫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111279847.4A priority Critical patent/CN113988294A/en
Publication of CN113988294A publication Critical patent/CN113988294A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The disclosure provides a method for training a prediction network, an image processing method, an image processing device, equipment and a storage medium, and relates to the field of artificial intelligence, in particular to computer vision and deep learning technology. The specific implementation scheme is as follows: for each sample set of a plurality of sample sets, inputting the sample set into a prediction network, wherein each sample set of the plurality of sample sets comprises a background histogram sample and a corresponding foreground histogram sample; determining a loss value according to a standard histogram corresponding to the sample set and a prediction result; and adjusting the parameters of the prediction network according to the loss value under the condition that the loss value is larger than the loss threshold value.

Description

Method for training prediction network, image processing method and device
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular to computer vision and deep learning techniques.
Background
With the rise of remote offices in recent years, video conference software is increasingly used. One commonly used function in video conference software is background replacement, that is, after a portrait part in a foreground in an image is extracted, a background part in the image is replaced, so that the effects of protecting privacy, obtaining a better conference effect and the like are achieved.
Disclosure of Invention
The disclosure provides a method for training a prediction network, an image processing method, an apparatus, a device and a storage medium.
According to an aspect of the present disclosure, there is provided a method of training a prediction network, including: for each sample set of a plurality of sample sets, inputting the sample set into a prediction network, wherein each sample set of the plurality of sample sets comprises a background histogram sample and a corresponding foreground histogram sample; determining a loss value according to a standard histogram corresponding to the sample set and the prediction result; and adjusting a parameter of the prediction network according to the loss value when the loss value is greater than a loss threshold value.
According to another aspect of the present disclosure, there is provided an image processing method including: determining a background histogram of a background image and a first foreground histogram of a first foreground image; inputting the background histogram and the first foreground histogram into a prediction network to obtain a prediction histogram; determining a first target foreground image according to the prediction histogram and the first foreground histogram; and synthesizing the background image and the first target foreground image to obtain a first target image, wherein the prediction network is trained according to the method for training the prediction network disclosed by the embodiment of the disclosure.
According to another aspect of the present disclosure, there is provided an apparatus for training a prediction network, including: an input module to input a plurality of sample sets into a prediction network, wherein each sample set of the plurality of sample sets comprises a background histogram sample and a corresponding foreground histogram sample; a loss determining module, configured to determine a loss value according to the standard histogram corresponding to the sample set and the prediction result; and an adjusting module, configured to adjust a parameter of the predicted network according to the loss value when the loss value is greater than a loss threshold.
According to another aspect of the present disclosure, there is provided an image processing apparatus including: the first determining module is used for determining a background histogram of a background image and a first foreground histogram of a first foreground image; the input module is used for inputting the background histogram and the first foreground histogram into a prediction network to obtain a prediction histogram; a second determining module, configured to determine a first target foreground image according to the prediction histogram and the first foreground histogram; and a synthesis module, configured to synthesize the background image and the first target foreground image to obtain a first target image, where the prediction network is trained according to the method for training a prediction network according to the embodiment of the present disclosure.
Another aspect of the present disclosure provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the embodiments of the present disclosure.
According to another aspect of the disclosed embodiments, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method shown in the disclosed embodiments.
According to another aspect of the embodiments of the present disclosure, there is provided a computer program product comprising computer programs/instructions, characterized in that the computer programs/instructions, when executed by a processor, implement the steps of the method shown in the embodiments of the present disclosure.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1A schematically illustrates a background image schematic according to an embodiment of the disclosure;
FIG. 1B schematically shows a video image schematic according to an embodiment of the disclosure;
FIG. 1C schematically illustrates a composite image schematic according to an embodiment of the disclosure;
FIG. 2 schematically illustrates a flow diagram of a method of training a predictive network according to an embodiment of the disclosure;
FIG. 3 schematically shows a predictive network schematic according to an embodiment of the disclosure;
FIG. 4 schematically shows a schematic diagram of a method of training a predictive network according to an embodiment of the disclosure;
FIG. 5 schematically shows a flow chart of an image processing method according to an embodiment of the present disclosure;
FIG. 6 schematically shows a flow chart of an image processing method according to another embodiment of the present disclosure;
FIG. 7 schematically illustrates a block diagram of an apparatus for training a predictive network, in accordance with an embodiment of the present disclosure;
fig. 8 schematically shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure; and
FIG. 9 schematically shows a block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1A to 1C.
It should be noted that the following is only an example of an application scenario in which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
The image processing method according to the embodiment of the present disclosure may be used, for example, to replace a background portion in an image.
Based on this, fig. 1A schematically illustrates a background image schematic according to an embodiment of the present disclosure. As shown in fig. 1A, the background image 110 illustratively demonstrates an office environment.
Fig. 1B schematically shows a video image schematic according to an embodiment of the disclosure. As shown in fig. 1B, the video image diagram 120 includes a human body image 121.
According to the embodiment of the present disclosure, for the reasons of protecting privacy or achieving better video effect, the background of the human body image 121 in the video image 120 may be replaced. Based on this, a human body image 121 can be extracted from the video image 120. Then, the human body image 121 is synthesized with the background image 110 as a foreground image, and a synthesized image 130 is obtained.
Fig. 1C schematically illustrates a composite image schematic according to an embodiment of the disclosure. As shown in fig. 1C, the background portion in the original video image 120 is replaced by the background image 110, so that the human body image 121 and the background image 110 are synthesized into the composite image 130, which achieves the effect of making the human body in the office environment.
However, in the case where the shooting environments of the background image 110 and the video image 120 are different, in the composite image 130 after replacing the background, the human body image 121 and the background image 110 have a large difference in color, illumination, and the like, and thus the composite image 130 has a strong sense of incongruity.
Based on this, according to embodiments of the present disclosure, the predictive network may be trained in advance. The input of the prediction network is a histogram of a background picture and a histogram of a foreground picture, and the output is another histogram. The histogram contains color and illumination information resulting from the synthesis of the background picture and the foreground picture. On this basis, a background histogram of the background image 120 and a foreground histogram of the human body image 121 as a foreground image can be determined. And then inputting the background histogram and the foreground histogram into a pre-trained prediction network to obtain a prediction histogram. Next, a target foreground image is determined based on the prediction histogram and the foreground histogram. The background image 120 and the target foreground image are synthesized to obtain the synthesized image 130, so that the color and illumination of the foreground image 121 in the synthesized image 130 can be more matched with the background image 110, and the sense of incongruity is reduced.
In the technical scheme of the present disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the related users all conform to the regulations of the related laws and regulations, and do not violate the good custom of the public order.
Fig. 2 schematically shows a flow diagram of a method of training a predictive network according to an embodiment of the disclosure.
As shown in fig. 2, the method 200 of training a prediction network includes inputting a plurality of sample sets into the prediction network for each of the sample sets in operation S210.
According to an embodiment of the present disclosure, each sample set of the plurality of sample sets comprises a background histogram sample and a corresponding foreground histogram sample.
Then, in operation S220, a loss value is determined according to the standard histogram corresponding to the sample set and the prediction result.
In operation S230, in case the loss value is greater than the loss threshold value, a parameter of the prediction network is adjusted according to the loss value.
According to embodiments of the present disclosure, the background histogram samples and the corresponding foreground histogram samples in the sample set may be generated from a standard image. Wherein the standard image comprises a background portion and a foreground portion. According to the embodiment of the disclosure, the foreground image and the background image in the standard image can be extracted. The foreground image may then be adjusted so that the foreground image does not fit the background image. The adjustment may be performed at least once to obtain at least one adjusted foreground image, i.e. the target foreground image. Next, a histogram of at least one target foreground image may be determined as a foreground histogram sample, a histogram of a background image may be determined as a background histogram sample, and a histogram of a standard image may be determined as a standard histogram.
In this embodiment, for example, a foreground histogram sample and a background histogram sample corresponding to the same standard image may be taken as one sample set. It will be appreciated that in training the prediction network, the sample set may have a standard histogram of the standard image as the expected output.
For example, in the present embodiment, an image captured in a real scene may be collected as a standard image, for example.
The predictive network shown above is further described with reference to fig. 3 in conjunction with specific embodiments.
Fig. 3 schematically illustrates a predictive network schematic according to an embodiment of the disclosure.
As shown in fig. 3, prediction network 310 may include a plurality of one-dimensional convolutional layers. Illustratively, in the present embodiment, the prediction network 310 may include, for example, 8 one-dimensional convolutional layers.
According to an embodiment of the present disclosure, the inputs to the prediction network 310 may be a background histogram 301 and a foreground histogram 302. The background histogram 301 and the foreground histogram 302 may respectively include histograms of 3 channels, where the 3 channels are a red channel, a green channel, and a blue channel. The size of the background histogram 301 and the foreground histogram 302 may be 3 × 256, where 3 represents 3 channels and 256 represents that the luminance range (0, 255) of the pixel contains 256 luminance values.
According to embodiments of the present disclosure, the output of the prediction network 310 may be a histogram 303. The histogram 303 also includes histograms of 3 channels, which are the same as the background histogram 301 and the foreground histogram 302, and the size is also 3 × 256.
The method for training the prediction network described above is further described with reference to fig. 4 in conjunction with specific embodiments.
Fig. 4 schematically shows a schematic diagram of a method of training a predictive network according to an embodiment of the disclosure.
As shown in fig. 4, a sample set may be obtained from a plurality of sample sets, where the sample set includes a background histogram sample Fa and a foreground histogram sample Fb, and the expected output of the sample set is a standard histogram Fc.
According to the embodiment of the disclosure, the background histogram sample Fa and the corresponding foreground histogram sample Fb can be input into the prediction network to obtain the prediction result Fd. Then, a loss value between the predicted result Fd and the standard histogram Fc is calculated using a loss function. The penalty value may be used to indicate the magnitude of the difference between the predicted outcome Fd and the expected output. In the present embodiment, the standard histogram Fc corresponding to the sample set as an input is expected to be output.
Next, it is determined whether the loss value is greater than a loss threshold. If the loss value is larger than the loss threshold value, the parameters of the prediction network are adjusted, and then another sample set is selected to continue the training process. And if the loss value is less than or equal to the loss threshold value, ending the training. According to the embodiments of the present disclosure, the loss threshold may be determined according to actual needs, and the specific value of the loss threshold is not specifically limited by the present disclosure.
Exemplarily, in the present embodiment, the loss value may be calculated according to the following formula, for example:
Figure BDA0003328405550000061
wherein, L is a loss value, n is a total number of luminance values in the prediction result, yi is a number of pixels corresponding to an ith luminance value in the standard histogram, and xi is a number of pixels corresponding to the ith luminance value in the prediction result.
According to the embodiment of the disclosure, the histogram output by the network prediction can be used for performing a lighting (relighting) process on the foreground image, so that the color and illumination in the foreground image are more suitable for the background.
Based on this, fig. 5 schematically shows a flow chart of an image processing method according to an embodiment of the present disclosure.
As shown in fig. 5, the image processing method 500 includes determining a background histogram of a background image and a foreground histogram of a foreground image in operation S510.
In operation S520, the background histogram and the foreground histogram are input to a prediction network, resulting in a prediction histogram.
According to an embodiment of the present disclosure, a prediction network is trained according to a method of training a prediction network of an embodiment of the present disclosure.
In operation S530, a target foreground image is determined according to the prediction histogram and the foreground histogram.
Operation S530 may include, for example, performing histogram equalization processing on the prediction histogram and the foreground histogram, according to an embodiment of the present disclosure. And then taking the equalized prediction histogram as a target, and carrying out migration processing on the foreground histogram to obtain a target foreground image.
In operation S540, the background image and the target foreground image are synthesized to obtain a target image.
According to the embodiment of the disclosure, the histogram output by the prediction network is utilized to perform the redright processing on the foreground image, so that the color and the illumination in the foreground image are more fit with the background. Therefore, after the foreground image and the background image are synthesized, the synthetic effect is more real and natural.
The image processing method according to the embodiment of the present disclosure can also be used for processing frame images in a video stream. Based on this, fig. 6 schematically shows a flow chart of an image processing method according to another embodiment of the present disclosure.
As shown in fig. 6, the image processing method 600 includes acquiring an initial frame image in a video stream in operation S610.
According to an embodiment of the present disclosure, the initial frame image may be, for example, a frame image corresponding to a first frame in a video stream.
In operation S620, an image including a target object in an initial frame image is extracted as a first foreground image.
According to an embodiment of the present disclosure, the target object may include, for example, a human body. According to other embodiments of the present disclosure, the target object may also include other objects besides a human body that need to be placed in the foreground.
In operation S630, a background histogram of the background image and a first foreground histogram of the first foreground image are determined.
In operation S640, the background histogram and the first foreground histogram are input to a prediction network, resulting in a prediction histogram.
In operation S650, a first target foreground image is determined according to the prediction histogram and the first foreground histogram.
According to an embodiment of the present disclosure, for example, the prediction histogram and the first foreground histogram may be histogram equalized. And then, taking the equalized prediction histogram as a target, and carrying out migration processing on the first foreground histogram to obtain a first target foreground image.
In operation S660, the background image and the first target foreground image are synthesized to obtain a first target image.
In operation S670, other frame images in the video stream except the initial frame image are acquired. Then, operations S680 to S6110 may be performed for each of the other frame images.
In operation S680, an image including the target object in the frame image is extracted as a second foreground image.
In operation S690, a second foreground histogram of the second foreground image is determined.
In operation S6100, a second target foreground image is determined according to the prediction histogram and the second foreground histogram.
According to the embodiment of the present disclosure, for example, reference may be made to the above method for determining a foreground image of a first target, which is not described herein again.
In operation S6110, the background image and the second foreground image are synthesized to obtain a second target image.
According to the embodiment of the disclosure, except that the first frame needs to be predicted by using a prediction network, the subsequent frame can directly perform the migration processing on the previously obtained prediction histogram, so that the processing speed is high. In addition, in the related art, the method of redrawing the picture is directly applied to the video stream, which often causes the problems of jitter and the like. When the image processing method according to the embodiment of the present disclosure processes the frame images of the video stream, since all the frame images are migrated with the same prediction histogram as the target, the timing sequence is stable, and thus, there is no jitter.
The following further describes an apparatus for training a prediction network according to an embodiment of the present disclosure.
Fig. 7 schematically illustrates a block diagram of an apparatus for training a predictive network according to an embodiment of the present disclosure.
As shown in fig. 7, the apparatus 700 for training a prediction network includes an input module 710, a loss determination module 720, and an adjustment module 730.
An input module 710 configured to input the sample sets into a prediction network for each of a plurality of sample sets, wherein each of the plurality of sample sets comprises a background histogram sample and a corresponding foreground histogram sample.
And a loss determining module 720, configured to determine a loss value according to the prediction result and the standard histogram corresponding to the sample set.
And an adjusting module 730, configured to adjust a parameter of the prediction network according to the loss value when the loss value is greater than the loss threshold.
According to an embodiment of the present disclosure, the loss determination module may include a calculation sub-module for calculating the loss value according to the following formula:
Figure BDA0003328405550000091
wherein L is the loss value, n is the total number of luminance values in the prediction result, yi is the number of pixels corresponding to the ith luminance value in the standard histogram, and xi is the number of pixels corresponding to the ith luminance value in the prediction result.
According to an embodiment of the present disclosure, the apparatus may further include an image extraction module, an adjustment module, a first histogram determination module, a second histogram determination module, and a third histogram determination module. The image extraction module is used for extracting a foreground image and a background image in at least one standard image aiming at each standard image in the standard images. And the adjusting module is used for adjusting the foreground image to obtain at least one target foreground image. A first histogram determination module, configured to determine a histogram of the at least one target foreground image as the foreground histogram sample. And the second histogram determination module is used for determining the histogram of the background image as the background histogram sample. And the third histogram determination module is used for determining the histogram of the standard image as the standard histogram.
According to embodiments of the present disclosure, a prediction network may include, for example, a plurality of one-dimensional convolutional layers.
The image processing apparatus shown in the embodiments of the present disclosure is further described below.
Fig. 8 schematically shows a block diagram of an image processing apparatus according to an embodiment of the present disclosure.
As shown in fig. 8, the image processing apparatus 800 includes a first determining module 810, an input module 820, a second determining module 830, and a synthesizing module 840.
A first determining module 810 for determining a background histogram of a background image and a first foreground histogram of a first foreground image.
And an input module 820, configured to input the background histogram and the first foreground histogram into the prediction network to obtain a prediction histogram.
A second determining module 830, configured to determine the first target foreground image according to the prediction histogram and the first foreground histogram.
The synthesizing module 840 is configured to synthesize the background image and the first target foreground image to obtain a first target image. Wherein the prediction network is trained according to the method for training the prediction network of the embodiment of the disclosure.
According to an embodiment of the present disclosure, the second determination module may include an equalization processing sub-module and a migration processing sub-module. And the equalization processing sub-module is used for performing histogram equalization processing on the prediction histogram and the first foreground histogram. And the migration processing sub-module is used for taking the equalized prediction histogram as a target and performing migration processing on the first foreground histogram to obtain a first target foreground image.
According to an embodiment of the present disclosure, the apparatus may further include a first obtaining module and a first extracting module. The first obtaining module is used for obtaining an initial frame image in a video stream. And the first extraction module is used for extracting an image containing the target object in the initial frame image as a first foreground image.
According to an embodiment of the present disclosure, the apparatus may further include a second obtaining module, a second extracting module, a third determining module, a fourth determining module, and a second synthesizing module. The second obtaining module is used for obtaining other frame images except the initial frame image in the video stream. And the second extraction module is used for extracting an image containing the target object in the frame image as a second foreground image aiming at each frame image in other frame images. A third determining module for determining a second foreground histogram of the second foreground image. And the fourth determining module is used for determining a second target foreground image according to the prediction histogram and the second foreground histogram. And the second synthesis module is used for synthesizing the background image and the second target foreground image to obtain a second target image.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 9 schematically shows a block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as the method of training the prediction network and the image processing method. For example, in some embodiments, the method of training a predictive network and the image processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the method of training a prediction network and the image processing method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the method of training the prediction network and the image processing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (19)

1. A method of training a predictive network, comprising:
inputting the sample sets into a prediction network for each sample set in a plurality of sample sets to obtain a prediction result, wherein each sample set in the plurality of sample sets comprises a background histogram sample and a corresponding foreground histogram sample;
determining a loss value according to a standard histogram corresponding to the sample set and the prediction result; and
and adjusting the parameters of the prediction network according to the loss value when the loss value is larger than a loss threshold value.
2. The method of claim 1, wherein the determining a loss value from the standard histogram corresponding to the set of samples and the prediction comprises:
calculating the loss value according to the following formula:
Figure FDA0003328405540000011
wherein L is the loss value, n is the total number of brightness values in the prediction result, and y isiThe number of pixels corresponding to the ith brightness value in the standard histogram, xiFor the i-th brightness value in the prediction resultThe number of pixels.
3. The method of claim 1, further comprising: for each of the at least one standard image,
extracting a foreground image and a background image in the standard image;
adjusting the foreground image to obtain at least one target foreground image;
determining a histogram of the at least one target foreground image as the foreground histogram sample;
determining a histogram of the background image as a background histogram sample; and
and determining a histogram of the standard image as the standard histogram.
4. The method of any of claims 1-3, wherein the prediction network comprises a plurality of one-dimensional convolutional layers.
5. An image processing method comprising:
determining a background histogram of a background image and a first foreground histogram of a first foreground image;
inputting the background histogram and the first foreground histogram into a prediction network to obtain a prediction histogram;
determining a first target foreground image according to the prediction histogram and the first foreground histogram; and
synthesizing the background image and the first target foreground image to obtain a first target image,
wherein the predictive network is trained according to the method of any one of claims 1-4.
6. The method of claim 5, wherein said determining a first target foreground image from the prediction histogram and the first foreground histogram comprises:
performing histogram equalization processing on the prediction histogram and the first foreground histogram; and
and taking the equalized prediction histogram as a target, and carrying out migration processing on the first foreground histogram to obtain a first target foreground image.
7. The method of claim 5 or 6, further comprising:
acquiring an initial frame image in a video stream; and
and extracting an image containing a target object in the initial frame image as the first foreground image.
8. The method of claim 7, further comprising:
acquiring other frame images except the initial frame image in the video stream;
for each of the other frame images,
extracting an image containing a target object in the frame image as a second foreground image;
determining a second foreground histogram of a second foreground image;
determining a second target foreground image according to the prediction histogram and the second foreground histogram; and
and synthesizing the background image and the second target foreground image to obtain a second target image.
9. An apparatus for training a predictive network, comprising:
an input module to input a plurality of sample sets into a prediction network, wherein each sample set of the plurality of sample sets comprises a background histogram sample and a corresponding foreground histogram sample;
a loss determining module, configured to determine a loss value according to the standard histogram corresponding to the sample set and the prediction result; and
and the adjusting module is used for adjusting the parameters of the prediction network according to the loss value under the condition that the loss value is greater than the loss threshold value.
10. The apparatus of claim 9, wherein the loss determination module comprises:
a calculation submodule for calculating the loss value according to the following formula:
Figure FDA0003328405540000031
wherein L is the loss value, n is the total number of brightness values in the prediction result, and y isiThe number of pixels corresponding to the ith brightness value in the standard histogram, xiThe number of pixels corresponding to the ith brightness value in the prediction result is obtained.
11. The apparatus of claim 9, further comprising:
the image extraction module is used for extracting a foreground image and a background image in at least one standard image aiming at each standard image in the standard images;
the adjusting module is used for adjusting the foreground image to obtain at least one target foreground image;
a first histogram determination module, configured to determine a histogram of the at least one target foreground image as the foreground histogram sample;
a second histogram determining module, configured to determine a histogram of the background image as the background histogram sample; and
and the third histogram determination module is used for determining the histogram of the standard image as the standard histogram.
12. The apparatus of any of claims 9-11, wherein the prediction network comprises a plurality of one-dimensional convolutional layers.
13. An image processing apparatus comprising:
the first determining module is used for determining a background histogram of a background image and a first foreground histogram of a first foreground image;
the input module is used for inputting the background histogram and the first foreground histogram into a prediction network to obtain a prediction histogram;
a second determining module, configured to determine a first target foreground image according to the prediction histogram and the first foreground histogram; and
a first synthesis module for synthesizing the background image and the first target foreground image to obtain a first target image,
wherein the predictive network is trained according to the method of any one of claims 1-4.
14. The apparatus of claim 13, wherein the second determining means comprises:
the equalization processing sub-module is used for carrying out histogram equalization processing on the prediction histogram and the first foreground histogram; and
and the migration processing sub-module is used for taking the equalized prediction histogram as a target and performing migration processing on the first foreground histogram to obtain a first target foreground image.
15. The apparatus of claim 13 or 14, further comprising:
the first acquisition module is used for acquiring an initial frame image in a video stream; and
and the first extraction module is used for extracting an image containing a target object in the initial frame image as the first foreground image.
16. The apparatus of claim 15, further comprising:
the second acquisition module is used for acquiring other frame images except the initial frame image in the video stream;
a second extraction module, configured to extract, for each frame image in the other frame images, an image that includes a target object in the frame image as a second foreground image;
a third determining module for determining a second foreground histogram of the second foreground image;
a fourth determining module, configured to determine a second target foreground image according to the prediction histogram and the second foreground histogram; and
and the second synthesis module is used for synthesizing the background image and the second target foreground image to obtain a second target image.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.
19. A computer program product comprising computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the steps of the method according to any of claims 1-8.
CN202111279847.4A 2021-10-29 2021-10-29 Method for training prediction network, image processing method and device Pending CN113988294A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111279847.4A CN113988294A (en) 2021-10-29 2021-10-29 Method for training prediction network, image processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111279847.4A CN113988294A (en) 2021-10-29 2021-10-29 Method for training prediction network, image processing method and device

Publications (1)

Publication Number Publication Date
CN113988294A true CN113988294A (en) 2022-01-28

Family

ID=79745156

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111279847.4A Pending CN113988294A (en) 2021-10-29 2021-10-29 Method for training prediction network, image processing method and device

Country Status (1)

Country Link
CN (1) CN113988294A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115293960A (en) * 2022-07-28 2022-11-04 珠海视熙科技有限公司 Illumination adjusting method, device, equipment and medium for fused image

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115293960A (en) * 2022-07-28 2022-11-04 珠海视熙科技有限公司 Illumination adjusting method, device, equipment and medium for fused image
CN115293960B (en) * 2022-07-28 2023-09-29 珠海视熙科技有限公司 Illumination adjustment method, device, equipment and medium for fused image

Similar Documents

Publication Publication Date Title
CN112633384B (en) Object recognition method and device based on image recognition model and electronic equipment
CN112492388B (en) Video processing method, device, equipment and storage medium
CN113163260B (en) Video frame output control method and device and electronic equipment
US20180314916A1 (en) Object detection with adaptive channel features
KR20220126264A (en) Video jitter detection method and device, electronic equipment and storage medium
CN111768356A (en) Face image fusion method and device, electronic equipment and storage medium
CN113365146B (en) Method, apparatus, device, medium and article of manufacture for processing video
CN113177451A (en) Training method and device of image processing model, electronic equipment and storage medium
CN115345968B (en) Virtual object driving method, deep learning network training method and device
US20230047748A1 (en) Method of fusing image, and method of training image fusion model
CN114449343A (en) Video processing method, device, equipment and storage medium
CN113014936A (en) Video frame insertion method, device, equipment and storage medium
CN112732553A (en) Image testing method and device, electronic equipment and storage medium
CN111784757A (en) Training method of depth estimation model, depth estimation method, device and equipment
CN113379877A (en) Face video generation method and device, electronic equipment and storage medium
CN113988294A (en) Method for training prediction network, image processing method and device
CN113873323B (en) Video playing method, device, electronic equipment and medium
CN114173158B (en) Face recognition method, cloud device, client device, electronic device and medium
CN116668843A (en) Shooting state switching method and device, electronic equipment and storage medium
CN113887435A (en) Face image processing method, device, equipment, storage medium and program product
CN113409199A (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN113691866B (en) Video processing method, device, electronic equipment and medium
CN116071422B (en) Method and device for adjusting brightness of virtual equipment facing meta-universe scene
CN113283305B (en) Face recognition method, device, electronic equipment and computer readable storage medium
CN115631103B (en) Training method and device for image generation model, and image generation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination