CN111833372B

CN111833372B - Foreground target extraction method and device

Info

Publication number: CN111833372B
Application number: CN202010728063.4A
Authority: CN
Inventors: 张迪; 潘华东; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-07-23
Filing date: 2020-07-23
Publication date: 2024-07-02
Anticipated expiration: 2040-07-23
Also published as: CN111833372A

Abstract

The invention provides a foreground object extraction method and a device, wherein the method comprises the following steps: acquiring a target image acquired by a camera; inputting the target image into a pre-trained target neural network model to obtain characteristic information of the target image output by a backbone network of the target neural network model; inputting the characteristic information into a full-connection layer of the target neural network model to obtain a classification result of the target image; according to the classification result, the characteristic information of the target image is input into a corresponding foreground target segmentation structure to obtain a foreground target image corresponding to the target image, so that the problem that the extraction effect on a foreground human body in a static image is poor due to human body segmentation based on a video image in the related technology can be solved, parameters in a target neural network model are directly obtained through training, manual intervention is not needed, the practicability is high, and the extraction effect on the foreground human body in the static image is achieved.

Description

Foreground target extraction method and device

Technical Field

The invention relates to the field of image processing, in particular to a foreground object extraction method and device.

Background

The object segmentation technology is mainly used for researching object segmentation based on panoramic pictures and segmenting general objects in images. In the field of gait recognition, the general scheme is to use a human body segmentation algorithm first and then use a gait recognition model to extract features.

The traditional human body segmentation algorithm has obvious advantages. Among them, the conventional human body segmentation algorithm can be roughly divided into 2 types: semantic segmentation and instance segmentation. Semantic segmentation algorithms typically use multi-scale segmentation networks, such as deep, PSP, etc., that are used to solve the panoramic object semantic segmentation problem. In the segmentation process, the size of the target, the scale of the target, the color gamut and the like have great influence on the segmentation effect. The example segmentation algorithm generally performs example segmentation on all types of targets in the panoramic picture, and classifies different human bodies into different types. Both segmentation algorithms segment the panoramic image, but cannot judge the foreground and background targets, and the algorithm for analyzing the foreground targets cannot be used.

A human body segmentation method based on a convolutional neural network is proposed in the related art, and includes the following steps. Step S3: the network parameters of the body segmentation model are randomly distributed, and the data set is repeatedly iterated to update the network parameters. Step S4: a prediction is made of the human body position of the video image to indicate a region of interest in the video image. Step S5: and (3) performing human body segmentation on the region of interest in the step S4 to acquire a human body in the video image. The human body segmentation method based on the convolutional neural network can be used for identifying the human body in real time and segmenting the human body in real time in the human body movement process, particularly in the high-speed human body movement process, better meets the video real-time requirement, and has higher identification accuracy and stability compared with the traditional technology. Meanwhile, the quality requirement on the video or the image is not high, and the human body identification and the human body segmentation can still be completed under the condition of poor definition of the video image. Human body segmentation is performed based on video images, and the extraction effect on foreground human bodies in static images is poor.

Aiming at the problem that the extraction effect of a foreground human body in a static image is poor when the human body is segmented based on a video image in the related art, no solution is proposed yet.

Disclosure of Invention

The embodiment of the invention provides a method and a device for extracting a foreground object, which at least solve the problem that the extraction effect on a foreground human body in a static image is poor when the human body is segmented based on a video image in the related technology.

According to an embodiment of the present invention, there is provided a foreground object extraction method including:

acquiring a target image acquired by a camera;

Inputting the target image into a pre-trained target neural network model to obtain characteristic information of the target image output by a backbone network of the target neural network model;

Inputting the characteristic information into a full-connection layer of the target neural network model to obtain a classification result of the target image;

and inputting the characteristic information of the target image into a corresponding foreground target segmentation structure according to the classification result to obtain a foreground target image corresponding to the target image.

Optionally, inputting the feature information into two fully connected layers of the target neural network model, and obtaining the classification result of the target image includes:

Inputting the characteristic information of the target image into a full-connection layer of the target neural network model under the condition that the target image contains a foreground target and the foreground target duty ratio is larger than or equal to a preset ratio, and obtaining a classification result of the target image as a first classification result;

And under the condition that the target image contains a foreground target and the foreground target duty ratio is smaller than the preset ratio, inputting the characteristic information of the target image into two full-connection layers of the target neural network model, and obtaining a classification result of the target image as a second classification result.

Optionally, inputting the feature information of the target image into a corresponding foreground target segmentation structure according to the classification result, and obtaining the foreground target image corresponding to the target image includes:

Inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure under the condition that the classification result is the first classification result to obtain a foreground target image corresponding to the target image; or alternatively

Inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure under the condition that the classification result is the first classification result to obtain a foreground target image corresponding to the target image; and under the condition that the classification result is the second classification result, inputting the characteristic information of the target image into a corresponding second foreground target segmentation structure to obtain a foreground target image corresponding to the target image.

Optionally, before inputting the target image into a pre-trained target neural network model, and obtaining feature information of the target image output by a backbone network of the target neural network model, the method further includes:

Constructing a data set, wherein the data set comprises a preset number of images and foreground objects corresponding to the images, and the images comprise: a first type of image and a second type of image, wherein the first type of image is an image containing a foreground object and the foreground object duty ratio is larger than or equal to a preset ratio, and the second type of image is an image containing a foreground object and the foreground object duty ratio is smaller than the preset ratio;

Training an original neural network model by using the preset number of images and foreground targets corresponding to the images to obtain the target neural network model, wherein the preset number of images are input into the original neural network model, and the foreground target images corresponding to the target images output by the trained target neural network model and the foreground targets actually corresponding to the target images meet a target loss function.

Optionally, training an original neural network model using the predetermined number of images and the foreground object corresponding to the images, to obtain the target neural network model includes:

inputting the preset number of images into the original neural network model to obtain the preset number of characteristic information output by a backbone network of the original neural network model;

Training two full-connection layers of the original neural network model according to the characteristic information of the preset number of images, wherein the characteristic information of the preset number of images is input into the two full-connection layers of the original neural network model, and the classification result corresponding to the target image output by the two full-connection layers of the trained original neural network model and the classification result actually corresponding to the target image meet a first loss function;

Training the feature information of the first type of image on a first foreground object segmentation structure of the original neural network model, wherein the feature information of the first type of image is input into the first foreground object segmentation structure of the original neural network model, and a second loss function is met by a foreground object corresponding to the target image and a foreground object actually corresponding to the target image output by the trained first foreground object segmentation structure of the original neural network model; and/or

Training the second foreground object segmentation structure of the original neural network model by using the characteristic information of the second class image, wherein the characteristic information of the second class image is input into the second foreground object segmentation structure of the original neural network model, and a third loss function is satisfied by the foreground object corresponding to the target image and the foreground object actually corresponding to the target image output by the trained second foreground object segmentation structure of the original neural network model, wherein the target loss function is determined according to the first loss function and the second loss function, or the target loss function is determined according to the first loss function, the second loss function and the third loss function.

Optionally, constructing the dataset includes:

for each image in the predetermined number of images in the dataset, acquiring gray values of all pixels of a foreground object corresponding to each image;

setting mask image labels for all pixels in each image, wherein if the pixels belong to a foreground object, the mask image labels corresponding to the pixels are 1, and if the pixels belong to a background, the mask image labels corresponding to the pixels are 0;

and marking whether each image is an image with a foreground object and the foreground object ratio being greater than or equal to a preset ratio.

According to another embodiment of the present invention, there is also provided a foreground object extraction apparatus including:

the acquisition module is used for acquiring a target image acquired by the camera;

the first input module is used for inputting the target image into a pre-trained target neural network model to obtain the characteristic information of the target image output by a backbone network of the target neural network model;

the second input module is used for inputting the characteristic information into the full-connection layer of the target neural network model to obtain a classification result of the target image;

And the third input module is used for inputting the characteristic information of the target image into the corresponding foreground target segmentation structure according to the classification result to obtain the foreground target image corresponding to the target image.

Optionally, the second input module includes:

The first input sub-module is used for inputting the characteristic information of the target image into two full-connection layers of the target neural network model under the condition that the target image contains a foreground target and the foreground target duty ratio is larger than or equal to a preset ratio, so that a classification result of the target image is a first classification result;

And the second input sub-module is used for inputting the characteristic information of the target image into two full-connection layers of the target neural network model under the condition that the target image contains a foreground target and the foreground target duty ratio is smaller than the preset ratio, so that the classification result of the target image is a second classification result.

Optionally, the third input module includes:

The third input sub-module is used for inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure to obtain a foreground target image corresponding to the target image under the condition that the classification result is the first classification result; or alternatively

The fourth input sub-module is used for inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure to obtain a foreground target image corresponding to the target image under the condition that the classification result is the first classification result; and under the condition that the classification result is the second classification result, inputting the characteristic information of the target image into a corresponding second foreground target segmentation structure to obtain a foreground target image corresponding to the target image.

Optionally, the apparatus further comprises:

A construction module, configured to construct a dataset, where the dataset includes a predetermined number of images and foreground objects corresponding to the images, and the images include: a first type of image and a second type of image, wherein the first type of image is an image containing a foreground object and the foreground object duty ratio is larger than or equal to a preset ratio, and the second type of image is an image containing a foreground object and the foreground object duty ratio is smaller than the preset ratio;

The training module is used for training an original neural network model by using the images with the preset number and the foreground targets corresponding to the images to obtain the target neural network model, wherein the images with the preset number are input of the original neural network model, and the foreground target images corresponding to the target images output by the trained target neural network model and the foreground targets actually corresponding to the target images meet a target loss function.

Optionally, the training module includes:

A fifth input sub-module, configured to input the predetermined number of images into the original neural network model, to obtain the predetermined number of feature information output by a backbone network of the original neural network model;

The first training submodule is used for training the two full-connection layers of the original neural network model according to the characteristic information of the images with the preset number, wherein the characteristic information of the images with the preset number is input into the two full-connection layers of the original neural network model, and the classification results corresponding to the target images output by the two full-connection layers of the trained original neural network model and the classification results actually corresponding to the target images meet a first loss function;

the second training sub-module is used for training the first foreground object segmentation structure of the original neural network model by the characteristic information of the first type of image, wherein the characteristic information of the first type of image is input into the first foreground object segmentation structure of the original neural network model, and a second loss function is met by a foreground object corresponding to the target image output by the trained first foreground object segmentation structure of the original neural network model and a foreground object actually corresponding to the target image; and/or

And the third training sub-module is used for training the second foreground object segmentation structure of the original neural network model by the characteristic information of the second class image, wherein the characteristic information of the second class image is the input of the second foreground object segmentation structure of the original neural network model, the trained foreground object corresponding to the target image output by the second foreground object segmentation structure of the original neural network model and the foreground object actually corresponding to the target image meet a third loss function, and the target loss function is determined according to the first loss function and the second loss function or the target loss function is determined according to the first loss function, the second loss function and the third loss function.

Optionally, the building module includes:

An acquisition sub-module, configured to acquire, for each image in the predetermined number of images in the dataset, gray values of all pixels of a foreground object corresponding to the each image;

The setting submodule is used for setting mask image labels for all pixels in each image, wherein if the pixels belong to a foreground object, the mask image labels corresponding to the pixels are 1, and if the pixels belong to a background, the mask image labels corresponding to the pixels are 0;

and the marking sub-module is used for marking whether each image is an image with a foreground target and the foreground target duty ratio being larger than or equal to a preset ratio.

According to a further embodiment of the invention, there is also provided a computer-readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.

According to a further embodiment of the invention, there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

According to the invention, the target image acquired by the camera is acquired; inputting the target image into a pre-trained target neural network model to obtain characteristic information of the target image output by a backbone network of the target neural network model; inputting the characteristic information into a full-connection layer of the target neural network model to obtain a classification result of the target image; according to the classification result, feature information of the target image is input into a corresponding foreground target segmentation structure to obtain a foreground target image corresponding to the target image, the problem that in the related art, based on a video image, human segmentation is carried out, and the extraction effect on a foreground human body in a static image is poor can be solved, the foreground target in the target image can be directly segmented and the foreground target can be judged through a pre-trained target neural network model, parameters in the target neural network model are directly obtained through training, manual intervention is not needed, the practicability is high, and the extraction effect on the foreground human body in the static image is achieved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

fig. 1 is a block diagram of a hardware structure of a mobile terminal of a foreground object extraction method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a foreground object extraction method according to an embodiment of the invention;

FIG. 3 is a block diagram of a foreground object extraction apparatus according to an embodiment of the invention;

Fig. 4 is a block diagram of a foreground object extraction apparatus according to a preferred embodiment of the present invention.

Detailed Description

The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

Example 1

The method according to the first embodiment of the present application may be implemented in a mobile terminal, a computer terminal or a similar computing device. Taking the example of running on a mobile terminal, fig. 1 is a block diagram of a hardware structure of the mobile terminal according to the foreground object extraction method of the embodiment of the present application, as shown in fig. 1, the mobile terminal may include one or more (only one is shown in fig. 1) processors 102 (the processors 102 may include, but are not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing data, and optionally, the mobile terminal may further include a transmission device 106 for communication functions and an input/output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting of the structure of the mobile terminal described above. For example, the mobile terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.

The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to the foreground object extraction method in the embodiment of the present invention, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, implement the above-mentioned method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the mobile terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission means 106 is arranged to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as a NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.

In this embodiment, a foreground object extraction method running on the mobile terminal or the network architecture is provided, and fig. 2 is a flowchart of the foreground object extraction method according to an embodiment of the present invention, as shown in fig. 2, where the flowchart includes the following steps:

step S202, acquiring a target image acquired by a camera;

step S204, inputting the target image into a pre-trained target neural network model to obtain the characteristic information of the target image output by a backbone network of the target neural network model;

step S206, inputting the characteristic information into the full connection layer of the target neural network model to obtain a classification result of the target image;

The target neural network model in the embodiment of the invention at least comprises a backbone network and at least two full-connection layers, wherein the embodiment of the invention uses two full-connection layers as an example for illustration, and does not limit the number of the full-connection layers.

Further, the step S206 may specifically include:

Inputting the characteristic information of the target image into two full-connection layers of the target neural network model under the condition that the target image contains a foreground target and the foreground target duty ratio is larger than or equal to a preset ratio, and obtaining a classification result of the target image as a first classification result;

And step S208, inputting the characteristic information of the target image into a corresponding foreground target segmentation structure according to the classification result to obtain a foreground target image corresponding to the target image.

Further, the step S208 may specifically include:

Inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure under the condition that the classification result is the first classification result to obtain a foreground target image corresponding to the target image; or under the condition that the classification result is the first classification result, inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure to obtain a foreground target image corresponding to the target image; and under the condition that the classification result is the second classification result, inputting the characteristic information of the target image into a corresponding second foreground target segmentation structure to obtain a foreground target image corresponding to the target image.

Through the steps S202 to S208, the problem that the extraction effect of the foreground human body in the static image is poor due to the fact that the human body segmentation is performed based on the video image in the related art can be solved, the foreground target in the target image can be directly segmented and the foreground target can be judged through the pre-trained target neural network model, parameters in the target neural network model are directly obtained through training, manual intervention is not needed, the practicability is high, and the extraction effect of the foreground human body in the static image is achieved.

In this embodiment, before inputting the target image into a pre-trained target neural network model to obtain feature information of the target image output by a backbone network of the target neural network model, a data set is constructed, and further, for each image in the predetermined number of images in the data set, gray values of all pixels of a foreground target corresponding to the each image are obtained; setting mask image labels for all pixels in each image, wherein if the pixels belong to a foreground object, the mask image labels corresponding to the pixels are 1, and if the pixels belong to a background, the mask image labels corresponding to the pixels are 0; marking whether each image is a foreground object and the foreground object ratio is greater than or equal to a preset ratio, wherein the data set comprises a preset number of images and foreground objects corresponding to the images, and the images comprise: the method comprises the steps of a first type of image and a second type of image, wherein the first type of image is an image containing a foreground object and the foreground object is larger than or equal to a preset ratio, the picture is marked as 1, the second type of image is an image containing a foreground object and the foreground object is smaller than the preset ratio, and the picture is marked as-1;

Training an original neural network model by using the preset number of images and foreground targets corresponding to the images to obtain a target neural network model, wherein the preset number of images are input into the original neural network model, and the foreground target images corresponding to the target images output by the trained target neural network model and the foreground targets actually corresponding to the target images meet a target loss function, and further, the training can be performed specifically by the following steps: inputting the preset number of images into the original neural network model to obtain the preset number of characteristic information output by a backbone network of the original neural network model; training two full-connection layers of the original neural network model according to the characteristic information of the preset number of images, wherein the characteristic information of the preset number of images is input into the two full-connection layers of the original neural network model, and the classification result corresponding to the target image output by the two full-connection layers of the trained original neural network model and the classification result actually corresponding to the target image meet a first loss function; training the feature information of the first type of image on a first foreground object segmentation structure of the original neural network model, wherein the feature information of the first type of image is input into the first foreground object segmentation structure of the original neural network model, and a second loss function is met by a foreground object corresponding to the target image and a foreground object actually corresponding to the target image output by the trained first foreground object segmentation structure of the original neural network model; and/or training the feature information of the second class of images on a second foreground object segmentation structure of the original neural network model, wherein the feature information of the second class of images is input into the second foreground object segmentation structure of the original neural network model, and a third loss function is satisfied by a foreground object corresponding to the target image and a foreground object actually corresponding to the target image output by the trained second foreground object segmentation structure of the original neural network model, wherein the target loss function is determined according to the first loss function and the second loss function, or the target loss function is determined according to the first loss function, the second loss function and the third loss function.

On the basis of the thought of a general segmentation algorithm, the embodiment of the invention adds an auxiliary branch to judge whether a target human body is positioned in a target center, refers to whether classification loss has a foreground target, guides a network to notice the foreground target and inhibits a background human body, and specifically comprises the following steps:

Step S1, constructing a data set for training, wherein the data set comprises two parts, one part is a picture of a human body in a complex background, a foreground target is positioned in the center of the picture and occupies 80%, and the other part is that the area of the foreground target is smaller than 80%;

Step S2, constructing a data set label, inputting a three-channel picture Xm (i, j), and representing the gray value of the mth picture at (i, j). Mask picture tag M (i, j) e {1,0}, if position (i, j) belongs to foreground target block, marking as 1, otherwise marking as 0; y _m epsilon {1, -1}, indicating whether the mth input picture meets the condition containing the foreground object, wherein 1 indicates that the input image meets the condition containing the foreground object, -1 indicates that the input image does not meet the condition containing the foreground object;

Step S3, constructing a human body segmentation model based on a convolutional neural network, wherein the human body segmentation model comprises two branches, a Resnet backbone (backbone network for feature extraction) parameter is shared, a first branch is used for making a human body target segmentation structure, the first branch comprises two 1X 1 convolutional layers, and the class of pixel-wise is output (the class of each pixel point); the other is composed of two fully connected layers (FC), judging whether the input picture contains a foreground target and the foreground target accounts for 80 percent;

Step S4, pre-training the segmentation network parameters based on COCO data, wherein the back two branch structures are generated by random initialization and are repeatedly iterated by applying the data set to update;

Judging whether the updated network parameters meet preset network parameter segmentation accuracy indexes or not after each iteration is finished;

Step S5, wherein the network parameters of the human body segmentation model determined in step S3 are used for judging the capability of the foreground patch picture by simultaneously applying the second branch training network of the human body segmentation model (the probability that the network output patch image finally confirmed through iterative optimization comprises a human body target and is positioned in the center and the ratio is more than 80 percent;

step S6, calculating loss of the split branch network and foreground target judgment loss in each iteration;

Further, using the L _segment (X, M) loss function in step S4, combining the total loss function in step S5 as follows, the first part being the segmentation loss, the second part being the classification loss of the target foreground judgment;

L _loss is the total loss function of the target neural network model (corresponding to the target loss function described above), L (X, Y) is the loss function of the split branch network (corresponding to the second loss function described above), and L _segment (X, M) is the loss function of the foreground target decision (corresponding to the first loss function described above).

Where L (X, Y) calculates the cross entropy loss between the network predicted value of the input image foreground object judgment and the real object label value of the input image.

L _segment (X, M) is the cross entropy loss between the network predicted segmentation map of the input image and the binary map annotated by the input image.

The calculation formula of the cross entropy is as follows:

Where p is the true label value of the input image and p' is the predicted value of the network.

Step S61, calculating the total loss function of the full connection layer and the full convolution layer through forward propagation;

Step S62, updating the above network parameters by back propagation.

The neural network used in the embodiment can directly segment the person in the image and judge the foreground target, the foreground target image is not required to be judged manually, the hyper-parameters in the algorithm are directly obtained through training, manual intervention is not required, and the method has higher practicability and robustness than the traditional segmentation algorithm.

Example 2

According to another embodiment of the present invention, there is also provided a foreground object extraction apparatus, fig. 3 is a block diagram of the foreground object extraction apparatus according to an embodiment of the present invention, as shown in fig. 3, including:

an acquisition module 32, configured to acquire a target image acquired by the camera;

The first input module 34 is configured to input the target image into a pre-trained target neural network model, so as to obtain feature information of the target image output by a backbone network of the target neural network model;

A second input module 36, configured to input the feature information into a full connection layer of the target neural network model, to obtain a classification result of the target image;

And a third input module 38, configured to input feature information of the target image into a corresponding foreground target segmentation structure according to the classification result, so as to obtain a foreground target image corresponding to the target image.

Optionally, the second input module 36 includes:

Optionally, the third input module 38 includes:

Fig. 4 is a block diagram of a foreground object extraction apparatus according to a preferred embodiment of the present invention, as shown in fig. 4, the apparatus further comprising:

A construction module 42, configured to construct a dataset, where the dataset includes a predetermined number of images and foreground objects corresponding to the images, and the images include: a first type of image and a second type of image, wherein the first type of image is an image containing a foreground object and the foreground object duty ratio is larger than or equal to a preset ratio, and the second type of image is an image containing a foreground object and the foreground object duty ratio is smaller than the preset ratio;

The training module 44 is configured to train the original neural network model by using the predetermined number of images and the foreground target corresponding to the images, so as to obtain the target neural network model, where the predetermined number of images are input to the original neural network model, and the foreground target image corresponding to the target image output by the trained target neural network model and the foreground target actually corresponding to the target image satisfy a target loss function.

Optionally, the training module 44 includes:

Optionally, the building module 42 includes:

It should be noted that each of the above modules may be implemented by software or hardware, and for the latter, it may be implemented by, but not limited to: the modules are all located in the same processor; or the above modules may be located in different processors in any combination.

Example 3

Embodiments of the present invention also provide a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.

Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:

s1, acquiring a target image acquired by a camera;

S2, inputting the target image into a pre-trained target neural network model to obtain characteristic information of the target image output by a backbone network of the target neural network model;

s3, inputting the characteristic information into a full-connection layer of the target neural network model to obtain a classification result of the target image;

and S4, inputting the characteristic information of the target image into a corresponding foreground target segmentation structure according to the classification result to obtain a foreground target image corresponding to the target image.

Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.

Example 4

An embodiment of the invention also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:

s1, acquiring a target image acquired by a camera;

Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments and optional implementations, and this embodiment is not described herein.

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A foreground object extraction method, characterized by comprising:

acquiring a target image acquired by a camera;

Inputting the characteristic information of the target image into a corresponding foreground target segmentation structure according to the classification result to obtain a foreground target image corresponding to the target image;

the step of inputting the characteristic information into the full connection layer of the target neural network model to obtain the classification result of the target image comprises the following steps:

under the condition that the target image contains a foreground target and the foreground target duty ratio is smaller than the preset ratio, inputting the characteristic information of the target image into two full-connection layers of the target neural network model to obtain a classification result of the target image as a second classification result;

Inputting the feature information of the target image into a corresponding foreground target segmentation structure according to the classification result, and obtaining a foreground target image corresponding to the target image comprises the following steps:

2. The method of claim 1, wherein prior to inputting the target image into a pre-trained target neural network model, obtaining feature information of the target image output by a backbone network of the target neural network model, the method further comprises:

3. The method of claim 2, wherein training an original neural network model using the predetermined number of images and the foreground object to which the images correspond to obtain the target neural network model comprises:

4. The method of claim 2, wherein constructing the dataset comprises:

5. A foreground object extraction apparatus, comprising:

The third input module is used for inputting the characteristic information of the target image into the corresponding foreground target segmentation structure according to the classification result to obtain a foreground target image corresponding to the target image;

wherein the second input module comprises:

the second input sub-module is used for inputting the characteristic information of the target image into two full-connection layers of the target neural network model under the condition that the target image contains a foreground target and the foreground target duty ratio is smaller than the preset ratio, so that a classification result of the target image is a second classification result;

wherein the third input module comprises:

6. A computer-readable storage medium, characterized in that the storage medium has stored therein a computer program, wherein the computer program is arranged to execute the method of any of the claims 1 to 4 when run.

7. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the method of any of the claims 1 to 4.