CN111833372B - Foreground target extraction method and device - Google Patents

Foreground target extraction method and device Download PDF

Info

Publication number
CN111833372B
CN111833372B CN202010728063.4A CN202010728063A CN111833372B CN 111833372 B CN111833372 B CN 111833372B CN 202010728063 A CN202010728063 A CN 202010728063A CN 111833372 B CN111833372 B CN 111833372B
Authority
CN
China
Prior art keywords
target
target image
foreground
image
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010728063.4A
Other languages
Chinese (zh)
Other versions
CN111833372A (en
Inventor
张迪
潘华东
殷俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202010728063.4A priority Critical patent/CN111833372B/en
Publication of CN111833372A publication Critical patent/CN111833372A/en
Application granted granted Critical
Publication of CN111833372B publication Critical patent/CN111833372B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a foreground object extraction method and a device, wherein the method comprises the following steps: acquiring a target image acquired by a camera; inputting the target image into a pre-trained target neural network model to obtain characteristic information of the target image output by a backbone network of the target neural network model; inputting the characteristic information into a full-connection layer of the target neural network model to obtain a classification result of the target image; according to the classification result, the characteristic information of the target image is input into a corresponding foreground target segmentation structure to obtain a foreground target image corresponding to the target image, so that the problem that the extraction effect on a foreground human body in a static image is poor due to human body segmentation based on a video image in the related technology can be solved, parameters in a target neural network model are directly obtained through training, manual intervention is not needed, the practicability is high, and the extraction effect on the foreground human body in the static image is achieved.

Description

Foreground target extraction method and device
Technical Field
The invention relates to the field of image processing, in particular to a foreground object extraction method and device.
Background
The object segmentation technology is mainly used for researching object segmentation based on panoramic pictures and segmenting general objects in images. In the field of gait recognition, the general scheme is to use a human body segmentation algorithm first and then use a gait recognition model to extract features.
The traditional human body segmentation algorithm has obvious advantages. Among them, the conventional human body segmentation algorithm can be roughly divided into 2 types: semantic segmentation and instance segmentation. Semantic segmentation algorithms typically use multi-scale segmentation networks, such as deep, PSP, etc., that are used to solve the panoramic object semantic segmentation problem. In the segmentation process, the size of the target, the scale of the target, the color gamut and the like have great influence on the segmentation effect. The example segmentation algorithm generally performs example segmentation on all types of targets in the panoramic picture, and classifies different human bodies into different types. Both segmentation algorithms segment the panoramic image, but cannot judge the foreground and background targets, and the algorithm for analyzing the foreground targets cannot be used.
A human body segmentation method based on a convolutional neural network is proposed in the related art, and includes the following steps. Step S3: the network parameters of the body segmentation model are randomly distributed, and the data set is repeatedly iterated to update the network parameters. Step S4: a prediction is made of the human body position of the video image to indicate a region of interest in the video image. Step S5: and (3) performing human body segmentation on the region of interest in the step S4 to acquire a human body in the video image. The human body segmentation method based on the convolutional neural network can be used for identifying the human body in real time and segmenting the human body in real time in the human body movement process, particularly in the high-speed human body movement process, better meets the video real-time requirement, and has higher identification accuracy and stability compared with the traditional technology. Meanwhile, the quality requirement on the video or the image is not high, and the human body identification and the human body segmentation can still be completed under the condition of poor definition of the video image. Human body segmentation is performed based on video images, and the extraction effect on foreground human bodies in static images is poor.
Aiming at the problem that the extraction effect of a foreground human body in a static image is poor when the human body is segmented based on a video image in the related art, no solution is proposed yet.
Disclosure of Invention
The embodiment of the invention provides a method and a device for extracting a foreground object, which at least solve the problem that the extraction effect on a foreground human body in a static image is poor when the human body is segmented based on a video image in the related technology.
According to an embodiment of the present invention, there is provided a foreground object extraction method including:
acquiring a target image acquired by a camera;
Inputting the target image into a pre-trained target neural network model to obtain characteristic information of the target image output by a backbone network of the target neural network model;
Inputting the characteristic information into a full-connection layer of the target neural network model to obtain a classification result of the target image;
and inputting the characteristic information of the target image into a corresponding foreground target segmentation structure according to the classification result to obtain a foreground target image corresponding to the target image.
Optionally, inputting the feature information into two fully connected layers of the target neural network model, and obtaining the classification result of the target image includes:
Inputting the characteristic information of the target image into a full-connection layer of the target neural network model under the condition that the target image contains a foreground target and the foreground target duty ratio is larger than or equal to a preset ratio, and obtaining a classification result of the target image as a first classification result;
And under the condition that the target image contains a foreground target and the foreground target duty ratio is smaller than the preset ratio, inputting the characteristic information of the target image into two full-connection layers of the target neural network model, and obtaining a classification result of the target image as a second classification result.
Optionally, inputting the feature information of the target image into a corresponding foreground target segmentation structure according to the classification result, and obtaining the foreground target image corresponding to the target image includes:
Inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure under the condition that the classification result is the first classification result to obtain a foreground target image corresponding to the target image; or alternatively
Inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure under the condition that the classification result is the first classification result to obtain a foreground target image corresponding to the target image; and under the condition that the classification result is the second classification result, inputting the characteristic information of the target image into a corresponding second foreground target segmentation structure to obtain a foreground target image corresponding to the target image.
Optionally, before inputting the target image into a pre-trained target neural network model, and obtaining feature information of the target image output by a backbone network of the target neural network model, the method further includes:
Constructing a data set, wherein the data set comprises a preset number of images and foreground objects corresponding to the images, and the images comprise: a first type of image and a second type of image, wherein the first type of image is an image containing a foreground object and the foreground object duty ratio is larger than or equal to a preset ratio, and the second type of image is an image containing a foreground object and the foreground object duty ratio is smaller than the preset ratio;
Training an original neural network model by using the preset number of images and foreground targets corresponding to the images to obtain the target neural network model, wherein the preset number of images are input into the original neural network model, and the foreground target images corresponding to the target images output by the trained target neural network model and the foreground targets actually corresponding to the target images meet a target loss function.
Optionally, training an original neural network model using the predetermined number of images and the foreground object corresponding to the images, to obtain the target neural network model includes:
inputting the preset number of images into the original neural network model to obtain the preset number of characteristic information output by a backbone network of the original neural network model;
Training two full-connection layers of the original neural network model according to the characteristic information of the preset number of images, wherein the characteristic information of the preset number of images is input into the two full-connection layers of the original neural network model, and the classification result corresponding to the target image output by the two full-connection layers of the trained original neural network model and the classification result actually corresponding to the target image meet a first loss function;
Training the feature information of the first type of image on a first foreground object segmentation structure of the original neural network model, wherein the feature information of the first type of image is input into the first foreground object segmentation structure of the original neural network model, and a second loss function is met by a foreground object corresponding to the target image and a foreground object actually corresponding to the target image output by the trained first foreground object segmentation structure of the original neural network model; and/or
Training the second foreground object segmentation structure of the original neural network model by using the characteristic information of the second class image, wherein the characteristic information of the second class image is input into the second foreground object segmentation structure of the original neural network model, and a third loss function is satisfied by the foreground object corresponding to the target image and the foreground object actually corresponding to the target image output by the trained second foreground object segmentation structure of the original neural network model, wherein the target loss function is determined according to the first loss function and the second loss function, or the target loss function is determined according to the first loss function, the second loss function and the third loss function.
Optionally, constructing the dataset includes:
for each image in the predetermined number of images in the dataset, acquiring gray values of all pixels of a foreground object corresponding to each image;
setting mask image labels for all pixels in each image, wherein if the pixels belong to a foreground object, the mask image labels corresponding to the pixels are 1, and if the pixels belong to a background, the mask image labels corresponding to the pixels are 0;
and marking whether each image is an image with a foreground object and the foreground object ratio being greater than or equal to a preset ratio.
According to another embodiment of the present invention, there is also provided a foreground object extraction apparatus including:
the acquisition module is used for acquiring a target image acquired by the camera;
the first input module is used for inputting the target image into a pre-trained target neural network model to obtain the characteristic information of the target image output by a backbone network of the target neural network model;
the second input module is used for inputting the characteristic information into the full-connection layer of the target neural network model to obtain a classification result of the target image;
And the third input module is used for inputting the characteristic information of the target image into the corresponding foreground target segmentation structure according to the classification result to obtain the foreground target image corresponding to the target image.
Optionally, the second input module includes:
The first input sub-module is used for inputting the characteristic information of the target image into two full-connection layers of the target neural network model under the condition that the target image contains a foreground target and the foreground target duty ratio is larger than or equal to a preset ratio, so that a classification result of the target image is a first classification result;
And the second input sub-module is used for inputting the characteristic information of the target image into two full-connection layers of the target neural network model under the condition that the target image contains a foreground target and the foreground target duty ratio is smaller than the preset ratio, so that the classification result of the target image is a second classification result.
Optionally, the third input module includes:
The third input sub-module is used for inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure to obtain a foreground target image corresponding to the target image under the condition that the classification result is the first classification result; or alternatively
The fourth input sub-module is used for inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure to obtain a foreground target image corresponding to the target image under the condition that the classification result is the first classification result; and under the condition that the classification result is the second classification result, inputting the characteristic information of the target image into a corresponding second foreground target segmentation structure to obtain a foreground target image corresponding to the target image.
Optionally, the apparatus further comprises:
A construction module, configured to construct a dataset, where the dataset includes a predetermined number of images and foreground objects corresponding to the images, and the images include: a first type of image and a second type of image, wherein the first type of image is an image containing a foreground object and the foreground object duty ratio is larger than or equal to a preset ratio, and the second type of image is an image containing a foreground object and the foreground object duty ratio is smaller than the preset ratio;
The training module is used for training an original neural network model by using the images with the preset number and the foreground targets corresponding to the images to obtain the target neural network model, wherein the images with the preset number are input of the original neural network model, and the foreground target images corresponding to the target images output by the trained target neural network model and the foreground targets actually corresponding to the target images meet a target loss function.
Optionally, the training module includes:
A fifth input sub-module, configured to input the predetermined number of images into the original neural network model, to obtain the predetermined number of feature information output by a backbone network of the original neural network model;
The first training submodule is used for training the two full-connection layers of the original neural network model according to the characteristic information of the images with the preset number, wherein the characteristic information of the images with the preset number is input into the two full-connection layers of the original neural network model, and the classification results corresponding to the target images output by the two full-connection layers of the trained original neural network model and the classification results actually corresponding to the target images meet a first loss function;
the second training sub-module is used for training the first foreground object segmentation structure of the original neural network model by the characteristic information of the first type of image, wherein the characteristic information of the first type of image is input into the first foreground object segmentation structure of the original neural network model, and a second loss function is met by a foreground object corresponding to the target image output by the trained first foreground object segmentation structure of the original neural network model and a foreground object actually corresponding to the target image; and/or
And the third training sub-module is used for training the second foreground object segmentation structure of the original neural network model by the characteristic information of the second class image, wherein the characteristic information of the second class image is the input of the second foreground object segmentation structure of the original neural network model, the trained foreground object corresponding to the target image output by the second foreground object segmentation structure of the original neural network model and the foreground object actually corresponding to the target image meet a third loss function, and the target loss function is determined according to the first loss function and the second loss function or the target loss function is determined according to the first loss function, the second loss function and the third loss function.
Optionally, the building module includes:
An acquisition sub-module, configured to acquire, for each image in the predetermined number of images in the dataset, gray values of all pixels of a foreground object corresponding to the each image;
The setting submodule is used for setting mask image labels for all pixels in each image, wherein if the pixels belong to a foreground object, the mask image labels corresponding to the pixels are 1, and if the pixels belong to a background, the mask image labels corresponding to the pixels are 0;
and the marking sub-module is used for marking whether each image is an image with a foreground target and the foreground target duty ratio being larger than or equal to a preset ratio.
According to a further embodiment of the invention, there is also provided a computer-readable storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
According to a further embodiment of the invention, there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
According to the invention, the target image acquired by the camera is acquired; inputting the target image into a pre-trained target neural network model to obtain characteristic information of the target image output by a backbone network of the target neural network model; inputting the characteristic information into a full-connection layer of the target neural network model to obtain a classification result of the target image; according to the classification result, feature information of the target image is input into a corresponding foreground target segmentation structure to obtain a foreground target image corresponding to the target image, the problem that in the related art, based on a video image, human segmentation is carried out, and the extraction effect on a foreground human body in a static image is poor can be solved, the foreground target in the target image can be directly segmented and the foreground target can be judged through a pre-trained target neural network model, parameters in the target neural network model are directly obtained through training, manual intervention is not needed, the practicability is high, and the extraction effect on the foreground human body in the static image is achieved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
fig. 1 is a block diagram of a hardware structure of a mobile terminal of a foreground object extraction method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a foreground object extraction method according to an embodiment of the invention;
FIG. 3 is a block diagram of a foreground object extraction apparatus according to an embodiment of the invention;
Fig. 4 is a block diagram of a foreground object extraction apparatus according to a preferred embodiment of the present invention.
Detailed Description
The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
Example 1
The method according to the first embodiment of the present application may be implemented in a mobile terminal, a computer terminal or a similar computing device. Taking the example of running on a mobile terminal, fig. 1 is a block diagram of a hardware structure of the mobile terminal according to the foreground object extraction method of the embodiment of the present application, as shown in fig. 1, the mobile terminal may include one or more (only one is shown in fig. 1) processors 102 (the processors 102 may include, but are not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory 104 for storing data, and optionally, the mobile terminal may further include a transmission device 106 for communication functions and an input/output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting of the structure of the mobile terminal described above. For example, the mobile terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to the foreground object extraction method in the embodiment of the present invention, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, implement the above-mentioned method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the mobile terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission means 106 is arranged to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as a NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.
In this embodiment, a foreground object extraction method running on the mobile terminal or the network architecture is provided, and fig. 2 is a flowchart of the foreground object extraction method according to an embodiment of the present invention, as shown in fig. 2, where the flowchart includes the following steps:
step S202, acquiring a target image acquired by a camera;
step S204, inputting the target image into a pre-trained target neural network model to obtain the characteristic information of the target image output by a backbone network of the target neural network model;
step S206, inputting the characteristic information into the full connection layer of the target neural network model to obtain a classification result of the target image;
The target neural network model in the embodiment of the invention at least comprises a backbone network and at least two full-connection layers, wherein the embodiment of the invention uses two full-connection layers as an example for illustration, and does not limit the number of the full-connection layers.
Further, the step S206 may specifically include:
Inputting the characteristic information of the target image into two full-connection layers of the target neural network model under the condition that the target image contains a foreground target and the foreground target duty ratio is larger than or equal to a preset ratio, and obtaining a classification result of the target image as a first classification result;
And under the condition that the target image contains a foreground target and the foreground target duty ratio is smaller than the preset ratio, inputting the characteristic information of the target image into two full-connection layers of the target neural network model, and obtaining a classification result of the target image as a second classification result.
And step S208, inputting the characteristic information of the target image into a corresponding foreground target segmentation structure according to the classification result to obtain a foreground target image corresponding to the target image.
Further, the step S208 may specifically include:
Inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure under the condition that the classification result is the first classification result to obtain a foreground target image corresponding to the target image; or under the condition that the classification result is the first classification result, inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure to obtain a foreground target image corresponding to the target image; and under the condition that the classification result is the second classification result, inputting the characteristic information of the target image into a corresponding second foreground target segmentation structure to obtain a foreground target image corresponding to the target image.
Through the steps S202 to S208, the problem that the extraction effect of the foreground human body in the static image is poor due to the fact that the human body segmentation is performed based on the video image in the related art can be solved, the foreground target in the target image can be directly segmented and the foreground target can be judged through the pre-trained target neural network model, parameters in the target neural network model are directly obtained through training, manual intervention is not needed, the practicability is high, and the extraction effect of the foreground human body in the static image is achieved.
In this embodiment, before inputting the target image into a pre-trained target neural network model to obtain feature information of the target image output by a backbone network of the target neural network model, a data set is constructed, and further, for each image in the predetermined number of images in the data set, gray values of all pixels of a foreground target corresponding to the each image are obtained; setting mask image labels for all pixels in each image, wherein if the pixels belong to a foreground object, the mask image labels corresponding to the pixels are 1, and if the pixels belong to a background, the mask image labels corresponding to the pixels are 0; marking whether each image is a foreground object and the foreground object ratio is greater than or equal to a preset ratio, wherein the data set comprises a preset number of images and foreground objects corresponding to the images, and the images comprise: the method comprises the steps of a first type of image and a second type of image, wherein the first type of image is an image containing a foreground object and the foreground object is larger than or equal to a preset ratio, the picture is marked as 1, the second type of image is an image containing a foreground object and the foreground object is smaller than the preset ratio, and the picture is marked as-1;
Training an original neural network model by using the preset number of images and foreground targets corresponding to the images to obtain a target neural network model, wherein the preset number of images are input into the original neural network model, and the foreground target images corresponding to the target images output by the trained target neural network model and the foreground targets actually corresponding to the target images meet a target loss function, and further, the training can be performed specifically by the following steps: inputting the preset number of images into the original neural network model to obtain the preset number of characteristic information output by a backbone network of the original neural network model; training two full-connection layers of the original neural network model according to the characteristic information of the preset number of images, wherein the characteristic information of the preset number of images is input into the two full-connection layers of the original neural network model, and the classification result corresponding to the target image output by the two full-connection layers of the trained original neural network model and the classification result actually corresponding to the target image meet a first loss function; training the feature information of the first type of image on a first foreground object segmentation structure of the original neural network model, wherein the feature information of the first type of image is input into the first foreground object segmentation structure of the original neural network model, and a second loss function is met by a foreground object corresponding to the target image and a foreground object actually corresponding to the target image output by the trained first foreground object segmentation structure of the original neural network model; and/or training the feature information of the second class of images on a second foreground object segmentation structure of the original neural network model, wherein the feature information of the second class of images is input into the second foreground object segmentation structure of the original neural network model, and a third loss function is satisfied by a foreground object corresponding to the target image and a foreground object actually corresponding to the target image output by the trained second foreground object segmentation structure of the original neural network model, wherein the target loss function is determined according to the first loss function and the second loss function, or the target loss function is determined according to the first loss function, the second loss function and the third loss function.
On the basis of the thought of a general segmentation algorithm, the embodiment of the invention adds an auxiliary branch to judge whether a target human body is positioned in a target center, refers to whether classification loss has a foreground target, guides a network to notice the foreground target and inhibits a background human body, and specifically comprises the following steps:
Step S1, constructing a data set for training, wherein the data set comprises two parts, one part is a picture of a human body in a complex background, a foreground target is positioned in the center of the picture and occupies 80%, and the other part is that the area of the foreground target is smaller than 80%;
Step S2, constructing a data set label, inputting a three-channel picture Xm (i, j), and representing the gray value of the mth picture at (i, j). Mask picture tag M (i, j) e {1,0}, if position (i, j) belongs to foreground target block, marking as 1, otherwise marking as 0; y m epsilon {1, -1}, indicating whether the mth input picture meets the condition containing the foreground object, wherein 1 indicates that the input image meets the condition containing the foreground object, -1 indicates that the input image does not meet the condition containing the foreground object;
Step S3, constructing a human body segmentation model based on a convolutional neural network, wherein the human body segmentation model comprises two branches, a Resnet backbone (backbone network for feature extraction) parameter is shared, a first branch is used for making a human body target segmentation structure, the first branch comprises two 1X 1 convolutional layers, and the class of pixel-wise is output (the class of each pixel point); the other is composed of two fully connected layers (FC), judging whether the input picture contains a foreground target and the foreground target accounts for 80 percent;
Step S4, pre-training the segmentation network parameters based on COCO data, wherein the back two branch structures are generated by random initialization and are repeatedly iterated by applying the data set to update;
Judging whether the updated network parameters meet preset network parameter segmentation accuracy indexes or not after each iteration is finished;
Step S5, wherein the network parameters of the human body segmentation model determined in step S3 are used for judging the capability of the foreground patch picture by simultaneously applying the second branch training network of the human body segmentation model (the probability that the network output patch image finally confirmed through iterative optimization comprises a human body target and is positioned in the center and the ratio is more than 80 percent;
step S6, calculating loss of the split branch network and foreground target judgment loss in each iteration;
Further, using the L segment (X, M) loss function in step S4, combining the total loss function in step S5 as follows, the first part being the segmentation loss, the second part being the classification loss of the target foreground judgment;
L loss is the total loss function of the target neural network model (corresponding to the target loss function described above), L (X, Y) is the loss function of the split branch network (corresponding to the second loss function described above), and L segment (X, M) is the loss function of the foreground target decision (corresponding to the first loss function described above).
Where L (X, Y) calculates the cross entropy loss between the network predicted value of the input image foreground object judgment and the real object label value of the input image.
L segment (X, M) is the cross entropy loss between the network predicted segmentation map of the input image and the binary map annotated by the input image.
The calculation formula of the cross entropy is as follows:
Where p is the true label value of the input image and p' is the predicted value of the network.
Step S61, calculating the total loss function of the full connection layer and the full convolution layer through forward propagation;
Step S62, updating the above network parameters by back propagation.
The neural network used in the embodiment can directly segment the person in the image and judge the foreground target, the foreground target image is not required to be judged manually, the hyper-parameters in the algorithm are directly obtained through training, manual intervention is not required, and the method has higher practicability and robustness than the traditional segmentation algorithm.
Example 2
According to another embodiment of the present invention, there is also provided a foreground object extraction apparatus, fig. 3 is a block diagram of the foreground object extraction apparatus according to an embodiment of the present invention, as shown in fig. 3, including:
an acquisition module 32, configured to acquire a target image acquired by the camera;
The first input module 34 is configured to input the target image into a pre-trained target neural network model, so as to obtain feature information of the target image output by a backbone network of the target neural network model;
A second input module 36, configured to input the feature information into a full connection layer of the target neural network model, to obtain a classification result of the target image;
And a third input module 38, configured to input feature information of the target image into a corresponding foreground target segmentation structure according to the classification result, so as to obtain a foreground target image corresponding to the target image.
Optionally, the second input module 36 includes:
The first input sub-module is used for inputting the characteristic information of the target image into two full-connection layers of the target neural network model under the condition that the target image contains a foreground target and the foreground target duty ratio is larger than or equal to a preset ratio, so that a classification result of the target image is a first classification result;
And the second input sub-module is used for inputting the characteristic information of the target image into two full-connection layers of the target neural network model under the condition that the target image contains a foreground target and the foreground target duty ratio is smaller than the preset ratio, so that the classification result of the target image is a second classification result.
Optionally, the third input module 38 includes:
The third input sub-module is used for inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure to obtain a foreground target image corresponding to the target image under the condition that the classification result is the first classification result; or alternatively
The fourth input sub-module is used for inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure to obtain a foreground target image corresponding to the target image under the condition that the classification result is the first classification result; and under the condition that the classification result is the second classification result, inputting the characteristic information of the target image into a corresponding second foreground target segmentation structure to obtain a foreground target image corresponding to the target image.
Fig. 4 is a block diagram of a foreground object extraction apparatus according to a preferred embodiment of the present invention, as shown in fig. 4, the apparatus further comprising:
A construction module 42, configured to construct a dataset, where the dataset includes a predetermined number of images and foreground objects corresponding to the images, and the images include: a first type of image and a second type of image, wherein the first type of image is an image containing a foreground object and the foreground object duty ratio is larger than or equal to a preset ratio, and the second type of image is an image containing a foreground object and the foreground object duty ratio is smaller than the preset ratio;
The training module 44 is configured to train the original neural network model by using the predetermined number of images and the foreground target corresponding to the images, so as to obtain the target neural network model, where the predetermined number of images are input to the original neural network model, and the foreground target image corresponding to the target image output by the trained target neural network model and the foreground target actually corresponding to the target image satisfy a target loss function.
Optionally, the training module 44 includes:
A fifth input sub-module, configured to input the predetermined number of images into the original neural network model, to obtain the predetermined number of feature information output by a backbone network of the original neural network model;
The first training submodule is used for training the two full-connection layers of the original neural network model according to the characteristic information of the images with the preset number, wherein the characteristic information of the images with the preset number is input into the two full-connection layers of the original neural network model, and the classification results corresponding to the target images output by the two full-connection layers of the trained original neural network model and the classification results actually corresponding to the target images meet a first loss function;
the second training sub-module is used for training the first foreground object segmentation structure of the original neural network model by the characteristic information of the first type of image, wherein the characteristic information of the first type of image is input into the first foreground object segmentation structure of the original neural network model, and a second loss function is met by a foreground object corresponding to the target image output by the trained first foreground object segmentation structure of the original neural network model and a foreground object actually corresponding to the target image; and/or
And the third training sub-module is used for training the second foreground object segmentation structure of the original neural network model by the characteristic information of the second class image, wherein the characteristic information of the second class image is the input of the second foreground object segmentation structure of the original neural network model, the trained foreground object corresponding to the target image output by the second foreground object segmentation structure of the original neural network model and the foreground object actually corresponding to the target image meet a third loss function, and the target loss function is determined according to the first loss function and the second loss function or the target loss function is determined according to the first loss function, the second loss function and the third loss function.
Optionally, the building module 42 includes:
An acquisition sub-module, configured to acquire, for each image in the predetermined number of images in the dataset, gray values of all pixels of a foreground object corresponding to the each image;
The setting submodule is used for setting mask image labels for all pixels in each image, wherein if the pixels belong to a foreground object, the mask image labels corresponding to the pixels are 1, and if the pixels belong to a background, the mask image labels corresponding to the pixels are 0;
and the marking sub-module is used for marking whether each image is an image with a foreground target and the foreground target duty ratio being larger than or equal to a preset ratio.
It should be noted that each of the above modules may be implemented by software or hardware, and for the latter, it may be implemented by, but not limited to: the modules are all located in the same processor; or the above modules may be located in different processors in any combination.
Example 3
Embodiments of the present invention also provide a computer readable storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:
s1, acquiring a target image acquired by a camera;
S2, inputting the target image into a pre-trained target neural network model to obtain characteristic information of the target image output by a backbone network of the target neural network model;
s3, inputting the characteristic information into a full-connection layer of the target neural network model to obtain a classification result of the target image;
and S4, inputting the characteristic information of the target image into a corresponding foreground target segmentation structure according to the classification result to obtain a foreground target image corresponding to the target image.
Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.
Example 4
An embodiment of the invention also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
s1, acquiring a target image acquired by a camera;
S2, inputting the target image into a pre-trained target neural network model to obtain characteristic information of the target image output by a backbone network of the target neural network model;
s3, inputting the characteristic information into a full-connection layer of the target neural network model to obtain a classification result of the target image;
and S4, inputting the characteristic information of the target image into a corresponding foreground target segmentation structure according to the classification result to obtain a foreground target image corresponding to the target image.
Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments and optional implementations, and this embodiment is not described herein.
It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. A foreground object extraction method, characterized by comprising:
acquiring a target image acquired by a camera;
Inputting the target image into a pre-trained target neural network model to obtain characteristic information of the target image output by a backbone network of the target neural network model;
Inputting the characteristic information into a full-connection layer of the target neural network model to obtain a classification result of the target image;
Inputting the characteristic information of the target image into a corresponding foreground target segmentation structure according to the classification result to obtain a foreground target image corresponding to the target image;
the step of inputting the characteristic information into the full connection layer of the target neural network model to obtain the classification result of the target image comprises the following steps:
Inputting the characteristic information of the target image into two full-connection layers of the target neural network model under the condition that the target image contains a foreground target and the foreground target duty ratio is larger than or equal to a preset ratio, and obtaining a classification result of the target image as a first classification result;
under the condition that the target image contains a foreground target and the foreground target duty ratio is smaller than the preset ratio, inputting the characteristic information of the target image into two full-connection layers of the target neural network model to obtain a classification result of the target image as a second classification result;
Inputting the feature information of the target image into a corresponding foreground target segmentation structure according to the classification result, and obtaining a foreground target image corresponding to the target image comprises the following steps:
Inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure under the condition that the classification result is the first classification result to obtain a foreground target image corresponding to the target image; or alternatively
Inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure under the condition that the classification result is the first classification result to obtain a foreground target image corresponding to the target image; and under the condition that the classification result is the second classification result, inputting the characteristic information of the target image into a corresponding second foreground target segmentation structure to obtain a foreground target image corresponding to the target image.
2. The method of claim 1, wherein prior to inputting the target image into a pre-trained target neural network model, obtaining feature information of the target image output by a backbone network of the target neural network model, the method further comprises:
Constructing a data set, wherein the data set comprises a preset number of images and foreground objects corresponding to the images, and the images comprise: a first type of image and a second type of image, wherein the first type of image is an image containing a foreground object and the foreground object duty ratio is larger than or equal to a preset ratio, and the second type of image is an image containing a foreground object and the foreground object duty ratio is smaller than the preset ratio;
Training an original neural network model by using the preset number of images and foreground targets corresponding to the images to obtain the target neural network model, wherein the preset number of images are input into the original neural network model, and the foreground target images corresponding to the target images output by the trained target neural network model and the foreground targets actually corresponding to the target images meet a target loss function.
3. The method of claim 2, wherein training an original neural network model using the predetermined number of images and the foreground object to which the images correspond to obtain the target neural network model comprises:
inputting the preset number of images into the original neural network model to obtain the preset number of characteristic information output by a backbone network of the original neural network model;
Training two full-connection layers of the original neural network model according to the characteristic information of the preset number of images, wherein the characteristic information of the preset number of images is input into the two full-connection layers of the original neural network model, and the classification result corresponding to the target image output by the two full-connection layers of the trained original neural network model and the classification result actually corresponding to the target image meet a first loss function;
Training the feature information of the first type of image on a first foreground object segmentation structure of the original neural network model, wherein the feature information of the first type of image is input into the first foreground object segmentation structure of the original neural network model, and a second loss function is met by a foreground object corresponding to the target image and a foreground object actually corresponding to the target image output by the trained first foreground object segmentation structure of the original neural network model; and/or
Training the second foreground object segmentation structure of the original neural network model by using the characteristic information of the second class image, wherein the characteristic information of the second class image is input into the second foreground object segmentation structure of the original neural network model, and a third loss function is satisfied by the foreground object corresponding to the target image and the foreground object actually corresponding to the target image output by the trained second foreground object segmentation structure of the original neural network model, wherein the target loss function is determined according to the first loss function and the second loss function, or the target loss function is determined according to the first loss function, the second loss function and the third loss function.
4. The method of claim 2, wherein constructing the dataset comprises:
for each image in the predetermined number of images in the dataset, acquiring gray values of all pixels of a foreground object corresponding to each image;
setting mask image labels for all pixels in each image, wherein if the pixels belong to a foreground object, the mask image labels corresponding to the pixels are 1, and if the pixels belong to a background, the mask image labels corresponding to the pixels are 0;
and marking whether each image is an image with a foreground object and the foreground object ratio being greater than or equal to a preset ratio.
5. A foreground object extraction apparatus, comprising:
the acquisition module is used for acquiring a target image acquired by the camera;
the first input module is used for inputting the target image into a pre-trained target neural network model to obtain the characteristic information of the target image output by a backbone network of the target neural network model;
the second input module is used for inputting the characteristic information into the full-connection layer of the target neural network model to obtain a classification result of the target image;
The third input module is used for inputting the characteristic information of the target image into the corresponding foreground target segmentation structure according to the classification result to obtain a foreground target image corresponding to the target image;
wherein the second input module comprises:
The first input sub-module is used for inputting the characteristic information of the target image into two full-connection layers of the target neural network model under the condition that the target image contains a foreground target and the foreground target duty ratio is larger than or equal to a preset ratio, so that a classification result of the target image is a first classification result;
the second input sub-module is used for inputting the characteristic information of the target image into two full-connection layers of the target neural network model under the condition that the target image contains a foreground target and the foreground target duty ratio is smaller than the preset ratio, so that a classification result of the target image is a second classification result;
wherein the third input module comprises:
The third input sub-module is used for inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure to obtain a foreground target image corresponding to the target image under the condition that the classification result is the first classification result; or alternatively
The fourth input sub-module is used for inputting the characteristic information of the target image into a corresponding first foreground target segmentation structure to obtain a foreground target image corresponding to the target image under the condition that the classification result is the first classification result; and under the condition that the classification result is the second classification result, inputting the characteristic information of the target image into a corresponding second foreground target segmentation structure to obtain a foreground target image corresponding to the target image.
6. A computer-readable storage medium, characterized in that the storage medium has stored therein a computer program, wherein the computer program is arranged to execute the method of any of the claims 1 to 4 when run.
7. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the method of any of the claims 1 to 4.
CN202010728063.4A 2020-07-23 2020-07-23 Foreground target extraction method and device Active CN111833372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010728063.4A CN111833372B (en) 2020-07-23 2020-07-23 Foreground target extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010728063.4A CN111833372B (en) 2020-07-23 2020-07-23 Foreground target extraction method and device

Publications (2)

Publication Number Publication Date
CN111833372A CN111833372A (en) 2020-10-27
CN111833372B true CN111833372B (en) 2024-07-02

Family

ID=72925818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010728063.4A Active CN111833372B (en) 2020-07-23 2020-07-23 Foreground target extraction method and device

Country Status (1)

Country Link
CN (1) CN111833372B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112365465B (en) * 2020-11-09 2024-02-06 浙江大华技术股份有限公司 Synthetic image category determining method and device, storage medium and electronic device
CN112614144A (en) * 2020-12-30 2021-04-06 深圳市联影高端医疗装备创新研究院 Image segmentation method, device, equipment and storage medium
CN113034514A (en) * 2021-03-19 2021-06-25 影石创新科技股份有限公司 Sky region segmentation method and device, computer equipment and storage medium
CN113963437A (en) * 2021-10-15 2022-01-21 武汉众智数字技术有限公司 Gait recognition sequence acquisition method and system based on deep learning
CN114445632A (en) * 2022-02-08 2022-05-06 支付宝(杭州)信息技术有限公司 Picture processing method and device
CN114782460B (en) * 2022-06-21 2022-10-18 阿里巴巴达摩院(杭州)科技有限公司 Image segmentation model generation method, image segmentation method and computer equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019233297A1 (en) * 2018-06-08 2019-12-12 Oppo广东移动通信有限公司 Data set construction method, mobile terminal and readable storage medium
CN111382808A (en) * 2020-05-29 2020-07-07 浙江大华技术股份有限公司 Vehicle detection processing method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018033156A1 (en) * 2016-08-19 2018-02-22 北京市商汤科技开发有限公司 Video image processing method, device, and electronic apparatus
CN109522967A (en) * 2018-11-28 2019-03-26 广州逗号智能零售有限公司 A kind of commodity attribute recognition methods, device, equipment and storage medium
DE102018010197A1 (en) * 2018-12-18 2020-06-18 GRID INVENT gGmbH Electronic element and electrically controlled display element
CN110415233A (en) * 2019-07-26 2019-11-05 东南大学 Pavement crack rapid extracting method based on two step convolutional neural networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019233297A1 (en) * 2018-06-08 2019-12-12 Oppo广东移动通信有限公司 Data set construction method, mobile terminal and readable storage medium
CN111382808A (en) * 2020-05-29 2020-07-07 浙江大华技术股份有限公司 Vehicle detection processing method and device

Also Published As

Publication number Publication date
CN111833372A (en) 2020-10-27

Similar Documents

Publication Publication Date Title
CN111833372B (en) Foreground target extraction method and device
JP7236545B2 (en) Video target tracking method and apparatus, computer apparatus, program
US11151723B2 (en) Image segmentation method, apparatus, and fully convolutional network system
CN108765278B (en) Image processing method, mobile terminal and computer readable storage medium
CN109583345B (en) Road recognition method, device, computer device and computer readable storage medium
CN110379020B (en) Laser point cloud coloring method and device based on generation countermeasure network
CN111739027B (en) Image processing method, device, equipment and readable storage medium
JP2023533907A (en) Image processing using self-attention-based neural networks
CN108596240B (en) Image semantic segmentation method based on discriminant feature network
CN110599554A (en) Method and device for identifying face skin color, storage medium and electronic device
CN110222718A (en) The method and device of image procossing
CN111914762A (en) Gait information-based identity recognition method and device
Grigorev et al. Depth estimation from single monocular images using deep hybrid network
WO2021103474A1 (en) Image processing method and apparatus, storage medium and electronic apparatus
CN114445651A (en) Training set construction method and device of semantic segmentation model and electronic equipment
CN112800978A (en) Attribute recognition method, and training method and device for part attribute extraction network
CN111401193A (en) Method and device for obtaining expression recognition model and expression recognition method and device
CN112906517A (en) Self-supervision power law distribution crowd counting method and device and electronic equipment
CN112580750A (en) Image recognition method and device, electronic equipment and storage medium
CN116959113A (en) Gait recognition method and device
CN115115552B (en) Image correction model training method, image correction device and computer equipment
CN113129329A (en) Method and device for constructing dense point cloud based on base station target segmentation
CN116958729A (en) Training of object classification model, object classification method, device and storage medium
CN111079617A (en) Poultry identification method and device, readable storage medium and electronic equipment
Zhang et al. A compensation textures dehazing method for water alike area

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant