CN117197178A - Foreground and background segmentation method, electronic device and computer readable medium - Google Patents

Foreground and background segmentation method, electronic device and computer readable medium Download PDF

Info

Publication number
CN117197178A
CN117197178A CN202210611875.XA CN202210611875A CN117197178A CN 117197178 A CN117197178 A CN 117197178A CN 202210611875 A CN202210611875 A CN 202210611875A CN 117197178 A CN117197178 A CN 117197178A
Authority
CN
China
Prior art keywords
foreground
background segmentation
characteristic data
data
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210611875.XA
Other languages
Chinese (zh)
Inventor
尹芹
方晖
王金东
霍智勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN202210611875.XA priority Critical patent/CN117197178A/en
Priority to PCT/CN2023/097502 priority patent/WO2023232086A1/en
Publication of CN117197178A publication Critical patent/CN117197178A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The application provides a foreground and background segmentation method, electronic equipment and a computer readable medium, wherein the foreground and background segmentation method comprises the following steps: acquiring first characteristic data of image data to be segmented; performing feature compression aggregation on the first feature data by adopting an encoder in the trained foreground and background segmentation network to obtain third feature data; wherein the foreground-background segmentation network comprises: an encoder, a network underlying block and a decoder; calculating the third characteristic data by adopting the encoder through a self-attention mechanism to obtain a channel self-adaptive weight; calculating fourth characteristic data according to the channel self-adaptive weight and the first characteristic data by adopting the encoder; and inputting the fourth characteristic data obtained after the first characteristic data is processed by the encoder into the network bottom layer block, and processing the fourth characteristic data by the network bottom layer block and the decoder to obtain a first foreground and background segmentation result.

Description

Foreground and background segmentation method, electronic device and computer readable medium
Technical Field
The embodiment of the application relates to the technical field of artificial intelligence, in particular to a foreground and background segmentation method, electronic equipment and a computer readable medium.
Background
In the field of image processing, in some scenarios, foreground-background segmentation of an image is required for subsequent image processing and analysis. For example, in sports games, how to accurately track players to further generate relevant statistics such as ball control, player speed, distance traveled, etc., which are useful to professionals such as coaches and add to the entertainment value of spectators. Prior to tracking, each frame typically needs to be segmented from the input video in order to acquire the object of interest in each frame. It is considered as a preprocessing stage on which other higher order visual tasks depend, and it is critical for segmentation algorithms to ensure higher than real-time processing speed and low error resolution.
The segmentation effect of the current segmentation algorithm is not ideal.
Disclosure of Invention
The embodiment of the application provides a foreground and background segmentation method, electronic equipment and a computer readable medium.
In a first aspect, an embodiment of the present application provides a foreground and background segmentation method, including: acquiring first characteristic data of image data to be segmented; performing feature compression aggregation on the first feature data by adopting an encoder in the trained foreground and background segmentation network to obtain third feature data; wherein the foreground-background segmentation network comprises: an encoder, a network underlying block and a decoder; calculating the third characteristic data by adopting the encoder through a self-attention mechanism to obtain a channel self-adaptive weight; calculating fourth characteristic data according to the channel self-adaptive weight and the first characteristic data by adopting the encoder; and inputting the fourth characteristic data obtained after the first characteristic data is processed by the encoder into the network bottom layer block, and processing the fourth characteristic data by the network bottom layer block and the decoder to obtain a first foreground and background segmentation result.
In a second aspect, an embodiment of the present application provides an electronic device, including: at least one processor; and the memory is stored with at least one program, and when the at least one program is executed by the at least one processor, any one of the foreground and background segmentation methods is realized.
In a third aspect, embodiments of the present application provide a computer readable medium having a computer program stored thereon, which when executed by a processor implements any of the above-described foreground and background segmentation methods.
According to the foreground and background segmentation method provided by the embodiment of the application, after the second characteristic data is subjected to characteristic compression aggregation treatment, the self-adaptive weight of the channel is obtained by adopting a self-attention mechanism, namely, the correlation among the channels is established, the response of the characteristic channel is adaptively re-optimized, and the capturing capability of the front and rear Jing Fenge remote context information is improved, so that the foreground and background segmentation precision is improved.
Drawings
FIG. 1 is a flowchart of a foreground and background segmentation method according to an embodiment of the present application;
FIG. 2 is a schematic diagram illustrating the processing of a coding block in a U-network according to an embodiment of the present application;
fig. 3 is a block diagram of a foreground and background segmentation apparatus according to another embodiment of the present application.
Detailed Description
In order to better understand the technical solutions of the present application, the following describes the foreground segmentation method, the electronic device and the computer readable medium provided by the present application in detail with reference to the accompanying drawings.
Example embodiments will be described more fully hereinafter with reference to the accompanying drawings, but may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the application to those skilled in the art.
The embodiments of the application and features of the embodiments may be combined with each other without conflict.
As used herein, the term "and/or" includes any and all combinations of at least one of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of at least one other feature, integer, step, operation, element, component, and/or group thereof.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present application and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Fig. 1 is a flowchart of a foreground and background segmentation method according to an embodiment of the present application.
In a first aspect, referring to fig. 1, an embodiment of the present application provides a foreground and background segmentation method, including:
step 100, obtaining first characteristic data of image data to be segmented.
In some exemplary embodiments, the first feature data may be obtained by performing convolution operation on the image data to be segmented multiple times.
Assuming that the size of the image to be segmented is c×h×w, where C is the number of channels, H is the height, and W is the width, the size of the first feature data is also c×h×w.
Step 101, performing feature compression aggregation processing on the first feature data by adopting an encoder in a trained foreground and background segmentation network to obtain third feature data; wherein, the foreground and background segmentation network comprises: an encoder, a network underlying block and a decoder; calculating the third characteristic data by adopting the encoder through a self-attention mechanism to obtain a channel self-adaptive weight; and calculating fourth characteristic data according to the channel self-adaptive weight and the first characteristic data by adopting the encoder.
In some exemplary embodiments, the encoder includes at least one encoding block, each encoding block performing the following processing on the input second characteristic data: performing feature compression aggregation on the second feature data to obtain N third feature data; wherein N is an integer greater than or equal to 3; calculating N pieces of third characteristic data through a self-attention mechanism to obtain the channel self-adaptive weight; calculating the fourth characteristic data according to the channel self-adaptive weight and the second characteristic data; wherein, in the case that the code block is the 1 st code block, the second feature data is the first feature data; in the case where the code block is the i-th code block and i is an integer greater than or equal to 2, the second characteristic data is fourth characteristic data outputted from the (i-1) -th code block.
In some exemplary embodiments, the foreground-background split network may be any of the U-shaped networks known to those skilled in the art. For example, a U-network includes: an encoder, a decoder, and a network underlying block, the encoder comprising at least one encoded block, the decoder comprising at least one decoded block. The output of the encoder is input to the network bottom layer block as the input of the network bottom layer block, the output of the network bottom layer block is input to the decoder as the input of the decoder, and the output of the decoder is the foreground and background segmentation result, such as the first foreground and background segmentation result and the second foreground and background segmentation result; the output of the previous encoding block in the encoder is input into the next encoding block as the input of the next encoding block, and the output of the previous decoding block in the decoder is input into the next decoding block as the input of the next decoding block.
In some exemplary embodiments, as shown in fig. 2, performing feature compression aggregation processing on the second feature data to obtain N third feature data includes: carrying out global average pooling treatment on the second characteristic data to obtain global information; performing convolution operation and activation function operation on the global information and a kth convolution kernel to obtain kth third characteristic data; wherein k is an integer greater than or equal to 1 and less than or equal to N. I.e. the second characteristic data is split into N branches along the channel dimension.
In some exemplary embodiments, performing global average pooling on the second feature data to obtain global information includes: calculating according to a formula w=g (x) to obtain global information; wherein w is global information, g (x) is a global average pooling function, and x is second feature data.
In some exemplary embodiments, g (x) may be an average value of image data corresponding to each channel in the second feature data, and then the size of global information obtained through the averaging and pooling process is c×1×1.
In some exemplary embodiments, the convolution kernels corresponding to the different channels are the same or different, that is, the value of k is different, and the kth convolution kernel is the same or different.
In some exemplary embodiments, the size of the kth convolution kernel is 1×1.
In some exemplary embodiments, convolving the global information with a kth convolution kernel and performing an activation function operation to obtain kth of the third feature data comprises: according to the formulaCalculating to obtain kth third characteristic data; wherein X is k For the kth said third characteristic data, < > and>for the third convolution function, σ () is the first activation function.
In some example embodiments, the first activation function may be an activation function sigmoid.
In some exemplary embodiments, as shown in fig. 2, calculating the channel adaptive weights from the N third feature data by a self-attention mechanism includes: calculating a kth normalized weight of kth third feature data according to the kth third feature data; and determining the channel self-adaptive weight according to the 1 st normalized weight to the N th normalized weight.
In some exemplary embodiments, the data is according to equation X according to the kth third characteristic data j =σ(W 2 {δ(W 1 {x j -j) calculating the kth normalized weight; wherein X is k For the kth normalized weight, σ () is the first activation function, W 1 { } is a first convolution function, W 2 { } is the second convolution function, delta () is the second activation function, x k And the third characteristic data corresponding to the kth channel.
In some exemplary embodiments, W 1 { } and W 2 The corresponding convolution kernels are all 1 x 1 in size.
In some exemplary embodiments, the second activation function is an activation function ReLU.
In some exemplary embodiments, as shown in fig. 2, N is 3, and determining the channel adaptive weights from the 1 st normalized weight to the N-th normalized weight includes: carrying out dot product operation, scale scaling and normalization on the 1 st normalized weight and the 2 nd normalized weight to obtain a channel autocorrelation weight matrix; and carrying out dot product operation on the channel self-correlation weight matrix and the 3 rd normalized weight to obtain the channel self-adaptive weight.
In some exemplary embodiments, as shown in fig. 2, performing dot product operation, scaling process and normalization process on the 1 st normalized weight and the 2 nd normalized weight to obtain a channel autocorrelation weight matrix includes: according to the 1 st normalized weight and the 2 nd normalized weight, according to the formulaCalculating to obtain a channel autocorrelation weight matrix; wherein X is T For the channel autocorrelation weight matrix, softmax () is the third activation function, X 1 For the first normalized weight, X 2 Normalized weight for 2 nd, d 2 Is a scale.
In this embodiment, the scaling process is performed for the purpose of preventing scale explosion of dot product operations, and a third activation function is used to normalize the weights.
In some exemplary embodiments, as shown in fig. 2, performing a dot product operation on the channel autocorrelation weight matrix and the 3 rd normalized weight to obtain the channel adaptive weight includes: according to formula X s =X T ·X 3 Calculating to obtain a channel self-adaptive weight; wherein X is s For channel adaptive weights, X T For the channel autocorrelation weight matrix, X 3 The weights are normalized for 3 rd.
In some exemplary embodiments, as shown in fig. 2, performing a dot product operation on the channel adaptive weights and the second feature data to obtain fourth feature data includes: according to formula X c =X s X calculating to obtain fourth characteristic data; wherein X is c For the fourth characteristic data, X s And (3) self-adapting weights for channels, wherein x is second characteristic data.
And 102, inputting fourth characteristic data obtained after the first characteristic data are processed by the encoder into a network bottom layer block, and processing the fourth characteristic data by the network bottom layer block and a decoder to obtain a first foreground and background segmentation result.
In some exemplary embodiments, inputting the fourth feature data obtained after the first feature data is processed by the encoder into the network underlying block and processing the fourth feature data by the network underlying block and the decoder to obtain a first foreground and background segmentation result includes: and inputting fourth characteristic data output by the last coding block into the network bottom layer block, and processing the fourth characteristic data by the network bottom layer block and the decoder to obtain a first foreground and background segmentation result.
In some exemplary embodiments, before acquiring the first feature data of the image data to be segmented, the method further comprises: acquiring a training image dataset; performing enhanced preprocessing on the training image data set; and performing model training on the foreground and background segmentation network according to the training image data set after the enhancement pretreatment.
In some exemplary embodiments, a COCO dataset may be employed as the training image dataset, the COCO dataset being a dataset for human visual cognition and visual understanding purposes including a plurality of scene categories, each category in the COCO dataset comprising 400 images, meeting the requirement of extensive data training in deep learning methods.
In some exemplary embodiments, enhancing the training image dataset comprises: performing foreground and background segmentation on each training image data in the training image data set to obtain a second foreground and background segmentation result; and labeling the training image data with a second foreground and background segmentation result.
In some exemplary embodiments, the enhancing preprocessing of the training image dataset further comprises: and performing rotation, clipping and other processing on the training image data to obtain more training image data.
In some exemplary embodiments, model training the foreground-background segmentation network from the enhanced preprocessed training image dataset comprises: and respectively acquiring first characteristic data of each piece of training image data after the enhancement pretreatment in the training image data set after the enhancement pretreatment, taking the first characteristic data as the input of a foreground and background segmentation network, and taking a second foreground and background segmentation result as the output of the foreground and background segmentation network to perform model training on the foreground and background segmentation network.
In some exemplary embodiments, training of the foreground-background segmentation network may be achieved using training methods well known to those skilled in the art, and will not be described in detail herein.
It should be noted that, in the foreground-background segmentation method according to the embodiment of the present application, foreground data and background data in image data are segmented, where the foreground data refers to image data of interest, and the background data refers to image data of no interest. For example, in a sports game, the image data corresponding to a player and a ball are foreground data, and the other image data are background data.
According to the foreground and background segmentation method provided by the embodiment of the application, after the second characteristic data is subjected to characteristic compression aggregation treatment, the self-adaptive weight of the channel is obtained by adopting a self-attention mechanism, namely, the correlation among the channels is established, the response of the characteristic channel is adaptively re-optimized, and the capturing capability of the front and rear Jing Fenge remote context information is improved, so that the foreground and background segmentation precision is improved.
In a second aspect, another embodiment of the present application provides an electronic device, including: at least one processor; and the memory is stored with at least one program, and when the at least one program is executed by the at least one processor, any one of the foreground and background segmentation methods is realized.
Wherein the processor is a device having data processing capabilities including, but not limited to, a Central Processing Unit (CPU) or the like; the memory is a device with data storage capability including, but not limited to, random access memory (RAM, more specifically SDRAM, DDR, etc.), read-only memory (ROM), electrically charged erasable programmable read-only memory (EEPROM), FLASH memory (FLASH).
In some embodiments, the processor, the memory, and the other components of the computing device are connected to each other via a bus.
In a third aspect, another embodiment of the present application provides a computer readable medium having a computer program stored thereon, which when executed by a processor implements any one of the above-described foreground and background segmentation methods.
Fig. 3 is a block diagram of a foreground and background segmentation apparatus according to another embodiment of the present application.
Fourth aspect, referring to fig. 3, another embodiment of the present application provides a foreground and background segmentation apparatus, including: a feature data obtaining module 301, configured to obtain first feature data of image data to be segmented; the foreground and background segmentation module 302 is configured to perform feature compression aggregation processing on the first feature data by using an encoder in a trained foreground and background segmentation network to obtain third feature data; wherein the foreground-background segmentation network comprises: an encoder, a network underlying block and a decoder; calculating the third characteristic data by adopting the encoder through a self-attention mechanism to obtain a channel self-adaptive weight; calculating fourth characteristic data according to the channel self-adaptive weight and the first characteristic data by adopting the encoder; and inputting the fourth characteristic data obtained after the first characteristic data is processed by the encoder into the network bottom layer block, and processing the fourth characteristic data by the network bottom layer block and the decoder to obtain a first foreground and background segmentation result.
In some exemplary embodiments, further comprising: a data set acquisition module 303 for acquiring a training image data set; a preprocessing module 304, configured to perform enhanced preprocessing on the training image dataset; the model training module 305 is configured to perform model training on the foreground and background segmentation network according to the training image dataset after the enhancement preprocessing.
In some exemplary embodiments, the preprocessing module 304 is specifically configured to: performing foreground and background segmentation on each training image data in the training image data set to obtain a second foreground and background segmentation result; and labeling the training image data with a second foreground and background segmentation result.
In some exemplary embodiments, the encoder includes at least one encoding block, each encoding block performing the following processing on the input second characteristic data: performing feature compression aggregation on the second feature data to obtain N third feature data; wherein N is an integer greater than or equal to 3; calculating N pieces of third characteristic data through a self-attention mechanism to obtain the channel self-adaptive weight; calculating the fourth characteristic data according to the channel self-adaptive weight and the second characteristic data; wherein, in the case that the code block is the 1 st code block, the second feature data is the first feature data; in the case where the code block is an i-th code block and i is an integer greater than or equal to 2, the second characteristic data is fourth characteristic data output from the (i-1) -th code block;
the foreground-background segmentation module 302 is specifically configured to implement that the fourth feature data obtained by processing the first feature data with the encoder is input to the network bottom layer block, and is processed with the network bottom layer block and the decoder to obtain a first foreground-background segmentation result in the following manner: and inputting fourth characteristic data output by the last coding block into the network bottom layer block, and processing the fourth characteristic data by the network bottom layer block and the decoder to obtain a first foreground and background segmentation result.
In some exemplary embodiments, the foreground-background segmentation module 302 is specifically configured to perform feature compression aggregation processing on the second feature data to obtain N third feature data in the following manner: carrying out global average pooling treatment on the second characteristic data to obtain global information; performing convolution operation and activation function operation on the global information and a kth convolution kernel to obtain kth third characteristic data; wherein k is an integer greater than or equal to 1 and less than or equal to N.
In some exemplary embodiments, the foreground-background segmentation module 302 is specifically configured to calculate the channel adaptive weights by using a self-attention mechanism to calculate the N pieces of third feature data in the following manner: calculating a kth normalized weight of kth third feature data according to the kth third feature data; and determining the channel self-adaptive weight according to the 1 st normalized weight to the N th normalized weight.
In some exemplary embodiments, the foreground-background segmentation module 302 is specifically configured to implement calculating a kth normalized weight of the kth third feature data from the kth third feature data in the following manner: according to the k third characteristic data and the formula X j =σ(W 2 {δ(W 1 {x j -j) calculating a kth normalized weight; wherein X is k For the kth normalized weight, σ () is the first activation function, W 1 { } is a first convolution function, W 2 { } is the second convolution function, delta () is the second activation function, x k Is the kth third feature data.
In some exemplary embodiments, N is 3, and the foreground-background segmentation module 302 is specifically configured to determine the channel adaptive weights from the 1 st normalized weight to the nth normalized weight by: carrying out dot product operation, scale scaling and normalization on the 1 st normalized weight and the 2 nd normalized weight to obtain a channel autocorrelation weight matrix; and carrying out dot product operation on the channel self-correlation weight matrix and the 3 rd normalized weight to obtain the channel self-adaptive weight.
In some exemplary embodiments, the foreground-background segmentation module 302 is specifically configured to implement performing dot product operation, scaling process, and normalization process on the 1 st normalized weight and the 2 nd normalized weight in the following manner to obtain a channel self-correlationCorrelation weight matrix: according to the 1 st normalized weight and the 2 nd normalized weight, the formula is followedCalculating to obtain a channel autocorrelation weight matrix; wherein X is T For the channel autocorrelation weight matrix, softmax () is the third activation function, X 1 For the first normalized weight, X 2 Normalized weight for 2 nd, d 2 Is a scale.
The specific implementation process of the foreground and background segmentation device is the same as that of the foreground and background segmentation method in the foregoing embodiment, and will not be described in detail here.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purpose of limitation. In some instances, it will be apparent to one skilled in the art that features, characteristics, and/or elements described in connection with a particular embodiment may be used alone or in combination with other embodiments unless explicitly stated otherwise. It will therefore be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the scope of the present application as set forth in the following claims.

Claims (11)

1. A foreground-background segmentation method, comprising:
acquiring first characteristic data of image data to be segmented;
performing feature compression aggregation on the first feature data by adopting an encoder in the trained foreground and background segmentation network to obtain third feature data; wherein the foreground-background segmentation network comprises: an encoder, a network underlying block and a decoder;
calculating the third characteristic data by adopting the encoder through a self-attention mechanism to obtain a channel self-adaptive weight;
calculating fourth characteristic data according to the channel self-adaptive weight and the first characteristic data by adopting the encoder;
and inputting the fourth characteristic data obtained after the first characteristic data is processed by the encoder into the network bottom layer block, and processing the fourth characteristic data by the network bottom layer block and the decoder to obtain a first foreground and background segmentation result.
2. The foreground-background segmentation method of claim 1, further comprising, prior to the acquiring the first feature data of the image data to be segmented:
acquiring a training image dataset;
performing enhanced preprocessing on the training image data set;
and carrying out model training on the foreground and background segmentation network according to the training image data set after the enhancement pretreatment.
3. The foreground-background segmentation method of claim 2, wherein the enhancing the training image dataset comprises:
performing foreground and background segmentation on each training image data in the training image data set to obtain a second foreground and background segmentation result;
and labeling the training image data with the second foreground and background segmentation result.
4. A foreground-background segmentation method according to any one of claims 1-3, wherein the encoder comprises at least one encoding block, each encoding block processing the input second feature data by:
performing feature compression aggregation on the second feature data to obtain N third feature data; wherein N is an integer greater than or equal to 3;
calculating N pieces of third characteristic data through a self-attention mechanism to obtain the channel self-adaptive weight;
calculating the fourth characteristic data according to the channel self-adaptive weight and the second characteristic data;
wherein, in the case that the code block is the 1 st code block, the second feature data is the first feature data; in the case where the code block is an i-th code block and i is an integer greater than or equal to 2, the second characteristic data is fourth characteristic data output from the (i-1) -th code block;
the step of inputting the fourth characteristic data obtained after the first characteristic data is processed by the encoder to the network bottom layer block, and processing the fourth characteristic data by the network bottom layer block and the decoder to obtain a first foreground and background segmentation result comprises the following steps:
and inputting fourth characteristic data output by the last coding block into the network bottom layer block, and processing the fourth characteristic data by the network bottom layer block and the decoder to obtain a first foreground and background segmentation result.
5. The foreground and background segmentation method of claim 4, wherein the performing feature compression aggregation on the second feature data to obtain N third feature data comprises:
carrying out global average pooling treatment on the second characteristic data to obtain global information;
performing convolution operation and activation function operation on the global information and a kth convolution kernel to obtain kth third characteristic data; wherein k is an integer greater than or equal to 1 and less than or equal to N.
6. The foreground-background segmentation method of claim 4, wherein the computing the N third feature data by a self-attention mechanism to obtain the channel adaptive weights comprises:
calculating a kth normalized weight of kth third feature data according to the kth third feature data;
and determining the channel self-adaptive weight according to the 1 st normalized weight to the N th normalized weight.
7. The foreground-background segmentation method of claim 6, wherein the third feature data according to kth is according to formula X j =σ(W 2 {δ(W 1 {x j -j) calculating the kth normalized weight;
wherein X is k For the kth normalized weight, σ () is the first activation function, W 1 { } is a first convolution function, W 2 { } is the second convolution function, delta () is the second activation function, x k And the third characteristic data is kth.
8. The foreground-background segmentation method of claim 6, wherein N is 3, the determining the channel adaptive weights according to the 1 st-nth normalized weights comprises:
performing dot product operation, scale scaling and normalization on the 1 st normalized weight and the 2 nd normalized weight to obtain a channel autocorrelation weight matrix;
and carrying out dot product operation on the channel self-correlation weight matrix and the 3 rd normalized weight to obtain the channel self-adaptive weight.
9. The foreground-background segmentation method of claim 8, wherein performing dot product operation, scaling and normalization on the 1 st normalized weight and the 2 nd normalized weight to obtain a channel autocorrelation weight matrix comprises:
according to the 1 st normalized weight and the 2 nd normalized weight, according to the formulaCalculating to obtain the channel autocorrelation weight matrix;
wherein X is T For the channel autocorrelation weight matrix, softmax () is the third activation function, X 1 For the first normalized weight, X 2 Normalized weight for the 2 nd, d 2 Is a scale.
10. An electronic device, comprising:
at least one processor;
a memory having at least one program stored thereon, which when executed by the at least one processor, implements the foreground-background segmentation method of any one of claims 1-9.
11. A computer readable medium having stored thereon a computer program which, when executed by a processor, implements the foreground segmentation method of any one of claims 1-9.
CN202210611875.XA 2022-05-31 2022-05-31 Foreground and background segmentation method, electronic device and computer readable medium Pending CN117197178A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210611875.XA CN117197178A (en) 2022-05-31 2022-05-31 Foreground and background segmentation method, electronic device and computer readable medium
PCT/CN2023/097502 WO2023232086A1 (en) 2022-05-31 2023-05-31 Foreground and background segmentation method, electronic device and computer-readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210611875.XA CN117197178A (en) 2022-05-31 2022-05-31 Foreground and background segmentation method, electronic device and computer readable medium

Publications (1)

Publication Number Publication Date
CN117197178A true CN117197178A (en) 2023-12-08

Family

ID=88994818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210611875.XA Pending CN117197178A (en) 2022-05-31 2022-05-31 Foreground and background segmentation method, electronic device and computer readable medium

Country Status (2)

Country Link
CN (1) CN117197178A (en)
WO (1) WO2023232086A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020113355A1 (en) * 2018-12-03 2020-06-11 Intel Corporation A content adaptive attention model for neural network-based image and video encoders
US11270447B2 (en) * 2020-02-10 2022-03-08 Hong Kong Applied Science And Technology Institute Company Limited Method for image segmentation using CNN
CN113569881A (en) * 2020-04-28 2021-10-29 上海舜瞳科技有限公司 Self-adaptive semantic segmentation method based on chain residual error and attention mechanism
CN113570508A (en) * 2020-04-29 2021-10-29 上海耕岩智能科技有限公司 Image restoration method and device, storage medium and terminal
CN112967327A (en) * 2021-03-04 2021-06-15 国网河北省电力有限公司检修分公司 Monocular depth method based on combined self-attention mechanism

Also Published As

Publication number Publication date
WO2023232086A1 (en) 2023-12-07

Similar Documents

Publication Publication Date Title
CN110033003B (en) Image segmentation method and image processing device
US11200424B2 (en) Space-time memory network for locating target object in video content
CN109359636B (en) Video classification method, device and server
CN111192292B (en) Target tracking method and related equipment based on attention mechanism and twin network
Wang et al. Detect globally, refine locally: A novel approach to saliency detection
US8917907B2 (en) Continuous linear dynamic systems
CN111914997B (en) Method for training neural network, image processing method and device
CN111489401B (en) Image color constancy processing method, system, device and storage medium
CN109413510B (en) Video abstract generation method and device, electronic equipment and computer storage medium
CN110136162B (en) Unmanned aerial vehicle visual angle remote sensing target tracking method and device
CN111553477A (en) Image processing method, device and storage medium
CN111104925A (en) Image processing method, image processing apparatus, storage medium, and electronic device
Golestaneh et al. No-reference image quality assessment via feature fusion and multi-task learning
CN114549913A (en) Semantic segmentation method and device, computer equipment and storage medium
CN116258874A (en) SAR recognition database sample gesture expansion method based on depth condition diffusion network
CN114283352A (en) Video semantic segmentation device, training method and video semantic segmentation method
CN110659641B (en) Text recognition method and device and electronic equipment
CN112101091B (en) Video classification method, electronic device and storage medium
CN117197178A (en) Foreground and background segmentation method, electronic device and computer readable medium
CN109492124B (en) Method and device for detecting bad anchor guided by selective attention clue and electronic equipment
CN116168439A (en) Lightweight lip language identification method and related equipment
CN115619729A (en) Face image quality evaluation method and device and electronic equipment
CN111126177B (en) Method and device for counting number of people
CN114155388A (en) Image recognition method and device, computer equipment and storage medium
CN113919476A (en) Image processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication