CN114463800A - Multi-scale feature fusion face detection and segmentation method based on generalized intersection-parallel ratio - Google Patents

Multi-scale feature fusion face detection and segmentation method based on generalized intersection-parallel ratio Download PDF

Info

Publication number
CN114463800A
CN114463800A CN202011251701.4A CN202011251701A CN114463800A CN 114463800 A CN114463800 A CN 114463800A CN 202011251701 A CN202011251701 A CN 202011251701A CN 114463800 A CN114463800 A CN 114463800A
Authority
CN
China
Prior art keywords
face
loss
target
mask
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011251701.4A
Other languages
Chinese (zh)
Inventor
吕巨建
林凯瀚
赵慧民
陈荣军
熊建斌
战荫伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Polytechnic Normal University filed Critical Guangdong Polytechnic Normal University
Priority to CN202011251701.4A priority Critical patent/CN114463800A/en
Publication of CN114463800A publication Critical patent/CN114463800A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Abstract

The invention discloses a multi-scale feature fusion face detection and segmentation method based on generalized intersection ratio, which comprises the following steps: s1: preprocessing a face image to be detected, inputting the face image into a Mask R-CNN model, and extracting a corresponding characteristic diagram through a pre-trained deep neural network model; s2: generating a candidate region on the feature map through a region suggestion network with a preset size; s3: matching and corresponding the pixels of the input image and the feature map by using the candidate region matching, and acquiring a corresponding fixed-size feature map; s4: and finally, classifying the candidate regions and positioning the bounding box by using the full-connection layer, predicting pixel points by using a full convolution network, and generating a corresponding binary mask to segment the face target from the background image. The invention improves the identification precision, enables the positioning precision of the image pixel points to reach the pixel level after the multi-target face detection, and can acquire accurate face information on a complex monitoring picture.

Description

Multi-scale feature fusion face detection and segmentation method based on generalized intersection-parallel ratio
Technical Field
The invention belongs to the technical field of machine vision, and particularly relates to a multi-scale feature fusion face detection and segmentation method based on generalized intersection ratio.
Background
In recent years, with the rapid development and popularization of intelligent hardware devices such as smart phones, high-performance computers, intelligent robots and the like, artificial intelligence technology has been applied to various aspects of daily life, such as automatic driving, electronic commerce, intelligent robots, network security, intelligent home and the like, and great convenience is brought to work and life of people. Human beings are main service objects of the artificial intelligence technology, so the acquisition of face information is particularly important, and the importance of face detection as a key link of the acquisition of the face information is self-evident.
The face detection generally comprises two processes of face identification and face positioning, and the face is detected and positioned from an image or a video through image processing technology, machine learning and other related technologies, so that face information is obtained. The face detection is the first step of face-related application and is also the most critical link, and the effect of the face detection directly influences the performance of subsequent application. Therefore, the face detection becomes a research hotspot in the field of artificial intelligence, and the application prospect is very wide. The development process of face detection research can divide the existing work into a method based on traditional manual characteristics and a method based on a convolutional neural network. The face detection method based on the traditional manual characteristics is mostly based on a frame of a sliding window or matching according to characteristic points, and has an obvious speed advantage; the face detection method based on deep learning mainly utilizes a convolutional neural network to extract features, has good realization effect in the aspects of accuracy and multi-target detection, and can greatly improve the accuracy by replacing less time consumption compared with the traditional machine learning algorithm, so that the face detection algorithm based on deep learning becomes the mainstream research direction of multi-target face detection.
The prior art is as follows: in the aspect of traditional manual features, with the first real-time and effective human face detection method, Viola-Jones, the human face detection starts to enter a practical stage. The Viola-Jones algorithm utilizes Haar features to perform feature expression, and then performs face detection through Adaboost and a cascade structure, so that the real-time face detection effect in a common scene can be realized. However, the algorithm has the disadvantages of large feature size, low recognition rate in complex scenes and the like. In view of the above disadvantages, researchers have designed more elaborate manual Features, such as Histogram of Oriented Gradient (HOG) Features, Scale-invariant feature transform (SIFT), Speeded Up Robust Features (SURF), Local Binary Pattern Features (LBP), etc., and have implemented face detection in combination with classifiers such as Support Vector Machine (SVM). In addition, a Deformable Part Model (DPM) proposed by Felzenszzwald et al in 2010 is one of important progresses of a traditional manual feature method, the method adopts a multi-component strategy, and improves target detection effects under different angles and deformation conditions by combining improved HOG features with SVM, so that important breakthrough is made on tasks such as face recognition, pedestrian detection and the like. The method based on the traditional manual features well realizes the real-time detection of the human face, but has certain defects, such as more complex manual feature design, poorer detection stability, no pertinence of a sliding window strategy and the like, so the human face detection method has larger improvement space.
In terms of Convolutional neural networks, as early as 1994, valillant et al proposed to use neural networks to detect faces, and this method trained two Convolutional Neural Networks (CNNs) for face detection, where one CNN is used to classify whether each pixel is part of a face, and then the exact face position is output through the other CNN. Subsequently, Rowley et al proposed a hidden-link neural network for face detection that determines whether a face is contained or not through a sliding window. With the use and remarkable achievement of deep convolutional neural networks in ImageNet series competition by Alex et al, deep convolutional neural networks have begun to be applied to the field of face detection. The document proposes a deep neural network for fast multi-scale face detection, which consists of a suggestion sub-network and a detection sub-network. It is proposed that in a sub-network, detection is performed at multiple output layers to match objects of different scales and that these detectors of complementary scales are combined into a multi-scale detector, improving the detection of multi-scale objects. The literature combines the advantages of filtered Channel features and deep CNNs to propose a Convolutional Channel Feature (CCF) method, which has lower computation cost and storage cost than the general end-to-end CNN method.
The face detection method based on the deep convolutional neural network has good realization effect in the aspects of accuracy and multi-target detection, and can replace great accuracy improvement with less time consumption, so that a face detection algorithm based on deep learning becomes the mainstream research direction of face detection.
The existing multi-target face detection algorithm mainly realizes the detection of the face and the positioning of a face target frame, the extracted face target feature dimension is large, the space quantization is rough, the accurate positioning cannot be realized, certain background noise exists, the further image processing is not facilitated, and the application of partial high-efficiency and practical image processing technologies (such as face image super-resolution reconstruction, face image correction and the like) on a monitoring video is difficult to realize. Therefore, a multi-target face detection segmentation method facing multimedia is urgently needed.
However, in the mainstream research work in the prior art, face detection mainly achieves classification of a face and a background and positioning of a face bounding box, that is, a face is detected in an image to be detected and positioned by a bounding box. However, in the detected human face boundary box, the human face information usually only occupies a part of the human face boundary box, redundant information is brought by redundant background images in the boundary box, and therefore the problems that the extracted human face features have large background noise, coarse spatial quantization and large extracted feature dimension exist. Resulting in limited application of some practical face-related technologies (e.g., face recognition, facial expression recognition, face super-resolution reconstruction, face pose correction, etc.). The face segmentation method is mostly independent of the detection method, and directly segments the image, which easily causes segmentation errors and lower efficiency. Therefore, a multi-scale feature fusion face detection and segmentation method based on generalized intersection ratio is provided to solve the background
Problems mentioned in the art.
Disclosure of Invention
The invention aims to provide a multi-scale feature fusion face detection and segmentation method based on generalized intersection ratio, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: a multi-scale feature fusion face detection and segmentation method based on generalized intersection ratio comprises the following steps:
s1: preprocessing a face image to be detected, inputting the face image into a Mask R-CNN model, and extracting a corresponding characteristic diagram through a pre-trained deep neural network model;
s2: generating a candidate region on the feature map through a region suggestion network with a preset size;
s3: matching and corresponding the pixels of the input image and the feature map by using the candidate region matching, and acquiring a corresponding fixed-size feature map;
s4: and finally, classifying the candidate regions and positioning the bounding box by using the full-connection layer, predicting pixel points by using a full convolution network, and generating a corresponding binary mask to segment the face target from the background image.
A Mask R-CNN loss function in a Mask R-CNN model adopts a generalized intersection-proportion function to replace a traditional smooth L1 function in the regression loss of a boundary box, so that the detection precision of the multi-target face is improved.
A multi-scale feature fusion strategy is adopted in the FPN, a reverse side edge connecting path from bottom to top is added for multi-scale feature fusion, and the small-scale face detection performance is improved.
The Mask R-CNN model completes three tasks in the same network architecture, namely detection and positioning of target position information, classification of a target and a background and segmentation of the target and the background, so that a loss function of the same network architecture comprises three parts, namely positioning loss, classification loss and segmentation loss.
The network global loss function is defined as follows: l ═ Lcls+Lbox+Lmask#; wherein L isclsTo classify the loss, LboxFor loss of alignment, LmaskIs a segmentation loss.
Compared with the prior art, the invention has the beneficial effects that: according to the method for detecting and segmenting the face based on the generalized intersection-parallel ratio and the multi-scale feature fusion, the identification precision is improved through a ROIAlign (candidate region matching) algorithm, the positioning precision of the pixel points of the image after the multi-target face detection reaches the pixel level, and therefore the requirement of an example segmentation technology on the precision of the pixel points is met.
The invention can perform example segmentation on the multi-target face image of the monitoring video through an FCN (full Convolutional network) algorithm, draws a face binary mask, and segments the face image from a background image, thereby reducing the interference of background noise and acquiring accurate face information on a complex monitoring picture.
According to the invention, the screening of the prediction result is carried out through an MOB (Mask of bounding box) algorithm, and the identification accuracy is improved.
Drawings
FIG. 1 is a schematic flow chart of the detection algorithm of the present invention;
FIG. 2 is a schematic diagram of a GIoU of the present invention;
FIG. 3 is a schematic structural diagram of a ResNet101+ FPN framework according to the present invention;
fig. 4 is a schematic structural diagram of the multi-scale feature fusion network of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1: the invention provides a method for detecting and segmenting a multi-scale feature fusion face based on a generalized intersection ratio as shown in figures 1-4, which comprises the following steps:
s1: preprocessing a face image to be detected, inputting the face image into a Mask R-CNN model, and extracting a corresponding characteristic diagram through a pre-trained deep neural network model;
s2: generating a candidate region on the feature map through a region suggestion network with a preset size;
s3: matching and corresponding the pixels of the input image and the feature map by using the candidate region matching, and acquiring a corresponding fixed-size feature map;
s4: and finally, classifying the candidate regions and positioning the bounding box by using the full-connection layer, predicting pixel points by using a full convolution network, and generating a corresponding binary mask to segment the face target from the background image.
The invention relates to a multi-scale feature fusion face detection and segmentation method based on generalized intersection ratio, and the specific embodiment of the whole detection method is as follows:
the network framework is expanded based on a Mask R-CNN model, the whole framework is shown in figure 1, firstly, a human face image to be detected is preprocessed and input into the model, and a corresponding characteristic diagram is extracted through a pre-trained deep neural network model; secondly, generating a candidate Region (Region of Interest, RoI) on the feature map through a Region suggestion Network (RPN) with a preset size; then, matching and corresponding the input image with the pixels of the feature map and acquiring a corresponding fixed-size feature map by using candidate Region matching (Region of Interest Align, RoIAlign); and finally, classifying the candidate area and positioning the boundary box by using a full connection layer, predicting pixel points by using a Full Convolution Network (FCN), and generating a corresponding binary mask to segment the face target from the background image.
Specifically, the present document improves on the deficiencies existing in the face of multi-target face detection and small-scale face detection tasks. A Generalized Intersection over Union (GIoU) is used as a boundary frame loss function to improve the detection precision of the multi-target human face; and a multi-scale feature fusion strategy is adopted in the FPN network, and the aim of improving the small-scale face detection performance is fulfilled.
Mask R-CNN is one of the excellent target detection and segmentation models at present, and three tasks, namely, detection and positioning of target position information, classification of a target and a background and segmentation of the target and the background, are completed in the same network architecture. Therefore, the overall network loss function includes three parts, namely positioning loss, classification loss and segmentation loss, and is defined as follows:
L=Lcls+Lbox+Lmask#
wherein L isclsTo classify the loss, LboxFor loss of alignment, LmaskIs a segmentation loss. Specifically, in the classification task, LclsComprises the following steps:
Figure BDA0002771008010000061
wherein i corresponds to the ith anchor point, NclsTo classify the number of samples, piPredicting a probability value, p, for the anchor point as a targeti *Labeled tag value, pi *1 is positive example, pi *Negative example is 0. L isclsCross entropy loss for two classes:
Figure BDA0002771008010000071
if the candidate box is detected to be a certain class, the cross entropy of the class is used as an error value to calculate, and the loss values of other classes are not counted, so that the competition among the classes is avoided, and the formula is as follows:
Figure BDA0002771008010000072
wherein y isijIs the label value of the m x m region coordinate point (i, j),
Figure BDA0002771008010000073
is the predicted value of the kth class at the point.
In the localization loss, unlike the original Mask R-CNN model, in order to better reflect the Intersection between the predicted and actual bounding boxes and the Intersection when two bounding boxes do not exist, a Generalized Intersection over Union (GIoU) function is used herein as the loss function.
Specifically, as shown in fig. 2, the GIoU is defined as follows: assuming that the black rectangular frame and the blue rectangular frame are the prediction frame and the real frame respectively, the coordinates thereof are:
Figure BDA0002771008010000074
and
Figure BDA0002771008010000075
wherein x is2>x1,y2>y1Then the areas are respectively:
Figure BDA0002771008010000076
Figure BDA0002771008010000077
rectangular frame at their intersection
Figure BDA0002771008010000078
Comprises the following steps:
Figure BDA0002771008010000079
Figure BDA00027710080100000710
the area is as follows:
Figure BDA0002771008010000081
similarly, the minimum bounding box of the prediction box and the real box is indicated by the dotted line
Figure BDA0002771008010000082
Comprises the following steps:
Figure BDA0002771008010000083
Figure BDA0002771008010000084
the area is as follows:
Figure BDA0002771008010000085
from the definition of the IoU function, the IoU values of the predicted frame and the real frame can be obtained as
Figure BDA0002771008010000086
By considering the intersection condition between two bounding boxes and the condition when the two bounding boxes are not intersected through the smallest bounding box, the GIoU is defined as the formula
Figure BDA0002771008010000087
Similarly, when GIoU is used herein as the loss function, the loss function is set as:
Lbox=LGIoU=1-GIoU#
in the Mask R-CNN model, ResNet101+ FPN frame is shown in FIG. 3, and assuming that the input image size is 224 × 224, feature maps C1, C2, C3, C4 and C5 can be obtained through Conv1, Conv2_ x, Conv3_ x, Conv4_ x and Conv5_ x of ResNet 101.
Due to the large size of C1, a large amount of computational power is involved. Therefore, the FPN adopts the characteristic diagrams of C2, C3, C4 and C5. First, C5 is reduced in dimensionality by 1 × 1 convolution to obtain P5, then P5 is upsampled to the same size as C4 and added to the results of 1 × 1 convolution with C4 to obtain P4.
Similarly, P4, P3, and P2 are available from top to bottom, and finally a 3 × 3 convolution operation is performed after P2, P3, P4, and P5 to reduce aliasing effects of upsampling.
Furthermore, the maximum pooling with step size 2 at P5 resulted in P6. Through the above operations, 5 feature maps P2, P3, P4, P5 and P6 fused with feature information of different levels are finally obtained, and are input into the RPN network for the next operation.
It can be seen that the FPN network has a transversely-connected top-down structure, so that shallow features also have deep feature information, and the feature information richness and the multi-target detection accuracy of the network are improved.
However, FPN networks still have certain disadvantages: firstly, because the horizontal connection of the FPN network only has a top-down path, the output characteristic diagram mainly comprises characteristic information of the current layer and a deeper layer, and the characteristic information of different scales is not fully utilized.
Secondly, shallow features have accurate global position information, while the number of network layers in the ResNet-101 is 101, the transmission distance between the shallow feature information and the deep feature information is long, and valuable information may be lost in the process. Since the large-scale target is mainly located in the shallow feature map, the above-mentioned disadvantages have less influence on the large-scale target. However, for small-scale face targets, ignoring feature information of different scales directly affects the detection accuracy of the small-scale face targets.
Aiming at the defects of the FPN, the invention adopts a Multi-scale feature fusion strategy (Multi-scale feature fusion) to improve the detection effect of the small-scale face target. Specifically, a feature fusion path from bottom to top is added in the FPN network, so that the transmission distance between deep features and shallow features is shortened, the fusion between the deep features and the shallow features is realized, and the detection precision of the small-scale face target is improved.
The multi-scale feature fusion network is shown in FIG. 4, in which C2-C5 and P2-P5 represented by blue rectangles define the same FPN network, and the bottom-up feature fusion path is shown as N2-N5. First, N2 was directly copied from P2 and passed through a 3 × 3 convolution with step size 2 to get a feature map of the same size as P3, then this feature map was added to P3, and the new feature map was passed through a 3 × 3 convolution with step size 1 to get N3. Similarly, N4, N5 are available from bottom to top, where all profiles use 256 channels. Through the operation, the shallow layer and deep layer feature information can be fused in the newly generated feature maps N2-N5, and the small-scale face detection task can be better handled.
The disadvantages of the conventional manual characterization method are:
1. the detection precision is relatively low, and the phenomena of false detection and inaccurate position of a target frame are easy to occur;
2. the conditions of missed detection and false detection are easy to occur to small targets;
3. only target frame positioning and classification are carried out on the multi-target face, and the face image and the background image are not divided;
4. the fine division effect is not achieved, with background noise interference.
The above disadvantages arise for the following reasons:
1. the adopted model is simpler, so that the overfitting phenomenon is easy to occur;
2. the deep extraction of the face features is not carried out; no screening is performed on the prediction results;
3. the face and the background image are segmented without applying an example segmentation technology, and only single detection work is performed.
Disadvantages of the existing deep learning (RCNN, Fast RCNN) methods:
1. the recognition speed is slow;
2. for the face image with the special posture, the detection precision is not high;
3. only target frame positioning and classification are carried out on the multi-target face, and the face image and the background image are not divided;
4. the fine division effect is not achieved, with background noise interference.
The above disadvantages arise for the following reasons:
1. by using a deep learning deep network architecture, feature extraction is carried out on each generated candidate frame, so that the calculation time is increased;
2. the ROI Pooling (Region of Interest, Pooling of candidate regions) performs twice rounding operations, which generates quantization errors, resulting in low accuracy of positioning pixels of images;
3. the prediction results are not screened, so that the false detection rate is increased;
4. the human face and the background image are not segmented by using an image segmentation algorithm, and only single detection work is performed.
The invention introduces instance segmentation on the basis of the traditional surveillance video face detection, utilizes a full convolution network to segment a face image from a background image, and the application of the instance segmentation on the multi-target face detection of the surveillance video is within the protection scope of the invention.
The generalized intersection and comparison function is adopted to replace the traditional smooth L1 function in the regression loss of the boundary box, the detection precision of the multi-target face is improved, the application of the generalized intersection and comparison function in the multi-target face detection and segmentation with better detection and segmentation effects is achieved, and the method is within the protection range of the method.
The invention adopts a multi-scale feature fusion strategy in the FPN network, increases the reverse side edge connecting path from bottom to top to carry out multi-scale feature fusion, and improves the small-scale face detection performance. The application of the MOB algorithm in the multi-target face detection of the surveillance video is within the protection scope of the invention.
To sum up, compared with the prior art:
according to the method, the identification precision is improved through a ROIAlign (candidate region matching) algorithm, and the positioning precision of the image pixel points after the multi-target face detection reaches the pixel level, so that the requirement of an example segmentation technology on the precision of the pixel points is met.
The invention can perform example segmentation on the multi-target face image of the monitoring video through an FCN (full Convolutional network) algorithm, draws a face binary mask, and segments the face image from a background image, thereby reducing the interference of background noise and acquiring accurate face information on a complex monitoring picture.
According to the invention, the screening of the prediction result is carried out through an MOB (Mask of bounding box) algorithm, and the identification accuracy is improved.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments or portions thereof without departing from the spirit and scope of the invention.

Claims (5)

1. A multiscale feature fusion face detection and segmentation method based on generalized intersection ratio is characterized in that: the method comprises the following steps:
s1: preprocessing a face image to be detected, inputting the face image into a Mask R-CNN model, and extracting a corresponding characteristic diagram through a pre-trained deep neural network model;
s2: generating a candidate region on the feature map through a region suggestion network with a preset size;
s3: matching and corresponding the pixels of the input image and the feature map by using the candidate region matching, and acquiring a corresponding fixed-size feature map;
s4: and finally, classifying the candidate regions and positioning the bounding box by using the full-connection layer, predicting pixel points by using a full convolution network, and generating a corresponding binary mask to segment the face target from the background image.
2. The method for detecting and segmenting the human face by fusing the multi-scale features based on the generalized intersection ratio as claimed in claim 1, wherein: a Mask R-CNN loss function in a Mask R-CNN model adopts a generalized intersection-proportion function to replace a traditional smooth L1 function in the regression loss of a boundary box, so that the detection precision of the multi-target face is improved.
3. The method for detecting and segmenting the human face by fusing the multi-scale features based on the generalized intersection ratio as claimed in claim 1, wherein: a multi-scale feature fusion strategy is adopted in the FPN, a reverse side edge connecting path from bottom to top is added for multi-scale feature fusion, and the small-scale face detection performance is improved.
4. The method for detecting and segmenting the human face by fusing the multi-scale features based on the generalized intersection ratio as claimed in claim 1, wherein: the Mask R-CNN model completes three tasks in the same network architecture, namely detection and positioning of target position information, classification of a target and a background and segmentation of the target and the background, so that a loss function of the same network architecture comprises three parts, namely positioning loss, classification loss and segmentation loss.
5. The method for detecting and segmenting the human face by fusing the multi-scale features based on the generalized intersection ratio as claimed in claim 4, wherein: the network global loss function is defined as follows: l ═ Lcls+Lbox+Lmask#; wherein L isclsTo classify the loss, LboxFor loss of alignment, LmaskIs a segmentation loss.
CN202011251701.4A 2020-11-10 2020-11-10 Multi-scale feature fusion face detection and segmentation method based on generalized intersection-parallel ratio Pending CN114463800A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011251701.4A CN114463800A (en) 2020-11-10 2020-11-10 Multi-scale feature fusion face detection and segmentation method based on generalized intersection-parallel ratio

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011251701.4A CN114463800A (en) 2020-11-10 2020-11-10 Multi-scale feature fusion face detection and segmentation method based on generalized intersection-parallel ratio

Publications (1)

Publication Number Publication Date
CN114463800A true CN114463800A (en) 2022-05-10

Family

ID=81403948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011251701.4A Pending CN114463800A (en) 2020-11-10 2020-11-10 Multi-scale feature fusion face detection and segmentation method based on generalized intersection-parallel ratio

Country Status (1)

Country Link
CN (1) CN114463800A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114973386A (en) * 2022-08-01 2022-08-30 成都市威虎科技有限公司 Construction site scene face target detection method for deeply mining mixed features
CN116311482A (en) * 2023-05-23 2023-06-23 中国科学技术大学 Face fake detection method, system, equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114973386A (en) * 2022-08-01 2022-08-30 成都市威虎科技有限公司 Construction site scene face target detection method for deeply mining mixed features
CN114973386B (en) * 2022-08-01 2022-11-04 成都市威虎科技有限公司 Construction site scene face target detection method for deeply mining mixed features
CN116311482A (en) * 2023-05-23 2023-06-23 中国科学技术大学 Face fake detection method, system, equipment and storage medium
CN116311482B (en) * 2023-05-23 2023-08-29 中国科学技术大学 Face fake detection method, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111210443B (en) Deformable convolution mixing task cascading semantic segmentation method based on embedding balance
CN108304808B (en) Monitoring video object detection method based on temporal-spatial information and deep network
Asha et al. Vehicle counting for traffic management system using YOLO and correlation filter
CN102609686B (en) Pedestrian detection method
CN109670405B (en) Complex background pedestrian detection method based on deep learning
Liu et al. Pedestrian detection algorithm based on improved SSD
CN110929593A (en) Real-time significance pedestrian detection method based on detail distinguishing and distinguishing
CN113763427B (en) Multi-target tracking method based on coarse-to-fine shielding processing
CN114463800A (en) Multi-scale feature fusion face detection and segmentation method based on generalized intersection-parallel ratio
Zhang et al. A survey on instance segmentation: Recent advances and challenges
CN109635649B (en) High-speed detection method and system for unmanned aerial vehicle reconnaissance target
Cao et al. A new region proposal network for far-infrared pedestrian detection
CN102509308A (en) Motion segmentation method based on mixtures-of-dynamic-textures-based spatiotemporal saliency detection
Chen et al. Occlusion and multi-scale pedestrian detection A review
Kiratiratanapruk et al. Vehicle detection and tracking for traffic monitoring system
CN102156879A (en) Human target matching method based on weighted terrestrial motion distance
Cheng et al. Dense-acssd for end-to-end traffic scenes recognition
CN116152696A (en) Intelligent security image identification method and system for industrial control system
CN114821441A (en) Deep learning-based airport scene moving target identification method combined with ADS-B information
Che et al. Traffic light recognition for real scenes based on image processing and deep learning
Khosla et al. A neuromorphic system for object detection and classification
CN114332754A (en) Cascade R-CNN pedestrian detection method based on multi-metric detector
Ye et al. Real-time TV logo detection based on color and HOG features
Kovbasiuk et al. Detection of vehicles on images obtained from unmanned aerial vehicles using instance segmentation
Huang Pedestrian detection algorithm in video analysis based on centrist

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination