CN114299138A - Human body target detection method and system in conference polling based on dynamic and static detection combination - Google Patents

Human body target detection method and system in conference polling based on dynamic and static detection combination Download PDF

Info

Publication number
CN114299138A
CN114299138A CN202111004697.6A CN202111004697A CN114299138A CN 114299138 A CN114299138 A CN 114299138A CN 202111004697 A CN202111004697 A CN 202111004697A CN 114299138 A CN114299138 A CN 114299138A
Authority
CN
China
Prior art keywords
target
conference
human body
static
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111004697.6A
Other languages
Chinese (zh)
Inventor
孙丽丽
刘鸿雁
陈思颖
张延童
王朔
王雨晨
何子亨
车四四
刘方舟
陈廷森
李宗皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Shandong University
Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Shandong University
Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Shandong University, Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202111004697.6A priority Critical patent/CN114299138A/en
Publication of CN114299138A publication Critical patent/CN114299138A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Image Analysis (AREA)

Abstract

The utility model provides a human body target detection method and system in conference polling based on dynamic and static detection combination, comprising: acquiring conference polling video data in real time; detecting a moving human body target by adopting an inter-frame difference method based on the current frame image and the previous frame image, and acquiring coordinate information and confidence coefficients of all areas with the moving target; detecting static human body targets by using the position relations between the conference table plate and the table board and the human body through mean value filtering, binarization processing, corrosion and midpoint positioning methods to obtain coordinate information and confidence coefficients of all static target areas; and screening by using a non-maximum suppression algorithm based on the obtained coordinate information and confidence of the moving and static target areas to obtain the final human body target position.

Description

Human body target detection method and system in conference polling based on dynamic and static detection combination
Technical Field
The disclosure belongs to the technical field of computer vision, and particularly relates to a method and a system for detecting a human body target in conference polling based on combination of dynamic and static detection.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Video monitoring is to automatically analyze an image sequence shot by a camera by using a computer vision and image processing method under the condition of no need of human intervention, and at present, human body target detection based on video image recognition is widely applied to computer vision, such as video conferences, student classrooms and the like. Through the video image recognition technology, the recognition degree of monitoring can be improved, and the workload of monitoring personnel is reduced. In recent years, with the rapid development of video image analysis technology, video image analysis has been gradually integrated into people's daily life in various forms, and more attention is paid to analyzing specific people through acquired features.
For detecting the position of a human body target in a video image, currently, a common method is to detect a moving target, wherein the moving target detection refers to a process of detecting a change area between images in a sequence image and extracting the moving target from a background image. The inventor finds that for the detection of a moving target under a fixed background, common methods are an interframe difference method and a background difference method, the interframe difference method has the advantages of strong adaptability to a dynamic environment, and the defect that a threshold value is difficult to select when a foreground is extracted. The background difference method has the advantages that a complete moving object image can be acquired, and the background needs to be updated timely. For example: in patent "CN 201910548272.8: in a moving region foreground image algorithm based on the combination of a background difference method and an inter-frame difference method, a background difference method and the inter-frame difference method are used for extracting a foreground image, although interference information in the image can be effectively removed, the algorithm has certain limitation, and all targets in a scene cannot be detected in the scene for scenes with small movement amplitude of personnel, such as a conference scene.
Disclosure of Invention
The scheme utilizes the characteristics of obvious characteristics and convenience in detection of conference table cards, and for conference scenes with small motion amplitude and fixed positions of human body targets, under the condition that participants are not absent, the moving human body targets are detected by utilizing the difference between the front frame and the rear frame of an image and by utilizing an interframe difference method; then, the static human body target is detected by using the position relation between the conference table board and the human body through methods such as binarization processing, mean value filtering, corrosion and the like; the missing detection caused by small motion amplitude of the human body can be effectively avoided, the detection accuracy is improved, and the technical support is provided for the detection of the human body target in the conference scene with the human body target relatively fixed.
According to a first aspect of the embodiments of the present disclosure, there is provided a human body target detection method in conference polling based on dynamic and static detection combination, including:
acquiring conference polling video data in real time;
detecting a moving human body target by adopting an inter-frame difference method based on the current frame image and the previous frame image, and acquiring coordinate information and confidence coefficients of all areas with the moving target; detecting static human body targets by using the position relations between the conference table plate and the table board and the human body through mean value filtering, binarization processing, corrosion and midpoint positioning methods to obtain coordinate information and confidence coefficients of all static target areas;
and screening by using a non-maximum suppression algorithm based on the obtained coordinate information and confidence of the moving and static target areas to obtain the final human body target position.
Further, the method for detecting the moving human body target by adopting the interframe difference method specifically comprises the following steps: subtracting the gray level image of the current frame and the gray level image of the previous frame, and performing binary processing based on a first preset threshold value to obtain a binary image of the moving target; traversing the whole binary image by using pixel blocks with preset sizes to obtain a plurality of areas with preset sizes, wherein the position of each area is represented as a row-column value of the area; summing and normalizing the pixel point values in each region respectively to serve as a confidence value of each region; and comparing the confidence value of each area with a third preset threshold value to determine the area with the moving target.
Further, the obtaining of the coordinate information of all the stationary target areas specifically includes: acquiring a binary image of a current frame image, and determining a conference desktop area based on the characteristic that a conference desk plate has a large block shadow; carrying out mean value filtering and binarization processing on the conference desktop area to obtain a binary image of the conference desktop area; and carrying out corrosion operation on the binary image, determining the coordinates of the central point of the table board in the conference table top area based on the corroded binary image, and obtaining the position coordinates of the area where the human body target is located based on the position relation between the conference table board and the human body target.
Further, the obtaining the confidence degrees of all the static target regions specifically includes: and carrying out binarization processing on the image of the region where the human body target is located, summing pixel points in the region, and then normalizing to obtain a confidence value of the region.
Further, carrying out binarization processing on the current frame image to obtain a binary image of the conference scene, and determining the edge position of the table plate based on the characteristic that the conference table plate has large-block shadow; determining a conference desktop area based on the table edge position; carrying out mean value filtering processing on the conference desktop area, and carrying out binarization processing on the image of the android conference area after the mean value filtering processing to obtain a binary image of the conference desktop area;
according to a second aspect of the embodiments of the present disclosure, there is provided a human body target detection system in conference polling based on dynamic and static detection combination, including:
acquiring conference polling video data in real time by data;
detecting a moving human body target by adopting an inter-frame difference method based on the current frame image and the previous frame image, and acquiring coordinate information and confidence coefficients of all areas with the moving target; detecting static human body targets by using the position relations between the conference table plate and the table board and the human body through mean value filtering, binarization processing, corrosion and midpoint positioning methods to obtain coordinate information and confidence coefficients of all static target areas;
and screening by using a non-maximum suppression algorithm based on the obtained coordinate information and confidence of the moving and static target areas to obtain the final human body target position.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory for running, where the processor implements the method for detecting a human target in a conference poll based on a combination of dynamic and static detection when executing the program.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for detecting human targets in conference polling based on combination of dynamic and static detection.
Compared with the prior art, the beneficial effect of this disclosure is:
the scheme combines moving human body target detection by adopting an interframe difference method and static human body target detection by adopting methods such as binarization processing and the like, detects a human body target with larger motion amplitude by adopting the interframe difference method, and supplements the human body target by adopting a static human body detection method, so that missing detection caused by small motion amplitude of personnel can be avoided, and the detection accuracy is improved.
Advantages of additional aspects of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
Fig. 1 is a flowchart of a human body target detection method in conference polling based on dynamic and static detection combination in the first embodiment of the disclosure.
Detailed Description
The present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
The first embodiment is as follows:
the embodiment aims to provide a human body target detection method in conference polling based on dynamic and static detection combination.
A human body target detection method in conference polling based on dynamic and static detection combination comprises the following steps:
acquiring conference polling video data in real time;
detecting a moving human body target by adopting an inter-frame difference method based on the current frame image and the previous frame image, and acquiring coordinate information and confidence coefficients of all areas with the moving target; detecting static human body targets by using the position relations between the conference table plate and the table board and the human body through mean value filtering, binarization processing, corrosion and midpoint positioning methods to obtain coordinate information and confidence coefficients of all static target areas;
and screening by using a non-maximum suppression algorithm based on the obtained coordinate information and confidence of the moving and static target areas to obtain the final human body target position.
Further, the method for detecting the moving human body target by adopting the interframe difference method specifically comprises the following steps: subtracting the gray level image of the current frame and the gray level image of the previous frame, and performing binary processing based on a first preset threshold value to obtain a binary image of the moving target; traversing the whole binary image by using pixel blocks with preset sizes to obtain a plurality of areas with preset sizes, wherein the position of each area is represented as a row-column value of the area; summing and normalizing the pixel point values in each region respectively to serve as a confidence value of each region; and comparing the confidence value of each area with a third preset threshold value to determine the area with the moving target.
Further, the detecting of the moving human body target by adopting the interframe difference method further comprises: and calculating that the sum of the pixel values of all pixel points in the binary image is greater than a second preset threshold value, which indicates that the scene switching is performed on the conference polling video, and no moving human body target detection is performed between the current frame and the previous frame.
Further, the obtaining of the coordinate information of all the stationary target areas specifically includes: acquiring a binary image of a current frame image, and determining a conference desktop area based on the characteristic that a conference desk plate has a large block shadow; carrying out mean value filtering and binarization processing on the conference desktop area to obtain a binary image of the conference desktop area; and carrying out corrosion operation on the binary image, determining the coordinates of the central point of the table board in the conference table top area based on the corroded binary image, and obtaining the position coordinates of the area where the human body target is located based on the position relation between the conference table board and the human body target.
Further, the obtaining the confidence degrees of all the static target regions specifically includes: and carrying out binarization processing on the image of the region where the human body target is located, summing pixel points in the region, and then normalizing to obtain a confidence value of the region.
Further, carrying out binarization processing on the current frame image to obtain a binary image of the conference scene, and determining the edge position of the table plate based on the characteristic that the conference table plate has large-block shadow; determining a conference desktop area based on the table edge position; carrying out mean value filtering processing on the conference desktop area, and carrying out binarization processing on the image of the android conference area after the mean value filtering processing to obtain a binary image of the conference desktop area;
further, the screening by using the non-maximum suppression algorithm specifically includes:
(1) obtaining coordinate information and confidence of a plurality of human body target areas based on the results of the moving human body target detection and the static human body target detection;
(2) sequencing the target regions according to the confidence coefficient from large to small to obtain an ordered matrix list;
(3) calculating the IOU (Intersection over Unit) value from the target area with the maximum confidence coefficient to any other target area, and deleting the target area with the IOU value larger than a fourth preset threshold value; storing the target area with the maximum confidence coefficient and deleting the target area in the ordered matrix list;
(4) and (4) repeating the step in the step (3) until the traversal of the ordered matrix list is finished.
Specifically, for ease of understanding, the embodiments of the present disclosure are described in detail below with reference to the accompanying drawings:
as shown in fig. 1, the present disclosure provides a human body target detection method in conference polling based on combination of dynamic and static detection, which utilizes the characteristics of obvious characteristics and convenient detection of a conference table board, and for conference scenes with small motion amplitude and fixed positions of human body targets, under the condition that participants are not absent, detects moving human body targets by using an interframe difference method by means of the difference between the front and rear frames of an image; carrying out binarization processing, mean filtering, corrosion and midpoint positioning through the position relation between the conference table board and the human body, carrying out static human body target detection, and finally screening by adopting a non-maximum inhibition algorithm to obtain a human body target; the method specifically comprises the following steps:
step (1): acquiring a current frame and a previous frame of images of a conference polling video;
wherein, the step 1 specifically comprises: acquiring the current t frame and the previous frame gray images of the conference polling video to obtain two pairs of H rows and W columns of images ItAnd It-1Size W × H, image It-1And ItThe pixel values of the corresponding pixel points are respectively It-1(x, y) and It(x, y), (x, y) represents the coordinates of the corresponding pixel points, and (x, y) belongs to {0,1,. multidot.H-1 } × {0,1,. multidot.W-1 };
step (2) adopting an interframe difference method to detect a moving human body target;
wherein the step (2) specifically comprises the following steps:
a) selecting a suitable threshold value T1(i.e., the first preset threshold), two adjacent frames of grayed images It-1And ItAfter subtraction, binarization processing is carried out to obtain a binary image D of the moving targettThe pixel value of the corresponding pixel point is Dt(x, y) is represented by
Figure RE-GDA0003523327540000071
b) For different meeting scenes, the method does not need to adoptThe interframe difference method is used for detecting the moving target, and a proper threshold value T is selected by utilizing the characteristic of larger difference between different meeting scenes2(i.e. the second preset threshold), if the binary image D of the moving objecttThe summation result of the pixel values of all the pixel points is greater than the threshold value T2If yes, the conference polling video is switched to the next conference scene; define scene Change State V, denoted as
Figure RE-GDA0003523327540000072
When V is 1, two frames of pictures represent the same conference scene, and step c) is continuously executed;
when V is equal to 0, two frames of pictures represent different conference scenes, and steps c) and d) are not executed;
c) defining an h-row w-column pixel block, and traversing and scanning a binary image D of the moving target from left to right and from top to bottom by respectively separating delta w and delta h pixel pointstThe values of Δ H and Δ W satisfy the condition that H-H can be divided by Δ H and W-W can be divided by Δ W, defining
Figure RE-GDA0003523327540000073
Obtaining M multiplied by N areas with the size of w multiplied by h after traversing scanning, dmnDenotes the m-th row and n-th column of regions, dmnCan be represented by the coordinates (x) of the upper left cornerm,yn) Height h and width w are uniquely determined, wherein
Figure RE-GDA0003523327540000074
d) Will be in area dmnSumming and normalizing the pixel values of all the pixels in the image to obtain confidence coefficient lambdamnIs shown as
Figure RE-GDA0003523327540000075
The coordinate information and confidence of all regions form a matrix Ψ, Ψ of size MxNx5mnRepresents the mth row and nth column of the matrix Ψ, denoted as Ψmn=[xm,yn,h,w,λmn]M-0, 1, M-1, N-0, 1, N-1, and selecting a suitable threshold T3(namely a third preset threshold value) and judging whether the moving target exists in the area, wherein the coordinate information and the confidence coefficient of all the areas with the moving target form a matrix gamma(1)
Figure RE-GDA0003523327540000081
Representative matrix Γ(1)The k-th row, initialization k is 0, and the following operations are performed:
by comparing the areas dmnConfidence of (b) lambdamnAnd a threshold value T3Of size, if λmn≥T3Let us order
Figure RE-GDA0003523327540000082
k=k+1;
Repeating the operation until the confidence degrees of all the regions are compared with the threshold value, and if K is equal to K, the K is the number of the human body targets determined by the detection of the moving human body targets;
and (3): carrying out static human body target detection through mean value filtering, binarization processing, corrosion and midpoint positioning;
wherein the step (3) specifically comprises the following steps:
a) for the current frame image ItCarrying out binarization processing to obtain a binary image of the conference scene
Figure RE-GDA0003523327540000083
The pixel value of the corresponding pixel point is
Figure RE-GDA0003523327540000084
Scanning by using the characteristic of large shadow of conference table plate
Figure RE-GDA0003523327540000085
According to the row summation result, if the summation result of the continuous gamma rows is 0, the area is indicated as a shadow area, and therefore the edge position of the table board is obtained;define the edge position of the table board as
Figure RE-GDA0003523327540000086
Satisfy the requirement of
Figure RE-GDA0003523327540000087
Determining a meeting desktop area Z according to the edge position of the meeting desk boardtDefining the height of the table board as hZThen Z istCan be represented by the coordinates of the upper left corner
Figure RE-GDA0003523327540000088
High hZAnd width W uniquely determined, conference desktop zone ZtThe pixel value of the corresponding pixel point is Zt(u,v),(u,v)∈Υ,Υ={0,1,...hZ-1}×{0,1,...,W-1}:
b) Determining the coordinates of the center point of the conference table card;
1) defining a filter A of size 3 x 31Carrying out mean value filtering processing on the conference desktop area, and carrying out a filter A1Coefficient of any one point is A1(r, s), (r, s) ∈ Θ, Θ { -1, 0, 1} × { -1, 0, 1}, then point (0, 0) is the filter center point;
in the conference desktop zone ZtMoving the filter A point by point from left to right and from top to bottom1As described in the scanning process of step d) in step 2), when the scanning step Δ h is 1 and Δ w is 1, the filter center (0, 0) and Z are set to be equal to each othertIs determined, each coefficient in the filter is multiplied by the corresponding pixel one by one and summed, the result is taken as the output value G of the pixel (u, v)t(u, v), expressed as:
Figure RE-GDA0003523327540000091
when the center point (0, 0) of the filter is located on the boundary of the image, part of the edge points will be located outside the image, which is filled with 0 at this time.
2) Image G after mean value filtering processingtCarrying out binarization processing to obtain a binary image of the conference desktop area
Figure RE-GDA0003523327540000092
To pair
Figure RE-GDA0003523327540000093
Etching is carried out to obtain an image after etching operation
Figure RE-GDA0003523327540000094
The image erosion operation is as follows:
corrosion is a morphological operation, the set in mathematical morphology representing an object in an image, a binary image
Figure RE-GDA0003523327540000095
Is the set of all white pixels in the image, defining the point z ═ u, v and the set B, denoted as
Figure RE-GDA0003523327540000096
Defining the complement of B as BcIs shown as
Figure RE-GDA00035233275400000910
Defining an image U with the size of 3 × 3, wherein the pixel value of a corresponding pixel point is U (r, s), (r, s) ∈ Θ, the morphological description of U is the set of all white pixels in the image, and defining a point a ═ r (r, s) and a set A2Is shown as A2={a=(r,s)|U(r,s)=1,(r,s)∈Θ},(A2)zRepresentation set A2Is represented by (A)2)z={b|b=a+z,a∈A2}; the erosion algorithm is performed by letting a translate with z2Run on B, which is the same as the mean filtering, and B is A2The result of the corrosion is A2The set of all points z contained in B, denoted as
Figure RE-GDA00035233275400000911
3) Binary image after corrosion operation
Figure RE-GDA0003523327540000097
The value of the target object is 1, and the value of the background is 0; scanning from left to right
Figure RE-GDA0003523327540000098
The result of the column-wise summation is found to be the first non-zero point, defined as the left endpoint p0(ii) a Scanning to the right by taking the left end point as a starting point until the first zero point is found, and defining the last non-zero point found in the scanning process as a right end point q0(ii) a Calculating the center point c from the left and right end points0Is shown as
Figure RE-GDA0003523327540000099
Wherein, c0Namely the vertical coordinate of the central point of the first conference table card, continuously scanning to the right, and repeating the steps until the scanning is finished
Figure RE-GDA0003523327540000101
Saving the ordinate c of the center points of all conference table boardslL-1, L is the number of the conference table cards, and further the coordinate of the center point of the conference table card is obtained as
Figure RE-GDA0003523327540000102
c) According to the proportion relation of the conference scene and the participants in the video picture and the position relation of the conference table board and the participants, the characteristic that the human body target and the conference table board have difference in the vertical direction is obtained, namely the vertical coordinate of the center point of the conference table board is the same as the vertical coordinate of the center point of the human body target, the horizontal coordinate is reduced by delta pixels according to the picture proportion, and the region where the human body target is located can be formed by the coordinate of the upper left corner
Figure RE-GDA0003523327540000103
The height h and the width w are uniquely determined;
after binarization processing is carried out on the region where the human body target is located, the pixel values of all pixel points are normalized after summation, and the confidence coefficient omega of the region is obtainedl(ii) a Human body target area constitution matrix gamma determined by static target detection(2)
Figure RE-GDA0003523327540000104
Representative matrix Γ(2)Is indicated as line I of
Figure RE-GDA0003523327540000105
And (4): screening by adopting a non-maximum inhibition algorithm to obtain a human body target position;
different human body regions may be detected around the same target in the human body detection process, and an overlapping phenomenon exists between the regions, so that a non-overlapping region with the optimal confidence degree needs to be found by using a non-maximum suppression algorithm, and a redundant region is eliminated;
the non-maximum value suppression is abbreviated as NMS algorithm, and the idea is to search local maximum values and suppress non-maximum values; according to the confidence coefficient and the coordinate information of the region, finding the region with the optimal confidence coefficient; the method specifically comprises the following steps:
a) the coordinate information and the confidence degrees of a plurality of areas detected by the two detection methods are integrated to obtain a matrix gamma(3)Is shown as gamma(3)=Γ(1)(2)(ii) a At Γ type(3)In the case of non-null, Γ(3)The middle rows are reordered from high to low according to the confidence coefficient to obtain a matrix gamma(4)Is shown as
Figure RE-GDA0003523327540000106
Rho is the current matrix Γ(4)Initializing p K + L, the value of p being dependent on Γ(4)Is reduced by the change in the amount of the,
Figure RE-GDA0003523327540000107
representation matrix r(4)Is denoted as line a of
Figure RE-GDA0003523327540000108
α ═ 0,1,. rho-1; defining a final output matrix omega and the number χ of the human body targets of the final output, and expressing the matrix omega as [ omega ]01,...,Ωχ-1]T,ΩηThe η th row of the matrix Ω is represented, and η is initialized to 0, Ωη=[Ωη0η1,...,Ωη4];
b) The region with the highest confidence coefficient is in gamma(4)Wherein the corresponding element is
Figure RE-GDA0003523327540000111
Order to
Figure RE-GDA0003523327540000112
And in matrix Γ(4)Deleting the element corresponding to the region with the highest confidence coefficient, wherein rho is rho-1;
c) calculating IOU of the region with highest confidence coefficient and other regions, and calculating IOU in matrix gamma(4)In, the deletion IOU is greater than the threshold T4(i.e., a fourth preset threshold) of the elements corresponding to the region; the other regions are arranged in the gamma direction(4)The corresponding information in (1) is
Figure RE-GDA0003523327540000113
The ratio of the intersection and union of the region with the highest confidence to the other regions is expressed as
Figure RE-GDA0003523327540000114
Assuming that a total of epsilon IOU areas are larger than the threshold value in the process, rho is rho-epsilon;
d) let η ═ η +1, χ ═ η, repeat steps b) and c) until Γ(4)Null (ρ ═ 0);
e) and finally, obtaining coordinate information in the matrix omega, namely the coordinate information of the human body target in the conference scene.
Example two:
the purpose of this embodiment is to provide a human target detecting system in meeting polling based on sound detection combines.
A human body target detection system in conference polling based on dynamic and static detection combination comprises:
acquiring conference polling video data in real time by data;
detecting a moving human body target by adopting an inter-frame difference method based on the current frame image and the previous frame image, and acquiring coordinate information and confidence coefficients of all areas with the moving target; detecting static human body targets by using the position relations between the conference table plate and the table board and the human body through mean value filtering, binarization processing, corrosion and midpoint positioning methods to obtain coordinate information and confidence coefficients of all static target areas;
and screening by using a non-maximum suppression algorithm based on the obtained coordinate information and confidence of the moving and static target areas to obtain the final human body target position.
In further embodiments, there is also provided:
an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the method of embodiment one. For brevity, no further description is provided herein.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of embodiment one.
The method in the first embodiment may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The human body target detection method and the human body target detection system based on the dynamic and static detection combination in the conference polling can be realized, and have wide application prospects.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (10)

1. A human body target detection method in conference polling based on dynamic and static detection combination is characterized by comprising the following steps:
acquiring conference polling video data in real time;
detecting a moving human body target by adopting an inter-frame difference method based on the current frame image and the previous frame image, and acquiring coordinate information and confidence coefficients of all areas with the moving target; detecting static human body targets by using the position relations between the conference table plate and the table board and the human body through mean value filtering, binarization processing, corrosion and midpoint positioning methods to obtain coordinate information and confidence coefficients of all static target areas;
and screening by using a non-maximum suppression algorithm based on the obtained coordinate information and confidence of the moving and static target areas to obtain the final human body target position.
2. The human target detection method in the conference polling based on the combination of the dynamic detection and the static detection as claimed in claim 1, wherein the detection of the moving human target is realized by adopting an interframe difference method, which specifically comprises the following steps: subtracting the gray level image of the current frame and the gray level image of the previous frame, and performing binary processing based on a first preset threshold value to obtain a binary image of the moving target; traversing the whole binary image by using pixel blocks with preset sizes to obtain a plurality of areas with preset sizes, wherein the position of each area is represented as a row-column value of the area; summing and normalizing the pixel point values in each region respectively to serve as a confidence value of each region; and comparing the confidence value of each area with a third preset threshold value to determine the area with the moving target.
3. The human target detection method in the conference polling based on the combination of the dynamic detection and the static detection as claimed in claim 1, wherein the detection of the moving human target is realized by adopting an interframe difference method, further comprising: and calculating that the sum of the pixel values of all pixel points in the binary image is greater than a second preset threshold value, which indicates that the scene switching is performed on the conference polling video, and no moving human body target detection is performed between the current frame and the previous frame.
4. The human body target detection method in the conference polling based on the combination of the dynamic and static detection as claimed in claim 1, wherein the obtaining of the coordinate information of all static target areas specifically comprises: acquiring a binary image of a current frame image, and determining a conference desktop area based on the characteristic that a conference desk plate has a large block shadow; carrying out mean value filtering and binarization processing on the conference desktop area to obtain a binary image of the conference desktop area; and carrying out corrosion operation on the binary image, determining the coordinates of the central point of the table board in the conference table top area based on the corroded binary image, and obtaining the position coordinates of the area where the human body target is located based on the position relation between the conference table board and the human body target.
5. The human target detection method in the conference polling based on the combination of the dynamic detection and the static detection as claimed in claim 1, wherein the obtaining the confidence degrees of all the static target areas specifically comprises: and carrying out binarization processing on the image of the region where the human body target is located, summing pixel points in the region, and then normalizing to obtain a confidence value of the region.
6. The human body target detection method in the conference polling based on the combination of the dynamic detection and the static detection as claimed in claim 1, characterized in that, the current frame image is subjected to binarization processing to obtain a binary image of a conference scene, and the edge position of the table is determined based on the characteristic that the conference table has a large block shadow; determining a conference desktop area based on the table edge position; carrying out mean value filtering processing on the conference desktop area, and carrying out binarization processing on the image of the android conference area after the mean value filtering processing to obtain a binary image of the conference desktop area;
7. the human target detection method in the conference polling based on the combination of dynamic and static detection as claimed in claim 1, wherein the screening is performed by using a non-maximum suppression algorithm, specifically:
(1) obtaining coordinate information and confidence of a plurality of human body target areas based on the results of the moving human body target detection and the static human body target detection;
(2) sequencing the target regions according to the confidence coefficient from large to small to obtain an ordered matrix list;
(3) calculating the IOU value from the target area with the maximum confidence coefficient to any other target area, and deleting the target area with the IOU value larger than a fourth preset threshold; storing the target area with the maximum confidence coefficient and deleting the target area in the ordered matrix list;
(4) and (4) repeating the step in the step (3) until the traversal of the ordered matrix list is finished.
8. A human body target detection system in conference polling based on dynamic and static detection combination is characterized by comprising:
acquiring conference polling video data in real time by data;
detecting a moving human body target by adopting an inter-frame difference method based on the current frame image and the previous frame image, and acquiring coordinate information and confidence coefficients of all areas with the moving target; detecting static human body targets by using the position relations between the conference table plate and the table board and the human body through mean value filtering, binarization processing, corrosion and midpoint positioning methods to obtain coordinate information and confidence coefficients of all static target areas;
and screening by using a non-maximum suppression algorithm based on the obtained coordinate information and confidence of the moving and static target areas to obtain the final human body target position.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory for execution, wherein the processor implements a method for detecting a human target in a conference poll based on dynamic and static detection as claimed in any one of claims 1 to 7.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements a method for human target detection in a combined dynamic and static detection based conference polling according to any one of claims 1 to 7.
CN202111004697.6A 2021-08-30 2021-08-30 Human body target detection method and system in conference polling based on dynamic and static detection combination Pending CN114299138A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111004697.6A CN114299138A (en) 2021-08-30 2021-08-30 Human body target detection method and system in conference polling based on dynamic and static detection combination

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111004697.6A CN114299138A (en) 2021-08-30 2021-08-30 Human body target detection method and system in conference polling based on dynamic and static detection combination

Publications (1)

Publication Number Publication Date
CN114299138A true CN114299138A (en) 2022-04-08

Family

ID=80964577

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111004697.6A Pending CN114299138A (en) 2021-08-30 2021-08-30 Human body target detection method and system in conference polling based on dynamic and static detection combination

Country Status (1)

Country Link
CN (1) CN114299138A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115240232A (en) * 2022-09-22 2022-10-25 齐鲁空天信息研究院 Human head and shoulder area detection method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102007033133A1 (en) * 2007-07-16 2009-01-29 Rohde & Schwarz Gmbh & Co. Kg Method for detection of persons or object in area, involves detecting intersecting sections of multiple sampling lines by detection of reference object in sampling line
CN104658011A (en) * 2015-01-31 2015-05-27 北京理工大学 Intelligent transportation moving object detection tracking method
CN108304808A (en) * 2018-02-06 2018-07-20 广东顺德西安交通大学研究院 A kind of monitor video method for checking object based on space time information Yu depth network
CN113225461A (en) * 2021-02-04 2021-08-06 江西方兴科技有限公司 System and method for detecting video monitoring scene switching

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102007033133A1 (en) * 2007-07-16 2009-01-29 Rohde & Schwarz Gmbh & Co. Kg Method for detection of persons or object in area, involves detecting intersecting sections of multiple sampling lines by detection of reference object in sampling line
CN104658011A (en) * 2015-01-31 2015-05-27 北京理工大学 Intelligent transportation moving object detection tracking method
CN108304808A (en) * 2018-02-06 2018-07-20 广东顺德西安交通大学研究院 A kind of monitor video method for checking object based on space time information Yu depth network
CN113225461A (en) * 2021-02-04 2021-08-06 江西方兴科技有限公司 System and method for detecting video monitoring scene switching

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李甜田等: "基于动静结合的单一背景目标检测方法", 《河南科技》, no. 13, 31 May 2020 (2020-05-31), pages 30 - 34 *
贾伟: "煤矿井下视频多目标轨迹跟踪方法研究与应用", ,《中国优秀硕士学位论文全文数据库信息科技辑》, no. 1, 15 January 2018 (2018-01-15), pages 1 - 51 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115240232A (en) * 2022-09-22 2022-10-25 齐鲁空天信息研究院 Human head and shoulder area detection method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109409366B (en) Distorted image correction method and device based on angular point detection
CN112184744B (en) Display screen edge defect detection method and device
CN108960229B (en) Multidirectional character detection method and device
CN109785291B (en) Lane line self-adaptive detection method
US6771834B1 (en) Method for segmenting a digital image
WO2020048396A1 (en) Target detection method, apparatus and device for continuous images, and storage medium
CN111680690B (en) Character recognition method and device
CN114529459B (en) Method, system and medium for enhancing image edge
US9524445B2 (en) Methods and systems for suppressing non-document-boundary contours in an image
CN112669301B (en) High-speed rail bottom plate paint removal fault detection method
CN109035287B (en) Foreground image extraction method and device and moving vehicle identification method and device
CN108875504B (en) Image detection method and image detection device based on neural network
CN109729298B (en) Image processing method and image processing apparatus
CN114820594B (en) Method for detecting edge sealing defect of plate based on image, related equipment and storage medium
CN113723399A (en) License plate image correction method, license plate image correction device and storage medium
CN114299138A (en) Human body target detection method and system in conference polling based on dynamic and static detection combination
CN113205494B (en) Infrared small target detection method and system based on adaptive scale image block weighting difference measurement
CN110751156A (en) Method, system, device and medium for table line bulk interference removal
CN114674826A (en) Visual detection method and detection system based on cloth
US6968074B1 (en) Image processing device, image processing method, and storage medium
CN113542868A (en) Video key frame selection method and device, electronic equipment and storage medium
US6668070B2 (en) Image processing device, image processing method, and storage medium
CN116664643A (en) Railway train image registration method and equipment based on SuperPoint algorithm
Xu et al. Improved Canny Edge Detection Operator
JP2005277732A (en) Method, device and program for four-corner detection of rectangular image, and storage medium stored with four-corner detection program of rectangular image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination