CN114299138A

CN114299138A - Human body target detection method and system in conference polling based on dynamic and static detection combination

Info

Publication number: CN114299138A
Application number: CN202111004697.6A
Authority: CN
Inventors: 孙丽丽; 刘鸿雁; 陈思颖; 张延童; 王朔; 王雨晨; 何子亨; 车四四; 刘方舟; 陈廷森; 李宗皓
Original assignee: State Grid Corp of China SGCC; Shandong University; Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; Shandong University; Information and Telecommunication Branch of State Grid Shandong Electric Power Co Ltd
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2022-04-08

Abstract

The utility model provides a human body target detection method and system in conference polling based on dynamic and static detection combination, comprising: acquiring conference polling video data in real time; detecting a moving human body target by adopting an inter-frame difference method based on the current frame image and the previous frame image, and acquiring coordinate information and confidence coefficients of all areas with the moving target; detecting static human body targets by using the position relations between the conference table plate and the table board and the human body through mean value filtering, binarization processing, corrosion and midpoint positioning methods to obtain coordinate information and confidence coefficients of all static target areas; and screening by using a non-maximum suppression algorithm based on the obtained coordinate information and confidence of the moving and static target areas to obtain the final human body target position.

Description

Human body target detection method and system in conference polling based on dynamic and static detection combination

Technical Field

The disclosure belongs to the technical field of computer vision, and particularly relates to a method and a system for detecting a human body target in conference polling based on combination of dynamic and static detection.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Video monitoring is to automatically analyze an image sequence shot by a camera by using a computer vision and image processing method under the condition of no need of human intervention, and at present, human body target detection based on video image recognition is widely applied to computer vision, such as video conferences, student classrooms and the like. Through the video image recognition technology, the recognition degree of monitoring can be improved, and the workload of monitoring personnel is reduced. In recent years, with the rapid development of video image analysis technology, video image analysis has been gradually integrated into people's daily life in various forms, and more attention is paid to analyzing specific people through acquired features.

For detecting the position of a human body target in a video image, currently, a common method is to detect a moving target, wherein the moving target detection refers to a process of detecting a change area between images in a sequence image and extracting the moving target from a background image. The inventor finds that for the detection of a moving target under a fixed background, common methods are an interframe difference method and a background difference method, the interframe difference method has the advantages of strong adaptability to a dynamic environment, and the defect that a threshold value is difficult to select when a foreground is extracted. The background difference method has the advantages that a complete moving object image can be acquired, and the background needs to be updated timely. For example: in patent "CN 201910548272.8: in a moving region foreground image algorithm based on the combination of a background difference method and an inter-frame difference method, a background difference method and the inter-frame difference method are used for extracting a foreground image, although interference information in the image can be effectively removed, the algorithm has certain limitation, and all targets in a scene cannot be detected in the scene for scenes with small movement amplitude of personnel, such as a conference scene.

Disclosure of Invention

The scheme utilizes the characteristics of obvious characteristics and convenience in detection of conference table cards, and for conference scenes with small motion amplitude and fixed positions of human body targets, under the condition that participants are not absent, the moving human body targets are detected by utilizing the difference between the front frame and the rear frame of an image and by utilizing an interframe difference method; then, the static human body target is detected by using the position relation between the conference table board and the human body through methods such as binarization processing, mean value filtering, corrosion and the like; the missing detection caused by small motion amplitude of the human body can be effectively avoided, the detection accuracy is improved, and the technical support is provided for the detection of the human body target in the conference scene with the human body target relatively fixed.

According to a first aspect of the embodiments of the present disclosure, there is provided a human body target detection method in conference polling based on dynamic and static detection combination, including:

acquiring conference polling video data in real time;

detecting a moving human body target by adopting an inter-frame difference method based on the current frame image and the previous frame image, and acquiring coordinate information and confidence coefficients of all areas with the moving target; detecting static human body targets by using the position relations between the conference table plate and the table board and the human body through mean value filtering, binarization processing, corrosion and midpoint positioning methods to obtain coordinate information and confidence coefficients of all static target areas;

and screening by using a non-maximum suppression algorithm based on the obtained coordinate information and confidence of the moving and static target areas to obtain the final human body target position.

Further, the method for detecting the moving human body target by adopting the interframe difference method specifically comprises the following steps: subtracting the gray level image of the current frame and the gray level image of the previous frame, and performing binary processing based on a first preset threshold value to obtain a binary image of the moving target; traversing the whole binary image by using pixel blocks with preset sizes to obtain a plurality of areas with preset sizes, wherein the position of each area is represented as a row-column value of the area; summing and normalizing the pixel point values in each region respectively to serve as a confidence value of each region; and comparing the confidence value of each area with a third preset threshold value to determine the area with the moving target.

Further, the obtaining of the coordinate information of all the stationary target areas specifically includes: acquiring a binary image of a current frame image, and determining a conference desktop area based on the characteristic that a conference desk plate has a large block shadow; carrying out mean value filtering and binarization processing on the conference desktop area to obtain a binary image of the conference desktop area; and carrying out corrosion operation on the binary image, determining the coordinates of the central point of the table board in the conference table top area based on the corroded binary image, and obtaining the position coordinates of the area where the human body target is located based on the position relation between the conference table board and the human body target.

Further, the obtaining the confidence degrees of all the static target regions specifically includes: and carrying out binarization processing on the image of the region where the human body target is located, summing pixel points in the region, and then normalizing to obtain a confidence value of the region.

Further, carrying out binarization processing on the current frame image to obtain a binary image of the conference scene, and determining the edge position of the table plate based on the characteristic that the conference table plate has large-block shadow; determining a conference desktop area based on the table edge position; carrying out mean value filtering processing on the conference desktop area, and carrying out binarization processing on the image of the android conference area after the mean value filtering processing to obtain a binary image of the conference desktop area;

according to a second aspect of the embodiments of the present disclosure, there is provided a human body target detection system in conference polling based on dynamic and static detection combination, including:

acquiring conference polling video data in real time by data;

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory for running, where the processor implements the method for detecting a human target in a conference poll based on a combination of dynamic and static detection when executing the program.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for detecting human targets in conference polling based on combination of dynamic and static detection.

Compared with the prior art, the beneficial effect of this disclosure is:

the scheme combines moving human body target detection by adopting an interframe difference method and static human body target detection by adopting methods such as binarization processing and the like, detects a human body target with larger motion amplitude by adopting the interframe difference method, and supplements the human body target by adopting a static human body detection method, so that missing detection caused by small motion amplitude of personnel can be avoided, and the detection accuracy is improved.

Advantages of additional aspects of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.

Fig. 1 is a flowchart of a human body target detection method in conference polling based on dynamic and static detection combination in the first embodiment of the disclosure.

Detailed Description

The present disclosure is further described with reference to the following drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

The first embodiment is as follows:

the embodiment aims to provide a human body target detection method in conference polling based on dynamic and static detection combination.

A human body target detection method in conference polling based on dynamic and static detection combination comprises the following steps:

acquiring conference polling video data in real time;

Further, the detecting of the moving human body target by adopting the interframe difference method further comprises: and calculating that the sum of the pixel values of all pixel points in the binary image is greater than a second preset threshold value, which indicates that the scene switching is performed on the conference polling video, and no moving human body target detection is performed between the current frame and the previous frame.

further, the screening by using the non-maximum suppression algorithm specifically includes:

(1) obtaining coordinate information and confidence of a plurality of human body target areas based on the results of the moving human body target detection and the static human body target detection;

(2) sequencing the target regions according to the confidence coefficient from large to small to obtain an ordered matrix list;

(3) calculating the IOU (Intersection over Unit) value from the target area with the maximum confidence coefficient to any other target area, and deleting the target area with the IOU value larger than a fourth preset threshold value; storing the target area with the maximum confidence coefficient and deleting the target area in the ordered matrix list;

(4) and (4) repeating the step in the step (3) until the traversal of the ordered matrix list is finished.

Specifically, for ease of understanding, the embodiments of the present disclosure are described in detail below with reference to the accompanying drawings:

as shown in fig. 1, the present disclosure provides a human body target detection method in conference polling based on combination of dynamic and static detection, which utilizes the characteristics of obvious characteristics and convenient detection of a conference table board, and for conference scenes with small motion amplitude and fixed positions of human body targets, under the condition that participants are not absent, detects moving human body targets by using an interframe difference method by means of the difference between the front and rear frames of an image; carrying out binarization processing, mean filtering, corrosion and midpoint positioning through the position relation between the conference table board and the human body, carrying out static human body target detection, and finally screening by adopting a non-maximum inhibition algorithm to obtain a human body target; the method specifically comprises the following steps:

step (1): acquiring a current frame and a previous frame of images of a conference polling video;

wherein, the step 1 specifically comprises: acquiring the current t frame and the previous frame gray images of the conference polling video to obtain two pairs of H rows and W columns of images I_tAnd I_t-1Size W × H, image I_t-1And I_tThe pixel values of the corresponding pixel points are respectively I_t-1(x, y) and I_t(x, y), (x, y) represents the coordinates of the corresponding pixel points, and (x, y) belongs to {0,1,. multidot.H-1 } × {0,1,. multidot.W-1 };

step (2) adopting an interframe difference method to detect a moving human body target;

wherein the step (2) specifically comprises the following steps:

a) selecting a suitable threshold value T₁(i.e., the first preset threshold), two adjacent frames of grayed images I_t-1And I_tAfter subtraction, binarization processing is carried out to obtain a binary image D of the moving target_tThe pixel value of the corresponding pixel point is D_t(x, y) is represented by

b) For different meeting scenes, the method does not need to adoptThe interframe difference method is used for detecting the moving target, and a proper threshold value T is selected by utilizing the characteristic of larger difference between different meeting scenes₂(i.e. the second preset threshold), if the binary image D of the moving object_tThe summation result of the pixel values of all the pixel points is greater than the threshold value T₂If yes, the conference polling video is switched to the next conference scene; define scene Change State V, denoted as

When V is 1, two frames of pictures represent the same conference scene, and step c) is continuously executed;

when V is equal to 0, two frames of pictures represent different conference scenes, and steps c) and d) are not executed;

c) defining an h-row w-column pixel block, and traversing and scanning a binary image D of the moving target from left to right and from top to bottom by respectively separating delta w and delta h pixel points_tThe values of Δ H and Δ W satisfy the condition that H-H can be divided by Δ H and W-W can be divided by Δ W, defining

Obtaining M multiplied by N areas with the size of w multiplied by h after traversing scanning, d_mnDenotes the m-th row and n-th column of regions, d_mnCan be represented by the coordinates (x) of the upper left corner_m,y_n) Height h and width w are uniquely determined, wherein

d) Will be in area d_mnSumming and normalizing the pixel values of all the pixels in the image to obtain confidence coefficient lambda_mnIs shown as

The coordinate information and confidence of all regions form a matrix Ψ, Ψ of size MxNx5_mnRepresents the mth row and nth column of the matrix Ψ, denoted as Ψ_mn＝[x_m，y_n，h，w，λ_mn]M-0, 1, M-1, N-0, 1, N-1, and selecting a suitable threshold T₃(namely a third preset threshold value) and judging whether the moving target exists in the area, wherein the coordinate information and the confidence coefficient of all the areas with the moving target form a matrix gamma⁽¹⁾，

Representative matrix Γ⁽¹⁾The k-th row, initialization k is 0, and the following operations are performed:

by comparing the areas d_mnConfidence of (b) lambda_mnAnd a threshold value T₃Of size, if λ_mn≥T₃Let us order

k＝k+1；

Repeating the operation until the confidence degrees of all the regions are compared with the threshold value, and if K is equal to K, the K is the number of the human body targets determined by the detection of the moving human body targets;

and (3): carrying out static human body target detection through mean value filtering, binarization processing, corrosion and midpoint positioning;

wherein the step (3) specifically comprises the following steps:

a) for the current frame image I_tCarrying out binarization processing to obtain a binary image of the conference scene

The pixel value of the corresponding pixel point is

Scanning by using the characteristic of large shadow of conference table plate

According to the row summation result, if the summation result of the continuous gamma rows is 0, the area is indicated as a shadow area, and therefore the edge position of the table board is obtained;define the edge position of the table board as

Satisfy the requirement of

Determining a meeting desktop area Z according to the edge position of the meeting desk board_tDefining the height of the table board as h_ZThen Z is_tCan be represented by the coordinates of the upper left corner

High h_ZAnd width W uniquely determined, conference desktop zone Z_tThe pixel value of the corresponding pixel point is Z_t(u，v)，(u，v)∈Υ，Υ＝{0，1，...h_Z-1}×{0，1，...，W-1}：

b) Determining the coordinates of the center point of the conference table card;

1) defining a filter A of size 3 x 3₁Carrying out mean value filtering processing on the conference desktop area, and carrying out a filter A₁Coefficient of any one point is A₁(r, s), (r, s) ∈ Θ, Θ { -1, 0, 1} × { -1, 0, 1}, then point (0, 0) is the filter center point;

in the conference desktop zone Z_tMoving the filter A point by point from left to right and from top to bottom₁As described in the scanning process of step d) in step 2), when the scanning step Δ h is 1 and Δ w is 1, the filter center (0, 0) and Z are set to be equal to each other_tIs determined, each coefficient in the filter is multiplied by the corresponding pixel one by one and summed, the result is taken as the output value G of the pixel (u, v)_t(u, v), expressed as:

when the center point (0, 0) of the filter is located on the boundary of the image, part of the edge points will be located outside the image, which is filled with 0 at this time.

2) Image G after mean value filtering processing_tCarrying out binarization processing to obtain a binary image of the conference desktop area

To pair

Etching is carried out to obtain an image after etching operation

The image erosion operation is as follows:

corrosion is a morphological operation, the set in mathematical morphology representing an object in an image, a binary image

Is the set of all white pixels in the image, defining the point z ═ u, v and the set B, denoted as

Defining the complement of B as B^cIs shown as

Defining an image U with the size of 3 × 3, wherein the pixel value of a corresponding pixel point is U (r, s), (r, s) ∈ Θ, the morphological description of U is the set of all white pixels in the image, and defining a point a ═ r (r, s) and a set A₂Is shown as A₂＝{a＝(r，s)|U(r，s)＝1，(r，s)∈Θ}，(A₂)_zRepresentation set A₂Is represented by (A)₂)_z＝{b|b＝a+z，a∈A₂}; the erosion algorithm is performed by letting a translate with z₂Run on B, which is the same as the mean filtering, and B is A₂The result of the corrosion is A₂The set of all points z contained in B, denoted as

3) Binary image after corrosion operation

The value of the target object is 1, and the value of the background is 0; scanning from left to right

The result of the column-wise summation is found to be the first non-zero point, defined as the left endpoint p₀(ii) a Scanning to the right by taking the left end point as a starting point until the first zero point is found, and defining the last non-zero point found in the scanning process as a right end point q₀(ii) a Calculating the center point c from the left and right end points₀Is shown as

Wherein, c₀Namely the vertical coordinate of the central point of the first conference table card, continuously scanning to the right, and repeating the steps until the scanning is finished

Saving the ordinate c of the center points of all conference table boards_lL-1, L is the number of the conference table cards, and further the coordinate of the center point of the conference table card is obtained as

c) According to the proportion relation of the conference scene and the participants in the video picture and the position relation of the conference table board and the participants, the characteristic that the human body target and the conference table board have difference in the vertical direction is obtained, namely the vertical coordinate of the center point of the conference table board is the same as the vertical coordinate of the center point of the human body target, the horizontal coordinate is reduced by delta pixels according to the picture proportion, and the region where the human body target is located can be formed by the coordinate of the upper left corner

The height h and the width w are uniquely determined;

after binarization processing is carried out on the region where the human body target is located, the pixel values of all pixel points are normalized after summation, and the confidence coefficient omega of the region is obtained_l(ii) a Human body target area constitution matrix gamma determined by static target detection⁽²⁾，

Representative matrix Γ⁽²⁾Is indicated as line I of

And (4): screening by adopting a non-maximum inhibition algorithm to obtain a human body target position;

different human body regions may be detected around the same target in the human body detection process, and an overlapping phenomenon exists between the regions, so that a non-overlapping region with the optimal confidence degree needs to be found by using a non-maximum suppression algorithm, and a redundant region is eliminated;

the non-maximum value suppression is abbreviated as NMS algorithm, and the idea is to search local maximum values and suppress non-maximum values; according to the confidence coefficient and the coordinate information of the region, finding the region with the optimal confidence coefficient; the method specifically comprises the following steps:

a) the coordinate information and the confidence degrees of a plurality of areas detected by the two detection methods are integrated to obtain a matrix gamma⁽³⁾Is shown as gamma⁽³⁾＝Γ⁽¹⁾+Γ⁽²⁾(ii) a At Γ type⁽³⁾In the case of non-null, Γ⁽³⁾The middle rows are reordered from high to low according to the confidence coefficient to obtain a matrix gamma⁽⁴⁾Is shown as

Rho is the current matrix Γ⁽⁴⁾Initializing p K + L, the value of p being dependent on Γ⁽⁴⁾Is reduced by the change in the amount of the,

representation matrix r⁽⁴⁾Is denoted as line a of

α ═ 0,1,. rho-1; defining a final output matrix omega and the number χ of the human body targets of the final output, and expressing the matrix omega as [ omega ]₀,Ω₁,...,Ω_χ-1]^T，Ω_ηThe η th row of the matrix Ω is represented, and η is initialized to 0, Ω_η＝[Ω_η0,Ω_η1,...,Ω_η4]；

b) The region with the highest confidence coefficient is in gamma⁽⁴⁾Wherein the corresponding element is

Order to

And in matrix Γ⁽⁴⁾Deleting the element corresponding to the region with the highest confidence coefficient, wherein rho is rho-1;

c) calculating IOU of the region with highest confidence coefficient and other regions, and calculating IOU in matrix gamma⁽⁴⁾In, the deletion IOU is greater than the threshold T₄(i.e., a fourth preset threshold) of the elements corresponding to the region; the other regions are arranged in the gamma direction⁽⁴⁾The corresponding information in (1) is

The ratio of the intersection and union of the region with the highest confidence to the other regions is expressed as

Assuming that a total of epsilon IOU areas are larger than the threshold value in the process, rho is rho-epsilon;

d) let η ═ η +1, χ ═ η, repeat steps b) and c) until Γ⁽⁴⁾Null (ρ ═ 0);

e) and finally, obtaining coordinate information in the matrix omega, namely the coordinate information of the human body target in the conference scene.

Example two:

the purpose of this embodiment is to provide a human target detecting system in meeting polling based on sound detection combines.

A human body target detection system in conference polling based on dynamic and static detection combination comprises:

acquiring conference polling video data in real time by data;

In further embodiments, there is also provided:

an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the method of embodiment one. For brevity, no further description is provided herein.

It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.

A computer readable storage medium storing computer instructions which, when executed by a processor, perform the method of embodiment one.

The method in the first embodiment may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.

Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The human body target detection method and the human body target detection system based on the dynamic and static detection combination in the conference polling can be realized, and have wide application prospects.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A human body target detection method in conference polling based on dynamic and static detection combination is characterized by comprising the following steps:

acquiring conference polling video data in real time;

2. The human target detection method in the conference polling based on the combination of the dynamic detection and the static detection as claimed in claim 1, wherein the detection of the moving human target is realized by adopting an interframe difference method, which specifically comprises the following steps: subtracting the gray level image of the current frame and the gray level image of the previous frame, and performing binary processing based on a first preset threshold value to obtain a binary image of the moving target; traversing the whole binary image by using pixel blocks with preset sizes to obtain a plurality of areas with preset sizes, wherein the position of each area is represented as a row-column value of the area; summing and normalizing the pixel point values in each region respectively to serve as a confidence value of each region; and comparing the confidence value of each area with a third preset threshold value to determine the area with the moving target.

3. The human target detection method in the conference polling based on the combination of the dynamic detection and the static detection as claimed in claim 1, wherein the detection of the moving human target is realized by adopting an interframe difference method, further comprising: and calculating that the sum of the pixel values of all pixel points in the binary image is greater than a second preset threshold value, which indicates that the scene switching is performed on the conference polling video, and no moving human body target detection is performed between the current frame and the previous frame.

4. The human body target detection method in the conference polling based on the combination of the dynamic and static detection as claimed in claim 1, wherein the obtaining of the coordinate information of all static target areas specifically comprises: acquiring a binary image of a current frame image, and determining a conference desktop area based on the characteristic that a conference desk plate has a large block shadow; carrying out mean value filtering and binarization processing on the conference desktop area to obtain a binary image of the conference desktop area; and carrying out corrosion operation on the binary image, determining the coordinates of the central point of the table board in the conference table top area based on the corroded binary image, and obtaining the position coordinates of the area where the human body target is located based on the position relation between the conference table board and the human body target.

5. The human target detection method in the conference polling based on the combination of the dynamic detection and the static detection as claimed in claim 1, wherein the obtaining the confidence degrees of all the static target areas specifically comprises: and carrying out binarization processing on the image of the region where the human body target is located, summing pixel points in the region, and then normalizing to obtain a confidence value of the region.

6. The human body target detection method in the conference polling based on the combination of the dynamic detection and the static detection as claimed in claim 1, characterized in that, the current frame image is subjected to binarization processing to obtain a binary image of a conference scene, and the edge position of the table is determined based on the characteristic that the conference table has a large block shadow; determining a conference desktop area based on the table edge position; carrying out mean value filtering processing on the conference desktop area, and carrying out binarization processing on the image of the android conference area after the mean value filtering processing to obtain a binary image of the conference desktop area;

7. the human target detection method in the conference polling based on the combination of dynamic and static detection as claimed in claim 1, wherein the screening is performed by using a non-maximum suppression algorithm, specifically:

(3) calculating the IOU value from the target area with the maximum confidence coefficient to any other target area, and deleting the target area with the IOU value larger than a fourth preset threshold; storing the target area with the maximum confidence coefficient and deleting the target area in the ordered matrix list;

8. A human body target detection system in conference polling based on dynamic and static detection combination is characterized by comprising:

acquiring conference polling video data in real time by data;

9. An electronic device comprising a memory, a processor and a computer program stored in the memory for execution, wherein the processor implements a method for detecting a human target in a conference poll based on dynamic and static detection as claimed in any one of claims 1 to 7.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements a method for human target detection in a combined dynamic and static detection based conference polling according to any one of claims 1 to 7.