CN112887600B

CN112887600B - Shooting method and system based on standing behavior detection

Info

Publication number: CN112887600B
Application number: CN202110099321.1A
Authority: CN
Inventors: 张明; 董健
Original assignee: Ruimo Intelligent Technology Shenzhen Co ltd
Current assignee: Ruimo Intelligent Technology Shenzhen Co ltd
Priority date: 2021-01-25
Filing date: 2021-01-25
Publication date: 2022-08-05
Anticipated expiration: 2041-01-25
Also published as: CN112887600A

Abstract

The invention discloses a shooting method and a shooting system based on standing behavior detection. The shooting method based on the standing behavior detection comprises the following steps: quickly screening out the character target with the standing behavior in the current frame image according to a track image list and a preset rule; the track image list consists of track image sequences of all the character targets; determining the human target with effective standing behavior according to the track image sequence corresponding to the screened human target by using a visual classifier based on a deep learning neural network; controlling a camera to lock the character target with the effective standing behavior; and performing close-up shooting on the locked human target. The invention can accurately and efficiently detect the character targets with effective standing behaviors, and can carry out focusing close-up shooting on the corresponding character targets, thereby greatly improving the user experience.

Description

Shooting method and system based on standing behavior detection

Technical Field

The embodiment of the invention relates to the technical field of shooting, in particular to a shooting method and a shooting system based on standing behavior detection.

Background

In a recording and broadcasting system, such as teaching, meeting and the like, some prominent target characters or target behaviors need to be shot in close-up, particularly when the target characters are lifted up, the shooting equipment needs to shoot the target characters in close-up, when the target characters are seated, the shooting equipment is restored to a wide-angle position, the rising behaviors of the target characters are monitored in real time, and the target characters are focused on after being recognized to be shot in close-up.

In the prior art, a frame difference method or an optical flow method is mainly adopted and realized based on a motion detection principle, after a rough motion range is obtained through the two methods, shooting equipment is controlled to carry out close-up shooting on the area, and the wide-angle shooting is carried out after a certain time. However, the scheme based on motion detection is very sensitive to the change of the picture, and in a real scene, a large amount of changes easily exist in the picture, but most of the changes are not effective, and the human actions in a classroom or a conference room are very frequent and disordered, so that it is difficult to acquire accurate behavior actions from rough and simple optical flow or frame difference features.

Disclosure of Invention

The invention provides a shooting method and a shooting system based on standing behavior detection, which can accurately judge effective standing behaviors and carry out close-up shooting on character targets corresponding to the effective standing behaviors, thereby greatly improving user experience.

In a first aspect, an embodiment of the present invention provides a shooting method based on standing behavior detection, where the shooting method based on standing behavior detection includes:

A. quickly screening out the character target with the standing behavior in the current frame image according to a track image list and a preset rule; the track image list consists of track image sequences of all the character targets;

B. determining the human target with effective standing behavior according to the track image sequence corresponding to the screened human target by using a visual classifier based on a deep learning neural network;

C. controlling a camera to lock the character target with the effective standing behavior;

D. and performing close-up shooting on the locked human target.

In a second aspect, an embodiment of the present invention further provides a shooting system based on standing behavior detection, where the shooting system based on standing behavior detection includes:

the target screening module is used for rapidly screening character targets with standing behaviors in the current frame image according to the track image list and a preset rule; the track image list consists of track image sequences of all the character targets;

the behavior screening module is used for determining a human target with effective standing behavior according to the track image sequence corresponding to the screened human target by utilizing a visual classifier based on a deep learning neural network;

the locking module is used for controlling the camera to lock the character target with the effective standing behavior;

and the shooting module is used for shooting the locked person target in close-up.

According to the invention, the figure target with the standing behavior in the current frame image is quickly screened out according to the track image list and the preset rule, and then the figure target with the effective standing behavior is determined according to the track image sequence of the screened figure target by using the visual classifier based on the deep learning neural network, so that the figure target with the effective standing behavior can be accurately and efficiently detected, and the focusing close-up shooting is carried out on the corresponding figure target, and the user experience is greatly improved.

Drawings

Fig. 1 is a flowchart of a method of a photographing method based on standing behavior detection according to an embodiment of the present invention.

Fig. 2 is a flowchart of a method of a photographing method based on standing behavior detection according to a second embodiment of the present invention.

Fig. 3 is a flowchart of a sub-method of a photographing method based on standing behavior detection according to a second embodiment of the present invention.

Fig. 4 is a flowchart of another sub-method of a photographing method based on standing behavior detection according to a second embodiment of the present invention.

Fig. 5 is a flowchart of another sub-method of a photographing method based on standing behavior detection according to a second embodiment of the present invention.

Fig. 6 is a flowchart of a method of a photographing method based on standing behavior detection according to a third embodiment of the present invention.

Fig. 7 is a block diagram illustrating a configuration of a photographing system based on standing behavior detection according to a fourth embodiment of the present invention.

Fig. 8 is a block diagram illustrating a configuration of a photographing system based on standing behavior detection according to a fifth embodiment of the present invention.

Fig. 9 is a block diagram illustrating a structure of a subsystem of a photographing system based on standing behavior detection according to a fifth embodiment of the present invention.

Fig. 10 is a block diagram illustrating another sub-system of a photographing system based on standing behavior detection according to a fifth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a method for a shooting method based on detection of a standing behavior according to an embodiment of the present invention, where this embodiment is applicable to recording and broadcasting system situations such as teaching and conference, and the method can be executed by an intelligent camera with a pan/tilt head, and specifically includes the following steps:

and 110, quickly screening the character target with the standing behavior in the current frame image according to the track image list and a preset rule. Wherein the track image list is composed of track image sequences of all the human targets.

Step S110 is to perform preliminary screening on a large amount of data, for example, tens of people or hundreds of people may exist in a classroom, most useless data can be eliminated through the preliminary screening, in this embodiment, a character target that may have a standing behavior can be quickly screened out according to a track image list of a character, accurate locking of the character target is realized through a track, and then accurate closed-loop control over zooming and a pan-tilt is realized, and the whole action process of the character can be monitored based on analysis of the track, and a start point and an end point of subsequent shooting of the character target can be accurately controlled. The embodiment maintains track image sequences of all detected human targets, forms a track image list, and updates the track image list according to information detected in image frames.

And step 120, determining the human target with effective standing behavior according to the track image sequence corresponding to the screened human target by using a visual classifier based on a deep learning neural network.

The person target with the standing behavior screened in the step S110 may have false detection to some extent, and the step S120 introduces a visual classifier based on a deep learning neural network, so that visual information contained in the track image sequence of the screened person target with the standing behavior can be further analyzed, and invalid standing behavior can be further filtered by the visual classifier, so as to accurately screen the person target with the valid standing behavior. In this embodiment, the effective standing behavior is a behavior that changes from a sitting behavior to a standing behavior, and the ineffective standing behavior includes a behavior that changes from a sitting behavior to a walking behavior or a standing behavior. In the embodiment, the track image sequence of the screened character target is subjected to behavior analysis and detection through the visual classifier, so that the character target with effective standing behavior can be accurately detected.

And S130, controlling the camera to lock the character target corresponding to the effective standing behavior.

And step S140, performing close-up shooting on the locked person target.

According to the embodiment, after the figure target with the effective standing behavior is detected, the camera is controlled to lock the figure target corresponding to the effective standing behavior, the pan-tilt and the zoom are synchronously controlled, the locked figure target is shot in a close-up mode, and the focal length and the pan-tilt do not move any more when the close-up shooting is carried out.

According to the photographing method based on standing behavior detection, in step S110, only a lightweight algorithm is needed to be adopted to perform primary screening on a large amount of image information, most useless information can be eliminated quickly, and a person target with standing behavior is screened out; in step S120, the trajectory image list of the screened human target is subjected to high-precision behavior analysis and detection by the visual classifier, so that the detection efficiency and accuracy are greatly improved, and the human target with effective standing behavior can be effectively determined. According to the embodiment, the figure target with the standing behavior in the current frame image is quickly screened out according to the track image list and the preset rule, the effective standing behavior in the screened figure target is determined by using the visual classifier based on the deep learning neural network, the figure target with the effective standing behavior can be accurately and efficiently detected, focusing close-up shooting is carried out on the corresponding figure target, and the user experience is greatly improved.

Example two

Fig. 2 is a flowchart of a method for a shooting method based on detection of a standing behavior according to a second embodiment of the present invention, where this embodiment is applicable to recording and broadcasting system situations such as teaching and conference, and the method can be executed by an intelligent camera with a pan/tilt head, and specifically includes the following steps:

and step 210, quickly screening the character target with the standing behavior in the current frame image according to the track image list and a preset rule.

Step S210 performs preliminary screening on a large amount of data, for example, tens of people or hundreds of people may exist in a classroom, most useless data can be eliminated through the preliminary screening, in this embodiment, a character target that may have a standing behavior can be quickly screened out according to a track image list of a character, accurate locking of the character target is realized through a track, and then accurate closed-loop control over zooming and a pan-tilt is realized, and the whole action process of the character can be monitored based on analysis of the track, and a start point and an end point of subsequent shooting of the character target can be accurately controlled. The embodiment maintains track image sequences of all detected human targets, forms a track image list, and updates the track image list according to information detected in image frames.

In some embodiments, as shown in fig. 3, the step S210 of quickly filtering out the human target having the standing behavior in the current frame image according to the track image list and the preset rule specifically includes steps S211 to S213, and the specific contents are as follows:

step S211, detecting all the human targets in the current frame image, and acquiring the head bounding boxes of all the human targets.

In this embodiment, the use of the head as the detection element has the following advantages: 1. all directions and angles are visible, and the shielding resistance is good; 2. the head boundary frame can not deform; 3. the head bounding box is close to a square visually, so that the track matching and analysis are facilitated; 4. visual shielding effect is not easy to occur, and complexity of track image sequence analysis can be greatly simplified.

Step S212, updating the track image list according to the head bounding box of the detected human target.

In this embodiment, the sequence of track images is a series of head bounding boxes and a collection of images with time as the axis. The track image list is a list consisting of at least one track image sequence. The trajectory describes the temporal position variation of a human target and binds a current image for each position. The present embodiment updates the track image list according to the head bounding box of the human target detected for each frame image.

In some embodiments, as shown in fig. 4, step S212 specifically includes:

step S2121, calculating a distance between each trace image sequence and each detected head bounding box to obtain a distance netlist of distances between all trace image sequences and each detected head bounding box.

Wherein the distance between a sequence of trace images and a detected head bounding box is: and the difference value of the coordinates of the four corners of the latest head bounding box of the track image sequence and the four corners corresponding to the detected head bounding box.

And S2122, arranging the distance netlists in an ascending order according to the distance, and removing data with the distance exceeding a preset threshold value.

And S2123, performing track matching according to the arrangement sequence, and updating a track image list according to a matching result.

Specifically, if the detected head feature information of the head bounding box is matched with the head feature information of the track image sequence and the distances of the head bounding boxes are all smaller than the preset distance, the detected head bounding box is matched with the track image sequence, and the information of the corresponding track image sequence is updated according to the positions of the detected head bounding box and the detected bounding box; if a head bounding box matched with the current frame image does not exist in the current frame image of a track image sequence and the difference value between the timestamp and the current timestamp in the process of one-time matching on the track image sequence exceeds the preset time, deleting the track image sequence; and if the detected head bounding box does not have the track image sequence matched with the detected head bounding box, newly building a track image sequence for the detected head bounding box.

Step S213, screening out a human target with an up behavior in the current frame image according to the position feature information in the updated track image list, where the position feature information includes: an offset in the horizontal direction, an offset in the vertical direction, a standard deviation of the offset in the horizontal direction within a preset time, a standard deviation of the offset in the vertical direction within a preset time, a standard deviation of the area of the head bounding box, and a region range of the trajectory.

In the present embodiment, the offset in the horizontal direction is used to describe the displacement of the trajectory in the horizontal direction. The offset in the vertical direction is used to describe the displacement of the trajectory in the vertical direction. The standard deviation of the deviation in the horizontal direction within the preset time is used to measure the stability of the position change of the human target in the horizontal direction. The standard deviation of the deviation in the vertical direction within a preset time, which may be set to 0.5 second, is used to measure the stability of the change in the position of the human target in the vertical direction. The standard deviation of the area of the head bounding box is used for measuring the stability of the size of the human target in the picture, and can be used for eliminating false detection caused by the fact that the target walks in or away; the region range of the trajectory is used to describe the spatial extent of the image sequence of the trajectory in the image.

In this embodiment, the screening of the human target having an uprising behavior in the current frame image according to the position feature information in the updated track image list specifically includes: if the deviation in the horizontal direction of a track image sequence is less than or equal to a preset horizontal deviation, the deviation in the vertical direction is greater than or equal to a preset first vertical deviation, the standard deviation of the deviation in the horizontal direction in a preset time is less than or equal to a preset horizontal standard deviation, the standard deviation of the deviation in the vertical direction in a preset time is less than or equal to a preset vertical standard deviation, and the standard deviation of the area of a head bounding box is less than or equal to a preset area standard deviation, the human target corresponding to the track image sequence is considered to have a standing behavior, and the corresponding human target is screened out. The deviation in the horizontal direction is less than or equal to the preset horizontal preset, so that the false detection caused by irregular action can be eliminated, the deviation in the vertical direction is more than or equal to the preset vertical preset, so that the obvious rising behavior of the human target can be confirmed, the standard deviation of the deviation in the horizontal direction in the preset time is less than or equal to the preset horizontal standard deviation, and the standard deviation of the deviation in the vertical direction in the preset time is less than or equal to the preset vertical standard deviation, so that the false detection caused by the human target in the dynamic state can be eliminated, the influence caused by the change of the target distance can be eliminated, in the embodiment, when the position characteristic information of a track image sequence meets the conditions, the human target corresponding to the track image sequence is considered to have the rising behavior, the accuracy of the rising behavior detection is greatly improved, and the larger calculation pressure caused to a subsequent visual classifier is avoided, the detection rate is effectively improved.

And step 220, determining the human target with effective standing behavior according to the track image sequence corresponding to the screened human target by using a visual classifier based on a deep learning neural network.

The person target with the standing up behavior screened in the step S210 may have false detection to some extent, and the step S220 introduces a visual classifier based on a deep learning neural network, so that visual information contained in the track image sequence of the screened person target with the standing up behavior can be further analyzed, and invalid standing up behavior can be further filtered out through the visual classifier, and the person target with the valid standing up behavior can be accurately screened out. In this embodiment, the effective standing behavior is a behavior that changes from a sitting behavior to a standing behavior, and the ineffective standing behavior includes a behavior that changes from a sitting behavior to a walking behavior or a standing behavior. In the embodiment, the track image sequence of the screened character target is subjected to behavior analysis and detection through the visual classifier, so that the character target with effective standing behavior can be accurately detected.

Specifically, as shown in fig. 5, in some embodiments, step S220 includes steps S221 to S224, which are as follows:

and step S221, confirming the starting point and the ending point of the time axis of the person target according to the longitudinal coordinate of the center point of the head bounding box in the track image sequence corresponding to each screened person target.

The start position and the end position of the time axis can be determined according to the coverage range of the longitudinal coordinate of the center point of the head bounding box in the track image sequence.

And step S222, uniformly sampling images in the track image sequence according to time or central point vertical coordinates according to the starting point and the end point of the time axis.

After the time range is determined according to the start point and the end point of the time axis, uniform sampling may be performed in the interval according to the required number of samples. Or, determining a vertical coordinate range of the bounding box according to the starting point and the end point of the time axis, then performing uniform segmentation in the range, and then acquiring an image corresponding to the corresponding time point as a sampling sample. Therefore, the number of image samples is greatly reduced under the condition of not influencing the detection result, and the data processing efficiency is greatly improved.

And step S223, carrying out equal-length extension on the sampling image obtained in the step S222 in each screened track image sequence of the human target downwards according to the track area range corresponding to each image, expanding the sampling image into a square bounding box left and right by taking the longitudinal axis as a reference after extension, and cutting the image corresponding to the square bounding box.

Wherein, the track area range corresponding to each image is as follows: an image is cut off to the area range of a head boundary frame before the image in the track image sequence, the track area range corresponding to each image is extended with equal length downwards, and the extended track area range is extended into a square boundary frame left and right by taking the longitudinal axis as the direction, and the square boundary frame can acquire the posture of the human target in the up-down and left-right directions, so that the judgment of the human target behavior by a visual classifier is facilitated.

And S224, inputting the cut image obtained in the step S223 corresponding to each human target into a visual classifier based on a deep learning neural network, and screening out the human targets with effective standing behaviors.

Through the steps S221 to S223, the number of image samples input into the visual classifier can be greatly reduced, which is beneficial to improving the processing efficiency of the visual classifier while ensuring the accuracy, and after the visual classifier determines that there is an effective standing behavior in the step S224, the visual classifier screens out the human target corresponding to the effective standing behavior.

And step S230, controlling the camera to lock the character target corresponding to the effective standing behavior.

And step S240, performing close-up shooting on the locked person target.

In the embodiment, after the figure target with the effective standing behavior is detected, the camera is controlled to lock the figure target corresponding to the effective standing behavior, and the focal length and the position of the holder are synchronously adjusted to perform close-up shooting on the locked figure target, so that the figure target is displayed in the center of the picture, the size of the picture is proper, and the focal length and the holder do not move any more when the close-up shooting is performed.

And step S250, deleting other track image sequences except the track image sequence corresponding to the locked human target in the track image list, and subsequently updating the track image sequence corresponding to the locked human target according to the new image frame.

In this embodiment, after the system performs close-up shooting on the locked human target, the other track image sequences in the track image list except for the track image sequence corresponding to the locked human target are deleted, that is, only the track image sequence corresponding to the effective standing behavior is retained, and the subsequent track image list is updated only by updating the track image sequence corresponding to the locked human target, and no other track image sequence is added, so that the close-up shooting of the locked human target by the system is not affected, and the data processing amount by the system is also reduced.

And S260, if the person target with the effective standing behavior is detected to have sitting behavior, is lost or the close-up shooting time exceeds the preset time, recovering the wide-angle shooting of the camera, clearing the track image list, and updating and maintaining the track image list according to the new image frame.

In this embodiment, if it is found that the person target has a sitting behavior, is lost, or the close-up shooting duration exceeds the preset duration, the system resumes the wide-angle shooting of the camera, that is, the system directly enters the reset state, resets all the focal length and the pan-tilt to the default state, clears the track image list, and starts to perform the next round of detection.

Specifically, in an embodiment, the detecting that the human target with the effective standing behavior has a sitting behavior is specifically: and if the deviation of the character target with the effective standing behavior in the vertical direction is detected to be larger than or equal to a preset second vertical deviation, determining that the character target has a sitting behavior.

According to the shooting method based on standing behavior detection, only a lightweight algorithm is needed to be adopted to carry out primary screening on a large amount of image information, most useless information can be rapidly eliminated, and a person target with a standing behavior is screened out; the track image list of the screened character targets is subjected to high-precision behavior analysis and detection through the visual classifier, the real-time performance of updating the track image list is met, the high complexity of calculation of the visual classifier is also met, the detection efficiency and accuracy are greatly improved, and the character targets with effective standing behaviors can be effectively determined. According to the technical scheme, the figure target with the standing behavior in the current frame image is quickly screened out according to the track image list and the preset rule, the effective standing behavior in the screened figure target is determined by the aid of the visual classifier based on the deep learning neural network, the figure target with the effective standing behavior can be accurately and efficiently detected, the corresponding figure target is focused and shot in close-up mode, and user experience is greatly improved.

EXAMPLE III

Fig. 6 is a flowchart of a method for a shooting method based on detection of a standing behavior according to a third embodiment of the present invention, where this embodiment is applicable to recording and broadcasting systems such as teaching and conferences, and the method may be executed by a smart camera with a pan-tilt, where step S310, step S320, step S340, step S350, step S360, and step S370 in this embodiment are the same as step S210 to step S260 in the second embodiment, and specific embodiments may refer to corresponding contents of the second embodiment of the present invention, and are not described herein again. The difference between this embodiment and the second embodiment is that step S320 is followed by step S330, and if it is detected that the human target with the standing behavior has a sitting behavior or is lost, the human target with the standing behavior in the next frame of image is continuously and quickly screened out according to the track image list and the preset rule. The specific contents are as follows:

and 310, quickly screening the character target with the standing behavior in the current frame image according to the track image list and a preset rule.

And 320, determining the human target with effective standing behavior according to the track image sequence corresponding to the screened human target by using a visual classifier based on a deep learning neural network.

Step S330, judging whether the character target with the effective standing behavior has sitting behavior or is lost, if so, executing step S340; if not, go to step S350. And step S340, continuously and quickly screening the character target with the standing behavior in the next frame of image according to the track image list and a preset rule. After step S340 is executed, the process returns to step S320.

And step S350, controlling the camera to lock the character target corresponding to the effective standing behavior.

And step S360, performing close-up shooting on the locked person target.

In the embodiment, after the figure target with the effective standing behavior is determined by the visual classifier, if the figure target with the standing behavior is detected to have sitting behavior or be lost, the figure target with the standing behavior in the next frame of image is continuously and quickly screened out according to a track image list and a preset rule; otherwise, controlling the camera to lock the figure target corresponding to the effective standing behavior, and performing close-up shooting on the locked figure target, which is more favorable for improving the processing efficiency of the system.

Step S370, deleting other track image sequences in the track image list except the track image sequence corresponding to the locked human target, and subsequently updating the track image sequence corresponding to the locked human target according to the new image frame.

And S380, if the person target with the effective standing behavior is detected to have sitting behavior, is lost or the close-up shooting duration exceeds the preset duration, the wide-angle shooting of the camera is recovered, the track image list is cleared, and the track image list is updated and maintained according to the new image frame.

According to the embodiment, a large amount of image information is primarily screened only by adopting a lightweight algorithm, most useless information can be rapidly eliminated, and a character target with a standing behavior is screened out; after determining the character target with effective standing behavior through the visual classifier, if detecting that the character target with the standing behavior has sitting behavior or is lost, continuously and quickly screening the character target with the standing behavior in the next frame of image according to a track image list and a preset rule; otherwise, controlling the camera to lock the figure target corresponding to the effective standing behavior, and carrying out close-up shooting on the locked figure target; after the close-up shooting is completed, the wide-angle shooting of the camera is recovered, namely the pan-tilt and the camera are controlled to reach the initial position, so that the embodiment can flexibly deal with the actual situation, and the user experience is further improved.

The photographing system based on the standing behavior detection provided by the embodiment of the invention can execute the photographing method based on the standing behavior detection provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Example four

Fig. 7 is a system block diagram of a shooting system based on standing behavior detection according to a fourth embodiment of the present invention, where this embodiment is applicable to recording and broadcasting systems such as teaching and conference, and the shooting system based on standing behavior detection may be executed by a smart camera with a pan/tilt head, as shown in fig. 7, the shooting based on standing behavior detection includes:

the target screening module 10 is used for quickly screening the character targets with the standing behaviors in the current frame image according to the track image list and a preset rule; wherein the track image list is composed of track image sequences of all the human targets.

And the behavior screening module 20 is configured to determine, by using a visual classifier based on a deep learning neural network, a human target with an effective standing behavior according to the track image sequence corresponding to the screened human target.

And the locking module 30 is used for controlling the camera to lock the character target with the effective standing behavior.

And the shooting module 40 is used for shooting the locked person target in close-up.

According to the embodiment, the figure targets with the standing behaviors in the current frame image are quickly screened out according to the track image list and preset rules, effective standing behaviors in the screened figure targets are determined by using the visual classifier based on the deep learning neural network, the figure targets with the effective standing behaviors can be accurately and efficiently detected, focusing close-up shooting is carried out on the corresponding figure targets, and user experience is greatly improved.

EXAMPLE five

Fig. 8 is a system block diagram of a shooting system based on detection of a standing behavior according to a fifth embodiment of the present invention, which is applicable to recording and broadcasting systems such as teaching and conference, and the shooting system based on detection of a standing behavior may be executed by a smart camera with a pan/tilt head, as shown in fig. 8, the shooting based on detection of a standing behavior includes:

In some embodiments, as shown in fig. 9, target screening module 10 includes:

and the target detection unit 11 is configured to detect all the human targets in the current frame image, and acquire the head bounding boxes of all the human targets.

A list updating unit 12 for updating the track image list according to the detected head bounding box of the human target.

A target screening unit 13, configured to screen out, according to location feature information in the updated track image list, a person target with an up-right behavior in the current frame image, where the location feature information includes: an offset in the horizontal direction, an offset in the vertical direction, a standard deviation of the offset in the horizontal direction within a preset time, a standard deviation of the offset in the vertical direction within a preset time, a standard deviation of the area of the head bounding box, and a region range of the trajectory.

In some embodiments, the object screening unit 13 is specifically configured to determine that a human object corresponding to a track image sequence has a standing behavior and screen out the corresponding human object if a horizontal deviation of the track image sequence is smaller than or equal to a preset horizontal deviation, a vertical deviation of the track image sequence is larger than or equal to a preset first vertical deviation, a standard deviation of the horizontal deviation in a preset time is smaller than or equal to a preset horizontal standard deviation, a standard deviation of the vertical deviation in a preset time is smaller than or equal to a preset vertical standard deviation, and a standard deviation of the area of the head bounding box is smaller than or equal to a preset area standard deviation.

In some embodiments, as shown in fig. 10, behavior screening module 20 includes:

and a point confirming unit 21, configured to confirm a start point and an end point of a time axis of the person target according to a longitudinal coordinate of a center point of a head bounding box in the corresponding track image sequence of each screened person target.

And the sampling unit 22 is used for uniformly sampling the images in the track image sequence according to time or according to the central point vertical coordinate at the starting point and the end point of the time axis.

And the extending and cutting unit 23 is configured to extend the samples acquired by the sampling unit 22 in the track image sequence of each screened human target in equal length downwards according to the track area range corresponding to each image, extend the samples after extending the samples, take the longitudinal axis as an axis, extend the samples to the left and right to form a square bounding box, and cut the image corresponding to the square bounding box.

And the effective target screening unit 24 is used for respectively inputting the cut image obtained by the extended cutting unit 23 corresponding to each human target into the visual classifier based on the deep learning neural network, and screening out the human target with effective standing behavior.

And the deletion updating module 50 is used for deleting other track image sequences except the track image sequence corresponding to the locked human target in the track image list after the shooting module 40 carries out close-up shooting on the locked human target, and subsequently updating the track image sequence corresponding to the locked human target according to the new image frame.

And the recovery updating module 60 is configured to recover wide-angle shooting of the camera, clear the track image list, and update and maintain the track image list according to a new image frame when detecting that the person target with the effective standing behavior has a sitting behavior, is lost, or the close-up shooting duration exceeds a preset duration.

In some embodiments, the detecting that the human target with effective standing behavior has sitting behavior is specifically: and if the deviation of the character target with the effective standing behavior in the vertical direction is detected to be larger than or equal to a preset second vertical deviation, determining that the character target has a sitting behavior.

In some embodiments, the photographing system based on standing behavior detection further includes a determining module 70, where the determining module 70 is configured to determine whether the human target with effective standing behavior has sitting behavior or is lost. The target screening module 10 is further configured to, if the determining module 70 determines that the human target with the effective standing behavior has a sitting behavior or is lost, continue to quickly screen the human target with the standing behavior in the next frame of image according to the track image list and according to a preset rule.

The system provided by the embodiment only needs to adopt a lightweight algorithm to primarily screen a large amount of image information, can quickly eliminate most useless information, and screens out character targets with standing behaviors; after determining the character target with effective standing behavior through the visual classifier, if the character target with the standing behavior is detected to have sitting behavior or be lost, quickly screening out the character target with the standing behavior in the next frame of image according to a track image list and a preset rule; otherwise, controlling the camera to lock the figure target corresponding to the effective standing behavior, and carrying out close-up shooting on the locked figure target; after the close-up shooting is completed, the wide-angle shooting of the camera is recovered, namely the cradle head and the camera are controlled to reach the initial positions, the actual situation can be flexibly handled, and the user experience is further improved.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A shooting method based on standing behavior detection is characterized by comprising the following steps:

D. shooting the locked character target in close-up;

wherein, step B includes:

b1, confirming the starting point and the ending point of the time axis of the person target according to the longitudinal coordinate of the center point of the head bounding box in the track image sequence corresponding to each screened person target;

b2, uniformly sampling images in the track image sequence according to the starting point and the end point of a time axis according to time or uniformly sampling according to the vertical coordinate of a central point;

b3, carrying out equal-length extension on the sampled image obtained in the step B2 in the track image sequence of each screened human target downwards according to the track area range corresponding to each image, expanding the sample image into a square bounding box left and right by taking the longitudinal axis as the reference after extension, and cutting the image corresponding to the square bounding box;

and B4, inputting the cut image obtained in the step B3 corresponding to each human target into a visual classifier based on a deep learning neural network, and screening out the human targets with effective standing behaviors.

2. The shooting method based on the standing behavior detection as claimed in claim 1, wherein the step a of quickly screening the human target with the standing behavior in the current frame image according to the track image list and the preset rule comprises:

a1, detecting all the human targets in the current frame image, and acquiring the head bounding boxes of all the human targets;

a2, updating a track image list according to the head bounding box of the detected human target;

a3, screening out the human target with the standing behavior in the current frame image according to the position characteristic information in the updated track image list, wherein the position characteristic information comprises: an offset in the horizontal direction, an offset in the vertical direction, a standard deviation of the offset in the horizontal direction within a preset time, a standard deviation of the offset in the vertical direction within a preset time, a standard deviation of the area of the head bounding box, and a region range of the trajectory.

3. The shooting method based on standing behavior detection according to claim 2, wherein the step a3 of screening out the human target with standing behavior in the current frame image according to the position feature information in the updated track image list specifically comprises: if the deviation in the horizontal direction of a track image sequence is smaller than or equal to a preset horizontal deviation, the deviation in the vertical direction is larger than or equal to a preset first vertical deviation, the standard deviation of the deviation in the horizontal direction in a preset time is smaller than or equal to a preset horizontal standard deviation, the standard deviation of the deviation in the vertical direction in a preset time is smaller than or equal to a preset vertical standard deviation, and the standard deviation of the area of a head bounding box is smaller than or equal to a preset area standard deviation, determining that the human target corresponding to the track image sequence has a standing behavior, and screening out the corresponding human target.

4. The shooting method based on standing behavior detection according to claim 1, wherein the step D, after performing close-up shooting on the locked human target, further comprises:

E. deleting other track image sequences except the track image sequence corresponding to the locked human target in the track image list, and subsequently updating the track image sequence corresponding to the locked human target according to a new image frame;

F. and if the person target with the effective standing behavior is detected to have sitting behavior, lose or the close-up shooting time length exceeds the preset time length, recovering the wide-angle shooting of the camera, clearing the track image list, and updating and maintaining the track image list according to the new image frame.

5. The shooting method based on standing behavior detection according to claim 4, wherein the step of detecting that the human target with effective standing behavior has sitting behavior is specifically: and if the deviation of the character target with the effective standing behavior in the vertical direction is detected to be larger than or equal to a preset second vertical deviation, determining that the character target has a sitting behavior.

6. The shooting method based on standing behavior detection according to claim 1, wherein after determining the human target with effective standing behavior according to the trajectory image sequence corresponding to the screened human target by using a visual classifier based on a deep learning neural network, the method further comprises:

and G, if the person target with the standing behavior is detected to have sitting behavior or be lost, quickly screening the person target with the standing behavior in the next frame of image according to a preset rule according to the track image list.

7. A shooting system based on standing behavior detection, characterized in that the shooting system based on standing behavior detection comprises:

the target screening module is used for quickly screening the character target with the standing behavior in the current frame image according to the track image list and a preset rule; the track image list consists of track image sequences of all the character targets;

the shooting module is used for shooting the locked person target in close-up;

wherein the behavior screening module comprises:

the point confirmation unit is used for confirming the starting point and the end point of the time axis of the person target according to the longitudinal coordinate of the center point of the head bounding box in the track image sequence corresponding to each screened person target;

the sampling unit is used for uniformly sampling images in the track image sequence according to time or central point vertical coordinates at the starting point and the end point of a time axis;

the extending and cutting unit is used for carrying out equal-length extension on the images sampled in the track image sequence of each screened human target downwards according to the track area range corresponding to the image, expanding the images into a square boundary frame from left to right by taking the longitudinal axis as a reference after the extension, and cutting the images corresponding to the square boundary frame;

and the effective target screening unit is used for respectively inputting the cut images corresponding to each human target into a visual classifier based on the deep learning neural network, and screening the human targets with effective standing behaviors.

8. The upright behavior detection-based photographing system of claim 7, wherein the object filtering module comprises:

the target detection unit is used for detecting all the figure targets in the current frame image and acquiring the head bounding boxes of all the figure targets;

a list updating unit for updating the track image list according to the detected head bounding box of the human target;

the target screening unit is used for screening the character target with the standing behavior in the current frame image according to the position characteristic information in the updated track image list, wherein the position characteristic information comprises: an offset in the horizontal direction, an offset in the vertical direction, a standard deviation of the offset in the horizontal direction within a preset time, a standard deviation of the offset in the vertical direction within a preset time, a standard deviation of the area of the head bounding box, and a region range of the trajectory.