CN115514887A

CN115514887A - Control method and device for video acquisition, computer equipment and storage medium

Info

Publication number: CN115514887A
Application number: CN202211089801.0A
Authority: CN
Inventors: 张伟俊; 贾配洋; 侯俊
Original assignee: Insta360 Innovation Technology Co Ltd
Current assignee: Insta360 Innovation Technology Co Ltd
Priority date: 2022-09-07
Filing date: 2022-09-07
Publication date: 2022-12-23

Abstract

The application relates to a control method, a control device, computer equipment, a storage medium and a computer program product for video acquisition. The method comprises the following steps: performing gesture recognition on a first video picture acquired by shooting equipment to obtain a user gesture; judging whether an interested area in the first video picture is detected or not according to the relation between the user gesture and the preset gesture; when the region of interest is detected and is not located at the target position of the first video picture, adjusting shooting parameters of the shooting equipment, and acquiring a positioning picture according to the adjusted shooting parameters to enable the region of interest to be located at the target position of the positioning picture; correcting the region of interest in the positioning picture to obtain a corrected region of interest; and controlling the shooting equipment to collect the corrected video pictures in the region of interest to obtain a second video picture. By adopting the method, the video picture can be controlled to be shot according to the gesture, and the shooting efficiency is higher.

Description

Control method and device for video acquisition, computer equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for controlling video capture, a computer device, a storage medium, and a computer program product.

Background

In application scenes such as video conferences, course live broadcasting, course recorded broadcasting and the like, shooting a target character and a corresponding region of interest by using shooting equipment to obtain corresponding positioning pictures, and then transmitting the positioning pictures to terminals of all participants for displaying. When a certain area of interest is captured, the area of interest is captured by manually adjusting the capturing parameters of the capturing device in the conventional scheme. However, the manual adjustment method requires multiple adjustments to achieve the final purpose, resulting in low shooting efficiency.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a control method, an apparatus, a computer device, a computer readable storage medium and a computer program product for video capture, which can improve efficiency.

In a first aspect, the present application provides a method for controlling video capture. The method comprises the following steps:

a method for controlling video acquisition, the method comprising:

performing gesture recognition on a first video picture acquired by shooting equipment to obtain a user gesture;

judging whether the region of interest in the first video picture is located or not according to the relation between the user gesture and a preset gesture;

when the interested area is detected and is not located at the target position of the first video picture, adjusting shooting parameters of the shooting equipment, and acquiring a positioning picture according to the adjusted shooting parameters so that the interested area is located at the target position of the positioning picture;

correcting the region of interest in the positioning picture to obtain a corrected region of interest;

and controlling the shooting equipment to collect the video pictures in the corrected region of interest to obtain a second video picture.

In one embodiment, the correcting the region of interest in the positioning picture to obtain a corrected region of interest includes:

carrying out edge corner identification on the region of interest in the positioning picture;

correcting the region of interest into a preset shape according to the edge corner points;

and taking the region of interest with the preset shape as a corrected region of interest.

In one embodiment, the performing edge corner identification on the region of interest in the positioning picture includes:

detecting a positioning symbol in the positioning picture to obtain the position of the positioning symbol and the direction represented by the positioning symbol;

when the number of the positioning symbols is detected to meet a preset condition, and the positions of the positioning symbols are consistent with the directions of the positioning symbols, identifying edge corner points of the region of interest;

the correcting the region of interest into a preset shape according to the edge corner points comprises:

and projecting the region of interest into a preset shape according to the position of the positioning symbol.

detecting the positioning picture through a target detection model obtained by pre-training to obtain the region position of the region of interest and the offset of each edge corner point of the region of interest;

and correcting the region of interest based on the position of the region of interest and the offset of the edge corner point, and projecting the region of interest into a preset shape.

In one embodiment, the method further comprises:

performing gesture detection on a gesture detection area of the second video picture; the gesture detection area is at least a partial area in the second video picture;

judging whether a video acquisition mode is kept or not according to the gesture of the second video picture;

if so, acquiring the second video picture;

if not, adjusting the shooting parameters of the shooting equipment so that the adjusted shooting equipment can acquire a third video picture again; the shooting range of the third video picture is larger than that of the second video picture.

In one embodiment, the performing gesture detection on the gesture detection area of the second video picture includes:

determining positions of a gesture detection area and edge corners of the region of interest in the second video picture;

and when the distance between the gesture detection area and the position of the edge corner point is smaller than or equal to a distance threshold value, taking the gesture detection area as an effective detection area to perform gesture detection.

In one embodiment, the performing gesture recognition on the first video picture acquired by the shooting device to obtain the user gesture includes:

acquiring a frame image continuous with the time sequence of the first video image;

extracting gesture features and gesture change features from each frame image;

performing feature fusion on the gesture features and the gesture change features to obtain fusion features;

and recognizing the user gesture in the frame image according to the fusion feature.

In a second aspect, the application further provides a control device for video acquisition. The device comprises:

the gesture detection module is used for performing gesture recognition on a first video picture acquired by the shooting equipment to obtain a user gesture;

the interesting region determining module is used for judging whether the interesting region in the first video picture is positioned according to the relation between the user gesture and a preset gesture;

the acquisition adjusting module is used for adjusting shooting parameters of the shooting equipment when the interested area is positioned and is not positioned at the target position of the first video picture, and acquiring a positioning picture according to the adjusted shooting parameters so that the interested area is positioned at the target position of the positioning picture;

the correction module is used for correcting the region of interest in the positioning picture to obtain a corrected region of interest;

and the video acquisition module is used for controlling the shooting equipment to acquire the corrected video pictures in the region of interest to obtain a second video picture.

In a third aspect, the application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of controlling video capture in any of the embodiments described above when the processor executes the computer program.

In a fourth aspect, the present application further provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of controlling video capture in any of the embodiments described above.

In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program that when executed by a processor performs the steps of controlling video capture in any of the embodiments described above.

The control method, the control device, the computer equipment, the storage medium and the computer program product for video acquisition are used for carrying out gesture recognition on a first video picture acquired by shooting equipment to obtain a user gesture; judging whether an interested area in the first video picture is detected or not according to the relation between the user gesture and a preset gesture, and judging whether the shooting parameters of the first video picture are changed into the shooting parameters of the second video picture or not through the gesture; when the region of interest is detected and the region of interest is not located at the target position of the first video picture, adjusting shooting parameters of the shooting equipment, acquiring a positioning picture according to the adjusted shooting parameters, realizing the change of the image acquisition direction, enabling the region of interest to be located at the target position of the positioning picture, and further correcting the region of interest in the positioning picture to obtain a corrected region of interest; and controlling the shooting equipment to collect the video pictures in the corrected region of interest to obtain a second video picture. And the video picture is shot according to the gesture control, so that the shooting efficiency is higher.

Drawings

FIG. 1 is a diagram of an exemplary control method for video capture;

FIG. 2 is a schematic flow chart illustrating a method for controlling video capture according to one embodiment;

FIG. 3 is a diagram illustrating a first video frame, in accordance with one embodiment;

FIG. 4 is a diagram illustrating a first video frame in one embodiment;

FIG. 5 is a schematic view of different shaped locators in one embodiment;

FIG. 6 is a schematic illustration of different orientation indicators in one embodiment;

FIG. 7 is a schematic illustration of a corrected region of interest in one embodiment;

FIG. 8 is a block diagram showing the construction of a control apparatus for video capture in one embodiment;

FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

The control method for video capture provided by the embodiment of the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104, or may be located on the cloud or other network server. The image acquisition method provided in this embodiment may also be only applied to the terminal 102, and in such an integrated embedded device, through the AI algorithm, one or more of processing, control and storage related to the AI algorithm are performed, and these processing relate to the scheme of this application.

The terminal 102 may be, but not limited to, various shooting devices, sensors, radars, personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.

In one embodiment, as shown in fig. 2, a method for controlling video capture is provided, which is described by taking the method as an example applied to the terminal 102 in fig. 1, and includes the following steps:

step 202, performing gesture recognition on a first video picture acquired by the shooting device to obtain a user gesture.

The first video picture is an image collected by the shooting equipment in a state of keeping the shooting parameters of the shooting equipment within a certain range; the collected images can be continuous multi-frame images, images selected from continuous images according to a certain frequency, or certain frame images, and the common point of the images is that the shooting parameters of the shooting equipment are within a preset range. The shooting parameters of the shooting device comprise one or more parameters such as shooting angle, zoom coefficient and exposure. The shooting equipment is the electronic equipment who has the shooting function, and it includes equipment such as cloud platform camera, video conference machine, motion camera, panoramic camera, cloud platform camera, unmanned aerial vehicle, and the present case does not restrict this.

When the shooting parameters of the shooting device are in the preset range, the acquisition mode of the corresponding composition information representation image is the preset acquisition mode of the first video picture, the preset acquisition mode comprises a normal mode and a tracking mode, the normal mode is shown in fig. 3, and the tracking mode is shown in fig. 4.

The conventional mode is that the shooting equipment cannot control the holder to rotate, so that the lens of the shooting equipment is in a mode capable of shooting according to a normal state; the tracking mode is to acquire images of one or more objects to obtain real-time dynamic of the objects, and to shoot according to the real-time dynamic to realize the effect of intelligent composition.

In one embodiment, a tracking mode is set forth. Detecting a human body region corresponding to a human head region in the acquired first image, and estimating the height of the human head region and the human body region; calculating a height difference based on the height of the human head region and the height of the human body region, determining target composition information according to the height difference, and determining an acquisition mode required to be adopted according to the target composition information; and self-adaptively adjusting related parameters during the first image acquisition according to the image acquisition offset generated by the target composition information, and acquiring the image to obtain a second image. By adopting the tracking mode, the shot object can be half-length inserted when approaching the lens, the head of a person is at a proper height in the picture, and the shot object can be whole-length inserted when departing from the lens. Thus, we can ensure that the subject has a reasonable composition effect in the second image, whether close to or far from the lens. The requirement of a user for entering the mirror under the application scenes with different shooting distances is met.

The first image and the second image can be images acquired in any acquisition mode, and when the image acquisition offset does not need to be adjusted, the first image and the second image can be images acquired in the same acquisition mode. The human head region is a region where a head exists in the first image; the human head area is divided by the image dimension set by one or more area elements of the human head, and the area elements can be at least one of application scenes and business requirements; and the image dimension can be one or more angles in the front, the side and the back of the human head, and can be at least one angle in the front and the side of the human face. The human body region refers to a region in which a body part of a person is present in the first image; when a region in the first image may include both the head and the body, the region is adjusted based on the corresponding rule. It can be understood that, after a certain head region is detected, the body region corresponding to the certain head region is detected to determine the object indicated by the combination.

In one embodiment, performing gesture recognition on a first video picture acquired by a shooting device to obtain a user gesture includes: acquiring a frame image continuous with the time sequence of the first video image; extracting gesture features and gesture change features from each frame of image; performing feature fusion on the gesture features and the gesture change features to obtain fusion features; and recognizing the user gesture in the frame image according to the fusion features.

Illustratively, feature fusion is performed on the gesture feature and the gesture change feature to obtain a fusion feature, including: taking the average value of the sum of the gesture features and the gesture change features as a fusion feature; correspondingly, the method for recognizing the user gesture in the frame image according to the fusion features comprises the following steps: and identifying the user gestures of each frame of image in the sign language video by the fusion features in a fuzzy matching mode so as to improve the identification accuracy.

When the gesture characteristics and the gesture change characteristics are used for identifying the user gesture, the accuracy of the user gesture identification can be effectively improved. And integrating information of different dimensions according to the fusion characteristics between the gesture characteristics and the gesture change characteristics, thereby accurately and effectively combining corresponding meanings and being beneficial to improving the accuracy of gesture recognition of a user. Illustratively, when the gesture feature represents a word a and the gesture variation feature represents a word a, the user gesture represented by the fusion feature is a word a; and when the gesture feature represents the word A and the gesture change feature represents the word B, predicting again through the fusion feature obtained by fusing the gesture feature and the gesture change feature so as to accurately identify the gesture of the user.

In one embodiment, performing gesture recognition on a first video picture acquired by a shooting device to obtain a user gesture includes: acquiring a plurality of frame images with continuous time sequence in a first video picture; carrying out target part detection processing on the multi-frame images to obtain a boundary frame of at least one target part; the target part is a part with a larger area than the hand, and can be any one of the human head, the human face and the human head and the shoulder; performing gesture recognition according to the bounding box and the images of at least one target part, determining gestures corresponding to the positioning picture, wherein the gestures corresponding to the plurality of images form a gesture sequence; according to the gesture sequence, a user gesture is determined.

The method has the advantages that the target part is detected firstly, then the gesture recognition is carried out in the local area near the target part, compared with the full-image gesture recognition, the calculation force can be reduced, in addition, the characteristic granularity of the hand is relatively larger in the local area near the target part, so that the characteristics of the hand part are obvious, and the improvement of the accuracy of the gesture detection in each positioning picture can be promoted. In addition, the trigger gesture is obtained by considering the gestures in the multi-frame images with continuous time sequences, and compared with the method for determining the trigger gesture by adopting the gesture of the single-frame image, the accuracy of the trigger gesture can be improved, and the false triggering is avoided to a certain extent. For example, if a user carelessly makes a gesture or a gesture misclassification occurs in a frame, and the gesture occurs in a gesture sequence for a few times, the gesture will not be determined as an effective trigger gesture of the user, so that the electronic device will not respond to the gesture, and occurrence of false triggering is reduced.

And 204, judging whether the region of interest in the first video picture is positioned or not according to the relation between the user gesture and the preset gesture.

In one embodiment, the determining whether to locate the region of interest in the first video frame according to the relationship between the user gesture and the preset gesture includes: determining a relationship between a user gesture and a preset gesture; and judging whether the region of interest in the first video picture is positioned or not according to the information represented by the relationship.

Illustratively, determining a relationship between the user gesture and the preset gesture includes: and detecting whether the user gesture is a preset gesture or not by using a preset target detection algorithm. The target detection algorithm can be a common detection method, the common detection method can be one or more detection methods based on manual characteristics, and the detection methods based on the manual characteristics comprise a template matching method, a key point matching method and a key characteristic method; the common detection method may also be one or more detection methods based on a Convolutional Neural network technology, which may adopt one or more models of YOLO (You Only see Once), SSD (Single Shot multi box), R-CNN (Region-based Convolutional Neural network), or Mask R-CNN (Mask Region-based Convolutional Neural network). Illustratively, the YOLO model may be one or more of YOLOv1-YOLOv5, YOLOR, YOLOX, etc. versions.

A region of interest is a region with at least partial information of a certain object. The region may be detected by the anchor symbol or by an object detection model. Because shooting parameters of the shooting equipment are different, the acquisition modes of the composition information representation images are also different, so that the region of interest in the first video picture does not correspond to the target position, the region of interest in the first video picture is not necessarily located at the target position, and the region of interest in the first video picture may deviate from the target position. The target position is used to determine the shooting parameters of the shooting device to be able to form a second video picture.

When the region of interest is at the target location of the first video frame, then the first video frame includes the positioning frame. The relationship of the first video picture to the positioning picture may be one or more of the following relationships: the first video picture may be a positioning picture; selecting one or more frames of images from the first video picture as a positioning picture; according to a certain specification, the middle part area of the first video picture is cut out to be used as a positioning picture.

And step 206, when the region of interest is located and is not located at the target position of the first video picture, adjusting the shooting parameters of the shooting equipment, and acquiring the positioning picture according to the adjusted shooting parameters, so that the region of interest is located at the target position of the positioning picture.

In one embodiment, the terminal adjusts shooting parameters of the shooting device, and the method comprises the following steps: the terminal generates position difference information of the position of the region of interest in the first video picture and the target position; calculating a shooting parameter difference value of the first shooting picture relative to the positioning picture according to the position difference information; and adjusting the shooting parameters of the shooting equipment according to the shooting parameter difference values. The shooting parameters comprise one or more parameters of shooting direction and zoom factor of the shooting device.

And for the state that the terminal collects the positioning picture according to the adjusted shooting parameters, after the terminal adjusts the shooting parameters of the shooting equipment, the adjusted positioning picture collected by the shooting equipment is used for enabling the region of interest to be located at the target position of the positioning picture.

After the positioning picture is collected, if the position of the shooting equipment and a certain object indicated by the interested area are within a certain angle range, the interested area collected by the shooting equipment has smaller difference with the preset shape; and when a certain object, which is indicated by the position of the shooting equipment and the region of interest, exceeds the angle range, the region of interest acquired by the shooting equipment is greatly different from the preset shape, and at the moment, the region of interest has distortion, and the distortion is to be corrected.

And 208, correcting the region of interest in the positioning picture to obtain a corrected region of interest.

The method for correcting the region of interest in the positioning picture may be a scheme based on edge detection or a scheme based on color threshold segmentation, and both of the schemes can realize the correction of the region of interest to obtain a corrected region of interest. The difference between the corrected region of interest and the preset shape is within a certain range. Illustratively, the region of interest in the positioning picture is a trapezoid, and the preset shape is a rectangle, then the corrected region of interest approximates to a rectangle.

In one embodiment, the correcting the region of interest in the positioning frame to obtain a corrected region of interest includes: carrying out edge corner identification on the region of interest in the positioning picture; correcting the region of interest into a preset shape according to the edge corner points; and taking the region of interest with the preset shape as the corrected region of interest.

The edge corner points are edge points of the object to which the region of interest refers, and the edge points are vertices of the preset shape. When the position of the shooting equipment is over against the object indicated by the region of interest, the positioning picture has no distortion, and adjacent edge corner points are sequentially connected to generate the region of interest with a preset shape; when the position of the shooting equipment is different from the object indicated by the interested region to a certain extent, the positioning picture is distorted, and the adjacent edge corner points are sequentially connected to generate the interested region to be corrected.

The positioning picture is corrected by identifying the edge corner points instead of detecting the edges or colors of the areas, so that some areas with similar shapes or similar colors can be prevented from being mistakenly detected as the areas of interest, meanwhile, missing detection caused by the fact that the edges of the white board are not clear enough can be avoided, the identification accuracy is high, and the usability is high. The camera picture can not be switched to the mistakenly identified region of interest, the accuracy is high, and the product interaction experience can be improved.

Specifically, the edge corner identification of the region of interest in the positioning picture includes: detecting the positioning symbols in the positioning picture to obtain the positions of the positioning symbols and the directions represented by the positioning symbols; and when the number of the positioning symbols is detected to meet the preset condition and the positions of the positioning symbols are consistent with the directions of the positioning symbols, identifying the edge corner points of the region of interest. The shape of the positioning symbol is shown in fig. 5 (a) -5 (d), and the different orientations represented by the positioning symbol are shown in fig. 6 (a) -6 (d).

Correspondingly, the method for correcting the region of interest into a preset shape according to the edge corner points comprises the following steps: and projecting the region of interest into a preset shape according to the position of the positioning symbol.

Projecting the region of interest into a preset shape according to the position of the positioning symbol, which is a process of performing perspective transformation to project the region of interest into the preset shape. The process comprises the following steps: generating a corresponding perspective change matrix according to the vertex position of the preset shape; and carrying out projection correction on the positions of the positioning symbols in the region of interest according to the perspective change matrix to obtain a corrected projection image. The corrected projection image is shown in fig. 7.

Wherein the positioning symbol is a symbol with orientation indicative properties for determining the region of interest resulting from the preset shape setting. Since the positioning frame is captured by the shooting device, it may be distorted due to the shooting angle, and the position of the positioning symbol in the positioning frame is different from the pre-positioned position. The position of the positioning symbol can be expressed by the positioning symbol, the region of interest does not need to be detected according to an obvious boundary, and the identification accuracy and the false detection rate of the region of interest can be higher.

When the number of the positioning symbols is detected not to meet the preset condition and the positions of the positioning symbols are inconsistent with the orientations of the positioning symbols, determining a detection area of a positioning picture acquired again according to the orientations so as to update the number, the positions and the orientations of the positioning symbols in the positioning picture; and when the updated number meets the preset condition and the updated position is consistent with the updated position, identifying the edge corner points of the region of interest.

For the process of determining the detection area of the re-collected positioning picture through the orientation, the terminal adjusts the collection direction of the image according to the orientation represented by the detected positioning symbol; acquiring the image again according to the adjusted acquisition direction; in the re-acquired image, the detection area of the locator symbol is determined according to the orientation. The image acquisition direction is determined through the position of the positioning symbol, so that the camera with the holder can automatically find the region of interest under the condition that only part of the white board enters the mirror, the intelligent degree is high, and the interaction experience is good.

Specifically, the edge corner identification of the region of interest in the positioning picture includes: and detecting the positioning picture through a target detection model obtained by pre-training to obtain the region position of the region of interest and the offset of each edge corner point of the region of interest.

The object detection model is a model for identifying edge corners. Illustratively, the target detection model comprises a feature extraction network, a feature fusion enhancement network and a prediction branch network which are connected in sequence, the prediction branch network comprises a classification branch network, a target area prediction branch network and an offset prediction branch network, and the target area prediction branch network and the offset prediction branch network share part of network parameters. The feature extraction network, which can also be called a backbone network, is used for extracting video frame features of the positioning picture; the feature fusion enhancement network can also be called a feature pyramid and is used for carrying out feature fusion and enhancement processing on the video frame features to obtain a feature map; the classification branch network is used for determining whether the area type of the identified target area is the target type or not based on the characteristic diagram; the target area prediction branch network is used for obtaining the area position of the target area based on the characteristic diagram; and the offset prediction branch network is used for obtaining the offset of each edge corner point of the target area based on the feature map.

The positioning picture is processed through the target detection model, the region position of the region of interest and the offset of each edge corner point can be accurately obtained, correction is carried out according to the region position and the offset of each corner point on the basis, accuracy can be improved, and inaccurate image correction caused by the fact that the detected edge is not clear is avoided.

Correspondingly, the method for correcting the region of interest into a preset shape according to the edge corner points comprises the following steps: and correcting the region of interest based on the position of the region of interest and the offset of the edge corner point, and projecting the region of interest into a preset shape.

The offset of the edge corner points is the offset of each edge corner point relative to the center point of the preset detection frame. The offset can be obtained by decoding the offset of each edge corner point according to the center point, and can be obtained by combining the offset of each edge corner point, a certain side length of a preset detection frame and the position coordinate of the center point.

Exemplarily, decoding the offset of each edge corner according to the center point to obtain edge corner coordinates of each edge corner, including: and the product of the offset of each edge corner point multiplied by the width of the preset detection frame or the height of the preset detection frame is added to the position coordinates of the central point respectively to obtain the edge corner point coordinates of each edge corner point.

Specifically, projecting the region of interest into a preset shape includes: determining the target shape vertex coordinates of the region of interest; transforming to a projection transformation matrix of the vertex coordinates of the target shape according to the corner point coordinates of each edge corner point; and based on the projection transformation matrix, carrying out perspective transformation on the region position of the region of interest to obtain the region of interest after image correction.

Therefore, image correction is performed according to the corner point coordinates of each edge corner point, and the determined edge corner point coordinates in the embodiment are corrected, so that the influence caused by unclear edges can be better avoided, the interference of background information contained in an external rectangular frame is also avoided, and the correction accuracy is improved.

And step 210, controlling the shooting equipment to collect the corrected video pictures in the region of interest to obtain a second video picture.

The second video picture is an image captured by the video picture within the corrected region of interest. The second video picture is typically a real-time frame image. The first video picture is different from the second video picture mainly in the range corresponding to the shooting parameters, and the shooting parameters corresponding to the second video picture focus on the video pictures in the corrected region of interest. The second video picture includes the video picture within the corrected region of interest and may also include a boundary portion of the video picture within the corrected region of interest.

In one embodiment, after displaying the video picture in the corrected region of interest as the second video picture, the method further comprises: performing gesture detection on a gesture detection area of the second video picture; the gesture detection area is at least a partial area in the second video picture; judging whether the acquisition mode is kept or not according to the gesture of the second video picture; if so, acquiring a second video picture; if not, adjusting the shooting parameters of the shooting equipment so that the adjusted shooting equipment can acquire a third video picture again; the shooting range of the third video picture is larger than that of the second video picture.

The gesture detection area is an area where gesture detection is performed. The gesture detection area may be valid for the full view of the second video frame, or may be valid for a portion of the second video frame. When the gesture detection area is the full image of the second video image, the gesture detection is carried out on the whole second video image, so that the convenience of changing the interaction mode by a user is facilitated; when the gesture detection area has the effective gesture, the trigger area is limited to be smaller, so that the false trigger rate (due to the fact that other gestures in the whiteboard area are detected as effective gestures in a false mode, false triggering is caused) can be reduced, and the consumption of computing resources can also be reduced.

The third video picture and the second video picture have different capture modes, and the third video picture can have the same preset capture mode as the first video picture. Specifically, in order to acquire the third video image, the shooting parameters of the shooting device are adjusted, and the region of interest is used as a partial region of the third video image; it is to be understood that the first video frame and the third video frame are not necessarily video-captured of the same object, and are not necessarily video-captured in the same capture mode. When the first video picture is in the normal mode, the third video picture switched according to a certain gesture of the second video picture is in the tracking mode, and the third video picture switched according to another gesture of the second video picture is in the normal mode.

Specifically, the acquisition mode entering the second video image is a white board mode, and the acquisition modes of the first video image and the third video image are both conventional modes; detecting a user gesture of at least a partial area of the second video picture, and when the user gesture represents that the whiteboard mode exits, adjusting the shooting parameters of the shooting equipment to return to the conventional mode so that the adjusted shooting equipment can acquire the third video picture again.

Thereby, the user is enabled to use the photographing apparatus for the photographing of the first video picture and the photographing of the second video picture, and to freely switch between the two. Moreover, the user can use the gesture to switch modes at intervals, the mode switching is not needed to be carried out by operating an electronic product or a PC client, the manual correction of the recognition result is not needed, and the interaction experience is good.

Wherein, carry out gesture detection to the gesture detection area of second video picture, include: in a second video picture, determining the positions of edge corner points of a gesture detection area and an area of interest; and when the distance between the gesture detection area and the position of the edge corner point is smaller than or equal to a distance threshold value, performing gesture detection by taking the gesture detection area as an effective detection area. Wherein, the distance between the gesture detection area and the position of the edge corner point may be a euclidean distance. The gesture near the edge corner point of the second video picture is an effective gesture, so that the false triggering rate can be reduced, and the consumption of computing resources can be reduced.

The method also comprises a step of displaying the collected video pictures, wherein the step comprises the following steps: when the shooting equipment collects a first video picture, displaying the first video picture; when the shooting equipment collects a second video picture, displaying the second video picture; and when the shooting equipment collects the third video picture, displaying the third video picture.

In the control method for video acquisition, an interested area in the first video picture is detected, so that whether the shooting parameters of the first video picture are changed into the shooting parameters of the second video picture is judged through gestures; when the region of interest is detected and the region of interest is not located at the target position of the first video picture, adjusting shooting parameters of the shooting equipment, acquiring a positioning picture according to the adjusted shooting parameters, realizing the change of the image acquisition direction, enabling the region of interest to be located at the target position of the positioning picture, and further correcting the region of interest in the positioning picture to obtain a corrected region of interest; and controlling the shooting equipment to collect the video pictures in the corrected region of interest to obtain a second video picture. The shooting efficiency is higher; and the picture display effect is better, and the reading experience of the picture content is better.

In one embodiment, the implementation is discussed more clearly by a specific application scenario. The object to which the region of interest is directed is a whiteboard, the acquisition mode of the first video pictures is a normal mode, and the acquisition mode of the second video pictures is a whiteboard mode.

When the user gesture in the first video picture is determined to be a preset gesture, adjusting the shooting parameters of the shooting equipment, and controlling the shooting equipment to enter a whiteboard mode; in the process of entering the whiteboard mode, adjusting shooting parameters of the shooting equipment, positioning the region of interest, and determining a positioning picture of the region of interest at the target position; in the positioning picture, correcting the quadrilateral picture area into a rectangle according to 4 edge corner points of the whiteboard area. And the enhanced display can be carried out, so that the picture display effect is better, and the reading experience of the picture content is better. In the second video picture, judging whether the video acquisition mode is kept or not according to the gesture of the second video picture, and if so, displaying the second video picture; if not, adjusting the shooting parameters of the shooting equipment so that the adjusted shooting equipment can acquire the third video picture again. Thereby, the user is enabled to use the photographing apparatus for both the regular photographing and the whiteboard picture photographing, and to freely switch between the two.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present application further provides a control device for video capture, which is used for implementing the above-mentioned control method for video capture. The implementation scheme for solving the problem provided by the apparatus is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the control apparatus for video capture provided below may refer to the limitations in the above control method for video capture, and details are not described here again.

In one embodiment, as shown in fig. 8, there is provided a control apparatus for video capture, including: a gesture detection module 802, a region of interest determination module 804, an acquisition adjustment module 806, a correction module 808, and a video acquisition module 810, wherein:

the gesture detection module 802 is configured to perform gesture recognition on a first video picture acquired by a shooting device to obtain a user gesture;

an interested region determining module 804, configured to determine whether to locate an interested region in the first video frame according to a relationship between the user gesture and a preset gesture;

the acquisition adjusting module 806 is configured to adjust the shooting parameters of the shooting device when the region of interest is located and the region of interest is not located at the target position of the first video frame, and acquire a positioning frame according to the adjusted shooting parameters, so that the region of interest is located at the target position of the positioning frame;

a correction module 808, configured to correct the region of interest in the positioning frame, to obtain a corrected region of interest;

and the video acquisition module 810 is configured to control the shooting device to acquire the video picture in the corrected region of interest to obtain a second video picture.

In one embodiment, the correction module 808 is configured to perform edge corner identification on a region of interest in the positioning frame; correcting the region of interest into a preset shape according to the edge corner points; and taking the region of interest with the preset shape as the corrected region of interest.

In one embodiment, the correction module 808 is configured to detect a positioning symbol in the positioning frame, so as to obtain a position of the positioning symbol and an orientation represented by the positioning symbol; when the number of the positioning symbols is detected to meet a preset condition, and the positions of the positioning symbols are consistent with the positions of the positioning symbols, identifying edge corner points of the region of interest; correspondingly, the video capture module 810 is configured to project the region of interest into a preset shape according to the position of the positioning symbol.

In one embodiment, the correction module 808 is configured to detect the positioning frame through a target detection model obtained through pre-training, so as to obtain an area position of the region of interest and offsets of corner points of each edge of the region of interest; correspondingly, the video capture module 810 is configured to correct the region of interest based on the position of the region of interest and the offset of the edge corner point, and project the region of interest into a preset shape.

In one embodiment, the gesture detection module 802 is further configured to perform gesture detection on a gesture detection area of the second video frame; judging whether a video acquisition mode is kept or not according to the gesture of the second video picture; if so, acquiring the second video picture; if not, adjusting the shooting parameters of the shooting equipment so that the adjusted shooting equipment can acquire a third video picture again; the shooting range of the third video picture is larger than that of the second video picture.

In one embodiment, the gesture detection module 802 is further configured to determine, in the second video frame, a gesture detection area and positions of edge corners of the area of interest; and when the distance between the gesture detection area and the position of the edge corner point is smaller than or equal to a distance threshold value, performing gesture detection by taking the gesture detection area as an effective detection area.

In one embodiment, the gesture detection module 802 is further configured to acquire a frame image consecutive to a time sequence of the first video frame; extracting gesture features and gesture change features from each frame image; performing feature fusion on the gesture features and the gesture change features to obtain fusion features; and recognizing the user gesture in the frame image according to the fusion feature.

All or part of the modules in the control device for video acquisition can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 9. The computer apparatus includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input device. The processor, the memory and the input/output interface are connected by a system bus, and the communication interface, the display unit and the input device are connected by the input/output interface to the system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a control method of video capture. The display unit of the computer equipment is used for forming a visual and visible picture, and can be a display screen, a projection device or a virtual reality imaging device, the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In an embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, carries out the steps in the method embodiments described above.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant country and region.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), magnetic Random Access Memory (MRAM), ferroelectric Random Access Memory (FRAM), phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), for example. The databases involved in the embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims

1. A method for controlling video capture, the method comprising:

judging whether an interested area in the first video picture is detected or not according to the relation between the user gesture and a preset gesture;

2. The method according to claim 1, wherein the correcting the region of interest in the positioning picture to obtain a corrected region of interest comprises:

and taking the region of interest with the preset shape as the corrected region of interest.

3. The method according to claim 2, wherein the edge corner identification of the region of interest in the positioning picture comprises:

when the number of the positioning symbols is detected to meet a preset condition, and the positions of the positioning symbols are consistent with the positions of the positioning symbols, identifying edge corner points of the region of interest;

4. The method according to claim 2, wherein the performing edge corner identification on the region of interest in the positioning picture comprises:

5. The method of claim 1, further comprising:

judging whether an acquisition mode is kept or not according to the gesture of the second video picture;

if yes, collecting the second video picture;

6. The method of claim 5, wherein the gesture detection of the gesture detection area of the second video frame comprises:

7. The method of claim 1, wherein the performing gesture recognition on the first video picture acquired by the shooting device to obtain the user gesture comprises:

extracting gesture features and gesture change features from each frame image;

8. A control device for video capture, the device comprising:

the gesture detection module is used for carrying out gesture recognition on a first video picture acquired by the shooting equipment to obtain a user gesture;

the acquisition adjusting module is used for adjusting the shooting parameters of the shooting equipment when the region of interest is positioned and is not positioned at the target position of the first video picture, and acquiring a positioning picture according to the adjusted shooting parameters so that the region of interest is positioned at the target position of the positioning picture;

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.