CN112287877A - Multi-role close-up shot tracking method - Google Patents
Multi-role close-up shot tracking method Download PDFInfo
- Publication number
- CN112287877A CN112287877A CN202011294296.4A CN202011294296A CN112287877A CN 112287877 A CN112287877 A CN 112287877A CN 202011294296 A CN202011294296 A CN 202011294296A CN 112287877 A CN112287877 A CN 112287877A
- Authority
- CN
- China
- Prior art keywords
- video
- close
- path
- video data
- deep learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/165—Detection; Localisation; Normalisation using facial parts and geometric relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Geometry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a multi-role close-up shot tracking method, which comprises the following steps: acquiring multi-channel video data, constructing a deep learning model based on a CNN network, and respectively carrying out human body detection and face detection on the multi-channel video data through the deep learning model; respectively carrying out identity matching on people appearing in each path of video according to the human body detection result and the human face detection result; and respectively selecting the lens with the optimal view angle for different identity characters, and pushing the image and/or video stream with the optimal view angle corresponding to the different identity characters. The invention can accurately judge the effective character data in the monitoring system, improve the analysis capability of the video data, provide high-quality analysis results and provide close-up images and/or video streams of detected characters in real time.
Description
Technical Field
The invention relates to the technical field of video image processing, in particular to a method for tracking a multi-angle close-up shot.
Background
At present, the deep learning technology is continuously developed and advanced, and becomes one of the most popular scientific trends at present. Convolutional Neural Networks (CNNs) are important algorithms in deep learning, are very good at handling image-related problems, are widely used in the field of computer vision today, and play an important role in face detection, image retrieval, and the like.
In the prior art, people to be monitored often need to be monitored and identified, most of the existing methods are used for monitoring the people to be monitored in real time by arranging video monitoring equipment, establishing a model for a large-scale data set obtained by monitoring, extracting characteristics and outputting related data of the people to be monitored, but in most scenes, monitoring and identifying by utilizing video data are difficult to effectively meet customized service requirements, such as: because child abusing events of the childbirth control mechanism occur frequently, in order to ensure the safety of children and keep the consistency of home care and childbirth care, a household often wants to check the monitoring video of the childbirth control mechanism in real time. This need is often difficult to meet, however, for the following reasons: (1) the privacy of other children can be invaded by directly using the traditional monitoring system to check the video; (2) the characteristic education content of the entrusting institution can be revealed by displaying the video of the full picture, so that the competitiveness of the entrusting institution is weakened; (3) even if some support mechanisms allow guardians to check monitoring videos on mobile phone software or computer software in real time, due to the fact that the shooting angle of the monitoring camera is fixed and the positions of infants move, the monitoring camera cannot change according to the actions of people to be detected in real time, the guardians cannot always see close-up shots of children, time is needed to be spent on positioning the children in each monitoring picture, the data checked by all the guardians are consistent, and no customized video data are distributed. Therefore, based on the above-mentioned practical problems, it is urgently needed to provide a close-up scene that can track the occurrence of a specific character in a specific scene in real time under multiple paths of monitoring videos in some scenes, and realize highly customized automatic generation service of video data so as to meet the customized data requirements in such scenes.
Disclosure of Invention
In order to solve the above technical problem, the present invention provides a method for tracking a multi-angle close-up shot. In the system and the method,
in order to achieve the purpose, the technical scheme of the invention is as follows:
a multi-character close-up shot tracking method, comprising the steps of:
obtaining multi-channel video data, constructing a deep learning model based on a CNN network, and passing the deep learning model
Respectively carrying out human body detection and human face detection on the multi-channel video data;
respectively carrying out identity matching on people appearing in each path of video according to the human body detection result and the human face detection result;
and respectively selecting the lens with the optimal view angle for different identity characters, and pushing the image and/or video stream with the optimal view angle corresponding to the different identity characters.
Preferably, before the step of pushing the optimal view angle images and/or video streams corresponding to the different identity characters, the method further includes the steps of performing corresponding central area interception on the person in the lens of the optimal view angle of each different identity character, and performing high-definition image restoration on the intercepted area.
Preferably, the shot with the optimal viewing angle is the shot with the largest number of key points for face detection in the multi-path video.
Preferably, the key points of the face detection include a left inner and outer eye corner, a nose heel point, a right inner and outer eye corner, a nose root point, a left nose wing, a right nose wing, a nose separation point, a left lip, a right lip, an upper lip, a lower lip and a mental point.
Preferably, the human body detection and the face detection are respectively performed on the multiple paths of video data, and the method specifically includes the following steps:
constructing a deep learning model based on a CNN network, extracting image features from multi-path video data, and completing the process
Generating a plurality of location box predictions and category predictions;
the method comprises the steps that loss calculation is carried out on a plurality of position frame predictions and category predictions and a label frame respectively to obtain corresponding loss values;
and updating parameters of the deep learning model according to the loss value.
Preferably, the loss function adopted by the position prediction is smooth L1 loss:
in the formula, x1And x2Both are the difference between the position prediction and the real position, and the value ranges of the position prediction and the real position are respectively-1<x1<1,x2>1 or x2<-1。
Preferably, the loss function of the classification prediction is a cross-entropy function:
in the formula, y'iIndex data tag, yiRefers to the prediction probability value.
Preferably, the specific method for respectively performing identity matching on people in each video according to the human body detection result and the human face detection result comprises the following steps:
calling a pre-trained feature vector extraction model, and extracting feature vectors of characters in each path of video from the video stream;
calculating Euler distance between every two feature vectors;
according to the calculated Euler distance, obtaining a similarity result of people in each path of video;
and matching the identities of the people in each path of video according to the similarity result.
Preferably, the euler distance is calculated by the formula:
in the formula, miAnd niAre elements of any two sets of feature vectors in different video streams.
Preferably, the multi-channel video data is acquired from different angles through a plurality of monitoring acquisition devices.
Based on the technical scheme, the invention has the beneficial effects that: the invention takes a deep learning technology as a core, firstly solves the problems of pedestrian detection and face detection by utilizing a target detection technology in the deep learning, multiplexes the feature vectors of a detection network to complete the person identity matching of each path of video, automatically selects a group of shots with the best view angle for each person appearing in a scene, and generates a close-up shot of each person in real time under the shots with the best view angle. Even if the optimal visual angle changes due to the fact that the position of the person moves, the method and the device can capture the path with the optimal visual angle in all video streams all the time to process, track the close-up shot of the person all the time, provide the close-up image and/or the video stream of the detected person in real time, and improve the use experience of a user.
Drawings
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
FIG. 1: the invention relates to a flow chart of a multi-role close-up shot tracking method;
FIG. 2: the invention discloses a video stream input and output schematic diagram in a multi-role close-up shot tracking method;
FIG. 3: the invention relates to an algorithm function flow chart in a multi-role close-up shot tracking method;
FIG. 4: the invention relates to a deep learning training schematic diagram in a multi-role close-up shot tracking method;
FIG. 5: the invention discloses a task identity matching schematic diagram in a multi-role close-up shot tracking method.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
Example one
As shown in fig. 1 to 5, the method for tracking a multi-angular close-up shot of the present invention comprises the following steps:
1. in certain scenarios (e.g., kindergarten, nursery, nursing home, etc.), multiple surveillance cameras capture video from multiple perspectives.
2. Inputting the multi-channel video data acquired in the step 1 into an AI inference edge server for comprehensive processing, wherein the specific processing steps are as shown in FIG. 1: and performing real-time human body detection and face detection on each path of video, so as to obtain human body detection results and face recognition results of corresponding multiple paths of video → performing identity matching on people appearing in each path of video, wherein the matching result is that each person with a specific identity corresponds to the monitoring video under multiple visual angles. → the selection of the optimal view angle for each person with a specific identity is carried out according to the number of key points detected by the human face, the more key points detected by the human face, the better the view angle is, the central region is intercepted for the optimal view angle of each person, and the exclusive 'close-up shot' of each person with a specific identity is obtained. → high-definition repair is carried out for the close-up shot of each specific identity figure, and the video stream output is pushed.
The algorithm work flow is illustrated by taking fig. 3 as an example. Assume that two video streams 1 and 2 are captured by different cameras at two viewing angles, and for the same scene, there are two people: adults and children. Firstly, a deep learning detection algorithm is used for carrying out human body detection and human face detection on videos at two visual angles. Video streams 1 and 2 each get two detection results ID1 and ID 2. The ID detected by the two video streams is matched, and as a result, the ID2 under the video stream 1 corresponds to the ID1 under the video stream 2 and is a person (adult) with the same identity, and the ID1 under the video stream 1 corresponds to the ID2 under the video stream 2 and is a person (child) with the same identity. Thus, an adult and a child have shots from two perspectives, for the adult, that appear in both ID1 in video 1 and ID2 in video stream 2. Next, comparing the two lenses, and selecting the best visual angle of the adult according to the number of key points of the face detection, wherein the key points of the face detection comprise a left inner and outer eye angle, a nose heel point, a right inner and outer eye angle, a nose root point, a left nose wing, a right nose wing, a nose septal point, a left lip, a right lip, an upper lip, a lower lip and a front chin point. It is found that there are more key points detectable by the adult under the lens 2, and the visual angle of the lens 2 for the adult is better. Also, the child's angle of view under the lens 1 is better. And then, performing center area interception on the optimal shots of the two characters, and performing push video streaming.
Implementation details:
and respectively carrying out human body detection and face detection on the multi-path video data: the target detection network using deep learning as the core is not limited to single-stage, double-stage or anchor free, anchor base and other frameworks.
The training process is as shown in fig. 4, a large amount of image data with a character position label box is needed, a deep learning model is input, the model takes a CNN (continuous neural network) as a main network, image features are gradually extracted, finally predictions of a plurality of human body/human face position boxes and predictions of corresponding categories are output, loss calculation is performed on the prediction results and the label box, loss values of a batch of data by the network are obtained, and parameters of the deep learning model are updated according to the loss values, so that the position predictions and the category predictions of subsequent data predictions by the model are closer to true values.
Wherein, the loss function of the position prediction is smooth L1 loss:
in the formula, x1And x2Both are the difference between the position prediction and the real position, and the value ranges of the position prediction and the real position are respectively-1<x1<1,x2>1 or x2<-1。
The loss function for class prediction is cross entropy loss:
in the formula, y'iIndex data tag, yiRefers to the prediction probability value. It can be seen that human detection/face detection is a training process for multi-task learning.
And respectively carrying out identity ID matching on people appearing in each path of video according to the results of human body detection and human face detection: in order to match the identities of people in different video streams (visual angles), the invention adopts a mode of calculating human body detection/human face detection and outputting the Euler distance of features. The specific method is as shown in fig. 5, the detection model performs inference calculation on different videos to obtain corresponding output features, and performs euler distance calculation on features corresponding to different video stream detection results to perform person identity matching. Essentially, two euler distances between a plurality of feature vectors are calculated, and the euler distance calculation formula is as follows:
in the formula, miAnd niAre elements of any two sets of feature vectors in different video streams.
The above description is only a preferred embodiment of the multi-angular close-up tracing method disclosed in the present invention, and is not intended to limit the scope of the embodiments of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the embodiments of the present disclosure should be included in the protection scope of the embodiments of the present disclosure.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The embodiments in the present specification are all described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Claims (10)
1. A method for multi-angular close-up shot tracking, comprising the steps of:
acquiring multi-channel video data, constructing a deep learning model based on a CNN network, and respectively carrying out human body detection and face detection on the multi-channel video data through the deep learning model;
respectively carrying out identity matching on people appearing in each path of video according to the human body detection result and the human face detection result;
and respectively selecting the lens with the optimal view angle for different identity characters, and pushing the image and/or video stream with the optimal view angle corresponding to the different identity characters.
2. The method for multi-character close-up shot tracking according to claim 1, wherein before the step of pushing the optimal perspective images and/or video streams corresponding to different identity characters, the method further comprises the steps of intercepting the corresponding central area of each of the different identity characters within the optimal perspective shot, and performing high-definition image restoration on the intercepted area.
3. The method as claimed in claim 1, wherein the shots with the best view angle are shots with the largest number of key points detected by the face in the multi-path video.
4. The method as claimed in claim 3, wherein the key points of face detection include left inner and outer corners of eyes, nose heel point, right inner and outer corners of eyes, nose root point, left nose wing, right nose wing, nose diaphragm point, left and right lips, upper and lower lips, and mental front point.
5. The method for multi-angular close-up shot tracking according to claim 1, wherein the steps of performing human body detection and human face detection on the multi-channel video data respectively comprise:
constructing a deep learning model based on a CNN network, extracting image features from multi-path video data, and completing prediction of a plurality of position frames and category prediction;
the method comprises the steps that loss calculation is carried out on a plurality of position frame predictions and category predictions and a label frame respectively to obtain corresponding loss values; and updating parameters of the deep learning model according to the loss value.
6. A multi-feature close-up tracing method according to claim 5, wherein said position prediction uses a loss function of smooth L1 loss:
in the formula, x1And x2Both are the difference between the position prediction and the real position, and the value ranges of the position prediction and the real position are respectively-1<x1<1,x2>1 or x2<-1。
8. The method for multi-angular close-up shot tracking according to claim 1, wherein the specific method for respectively matching the identities of the people in each video according to the results of human body detection and human face detection comprises:
calling a pre-trained feature vector extraction model, and extracting feature vectors of characters in each path of video from the video stream;
calculating Euler distance between every two feature vectors;
according to the calculated Euler distance, obtaining a similarity result of people in each path of video;
and matching the identities of the people in each path of video according to the similarity result.
10. The method of claim 1, wherein the multiple video data is captured from different angles by a plurality of surveillance capture devices.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011294296.4A CN112287877B (en) | 2020-11-18 | 2020-11-18 | Multi-role close-up shot tracking method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011294296.4A CN112287877B (en) | 2020-11-18 | 2020-11-18 | Multi-role close-up shot tracking method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112287877A true CN112287877A (en) | 2021-01-29 |
CN112287877B CN112287877B (en) | 2022-12-02 |
Family
ID=74397916
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011294296.4A Active CN112287877B (en) | 2020-11-18 | 2020-11-18 | Multi-role close-up shot tracking method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112287877B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114542874A (en) * | 2022-02-23 | 2022-05-27 | 常州工业职业技术学院 | Device for automatically adjusting photographing height and angle and control system thereof |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108564052A (en) * | 2018-04-24 | 2018-09-21 | 南京邮电大学 | Multi-cam dynamic human face recognition system based on MTCNN and method |
CN109117803A (en) * | 2018-08-21 | 2019-01-01 | 腾讯科技(深圳)有限公司 | Clustering method, device, server and the storage medium of facial image |
CN109543560A (en) * | 2018-10-31 | 2019-03-29 | 百度在线网络技术(北京)有限公司 | Dividing method, device, equipment and the computer storage medium of personage in a kind of video |
CN109829436A (en) * | 2019-02-02 | 2019-05-31 | 福州大学 | Multi-face tracking method based on depth appearance characteristics and self-adaptive aggregation network |
CN109919977A (en) * | 2019-02-26 | 2019-06-21 | 鹍骐科技(北京)股份有限公司 | A kind of video motion personage tracking and personal identification method based on temporal characteristics |
CN109919097A (en) * | 2019-03-08 | 2019-06-21 | 中国科学院自动化研究所 | Face and key point combined detection system, method based on multi-task learning |
CN110414415A (en) * | 2019-07-24 | 2019-11-05 | 北京理工大学 | Human bodys' response method towards classroom scene |
WO2020001082A1 (en) * | 2018-06-30 | 2020-01-02 | 东南大学 | Face attribute analysis method based on transfer learning |
CN110852219A (en) * | 2019-10-30 | 2020-02-28 | 广州海格星航信息科技有限公司 | Multi-pedestrian cross-camera online tracking system |
CN111401238A (en) * | 2020-03-16 | 2020-07-10 | 湖南快乐阳光互动娱乐传媒有限公司 | Method and device for detecting character close-up segments in video |
CN111783749A (en) * | 2020-08-12 | 2020-10-16 | 成都佳华物链云科技有限公司 | Face detection method and device, electronic equipment and storage medium |
CN111815675A (en) * | 2020-06-30 | 2020-10-23 | 北京市商汤科技开发有限公司 | Target object tracking method and device, electronic equipment and storage medium |
-
2020
- 2020-11-18 CN CN202011294296.4A patent/CN112287877B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108564052A (en) * | 2018-04-24 | 2018-09-21 | 南京邮电大学 | Multi-cam dynamic human face recognition system based on MTCNN and method |
WO2020001082A1 (en) * | 2018-06-30 | 2020-01-02 | 东南大学 | Face attribute analysis method based on transfer learning |
CN109117803A (en) * | 2018-08-21 | 2019-01-01 | 腾讯科技(深圳)有限公司 | Clustering method, device, server and the storage medium of facial image |
CN109543560A (en) * | 2018-10-31 | 2019-03-29 | 百度在线网络技术(北京)有限公司 | Dividing method, device, equipment and the computer storage medium of personage in a kind of video |
WO2020155873A1 (en) * | 2019-02-02 | 2020-08-06 | 福州大学 | Deep apparent features and adaptive aggregation network-based multi-face tracking method |
CN109829436A (en) * | 2019-02-02 | 2019-05-31 | 福州大学 | Multi-face tracking method based on depth appearance characteristics and self-adaptive aggregation network |
CN109919977A (en) * | 2019-02-26 | 2019-06-21 | 鹍骐科技(北京)股份有限公司 | A kind of video motion personage tracking and personal identification method based on temporal characteristics |
CN109919097A (en) * | 2019-03-08 | 2019-06-21 | 中国科学院自动化研究所 | Face and key point combined detection system, method based on multi-task learning |
CN110414415A (en) * | 2019-07-24 | 2019-11-05 | 北京理工大学 | Human bodys' response method towards classroom scene |
CN110852219A (en) * | 2019-10-30 | 2020-02-28 | 广州海格星航信息科技有限公司 | Multi-pedestrian cross-camera online tracking system |
CN111401238A (en) * | 2020-03-16 | 2020-07-10 | 湖南快乐阳光互动娱乐传媒有限公司 | Method and device for detecting character close-up segments in video |
CN111815675A (en) * | 2020-06-30 | 2020-10-23 | 北京市商汤科技开发有限公司 | Target object tracking method and device, electronic equipment and storage medium |
CN111783749A (en) * | 2020-08-12 | 2020-10-16 | 成都佳华物链云科技有限公司 | Face detection method and device, electronic equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
肖旭章: ""基于人脸跟踪识别的多摄像头调度系统的研究与实现"", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114542874A (en) * | 2022-02-23 | 2022-05-27 | 常州工业职业技术学院 | Device for automatically adjusting photographing height and angle and control system thereof |
Also Published As
Publication number | Publication date |
---|---|
CN112287877B (en) | 2022-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sabir et al. | Recurrent convolutional strategies for face manipulation detection in videos | |
EP3467707B1 (en) | System and method for deep learning based hand gesture recognition in first person view | |
Laraba et al. | 3D skeleton‐based action recognition by representing motion capture sequences as 2D‐RGB images | |
CN105590091B (en) | Face recognition method and system | |
US20170032182A1 (en) | System for adaptive real-time facial recognition using fixed video and still cameras | |
CN112530019B (en) | Three-dimensional human body reconstruction method and device, computer equipment and storage medium | |
Carneiro et al. | Fight detection in video sequences based on multi-stream convolutional neural networks | |
Shah et al. | Multi-view action recognition using contrastive learning | |
CN109063580A (en) | Face identification method, device, electronic equipment and storage medium | |
WO2022120843A1 (en) | Three-dimensional human body reconstruction method and apparatus, and computer device and storage medium | |
CN110348371A (en) | Human body three-dimensional acts extraction method | |
Zhou et al. | A study on attention-based LSTM for abnormal behavior recognition with variable pooling | |
Badhe et al. | Artificial neural network based indian sign language recognition using hand crafted features | |
CN112906520A (en) | Gesture coding-based action recognition method and device | |
CN110543813B (en) | Face image and gaze counting method and system based on scene | |
CN112287877B (en) | Multi-role close-up shot tracking method | |
Kaur et al. | Violence detection in videos using deep learning: A survey | |
Tur et al. | Isolated sign recognition with a siamese neural network of RGB and depth streams | |
Uçan et al. | Deepfake and security of video conferences | |
Mesbahi et al. | Hand gesture recognition based on various deep learning YOLO models | |
Tang et al. | A Survey on Human Action Recognition based on Attention Mechanism | |
Deotale et al. | Optimized hybrid RNN model for human activity recognition in untrimmed video | |
Xu et al. | Beyond two-stream: Skeleton-based three-stream networks for action recognition in videos | |
CN104751144B (en) | A kind of front face fast appraisement method of facing video monitoring | |
CN115393963A (en) | Motion action correcting method, system, storage medium, computer equipment and terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20210714 Address after: 215000 room d303, building g-1, shazhouhu science and Technology Innovation Park, Huachang Road, yangshe Town, Zhangjiagang City, Suzhou City, Jiangsu Province Applicant after: Suzhou aikor Intelligent Technology Co.,Ltd. Address before: 201601 building 6, 351 sizhuan Road, Sijing Town, Songjiang District, Shanghai Applicant before: Shanghai Sike Intelligent Technology Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
GR01 | Patent grant | ||
GR01 | Patent grant |