WO2021196409A1 - Video figure retrieval method and retrieval system based on deep learning - Google Patents
Video figure retrieval method and retrieval system based on deep learning Download PDFInfo
- Publication number
- WO2021196409A1 WO2021196409A1 PCT/CN2020/096015 CN2020096015W WO2021196409A1 WO 2021196409 A1 WO2021196409 A1 WO 2021196409A1 CN 2020096015 W CN2020096015 W CN 2020096015W WO 2021196409 A1 WO2021196409 A1 WO 2021196409A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- face
- frame
- image
- deep learning
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 15
- 238000013135 deep learning Methods 0.000 title claims abstract description 14
- 238000013528 artificial neural network Methods 0.000 claims abstract description 15
- 238000007781 pre-processing Methods 0.000 claims description 24
- 238000001514 detection method Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 3
- 230000001815 facial effect Effects 0.000 abstract 3
- 239000012634 fragment Substances 0.000 abstract 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7837—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
- G06F16/784—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
- G06T7/66—Analysis of geometric attributes of image moments or centre of gravity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/12—Fingerprints or palmprints
- G06V40/1347—Preprocessing; Feature extraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/12—Fingerprints or palmprints
- G06V40/1365—Matching; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Definitions
- the invention relates to the technical field of video face retrieval, in particular to a video person retrieval method and retrieval system based on deep learning.
- the present invention provides a method and system that can enable streaming media service providers and smart set-top box service providers to perform character retrieval in videos.
- step c) Input the pre-processed frame into the pre-trained deep neural network. If there is a human face in the grayscale image, the deep neural network outputs the position of all faces in the frame and intercepts the human face. If there is no human face in the frame Face, then return to step a);
- the service provider finds all the videos containing the frame of the specific person in the server, and when the user needs to watch the video of the specific person, the service provider jumps to the video containing the frame of the specific person for the user.
- the frame step size is set to s, and s is a positive integer greater than or equal to 1, and one frame is selected from every s frames decoded in the digital video file and sent to step b) for processing.
- the decoded digital video frame is reduced to a fixed size in a length-to-width equal ratio, and then converted into a grayscale image after the reduction.
- step d) includes the following steps:
- N is 128 in step e).
- M is 160 in step d).
- a video person retrieval system based on deep learning including: a video decoding unit, a face detection unit, and a face feature extraction unit;
- the video decoding unit includes a video decoding unit and a preprocessing unit, the video decoding unit decodes the digital video file according to its frame rate, and the preprocessing unit preprocesses the decoded digital video frames;
- the face detection unit includes a deep neural network and a preprocessing unit.
- the deep neural network outputs the position coordinates of all faces in the frame and intercepts the faces before preprocessing by the preprocessing unit;
- the face feature extraction unit is composed of the Facenet network.
- frames or segments can be decoded in the digital video by decoding the digital video according to the frame rate and preprocessing, using the pre-trained deep neural network to obtain the face information, and then using the Facenet network to convert the human
- the face image is converted into a feature vector, and the feature value of the face image of a specific person is extracted using the Facenet network.
- the distance between the feature vector and the feature centroid of the person is calculated using a formula, and the distance is compared with the feature hemisphere r to determine whether it is a specific person.
- Figure 1 is a system structure diagram of the present invention.
- step c) Input the pre-processed frame into the pre-trained deep neural network. If there is a human face in the grayscale image, the deep neural network outputs the position of all faces in the frame and intercepts the human face. If there is no human face in the frame Face, then return to step a);
- the service provider finds all the videos containing the frame of the specific person in the server, and when the user needs to watch the video of the specific person, the service provider jumps to the video containing the frame of the specific person for the user.
- the frames or segments can be decoded in the digital video.
- the pre-trained deep neural network is used to obtain the face information and then the Facenet network is used to convert the face image into a feature vector.
- Use the Facenet network to extract the feature value of the face picture of a specific person, and then use a formula to calculate the distance between the feature vector and the feature centroid of the person, and compare the distance with the feature hemisphere r to determine whether it is a specific person, so as to facilitate the service provider in the server Search multiple types of application scenarios, including videos containing specific characters.
- the frame step size is set to s, and s is a positive integer greater than or equal to 1, and one frame is selected from every s frames decoded in the digital video file and sent to step b) for processing.
- step a the frame step size is set to s, and s is a positive integer greater than or equal to 1, and one frame is selected from every s frames decoded in the digital video file and sent to step b) for processing.
- step b the decoded digital video frame is reduced to a fixed size according to the aspect ratio, and then converted to a grayscale image, which can speed up face detection.
- the speed of execution is not limited to the range of possible ranges.
- step d) includes the following steps:
- N is 128 in step e).
- M is 160 in step d).
- a video person retrieval system based on deep learning including: a video decoding unit, a face detection unit, and a face feature extraction unit;
- the video decoding unit includes a video decoding unit and a preprocessing unit, the video decoding unit decodes the digital video file according to its frame rate, and the preprocessing unit preprocesses the decoded digital video frames;
- the face detection unit includes a deep neural network and a preprocessing unit.
- the deep neural network outputs the position coordinates of all faces in the frame and intercepts the faces before preprocessing by the preprocessing unit;
- the face feature extraction unit is composed of the Facenet network.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Library & Information Science (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Geometry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Databases & Information Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (7)
- 一种基于深度学习的视频人物检索方法,其特征在于,包括如下步骤:A method for retrieving video characters based on deep learning, which is characterized in that it comprises the following steps:a)将数字视频文件按其帧率进行解码;a) Decode the digital video file according to its frame rate;b)将解码出的数字视频的帧进行预处理;b) Preprocessing the decoded digital video frames;c)将预处理后的帧输入已预先训练好的深度神经网络,如果灰度图中存在人脸,则深度神经网络输出帧内所有人脸的位置并截取人脸,如果帧内不存在人脸,则返回步骤a);c) Input the pre-processed frame into the pre-trained deep neural network. If there is a human face in the grayscale image, the deep neural network outputs the position of all faces in the frame and intercepts the human face. If there is no human face in the frame Face, then return to step a);d)将截取的人脸图像进行预处理操作;d) Perform preprocessing operations on the intercepted face images;e)将预处理后的人脸图像输入Facenet网络,Facenet网络将人脸图片转化为N维的特征向量V unkonwn; e) Input the preprocessed face image into the Facenet network, and the Facenet network converts the face image into an N-dimensional feature vector V unkonwn ;f)将需要识别的i个特定人物的人脸图片输入Facenet网络,Facenet网络提取特定人物的人脸图片的特征值V target,i,根据公式 计算特定人物的特征质心cen i,ρ i为第i个特定人物的人脸图片的置信因子,0<ρ i≤1; f) Input the face pictures of i specific persons to be recognized into the Facenet network, and the Facenet network extracts the characteristic value V target,i of the face pictures of the specific persons, according to the formula Calculate the characteristic centroid cen i of a specific person, ρ i is the confidence factor of the face picture of the i-th specific person, 0<ρ i ≤1;g)通过公式 计算特性向量V unkonwn与人物的特征质心的距离l cen,如果l cen小于特征球半径r则判定该人物为特定人物,如果l cen大于等于特征球半径r则判定该人物不为特定人物; g) through the formula Calculate the distance l cen between the characteristic vector V unkonwn and the characteristic centroid of the character. If l cen is smaller than the characteristic sphere radius r, then the character is determined to be a specific character, and if l cen is greater than or equal to the characteristic sphere radius r, it is determined that the character is not a specific character;g)服务商在服务器中寻找到包含该特定人物帧的所有视频,当用户需要观看该特定人物的视频时,服务商为用户跳转到包含该特定人物帧的视频。g) The service provider finds all videos containing the frame of the specific person in the server, and when the user needs to watch the video of the specific person, the service provider jumps to the video containing the frame of the specific person for the user.
- 根据权利要求1所述的基于深度学习的视频人物检索方法,其特征在于:步骤a)中设置帧步长为s,s为大于等于1的正整数,将数字视频文件中解码出的每s个帧中选取一个帧送入步骤b)处理。The method of video character retrieval based on deep learning according to claim 1, characterized in that: in step a), the frame step size is set to s, and s is a positive integer greater than or equal to 1, and every s in the digital video file is decoded. Select one of the frames and send it to step b) for processing.
- 根据权利要求1所述的基于深度学习的视频人物检索方法,其特征在于:步骤b)中将解码出的数字视频的帧按长宽等比的方式缩小至固定尺寸,缩小后 将其转换为灰度图。The method of video person retrieval based on deep learning according to claim 1, characterized in that: in step b), the decoded digital video frame is reduced to a fixed size according to the aspect ratio, and then converted into Grayscale image.
- 根据权利要求1所述的基于深度学习的视频人物检索方法,其特征在于,步骤d)中预处理操作包括以下步骤:The method of video character retrieval based on deep learning according to claim 1, wherein the preprocessing operation in step d) comprises the following steps:d-1)如果截取的人脸图像为正方形,则将截取的人脸图像缩放至M×M像素的正方形图像;d-1) If the intercepted face image is square, scale the intercepted face image to a square image of M×M pixels;d-2)如果截取的人脸图像不为正方形,则使用黑边将图像补为正方形图像后再缩放至M×M像素的正方形图像。d-2) If the captured face image is not square, use black borders to fill the image into a square image and then scale it to a square image of M×M pixels.
- 根据权利要求1所述的基于深度学习的视频人物检索方法,其特征在于:步骤e)中N为128。The method for retrieving video characters based on deep learning according to claim 1, characterized in that: in step e), N is 128.
- 根据权利要求4所述的基于深度学习的视频人物检索方法,其特征在于:步骤d)中M为160。The method for retrieving video characters based on deep learning according to claim 4, wherein M is 160 in step d).
- 一种实施权利要求1所述的基于深度学习的视频人物检索方法的检索系统,其特征在于,包括:视频解码单元、人脸检测单元以及人脸特征提取单元;A retrieval system implementing the method of video person retrieval based on deep learning of claim 1, characterized in that it comprises: a video decoding unit, a face detection unit, and a face feature extraction unit;所述视频解码单元包括视频解码单元和预处理单元,视频解码单元将数字视频文件按其帧率进行解码,预处理单元将解码出的数字视频的帧进行预处理;The video decoding unit includes a video decoding unit and a preprocessing unit, the video decoding unit decodes the digital video file according to its frame rate, and the preprocessing unit preprocesses the decoded digital video frames;人脸检测单元包括深度神经网络及预处理单元,深度神经网络输出帧内所有人脸的位置坐标并截取人脸后通过预处理单元预处理;The face detection unit includes a deep neural network and a preprocessing unit. The deep neural network outputs the position coordinates of all faces in the frame and intercepts the faces before preprocessing by the preprocessing unit;人脸特征提取单元由Facenet网络构成。The face feature extraction unit is composed of the Facenet network.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010249216.7A CN111460226A (en) | 2020-04-01 | 2020-04-01 | Video character retrieval method and retrieval system based on deep learning |
CN202010249216.7 | 2020-04-01 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021196409A1 true WO2021196409A1 (en) | 2021-10-07 |
Family
ID=71682499
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/096015 WO2021196409A1 (en) | 2020-04-01 | 2020-06-15 | Video figure retrieval method and retrieval system based on deep learning |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN111460226A (en) |
WO (1) | WO2021196409A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113705422B (en) * | 2021-08-25 | 2024-04-09 | 山东浪潮超高清视频产业有限公司 | Method for obtaining character video clips through human faces |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106658169A (en) * | 2016-12-18 | 2017-05-10 | 北京工业大学 | Universal method for segmenting video news in multi-layered manner based on deep learning |
CN107911748A (en) * | 2017-11-24 | 2018-04-13 | 南京融升教育科技有限公司 | A kind of video method of cutting out based on recognition of face |
CN108337532A (en) * | 2018-02-13 | 2018-07-27 | 腾讯科技(深圳)有限公司 | Perform mask method, video broadcasting method, the apparatus and system of segment |
CN108647621A (en) * | 2017-11-16 | 2018-10-12 | 福建师范大学福清分校 | A kind of video analysis processing system and method based on recognition of face |
US20190065825A1 (en) * | 2017-08-23 | 2019-02-28 | National Applied Research Laboratories | Method for face searching in images |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10943096B2 (en) * | 2017-12-31 | 2021-03-09 | Altumview Systems Inc. | High-quality training data preparation for high-performance face recognition systems |
CN108764067A (en) * | 2018-05-08 | 2018-11-06 | 北京大米科技有限公司 | Video intercepting method, terminal, equipment and readable medium based on recognition of face |
CN110188602A (en) * | 2019-04-17 | 2019-08-30 | 深圳壹账通智能科技有限公司 | Face identification method and device in video |
CN110543811B (en) * | 2019-07-15 | 2024-03-08 | 华南理工大学 | Deep learning-based non-cooperative examination personnel management method and system |
-
2020
- 2020-04-01 CN CN202010249216.7A patent/CN111460226A/en active Pending
- 2020-06-15 WO PCT/CN2020/096015 patent/WO2021196409A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106658169A (en) * | 2016-12-18 | 2017-05-10 | 北京工业大学 | Universal method for segmenting video news in multi-layered manner based on deep learning |
US20190065825A1 (en) * | 2017-08-23 | 2019-02-28 | National Applied Research Laboratories | Method for face searching in images |
CN108647621A (en) * | 2017-11-16 | 2018-10-12 | 福建师范大学福清分校 | A kind of video analysis processing system and method based on recognition of face |
CN107911748A (en) * | 2017-11-24 | 2018-04-13 | 南京融升教育科技有限公司 | A kind of video method of cutting out based on recognition of face |
CN108337532A (en) * | 2018-02-13 | 2018-07-27 | 腾讯科技(深圳)有限公司 | Perform mask method, video broadcasting method, the apparatus and system of segment |
Also Published As
Publication number | Publication date |
---|---|
CN111460226A (en) | 2020-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Recasens et al. | Broaden your views for self-supervised video learning | |
CN111935491B (en) | Live broadcast special effect processing method and device and server | |
WO2018006825A1 (en) | Video coding method and apparatus | |
US9860593B2 (en) | Devices, systems, methods, and media for detecting, indexing, and comparing video signals from a video display in a background scene using a camera-enabled device | |
US20090290791A1 (en) | Automatic tracking of people and bodies in video | |
CN111954053B (en) | Method for acquiring mask frame data, computer equipment and readable storage medium | |
US20170060867A1 (en) | Video and image match searching | |
CN107343220B (en) | Data processing method and device and terminal equipment | |
US20170289624A1 (en) | Multimodal and real-time method for filtering sensitive media | |
US20170339417A1 (en) | Fast and robust face detection, region extraction, and tracking for improved video coding | |
US20100177194A1 (en) | Image Processing System and Method for Object Tracking | |
US20150227780A1 (en) | Method and apparatus for determining identity and programing based on image features | |
WO2021164216A1 (en) | Video coding method and apparatus, and device and medium | |
US20220147735A1 (en) | Face-aware person re-identification system | |
EP3769258A1 (en) | Content type detection in videos using multiple classifiers | |
CN114339360B (en) | Video processing method, related device and equipment | |
WO2021196409A1 (en) | Video figure retrieval method and retrieval system based on deep learning | |
Zhao et al. | Laddernet: Knowledge transfer based viewpoint prediction in 360◦ video | |
CN113011254A (en) | Video data processing method, computer equipment and readable storage medium | |
JP7211373B2 (en) | MOVING IMAGE ANALYSIS DEVICE, MOVING IMAGE ANALYSIS SYSTEM, MOVING IMAGE ANALYSIS METHOD, AND PROGRAM | |
CN110570441A (en) | Ultra-high definition low-delay video control method and system | |
Hayashida et al. | Real-time human detection using spherical camera for web browser-based telecommunications | |
GB2608063A (en) | Device and method for providing missing child search service based on face recognition using deep learning | |
CN116391200A (en) | Scaling agnostic watermark extraction | |
CN111954081A (en) | Method for acquiring mask data, computer device and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20929466 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20929466 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20929466 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 05/04/2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20929466 Country of ref document: EP Kind code of ref document: A1 |