WO2018133666A1 - Method and apparatus for tracking video target - Google Patents

Method and apparatus for tracking video target Download PDF

Info

Publication number
WO2018133666A1
WO2018133666A1 PCT/CN2018/070090 CN2018070090W WO2018133666A1 WO 2018133666 A1 WO2018133666 A1 WO 2018133666A1 CN 2018070090 W CN2018070090 W CN 2018070090W WO 2018133666 A1 WO2018133666 A1 WO 2018133666A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
target
tracked
current
facial
Prior art date
Application number
PCT/CN2018/070090
Other languages
French (fr)
Chinese (zh)
Inventor
余三思
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to CN201710032132.6A priority Critical patent/CN106845385A/en
Priority to CN201710032132.6 priority
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018133666A1 publication Critical patent/WO2018133666A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00624Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
    • G06K9/00711Recognising video content, e.g. extracting audiovisual features from movies, extracting representative key-frames, discriminating news vs. sport content
    • G06K9/00718Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06K9/00724Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • G06K9/00228Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • G06K9/00268Feature extraction; Face representation
    • G06K9/00275Holistic features and representations, i.e. based on the facial image taken as a whole
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00221Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
    • G06K9/00288Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00624Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
    • G06K9/00711Recognising video content, e.g. extracting audiovisual features from movies, extracting representative key-frames, discriminating news vs. sport content
    • G06K9/00744Extracting features from the video content, e.g. video "fingerprints", or characteristics, e.g. by automatic extraction of representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00624Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
    • G06K9/00711Recognising video content, e.g. extracting audiovisual features from movies, extracting representative key-frames, discriminating news vs. sport content
    • G06K9/00758Matching video sequences

Abstract

A method and apparatus for tracking a video target. The method can be applied to a terminal or a server, and comprises: obtaining a video stream, and recognizing a face region according to a face detection algorithm, so as to obtain a first to-be-tracked target corresponding to a first video frame (S210); performing extraction on the first to-be-tracked target according to face features based on a deep neural network so as to obtain a first face feature, and adding the first face feature into a feature library corresponding to the first to-be-tracked target (S220); and recognizing a face region in a current video frame according to the face detection algorithm so as obtain a current to-be-tracked target corresponding to the current video frame, performing extraction on the current to-be-tracked target according to the face features based on the deep neural network so as to obtain a second face feature, and performing feature matching on the current to-be-tracked target and the first to-be-tracked target according to the second face feature and the feature library so as to track the first to-be-tracked target starting from the first video frame, and in the tracking process, updating the feature library according to extracted updated face features (S230).

Description

视频目标跟踪方法和装置Video target tracking method and device
本申请要求于2017年1月17日提交中国专利局、申请号为201710032132.6,发明名称为“视频目标跟踪的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application, filed on Jan. 17, 2017, filed Jan. .
技术领域Technical field
本申请涉及计算机技术领域,特别是涉及一种视频目标跟踪方法和装置。The present application relates to the field of computer technologies, and in particular, to a video object tracking method and apparatus.
背景技术Background technique
目标跟踪技术一直以来都是计算机视觉和图像处理领域的热点,被广泛应用在智能监控、智能交通、视觉导航、人机交互、国防侦察等领域。Target tracking technology has always been a hotspot in the field of computer vision and image processing, and is widely used in the fields of intelligent monitoring, intelligent transportation, visual navigation, human-computer interaction, and defense reconnaissance.
目标跟踪算法通常使用一种或数种简单的传统特征匹配算法来区分目标,如利用图像本身的颜色、形状等特征。Target tracking algorithms typically use one or several simple traditional feature matching algorithms to distinguish targets, such as using the color, shape, and other characteristics of the image itself.
发明内容Summary of the invention
本申请实施例提供一种视频目标跟踪方法和装置,能够提高跟踪的连续性和鲁棒性。The embodiment of the present application provides a video object tracking method and apparatus, which can improve the continuity and robustness of tracking.
本申请实施例提供一种视频目标跟踪的方法,应用于终端或服务器,所述方法包括:The embodiment of the present application provides a method for video target tracking, which is applied to a terminal or a server, and the method includes:
获取视频流,根据人脸检测算法识别人脸区域,得到第一视频帧对应的第一待跟踪目标;Obtaining a video stream, identifying a face region according to a face detection algorithm, and obtaining a first to-be-tracked target corresponding to the first video frame;
对所述第一待跟踪目标通过基于深度神经网络的人脸特征提取得到第一人脸特征,并将所述第一人脸特征存入所述第一待跟踪目标对应的特征库;Obtaining a first facial feature by using a depth neural network based facial feature extraction on the first to-be-tracked target, and storing the first facial feature into a feature database corresponding to the first to-be-tracked target;
在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标,对所述当前待跟踪目标通过基于深度神经网络的 人脸特征提取得到第二人脸特征,根据所述第二人脸特征和所述特征库将所述当前待跟踪目标与第一待跟踪目标进行特征匹配,以从所述第一视频帧开始跟踪所述第一待跟踪目标,在跟踪过程中根据提取的更新的人脸特征更新所述特征库。Identifying a face region according to a face detection algorithm in the current video frame, obtaining a current target to be tracked corresponding to the current video frame, and obtaining a second face feature by using the face feature extraction based on the depth neural network for the current target to be tracked, according to The second face feature and the feature library perform feature matching on the current to-be-tracked target and the first to-be-tracked target to track the first to-be-tracked target from the first video frame, in the tracking process The feature library is updated according to the extracted updated face features.
本申请实施例还提供一种视频目标跟踪装置,所述装置包括:The embodiment of the present application further provides a video object tracking device, where the device includes:
处理器以及与所述处理器相连接的存储器,所述存储器中存储有可由所述处理器执行的机器可读指令模块;所述机器可读指令模块包括:a processor and a memory coupled to the processor, the memory having stored therein a machine readable instruction module executable by the processor; the machine readable instruction module comprising:
检测模块,用于获取视频流,根据人脸检测算法识别人脸区域,得到第一视频帧对应的第一待跟踪目标;a detecting module, configured to acquire a video stream, and identify a face region according to a face detection algorithm, to obtain a first to-be-tracked target corresponding to the first video frame;
人脸特征提取模块,用于对所述第一待跟踪目标通过基于深度神经网络的人脸特征提取得到第一人脸特征,并将所述第一人脸特征存入所述第一待跟踪目标对应的特征库;a face feature extraction module, configured to obtain a first face feature by using a depth neural network based face feature extraction on the first to-be-tracked target, and storing the first face feature into the first to-be-tracked a feature library corresponding to the target;
所述检测模块还用于在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标;The detecting module is further configured to: identify a face area according to a face detection algorithm in the current video frame, and obtain a current target to be tracked corresponding to the current video frame;
所述人脸特征提取模块还用于对所述当前待跟踪目标通过基于深度神经网络的人脸特征提取得到第二人脸特征;The face feature extraction module is further configured to obtain a second face feature by using a depth neural network based face feature extraction on the current target to be tracked;
跟踪模块,用于根据所述第二人脸特征和所述特征库将所述当前待跟踪目标与第一待跟踪目标进行特征匹配,以从所述第一视频帧开始跟踪所述第一待跟踪目标;a tracking module, configured to perform feature matching between the current to-be-tracked target and the first to-be-tracked target according to the second facial feature and the feature library, to track the first to-be-being from the first video frame Track the target;
学习模块,用于在跟踪过程中根据提取的更新的人脸特征更新所述特征库。And a learning module, configured to update the feature library according to the extracted updated facial features during the tracking process.
本申请实施例还提供一种非易失性计算机可读存储介质,所述存储介质中存储有机器可读指令,所述机器可读指令可以由处理器执行以完成以下操作:The embodiment of the present application further provides a non-transitory computer readable storage medium storing machine readable instructions, the machine readable instructions being executable by a processor to perform the following operations:
获取视频流,根据人脸检测算法识别人脸区域,得到第一视频帧对 应的第一待跟踪目标;Obtaining a video stream, identifying a face region according to a face detection algorithm, and obtaining a first to-be-tracked target corresponding to the first video frame;
对所述第一待跟踪目标通过基于深度神经网络的人脸特征提取得到第一人脸特征,并将所述第一人脸特征存入所述第一待跟踪目标对应的特征库;Obtaining a first facial feature by using a depth neural network based facial feature extraction on the first to-be-tracked target, and storing the first facial feature into a feature database corresponding to the first to-be-tracked target;
在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标,对所述当前待跟踪目标通过基于深度神经网络的人脸特征提取得到第二人脸特征,根据所述第二人脸特征和所述特征库将所述当前待跟踪目标与第一待跟踪目标进行特征匹配,以从所述第一视频帧开始跟踪所述第一待跟踪目标,在跟踪过程中根据提取的更新的人脸特征更新所述特征库。Identifying a face region according to a face detection algorithm in the current video frame, obtaining a current target to be tracked corresponding to the current video frame, and obtaining a second face feature by using the face feature extraction based on the depth neural network for the current target to be tracked, according to The second face feature and the feature library perform feature matching on the current to-be-tracked target and the first to-be-tracked target to track the first to-be-tracked target from the first video frame, in the tracking process The feature library is updated according to the extracted updated face features.
附图说明DRAWINGS
图1为本申请一个实施例中视频目标跟踪方法的应用环境图;1 is an application environment diagram of a video object tracking method according to an embodiment of the present application;
图2为本申请一个实施例中图1中终端的内部结构图;2 is an internal structural diagram of a terminal in FIG. 1 according to an embodiment of the present application;
图3为本申请一个实施例中图1中服务器的内部结构图;3 is an internal structural diagram of the server in FIG. 1 in an embodiment of the present application;
图4为本申请一个实施例中视频目标跟踪方法的流程图;4 is a flowchart of a video object tracking method according to an embodiment of the present application;
图5为本申请一个实施例中得到当前待跟踪目标的流程图;FIG. 5 is a flowchart of obtaining an object to be tracked in an embodiment of the present application;
图6为本申请一个实施例中更新特征库的流程图;6 is a flowchart of updating a feature library in an embodiment of the present application;
图7为本申请一个实施例中视频目标跟踪算法与模板匹配算法匹配对比示意图;FIG. 7 is a schematic diagram showing matching comparison between a video target tracking algorithm and a template matching algorithm according to an embodiment of the present application; FIG.
图8为本申请一个实施例中得到当前待跟踪目标的另一流程图;FIG. 8 is another flowchart of obtaining a current target to be tracked in an embodiment of the present application;
图9为本申请一个实施例中视频目标跟踪方法对应的目标跟踪系统示意图;9 is a schematic diagram of a target tracking system corresponding to a video object tracking method according to an embodiment of the present application;
图10为本申请一个实施例中视频目标跟踪算法得到的视频跟踪结果示意图;10 is a schematic diagram of video tracking results obtained by a video target tracking algorithm according to an embodiment of the present application;
图11为本申请一个实施例中TLD跟踪算法得到的视频跟踪结果示 意图;FIG. 11 is a schematic diagram showing video tracking results obtained by a TLD tracking algorithm according to an embodiment of the present application; FIG.
图12为本申请一个实施例中视频目标跟踪装置的结构示意图;12 is a schematic structural diagram of a video object tracking apparatus according to an embodiment of the present application;
图13为本申请一个实施例中视频目标跟踪装置的另一结构示意图;FIG. 13 is another schematic structural diagram of a video object tracking apparatus according to an embodiment of the present application; FIG.
图14为本申请一个实施例中视频目标跟踪装置的另一结构示意图;FIG. 14 is another schematic structural diagram of a video object tracking apparatus according to an embodiment of the present application; FIG.
图15为本申请一个实施例中视频目标跟踪装置的另一结构示意图;15 is another schematic structural diagram of a video object tracking apparatus according to an embodiment of the present application;
图16为本申请一个实施例中视频目标跟踪装置的另一结构示意图。FIG. 16 is another schematic structural diagram of a video object tracking apparatus according to an embodiment of the present application.
具体实施方式detailed description
图1为本申请一个实施例中视频目标跟踪方法运行的应用环境图。如图1所示,该应用环境包括终端110、服务器120、以及视频采集装置130,其中,终端110、服务器120、视频采集装置130通过网络140进行通信。FIG. 1 is an application environment diagram of a video target tracking method in an embodiment of the present application. As shown in FIG. 1 , the application environment includes a terminal 110, a server 120, and a video capture device 130. The terminal 110, the server 120, and the video capture device 130 communicate through the network 140.
在本申请一些实施例中,终端110可为智能手机、平板电脑、笔记本电脑、台式计算机等,但并不局限于此。视频采集装置130可为摄像头,布置在建筑物入口处等位置。网络140可以是有线网络也可以是无线网络。在本申请一些实施例中,视频采集装置130可将采集的视频流发送至终端110或服务器120,终端110或服务器120可对视频流进行目标跟踪。在本申请另一些实施例中,视频采集装置130也可直接对视频流进行目标跟踪,并将跟踪结果发送至终端110进行显示。In some embodiments of the present application, the terminal 110 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, etc., but is not limited thereto. The video capture device 130 can be a camera disposed at a location such as an entrance to a building. Network 140 can be a wired network or a wireless network. In some embodiments of the present application, the video capture device 130 may send the collected video stream to the terminal 110 or the server 120, and the terminal 110 or the server 120 may perform target tracking on the video stream. In other embodiments of the present application, the video capture device 130 may directly perform target tracking on the video stream, and send the tracking result to the terminal 110 for display.
在本申请一个实施例中,图1中的终端110的内部结构如图2所示,该终端110包括通过系统总线1101连接的处理器1102、图形处理单元1103、存储介质1104、内存1105、网络接口1106、显示屏幕1107和输入设备1108。其中,终端110的存储介质1104存储有操作系统11041以及第一视频目标跟踪装置11042,该装置11042用于实现一种适用于终端110的视频目标跟踪方法。处理器1102用于提供计算和控制能力,支撑整个终端110的运行。终端110中的图形处理单元1103用于至少提 供显示界面的绘制能力。内存1105为存储介质1104中的第一视频目标跟踪装置11042的运行提供环境。网络接口1106用于与视频采集装置130进行网络通信,如接收视频采集装置130采集的视频流等。显示屏幕1107用于显示跟踪结果等。输入设备1108用于接收用户输入的命令或数据等。对于带触摸屏的终端110,显示屏幕1107和输入设备1108可为触摸屏。图2中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的终端110的限定,具体的终端110可以包括比图2中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。In an embodiment of the present application, the internal structure of the terminal 110 in FIG. 1 is as shown in FIG. 2, and the terminal 110 includes a processor 1102, a graphics processing unit 1103, a storage medium 1104, a memory 1105, and a network connected through a system bus 1101. Interface 1106, display screen 1107, and input device 1108. The storage medium 1104 of the terminal 110 stores an operating system 11041 and a first video object tracking device 11042. The device 11042 is configured to implement a video object tracking method suitable for the terminal 110. The processor 1102 is configured to provide computing and control capabilities to support operation of the entire terminal 110. The graphics processing unit 1103 in the terminal 110 is operative to provide at least the rendering capabilities of the display interface. Memory 1105 provides an environment for operation of first video target tracking device 11042 in storage medium 1104. The network interface 1106 is configured to perform network communication with the video capture device 130, such as receiving a video stream collected by the video capture device 130. The display screen 1107 is for displaying a tracking result and the like. The input device 1108 is configured to receive commands or data input by the user, and the like. For terminal 110 with a touch screen, display screen 1107 and input device 1108 can be touch screens. The structure shown in FIG. 2 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the terminal 110 to which the solution of the present application is applied. The specific terminal 110 may include the same as shown in FIG. More or fewer parts, or some parts, or different parts.
在本申请一个实施例中,图1中服务器120的内部结构如图3所示,该服务器120包括通过系统总线1201连接的处理器1202、存储介质1203、内存1204和网络接口1205。其中,该服务器120的存储介质1203存储有操作系统12031、数据库12032、第二视频目标跟踪装置12033。数据库12032用于存储数据。第二视频目标跟踪装置12033用于实现一种适用于服务器120的视频目标跟踪方法。该服务器120的处理器1202用于提供计算和控制能力,支撑整个服务器120的运行。该服务器120的内存1204为存储介质1203中的第二视频目标跟踪装置12033的运行提供环境。该服务器120的网络接口1205用于与外部的视频采集装置130通过网络连接通信,比如接收视频采集装置130发送的视频流等。In an embodiment of the present application, the internal structure of the server 120 in FIG. 1 is as shown in FIG. 3. The server 120 includes a processor 1202, a storage medium 1203, a memory 1204, and a network interface 1205 connected through a system bus 1201. The storage medium 1203 of the server 120 stores an operating system 12031, a database 12032, and a second video target tracking device 12033. Database 12032 is used to store data. The second video object tracking device 12033 is configured to implement a video object tracking method suitable for the server 120. The processor 1202 of the server 120 is used to provide computing and control capabilities to support the operation of the entire server 120. The memory 1204 of the server 120 provides an environment for the operation of the second video object tracking device 12033 in the storage medium 1203. The network interface 1205 of the server 120 is configured to communicate with the external video capture device 130 via a network connection, such as receiving a video stream sent by the video capture device 130.
如图4所示,在本申请一个实施例中,提供了一种视频目标跟踪方法,其应用于上述应用环境中的终端110、服务器120或视频采集装置130,该方法可由本申请任一实施例提供的视频目标跟踪装置执行,包括如下步骤:As shown in FIG. 4, in an embodiment of the present application, a video object tracking method is provided, which is applied to the terminal 110, the server 120, or the video collection device 130 in the application environment, and the method may be implemented by any of the applications. The video target tracking device provided by the example performs the following steps:
步骤S210,获取视频流,根据人脸检测算法识别人脸区域,得到第一视频帧对应的第一待跟踪目标。Step S210: Acquire a video stream, and identify a face region according to the face detection algorithm to obtain a first to-be-tracked target corresponding to the first video frame.
具体的,视频流可由分布在建筑物入口处的视频采集装置采集得到。 如果视频目标跟踪方法应用于视频采集装置,则可直接从视频采集装置的存储器中获得到视频流。如果视频目标跟踪方法应用于终端或服务器,则视频采集装置可实时将采集到的视频流发送至终端或服务器。Specifically, the video stream can be acquired by a video capture device distributed at the entrance of the building. If the video target tracking method is applied to a video capture device, the video stream can be obtained directly from the memory of the video capture device. If the video target tracking method is applied to a terminal or a server, the video capture device can transmit the collected video stream to the terminal or server in real time.
人脸检测是指对于任意一幅给定的图像,采用一定的策略对其进行搜索以确定其中是否含有人脸,如果是,则返回人脸的位置、大小和姿态。在本申请一些实施例中,可通过推荐框的方式显示人脸区域(如图10中所示的矩形框),得到第一视频帧对应的第一待跟踪目标。通过不断地对视频流进行人脸检测,直到检测到有人脸出现,将人脸区域确定为第一待跟踪目标。由于一帧中可能检测到多个人脸,故第一待跟踪目标可能为多个。如果有多个第一待跟踪目标,则可通过不同的标识信息标识不同的人脸区域,如通过不同颜色的推荐框标识不同的人脸区域。人脸检测算法可根据需要自定义,如采用NPD(Normalized Pixel Difference,归一化的像素差异特征)人脸检测算法,或将NPD人脸检测算法与其它算法结合以提高确定待跟踪目标的准确性。Face detection refers to searching for a given image with a certain strategy to determine whether it contains a face, and if so, returning the position, size and posture of the face. In some embodiments of the present application, the face area (such as the rectangular frame shown in FIG. 10) may be displayed by a recommendation box to obtain a first target to be tracked corresponding to the first video frame. By continuously performing face detection on the video stream until the presence of a human face is detected, the face area is determined as the first target to be tracked. Since multiple faces may be detected in one frame, there may be multiple first to-be-tracked targets. If there are multiple first to-be-tracked targets, different face areas may be identified by different identification information, such as different face areas by different recommended frames. The face detection algorithm can be customized according to needs, such as using NPD (Normalized Pixel Difference) face detection algorithm, or combining NPD face detection algorithm with other algorithms to improve the accuracy of determining the target to be tracked. Sex.
步骤S220,对第一待跟踪目标通过基于深度神经网络的人脸特征提取得到第一人脸特征,并将所述第一人脸特征存入第一待跟踪目标对应的特征库。Step S220: The first face feature is obtained by the face feature extraction based on the depth neural network for the first target to be tracked, and the first face feature is stored in the feature library corresponding to the first target to be tracked.
具体的,深度神经网络是一种深度学习下的机器学习模型。深度学习是机器学习的分支,是使用包含复杂结构或由多重非线性变换构成的多个处理层对数据进行高层抽象的算法。深度神经网络可采用VGG(Visual Geometry Group)网络结构,通过VGG网络结构比通过特征匹配算法进行区分目标的召回率和准确率高。Specifically, deep neural network is a machine learning model under deep learning. Deep learning is a branch of machine learning. It is an algorithm that uses high-level abstraction of data using multiple processing layers consisting of complex structures or multiple nonlinear transforms. The deep neural network can adopt the VGG (Visual Geometry Group) network structure, and the recall rate and accuracy of the VGG network structure are better than the target matching algorithm.
为第一待跟踪目标分配一个目标标识并建立特征库,为所述目标标识和特征库建立关联关系并保存所述关联关系。当第一待跟踪目标为多个时,可为每个第一待跟踪目标分配目标标识并建立特征库,为每个第一待跟踪目标和其对应的第一人脸特征建立关联关系,将所述关联关系 以及第一人脸特征存储至该第一待跟踪目标对应的特征库。通过引用人脸特征进行特征匹配,可解决目标跟踪算法由于没有较好地利用人脸特征,故而频繁出现跟错、跟偏和跟丢后无法重新正确找回跟踪目标的问题。Assigning a target identifier to the first target to be tracked and establishing a feature database, establishing an association relationship between the target identifier and the feature database, and saving the association relationship. When the first to-be-tracked target is multiple, a target identifier may be assigned to each of the first to-be-tracked targets, and a feature database is established, and an association relationship is established for each of the first to-be-tracked targets and their corresponding first facial features. The association relationship and the first facial feature are stored to a feature library corresponding to the first to-be-tracked target. By matching the face features for feature matching, the problem of the target tracking algorithm can be solved because the face features are not used well, so the tracking target can not be retrieved correctly after frequent error, deviation and loss.
步骤S230,在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标,对当前待跟踪目标通过基于深度神经网络的人脸特征提取得到第二人脸特征,根据第二人脸特征和特征库将当前待跟踪目标与第一待跟踪目标进行特征匹配,以从第一视频帧开始跟踪第一待跟踪目标,在跟踪过程中根据提取的更新的人脸特征更新特征库。Step S230, the current video frame is identified according to the face detection algorithm, and the current target to be tracked corresponding to the current video frame is obtained, and the second face feature is obtained by extracting the face feature based on the depth neural network for the current target to be tracked. Feature matching between the current target to be tracked and the first target to be tracked according to the second face feature and the feature library to track the first target to be tracked from the first video frame, and according to the extracted updated face feature during the tracking process Update the signature library.
具体的,将第二人脸特征与特征库中第一待跟踪目标对应的各个第一人脸特征进行特征匹配。特征匹配的具体算法可自定义,如可直接计算人脸特征对应的矢量的欧式距离,根据欧式距离判断是否能匹配成功。如果第二人脸特征与第一人脸特征匹配成功,则确定当前待跟踪目标为第一待跟踪目标的连续运动目标。如果当前待跟踪目标有多个,则每个当前待跟踪目标组成当前待跟踪目标集合,将当前待跟踪目标集合中的各个当前待跟踪目标对应的第二人脸特征分别与特征库中各个历史待跟踪目标对应的人脸特征进行匹配,如果匹配成功,则将历史待跟踪目标的目标标识作为当前待跟踪目标的目标标识,当前待跟踪目标的位置为历史待跟踪目标运动后的位置。Specifically, the second facial features are matched with the first facial features corresponding to the first target to be tracked in the feature database. The specific algorithm of feature matching can be customized, for example, the Euclidean distance of the vector corresponding to the face feature can be directly calculated, and whether the match can be successfully determined according to the Euclidean distance. If the second facial feature matches the first facial feature successfully, it is determined that the current target to be tracked is the continuous moving target of the first target to be tracked. If there are multiple targets to be tracked, each current target to be tracked constitutes a current target group to be tracked, and the second face features corresponding to each current target to be tracked in the current target group to be tracked are respectively associated with each history in the feature library. The face features corresponding to the target to be tracked are matched. If the matching is successful, the target identifier of the historical target to be tracked is used as the target identifier of the current target to be tracked, and the current target to be tracked is the position after the historical target to be tracked.
在本申请一些实施例中,可在跟踪过程中根据提取的更新的人脸特征更新特征库,如在光照连续变化或侧脸时,会得到第一待跟踪目标在其它帧的更新的人脸特征,如果该更新的人脸特征与第一人脸特征存在差异,可将存在差异的更新的人脸特征加入第一待跟踪目标对应的特征库,并为该更新的人脸特征与第一待跟踪目标的目标标识建立关联关系,并将所述关联关系存储在特征库中,从而在第一待跟踪目标在其它帧中 存在更大角度的侧脸或更大光强的光线变化时,可将当前待跟踪目标对应的第二人脸特征与第一待跟踪目标的更新的人脸特征进行特征匹配,比直接与第一人脸特征进行特征匹配时的差异小,从而加大特征匹配成功的概率,减小目标跟踪过程对跟踪目标的变化、倾斜、遮盖、光照变化的敏感度,提高跟踪的连续性和鲁棒性。且通过特征库可保存大量第一待跟踪目标在不同帧对应的人脸特征,在第一待跟踪目标消失后又出现的情况下,可利用第一待跟踪目标对应的特征库中之前已保存的第一待跟踪目标消失前的人脸特征进行特征匹配,从而对间断出现的目标达到良好的跟踪效果。更新特征库是通过跟踪和检测来更新一个正负样本库,相当于一个半在线的跟踪算法,相比于完全离线的跟踪算法有更好的召回率,相比于完全在线的跟踪算法则能表现出更高的准确率。In some embodiments of the present application, the feature library may be updated according to the extracted updated facial features during the tracking process, such as when the illumination continuously changes or the side face, the updated face of the first target to be tracked in other frames is obtained. a feature, if the updated facial feature is different from the first facial feature, the updated facial feature having the difference may be added to the feature database corresponding to the first to-be-tracked target, and the updated facial feature is the first Correlating the target identifier of the target to be tracked, and storing the association relationship in the feature library, so that when the first target to be tracked has a larger angle of side face or a light intensity change of a larger light intensity in other frames, The second face feature corresponding to the current target to be tracked may be matched with the updated face feature of the first target to be tracked, and the difference between the feature matching directly with the first face feature is smaller, thereby increasing feature matching. The probability of success reduces the sensitivity of the target tracking process to tracking changes, tilting, occlusion, and illumination changes, and improves tracking continuity and robustness. And the feature library can save a large number of face features corresponding to the first target to be tracked in different frames, and in the case that the first target to be tracked disappears, the feature library corresponding to the first target to be tracked can be saved previously. The face features before the disappearance of the first target to be tracked are feature-matched, so that a good tracking effect is achieved for the intermittently occurring target. The update signature database updates a positive and negative sample library by tracking and detection, which is equivalent to a semi-online tracking algorithm. Compared with the full offline tracking algorithm, it has a better recall rate, compared to the fully online tracking algorithm. Shows a higher accuracy rate.
在本申请实施例中,通过获取视频流,根据人脸检测算法识别人脸区域,得到第一视频帧对应的第一待跟踪目标,对第一待跟踪目标通过基于深度神经网络的人脸特征提取得到第一人脸特征,将所述第一人脸特征加入特征库,在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标,对当前待跟踪目标通过基于深度神经网络的人脸特征提取得到第二人脸特征,根据第二人脸特征和所述特征库将当前待跟踪目标与第一待跟踪目标进行特征匹配,以从第一视频帧开始跟踪第一待跟踪目标,在跟踪过程中根据提取的更新的人脸特征更新特征库,通过引用基于深度神经网络的人脸特征进行特征匹配,可解决目标跟踪算法由于没有较好地利用人脸特征,频繁出现跟错、跟偏和跟丢后无法重新正确找回跟踪目标的问题,从而节省了终端或服务器设备的资源,提升了终端或服务器的处理器的处理速度。同时,特征库在跟踪过程中不断更新,可保存待跟踪目标在不同状态下对应的不同人脸特征,从而提高人脸特征匹配的成功率,减小目标跟踪过程对跟踪目标的变化、倾斜、遮盖、光照变化的敏感度,提高跟踪的连续性和鲁棒 性,进而提升了终端或服务器的处理器的处理速度。In the embodiment of the present application, the video stream is obtained, the face region is identified according to the face detection algorithm, and the first target to be tracked corresponding to the first video frame is obtained, and the face feature based on the depth neural network is adopted for the first target to be tracked. Extracting the first facial feature, adding the first facial feature to the feature database, and identifying the face region according to the face detection algorithm in the current video frame, and obtaining the current target to be tracked corresponding to the current video frame, and the current target to be tracked The second facial feature is obtained by the face feature extraction based on the deep neural network, and the current target to be tracked is matched with the first target to be tracked according to the second facial feature and the feature library to start from the first video frame. Tracking the first target to be tracked, updating the feature database according to the extracted updated facial features during the tracking process, and performing feature matching by referring to the facial features based on the deep neural network, the target tracking algorithm can be solved because the face is not well utilized Features, frequent occurrences of mistakes, deviations, and misses can not re-follow the correct tracking target, thus saving Resources, client or server device, to enhance the processing speed of a processor or a terminal server. At the same time, the feature library is continuously updated during the tracking process, which can save different face features corresponding to the target to be tracked in different states, thereby improving the success rate of face feature matching, reducing the change, tilt, and tracking of the target tracking process. The sensitivity of occlusion and illumination changes improves tracking continuity and robustness, which in turn increases the processing speed of the processor of the terminal or server.
在本申请一个实施例中,上述方法还包括:根据每个待跟踪目标的人脸状态通过人脸识别算法识别得到每个待跟踪目标对应的人脸身份信息,通过图像特征提取算法得到人脸身份信息对应的目标特征。In an embodiment of the present application, the method further includes: identifying a face identity information corresponding to each target to be tracked by a face recognition algorithm according to a face state of each target to be tracked, and obtaining a face by using an image feature extraction algorithm. The target feature corresponding to the identity information.
在本申请一些实施例中,人脸状态是指人脸的偏转角度状态。当检测到人脸为正脸时,可通过人脸识别算法识别得到对应的人脸身份信息。人脸身份信息用于描述人脸对应的身份。人脸识别是指将提取的人脸图像的特征数据与数据库中存储的特征模板比如人脸特征模板进行搜索匹配,根据相似程度确定人脸身份信息。如在对进入企业的员工进行人脸识别时,在数据库中提前存储了企业中各个员工的特征模板,例如人脸特征模板,从而通过将当前提取的人脸图像的特征数据与数据库中存储的人脸特征模板比对得到员工的人脸身份信息。人脸身份信息的具体内容可根据需要自定义,如员工名字、工号、所属部门等。In some embodiments of the present application, the face state refers to the state of the deflection angle of the face. When the face is detected as a positive face, the corresponding face identity information can be identified by the face recognition algorithm. The face identity information is used to describe the identity of the face. The face recognition refers to searching and matching the feature data of the extracted face image with a feature template stored in the database, such as a face feature template, and determining the face identity information according to the degree of similarity. For example, when performing face recognition on an employee entering the enterprise, a feature template of each employee in the enterprise, such as a face feature template, is stored in advance in the database, thereby storing the feature data of the currently extracted face image and the database. The face feature template is compared to get the employee's face identity information. The specific content of the face identity information can be customized according to needs, such as employee name, job number, and department.
图像特征提取算法是根据图像本身的特征,如颜色特征、纹理特征、形状特征、空间关系特征等提取特征数据,得到目标特征,其中,所述目标特征是提取得到的所有特征数据的集合。为目标特征与人脸身份信息建立关联关系,如衣服颜色、衣服纹理、人体形状,身高比例等特征,并将关联关系存储在数据库中。这样,当人脸存在偏转、遮盖时,可通过其它的目标特征进行身份的识别和确定人脸区域。在本申请一个实施例中,如图5所示,步骤S230中在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标的步骤包括:The image feature extraction algorithm extracts feature data according to characteristics of the image itself, such as a color feature, a texture feature, a shape feature, a spatial relationship feature, and the like, to obtain a target feature, wherein the target feature is a set of all the feature data extracted. The relationship between the target feature and the face identity information, such as clothing color, clothes texture, human body shape, height ratio, etc., is stored in the database. In this way, when the face is deflected and covered, the identification of the identity and the determination of the face area can be performed by other target features. In an embodiment of the present application, as shown in FIG. 5, the step of identifying the face area according to the face detection algorithm in the current video frame in step S230, and obtaining the current target to be tracked corresponding to the current video frame includes:
步骤S231,判断当前视频帧根据人脸检测算法是否识别到人脸区域,如果没有识别到人脸区域,则根据图像特征提取算法获取当前视频帧对应的当前图像特征。Step S231, determining whether the current video frame recognizes the face region according to the face detection algorithm, and if the face region is not recognized, acquiring the current image feature corresponding to the current video frame according to the image feature extraction algorithm.
具体的,如果根据人脸检测算法在当前视频帧中没有识别到人脸区 域,也有可能是由于人脸偏侧导致检测失败,此时需要根据图像特征提取算法获取当前视频帧对应的当前图像特征。Specifically, if the face region is not recognized in the current video frame according to the face detection algorithm, the detection may fail due to the face bias. In this case, the current image feature corresponding to the current video frame needs to be acquired according to the image feature extraction algorithm. .
步骤S232,将当前图像特征与目标特征对比得到匹配的目标人脸身份信息,根据目标人脸身份信息得到当前视频帧对应的当前待跟踪目标。Step S232, comparing the current image feature with the target feature to obtain the matched target face identity information, and obtaining the current target to be tracked corresponding to the current video frame according to the target face identity information.
具体的,由于之前已经将目标特征与人脸身份信息关联,此时可将当前图像特征与目标特征对比,计算相似度,如果相似度超过阈值,则匹配成功,可获取匹配的目标特征对应的目标人脸身份信息,从而根据目标人脸身份信息得到当前视频帧对应的当前待跟踪目标。然后,通过人脸身份信息将当前待跟踪目标与第一待跟踪目标进行匹配,从而实现对第一待跟踪目标的跟踪。Specifically, since the target feature has been associated with the face identity information, the current image feature can be compared with the target feature to calculate the similarity. If the similarity exceeds the threshold, the matching is successful, and the matching target feature can be obtained. The target face identity information, so that the current target to be tracked corresponding to the current video frame is obtained according to the target face identity information. Then, the current target to be tracked is matched with the first target to be tracked by the face identity information, thereby implementing tracking of the first target to be tracked.
本申请实施例中,将人脸身份信息引入目标跟踪,在人脸识别的同时结合图像特征,在人脸检测算法无法识别人脸区域时也能达到对目标的跟踪,进一步提高跟踪的连续性和鲁棒性。In the embodiment of the present application, the face identity information is introduced into the target tracking, and the face feature is combined with the image feature, and the face detection algorithm can also track the target when the face detection algorithm cannot recognize the face region, thereby further improving the continuity of the tracking. And robustness.
在本申请一个实施例中,步骤S220可包括:获取第一待跟踪目标对应的第一人脸身份信息,建立第一人脸身份信息对应的第一人脸特征集合,将第一人脸特征加入所述第一人脸特征集合并将所述第一人脸特征集合存储至第一待跟踪目标对应的特征库。In an embodiment of the present application, step S220 may include: acquiring first face identity information corresponding to the first target to be tracked, establishing a first face feature set corresponding to the first face identity information, and using the first face feature Adding the first facial feature set and storing the first facial feature set to a feature library corresponding to the first target to be tracked.
具体的,可对第一待跟踪目标进行人脸识别得到第一待跟踪目标对应的第一人脸身份信息。第一人脸特征集合用于存储第一待跟踪目标在运动过程中不同状态下的第一人脸特征,不同状态包括不同角度、不同光照、不同遮盖范围等。将人脸特征提取后得到的第一人脸特征加入第一人脸特征集合,并为所述第一人脸特征集合与第一人脸身份信息建立关联关系,将所述关联关系以及第一人脸特征集合存储至第一待跟踪目标对应的特征库。Specifically, the first to-be-tracked target may be subjected to face recognition to obtain first face identity information corresponding to the first target to be tracked. The first facial feature set is used to store the first facial features of the first target to be tracked in different states during the motion, and the different states include different angles, different illuminations, different coverage ranges, and the like. Adding a first facial feature obtained by extracting the facial features to the first facial feature set, and establishing an association relationship between the first facial feature set and the first facial identity information, and the associated relationship and the first The set of face features is stored to a feature library corresponding to the first target to be tracked.
在本申请一个实施例中,如图6所示,步骤S230中在跟踪过程中 根据提取的更新的人脸特征更新特征库的步骤可包括:In an embodiment of the present application, as shown in FIG. 6, the step of updating the feature library according to the extracted updated facial features in the tracking process in step S230 may include:
步骤S233,获取当前待跟踪目标对应的当前人脸身份信息,从特征库获取当前人脸身份信息对应的第一人脸特征集合。Step S233: Acquire current face identity information corresponding to the current target to be tracked, and obtain a first face feature set corresponding to the current face identity information from the feature database.
具体的,在一个实施例中,可通过对当前待跟踪目标进行人脸识别得到当前待跟踪目标对应的当前人脸身份信息。在另外一个实施例中,也可通过对当前待跟踪目标应用图像特征提取算法得到当前待跟踪目标对应的当前图像特征,再将当前图像特征与目标特征进行匹配,将匹配的目标特征对应的人脸身份信息作为当前人脸身份信息,从而在当前待跟踪目标无法识别到人脸区域时也能得到当前人脸身份信息。根据人脸身份信息与人脸特征集合的关联对应关系,得到当前人脸身份信息对应的第一人脸特征集合,表明当前待跟踪目标与第一待跟踪目标是同一目标。Specifically, in an embodiment, the current face identity information corresponding to the current target to be tracked may be obtained by performing face recognition on the current target to be tracked. In another embodiment, the current image feature corresponding to the current target to be tracked may be obtained by applying an image feature extraction algorithm to the current target to be tracked, and then the current image feature is matched with the target feature, and the matching target feature is matched. The face identity information is used as the current face identity information, so that the current face identity information can also be obtained when the current target to be tracked cannot recognize the face region. According to the association relationship between the face identity information and the face feature set, the first face feature set corresponding to the current face identity information is obtained, indicating that the current target to be tracked and the first target to be tracked are the same target.
步骤S234,计算第一人脸特征集合中的第一人脸特征与第二人脸特征的差异量,如果差异量超过预设阈值,则在第一人脸特征集合中增加第二人脸特征。Step S234, calculating a difference between the first facial feature and the second facial feature in the first facial feature set, and if the difference exceeds a preset threshold, adding a second facial feature in the first facial feature set .
具体的,可自定义算法计算第二人脸特征与第一人脸特征集合中的第一人脸特征的差异量。如果第一人脸特征集合中的第一人脸特征为多个,则分别计算第二人脸特征与每个第一人脸特征的差异量,得到多个差异量。差异量表明了第二人脸特征与特征库中已经保存的同一跟踪目标的人脸特征之间的差异,差异越大表明跟踪目标的人脸状态变化越大。如果差异量超过预设阈值,则在第一人脸特征集合中增加第二人脸特征,增加的第二人脸特征可用于后续进行的特征匹配。在人脸特征集合中存储的人脸特征越多,就越能表征同一跟踪目标在不同状态下的特征,只要其中任何一个特征能在特征匹配时匹配成功,就认为当前待跟踪目标与第一待跟踪目标的匹配成功,从而加大了匹配成功的概率,减小目标 跟踪过程对跟踪目标的变化、倾斜、遮盖、光照变化的敏感度,提高跟踪的连续性和鲁棒性。Specifically, the custom algorithm calculates a difference amount of the second facial feature and the first facial feature in the first facial feature set. If the first face feature in the first face feature set is plural, the difference amount between the second face feature and each first face feature is separately calculated, and a plurality of difference amounts are obtained. The difference amount indicates the difference between the second face feature and the face feature of the same tracking target that has been saved in the feature database. The larger the difference, the larger the face state change of the tracking target. If the difference amount exceeds the preset threshold, the second face feature is added to the first face feature set, and the added second face feature is available for subsequent feature matching. The more face features stored in the face feature set, the more it can characterize the same track target in different states. As long as any one of the features can match successfully when the feature matches, the current target to be tracked is considered to be the first target. The matching of the target to be tracked is successful, thereby increasing the probability of successful matching, reducing the sensitivity of the target tracking process to the change, tilt, occlusion, and illumination changes of the tracking target, and improving the continuity and robustness of the tracking.
在本申请一个实施例中,步骤S220可包括:对第一待跟踪目标通过深度神经网络进行人脸特征提取得到第一特征矢量。In an embodiment of the present application, step S220 may include: performing facial feature extraction on the first to-be-tracked target through the depth neural network to obtain the first feature vector.
具体的,对深度神经网络进行训练后得到人脸特征提取模型,输入第一待跟踪目标对应的像素值,则得到第一特征矢量,第一特征矢量的维度由人脸特征提取模型决定。Specifically, after training the deep neural network to obtain a face feature extraction model, and inputting the pixel value corresponding to the first target to be tracked, the first feature vector is obtained, and the dimension of the first feature vector is determined by the face feature extraction model.
步骤S230包括:对当前待跟踪目标通过深度神经网络进行人脸特征提取得到第二特征矢量,计算第一特征矢量与第二特征矢量的欧氏距离,如果欧氏距离小于预设阈值,则确定第一待跟踪目标与当前待跟踪目标特征匹配成功。Step S230 includes: performing a facial feature extraction on the current target to be tracked to obtain a second feature vector, and calculating an Euclidean distance between the first feature vector and the second feature vector. If the Euclidean distance is less than a preset threshold, determining The first to-be-tracked target is successfully matched with the current target feature to be tracked.
具体的,输入当前待跟踪目标对应的像素值至上述人脸特征提取模型,则可得到第二特征矢量。第一特征矢量与第二特征矢量的欧氏距离代表了当前待跟踪目标与第一待跟踪目标的相似度。如果欧氏距离小于预设阈值,则确定当前待跟踪目标与第一待跟踪目标特征匹配成功,表明当前待跟踪目标与第一待跟踪目标是同一目标,达到跟踪目的。Specifically, the pixel value corresponding to the current target to be tracked is input to the face feature extraction model, and the second feature vector is obtained. The Euclidean distance of the first feature vector and the second feature vector represents the similarity between the current target to be tracked and the first target to be tracked. If the Euclidean distance is less than the preset threshold, it is determined that the current target to be tracked and the first target to be tracked are successfully matched, indicating that the current target to be tracked is the same target as the first target to be tracked, and the tracking target is achieved.
在本申请一个实施例中,深度神经网络的网络结构可以为11层网络层,包括堆栈式的卷积神积网络和完全连接层,堆栈式的卷积神积网络由多个卷积层和maxpool层组成,具体网络结构为:In an embodiment of the present application, the network structure of the deep neural network may be an 11-layer network layer, including a stacked convolutional product network and a fully connected layer, and the stacked convolutional product network is composed of multiple convolution layers and The maxpool layer is composed of specific network structures:
conv3-64*2+LRN+max poolConv3-64*2+LRN+max pool
conv3-128+max poolConv3-128+max pool
conv3-256*2+max poolConv3-256*2+max pool
conv3-512*2+max poolConv3-512*2+max pool
conv3-512*2+max poolConv3-512*2+max pool
FC2048FC2048
FC1024,FC1024,
其中conv3表示半径为3的卷积层,LRN表示LRN层,max pool表 示maxpool层,FC表示完全连接层。Where conv3 represents a convolutional layer with a radius of 3, LRN represents the LRN layer, max pool represents the maxpool layer, and FC represents a fully connected layer.
具体的,此网络结构为简化的深度神经网络VGG网络结构,其中64*2表示2个64组,LRN层是一种帮助训练的无参数层,FC2048表示输出为2048维度矢量的完全连接层,最后一个完全连接层FC1024的输出为特征提取得到的人脸特征,是1024维矢量。通过简化的VGG网络结构得到的优化后的人脸特征在测试集的随机块匹配上的表现远优于TLD(Tracking-Learning-Detection,单目标长时间跟踪)中的匹配模块的匹配表现,且大大提高了人脸特征提取的效率,达到跟踪算法所要求的实时性。在本申请一个实施例中,可控制待跟踪目标的分辨率为112*112像素,以减少计算复杂度。图7为此VGG网络结构对应的人脸特征提取算法VGG-S与模板匹配算法match template的匹配比对示意图。如图7所示,横坐标代表召回率,纵坐标代表准确率,可见此VGG网络结构对应的人脸特征提取算法在进行特征匹配时有更好的准确率,提高了目标跟踪的正确率。Specifically, the network structure is a simplified deep neural network VGG network structure, wherein 64*2 represents two 64 groups, the LRN layer is a parameter-free layer for training, and the FC2048 represents a fully connected layer with a 2048 dimension vector. The output of the last fully connected layer FC1024 is the face feature obtained by feature extraction, which is a 1024-dimensional vector. The optimized face features obtained by the simplified VGG network structure perform much better than the matching modules in the TLD (Tracking-Learning-Detection). The efficiency of face feature extraction is greatly improved, and the real-time performance required by the tracking algorithm is achieved. In one embodiment of the present application, the resolution of the target to be tracked can be controlled to be 112*112 pixels to reduce computational complexity. FIG. 7 is a schematic diagram of matching matching of the face feature extraction algorithm VGG-S corresponding to the VGG network structure and the template matching algorithm match template. As shown in Fig. 7, the abscissa represents the recall rate and the ordinate represents the accuracy. It can be seen that the face feature extraction algorithm corresponding to the VGG network structure has better accuracy in feature matching and improves the correct rate of target tracking.
在本申请一个实施例中,步骤S230中在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标的步骤可包括:基于归一化的像素差异特征和人体半身识别算法在当前视频帧中识别人脸区域,得到当前视频帧对应的当前待跟踪目标。In an embodiment of the present application, in step S230, the step of identifying the face region according to the face detection algorithm in the current video frame, and obtaining the current target to be tracked corresponding to the current video frame may include: normalizing the pixel difference feature and the human body The half body identification algorithm identifies the face area in the current video frame, and obtains the current target to be tracked corresponding to the current video frame.
具体的,基于归一化的像素差异特征(Normalized Pixel Difference,NPD)进行人脸检测,将得到的返回值作为人脸区域推荐框,如可基于NPD特征使用AdaBoost构造强分类器用以识别和区分人脸。人体半身识别算法可根据需要定义,可进行上半身检测,根据上半身检测筛选人脸区域推荐框,可过滤掉部分识别错误的人脸区域推荐框,极大地提高了人脸区域检测的召回率和准确率,提升了目标跟踪的整体表现。Specifically, the face detection is performed based on the normalized Pixel Difference (NPD), and the obtained return value is used as a face region recommendation frame. For example, the AdaBoost structure strong classifier can be used to identify and distinguish based on the NPD feature. human face. The human body half-length recognition algorithm can be defined according to needs, and can perform upper body detection. According to the upper body detection, the face area recommendation box can be screened, and the partial recognition frame of the face area can be filtered out, which greatly improves the recall rate and accuracy of the face area detection. The rate improves the overall performance of the target tracking.
在本申请一个实施例中,如图8所示,步骤S230中在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目 标的步骤可包括:In an embodiment of the present application, as shown in FIG. 8, the step of identifying the face area according to the face detection algorithm in the current video frame in step S230, and obtaining the current to-be-tracked target corresponding to the current video frame may include:
步骤S235,基于归一化的像素差异特征识别人脸区域,在当前视频帧得到第一推荐区域。Step S235, identifying a face region based on the normalized pixel difference feature, and obtaining a first recommended region in the current video frame.
步骤S236,根据光流分析算法计算得到所述第一待跟踪目标在当前视频帧对应的第二推荐区域。Step S236, calculating, according to the optical flow analysis algorithm, that the first target to be tracked is in the second recommended area corresponding to the current video frame.
具体的,光流分析算法假设一个像素I(x,y,t)在第一帧的光强度,它移动了(dx,dy)的距离到下一帧,用了dt时间。因为像素点是一样的,光强度也没有发生变化。根据历史第一待跟踪目标的运动轨迹采用光流分析原理计算得到第一待跟踪目标对应的向量速度模型,向向量速度模型输入当前视频帧和当前视频帧的前一帧以及第一待跟踪目标在前一帧的位置,可得到第一待跟踪目标在当前视频帧对应的第二推荐区域,即第一待跟踪目标在当前视频帧可能出现的位置。Specifically, the optical flow analysis algorithm assumes that a pixel I(x, y, t) is at the light intensity of the first frame, and it moves the distance of (dx, dy) to the next frame, using the dt time. Since the pixels are the same, the light intensity does not change. According to the motion track of the first to-be-tracked target, the vector velocity model corresponding to the first target to be tracked is calculated by using the optical flow analysis principle, and the current video frame and the previous frame of the current video frame and the first target to be tracked are input to the vector velocity model. In the position of the previous frame, the second recommended area corresponding to the current video frame of the first to-be-tracked target may be obtained, that is, the position where the first to-be-tracked target may appear in the current video frame.
步骤S237,根据第一推荐区域和第二推荐区域得到当前待跟踪目标。Step S237, obtaining a current target to be tracked according to the first recommended area and the second recommended area.
具体的,根据光流分析算法得出的第二推荐区域为第一待跟踪目标基于历史运动速度可能运动至的区域,可根据第二推荐区域的位置排除与第二推荐区域位置距离超过预设范围的第一推荐区域,从而得到当前待跟踪目标。也可将第一推荐区域和第二推荐区域全部作为当前待跟踪目标。如果第一待跟踪目标为多个,则每个第一待跟踪目标分别有对应的第二推荐区域。Specifically, the second recommended area according to the optical flow analysis algorithm is an area that the first to-be-tracked target may move based on the historical moving speed, and the distance between the second recommended area and the second recommended area may be excluded according to the position of the second recommended area. The first recommended area of the range, thereby obtaining the current target to be tracked. The first recommended area and the second recommended area may all be used as the current target to be tracked. If the first target to be tracked is multiple, each of the first to-be-tracked targets has a corresponding second recommended area.
本实施例中,将归一化的像素差异特征与光流分析算法结合得到当前待跟踪目标,因为先验信息的加入使得后续进行特征匹配时准确率提高。In this embodiment, the normalized pixel difference feature is combined with the optical flow analysis algorithm to obtain the current target to be tracked, because the addition of a priori information improves the accuracy of subsequent feature matching.
在一个实施例中,步骤S237可包括:根据帧间相关性进行运动预测得到预期运动范围,根据预期运动范围筛选第一推荐区域和第二推荐区域得到当前待跟踪目标。In an embodiment, step S237 may include: performing motion prediction according to inter-frame correlation to obtain an expected motion range, and screening the first recommended area and the second recommended area according to the expected motion range to obtain a current target to be tracked.
具体的,帧间相关性利用历史位置信息和运动轨迹来预测目标在下一帧或数帧内的位置,相当于利用先验信息来调整NPD算法的可信度。将预期运动范围外的第一推荐区域和第二推荐区域过滤掉,得到当前待跟踪目标,减少了后续计算特征匹配的匹配数量,提高了匹配效率和准确率。Specifically, the inter-frame correlation uses the historical position information and the motion trajectory to predict the position of the target in the next frame or frames, which is equivalent to using the prior information to adjust the credibility of the NPD algorithm. The first recommended area and the second recommended area outside the expected motion range are filtered out to obtain the current target to be tracked, which reduces the matching number of subsequent calculated feature matching, and improves the matching efficiency and accuracy.
在本申请一个实施例中,视频目标跟踪方法可通过如图9所示的三个模块完成视频目标跟踪,包括跟踪模块310、检测模块320、以及学习模块330。具体地,获取视频流,根据人脸检测算法识别人脸区域,得到第一视频帧对应的第一待跟踪目标,从第一待跟踪目标所在的视频帧开始跟踪,跟踪模块310对第一待跟踪目标通过基于深度神经网络的人脸特征提取得到第一人脸特征,并将所述第一人脸特征加入特征库,学习模块330根据跟踪情况更新特征库,检测模块320不断从当前视频帧中查找更好的当前待跟踪目标,以防跟错和跟丢,跟踪模块310根据更新的特征库将当前待跟踪目标和第一待跟踪目标进行匹配,以跟踪第一待跟踪目标。In an embodiment of the present application, the video target tracking method may complete video target tracking by using three modules as shown in FIG. 9, including a tracking module 310, a detecting module 320, and a learning module 330. Specifically, the video stream is obtained, and the face region is identified according to the face detection algorithm, and the first to-be-tracked target corresponding to the first video frame is obtained, and the tracking is started from the video frame where the first to-be-tracked target is located, and the tracking module 310 The tracking target obtains the first facial feature by facial feature extraction based on the depth neural network, and adds the first facial feature to the feature library, and the learning module 330 updates the feature database according to the tracking condition, and the detecting module 320 continuously obtains the current video frame. The tracking module 310 matches the current target to be tracked with the first target to be tracked according to the updated feature database to track the first target to be tracked.
在本申请一个实施例中,采用上述视频目标跟踪方法得到的跟踪区域示意图可如图10所示,采用TLD跟踪算法得到的跟踪区域示意图可如图11所示。通过对比可以发现,在人脸偏侧时,本申请实施例提出的视频目标跟踪方法的跟踪区域比TLD跟踪算法的跟踪区域更为精确,且TLD跟踪算法在人脸完全偏转时会出现跟踪失败的现象,而本申请实施例提出的视频目标跟踪方法在人脸完全偏转时仍然能够跟踪成功。正确率和召回率相比于TLD跟踪算法均有提升,具体数据如下:In an embodiment of the present application, the schematic diagram of the tracking area obtained by using the video target tracking method described above may be as shown in FIG. 10, and the tracking area obtained by using the TLD tracking algorithm may be as shown in FIG. 11. By comparison, it can be found that the tracking area of the video object tracking method proposed by the embodiment of the present application is more accurate than the tracking area of the TLD tracking algorithm, and the TLD tracking algorithm may fail to track when the face is completely deflected. The video target tracking method proposed in the embodiment of the present application can still track success when the face is completely deflected. The correct rate and recall rate are improved compared to the TLD tracking algorithm. The specific data is as follows:
1.无人头检测版本:准确率提升5个百分点左右,错误率降低100%,目标跟踪丢失率下降25%。1. Unmanned head detection version: The accuracy rate is increased by about 5 percentage points, the error rate is reduced by 100%, and the target tracking loss rate is reduced by 25%.
2.有人头检测版本:准确率提升1个百分点左右,错误率降低100%,目标跟踪丢失率下降15%。2. The header detection version: the accuracy rate is increased by about 1%, the error rate is reduced by 100%, and the target tracking loss rate is reduced by 15%.
在性能方面,在640*480的分辨率下,3.5G主频的CPU和Nvidia Geforce Gtx 775m的机器,单帧处理时间在40ms左右,帧率在25FPS以上。In terms of performance, at a resolution of 640*480, a 3.5G CPU and an Nvidia Geforce Gtx 775m machine have a single frame processing time of about 40ms and a frame rate of 25FPS or more.
上述视频目标跟踪方法比传统方法更精准,给后续的人员人流统计、身份识别和行为分析等需求提供了可能和便利,性能上的良好表现也满足了在线处理的需求,提高了监控分析系统的准确性、拓展性和适用性,进而提高了硬件处理器的处理速度,提高了处理器的处理性能。The above video target tracking method is more accurate than the traditional method, which provides the possibility and convenience for subsequent personnel flow statistics, identification and behavior analysis requirements, and the performance performance also satisfies the requirements of online processing, and improves the monitoring and analysis system. Accuracy, scalability and applicability, which in turn increases the processing speed of the hardware processor and improves the processing performance of the processor.
在本申请一个实施例中,如图12所示,提供了一种视频目标跟踪装置,该装置可包括:In an embodiment of the present application, as shown in FIG. 12, a video object tracking device is provided, and the device may include:
检测模块410,用于获取视频流,根据人脸检测算法识别人脸区域,得到第一视频帧对应的第一待跟踪目标。The detecting module 410 is configured to acquire a video stream, and identify a face region according to the face detection algorithm to obtain a first to-be-tracked target corresponding to the first video frame.
人脸特征提取模块420,用于对所述第一待跟踪目标通过基于深度神经网络的人脸特征提取得到第一人脸特征,并将所述第一人脸特征存入所述第一待跟踪目标对应的特征库。The facial feature extraction module 420 is configured to obtain the first facial feature by using the facial feature extraction based on the depth neural network for the first to-be-tracked target, and store the first facial feature into the first Track the feature library corresponding to the target.
检测模块410还用于在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标。The detecting module 410 is further configured to: in the current video frame, identify the face region according to the face detection algorithm, and obtain the current target to be tracked corresponding to the current video frame.
人脸特征提取模块420还用于对当前待跟踪目标通过基于深度神经网络的人脸特征提取得到第二人脸特征。The facial feature extraction module 420 is further configured to obtain a second facial feature by using a depth neural network based facial feature extraction on the current target to be tracked.
跟踪模块430,用于根据第二人脸特征和所述特征库将当前待跟踪目标与第一待跟踪目标进行特征匹配,以从第一视频帧开始跟踪第一待跟踪目标。The tracking module 430 is configured to perform feature matching between the current target to be tracked and the first target to be tracked according to the second face feature and the feature library to track the first target to be tracked from the first video frame.
学习模块440,用于在跟踪过程中根据提取的更新的人脸特征更新所述特征库。The learning module 440 is configured to update the feature library according to the extracted updated facial features during the tracking process.
在本申请一个实施例中,如图13所示,该装置还包括:In an embodiment of the present application, as shown in FIG. 13, the device further includes:
特征身份处理模块450,用于根据待跟踪目标的人脸状态通过人脸 识别算法识别得到对应的人脸身份信息,根据图像特征提取算法得到人脸身份信息对应的目标特征,并为所述目标特征和人脸身份信息建立关联关系。The feature identity processing module 450 is configured to identify, according to the face state of the target to be tracked, the corresponding face identity information by using a face recognition algorithm, and obtain the target feature corresponding to the face identity information according to the image feature extraction algorithm, and serve the target Feature and face identity information are related.
检测模块410可包括:The detecting module 410 can include:
图像特征提取单元411,用于判断在当前视频帧根据人脸检测算法是否识别到人脸区域,如果没有识别到人脸区域,则根据图像特征提取算法获取当前视频帧对应的当前图像特征。The image feature extraction unit 411 is configured to determine whether the current video frame recognizes the face region according to the face detection algorithm. If the face region is not recognized, the current image feature corresponding to the current video frame is acquired according to the image feature extraction algorithm.
身份匹配单元412,用于基于所述关联关系,将当前图像特征与目标特征对比得到匹配的目标人脸身份信息。The identity matching unit 412 is configured to compare the current image feature with the target feature to obtain matching target facial identity information based on the association relationship.
第一跟踪目标确定单元413,用于根据目标人脸身份信息得到当前视频帧对应的当前待跟踪目标。The first tracking target determining unit 413 is configured to obtain a current target to be tracked corresponding to the current video frame according to the target facial identity information.
在本申请一个实施例中,人脸特征提取模块420还用于获取第一待跟踪目标对应的第一人脸身份信息,建立第一人脸身份信息对应的第一人脸特征集合,将第一人脸特征加入第一人脸特征集合并将所述第一人脸特征集合存储至所述特征库。In an embodiment of the present application, the facial feature extraction module 420 is further configured to acquire first facial identity information corresponding to the first target to be tracked, and establish a first facial feature set corresponding to the first facial identity information, A face feature is added to the first face feature set and the first face feature set is stored to the feature library.
学习模块440还用于获取当前待跟踪目标对应的当前人脸身份信息,从特征库获取当前人脸身份信息对应的第一人脸特征集合,计算第一人脸特征集合中的第一人脸特征与第二人脸特征的差异量,如果差异量超过预设阈值,则在第一人脸特征集合中增加第二人脸特征。The learning module 440 is further configured to acquire current face identity information corresponding to the current target to be tracked, obtain a first face feature set corresponding to the current face identity information from the feature database, and calculate a first face in the first face feature set. The amount of difference between the feature and the second face feature, if the amount of difference exceeds a preset threshold, adding a second face feature to the first face feature set.
在本申请一个实施例中,检测模块410还用于基于归一化的像素差异特征和人体半身识别算法在当前视频帧中识别人脸区域,得到当前视频帧对应的当前待跟踪目标。In an embodiment of the present application, the detecting module 410 is further configured to identify a face region in the current video frame based on the normalized pixel difference feature and the human body half body recognition algorithm, to obtain a current target to be tracked corresponding to the current video frame.
在本申请一个实施例中,如图14所示,检测模块410可包括:In an embodiment of the present application, as shown in FIG. 14, the detecting module 410 may include:
第一推荐单元414,用于基于归一化的像素差异特征识别人脸区域,在当前视频帧得到第一推荐区域。The first recommending unit 414 is configured to identify the face region based on the normalized pixel difference feature, and obtain the first recommended region in the current video frame.
第二推荐单元415,根据光流分析算法计算得到第一待跟踪目标在当前视频帧对应的第二推荐区域。The second recommending unit 415 calculates, according to the optical flow analysis algorithm, that the first target to be tracked is in the second recommended area corresponding to the current video frame.
第二跟踪目标确定单元416,用于根据第一推荐区域和第二推荐区域得到当前待跟踪目标。The second tracking target determining unit 416 is configured to obtain the current target to be tracked according to the first recommended area and the second recommended area.
在本申请一个实施例中,第二跟踪目标确定单元416还用于根据帧间相关性进行运动预测得到预期运动范围,根据预期运动范围筛选第一推荐区域和第二推荐区域得到当前待跟踪目标。In an embodiment of the present application, the second tracking target determining unit 416 is further configured to perform motion prediction according to the inter-frame correlation to obtain an expected motion range, and filter the first recommended area and the second recommended area according to the expected motion range to obtain the current target to be tracked. .
在本申请一个实施例中,深度神经网络的网络结构为11层网络层,包括堆栈式的卷积神积网络和完全连接层,堆栈式的卷积神积网络由多个卷积层和maxpool层组成,具体网络结构为:In an embodiment of the present application, the network structure of the deep neural network is an 11-layer network layer, including a stacked convolutional product network and a fully connected layer, and the stacked convolutional product network consists of multiple convolution layers and maxpool. Layer composition, the specific network structure is:
conv3-64*2+LRN+max poolConv3-64*2+LRN+max pool
conv3-128+max poolConv3-128+max pool
conv3-256*2+max poolConv3-256*2+max pool
conv3-512*2+max poolConv3-512*2+max pool
conv3-512*2+max poolConv3-512*2+max pool
FC2048FC2048
FC1024,FC1024,
其中conv3表示半径为3的卷积层,LRN表示LRN层,max pool表示maxpool层,FC表示完全连接层。Where conv3 represents a convolutional layer with a radius of 3, LRN represents the LRN layer, max pool represents the maxpool layer, and FC represents the fully connected layer.
在本申请一个实施例中,人脸特征提取模块420还用于对第一待跟踪目标通过深度神经网络进行人脸特征提取得到第一特征矢量,对当前待跟踪目标通过深度神经网络进行人脸特征提取得到第二特征矢量。In an embodiment of the present application, the facial feature extraction module 420 is further configured to perform facial feature extraction on the first to-be-tracked target through the depth neural network to obtain a first feature vector, and perform a face on the current target to be tracked through the deep neural network. Feature extraction yields a second feature vector.
跟踪模块430还用于计算第一特征矢量与第二特征矢量的欧氏距离,如果所述欧氏距离小于预设阈值,则确定所述第一待跟踪目标与当前待跟踪目标特征匹配成功。The tracking module 430 is further configured to calculate an Euclidean distance between the first feature vector and the second feature vector, and if the Euclidean distance is less than a preset threshold, determine that the first to-be-tracked target matches the current target feature to be tracked successfully.
图15是本申请实施例提供的视频目标跟踪装置的另一结构示意图。 如图15所示,该视频目标跟踪装置包括:处理器510,与所述处理器510相连接的存储器520,以及用于发送和接收数据的端口530。所述存储器520中存储有可由所述处理器510执行的机器可读指令模块,所述所述机器可读指令模块包括:FIG. 15 is another schematic structural diagram of a video object tracking apparatus according to an embodiment of the present application. As shown in FIG. 15, the video object tracking device includes a processor 510, a memory 520 coupled to the processor 510, and a port 530 for transmitting and receiving data. The memory 520 stores a machine readable instruction module executable by the processor 510, the machine readable instruction module comprising:
检测模块521,用于获取视频流,根据人脸检测算法识别人脸区域,得到第一视频帧对应的第一待跟踪目标。The detecting module 521 is configured to acquire a video stream, and identify a face region according to the face detection algorithm to obtain a first to-be-tracked target corresponding to the first video frame.
人脸特征提取模块522,用于对所述第一待跟踪目标通过基于深度神经网络的人脸特征提取得到第一人脸特征,并将所述第一人脸特征存入所述第一待跟踪目标对应的特征库。The facial feature extraction module 522 is configured to obtain the first facial feature by using the facial feature extraction based on the depth neural network for the first to-be-tracked target, and store the first facial feature into the first Track the feature library corresponding to the target.
检测模块521还用于在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标。The detecting module 521 is further configured to identify the face area according to the face detection algorithm in the current video frame, and obtain the current target to be tracked corresponding to the current video frame.
人脸特征提取模块522还用于对当前待跟踪目标通过基于深度神经网络的人脸特征提取得到第二人脸特征。The facial feature extraction module 522 is further configured to obtain a second facial feature by using a depth neural network based facial feature extraction on the current target to be tracked.
跟踪模块523,用于根据第二人脸特征和所述特征库将当前待跟踪目标与第一待跟踪目标进行特征匹配,以从第一视频帧开始跟踪第一待跟踪目标。The tracking module 523 is configured to perform feature matching between the current target to be tracked and the first target to be tracked according to the second face feature and the feature library to track the first target to be tracked from the first video frame.
学习模块524,用于在跟踪过程中根据提取的更新的人脸特征更新所述特征库。The learning module 524 is configured to update the feature library according to the extracted updated facial features during the tracking process.
在本申请一个实施例中,如图16所示,所述机器可读指令模块还可包括:In an embodiment of the present application, as shown in FIG. 16, the machine readable instruction module may further include:
特征身份处理模块525,用于根据待跟踪目标的人脸状态通过人脸识别算法识别得到对应的人脸身份信息,根据图像特征提取算法得到人脸身份信息对应的目标特征,并为所述目标特征和人脸身份信息建立关联关系。The feature identity processing module 525 is configured to identify the corresponding face identity information by using a face recognition algorithm according to the face state of the target to be tracked, and obtain the target feature corresponding to the face identity information according to the image feature extraction algorithm, and serve the target Feature and face identity information are related.
在本申请实施例中,上述检测模块521、人脸特征提取模块522、 跟踪模块523、学习模块524以及特征身份处理模块525的具体功能和实现方式可参照前述的模块410至450的相关描述,在此不再赘述。In the embodiment of the present application, the specific functions and implementation manners of the foregoing detection module 521, the face feature extraction module 522, the tracking module 523, the learning module 524, and the feature identity processing module 525 may refer to the related descriptions of the foregoing modules 410 to 450. I will not repeat them here.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述程序可存储于一非易失性计算机可读取存储介质中,如本申请实施例中,该程序可存储于计算机系统的存储介质中,并被该计算机系统中的至少一个处理器执行,以实现包括如上述各方法的实施例的流程。其中,所述存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。One of ordinary skill in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a non-volatile computer readable storage medium. As in the embodiment of the present application, the program may be stored in a storage medium of the computer system and executed by at least one processor in the computer system to implement a flow including an embodiment of the methods as described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
通过以上的实施例的描述,本领域的技术人员可以清楚地了解到本申请实施例可借助软件加必需的通用硬件平台的方式来实现,即通过机器可读指令来指令相关的硬件来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台终端设备(可以是手机,个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the embodiments of the present application can be implemented by means of software plus a necessary general hardware platform, that is, the machine hardware readable instructions are used to instruct related hardware. Of course, hardware can also be used, but in many cases the former is a better implementation. Based on such understanding, the technical solution of the embodiments of the present application may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions. A terminal device (which may be a cell phone, a personal computer, a server, or a network device, etc.) is caused to perform the methods described in the various embodiments of the present application.
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above-described embodiments may be arbitrarily combined. For the sake of brevity of description, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction between the combinations of these technical features, All should be considered as the scope of this manual.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请的保护范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请的保护范围应以所附权利要求为准。The above-mentioned embodiments are merely illustrative of several embodiments of the present application, and the description thereof is not to be construed as limiting the scope of the application. It should be noted that a number of variations and modifications may be made by those skilled in the art without departing from the spirit and scope of the present application. Therefore, the scope of protection of the application should be determined by the appended claims.

Claims (22)

  1. 一种视频目标跟踪方法,应用于终端或服务器,所述方法包括:A video object tracking method is applied to a terminal or a server, and the method includes:
    获取视频流,根据人脸检测算法识别人脸区域,得到第一视频帧对应的第一待跟踪目标;Obtaining a video stream, identifying a face region according to a face detection algorithm, and obtaining a first to-be-tracked target corresponding to the first video frame;
    对所述第一待跟踪目标通过基于深度神经网络的人脸特征提取得到第一人脸特征,并将所述第一人脸特征存入所述第一待跟踪目标对应的特征库;Obtaining a first facial feature by using a depth neural network based facial feature extraction on the first to-be-tracked target, and storing the first facial feature into a feature database corresponding to the first to-be-tracked target;
    在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标,对所述当前待跟踪目标通过基于深度神经网络的人脸特征提取得到第二人脸特征,根据所述第二人脸特征和所述特征库将所述当前待跟踪目标与第一待跟踪目标进行特征匹配,以从所述第一视频帧开始跟踪所述第一待跟踪目标,在跟踪过程中根据提取的更新的人脸特征更新所述特征库。Identifying a face region according to a face detection algorithm in the current video frame, obtaining a current target to be tracked corresponding to the current video frame, and obtaining a second face feature by using the face feature extraction based on the depth neural network for the current target to be tracked, according to The second face feature and the feature library perform feature matching on the current to-be-tracked target and the first to-be-tracked target to track the first to-be-tracked target from the first video frame, in the tracking process The feature library is updated according to the extracted updated face features.
  2. 根据权利要求1所述的方法,所述方法还包括:The method of claim 1 further comprising:
    根据待跟踪目标的人脸状态通过人脸识别算法识别得到对应的人脸身份信息,根据图像特征提取算法得到所述人脸身份信息对应的目标特征,并为所述目标特征和人脸身份信息建立关联关系;Corresponding facial identity information is identified by a face recognition algorithm according to a face state of the target to be tracked, and a target feature corresponding to the facial identity information is obtained according to an image feature extraction algorithm, and the target feature and the face identity information are obtained. Establish an association relationship;
    所述在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标的步骤包括:The step of identifying the face area according to the face detection algorithm in the current video frame, and obtaining the current target to be tracked corresponding to the current video frame includes:
    判断在当前视频帧根据人脸检测算法是否识别到人脸区域,如果没有识别到人脸区域,则根据图像特征提取算法获取当前视频帧对应的当前图像特征;Determining whether the current video frame recognizes the face region according to the face detection algorithm, and if the face region is not recognized, acquiring the current image feature corresponding to the current video frame according to the image feature extraction algorithm;
    基于所述关联关系,将所述当前图像特征与所述目标特征对比得到匹配的目标人脸身份信息;And comparing the current image feature with the target feature to obtain matching target face identity information based on the association relationship;
    根据所述目标人脸身份信息得到当前视频帧对应的当前待跟踪目 标。Obtaining a current to-be-tracked target corresponding to the current video frame according to the target facial identity information.
  3. 根据权利要求1所述的方法,所述对所述第一待跟踪目标通过基于深度神经网络的人脸特征提取得到第一人脸特征,并将所述第一人脸特征存入所述第一待跟踪目标对应的特征库的步骤包括:The method according to claim 1, wherein the first to-be-tracked target obtains a first facial feature by facial feature extraction based on a depth neural network, and stores the first facial feature into the first The steps of the feature library corresponding to the tracking target include:
    获取第一待跟踪目标对应的第一人脸身份信息;Obtaining first face identity information corresponding to the first target to be tracked;
    建立所述第一人脸身份信息对应的第一人脸特征集合,将所述第一人脸特征加入所述第一人脸特征集合并将所述第一人脸特征集合存储至所述特征库;Establishing a first facial feature set corresponding to the first facial identity information, adding the first facial feature to the first facial feature set, and storing the first facial feature set to the feature Library
    所述在跟踪过程中根据提取的更新的人脸特征更新所述特征库的步骤包括:The step of updating the feature library according to the extracted updated facial features during the tracking process includes:
    获取当前待跟踪目标对应的当前人脸身份信息;Obtaining current face identity information corresponding to the current target to be tracked;
    从所述特征库获取所述当前人脸身份信息对应的第一人脸特征集合;Obtaining, from the feature database, a first facial feature set corresponding to the current facial identity information;
    计算所述第一人脸特征集合中的第一人脸特征与所述第二人脸特征的差异量,如果所述差异量超过预设阈值,则在所述第一人脸特征集合中增加所述第二人脸特征。Calculating a difference amount between the first facial feature and the second facial feature in the first facial feature set, and adding the difference in the first facial feature set if the difference exceeds a preset threshold The second facial feature.
  4. 根据权利要求1所述的方法,所述在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标的步骤包括:The method according to claim 1, wherein the step of identifying the face region according to the face detection algorithm in the current video frame, and obtaining the current target to be tracked corresponding to the current video frame comprises:
    基于归一化的像素差异特征和人体半身识别算法在当前视频帧中识别人脸区域,得到当前视频帧对应的当前待跟踪目标。The normalized pixel difference feature and the human body half body recognition algorithm identify the face region in the current video frame, and obtain the current target to be tracked corresponding to the current video frame.
  5. 根据权利要求1所述的方法,所述在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标的步骤包括:The method according to claim 1, wherein the step of identifying the face region according to the face detection algorithm in the current video frame, and obtaining the current target to be tracked corresponding to the current video frame comprises:
    基于归一化的像素差异特征识别人脸区域,在当前视频帧得到第一推荐区域;Identifying a face region based on the normalized pixel difference feature, and obtaining a first recommended region in the current video frame;
    根据光流分析算法计算得到所述第一待跟踪目标在当前视频帧对应的第二推荐区域;Calculating, according to the optical flow analysis algorithm, the second recommended area of the first to-be-tracked target corresponding to the current video frame;
    根据所述第一推荐区域和所述第二推荐区域得到所述当前待跟踪目标。Obtaining the current target to be tracked according to the first recommended area and the second recommended area.
  6. 根据权利要求5所述的方法,所述根据所述第一推荐区域和所述第二推荐区域得到所述当前待跟踪目标的步骤包括:The method according to claim 5, wherein the step of obtaining the current target to be tracked according to the first recommended area and the second recommended area comprises:
    根据帧间相关性进行运动预测得到预期运动范围,根据所述预期运动范围筛选所述第一推荐区域和所述第二推荐区域得到所述当前待跟踪目标。The motion prediction is performed according to the inter-frame correlation to obtain an expected motion range, and the first recommended area and the second recommended area are filtered according to the expected motion range to obtain the current to-be-tracked target.
  7. 根据权利要求1至6任一项所述的方法,所述深度神经网络的网络结构为11层网络层,包括堆栈式的卷积神积网络和完全连接层,所述堆栈式的卷积神积网络由多个卷积层和maxpool层组成,具体网络结构为:The method according to any one of claims 1 to 6, wherein the network structure of the deep neural network is an 11-layer network layer, including a stacked convolutional product network and a fully connected layer, the stacked convolution god The product network consists of multiple convolutional layers and maxpool layers. The specific network structure is:
    conv3-64*2+LRN+max poolConv3-64*2+LRN+max pool
    conv3-128+max poolConv3-128+max pool
    conv3-256*2+max poolConv3-256*2+max pool
    conv3-512*2+max poolConv3-512*2+max pool
    conv3-512*2+max poolConv3-512*2+max pool
    FC2048FC2048
    FC1024,FC1024,
    其中conv3表示半径为3的卷积层,LRN表示LRN层,max pool表示maxpool层,FC表示完全连接层。Where conv3 represents a convolutional layer with a radius of 3, LRN represents the LRN layer, max pool represents the maxpool layer, and FC represents the fully connected layer.
  8. 根据权利要求1至6任一项所述的方法,所述对所述第一待跟踪目标通过基于深度神经网络的人脸特征提取得到第一人脸特征,并将所述第一人脸特征存入所述第一待跟踪目标对应的特征库的步骤包括:The method according to any one of claims 1 to 6, wherein the first to-be-tracked target obtains a first facial feature by facial feature extraction based on a depth neural network, and the first facial feature is obtained The step of storing the feature library corresponding to the first to-be-tracked target includes:
    对所述第一待跟踪目标通过深度神经网络进行人脸特征提取得到 第一特征矢量;Performing facial feature extraction on the first to-be-tracked target through a deep neural network to obtain a first feature vector;
    所述对所述当前待跟踪目标通过基于深度神经网络的人脸特征提取得到第二人脸特征,根据所述第二人脸特征和所述特征库将所述当前待跟踪目标与第一待跟踪目标进行特征匹配,以从所述第一视频帧开始跟踪所述第一待跟踪目标的步骤包括:Determining, by the depth neural network based facial feature extraction, the second facial feature to the current to-be-tracked target, and according to the second facial feature and the feature database, the current to-be-tracked target and the first to-be-targeted Tracking the target for feature matching to track the first to-be-tracked target from the first video frame includes:
    对所述当前待跟踪目标通过深度神经网络进行人脸特征提取得到第二特征矢量;Performing facial feature extraction on the current target to be tracked through the depth neural network to obtain a second feature vector;
    计算所述第一特征矢量与第二特征矢量的欧氏距离,如果所述欧氏距离小于预设阈值,则确定所述第一待跟踪目标与当前待跟踪目标特征匹配成功。And calculating an Euclidean distance between the first feature vector and the second feature vector. If the Euclidean distance is less than a preset threshold, determining that the first to-be-tracked target matches the current target feature to be tracked successfully.
  9. 一种视频目标跟踪装置,所述装置包括:A video object tracking device, the device comprising:
    处理器以及与所述处理器相连接的存储器,所述存储器中存储有可由所述处理器执行的机器可读指令模块;所述机器可读指令模块包括:a processor and a memory coupled to the processor, the memory having stored therein a machine readable instruction module executable by the processor; the machine readable instruction module comprising:
    检测模块,用于获取视频流,根据人脸检测算法识别人脸区域,得到第一视频帧对应的第一待跟踪目标;a detecting module, configured to acquire a video stream, and identify a face region according to a face detection algorithm, to obtain a first to-be-tracked target corresponding to the first video frame;
    人脸特征提取模块,用于对所述第一待跟踪目标通过基于深度神经网络的人脸特征提取得到第一人脸特征,并将所述第一人脸特征存入所述第一待跟踪目标对应的特征库;a face feature extraction module, configured to obtain a first face feature by using a depth neural network based face feature extraction on the first to-be-tracked target, and storing the first face feature into the first to-be-tracked a feature library corresponding to the target;
    所述检测模块还用于在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标;The detecting module is further configured to: identify a face area according to a face detection algorithm in the current video frame, and obtain a current target to be tracked corresponding to the current video frame;
    所述人脸特征提取模块还用于对所述当前待跟踪目标通过基于深度神经网络的人脸特征提取得到第二人脸特征;The face feature extraction module is further configured to obtain a second face feature by using a depth neural network based face feature extraction on the current target to be tracked;
    跟踪模块,用于根据所述第二人脸特征和所述特征库将所述当前待跟踪目标与第一待跟踪目标进行特征匹配,以从所述第一视频帧开始跟踪所述第一待跟踪目标;a tracking module, configured to perform feature matching between the current to-be-tracked target and the first to-be-tracked target according to the second facial feature and the feature library, to track the first to-be-being from the first video frame Track the target;
    学习模块,用于在跟踪过程中根据提取的更新的人脸特征更新所述特征库。And a learning module, configured to update the feature library according to the extracted updated facial features during the tracking process.
  10. 根据权利要求9所述的装置,所述装置还包括:The apparatus of claim 9 further comprising:
    特征身份处理模块,用于根据待跟踪目标的人脸状态通过人脸识别算法识别得到对应的人脸身份信息,根据图像特征提取算法得到所述人脸身份信息对应的目标特征,并为所述目标特征和人脸身份信息建立关联关系;a feature identity processing module, configured to identify a corresponding face identity information by using a face recognition algorithm according to a face state of the target to be tracked, and obtain a target feature corresponding to the face identity information according to the image feature extraction algorithm, and The relationship between the target feature and the face identity information is established;
    所述检测模块包括:The detection module includes:
    图像特征提取单元,用于判断在当前视频帧根据人脸检测算法是否识别到人脸区域,如果没有识别到人脸区域,则根据图像特征提取算法获取当前视频帧对应的当前图像特征;The image feature extraction unit is configured to determine whether the current video frame recognizes the face region according to the face detection algorithm, and if the face region is not recognized, acquire the current image feature corresponding to the current video frame according to the image feature extraction algorithm;
    身份匹配单元,用于基于所述关联关系,将所述当前图像特征与所述目标特征对比得到匹配的目标人脸身份信息;An identity matching unit, configured to compare the current image feature with the target feature to obtain matching target face identity information based on the association relationship;
    第一跟踪目标确定单元,用于根据所述目标人脸身份信息得到当前视频帧对应的当前待跟踪目标。The first tracking target determining unit is configured to obtain, according to the target facial identity information, a current target to be tracked corresponding to the current video frame.
  11. 根据权利要求9所述的装置,所述人脸特征提取模块还用于获取第一待跟踪目标对应的第一人脸身份信息,建立所述第一人脸身份信息对应的第一人脸特征集合,将所述第一人脸特征加入所述第一人脸特征集合并将所述第一人脸特征集合存储至所述特征库;The device according to claim 9, wherein the facial feature extraction module is further configured to acquire first facial identity information corresponding to the first target to be tracked, and establish a first facial feature corresponding to the first facial identity information. Collecting, adding the first facial feature to the first facial feature set and storing the first facial feature set to the feature database;
    所述学习模块还用于获取当前待跟踪目标对应的当前人脸身份信息,从所述特征库获取所述当前人脸身份信息对应的第一人脸特征集合,计算所述第一人脸特征集合中的第一人脸特征与所述第二人脸特征的差异量,如果所述差异量超过预设阈值,则在所述第一人脸特征集合中增加所述第二人脸特征。The learning module is further configured to acquire current facial identity information corresponding to the current target to be tracked, obtain a first facial feature set corresponding to the current facial identity information from the feature database, and calculate the first facial feature And a difference amount of the first facial feature in the set and the second facial feature, and if the difference exceeds a preset threshold, adding the second facial feature to the first facial feature set.
  12. 根据权利要求9所述的装置,所述检测模块还用于基于归一化 的像素差异特征和人体半身识别算法在当前视频帧中识别人脸区域,得到当前视频帧对应的当前待跟踪目标。The apparatus according to claim 9, wherein the detecting module is further configured to identify a face region in the current video frame based on the normalized pixel difference feature and the body half body recognition algorithm to obtain a current target to be tracked corresponding to the current video frame.
  13. 根据权利要求9所述的装置,所述检测模块包括:The apparatus of claim 9, the detecting module comprising:
    第一推荐单元,用于基于归一化的像素差异特征识别人脸区域,在当前视频帧得到第一推荐区域;a first recommending unit, configured to identify a face region based on the normalized pixel difference feature, and obtain a first recommended region in the current video frame;
    第二推荐单元,根据光流分析算法计算得到所述第一待跟踪目标在当前视频帧对应的第二推荐区域;The second recommendation unit calculates, according to the optical flow analysis algorithm, that the first to-be-tracked target is in the second recommended area corresponding to the current video frame;
    第二跟踪目标确定单元,用于根据所述第一推荐区域和所述第二推荐区域得到所述当前待跟踪目标。The second tracking target determining unit is configured to obtain the current target to be tracked according to the first recommended area and the second recommended area.
  14. 根据权利要求13所述的装置,所述第二跟踪目标确定单元还用于根据帧间相关性进行运动预测得到预期运动范围,根据所述预期运动范围筛选所述第一推荐区域和所述第二推荐区域得到所述当前待跟踪目标。The apparatus according to claim 13, wherein the second tracking target determining unit is further configured to perform motion prediction according to inter-frame correlation to obtain an expected motion range, and filter the first recommended region and the first according to the expected motion range. The two recommended areas get the current target to be tracked.
  15. 根据权利要求9至14任一项所述的装置,所述人脸特征提取模块还用于对所述第一待跟踪目标通过深度神经网络进行人脸特征提取得到第一特征矢量,对所述当前待跟踪目标通过深度神经网络进行人脸特征提取得到第二特征矢量;The device according to any one of claims 9 to 14, wherein the facial feature extraction module is further configured to perform facial feature extraction on the first to-be-tracked target through a depth neural network to obtain a first feature vector, The current target to be tracked performs face feature extraction through a deep neural network to obtain a second feature vector;
    所述跟踪模块还用于计算所述第一特征矢量与第二特征矢量的欧氏距离,如果所述欧氏距离小于预设阈值,则确定所述第一待跟踪目标与当前待跟踪目标特征匹配成功。The tracking module is further configured to calculate an Euclidean distance between the first feature vector and the second feature vector, and if the Euclidean distance is less than a preset threshold, determine the first target to be tracked and the current target feature to be tracked The match was successful.
  16. 一种非易失性计算机可读存储介质,所述存储介质中存储有机器可读指令,所述机器可读指令可以由处理器执行以完成以下操作:A non-transitory computer readable storage medium storing machine readable instructions, the machine readable instructions being executable by a processor to perform the following operations:
    获取视频流,根据人脸检测算法识别人脸区域,得到第一视频帧对应的第一待跟踪目标;Obtaining a video stream, identifying a face region according to a face detection algorithm, and obtaining a first to-be-tracked target corresponding to the first video frame;
    对所述第一待跟踪目标通过基于深度神经网络的人脸特征提取得 到第一人脸特征,并将所述第一人脸特征存入所述第一待跟踪目标对应的特征库;Obtaining a first facial feature by the facial feature extraction based on the depth neural network for the first to-be-tracked target, and storing the first facial feature into a feature database corresponding to the first to-be-tracked target;
    在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标,对所述当前待跟踪目标通过基于深度神经网络的人脸特征提取得到第二人脸特征,根据所述第二人脸特征和所述特征库将所述当前待跟踪目标与第一待跟踪目标进行特征匹配,以从所述第一视频帧开始跟踪所述第一待跟踪目标,在跟踪过程中根据提取的更新的人脸特征更新所述特征库。Identifying a face region according to a face detection algorithm in the current video frame, obtaining a current target to be tracked corresponding to the current video frame, and obtaining a second face feature by using the face feature extraction based on the depth neural network for the current target to be tracked, according to The second face feature and the feature library perform feature matching on the current to-be-tracked target and the first to-be-tracked target to track the first to-be-tracked target from the first video frame, in the tracking process The feature library is updated according to the extracted updated face features.
  17. 如权利要求16所述的非易失性计算机可读存储介质,所述机器可读指令可以由所述处理器执行以完成以下操作:The non-transitory computer readable storage medium of claim 16, the machine readable instructions being executable by the processor to:
    根据待跟踪目标的人脸状态通过人脸识别算法识别得到对应的人脸身份信息,根据图像特征提取算法得到所述人脸身份信息对应的目标特征,并为所述目标特征和人脸身份信息建立关联关系;Corresponding facial identity information is identified by a face recognition algorithm according to a face state of the target to be tracked, and a target feature corresponding to the facial identity information is obtained according to an image feature extraction algorithm, and the target feature and the face identity information are obtained. Establish an association relationship;
    所述在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标的步骤包括:The step of identifying the face area according to the face detection algorithm in the current video frame, and obtaining the current target to be tracked corresponding to the current video frame includes:
    判断在当前视频帧根据人脸检测算法是否识别到人脸区域,如果没有识别到人脸区域,则根据图像特征提取算法获取当前视频帧对应的当前图像特征;Determining whether the current video frame recognizes the face region according to the face detection algorithm, and if the face region is not recognized, acquiring the current image feature corresponding to the current video frame according to the image feature extraction algorithm;
    基于所述关联关系,将所述当前图像特征与所述目标特征对比得到匹配的目标人脸身份信息;And comparing the current image feature with the target feature to obtain matching target face identity information based on the association relationship;
    根据所述目标人脸身份信息得到当前视频帧对应的当前待跟踪目标。Obtaining a current target to be tracked corresponding to the current video frame according to the target facial identity information.
  18. 如权利要求16所述的非易失性计算机可读存储介质,所述对所述第一待跟踪目标通过基于深度神经网络的人脸特征提取得到第一人脸特征,并将所述第一人脸特征存入所述第一待跟踪目标对应的特征 库的步骤包括:The non-transitory computer readable storage medium according to claim 16, wherein the first to-be-tracked target obtains a first facial feature by depth neural network-based facial feature extraction, and the first The step of storing the face feature in the feature library corresponding to the first to-be-tracked target includes:
    获取第一待跟踪目标对应的第一人脸身份信息;Obtaining first face identity information corresponding to the first target to be tracked;
    建立所述第一人脸身份信息对应的第一人脸特征集合,将所述第一人脸特征加入所述第一人脸特征集合并将所述第一人脸特征集合存储至所述特征库;Establishing a first facial feature set corresponding to the first facial identity information, adding the first facial feature to the first facial feature set, and storing the first facial feature set to the feature Library
    所述在跟踪过程中根据提取的更新的人脸特征更新所述特征库的步骤包括:The step of updating the feature library according to the extracted updated facial features during the tracking process includes:
    获取当前待跟踪目标对应的当前人脸身份信息;Obtaining current face identity information corresponding to the current target to be tracked;
    从所述特征库获取所述当前人脸身份信息对应的第一人脸特征集合;Obtaining, from the feature database, a first facial feature set corresponding to the current facial identity information;
    计算所述第一人脸特征集合中的第一人脸特征与所述第二人脸特征的差异量,如果所述差异量超过预设阈值,则在所述第一人脸特征集合中增加所述第二人脸特征。Calculating a difference amount between the first facial feature and the second facial feature in the first facial feature set, and adding the difference in the first facial feature set if the difference exceeds a preset threshold The second facial feature.
  19. 如权利要求16所述的非易失性计算机可读存储介质,所述在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标的步骤包括:The non-transitory computer readable storage medium of claim 16, wherein the step of identifying the face region according to the face detection algorithm in the current video frame, and obtaining the current target to be tracked corresponding to the current video frame comprises:
    基于归一化的像素差异特征和人体半身识别算法在当前视频帧中识别人脸区域,得到当前视频帧对应的当前待跟踪目标。The normalized pixel difference feature and the human body half body recognition algorithm identify the face region in the current video frame, and obtain the current target to be tracked corresponding to the current video frame.
  20. 如权利要求16所述的非易失性计算机可读存储介质,所述在当前视频帧根据人脸检测算法识别人脸区域,得到当前视频帧对应的当前待跟踪目标的步骤包括:The non-transitory computer readable storage medium of claim 16, wherein the step of identifying the face region according to the face detection algorithm in the current video frame, and obtaining the current target to be tracked corresponding to the current video frame comprises:
    基于归一化的像素差异特征识别人脸区域,在当前视频帧得到第一推荐区域;Identifying a face region based on the normalized pixel difference feature, and obtaining a first recommended region in the current video frame;
    根据光流分析算法计算得到所述第一待跟踪目标在当前视频帧对 应的第二推荐区域;Calculating, according to the optical flow analysis algorithm, the second recommended area of the first to-be-tracked target corresponding to the current video frame;
    根据所述第一推荐区域和所述第二推荐区域得到所述当前待跟踪目标。Obtaining the current target to be tracked according to the first recommended area and the second recommended area.
  21. 如权利要求20所述的非易失性计算机可读存储介质,所述根据所述第一推荐区域和所述第二推荐区域得到所述当前待跟踪目标的步骤包括:The non-transitory computer readable storage medium of claim 20, wherein the step of obtaining the current target to be tracked according to the first recommended area and the second recommended area comprises:
    根据帧间相关性进行运动预测得到预期运动范围,根据所述预期运动范围筛选所述第一推荐区域和所述第二推荐区域得到所述当前待跟踪目标。The motion prediction is performed according to the inter-frame correlation to obtain an expected motion range, and the first recommended area and the second recommended area are filtered according to the expected motion range to obtain the current to-be-tracked target.
  22. 根据权利要求16至21任一项所述的非易失性计算机可读存储介质,所述对所述第一待跟踪目标通过基于深度神经网络的人脸特征提取得到第一人脸特征,并将所述第一人脸特征存入所述第一待跟踪目标对应的特征库的步骤包括:The non-transitory computer readable storage medium according to any one of claims 16 to 21, wherein the first to-be-tracked target obtains a first facial feature by facial feature extraction based on a depth neural network, and The step of storing the first facial feature into the feature library corresponding to the first to-be-tracked target includes:
    对所述第一待跟踪目标通过深度神经网络进行人脸特征提取得到第一特征矢量;Performing facial feature extraction on the first to-be-tracked target through a deep neural network to obtain a first feature vector;
    所述对所述当前待跟踪目标通过基于深度神经网络的人脸特征提取得到第二人脸特征,根据所述第二人脸特征和所述特征库将所述当前待跟踪目标与第一待跟踪目标进行特征匹配,以从所述第一视频帧开始跟踪所述第一待跟踪目标的步骤包括:Determining, by the depth neural network based facial feature extraction, the second facial feature to the current to-be-tracked target, and according to the second facial feature and the feature database, the current to-be-tracked target and the first to-be-targeted Tracking the target for feature matching to track the first to-be-tracked target from the first video frame includes:
    对所述当前待跟踪目标通过深度神经网络进行人脸特征提取得到第二特征矢量;Performing facial feature extraction on the current target to be tracked through the depth neural network to obtain a second feature vector;
    计算所述第一特征矢量与第二特征矢量的欧氏距离,如果所述欧氏距离小于预设阈值,则确定所述第一待跟踪目标与当前待跟踪目标特征匹配成功。And calculating an Euclidean distance between the first feature vector and the second feature vector. If the Euclidean distance is less than a preset threshold, determining that the first to-be-tracked target matches the current target feature to be tracked successfully.
PCT/CN2018/070090 2017-01-17 2018-01-03 Method and apparatus for tracking video target WO2018133666A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710032132.6A CN106845385A (en) 2017-01-17 2017-01-17 The method and apparatus of video frequency object tracking
CN201710032132.6 2017-01-17

Publications (1)

Publication Number Publication Date
WO2018133666A1 true WO2018133666A1 (en) 2018-07-26

Family

ID=59124734

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/070090 WO2018133666A1 (en) 2017-01-17 2018-01-03 Method and apparatus for tracking video target

Country Status (3)

Country Link
CN (1) CN106845385A (en)
TW (1) TWI677825B (en)
WO (1) WO2018133666A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI719409B (en) * 2019-02-23 2021-02-21 和碩聯合科技股份有限公司 Tracking system and tracking method thereof

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845385A (en) * 2017-01-17 2017-06-13 腾讯科技(上海)有限公司 The method and apparatus of video frequency object tracking
CN107341457A (en) * 2017-06-21 2017-11-10 北京小米移动软件有限公司 Method for detecting human face and device
CN107424273A (en) * 2017-07-28 2017-12-01 杭州宇泛智能科技有限公司 A kind of management method of unmanned supermarket
US10592786B2 (en) * 2017-08-14 2020-03-17 Huawei Technologies Co., Ltd. Generating labeled data for deep object tracking
CN108875480A (en) * 2017-08-15 2018-11-23 北京旷视科技有限公司 A kind of method for tracing of face characteristic information, apparatus and system
CN109426800A (en) * 2017-08-22 2019-03-05 北京图森未来科技有限公司 A kind of method for detecting lane lines and device
CN109426785A (en) * 2017-08-31 2019-03-05 杭州海康威视数字技术股份有限公司 A kind of human body target personal identification method and device
CN107644204B (en) * 2017-09-12 2020-11-10 南京凌深信息科技有限公司 Human body identification and tracking method for security system
CN107845105A (en) * 2017-10-24 2018-03-27 深圳市圆周率软件科技有限责任公司 A kind of monitoring method, smart machine and storage medium based on the linkage of panorama rifle ball
CN107944381B (en) * 2017-11-20 2020-06-16 深圳云天励飞技术有限公司 Face tracking method, face tracking device, terminal and storage medium
CN109918975A (en) 2017-12-13 2019-06-21 腾讯科技(深圳)有限公司 A kind of processing method of augmented reality, the method for Object identifying and terminal
CN108121931B (en) * 2017-12-18 2021-06-25 阿里巴巴(中国)有限公司 Two-dimensional code data processing method and device and mobile terminal
CN108304001A (en) * 2018-02-09 2018-07-20 成都新舟锐视科技有限公司 A kind of Face datection tracking, ball machine head rotation control method and ball machine
CN110400332A (en) * 2018-04-25 2019-11-01 杭州海康威视数字技术股份有限公司 A kind of target detection tracking method, device and computer equipment
CN108763532A (en) * 2018-05-31 2018-11-06 上海掌门科技有限公司 For pushed information, show the method and apparatus of information
CN109598211A (en) * 2018-11-16 2019-04-09 恒安嘉新(北京)科技股份公司 A kind of real-time dynamic human face recognition methods and system
TWI684907B (en) * 2018-11-28 2020-02-11 財團法人金屬工業研究發展中心 Digital image recognition method, electrical device, and computer program product
CN109816701A (en) * 2019-01-17 2019-05-28 北京市商汤科技开发有限公司 A kind of method for tracking target and device, storage medium
CN110210285A (en) * 2019-04-16 2019-09-06 浙江大华技术股份有限公司 Face tracking method, face tracking device and computer storage medium
CN110363150A (en) * 2019-07-16 2019-10-22 深圳市商汤科技有限公司 Data-updating method and device, electronic equipment and storage medium
CN110633627A (en) * 2019-08-01 2019-12-31 平安科技(深圳)有限公司 Method, device, computer equipment and storage medium for positioning object in video
CN110838133B (en) * 2019-09-27 2020-11-24 深圳云天励飞技术有限公司 Multi-target tracking method and related equipment
CN112084939A (en) * 2020-09-08 2020-12-15 深圳市润腾智慧科技有限公司 Image feature data management method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105787440A (en) * 2015-11-10 2016-07-20 深圳市商汤科技有限公司 Security protection management method and system based on face features and gait features
CN105931276A (en) * 2016-06-15 2016-09-07 广州尚云在线科技有限公司 Long-time face tracking method based on intelligent cloud platform of patrol robot
CN106096535A (en) * 2016-06-07 2016-11-09 广东顺德中山大学卡内基梅隆大学国际联合研究院 A kind of face verification method based on bilinearity associating CNN
CN106156702A (en) * 2015-04-01 2016-11-23 北京市商汤科技开发有限公司 Identity identifying method and equipment
CN106845385A (en) * 2017-01-17 2017-06-13 腾讯科技(上海)有限公司 The method and apparatus of video frequency object tracking

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8036425B2 (en) * 2008-06-26 2011-10-11 Billy Hou Neural network-controlled automatic tracking and recognizing system and method
US9124800B2 (en) * 2012-02-13 2015-09-01 Htc Corporation Auto burst image capture method applied to a mobile device, method for tracking an object applied to a mobile device, and related mobile device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156702A (en) * 2015-04-01 2016-11-23 北京市商汤科技开发有限公司 Identity identifying method and equipment
CN105787440A (en) * 2015-11-10 2016-07-20 深圳市商汤科技有限公司 Security protection management method and system based on face features and gait features
CN106096535A (en) * 2016-06-07 2016-11-09 广东顺德中山大学卡内基梅隆大学国际联合研究院 A kind of face verification method based on bilinearity associating CNN
CN105931276A (en) * 2016-06-15 2016-09-07 广州尚云在线科技有限公司 Long-time face tracking method based on intelligent cloud platform of patrol robot
CN106845385A (en) * 2017-01-17 2017-06-13 腾讯科技(上海)有限公司 The method and apparatus of video frequency object tracking

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI719409B (en) * 2019-02-23 2021-02-21 和碩聯合科技股份有限公司 Tracking system and tracking method thereof

Also Published As

Publication number Publication date
TW201828158A (en) 2018-08-01
TWI677825B (en) 2019-11-21
CN106845385A (en) 2017-06-13

Similar Documents

Publication Publication Date Title
WO2018133666A1 (en) Method and apparatus for tracking video target
CN105469029B (en) System and method for object re-identification
CN107292240B (en) Person finding method and system based on face and body recognition
WO2018188453A1 (en) Method for determining human face area, storage medium, and computer device
WO2017000115A1 (en) Person re-identification method and device
US8818024B2 (en) Method, apparatus, and computer program product for object tracking
WO2019218824A1 (en) Method for acquiring motion track and device thereof, storage medium, and terminal
WO2019033570A1 (en) Lip movement analysis method, apparatus and storage medium
US20200294250A1 (en) Trajectory tracking method and apparatus, computer device, and storage medium
CN107545256B (en) Camera network pedestrian re-identification method combining space-time and network consistency
WO2016179808A1 (en) An apparatus and a method for face parts and face detection
CN111033509A (en) Object re-identification
Jin et al. Real-time action detection in video surveillance using sub-action descriptor with multi-cnn
Ballotta et al. Fully convolutional network for head detection with depth images
Hasan et al. Tiny head pose classification by bodily cues
Tsintotas et al. DOSeqSLAM: Dynamic on-line sequence based loop closure detection algorithm for SLAM
US9286707B1 (en) Removing transient objects to synthesize an unobstructed image
WO2020233397A1 (en) Method and apparatus for detecting target in video, and computing device and storage medium
WO2018107488A1 (en) Boosted intuitionistic fuzzy tree-based method and device for target tracking
Zhang et al. ATT Squeeze U-Net: A Lightweight Network for Forest Fire Detection and Recognition
CN111241932A (en) Automobile exhibition room passenger flow detection and analysis system, method and storage medium
WO2020172870A1 (en) Method and apparatus for determining motion trajectory of target object
Singleton et al. Gun identification using tensorflow
KR20190093799A (en) Real-time missing person recognition system using cctv and method thereof
Li et al. Presight: Enabling real-time detection of accessibility problems on sidewalks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18742031

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18742031

Country of ref document: EP

Kind code of ref document: A1