WO2020151156A1 - 视频流播放方法、系统、计算机装置及可读存储介质 - Google Patents

视频流播放方法、系统、计算机装置及可读存储介质 Download PDF

Info

Publication number
WO2020151156A1
WO2020151156A1 PCT/CN2019/090027 CN2019090027W WO2020151156A1 WO 2020151156 A1 WO2020151156 A1 WO 2020151156A1 CN 2019090027 W CN2019090027 W CN 2019090027W WO 2020151156 A1 WO2020151156 A1 WO 2020151156A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
frame
feature points
image
video stream
Prior art date
Application number
PCT/CN2019/090027
Other languages
English (en)
French (fr)
Inventor
翟彬彬
赵有志
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020151156A1 publication Critical patent/WO2020151156A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting

Definitions

  • This application relates to the technical field of video data processing, and in particular to a method, system, computer device and non-volatile readable storage medium for playing a video stream.
  • Face tracking technology is a face detection technology that tracks the position and features of the face in the video collected by the camera based on the facial feature information of the person. After reading the video stream collected by the camera, it is necessary to detect the face and facial feature points of each frame of the video stream, and draw the face frame and facial feature points.
  • the traditional face and facial feature point detection takes too much time , The video display may freeze or even fail to display.
  • the existing improvement method is to improve the detection speed by optimizing the face or feature point detection algorithm to avoid video display jams, but the improvement of the algorithm is more difficult and the effect is limited.
  • the present application provides a video stream playback method, system, computer device, and non-volatile readable storage medium, which can avoid video display jams.
  • An implementation manner of the present application provides a method for playing a video stream, and the method includes:
  • the video stream added with the face feature points and the face frame is played according to a preset frame frequency.
  • An embodiment of the present application provides a video stream playback system, the system includes:
  • the receiving module is used to receive the video stream collected by the video collection device
  • a processing module configured to obtain each image frame in the video stream, and perform reduction processing on each image frame according to a preset ratio
  • the detection module is used to perform face detection on the reduced image frames and filter out the face image frames
  • a mapping module for drawing the face feature points and face frame of each of the face image frames, and mapping the face feature points and the face frame to the original image frame that has not been reduced;
  • the playing module is used to play the video stream added with the face feature points and the face frame according to a preset frame frequency.
  • An embodiment of the present application provides a computer device.
  • the computer device includes a processor and a memory.
  • the memory stores a number of computer-readable instructions.
  • the processor is used to execute the computer-readable instructions stored in the memory. The steps of the video stream playback method described above.
  • An embodiment of the present application provides a non-volatile readable storage medium having computer readable instructions stored thereon, and when the computer readable instructions are executed by a processor, the steps of the video stream playback method as described above are realized.
  • the above-mentioned video stream playback method, system, computer device and non-volatile readable storage medium by reducing the image frame of the video stream according to a preset ratio, the abbreviated image data is smaller, which can reduce The amount of data calculation improves the speed of face detection.
  • the face feature point information is mapped to the original image frame for playback.
  • the face and feature point information of the previous frame can be used directly , It can further avoid the dragging phenomenon of faces and feature points during video playback, and give full play to the parallel computing capabilities of the detection equipment, and use multithreading to realize the video stream face detection function, further shortening the face detection time.
  • Fig. 1 is a flowchart of the steps of a video stream playing method in an embodiment of the present application.
  • Fig. 2 is a functional block diagram of a video stream playing system in an embodiment of the application.
  • Fig. 3 is a schematic diagram of a computer device in an embodiment of the application.
  • the video stream playing method of this application is applied to one or more computer devices.
  • the computer device is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor and an application specific integrated circuit (ASIC). , Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded devices, etc.
  • ASIC application specific integrated circuit
  • the computer device may be a computing device such as a desktop computer, a notebook computer, a tablet computer, and a server.
  • the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • Fig. 1 is a flowchart of the steps of a preferred embodiment of the video stream playing method of the present application. According to different needs, the order of the steps in the flowchart can be changed, and some steps can be omitted.
  • the video stream playback method specifically includes the following steps.
  • Step S11 Receive the video stream collected by the video collection device.
  • the video capture device may be a camera, a video camera, an LD video machine, or the like.
  • the video capture device can be installed at a specific location in an area where video capture is required.
  • the video capture device can communicate with the computer device 1 (as shown in FIG. 3) via a network.
  • the video capture device can capture a video stream and send the video stream to all The computer device 1 performs processing.
  • Step S12 Obtain each image frame in the video stream, and perform reduction processing on each image frame according to a preset ratio.
  • the definition of the video stream collected by the video capture device is generally high (such as high-definition, ultra-definition), so that each image frame in the video stream has larger image data.
  • More detection time will be required.
  • shrinking each image frame in the video stream according to a preset ratio the reduced image frame data is relatively small, and the amount of data calculation can be reduced during face detection, so that face detection will take less detection time.
  • the preset ratio preferably needs to ensure that the reduced image can clearly distinguish the face area without affecting the detection of the face image.
  • the image to be detected can be reduced to a size of M*N, where M and N are the number of pixels, and the values of M and N are consistent with the aspect ratio of the original image frame, for example, the pixel ratio of the original image frame is 1600*1200, The preset ratio is one tenth, and the reduced image pixel ratio is 160*120.
  • An image with a size of 160*120 can not only ensure that the face in the image is easier to distinguish, but also minimize the detection of face images The amount of calculation at the time.
  • multiple threads can be used to perform face detection actions in the video stream in parallel, for example: establishing a first thread and using the first thread to obtain each image frame in the video stream Establish a second thread and use the second thread to perform reduction processing on each of the image frames according to the preset ratio.
  • Step S13 Perform face detection on the reduced image frames and filter out face image frames.
  • the image frames of the video stream received in the foregoing steps may include image frames without human faces, that is, not all the image information contained in each image frame in the video stream needs to be performed.
  • face detection it is necessary to perform face detection on each image frame in the video stream to filter out image frames containing human faces from the video stream.
  • a face recognition model may be established and trained according to a preset face image sample library, and the face recognition model may be used to perform face detection on the reduced image frames to filter out face images frame. Specifically, first construct a face image sample library and establish a convolutional neural network for face detection.
  • the face image sample library contains face images of multiple people, and each person’s face image can include multiple angles.
  • the face image of each angle can have multiple pictures; input the face image in the face image sample library to the convolutional neural network, and use the default parameters of the convolutional neural network for convolutional neural network training; according to the training
  • the intermediate results of the default parameters, the initial weight of the default parameters, training rate, the number of iterations and other parameters are continuously adjusted until the optimal convolutional neural network parameters are obtained.
  • the convolutional neural network with the optimal network parameters can be used as the Describe the face recognition model.
  • the optimal network parameters of the convolutional neural network refer to parameters that meet the preset parameter requirements, and the preset parameter requirements can be set according to actual use requirements.
  • the second thread may be used to implement face detection on image frames that have undergone reduction processing and to filter out face image frames.
  • Step S14 Draw the face feature points and face frame of each of the face image frames, and map the face feature points and the face frame to the original image frame that has not been reduced.
  • the facial feature points can be composed of eyes, nose, mouth, chin, etc., and the facial feature points can be obtained from image frame information through integral projection or face alignment algorithms. The number of points can be determined according to the selected algorithm and actual requirements. Face feature points can be used for face recognition to distinguish different faces in the image. Since eyes are the more prominent facial features in the face, the eyes can be accurately located first, then other organs of the face, such as eyebrows , Mouth, nose, etc., can be more accurately positioned based on the potential distribution relationship.
  • the face frame is the frame of a rectangular area in the face image that can include all the feature points of the face.
  • the face frame in the current image frame can be determined according to the position of the feature points of the face.
  • the position is calculated to obtain a rectangular area containing these feature points, and the border of the rectangular area is the face detection frame determined in the current image frame.
  • the size of the face detection frame can be determined according to actual needs, but it is better to include all the feature points of the face.
  • the face alignment algorithm may be ASM algorithm, AAM algorithm, STASM algorithm, etc.
  • the feature points of the face image are drawn by corresponding to the peaks or troughs generated under different integral projection methods.
  • integral projection is divided into vertical projection and horizontal projection.
  • f(x, y) represent the gray value of the image (x, y)
  • the horizontal integral projection in the image [y1, y2] and [x1, x2] area M h (y) and vertical integral projection M v (x) are respectively expressed as:
  • the horizontal integral projection is to accumulate the gray values of all pixels in a row before displaying
  • the vertical integral projection is to accumulate the gray values of all pixels in a column before displaying.
  • the first minimum point corresponds to the position of the eyebrows on the vertical axis, denoted as y brow
  • the second minimum point corresponds to the position of the eye on the vertical axis, denoted as y eye
  • the third pole corresponds to the position of the nose on the vertical axis, denoted as y nose
  • the fourth minimum point corresponds to the position of the mouth on the vertical axis, denoted as y month .
  • the face feature points and face frame are drawn on the reduced image frame, after the drawing is completed, the face feature points and face frame need to be mapped to the original size image frame, and then If there is a face in the played video, the drawn face frame and face feature points will be displayed at the same time.
  • the face feature points and face frame mapping can be performed according to the reduction ratio of the previous image.
  • the second frame of image is first reduced, if it is detected that the second frame of image contains a human face, then the reduced second frame of image
  • the frame image draws the face feature points and the face frame, and maps the face feature points and the face frame to the original second image frame. If it is determined that the second frame image does not contain a face image, then The subsequent drawing of face feature points and face frame is not performed.
  • the second thread can also be used to draw the face feature points and face frame of each face image frame, and map the face feature points and the face frame To the original image frame that has not been reduced.
  • the principle of video playback is generally to display several continuous pictures (for example, 25 pictures) per second
  • a certain image may appear
  • the frame needs to be played but the face feature points and face frame of the image frame have not been drawn or the drawing has not been completed.
  • the face feature points and face frame of the previous frame can be directly mapped to the image frame to avoid It takes too long to detect and draw the face feature points and face frame, which causes the video display to freeze or fail to display.
  • the required interval for face feature point detection can be obtained. How many frames are detected once? For example, if the drawing time is calculated to be 1/9 second, and the number of frames per second is 25 frames/s, then it can be detected once every 3 frames.
  • Step S15 Play the video stream with the face feature points and the face frame added according to a preset frame frequency.
  • the preset frequency may be set according to actual usage requirements, for example, the preset frequency is 25 frames per second. It is possible to establish a third thread and use the third thread to read each image frame of the video stream with the face feature points and the face frame added, and set each image frame according to the preset Frame frequency for playback.
  • the above video stream playback method reduces the image frames of the video stream according to a preset ratio.
  • the abbreviated image data is smaller, which can reduce the amount of data calculation during face detection and increase the speed of face detection.
  • the face and feature point information of the previous frame can be used directly, which can further avoid the drag of the face and feature points during video playback It also makes full use of the parallel computing capabilities of the detection equipment, and uses multithreading to realize the video stream face detection function, further shortening the face detection time.
  • Fig. 2 is a diagram of functional modules of a preferred embodiment of the video stream playing system of this application.
  • the video stream playing system 10 may include a receiving module 101, a processing module 102, a detecting module 103, a mapping module 104, and a playing module 105.
  • the receiving module 101 is used to receive a video stream collected by a video collection device.
  • the video capture device may be a camera, a video camera, an LD video machine, or the like.
  • the video capture device can be installed at a specific location in an area where video capture is required.
  • the video capture device can communicate with the computer device 1 via a network. When the video capture device is started, the video capture device can capture a video stream and send the video stream to the computer device 1 for processing.
  • the processing module 102 is configured to obtain each image frame in the video stream, and perform reduction processing on each image frame according to a preset ratio.
  • the definition of the video stream collected by the video capture device is generally high (such as high-definition, ultra-definition), so that each image frame in the video stream has larger image data.
  • More detection time will be required.
  • each image frame in the video stream is reduced according to a preset ratio.
  • the reduced image frame data is relatively small, which can reduce the amount of data calculation during face detection, which can make face detection more expensive. Less detection time.
  • the preset ratio preferably needs to ensure that the reduced image can clearly distinguish the face area without affecting the detection of the face image.
  • the image to be detected can be reduced to a size of M*N, where M and N are the number of pixels, and the values of M and N are consistent with the aspect ratio of the original image frame, for example, the pixel ratio of the original image frame is 1600*1200, The preset ratio is one tenth, and the reduced image pixel ratio is 160*120.
  • An image with a size of 160*120 can not only ensure that the face in the image is easier to distinguish, but also minimize the detection of face images The amount of calculation at the time.
  • the video stream playback system 10 may use multiple threads to execute video stream face detection actions in parallel, for example: establishing a first thread and using the first thread to obtain the For each image frame in the video stream, a second thread is established and the second thread is used to perform reduction processing on each image frame according to the preset ratio.
  • the detection module 103 is used to perform face detection on the image frames that have undergone reduction processing and filter out face image frames.
  • the image frames of the video stream received by the receiving module 101 may include image frames without human faces, that is, not all the image information contained in each image frame in the video stream is Face detection needs to be performed, and face detection needs to be performed on each image frame in the video stream to filter out image frames containing human faces from the video stream.
  • the detection module 103 may establish and train a face recognition model according to a preset face image sample library, and use the face recognition model to perform face detection on the reduced image frames to Filter out the face image frames. Specifically, first construct a face image sample library and establish a convolutional neural network for face detection.
  • the face image sample library contains face images of multiple people, and each person’s face image can include multiple angles.
  • the face image of each angle can have multiple pictures; input the face image in the face image sample library to the convolutional neural network, and use the default parameters of the convolutional neural network for convolutional neural network training; according to the training
  • the intermediate results of the default parameters, the initial weight of the default parameters, training rate, the number of iterations and other parameters are continuously adjusted until the optimal convolutional neural network parameters are obtained.
  • the convolutional neural network with the optimal network parameters can be used as the Describe the face recognition model.
  • the optimal network parameters of the convolutional neural network refer to parameters that meet the preset parameter requirements, and the preset parameter requirements can be set according to actual use requirements.
  • the detection module 103 may use the second thread to perform face detection on the reduced image frames and filter out the face image frames.
  • the mapping module 104 is used to draw the face feature points and the face frame of each of the face image frames, and map the face feature points and the face frame to the original image frame that has not been reduced. in.
  • the facial feature points can be composed of eyes, nose, mouth, chin, etc., and the facial feature points can be obtained from image frame information through integral projection or face alignment algorithms. The number of points can be determined according to the selected algorithm and actual requirements. Face feature points can be used for face recognition to distinguish different faces in the image. Since eyes are the more prominent facial features in the face, the eyes can be accurately located first, then other organs of the face, such as eyebrows , Mouth, nose, etc., can be more accurately positioned based on the potential distribution relationship.
  • the face frame is the frame of a rectangular area in the face image that can include all the feature points of the face.
  • the face frame in the current image frame can be determined according to the position of the feature points of the face.
  • the position is calculated to obtain a rectangular area containing these feature points, and the border of the rectangular area is the face detection frame determined in the current image frame.
  • the size of the face detection frame can be determined according to actual needs, but it is better to include all the feature points of the face.
  • the face alignment algorithm may be ASM algorithm, AAM algorithm, STASM algorithm, etc.
  • the feature points of the face image are drawn by corresponding to the peaks or troughs generated under different integral projection methods.
  • integral projection is divided into vertical projection and horizontal projection.
  • f(x, y) represent the gray value at the image (x, y)
  • the horizontal integral projection in the image [y1, y2] and [x1, x2] area M h (y) and vertical integral projection M v (x) are respectively expressed as:
  • the horizontal integral projection is to accumulate the gray values of all pixels in a row before displaying
  • the vertical integral projection is to accumulate the gray values of all pixels in a column before displaying.
  • the first minimum point corresponds to the position of the eyebrows on the vertical axis, denoted as y brow
  • the second minimum point corresponds to the position of the eye on the vertical axis, denoted as y eye
  • the third pole corresponds to the position of the nose on the vertical axis, denoted as y nose
  • the fourth minimum point corresponds to the position of the mouth on the vertical axis, denoted as y month .
  • the mapping module 104 since the face feature points and face frame are drawn on the reduced image frame, the mapping module 104 needs to draw the face feature points and face frame after drawing the face feature points and face frame. And the face frame is mapped to the original size image frame, so that if there is a face in the played video, the drawn face frame and face feature points will be displayed at the same time.
  • the mapping module 104 may perform face feature points and face frame mapping according to the reduction ratio of the previous image.
  • the processing module 102 when performing face detection on the second frame image of the video stream, performs reduction processing on the second frame image. If the detection module 103 detects that the second frame image includes a human face image, Then the mapping module 104 draws the face feature points and the face frame on the reduced second frame image, and maps the face feature points and the face frame to the original second image frame, if the detection module 103 If it is determined that the second frame of image does not contain a human face image, the subsequent drawing of the face feature points and the face frame is not performed.
  • the mapping module 104 can also use the second thread to draw the face feature points and face frame of each face image frame, and combine the face feature points and all face frames.
  • the face frame is mapped to the original image frame that has not been reduced.
  • the principle of video playback is generally to display several continuous pictures (for example, 25 pictures) per second
  • a certain image may appear
  • the frame needs to be played but the face feature points and face frame of the image frame have not been drawn or the drawing has not been completed.
  • the face feature points and face frame of the previous frame can be directly mapped to the image frame to avoid It takes too long to detect and draw the face feature points and face frame, which causes the video display to freeze or fail to display.
  • the mapping module 104 can also avoid the phenomenon of video display jams in the following ways: judging whether the face feature points of the face image frame currently to be played and the face frame are drawn; if the face frame is currently to be played The face feature points and the face frame of the face image frame are drawn, then the face feature points and the face frame are mapped to the original face image frame that has not been reduced; if the person currently to be played If the face feature points and face frame of the face image frame are not drawn, map the face feature points and face frame of the previous or earlier face image frame to the original face image frame that has not been reduced. in.
  • the mapping module 104 may estimate the time taken to draw the facial feature points and face frame of a face image, and compare the drawing time with the number of frames per second to obtain the face. How many frames are needed for feature point detection? For example, if the drawing time is calculated to be 1/9 second, and the number of frames per second is 25 frames/s, then it can be detected once every 3 frames.
  • the playing module 105 is configured to play the video stream added with the face feature points and the face frame according to a preset frame frequency.
  • the preset frequency may be set according to actual usage requirements, for example, the preset frequency is 25 frames per second.
  • the playing module 105 may use the third thread to read each image frame of the video stream with the face feature points and the face frame added, and set each image frame according to the preset Frame frequency for playback.
  • the above-mentioned video stream playback system reduces the image frame of the video stream according to a preset ratio.
  • the abbreviated image data is smaller, which can reduce the amount of data calculation during face detection and increase the speed of face detection.
  • the face and feature point information of the previous frame can be used directly, which can further avoid the drag of the face and feature points during video playback It also makes full use of the parallel computing capabilities of the detection equipment, and uses multithreading to realize the video stream face detection function, further shortening the face detection time.
  • FIG. 3 is a schematic diagram of a preferred embodiment of the computer device of this application.
  • the computer device 1 includes a memory 20, a processor 30, and computer readable instructions 40 stored in the memory 20 and running on the processor 30, such as a video streaming program.
  • the processor 30 executes the computer-readable instruction 40, the steps in the above embodiment of the video stream playback method are implemented, such as steps S11 to S15 shown in FIG. 1.
  • the processor 30 executes the computer-readable instructions 40, the functions of the modules in the foregoing embodiment of the video stream playback system are implemented, for example, the modules 101 to 105 in FIG. 2.
  • the computer-readable instructions 40 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 20 and executed by the processor 30, To complete this application.
  • the one or more modules/units may be a series of computer-readable instruction instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instruction 40 in the computer device 1.
  • the computer-readable instruction 40 may be divided into the receiving module 101, the processing module 102, the detecting module 103, the mapping module 104, and the playing module 105 in FIG. 2. Refer to the second embodiment for the specific functions of each module.
  • the computer device 1 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the schematic diagram is only an example of the computer device 1 and does not constitute a limitation on the computer device 1. It may include more or less components than those shown in the figure, or combine certain components, or different components. Components, for example, the computer device 1 may also include input and output devices, network access devices, buses, and so on.
  • the so-called processor 30 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor 30 may also be any conventional processor, etc.
  • the processor 30 is the control center of the computer device 1 and connects the entire computer device 1 with various interfaces and lines. The various parts.
  • the memory 20 may be used to store the computer-readable instructions 40 and/or modules/units, and the processor 30 can run or execute the computer-readable instructions and/or modules/units stored in the memory 20, and
  • the data stored in the memory 20 is called to realize various functions of the computer device 1.
  • the memory 20 may mainly include a program storage area and a data storage area.
  • the program storage area may store an operating system, an application program required by at least one function (such as a sound playback function, an image playback function, etc.), etc.;
  • the data (such as audio data, phone book, etc.) created according to the use of the computer device 1 and the like are stored.
  • the memory 20 may include a high-speed random access memory, and may also include a non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart media card (SMC), and a secure digital (SD) Card, Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart media card (SMC), and a secure digital (SD) Card, Flash Card, at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • the integrated modules/units of the computer device 1 are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer readable storage medium. Based on this understanding, this application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through computer-readable instructions.
  • the computer-readable instructions can be stored in a non-volatile memory. In the read storage medium, when the computer-readable instructions are executed by the processor, the steps of the foregoing method embodiments can be implemented.
  • the computer-readable instruction includes computer-readable instruction code, and the computer-readable instruction code may be in the form of source code, object code, executable file, or some intermediate form.
  • the computer-readable medium may include: any entity or device capable of carrying the computer-readable instruction code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signal telecommunications signal
  • software distribution media any entity or device capable of carrying the computer-readable instruction code
  • recording medium U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media.
  • each unit in each embodiment of the present application may be integrated in the same processing unit, or each unit may exist alone physically, or two or more units may be integrated in the same unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware, or in the form of hardware plus software functional modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本申请提供一种视频流播放方法、系统、计算机装置及非易失性可读存储介质。所述视频流播放方法包括:接收视频采集设备所采集的视频流;获取所述视频流中每一图像帧,并对每一所述图像帧按预设比例进行缩小处理;对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像帧;绘制每一所述人脸图像帧的人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中;将添加有所述人脸特征点及所述人脸框的视频流按照预设帧频率进行播放。本申请涉及人脸检测技术,通过对视频流的图像帧进行按比例缩小处理,降低人脸及面部特征点检测耗时,可避免出现视频播放卡顿现象。

Description

视频流播放方法、系统、计算机装置及可读存储介质
本申请要求于2019年01月25日提交中国专利局,申请号为201910075210.X发明名称为“视频流播放方法、系统、计算机装置及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频数据处理技术领域,尤其涉及一种视频流播放方法、系统、计算机装置及非易失性可读存储介质。
背景技术
人脸追踪技术是基于人的面部特征信息,对摄像机采集的视频中人脸位置和面部特征进行追踪的一种人脸检测技术。在读取摄像头采集的视频流后,需对视频流的每帧图像进行人脸和面部特征点检测,并绘制人脸框和面部特征点,传统的人脸及面部特征点检测耗时过大,可能会出现视频显示卡顿,甚至无法显示的现象。现有的改进做法是通过优化人脸或特征点检测算法来提高检测速度,避免视频显示卡顿,但算法的提升较为困难,效果有限。
发明内容
鉴于上述,本申请提供一种视频流播放方法、系统、计算机装置及非易失性可读存储介质,其可实现避免视频显示卡顿现象。
本申请一实施方式提供一种视频流播放方法,所述方法包括:
接收视频采集设备所采集的视频流;
获取所述视频流中每一图像帧,并对每一所述图像帧按预设比例进行缩小处理;
对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像帧;
绘制每一所述人脸图像帧的人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中;及
将添加有所述人脸特征点及所述人脸框的视频流按照预设帧频率进行播放。
本申请一实施方式提供一种视频流播放系统,所述系统包括:
接收模块,用于接收视频采集设备所采集的视频流;
处理模块,用于获取所述视频流中每一图像帧,并对每一所述图像帧按预设比例进行缩小处理;
检测模块,用于对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像帧;
映射模块,用于绘制每一所述人脸图像帧的人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中;及
播放模块,用于将添加有所述人脸特征点及所述人脸框的视频流按照预设帧频率进行播放。
本申请一实施方式提供一种计算机装置,所述计算机装置包括处理器及存储器,所述存储器上存储有若干计算机可读指令,所述处理器用于执行存储器中存储的计算机可读指令时实现如前面所述的视频流播放方法的步骤。
本申请一实施方式提供一种非易失性可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如前面所述的视频流播放方法的步骤。
上述视频流播放方法、系统、计算机装置及非易失性可读存储介质,通过将视频流的图像帧按预设比例进行缩小处理,缩写后的图像数据较小,在人脸检测时能减少数据计算量,提高人脸检测速度,检测完毕后再将人脸特征点信息映射到原始图像帧中进行播放,在当前图像帧检测忙碌时,可直接用上一帧的人脸和特征点信息,可进一步避免视频播放时出现人脸和特征点拖拉现象,且充分发挥检测设备的并行计算能力,采用多线程来实现视频流人脸检测功能,进一步缩短人脸检测时间。
附图说明
为了更清楚地说明本申请实施方式的技术方案,下面将对实施方式描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请一实施例中视频流播放方法的步骤流程图。
图2为本申请一实施例中视频流播放系统的功能模块图。
图3为本申请一实施例中计算机装置示意图。
具体实施方式
为了能够更清楚地理解本申请的上述目的、特征和优点,下面结合附图和具体实施方式对本申请进行详细描述。需要说明的是,在不冲突的情况下,本申请的实施方式及实施方式中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本申请,所描述的实施方式仅仅是本申请一部分实施方式,而不是全部的实施方式。基于本申请中的实施方式,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施方式,都属于本申请保护的范围。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施方式的目的,不是旨在于限制本申请。
优选地,本申请的视频流播放方法应用在一个或者多个计算机装置中。所述计算机装置是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述计算机装置可以是桌上型计算机、笔记本电脑、平板电脑、服务器等计算设备。所述计算机装置可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。
实施例一:
图1是本申请视频流播放方法较佳实施例的步骤流程图。根据不同的需求,所述流程图中步骤的顺序可以改变,某些步骤可以省略。
参阅图1所示,所述视频流播放方法具体包括以下步骤。
步骤S11、接收视频采集设备所采集的视频流。
在一实施方式中,所述视频采集设备可以是摄像头、摄像机、LD视频机等。所述视频采集设备可以安装在需要进行视频采集的区域特定位置。所述视频采集设备可以通过网络与计算机装置1(如图3所示)进行通信,当启动所述视频采集设备时,所述视频采集设备可以采集视频流,并将所述视频流发送至所述计算机装置1进行处理。
步骤S12、获取所述视频流中每一图像帧,并对每一所述图像帧按预设比例进行缩小处理。
在一实施方式中,视频采集设备所采集的视频流清晰度一般较高(比如高清、超清),进而使得视频流中的每一图像帧具有较大的图像数据,在进行人脸检测时将需要更多的检测时间。通过对视频流中每一图像帧按预设比例进行缩小处理,缩小后的图像帧数据比较小,在人脸检测时能减少数据计算量,可以使得人脸检测将花费更少的检测时间。所述预设比例优选需要保证缩小 后的图像能清楚分辨出人脸区域,不影响人脸图像的检测。例如可以将待检测图像缩小到M*N大小,所述M、N为像素点数,M与N的值与所述原图像帧长宽比一致,例如原始图片帧的像素比为1600*1200,所述预设比例为十分之一,缩小后的图像像素比为160*120,160*120大小的图像既能保证图像中的人脸比较容易分辨,又能最大限度的减少检测人脸图像时的计算量。
在一实施方式中,为了进一步缩短人脸检测时间,可以利用多线程并行执行视频流人脸检测动作,比如:建立第一线程并利用所述第一线程获取所述视频流中每一图像帧,建立第二线程并利用所述第二线程对每一所述图像帧按所述预设比例进行缩小处理。
步骤S13,对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像帧。
在一实施方式中,前述步骤接收到的视频流的各图像帧中可能包括不存在人脸的图像帧,即所述视频流中的各图像帧所包含的图像信息并不是全部都需要进行人脸检测,需对所述视频流中的每一图像帧进行人脸检测,以从所述视频流中筛选出包含人脸的图像帧。
在一实施方式中,可以根据预设人脸图像样本库建立并训练得到人脸识别模型,并利用所述人脸识别模型对经过缩小处理的图像帧进行人脸检测,以筛选出人脸图像帧。具体地,先构建人脸图像样本库并建立一用于人脸检测的卷积神经网络,所述人脸图像样本库包含多个人的人脸图像,每个人的人脸图像可以包括多种角度,每种角度的人脸图像可以有多张图片;将人脸图像样本库中的人脸图像输入至卷积神经网络,使用卷积神经网络的默认参数进行卷积神经网络训练;根据训练得到的中间结果,对默认参数的初始权值、训练速率、迭代次数等参数进行不断调整,直到得到最优的卷积神经网络网络参数,该具有最优网络参数的卷积神经网络即可作为所述人脸识别模型。卷积神经网络的最优网络参数是指符合预设参数要求的参数,所述预设参数要求可以根据实际的使用需求进行设定。
在一实施方式中,可以利用所述第二线程来实现对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像帧。
步骤S14、绘制每一所述人脸图像帧的人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中。
在一实施方式中,所述人脸特征点可以由眼睛、鼻子、嘴巴、下巴等部分构成,人脸特征点可以通过积分投影方式或者人脸对 齐算法从图像帧信息中求出,人脸特征点的数目可以根据所选择的算法和实际需求而定。人脸特征点可以用于进行人脸识别来区分图像中不同的人脸,由于眼睛是人脸当中比较突出的人脸特征,可以先对眼睛进行精确定位,则脸部其他器官,如:眼眉、嘴巴、鼻子等,可以由潜在的分布关系得出比较准确的定位。所述人脸框为人脸图像中可以将人脸特征点全部包含在内的一个矩形区域的边框,可以根据人脸的特征点位置确定出当前图像帧中的人脸框,通过这些特征点的位置,计算求出包含这些特征点的矩形区域,该矩形区域的边框即为当前图像帧中确定出的人脸检测框。人脸检测框的大小可根据实际需求而定,但最好需包含人脸的全部的特征点在内。所述人脸对齐算法可以是ASM算法、AAM算法、STASM算法等。
举例而言,人脸图像特征点的绘制通过对应于不同积分投影方式下产生的波峰或波谷进行。其中,积分投影分为垂直投影和水平投影,设f(x,y)表示图像(x,y)处的灰度值,在图像[y1,y2]和[x1,x2]区域的水平积分投影M h(y)和垂直积分投影M v(x)分别表示为:
Figure PCTCN2019090027-appb-000001
Figure PCTCN2019090027-appb-000002
其中,水平积分投影是将一行所有像素点的灰度值进行累加后再显示,垂直积分投影是将一列所有像素点的灰度值进行累加后再显示。通过定位两个波谷点x1、x2从该待识别人脸图像中把横轴[x1,x2]区域的图像截取出来,即可实现待识别人脸图像左右边界的定位。对左右边界定位后二值化待识别人脸图像,分别进行水平积分投影和垂直积分投影。进一步的,利用对人脸图像的先验知识可知,眉毛和眼睛是人脸图像中较近的黑色区域,其对应着水平积分投影曲线上的前两个极小值点。第一个极小值点对应的是眉毛在纵轴上的位置,记做y brow,第二个极小值点对应的是眼睛在纵轴上的位置,记做y eye,第三个极小值点对应的是鼻子在纵轴上的位置,记做y nose,第四个极小值点对应的是嘴巴在纵轴上的位置,记做y month。同样,人脸图像中心对称轴两侧出现两个极小值点,分别对应左右眼在横轴上的位置,记做x left-eye、x right-eye;眉毛在横轴上的位置和眼睛相同,嘴巴和鼻子在横轴上的位置为(x left-eye+x right-eye)/2。
在一实施方式中,由于是对缩小后的图像帧进行人脸特征点及人脸框的绘制,绘制完成后还需要将人脸特征点及人脸框映射到原始大小的图像帧中,进而使得播放的视频中若存在人脸时会同时显示绘制的人脸框及人脸特征点。可以根据先前图像的缩小 比例进行人脸特征点及人脸框映射。
举例而言,当对视频流的第二帧图像进行人脸检测时,首先对该第二帧图像进行缩小处理,若检测到该第二帧图像包含有人脸图像,则对缩小后的第二帧图像绘制人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到原始的第二图像帧中,若判断该第二帧图像不包含人脸图像,则不进行后续的人脸特征点及人脸框的绘制。
在一实施方式中,同样可以利用所述第二线程来实现绘制每一所述人脸图像帧的人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中。
在一实施方式中,由于视频播放原理一般是每秒显示若干张连续图片(例如25张),在对每一人脸图像进行人脸特征点检测与人脸框绘制时,可能会出现某一图像帧需要被播放但是该图像帧的人脸特征点与人脸框还未绘制或者还未绘制完成,此时可以直接用上一帧的人脸特征点与人脸框映射到该图像帧,避免由于人脸特征点、人脸框检测绘制耗时过长,导致出现视频显示卡顿或无法显示的现象。具体地,还可以通过以下方式来实现避免视频显示卡顿的现象:判断当前待播放的人脸图像帧的人脸特征点及人脸框是否绘制完成;若当前待播放的人脸图像帧的人脸特征点及人脸框绘制完成,则将所述人脸特征点及所述人脸框映射到未经过缩小处理的原人脸图像帧中;若当前待播放的人脸图像帧的人脸特征点及人脸框未绘制完成,则将上一帧或者更早的人脸图像帧的人脸特征点及人脸框映射到未经过缩小处理的原人脸图像帧中。
在一实施方式中,可以通过估算绘制一人脸图像的人脸特征点及人脸框的耗时,并将该绘制耗时与每秒钟帧数进行比较,来得到人脸特征点检测需要间隔多少帧数检测一次,例如计算得到绘制耗时为1/9秒,而每秒钟帧数为25帧/s,则可以得到每3帧检测一次。
步骤S15、将添加有所述人脸特征点及所述人脸框的视频流按照预设帧频率进行播放。
在一实施方式中,所述预设频率可以根据实际使用需求进行设定,例如所述预设频率是每秒25帧。可以建立第三线程并利用所述第三线程读取添加有所述人脸特征点及所述人脸框的视频流的每一图像帧,并将每一所述图像帧按照所述预设帧频率进行播放。
上述视频流播放方法,通过将视频流的图像帧按预设比例进行缩小处理,缩写后的图像数据较小,在人脸检测时能减少数据计算量,提高人脸检测速度,检测完毕后再将人脸特征点信息映射 到原始图像帧中进行播放,在当前图像帧检测忙碌时,可直接用上一帧的人脸和特征点信息,可进一步避免视频播放时出现人脸和特征点拖拉现象,且充分发挥检测设备的并行计算能力,采用多线程来实现视频流人脸检测功能,进一步缩短人脸检测时间。
实施例二:
图2为本申请视频流播放系统较佳实施例的功能模块图。
参阅图2所示,所述视频流播放系统10可以包括接收模块101、处理模块102、检测模块103、映射模块104及播放模块105。
所述接收模块101用于接收视频采集设备所采集的视频流。
在一实施方式中,所述视频采集设备可以是摄像头、摄像机、LD视频机等。所述视频采集设备可以安装在需要进行视频采集的区域特定位置。所述视频采集设备可以通过网络与计算机装置1进行通信,当启动所述视频采集设备时,所述视频采集设备可以采集视频流,并将所述视频流发送至所述计算机装置1进行处理。
所述处理模块102用于获取所述视频流中每一图像帧,并对每一所述图像帧按预设比例进行缩小处理。
在一实施方式中,视频采集设备所采集的视频流清晰度一般较高(比如高清、超清),进而使得视频流中的每一图像帧具有较大的图像数据,在进行人脸检测时将需要更多的检测时间。通过所述处理模块102对视频流中每一图像帧按预设比例进行缩小处理,缩小后的图像帧数据比较小,在人脸检测时能减少数据计算量,可以使得人脸检测将花费更少的检测时间。所述预设比例优选需要保证缩小后的图像能清楚分辨出人脸区域,不影响人脸图像的检测。例如可以将待检测图像缩小到M*N大小,所述M、N为像素点数,M与N的值与所述原图像帧长宽比一致,例如原始图片帧的像素比为1600*1200,所述预设比例为十分之一,缩小后的图像像素比为160*120,160*120大小的图像既能保证图像中的人脸比较容易分辨,又能最大限度的减少检测人脸图像时的计算量。
在一实施方式中,为了进一步缩短人脸检测时间,所述视频流播放系统10可以利用多线程并行执行视频流人脸检测动作,比如:建立第一线程并利用所述第一线程获取所述视频流中每一图像帧,建立第二线程并利用所述第二线程对每一所述图像帧按所述预设比例进行缩小处理。
所述检测模块103用于对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像帧。
在一实施方式中,所述接收模块101接收到的视频流的各图像帧中可能包括不存在人脸的图像帧,即所述视频流中的各图像帧 所包含的图像信息并不是全部都需要进行人脸检测,需对所述视频流中的每一图像帧进行人脸检测,以从所述视频流中筛选出包含人脸的图像帧。
在一实施方式中,所述检测模块103可以根据预设人脸图像样本库建立并训练得到人脸识别模型,并利用所述人脸识别模型对经过缩小处理的图像帧进行人脸检测,以筛选出人脸图像帧。具体地,先构建人脸图像样本库并建立一用于人脸检测的卷积神经网络,所述人脸图像样本库包含多个人的人脸图像,每个人的人脸图像可以包括多种角度,每种角度的人脸图像可以有多张图片;将人脸图像样本库中的人脸图像输入至卷积神经网络,使用卷积神经网络的默认参数进行卷积神经网络训练;根据训练得到的中间结果,对默认参数的初始权值、训练速率、迭代次数等参数进行不断调整,直到得到最优的卷积神经网络网络参数,该具有最优网络参数的卷积神经网络即可作为所述人脸识别模型。卷积神经网络的最优网络参数是指符合预设参数要求的参数,所述预设参数要求可以根据实际的使用需求进行设定。
在一实施方式中,所述检测模块103可以利用所述第二线程来实现对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像帧。
所述映射模块104用于绘制每一所述人脸图像帧的人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中。
在一实施方式中,所述人脸特征点可以由眼睛、鼻子、嘴巴、下巴等部分构成,人脸特征点可以通过积分投影方式或者人脸对齐算法从图像帧信息中求出,人脸特征点的数目可以根据所选择的算法和实际需求而定。人脸特征点可以用于进行人脸识别来区分图像中不同的人脸,由于眼睛是人脸当中比较突出的人脸特征,可以先对眼睛进行精确定位,则脸部其他器官,如:眼眉、嘴巴、鼻子等,可以由潜在的分布关系得出比较准确的定位。所述人脸框为人脸图像中可以将人脸特征点全部包含在内的一个矩形区域的边框,可以根据人脸的特征点位置确定出当前图像帧中的人脸框,通过这些特征点的位置,计算求出包含这些特征点的矩形区域,该矩形区域的边框即为当前图像帧中确定出的人脸检测框。人脸检测框的大小可根据实际需求而定,但最好需包含人脸的全部的特征点在内。所述人脸对齐算法可以是ASM算法、AAM算法、STASM算法等。
举例而言,人脸图像特征点的绘制通过对应于不同积分投影方式下产生的波峰或波谷进行。其中,积分投影分为垂直投影和水平投影,设f(x,y)表示图像(x,y)处的灰度值,在图像[y1,y2] 和[x1,x2]区域的水平积分投影M h(y)和垂直积分投影M v(x)分别表示为:
Figure PCTCN2019090027-appb-000003
Figure PCTCN2019090027-appb-000004
其中,水平积分投影是将一行所有像素点的灰度值进行累加后再显示,垂直积分投影是将一列所有像素点的灰度值进行累加后再显示。通过定位两个波谷点x1、x2从该待识别人脸图像中把横轴[x1,x2]区域的图像截取出来,即可实现待识别人脸图像左右边界的定位。对左右边界定位后二值化待识别人脸图像,分别进行水平积分投影和垂直积分投影。进一步的,利用对人脸图像的先验知识可知,眉毛和眼睛是人脸图像中较近的黑色区域,其对应着水平积分投影曲线上的前两个极小值点。第一个极小值点对应的是眉毛在纵轴上的位置,记做y brow,第二个极小值点对应的是眼睛在纵轴上的位置,记做y eye,第三个极小值点对应的是鼻子在纵轴上的位置,记做y nose,第四个极小值点对应的是嘴巴在纵轴上的位置,记做y month。同样,人脸图像中心对称轴两侧出现两个极小值点,分别对应左右眼在横轴上的位置,记做x left-eye、x right-eye;眉毛在横轴上的位置和眼睛相同,嘴巴和鼻子在横轴上的位置为(x left-eye+x right-eye)/2。
在一实施方式中,由于是对缩小后的图像帧进行人脸特征点及人脸框的绘制,所述映射模块104绘制完人脸特征点及人脸框后,还需要将人脸特征点及人脸框映射到原始大小的图像帧中,进而使得播放的视频中若存在人脸时会同时显示绘制的人脸框及人脸特征点。所述映射模块104可以根据先前图像的缩小比例进行人脸特征点及人脸框映射。
举例而言,当对视频流的第二帧图像进行人脸检测时,所述处理模块102对该第二帧图像进行缩小处理,若检测模块103检测到该第二帧图像包含有人脸图像,则映射模块104对缩小后的第二帧图像绘制人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到原始的第二图像帧中,若检测模块103判断该第二帧图像不包含人脸图像,则不进行后续的人脸特征点及人脸框的绘制。
在一实施方式中,所述映射模块104同样可以利用所述第二线程来实现绘制每一所述人脸图像帧的人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中。
在一实施方式中,由于视频播放原理一般是每秒显示若干张连续图片(例如25张),在对每一人脸图像进行人脸特征点检测与 人脸框绘制时,可能会出现某一图像帧需要被播放但是该图像帧的人脸特征点与人脸框还未绘制或者还未绘制完成,此时可以直接用上一帧的人脸特征点与人脸框映射到该图像帧,避免由于人脸特征点、人脸框检测绘制耗时过长,导致出现视频显示卡顿或无法显示的现象。具体地,所述映射模块104还可以通过以下方式来实现避免视频显示卡顿的现象:判断当前待播放的人脸图像帧的人脸特征点及人脸框是否绘制完成;若当前待播放的人脸图像帧的人脸特征点及人脸框绘制完成,则将所述人脸特征点及所述人脸框映射到未经过缩小处理的原人脸图像帧中;若当前待播放的人脸图像帧的人脸特征点及人脸框未绘制完成,则将上一帧或者更早的人脸图像帧的人脸特征点及人脸框映射到未经过缩小处理的原人脸图像帧中。
在一实施方式中,所述映射模块104可以通过估算绘制一人脸图像的人脸特征点及人脸框的耗时,并将该绘制耗时与每秒钟帧数进行比较,来得到人脸特征点检测需要间隔多少帧数检测一次,例如计算得到绘制耗时为1/9秒,而每秒钟帧数为25帧/s,则可以得到每3帧检测一次。
所述播放模块105用于将添加有所述人脸特征点及所述人脸框的视频流按照预设帧频率进行播放。
在一实施方式中,所述预设频率可以根据实际使用需求进行设定,例如所述预设频率是每秒25帧。所述播放模块105可以利用所述第三线程读取添加有所述人脸特征点及所述人脸框的视频流的每一图像帧,并将每一所述图像帧按照所述预设帧频率进行播放。
上述视频流播放系统,通过将视频流的图像帧按预设比例进行缩小处理,缩写后的图像数据较小,在人脸检测时能减少数据计算量,提高人脸检测速度,检测完毕后再将人脸特征点信息映射到原始图像帧中进行播放,在当前图像帧检测忙碌时,可直接用上一帧的人脸和特征点信息,可进一步避免视频播放时出现人脸和特征点拖拉现象,且充分发挥检测设备的并行计算能力,采用多线程来实现视频流人脸检测功能,进一步缩短人脸检测时间。
图3为本申请计算机装置较佳实施例的示意图。
所述计算机装置1包括存储器20、处理器30以及存储在所述存储器20中并可在所述处理器30上运行的计算机可读指令40,例如视频流播放程序。所述处理器30执行所述计算机可读指令40时实现上述视频流播放方法实施例中的步骤,例如图1所示的步骤S11~S15。所述处理器30执行所述计算机可读指令40时实现上述视频流播放系统实施例中各模块的功能,例如图2中的模块 101~105。
示例性的,所述计算机可读指令40可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器20中,并由所述处理器30执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令指令段,所述指令段用于描述所述计算机可读指令40在所述计算机装置1中的执行过程。例如,所述计算机可读指令40可以被分割成图2中的接收模块101、处理模块102、检测模块103、映射模块104及播放模块105。各模块具体功能参见实施例二。
所述计算机装置1可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。本领域技术人员可以理解,所述示意图仅仅是计算机装置1的示例,并不构成对计算机装置1的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述计算机装置1还可以包括输入输出设备、网络接入设备、总线等。
所称处理器30可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者所述处理器30也可以是任何常规的处理器等,所述处理器30是所述计算机装置1的控制中心,利用各种接口和线路连接整个计算机装置1的各个部分。
所述存储器20可用于存储所述计算机可读指令40和/或模块/单元,所述处理器30通过运行或执行存储在所述存储器20内的计算机可读指令和/或模块/单元,以及调用存储在存储器20内的数据,实现所述计算机装置1的各种功能。所述存储器20可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据计算机装置1的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器20可以包括高速随机存取存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
所述计算机装置1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机 可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性可读存储介质中,所述计算机可读指令在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机可读指令包括计算机可读指令代码,所述计算机可读指令代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机可读指令代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
在本申请所提供的几个实施例中,应该理解到,所揭露的计算机装置和方法,可以通过其它的方式实现。例如,以上所描述的计算机装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
另外,在本申请各个实施例中的各功能单元可以集成在相同处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在相同单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。计算机装置权利要求中陈述的多个单元或计算机装置也可以由同一个单元或计算机装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。

Claims (20)

  1. 一种视频流播放方法,其特征在于,所述方法包括:
    接收视频采集设备所采集的视频流;
    获取所述视频流中每一图像帧,并对每一所述图像帧按预设比例进行缩小处理;
    对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像帧;
    绘制每一所述人脸图像帧的人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中;及
    将添加有所述人脸特征点及所述人脸框的视频流按照预设帧频率进行播放。
  2. 如权利要求1所述的视频流播放方法,其特征在于,所述获取所述视频流中每一图像帧,并对每一所述图像帧按预设比例进行缩小处理的步骤包括:
    建立第一线程并利用所述第一线程获取所述视频流中每一图像帧;及
    建立第二线程并利用所述第二线程对每一所述图像帧按所述预设比例进行缩小处理。
  3. 如权利要求2所述的视频流播放方法,其特征在于,所述对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像帧的步骤包括:
    利用所述第二线程对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像;
    所述绘制每一所述人脸图像帧的人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中的步骤包括:
    利用所述第二线程绘制每一所述人脸图像帧的人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中。
  4. 如权利要求2所述的视频流播放方法,其特征在于,所述将添加有所述人脸特征点及所述人脸框的视频流按照预设帧频率进行播放的步骤包括:
    建立第三线程并利用所述第三线程读取添加有所述人脸特征点及所述人脸框的视频流的每一图像帧,并将每一所述图像帧按照所述预设帧频率进行播放。
  5. 如权利要求1所述的视频流播放方法,其特征在于,所述对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像帧的步骤包括:
    根据预设人脸图像样本库建立并训练得到人脸识别模型;及
    利用所述人脸识别模型对经过缩小处理的图像帧进行人脸检测,以筛选出人脸图像帧。
  6. 如权利要求5所述的视频流播放方法,其特征在于,所述根据预设人脸图像样本库建立并训练得到人脸识别模型的步骤包括:
    建立卷积神经网络,并将所述预设人脸图像样本库中的人脸图像输入至所述卷积神经网络,其中所述预设人脸图像样本库包含多个人的人脸图像,每个人的人脸图像包括多种角度,且每种角度包括有多张图片;及
    利用所述卷积神经网络的默认参数进行训练;及
    根据训练结果对所述默认参数的初始权值、训练速率、迭代次数进行调整,直至所述卷积神经网络的网络参数被调整至符合预设参数要求。
  7. 如权利要求1所述的视频流播放方法,其特征在于,所述将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中的步骤包括:
    判断当前待播放的人脸图像帧的人脸特征点及人脸框是否绘制完成;
    若当前待播放的人脸图像帧的人脸特征点及人脸框绘制完成,则将所述人脸特征点及所述人脸框映射到未经过缩小处理的原人脸图像帧中;
    若当前待播放的人脸图像帧的人脸特征点及人脸框未绘制完成,则将上一帧的人脸图像帧的人脸特征点及人脸框映射到未经过缩小处理的原人脸图像帧中。
  8. 一种视频流播放系统,其特征在于,所述系统包括:
    接收模块,用于接收视频采集设备所采集的视频流;
    处理模块,用于获取所述视频流中每一图像帧,并对每一所述图像帧按预设比例进行缩小处理;
    检测模块,用于对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像帧;
    映射模块,用于绘制每一所述人脸图像帧的人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中;及
    播放模块,用于将添加有所述人脸特征点及所述人脸框的视频流按照预设帧频率进行播放。
  9. 一种计算机装置,其特征在于,所述计算机装置包括处理器和存储器,所述处理器用于执行所述存储器中存储的计算机可读指令时实现以下步骤:
    接收视频采集设备所采集的视频流;
    获取所述视频流中每一图像帧,并对每一所述图像帧按预设比例进行缩小处理;
    对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像帧;
    绘制每一所述人脸图像帧的人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中;及
    将添加有所述人脸特征点及所述人脸框的视频流按照预设帧频率进行播放。
  10. 如权利要求9所述的计算机装置,其特征在于,所述获取所述视频流中每一图像帧,并对每一所述图像帧按预设比例进行缩小处理的步骤包括:
    建立第一线程并利用所述第一线程获取所述视频流中每一图像帧;及
    建立第二线程并利用所述第二线程对每一所述图像帧按所述预设比例进行缩小处理。
  11. 如权利要求10所述的计算机装置,其特征在于,所述对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像帧的步骤包括:
    利用所述第二线程对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像;
    所述绘制每一所述人脸图像帧的人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中的步骤包括:
    利用所述第二线程绘制每一所述人脸图像帧的人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中。
  12. 如权利要求10所述的计算机装置,其特征在于,所述将添加有所述人脸特征点及所述人脸框的视频流按照预设帧频率进行播放的步骤包括:
    建立第三线程并利用所述第三线程读取添加有所述人脸特征点及所述人脸框的视频流的每一图像帧,并将每一所述图像帧按照所述预设帧频率进行播放。
  13. 如权利要求9所述的计算机装置,其特征在于,所述对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像帧的步骤包括:
    根据预设人脸图像样本库建立并训练得到人脸识别模型;及
    利用所述人脸识别模型对经过缩小处理的图像帧进行人脸检测,以筛选出人脸图像帧。
  14. 如权利要求13所述的计算机装置,其特征在于,所述根据预设人脸图像样本库建立并训练得到人脸识别模型的步骤包括:
    建立卷积神经网络,并将所述预设人脸图像样本库中的人脸图像输入至所述卷积神经网络,其中所述预设人脸图像样本库包含多个人的人脸图像,每个人的人脸图像包括多种角度,且每种角度包括有多张图片;及
    利用所述卷积神经网络的默认参数进行训练;及
    根据训练结果对所述默认参数的初始权值、训练速率、迭代次数进行调整,直至所述卷积神经网络的网络参数被调整至符合预设参数要求。
  15. 如权利要求9所述的计算机装置,其特征在于,所述将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中的步骤包括:
    判断当前待播放的人脸图像帧的人脸特征点及人脸框是否绘制完成;
    若当前待播放的人脸图像帧的人脸特征点及人脸框绘制完成,则将所述人脸特征点及所述人脸框映射到未经过缩小处理的原人脸图像帧中;
    若当前待播放的人脸图像帧的人脸特征点及人脸框未绘制完成,则将上一帧的人脸图像帧的人脸特征点及人脸框映射到未经过缩小处理的原人脸图像帧中。
  16. 一种非易失性可读存储介质,所述非易失性可读存储介质上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现以下步骤:
    接收视频采集设备所采集的视频流;
    获取所述视频流中每一图像帧,并对每一所述图像帧按预设比例进行缩小处理;
    对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像帧;
    绘制每一所述人脸图像帧的人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中;及
    将添加有所述人脸特征点及所述人脸框的视频流按照预设帧频率进行播放。
  17. 如权利要求16所述的非易失性可读存储介质,其特征在于,所述获取所述视频流中每一图像帧,并对每一所述图像帧按预设比例进行缩小处理的步骤包括:
    建立第一线程并利用所述第一线程获取所述视频流中每一图像帧;及
    建立第二线程并利用所述第二线程对每一所述图像帧按所述 预设比例进行缩小处理。
  18. 如权利要求17所述的非易失性可读存储介质,其特征在于,所述对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像帧的步骤包括:
    利用所述第二线程对经过缩小处理的图像帧进行人脸检测并筛选出人脸图像;
    所述绘制每一所述人脸图像帧的人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中的步骤包括:
    利用所述第二线程绘制每一所述人脸图像帧的人脸特征点及人脸框,并将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中。
  19. 如权利要求17所述的非易失性可读存储介质,其特征在于,所述将添加有所述人脸特征点及所述人脸框的视频流按照预设帧频率进行播放的步骤包括:
    建立第三线程并利用所述第三线程读取添加有所述人脸特征点及所述人脸框的视频流的每一图像帧,并将每一所述图像帧按照所述预设帧频率进行播放。
  20. 如权利要求16所述的非易失性可读存储介质,其特征在于,所述将所述人脸特征点及所述人脸框映射到未经过缩小处理的原图像帧中的步骤包括:
    判断当前待播放的人脸图像帧的人脸特征点及人脸框是否绘制完成;
    若当前待播放的人脸图像帧的人脸特征点及人脸框绘制完成,则将所述人脸特征点及所述人脸框映射到未经过缩小处理的原人脸图像帧中;
    若当前待播放的人脸图像帧的人脸特征点及人脸框未绘制完成,则将上一帧的人脸图像帧的人脸特征点及人脸框映射到未经过缩小处理的原人脸图像帧中。
PCT/CN2019/090027 2019-01-25 2019-06-04 视频流播放方法、系统、计算机装置及可读存储介质 WO2020151156A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910075210.X 2019-01-25
CN201910075210.XA CN109840491B (zh) 2019-01-25 2019-01-25 视频流播放方法、系统、计算机装置及可读存储介质

Publications (1)

Publication Number Publication Date
WO2020151156A1 true WO2020151156A1 (zh) 2020-07-30

Family

ID=66884230

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/090027 WO2020151156A1 (zh) 2019-01-25 2019-06-04 视频流播放方法、系统、计算机装置及可读存储介质

Country Status (2)

Country Link
CN (1) CN109840491B (zh)
WO (1) WO2020151156A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110443115B (zh) * 2019-06-19 2023-12-22 平安科技(深圳)有限公司 人脸识别方法、装置、计算机设备及可读存储介质
CN110348353B (zh) * 2019-06-28 2023-07-25 照熠信息科技(上海)有限公司 一种图像处理方法及装置
CN111783632B (zh) * 2020-06-29 2022-06-10 北京字节跳动网络技术有限公司 针对视频流的人脸检测方法、装置、电子设备及存储介质
CN112183227B (zh) * 2020-09-08 2023-12-22 瑞芯微电子股份有限公司 一种智能泛人脸区域的编码方法和设备
CN112132797B (zh) * 2020-09-15 2024-02-20 新华智云科技有限公司 一种短视频质量筛选方法
CN113286175A (zh) * 2021-04-27 2021-08-20 金卯新能源集团有限公司 视频流处理方法、装置及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050036704A1 (en) * 2003-08-13 2005-02-17 Adriana Dumitras Pre-processing method and system for data reduction of video sequences and bit rate reduction of compressed video sequences using spatial filtering
CN106650575A (zh) * 2016-09-19 2017-05-10 北京小米移动软件有限公司 人脸检测方法及装置
CN107909551A (zh) * 2017-10-30 2018-04-13 珠海市魅族科技有限公司 图像处理方法、装置、计算机装置及计算机可读存储介质
CN108198148A (zh) * 2017-12-07 2018-06-22 北京小米移动软件有限公司 图像处理的方法及装置
CN108564008A (zh) * 2018-03-28 2018-09-21 厦门瑞为信息技术有限公司 一种基于zynq的实时行人与人脸检测方法
CN108875480A (zh) * 2017-08-15 2018-11-23 北京旷视科技有限公司 一种人脸特征信息的追踪方法、装置及系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018177134A1 (zh) * 2017-03-29 2018-10-04 腾讯科技(深圳)有限公司 用户生成内容处理方法、存储介质和终端
CN109214303B (zh) * 2018-08-14 2021-10-01 北京工商大学 一种基于云端api的多线程动态人脸签到方法
CN109246332A (zh) * 2018-08-31 2019-01-18 北京达佳互联信息技术有限公司 视频流降噪方法和装置、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050036704A1 (en) * 2003-08-13 2005-02-17 Adriana Dumitras Pre-processing method and system for data reduction of video sequences and bit rate reduction of compressed video sequences using spatial filtering
CN106650575A (zh) * 2016-09-19 2017-05-10 北京小米移动软件有限公司 人脸检测方法及装置
CN108875480A (zh) * 2017-08-15 2018-11-23 北京旷视科技有限公司 一种人脸特征信息的追踪方法、装置及系统
CN107909551A (zh) * 2017-10-30 2018-04-13 珠海市魅族科技有限公司 图像处理方法、装置、计算机装置及计算机可读存储介质
CN108198148A (zh) * 2017-12-07 2018-06-22 北京小米移动软件有限公司 图像处理的方法及装置
CN108564008A (zh) * 2018-03-28 2018-09-21 厦门瑞为信息技术有限公司 一种基于zynq的实时行人与人脸检测方法

Also Published As

Publication number Publication date
CN109840491A (zh) 2019-06-04
CN109840491B (zh) 2024-07-02

Similar Documents

Publication Publication Date Title
WO2020151156A1 (zh) 视频流播放方法、系统、计算机装置及可读存储介质
JP7110502B2 (ja) 深度を利用した映像背景減算法
US20220083763A1 (en) Face image processing methods and apparatuses, and electronic devices
US10832069B2 (en) Living body detection method, electronic device and computer readable medium
WO2020108082A1 (zh) 视频处理方法、装置、电子设备和计算机可读介质
US9667860B2 (en) Photo composition and position guidance in a camera or augmented reality system
WO2020056903A1 (zh) 用于生成信息的方法和装置
WO2021213067A1 (zh) 物品显示方法、装置、设备及存储介质
CN110705583A (zh) 细胞检测模型训练方法、装置、计算机设备及存储介质
WO2012025042A1 (zh) 视频画面显示方法及装置
WO2020244074A1 (zh) 表情交互方法、装置、计算机设备及可读存储介质
WO2023050651A1 (zh) 图像语义分割方法、装置、设备及存储介质
US11270100B2 (en) Face image detection method and terminal device
US20220270266A1 (en) Foreground image acquisition method, foreground image acquisition apparatus, and electronic device
US20230351604A1 (en) Image cutting method and apparatus, computer device, and storage medium
WO2020108010A1 (zh) 视频处理方法、装置、电子设备以及存储介质
CN105430269B (zh) 一种应用于移动终端的拍照方法及装置
CN110781823A (zh) 录屏检测方法、装置、可读介质及电子设备
WO2020034672A1 (zh) 一种确定图像中用户的感兴趣区域的方法及装置
WO2020244160A1 (zh) 终端设备控制方法、装置、计算机设备及可读存储介质
WO2020052062A1 (zh) 检测方法和装置
CN114845158B (zh) 视频封面的生成方法、视频发布方法及相关设备
US20130182943A1 (en) Systems and methods for depth map generation
WO2020103462A1 (zh) 视觉搜索方法、装置、计算机设备及存储介质
US11546577B2 (en) Video jitter detection method and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19911046

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19911046

Country of ref document: EP

Kind code of ref document: A1