WO2018098677A1 - 处理视频流的方法和终端 - Google Patents

处理视频流的方法和终端 Download PDF

Info

Publication number
WO2018098677A1
WO2018098677A1 PCT/CN2016/107993 CN2016107993W WO2018098677A1 WO 2018098677 A1 WO2018098677 A1 WO 2018098677A1 CN 2016107993 W CN2016107993 W CN 2016107993W WO 2018098677 A1 WO2018098677 A1 WO 2018098677A1
Authority
WO
WIPO (PCT)
Prior art keywords
gpu
image data
terminal
video
image
Prior art date
Application number
PCT/CN2016/107993
Other languages
English (en)
French (fr)
Inventor
向晨宇
刘昂
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2016/107993 priority Critical patent/WO2018098677A1/zh
Priority to CN201680002250.6A priority patent/CN106688016A/zh
Publication of WO2018098677A1 publication Critical patent/WO2018098677A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/001Texturing; Colouring; Generation of texture or colour

Definitions

  • the present disclosure relates to the field of image processing, and more particularly to a method and terminal for processing a video stream using a graphics processing unit (GPU).
  • GPU graphics processing unit
  • the terminal provides a rich variety of functions including, for example, recording, processing, and live streaming of images/videos.
  • One of the important functions of a video processing application is to provide the user with a retouching function (or more generally, a beauty and thin face function). The user can correct the original image captured by the camera to achieve an image/video effect that is satisfactory to the user.
  • a method of processing a video stream using a graphics processing unit (GPU) at a processor of a terminal includes: acquiring image data of at least one frame image in the video stream; determining at least one feature point in the image data; and instructing the GPU to perform image on the image data based on the at least one feature point deal with.
  • GPU graphics processing unit
  • the video stream is a video stream having a 4K or higher resolution.
  • the at least one frame image comprises a human face
  • the at least one feature point comprises at least one of the following: a face, an eye, a nose, a mouth, an ear, a hair, or an eyebrow.
  • the image processing includes at least one of the following: contour correction, color correction, brightness correction, and blurring.
  • the method further comprises instructing the GPU to perform at least one of a video recording operation, a video display operation, an object tracking operation, or a live video operation for the image processed image data.
  • instructing the GPU to perform at least one of a video recording operation, a video display operation, an object tracking operation, or a video live broadcast operation includes: instructing the GPU to perform a video recording operation, a video display operation, an object tracking operation, Or at least two of the live video operations.
  • the image data is coordinate transformed before the GPU performs image processing on the image data based on the at least one feature point to enable image processing by the GPU.
  • after the GPU performs image processing on the image data based on the at least one feature point if the GPU is image processing for the video stream for the first time, a new texture is created, otherwise In the case where the aspect ratio of the image data does not change, only the existing texture is updated.
  • a new texture is created if the GPU is not image processing for the video stream for the first time and the aspect ratio of the image data has changed.
  • the change in the aspect ratio of the image data is caused by changing the camera and/or rotating the terminal.
  • a terminal for processing a video stream using a graphics processing unit includes: image data acquiring means for acquiring image data of at least one frame image in the video stream; feature point determining means for determining at least one feature point in the image data; and image processing indicating means And for instructing the GPU to perform image processing on the image data based on the at least one feature point.
  • the video stream is a video stream having a 4K or higher resolution.
  • the at least one frame image comprises a human face
  • the at least one feature point comprises at least one of the following: a face, an eye, a nose, a mouth, an ear, a hair, or an eyebrow.
  • the image processing includes at least one of the following: contour correction, color correction, brightness correction, and blurring.
  • the terminal further includes: an operation indication device, configured to instruct the GPU to perform a video recording operation, a video display operation, an object tracking operation, or a live video operation for the image processed image data At least one of them.
  • the operation indication device is further configured to: instruct the GPU to perform at least two of a video recording operation, a video display operation, an object tracking operation, or a live video operation in parallel.
  • the image data is coordinate transformed before the GPU performs image processing on the image data based on the at least one feature point to enable image processing by the GPU.
  • the GPU performs image processing on the image data based on the at least one feature point, if the GPU is image processing for the video stream for the first time, a new texture is created, otherwise In the case where the aspect ratio of the image data does not change, only the existing texture is updated.
  • the change in the aspect ratio of the image data is caused by changing the camera and/or rotating the terminal.
  • a terminal for processing a video stream includes: a processor; a graphics processing unit (GPU); a memory storing instructions that, when executed by the processor, cause the processor to: acquire image data of at least one frame of the video stream Determining at least one feature point in the image data; and instructing the GPU to perform image processing on the image data based on the at least one feature point.
  • a processor a graphics processing unit (GPU); a memory storing instructions that, when executed by the processor, cause the processor to: acquire image data of at least one frame of the video stream Determining at least one feature point in the image data; and instructing the GPU to perform image processing on the image data based on the at least one feature point.
  • GPU graphics processing unit
  • the video stream is a video stream having a 4K or higher resolution.
  • the at least one frame image comprises a human face
  • the at least one feature point comprises at least one of the following: a face, an eye, a nose, a mouth, an ear, a hair, or an eyebrow.
  • the image processing includes at least one of the following: contour correction, color correction, brightness correction, and blurring.
  • the instructions when executed by the processor, further cause the processor to: instruct the GPU to perform a video recording operation, a video display operation, an object tracking for the image processed image data At least one of an operation, or a live video operation.
  • the instructions when executed by the processor, further cause the processor to: instruct the GPU to perform a video recording operation, a video display operation, an object in parallel for the image processed image data Track at least two of the actions, or live video operations.
  • the image data is coordinate transformed before the GPU performs image processing on the image data based on the at least one feature point to enable image processing by the GPU.
  • the GPU after the GPU performs image processing on the image data based on the at least one feature point, if the GPU is image processing for the video stream for the first time, a new texture is created, otherwise In the case where the aspect ratio of the image data does not change, only the existing texture is updated.
  • a new texture is created if the GPU is not image processing for the video stream for the first time and the aspect ratio of the image data has changed.
  • the change in the aspect ratio of the image data is caused by changing the camera and/or rotating the terminal.
  • a computer program when executed by a processor, causing the processor to perform the method according to any of the first aspects of the present disclosure.
  • a computer program product comprising a computer program according to the fourth aspect of the present disclosure.
  • FIG. 1 is a schematic diagram showing a hardware structure of a terminal according to an embodiment of the present disclosure.
  • FIG. 2 is a schematic diagram showing data/instruction interaction between components in the terminal shown in FIG. 1.
  • FIG. 3 is a flowchart illustrating a method of processing a video stream using a GPU performed at a main processor of a terminal, according to an embodiment of the present disclosure.
  • FIG. 4 is a block diagram showing an example functional architecture of a terminal for performing the method illustrated in FIG. 3 in accordance with an embodiment of the present disclosure.
  • 1080P It is a format standard for high-definition digital television developed by the Society of Motion Picture and Television Engineers (SMPTE) with an effective resolution of 1920 ⁇ 1080. It is a display format that achieves a resolution of 1920 ⁇ 1080 under progressive scanning.
  • SMPTE Society of Motion Picture and Television Engineers
  • 4K resolution does not specifically refer to a specific value, it refers to about 4,000 pixels in the horizontal direction, there are subtle gaps depending on the application area.
  • the trend of 4K resolution development is 4096 ⁇ 2160 pixel resolution, which is 4 times the resolution of 2K projector and HDTV, and belongs to ultra high definition resolution.
  • the 4K movie often mentioned has a film resolution of 4096 ⁇ 2160, which is determined by the imaging format of the 4K camera.
  • the 4K of the TV screen refers to the physical resolution of 3840 ⁇ 2160, which is equivalent to 1920 ⁇ 1080 (ie, 1080P) in the horizontal and vertical directions. Doubled, still the standard 16:9 specification.
  • the 4K mentioned includes various known or de facto 4K resolutions, including but not limited to the two aforementioned 4K resolutions, unless otherwise specified.
  • standards such as 2K and 8K are currently defined.
  • Object tracking When an object appears in the field of view/picture of the camera, the object recognition algorithm can be used to determine the existence of the object, and the camera is instructed to follow the determined object until the object exceeds the camera's photographable range. until. Object tracking is widely used in video surveillance, drone driving and other fields.
  • the inventors of the present application have noticed that the related operations of the beauty face-lift function of the existing video beauty software are performed on a central processing unit (CPU), which can only support a video resolution of at most 1080P. Even if the GPU is used to perform the display process, it cannot realize real-time retouching processing for video having a resolution of 4K or more. In addition, it is even less able to support concurrent execution of various operations: screen, photo, video, live, display, and/or object tracking. More generally, the present disclosure has noticed that the existing terminal does not fully utilize the powerful parallel image processing capability of the GPU, which causes the CPU to bear excessive image processing work, resulting in failure to fully utilize existing hardware resources. The problem.
  • CPU central processing unit
  • the terminal may be a mobile terminal, such as a mobile phone, a notebook computer, a handheld camera device, or the like; or a desktop computer or the like.
  • embodiments of the present disclosure propose a scheme for implementing high speed processing of image frames in a video stream using a GPU.
  • coordinate rotation and beauty of the data in the image frame by the GPU and face-lifting with the aid of the CPU can significantly improve the face image for high definition (for example, 4K or higher resolution).
  • High speed processing by using the replacement of texture data on the GPU and the CPU, and rational scheduling of resources, it is possible to perform display, video recording, photographing, live broadcast and concurrently after the face-lifting in 4K or higher resolution. / or object tracking (Tracking) and other functions.
  • FIG. 1 illustrates a hardware schematic of a terminal 100 for performing video stream processing in accordance with an embodiment of the present disclosure.
  • the terminal 100 may include a main processor (hereinafter sometimes referred to as a central processing unit Or a CPU or host processor 102), a graphics processing unit (hereinafter sometimes referred to as GPU) 104, and a display 106.
  • main processor hereinafter sometimes referred to as a central processing unit Or a CPU or host processor 102
  • GPU graphics processing unit
  • graphics processing unit 104 can function as a graphics processing core embedded in host processor 102.
  • its main processor, Kirin 950 includes eight general-purpose processor cores (four Cortex A72s and four Cortex A53s from ARM) and one GPU core (launched by ARM). Mali T880).
  • main processor 102 and graphics processing unit 104 are actually one physical hardware, they can still be logically considered as two separate logical modules.
  • an 8-core CPU consisting of eight general-purpose processor cores still needs to call a graphical program interface such as OpenGL to direct the GPU to perform graphics calculations.
  • a graphical program interface such as OpenGL
  • the main processor 102 can be logically distinguished from the graphics processing unit 104, however this does not mean that they are necessarily different physical hardware.
  • the GPU 104 acts as a processor dedicated to graphics processing, which has more computing cores and thus more powerful parallel computing capabilities than the CPU 102, and is also more suitable for image processing.
  • GPU 104 may have a computational speed at least 100 times higher than CPU 102 for the same image processing, as described in detail later.
  • the powerful parallel image processing capability of the GPU 104 is not fully utilized, resulting in the existing terminal not being able to process images with higher resolution in real time and/or Video (eg, images and/or video with 4K or higher resolution).
  • display 106 of terminal 100 can be coupled to GPU 104 to display various images, text, and the like under the control of GPU 104.
  • Display 106 may include, but is not limited to, a CRT (cathode ray tube) display, an LCD (liquid crystal) display, an LED (light emitting diode) display, an OLED (organic light emitting diode) display, and the like.
  • the content displayed on display 106 can be rotated under the control of GPU 104 to, for example, accommodate the current device orientation of terminal 100.
  • the terminal 100 can be changed from a portrait orientation to a landscape orientation to enable the video to be displayed full screen, occupying the entire screen rather than being displayed on a portion of the screen.
  • the display 106 is not an essential component of the terminal 100. In fact, terminal 100 can externally display or otherwise stream display to a remote display without the need for built-in display 106.
  • the communication unit 108 may be a module or unit that enables the terminal 100 to communicate with an external device. Pass The letter unit 108 can be a wired communication unit or a wireless communication unit or a combination of the two. When the communication unit 108 is a wired communication unit, it may include, but is not limited to, a USB module, an IEEE 1394 module, an Ethernet module, a digital subscriber line (DSL or more generally xDSL) modem, a serial port module, and the like.
  • the communication unit 108 When the communication unit 108 is a wireless communication unit, it may include, for example, but not limited to, various 2G modules (eg, GSM/GPRS modules, etc.), 3G modules (eg, WCDMA modules, CDMA2000 modules, TD-SCDMA modules, etc.) , 4G modules (for example, TD-LTE modules, etc.), long-distance communication modules such as various 5G modules that are developing and will appear.
  • the communication unit 108 may also include a short-range wireless communication module such as, but not limited to, a Wi-Fi module, a Bluetooth module, an NFC module, an RFID module, an infrared module, and the like.
  • the communication unit 108 is not limited to any of the above modules, but may be any module that enables the main processor 102 to communicate with the outside, or even a module having a form of shared memory, so that the main processor 102 can use "write/read" The way to share data with external devices/processors.
  • the communication unit 108 is also not a necessary component of the terminal 100. In fact, the terminal 100 can externally communicate with the communication unit to communicate with other external devices without the built-in communication unit 108.
  • Terminal 100 may also include one or more cameras 110.
  • the terminal 100 includes a first camera 110-1 and a second camera 110-2 (hereinafter, collectively referred to as a camera 110). These two cameras can have different technical parameters.
  • the first camera 110-1 may be located at the back of the terminal 100 to photograph an object in a case where the photographing screen can be simultaneously viewed.
  • the second camera 110-2 may be located on the front side of the terminal 100 to take a self-portrait of the user himself. Further, in other embodiments, the first camera 110-1 may be rotated to simultaneously achieve the photographing functions of the first camera 110-1 and the second camera 110-2.
  • the first camera 110-1 may even include two lenses to achieve 3D photography, obtain sharper images, and the like.
  • the two cameras 110-1 and 110-2 may have different resolutions, zoom range/fixed focal length, shutter speed, aperture, depth of field, sensitivity (ISO), and the like.
  • the connection manner of the two cameras 110 and the main processor 102 is not limited to the manner shown in FIG. 1, but may be respectively connected to different pins of the main processor 102, or both connected to a certain data/control bus. The present disclosure is not limited thereto.
  • the cameras 110-1 and 110-2 are not essential components of the terminal 100.
  • the second camera 110-2 is shown as a dashed box to indicate that it is an optional camera, this does not imply that the camera 1 is a mandatory camera.
  • the terminal 100 can externally access the camera to acquire images and/or video without the built-in cameras 110-1 and 110-2. Further, the terminal 100 can acquire an image/video stream or acquire an image/video stream from the memory 112 through the communication unit 108 without any built-in/external camera.
  • Terminal 100 may also include one or more memories 112.
  • the memory 112 can be a volatile memory or a non-volatile memory.
  • memory 112 may include, but is not limited to, random access memory (RAM), dynamic RAM (DRAM), static RAM (SRAM), synchronous DRAM (SDRAM), cache, registers, and the like.
  • RAM random access memory
  • DRAM dynamic RAM
  • SRAM static RAM
  • SDRAM synchronous DRAM
  • the memory 112 may also include, but is not limited to, one-time programmable read only memory (OTPROM), erasable programmable ROM (EPROM), and electrically erasable programmable ROM (EEPROM). ), mask ROM, flash ROM, flash memory, hard drive, solid state drive, and more.
  • OTPROM one-time programmable read only memory
  • EPROM erasable programmable ROM
  • EEPROM electrically erasable programmable ROM
  • memory 112 may also include, for example, but is not limited to, high density flash memory (CF), secure digital (SD), micro SD, mini SD, extreme speed digital (xD), multimedia card (MMC), or memory stick, and the like.
  • Memory 112 may store instructions for execution by host processor 102 and/or GPU 104 and/or data to be processed or processed. In some embodiments, such as when GPU 104 shares memory 112 with CPU 102, memory 112 may transfer data between GPU 104 and CPU 102 in a manner that passes pointers, such as in a video stream to be processed and/or processed. Image data. For example, as indicated by the dashed arrows between GPU 104 and memory 112 in FIG.
  • GPU 104 may also access instructions and/or data stored in memory 112 through host processor 102. The present disclosure is not limited to this.
  • Terminal 100 may also include various other functional units collectively referred to as other modules 114, including but not limited to: power modules, sensor modules, input and output modules (eg, keyboards, buttons, etc., audio modules), vibration modules, Encryption module and more.
  • modules 114 including but not limited to: power modules, sensor modules, input and output modules (eg, keyboards, buttons, etc., audio modules), vibration modules, Encryption module and more.
  • video stream data captured by camera 110 may be transmitted to main processor 102, GPU 104, and/or memory 112 in a variety of manners, for example, directly over a data bus. It is transmitted to various components or transmitted to the CPU 102 and forwarded by the CPU 102 to other various components. Please note that in some embodiments, the video stream data may not be transmitted to memory 112 and/or CPU 102, but only by GPU 104.
  • step S202 the GPU 104 performs optional coordinate rotation on each frame of image data of the video stream data collected by the camera 110 to convert the image data captured by the camera into a data format that can be correctly processed by the GPU 104 ( For example, transform to the coordinate system of GPU 104).
  • step S203 the GPU 104 extracts each frame of raw image data from the video stream data and transmits it to the main processor 102 and/or the memory 112 in step S204.
  • the main processor 102 may perform a face recognition and/or a face feature point determination algorithm on the original data in step S205 to determine whether a face is included in each frame image, and if Including a human face, one or more of the position, contour, color, brightness, etc. of various facial features (eg, face, eyes, nose, mouth, ears, hair, or eyebrows, etc.) in the face are determined.
  • the main processor 102 can then transmit the above feature data to the GPU 104 in step S206.
  • the face recognition/feature point determination algorithm is executed on the CPU 102 .
  • the GPU 104 itself does not have more control units (e.g., branch prediction, buffer, etc.) like the CPU 102, so when executing such an algorithm GPU 104 is generally not as efficient as CPU 102. Therefore, the algorithm can be executed primarily by the main processor 102 rather than having it execute on the GPU 104. However, in other embodiments, GPU 104 may also be allowed to perform these operations.
  • the algorithm is not the focus of this paper, and there are many algorithms for face recognition, so it will not be described in detail herein, but this does not affect the implementation of the embodiments of the present disclosure.
  • feature points are used herein to represent features on a person's face, it does not mean that it is a single point or pixel in the image, but may represent an area having a certain shape and area, for example, an eye feature point may be included in the image.
  • the GPU 104 may perform image processing (for example, a retouching operation) according to various feature points found by the CPU 102.
  • the image processing may include at least one of contour correction (eg, face-lifting, eye enlargement, etc.), color correction (eg, skin color, eye color, lip color correction, etc.), brightness correction (eg, facial lighting, etc.), and Fuzzy processing (for example, Gaussian blur, or commonly known as "skinning").
  • contour correction eg, face-lifting, eye enlargement, etc.
  • color correction eg, skin color, eye color, lip color correction, etc.
  • brightness correction eg, facial lighting, etc.
  • Fuzzy processing for example, Gaussian blur, or commonly known as "skinning”
  • beauty the overall effect includes, for example, dermabrasion, whitening, etc., specifically may include, for example, bilateral filtering, edge detection, sharpening, skin tone adjustment, and/or other part color/brightness adjustment, etc.
  • the face-lifting operation S207 is performed first, and then the beauty operation S208 is performed, it may be reversed, or even performed in parallel. Moreover, in some embodiments, at least a portion of the beauty operation S208 may not require the presence of a feature point, and thus its or a portion thereof may even be performed prior to feature point validation (ie, prior to step S204).
  • the inventors of the present application found that in a mobile device, there is no click feeling as long as 30 images having a resolution of 4K are displayed in one second.
  • the time for processing the beauty using the GPU 104 is about 7.5 ms (while the CPU 102 is used to perform the same operation for at least 750 ms or more), and the time for processing the face is about 12 ms (including the time when the CPU 102 determines the feature point is about 9.5 ms and the time when the GPU 104 performs the face-lifting operation is about 2.5 ms).
  • step S209 in the case where the GPU 104 first processes the image data of a certain video stream, the GPU 104 can use the processed data or the original data to create a new texture that is constant in width (or resolution).
  • the resolution or width and height usually changes, and at this time. You can recreate the new texture. Although it can sometimes cause a few tenths of a second, the user usually does not notice such a stuck.
  • GPU 104 can continue to render the texture onto the GPUBuffer.
  • GPU 104 may transmit the display data created or updated in step S209 to display 106 to enable the user to see the retouched video on display 106. Meanwhile, in step S210, the GPU 104 may also transmit the display data to the CPU 102 and/or the memory 112 so that the CPU 102 can perform subsequent processing and enable the memory 112 to store the display data, thereby implementing a video recording function.
  • the CPU 102 may separately generate live data for the communication unit 108 according to the display data (for example, package, package, etc. display data in various formats of network streaming) and/or for the camera. 110 tracking instructions to enable live video and/or object tracking, respectively.
  • display data for example, package, package, etc. display data in various formats of network streaming
  • 110 tracking instructions to enable live video and/or object tracking, respectively.
  • the user can turn on the face-lift function when operating the mobile phone, and has no feeling of being stuck in the state of 4K high-definition, and can simultaneously perform live broadcast and video recording. , taking photos, displaying, and rotating the screen.
  • the performance is optimized by a reasonable scheduling of the GPU/CPU.
  • a method of processing a video stream using the GPU 104 of the terminal 100 and a functional configuration of the terminal 100 according to an embodiment of the present disclosure will be described in detail below with reference to FIGS.
  • FIG. 3 is a flow diagram showing a method 300 of processing a video stream using GPU 104, performed in terminal 100, in accordance with an embodiment of the disclosure.
  • method 300 can include steps S310, S320, and S330. According to the present disclosure, some steps of method 300 may be performed separately or in combination, and may be performed in parallel or The sequential execution is not limited to the specific operational sequence shown in FIG. In some embodiments, method 300 can be performed by terminal 100 and/or processor 102 shown in FIG.
  • FIG. 4 is a functional block diagram showing an example terminal 100 in accordance with an embodiment of the present disclosure.
  • the terminal 100 may include an image data acquiring device 150, a feature point determining device 160, and an image processing indicating device 170.
  • the image data obtaining means 150 may be configured to acquire image data of at least one frame of images in the video stream.
  • the image data acquiring device 150 may be a central processing unit (eg, CPU 102) of the terminal 100, a digital signal processor (DSP), a microprocessor, a microcontroller, etc., which may cooperate with, for example, the camera 110 of the terminal 100. Acquiring image data of at least one frame of the video stream. Furthermore, it may cooperate with the communication unit 108 and/or the memory 112 of the terminal 100 to obtain image data of at least one frame image of the video stream transmitted from the external device and/or at least a video stream stored in the local memory. Image data of one frame of image.
  • the feature point determining means 160 can be used to determine at least one feature point in the image data.
  • the feature point determining device 160 may also be a central processing unit (eg, CPU 102) of the terminal 100, a digital signal processor (DSP), a microprocessor, a microcontroller, etc., which may be determined based on an image feature point recognition algorithm. Parameters such as the position and size of feature points in the image. For example, it is possible to determine whether there is a face in the image according to the face feature recognition parameter, and the position, size, color, and the like of each face organ of the face.
  • the image processing instructing means 170 may be configured to instruct the GPU 104 to perform image processing on the image data based on the determined at least one feature point.
  • the image processing indicating device 170 may also be a central processing unit (CPU 102) of the terminal 100, a digital signal processor (DSP), a microprocessor, a microcontroller, etc., which may be through, for example, a graphical programming interface (eg, OpenGL, Direct3D) And so on, instructing the GPU 104 to perform corresponding image processing based on the detected at least one feature point.
  • a graphical programming interface eg, OpenGL, Direct3D
  • the GPU 104 may be instructed to perform blurring processing (eg, Gaussian blur, etc.) where the facial acne is detected in the image, perform brightness correction, color correction, etc., where the skin is detected in the image, and/or when a face is detected
  • blurring processing eg, Gaussian blur, etc.
  • the contour is executed by contour correction (for example, face-lifting, etc.).
  • the terminal 100 may further include other functional units not shown in FIG. 4, such as an operation indicating device.
  • the operation indicating means can be configured to instruct the GPU 104 to perform at least one of a video recording operation, a video display operation, an object tracking operation, or a live video operation for the image processed image data.
  • the operation indicating device may be further configured to: instruct the GPU 104 to perform at least two of a video recording operation, a video display operation, an object tracking operation, or a live video operation in parallel.
  • terminal 100 may further include other functional units not shown in FIG. 4, however, since it does not affect those skilled in the art to understand the embodiments of the present disclosure, it is omitted in FIG.
  • terminal 100 may also include one or more of the following functional units: a power source, a memory (eg, memory 112), a data bus, an antenna, a wireless transceiver (eg, communication unit 108), and the like.
  • a method 300 and a terminal 100 for processing a video stream using the GPU 104 performed on the terminal 100 according to an embodiment of the present disclosure will be described in detail below with reference to FIGS. 3 and 4.
  • the method 300 begins in step S310, in which image data of at least one frame image in a video stream can be acquired by the image data acquiring device 150 of the terminal 100.
  • step S320 at least one feature point in the image data may be determined by the feature point determining means 160 of the terminal 100.
  • the image processing instruction means 170 of the terminal 100 may instruct the GPU 104 to perform image processing on the image data based on at least one feature point.
  • the video stream can be a video stream having a 4K or higher resolution.
  • the at least one frame of image may comprise a human face, and the at least one feature point may comprise at least one of the following: a face, an eye, a nose, a mouth, an ear, a hair, or an eyebrow.
  • the image processing may include at least one of the following: contour correction, color correction, brightness correction, and blurring.
  • the method 300 can further include instructing the GPU 104 to perform at least one of a video recording operation, a video display operation, an object tracking operation, or a live video operation for the image processed image data.
  • instructing GPU 104 to perform at least one of a video recording operation, a video display operation, an object tracking operation, or a live video operation may include instructing GPU 104 to perform a video recording operation, a video display operation, an object tracking operation in parallel Or at least two of the live video operations.
  • the image data may be coordinate transformed before the GPU 104 performs image processing on the image data based on the at least one feature point to enable image processing by the GPU 104.
  • a new texture may be created, otherwise the aspect ratio of the image data is not In the case of a change, only existing textures can be updated.
  • the change in the aspect ratio of the image data may be caused by changing the camera and/or rotating the terminal.
  • the user can operate the hand by using the method, terminal, and/or computer program according to an embodiment of the present disclosure.
  • the face-lifting function is turned on, and there is no sensation in the state of 4K high-definition, and the live broadcast, video recording, photographing, display, and rotating screen can be simultaneously performed.
  • the performance is optimized by a reasonable scheduling of the GPU/CPU.
  • functions described herein as being implemented by pure hardware, software and/or firmware may also be implemented by means of dedicated hardware, a combination of general hardware and software, and the like.
  • functions described as being implemented by dedicated hardware eg, Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), etc.
  • general purpose hardware eg, central processing unit (CPU), digital signal processing (DSP) is implemented in a way that is combined with software and vice versa.
  • a general-purpose processor for example, a CPU, a DSP, or the like
  • an analog-to-digital conversion circuit for example, a CPU, a DSP, or the like
  • an amplifying circuit for example, a senor, or the like
  • a Bluetooth, NFC-related processing for example, a Bluetooth, NFC-related processing.
  • Software to implement and vice versa for example, described as a function implemented by a Bluetooth module, an NFC chip/coil, or the like, a general-purpose processor (for example, a CPU, a DSP, or the like) may be combined with an analog-to-digital conversion circuit, an amplifying circuit, an antenna, and the like, and a Bluetooth, NFC-related processing.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种在终端(100)的处理器(102)处使用图形处理单元GPU(104)来处理视频流的方法(300)、相应终端(100)、计算机程序和计算机程序产品。所述方法(300)包括:获取(S310)所述视频流中的至少一帧图像的图像数据;确定(S320)所述图像数据中的至少一个特征点;以及指示(S330)所述GPU(104)基于所述至少一个特征点对所述图像数据进行图像处理。

Description

处理视频流的方法和终端
版权申明
本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或者专利披露。
技术领域
本公开涉及图像处理领域,更具体地涉及使用图形处理单元(GPU)来处理视频流的方法和终端。
背景技术
随着终端,尤其是可移动终端(例如,智能手机、平板电脑等)的日益普及,其已经成为了人们在生产、生活中不可或缺的一部分。终端提供了丰富的各种功能,包括例如针对图像/视频的录制、处理、和直播等等。作为视频处理应用的重要功能之一是为用户提供修图功能(或更通俗地,美颜瘦脸功能)。用户可以通过对摄像头采集到的原始图像进行修正处理,来实现令用户满意的图像/视频效果。
发明内容
根据本公开的第一方面,提出了一种在终端的处理器处使用图形处理单元(GPU)来处理视频流的方法。该方法包括:获取所述视频流中的至少一帧图像的图像数据;确定所述图像数据中的至少一个特征点;以及指示所述GPU基于所述至少一个特征点对所述图像数据进行图像处理。
在一些实施例中,所述视频流是具有4K或更高分辨率的视频流。在一些实施例中,所述至少一帧图像包括人脸,以及所述至少一个特征点包括以下至少一项:脸部、眼睛、鼻子、嘴、耳朵、头发、或眉毛。在一些实施例中,所述图像处理包括以下至少一项:轮廓修正、颜色修正、亮度修正、以及模糊处理。在一些实施例中,所述方法还包括:针对经过图像处理的所述图像数据,指示所述GPU执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少一项。在一些 实施例中,指示所述GPU执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少一项包括:指示所述GPU并行执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少两项。在一些实施例中,在所述GPU基于所述至少一个特征点对所述图像数据进行图像处理之前,对所述图像数据进行坐标变换,以使其能够由所述GPU进行图像处理。在一些实施例中,在所述GPU基于所述至少一个特征点对所述图像数据进行图像处理之后,如果所述GPU是首次针对所述视频流进行图像处理,则创建新的纹理,否则在所述图像数据的宽高比不改变的情况下,仅更新已有的纹理。在一些实施例中,如果所述GPU不是首次针对所述视频流进行图像处理且所述图像数据的宽高比发生了改变,则创建新的纹理。在一些实施例中,所述图像数据的宽高比的改变是由变更摄像头和/或旋转所述终端引起的。
根据本公开的第二方面,提出了一种使用图形处理单元(GPU)来处理视频流的终端。该终端包括:图像数据获取装置,用于获取所述视频流中的至少一帧图像的图像数据;特征点确定装置,用于确定所述图像数据中的至少一个特征点;以及图像处理指示装置,用于指示所述GPU基于所述至少一个特征点对所述图像数据进行图像处理。
在一些实施例中,所述视频流是具有4K或更高分辨率的视频流。在一些实施例中,所述至少一帧图像包括人脸,以及所述至少一个特征点包括以下至少一项:脸部、眼睛、鼻子、嘴、耳朵、头发、或眉毛。在一些实施例中,所述图像处理包括以下至少一项:轮廓修正、颜色修正、亮度修正、以及模糊处理。在一些实施例中,所述终端还包括:操作指示装置,用于针对经过图像处理的所述图像数据,指示所述GPU执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少一项。在一些实施例中,所述操作指示装置还用于:指示所述GPU并行执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少两项。在一些实施例中,在所述GPU基于所述至少一个特征点对所述图像数据进行图像处理之前,对所述图像数据进行坐标变换,以使其能够由所述GPU进行图像处理。在一些实施例中,在所述GPU基于所述至少一个特征点对所述图像数据进行图像处理之后,如果所述GPU是首次针对所述视频流进行图像处理,则创建新的纹理,否则在所述图像数据的宽高比不改变的情况下,仅更新已有的纹理。在一些实施例中,如 果所述GPU不是首次针对所述视频流进行图像处理且所述图像数据的宽高比发生了改变,则创建新的纹理。在一些实施例中,所述图像数据的宽高比的改变是由变更摄像头和/或旋转所述终端引起的。
根据本公开的第三方面,提出了一种用于处理视频流的终端。该终端包括:处理器;图形处理单元(GPU);存储器,存储指令,所述指令在由所述处理器执行时使得所述处理器:获取所述视频流中的至少一帧图像的图像数据;确定所述图像数据中的至少一个特征点;以及指示所述GPU基于所述至少一个特征点对所述图像数据进行图像处理。
在一些实施例中,所述视频流是具有4K或更高分辨率的视频流。在一些实施例中,所述至少一帧图像包括人脸,以及所述至少一个特征点包括以下至少一项:脸部、眼睛、鼻子、嘴、耳朵、头发、或眉毛。在一些实施例中,所述图像处理包括以下至少一项:轮廓修正、颜色修正、亮度修正、以及模糊处理。在一些实施例中,所述指令在由所述处理器执行时,还使得所述处理器:针对经过图像处理的所述图像数据,指示所述GPU执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少一项。在一些实施例中,所述指令在由所述处理器执行时,还使得所述处理器:针对经过图像处理的所述图像数据,指示所述GPU并行执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少两项。在一些实施例中,在所述GPU基于所述至少一个特征点对所述图像数据进行图像处理之前,对所述图像数据进行坐标变换,以使其能够由所述GPU进行图像处理。在一些实施例中,在所述GPU基于所述至少一个特征点对所述图像数据进行图像处理之后,如果所述GPU是首次针对所述视频流进行图像处理,则创建新的纹理,否则在所述图像数据的宽高比不改变的情况下,仅更新已有的纹理。在一些实施例中,如果所述GPU不是首次针对所述视频流进行图像处理且所述图像数据的宽高比发生了改变,则创建新的纹理。在一些实施例中,所述图像数据的宽高比的改变是由变更摄像头和/或旋转所述终端引起的。
根据本公开的第四方面,提出了一种计算机程序,在由处理器执行时使得所述处理器执行根据本公开的第一方面中任一项所述的方法。
根据本公开的第五方面,提出了一种计算机程序产品,包括根据本公开的第四方面所述的计算机程序。
附图说明
为了更完整地理解本公开实施例及其优势,现在将参考结合附图的以下描述,其中:
图1是示出了根据本公开实施例的终端的硬件结构示意图。
图2是示出了图1所示的终端中各组件之间数据/指令交互的示意图。
图3是示出了根据本公开实施例的在终端的主处理器处执行的使用GPU来处理视频流的方法的流程图。
图4是示出了根据本公开实施例的用于执行图3所示的方法的终端的示例功能架构框图。
具体实施方式
根据结合附图对本公开示例性实施例的以下详细描述,本公开的其它方面、优势和突出特征对于本领域技术人员将变得显而易见。
在本公开中,术语“包括”和“含有”及其派生词意为包括而非限制;术语“或”是包含性的,意为和/或。
在本说明书中,下述用于描述本公开原理的各种实施例只是说明,不应该以任何方式解释为限制公开的范围。参照附图的下述描述用于帮助全面理解由权利要求及其等同物限定的本公开的示例性实施例。下述描述包括多种具体细节来帮助理解,但这些细节应认为仅仅是示例性的。因此,本领域普通技术人员应认识到,在不脱离本公开的范围和精神的情况下,可以对本文中描述的实施例进行多种改变和修改。此外,为了清楚和简洁起见,省略了公知功能和结构的描述。此外,贯穿附图,相同附图标记用于相同或相似的功能和操作。
在正式介绍本公开各实施例之前,将首先大致说明本文中可能用到的各种术语。
1080P:其为美国电影电视工程师协会(SMPTE)制定的高清数字电视的格式标准,有效分辨率为:1920×1080。其是一种在逐行扫描下达到1920×1080的分辨率的显示格式。
4K:4K分辨率并不特指某个特定数值,它指的是水平方向约有4000个像素点左右,根据不同的应用领域而存在细微差距。4K分辨率发展的趋势即4096×2160 的像素分辨率,它是2K投影机和高清电视分辨率的4倍,属于超高清分辨率。比如常说的4K电影,其影片分辨率4096×2160,是由4K摄像机的成像格式决定。而在人们讨论的电视领域,由于屏幕16∶9已成主流,因此电视屏的4K指的是3840×2160的物理分辨率,相当于给1920×1080(即,1080P)在横向和竖向上各翻了一番,仍然是标准的16∶9规格。在本公开的上下文中,如无特别指明,所提到的4K包含各种已知的或事实上的4K分辨率,包括但不限于前述两种4K分辨率。此外,类似地目前还定义有2K、8K等标准。
对象跟踪:指的是当某一对象出现在摄像头的视野/画面中时,可以通过对象识别算法来确定该对象的存在,并指示摄像头去跟拍该确定对象,直到对象超出摄像头的可拍摄范围为止。对象跟踪广泛应用于视频监控、无人机驾驶等领域中。
本申请的发明人注意到:现有的视频美颜软件的美颜瘦脸功能的相关操作是在中央处理单元(CPU)上执行的,其最多仅能支持1080P的视频分辨率。即使在显示流程方面采用GPU来执行,其也无法实现针对具有4K或以上分辨率的视频的实时修图处理。此外,其更无法支持以下各种操作的并发执行:转屏、拍照、录像、直播、显示和/或对象跟踪。更一般地,本申请的公开人注意到:现有的终端上并未充分利用GPU强大的并行图像处理能力,而使得CPU负担了过多的图像处理工作,造成了无法充分利用现有硬件资源的问题。
本实施例中,所述终端可以为可移动终端,如手机、笔记本电脑、手持拍摄装置等;也可为桌上计算机等。
大体上,本公开实施例提出了一种利用GPU来实现对视频流中的图像帧进行高速处理的方案。在该方案中,通过GPU对图像帧中的数据进行坐标旋转、美颜,并在CPU的辅助下进行瘦脸,能够显著提升针对高清晰度(例如,4K或更高分辨率)的人脸图像的高速处理。此外,通过使用GPU和CPU上的纹理数据的替换、对资源合理调度,来解决在4K或更高分辨率的条件下能够在美颜瘦脸之后还能并发地执行显示、录像、拍照、直播和/或对象跟踪(Tracking)等功能。
首先,将结合图1来详细描述根据本公开实施例的用于进行视频流处理的终端的构架。图1示出了根据本公开实施例的用于进行视频流处理的终端100的硬件示意图。如图1所示,终端100可以包括主处理器(以下有时也可称为中央处理单元 或CPU或host processor)102、图形处理单元(以下有时也称为GPU)104和显示器106。
首先,请注意到:尽管图1中将主处理器102和图形处理单元104示出为单独的两个模块,然而实际上本公开不限于此。在一些实施例中,图形处理单元104可以作为主处理器102中内嵌的图形处理核心。例如,在华为推出的手机Mate8中,其主处理器麒麟950包括了8个通用处理器核心(由ARM公司推出的4个CortexA72和4个Cortex A53)以及1个GPU核心(由ARM公司推出的Mali T880)。因此,在这样的示例中,尽管主处理器102和图形处理单元104实际上为一个物理硬件,但在逻辑上依然可将其视为两个单独的逻辑模块。实际上,对于程序员而言,由8个通用处理器核心组成的8核CPU依然需要调用诸如OpenGL之类的图形程序接口才能指挥GPU进行图形计算工作。因此,可以在逻辑上将主处理器102与图形处理单元104加以区分,然而这并不代表它们二者必然是不同的物理硬件。
GPU 104作为专门用于图形处理的处理器,其与CPU 102相比,具有更多的计算核心以及因此具有更强大的并行计算能力,也更适用于图像处理。事实上,在图像处理方面,如后文中详细描述的,针对相同的图像处理,GPU 104可以具有比CPU102至少高100倍的计算速度。然而,如前文所述,在现有的图像处理过程中,并没有完全利用GPU 104这种强大的并行图像处理能力,从而导致了现有的终端无法实时处理分辨率较大的图像和/或视频(例如,具有4K或更高分辨率的图像和/或视频)。
此外,终端100的显示器106可以与GPU 104相连,以在GPU 104的控制下显示各种图像、文字等。显示器106可以包括(但不限于):CRT(阴极射线管)显示器、LCD(液晶)显示器、LED(发光二极管)显示器、OLED(有机发光二极管)显示器等等。此外,显示器106上显示的内容可以在GPU 104的控制下进行旋转,以例如适应终端100的当前设备定向。例如,在观看视频时,可以将终端100从纵向定向改变为横向定向,以使得视频能够全屏显示,占据整个屏幕,而非在屏幕的一部分上显示。此外,请注意显示器106不是终端100的必要组件。事实上,终端100可以外接显示器或以其他方式向远程显示器流传输显示画面,而无需内置显示器106。
通信单元108可以是使得终端100能够与外部设备进行通信的模块或单元。通 信单元108可以是有线通信单元或无线通信单元或这二者的结合。当通信单元108是有线通信单元时,其可以包括(但不限于):USB模块、IEEE 1394模块、以太网模块、数字订户线路(DSL或更一般地xDSL)调制解调器、串口模块等等。当通信单元108是无线通信单元时,其可以包括例如(但不限于):各种2G模块(例如,GSM/GPRS模块等)、3G模块(例如WCDMA模块、CDMA2000模块、TD-SCDMA模块等)、4G模块(例如,TD-LTE模块等)、正在发展中且将要出现的各种5G模块等远距离通信模块。此外,通信单元108还可以包括近距离无线通信模块,例如(但不限于):Wi-Fi模块、蓝牙模块、NFC模块、RFID模块、红外模块等。事实上,通信单元108不限于上述任何模块,而是可以使得主处理器102能够与外部通信的任何模块,甚至可以是具有共享内存形式的模块,使得主处理器102可以用“写入/读取”的方式与外部设备/处理器共享数据。同样地,请注意通信单元108也不是终端100的必要组件。事实上,终端100可以外接通信单元以与其他外部设备通信,而无需内置通信单元108。
终端100还可以包括一个或多个摄像头110。在图1所示示例中,终端100包括了第一摄像头110-1和第二摄像头110-2(下文中,将其统称为摄像头110)。这两个摄像头可以具有不同的技术参数。例如,第一摄像头110-1可以位于终端100的背部,以在能够同时查看拍摄画面的情况下拍摄对象。第二摄像头110-2可以位于终端100的正面,以对用户自身进行自拍。此外,在另一些实施例中,第一摄像头110-1可以旋转以同时实现第一摄像头110-1和第二摄像头110-2的拍摄功能。在另一些实施例中,第一摄像头110-1甚至可以包括两个镜头,以实现3D拍摄、获取更清晰的图像等。这两个摄像头110-1和110-2可以具有不同的分辨率、变焦范围/固定焦距、快门速度、光圈、景深、感光度(ISO)等。此外,这两个摄像头110与主处理器102的连接方式也不限于图1所示的方式,而是可以分别与主处理器102的不同管脚相连,或者都连接到某个数据/控制总线上,本公开不限于此。同样地,请注意摄像头110-1和110-2也不是终端100的必要组件。尽管将第二摄像头110-2示出为虚线框,以表明其为可选摄像头,然而这并不暗示着摄像头1是必选摄像头。事实上,终端100可以外接摄像头以获取图像和/或视频,而无需内置摄像头110-1和110-2。此外,终端100甚至可以通过通信单元108来获取图像/视频流或从存储器112中获取图像/视频流,而无需任何内置/外部摄像头。
终端100还可以包括一个或多个存储器112。存储器112可以是易失性存储器或非易失性存储器。例如,作为易失性存储器,存储器112可以包括(但不限于):随机存取存储器(RAM)、动态RAM(DRAM)、静态RAM(SRAM)、同步DRAM(SDRAM)、高速缓存、寄存器等。例如,作为非易失性存储器,存储器112还可以包括(但不限于):一次性可编程只读存储器(OTPROM)、可擦除可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)、掩膜ROM、闪存ROM、闪存、硬盘驱动器、固态驱动器等等。此外,存储器112还可以包括例如(但不限于)高密度闪存(CF)、安全数字(SD)、微型SD、迷你SD、极速数字(xD)、多媒体卡(MMC)、或记忆棒等等。存储器112可以存储供主处理器102和/或GPU 104所执行的指令和/或要处理或已处理的数据。在一些实施例中,例如当GPU 104与CPU 102共享存储器112时,存储器112可以用传递指针的方式在GPU 104和CPU 102之间传输数据,例如待处理和/或已处理的视频流中的图像数据。例如,如图1中GPU 104与存储器112之间的虚线箭头所示,其可以可选地在物理上直接连接到存储器112,从而与CPU 102以时分的方式来共享存储器。此外,GPU 104也可以通过主处理器102来访问存储器112中存储的指令和/或数据。本公开不限于此。
终端100还可以包括统称为其它模块114的各种其它功能单元,包括(但不限于):例如,电源模块、传感器模块、输入输出模块(例如,键盘、按钮等、音频模块)、振动模块、加密模块等等。然而它们的存在与否并不影响对本公开实施例的主旨的理解和实现,因此此处省略了对它们的详细描述。
接下来将参考图2来详细描述在图1所示的终端100中各组件之间的数据/指令交互,该交互使得终端100能够使用其GPU 104来处理视频流。
在图2所示的实施例中,在步骤S201处,由摄像头110所拍摄的视频流数据可以通过各种方式向主处理器102、GPU 104和/或存储器112传输,例如,通过数据总线直接向各个组件传输,或传输给CPU 102,并由CPU 102向其它各个组件转发。请注意:在一些实施例中,该视频流数据可以不向存储器112和/或CPU 102传输,而仅由GPU 104来获取。
接下来,在步骤S202,GPU 104对摄像头110采集到的视频流数据的每一帧图像数据执行可选的坐标旋转,以将摄像头拍摄到的图像数据变换为可由GPU 104正确处理的数据格式(例如,变换到GPU 104的坐标系下)。然后,在步骤S203,GPU 104从该视频流数据中提取出每一帧原始图像数据,并在步骤S204向主处理器102和/或存储器112发送。
在接收到该原始图像数据之后,主处理器102可以在步骤S205中对该原始数据执行人脸识别和/或人脸特征点确定算法,以确定每一帧图像中是否包含人脸,以及如果包含人脸,则确定人脸中各个脸部特征(例如,脸部、眼睛、鼻子、嘴、耳朵、头发、或眉毛等)的位置、轮廓、颜色、亮度等中的一项或多项。然后,主处理器102可以在步骤S206中将上述特征数据发送给GPU 104。该人脸识别/特征点确定算法之所以在CPU 102上执行的原因在于GPU 104本身不像CPU 102一样具备较多的控制单元(例如,分支预测、缓存等),因此在执行这类算法时,GPU 104通常并不如CPU 102一样高效。所以,该算法主要可以由主处理器102来执行,而不是让其在GPU 104上执行。然而,在另一些实施例中,也可以让GPU 104来执行这些操作。此外,该算法并不是本文关注的重点,且存在多种用于人脸识别的算法,因此本文不再对其详细描述,然而这并不影响本领域技术人员实现本公开实施例。此外,尽管本文中使用特征点来代表人脸上的特征,但不代表其就是图像中单一的点或像素,而是可以代表具有一定形状和面积的区域,例如眼睛特征点可以包括与图像中眼睛部分大致相同形状和面积的像素块。
接下来,在步骤S207和S208中,GPU 104可以根据CPU 102找到的各个特征点来执行图像处理(例如,修图操作)。图像处理可以包括以下至少一项:轮廓修正(例如,瘦脸、眼部放大等)、颜色修正(例如,肤色、眼睛颜色、嘴唇颜色修正等)、亮度修正(例如,面部打光等)、以及模糊处理(例如,高斯模糊,或者俗称为“磨皮”)。此外,诸如瘦脸、美颜(整体效果包括例如磨皮、美白等,具体地可以包括例如双边滤波、边缘检测、锐化、肤色调节、和/或其他部位颜色/亮度调节等中的一项或多项)之类的修图操作的顺序并无一定之规。尽管在图2所示实施例中,先执行了瘦脸操作S207,然后执行了美颜操作S208,然而也可以颠倒执行,或甚至并行执行。此外,在一些实施例中,至少部分美颜操作S208可以不需要特征点的存在,且因此其或其一部分甚至可以在特征点确认之前(即,步骤S204之前)进行。
此外,本申请发明人发现:在移动设备中,一秒钟只要显示30张具有4K分辨率的图像就毫无卡顿感。根据本公开实施例,使用GPU 104来处理美颜的时间约为7.5ms(而使用CPU 102来执行同样操作至少要750ms以上),处理瘦脸的时间约为 12ms(包括CPU 102确定特征点的时间约9.5ms和GPU 104执行瘦脸操作的时间约2.5ms)。从而,可以实现每帧4K图像处理仅需要约19.5ms,从而使得视频流没有卡顿感。
然后,在步骤S209,在GPU 104首次处理某一视频流的图像数据的情况下,GPU 104可以用经处理的数据或原始数据来创建一个新的纹理,在宽高(或分辨率)不变的情况下,以后来一次数据只要更新一次纹理数据,而无需重新创建新的纹理。从而减少了数据资源的反复创建,能够进一步提升计算性能。此外,在例如切换前后摄像头110-1和110-2的情况下或在旋转终端100(例如,纵向变横向或反之)的情况下,分辨率(或宽高)通常会发生改变,且此时可以重新创建新的纹理。尽管有时可能会造成零点几秒的卡顿,但是用户通常也不会注意到这样的卡顿。在纹理数据处理完成之后,GPU 104可以继续用纹理去渲染到GPUBuffer上。
在步骤S210,GPU 104可以向显示器106发送在步骤S209中创建或更新的显示数据,以使得用户能够在显示器106上看到经过修图的视频。同时,在步骤S210中,GPU 104还可以向CPU 102和/或存储器112发送该显示数据,以使得CPU 102可以执行后续处理,并使得存储器112能够存储该显示数据,从而实现视频录制功能。
在步骤S211和/或S212,CPU 102可以根据该显示数据来分别生成针对通信单元108的直播数据(例如,将显示数据按网络流传输的各种格式加以封装、打包等)和/或针对摄像头110的跟踪指令,以分别实现视频直播和/或对象跟踪。
至此,已结合图1和图2详细描述根据本公开实施例的使用GPU 104来处理视频流的总体流程。通过使用根据本公开实施例的方法、终端和/或计算机程序,用户可以在操作手机的时候开启美颜瘦脸功能,并在4K高清的状态下毫无卡顿感,还能同时进行直播、录像、拍照、显示以及旋转屏幕等操作。从而,通过GPU/CPU的合理调度的方式来让性能达到最优。
以下将结合图3~4来详细描述根据本公开实施例的终端100的使用GPU 104来处理视频流的方法以及终端100的功能构造。
图3是示出了根据本公开实施例的在终端100中执行的使用GPU 104来处理视频流的方法300的流程图。如图3所示,方法300可以包括步骤S310、S320和S330。根据本公开,方法300的一些步骤可以单独执行或组合执行,以及可以并行执行或 顺序执行,并不局限于图3所示的具体操作顺序。在一些实施例中,方法300可以由图1所示的终端100和/或处理器102来执行。
图4是示出了根据本公开实施例的示例终端100的功能框图。如图4所示,终端100可以包括:图像数据获取装置150、特征点确定装置160和图像处理指示装置170。
图像数据获取装置150可以用于获取视频流中的至少一帧图像的图像数据。图像数据获取装置150可以是终端100的中央处理单元(例如,CPU 102)、数字信号处理器(DSP)、微处理器、微控制器等等,其可以与例如终端100的摄像头110相配合,获取视频流中的至少一帧图像的图像数据。此外,其也可以与终端100的通信单元108和/或存储器112相配合,获得从外部设备传送来的视频流的至少一帧图像的图像数据和/或存储在本地存储器中的视频流的至少一帧图像的图像数据。
特征点确定装置160可以用于确定该图像数据中的至少一个特征点。特征点确定装置160也可以是终端100的中央处理单元(例如,CPU 102)、数字信号处理器(DSP)、微处理器、微控制器等等,其可以基于图像特征点识别算法,来确定图像中的特征点的位置、大小等参数。例如,可以根据人脸特征识别参数来确定图像中是否存在人脸,以及人脸的各个面部器官的位置、大小、颜色等等。
图像处理指示装置170可以用于指示GPU 104基于所确定的至少一个特征点对该图像数据进行图像处理。图像处理指示装置170也可以是终端100的中央处理单元(CPU 102)、数字信号处理器(DSP)、微处理器、微控制器等等,其可以通过例如图形编程接口(例如,OpenGL、Direct3D等),指示GPU 104根据所检测到的至少一个特征点来执行相应图像处理。例如,可以指示GPU 104在图像中检测到面部粉刺的地方执行模糊处理(例如,高斯模糊等),在图像中检测到皮肤的地方执行亮度修正、颜色修正等处理,和/或在检测到面部轮廓的地方执行轮廓修正(例如,瘦脸等)处理。
此外,终端100还可以包括图4中未示出的其它功能单元,例如操作指示装置。在一些实施例中,操作指示装置可以用于针对经过图像处理的图像数据,指示GPU104来执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少一项。此外,该操作指示装置还可以用于:指示GPU 104并行执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少两项。
此外,终端100还可以包括图4中未示出的其他功能单元,然而由于其并不影响本领域技术人员理解本公开的实施方式,因此在图4中加以省略。例如,终端100还可以包括以下一项或多项功能单元:电源、存储器(例如,存储器112)、数据总线、天线、无线收发信机(例如,通信单元108)等等。
以下将结合图3和图4,对根据本公开实施例的在终端100上执行的使用GPU104来处理视频流的方法300和终端100进行详细的描述。
方法300开始于步骤S310,在步骤S310中,可以由终端100的图像数据获取装置150获取视频流中的至少一帧图像的图像数据。
在步骤S320中,可以由终端100的特征点确定装置160确定该图像数据中的至少一个特征点。
在步骤S330中,可以由终端100的图像处理指示装置170指示GPU 104基于至少一个特征点对图像数据进行图像处理。
在一些实施例中,该视频流可以是具有4K或更高分辨率的视频流。在一些实施例中,该至少一帧图像可以包括人脸,以及至少一个特征点可以包括以下至少一项:脸部、眼睛、鼻子、嘴、耳朵、头发、或眉毛。在一些实施例中,图像处理可以包括以下至少一项:轮廓修正、颜色修正、亮度修正、以及模糊处理。在一些实施例中,方法300还可以包括:针对经过图像处理的图像数据,指示GPU 104执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少一项。在一些实施例中,指示GPU 104执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少一项可以包括:指示GPU 104并行执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少两项。在一些实施例中,在GPU 104基于至少一个特征点对图像数据进行图像处理之前,可以对图像数据进行坐标变换,以使其能够由GPU 104进行图像处理。在一些实施例中,在GPU 104基于至少一个特征点对图像数据进行图像处理之后,如果GPU 104是首次针对视频流进行图像处理,则可以创建新的纹理,否则在图像数据的宽高比不改变的情况下,仅可以更新已有的纹理。在一些实施例中,如果GPU 104不是首次针对视频流进行图像处理且图像数据的宽高比发生了改变,则可以创建新的纹理。在一些实施例中,图像数据的宽高比的改变可以是由变更摄像头和/或旋转终端引起的。
通过使用根据本公开实施例的方法、终端和/或计算机程序,用户可以在操作手 机的时候开启美颜瘦脸功能,并在4K高清的状态下毫无卡顿感,还能同时进行直播、录像、拍照、显示以及旋转屏幕等操作。从而,通过GPU/CPU的合理调度的方式来让性能达到最优。
需要注意的是,在本文中被描述为通过纯硬件、纯软件和/或固件来实现的功能,也可以通过专用硬件、通用硬件与软件的结合等方式来实现。例如,被描述为通过专用硬件(例如,现场可编程门阵列(FPGA)、专用集成电路(ASIC)等)来实现的功能,可以由通用硬件(例如,中央处理单元(CPU)、数字信号处理器(DSP))与软件的结合的方式来实现,反之亦然。此外,例如描述为通过蓝牙模块、NFC芯片/线圈等实现的功能,也可以由通用处理器(例如,CPU、DSP等)结合模数转换电路、放大电路、天线等硬件以及蓝牙、NFC相关处理软件来实现,反之亦然。
尽管已经参照本公开的特定示例性实施例示出并描述了本公开,但是本领域技术人员应该理解,在不背离所附权利要求及其等同物限定的本公开的精神和范围的情况下,可以对本公开进行形式和细节上的多种改变。因此,本公开的范围不应该限于上述实施例,而是应该不仅由所附权利要求来进行确定,还由所附权利要求的等同物来进行限定。

Claims (31)

  1. 一种在终端的处理器处使用图形处理单元GPU来处理视频流的方法,包括:
    获取所述视频流中的至少一帧图像的图像数据;
    确定所述图像数据中的至少一个特征点;以及
    指示所述GPU基于所述至少一个特征点对所述图像数据进行图像处理。
  2. 根据权利要求1所述的方法,其中,所述视频流是具有4K或更高分辨率的视频流。
  3. 根据权利要求1所述的方法,其中,所述至少一帧图像包括人脸,以及所述至少一个特征点包括以下至少一项:脸部、眼睛、鼻子、嘴、耳朵、头发、或眉毛。
  4. 根据权利要求1所述的方法,其中,所述图像处理包括以下至少一项:轮廓修正、颜色修正、亮度修正、以及模糊处理。
  5. 根据权利要求1所述的方法,还包括:针对经过图像处理的所述图像数据,指示所述GPU执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少一项。
  6. 根据权利要求5所述的方法,其中,指示所述GPU执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少一项包括:指示所述GPU并行执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少两项。
  7. 根据权利要求1所述的方法,其中,在所述GPU基于所述至少一个特征点对所述图像数据进行图像处理之前,对所述图像数据进行坐标变换,以使其能够由所述GPU进行图像处理。
  8. 根据权利要求1所述的方法,其中,在所述GPU基于所述至少一个特征点对所述图像数据进行图像处理之后,如果所述GPU是首次针对所述视频流进行图像处理,则创建新的纹理,否则在所述图像数据的宽高比不改变的情况下,仅更新已有的纹理。
  9. 根据权利要求8所述的方法,其中,如果所述GPU不是首次针对所述视频流进行图像处理且所述图像数据的宽高比发生了改变,则创建新的纹理。
  10. 根据权利要求8所述的方法,其中,所述图像数据的宽高比的改变是由变 更摄像头和/或旋转所述终端引起的。
  11. 一种使用图形处理单元GPU来处理视频流的终端,包括:
    图像数据获取装置,用于获取所述视频流中的至少一帧图像的图像数据;
    特征点确定装置,用于确定所述图像数据中的至少一个特征点;以及
    图像处理指示装置,用于指示所述GPU基于所述至少一个特征点对所述图像数据进行图像处理。
  12. 根据权利要求11所述的终端,其中,所述视频流是具有4K或更高分辨率的视频流。
  13. 根据权利要求11所述的终端,其中,所述至少一帧图像包括人脸,以及所述至少一个特征点包括以下至少一项:脸部、眼睛、鼻子、嘴、耳朵、头发、或眉毛。
  14. 根据权利要求11所述的终端,其中,所述图像处理包括以下至少一项:轮廓修正、颜色修正、亮度修正、以及模糊处理。
  15. 根据权利要求11所述的终端,还包括:操作指示装置,用于针对经过图像处理的所述图像数据,指示所述GPU执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少一项。
  16. 根据权利要求15所述的终端,其中,所述操作指示装置还用于:指示所述GPU并行执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少两项。
  17. 根据权利要求11所述的终端,其中,在所述GPU基于所述至少一个特征点对所述图像数据进行图像处理之前,对所述图像数据进行坐标变换,以使其能够由所述GPU进行图像处理。
  18. 根据权利要求11所述的终端,其中,在所述GPU基于所述至少一个特征点对所述图像数据进行图像处理之后,如果所述GPU是首次针对所述视频流进行图像处理,则创建新的纹理,否则在所述图像数据的宽高比不改变的情况下,仅更新已有的纹理。
  19. 根据权利要求18所述的终端,其中,如果所述GPU不是首次针对所述视频流进行图像处理且所述图像数据的宽高比发生了改变,则创建新的纹理。
  20. 根据权利要求18所述的终端,其中,所述图像数据的宽高比的改变是由变 更摄像头和/或旋转所述终端引起的。
  21. 一种用于处理视频流的终端,包括:
    处理器;
    图形处理单元GPU;
    存储器,存储指令,所述指令在由所述处理器执行时使得所述处理器:
    获取所述视频流中的至少一帧图像的图像数据;
    确定所述图像数据中的至少一个特征点;以及
    指示所述GPU基于所述至少一个特征点对所述图像数据进行图像处理。
  22. 根据权利要求21所述的终端,其中,所述视频流是具有4K或更高分辨率的视频流。
  23. 根据权利要求21所述的终端,其中,所述至少一帧图像包括人脸,以及所述至少一个特征点包括以下至少一项:脸部、眼睛、鼻子、嘴、耳朵、头发、或眉毛。
  24. 根据权利要求21所述的终端,其中,所述图像处理包括以下至少一项:轮廓修正、颜色修正、亮度修正、以及模糊处理。
  25. 根据权利要求21所述的终端,其中,所述指令在由所述处理器执行时,还使得所述处理器:针对经过图像处理的所述图像数据,指示所述GPU执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少一项。
  26. 根据权利要求25所述的终端,其中,所述指令在由所述处理器执行时,还使得所述处理器:针对经过图像处理的所述图像数据,指示所述GPU并行执行视频录制操作、视频显示操作、对象跟踪操作、或视频直播操作中的至少两项。
  27. 根据权利要求21所述的终端,其中,在所述GPU基于所述至少一个特征点对所述图像数据进行图像处理之前,对所述图像数据进行坐标变换,以使其能够由所述GPU进行图像处理。
  28. 根据权利要求21所述的终端,其中,在所述GPU基于所述至少一个特征点对所述图像数据进行图像处理之后,如果所述GPU是首次针对所述视频流进行图像处理,则创建新的纹理,否则在所述图像数据的宽高比不改变的情况下,仅更新已有的纹理。
  29. 根据权利要求28所述的终端,其中,如果所述GPU不是首次针对所述视频流进行图像处理且所述图像数据的宽高比发生了改变,则创建新的纹理。
  30. 根据权利要求28所述的终端,其中,所述图像数据的宽高比的改变是由变更摄像头和/或旋转所述终端引起的。
  31. 根据权利要求21-30中任一项所述的终端,其中,所述终端为可移动终端。
PCT/CN2016/107993 2016-11-30 2016-11-30 处理视频流的方法和终端 WO2018098677A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2016/107993 WO2018098677A1 (zh) 2016-11-30 2016-11-30 处理视频流的方法和终端
CN201680002250.6A CN106688016A (zh) 2016-11-30 2016-11-30 处理视频流的方法和终端

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/107993 WO2018098677A1 (zh) 2016-11-30 2016-11-30 处理视频流的方法和终端

Publications (1)

Publication Number Publication Date
WO2018098677A1 true WO2018098677A1 (zh) 2018-06-07

Family

ID=58849586

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/107993 WO2018098677A1 (zh) 2016-11-30 2016-11-30 处理视频流的方法和终端

Country Status (2)

Country Link
CN (1) CN106688016A (zh)
WO (1) WO2018098677A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117331512A (zh) * 2023-12-01 2024-01-02 芯动微电子科技(武汉)有限公司 对gpu核内存储器执行写操作的数据压缩及处理方法

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111164569B (zh) * 2017-08-02 2020-11-10 深圳传音通讯有限公司 一种基于智能终端的滤镜切换方法及滤镜切换系统
CN107578371A (zh) * 2017-09-29 2018-01-12 北京金山安全软件有限公司 一种图像处理方法、装置、电子设备及介质
CN108495043B (zh) * 2018-04-28 2020-08-07 Oppo广东移动通信有限公司 图像数据处理方法及相关装置
CN108600771B (zh) * 2018-05-15 2019-10-25 东北农业大学 录播工作站系统及操作方法
CN109089043B (zh) * 2018-08-30 2021-07-30 Oppo广东移动通信有限公司 拍摄图像预处理方法、装置、存储介质及移动终端
CN110730335A (zh) * 2019-11-14 2020-01-24 深圳市高巨创新科技开发有限公司 无人机视频实时预览方法及其系统
CN111182350B (zh) * 2019-12-31 2022-07-26 广州方硅信息技术有限公司 图像处理方法、装置、终端设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110085139A1 (en) * 2009-10-08 2011-04-14 Tobii Technology Ab Eye-tracking using a gpu
CN103685926A (zh) * 2012-09-21 2014-03-26 宏达国际电子股份有限公司 面部区域的影像处理方法以及使用此方法的电子装置
CN104331858A (zh) * 2014-11-24 2015-02-04 厦门美图之家科技有限公司 一种同时利用cpu和gpu进行图像处理的加速方法
CN104392409A (zh) * 2014-12-01 2015-03-04 厦门美图之家科技有限公司 一种图像美容的加速方法
CN105872447A (zh) * 2016-05-26 2016-08-17 努比亚技术有限公司 一种视频图像处理装置和方法
CN106127673A (zh) * 2016-07-19 2016-11-16 腾讯科技(深圳)有限公司 一种视频处理方法、装置及计算机设备

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102572391B (zh) * 2011-12-09 2014-08-27 深圳万兴信息科技股份有限公司 一种摄像头视频帧的精灵化方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110085139A1 (en) * 2009-10-08 2011-04-14 Tobii Technology Ab Eye-tracking using a gpu
CN103685926A (zh) * 2012-09-21 2014-03-26 宏达国际电子股份有限公司 面部区域的影像处理方法以及使用此方法的电子装置
CN104331858A (zh) * 2014-11-24 2015-02-04 厦门美图之家科技有限公司 一种同时利用cpu和gpu进行图像处理的加速方法
CN104392409A (zh) * 2014-12-01 2015-03-04 厦门美图之家科技有限公司 一种图像美容的加速方法
CN105872447A (zh) * 2016-05-26 2016-08-17 努比亚技术有限公司 一种视频图像处理装置和方法
CN106127673A (zh) * 2016-07-19 2016-11-16 腾讯科技(深圳)有限公司 一种视频处理方法、装置及计算机设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117331512A (zh) * 2023-12-01 2024-01-02 芯动微电子科技(武汉)有限公司 对gpu核内存储器执行写操作的数据压缩及处理方法
CN117331512B (zh) * 2023-12-01 2024-04-12 芯动微电子科技(武汉)有限公司 对gpu核内存储器执行写操作的数据压缩及处理方法

Also Published As

Publication number Publication date
CN106688016A (zh) 2017-05-17

Similar Documents

Publication Publication Date Title
WO2018098677A1 (zh) 处理视频流的方法和终端
US11496696B2 (en) Digital photographing apparatus including a plurality of optical systems for acquiring images under different conditions and method of operating the same
US11871105B2 (en) Field of view adjustment
WO2019105305A1 (zh) 图像亮度处理方法、计算机可读存储介质和电子设备
WO2021078001A1 (zh) 一种图像增强方法及装置
WO2020259250A1 (zh) 图像处理方法、图像处理器、拍摄装置和电子设备
WO2019237982A1 (zh) 贴纸设置方法及装置
EP4156082A1 (en) Image transformation method and apparatus
WO2023160234A1 (zh) 转场动效生成方法、电子设备和存储介质
TWI520604B (zh) 攝像裝置及其影像預覽系統及影像預覽方法
CN108898650B (zh) 人形素材创建方法及相关装置
US8971636B2 (en) Image creating device, image creating method and recording medium
US11438491B2 (en) Systems and methods for blocking a target in video monitoring
US9600735B2 (en) Image processing device, image processing method, program recording medium
WO2023060921A1 (zh) 图像处理方法与电子设备
US11443403B2 (en) Image and video processing using multiple pipelines
US11636708B2 (en) Face detection in spherical images
US20210191683A1 (en) Method and system for simultaneously driving dual displays with same camera video data and different graphics
WO2021258249A1 (zh) 图像获取方法、电子设备和可移动设备
WO2021237736A1 (zh) 图像处理方法、装置和系统,计算机可读存储介质
JP6992829B2 (ja) 画像処理システム、画像処理方法およびプログラム
WO2019238001A1 (zh) 拍照控制方法及相关装置
WO2018098931A1 (zh) 一种数据处理方法及装置
TW202009595A (zh) 全景照相裝置及其影像映射結合方法
JP6705477B2 (ja) 画像処理システム、画像処理方法およびプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16922708

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16922708

Country of ref document: EP

Kind code of ref document: A1