WO2018103244A1 - 直播视频处理方法、装置及电子设备 - Google Patents

直播视频处理方法、装置及电子设备 Download PDF

Info

Publication number
WO2018103244A1
WO2018103244A1 PCT/CN2017/079594 CN2017079594W WO2018103244A1 WO 2018103244 A1 WO2018103244 A1 WO 2018103244A1 CN 2017079594 W CN2017079594 W CN 2017079594W WO 2018103244 A1 WO2018103244 A1 WO 2018103244A1
Authority
WO
WIPO (PCT)
Prior art keywords
pixel
area
skin
blurred
live video
Prior art date
Application number
PCT/CN2017/079594
Other languages
English (en)
French (fr)
Inventor
赵连超
Original Assignee
武汉斗鱼网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉斗鱼网络科技有限公司 filed Critical 武汉斗鱼网络科技有限公司
Publication of WO2018103244A1 publication Critical patent/WO2018103244A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs

Definitions

  • the present invention relates to the field of image processing technologies, and in particular, to a live video processing method, apparatus, and electronic device.
  • the main process of the live broadcast is that the anchor opens the camera, and the host device such as the mobile phone or the computer obtains the video stream of the anchor in real time through the camera, and sends the obtained video stream to the server, and the server forwards the received video stream to the terminal of each viewer in real time. device.
  • the host device such as the mobile phone or the computer obtains the video stream of the anchor in real time through the camera, and sends the obtained video stream to the server, and the server forwards the received video stream to the terminal of each viewer in real time. device.
  • the camera may also capture some personal items of the anchor, causing the personal privacy of the anchor to leak. Aiming at the problem that the existing live broadcast method easily leads to the leakage of personal privacy of the anchor, a good solution has not been proposed yet.
  • an object of the present invention is to provide a live video processing method, apparatus, and electronic device to solve the problem that the existing live broadcast mode easily leads to personal privacy leakage of the anchor.
  • the embodiment of the present invention provides a live video processing method, where the method includes: acquiring a sequence of live video frames; and determining a still background and a non-skin area in each video frame of the live video frame sequence, Determining the area composed of the still background and the non-skin area as a blur area; in each of the video frames, blurring the blur area to obtain a plurality of blurred images.
  • the embodiment of the present invention provides a first possible implementation manner of the first aspect, wherein after performing the step of acquiring a sequence of live video frames, the method further includes: at each of the video frames Determining an area where the anchor is located, performing brightness enhancement and/or contrast enhancement on the area where the anchor is located, and obtaining a plurality of enhanced images; each of the pixels corresponds to a probability value of the skin corresponding to each pixel in the video frame. Description The blurred image is fused with the corresponding enhanced image to obtain a plurality of fused images; and the plurality of fused images are sent as a sequence of processed live video frames.
  • the embodiment of the present invention provides the second possible implementation manner of the first aspect, wherein the probability value of each pixel corresponding to the skin in the video frame is obtained by: Generating a skin map corresponding to the video frame, wherein each pixel point is marked as a skin point or a non-skin point in the skin map; and the skin map is blurred to obtain a blurred image, according to each of the blurred images The pixel values of the pixels determine the probability value of each pixel corresponding to the skin in the video frame.
  • the embodiment of the present invention provides a third possible implementation manner of the first aspect, wherein each pixel point in each of the video frames is determined according to the following manner Corresponding to the probability value of the skin, each of the blurred images is fused with the corresponding enhanced image to obtain a plurality of fused images:
  • BG_Blur represents the pixel value of each pixel of the blurred image
  • dest3 represents the pixel value of each pixel of the enhanced image
  • Dest represents the pixel value of each pixel of the fused image.
  • the embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the brightness enhancement of the area where the anchor is located is performed by: And / or contrast enhancement:
  • dest1 represents the pixel value of each pixel in the area where the anchor is enhanced
  • src1 represents the pixel value of each pixel in the area where the anchor is located before the brightness enhancement
  • beat represents the brightness enhancement coefficient
  • dest2 represents the contrast enhancement.
  • src2 represents the pixel value of each pixel in the area where the anchor is located before the contrast enhancement
  • gamma represents the contrast enhancement coefficient.
  • the embodiment of the present invention provides the fifth possible implementation manner of the first aspect, wherein the video is used in the following manner Determining the non-skin area in a frame: converting the video frame to a YUV color space, determining a skin pixel point according to values of Y, U, and V of each pixel point in the video frame; forming all the skin pixel points The area is determined as a skin area, and an area other than the skin area is determined as the non-skin area.
  • the embodiment of the present invention provides a sixth possible implementation manner of the first aspect, wherein the value of Y, U, and V according to each pixel in the video frame is
  • the step of determining skin pixel points includes: determining a value of U within the first range, and determining a pixel point whose V value is within the second range as the skin pixel point.
  • the embodiment of the present invention provides a seventh possible implementation manner of the first aspect, wherein before performing the step of acquiring a sequence of live video frames, the method further includes: acquiring privacy Protected setup instructions.
  • an embodiment of the present invention provides an eighth possible implementation of the first aspect, wherein the non-skin area is a non-skin area in a non-stationary background.
  • an embodiment of the present invention provides another live video processing method, where the method includes:
  • the ambiguous area is blurred to obtain a plurality of blurred images.
  • an embodiment of the present invention provides a first possible implementation manner of the second aspect, wherein the blurred area further includes a non-skin area in a non-stationary background.
  • the embodiment of the present invention provides the second possible implementation manner of the second aspect, where the method further includes:
  • each of the blurred images is merged with the corresponding enhanced image according to a probability value of each pixel corresponding to the skin in each of the video frames to obtain a plurality of fused images;
  • a plurality of the fused images are transmitted as a sequence of processed live video frames.
  • the embodiment of the present invention provides a third possible implementation manner of the second aspect, wherein a probability value of each pixel corresponding to the skin in the video frame is obtained by: :
  • Obscuring the skin map to obtain a blurred image and determining a probability value of each pixel corresponding to the skin in the video frame according to a pixel value of each pixel in the blurred image.
  • each pixel in the blurred image corresponds to each pixel in the video frame
  • the probability of each pixel in the video frame is determined according to the pixel value of each pixel in the blurred image.
  • the value includes: determining a probability value of the corresponding skin of each pixel in the video frame according to a pixel value of each pixel in the blurred image.
  • an embodiment of the present invention provides a live video processing device, where the live video processing device includes: a video acquiring module, configured to acquire a live video frame sequence; and an area determining module, configured to: in the live video frame sequence Determining a stationary background and a non-skin area in each video frame, determining an area composed of the still background and the non-skin area as a blur area; an image blurring module for each of the video frames The blurring area is blurred to obtain a plurality of blurred images.
  • an embodiment of the present invention provides an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer
  • the steps of the method of the first aspect or the second aspect described above are implemented at the time of the program.
  • the live video frame sequence is first obtained, and then, in each video frame of the live video frame sequence, the stationary background and the non-skin area are determined, and the area composed of the static background and the non-skin area is determined as the blurred area. Finally, in each video frame, the ambiguous area is blurred to obtain a plurality of blurred images. Since the method in this embodiment can perform a blurring process on a part of the image in each video frame in the live video frame sequence, and the blurred portion is a still background and a non-skin area, that is, the motion background is not blurred. And the anchoring of the face, the live video processing method, the device and the electronic device in the embodiment can protect the personal privacy of the anchor, and solve the problem that the existing live broadcast mode easily leads to the leakage of the personal privacy of the anchor.
  • FIG. 1 is a schematic diagram of a first process of a live video processing method according to an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of a second method for processing a live video according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a first module of a live video processing apparatus according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of a second module structure of a live video processing device according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a module of an electronic device according to an embodiment of the present invention.
  • the present invention provides a method, a device, and an electronic device for processing a live video, which are described in detail below by using an embodiment.
  • FIG. 1 is a schematic diagram of a first process of a live video processing method according to an embodiment of the present invention. As shown in FIG. 1 , the method includes the following steps:
  • Step S102 Acquire a sequence of live video frames.
  • the method in this embodiment can be executed by the client of the anchor, can also be executed by the background server of the live website, and can also be executed by the client of the viewer.
  • the client of the anchor When the method is executed by the client of the anchor, the client of the anchor acquires a sequence of live video frames, and the live broadcast
  • the sequence of video frames is a sequence of video frames to be sent to the background server of the live website.
  • the video frame in the client of the anchor After the anchor broadcasts, the video frame in the client of the anchor enters the buffer queue, and the video frame is buffered through the buffer queue, thereby ensuring smooth play during the live broadcast.
  • the background server When the method is executed by the background server of the live broadcast website, the background server obtains a sequence of live video frames, which is a video frame sequence of the client to be sent to each viewer, which is uploaded by the client of the main broadcast.
  • the client of the viewer When the method is executed by the client of the viewer, the client of the viewer obtains a sequence of live video frames, which is a sequence of live video frames to be displayed sent by the background server.
  • the following content will not focus on the execution host as the background server, the client of the anchor, or the client of the viewer. It can be understood that the method in this embodiment can be executed by the background server, It can be executed by the client of the anchor and can also be executed by the client of the viewer.
  • Step S104 in each video frame of the live video frame sequence, a stationary background and a non-skin area are determined, and an area composed of the stationary background and the non-skin area is determined as a blurred area.
  • the live video frame sequence is composed of a plurality of consecutive video frames.
  • the same processing is performed for each video frame, and in each video frame, a stationary background and a non-skin region are determined.
  • the still background refers to a still background image in a video frame.
  • the non-skin area in the embodiment of the present invention refers to a non-skin area in a non-stationary background, such as a non-skin area in a region other than a still background picture in a video frame.
  • the Gaussian background modeling method can be used to detect the stationary background in the video frame.
  • the main process of the Gaussian background modeling method is to establish a plurality of Gaussian models for each pixel, and simulate the distribution of pixel values of each pixel by the established Gaussian model.
  • the pixel value of a certain pixel changes, Determining whether the changed pixel value is in the corresponding Gaussian model, if yes, determining that the pixel point is a background point, and if not, determining that the pixel point is a front point of view.
  • the area in which all the detected background points are combined is determined as a stationary background.
  • the still background is represented by the symbol BG_S_mask.
  • the non-skin area can be determined in the video frame in the following manner:
  • a region composed of all skin pixel points is determined as a skin region, and a region other than the skin region is determined as a non-skin region.
  • the area composed of the still background and the non-skin area of each video frame is also determined as the blurred area of the video frame, in this embodiment.
  • the morphological processing method of corrosion or expansion may be used to process the falsified area to make the void in the ambiguous area.
  • the discontinuous areas are connected to make the ambiguous area more complete, and the mask of the morphologically processed ambiguous area is represented by the symbol BG_mask.
  • Step S106 in each video frame, blurring the blurred area to obtain a plurality of blurred images.
  • the ambiguous area is also blurred.
  • the ambiguous area can be blurred by a Gaussian fuzzy algorithm. It is assumed that the BG_mask is represented as a BG1_mask after morphological processing.
  • the Gaussian weight G(x, y) is calculated by the following formula:
  • the imaginary image corresponding to each video frame is obtained, that is, a plurality of imaginary images are obtained, and the plurality of imaginary images are sequentially composed of the blurred live video frame sequence.
  • the live video frame sequence is first obtained, and then, in each video frame of the live video frame sequence, the stationary background and the non-skin area are determined, and the area composed of the static background and the non-skin area is determined as the blurred area. Finally, in each video frame, the ambiguous area is blurred to obtain a plurality of blurred images. Since the method in this embodiment can perform a blurring process on a part of the image in each video frame in the live video frame sequence, and the blurred portion is a still background and a non-skin area, that is, the motion background is not blurred. And the anchor face, therefore, by the method in this embodiment, the personal privacy of the anchor can be protected, and the problem that the existing live broadcast mode easily causes the personal privacy of the anchor to be leaked is solved.
  • FIG. 2 is a schematic diagram of a second process of a method for processing a live video according to an embodiment of the present invention. As shown in FIG. 2, after the step S102, the method further includes the following steps:
  • Step S104' in each video frame, determining an area where the anchor is located, performing brightness enhancement and/or contrast enhancement on the area where the anchor is located, and obtaining a plurality of enhanced images.
  • step S104' is to obtain a plurality of enhanced images according to the plurality of video frames
  • step S104 is to obtain a plurality of blurred images according to the plurality of video frames
  • the two steps are directly for the video frames in the live video frame sequence.
  • the processing is performed, so the two steps can be performed at the same time.
  • the two steps can also be performed sequentially, such as performing step S104 first and then performing step S104', and vice versa.
  • the same processing is performed separately for each video frame, and in each video frame, the area where the anchor is located is determined, and the specific determining process is: detecting the face in each video frame. After obtaining the face area of the anchor, after obtaining the face area, the face area is expanded according to the size ratio of the face and the trunk, and the complete image of the anchor is obtained, that is, the area where the anchor is located, and the area where the anchor is located adopts the symbol FG ( Foreground, foreground).
  • a robust face detection algorithm when detecting a face, a robust face detection algorithm may be used.
  • the face model is directly trained by using an offline face data set, and the face of the face can be obtained by using the Adaboost training method.
  • the model scans the trained face model in a real-time video frame, and determines whether the current sliding window is a human face according to the comparison result, so that the face region in the video frame is detected.
  • the face area is expanded according to the size ratio of the face to the torso.
  • the face is set to be a rectangle, and the size is a*b, and the face is set below (m*a)* ( The rectangular range of n*b) is the torso part.
  • m represents the first expansion ratio
  • n represents the second expansion ratio
  • the area formed by the two rectangular areas a*b and (m*a)*(n*b) is determined as the area in which the main broadcast is located.
  • brightness enhancement and/or contrast enhancement may be performed on the area where the anchor is located according to the enhanced instruction input by the anchor, and the method in this embodiment is executed on the server or the client of the viewer.
  • the brightness enhancement and/or contrast enhancement of the area in which the anchor is located can be performed according to the default enhancement instructions.
  • the enhanced instruction of the anchor input is consistent with the format of the default enhanced instruction.
  • the enhancement instruction when the enhancement instruction includes the brightness enhancement coefficient, the brightness of the area where the anchor is located is enhanced.
  • the enhancement instruction includes the contrast enhancement coefficient the contrast enhancement is performed on the area where the anchor is located, while the enhanced instruction is simultaneously
  • brightness enhancement and contrast enhancement are performed on the area where the anchor is located.
  • brightness enhancement and/or contrast enhancement is performed on the area where the anchor is located in the following manner:
  • the dest1 indicates the pixel value of each pixel in the area where the anchor is enhanced, and the src1 indicates the pixel value of each pixel in the area where the anchor is before the brightness enhancement.
  • the beat indicates the brightness enhancement coefficient, and the value range may be [ 2,11], the larger the beta value, the brighter the image.
  • Dest2 represents the pixel value of each pixel in the area where the contrast is enhanced
  • src2 represents the pixel value of each pixel in the area where the anchor is before the contrast enhancement
  • gamma represents the above contrast enhancement coefficient, and the value range is [0, 1 ], the larger the gamma, the higher the contrast of the image.
  • Step S108 each blurred image is obtained according to a probability value of the skin corresponding to each pixel in each video frame. Fusion with the corresponding enhanced image to obtain a plurality of fused images.
  • a plurality of blurred images and a plurality of enhanced images can be obtained, because the plurality of blurred images are in one-to-one correspondence with each video frame in the live video frame sequence, and the plurality of enhanced images and the live video frame are Each of the video frames in the sequence corresponds one-to-one, so that the plurality of blurred images are in one-to-one correspondence with the plurality of enhanced images.
  • each of the blurred images is merged with the corresponding enhanced image to obtain a plurality of fused images.
  • each blurred image is fused with the corresponding enhanced image to obtain a fused image, which is implemented by the following formula:
  • BG_Blur represents the pixel value of each pixel of the blurred image
  • dest3 represents the pixel value of each pixel of the enhanced image
  • dest represents the fused image. The pixel value of the pixel.
  • a represents the probability value of each pixel corresponding to the skin in the video frame.
  • the optional value range is between [0, 1]. The larger the a value, the higher the probability that the pixel is the skin pixel. .
  • the probability value of each pixel corresponding to the skin in the video frame can be obtained by:
  • each pixel point is marked as a skin point or a non-skin point
  • each pixel in the video frame is assigned.
  • the pixel value of the pixel is 255.
  • the pixel value of the pixel is 0, and the value is obtained.
  • the image is a skin map of the video frame. After blurring the skin map, the pixel values of each pixel are redefined. In the blurred image, the pixel value of each pixel is between 255 and 0, and the pixel value of each pixel is divided by 255, that is, each pixel.
  • the values are normalized to [0, 1], and the normalized pixel values represent the probability values of the respective pixels corresponding to the skin.
  • step S110 is performed.
  • step S110 a plurality of fused images are sent as a sequence of processed live video frames.
  • the fused image obtained by the fusion is the live image to be seen by the final audience, and the plurality of fused images are sent as a sequence of the processed live video frames, so that the viewer sees the background blurred, the foreground enhanced live image, that is, the anchor is protected.
  • the privacy has enhanced the anchor image.
  • the client of the anchor when the method in this embodiment is executed by the client of the anchor, the client of the anchor sends the multiple fused images as a sequence of the processed live video frames to the server, when the method in this embodiment is executed by the server.
  • the server sends a plurality of fused images as a processed live video frame sequence to each viewer's client.
  • the method further includes the following steps before the step S102:
  • Step S101 Acquire a setting instruction for turning on privacy protection.
  • the client When the method in this embodiment is executed by the client of the anchor, the client receives the setting instruction of the privacy protection sent by the anchor, and performs step S102 to step S110 according to the setting instruction.
  • the client of the anchor receives the setting instruction of the privacy protection sent by the anchor, and sends the setting instruction to the server, and the server executes step S102 to step S110 according to the setting instruction.
  • the determined blurned area may include only the area where the still background is located, and in each video frame, only the stationary background is blurred.
  • the determined blurned area may include only non-skin areas in a non-stationary background, and in each video frame, only non-skin areas in the non-stationary background are blurred.
  • an option to blur the area where the still background is located may be provided, and an option to blur the non-skin area in the non-stationary background may be provided, which is selected according to the user's selection. Select the area where the still background is blurred and/or the non-skin area in the non-stationary background.
  • the embodiments of the present invention are not specifically limited.
  • FIG. 3 is a schematic diagram of a first module structure of the live video processing device according to an embodiment of the present invention.
  • the device includes: a video acquisition module 31, configured to acquire a sequence of live video frames; and an area determining module 32, configured to determine a still background and a non-skin area in each video frame of the live video frame sequence, and to set the still background and the non-skin area
  • the jointly formed area is determined as a blurred area
  • the image blurring module 33 is configured to blur the blurred area in each video frame to obtain a plurality of blurred images.
  • the area determining module 32 includes: a first determining sub-module and a second determining sub-module, configured to convert the video frame to the YUV color space according to Y, U of each pixel point in the video frame.
  • the V value determines a skin pixel point;
  • the second determining sub-module is configured to determine an area composed of all skin pixel points as a skin area, and an area outside the skin area as a non-skin area.
  • the first determining sub-module is specifically configured to determine a U value in a first range, and a pixel point in which the V value is in the second range is determined as a skin pixel point.
  • the live video frame sequence is first obtained, and then, in each video frame of the live video frame sequence, the stationary background and the non-skin area are determined, and the area composed of the static background and the non-skin area is determined as the blurred area. Finally, in each video frame, the ambiguous area is blurred to obtain a plurality of blurred images. Since the method in this embodiment can perform a blurring process on a part of the image in each video frame in the live video frame sequence, and the blurred portion is a still background and a non-skin area, that is, the motion background is not blurred. And the anchor face, therefore, by the device in this embodiment, the personal privacy of the anchor can be protected, and the problem that the existing live broadcast mode easily causes the personal privacy of the anchor to be leaked can be solved.
  • the device in this embodiment further includes: an instruction acquiring module 30, configured to acquire a setting instruction for enabling privacy protection.
  • the image enhancement module 32' is configured to determine, in each video frame, an area where the anchor is located, perform brightness enhancement and/or contrast enhancement on the area where the anchor is located, to obtain a plurality of enhanced images, and an image fusion module 34 for each video.
  • the enhanced image is fused to obtain a plurality of fused images; and the image sending module 35 is configured to send the plurality of fused images as the processed live video frame sequence.
  • the image fusion module 34 is specifically configured to obtain a probability value of the skin corresponding to each pixel in the video frame by: generating a skin map corresponding to the video frame, where each pixel is marked as a skin point or Non-skin point; the skin map is blurred to obtain a blurred image, and the probability value of each pixel corresponding to the pixel in the video frame is determined according to the pixel value of each pixel in the blurred image.
  • the image fusion module 34 is specifically configured to fuse each blurred image with a corresponding enhanced image according to a probability value of each pixel corresponding to the skin in each video frame to obtain a fused image:
  • BG_Blur represents the pixel value of each pixel of the blurred image
  • dest3 represents the pixel value of each pixel of the enhanced image
  • dest represents the fused image. The pixel value of the pixel.
  • the image enhancement module 32' performs brightness enhancement and/or contrast enhancement on the area where the anchor is located in the following manner:
  • dest1 represents the pixel value of each pixel in the area where the brightness is enhanced
  • src1 represents the pixel value of each pixel in the area where the anchor is before the brightness enhancement
  • beat represents the brightness enhancement coefficient
  • dest2 represents the contrast enhanced anchor.
  • src2 represents the pixel value of each pixel in the area where the anchor is before contrast enhancement
  • gamma represents the contrast enhancement coefficient.
  • FIG. 5 is a schematic diagram of a module structure of the electronic device according to the embodiment of the present invention.
  • the device includes a memory 1000, a processor 2000, and A computer program stored on the memory 1000 and operable on the processor 2000, the processor 2000 executing the computer program to implement the steps of the live video processing method in the above embodiment.
  • the memory 1000 and the processor 2000 can be general-purpose memories and processors, which are not specifically limited herein.
  • the memory 1000 and the processor 2000 are connected by a communication bus, and can be protected when the processor 2000 runs a computer program stored in the memory 1000.
  • the personal privacy of the anchor, solving the existing live broadcast method is likely to cause the privacy of the anchor to be leaked.
  • the live video processing device provided by the embodiment of the present invention may be specific hardware on the device or software or firmware installed on the device.
  • the live video processing device provided by the embodiment of the present invention has the same implementation principle and the technical effect of the live video processing method.
  • the device embodiment is not mentioned, reference may be made to the foregoing method embodiment.
  • the corresponding content A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working processes of the foregoing system, the device and the unit can refer to the corresponding processes in the foregoing method embodiments, and details are not described herein again.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some communication interface, device or unit, and may be electrical, mechanical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in the embodiment provided by the present invention may be integrated into one processing unit, It may be that each unit physically exists alone, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)

Abstract

本发明的目的在于提供一种直播视频处理方法、装置及电子设备,该方法包括:获取直播视频帧序列;在所述直播视频帧序列的每个视频帧中,确定静止背景和非皮肤区域,将所述静止背景和所述非皮肤区域共同组成的区域确定为虚化区域;在每个所述视频帧中,对所述虚化区域进行虚化,得到多个虚化图像。通过本发明中的直播视频处理方法、装置及电子设备,能够解决现有的直播方式容易导致主播的个人隐私泄露的问题。

Description

直播视频处理方法、装置及电子设备
本申请要求于2016年12月09日提交中国专利局的申请号为CN201611129655.4、名称为“直播视频处理方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及图像处理技术领域,具体而言,涉及一种直播视频处理方法、装置及电子设备。
背景技术
随着直播行业的快速发展,各行各业的人开始进入直播行业,直播的场景也越来越多样化。
直播的主要过程为,主播开启摄像头,主播的终端设备如手机或者电脑通过摄像头实时获取主播的视频流,将获取的视频流发送至服务器,服务器将接收到的视频流实时转发至各个观众的终端设备。
在直播过程中,摄像头除了拍摄主播之外,很可能还拍摄到主播的一些私人物品,导致主播的个人隐私泄露。针对现有的直播方式容易导致主播的个人隐私泄露的问题,目前尚未提出很好的解决方案。
发明内容
有鉴于此,本发明的目的在于提供一种直播视频处理方法、装置及电子设备,以解决现有的直播方式容易导致主播的个人隐私泄露的问题。
第一方面,本发明实施例提供了一种直播视频处理方法,所述方法包括:获取直播视频帧序列;在所述直播视频帧序列的每个视频帧中,确定静止背景和非皮肤区域,将所述静止背景和所述非皮肤区域共同组成的区域确定为虚化区域;在每个所述视频帧中,对所述虚化区域进行虚化,得到多个虚化图像。
结合第一方面,本发明实施例提供了第一方面第一种可能的实施方式,其中,在执行所述获取直播视频帧序列的步骤之后,所述方法还包括:在每个所述视频帧中,确定主播所在区域,对所述主播所在区域进行亮度增强和/或对比度增强,得到多个增强图像;根据每个所述视频帧中每个像素点对应皮肤的概率值,将每个所述 虚化图像与对应的所述增强图像进行融合,得到多个融合图像;将多个所述融合图像作为处理后的直播视频帧序列发送出去。
结合第一方面第一种可能的实施方式,本发明实施例提供了第一方面第二种可能的实施方式,其中,所述视频帧中每个像素点对应皮肤的概率值通过以下方式获取:生成所述视频帧对应的皮肤图,所述皮肤图中,每个像素点被标记为皮肤点或者非皮肤点;对所述皮肤图进行模糊处理,得到模糊图像,根据所述模糊图像中每个像素点的像素值确定所述视频帧中每个像素点对应皮肤的概率值。
结合第一方面第一种或第二种可能的实施方式,本发明实施例提供了第一方面第三种可能的实施方式,其中,通过以下方式根据每个所述视频帧中每个像素点对应皮肤的概率值,将每个所述虚化图像与对应的所述增强图像进行融合,得到多个融合图像:
dest=BG_Blur*a+dest3*(1-a)
其中,a表示所述视频帧中每个像素点对应皮肤的概率值,BG_Blur表示所述虚化图像的每个像素点的像素值,dest3表示所述增强图像的每个像素点的像素值,dest表示所述融合图像的每个像素点的像素值。
结合第一方面第一种至第三种任一种可能的实施方式,本发明实施例提供了第一方面第四种可能的实施方式,其中,通过以下方式对所述主播所在区域进行亮度增强和/或对比度增强:
Figure PCTCN2017079594-appb-000001
dest2=(src2-128)*gamma+128
其中,dest1表示亮度增强后的所述主播所在区域中各个像素点的像素值,src1表示亮度增强前的所述主播所在区域中各个像素点的像素值,beat表示亮度增强系数,dest2表示对比度增强后的所述主播所在区域中各个像素点的像素值,src2表示对比度增强前的所述主播所在区域中各个像素点的像素值,gamma表示对比度增强系数。
结合第一方面,或第一方面第一种至第四种任一种可能的实施方式,本发明实施例提供了第一方面第五种可能的实施方式,其中,采用以下方式在所述视频帧中确定所述非皮肤区域:将所述视频帧转换至YUV颜色空间,根据所述视频帧中各个像素点的Y、U、V取值确定皮肤像素点;将所有所述皮肤像素点组成的区域确定为皮肤区域,将所述皮肤区域以外的区域确定为所述非皮肤区域。
结合第一方面第五种可能的实施方式,本发明实施例提供了第一方面第六种可能的实施方式,其中,所述根据所述视频帧中各个像素点的Y、U、V取值确定皮肤像素点的步骤,包括:将U取值在第一范围内,且V取值在第二范围内的像素点确定为所述皮肤像素点。
结合第一方面上述的实施方式,本发明实施例提供了第一方面第七种可能的实施方式,其中,在执行所述获取直播视频帧序列的步骤之前,所述方法还包括:获取开启隐私保护的设置指令。
结合第一方面上述的实施方式,本发明实施例提供了第一方面第八种可能的实施方式,其中,所述非皮肤区域为非静止背景中的非皮肤区域。
第二方面,本发明实施例提供了另一种直播视频处理方法,所述方法包括:
获取直播视频帧序列;
在所述直播视频帧序列的每个视频帧中,确定虚化区域,所述虚化区域包括静止背景所在的区域;
在每个所述视频帧中,对所述虚化区域进行虚化,得到多个虚化图像。
结合第二方面,本发明实施例提供了第二方面第一种可能的实施方式,其中,所述虚化区域还包括非静止背景中的非皮肤区域。
结合第二方面,或第二方面的第一种可能的实施方式,本发明实施例提供了第二方面第二种可能的实施方式,其中,所述方法还包括:
在每个所述视频帧中,确定主播所在区域,对所述主播所在区域进行亮度增强和/或对比度增强,得到多个增强图像;
根据每个所述视频帧中每个像素点对应皮肤的概率值,将每个所述虚化图像与对应的所述增强图像进行融合,得到多个融合图像;
将多个所述融合图像作为处理后的直播视频帧序列发送出去。
结合第二方面的第二种可能的实施方式,本发明实施例提供了第二方面第三种可能的实施方式,其中,所述视频帧中每个像素点对应皮肤的概率值通过以下方式获取:
生成所述视频帧对应的皮肤图,所述皮肤图中,每个像素点被标记为皮肤点或者非皮肤点;
对所述皮肤图进行模糊处理,得到模糊图像,根据所述模糊图像中每个像素点的像素值确定所述视频帧中每个像素点对应皮肤的概率值。
本发明实施例中,模糊图像中各像素点与视频帧中各像素点一一对应,根据所述模糊图像中每个像素点的像素值确定所述视频帧中每个像素点对应皮肤的概率值包括:根据所述模糊图像中每个像素点的像素值确定所述视频帧中对应的每个像素点对应皮肤的概率值。
第三方面,本发明实施例提供了一种直播视频处理装置,所述直播视频处理装置包括:视频获取模块,用于获取直播视频帧序列;区域确定模块,用于在所述直播视频帧序列的每个视频帧中,确定静止背景和非皮肤区域,将所述静止背景和所述非皮肤区域共同组成的区域确定为虚化区域;图像虚化模块,用于在每个所述视频帧中,对所述虚化区域进行虚化,得到多个虚化图像。
第四方面,本发明实施例提供了一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现上述第一方面或第二方面所述的方法的步骤。
本实施例中,首先获取直播视频帧序列,然后在直播视频帧序列的每个视频帧中,确定静止背景和非皮肤区域,将静止背景和非皮肤区域共同组成的区域确定为虚化区域,最后在每个视频帧中,对虚化区域进行虚化,得到多个虚化图像。由于本实施例中的方法能够对直播视频帧序列中的每个视频帧中的部分图像进行虚化处理,且虚化的部分为静止背景和非皮肤区域,也即不会虚化到运动背景和主播面部,因此通过本实施例中的直播视频处理方法、装置及电子设备,能够保护主播的个人隐私,解决现有的直播方式容易导致主播的个人隐私泄露的问题。
为使本发明的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并 配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本发明的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1为本发明实施例提供的直播视频处理方法的第一种流程示意图;
图2为本发明实施例提供的直播视频处理方法的第二种流程示意图;
图3为本发明实施例提供的直播视频处理装置的第一种模块组成示意图;
图4为本发明实施例提供的直播视频处理装置的第二种模块组成示意图;
图5为本发明实施例提供的电子设备的模块组成示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围,而是仅仅表示本发明的选定实施例。基于本发明的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。
考虑到现有的直播方式容易导致主播的个人隐私泄露的问题,本发明提供了一种直播视频处理方法、装置及电子设备,下面通过实施例进行具体描述。
图1为本发明实施例提供的直播视频处理方法的第一种流程示意图,如图1所示,该方法包括以下步骤:
步骤S102,获取直播视频帧序列。
本实施例中的方法能够由主播的客户端执行,还能够由直播网站的后台服务器执行,还能够由观众的客户端执行。
当该方法由主播的客户端执行时,主播的客户端获取直播视频帧序列,该直播 视频帧序列为待发送至直播网站的后台服务器的视频帧序列。为了提高直播过程的流畅性,主播进行直播后,主播的客户端中的视频帧进入缓冲队列,通过缓冲队列缓冲视频帧,从而保证直播过程中的播放流畅。
当该方法由直播网站的后台服务器执行时,后台服务器获取直播视频帧序列,该直播视频帧序列为主播的客户端上传的,待发送至各个观众的客户端的视频帧序列。
当该方法由观众的客户端执行时,观众的客户端获取直播视频帧序列,该直播视频帧序列为后台服务器发送的待显示的直播视频帧序列。
为了便于实施例的描述,后面内容将不再重点强调执行主体为后台服务器、主播的客户端、或者观众的客户端,能够理解的是,本实施例中的方法即可以由后台服务器执行,也能够由主播的客户端执行,还能够由观众的客户端执行。
步骤S104,在直播视频帧序列的每个视频帧中,确定静止背景和非皮肤区域,将静止背景和非皮肤区域共同组成的区域确定为虚化区域。
直播视频帧序列由多帧连续的视频帧组成,本实施例中,对每个视频帧都做相同的处理,在每个视频帧中,均确定静止背景和非皮肤区域。其中,静止背景是指视频帧中静止的背景画面。本发明实施例中的非皮肤区域是指非静止背景中的非皮肤区域,如视频帧中除了静止的背景画面之外的区域中的非皮肤区域。
本步骤中,可以采用高斯背景建模法检测视频帧中的静止背景。具体地,高斯背景建模法的主要过程是,为每个像素点建立多个高斯模型,通过建立的高斯模型模拟各个像素点的像素值的分布,当某个像素点的像素值改变时,判断改变后的像素值是否在对应的高斯模型之中,如果在,则确定该像素点为背景点,如果不在,则确定该像素点为前景点。本实施例中,将检测到的所有背景点共同组成的区域确定为静止背景,本实施例中,该静止背景通过符号BG_S_mask表示。
本步骤中,可以采用以下方式在视频帧中确定非皮肤区域:
(1)将视频帧转换至YUV颜色空间,根据视频帧中各个像素点的Y、U、V取值 确定皮肤像素点;
(2)将所有皮肤像素点组成的区域确定为皮肤区域,将皮肤区域以外的区域确定为非皮肤区域。
本步骤中,在每个视频帧中确定静止背景和非皮肤区域后,还将每个视频帧的静止背景和非皮肤区域共同组成的区域确定为该视频帧的虚化区域,本实施例中,虚化区域通过符号BG1_mask表示,由于虚化区域由静止背景和非皮肤区域共同组成,因此BG1_mask=BG_S_mask∪BG_D_mask。
考虑到光照等因素的影响,虚化区域可能存在空洞或者不连续区域,因此本实施例中,还可以使用腐蚀或者膨胀的形态学处理方法对虚化区域进行处理,使虚化区域中的空洞或者不连续区域连接起来,使虚化区域更加完整,经过形态学处理后的虚化区域的掩膜以符号BG_mask表示。
步骤S106,在每个视频帧中,对虚化区域进行虚化,得到多个虚化图像。
在每个视频中确定虚化区域后,还对虚化区域进行虚化,虚化处理的方式有多种,例如,可以采用高斯模糊算法对虚化区域进行虚化。假设当虚化区域经过形态学处理后,表示为BG_mask,如果未经形态学处理,则表示为BG1_mask。
高斯模糊的步骤包括:在上述确定的虚化区域BG_mask或者BG1_mask中任取一像素点X(x0,y0),取其半径为R的邻域,邻域内每一像素点均为Xr(x,y),按照高斯权值G(x,y)进行加权,则该像素点模糊后的像素值为Nnew=ΣXr(x,y)·G(x,y)。
其中,高斯权值G(x,y)通过以下公式计算得到:
Figure PCTCN2017079594-appb-000002
根据高斯模糊算法更新虚化区域的像素值之后,得到每个视频帧对应的虚化图像,也即得到多个虚化图像,该多个虚化图像按照顺序组成虚化的直播视频帧序列。
本实施例中,首先获取直播视频帧序列,然后在直播视频帧序列的每个视频帧中,确定静止背景和非皮肤区域,将静止背景和非皮肤区域共同组成的区域确定为虚化区域,最后在每个视频帧中,对虚化区域进行虚化,得到多个虚化图像。由于本实施例中的方法能够对直播视频帧序列中的每个视频帧中的部分图像进行虚化处理,且虚化的部分为静止背景和非皮肤区域,也即不会虚化到运动背景和主播面部,因此通过本实施例中的方法,能够保护主播的个人隐私,解决现有的直播方式容易导致主播的个人隐私泄露的问题。
图2为本发明实施例提供的直播视频处理方法的第二种流程示意图,如图2所示,该方法在上述步骤S102之后,还包括以下步骤:
步骤S104’,在每个视频帧中,确定主播所在区域,对主播所在区域进行亮度增强和/或对比度增强,得到多个增强图像。
步骤S104’的目的是根据多个视频帧得到多个增强图像,步骤S104的目的是根据多个视频帧得到多个虚化图像,该两个步骤都是直接对直播视频帧序列中的视频帧进行处理,因此该两个步骤可以同时执行,当然,这两个步骤也可以先后执行,如先执行步骤S104,后执行步骤S104’,反之也可。
可选地,本实施例中,分别对每个视频帧进行相同的处理,在每个视频帧中,确定主播所在的区域,具体确定过程为:在每个视频帧中,对人脸进行检测,得到主播的人脸区域,得到人脸区域之后,按照人脸与躯干的尺寸比例,对人脸区域进行扩展,得到主播的完整图像,也即主播所在区域,该主播所在区域采用符号FG(foreground,前景)表示。
本步骤中,对人脸进行检测时,可以采用基于鲁棒的人脸检测算法,具体地,首先采用离线的人脸数据集进行训练得到人脸模型,这里可以采用Adaboost的训练方法得到人脸模型,将训练得到的人脸模型在实时的视频帧中滑动比对,根据比对结果判断当前滑窗中是否是人脸,如此即检测出视频帧中的人脸区域。
本步骤中,按照人脸与躯干的尺寸比例,对人脸区域进行扩展,具体可以为:设定人脸为矩形,其尺寸为a*b,设定人脸以下(m*a)*(n*b)的矩形范围为躯干部分, 其中m表示第一扩展比例,n表示第二扩展比例,将a*b和(m*a)*(n*b)两个矩形区域共同组成的区域确定为主播所在区域。
当本实施例中的方法在主播的客户端执行时,可以根据主播输入的增强指令对主播所在区域进行亮度增强和/或对比度增强,当本实施例中的方法在服务器或者观众的客户端执行时,可以根据默认的增强指令对主播所在区域进行亮度增强和/或对比度增强。其中,主播输入的增强指令和该默认的增强指令的格式一致。
以主播输入的增强指令为例,当增强指令中包括亮度增强系数时,对主播所在区域进行亮度增强,当增强指令中包括对比度增强系数时,对主播所在区域进行对比度增强,当增强指令中同时包括亮度增强系数和对比度增强系数时,对主播所在区域进行亮度增强和对比度增强。
进行亮度增强和/或对比度增强的可实现方式有多种,可选地,本实施例中通过以下方式对主播所在区域进行亮度增强和/或对比度增强:
Figure PCTCN2017079594-appb-000003
dest2=(src2-128)*gamma+128
其中,dest1表示亮度增强后的主播所在区域中各个像素点的像素值,src1表示亮度增强前的主播所在区域中各个像素点的像素值,beat表示上述的亮度增强系数,取值范围可以为[2,11],beta值越大图像越明亮。
dest2表示对比度增强后的主播所在区域中各个像素点的像素值,src2表示对比度增强前的主播所在区域中各个像素点的像素值,gamma表示上述的对比度增强系数,取值范围为[0,1],gamma越大图像的对比度越高。
本步骤中,对每个视频帧中的主播所在区域进行亮度增强和/或对比度增强后,得到与各个视频帧一一对应的多个增强图像,该多个增强图像按照顺序组成增强的直播视频帧序列。
步骤S108,根据每个视频帧中每个像素点对应皮肤的概率值,将每个虚化图像 与对应的增强图像进行融合,得到多个融合图像。
通过步骤S104和步骤S104’,能够得到多个虚化图像和多个增强图像,由于多个虚化图像与直播视频帧序列中的各个视频帧一一对应,且多个增强图像与直播视频帧序列中的各个视频帧一一对应,因此多个虚化图像与多个增强图像一一对应,本步骤中,将每个虚化图像与对应的增强图像进行融合,得到多个融合图像。
本步骤中,根据每个视频帧中每个像素点对应皮肤的概率值,将每个虚化图像与对应的增强图像进行融合,得到融合图像,通过以下公式实现为:
dest=BG_Blur*a+dest3*(1-a)
其中,a表示视频帧中每个像素点对应皮肤的概率值,BG_Blur表示虚化图像的每个像素点的像素值,dest3表示增强图像的每个像素点的像素值,dest表示融合图像的每个像素点的像素值。
通过以上公式,能够将虚化图像和增强图像进行融合,得到融合图像。以上公式中,a表示视频帧中每个像素点对应皮肤的概率值,可选的取值范围在[0,1]之间,a值越大,则像素点为皮肤像素点的概率越高。
本实施例中,视频帧中每个像素点对应皮肤的概率值可以通过以下方式获取:
(1)生成视频帧对应的皮肤图,该皮肤图中,每个像素点被标记为皮肤点或者非皮肤点;
(2)对皮肤图进行模糊处理,得到模糊图像,根据模糊图像中每个像素点的像素值确定视频帧中每个像素点对应皮肤的概率值。
可选地,对视频帧中的每个像素点进行赋值,当像素点是皮肤点时,像素点的像素值为255,当像素点不是皮肤点时,像素点的像素值为0,赋值得到的图像为视频帧的皮肤图。对皮肤图进行模糊处理后,每个像素点的像素值被重新定义,模糊图像中,各个像素点的像素值在255至0之间,将各个像素点的像素值除以255,即将各个像素值归一化至[0,1]之间,归一化后的像素值表示各个像素点对应皮肤的概率值。
通过本步骤,能够将虚化图像与对应的增强图像融合,由于根据每个视频帧中每个像素点对应皮肤的概率值,将虚化图像与对应的增强图像融合,因此能够使得融合边界线性变化,自然过渡。融合完成后,执行步骤S110。
步骤S110,将多个融合图像作为处理后的直播视频帧序列发送出去。
融合得到的融合图像为最终观众要看到的直播图像,将多个融合图像作为处理后的直播视频帧序列发送出去,从而使观众看到背景虚化,前景增强的直播图像,即保护了主播的隐私,又增强了主播图像。
可选地,当本实施例中的方法由主播的客户端执行时,主播的客户端将多个融合图像作为处理后的直播视频帧序列发送至服务器,当本实施例中的方法由服务器执行时,服务器将多个融合图像作为处理后的直播视频帧序列发送至各个观众的客户端。
如图2所示,该方法在上述步骤S102之前,还包括以下步骤:
步骤S101,获取开启隐私保护的设置指令。
当本实施例中的方法由主播的客户端执行时,客户端接收主播发送的开启隐私保护的设置指令,根据该设置指令执行步骤S102至步骤S110。当本实施例中的方法由服务器执行时,主播的客户端接收主播发送的开启隐私保护的设置指令,并发送至服务器,服务器根据该设置指令执行步骤S102至步骤S110。
通过图2中所示的方法,能够对直播视频进行背景虚化和前景增强,从而在保护主播隐私的同时美化主播图像,使主播图像美白提亮,清晰度增强,提高观众的观看体验和主播的直播体验。
基于上述发明构思,在实施时还可以根据需求进行灵活变换,例如,确定的虚化区域可以仅包括静止背景所在的区域,在每个视频帧中,仅对静止背景进行虚化。又例如,确定的虚化区域可以仅包括非静止背景中的非皮肤区域,在每个视频帧中,仅对非静止背景中的非皮肤区域进行虚化。又例如,可以提供虚化静止背景所在的区域的选项,并提供虚化非静止背景中的非皮肤区域的选项,根据用户的选择,选 择虚化静止背景所在的区域和/或非静止背景中的非皮肤区域。对此,本发明实施例中不作具体限制。
对应上述的直播视频处理方法,本发明实施例还提供了一种直播视频处理装置,图3为本发明实施例提供的直播视频处理装置的第一种模块组成示意图,如图3所示,该装置包括:视频获取模块31,用于获取直播视频帧序列;区域确定模块32,用于在直播视频帧序列的每个视频帧中,确定静止背景和非皮肤区域,将静止背景和非皮肤区域共同组成的区域确定为虚化区域;图像虚化模块33,用于在每个视频帧中,对虚化区域进行虚化,得到多个虚化图像。
图3中,区域确定模块32包括:第一确定子模块和第二确定子模块,该第一确定子模块用于将视频帧转换至YUV颜色空间,根据视频帧中各个像素点的Y、U、V取值确定皮肤像素点;该第二确定子模块用于将所有皮肤像素点组成的区域确定为皮肤区域,将皮肤区域以外的区域确定为非皮肤区域。
其中,第一确定子模块具体用于,将U取值在第一范围内,且V取值在第二范围内的像素点确定为皮肤像素点。
本实施例中,首先获取直播视频帧序列,然后在直播视频帧序列的每个视频帧中,确定静止背景和非皮肤区域,将静止背景和非皮肤区域共同组成的区域确定为虚化区域,最后在每个视频帧中,对虚化区域进行虚化,得到多个虚化图像。由于本实施例中的方法能够对直播视频帧序列中的每个视频帧中的部分图像进行虚化处理,且虚化的部分为静止背景和非皮肤区域,也即不会虚化到运动背景和主播面部,因此通过本实施例中的装置,能够保护主播的个人隐私,解决现有的直播方式容易导致主播的个人隐私泄露的问题。
图4为本发明实施例提供的直播视频处理装置的第二种模块组成示意图,如图4所示,本实施例中的装置还包括:指令获取模块30,用于获取开启隐私保护的设置指令。图像增强模块32’,用于在每个视频帧中,确定主播所在区域,对主播所在区域进行亮度增强和/或对比度增强,得到多个增强图像;图像融合模块34,用于根据每个视频帧中每个像素点对应皮肤的概率值,将每个虚化图像与对 应的增强图像进行融合,得到多个融合图像;图像发送模块35,用于将多个融合图像作为处理后的直播视频帧序列发送出去。
其中,图像融合模块34具体用于,通过以下方式获取视频帧中每个像素点对应皮肤的概率值:生成视频帧对应的皮肤图,该皮肤图中,每个像素点被标记为皮肤点或者非皮肤点;对该皮肤图进行模糊处理,得到模糊图像,根据模糊图像中每个像素点的像素值确定视频帧中每个像素点对应皮肤的概率值。
其中,图像融合模块34具体用于,通过以下方式根据每个视频帧中每个像素点对应皮肤的概率值,将每个虚化图像与对应的增强图像进行融合,得到融合图像:
dest=BG_Blur*a+dest3*(1-a)
其中,a表示视频帧中每个像素点对应皮肤的概率值,BG_Blur表示虚化图像的每个像素点的像素值,dest3表示增强图像的每个像素点的像素值,dest表示融合图像的每个像素点的像素值。
其中,图像增强模块32’通过以下方式对主播所在区域进行亮度增强和/或对比度增强:
Figure PCTCN2017079594-appb-000004
dest2=(src2-128)*gamma+128
其中,dest1表示亮度增强后的主播所在区域中各个像素点的像素值,src1表示亮度增强前的主播所在区域中各个像素点的像素值,beat表示亮度增强系数,dest2表示对比度增强后的主播所在区域中各个像素点的像素值,src2表示对比度增强前的主播所在区域中各个像素点的像素值,gamma表示对比度增强系数。
通过图4中所示的装置,能够对直播视频进行背景虚化和前景增强,从而在保护主播隐私的同时美化主播图像,使主播图像美白提亮,清晰度增强,提高观众的观看体验和主播的直播体验。
对应上述的直播视频处理方法,本发明实施例还提供了一种电子设备,图5为本发明实施例提供的电子设备的模块组成示意图,如图5所示,包括存储器1000、处理器2000及存储在存储器1000上并可在处理器2000上运行的计算机程序,处理器2000执行该计算机程序时实现上述实施例中的直播视频处理方法的步骤。
具体地,存储器1000和处理器2000能够为通用的存储器和处理器,这里不做具体限定,存储器1000和处理器2000通过通讯总线连接,当处理器2000运行存储器1000存储的计算机程序时,能够保护主播的个人隐私,解决现有的直播方式容易导致主播的个人隐私泄露的问题。
本发明实施例所提供的直播视频处理装置可以为设备上的特定硬件或者安装于设备上的软件或固件等。本发明实施例所提供的直播视频处理装置,其实现原理及产生的技术效果和前述直播视频处理方法实施例相同,为简要描述,装置实施例部分未提及之处,可参考前述方法实施例中相应内容。所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,前述描述的系统、装置和单元的具体工作过程,均可以参考上述方法实施例中的对应过程,在此不再赘述。
在本发明所提供的实施例中,应该理解到,所揭露装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明提供的实施例中的各功能单元可以集成在一个处理单元中,也 可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释,此外,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。
最后应说明的是:以上所述实施例,仅为本发明的具体实施方式,用以说明本发明的技术方案,而非对其限制,本发明的保护范围并不局限于此,尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本发明实施例技术方案的精神和范围。都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应所述以权利要求的保护范围为准。

Claims (16)

  1. 一种直播视频处理方法,其特征在于,所述方法包括:
    获取直播视频帧序列;
    在所述直播视频帧序列的每个视频帧中,确定静止背景和非皮肤区域,将所述静止背景和所述非皮肤区域共同组成的区域确定为虚化区域;
    在每个所述视频帧中,对所述虚化区域进行虚化,得到多个虚化图像。
  2. 根据权利要求1所述的直播视频处理方法,其特征在于,在执行所述获取直播视频帧序列的步骤之后,所述方法还包括:
    在每个所述视频帧中,确定主播所在区域,对所述主播所在区域进行亮度增强和/或对比度增强,得到多个增强图像;
    根据每个所述视频帧中每个像素点对应皮肤的概率值,将每个所述虚化图像与对应的所述增强图像进行融合,得到多个融合图像;
    将多个所述融合图像作为处理后的直播视频帧序列发送出去。
  3. 根据权利要求2所述的直播视频处理方法,其特征在于,所述视频帧中每个像素点对应皮肤的概率值通过以下方式获取:
    生成所述视频帧对应的皮肤图,所述皮肤图中,每个像素点被标记为皮肤点或者非皮肤点;
    对所述皮肤图进行模糊处理,得到模糊图像,根据所述模糊图像中每个像素点的像素值确定所述视频帧中每个像素点对应皮肤的概率值。
  4. 根据权利要求2或3所述的直播视频处理方法,其特征在于,通过以下方式根据每个所述视频帧中每个像素点对应皮肤的概率值,将每个所述虚化图像与对应的所述增强图像进行融合,得到多个融合图像:
    dest=BG_Blur*a+dest3*(1-a)
    其中,a表示所述视频帧中每个像素点对应皮肤的概率值,BG_Blur表示所述虚化图像的每个像素点的像素值,dest3表示所述增强图像的每个像素点的像素值,dest表示所述融合图像的每个像素点的像素值。
  5. 根据权利要求2至4任一项所述的直播视频处理方法,其特征在于,通过以下方式对所述主播所在区域进行亮度增强和/或对比度增强:
    Figure PCTCN2017079594-appb-100001
    dest2=(src2-128)*gamma+128
    其中,dest1表示亮度增强后的所述主播所在区域中各个像素点的像素值,src1表示亮度增强前的所述主播所在区域中各个像素点的像素值,beat表示亮度增强系数,dest2表示对比度增强后的所述主播所在区域中各个像素点的像素值,src2表示对比度增强前的所述主播所在区域中各个像素点的像素值,gamma表示对比度增强系数。
  6. 根据权利要求1至5任一项所述的直播视频处理方法,其特征在于,采用以下方式在所述视频帧中确定所述非皮肤区域:
    将所述视频帧转换至YUV颜色空间,根据所述视频帧中各个像素点的Y、U、V取值确定皮肤像素点;
    将所有所述皮肤像素点组成的区域确定为皮肤区域,将所述皮肤区域以外的区域确定为所述非皮肤区域。
  7. 根据权利要求6所述的直播视频处理方法,其特征在于,所述根据所述视频帧中各个像素点的Y、U、V取值确定皮肤像素点的步骤,包括:
    将U取值在第一范围内,且V取值在第二范围内的像素点确定为所述皮肤像素点。
  8. 根据权利要求1至7任一项所述的直播视频处理方法,其特征在于,在执 行所述获取直播视频帧序列的步骤之前,所述方法还包括:
    获取开启隐私保护的设置指令。
  9. 根据权利要求1至8任一项所述的直播视频处理方法,其特征在于,所述非皮肤区域为非静止背景中的非皮肤区域。
  10. 一种直播视频处理方法,其特征在于,所述方法包括:
    获取直播视频帧序列;
    在所述直播视频帧序列的每个视频帧中,确定虚化区域,所述虚化区域包括静止背景所在的区域;
    在每个所述视频帧中,对所述虚化区域进行虚化,得到多个虚化图像。
  11. 根据权利要求10所述的直播视频处理方法,其特征在于,所述虚化区域还包括非静止背景中的非皮肤区域。
  12. 根据权利要求10或11所述的直播视频处理方法,其特征在于,所述方法还包括:
    在每个所述视频帧中,确定主播所在区域,对所述主播所在区域进行亮度增强和/或对比度增强,得到多个增强图像;
    根据每个所述视频帧中每个像素点对应皮肤的概率值,将每个所述虚化图像与对应的所述增强图像进行融合,得到多个融合图像;
    将多个所述融合图像作为处理后的直播视频帧序列发送出去。
  13. 根据权利要求12所述的直播视频处理方法,其特征在于,所述视频帧中每个像素点对应皮肤的概率值通过以下方式获取:
    生成所述视频帧对应的皮肤图,所述皮肤图中,每个像素点被标记为皮肤点或者非皮肤点;
    对所述皮肤图进行模糊处理,得到模糊图像,根据所述模糊图像中每个像素点 的像素值确定所述视频帧中每个像素点对应皮肤的概率值。
  14. 一种直播视频处理装置,其特征在于,所述直播视频处理装置包括:
    视频获取模块,用于获取直播视频帧序列;
    区域确定模块,用于在所述直播视频帧序列的每个视频帧中,确定静止背景和非皮肤区域,将所述静止背景和所述非皮肤区域共同组成的区域确定为虚化区域;
    图像虚化模块,用于在每个所述视频帧中,对所述虚化区域进行虚化,得到多个虚化图像。
  15. 根据权利要求14所述的直播视频处理装置,其特征在于,所述非皮肤区域为非静止背景中的非皮肤区域。
  16. 一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现上述权利要求1至13任一项所述的方法的步骤。
PCT/CN2017/079594 2016-12-09 2017-04-06 直播视频处理方法、装置及电子设备 WO2018103244A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611129655.4 2016-12-09
CN201611129655.4A CN106550243A (zh) 2016-12-09 2016-12-09 直播视频处理方法、装置及电子设备

Publications (1)

Publication Number Publication Date
WO2018103244A1 true WO2018103244A1 (zh) 2018-06-14

Family

ID=58397315

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/079594 WO2018103244A1 (zh) 2016-12-09 2017-04-06 直播视频处理方法、装置及电子设备

Country Status (2)

Country Link
CN (1) CN106550243A (zh)
WO (1) WO2018103244A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085686A (zh) * 2020-08-21 2020-12-15 北京迈格威科技有限公司 图像处理方法、装置、电子设备及计算机可读存储介质
CN112637614A (zh) * 2020-11-27 2021-04-09 深圳市创成微电子有限公司 网络直播音视频处理方法、处理器、装置及可读存储介质
CN113129207A (zh) * 2019-12-30 2021-07-16 武汉Tcl集团工业研究院有限公司 一种图片的背景虚化方法及装置、计算机设备、存储介质
US11094042B1 (en) 2021-03-12 2021-08-17 Flyreel, Inc. Face detection and blurring methods and systems
CN114339306A (zh) * 2021-12-28 2022-04-12 广州虎牙科技有限公司 直播视频图像处理方法、装置及服务器

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106550243A (zh) * 2016-12-09 2017-03-29 武汉斗鱼网络科技有限公司 直播视频处理方法、装置及电子设备
CN108735158B (zh) * 2017-04-25 2022-02-11 昆山国显光电有限公司 一种图像展示方法、装置及电子设备
CN107274373B (zh) * 2017-06-30 2018-08-03 武汉斗鱼网络科技有限公司 直播中打码方法及装置
CN107808404A (zh) * 2017-09-08 2018-03-16 广州视源电子科技股份有限公司 图像处理方法、系统、可读存储介质及移动摄像设备
CN108174140A (zh) * 2017-11-30 2018-06-15 维沃移动通信有限公司 一种视频通信的方法和移动终端
CN108133718B (zh) * 2017-12-13 2021-04-06 北京奇虎科技有限公司 一种对视频进行处理的方法和装置
CN108235054A (zh) * 2017-12-15 2018-06-29 北京奇虎科技有限公司 一种直播视频数据的处理方法和装置
CN110580428A (zh) * 2018-06-08 2019-12-17 Oppo广东移动通信有限公司 图像处理方法、装置、计算机可读存储介质和电子设备
CN108921086A (zh) * 2018-06-29 2018-11-30 Oppo广东移动通信有限公司 图像处理方法和装置、存储介质、电子设备
CN109191414A (zh) * 2018-08-21 2019-01-11 北京旷视科技有限公司 一种图像处理方法、装置、电子设备及存储介质
CN109325926B (zh) * 2018-09-30 2021-07-23 武汉斗鱼网络科技有限公司 自动滤镜实现方法、存储介质、设备及系统
CN109379571A (zh) * 2018-12-13 2019-02-22 移康智能科技(上海)股份有限公司 一种智能猫眼的实现方法及智能猫眼
CN109741280B (zh) * 2019-01-04 2022-04-19 Oppo广东移动通信有限公司 图像处理方法、装置、存储介质及电子设备
CN110312164A (zh) * 2019-07-24 2019-10-08 Oppo(重庆)智能科技有限公司 视频处理方法、装置及计算机存储介质和终端设备
CN111028563A (zh) * 2019-11-26 2020-04-17 罗昊 一种艺术设计用多媒体教学系统及其方法
CN115066881B (zh) * 2020-02-06 2023-11-14 Oppo广东移动通信有限公司 对于图像序列生成稳定化图像合成效果的方法、系统及计算机可读介质
CN112770049A (zh) * 2020-12-30 2021-05-07 维沃移动通信有限公司 拍摄方法、装置及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101193261A (zh) * 2007-03-28 2008-06-04 腾讯科技(深圳)有限公司 一种视频通信系统及方法
US20080259154A1 (en) * 2007-04-20 2008-10-23 General Instrument Corporation Simulating Short Depth of Field to Maximize Privacy in Videotelephony
CN104243973A (zh) * 2014-08-28 2014-12-24 北京邮电大学 基于感兴趣区域的视频感知质量无参考客观评价方法
CN104378553A (zh) * 2014-12-08 2015-02-25 联想(北京)有限公司 一种图像处理方法及电子设备
CN104580678A (zh) * 2013-10-26 2015-04-29 西安群丰电子信息科技有限公司 一种手机的带背景的通话实现方法
CN105340263A (zh) * 2013-06-10 2016-02-17 思杰系统有限公司 向在线会议提供具有虚拟幕布的用户视频
CN106550243A (zh) * 2016-12-09 2017-03-29 武汉斗鱼网络科技有限公司 直播视频处理方法、装置及电子设备

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5088161B2 (ja) * 2008-02-15 2012-12-05 ソニー株式会社 画像処理装置、カメラ装置、通信システム、画像処理方法、およびプログラム
US8306283B2 (en) * 2009-04-21 2012-11-06 Arcsoft (Hangzhou) Multimedia Technology Co., Ltd. Focus enhancing method for portrait in digital image
CN103593834B (zh) * 2013-12-03 2017-06-13 厦门美图网科技有限公司 一种智能添加景深的图像增强方法
CN104966266B (zh) * 2015-06-04 2019-07-09 福建天晴数码有限公司 自动模糊身体部位的方法及系统
CN105913400A (zh) * 2016-05-03 2016-08-31 成都索贝数码科技股份有限公司 一种获得高质量且实时的美颜的装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101193261A (zh) * 2007-03-28 2008-06-04 腾讯科技(深圳)有限公司 一种视频通信系统及方法
US20080259154A1 (en) * 2007-04-20 2008-10-23 General Instrument Corporation Simulating Short Depth of Field to Maximize Privacy in Videotelephony
CN105340263A (zh) * 2013-06-10 2016-02-17 思杰系统有限公司 向在线会议提供具有虚拟幕布的用户视频
CN104580678A (zh) * 2013-10-26 2015-04-29 西安群丰电子信息科技有限公司 一种手机的带背景的通话实现方法
CN104243973A (zh) * 2014-08-28 2014-12-24 北京邮电大学 基于感兴趣区域的视频感知质量无参考客观评价方法
CN104378553A (zh) * 2014-12-08 2015-02-25 联想(北京)有限公司 一种图像处理方法及电子设备
CN106550243A (zh) * 2016-12-09 2017-03-29 武汉斗鱼网络科技有限公司 直播视频处理方法、装置及电子设备

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113129207A (zh) * 2019-12-30 2021-07-16 武汉Tcl集团工业研究院有限公司 一种图片的背景虚化方法及装置、计算机设备、存储介质
CN113129207B (zh) * 2019-12-30 2023-08-01 武汉Tcl集团工业研究院有限公司 一种图片的背景虚化方法及装置、计算机设备、存储介质
CN112085686A (zh) * 2020-08-21 2020-12-15 北京迈格威科技有限公司 图像处理方法、装置、电子设备及计算机可读存储介质
CN112637614A (zh) * 2020-11-27 2021-04-09 深圳市创成微电子有限公司 网络直播音视频处理方法、处理器、装置及可读存储介质
CN112637614B (zh) * 2020-11-27 2023-04-21 深圳市创成微电子有限公司 网络直播音视频处理方法、处理器、装置及可读存储介质
US11094042B1 (en) 2021-03-12 2021-08-17 Flyreel, Inc. Face detection and blurring methods and systems
US11699257B2 (en) 2021-03-12 2023-07-11 Flyreel, Inc. Face detection and blurring methods and systems
CN114339306A (zh) * 2021-12-28 2022-04-12 广州虎牙科技有限公司 直播视频图像处理方法、装置及服务器
CN114339306B (zh) * 2021-12-28 2024-05-28 广州虎牙科技有限公司 直播视频图像处理方法、装置及服务器

Also Published As

Publication number Publication date
CN106550243A (zh) 2017-03-29

Similar Documents

Publication Publication Date Title
WO2018103244A1 (zh) 直播视频处理方法、装置及电子设备
US11037281B2 (en) Image fusion method and device, storage medium and terminal
WO2018176925A1 (zh) Hdr图像的生成方法及装置
US20190098277A1 (en) Image processing apparatus, image processing method, image processing system, and storage medium
WO2022160701A1 (zh) 特效生成方法、装置、设备及存储介质
US9600898B2 (en) Method and apparatus for separating foreground image, and computer-readable recording medium
TWI767985B (zh) 用於處理影像性質圖的方法及裝置
WO2019057041A1 (zh) 用于实现图像增强的方法、装置和电子设备
CN112182299B (zh) 一种视频中精彩片段的获取方法、装置、设备和介质
EP3644599A1 (en) Video processing method and apparatus, electronic device, and storage medium
WO2018233217A1 (zh) 图像处理方法、装置和增强现实设备
JP2017017431A (ja) 画像処理装置、情報処理方法及びプログラム
EP3707895B1 (en) Static video recognition
US10460487B2 (en) Automatic image synthesis method
WO2022160857A1 (zh) 图像处理方法及装置、计算机可读存储介质和电子设备
WO2016158001A1 (ja) 情報処理装置、情報処理方法、プログラム及び記録媒体
JP2020197989A (ja) 画像処理システム、情報処理装置、画像処理方法、およびプログラム
CN107564085B (zh) 图像扭曲处理方法、装置、计算设备及计算机存储介质
US20220398704A1 (en) Intelligent Portrait Photography Enhancement System
US10282633B2 (en) Cross-asset media analysis and processing
US20230131418A1 (en) Two-dimensional (2d) feature database generation
US20180012066A1 (en) Photograph processing method and system
Oui et al. An augmented reality's framework for mobile
JP2021005798A (ja) 撮像装置、撮像装置の制御方法およびプログラム
JP2016071496A (ja) 情報端末装置、方法及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17877564

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17877564

Country of ref document: EP

Kind code of ref document: A1