WO2018103243A1 - 一种带宽节省方法、系统、直播终端及可读取存储介质 - Google Patents

一种带宽节省方法、系统、直播终端及可读取存储介质 Download PDF

Info

Publication number
WO2018103243A1
WO2018103243A1 PCT/CN2017/079588 CN2017079588W WO2018103243A1 WO 2018103243 A1 WO2018103243 A1 WO 2018103243A1 CN 2017079588 W CN2017079588 W CN 2017079588W WO 2018103243 A1 WO2018103243 A1 WO 2018103243A1
Authority
WO
WIPO (PCT)
Prior art keywords
macroblock
interest
region
coordinate
macroblocks
Prior art date
Application number
PCT/CN2017/079588
Other languages
English (en)
French (fr)
Inventor
李亮
Original Assignee
武汉斗鱼网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 武汉斗鱼网络科技有限公司 filed Critical 武汉斗鱼网络科技有限公司
Publication of WO2018103243A1 publication Critical patent/WO2018103243A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234363Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the spatial resolution, e.g. for clients with a lower screen resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data
    • H04N21/25891Management of end-user data being end-user preferences
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440263Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the spatial resolution, e.g. for displaying on a connected PDA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Definitions

  • the present invention relates to the field of live broadcast application technologies, and in particular, to a bandwidth saving method, system, live broadcast terminal, and readable storage medium.
  • the present invention provides a bandwidth saving method, system, live broadcast terminal, and readable storage medium, which can reduce bandwidth usage in a live video transmission process by encoding different regions in a video image according to different coding rates. cost.
  • a bandwidth saving method is applied to a live video, and the method includes:
  • the step of determining whether the preset number of macroblocks are located in the region of interest comprises:
  • the step of determining that the macro block is located in the area of interest comprises:
  • the macroblock located outside the range of coordinate values of the region of interest is labeled as a second macroblock.
  • the real-time video image is set in a coordinate system such that row coordinates of the real-time video image are sequentially incremented from top to bottom, and column coordinates of the real-time video image are sequentially increased from left to right;
  • the step of marking the macroblock outside the range of coordinate values of the region of interest as a second macroblock includes:
  • Determining that the macroblock is a second macroblock if any one of the following conditions is met; otherwise, determining that the macroblock is a first macroblock;
  • the conditions include:
  • a row coordinate of a top left coordinate point of the macroblock is greater than a row coordinate of a lower right corner coordinate point of the region of interest;
  • a column coordinate of an upper left coordinate point of the macro block is greater than a column coordinate of a lower right coordinate point of the region of interest;
  • a row coordinate of a lower right corner coordinate point of the macroblock is smaller than a row coordinate of an upper left coordinate point of the region of interest
  • the column coordinates of the lower right corner coordinate point of the macroblock are smaller than the column coordinates of the upper left corner coordinate point of the region of interest.
  • the step of detecting the region of interest in the real-time video image includes:
  • the face detection algorithm includes any one of the following:
  • Face detection algorithm based on deep learning.
  • the method further includes:
  • the received first encoded data and the second encoded data are decoded by a corresponding decoding algorithm.
  • Another preferred embodiment of the present invention provides a bandwidth saving system, which is applied to live video, and the bandwidth saving system is The system includes:
  • a macroblock dividing module configured to divide the real-time video image into a preset number of macroblocks
  • a detecting module configured to detect a region of interest in the real-time video image
  • a determining module configured to determine whether the preset number of macroblocks are located in the region of interest, and mark a macroblock located in the region of interest among the preset number of macroblocks as a first macroblock, A macroblock located outside the region of interest among the preset number of macroblocks is marked as a second macroblock;
  • An encoding module configured to encode image data of the first macroblock at a first encoding rate, generate first encoded data, and encode image data of the second macroblock at a second encoding rate to generate a second Encoded data, wherein the first encoding rate is greater than the second encoding rate.
  • the determining module is configured to:
  • the determining module includes:
  • Obtaining a sub-module configured to acquire coordinate values of the region of interest and coordinate values of the preset number of macroblocks
  • a determining submodule configured to determine whether a coordinate value of the preset number of macroblocks is within a coordinate value range of the region of interest
  • a marking submodule configured to mark the macroblock whose coordinate value is within a range of coordinate values of the region of interest as a first macroblock, and to set the coordinate value to be outside the range of coordinate values of the region of interest The block is marked as a second macroblock.
  • the real-time video image is set in a coordinate system such that row coordinates of the real-time video image are sequentially incremented from top to bottom, and column coordinates of the real-time video image are sequentially increased from left to right;
  • Determining a sub-module determining whether a coordinate value of the preset number of macroblocks is within a coordinate value range of the region of interest; and the macroblock marker having a coordinate value within a coordinate value range of the region of interest
  • the manner in which the macroblock whose coordinate value is outside the range of coordinate values of the region of interest is marked as the second macroblock includes:
  • Determining that the macroblock is a second macroblock if any one of the following conditions is met; otherwise, determining that the macroblock is a first macroblock;
  • the conditions include:
  • a row coordinate of a top left coordinate point of the macroblock is greater than a row coordinate of a lower right corner coordinate point of the region of interest;
  • a column coordinate of an upper left coordinate point of the macro block is greater than a column coordinate of a lower right coordinate point of the region of interest;
  • a row coordinate of a lower right corner coordinate point of the macroblock is smaller than a row coordinate of an upper left coordinate point of the region of interest
  • the column coordinates of the lower right corner coordinate point of the macroblock are smaller than the column coordinates of the upper left corner coordinate point of the region of interest.
  • the manner in which the detecting module detects the region of interest in the real-time video image includes:
  • the face detection algorithm includes any one of the following:
  • Face detection algorithm based on deep learning.
  • the encoding module includes a first encoding submodule and a second encoding submodule
  • the first encoding submodule is configured to encode the image data of the first macroblock at a first encoding rate to generate first encoded data
  • the second encoding submodule is configured to encode the image data of the second macroblock at a second encoding rate to generate second encoded data.
  • the bandwidth saving system further includes a sending module and a receiving module
  • the sending module is configured to send the first encoded data and the second encoded data
  • the receiving module is configured to decode the received first encoded data and the second encoded data by a corresponding decoding algorithm.
  • Another preferred embodiment of the present invention further provides a live broadcast terminal, including a memory, a processor, and the bandwidth saving system.
  • the bandwidth saving system is installed or stored in the memory, and the processor controls execution of each functional module of the bandwidth saving system.
  • the present invention also provides a readable storage medium stored in a computer, comprising a plurality of instructions configured to implement the bandwidth saving method described above.
  • the bandwidth saving method, system, live broadcast terminal, and readable storage medium provided by the embodiments of the present invention are applied to live video, and according to different coding rates of different areas in the video image without affecting the user's viewing experience. Encoding to reduce the bandwidth cost during live video transmission.
  • FIG. 1 is a schematic structural diagram of a block of a live broadcast terminal applying a broadband saving system according to an embodiment of the present invention.
  • FIG. 2 is a connection block diagram of a bandwidth saving system according to an embodiment of the present invention.
  • FIG. 3 is a block diagram showing the sub-module connection of the determination module 130 shown in FIG. 2.
  • FIG. 4 is a block diagram showing the submodule connection of the encoding module 140 shown in FIG. 2.
  • FIG. 5 is a schematic flowchart of a bandwidth saving method according to another embodiment of the present invention.
  • Icons 10-live terminal; 100-bandwidth saving system; 110-macroblock division module; 120-detection module; 130-judgment module; 132-acquisition sub-module; 134-judgment sub-module; 136-tag sub-module; Encoding module; 142 - first encoding sub-module; 144 - second encoding sub-module; 200-memory; 300-storage controller; 400-processor.
  • FIG. 1 is a block diagram showing the structure of a live broadcast terminal 10 of a bandwidth saving system 100 according to an embodiment of the present invention.
  • the live broadcast terminal 10 includes a bandwidth saving system 100, a memory 200, a storage controller 300, and a processor 400.
  • the components of the memory 200, the storage controller 300, and the processor 400 are directly or indirectly electrically connected to each other to implement data transmission or interaction. For example, these components are electrically connected by one or more communication buses or signal lines.
  • the bandwidth saving system 100 includes at least one software function module that can be stored in the memory 200 or in an operating system of the live terminal 10 in the form of software or firmware.
  • the processor 400 accesses the memory 200 under the control of the memory controller 300 for executing executable modules stored in the memory 200, such as software function modules included in the bandwidth saving system 100 and Computer programs, etc.
  • FIG. 2 is a connection block diagram of a bandwidth saving system 100 according to an embodiment of the present invention, which is applied to a live video.
  • the bandwidth saving system 100 includes a macroblock dividing module 110, a detecting module 120, a judging module 130, and an encoding module 140.
  • the macroblock dividing module 110 is configured to divide the real-time video image into a preset number of macroblocks.
  • one coded picture is usually composed of several macroblocks, and one macroblock is composed of one luma pixel block and two additional chroma pixel blocks.
  • the luma block is a 16*16 pixel block, and the size of the two chroma image block depends on the sampling format of the image. For example, for the YUV420 sampled image, the chroma block is 8*8. Pixel block.
  • the video coding algorithm encodes the macroblocks in units of macroblocks and organizes them into a continuous video stream.
  • the preset number refers to the number of macroblocks that are preset according to the resolution width and height of the video image before the real-time video image is divided.
  • the detecting module 120 detects an important area in the video image, that is, a region of interest, where The region of interest may be a rectangular region.
  • the detecting module 120 may be a face detecting module, and identify a face location area in the live video image frame according to the face detecting technology.
  • the algorithm for detecting the face detection by the detecting module 120 may be a face recognition algorithm based on classification (such as Adaboost algorithm), a face recognition algorithm based on Support Vector Machine (SVM), and a hidden Markov model.
  • the face detection algorithm and the face detection method based on deep learning such as Convolutional Neural Networks (CNN)), any of the above algorithms can obtain accurate detection results.
  • CNN Convolutional Neural Networks
  • the determining module 130 determines, according to the region of interest in the real-time video image detected by the detecting module 120, whether the preset number of macroblocks divided by the macroblock dividing module 110 are located in the sense Within the region of interest, and marking a macroblock located in the region of interest among the preset number of macroblocks as a first macroblock, and placing the preset number of macroblocks outside the region of interest The macroblock is marked as the second macroblock.
  • the determining module 130 is configured to detect whether an area where the macro block is located overlaps with the area of interest; and when the area where the macro block is located overlaps with the area of interest, determining that the macro block is located The region of interest.
  • the determining module 130 includes an obtaining sub-module 132, a determining sub-module 134, and a marking sub-module 136.
  • the obtaining sub-module 132 is configured to acquire coordinate values of the region of interest and coordinate values of the preset number of macroblocks, where the coordinate values of the preset number of macroblocks refer to a preset number.
  • the determining sub-module 134 is configured to determine whether a coordinate value of the preset number of macroblocks is within a coordinate value range of the region of interest.
  • the marking sub-module 136 is configured to mark the macroblock whose coordinate value is within the coordinate value range of the region of interest as a first macroblock, and the coordinate value is located outside the coordinate value range of the region of interest The macroblock is labeled as the second macroblock.
  • the real-time video image is set in a coordinate system to make the line of the real-time video image
  • the coordinates are sequentially increased from top to bottom, and the column coordinates of the real-time video image are sequentially increased from left to right;
  • the determining module 130 determines whether the coordinate value of the preset number of macroblocks is within a coordinate value range of the region of interest; and the macroblock marker with the coordinate value located within a coordinate value range of the region of interest For the first macroblock, the manner in which the macroblock whose coordinate value is outside the range of coordinate values of the region of interest is marked as the second macroblock includes:
  • Determining that the macroblock is a second macroblock if any one of the following conditions is met; otherwise, determining that the macroblock is a first macroblock;
  • the conditions include:
  • a row coordinate of a top left coordinate point of the macroblock is greater than a row coordinate of a lower right corner coordinate point of the region of interest;
  • a column coordinate of an upper left coordinate point of the macro block is greater than a column coordinate of a lower right coordinate point of the region of interest;
  • a row coordinate of a lower right corner coordinate point of the macroblock is smaller than a row coordinate of an upper left coordinate point of the region of interest
  • the column coordinates of the lower right corner coordinate point of the macroblock are smaller than the column coordinates of the upper left corner coordinate point of the region of interest.
  • the coordinate values of the rectangular area of the face and the coordinate values of the M*N rectangular macroblocks are obtained, such as the coordinates A (left, top) and the lower right coordinates B (right, bottom) of the upper left corner of the rectangular area.
  • the value range of i is [0, M-1]
  • the value range of j is [0, N-1]
  • the coordinate of the upper left corner of the macroblock of the jth row and the i-th column is C(i *W/M, j*H/N)
  • the coordinates of the lower right corner are D((i+1)*W/M, (j+1)*H/N). Therefore, the judgment criteria must satisfy the following pseudo code logic.
  • the macro block has no face area
  • the encoding module 140 determines and marks the first macro block and the location according to the determining module 130.
  • the second macroblock selects a different coding rate to encode the image of the region of interest and the region of non-interest.
  • the encoding rate that is, the code rate
  • the larger the bit rate of the video file the smaller the compression ratio and the higher the picture quality. That is to say, the larger the code rate, the larger the sampling rate per unit time, the higher the data accuracy, and the closer the decoded file is to the original file.
  • the encoding module 140 includes a first encoding sub-module 142 and a second encoding sub-module 144.
  • the first encoding sub-module 142 is configured to encode the image data of the first macroblock at a first encoding rate to generate first encoded data.
  • the second encoding sub-module 144 is configured to encode the image data of the second macroblock at a second encoding rate to generate second encoded data, where the first encoding rate is greater than the second encoding rate.
  • the non-interest region when encoding the image data, the non-interest region has a lower image quality requirement than the region of interest, and therefore, when performing image data encoding, the region of interest
  • the code rate is greater than the code rate of the non-interest region, so that the region of interest maintains a high code rate, and the code rate of the non-interest region is reduced, and the code rate automatic adaptation of different image regions is realized.
  • the code rate of the region of interest is set to 2 Mbps, and the code rate of the non-interest region is reduced to 1.5 Mbps or 1 Mbps.
  • the bandwidth saving system 100 further includes an image acquisition module, a sending module, and a receiving module.
  • the image acquisition module is configured to acquire a real-time video image in a live video image. And transmitting the real-time video image to the macroblock partitioning module 110 for macroblock partitioning.
  • the acquiring module may be a separate camera, or may be integrated into an electronic device such as a computer or a mobile phone as a video input device, so that people can have images and sounds in the network environment through the camera. Conversation and communication.
  • the sending module is configured to transmit the first encoded data and the second encoded data that are encoded by the encoding module 140.
  • the receiving module is configured to decode the received first encoded data and the second encoded data by a corresponding decoding algorithm to restore a real-time video image in the live video.
  • the decoding algorithm matches the encoding algorithm used by the encoding module 140 to perform image data encoding.
  • FIG. 5 is a schematic flowchart of a bandwidth saving method according to a preferred embodiment of the present invention. The following steps are detailed for the specific process shown in FIG. 5.
  • Step S201 Acquire a real-time video image.
  • Step S202 dividing the real-time video image into a preset number of macroblocks.
  • the step S201 is performed by an image acquiring module, and the step S202 is performed by the macroblock dividing module 110.
  • the resolution width of the video image is W and the height is H
  • the video picture is divided into M*N rectangular macroblocks, and the width and height of each macro block are obtained as W/M and H/N, respectively.
  • Step S203 detecting a region of interest in the real-time video image.
  • the step S203 is performed by the detecting module 120.
  • the detecting module 120 may be a face detecting module, and the rectangular area where the face is located is detected by the face detecting technology, and the upper left corner of the rectangular area may also be obtained by the face detecting technology. Coordinate A (left, top) and bottom right coordinate B (right, bottom).
  • Step S204 It is determined whether the preset number of macroblocks are located in the region of interest, and if the macroblock is located in the region of interest, step S2051 is performed; otherwise, step S2052 is performed.
  • step S204 may include:
  • Step S2051 Marked as the first macroblock.
  • Step S2052 Marked as the second macroblock.
  • the step S204, the step S2051, and the step S2052 are performed by the determining module 130.
  • the real-time video image is set in a coordinate system such that row coordinates of the real-time video image are sequentially incremented from top to bottom, and column coordinates of the real-time video image are sequentially increased from left to right;
  • the step of marking the macroblock outside the range of coordinate values of the region of interest as a second macroblock includes:
  • Determining that the macroblock is a second macroblock if any one of the following conditions is met; otherwise, determining that the macroblock is a first macroblock;
  • the conditions include:
  • a row coordinate of a top left coordinate point of the macroblock is greater than a row coordinate of a lower right corner coordinate point of the region of interest;
  • a column coordinate of an upper left coordinate point of the macro block is greater than a column coordinate of a lower right coordinate point of the region of interest;
  • a row coordinate of a lower right corner coordinate point of the macroblock is smaller than a row coordinate of an upper left coordinate point of the region of interest
  • the column coordinates of the lower right corner coordinate point of the macroblock are smaller than the column coordinates of the upper left corner coordinate point of the region of interest.
  • the coordinate value of the rectangular area of the face and the M*N are first acquired by the acquiring submodule 132.
  • the coordinate values of the rectangular macroblocks such as the upper left corner coordinate A (left, top) and the lower right corner coordinate B (right, bottom) of the rectangular area.
  • the coordinate of the upper left corner of the macroblock of the jth row and the i-th column is C(i *W/M, j*H/N)
  • the coordinates of the lower right corner are D((i+1)*W/M, (j+1)*H/N). Therefore, the judgment criteria must satisfy the following pseudo code logic.
  • the judging result is marked by the marking sub-module 136. That is, the macroblock whose coordinate value is within the coordinate value range of the face region (region of interest) is marked as the first macroblock, and the coordinate value is located in the face region (region of interest) The macroblock outside the range of coordinate values is labeled as a second macroblock.
  • Step S2061 Generate first encoded data.
  • Step S2062 Generate second encoded data.
  • the step S2061 and the step S2062 are performed by the encoding module 140.
  • the first encoding sub-module 142 encodes the image data of the first macroblock at a first encoding rate to generate first encoded data.
  • the second encoding sub-module 144 encodes the image data of the second macroblock at a second encoding rate to generate second encoded data. It should be noted that in the encoding process, the first encoding rate is greater than the second encoding rate, ie, the allocation of dynamic code rates is achieved.
  • the total amount of video data outside the encoded face region (region of interest) is reduced, and the total amount of video data of the face region (region of interest) is relatively large, thereby ensuring video images.
  • the quality of the important area It should be understood that, in actual implementation, the steps S2061 and S2062 are performed in no order.
  • the bit rate of the face area is not lowered, the picture quality of the face area is not affected, and the code rate of the non-face area is reduced, and the total amount of encoded video data is also reduced, and the push is pushed.
  • the traffic to the Content Delivery Network (CDN) server is also reduced, and the bandwidth of the network transmission is also reduced.
  • CDN Content Delivery Network
  • Step S207 Data transmission.
  • the step S207 is performed by the sending module. That is, the sending module transmits the first encoded data and the second encoded data generated by the encoding module 140.
  • Step S208 receiving and decoding.
  • the step S208 is performed by the receiving module.
  • the receiving module is configured to decode the received first encoded data and the second encoded data by a corresponding decoding algorithm to restore a live video image in the live video.
  • the bandwidth saving method, system, live broadcast terminal, and readable storage medium dynamically allocate different coding rates (such as a region of interest and a non-interest region) to different regions in a real-time video picture.
  • different coding rates such as a region of interest and a non-interest region
  • the present invention can reduce the bandwidth cost in the data transmission process of the live video without affecting the user's viewing experience.
  • the terms “set”, “connected”, and “connected” shall be understood broadly, and may be, for example, a fixed connection, a detachable connection, or an integral connection; it may be a mechanical connection, or It is an electrical connection; it can be directly connected or indirectly connected through an intermediate medium, which can be the internal connection between two components.
  • the specific meaning of the above terms in the present invention can be understood in a specific case by those skilled in the art.
  • each block in the flowchart or block diagram can represent a module, a block, or a portion of code.
  • a portion of the module, program segment or code contains one or a predetermined number of logical functions for implementing the specification.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or function. Or it can be implemented by a combination of dedicated hardware and computer instructions.

Abstract

本发明提供一种带宽节省方法、系统、直播终端及可读取存储介质,应用于直播视频中。该方法首先将实时视频图像划分成预设数量个宏块,然后检测实时视频图像中的感兴趣区域,并判断预设数量个宏块是否位于感兴趣区域,将位于该感兴趣区域内的宏块标记为第一宏块,将位于该感兴趣区域外的宏块标记为第二宏块,最后,按第一编码速率对第一宏块的图像数据进行编码,生成第一编码数据,按第二编码速率对第二宏块的图像数据进行编码,生成第二编码数据,其中,第一编码速率大于第二编码速率。本发明能够有效降低直播视频数据传输过程中的带宽成本。

Description

一种带宽节省方法、系统、直播终端及可读取存储介质
本申请要求于2016年12月09日提交中国专利局的申请号为CN201611129506.8、名称为“一种带宽节省方法和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及直播应用技术领域,具体而言,涉及一种带宽节省方法、系统、直播终端及可读取存储介质。
背景技术
在当前的视频直播行业中,带宽成本的投入非常巨大。经发明人研究发现,视频传输的码率、直播带宽以及视频分辨率之间有很大关系,如何根据视频画面中不同区域的重要程度,在保证重要区域视频分辨率的前提下,降低视频直播过程中的带宽成本,成为本领域技术人员亟待解决的技术问题。
发明内容
有鉴于此,本发明提供一种带宽节省方法、系统、直播终端及可读取存储介质,通过对视频图像中的不同区域按照不同的编码速率进行编码,从而降低直播视频传输过程中的带宽使用成本。
本发明较佳实施例一种带宽节省方法,应用于直播视频中,所述方法包括:
将实时视频图像划分成预设数量个宏块;
检测所述实时视频图像中的感兴趣区域;
判断所述预设数量个宏块是否位于所述感兴趣区域,将所述预设数量个宏块中位于所述感兴趣区域内的宏块标记为第一宏块,将所述预设数量个宏块中位于所述感兴趣区域外的宏块标记为第二宏块;
按第一编码速率对所述第一宏块的图像数据进行编码,生成第一编码数据,按第二编码速率对所述第二宏块的图像数据进行编码,生成第二编码数据,其中,所述第一编码速率大于所述第二编码速率。
进一步地,所述判断所述预设数量个宏块是否位于所述感兴趣区域的步骤包括:
检测所述宏块所在的区域是否与所述感兴趣区域重叠;
当所述宏块所在的区域与所述感兴趣区域重叠时,判定所述宏块位于所述感兴趣区域。
进一步地,所述当所述宏块所在的区域与所述感兴趣区域重叠时,判定所述宏块位于所述感兴趣区域的步骤包括:
获取所述感兴趣区域的坐标值以及所述宏块的坐标值;
判断所述宏块的坐标值是否位于所述感兴趣区域的坐标值范围内,将坐标值位于所述感兴趣区域的坐标值范围内的所述宏块标记为第一宏块,将坐标值位于所述感兴趣区域的坐标值范围外的所述宏块标记为第二宏块。
进一步地,将所述实时视频图像设置在坐标系中,以使所述实时视频图像的行坐标从上至下依次递增,所述实时视频图像的列坐标从左到右依次递增;
所述判断所述宏块的坐标值是否位于所述感兴趣区域的坐标值范围内,将坐标值位于所述感兴趣区域的坐标值范围内的所述宏块标记为第一宏块,将坐标值位于所述感兴趣区域的坐标值范围外的所述宏块标记为第二宏块的步骤包括:
若满足以下条件任意之一则判定所述宏块为第二宏块,反之,则判定所述宏块为第一宏块;
所述条件包括:
所述宏块的左上角坐标点的行坐标大于所述感兴趣区域的右下角坐标点的行坐标;
所述宏块的左上角坐标点的列坐标大于所述感兴趣区域的右下角坐标点的列坐标;
所述宏块的右下角坐标点的行坐标小于所述感兴趣区域的左上角坐标点的行坐标;及
所述宏块的右下角坐标点的列坐标小于所述感兴趣区域的左上角坐标点的列坐标。
进一步地,所述检测所述实时视频图像中的感兴趣区域的步骤包括:
通过人脸检测算法获取所述实时视频图像的人脸区域,并将所述人脸图像区域作为所述实时视频图像中的感兴趣区域。
进一步地,所述人脸检测算法包括以下任意一种:
基于分类器的人脸识别算法;
基于支持向量机的人脸识别算法;
基于隐马尔可夫的人脸识别算法;或
基于深度学习的人脸检测算法。
进一步地,在按第一码率将所述第一编码数据进行传输,按第二码率将所述第二编码数据进行传输的步骤之后,所述方法还包括:
通过相应的解码算法对接收到的所述第一编码数据和所述第二编码数据进行解码。
本发明另一较佳实施例提供一种带宽节省系统,应用于直播视频中,所述带宽节省系 统包括:
宏块划分模块,配置成将实时视频图像划分成预设数量个宏块;
检测模块,配置成检测所述实时视频图像中的感兴趣区域;
判断模块,配置成判断所述预设数量个宏块是否位于所述感兴趣区域,将所述预设数量个宏块中位于所述感兴趣区域内的宏块标记为第一宏块,将所述预设数量个宏块中位于所述感兴趣区域外的宏块标记为第二宏块;
编码模块,配置成按第一编码速率对所述第一宏块的图像数据进行编码,生成第一编码数据,按第二编码速率对所述第二宏块的图像数据进行编码,生成第二编码数据,其中,所述第一编码速率大于所述第二编码速率。
进一步地,所述判断模块配置成:
用于检测所述宏块所在的区域是否与所述感兴趣区域重叠;以及
当所述宏块所在的区域与所述感兴趣区域重叠时,判定所述宏块位于所述感兴趣区域。
进一步地,所述判断模块包括:
获取子模块,配置成获取所述感兴趣区域的坐标值以及所述预设数量个宏块的坐标值;
判断子模块,配置成判断所述预设数量个宏块的坐标值是否位于所述感兴趣区域的坐标值范围内;
标记子模块,配置成将坐标值位于所述感兴趣区域的坐标值范围内的所述宏块标记为第一宏块,将坐标值位于所述感兴趣区域的坐标值范围外的所述宏块标记为第二宏块。
进一步地,将所述实时视频图像设置在坐标系中,以使所述实时视频图像的行坐标从上至下依次递增,所述实时视频图像的列坐标从左到右依次递增;
所述判定子模块判断所述预设数量个宏块的坐标值是否位于所述感兴趣区域的坐标值范围内;将坐标值位于所述感兴趣区域的坐标值范围内的所述宏块标记为第一宏块,将坐标值位于所述感兴趣区域的坐标值范围外的所述宏块标记为第二宏块的方式包括:
若满足以下条件任意之一则判定所述宏块为第二宏块,反之,则判定所述宏块为第一宏块;
所述条件包括:
所述宏块的左上角坐标点的行坐标大于所述感兴趣区域的右下角坐标点的行坐标;
所述宏块的左上角坐标点的列坐标大于所述感兴趣区域的右下角坐标点的列坐标;
所述宏块的右下角坐标点的行坐标小于所述感兴趣区域的左上角坐标点的行坐标;及
所述宏块的右下角坐标点的列坐标小于所述感兴趣区域的左上角坐标点的列坐标。
进一步地,所述检测模块检测所述实时视频图像中的感兴趣区域的方式包括:
通过人脸检测算法获取所述实时视频图像的人脸图像区域,并将所述人脸图像区域作为所述实时视频图像中的感兴趣区域。
进一步地,所述人脸检测算法包括以下任意一种:
基于分类器的人脸识别算法;
基于支持向量机的人脸识别算法;
基于隐马尔可夫的人脸识别算法;或
基于深度学习的人脸检测算法。
进一步地,所述编码模块包括第一编码子模块和第二编码子模块;
所述第一编码子模块,配置成按第一编码速率对所述第一宏块的图像数据进行编码,生成第一编码数据;
所述第二编码子模块,配置成按第二编码速率对所述第二宏块的图像数据进行编码,生成第二编码数据。
进一步地,所述带宽节省系统还包括发送模块和接收模块;
所述发送模块,配置成发送所述第一编码数据和所述第二编码数据;
所述接收模块,配置成通过相应的解码算法对接收到的所述第一编码数据和所述第二编码数据进行解码。
本发明另一较佳实施例还提供一种直播终端,包括存储器、处理器以及所述带宽节省系统。所述带宽节省系统安装或存储于所述存储器由所述处理器控制所述带宽节省系统各功能模块的执行。
另一方面,本发明还提供了一种存储于计算机的可读取存储介质,包括多条指令,所述多条指令被配置成实现上述带宽节省方法。
本发明实施例提供的带宽节省方法、系统、直播终端及可读取存储介质,应用于直播视频中,在不影响用户观看体验的前提下,通过对视频图像中的不同区域按照不同的编码速率进行编码,以降低直播视频传输过程中的带宽成本。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本发明的某些实施例,因此不应被看作是对范 围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1为本发明实施例提供的一种应用宽带节省系统的直播终端的方框结构示意图。
图2为本发明实施例提供的一种带宽节省系统的连接框图。
图3为图2所示的判断模块130的子模块连接框图。
图4为图2所示的编码模块140的子模块连接框图。
图5为本发明另一实施例提供的一种带宽节省方法的流程示意图。
图标:10-直播终端;100-带宽节省系统;110-宏块划分模块;120-检测模块;130-判断模块;132-获取子模块;134-判断子模块;136-标记子模块;140-编码模块;142-第一编码子模块;144-第二编码子模块;200-存储器;300-存储控制器;400-处理器。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。
因此,以下对在附图中提供的本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围,而是仅仅表示本发明的选定实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
如图1所示,为本发明实施例提供的一种应用带宽节省系统100的直播终端10的方框结构示意图。所述直播终端10包括带宽节省系统100、存储器200、存储控制器300以及处理器400。
其中,所述存储器200、存储控制器300、处理器400各元件相互之间直接或间接地电性连接,以实现数据的传输或交互。例如,这些元件之间通过一条或多条通讯总线或信号线实现电性连接。所述带宽节省系统100包括至少一个可以软件或固件的形式存储于所述存储器200中或固化在所述直播终端10的操作系统中的软件功能模块。所述处理器400在所述存储控制器300的控制下访问所述存储器200,以用于执行所述存储器200中存储的可执行模块,例如所述带宽节省系统100所包括的软件功能模块及计算机程序等。
进一步地,请参阅图2,为本发明实施例提供的一种带宽节省系统100的连接框图,应用于直播视频中。所述带宽节省系统100包括宏块划分模块110、检测模块120、判断模块130和编码模块140。
具体地,所述宏块划分模块110用于将实时视频图像划分成预设数量个宏块。可选地,在视频编码中,一个编码图像通常由若干宏块组成,一个宏块由一个亮度像素块和附加的两个色度像素块组成。一般来说,亮度块为16*16大小的像素块,而两个色度图像像素块的大小依据其图像的采样格式而定,例如,对于YUV420采样图像,色度块是为8*8大小的像素块。同时,每个图象中,若干宏块被排列成片的形式,视频编码算法以宏块为单位,逐个宏块进行编码,组织成连续的视频码流。应理解,所述预设数量是指在将实时视频图像进行划分前,根据视频图像的分辨率宽度和高度而预先设置的宏块数量个数。
进一步地,通过所述宏块划分模块110将所述视频图像划分成所述预设数量个宏块后,所述检测模块120检测该视频图像中的重要区域,即感兴趣区域,其中,所述感兴趣区域可以为矩形区域。
可选地,所述检测模块120可以为人脸检测模块,并根据人脸检测技术识别直播视频图像画面中的人脸位置区域。所述检测模块120实现人脸检测的算法可以是基于分类其的人脸识别算法(如Adaboost算法)、基于支持向量机(Support Vector Machine,SVM)的人脸识别算法、基于隐马尔可夫模型的人脸检测算法,以及基于深度学习的人脸检测方法(如卷积神经网络(Convolutional Neural Networks,CNN)),上述任意一种算法都能得到准确的检测效果。
进一步地,所述判断模块130根据所述检测模块120检测到的所述实时视频图像中的感兴趣区域,判断所述宏块划分模块110划分成的预设数量个宏块是否位于所述感兴趣区域内,并将所述预设数量个宏块中位于所述感兴趣区域内的宏块标记为第一宏块,将所述预设数量个宏块中位于所述感兴趣区域外的宏块标记为第二宏块。
具体地,所述判断模块130用于检测所述宏块所在的区域是否与所述感兴趣区域重叠;当所述宏块所在的区域与所述感兴趣区域重叠时,判定所述宏块位于所述感兴趣区域。
如图3所示,所述判断模块130包括获取子模块132、判断子模块134及标记子模块136。
其中,所述获取子模块132用于获取所述感兴趣区域的坐标值以及所述预设数量个宏块的坐标值,其中,所述预设数量个宏块的坐标值是指预设数量个宏块中每个宏块的坐标值。所述判断子模块134用于判断所述预设数量个宏块的坐标值是否位于所述感兴趣区域的坐标值范围内。所述标记子模块136用于将坐标值位于所述感兴趣区域的坐标值范围内的所述宏块标记为第一宏块,将坐标值位于所述感兴趣区域的坐标值范围外的所述宏块标记为第二宏块。
在本实施例中将所述实时视频图像设置在坐标系中,以使所述实时视频图像的行 坐标从上至下依次递增,所述实时视频图像的列坐标从左到右依次递增;
所述判断模块130判断所述预设数量个宏块的坐标值是否位于所述感兴趣区域的坐标值范围内;将坐标值位于所述感兴趣区域的坐标值范围内的所述宏块标记为第一宏块,将坐标值位于所述感兴趣区域的坐标值范围外的所述宏块标记为第二宏块的方式包括:
若满足以下条件任意之一则判定所述宏块为第二宏块,反之,则判定所述宏块为第一宏块;
所述条件包括:
所述宏块的左上角坐标点的行坐标大于所述感兴趣区域的右下角坐标点的行坐标;
所述宏块的左上角坐标点的列坐标大于所述感兴趣区域的右下角坐标点的列坐标;
所述宏块的右下角坐标点的行坐标小于所述感兴趣区域的左上角坐标点的行坐标;及
所述宏块的右下角坐标点的列坐标小于所述感兴趣区域的左上角坐标点的列坐标。
具体地,结合下面的具体例子对第一宏块和第二宏块的判定过程进行说明。
首先获取所述人脸矩形区域的坐标值以及所述M*N个矩形宏块的坐标值,如所述矩形区域左上角坐标A(left,top)和右下角坐标B(right,bottom)。可选地,假设i的取值范围为[0,M-1],j的取值范围为[0,N-1],那么第j行第i列的宏块左上角坐标为C(i*W/M,j*H/N),右下角坐标为D((i+1)*W/M,(j+1)*H/N)。因此,判断标准需满足以下伪代码逻辑。
if(D.x<A.x||D.y<A.y||C.x>B.x||C.y>B.y)
{
该宏块没有人脸区域
}
else
{
该宏块有人脸区域
}
其中A.x=left,A.y=top,B.x=right,B.y=bottom,C.x=i*W/M,C.y=j*H/N,D.x=(i+1)*W/M,D.y=(j+1)*H/N。
进一步地,所述编码模块140根据所述判断模块130判断并标记的所述第一宏块和所 述第二宏块,选择不同的编码速率对所述感兴趣区域和非感兴趣区域的图像进行编码。具体地,在实时视频图像编码过程中,所述编码速率即码率,决定着视频图像的分辨率大小,取决于单位时间内的取样率,即取样率越大,精度就越高。在同样分辨率下,视频文件的码率越大,压缩比就越小,画面质量就越高。也就是说,码率越大,单位时间内取样率越大,数据精度越高,解码后的文件就越接近原始文件。
可选地,如图4所示,所述编码模块140包括第一编码子模块142和第二编码子模块144。其中,所述第一编码子模块142用于按第一编码速率对所述第一宏块的图像数据进行编码,生成第一编码数据。所述第二编码子模块144用于按第二编码速率对所述第二宏块的图像数据进行编码,生成第二编码数据,其中,第一编码速率大于第二编码速率。
具体地,在本发明实施例中,进行图像数据编码时,相对于所述感兴趣区域,非感兴趣区域对画质的要求较低,因此,在进行图像数据编码时,所述感兴趣区域的码率大于非感兴趣区域的码率,以使感兴趣区域保持高码率,而降低非感兴趣区域码率,实现不同图像区域的码率自动适配。例如,在分辨率微1280*720的视频画面中,将所述感兴趣区域码率设置为2Mbps,将非感兴趣区域的码率降低为1.5Mbps或者1Mbps。
进一步地,所述带宽节省系统100还包括图像获取模块、发送模块和接收模块。
其中,所述图像获取模块用于获取直播视频图像中的实时视频图像。并将所述实时视频图像传送给所述宏块划分模块110以进行宏块划分。可选地,所述获取模块可以为单独的摄像头,也可以集成于电脑、手机等电子设备上为作为视频输入设备,以使人们彼此之间可通过摄像头在网络环境中进行有影像、有声音的交谈和沟通。
进一步地,所述发送模块用于对通过所述编码模块140编码完成的所述第一编码数据和所述第二编码数据进行传输。
进一步地,所述接收模块用于通过相应的解码算法对接收到的所述第一编码数据和所述第二编码数据进行解码,以还原所述直播视频中的实时视频图像。其中,所述解码算法与所述编码模块140进行图像数据编码时采用的编码算法相匹配。
基于上述的带宽节省系统100的设计和描述,下面对基于所述带宽节省系统100的带宽节省方法作进一步阐述。具体请参阅图5,为本发明较佳实施例提供的一种带宽节省方法的流程示意图,以下步骤是对图5所示的具体流程进行的详细阐述。
步骤S201:获取实时视频图像。
步骤S202:将实时视频图像划分成预设数量个宏块。
具体地,在本发明实施例中,所述步骤S201由图像获取模块执行,所述步骤S202由所述宏块划分模块110执行。假设视频图像的分辨率宽度为W,高度为H,那么将视频画面划分为M*N个矩形宏块,可得到每个宏块的宽度和高度分别为W/M、H/N。
步骤S203:检测所述实时视频图像中的感兴趣区域。
具体地,在本发明实施例中,所述步骤S203由检测模块120执行。例如,在本发明实施例中,所述检测模块120可以为人脸检测模块,通过人脸检测技术检测人脸所在的矩形区域,同时,也可通过所述人脸检测技术获取该矩形区域左上角坐标A(left,top)和右下角坐标B(right,bottom)。
步骤S204:分别判断所述预设数量个宏块是否位于感兴趣区域,若所述宏块位于所述感兴趣区域内,则执行步骤S2051;反之,执行步骤S2052。
具体地,步骤S204可以包括:
检测所述宏块所在的区域是否与所述感兴趣区域重叠;
当所述宏块所在的区域与所述感兴趣区域重叠时,判定所述宏块位于所述感兴趣区域。
步骤S2051:标记为第一宏块。
步骤S2052:标记为第二宏块。
具体地,在本发明实施例中,所述步骤S204、步骤S2051和步骤S2052由判断模块130执行。
在本实施例中,将所述实时视频图像设置在坐标系中,以使所述实时视频图像的行坐标从上至下依次递增,所述实时视频图像的列坐标从左到右依次递增;
所述判断所述宏块的坐标值是否位于所述感兴趣区域的坐标值范围内,将坐标值位于所述感兴趣区域的坐标值范围内的所述宏块标记为第一宏块,将坐标值位于所述感兴趣区域的坐标值范围外的所述宏块标记为第二宏块的步骤包括:
若满足以下条件任意之一则判定所述宏块为第二宏块,反之,则判定所述宏块为第一宏块;
所述条件包括:
所述宏块的左上角坐标点的行坐标大于所述感兴趣区域的右下角坐标点的行坐标;
所述宏块的左上角坐标点的列坐标大于所述感兴趣区域的右下角坐标点的列坐标;
所述宏块的右下角坐标点的行坐标小于所述感兴趣区域的左上角坐标点的行坐标;及
所述宏块的右下角坐标点的列坐标小于所述感兴趣区域的左上角坐标点的列坐标。
具体地,首先通过所述获取子模块132获取所述人脸矩形区域的坐标值以及所述M*N 个矩形宏块的坐标值,如所述矩形区域左上角坐标A(left,top)和右下角坐标B(right,bottom)。可选地,假设i的取值范围为[0,M-1],j的取值范围为[0,N-1],那么第j行第i列的宏块左上角坐标为C(i*W/M,j*H/N),右下角坐标为D((i+1)*W/M,(j+1)*H/N)。因此,判断标准需满足以下伪代码逻辑。
Figure PCTCN2017079588-appb-000001
其中A.x=left,A.y=top,B.x=right,B.y=bottom,C.x=i*W/M,C.y=j*H/N,D.x=(i+1)*W/M,D.y=(j+1)*H/N。
进一步地,在判断过程中,通过所述标记子模块136对上述判断结果进行标记。也就是说,将坐标值位于所述人脸区域(感兴趣区域)的坐标值范围内的所述宏块标记为第一宏块,将坐标值位于所述人脸区域(感兴趣区域)的坐标值范围外的所述宏块标记为第二宏块。
步骤S2061:生成第一编码数据。
步骤S2062:生成第二编码数据。
具体地,在本发明实施例中,所述步骤S2061和步骤S2062由所述编码模块140执行。其中,所述第一编码子模块142按第一编码速率对所述第一宏块的图像数据进行编码,生成第一编码数据。所述第二编码子模块144按第二编码速率对所述第二宏块的图像数据进行编码,生成第二编码数据。应注意,在编码过程中,所述第一编码速率大于所述第二编码速率即实现动态码率的分配。具体地,完成编码后的所述人脸区域(感兴趣区域)外的视频数据总量降低,而所述人脸区域(感兴趣区域)的视频数据总量相对较大,进而保证视频图像中的重要区域的画质。应理解,在实际实施时,所述步骤S2061和步骤S2062执行顺序不分先后。
进一步地,由于人脸区域的码率并没有降低,人脸区域的画面质量并不会受到影响,而非人脸区域的码率降低了,那么编码后的视频数据总量也会降低,推送到内容分发网络(Content Delivery Network,CDN)服务器的流量也会相应减少,网络传输的带宽也会降低。
步骤S207:数据传输。
具体地,在本发明实施例中,所述步骤S207由所述发送模块执行。即所述发送模块对所述编码模块140生成的第一编码数据和第二编码数据进行传输。
步骤S208:接收并解码。
具体地,在本发明实施例中,所述步骤S208由所述接收模块执行。所述接收模块用于通过相应的解码算法对接收到的所述第一编码数据和所述第二编码数据进行解码,以还原所述直播视频中的实时视频图像。
综上所述,本发明提供的带宽节省方法、系统、直播终端及可读取存储介质,对实时视频画面中的不同区域动态分配不同的编码速率(如感兴趣区域和非感兴趣区域),以完成直播视频图像的编码、传输。相比于现有技术,本发明能够在不影响用户观看体验的前提下,降低直播视频的数据传输过程中的带宽成本。
在本发明的描述中,术语“设置”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本发明中的具体含义。
在本发明实施例所提供的几个实施例中,应该理解到,所揭露的装置和方法,也可以通过其他方式实现。以上所描述的装置和方法实施例仅仅是示意性的,例如,附图中的流程图和框图显示了根据本发明的预设数量个实施例的装置、方法和计算机程序产品可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分。所述模块、程序段或代码的一部分包含一个或预设数量个用于实现规定的逻辑功能。
也应当注意,在有些作为替换的实现方式中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (17)

  1. 一种带宽节省方法,其特征在于,应用于直播视频中,所述方法包括:
    将实时视频图像划分成预设数量个宏块;
    检测所述实时视频图像中的感兴趣区域;
    判断所述预设数量个宏块是否位于所述感兴趣区域,将所述预设数量个宏块中位于所述感兴趣区域内的宏块标记为第一宏块,将所述预设数量个宏块中位于所述感兴趣区域外的宏块标记为第二宏块;
    按第一编码速率对所述第一宏块的图像数据进行编码,生成第一编码数据并发送,按第二编码速率对所述第二宏块的图像数据进行编码,生成第二编码数据并发送,其中,所述第一编码速率大于所述第二编码速率。
  2. 根据权利要求1所述的带宽节省方法,其特征在于,所述判断所述预设数量个宏块是否位于所述感兴趣区域的步骤包括:
    检测所述宏块所在的区域是否与所述感兴趣区域重叠;
    当所述宏块所在的区域与所述感兴趣区域重叠时,判定所述宏块位于所述感兴趣区域。
  3. 根据权利要求1或2所述的带宽节省方法,其特征在于,所述当所述宏块所在的区域与所述感兴趣区域重叠时,判定所述宏块位于所述感兴趣区域的步骤包括:
    获取所述感兴趣区域的坐标值以及所述宏块的坐标值;
    判断所述宏块的坐标值是否位于所述感兴趣区域的坐标值范围内,将坐标值位于所述感兴趣区域的坐标值范围内的所述宏块标记为第一宏块,将坐标值位于所述感兴趣区域的坐标值范围外的所述宏块标记为第二宏块。
  4. 根据权利要求3所述的带宽节省方法,其特征在于,将所述实时视频图像设置在坐标系中,以使所述实时视频图像的行坐标从上至下依次递增,所述实时视频图像的列坐标从左到右依次递增;
    所述判断所述宏块的坐标值是否位于所述感兴趣区域的坐标值范围内,将坐标值位于所述感兴趣区域的坐标值范围内的所述宏块标记为第一宏块,将坐标值位于所述感兴趣区域的坐标值范围外的所述宏块标记为第二宏块的步骤包括:
    若满足以下条件任意之一则判定所述宏块为第二宏块,反之,则判定所述宏块为第一宏块;
    所述条件包括:
    所述宏块的左上角坐标点的行坐标大于所述感兴趣区域的右下角坐标点的行坐标;
    所述宏块的左上角坐标点的列坐标大于所述感兴趣区域的右下角坐标点的列坐标;
    所述宏块的右下角坐标点的行坐标小于所述感兴趣区域的左上角坐标点的行坐标;及
    所述宏块的右下角坐标点的列坐标小于所述感兴趣区域的左上角坐标点的列坐标。
  5. 根据权利要求1-4中任意一项所述的带宽节省方法,其特征在于,所述检测所述实时视频图像中的感兴趣区域的步骤包括:
    通过人脸检测算法获取所述实时视频图像的人脸图像区域,并将所述人脸图像区域作为所述实时视频图像中的感兴趣区域。
  6. 根据权利要求5所述的带宽节省方法,其特征在于,所述人脸检测算法包括以下任意一种:
    基于分类器的人脸识别算法;
    基于支持向量机的人脸识别算法;
    基于隐马尔可夫的人脸识别算法;或
    基于深度学习的人脸检测算法。
  7. 根据权利要求1-6中任意一项所述的带宽节省方法,其特征在于,在所述生成第一编码数据和第二编码数据并发送的步骤之后,所述方法还包括:
    通过相应的解码算法对接收到的所述第一编码数据和所述第二编码数据进行解码。
  8. 一种带宽节省系统,应用于直播视频中,其特征在于,所述带宽节省系统包括:
    宏块划分模块,配置成将实时视频图像划分成预设数量个宏块;
    检测模块,配置成检测所述实时视频图像中的感兴趣区域;
    判断模块,配置成判断所述预设数量个宏块是否位于所述感兴趣区域,将所述预设数量个宏块中位于所述感兴趣区域内的宏块标记为第一宏块,将所述预设数量个宏块中位于所述感兴趣区域外的宏块标记为第二宏块;
    编码模块,配置成按第一编码速率对所述第一宏块的图像数据进行编码,生成第一编码数据,按第二编码速率对所述第二宏块的图像数据进行编码,生成第二编码数据,其中,所述第一编码速率大于所述第二编码速率。
  9. 根据权利要求8所述的带宽节省系统,其特征在于,所述判断模块配置成:
    检测所述宏块所在的区域是否与所述感兴趣区域重叠;以及
    在所述宏块所在的区域与所述感兴趣区域重叠时,判定所述宏块位于所述感兴趣 区域。
  10. 根据权利要求8或9所述的带宽节省系统,其特征在于,所述判断模块包括:
    获取子模块,配置成获取所述感兴趣区域的坐标值以及所述预设数量个宏块的坐标值;
    判断子模块,配置成判断所述预设数量个宏块的坐标值是否位于所述感兴趣区域的坐标值范围内;
    标记子模块,配置成将坐标值位于所述感兴趣区域的坐标值范围内的所述宏块标记为第一宏块,将坐标值位于所述感兴趣区域的坐标值范围外的所述宏块标记为第二宏块。
  11. 根据权利要求10所述的带宽节省系统,其特征在于,将所述实时视频图像设置在坐标系中,以使所述实时视频图像的行坐标从上至下依次递增,所述实时视频图像的列坐标从左到右依次递增;
    所述判定子模块判断所述预设数量个宏块的坐标值是否位于所述感兴趣区域的坐标值范围内;将坐标值位于所述感兴趣区域的坐标值范围内的所述宏块标记为第一宏块,将坐标值位于所述感兴趣区域的坐标值范围外的所述宏块标记为第二宏块的方式包括:
    若满足以下条件任意之一则判定所述宏块为第二宏块,反之,则判定所述宏块为第一宏块;
    所述条件包括:
    所述宏块的左上角坐标点的行坐标大于所述感兴趣区域的右下角坐标点的行坐标;
    所述宏块的左上角坐标点的列坐标大于所述感兴趣区域的右下角坐标点的列坐标;
    所述宏块的右下角坐标点的行坐标小于所述感兴趣区域的左上角坐标点的行坐标;及
    所述宏块的右下角坐标点的列坐标小于所述感兴趣区域的左上角坐标点的列坐标。
  12. 根据权利要求8-11中任意一项所述的带宽节省系统,其特征在于,所述检测模块检测所述实时视频图像中的感兴趣区域的方式包括:
    通过人脸检测算法获取所述实时视频图像的人脸图像区域,并将所述人脸图像区域作为所述实时视频图像中的感兴趣区域。
  13. 根据权利要求12所述的带宽节省系统,其特征在于,所述人脸检测算法包括 以下任意一种:
    基于分类器的人脸识别算法;
    基于支持向量机的人脸识别算法;
    基于隐马尔可夫的人脸识别算法;或
    基于深度学习的人脸检测算法。
  14. 根据权利要求8-13中任意一项所述的带宽节省系统,其特征在于,所述编码模块包括第一编码子模块和第二编码子模块;
    所述第一编码子模块,配置成按第一编码速率对所述第一宏块的图像数据进行编码,生成第一编码数据;
    所述第二编码子模块,配置成按第二编码速率对所述第二宏块的图像数据进行编码,生成第二编码数据。
  15. 根据权利要求8-14中任意一项所述的带宽节省系统,其特征在于,所述带宽节省系统还包括发送模块和接收模块;
    所述发送模块,配置成发送所述第一编码数据和所述第二编码数据;
    所述接收模块,配置成通过相应的解码算法对接收到的所述第一编码数据和所述第二编码数据进行解码。
  16. 一种直播终端,其特征在于,包括:
    存储器;
    处理器;及
    带宽节省系统,所述系统安装于所述存储器中并包括一个或多个由所述处理器执行的软件功能模块,所述装置包括:
    宏块划分模块,配置成将实时视频图像划分成预设数量个宏块;
    检测模块,配置成检测所述实时视频图像中的感兴趣区域;
    判断模块,配置成判断所述预设数量个宏块是否位于所述感兴趣区域,将所述预设数量个宏块中位于所述感兴趣区域内的宏块标记为第一宏块,将所述预设数量个宏块中位于所述感兴趣区域外的宏块标记为第二宏块;
    编码模块,配置成按第一编码速率对所述第一宏块的图像数据进行编码,生成第一编码数据,按第二编码速率对所述第二宏块的图像数据进行编码,生成第二编码数据,其中,所述第一编码速率大于所述第二编码速率。
  17. 一种存储于计算机的可读取存储介质,其特征在于,包括多条指令,所述多条指令被配置成实现如权利要求1-7任一项所述的方法。
PCT/CN2017/079588 2016-12-09 2017-04-06 一种带宽节省方法、系统、直播终端及可读取存储介质 WO2018103243A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611129506.8 2016-12-09
CN201611129506.8A CN106550240A (zh) 2016-12-09 2016-12-09 一种带宽节省方法和系统

Publications (1)

Publication Number Publication Date
WO2018103243A1 true WO2018103243A1 (zh) 2018-06-14

Family

ID=58397230

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/079588 WO2018103243A1 (zh) 2016-12-09 2017-04-06 一种带宽节省方法、系统、直播终端及可读取存储介质

Country Status (2)

Country Link
CN (1) CN106550240A (zh)
WO (1) WO2018103243A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674778A (zh) * 2019-09-30 2020-01-10 安徽创世科技股份有限公司 一种高分辨视频图像目标检测方法及装置
CN113301342A (zh) * 2021-05-13 2021-08-24 广州方硅信息技术有限公司 视频编码方法、网络直播方法、装置和终端设备
CN113810739A (zh) * 2020-06-17 2021-12-17 国基电子(上海)有限公司 影像传输方法、终端及计算机可读存储介质
CN114827684A (zh) * 2022-04-25 2022-07-29 青岛日日顺乐信云科技有限公司 一种基于5g的交互式视频服务方法及系统
CN116033189A (zh) * 2023-03-31 2023-04-28 卓望数码技术(深圳)有限公司 基于云边协同的直播互动视频分区智能控制方法和系统

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106550240A (zh) * 2016-12-09 2017-03-29 武汉斗鱼网络科技有限公司 一种带宽节省方法和系统
CN107040794A (zh) * 2017-04-26 2017-08-11 盯盯拍(深圳)技术股份有限公司 视频播放方法、服务器、虚拟现实设备以及全景虚拟现实播放系统
CN109218836B (zh) * 2017-06-30 2021-02-26 华为技术有限公司 一种视频处理方法及其设备
CN108600863A (zh) * 2018-03-28 2018-09-28 腾讯科技(深圳)有限公司 多媒体文件处理方法和装置、存储介质及电子装置
CN109005421A (zh) * 2018-08-17 2018-12-14 青岛海信电器股份有限公司 图像处理方法及装置、计算机可读存储介质
CN109862019B (zh) * 2019-02-20 2021-10-22 联想(北京)有限公司 数据处理方法、装置以及系统
CN110049324B (zh) * 2019-04-12 2022-10-14 深圳壹账通智能科技有限公司 视频编码方法、系统、设备及计算机可读存储介质
CN112118446B (zh) * 2019-06-20 2022-04-26 杭州海康威视数字技术股份有限公司 图像压缩方法及装置
CN110557633B (zh) * 2019-08-28 2021-06-29 深圳大学 图像数据的压缩传输方法、系统和计算机可读存储介质
CN110519607B (zh) * 2019-09-27 2022-05-20 腾讯科技(深圳)有限公司 视频解码方法及装置,视频编码方法及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101534444A (zh) * 2009-04-20 2009-09-16 杭州华三通信技术有限公司 一种图像处理方法、系统和装置
US20100215098A1 (en) * 2009-02-23 2010-08-26 Mondo Systems, Inc. Apparatus and method for compressing pictures with roi-dependent compression parameters
CN101867799A (zh) * 2009-04-17 2010-10-20 北京大学 一种视频帧处理方法和视频编码器
CN104105006A (zh) * 2014-07-23 2014-10-15 北京永新视博信息技术有限公司 一种视频图像处理方法和系统
CN104980740A (zh) * 2014-04-08 2015-10-14 富士通株式会社 图像处理方法、装置和电子设备
CN106131670A (zh) * 2016-07-12 2016-11-16 块互动(北京)科技有限公司 一种自适应视频编码方法及终端
CN106550240A (zh) * 2016-12-09 2017-03-29 武汉斗鱼网络科技有限公司 一种带宽节省方法和系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100215098A1 (en) * 2009-02-23 2010-08-26 Mondo Systems, Inc. Apparatus and method for compressing pictures with roi-dependent compression parameters
CN101867799A (zh) * 2009-04-17 2010-10-20 北京大学 一种视频帧处理方法和视频编码器
CN101534444A (zh) * 2009-04-20 2009-09-16 杭州华三通信技术有限公司 一种图像处理方法、系统和装置
CN104980740A (zh) * 2014-04-08 2015-10-14 富士通株式会社 图像处理方法、装置和电子设备
CN104105006A (zh) * 2014-07-23 2014-10-15 北京永新视博信息技术有限公司 一种视频图像处理方法和系统
CN106131670A (zh) * 2016-07-12 2016-11-16 块互动(北京)科技有限公司 一种自适应视频编码方法及终端
CN106550240A (zh) * 2016-12-09 2017-03-29 武汉斗鱼网络科技有限公司 一种带宽节省方法和系统

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674778A (zh) * 2019-09-30 2020-01-10 安徽创世科技股份有限公司 一种高分辨视频图像目标检测方法及装置
CN113810739A (zh) * 2020-06-17 2021-12-17 国基电子(上海)有限公司 影像传输方法、终端及计算机可读存储介质
US11812036B2 (en) 2020-06-17 2023-11-07 Ambit Microsystems (Shanghai) Ltd. Method for image transmitting, transmitting device and receiving device
CN113810739B (zh) * 2020-06-17 2024-02-09 富联国基(上海)电子有限公司 影像传输方法、终端及计算机可读存储介质
CN113301342A (zh) * 2021-05-13 2021-08-24 广州方硅信息技术有限公司 视频编码方法、网络直播方法、装置和终端设备
CN113301342B (zh) * 2021-05-13 2022-07-22 广州方硅信息技术有限公司 视频编码方法、网络直播方法、装置和终端设备
CN114827684A (zh) * 2022-04-25 2022-07-29 青岛日日顺乐信云科技有限公司 一种基于5g的交互式视频服务方法及系统
CN114827684B (zh) * 2022-04-25 2023-06-02 青岛海尔乐信云科技有限公司 一种基于5g的交互式视频服务方法及系统
CN116033189A (zh) * 2023-03-31 2023-04-28 卓望数码技术(深圳)有限公司 基于云边协同的直播互动视频分区智能控制方法和系统

Also Published As

Publication number Publication date
CN106550240A (zh) 2017-03-29

Similar Documents

Publication Publication Date Title
WO2018103243A1 (zh) 一种带宽节省方法、系统、直播终端及可读取存储介质
CN108780499B (zh) 基于量化参数的视频处理的系统和方法
US11775247B2 (en) Real-time screen sharing
US9172907B2 (en) Method and apparatus for dynamically adjusting aspect ratio of images during a video call
WO2018006825A1 (zh) 视频编码方法和装置
US9936208B1 (en) Adaptive power and quality control for video encoders on mobile devices
US9332271B2 (en) Utilizing a search scheme for screen content video coding
US20220046261A1 (en) Encoding method and apparatus for screen sharing, storage medium, and electronic device
KR100669837B1 (ko) 입체 비디오 코딩을 위한 포어그라운드 정보 추출 방법
WO2018161867A1 (zh) 码率分配方法、设备及存储介质
JP2013532926A (ja) 複数のプロセッサを使用してビデオフレームを符号化するための方法およびシステム
CN110555334B (zh) 人脸特征确定方法、装置、存储介质及电子设备
JP5950605B2 (ja) 画像処理システム、及び、画像処理方法
KR20190023546A (ko) 영상 부호화 장치 및 영상 부호화 시스템
CN114157870A (zh) 编码方法、介质及电子设备
US10595045B2 (en) Device and method for compressing panoramic video images
US11196977B2 (en) Unified coding of 3D objects and scenes
TWI586175B (zh) 視訊會議頻寬管理方法及系統
CN112183227B (zh) 一种智能泛人脸区域的编码方法和设备
KR102345258B1 (ko) 객체영역 검출방법, 장치 및 이에 대한 컴퓨터 프로그램
CN113810692A (zh) 对变化和移动进行分帧的方法、图像处理装置及程序产品
CN112104872A (zh) 图像传输方法及装置
CN110619362A (zh) 一种基于感知与像差的视频内容比对方法及装置
US11956441B2 (en) Identifying long term reference frame using scene detection and perceptual hashing
CN113660487B (zh) 用于为帧图像分配对应比特数的参数确定方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17879537

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17879537

Country of ref document: EP

Kind code of ref document: A1