WO2019196573A1 - 一种流媒体转码方法、装置、计算机设备及可读介质 - Google Patents

一种流媒体转码方法、装置、计算机设备及可读介质 Download PDF

Info

Publication number
WO2019196573A1
WO2019196573A1 PCT/CN2019/076993 CN2019076993W WO2019196573A1 WO 2019196573 A1 WO2019196573 A1 WO 2019196573A1 CN 2019076993 W CN2019076993 W CN 2019076993W WO 2019196573 A1 WO2019196573 A1 WO 2019196573A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame image
streaming media
image
frame
fast transform
Prior art date
Application number
PCT/CN2019/076993
Other languages
English (en)
French (fr)
Inventor
许赫赫
Original Assignee
北京大米科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京大米科技有限公司 filed Critical 北京大米科技有限公司
Publication of WO2019196573A1 publication Critical patent/WO2019196573A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/142Detection of scene cut or scene change
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234309Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4 or from Quicktime to Realvideo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234381Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the temporal resolution, e.g. decreasing the frame rate by frame skipping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/266Channel or content management, e.g. generation and management of keys and entitlement messages in a conditional access system, merging a VOD unicast channel into a multicast channel
    • H04N21/2662Controlling the complexity of the video stream, e.g. by scaling the resolution or bitrate of the video stream based on the client capabilities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440281Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the temporal resolution, e.g. by frame skipping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47217End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for controlling playback functions for recorded or on-demand content, e.g. using progress bars, mode or play-point indicators or bookmarks

Definitions

  • the present invention relates to the field of online education. More specifically, it relates to a streaming media transcoding method, apparatus, computer device, and readable medium.
  • the current course playback page saves the courseware for teachers and students, as well as multiple audio and video files.
  • the playback page needs to load multiple files at the same time.
  • the logical relationship between multiple files is complex, and the regular audio and video.
  • the file size is large, which may cause the audio and video files to be loaded and dragged slowly, the sound and picture are not synchronized, and the dragging feeling is strong.
  • the bandwidth requirement is high, and the playback page is for the audio and video files of the teacher and the student.
  • the loading needs to rely on the decoding and transmission of third-party software such as FLASH and Content Delivery Network (CDN).
  • CDN Content Delivery Network
  • an object of the present invention is to provide a streaming media transcoding method, which simplifies processing of an area where user sensitivity is not high to reduce the size of a streaming media file, and improves playback of a page streaming media file.
  • another object of the present invention is to provide a streaming media transcoding device, and it is still another object of the present invention to provide a computer device, and still another object of the present invention is to provide a computer readable medium.
  • a streaming media transcoding method including
  • performing image coding on the streaming media file to obtain the multi-frame image specifically includes:
  • the method further comprises converting at least one of said at least one P frame image to a B frame image.
  • determining the fast transform region in the multi-frame image specifically includes:
  • Performing image recognition processing on the image to be processed including the fast transform region results in the position of the fast transform region.
  • determining a set of to-be-processed images including the complex scene in the multi-frame image includes:
  • determining that the one-frame image includes a complex scene In response to the image size of one frame of the multi-frame image being greater than a preset second threshold, determining that the one-frame image includes a complex scene.
  • the method further comprises performing interpolation compensation on said fast transform region.
  • each frame image of the multi-frame image includes a plurality of pixel points
  • Performing interpolation compensation on the fast transform region specifically includes:
  • a streaming media transcoding device including
  • An image coding module configured to perform image coding on a streaming media file to obtain a multi-frame image
  • An image analysis module configured to determine a fast transform region in the multi-frame image
  • an image processing module configured to perform a simplified process on at least a portion of the multi-frame image that is outside the fast transform region to obtain a standard streaming media file.
  • the image encoding module is used for
  • the image processing module is further configured to:
  • the image analysis module is further configured to:
  • the image size difference between two adjacent ones of the consecutive N or more images to be processed in the set of to-be-processed images is greater than a preset first threshold, determining that the N or more to-be-processed images include a fast transform Region, where N is a positive integer greater than or equal to 2;
  • Performing image recognition processing on the image to be processed including the fast transform region results in the position of the fast transform region.
  • the image analysis module is further configured to determine that the one-frame image includes a complex scene in response to the image size of one frame of the multi-frame image being greater than a preset second threshold.
  • the image processing module is further configured to perform interpolation compensation on the fast transform region.
  • Each frame image of the multi-frame image includes a plurality of pixel points
  • the image processing module is further configured to acquire a plurality of pixel points included in the fast transform region, and insert at least one pixel point between any two adjacent pixel points.
  • a computer apparatus includes a memory, a processor, and a computer program stored on the memory and executable on the processor,
  • the method as described above is implemented when the processor executes the program.
  • the method is implemented as described above when executed by the processor.
  • the invention converts the streaming media file into image into a chronologically arranged multi-frame continuous image, and further determines a fast transformation region in the multi-frame image, the quick transformation region is usually generated by the action of the teacher or the student during the class, the teacher or The student is the user's key viewing area, but in the multi-frame image, in addition to the non-rapid transformation area of the fast transformation area, usually the course background (such as grassland) changes during the class, these non-rapid transformation areas are not the focus of the user's viewing. Focusing on the area, performing at least part of the multi-frame image outside the fast transform area to obtain a standard streaming media file, which can reduce the storage size of the streaming media file, reduce the bit rate of the streaming media file, and improve playback.
  • the streaming file loading speed of the page reaches the effect of second opening and second dragging.
  • Figure 1 shows a schematic diagram of a playback page in the prior art
  • FIG. 2 is a flow chart showing a specific embodiment of a streaming media transcoding method according to the present invention
  • FIG. 3 is a flow chart showing a method for determining a fast transition region in the multi-frame image according to a specific embodiment of the streaming media transcoding method of the present invention
  • FIG. 4 is a schematic structural diagram of a specific embodiment of a streaming media transcoding device according to the present invention.
  • FIG. 5 is a schematic diagram of interpolation compensation in a specific embodiment of a streaming media transcoding method and apparatus according to the present invention
  • FIG. 6 shows a schematic structural diagram of a computer device suitable for use in a terminal device or server for implementing an embodiment of the present invention.
  • Figure 1 shows a course playback page that presents the teacher video, student video, and signaling operations to the user, where the signaling operations include the courseware used in the class and related operations for the courseware.
  • the current course playback page needs to load multiple files such as audio, video, and signaling operations of the teacher and the student at the same time.
  • the logical relationship between multiple files is complicated, and the regular audio and video files uploaded by the students and teachers are Larger, it is easy to cause the audio and video files in the playback page to be slow to load and drag, the audio and video are not synchronized, and the dragging feeling is strong.
  • the bandwidth requirement for the instant is high, and the audio uploaded by the students and the teacher,
  • the format of video files is not uniform. It is often dependent on the decoding and transmission of third-party software such as FLASH and CDN. The upgrade of third-party software is prone to incompatibility problems, resulting in the inability to play the audio or video of the playback page.
  • the method 10 includes:
  • the streaming media file may be an audio file and a video file uploaded by a teacher or a student about an online course, and the video file of the teacher or student is image-encoded to obtain a chronological continuous multi-frame image.
  • the image encoding mode (for example, YUV) such as brightness and color difference may be used to image the streaming media file
  • the image encoding the streaming media file to obtain a continuous multi-frame image may include an I frame image located at two ends of the multi-frame image and located at two At least one frame P frame image between frame I frame images.
  • the I frame image can display the complete image
  • the P frame image only records the difference from the previous frame image to reduce the file size of the streaming media file.
  • S110 may include:
  • S111 Determine a group of to-be-processed images including complex scenes in the multi-frame image.
  • the image size of one frame of the multi-frame image is greater than a preset second threshold, it is determined that the one-frame image includes a complex scene.
  • the videos of the teacher and the student include the movement of the grass in the background and the movement of the teacher or the student, due to the presence of grass or
  • a second threshold may be preset.
  • the image size of the P frame image is increased to the second threshold, it is considered that the P frame image includes a complex scene, which may cause the code rate of the video file to rise, which may be complicated due to the insensitivity of the human eye to the motion scene.
  • the multi-texture simplification of the motion scene reduces the bit rate of this complex scene without reducing the user experience.
  • N is a positive integer greater than or equal to 2. Among them, N can be selected according to the number of image frames in a fixed time.
  • N may be selected as any one of 10 to 25, and the image size between two adjacent images in the image to be processed for 10 to 25 consecutive frames. If the difference is above a preset first threshold, determining that the plurality of to-be-processed images include a fast transition region.
  • the teacher or the student is still teaching online in the grass-based outdoor environment.
  • the complex scene may be a complex scene caused by grass movement or a human motion.
  • the simplified processing when the simplified processing is performed, the motion of the person is the position of interest that the user views, and the simplified processing cannot be performed, and the motion of the background such as grass is a position that the user does not pay attention to, and the simplified processing can be performed. Therefore, it is necessary to confirm whether a complex scene in a group of images to be processed includes a scene of human motion.
  • the motion of the grass is regular, such as the swing of the grass when the wind blows.
  • S113 Perform image recognition processing on the image to be processed including the fast transform region to obtain a location of the fast transform region.
  • the fast transform region is an area that does not need to be simplified, and the image to be processed needs to be processed by image recognition to obtain the position of the fast transform region in a set of images to be processed.
  • the position of the fast transition region including the body part of the teacher or student can be obtained by the sharpness convolution process.
  • S120 Perform simplification processing on at least part of the multi-frame image located outside the fast transform area to obtain a standard streaming media file.
  • the area outside the fast transform area is a non-fast transform area, and at least part of the non-fast transform area may be simplified.
  • the simplified processing may be a linear simplification process, such as Gaussian blur, or a P frame image.
  • the image data of the non-fast transition region is at least partially deleted, thereby greatly reducing the bit rate of the streaming media file.
  • the simplified processing of the image to be processed is not limited to the above manner, and any simplified processing manner that can reduce the code rate of the streaming media file is within the protection scope of the present invention.
  • the method further includes the step of performing interpolation compensation on the fast transform region, each frame image in the multi-frame image may include a plurality of pixel points, and the fast transform region may be acquired including Multiple pixels, insert at least one pixel between any two adjacent pixels.
  • the pixel value of at least one pixel may be obtained by a uniform mean square error or the like.
  • each pixel point may be represented as x i,j , and the pixel value of each pixel point x i,j is f i,j .
  • i and j are the row coordinates and column coordinates of each pixel in each frame image.
  • the pixel value of x can be obtained by the following formula Get:
  • f 1 (d(x,x i,j )/d(x i,j+1 ,x i,j ))f i,j +(d(x i,j+1 ,x)/d(x i,j+1 ,x i,j ))f i,j+1
  • f 2 (d(x,x i-1,j )/d(x i+1,j+1 ,x i-1,j ))f i-1,j +(d(x i+1, j+1 ,x)/d(x i+1,j+1 ,x i-1,j ))f i+1,j+1
  • f 3 (d(x,x i,j-1 )/d(x i,j+2 ,x i,j-1 ))f i,j-1 +(d(x i,j+2 , x)/d(x i,j+2 ,x i,j-1 ))f i,j+2
  • f 4 (d(x,x i-2,j-1 )/d(x i+2,j+2 ,x i-2,j-1 ))f i-2,j-1
  • f is the pixel value of the inserted pixel point x
  • d() is the distance calculation.
  • the pixel values of the pixels inserted at other positions can be selected in a similar manner by selecting a plurality of pixel points around the pixel points and substituting the above formula.
  • the method further includes the step of brightness enhancement of the streaming media file, and the brightness of the streaming media file display is enhanced to make the video itself clearer.
  • the method may further comprise converting at least one of the at least one P-frame image into a B-frame image.
  • the B frame image only records the difference from the previous frame and the next frame.
  • the B-frame image is replaced in the transcoding process of the streaming media file, which can further reduce the bit rate of the streaming media file, reduce the storage cost and improve the flow. The loading speed of media files.
  • the audio file is further decoded and synchronized with the simplified processed video file, scaled, supplemented with audio and video frames, merged, unified frame rate, resolution, and channel, and finally a unified Streaming media files, for example, after acquiring the audio files and video files of the teacher's class, processing the video files and audio files to obtain an MP4 file, the video files in MP4 can be in H264 format, and the audio files can be in AAC format, etc. .
  • the MP4 file can be played directly on pages such as HTML5 pages, reducing the reliance on third-party software.
  • the device includes an image encoding module 1, an image analyzing module 2, and an image processing module. 3.
  • the image encoding module 1 is configured to perform image encoding on the streaming media file to obtain a multi-frame image.
  • the streaming media file may include audio files and video files uploaded by the teacher or student regarding the online course, and the video files of the teacher or student are image encoded to obtain chronological sequential multi-frame images.
  • the audio files of the teacher and the student can be unified in one file with the standard video file after the video file is simplified, which reduces the logic processing complexity of the playback page.
  • the image encoding module 1 may perform image encoding on the streaming media file by using a YUV data encoding format, and image encoding the streaming media file to obtain a continuous multi-frame image, which may include an I frame image located at two ends of the multi-frame image and located in two frames. At least one frame P frame image between I frame images.
  • the I frame image can display the complete image, and the P frame image only records the difference from the previous frame image to reduce the file size of the streaming media file.
  • the image analysis module 2 is configured to determine a fast transition region in the multi-frame image. Specifically, the image analysis module 2 is configured to determine a group of to-be-processed images that include a complex scene in the multi-frame image, if two consecutive ones of the to-be-processed images in the group of to-be-processed images are And determining that the image size difference is greater than a preset first threshold, determining that the N or more to-be-processed images include a fast transform region, and performing image recognition processing on the to-be-processed image including the fast transform region to obtain the fast transform region s position.
  • N is a positive integer greater than or equal to 2.
  • the image size of one frame of the multi-frame image is greater than a preset second threshold, it is determined that the one-frame image includes a complex scene.
  • the videos of the teacher and the student include the movement of the grass in the background and the movement of the teacher or the student, due to the presence of grass or
  • a second threshold may be preset.
  • the image size of the P frame image is increased to the second threshold, it is considered that the P frame image includes a complex scene, which may cause the code rate of the video file to rise, which may be complicated due to the insensitivity of the human eye to the motion scene.
  • the multi-texture simplification of the motion scene reduces the bit rate of this complex scene without reducing the user experience.
  • N when determining the fast transition region, N may be selected based on the number of image frames in a fixed time. Preferably, if the image in 1 second is 25 frames, N may be selected as any one of 10 to 25, and the difference in image size between adjacent two of the images to be processed in consecutive 10 to 25 frames is Presetting the first threshold or more, determining that the plurality of to-be-processed images include a fast transform region.
  • the teacher or the student is still teaching online in the grass-based outdoor environment.
  • the complex scene may be a complex scene caused by grass movement or a human motion.
  • the simplified processing when the simplified processing is performed, the motion of the person is the position of interest that the user views, and the simplified processing cannot be performed, and the motion of the background such as grass is a position that the user does not pay attention to, and the simplified processing can be performed. Therefore, it is necessary to confirm whether a complex scene in a group of images to be processed includes a scene of human motion.
  • the motion of the grass is regular, such as the swing of the grass when the wind blows.
  • the fast transform region is an area that does not need to be simplified, and the image to be processed needs to be processed by image recognition to obtain the position of the fast transform region in a set of images to be processed.
  • the position of the fast transition region including the body part of the teacher or student can be obtained by the sharpness convolution process.
  • the image processing module 3 is configured to perform simplified processing on at least part of the multi-frame image located outside the fast transform area to obtain a standard streaming media file.
  • the area outside the fast transform area is a non-fast transform area, and at least part of the non-fast transform area may be simplified.
  • the simplified processing may be a linear simplification process, such as Gaussian blur, or a P frame image.
  • the image data of the non-fast transition region is at least partially deleted, thereby greatly reducing the bit rate of the streaming media file.
  • the simplified processing of the image to be processed is not limited to the above manner, and any simplified processing manner that can reduce the code rate of the streaming media file is within the protection scope of the present invention.
  • the image processing module 3 is further configured to perform interpolation compensation on the fast transform region.
  • Each of the multi-frame images may include a plurality of pixel points, and the image processing module 3 is configured to acquire a plurality of pixel points included in the fast transform region, and insert between any two adjacent pixel points. At least one pixel.
  • the pixel value of at least one pixel may be obtained by a uniform mean square error or the like.
  • each pixel point may be represented as x i,j , and the pixel value of each pixel point x i,j is f i,j .
  • i and j are the row coordinates and column coordinates of each pixel in each frame image.
  • the pixel value of x can be obtained by the following formula Get:
  • f 1 (d(x,x i,j )/d(x i,j+1 ,x i,j ))f i,j +(d(x i,j+1 ,x)/d(x i,j+1 ,x i,j ))f i,j+1
  • f 2 (d(x,x i-1,j )/d(x i+1,j+1 ,x i-1,j ))f i-1,j +(d(x i+1, j+1 ,x)/d(x i+1,j+1 ,x i-1,j ))f i+1,j+1
  • f 3 (d(x,x i,j-1 )/d(x i,j+2 ,x i,j-1 ))f i,j-1 +(d(x i,j+2 , x)/d(x i,j+2 ,x i,j-1 ))f i,j+2
  • f 4 (d(x,x i-2,j-1 )/d(x i+2,j+2 ,x i-2,j-1 ))f i-2,j-1
  • f is the pixel value of the inserted pixel point x
  • d() is the distance calculation.
  • the pixel values of the pixels inserted at other positions can be selected in a similar manner by selecting a plurality of pixel points around the pixel points and substituting the above formula.
  • the image processing module 3 is further configured to perform brightness enhancement on the streaming media file, and the brightness of the streaming media file display screen may be enhanced to make the video itself clearer.
  • the image processing module 3 may further convert at least one of the at least one P frame image into a B frame image.
  • the B frame image only records the difference from the previous frame and the next frame.
  • the B-frame image is replaced in the transcoding process of the streaming media file, which can further reduce the bit rate of the streaming media file, reduce the storage cost and improve the flow. The loading speed of media files.
  • the image processing module 3 further decodes the audio file and synchronizes, scales, complements the audio and video frames, merges, frames, resolutions, and channels with the simplified processed video file.
  • a unified streaming media file is obtained.
  • the video file and the audio file are processed to obtain an MP4 file, and the video file in the MP4 can be in the format of H264 or the like.
  • the file can be in a format such as AAC.
  • the MP4 file can be played directly on pages such as HTML5 pages, reducing the reliance on third-party software.
  • some embodiments of the present invention provide a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the program as described above The method executed by the client, or the method executed by the server as described above when the processor executes the program.
  • FIG. 6 a block diagram of a computer device 600 suitable for use in implementing a terminal device or server of an embodiment of the present application is shown.
  • computer device 600 includes a central processing unit (CPU) 601 that can be loaded into a random access memory (RAM) 603 according to a program stored in read only memory (ROM) 602 or from storage portion 608.
  • the program performs various appropriate tasks and processes.
  • RAM 603 various programs and data required for the operation of the system 600 are also stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also coupled to bus 604.
  • the following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, etc.; an output portion 607 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 608 including a hard disk or the like. And a communication portion 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the Internet.
  • Driver 610 is also coupled to I/O interface 606 as needed.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 610 as needed so that a computer program read therefrom is installed as the storage portion 608 as needed.
  • an embodiment of the invention includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart.
  • the computer program can be downloaded and installed from the network via communication portion 609, and/or installed from removable media 611.
  • each block of the flowchart or block diagrams can represent a module, a program segment, or a portion of code that includes one or more logic for implementing the specified.
  • Functional executable instructions can also be noted that in some alternative implementations, the functions noted in the blocks may also be transmitted in a different order than those illustrated in the drawings. For example, two successively represented blocks may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending on the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本发明公开一种流媒体回放服务器,包括数据库,存储有流媒体文件的相关信息和存储地址;信息转发模块,用于接收客户端发送的流媒体回放指令;应用处理模块,用于接收所述流媒体回放指令,鉴权所述流媒体回放指令的合法性;多个业务处理模块,用于接收所述合法的流媒体回放指令,当数据库中存在所述流媒体文件的相关信息时返回流媒体回放地址,以使所述客户端形成回放页面,进一步从所述数据库中提取对应的流媒体文件的存储地址并返回给客户端,以使客户端获取流媒体文件并加载至所述回放页面,本发明还公开了一种客户端,本发明降低了多个流媒体文件回放的逻辑复杂度,减少回放页面的卡顿和错误问题,提高流媒体回放页面的响应速度及交互体验。

Description

一种流媒体转码方法、装置、计算机设备及可读介质
本申请要求了2018年4月9日提交的、申请号为201810312958.2、发明名称为“一种流媒体转码方法、装置、计算机设备及可读介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及在线教育领域。更具体地,涉及一种流媒体转码方法、装置、计算机设备及可读介质。
背景技术
近年来,随着在线教育行业的兴起,老师和学生的远距离网络教学成为趋势,为了便于家长及在线教育机构对网络教学的课程情况进行监督和评价,通常会保存网络教学的音频、视频和课件等媒体文件并向家长或在线教育机构相关部门提供回放功能,使用户可以随时随地回放观看老师和学生的上课情况。
然而,目前的课程回放页面分别保存老师和学生上课的课件以及多个音、视频文件,在回放时,回放页面需同时加载多个文件,多个文件间的逻辑关系复杂,且常规的音视频文件较大,易导致音视频文件加载和拖动速度慢、音画不同步,拖动卡顿感较强,播放的时候对于瞬间的带宽要求高,且回放页面对于老师和学生的音视频文件的加载需要依赖于FLASH和内容分发网络(Content Delivery Network,CDN)等第三方软件的解码和传输,第三方软件的升级易产生不兼容问题,导致回放页面无法播放。
发明内容
为了解决以上问题的至少之一,本发明的一个目的在于提供一种流媒体转码方法,对用户敏感度不高的区域进行简化处理以减少流媒体文件的大小,提高回放页面流媒体文件的加载速度,本发明的另一个目的在于提供一种流媒体转码装置,本发明的再一个目的在于提供一种计算机设备,本发明的还一个目的在于提供计算机可读介质。
为达到上述目的,本发明采用下述技术方案:
根据本发明的一个方面,提供一种流媒体转码方法,包括
对流媒体文件进行图像编码得到多帧图像;
确定所述多帧图像中的快速变换区域;
对所述多帧图像中位于所述快速变换区域之外的至少部分区域进行简化处理得到标准流媒体文件。
优选地,对流媒体文件进行图像编码得到多帧图像具体包括:
对所述流媒体文件进行图像编码得到连续多帧图像,所述多帧图像包括分别位于多帧图像两端的I帧图像以及位于两帧I帧图像间的至少一帧P帧图像。
优选地,所述方法进一步包括将所述至少一个P帧图像中的至少一个转换为B帧图像。
优选地,确定所述多帧图像中的快速变换区域具体包括:
确定所述多帧图像中包括复杂场景的一组待处理图像;
响应于所述一组待处理图像中连续N个以上待处理图像中相邻两个之间的图像大小差异均在预设第一阈值以上,确定所述N个以上待处理图像中包括快速变换区域,N为大于等于2的正整数;
对包括快速变换区域的待处理图像进行图像识别处理得到所述快速变换区域的位置。
优选地,确定所述多帧图像中包括复杂场景的一组待处理图像包括:
响应于所述多帧图像中的一帧图像的图像大小大于预设第二阈值,确定所述一帧图像中包括复杂场景。
优选地,所述方法进一步包括对所述快速变换区域进行插值补偿。
优选地,所述多帧图像中的每一帧图像包括多个像素点;
对所述快速变换区域进行插值补偿具体包括:
获取所述快速变换区域包括的多个像素点,在任意两个相邻的像素点间插入至少一个像素点。
根据本发明的另一方面,还公开了一种流媒体转码装置,包括
图像编码模块,用于对流媒体文件进行图像编码得到多帧图像;
图像分析模块,用于确定所述多帧图像中的快速变换区域;
图像处理模块,用于对所述多帧图像中位于所述快速变换区域之外的至少部分区域进行简化处理得到标准流媒体文件。
优选地,所述图像编码模块用于
对所述流媒体文件进行图像编码得到连续多帧图像,所述多帧图像包括分别位于多帧图像两端的I帧图像以及位于两帧I帧图像间的至少一帧P帧图像。
优选地,所述图像处理模块进一步用于:
将所述至少一个P帧图像中的至少一个转换为B帧图像。
优选地,所述图像分析模块进一步用于:
确定所述多帧图像中包括复杂场景的一组待处理图像;
若所述一组待处理图像中连续N个以上待处理图像中相邻两个之间的图像大小差异均在预设第一阈值以上,则确定所述N个以上待处理图像中包括快速变换区域,N为大于等于2的正整数;
对包括快速变换区域的待处理图像进行图像识别处理得到所述快速变换区域的位置。
优选地,
所述图像分析模块进一步用于响应于所述多帧图像中的一帧图像的图像大小大于预设第二阈值,确定所述一帧图像中包括复杂场景。
优选地,所述图像处理模块进一步用于对所述快速变换区域进行插值补偿。
优选地,
所述多帧图像中的每一帧图像包括多个像素点;
所述图像处理模块进一步用于获取所述快速变换区域包括的多个像素点,在任意两个相邻的像素点间插入至少一个像素点。
根据本发明的还一个方面,提供一种计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,
所述处理器执行所述程序时实现如上所述方法。
根据本发明的还一个方面,提供一种计算机可读介质,其上存储有计算机程序,
该程序被处理器执行时实现如上所述方法。
本发明的有益效果如下:
本发明将流媒体文件进行图像编码转换为按时间顺序排列的多帧连续图像,进一步确定多帧图像中的快速变换区域,该快速变换区域通常为老师或学生上课时的动作产生的,老师或学生为用户的重点观看区域,但是,在多帧图像中除了快速变换区域的非快速变换区域,通常为上课时课程背景(如草地等)的变化,这些非快速变换区域不是用户观看时的重点关注区域,对所述多帧图像中位于所述快速变换区域之外的至少部分区域进行简化处理得到标准流媒体文件,可减少流媒体文件的存储大小,降低流媒体文件的码率,提高回放页面的流媒体文件加载速度,达到秒开、秒拖的效果。
附图说明
下面结合附图对本发明的具体实施方式作进一步详细的说明。
图1示出现有技术中一个回放页面的示意图;
图2示出本发明一种流媒体转码方法一个具体实施例的流程图;
图3示出本发明一种流媒体转码方法一个具体实施例确定所述多帧图像中的快速变换区域的流程图;
图4示出本发明一种流媒体转码装置一个具体实施例的结构示意图;
图5示出本发明一种流媒体转码方法及装置具体实施例中插值补偿的示意图;
图6示出适用于用来实现本发明实施例的终端设备或服务器的计算机设备的结构示意图。
具体实施方式
为了更清楚地说明本发明,下面结合优选实施例和附图对本发明做进一步的说明。 附图中相似的部件以相同的附图标记进行表示。本领域技术人员应当理解,下面所具体描述的内容是说明性的而非限制性的,不应以此限制本发明的保护范围。
近年来,随着在线教育行业的兴起,老师和学生的远距离网络教学成为趋势,为了便于家长及在线教育机构对网络教学的课程情况进行监督和评价,通常会保存网络教学的音频、视频和课件等媒体文件并向家长或在线教育机构相关部门提供回放功能,使用户可以随时随地回放观看老师和学生的上课情况。
图1示出了一个课程回放页面,回放页面向用户展示老师视频、学生视频以及信令操作,其中,信令操作包括上课所用的课件以及针对课件的相关操作。但是,目前的课程回放页面在回放时,需同时加载老师和学生的音频、视频、信令操作等多个文件,多个文件间的逻辑关系复杂,且学生和老师上传的常规音频和视频文件较大,易导致回放页面中的音视频文件加载和拖动速度慢、音画不同步,拖动卡顿感较强,播放的时候对于瞬间的带宽要求高,且学生和老师上传的音频、视频文件的格式不统一,往往需要依赖于FLASH和CDN等第三方软件的解码和传输,第三方软件的升级易产生不兼容问题,导致回放页面的音视或视频无法播放。
为了解决以上问题的至少之一,基于本发明的一个方面,如图2所示,公开了一种流媒体转码方法的一个具体实施例,该方法10包括:
S100:对流媒体文件进行图像编码得到多帧图像。在具体实施例中,流媒体文件可为老师或学生上传的关于在线课程的音频文件和视频文件,将老师或学生的视频文件进行图像编码得到按时间顺序的连续多帧图像。
具体的,可采用亮度和色差等图像编码模式(例如YUV)对流媒体文件进行图像编码,对流媒体文件进行图像编码得到连续的多帧图像可包括分别位于多帧图像两端的I帧图像以及位于两帧I帧图像间的至少一帧P帧图像。其中,I帧图像可显示完整图像,而P帧图像只记载了与前一帧图像的不同之处,以减小流媒体文件的文件大小。
S110:确定所述多帧图像中的快速变换区域。
具体的,如图3所示,S110可包括:
S111:确定所述多帧图像中包括复杂场景的一组待处理图像。
在优选地实施方式中,若所述多帧图像中的一帧图像的图像大小大于预设第二阈值,则确定所述一帧图像中包括复杂场景。
例如,在具体实施例中,当老师或学生在以草地为背景的户外进行在线授课时, 老师和学生的视频中包括了背景中草的运动和老师或学生的人的运动,由于存在草或人的运动,一定时间内,图像编码后得到的多帧图像中,P帧图像相对于前一帧图像的区别较多,导致P帧图像的大小增大,可预设一个第二阈值,当P帧图像的图像大小增大到第二阈值时,认为该P帧图像中包括复杂场景,该复杂场景会导致视频文件的码率上升,由于人眼对于运动场景的不敏感性,可对复杂运动场景进行多纹理的简单化处理,降低了这复杂场景的码率,同时又不减少用户的体验。
S112:若所述一组待处理图像中连续N个以上待处理图像中相邻两个之间的图像大小差异均在预设第一阈值以上,则确定所述N个以上待处理图像中包括快速变换区域,N为大于等于2的正整数。其中,N可根据固定时间内的图像帧数选定。
在具体实施例中,若1秒内的图像为25帧时,可选择N为10~25中的任意一个值,当连续10~25帧的待处理图像中相邻两个之间的图像大小差异均在预设第一阈值以上,则确定所述多个待处理图像中包括快速变换区域。
例如,仍以老师或学生在以草地为背景的户外进行在线授课为例,在包括复杂场景的一组待处理图像中,复杂场景可以是草的运动导致的复杂场景,也可以是人的运动导致的复杂场景,在进行简化处理时,人的运动是用户观看的关注位置,不能够进行简化处理,而对于草等背景的运动是用户不关注的位置,可以进行简化处理。因此,需要确认一组待处理图像中的复杂场景是否包括人的运动的场景,通过分析草的运动规律和人的运动规律,草的运动是有规律性的,例如草在风吹时的摇摆是有规律的,因此在转化为P帧图像后,前后两帧P帧图像的大小几乎没有差异,而人的运动是没有规律的,因此,可确定在连续的10帧中,相邻两帧的图像差异均在第一阈值以上时,则认为连续多张图像时包括快速变换区域,这些快速变换区域为人的运动导致的,在实际处理中,对这部分快速变换区域可以不做处理,而对其他区域进行简化处理。
S113:对包括快速变换区域的待处理图像进行图像识别处理得到所述快速变换区域的位置。在确定待处理图像中包括快速变换区域后,快速变换区域为无需简化处理的区域,需要通过图像识别处理待处理图像,得到一组待处理图像中的快速变换区域的位置。例如,可通过锐度卷积处理得到包括老师或学生的身体部分的快速变换区域的位置。
S120:对所述多帧图像中位于所述快速变换区域之外的至少部分区域进行简化处理得到标准流媒体文件。其中,快速变换区域之外的区域为非快速变换区域,可对非快速变换区域的至少部分进行简化处理,简化处理的方式可以是线性简化处理,例如 高斯模糊,也可以将P帧图像中的非快速变换区域的图像数据至少部分删除,从而大大减少了流媒体文件的码率。当然,对于待处理图像的简化处理并不限定以上方式,只要可以降低流媒体文件的码率的简化处理方式均在本发明的保护范围内。
在一个优选实施方式中,所述方法进一步包括对所述快速变换区域进行插值补偿的步骤,所述多帧图像中的每一帧图像可包括多个像素点,可获取所述快速变换区域包括的多个像素点,在任意两个相邻的像素点间插入至少一个像素点。其中,至少一个像素点的像素值可通过统一均方差等方法得到。通过在快速变换区域插入至少一个像素点,提高视频缩放时的显示效果,使视频显示更细腻,缩放不失真,同时,只对快速变换区域进行插值处理,提高了对用户重点关注区域的图像显示效果,同时减少了插值处理的范围。
具体的,若每一帧图像中包括i行、j列像素点,则各像素点可表示为x i,j,各像素点x i,j的像素值为f i,j。其中,i,j分别为每一帧图像中各像素点的行坐标和列坐标。当在两个相邻的像素点x i,j和x i,j+1或x i,j和x i+1,j间插入一个像素点x时,x的像素值可通过x周围多个像素点分别相对于x的距离权重与各像素点的像素值加权平均得到。
如图5所示,在一个具体实施例中,在两个相邻的像素点x i,j和x i,j+1间插入一个像素点x时,x的像素值可通过下述公式求得:
f 1=(d(x,x i,j)/d(x i,j+1,x i,j))f i,j+(d(x i,j+1,x)/d(x i,j+1,x i,j))f i,j+1
f 2=(d(x,x i-1,j)/d(x i+1,j+1,x i-1,j))f i-1,j+(d(x i+1,j+1,x)/d(x i+1,j+1,x i-1,j))f i+1,j+1
f 3=(d(x,x i,j-1)/d(x i,j+2,x i,j-1))f i,j-1+(d(x i,j+2,x)/d(x i,j+2,x i,j-1))f i,j+2
f 4=(d(x,x i-2,j-1)/d(x i+2,j+2,x i-2,j-1))f i-2,j-1
+(d(x i+2,j+2,x i-2,j-1)/d(x i+2,j+2,x i-2,j-1))f i+2,j+2
f=(f 1+f 2+f 3+f 4)/4
其中,f为插入的像素点x的像素值,d()为求距离运算。相应地,其他位置插入的像素点的像素值可通过类似的方式选取插入像素点周围的多个像素点并代入上述公式求得。
在一个优选实施方式中,所述方法进一步包括对所述流媒体文件进行亮度增强的步骤,通过对流媒体文件显示画面进行亮度增强,可使视频本身更加清晰。
在一个优选实施例中,所述方法进一步还可包括将所述至少一个P帧图像中的至少一个转换为B帧图像。其中,B帧图像只记载了与前一帧和后一帧的区别之处。在老师和学生上传的课程中,由于课程的实时性没有B帧图像,在流媒体文件的转码过程中替换加入B帧图像,可进一步降低流媒体文件的码率,降低存储成本并提高流媒 体文件的加载速度。
在可选实施例中,进一步还将音频文件解码处理并与简化处理后的视频文件进行同步、缩放、补音频和视频帧、合并、统一帧率、分辨率和声道,最后得到一个统一的流媒体文件,例如当获取了老师的上课的音频文件和视频文件后,对视频文件和音频文件进行处理得到一个MP4文件,MP4中的视频文件可为H264等格式,音频文件可为AAC等格式。该MP4文件可在HTML5页面等页面上直接播放,从而降低对第三方软件的依赖。
根据本发明的另一个方面,如图4所求,公开了一种流媒体转码装置的一个具体实施例,本实施例中,该装置包括图像编码模块1、图像分析模块2和图像处理模块3。
其中,图像编码模块1用于对流媒体文件进行图像编码得到多帧图像。在具体实施例中,流媒体文件可包括老师或学生上传的关于在线课程的音频文件和视频文件,将老师或学生的视频文件进行图像编码得到按时间顺序的连续多帧图像。其中,老师和学生的音频文件可在视频文件简化处理后与标准视频文件统一在一个文件中,减少回放页面的逻辑处理复杂度。
具体的,所述图像编码模块1可采用YUV数据编码形式对流媒体文件进行图像编码,对流媒体文件进行图像编码得到连续的多帧图像可包括分别位于多帧图像两端的I帧图像以及位于两帧I帧图像间的至少一帧P帧图像。其中,I帧图像可显示完整图像,而P帧图像只记载了与前一帧图像的不同之处,以减小流媒体文件的文件大小。
图像分析模块2用于确定所述多帧图像中的快速变换区域。具体的,所述图像分析模块2可用于确定所述多帧图像中包括复杂场景的一组待处理图像,若所述一组待处理图像中连续N个以上待处理图像中相邻两个之间的图像大小差异均在预设第一阈值以上,则确定所述N个以上待处理图像中包括快速变换区域,并对包括快速变换区域的待处理图像进行图像识别处理得到所述快速变换区域的位置。其中,N为大于等于2的正整数。
在优选地实施方式中,若所述多帧图像中的一帧图像的图像大小大于预设第二阈值,则确定所述一帧图像中包括复杂场景。
例如,在具体实施例中,当老师或学生在以草地为背景的户外进行在线授课时,老师和学生的视频中包括了背景中草的运动和老师或学生的人的运动,由于存在草或人的运动,一定时间内,图像编码后得到的多帧图像中,P帧图像相对于前一帧图像 的区别较多,导致P帧图像的大小增大,可预设一个第二阈值,当P帧图像的图像大小增大到第二阈值时,认为该P帧图像中包括复杂场景,该复杂场景会导致视频文件的码率上升,由于人眼对于运动场景的不敏感性,可对复杂运动场景进行多纹理的简单化处理,降低了这复杂场景的码率,同时又不减少用户的体验。
在具体实施例中,确定快速变换区域时,N可根据固定时间内的图像帧数选定。优选地,若1秒内的图像为25帧时,可选择N为10~25中的任意一个值,当连续10~25帧的待处理图像中相邻两个之间的图像大小差异均在预设第一阈值以上,则确定所述多个待处理图像中包括快速变换区域。
例如,仍以老师或学生在以草地为背景的户外进行在线授课为例,在包括复杂场景的一组待处理图像中,复杂场景可以是草的运动导致的复杂场景,也可以是人的运动导致的复杂场景,在进行简化处理时,人的运动是用户观看的关注位置,不能够进行简化处理,而对于草等背景的运动是用户不关注的位置,可以进行简化处理。因此,需要确认一组待处理图像中的复杂场景是否包括人的运动的场景,通过分析草的运动规律和人的运动规律,草的运动是有规律性的,例如草在风吹时的摇摆是有规律的,因此在转化为P帧图像后,前后两帧P帧图像的大小几乎没有差异,而人的运动是没有规律的,因此,可确定在连续的10帧中,相邻两帧的图像差异均在第一阈值以上时,则认为连续多张图像时包括快速变换区域,这些快速变换区域为人的运动导致的,在实际处理中,对这部分快速变换区域可以不做处理,而对其他区域进行简化处理。
在确定待处理图像中包括快速变换区域后,快速变换区域为无需简化处理的区域,需要通过图像识别处理待处理图像,得到一组待处理图像中的快速变换区域的位置。例如,可通过锐度卷积处理得到包括老师或学生的身体部分的快速变换区域的位置。
图像处理模块3用于对所述多帧图像中位于所述快速变换区域之外的至少部分区域进行简化处理得到标准流媒体文件。其中,快速变换区域之外的区域为非快速变换区域,可对非快速变换区域的至少部分进行简化处理,简化处理的方式可以是线性简化处理,例如高斯模糊,也可以将P帧图像中的非快速变换区域的图像数据至少部分删除,从而大大减少了流媒体文件的码率。当然,对于待处理图像的简化处理并不限定以上方式,只要可以降低流媒体文件的码率的简化处理方式均在本发明的保护范围内。
在一个优选实施方式中,所述图像处理模块3进一步用于对所述快速变换区域进行插值补偿。所述多帧图像中的每一帧图像可包括多个像素点,所述图像处理模块3 用于获取所述快速变换区域包括的多个像素点,在任意两个相邻的像素点间插入至少一个像素点。其中,至少一个像素点的像素值可通过统一均方差等方法得到。通过在快速变换区域插入至少一个像素点,提高视频缩放时的显示效果,使视频显示更细腻,缩放不失真,同时,只对快速变换区域进行插值处理,提高了对用户重点关注区域的图像显示效果,同时减少了插值处理的范围。
具体的,若每一帧图像中包括i行、j列像素点,则各像素点可表示为x i,j,各像素点x i,j的像素值为f i,j。其中,i,j分别为每一帧图像中各像素点的行坐标和列坐标。当在两个相邻的像素点x i,j和x i,j+1或x i,j和x i+1,j间插入一个像素点x时,x的像素值可通过x周围多个像素点分别相对于x的距离权重与各像素点的像素值加权平均得到。
如图5所示,在一个具体实施例中,在两个相邻的像素点x i,j和x i,j+1间插入一个像素点x时,x的像素值可通过下述公式求得:
f 1=(d(x,x i,j)/d(x i,j+1,x i,j))f i,j+(d(x i,j+1,x)/d(x i,j+1,x i,j))f i,j+1
f 2=(d(x,x i-1,j)/d(x i+1,j+1,x i-1,j))f i-1,j+(d(x i+1,j+1,x)/d(x i+1,j+1,x i-1,j))f i+1,j+1
f 3=(d(x,x i,j-1)/d(x i,j+2,x i,j-1))f i,j-1+(d(x i,j+2,x)/d(x i,j+2,x i,j-1))f i,j+2
f 4=(d(x,x i-2,j-1)/d(x i+2,j+2,x i-2,j-1))f i-2,j-1
+(d(x i+2,j+2,x i-2,j-1)/d(x i+2,j+2,x i-2,j-1))f i+2,j+2
f=(f 1+f 2+f 3+f 4)/4
其中,f为插入的像素点x的像素值,d()为求距离运算。相应地,其他位置插入的像素点的像素值可通过类似的方式选取插入像素点周围的多个像素点并代入上述公式求得。
在一个优选实施方式中,所述图像处理模块3进一步还用于对所述流媒体文件进行亮度增强的步骤,通过对流媒体文件显示画面进行亮度增强,可使视频本身更加清晰。
在一个优选实施例中,所述图像处理模块3进一步还可将所述至少一个P帧图像中的至少一个转换为B帧图像。其中,B帧图像只记载了与前一帧和后一帧的区别之处。在老师和学生上传的课程中,由于课程的实时性没有B帧图像,在流媒体文件的转码过程中替换加入B帧图像,可进一步降低流媒体文件的码率,降低存储成本并提高流媒体文件的加载速度。
在可选实施例中,所述图像处理模块3进一步还将音频文件解码处理并与简化处理后的视频文件进行同步、缩放、补音频和视频帧、合并、统一帧率、分辨率和声道,最后得到一个统一的流媒体文件,例如当获取了老师的上课的音频文件和视频文件后, 对视频文件和音频文件进行处理得到一个MP4文件,MP4中的视频文件可为H264等格式,音频文件可为AAC等格式。该MP4文件可在HTML5页面等页面上直接播放,从而降低对第三方软件的依赖。
进一步的,本发明的一些具体实施例提供一种计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上所述的由客户端执行的方法,或者,所述处理器执行所述程序时实现如上所述的由服务器执行的方法。
下面参考图6,其示出了适于用来实现本申请实施例的终端设备或服务器的计算机设备600的结构示意图。
如图6所示,计算机设备600包括中央处理单元(CPU)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM))603中的程序而执行各种适当的工作和处理。在RAM603中,还存储有系统600操作所需的各种程序和数据。CPU601、ROM602、以及RAM603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡,调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口606。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装如存储部分608。
特别地,根据本发明的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本发明的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,所述计算机程序包括用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。
附图中的流程图和框图,图示了按照本发明各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,所述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替 换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发送。例如两个接连地表示的方框实际上可以基本并行地执行,他们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
显然,本发明的上述实施例仅仅是为清楚地说明本发明所作的举例,而并非是对本发明的实施方式的限定,对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动,这里无法对所有的实施方式予以穷举,凡是属于本发明的技术方案所引伸出的显而易见的变化或变动仍处于本发明的保护范围之列。

Claims (16)

  1. 一种流媒体转码方法,其特征在于,包括
    对流媒体文件进行图像编码得到多帧图像;
    确定所述多帧图像中的快速变换区域;
    对所述多帧图像中位于所述快速变换区域之外的至少部分区域进行简化处理得到标准流媒体文件。
  2. 根据权利要求1所述的流媒体转码方法,其特征在于,对流媒体文件进行图像编码得到多帧图像具体包括:
    对所述流媒体文件进行图像编码得到连续多帧图像,所述多帧图像包括分别位于多帧图像两端的I帧图像以及位于两帧I帧图像间的至少一帧P帧图像。
  3. 根据权利要求2所述的流媒体转码方法,其特征在于,所述方法进一步包括:
    将所述至少一个P帧图像中的至少一个转换为B帧图像。
  4. 根据权利要求1所述的流媒体转码方法,其特征在于,确定所述多帧图像中的快速变换区域具体包括:
    确定所述多帧图像中包括复杂场景的一组待处理图像;
    响应于所述一组待处理图像中连续N个以上待处理图像中相邻两个之间的图像大小差异均在预设第一阈值以上,确定所述N个以上待处理图像中包括快速变换区域,N为大于等于2的正整数;
    对包括快速变换区域的待处理图像进行图像识别处理得到所述快速变换区域的位置。
  5. 根据权利要求4所述的流媒体转码方法,其特征在于,确定所述多帧图像中包括复杂场景的一组待处理图像包括:
    响应于所述多帧图像中的一帧图像的图像大小大于预设第二阈值,确定所述一帧图像中包括复杂场景。
  6. 根据权利要求1所述的流媒体转码方法,其特征在于,所述方法进一步包括:
    对所述快速变换区域进行插值补偿。
  7. 根据权利要求6所述的流媒体转码方法,其特征在于,
    所述多帧图像中的每一帧图像包括多个像素点;
    对所述快速变换区域进行插值补偿具体包括:
    获取所述快速变换区域包括的多个像素点,在任意两个相邻的像素点间插入至少一个像素点。
  8. 一种流媒体转码装置,其特征在于,包括
    图像编码模块,用于对流媒体文件进行图像编码得到多帧图像;
    图像分析模块,用于确定所述多帧图像中的快速变换区域;
    图像处理模块,用于对所述多帧图像中位于所述快速变换区域之外的至少部分区域进行简化处理得到标准流媒体文件。
  9. 根据权利要求8所述的流媒体转码装置,其特征在于,所述图像编码模块用于
    对所述流媒体文件进行图像编码得到连续多帧图像,所述多帧图像包括分别位于多帧图像两端的I帧图像以及位于两帧I帧图像间的至少一帧P帧图像。
  10. 根据权利要求9所述的流媒体转码装置,其特征在于,所述图像处理模块进一步用于:
    将所述至少一个P帧图像中的至少一个转换为B帧图像。
  11. 根据权利要求8所述的流媒体转码装置,其特征在于,所述图像分析模块进一步用于:
    确定所述多帧图像中包括复杂场景的一组待处理图像;
    响应于所述一组待处理图像中连续N个以上待处理图像中相邻两个之间的图像大小差异均在预设第一阈值以上,确定所述N个以上待处理图像中包括快速变换区域,N为大于等于2的正整数;
    对包括快速变换区域的待处理图像进行图像识别处理得到所述快速变换区域的位置。
  12. 根据权利要求11所述的流媒体转码装置,其特征在于,
    所述图像分析模块进一步用于响应于所述多帧图像中的一帧图像的图像大小大于预设第二阈值,确定所述一帧图像中包括复杂场景。
  13. 根据权利要求8所述的流媒体转码装置,其特征在于,所述图像处理模块进一步用于对所述快速变换区域进行插值补偿。
  14. 根据权利要求13所述的流媒体转码装置,其特征在于,
    所述多帧图像中的每一帧图像包括多个像素点;
    所述图像处理模块进一步用于获取所述快速变换区域包括的多个像素点,在任意两个相邻的像素点间插入至少一个像素点。
  15. 一种计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,
    所述处理器执行所述程序时实现如权利要求1-7任一项所述方法。
  16. 一种计算机可读介质,其上存储有计算机程序,其特征在于,
    该程序被处理器执行时实现如权利要求1-7任一项所述方法。
PCT/CN2019/076993 2018-04-09 2019-03-05 一种流媒体转码方法、装置、计算机设备及可读介质 WO2019196573A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810312958.2 2018-04-09
CN201810312958.2A CN108419095A (zh) 2018-04-09 2018-04-09 一种流媒体转码方法、装置、计算机设备及可读介质

Publications (1)

Publication Number Publication Date
WO2019196573A1 true WO2019196573A1 (zh) 2019-10-17

Family

ID=63134909

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/076993 WO2019196573A1 (zh) 2018-04-09 2019-03-05 一种流媒体转码方法、装置、计算机设备及可读介质

Country Status (2)

Country Link
CN (1) CN108419095A (zh)
WO (1) WO2019196573A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108419095A (zh) * 2018-04-09 2018-08-17 北京大米科技有限公司 一种流媒体转码方法、装置、计算机设备及可读介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102665077A (zh) * 2012-05-03 2012-09-12 北京大学 一种基于宏块分类的快速高效编转码方法
CN105992018A (zh) * 2015-02-11 2016-10-05 阿里巴巴集团控股有限公司 流媒体转码方法和装置
CN108419095A (zh) * 2018-04-09 2018-08-17 北京大米科技有限公司 一种流媒体转码方法、装置、计算机设备及可读介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10225301B2 (en) * 2015-08-26 2019-03-05 Zhan Ma Method and apparatus for use of input messages in media transport to support interactive communications

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102665077A (zh) * 2012-05-03 2012-09-12 北京大学 一种基于宏块分类的快速高效编转码方法
CN105992018A (zh) * 2015-02-11 2016-10-05 阿里巴巴集团控股有限公司 流媒体转码方法和装置
CN108419095A (zh) * 2018-04-09 2018-08-17 北京大米科技有限公司 一种流媒体转码方法、装置、计算机设备及可读介质

Also Published As

Publication number Publication date
CN108419095A (zh) 2018-08-17

Similar Documents

Publication Publication Date Title
US10368123B2 (en) Information pushing method, terminal and server
CN107534796B (zh) 视频处理系统和数字视频分发系统
WO2021068598A1 (zh) 共享屏幕的编码方法、装置、存储介质及电子设备
CN114025219B (zh) 增强现实特效的渲染方法、装置、介质及设备
CN110868625A (zh) 一种视频播放方法、装置、电子设备及存储介质
CN110827380B (zh) 图像的渲染方法、装置、电子设备及计算机可读介质
WO2019119854A1 (zh) 一种调整视频播放清晰度的方法和系统
WO2021227704A1 (zh) 图像识别方法、视频播放方法、相关设备及介质
CN113452944B (zh) 一种云手机的画面显示方法
CN113784118A (zh) 视频质量评估方法及装置、电子设备和存储介质
WO2017024901A1 (zh) 视频转码方法和装置
CN111524110B (zh) 视频质量的评价模型构建方法、评价方法及装置
CN111343503B (zh) 视频的转码方法、装置、电子设备及存储介质
Zhao et al. Laddernet: Knowledge transfer based viewpoint prediction in 360◦ video
WO2019196573A1 (zh) 一种流媒体转码方法、装置、计算机设备及可读介质
CN113452996A (zh) 一种视频编码、解码方法及装置
US20240040147A1 (en) Data processing method and apparatus, computer device, and storage medium
US20220327663A1 (en) Video Super-Resolution using Deep Neural Networks
CN111818338B (zh) 一种异常显示检测方法、装置、设备及介质
CN115834952A (zh) 基于视觉感知的视频帧率检测方法及装置
US11908340B2 (en) Magnification enhancement of video for visually impaired viewers
US11622118B2 (en) Determination of coding modes for video content using order of potential coding modes and block classification
CN112732381B (zh) 一种在线课堂的桌面数据采集方法及系统
CN112492375B (zh) 视频处理方法、存储介质、电子设备及视频直播系统
US20230071585A1 (en) Video compression and streaming

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19785526

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19785526

Country of ref document: EP

Kind code of ref document: A1