CN111526363A - Encoding method and apparatus, terminal and storage medium - Google Patents

Encoding method and apparatus, terminal and storage medium Download PDF

Info

Publication number
CN111526363A
CN111526363A CN202010240984.6A CN202010240984A CN111526363A CN 111526363 A CN111526363 A CN 111526363A CN 202010240984 A CN202010240984 A CN 202010240984A CN 111526363 A CN111526363 A CN 111526363A
Authority
CN
China
Prior art keywords
frame
video image
image
video
screen coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010240984.6A
Other languages
Chinese (zh)
Inventor
黎凌宇
王悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202010240984.6A priority Critical patent/CN111526363A/en
Publication of CN111526363A publication Critical patent/CN111526363A/en
Priority to PCT/CN2021/079799 priority patent/WO2021196994A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/177Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present disclosure provides a method and apparatus for encoding, a terminal, and a storage medium. The coding method comprises the following steps: acquiring an Nth frame of video image in a video; detecting the type of an Nth frame of video image, starting a screen coding tool for at least one frame of video image after the Nth frame of video image when the type of the Nth frame of video image is the key frame, and coding at least one frame of video image based on the screen coding tool; counting the number of image blocks adopting a screen coding mode in each frame of video image in at least one frame of video image to obtain a statistical result; and determining whether to use the screen coding tool for a continuous non-key frame video image after the at least one frame video image before the next key frame comes according to the statistical result. The coding method disclosed by the invention can adaptively determine whether to open the screen coding tool.

Description

Encoding method and apparatus, terminal and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a coding method and apparatus, a terminal, and a storage medium.
Background
The screen video content is captured directly from an image display of a terminal such as a computer mobile phone, and mainly comprises computer graphics, text documents, mixed images of natural video and graphics and text, computer-generated images and the like. The screen video has wide application prospect in the fields of desktop sharing, video conferences, online education, cloud games and the like.
Disclosure of Invention
To solve the existing problems, the present disclosure provides a coding method and apparatus, a terminal, and a storage medium.
The present disclosure adopts the following technical solutions.
In some embodiments, the present disclosure provides a method of encoding, comprising:
acquiring an Nth frame of video image in a video; wherein N is a positive integer;
detecting the type of the Nth frame video image; wherein the types include key frames and non-key frames;
when the type of the N frame video image is the key frame, starting a screen coding tool for at least one frame video image after the N frame video image, and coding the at least one frame video image based on the screen coding tool;
counting the number of image blocks adopting a screen coding mode in each frame of video image in the at least one frame of video image to obtain a counting result; and
and determining whether to use the screen coding tool for continuous non-key frame video images after the at least one frame video image before the next key frame comes according to the statistical result.
In some embodiments, the present disclosure provides an apparatus for encoding, comprising:
the acquisition module is used for acquiring the Nth frame of video image in the video; wherein N is a positive integer;
the detection module is used for detecting the type of the Nth frame of video image; wherein the types include key frames and non-key frames;
an operation module, configured to, when the type of the nth frame video image is the key frame, start a screen coding tool for at least one frame video image subsequent to the nth frame video image, and code the at least one frame video image based on the screen coding tool;
the statistical module is used for counting the number of image blocks adopting a screen coding mode in each frame of video image in the at least one frame of video image to obtain a statistical result; and
and the determining module is used for determining whether the screen coding tool is used for the continuous non-key frame video image after the at least one frame video image before the next key frame comes according to the statistical result.
In some embodiments, the present disclosure provides a terminal comprising: at least one memory and at least one processor;
the memory is used for storing program codes, and the processor is used for calling the program codes stored in the memory to execute the method.
In some embodiments, the present disclosure provides a storage medium for storing program code for performing the above-described method.
The coding scheme provided by the present disclosure can effectively distinguish screen video content from non-screen video content; for screen video, the coding efficiency of a screen coding tool can be ensured; for non-screen video content, the screen coding tool is closed in a self-adaptive manner, the coding speed is accelerated, and the computing resources are saved.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.
Fig. 1 is a flow chart of an encoding method of an embodiment of the present disclosure.
Fig. 2 is a schematic diagram of an encoding method of an embodiment of the present disclosure.
Fig. 3 is a schematic structural diagram of an encoding apparatus according to an embodiment of the present disclosure.
Fig. 4 is a schematic structural diagram of an encoding apparatus according to another embodiment of the present disclosure.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that various steps recited in method embodiments of the present disclosure may be performed in parallel and/or in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a" or "an" in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that reference to "one or more" unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
In practical applications, the video that people contact is compressed video. This is because the amount of uncompressed raw video data is surprising and cannot be used directly for actual transmission or storage at all. Therefore, a key technology for Video applications is Video Coding (Video Coding), also called Video compression, which aims to remove redundant components in Video data as much as possible and reduce the amount of data representing Video. High Efficiency Video Coding (HEVC) Screen Content Coding (SCC) an extension proposal is proposed for Screen Video Content on HEVC/h.265. For typical screen video content, HEVC SCC can improve coding efficiency by about 50%, while also increasing coding complexity. The SCC has a good coding efficiency only for typical desktop video, but cannot improve the coding compression efficiency for natural scene sequences captured by similar cameras, and can greatly increase the coding complexity. In addition, the screen video content has many scenes with natural sequences, such as playing movie and television series collected by a camera. The content produced by UGC users contains natural scene videos collected by a plurality of cameras and screen contents similar to ppt, document screen recording, hard captions and the like, and if screen coding tools are opened for all videos, the waste of computing power is caused.
In view of the above situation, the inventor proposes the present application, and the following describes the scheme provided by the embodiments of the present application in detail with reference to the accompanying drawings.
As shown in fig. 1, fig. 1 is a flowchart of an encoding method of an embodiment of the present disclosure, including the following steps.
S100, acquiring an Nth frame of video image in the video, wherein N is a positive integer. Specifically, N is a positive integer that increases from 1.
Today's video is typically a digital video, essentially a series of digital images of a continuous content, arranged in chronological order. Because human eyes have a visual persistence mechanism, the continuously played images can form a smooth and continuous visual effect, and when the playing speed is fast enough, the human eyes do not distinguish each image, but form a continuous video in the brain. Therefore, an image is a basic unit of a video signal. In order to distinguish from a still image, a complete image in a Video is generally called a Frame (Frame), and a Video composed of many frames in a time Sequence is called a Video Sequence (Video Sequence). And one frame image may be divided into a plurality of image blocks.
S200, detecting the type of the Nth frame of video image; wherein the types include key frames and non-key frames.
Specifically, a video sequence is composed Of several small Groups Of Pictures (GOPs), each GOP being independently coded. Each GOP begins with a key frame (I-frame) picture and subsequent pictures are non-key frames, such as P-frames and B-frames. Where I frames need not reference other frames, and P and B frames need to reference I frames. Starting from a key frame picture, the picture and pictures following it can be decoded independently, without having to rely on any previous picture in the codestream. In the encoded stream, the appearance of a key frame image marks both the end of the previous group of pictures and the start of a new group of pictures.
S300, when the type of the N frame video image is the key frame, starting a screen coding tool for at least one frame video image behind the N frame video image, and coding at least one frame video image based on the screen coding tool. Wherein, the at least one frame of video image mentioned here includes the N frame of video image and M +1 frames of consecutive video images composed of the following M frames of non-key frame video images, and M is a positive integer. Of course, the at least one frame of video image may not include the nth frame of video image, or only include the nth frame of video image.
In particular, upon detection of a key frame (I-frame) video image, embodiments of the present disclosure may encode the key frame video image and M consecutive non-key frame video images following the key frame video image. As shown in fig. 2, fig. 2 is a schematic diagram of an encoding method according to an embodiment of the present disclosure. Fig. 2 shows video image frames in a GOP, for example, where the first frame has a sequence number N, and the consecutive M frames of video images from the nth frame are taken, i.e., from the nth frame to the (N + M) th frame, for example, to perform an encoding operation. It can be understood that the value of M is a positive integer smaller than the total number of frames in the GOP, and the specific value can be determined according to actual situations. More specifically, each frame of video image can be divided into a plurality of image blocks; wherein different image blocks can be encoded in different modes, which can include a screen video encoding mode and a normal mode.
S400, counting the number of image blocks in each frame of video image in the at least one frame of video image respectively in the screen coding mode to obtain a statistical result.
Optionally, in this embodiment of the present application, when the number of the image blocks in the screen coding mode in each of the at least one frame of video image is greater than or equal to a first preset value and/or the percentage of the image blocks in the screen coding mode is greater than or equal to a second preset value, determining to use the screen coding tool for the consecutive non-key frame video images; otherwise, determining not to use the screen coding tool on the consecutive non-key frame video images.
The occupation ratio mentioned herein refers to a ratio of the number of image blocks in the screen coding mode to the number of all image blocks included in a frame of image.
Optionally, the first preset value and/or the second preset value may be the same or different for video images of different frames.
Specifically, the number of the at least one frame of video image may be defined as M +1, and the at least one frame of video image includes an nth frame of video image; coding a first image area in the N frame of video image based on the screen coding tool to obtain the number of first image blocks adopting the screen coding mode in the first image area; sequentially encoding the (i +1) th image area in the (N + i) th frame of video image based on the screen encoding tool, and respectively obtaining the number of (i +1) th image blocks adopting the screen encoding mode in the (i +1) th image area; wherein i is an integer increasing from 1 to M; sequentially and respectively obtaining the first occupation ratios of the first image blocks adopting the screen coding mode in the first image area until the (M +1) th occupation ratio of the (M +1) th image block adopting the screen coding mode in the (M +1) th image area; wherein the statistical result includes (M +1) th ratios from the first ratio to the (M +1) th ratio. The image area mentioned here may be an entire frame image or divided from a frame image. That is, one frame image may also include at least one image area.
More specifically, still taking fig. 2 as an example, assuming that the nth frame to (N + M) th frame video images are encoded separately, for each frame video image, different image blocks may be encoded in different modes. For example, three tiles may be included in the nth frame of video image, and if two of the tiles are coded in the screen coding mode, it is known that the number of the tiles in the sampled screen coding mode is 2, and the ratio is 2/3. Similarly, in other frame video images, the respective numbers of image blocks in the screen coding mode can also be obtained. The screen coding mode includes at least one of an Intra Block Copy (IBC) mode, a Hash-based operation search (Hash me) mode, a Palette coding (Palette) mode, and an Adaptive Color Transform (ACT) mode. Embodiments of the present disclosure may include statistics in both numerical and ratio forms. If the statistical result is a numerical value, the number of the image blocks adopting the screen coding mode in each frame of the obtained video image is the statistical result; if the statistical result is the ratio, the obtained number can be divided by the total number of the image blocks in the frame video image, and the occupation ratio of the image blocks adopting the screen coding mode in the frame video image can be obtained.
S500, determining whether to use the screen coding tool for the continuous non-key frame video image after the at least one frame video image before the next key frame comes according to the statistical result.
Specifically, when the statistical result is a ratio, such as each ratio of the (M +1) ratios from the first ratio to the (M +1) th ratio is greater than or equal to a preset ratio value, the disclosed embodiments may determine to use the screen coding tool on the consecutive non-key frame video images; otherwise, determining not to use the screen coding tool on the consecutive non-key frame video images. When the statistical result is a numerical value, determining to use the screen coding tool for the continuous non-key frame video images when the number of the image blocks adopting the screen coding mode in each frame of the Nth frame of video images and the at least one frame of video images is greater than or equal to a preset value; otherwise, determining not to use the screen coding tool on the consecutive non-key frame video images. More specifically, the setting criterion adopted in the embodiments of the present disclosure may be a preset value or a preset ratio. The specific value can be determined according to the requirement, for example, 50%, 80%, and the like, and is not limited in the embodiment of the disclosure. Referring to fig. 2 again, the total (M +1) frames from the nth frame to the (N + M) th frame of the video pictures is equivalent to sampling the (M +1) frame of the video pictures in the GOP, and determining whether to start the screen coding tool from the (N + M +1) th frame according to the sampling result, as shown by the dotted line in fig. 2. The embodiment of the disclosure can compare the number/ratio of image blocks of the screen video coding mode in each sampling frame with a threshold (numerical value or ratio), and if the number/ratio in each frame is greater than a certain threshold, the screen coding tool is continuously started; otherwise, the screen coding tool is closed and the traditional coding tool is selected. In other words, if the video content is greater than or equal to the preset value, the video image in the image group is the screen video content; if the video image is smaller than the preset value, the video image in the image group is non-screen video content. It can be understood that in the embodiment of the present disclosure, when the ratio of all image blocks in the sampled video image frame that adopt the screen coding mode to all image blocks is greater than a threshold, and the ratio corresponding to each frame is not greater than a threshold, the screen coding tool can be selected to continue to be turned on.
Furthermore, upon determining that the screen coding tool is not to be used for non-key frame video images, embodiments of the present disclosure may turn off the screen coding tool that has been turned on until a next video image frame of the type of the key frame is detected. That is, the steps of S100 to S500 are repeated in the next image group, and specific reference may be made to the above description, which is not repeated herein.
The first frames in each I frame interval are taken as sampling frames, and the video coding mode ratio of the sampling frames is used for judging whether the frames in the current I frame interval need to start the screen coding tool or not. When each I frame comes, a screen coding tool is started first, several frames are coded continuously, and then the occupation ratio of the image blocks which are finally coded into SCC modes (IBC, Hash me, Palette, ACT and the like) is counted. If the ratio is larger than a certain threshold value, judging that a plurality of frames before the next I frame arrive are screen video contents, and starting a screen video coding tool; if the ratio is not more than a certain threshold value, judging that a plurality of frames before the next I frame are natural contents, and closing the screen coding tool. That is, the SCC mode ratio of the sampling frame is used as a judgment basis to determine whether the frame in the current I frame interval needs to start the screen coding tool. The video coding mode can refer to a coding mode of a current coding block, which is to be written into a code stream; when the decoder receives the code stream, the coding mode can be analyzed, and thus the original image is restored. For example, an image block may be encoded as Intra (Intra), inter (inter), skip, etc. Typical screen video content, such as text graphics, is often encoded in a screen video encoding mode, but may also be encoded in a regular mode. Specifically, if the current frame is an I frame and the frame number is N, starting a screen video coding tool, and counting the ratio p [0] of the SCC mode after the coding of the current sampling frame is finished; if the current frame is a non-I frame, the frame number is K, and the first frame sequence number in the GOP is N, if K-N is less than M, starting a screen video coding tool, and counting the ratio p [ I ] (I is 1 … M) of the SCC mode after the coding of the current sampling frame is finished; when K-N is equal to M, counting p [ I ] (I is 0,1, … M), if p [ I ] > thresh (preset threshold) is provided for each I, judging that all frames from the current frame to the next I frame are SCC contents, opening a screen coding tool, and otherwise, closing the screen coding tool; and when K-N is larger than M, judging whether to open the screen coding tool or not according to the result. And (5) looping the above steps to finish all frame coding.
The method can judge whether the video is typical screen video content or not by detecting the type of the video; if yes, starting a screen coding tool to improve coding efficiency; if not, the screen coding tool is closed, and the coding complexity is reduced. Compared with the setting that the screen coding tool is opened all the time, the method and the device can effectively reduce the coding time and ensure the coding compression efficiency of the screen video content.
As shown in fig. 3, an encoded apparatus 10 is further provided in the embodiment of the present disclosure, which includes an obtaining module 11, a detecting module 13, an operating module 15, a counting module 17, and a determining module 19. The obtaining module 11 may be configured to obtain an nth frame video image in a video; wherein N is a positive integer, specifically, N is a positive integer which increases from 1. The detection module 13 can be used to detect the type of the nth frame video image. The operation module 15 may be configured to, when the type of the nth frame video image is the key frame, start a screen coding tool on at least one frame video image subsequent to the nth frame video image, and code the at least one frame video image based on the screen coding tool. The counting module 17 may be configured to count the number of image blocks in each frame of video image, which respectively adopt the screen coding mode, in the at least one frame of video image, so as to obtain a statistical result. The determination module 19 is operable to determine whether to use the screen coding tool for consecutive non-key frame video images following the at least one frame video image before the next key frame arrives according to the statistical result.
As shown in fig. 4, the embodiment of the present disclosure further provides an encoding apparatus 30. Unlike the apparatus 10 of the above embodiment, the apparatus 30 of the embodiment of the present disclosure further includes a switch module 36, which is configured to turn off the turned-on screen coding tool when determining that the screen coding tool is not used for the consecutive non-key frame video images, until a next video image frame of the key frame type is detected, and then turn on the screen coding tool.
For the embodiments of the apparatus, since they correspond substantially to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described apparatus embodiments are merely illustrative, wherein the modules described as separate modules may or may not be separate. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The encoding method and apparatus of the present disclosure have been described above based on the embodiments and application examples. In addition, the present disclosure also provides a terminal and a storage medium, which are described below.
Referring now to fig. 5, a schematic diagram of an electronic device (e.g., a terminal device or server) 800 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 5, the electronic device 800 may include a processing means (e.g., central processing unit, graphics processor, etc.) 801 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage means 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the electronic apparatus 800 are also stored. The processing apparatus 801, the ROM 802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
Generally, the following devices may be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 807 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage 808 including, for example, magnetic tape, hard disk, etc.; and a communication device 809. The communication means 809 may allow the electronic device 800 to communicate wirelessly or by wire with other devices to exchange data. While fig. 5 illustrates an electronic device 800 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 809, or installed from the storage means 808, or installed from the ROM 802. The computer program, when executed by the processing apparatus 801, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText transfer protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods of the present disclosure as described above.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, there is provided a method of encoding, including:
acquiring an Nth frame of video image in a video; wherein N is a positive integer, in particular, N is a positive integer which increases from 1;
detecting the type of the Nth frame video image; wherein the types include key frames and non-key frames;
when the type of the N frame video image is the key frame, starting a screen coding tool for at least one frame video image after the N frame video image, and coding the at least one frame video image based on the screen coding tool;
counting the number of image blocks in each frame of video image in the at least one frame of video image respectively in a screen coding mode to obtain a statistical result; and
and determining whether to use the screen coding tool for continuous non-key frame video images after the at least one frame video image before the next key frame comes according to the statistical result.
According to one or more embodiments of the present disclosure, there is provided a method, wherein the at least one frame of video image includes (M +1) frames of consecutive video images composed of the nth frame of video image and a subsequent M frames of non-key frame of video images, where M is a positive integer.
According to one or more embodiments of the present disclosure, there is provided a method, where the counting the number of image blocks in each of the at least one frame of video image, which use a screen coding mode, and obtaining a statistical result includes:
coding the N frame of video image based on the screen coding tool to obtain the number of first image blocks adopting the screen coding mode in a first image area of the N frame of video image;
sequentially encoding the (i +1) th image area in the (N + i) th frame of video image based on the screen encoding tool, and respectively obtaining the number of (i +1) th image blocks adopting the screen encoding mode in the (i +1) th image area; wherein i is an integer increasing from 1 to M; and
sequentially and respectively obtaining a first ratio of the first image block adopting the screen coding mode in the first image area until a (M +1) th ratio of the (M +1) th image block adopting the screen coding mode in the (M +1) th image area;
wherein the statistical result includes (M +1) th ratios from the first ratio to the (M +1) th ratio.
According to one or more embodiments of the present disclosure, there is provided a method, wherein the determining whether to use the screen coding tool for a consecutive non-key frame video image following the at least one frame video image before the next key frame arrives according to the statistical result comprises:
determining to use the screen coding tool on the consecutive non-key frame video images if each of the (M +1) th to (M +1) th duty ratios from the first to (M +1) th duty ratios is greater than or equal to a preset ratio value;
otherwise, determining not to use the screen coding tool on the consecutive non-key frame video images.
According to one or more embodiments of the present disclosure, there is provided a method, wherein the determining whether to use the screen coding tool for a consecutive non-key frame video image following the at least one frame video image before the next key frame arrives according to the statistical result, further comprises:
determining to use the screen coding tool for the continuous non-key frame video images when the number of the image blocks adopting the screen coding mode in each frame of the at least one frame of video images is greater than or equal to a first preset value and/or the occupation ratio is greater than or equal to a second preset value;
otherwise, determining not to use the screen coding tool on the consecutive non-key frame video images.
According to one or more embodiments of the present disclosure, there is provided a method, characterized in that the method further comprises:
turning off the turned-on screen coding tool when it is determined that the screen coding tool is not used for the consecutive non-key frame video images until turning on the screen coding tool when a next video image frame of the key frame type is detected.
In accordance with one or more embodiments of the present disclosure, there is provided a method, wherein the screen coding mode includes at least one of:
an intra block copy mode, a hash-based arithmetic search mode, a palette coding mode, and an adaptive color space conversion mode.
According to one or more embodiments of the present disclosure, there is provided an apparatus for encoding, including:
the acquisition module is used for acquiring the Nth frame of video image in the video; wherein N is a positive integer, specifically, a positive integer which increases from 1;
the detection module is used for detecting the type of the Nth frame of video image;
an operation module, configured to, when the type of the nth frame video image is the key frame, start a screen coding tool for at least one frame video image subsequent to the nth frame video image, and code the at least one frame video image based on the screen coding tool;
the statistical module is used for counting the number of image blocks in each frame of video image in the at least one frame of video image respectively in a screen coding mode to obtain a statistical result; and
and the determining module is used for determining whether the screen coding tool is used for the continuous non-key frame video image after the at least one frame video image before the next key frame comes according to the statistical result.
According to one or more embodiments of the present disclosure, there is provided a terminal including: at least one memory and at least one processor;
wherein the at least one memory is configured to store program code, and the at least one processor is configured to call the program code stored in the at least one memory to perform the method of any one of the above.
According to one or more embodiments of the present disclosure, there is provided a storage medium for storing program code for performing the above-described method.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (10)

1. An encoding method, comprising:
acquiring an Nth frame of video image in a video, wherein N is a positive integer;
detecting the type of the Nth frame video image; wherein the types include key frames and non-key frames;
when the type of the N frame video image is the key frame, starting a screen coding tool for at least one frame video image after the N frame video image, and coding the at least one frame video image based on the screen coding tool;
counting the number of image blocks adopting a screen coding mode in each frame of video image in the at least one frame of video image to obtain a counting result; and
and determining whether to use the screen coding tool for continuous non-key frame video images after the at least one frame video image before the next key frame comes according to the statistical result.
2. The method of claim 1, wherein said determining whether to use the screen coding tool for consecutive non-key frame video pictures following the at least one frame video picture before the next key frame arrives according to the statistics comprises:
determining to use the screen coding tool for the continuous non-key frame video images when the number of the image blocks adopting the screen coding mode in each frame of the at least one frame of video images is greater than or equal to a first preset value and/or the occupation ratio is greater than or equal to a second preset value;
otherwise, determining not to use the screen coding tool on the consecutive non-key frame video images.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
turning off the turned-on screen coding tool when it is determined that the screen coding tool is not used for the consecutive non-key frame video images until turning on the screen coding tool when a next video image frame of the key frame type is detected.
4. The method according to claim 1, wherein the at least one frame of video image comprises (M +1) consecutive frames of video images consisting of the nth frame of video image and a subsequent M frames of non-key frame video images, wherein M is a positive integer.
5. The method according to claim 4, wherein the counting the number of image blocks in the screen coding mode in each of the at least one frame of video image comprises:
coding the N frame of video image based on the screen coding tool to obtain the number of first image blocks adopting the screen coding mode in a first image area of the N frame of video image;
sequentially encoding the (i +1) th image area in the (N + i) th frame of video image based on the screen encoding tool, and respectively obtaining the number of (i +1) th image blocks adopting the screen encoding mode in the (i +1) th image area; wherein i is an integer increasing from 1 to M; and
sequentially and respectively obtaining a first ratio of the first image block adopting the screen coding mode in the first image area until a (M +1) th ratio of the (M +1) th image block adopting the screen coding mode in the (M +1) th image area;
wherein the statistical result includes (M +1) th ratios from the first ratio to the (M +1) th ratio.
6. The method of claim 5, wherein said determining whether to use said screen coding tool for consecutive non-key frame video pictures following said at least one frame video picture before the next key frame arrives based on said statistics further comprises:
determining to use the screen coding tool on the consecutive non-key frame video images if each of the (M +1) th to (M +1) th duty ratios from the first to (M +1) th duty ratios is greater than or equal to a preset ratio value;
otherwise, determining not to use the screen coding tool on the consecutive non-key frame video images.
7. The method of claim 1, wherein the screen coding mode comprises at least one of:
an intra block copy mode, a hash-based arithmetic search mode, a palette coding mode, and an adaptive color space conversion mode.
8. An apparatus of encoding, comprising:
the acquisition module is used for acquiring the Nth frame of video image in the video; wherein N is a positive integer;
the detection module is used for detecting the type of the Nth frame of video image; wherein the types include key frames and non-key frames;
an operation module, configured to, when the type of the nth frame video image is the key frame, start a screen coding tool for at least one frame video image subsequent to the nth frame video image, and code the at least one frame video image based on the screen coding tool;
the statistical module is used for counting the number of image blocks adopting a screen coding mode in each frame of video image in the at least one frame of video image to obtain a statistical result; and
and the determining module is used for determining whether the screen coding tool is used for the continuous non-key frame video image after the at least one frame video image before the next key frame comes according to the statistical result.
9. A terminal, comprising:
at least one memory and at least one processor;
wherein the at least one memory is configured to store program code and the at least one processor is configured to invoke the program code stored in the at least one memory to perform the method of any of claims 1 to 7.
10. A storage medium for storing program code for performing the method of any one of claims 1 to 7.
CN202010240984.6A 2020-03-31 2020-03-31 Encoding method and apparatus, terminal and storage medium Pending CN111526363A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010240984.6A CN111526363A (en) 2020-03-31 2020-03-31 Encoding method and apparatus, terminal and storage medium
PCT/CN2021/079799 WO2021196994A1 (en) 2020-03-31 2021-03-09 Encoding method and apparatus, terminal, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010240984.6A CN111526363A (en) 2020-03-31 2020-03-31 Encoding method and apparatus, terminal and storage medium

Publications (1)

Publication Number Publication Date
CN111526363A true CN111526363A (en) 2020-08-11

Family

ID=71910602

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010240984.6A Pending CN111526363A (en) 2020-03-31 2020-03-31 Encoding method and apparatus, terminal and storage medium

Country Status (2)

Country Link
CN (1) CN111526363A (en)
WO (1) WO2021196994A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021196994A1 (en) * 2020-03-31 2021-10-07 北京字节跳动网络技术有限公司 Encoding method and apparatus, terminal, and storage medium
WO2023202177A1 (en) * 2022-04-19 2023-10-26 华为技术有限公司 Image encoding method and apparatus

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114222124B (en) * 2021-11-29 2022-09-23 广州波视信息科技股份有限公司 Encoding and decoding method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103517077A (en) * 2012-12-14 2014-01-15 深圳百科信息技术有限公司 Method and device for rapidly selecting prediction mode
US20180262760A1 (en) * 2017-03-10 2018-09-13 Intel Corporation Screen content detection for adaptive encoding
CN110312134A (en) * 2019-08-06 2019-10-08 杭州微帧信息科技有限公司 A kind of screen video coding method based on image procossing and machine learning
CN110418138A (en) * 2019-07-29 2019-11-05 北京奇艺世纪科技有限公司 Method for processing video frequency, device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111526363A (en) * 2020-03-31 2020-08-11 北京字节跳动网络技术有限公司 Encoding method and apparatus, terminal and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103517077A (en) * 2012-12-14 2014-01-15 深圳百科信息技术有限公司 Method and device for rapidly selecting prediction mode
US20180262760A1 (en) * 2017-03-10 2018-09-13 Intel Corporation Screen content detection for adaptive encoding
CN110418138A (en) * 2019-07-29 2019-11-05 北京奇艺世纪科技有限公司 Method for processing video frequency, device, electronic equipment and storage medium
CN110312134A (en) * 2019-08-06 2019-10-08 杭州微帧信息科技有限公司 A kind of screen video coding method based on image procossing and machine learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021196994A1 (en) * 2020-03-31 2021-10-07 北京字节跳动网络技术有限公司 Encoding method and apparatus, terminal, and storage medium
WO2023202177A1 (en) * 2022-04-19 2023-10-26 华为技术有限公司 Image encoding method and apparatus

Also Published As

Publication number Publication date
WO2021196994A1 (en) 2021-10-07

Similar Documents

Publication Publication Date Title
US10264271B2 (en) Coded-block-flag coding and derivation
CN111526363A (en) Encoding method and apparatus, terminal and storage medium
US9262986B2 (en) Reference frame management for screen content video coding using hash or checksum functions
CN113473126B (en) Video stream processing method and device, electronic equipment and computer readable medium
KR20010087553A (en) A hierarchical hybrid shot change detection method for mpeg-compressed video
CN107172376B (en) Video coding method and device based on screen sharing
CN114363649B (en) Video processing method, device, equipment and storage medium
CN112714273A (en) Screen sharing display method, device, equipment and storage medium
CN113099272A (en) Video processing method and device, electronic equipment and storage medium
CN113965751A (en) Screen content coding method, device, equipment and storage medium
CN112203085A (en) Image processing method, device, terminal and storage medium
JP2021521752A (en) Hierarchical tiles
CN111182310A (en) Video processing method and device, computer readable medium and electronic equipment
US20050089232A1 (en) Method of video compression that accommodates scene changes
Ko et al. Implementation and evaluation of fast mobile VNC systems
CN112203086B (en) Image processing method, device, terminal and storage medium
JP2013110502A (en) Image processing apparatus and image processing method
CN111225214B (en) Video processing method and device and electronic equipment
WO2007139391A1 (en) Pre-processing of video data
JP2004208146A (en) Device and method for encoding moving image
US20150078433A1 (en) Reducing bandwidth and/or storage of video bitstreams
CN112449188B (en) Video decoding method, video encoding device, video encoding medium, and electronic apparatus
JP2006173962A (en) Picture encoding device
CN112449187B (en) Video decoding method, video encoding device, video encoding medium, and electronic apparatus
EP3989566A1 (en) Motion information list construction method in video encoding and decoding, device, and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200811

RJ01 Rejection of invention patent application after publication