CN115118999A - Entropy context processing method, system on chip and electronic equipment - Google Patents

Entropy context processing method, system on chip and electronic equipment Download PDF

Info

Publication number
CN115118999A
CN115118999A CN202210715926.3A CN202210715926A CN115118999A CN 115118999 A CN115118999 A CN 115118999A CN 202210715926 A CN202210715926 A CN 202210715926A CN 115118999 A CN115118999 A CN 115118999A
Authority
CN
China
Prior art keywords
entropy
cores
context
storage unit
core
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210715926.3A
Other languages
Chinese (zh)
Inventor
黄异青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Technology China Co Ltd
Original Assignee
ARM Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARM Technology China Co Ltd filed Critical ARM Technology China Co Ltd
Priority to CN202210715926.3A priority Critical patent/CN115118999A/en
Publication of CN115118999A publication Critical patent/CN115118999A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The application relates to the technical field of data processing, in particular to an entropy context processing method, a system on chip and electronic equipment, which can reduce time consumption of an entropy context storage and recovery process and avoid occupying system bandwidth. The method is particularly applied to a system on chip comprising a plurality of cores and at least one memory unit, and comprises the following steps: detecting that the input bit conditions of a plurality of cores do not meet a preset condition; the plurality of kernels respectively backup the cached entropy context to at least one storage unit; detecting that input bit conditions of a plurality of cores meet a preset condition; the multiple kernels respectively acquire corresponding entropy contexts from at least one storage unit and re-cache the corresponding entropy contexts; multiple cores entropy decode different macroblock rows in the input bitstream in parallel and update the cached entropy context.

Description

Entropy context processing method, system on chip and electronic equipment
Technical Field
The present application relates to the field of video decoding technologies, and in particular, to an entropy context processing method, a system on chip, and an electronic device.
Background
For High definition Video, mainstream Video compression standards now include High Efficiency Video Coding (HEVC), Advanced Video Coding (AVC), and source Coding Standard (AVS), which perform Video decoding by a Video Decoder (VDEC). Generally, a video decoder adopts a pipeline design, and takes macroblocks (denoted as MB in AVC, i.e. Macro Blocks; denoted as CTB in HEVC, i.e. Coding Tree Blocks) as pipeline units. Specifically, taking HEVC as an example, the main pipeline stage is divided into 4 levels, and the functions of each level are described as follows: the first level is Entropy decoding (Entropy Decode); the second stage is Inverse Quantization (IQT) and inverse DCT transform (IDCT); the third level is frame prediction (IPred) and image reconstruction (Rec); the fourth stage is deblocking filtering (Dblock) and adaptive compensation (SAO).
The video decoder decodes the bitstream through an entropy engine, which parses the bitstream specifically based on different entropy algorithms. Different entropy algorithms are used for different video compression standards, for example, HEVC uses Context-Based Adaptive Binary Arithmetic Coding (CABAC), AVS uses Context-Based Access Control (CBAC), AV1 uses a multi-symbol scheme, etc. Regardless of the entropy algorithm used by the video standard, the entropy engine inside the video decoder always parses the bitstream in the input buffer, and the size of the input buffer may vary according to the system requirements. In conventional video decoding systems, bits in the bitstream are loaded block-by-block (i.e., macroblock). Then only the very large input buffer can hold enough bit stream for the entropy engine to decode normally, otherwise the problem of "bit underflow" will occur. This means that the current entropy engine consumes all available bits in the input buffer, while no new available bits are stored in the input buffer. This underflow condition is typically caused by axi (advanced eXtensible interface) latency of the system bus, internet latency, or some system-level problems (e.g., more data occupying the system bus). The problem of bit underflow in entropy decoding may cause decoding anomalies and thus failures.
In order to solve the above problem, a common method is to store the current decoding information (e.g., neighbor information (i.e., adjacent macroblock information, such as the macroblock identifier) and entropy context) before the entropy decoding starts, and when the decoding process encounters "bit underflow", the video decoder exits the decoding process, waits for the completion of the subsequent bit loading, and then restores the decoding information to return to the decoding process. These processes of current entropy decoding are typically handled by firmware in the video decoder that performs the process of saving and restoring the entropy context between the video decoder's internal Static Random-Access Memory (SRAM) and external Double Data Rate synchronous dynamic Random Access Memory (DDR), resulting in the most time consuming part of the entropy decoding process being saving and restoring the entropy context. Also, when the video decoder uses multi-core decoding, the entropy context saving and restoring process is not only time consuming but also results in high system bandwidth.
Disclosure of Invention
The application provides an entropy context processing method, a system on chip and an electronic device, which can reduce time consumption of an entropy context storage and recovery process and avoid occupying system bandwidth.
A first aspect of the present application provides an entropy context processing method, applied in a system on chip including a plurality of cores and at least one storage unit (e.g., at least one milestone Ram hereinafter), the method including: detecting that the input bit conditions of the plurality of cores do not meet a preset condition; the plurality of kernels respectively backup the cached entropy context to the at least one storage unit; detecting that input bit conditions of a plurality of cores meet a preset condition; the multiple kernels respectively acquire corresponding entropy contexts from the at least one storage unit and re-cache the corresponding entropy contexts; the multiple cores entropy decode different macroblock rows in the input bitstream in parallel and update the cached entropy context. In this way, since each core does not need to save and restore the entropy context in the external DDR, the time consumption of the process of saving and restoring the entropy context by each core is reduced, and the system bandwidth generated by saving and restoring the entropy context by the core in the external DDR can be avoided in a multi-core decoding scenario.
In a possible implementation manner of the first aspect, the preset condition is: at least one of the plurality of cores does not input a bit stream within a preset time length, or the number of bits of the bit stream input into the plurality of cores is less than a preset number. It will be appreciated that the preset condition is used to indicate a bit underflow problem occurring in an input buffer connected to a plurality of cores.
In a possible implementation manner of the first aspect, the number of the at least one storage unit is the same as the number of the plurality of cores, and one core uniquely corresponds to one storage unit; each of the plurality of cores is to store or retrieve an entropy context in a unique corresponding one of the at least one storage unit.
In a possible implementation manner of the first aspect, the system on chip further includes an arbiter connected to the at least one storage unit, where a number of the at least one storage unit is smaller than a number of the plurality of cores; the method further comprises the following steps: the plurality of kernels respectively send a storage request message to the arbiter; the arbiter allocates a storage address to each of the plurality of cores based on a plurality of storage request messages; the arbiter returns a response message allowing storage to the plurality of cores respectively; wherein the entropy context cached by each core is stored in the at least one storage unit according to the storage address allocated by the arbiter.
In a possible implementation manner of the first aspect, one of the storage request messages includes at least one of the following: the method comprises the steps that identification information of a kernel, corresponding entropy context cached by the kernel, and first indication information requesting to store the entropy context are indicated; one of said response messages allowing storage comprises at least one of: identification information of the corresponding core, identification of a storage address allocated by the corresponding core, and second indication information indicating that the corresponding core is allowed to store the corresponding entropy context in the at least one storage unit.
In a possible implementation manner of the first aspect, each of the cores includes a firmware, an entropy engine, and a context cache unit (i.e., a context Ram in the following); wherein each of the cores entropy decodes the bitstream by an entropy engine and generates an entropy context; each kernel caches entropy context through a context caching unit; and each kernel initiates a storage request to the arbiter through firmware, backups the entropy context cached in the corresponding context caching unit to the at least one storage unit, and restores and stores the entropy context acquired from the at least one storage unit to the corresponding context caching unit.
In a possible implementation manner of the first aspect, the plurality of cores entropy decode the bitstream according to macroblock rows in the bitstream, and one of the cores entropy decodes the bitstream according to macroblocks in the macroblock rows, where one macroblock row corresponds to one core, and an entropy context cached by one of the cores corresponds to one macroblock.
In a possible implementation manner of the first aspect, the system on chip is connected to a target storage unit external to the system on chip; the method further comprises the following steps: each of the plurality of cores sends a core processing message to the target storage unit, wherein one of the core processing messages includes at least one of: information of the corresponding kernel, information of the bitstream, macroblock or macroblock row input in the corresponding kernel.
A second aspect of the present application provides a system-on-chip including a plurality of cores and at least one memory unit; the cores are used for detecting that the input bit conditions of the cores do not meet preset conditions; respectively backing up the cached entropy context into the at least one storage unit; the at least one storage unit is used for storing entropy contexts from the plurality of kernels; the cores are also used for detecting that the input bit conditions of the cores meet preset conditions; respectively acquiring corresponding entropy contexts from the at least one storage unit, and re-caching the corresponding entropy contexts; different macroblock rows in the input bitstream are entropy decoded in parallel and the cached entropy context is updated.
In a possible implementation manner of the second aspect, the preset condition is: at least one of the plurality of cores does not input a bit stream within a preset time length, or the number of bits of the bit stream input into the plurality of cores is less than a preset number.
In a possible implementation manner of the second aspect, the number of the at least one storage unit is the same as the number of the plurality of cores, and one core uniquely corresponds to one storage unit; each of the plurality of cores is to store or retrieve an entropy context in a unique corresponding one of the at least one storage unit.
In a possible implementation manner of the second aspect, the system on chip further includes an arbiter connected to the at least one storage unit, where the number of the at least one storage unit is smaller than the number of the plurality of cores; the plurality of kernels are further used for respectively sending a storage request message to the arbiter; the arbiter is used for allocating a storage address to each of the plurality of cores based on a plurality of storage request messages; respectively returning a response message allowing storage to the plurality of kernels; wherein the entropy context cached by each core is stored in the at least one storage unit according to the storage address allocated by the arbiter.
In a possible implementation manner of the second aspect, one of the storage request messages includes at least one of the following: the method comprises the steps that identification information of a kernel, entropy context corresponding to kernel cache, and first indication information requesting to store the entropy context are indicated; one of said response messages allowing storage comprises at least one of: identification information of the corresponding core, identification of a storage address allocated by the corresponding core, and second indication information indicating that the corresponding core is allowed to store the corresponding entropy context in the at least one storage unit.
In a possible implementation manner of the second aspect, each of the cores includes a firmware, an entropy engine, and a context cache unit; wherein each of the cores entropy decodes the bitstream by an entropy engine and generates an entropy context; each kernel caches entropy context through a context caching unit; and each kernel initiates a storage request to the arbiter through firmware, backups the entropy context cached in the corresponding context caching unit to the at least one storage unit, and restores and stores the entropy context acquired from the at least one storage unit to the corresponding context caching unit.
In a possible implementation manner of the second aspect, the plurality of cores entropy decode the bitstream according to macroblock rows in the bitstream, one of the cores entropy decodes the bitstream according to macroblocks in the macroblock rows, wherein one macroblock row corresponds to one core, and an entropy context cached by one of the cores corresponds to one macroblock.
In a possible implementation manner of the second aspect, the system on chip is connected to a target storage unit external to the system on chip; each of the plurality of cores is further configured to send a core processing message to the target storage unit, where the core processing message includes at least one of the following: information of the corresponding kernel, information of the bitstream, macroblock or macroblock row input in the corresponding kernel.
A third aspect of the present application provides an electronic device, comprising the system-on-chip as described in the second aspect above, and a target storage unit external to the system-on-chip; the target storage unit is configured to store kernel processing information sent by the multiple kernels, where one of the kernel processing information includes at least one of the following: information of the corresponding kernel, information of the bitstream, macroblock or macroblock line input in the corresponding kernel.
Drawings
FIG. 1 illustrates a scene schematic diagram of an entropy decoding application, according to some embodiments of the present application;
FIG. 2A illustrates a schematic structural diagram of an electronic device, according to some embodiments of the present application;
FIG. 2B illustrates an architectural diagram of an entropy decoding module in an electronic device, according to some embodiments of the present application;
FIG. 3 illustrates an architectural diagram of an entropy decoding module in an electronic device, according to some embodiments of the present application;
FIG. 4 illustrates a flow diagram of a method of entropy context processing, according to some embodiments of the present application;
FIG. 5 illustrates an architectural diagram of an entropy decoding module in an electronic device, according to some embodiments of the present application;
FIG. 6 illustrates a flow diagram of a method of entropy context processing, according to some embodiments of the present application;
FIG. 7 is a block diagram illustrating an electronic device, according to an embodiment of the present application;
FIG. 8 is a block diagram illustrating a system on a chip according to an embodiment of the present application.
Detailed Description
Embodiments of the present application include, but are not limited to, an entropy context processing method, a readable medium, and an electronic device thereof.
The embodiment of the application provides a processing method for an entropy context, wherein an on-chip storage unit is arranged in a video decoder and used for storing the entropy context generated when a bit underflow occurs in the process of entropy decoding. Specifically, the video decoder performs entropy decoding on the bit stream in the input buffer area through the entropy engine, stores the current entropy context into the on-chip storage unit when the bit underflow is generated due to insufficient bits in the input buffer area, and takes out the last stored entropy context from the on-chip storage unit to restore the entropy decoding process when the bit underflow problem is eliminated. In addition, an on-chip arbiter connected with the on-chip memory is also arranged in the video decoder. In a multi-core decoding scene, the entropy contexts of a plurality of cores can be respectively stored in at least one on-chip memory through an on-chip arbiter, and then the entropy contexts corresponding to the cores are recovered through the arbiter to continue entropy decoding. Then, since the firmware does not need to save and restore the entropy context in the external DDR, the time consumption of the firmware on the saving and restoring process of the entropy context is reduced, and the system bandwidth generated by the firmware saving and restoring the entropy context in the external DDR can be avoided in the multi-core decoding scenario.
It is to be understood that the video decoder in the embodiments of the present application may be disposed in a system on chip (SoC) in an electronic device, and the on-chip Memory unit described in the embodiments of the present application refers to a Memory in the system on chip, such as a Static Random Access Memory (SRAM) inside the system on chip. Specifically, as an example, an on-chip storage unit of a system on chip in which the video decoder is located is hereinafter referred to as a milestone Ram, and the on-chip storage unit is mainly used for storing an entropy context generated when a bit underflow occurs during decoding.
In some embodiments, the video decoder may be a programmable system on a chip, where a single chip performs the main logic function of the whole system, and may have the function of programming software and hardware in the system. As an example, the system-on-chip may include one or more embedded processor cores (cores) therein, and have small capacity on-chip high speed RAM resources, as well as sufficient on-chip programmable logic resources.
In addition, it is understood that the on-chip arbiter described in the embodiments of the present application may be a functional module in the on-chip system where the video decoder is located, and may be specifically implemented by software and/or hardware. As an example, the above-mentioned arbiters (arbiters) can be implemented by software algorithms, and are mainly used to determine which module occupies the same resource when two or more modules need to occupy the resource. Specifically, the above on-chip arbiter is referred to as an arbiter in the following of the application embodiments, and is used to decide which storage addresses in the external storage area the entropy contexts in the multiple cores are stored in.
The processing method of the entropy context is applied to an entropy decoding process, and is particularly applied to scenes needing to carry out coding and decoding of audio and/or video data. For example, the scheme of the application can be applied to scenes such as panoramic sound movies, ultra-high definition televisions, internet broadband audio and video services, digital audio and video broadcasting wireless broadband multimedia communication, virtual reality, augmented reality, video monitoring and the like. The following description will be made by taking a decoding process of video as an example.
It can be understood that there are many scenes in which video encoding is required, and besides video shooting and data compression storage through a smartphone and the like, video encoding is required in video conferences, digital TVs, and scenes in which compressed videos are stored on digital media including CDs, DVDs, memory sticks and the like.
In some embodiments, the Video compression Standard related to the present application may be, but is not limited to, High Efficiency Video Coding (HEVC), Advanced Video Coding (AVC), and source Coding Standard (AVS). Accordingly, the entropy decoding provided by the present application uses different entropy algorithms in different video compression standards, for example, HEVC employs Context-Based Adaptive Binary Arithmetic Coding (CABAC), AVS employs Context-Based Access Control protocol (CBAC), AV1 employs a multi-symbol scheme, and the like. It is to be understood that the entropy algorithm applied by the entropy context processing method of the embodiment of the present application may be any one of the above examples, and is not limited by the way. In the following embodiments, the entropy context processing method provided by the present application is mainly described by taking entropy decoding based on CBAC in AVS as an example.
Fig. 1 is a view illustrating a video encoding and decoding scene according to an embodiment of the present application. The scenario in fig. 1 includes connecting electronic device 100 and electronic device 200 over one or more networks.
In some embodiments of the present application, the electronic device 200 may encode the video based on AVS, obtaining compressed video data; the electronic device 200 may transmit the compressed video data to the electronic device 100; the electronic device 100 may decode the compressed video data to obtain decompressed video data.
In some embodiments, the electronic device 100 or the electronic device 200 may include, but is not limited to, a smart phone, a vehicle-mounted device, a personal computer, an artificial intelligence device, a tablet computer, a personal digital assistant, a smart wearable device (e.g., a smart watch or bracelet, smart glasses), a smart television (otherwise known as a smart large screen, a smart screen, or a large screen television, etc.), a virtual reality/mixed reality/augmented reality display device, a server, and the like.
The electronic device 200 performs video coding on the video to obtain compressed video data, where the compressed video data may be a code stream, that is, a multi-bit binary (0 or 1) code, for example, a frame of image may be coded into several hundred bytes of binary numbers.
For example, in AVS encoding, different levels of information of one frame of image are encoded, and the levels of the encoded information are: picture, slice, macroblock, block. For example, during AVS encoding, a macroblock may be used as a basic unit for processing, the macroblock size may be 64x64, the macroblock size may also be 8x8, and the macroblock size may also be 16x 16.
After dividing a frame of image into macroblocks, the electronic device 200 may extract information of the macroblocks through intra prediction, inter prediction, motion estimation, motion compensation, and other processes, and express the information of the macroblocks by values of syntax elements, wherein the syntax elements include: motion Vector Difference (MVD), motion mode (intra prediction or inter prediction), macroblock coding template (cbp) block prediction residual coefficients, picture unit mode, intra prediction mode. Then, the electronic device 200 may perform entropy coding on the value of the syntax element to obtain final compressed data, i.e., a code stream.
In some embodiments, entropy coding needs to go through three steps of syntax element binarization, context model selection and update, and binary arithmetic coding calculation. Correspondingly, entropy decoding is used as the inverse process of entropy coding and mainly comprises the steps of binary arithmetic decoding, context model updating, to-be-coded syntax element binarization and the like.
Specifically, binarization of syntax elements is mainly to map non-binary syntax elements into corresponding binary strings. For example, binarization of a syntax element may convert the value of the syntax element into a binary signal string represented by only 0 and 1, for example, when the value of a motion vector is 3, the binary symbol string obtained by performing binarization conversion processing is "1001", where 0 or 1 in "1001" is referred to as a binary value of the syntax element.
In some embodiments, after binarization of the syntax element, it is crucial for entropy coding to select the appropriate context model for the syntax element. For example, in the standard of AVS, 36 syntax elements may be included, defining a total of 269 context models. In the actual encoding process, the parameters of each syntax element are identified by a unique context model, each context model is independent of each other, but the parameters of the syntax elements in the same context model have correlation.
In some embodiments, entropy encoded arithmetic coding encodes the binary symbol string of each syntax element by parameters of a context model, generating a code stream. The arithmetic coding of entropy coding is mainly composed of conventional coding and bypass coding, the conventional coding is coded according to the input bin (binary, representing a binary file) after binarization and parameters in a context model, and the coded value is fed back to the context model according to the binarized value to complete updating of the context model. The parameters of the context model may include: a Most Probable Signal (MPS), a Least Probable Signal (LPS), and the like. For example, the binary symbol string of the motion vector is "1001", and the code stream generated by arithmetically encoding it may be "01".
It should be noted that the entropy context provided by the embodiment of the present application may be the context model described above.
It is understood that entropy coding is the last operation of video coding, is directly related to the generation of the code stream, and is a coding method of lossless compression mainly for removing statistical redundancy in the video signal. For example, the entropy coding of AVS uses CBAC coding. CBAC provides an estimation model of the conditional probability of the current syntax element, selects a different context model according to the currently encoded content, and derives the conditional probability of the current syntax element and the related content from this context model. Finally, the probability is utilized to reduce the redundancy of video coding, thereby achieving the purpose of compression.
In some embodiments, after receiving the compressed video data (i.e., the code stream), the electronic device 100 may perform AVS decoding on the compressed video data to obtain a video image. It is understood that AVS decoding is the inverse of AVS encoding. That is, in the AVS decoding process, the received code stream needs to be entropy decoded first to obtain the value of the syntax element. Furthermore, if a bit underflow problem occurs in the entropy decoding process, the current decoding information including the entropy context, the neighbors and the like needs to be backed up and stored until the bit underflow problem is eliminated, and then the information of the entropy context is restored to return to the entropy decoding process again.
Video compression standards typically employ a hardware accelerator as a Video Decoder (VDEC) to perform Video decoding. The video decoder according to the present application may adopt a single-core scheme or a multi-core scheme.
The single-core hardware decoder mostly adopts a pipeline design, and takes Macro Blocks (for example, MB in AVC, namely Macro Blocks; CTB in HEVC, namely Coding Tree Blocks) as pipeline units, and entropy decoding is usually the first stage of the pipeline stages.
The multi-core design of the video decoder may enable efficient parallel decoding. Specifically, the multi-core parallel decoding scheme is generally developed around the Frame (Frame), Slice (Slice), Tile (Slice), MB/CTB line (macroblock line), and the like levels. For example, in HEVC coding, entropy decoding is directly initialized at the line head of each CTB line, and it is not necessary to wait until the entropy decoding of the previous line is completely finished, so that inter-line serial characteristics of entropy-decoded macroblocks are blocked, and parallel macroblock line calculation is realized. In the following embodiments, the scheme of entropy context processing is mainly described by taking the example that a multi-core decoding scheme is adopted by a video decoder, that is, the multi-core decoding scheme is adopted by entropy decoding.
The present application will mainly use a multi-core scheme as an example for a video decoder in an electronic device.
The following describes in detail the process of performing entropy decoding by the electronic device 100 (e.g., the electronic device 100a and the electronic device 100b) according to the present application with reference to the drawings.
Fig. 2A is a schematic structural diagram of the electronic device 100 according to an embodiment of the present disclosure, and fig. 2A is a possible structural diagram of the electronic device 100.
As shown in fig. 2A, the electronic device 100 may include a processor 110, a power module 140, a memory 180, a camera 170, a mobile communication module 130, a wireless communication module 120, a sensor module 190, an audio module 150, an interface module 160, a display screen 102, and the like.
It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The Processor 110 may include one or more Processing units, for example, Processing modules or Processing circuits that may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), a Microprocessor (MCU), an Artificial Intelligence (AI) Processor or Programmable logic device (FPGA), and a Video Processing Unit (VPU). The different processing units may be separate devices or may be integrated into one or more processors.
For example, in some examples of the present application, the processor 110 may perform a decoding operation such as entropy decoding on compressed video data (i.e., a code stream) through the VPU, and in particular, may perform entropy decoding on the code stream through an entropy decoding module in the VPU. Wherein the entropy decoding module may be the entropy decoding module 100a or the entropy decoding module 100b1 or the entropy decoding module 100b2 hereinafter.
The Memory 180 may be used for storing Data, software programs, and modules, and may be a Volatile Memory (Volatile Memory), such as a Random-Access Memory (RAM), a Double Data Rate SDRAM (DDR); or a Non-Volatile Memory (Non-Volatile Memory), such as a Read-Only Memory (ROM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, HDD) or a Solid-State Drive (SSD); or a combination of the above types of memories, or may be a removable storage medium such as a Secure Digital (SD) memory card. Specifically, the memory 180 may include a program storage area (not shown) and a data storage area (not shown). For example, a data storage area (such as a data storage area in DDR) may be used for entropy decoding in which bit underflow occurs and backup is performed when the code stream is entropy decoded by the storage processor 110.
The mobile communication module 130 may provide a solution including 2G/3G/4G/5G wireless communication applied to the electronic device 100.
The wireless communication module 120 may include an antenna, and implement transceiving of electromagnetic waves via the antenna.
The Wireless communication module 120 may provide a solution for Wireless communication applied to the electronic device 100, including Wireless Local Area Networks (WLANs) (e.g., Wireless Fidelity (Wi-Fi) Networks), Bluetooth (BT), and the like. For example, the electronic device 100 may receive the codestream to be decoded from the electronic device 100 through the wireless communication module 120.
In some embodiments, the mobile communication module 130 and the wireless communication module 120 of the electronic device 100 may also be located in the same module.
The display screen 102 is used for displaying a human-computer interaction interface, images, videos and the like, such as videos after decoding operations, such as entropy decoding and the like.
The sensor module 190 may include a proximity light sensor, a pressure sensor, and the like.
The audio module 150 is used to convert digital audio information into an analog audio signal output or convert an analog audio input into a digital audio signal. In some embodiments, audio module 150 may include speakers, an earpiece, a microphone, and a headphone interface.
The camera 170 may be used to capture images, for example, to capture images in a video to be encoded, such as by entropy encoding.
The interface module 160 includes an external memory interface, a Universal Serial Bus (USB) interface, a Subscriber Identity Module (SIM) card interface, and the like.
It is to be understood that the hardware configuration shown in fig. 2A above does not constitute a specific limitation of the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown in FIG. 2A, or some components may be combined, some components may be split, or a different arrangement of components.
Fig. 2B is a block diagram illustrating an entropy decoding module 100a of an electronic device 100a according to an embodiment of the present application. As shown in fig. 2B, the entropy decoding module 100a includes: a video decoder 10a, an input buffer area 20a and an external storage area 30 a.
The video decoder 10a is configured to perform entropy decoding on macroblocks in the bitstream one by one in combination with the entropy context to obtain values of syntax elements and update and store the entropy context information, and is further configured to backup the entropy context information to the external memory 30 a. In some embodiments, the video decoder 10a may be implemented by a system-on-chip in the electronic device 100 a.
The video decoder 10a includes a core 0, a core 1, a core 2, and a core 3. Each kernel includes a firmware, a context Ram, and an entropy engine (or entropy kernel). The firmware may be implemented in software, and is configured to perform operations on the corresponding context Ram and the entropy engine, such as performing read and write operations on the context Ram, and controlling the entropy engine to start or end entropy decoding. For example, the firmware may be implemented by software.
As an example, the kernel 0 includes firmware 01, a context Ram 02, and an entropy engine 03. Specifically, when a bitstream is input in the kernel 0, the firmware 01 saves kernel information of the kernel 0 into the external storage area 30, and the entropy engine 03 entropy-decodes the input bitstream and updates the entropy context stored in the context Ram. When the entropy engine 03 does not have enough bits to perform entropy decoding, this problem is reported to the firmware 01, indicating that a bit underflow problem occurs in the input buffer 20, and the firmware 01 saves the entropy context backup stored in the context Ram 02 to the external storage area 30. Until the bit underflow problem of the input buffer 20a is eliminated, the firmware 01 reads the entropy context from the external storage area 30 and resumes saving to the context Ram 02. In turn, the entropy engine 03 may entropy decode the newly input bitstream in conjunction with the entropy context in the context Ram 02 to recover the entropy decoding.
It is to be understood that the above-mentioned context Ram 02 slice video decoder 10a is an internal Static Random Access Memory (SRAM) in a system on chip.
The input buffer area 20a is used for buffering a bit stream in compressed video data (i.e., a code stream) received from the electronic device 200.
In some embodiments, the input buffer 20a may be a storage module integrated in the entropy decoding module 100 a. In some other embodiments of the present application, the input buffer 20a may also be an on-chip memory integrated in the system-on-chip of the electronic device 100a, such as a Static Random Access Memory (SRAM) unit; the memory may also be an off-chip memory integrated on the electronic device 100a, such as a double data rate synchronous dynamic random access memory (DDR), a nonvolatile memory, and the like, which is not limited in particular.
The external storage area 30a is used for storing kernel information (such as a serial number or a name of the kernel) in the video decoder 10a and storing the entropy context that needs to be backed up. The external storage area 30 may be an off-chip memory on the electronic device 100a, such as a double data rate synchronous dynamic random access memory (DDR), a nonvolatile memory, and the like, which is not limited in particular. In addition, in some embodiments, the input buffer 20a and the external storage area 30a may be disposed in the same DDR in the electronic device 100 a.
Similarly, for the process of interactively performing the preservation and recovery of the entropy context by each unit in the other cores 1 to 3 in the video decoder 10a and the external storage area 30a, reference may be made to the description related to the core 0, and details thereof are not repeated.
It should be noted that, based on the scenario of entropy context processing shown in fig. 2B, since the firmware performs the saving and restoring process of the entropy context between the context Ram inside the video decoder and the external DDR where the external cache region is located, the most time-consuming part in the entropy decoding process is saving and restoring the entropy context. Also, when the video decoder uses multi-core decoding, the entropy context saving and restoring process is not only time consuming but also results in high system bandwidth. I.e., the low-speed entropy context preservation and restoration process shown in fig. 2B.
In order to optimize the process of entropy context processing, the embodiment of the present application improves the hardware of the video decoder 10a shown in fig. 2B described above.
Referring to fig. 3, fig. 3 is a block diagram illustrating components of an entropy decoding module 100b1 of an electronic device 100b according to an embodiment of the present application. As shown in fig. 3, the entropy decoding module 100b1 includes: a video decoder 10b, an input buffer area 20b and an external storage area 30 b.
It should be noted that, unlike the entropy decoding module 100B1, the structure of the entropy decoding module 100B1 provided in the embodiments of the present application is different from that of the video decoder, which results in the difference between the saving and restoring flow of the entropy context shown in fig. 3 and the saving and restoring flow of the entropy context shown in fig. 2B.
More specifically, as shown in fig. 3, one or more milestones Ram 11 and an arbitrator 12 are added to the video decoder 10b as compared to the video decoder 10 a. Wherein one or more milestones Ram 11 are used to store entropy contexts in the contexts Ram in the respective cores, and the arbiter 12 is used to arbitrate the addresses or locations where entropy context information in different cores is stored in the one or more milestones Ram 11, e.g., in which milestone Ram 11 an entropy context in a context Ram of one core is stored.
Accordingly, taking core 0 as an example, when a bit underflow occurs, the entropy context in context Ram 02 is directly backed up and saved in one milestone Ram 11 in the slice, and when the bit underflow problem is eliminated, the entropy context can be restored from the corresponding milestone Ram 11 to context Ram 02 through the arbiter 12. The firmware is not required to be stored and restored between the context Ram on the chip and the DDR in which the external storage area 30b is located, so that the time consumption of storing and restoring the entropy context is reduced, and the bandwidth of the process between the context Ram on the chip and the DDR off the chip is avoided.
Note that the input buffer 20B shown in fig. 3 is the same as the input buffer 20a shown in fig. 2B. The external memory area 30B shown in FIG. 3 differs from the external memory area 30a shown in FIG. 2B in that the entropy context in the entropy decoding process need not be stored in the external memory area 30B. Thus, as in 3, high speed entropy context preservation and restoration between the milestone Ram 11 to the context Ram in the kernel is provided.
In some embodiments, the video decoder 10b is a video decoder supporting one video standard, that is, after the core in the video decoder 10b completes decoding of a code stream of one video according to one video standard, it can decode a code stream of another video according to another video standard again. For example, the kernel in the video decoder 10b performs decoding processes such as entropy decoding on a video code stream using the AV1 standard, and then performs decoding processes such as entropy decoding on a video code stream using the AV1 standard. As an example, in this embodiment, the one or more milestone rams 11 may be newly added rams in the video decoder 10b, and these rams are mainly used for storing entropy context generated when a bit underflow occurs during the entropy decoding process.
In other embodiments, the video decoder 10b is a video decoder supporting multiple video standards, that is, the core in the video decoder 10b may time-division multiplex different video streams using multiple video standards. For example, in the process of decoding a code stream of one video using the AV1 standard, the core in the video decoder 10b time-division multiplexes a code stream of another video using another video standard (e.g., the EP9 standard) and decodes the code stream of another video. It is understood that the storage resources in the video decoder 10b supporting multiple video standards are sufficient, for example, the video decoder 10b of the video standard includes a RAM (denoted as a time division multiplexing RAM) corresponding to the decoding of other video standards (e.g., the EP9 standard). In the multi-standard decoding process, when the core decodes the code stream by using the AV1, the time division multiplexing RAM corresponding to the decoding of the EP9 standard is in an idle state, and the time division multiplexing RAM can be used as the milestone RAM 11 used in the decoding process of the AV1 standard. Then, as an example, in this embodiment, the one or more milestone rams 11 are all the rams decoded in the video decoder 10b corresponding to the other video standards except AV 1. Alternatively, one part of the one or more milestone rams 11 is a Ram decoded by the video decoder 10b corresponding to a video standard other than AV1, and the other part is a Ram newly added to the video decoder 10 b. That is, the one or more milestone rams 11 may be implemented by an existing Ram in the video decoder 10b, or may be implemented by a newly added Ram.
Example one
In some embodiments of the present application, the number of one or more milestone Ram 11 in fig. 3 is less than the number of kernels in the video decoder 10 b. At this time, the arbiter 12 is configured to arbitrate which storage addresses of the one or more milestone Ram 11 the entropy contexts in the different contexts Ram should be stored at.
The process of an entropy context processing method of the present application is described in detail with reference to the block diagram of the entropy decoding module 100b1 of the electronic device 100b of fig. 3.
Fig. 4 is a flowchart illustrating an entropy context process according to an embodiment of the present application, and the specific process includes the following steps:
s401: the input buffer 20b inputs bitstreams into the cores 0 to 3 of the video decoder 10b, respectively.
As an example, the bitstream input in one core is one macroblock row, and cores 0-3 may input consecutive macroblock rows, respectively.
S402: the firmware in the cores 0-3 of the video decoder 10b stores the core information in the external memory area 10b, respectively.
The kernel information stored in the external memory area 10b is related to a macroblock line corresponding to a bitstream, for example, the kernel information stored in the kernel 0 is related to a macroblock line corresponding to a bitstream to be processed by the kernel 0, and is used to distinguish which kernel processes the bitstream corresponding to which macroblock line.
S403: the entropy engines in the cores 0-3 of the video decoder 10b perform entropy decoding on the input bitstream and update the entropy contexts in the respective contexts Ram, respectively.
For example, the entropy context in the context Ram 02 in the current core 0 may include parameters such as an entropy context model updated after entropy-decoding the last macroblock.
S404: when there are insufficient bits input in the entropy engine 03 in the core 0 of the cores 0 to 3 of the video decoder 10b, the video decoder 10b the entropy engine 03 reports first information indicating that a bit underflow occurs to the firmware 01.
It is understood that, the above-mentioned S403 takes the example that the bits input by the entropy engine 03 in the kernel 0 are insufficient, so as to illustrate that the available bits input into the buffer 20b are consumed without new available bits, i.e. a bit underflow occurs. In other embodiments, one or more of cores 0-3 may detect that there are insufficient bits to input to trigger the first information when the input buffer 20b has a bit underflow.
S405: firmware 01 in cores 0-3 of video decoder 10b each initiates a write request to arbiter 12 requesting that the entropy context in the corresponding context Ram be written to one or more milestone rams 11.
In some embodiments, cores 0-3 in S405 may simultaneously initiate a write request to arbiter 12.
In other embodiments, core 0 may initiate a write request first, with subsequent cores 1-3 initiating write requests. It can be understood that, when the kernel 0 derives the backup entropy context, the entropy engines in the kernels 1 to 3 may further continue to perform entropy decoding on the macroblocks in the currently input bitstream until the processing of the macroblocks of the current entropy decoding is finished, stop the entropy decoding and update the entropy contexts in the respective contexts Ram, thereby requesting to backup the updated entropy contexts into the milestone Ram 11.
In some embodiments, the write request corresponding to each core carries core information of the core, for example, the core information may be an identifier, a name, or a number of the core.
S406: arbiter 12 in video decoder 10b returns a write enable response to the firmware in the respective core on each write request.
It is to be understood that the above-mentioned one write enable response is used to indicate that the entropy context in the context Ram of the corresponding core is allowed to write to one or more milestone rams 11.
In some embodiments of the present application, the arbiter 12 may query whether there is a free storage area in the one or more milestone rams 11, and the address of the free storage area, etc. after receiving a read request. A write permission response is returned if a free area is queried, and a write rejection response is returned if no free area, the write rejection response indicating that writing of the entropy context in the corresponding context Ram to the one or more milestone rams 11 is not permitted.
In some embodiments, a write enable response may also include the memory address of the milestone Ram 11 allocated by the arbiter 12 for the entropy context in the corresponding context Ram.
S407: firmware in cores 0-3 in video decoder 10b sends the entropy context in the corresponding context Ram to arbiter 12.
It can be understood that the time when the firmware in each core sends the entropy context in the corresponding context Ram to the arbiter 12 is the same or has a sequence, which is not described herein again.
S408: the arbiter 12 in the video decoder 10b stores the entropy context in the corresponding context Ram to the corresponding milestone Ram 11 (i.e., the milestone Ram 11 in which the allocated storage address is located) according to the storage address corresponding to each core.
In the embodiment of the present application, the video decoder 10b may keep a backup of the entropy context on-chip, without requiring a long time to store the entropy context in the off-chip DDR where the off-chip external storage area 30b is located, which reduces the time consumption and avoids the high bandwidth generated between the on-chip Ram and the off-chip DDR.
It can be understood that, when the kernel 0 derives the backup entropy context, the entropy engines in the kernels 1 to 3 may further continue to perform entropy decoding on the macroblocks in the currently input bitstream until the processing of the currently entropy-decoded macroblocks is finished, stop the entropy decoding and update the entropy contexts in the respective contexts Ram, thereby requesting to backup the updated entropy contexts into the milestone Ram 11.
In addition, the arbiter 12 may associate a storage address of an entropy context in a core with the core, for example, the storage address of an entropy context in the milestone Ram 11 is associated with the core information of the corresponding core.
S409: when the video stream in the input buffer 20b is re-input to the cores 0-3 of the video decoder 10b, the firmware in the cores 0-3 initiates a read request to the arbiter 12, respectively.
It will be appreciated that the video stream in the input buffer 20b is re-input to the cores 0-3 of the video decoder 10b, indicating that there are enough bits newly stored in the input buffer 20b for consumption by the entropy engine in each core.
Wherein one read request is used for requesting to read the entropy context in the corresponding milestone Ram 11 and write into the corresponding context Ram.
In some embodiments, cores 0-3 in S405 may simultaneously initiate a read request to arbiter 12.
In some embodiments, the read request corresponding to each core carries core information of the core.
S410: the arbiter 12 in the video decoder 10b reads the entropy context corresponding to each core according to each read request.
In some embodiments of the present application, the arbiter 12 may query the memory address corresponding to a core after receiving a read request from the core, read the stored entropy context from the memory address, and clear the memory area.
S411: the arbiter 12 in the video decoder 10b saves the read entropy context into the context Ram of the corresponding core.
It is understood that the arbiter 12 may restore (i.e., restore) the entropy context in the milestone Ram 11 at the same time as the context Ram in each core, or according to a sequential order (e.g., the order of the labels of the cores from small to large), which is not described herein again.
S412: the entropy engines in the kernels 0-3 in the video decoder 10b re-entropy decode the newly input bitstream in conjunction with the restored entropy context in the corresponding context Ram.
It should be noted that the entropy engine can re-entropy decode the row of the macroblock that has been paused or failed in the last entropy decoding.
In the embodiment of the present application, the video decoder 10b may keep a backup of the entropy context on-chip, without requiring a long time to store the entropy context in the off-chip DDR where the off-chip external storage area 30b is located, which reduces the time consumption and avoids the high bandwidth generated between the on-chip Ram and the off-chip DDR.
Example two
In other embodiments of the present application, the number of the one or more milestone Ram 11 is equal to the number of kernels in the video decoder 10b, for example, a kernel-generated context is stored in a unique corresponding milestone Ram 11. Then, the arbiter 12 may be eliminated from the video decoder 10b shown in fig. 3 at this time, i.e. the arbiter 12 is not required to arbitrate in which of the plurality of milestones Ram 11 the entropy context in one core is kept.
Based on the entropy decoding module 100b1 shown in fig. 3, referring to fig. 5, another architecture diagram of an entropy decoding module provided by the embodiments of the present application is shown. Wherein, the entropy decoding module 100b2 shown in fig. 5 is different from the entropy decoding module 100b1 shown in fig. 3 in that 4 milestones Ram 11 are provided in the entropy decoding module 100b2, i.e. each kernel uniquely establishes a connection with a corresponding one of the milestones Ram 11; also, the arbiter 12 is not provided.
Next, the process of an entropy context processing method of the present application will be described in detail with reference to the block diagram of the entropy decoding module 100b2 of the electronic device 100b of fig. 6.
Fig. 6 is a flowchart illustrating entropy context processing according to an embodiment of the present application, and a specific process includes the following steps:
s601: the input buffer 20b inputs bitstreams into the cores 0 to 3 of the video decoder 10b, respectively.
S602: the firmware in the cores 0-3 of the video decoder 10b stores the core information in the external memory area 30b, respectively.
S603: the entropy engines in the cores 0-3 of the video decoder 10b perform entropy decoding on the input bitstream and update the entropy contexts in the respective contexts Ram, respectively.
S604: when there are insufficient bits input in the entropy engine 03 in the core 0 of the cores 0 to 3 of the video decoder 10b, the entropy engine 03 reports first information indicating that a bit underflow occurs to the firmware 01.
It should be noted that the descriptions of S601-S604 shown in fig. 6 are the same as the descriptions of S401-S404 shown in fig. 4, and are not described in detail here. While the flow shown in fig. 6 differs from the flow of fig. 4 in that S405 to S412 are replaced with S605 to S607 described below. I.e. the flow shown in fig. 6 does not require the arbiter 12 to decide how to store or read the entropy context in the milestone Ram 11.
S605: firmware in the cores 0-3 of the video decoder 10b writes the entropy context in the corresponding context Ram into the milestone Ram 11 corresponding to the core.
In some embodiments, the firmware in the cores 0 to 3 in S605 may directly initiate a write request to the corresponding milestone Ram 11 to request to write the entropy context information, and in addition, the write request may carry corresponding core information. It is understood that the time when the firmware in the cores 0 to 3 initiates a write request to the corresponding milestone Ram 11 is not specifically limited, and may be the same time or the sequence. For example, when the kernel 0 derives the backup entropy context, the entropy engines in the kernels 1 to 3 may further continue to perform entropy decoding on the macroblocks in the currently input bitstream until the processing of the currently entropy-decoded macroblocks is finished, stop the entropy decoding and update the entropy contexts in the respective contexts Ram, and further initiate a write request to the corresponding milestone Ram 11 to request backup storage of the updated entropy contexts.
In some embodiments, the write request corresponding to each core carries core information of the core, for example, the core information may be an identifier, a name, or a number of the core.
In some embodiments, a preset storage area in the milestone Ram 11 corresponding to each core is used for storing the corresponding entropy context. And, when the input buffer 20b again experiences a bit underflow, the entropy context in the preset storage area in each milestone Ram 11 is updated with the entropy context in the corresponding context Ram.
In addition, each core may be associated with a storage address preset in the corresponding milestone Ram 11 for storing the entropy context.
S606: when the video stream in the input buffer 20b is re-input to the cores 0-3 of the video decoder 10b, the firmware in the cores 0-3 restores the entropy context in the corresponding milestone Ram 11 to the corresponding context Ram.
It will be appreciated that the bitstream in the input buffer 20b is re-input to the cores 0-3 of the video decoder 10b, indicating that there are enough bits newly stored in the input buffer 20b for consumption by the entropy engine parsing in the respective cores.
In some embodiments, the firmware in the cores 0-3 may initiate a read request to the corresponding milestone Ram 11 for requesting to read the entropy context in the corresponding milestone Ram 11 and write to the corresponding context Ram, respectively.
In some embodiments, the cores 0 to 3 in S605 may initiate a read request simultaneously with the corresponding milestone Ram 11, or may initiate read requests sequentially.
In some embodiments, the read request corresponding to each core carries core information of the core.
S407: the entropy engines in the kernels 0-3 in the video decoder 10b re-entropy decode the newly input bitstream in conjunction with the restored entropy context in the corresponding context Ram.
It should be noted that the entropy engine may perform entropy decoding again on the macroblock row where the entropy decoding was suspended or failed last time.
In the embodiment of the present application, the video decoder 10b may keep a backup of the entropy context on-chip, without requiring a long time to store the entropy context in the off-chip DDR where the off-chip external storage area 30b is located, which reduces the time consumption and avoids the high bandwidth generated between the on-chip Ram and the off-chip DDR. Moreover, because the number of the milestone rams Ram 11 is the same as that of the context rams, an arbiter is not required to be additionally arranged to arbitrate writing or reading of entropy context information in a plurality of the milestone rams Ram 11 by different contexts Ram, and the flow of entropy context processing is simplified.
FIG. 7 is a block diagram illustrating an electronic device 100 according to one embodiment of the present application. FIG. 7 schematically illustrates an example electronic device 100 in accordance with various embodiments. In one embodiment, the electronic device 100 may include one or more processors 1604, system control logic 1608 coupled to at least one of the processors 1604, system memory 1612 coupled to the system control logic 1608, non-volatile memory (NVM)1616 coupled to the system control logic 1608, and a network interface 1620 coupled to the system control logic 1608.
In some embodiments, the processor 1604 may include one or more single-core or multi-core processors. In some embodiments, the processor 1604 may comprise any combination of general-purpose processors and dedicated processors (e.g., graphics processors, application processors, baseband processors, etc.). In embodiments in which electronic device 100 employs an eNB (enhanced Node B) or RAN (Radio Access Network) controller, processor 1604 may be configured to perform various consistent embodiments, e.g., as one or more of the various embodiments shown in fig. 3A and 3B.
In some embodiments, the system control logic 1608 may include any suitable interface controllers to provide any suitable interface to at least one of the processors 1604 and/or any suitable device or component in communication with the system control logic 1608.
In some embodiments, the system control logic 1608 may include one or more memory controllers to provide an interface to the system memory 1612. The system memory 1612 may be used to load and store data and/or instructions. In some embodiments, memory 1612 of system 1600 may include any suitable volatile memory, such as suitable Dynamic Random Access Memory (DRAM).
The NVM 1616 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. In some embodiments, the NVM 1616 may include any suitable non-volatile memory, such as flash memory, and/or any suitable non-volatile storage device, such as at least one of an HDD (Hard Disk Drive), CD (Compact Disc) Drive, DVD (Digital Versatile Disc) Drive.
The NVM 1616 may include a portion of the storage resources on the device on which the electronic device 100 is installed, or it may be accessible by, but not necessarily a part of, the device. The NVM 1616 can be accessed over a network, for example, via the network interface 1620.
In particular, the system memory 1612 and the NVM 1616 may include: a temporary copy and a permanent copy of instructions 1624. The instructions 1624 may include: instructions that when executed by at least one of the processors 1604 cause the electronic device 100 to implement a method as shown in fig. 4 or fig. 6. In some embodiments, the instructions 1624, hardware, firmware, and/or software components thereof may additionally/alternatively be disposed in the system control logic 1608, the network interface 1620, and/or the processor 1604.
Network interface 1620 may include a transceiver to provide a radio interface for electronic device 100 to communicate with any other suitable device (e.g., front end module, antenna, etc.) over one or more networks. In some embodiments, network interface 1620 may be integrated with other components of electronic device 100. For example, the network interface 1620 may be integrated into at least one of the processor 1604, the system memory 1612, the NVM 1616, and a firmware device (not shown) having instructions that when executed by at least one of the processor 1604, the electronic device 100 implements the method shown in fig. 5.
The network interface 1620 may further comprise any suitable hardware and/or firmware to provide a multiple-input multiple-output radio interface. For example, network interface 1620 may be a network adapter, wireless network adapter, telephone modem, and/or wireless modem.
In one embodiment, at least one of the processors 1604 may be packaged together with logic for one or more controllers of the system control logic 1608 to form a System In Package (SiP). In one embodiment, at least one of the processors 1604 may be integrated on the same die with logic for one or more controllers of the system control logic 1608 to form a system on a chip (SoC).
The electronic device 100 may further include: input/output (I/O) devices 1632.
Fig. 8 shows a block diagram of a SoC (System on Chip) 1700 according to an embodiment of the present application. The SOC 1700 is provided in the electronic apparatus 100. In fig. 8, like parts have the same reference numerals. In fig. 8, the SoC 1700 includes: an interconnect unit 1750 coupled to the application processor 1710; a system agent unit 1770; a bus controller unit 1780; an integrated memory controller unit 1740; video processor 1720, video processor 1720 also including an entropy decoding module 100b1 or 100b 2; an Static Random Access Memory (SRAM) unit 1730; a Direct Memory Access (DMA) unit 1760. In one embodiment, SoC 1700 may also include a processor such as, for example, a network or communication processor, compression engine, GPU, high-throughput MIC processor, or embedded processor, among others.
Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this application, a processing system includes any system having a processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.
The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.
In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), Random Access Memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or a tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared digital signals, etc.) using the internet in an electrical, optical, acoustical or other form of propagated signal. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).
In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. It should be understood, however, that such specific arrangement and/or ordering may not be required, and rather, in some embodiments, such features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, may not be included or may be combined with other features.
It should be noted that, in each device embodiment of the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solving the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules which are not so closely related to solve the technical problems presented in the present application, which does not indicate that no other units/modules exist in the above-mentioned device embodiments.
It is noted that, in the examples and descriptions of this patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.
While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims (17)

1. An entropy context processing method, applied to a system on chip including a plurality of cores and at least one storage unit, the method comprising:
detecting that the input bit conditions of the plurality of cores do not meet a preset condition;
the plurality of cores respectively backup the cached entropy context into the at least one storage unit;
detecting that input bit conditions of a plurality of cores meet a preset condition;
the multiple kernels respectively acquire corresponding entropy contexts from the at least one storage unit and re-cache the corresponding entropy contexts;
the multiple cores entropy decode different macroblock rows in the input bitstream in parallel and update the cached entropy context.
2. The method according to claim 1, wherein the preset condition is: at least one of the cores does not input a bit stream within a preset time length, or the number of bits of the bit stream input in the cores is smaller than a preset number.
3. The method of claim 2, wherein the number of said at least one memory unit is the same as the number of said plurality of cores, one of said cores corresponding uniquely to one of said memory units;
each of the plurality of cores is to store or retrieve an entropy context in a unique corresponding one of the at least one storage unit.
4. The method of claim 1, further comprising an arbiter coupled to the at least one memory unit, the at least one memory unit having a number less than the number of the plurality of cores;
the method further comprises the following steps:
the plurality of kernels respectively send a storage request message to the arbiter;
the arbiter allocates a storage address to each of the plurality of cores based on a plurality of storage request messages;
the arbiter returns a response message allowing storage to the plurality of cores respectively;
wherein the entropy context cached by each of the cores is stored in the at least one storage unit according to the storage address allocated by the arbiter.
5. The method of claim 4, wherein one of the storage request messages includes at least one of: the method comprises the steps that identification information of a kernel, corresponding entropy context cached by the kernel, and first indication information requesting to store the entropy context are indicated;
one of said response messages allowing storage comprises at least one of: identification information of the corresponding core, identification of a storage address allocated by the corresponding core, and second indication information indicating that the corresponding core is allowed to store the corresponding entropy context in the at least one storage unit.
6. The method of claim 5, wherein each of said cores includes a firmware, an entropy engine and a context cache unit;
wherein each of the cores entropy decodes the bitstream by an entropy engine and generates an entropy context; each kernel caches entropy context through a context caching unit; and each kernel initiates a storage request to the arbiter through firmware, backups the entropy context cached in the corresponding context caching unit to the at least one storage unit, and restores and stores the entropy context acquired from the at least one storage unit to the corresponding context caching unit.
7. The method of claim 6, wherein the plurality of cores entropy decode the bitstream in terms of macroblock rows in the bitstream, one of the cores entropy decodes the bitstream in terms of macroblocks in the macroblock rows, wherein one macroblock row corresponds to one core, and wherein an entropy context cached by one of the cores corresponds to one macroblock.
8. The method of claim 7, wherein the system-on-chip is coupled to a target storage unit external to the system-on-chip;
the method further comprises the following steps:
each of the plurality of cores sends a core processing message to the target storage unit, wherein one of the core processing messages includes at least one of: information of the corresponding kernel, information of the bitstream, macroblock or macroblock row input in the corresponding kernel.
9. A system-on-chip, comprising a plurality of cores and at least one memory unit;
the cores are used for detecting that the input bit conditions of the cores do not meet preset conditions; respectively backing up the cached entropy context into the at least one storage unit;
the at least one storage unit is used for storing entropy contexts from the plurality of kernels;
the cores are also used for detecting that the input bit conditions of the cores meet preset conditions; respectively acquiring corresponding entropy contexts from the at least one storage unit, and re-caching the corresponding entropy contexts; different macroblock rows in the input bitstream are entropy decoded in parallel and the cached entropy context is updated.
10. The system according to claim 9, wherein the preset condition is: at least one of the cores does not input a bit stream within a preset time length, or the number of bits of the bit stream input in the cores is smaller than a preset number.
11. The system of claim 10, wherein the number of said at least one storage unit is the same as the number of said plurality of cores, one said core corresponding uniquely to one said storage unit;
each of the plurality of cores is to store or retrieve an entropy context in a unique corresponding one of the at least one storage unit.
12. The system according to claim 9, further comprising an arbiter coupled to the at least one memory unit, the at least one memory unit having a number less than the number of the plurality of cores;
the plurality of kernels are further used for respectively sending a storage request message to the arbiter;
the arbiter is used for allocating a storage address to each of the plurality of cores based on a plurality of storage request messages; respectively returning a response message allowing storage to the plurality of kernels;
wherein the entropy context cached by each core is stored in the at least one storage unit according to the storage address allocated by the arbiter.
13. The system of claim 12, wherein one of the storage request messages includes at least one of: the method comprises the steps that identification information of a kernel, corresponding entropy context cached by the kernel, and first indication information requesting to store the entropy context are indicated;
one of said store-allowed response messages includes at least one of: identification information of the corresponding core, identification of a storage address allocated by the corresponding core, and second indication information indicating that the corresponding core is allowed to store the corresponding entropy context in the at least one storage unit.
14. The system of claim 12, wherein each of said cores includes a firmware, an entropy engine, and a context cache unit;
wherein each of the cores entropy decodes the bitstream by an entropy engine and generates an entropy context; each kernel caches entropy context through a context caching unit; and each kernel initiates a storage request to the arbiter through firmware, backups the entropy context cached in the corresponding context caching unit to the at least one storage unit, and restores and stores the entropy context acquired from the at least one storage unit to the corresponding context caching unit.
15. The system of claim 14, wherein the plurality of cores entropy decode the bitstream in terms of macroblock rows in the bitstream, one of the cores entropy decode the bitstream in terms of macroblocks in the macroblock rows, wherein one macroblock row corresponds to one core, and wherein an entropy context cached by one of the cores corresponds to one macroblock.
16. The system of claim 15, wherein the system-on-chip is coupled to a target storage unit external to the system-on-chip;
each of the plurality of cores is further configured to send a core processing information to the target storage unit, where the core processing information includes at least one of the following: information of the corresponding kernel, information of the bitstream, macroblock or macroblock row input in the corresponding kernel.
17. An electronic device, characterized in that the electronic device comprises the system on chip according to any one of claims 9 to 16, and a target storage unit external to the system on chip; the target storage unit is configured to store kernel processing information sent by the multiple kernels, where one of the kernel processing information includes at least one of the following: information of the corresponding kernel, information of the bitstream, macroblock or macroblock row input in the corresponding kernel.
CN202210715926.3A 2022-06-23 2022-06-23 Entropy context processing method, system on chip and electronic equipment Pending CN115118999A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210715926.3A CN115118999A (en) 2022-06-23 2022-06-23 Entropy context processing method, system on chip and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210715926.3A CN115118999A (en) 2022-06-23 2022-06-23 Entropy context processing method, system on chip and electronic equipment

Publications (1)

Publication Number Publication Date
CN115118999A true CN115118999A (en) 2022-09-27

Family

ID=83328425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210715926.3A Pending CN115118999A (en) 2022-06-23 2022-06-23 Entropy context processing method, system on chip and electronic equipment

Country Status (1)

Country Link
CN (1) CN115118999A (en)

Similar Documents

Publication Publication Date Title
CN111937387B (en) Prediction method and apparatus for history-based motion vector using parallel processing
CN110024401B (en) Modified adaptive loop filter temporal prediction for temporal scalability support
KR101589505B1 (en) Method and apparatus for entropy coding for slice segment, method and apparatus for entropy decoding for slice segment
CN110708553B (en) Video decoding method, decoder, computer device and storage device
US9077998B2 (en) Padding of segments in coded slice NAL units
US20160261878A1 (en) Signaling information for coding
US20200304837A1 (en) Adaptive syntax grouping and compression in video data
US11381822B2 (en) Scalar quantizer decision scheme for dependent scalar quantization
US20230068657A1 (en) Selecting a coding method for suffix values for displacement vector differences based on value intervals
US9635370B2 (en) Image processing apparatus and image processing method, image encoding apparatus and image encoding method, and image decoding apparatus and image decoding method
US20140307772A1 (en) Image processing apparatus and image processing method
CN113348666A (en) Method for identifying group of graph blocks
JP7162532B2 (en) Method and Apparatus for Context-Adaptive Binary Arithmetic Encoding of a Sequence of Binary Symbols Representing Syntax Elements Related to Video Data
WO2023193701A1 (en) Image coding method and apparatus
US20220417550A1 (en) Method and apparatus for constructing motion information list in video encoding and decoding and device
CN113994686A (en) Data unit and parameter set design for point cloud coding
CN115118999A (en) Entropy context processing method, system on chip and electronic equipment
US8774273B2 (en) Method and system for decoding digital video content involving arbitrarily accessing an encoded bitstream
CN115176461A (en) Video coding and decoding method and device
CN113905233B (en) Entropy decoding method based on audio-video coding standard, readable medium and electronic device thereof
US10026149B2 (en) Image processing system and image processing method
US12034947B2 (en) Media data processing method and related device
US20230091266A1 (en) Media data processing method and related device
US11985318B2 (en) Encoding video with extended long term reference picture retention
US11595652B2 (en) Explicit signaling of extended long term reference picture retention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination