CN114650422A

CN114650422A - Video frame encoding method, video frame encoding device, electronic equipment and computer readable medium

Info

Publication number: CN114650422A
Application number: CN202011500360.XA
Authority: CN
Inventors: 张韵东; 昝劲文; 隋红丽
Original assignee: Chongqing Zhongxing Micro Artificial Intelligence Chip Technology Co ltd
Current assignee: Chongqing Zhongxing Micro Artificial Intelligence Chip Technology Co ltd
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2022-06-21

Abstract

The embodiment of the disclosure discloses a video frame encoding method, a video frame encoding device, an electronic device and a computer readable medium. One embodiment of the method comprises: acquiring a video frame; coding and dividing the video frame to obtain a coding unit set; filtering the set of coding units to generate a first threshold and a second threshold; performing a multi-level region division process on the video frame based on the first threshold and the second threshold to generate a multi-level region, wherein the multi-level region includes an interested region group, a possible region group, and a non-interested region group; and respectively coding the interested area group, the possible area group and the non-interested area group which are included in the multilevel area to obtain a multi-coded video frame. The embodiment solves the problem of area division of the video frame, reduces the probability of missing detection of the overall target object and improves the coding efficiency.

Description

Video frame encoding method, video frame encoding device, electronic device, and computer-readable medium

Technical Field

Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a video frame encoding method, an apparatus, an electronic device, and a computer-readable medium.

Background

Image coding refers to a technique for representing an image or information included in an image with a small number of bits. A common image coding technique is to perform a process of reducing information redundancy on a video frame subjected to high-precision analog-to-digital conversion. Then, the video frame with reduced information redundancy is re-encoded within a certain loss range.

However, when the above method is adopted to perform the encoding processing on the video frame, the following technical problems often exist:

first, the video frame cannot be accurately divided into regions, resulting in missing information in the video frame when encoding each region included in the video frame;

secondly, the video frame cannot be accurately classified and coded, which results in that the regions of different levels cannot be accurately classified and coded, and the coding efficiency of the video frame is not high.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Some embodiments of the present disclosure propose a video frame encoding method, apparatus, electronic device and computer readable medium to solve one or more of the technical problems set forth in the background section above.

In a first aspect, some embodiments of the present disclosure provide a video frame encoding method, including: acquiring a video frame; coding and dividing a video frame to obtain a coding unit set; filtering the set of coding units to generate a first threshold and a second threshold; performing multi-level region division processing on the video frame based on a first threshold and a second threshold to generate a multi-level region, wherein the multi-level region comprises an interested region group, a possible region group and a non-interested region group; and respectively coding the interested area group, the possible area group and the non-interested area group which are included in the multilevel area to obtain a multi-coded video frame.

In a second aspect, some embodiments of the present disclosure provide a video frame encoding method, including: an acquisition unit that acquires a video frame; the encoding and dividing unit is configured to encode and divide the video frame to obtain an encoding unit set; a filter processing unit configured to perform filter processing on the set of encoding units to generate a first threshold value and a second threshold value; a division processing unit configured to perform a multi-level region division processing on the video frame based on a first threshold and a second threshold to generate a multi-level region, wherein the multi-level region includes a region-of-interest group, a possible region group, and a region-of-non-interest group; and the coding unit is configured to respectively perform coding processing on the region-of-interest group, the possible region group and the region-of-non-interest group included in the multi-level region to obtain a multi-coded video frame.

In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device, on which one or more programs are stored, which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any implementation manner of the first aspect.

In a fourth aspect, some embodiments of the disclosure provide a computer readable medium having a computer program stored thereon, where the program when executed by a processor implements a method as described in any of the implementations of the first aspect.

The above embodiments of the present disclosure have the following advantages: the multiple encoded video frames obtained by the video frame encoding method of some embodiments of the present disclosure improve the precision of region division, thereby reducing the overall target object misjudgment and missed detection probability and improving the video frame encoding efficiency. Specifically, the reason for missing detection of the relevant target object is as follows: the traditional method cannot accurately divide the area, so that the target object cannot be completely detected. Based on this, the video frame coding method of some embodiments of the present disclosure obtains multiple coded video frames. First, a video frame is acquired. And secondly, coding and dividing the video frame to obtain a coding unit set. By encoding and dividing the video frame to generate the encoding unit set, the video frame can be subjected to blocking processing, and the processing efficiency of the video frame can be improved. Then, the set of coding units is filtered to generate a first threshold and a second threshold. And performing filtering processing on the coding unit set to determine a threshold value of the significance of the coding unit, so as to prepare for next region division. Then, based on the first threshold and the second threshold, a multi-level region division process may be performed on the video frame to generate a multi-level region. The significance of the coding unit is compared with the first threshold and the second threshold, so that the video frame is accurately divided into the multi-level regions, and the importance degrees of different regions of the video frame are distinguished. And finally, respectively coding the interested area group, the possible area group and the non-interested area group which are included in the multi-stage area to obtain a multi-coded video frame. By encoding the multi-level regions, multi-encoded video frames are obtained, and the encoding efficiency of the video can be improved. The multi-coded video frame includes not only the region-of-interest group and the region-of-non-interest group but also the possible region group. And because of the participation of the possible regional groups, the target object (such as a vehicle) in the video frame can be effectively detected, and the probability of missing detection of the target object is reduced. And further, the accuracy rate of target object detection and the video frame coding efficiency are improved.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1 is a schematic diagram of an application scenario of a video frame encoding method of some embodiments of the present disclosure;

fig. 2 is a flow diagram of some embodiments of a video frame encoding method according to the present disclosure;

fig. 3 is a schematic structural diagram of some embodiments of a video frame encoding apparatus according to the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to a video frame encoding method of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 is a schematic diagram of an application scenario of a video frame encoding method according to some embodiments of the present disclosure.

In the application scenario of fig. 1, first, a computing device 101 may obtain a video frame 102. Then, the computing device 101 may perform encoding segmentation on the video frame 102 to obtain a set of encoding units 103. Thereafter, the computing device 101 may perform a filtering process on the set of encoding units 103 to generate the first threshold 104 and the second threshold 105. Then, the computing device 101 may perform a multi-level region division process on the video frame 102 based on the first threshold 104 and the second threshold 105 to generate a multi-level region 106, where the multi-level region 106 includes a region-of-interest group, a possible region group, and a region-of-non-interest group. Finally, the computing device 101 may perform encoding processing on the region-of-interest group, the possible region group, and the region-of-non-interest group included in the multi-level region 106, respectively, to obtain a multi-encoded video frame 107.

The computing device 101 may be hardware or software. When the computing device is hardware, it may be implemented as a distributed cluster composed of multiple servers or terminal devices, or may be implemented as a single server or a single terminal device. When the computing device is embodied as software, it may be installed in the hardware devices enumerated above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

It should be understood that the number of computing devices in FIG. 1 is merely illustrative. There may be any number of computing devices, as implementation needs dictate.

With continued reference to fig. 2, a flow 200 of some embodiments of a video frame encoding method according to the present disclosure is shown. The video frame coding method comprises the following steps:

step 201, acquiring a video frame.

In some embodiments, the subject of execution of the video frame encoding method (e.g., computing device 101 shown in fig. 1) may obtain the video frames via a wired connection or a wireless connection. The video frame may be any one of pictures in the target video. For example, the video frame may be any one of the pictures in a movie video. The video frame may be any one of the pictures in the television video.

Step 202, performing coding and segmentation on the video frame to obtain a coding unit set.

In some embodiments, the executing entity may perform encoding segmentation on the video frame to obtain a set of encoding units, and may include the following steps:

firstly, the video frames are coded to obtain a video information sequence. Wherein the video information sequence may be an information sequence consisting of 0 and 1.

And secondly, averagely dividing the video information sequence into a preset number of sub-video information sequences to generate the coding unit set.

As an example, the predetermined number may be 4.

Step 203, filtering the set of coding units to generate a first threshold and a second threshold.

In some embodiments, the execution subject of the video frame encoding method may perform filtering processing on the set of encoding units to generate a first threshold value and a second threshold value, where the first threshold value is a region critical value used for dividing a non-region-of-interest group and a possible region group, and the second threshold value is a region critical value used for dividing the possible region group and the region-of-interest group. The filtering process may be a moving smoothing filtering process. The filtering process may be a clipping filtering process.

In some optional implementation manners of some embodiments, processing the encoding unit to generate the first threshold and the second threshold may include:

firstly, performing discrete cosine transform processing on each coding unit in the coding unit set to generate a frequency component distribution matrix, and obtaining a frequency component distribution matrix group. In the discrete cosine transform process, the signal of the coding unit is transformed into frequency components including a direct current component and a cosine component. The frequency component is used to represent the magnitude of the amplitude of the frequency vibration of the dc component and the cosine component.

As an example, the distribution matrix corresponding to the frequency components may be:

where F (0,0) is used to characterize the frequency component value at the (0,0) position.

And secondly, filtering each frequency component distribution matrix in the frequency component distribution matrix group to generate a filtered frequency component distribution matrix as a filtering frequency component distribution matrix, so as to obtain a filtering frequency component distribution matrix group. Wherein each frequency component distribution matrix is subjected to filtering processing. And when the normalized frequency value corresponding to the frequency component distribution matrix corresponding to the frequency component is in the range of [0.03, 0.25], the frequency component is subjected to filtering processing. And when the normalized frequency value corresponding to the frequency component distribution matrix corresponding to the frequency component is not in [0.03, 0.25], the frequency component is not subjected to filtering processing.

As an example, the set of filter frequency component distribution matrices may be:

And thirdly, accumulating each element included in each filtering frequency component distribution matrix in the filtering frequency component distribution matrix group to generate a filtering frequency component significance as a coding unit significance, and obtaining a coding unit significance group. Wherein the significance of the coding units indicates the ability of the coding units to distinguish from each other.

And fourthly, generating a first threshold value and a second threshold value based on the coding unit significance group.

As an example, the execution body may generate the first threshold and the second threshold in various ways based on the coding unit saliency group.

Optionally, generating the first threshold and the second threshold based on the coding unit significance group may further include the following sub-steps:

the first sub-step, using the coding unit significance set as the abscissa axis and the number of the coding unit sets as the ordinate axis, constructs a plane rectangular coordinate system.

And a second substep of performing interval division processing on the abscissa axis by using a preset interval length based on the plane rectangular coordinate system to obtain a preset number of intervals.

And a third sub-step, scanning each coding unit significance in the coding unit set in sequence to obtain a coding unit significance group.

A fourth substep of constructing a coding unit histogram on the predetermined number of bins based on the coding unit saliency group.

A fifth substep of performing a scan process on the coding unit histogram to generate a first threshold value and a second threshold value, comprising:

generating the first threshold value and the second threshold value by the following formula:

wherein TH is₁Representing a first threshold value. TH₂Representing a second threshold. X₀The start coordinate of the abscissa axis of the histogram of the coding unit is represented. N (i) represents the number of coding units in the coding unit histogram ith interval corresponding to any coding unit significance in the coding unit significance group, and the value range of i is [1, M%]M represents a predetermined number of intervals; m represents the section number corresponding to the maximum value of N (i). k represents the interval of the predetermined number of intervals; l represents the interval of the predetermined number of intervals; m + k represents scanning from the m-th interval to the direction increasing to i, and after k intervals, the first N (i) is found to be less than delta₁The interval of N (m) is used as a first threshold interval, and the values of the continuous scanning N (m + k +1) and N (m + k +2) are less than delta₁N (m), the first threshold regionThe section number corresponding to the section is m + k. m + k + l represents scanning from the m + k interval to the direction increasing to i, after l intervals, the first N (i) is found to be less than delta₂The interval of N (m) is used as a second threshold interval, and the values of the continuous scanning N (m + k +1) and N (m + k +2) are less than delta₂N (m), and the sequence number of the interval corresponding to the second threshold interval is m + k + l. Delta₁Representing a first threshold predetermined coefficient, of value 0.8. Delta. for the preparation of a coating₂Representing a second threshold predetermined coefficient, taking the value of 0.2. Δ x represents the length of the coding unit histogram abscissa axis divided into a predetermined number of bins.

The above formula is used as an inventive point of the embodiment of the present disclosure, and solves the second technical problem mentioned in the background art, that is, the video frame cannot be accurately classified and encoded due to the inability to accurately classify and encode regions of different levels, which causes the low encoding efficiency of the video frame. Factors that lead to inaccuracy in the threshold setting tend to be as follows: without considering the relationship of the degree of saliency between each coding unit, the threshold value of the coding unit cannot be set accurately, and thus the region cannot be divided accurately. If the above factors are solved, the accurate region division of the video frame can be realized. To achieve this effect, the above formula scans the significance of each coding unit in the coding unit set in sequence to obtain a significance interval corresponding to the maximum value of the number of coding units, and the number of coding units corresponding to the significance interval is the maximum value. Then, the scanning is continued in the increasing direction of the significance degree, a significance degree interval is found to meet the first formula condition, and the two last adjacent and continuous intervals also meet the first formula condition. The first threshold value can be obtained, and the non-interested region is divided to obtain the non-interested region group, so that the non-interested region group does not contain useful information, and the subsequent detection of the target object is more accurate. And rescanning in the significance increasing direction, finding a significance interval to meet the second formula condition, and enabling the two last adjacent and continuous intervals to meet the second formula condition. The second threshold value can be obtained, and the region of interest is marked out to obtain a region of interest group. Two threshold values are obtained on the coordinate axis, so that three non-intersection areas are divided on the video frame, and the areas of the video frame are divided more accurately. Furthermore, the probability of missing detection of the overall target object is reduced, and meanwhile the coding efficiency is improved.

And 204, performing multi-level region division processing on the video frame based on the first threshold and the second threshold to generate a multi-level region.

In some embodiments, a video frame is subjected to a multi-level region partitioning process based on a first threshold and a second threshold to generate a multi-level region. The multi-level region may include a region-of-interest group, a possible region group, and a region-of-non-interest group. The multi-stage area division may be an image processing technique, and the objective is to divide at least one area of an object from a video frame, i.e. to find a set of pixels corresponding to the object or the surface of the object, which appear as two-dimensional blobs, which is one of the basic shape features of the area. The multi-level region division processing may obtain the multi-level regions including the region-of-interest group, the possible region group, and the region-of-non-interest group by comparing the significance of the coding unit with a threshold. The above-mentioned region of interest is a region where the target object is highly likely to appear. The likely region is a region where the target object is likely to appear. The region of non-interest has no regions where the target object may appear.

In some optional implementations of some embodiments, the executing body performs a multi-level region dividing process on the video frame based on the first threshold and the second threshold to generate a multi-level region, and may include the following steps:

determining the coding unit significance smaller than the first threshold in the coding unit significance group as a first coding unit significance to obtain a first coding unit significance group;

secondly, determining a region corresponding to the significance of each first coding unit in the first coding unit significance group as a non-interest region to obtain a non-interest region group;

determining the coding unit significance of the coding unit significance group which is greater than or equal to the first threshold value and smaller than the second threshold value as a second coding unit significance to obtain a second coding unit significance group;

fourthly, determining the region corresponding to each second coding unit significance in the second coding unit significance group as a possible region to obtain a possible region group;

fifthly, determining the coding unit significance greater than or equal to the second threshold value in the coding unit significance group as a third coding unit significance to obtain a third coding unit significance group;

and sixthly, determining a region corresponding to the significance of each third coding unit in the third coding unit significance group as an interested region, so as to obtain the interested region group.

Step 205, respectively encoding the region-of-interest group, the possible region group and the region-of-non-interest group included in the multi-level region to obtain a multi-encoded video frame.

In some embodiments, the execution subject performs encoding processing on a region-of-interest group, a region-of-possibility group, and a region-of-non-interest group included in the multi-level region, respectively, to obtain a multi-encoded video frame.

In some optional implementations of some embodiments, the performing main body performs encoding processing on a region-of-interest group, a possible region group, and a region-of-non-interest group included in the multi-level region, respectively, to obtain a multi-coded video frame, and may include the following steps:

first, each interested region in the interested region group is subjected to first quantization parameter coding processing to obtain a first-level coded video frame. The first quantization parameter encoding process may be a Huffman encoding process.

And secondly, carrying out second quantization parameter coding processing on each possible area in the possible area group to obtain a two-level coded video frame. The second quantization parameter encoding process may be a Huffman encoding process.

And thirdly, carrying out third quantization parameter coding processing on each non-interested region in the non-interested region group to obtain a three-level coded video frame. The first quantization parameter encoding process may be a Huffman encoding process.

With further reference to fig. 3, as an implementation of the methods shown in the above figures, the present disclosure provides some embodiments of a video frame encoding apparatus, which correspond to those shown in fig. 2, and which may be applied in various electronic devices in particular.

As shown in fig. 3, an apparatus 300 for video frame encoding of some embodiments comprises: an acquisition unit 301, an encoding division unit 302, a filter processing unit 303, a division processing unit 304, and an encoding processing unit 305. Wherein, the obtaining unit 301 is configured to obtain a video frame; an encoding and dividing unit 302 configured to perform encoding and dividing on the video frame to obtain an encoding unit set; a filter processing unit 303 configured to perform filter processing on the set of encoding units to generate a first threshold value and a second threshold value; a division processing unit 304 configured to perform a multi-level region division processing on the video frame based on a first threshold and a second threshold to generate a multi-level region, wherein the multi-level region includes a region-of-interest group, a possible region group, and a region-of-non-interest group; the encoding processing unit 305 is configured to perform encoding processing on the region-of-interest group, the possible region group, and the region-of-non-interest group included in the multi-level region, respectively, to obtain a multi-encoded video frame.

It will be understood that the units described in the apparatus 300 correspond to the various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 300 and the units included therein, and are not described herein again.

Referring now to FIG. 4, shown is a block diagram of an electronic device (e.g., computing device 101 of FIG. 1)400 suitable for use in implementing some embodiments of the present disclosure. The server shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 4, electronic device 400 may include a processing device (e.g., central processing unit, graphics processor, etc.) 401 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage device 408 into a Random Access Memory (RAM) 403. In the RAM403, various programs and data necessary for the operation of the electronic apparatus 400 are also stored. The processing device 401, the ROM 402, and the RAM403 are connected to each other through a bus 404. An input/output (I/O) interface 404 is also connected to bus 404.

Generally, the following devices may be connected to the I/O interface 404: input devices 406 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 407 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 408 including, for example, tape, hard disk, etc.; and a communication device 409. The communication means 409 may allow the electronic device 400 to communicate wirelessly or by wire with other devices to exchange data. While fig. 4 illustrates an electronic device 400 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may be alternatively implemented or provided. Each block shown in fig. 4 may represent one device or may represent multiple devices as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 409, or from the storage device 408, or from the ROM 402. The computer program, when executed by the processing apparatus 401, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the apparatus; or may be separate and not incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a video frame; coding and dividing a video frame to obtain a coding unit set; filtering the set of coding units to generate a first threshold and a second threshold; based on a first threshold value and a second threshold value, carrying out multistage region division processing on a video frame to generate multistage regions, wherein the multistage regions comprise an interested region group, a possible region group and a non-interested region group; and respectively coding the interested area group, the possible area group and the non-interested area group which are included in the multi-level area to obtain a multi-coded video frame.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a code division unit, a filter processing unit, a division processing unit, and a coding unit. Where the names of these units do not in some cases constitute a limitation on the unit itself, for example, an acquisition unit may also be described as a "unit that acquires video frames in a video".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A video frame encoding method, comprising:

acquiring a video frame;

coding and dividing the video frame to obtain a coding unit set;

filtering the set of coding units to generate a first threshold and a second threshold;

performing a multi-level region division process on the video frame based on the first threshold and the second threshold to generate a multi-level region, wherein the multi-level region includes a region-of-interest group, a possible region group, and a region-of-non-interest group;

and respectively coding the interested area group, the possible area group and the non-interested area group which are included in the multi-level area to obtain a multi-coded video frame.

2. The method of claim 1, wherein the filtering the set of coding units to generate a first threshold and a second threshold comprises:

performing discrete cosine transform processing on each coding unit in the coding unit set to generate a frequency component distribution matrix to obtain a frequency component distribution matrix group;

filtering each frequency component distribution matrix in the frequency component distribution matrix group to generate a filtered frequency component distribution matrix as a filtering frequency component distribution matrix, so as to obtain a filtering frequency component distribution matrix group;

accumulating each element included in each filtering frequency component distribution matrix in the filtering frequency component distribution matrix group to generate a filtering frequency component significance as a coding unit significance to obtain a coding unit significance group;

generating a first threshold and a second threshold based on the coding unit significance group.

3. The method of claim 2, wherein the generating the first and second thresholds comprises:

constructing a plane rectangular coordinate system by taking the coding unit significance groups as abscissa axes and the number of the coding unit sets as ordinate axes;

based on the plane rectangular coordinate system, interval division processing is carried out on the abscissa axis by using a preset interval length to obtain a preset number of intervals;

sequentially scanning the significance of each coding unit in the coding unit set to obtain a coding unit significance group;

constructing a coding unit histogram over the predetermined number of bins based on the coding unit saliency group;

and performing scanning processing on the coding unit histogram to generate a first threshold value and a second threshold value.

4. The method of claim 3, wherein the scan processing the coding unit histogram to generate a first threshold and a second threshold comprises:

wherein, TH₁Represents a first threshold value; TH₂Represents a second threshold; x₀The initial coordinate represents the horizontal axis of the histogram of the coding unit; n (i) represents the number of coding units in the ith interval of the coding unit histogram corresponding to any coding unit significance in the coding unit significance group, and the value range of i is [1, M%]M represents a predetermined number of intervals; m represents the section number corresponding to the maximum value of N (i); k represents an interval of the predetermined number of intervals; l represents an interval of the predetermined number of intervals; m + k represents scanning from the m-th section to the direction increasing to i, after k sections, the first N (i) less than delta is found₁The interval of N (m) is used as a first threshold interval, and the values of the continuous scanning N (m + k +1) and N (m + k +2) are less than delta₁N (m) ofThe interval serial number corresponding to the first threshold interval is m + k; m + k + l represents scanning from the m + k interval to the direction increasing to i, after l intervals, the first N (i) is found to be less than delta₂The interval of N (m) is used as a second threshold interval, and the values of the continuous scanning N (m + k +1) and N (m + k +2) are less than delta₂N (m), the sequence number of the interval corresponding to the second threshold interval is m + k + l; delta₁The first threshold value preset coefficient is represented, and the value is 0.8; delta₂The preset coefficient of the second threshold is represented, and the value is 0.2; Δ x denotes a length of the encoding unit histogram abscissa axis divided into a predetermined number of bins.

5. The method of claim 4, wherein said performing a multi-level region partition process on the video frame based on the first threshold and the second threshold to generate a multi-level region comprises:

determining a region corresponding to each first coding unit significance in the first coding unit significance group as a region of non-interest to obtain a region of non-interest group;

determining an area corresponding to each second coding unit significance in the second coding unit significance group as a possible area to obtain a possible area group;

determining coding unit significance greater than or equal to the second threshold in the coding unit significance group as third coding unit significance to obtain a third coding unit significance group;

and determining a region corresponding to each third coding unit significance in the third coding unit significance group as an interested region to obtain the interested region group.

6. The method according to claim 5, wherein said encoding the groups of regions of interest, the groups of regions of potential and the groups of regions of non-interest included in the multi-level region respectively to obtain a multi-coded video frame comprises:

performing first quantization parameter coding processing on each region of interest in the region of interest group to obtain a primary coded video frame;

performing second quantization parameter coding processing on each possible region in the possible region group to obtain a secondary coded video frame;

and carrying out third quantization parameter coding processing on each non-interested area in the non-interested area group to obtain a three-level coded video frame.

7. A video frame encoding apparatus, comprising:

an acquisition unit configured to acquire a video frame;

the encoding and dividing unit is configured to encode and divide the video frame to obtain an encoding unit set;

a filter processing unit configured to filter-process the set of encoding units to generate a first threshold value and a second threshold value;

a division processing unit configured to perform a multi-level region division processing on the video frame based on the first threshold and the second threshold to generate a multi-level region, wherein the multi-level region includes a region-of-interest group, a possible region group, and a region-of-non-interest group;

and the encoding processing unit is configured to perform encoding processing on the region-of-interest group, the possible region group and the region-of-non-interest group included in the multi-level region respectively to obtain a multi-encoded video frame.

8. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

9. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-6.