CN109451318B

CN109451318B - Method, apparatus, electronic device and storage medium for facilitating VR video encoding

Info

Publication number: CN109451318B
Application number: CN201910022693.7A
Authority: CN
Inventors: 鲍金龙
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-01-09
Filing date: 2019-01-09
Publication date: 2022-11-01
Anticipated expiration: 2039-01-09
Also published as: CN109451318A

Abstract

The invention provides a method, a device, electronic equipment and a storage medium for conveniently encoding a VR video, wherein after a VR video image to be encoded is obtained, the VR video image comprising a first image and a second image is divided into pixel blocks with preset row and column numbers; then, aiming at each pixel block in the first image, determining a pixel block in the second image as an adaptation block, and calculating the distance between the pixel block and the adaptation block, wherein the similarity difference value between the pixel block and the adaptation block is minimum; then calculating based on the distance to obtain the depth information of the pixel block corresponding to the distance; before coding, each pixel block is grouped according to the depth information, so that VR video images are at least divided into two groups, and when the VR video is coded subsequently, compared with the video images of low-quality groups, the video images of high-quality groups can be allocated with relatively more code rates, thereby improving the compression efficiency and saving the bandwidth.

Description

Method, apparatus, electronic device and storage medium for facilitating VR video encoding

Technical Field

The present invention relates to the field of video coding, and in particular, to a method, an apparatus, an electronic device, and a storage medium for facilitating VR video coding.

Background

In the current video coding scheme, the algorithms for adaptive layered coding according to video content mainly include the following three types: the method comprises the steps of segmenting an image, carrying out complexity analysis on a coding block, and segmenting or identifying image content. The main problem of the above algorithm is that the calculation amount is generally too large, and the segmentation and identification of the image generally have no real-time property, so that the above algorithm cannot be applied to the live video application with strong real-time property.

Panoramic VR live sports requires high video resolution and frame rate. If the common coding method is directly adopted, the code rate of the video stream is too high, so that the network live broadcast is extremely difficult.

Disclosure of Invention

It is therefore an object of the present invention to provide a method, an apparatus, an electronic device and a storage medium for facilitating VR video encoding, so as to alleviate the above problems.

In a first aspect, an embodiment of the present invention provides a method for facilitating VR video encoding, where the method includes: acquiring a VR video image to be coded, wherein the VR video image comprises a first image and a second image; dividing the first image and the second image into pixel blocks with preset row and column numbers; for each pixel block in the first image, determining a pixel block in the second image as an adaptation block, and calculating the distance between the pixel block and the adaptation block, wherein the similarity difference between the pixel block and the adaptation block is the minimum; calculating depth information of a pixel block corresponding to the distance based on the distance; before encoding, each pixel block is grouped according to the depth information, so that the VR video image is divided into at least two groups.

In a second aspect, an embodiment of the present invention provides an apparatus for facilitating VR video encoding, where an obtaining module is configured to obtain a VR video image to be encoded, where the VR video image includes a first image and a second image; the dividing module is used for dividing the first image and the second image into pixel blocks with preset row and column numbers; a calculating module, configured to determine, for each pixel block in the first image, a pixel block in the second image as an adaptation block, and calculate a distance between the pixel block and the adaptation block, where a similarity difference between the pixel block and the adaptation block is minimum; the calculation module is further configured to calculate depth information of a pixel block corresponding to the distance based on the distance; and the grouping module is used for grouping each pixel block according to the depth information before coding, so that the VR video image is at least divided into two groups.

In a third aspect, an embodiment of the present invention provides an electronic device, including a memory and a processor, which are connected to each other, where the memory stores a computer program, and when the computer program is executed by the processor, the electronic device is caused to perform the method according to the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the method according to the first aspect.

Compared with the prior art, the method, the device, the electronic equipment and the storage medium which are provided by the embodiments of the invention and are convenient for VR video coding have the beneficial effects that: after obtaining a VR video image to be coded, dividing the VR video image including a first image and a second image into pixel blocks with preset row and column numbers; then, aiming at each pixel block in the first image, determining a pixel block in the second image as an adaptation block, and calculating the distance between the pixel block and the adaptation block, wherein the similarity difference value between the pixel block and the adaptation block is minimum; then calculating based on the distance to obtain the depth information of the pixel block corresponding to the distance; before coding, each pixel block is grouped according to the depth information, so that the VR video images are at least divided into two groups, and when the subsequent VR video coding is carried out, compared with the video content of a low-quality group, relatively more code rates are distributed to the video content of a high-quality group, thereby improving the compression efficiency and saving the bandwidth.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a block diagram of an electronic device according to an embodiment of the present invention;

fig. 2 is a flowchart of a method for facilitating VR video encoding according to a first embodiment of the present invention;

fig. 3 is a schematic diagram of the distance from a pixel block to an adaptation block according to the first embodiment of the present invention;

FIG. 4 is a diagram illustrating domain pixel blocks according to a first embodiment of the present invention;

FIG. 5 is a diagram illustrating the determination of an adaptation block according to a first embodiment of the present invention;

FIG. 6 is a schematic diagram illustrating the calculation of constants a and b according to the first embodiment of the present invention;

fig. 7 is a block diagram of an apparatus for facilitating VR video encoding according to a second embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not construed as indicating or implying relative importance.

As shown in fig. 1, is a block schematic diagram of an electronic device 100. The electronic device 100 may include: a device that facilitates VR video encoding, a memory 110, a memory controller 120, a processor 130, a peripheral interface 140, an input output unit 150, an audio unit 160, a display unit 170. The electronic device 100 may be a user terminal, such as a Personal Computer (PC), a tablet computer, a smart phone, a Personal Digital Assistant (PDA), or a server.

The memory 110, the memory controller 120, the processor 130, the peripheral interface 140, the input/output unit 150, the audio unit 160, and the display unit 170 are electrically connected to each other directly or indirectly, so as to implement data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The means for facilitating VR video encoding includes at least one software functional module that may be stored in the memory 110 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the electronic device. The processor 130 is configured to execute executable modules stored in the memory 110, such as software functional modules or computer programs included in the apparatus for facilitating VR video encoding.

The Memory 110 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 110 is used for storing a program, and the processor 130 executes the program after receiving an execution instruction, and the method defined by the flow disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 130, or implemented by the processor 130.

The processor 130 may be an integrated circuit chip having signal processing capabilities. The Processor 130 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The peripherals interface 140 couples various input/output devices to the processor 130 and to the memory 110. In some embodiments, the peripheral interface 140, the processor 130, and the memory controller 120 may be implemented in a single chip. In other examples, they may be implemented separately from each other.

The input and output unit 150 is used for providing input data to the user to realize the interaction of the user with the electronic device 100. The input/output unit 150 may be, but is not limited to, a mouse, a keyboard, and the like.

Audio unit 160 provides an audio interface to a user that may include one or more microphones, one or more speakers, and audio circuitry.

The display unit 170 provides an interactive interface (e.g., a user operation interface) between the electronic device 100 and a user or is used to display image data to a user reference. In this embodiment, the display unit 170 may be a liquid crystal display or a touch display. In the case of a touch display, the display can be a capacitive touch screen or a resistive touch screen, which supports single-point and multi-point touch operations. Supporting single-point and multi-point touch operations means that the touch display can sense touch operations from one or more locations on the touch display at the same time, and the sensed touch operations are sent to the processor 130 for calculation and processing.

First embodiment

Referring to fig. 2, fig. 2 is a flowchart illustrating a method for facilitating VR video encoding according to a first embodiment of the present invention, where the method is applied to an electronic device. The flow shown in fig. 2 will be described in detail below, and the method includes:

step S110: and acquiring a VR video image to be coded, wherein the VR video image comprises a first image and a second image.

Since there are two modes of the VR video image, the VR video image acquired by the electronic device 100 may include a left image and a right image. Wherein there is a viewing angle difference between the left image and the right image.

Optionally, the first image may be a left image, and correspondingly, the second image is a right image; optionally, the first image may be a right image, and correspondingly, the second image is a left image.

Step S120: and dividing the first image and the second image into pixel blocks with preset row and column numbers.

The number of the preset rows and columns may be 8 rows and 8 columns, that is, the first image and the second image are both divided into 8 × 8 pixel blocks, and the size of each pixel block is the same. Of course, as an alternative implementation, a larger (e.g., 9 × 9) or smaller (e.g., 6 × 6) partitioning principle may be used.

Of course, as an alternative embodiment, before the first image and the second image are both divided into a preset number of pixel blocks, in order to improve the calculation efficiency, the first image and the second image may be respectively subjected to binarization processing to obtain a first gradient image corresponding to the first image and a second gradient image corresponding to the second image. The image binarization processing is to set the gray value of a pixel point on an image to be 0 or 255, that is, the whole image is obviously black and white, wherein 0 represents white, and 255 represents black. The gray level image with 256 brightness levels is used for obtaining a binary image which can still reflect the whole and local characteristics of the image by selecting a proper threshold value. For example, the gray scale value of the pixel point whose gray scale value is greater than or equal to the threshold is reset to 255, and the gray scale value of the pixel point whose gray scale value is less than the preset threshold is reset to 0.

The binarization of the image is beneficial to further processing of the image, so that the image is simple, the data volume is reduced, the outline of an interested target can be highlighted, and the calculation complexity can be reduced.

The first image and the second image can be binarized through a SOBEL or CANNEY algorithm, so that a first gradient image corresponding to the first image and a second gradient image corresponding to the second image are obtained.

Step S130: and for each pixel block in the first image, determining a pixel block in the second image as an adaptation block, and calculating the distance between the pixel block and the adaptation block, wherein the similarity difference between the pixel block and the adaptation block is the minimum.

Since the relationship between the first image and the second image belongs to the same VR video image, the difference between the first image and the second image is mainly caused by the viewing angle difference. In this case, for each block S of pixels in the first image, one adaptation block D corresponding to this block S of pixels can be found in the second image. When the pixel block S and each pixel block in the second image are subjected to similarity difference value calculation, the similarity difference value between the adaptation block D as the pixel block S and the pixel block S is minimum. That is, when a pixel block D is searched in the second image so that the SAD is the minimum, where the SAD is the similarity difference value, the pixel block D is determined to be the adaptive block corresponding to the pixel block S in the first image, where S may be referred to as a source pixel block and D may be referred to as a destination pixel block.

When the number of preset rows and columns is 8 × 8 and similarity difference values are calculated, optionally, for each pixel block in the first image, taking the pixel block as a source pixel block S, where a dimension of a pixel block matrix corresponding to each pixel block is 8 × 8, where an element value in the pixel block matrix is a pixel value of a pixel point in the pixel block, and a formula may be based on the formula

Calculating a similarity difference value between the pixel block and each pixel block in the second image, wherein SAD is the similarity difference value, S_ijIs the pixel value of the pixel point of the ith row and the jth column in the pixel block, d_ijAnd the pixel value is the pixel value of the pixel point in the ith row and the jth column in a certain pixel block in the second image. Where i, j is the index of the pixel element in the pixel block matrix.

Referring to fig. 3, after finding the corresponding adaptive block for each pixel block in the first image, a motion displacement, i.e., a distance mv, between the pixel block (i.e., the source pixel block S) and the corresponding adaptive block (i.e., the destination pixel block D) can be calculated through window search.

In order to calculate mv, therefore, no matter the pixel block is a source pixel block or a target pixel block, the coordinates (x, y) of the pixel block are determined based on the upper left corner of the 1/2 image where the pixel block is respectively located as the origin, wherein the difference value between the coordinates of the upper left corner pixel point of the source pixel block and the coordinates of the upper left corner pixel point of the target pixel block is the motion vector mv. Since the first image and the second image are left and right images, the horizontal distance between the coordinates of the upper-left pixel point of the source pixel block and the coordinates of the upper-left pixel point of the target pixel block is a motion vector mv.

Assuming that the coordinates of the source pixel block are (x =8, y = 6), it can be expressed as S_(8,16)The coordinates of the target pixel block are (x =25, y = 16), which can be expressed as D_(25,16)Then the value of mv is 25-8=17; of course, a negative number is also possible, for example, if the coordinates of the destination pixel block are (x = -2,y = -16), then the motion vector mv is-10.

As another alternative, for the source pixel block S in the first image, when the plurality of pixel blocks are searched in the second image so that the SAD is minimum, in order to determine the adaptation block of the source pixel block S, a plurality of pixel blocks adjacent to the source pixel block S are determined as the domain pixel block, where, referring to fig. 4, the domain pixel block may be 8 pixel blocks centered at S in the first image.

After determining the domain pixel blocks, the electronic device 100 may obtain a position area of an adaptation block corresponding to each pixel block included in the domain pixel blocks, count a position distribution of the adaptation block of each pixel block in the domain pixel blocks, and then take an area with a highest occurrence frequency in the position distribution as a target area. Since there is a certain similarity between the domain pixel block and the source pixel block S, the adaptation block D of the source pixel block S is also likely to appear in the target region. Accordingly, a pixel block belonging to the target region among the plurality of pixel blocks that minimizes the SAD may be determined as an adaptation block corresponding to the source pixel block S. Referring to fig. 5, point a appears most frequently as an adaptation block, and thus is taken as an adaptation block corresponding to the source pixel block S.

Step S140: and calculating the depth information of the pixel block corresponding to the distance based on the distance.

Alternatively, it may be based on a formula

And calculating to obtain the depth information, wherein the depth information is the distance between a shooting object and the camera, Z is the depth information, a and b are constants, and mv is the distance between the source pixel block S and the adaptation block.

With reference to FIG. 6, it is shown,

the derivation process of (2) can refer to the following process:

in order to establish a relationship between depth information and mv, therefore, from the distance (d) between a photographic subject and a camera, the parallax (mv) between left and right eye images, and the similarity of triangles, it is possible to obtain: d/dt = dv/mv, where dv is the distance between the camera and the left and right eye images, and dt is the interpupillary distance, and since the distance between the camera and the left and right eye images is not constant, dv in d/dt = dv/mv needs to be eliminated.

Since the focal length of the camera is fk, according to the imaging principle of the camera, a linear relationship between dv and d is established, and then: 1/d +1/dv =1/fk, such that dv is represented by d, please refer to the following formula.

dv＝1/(1/fk-1/d)

By bringing the expression for dv into d/dt = dv/mv, one then can get:

d/dt = 1/((1/fk-1/d) × mv), so that the expression is not affected by dv.

By further deforming d/dt = 1/((1/fk-1/d) × mv), one can then obtain:

d*mv＝dt/(1/fk-1/d)

d*mv＝dt/((d-fk)/(fk*d))

d*mv＝dt*fk*d/(d-fk)

(d-fk)*mv＝dt*fk

further, the equation is obtained: d = dt × fk/mv + fk.

Since the interpupillary distance and the focal length are both constants, the above equation can be further converted into:

because two unknown parameters a and b exist in the above formula, two equations need to be constructed to solve a and b simultaneously, wherein the two parameters can be obtained through two sets of real object shooting images:

wherein d0 is a distance between a real object in the first set of real object captured images and the camera, mv0 is a motion vector between a left eye overlapping image and a right eye image of the real object in the first set of real object captured images, d1 is a distance between the real object in the second set of real object captured images and the camera, mv1 is a motion vector between a left eye overlapping image and a right eye image of the real object in the second set of real object captured images, and d0, d1, mv0 and mv1 are data which can be obtained through measurement and are substituted into the equation set, so that numerical values of a and b can be solved.

Step S150: before encoding, each pixel block is grouped according to the depth information, so that the VR video image is divided into at least two groups.

When the camera is used for shooting an object, when the image is coded, high code rate needs to be distributed to the image of some regions, low code rate needs to be distributed to the image of some regions, so that compression efficiency can be improved, bandwidth can be saved, and code rate needs to be saved on the premise that subjective quality does not have obvious difference.

Therefore, by layering the VR video images, when VR video coding is carried out subsequently, on the basis of the code rate distribution mode, compared with the video images of low-quality groups, the video images of high-quality groups are distributed with relatively more code rates, so that the compression efficiency can be improved, the bandwidth can be saved, and the code rates can be saved on the premise that the subjective quality is not obviously different.

According to the method for facilitating VR video coding provided by the first embodiment of the invention, after a VR video image to be coded is obtained, the VR video image including a first image and a second image is divided into pixel blocks with preset row and column numbers; then, aiming at each pixel block in the first image, determining a pixel block in the second image as an adaptation block, and calculating the distance between the pixel block and the adaptation block, wherein the similarity difference between the pixel block and the adaptation block is the minimum; then calculating based on the distance to obtain the depth information of the pixel block corresponding to the distance; before coding, each pixel block is grouped according to the depth information, so that the VR video images are at least divided into two groups, and when the subsequent VR video coding is carried out, compared with the video images of the low-quality groups, the video images of the high-quality groups are distributed with relatively more code rates, so that the compression efficiency can be improved, and the bandwidth can be saved.

Second embodiment

Referring to fig. 7, fig. 7 is a block diagram illustrating an apparatus 400 for facilitating VR video encoding according to a second embodiment of the present invention. The block diagram of the structure shown in fig. 7 will be explained, and the illustrated apparatus includes:

an obtaining module 410, configured to obtain a VR video image to be encoded, where the VR video image includes a first image and a second image;

a dividing module 420, configured to divide the first image and the second image into pixel blocks with a preset number of rows and columns;

a calculating module 430, configured to determine, for each pixel block in the first image, a pixel block in the second image as an adaptation block, and calculate a distance between the pixel block and the adaptation block, where a similarity difference between the pixel block and the adaptation block is minimum;

the calculating module 430 is further configured to calculate depth information of the pixel block corresponding to the distance based on the distance;

a grouping module 440, configured to group each pixel block according to the depth information before encoding, so that the VR video image is divided into at least two groups.

Optionally, the apparatus further comprises: and the preprocessing module is used for respectively carrying out binarization processing on the first image and the second image to obtain a first gradient image corresponding to the first image and a second gradient image corresponding to the second image.

Optionally, the calculating module 430 is configured to calculate, for each pixel block in the first image, a formula based on

Calculating a similarity difference value between the pixel block and each pixel block in the second image, wherein SAD is the similarity difference value, S_ijFor the pixel block of the ith row and jth column in the first image, d_ijA pixel block of the ith row and the jth column in the second image; when a pixel block is searched in the second image so that the SAD is minimum, the pixel block is determined as being associated with the S_ijThe corresponding adaptation block.

Optionally, the calculating module 430 is further configured to compare the SAD with the S when a plurality of pixel blocks are searched in the second image so that the SAD is minimum_ijDetermining a plurality of adjacent pixel blocks as field pixel blocks; acquiring a position area of an adaptation block corresponding to each pixel block included in the field pixel blocks, and determining an area with the highest occurrence frequency of the adaptation blocks as a target area; the plurality that will minimize the SADThe pixel block belonging to the target area in the pixel blocks is determined as the same as the S_ijA corresponding adaptation block.

Optionally, the calculating module 430 is further configured to base a formula

And calculating to obtain the depth information, wherein Z is the depth information, a and b are constants, and mv is the distance.

For the process of implementing each function of each functional module of the apparatus 400 for facilitating VR video encoding in this embodiment, please refer to the content described in the embodiments shown in fig. 1 to fig. 5, which is not described herein again.

Furthermore, an electronic device may be provided as shown in fig. 1, and includes a memory 110 and a processor 120, which are connected to each other, where the memory 110 stores a computer program, and when the computer program is executed by the processor 120, the electronic device 100 executes the method for facilitating VR video encoding according to any embodiment of the present invention.

Furthermore, an embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the method for facilitating VR video encoding provided by any one of the embodiments of the present invention.

In addition, an embodiment of the present invention further provides a computer program, where the computer program may be stored in a cloud or on a local storage medium, and when the computer program runs on a computer, the computer is enabled to execute the method for facilitating VR video encoding provided in any embodiment of the present invention.

In summary, according to the method, the apparatus, the electronic device and the storage medium for facilitating VR video encoding provided by the embodiments of the present invention, after a VR video image to be encoded is obtained, the VR video image including a first image and a second image is divided into pixel blocks with a preset row number and a preset column number; then, aiming at each pixel block in the first image, determining a pixel block in the second image as an adaptation block, and calculating the distance between the pixel block and the adaptation block, wherein the similarity difference value between the pixel block and the adaptation block is minimum; then calculating based on the distance to obtain the depth information of the pixel block corresponding to the distance; before coding, each pixel block is grouped according to the depth information, so that the VR video images are at least divided into two groups, and when the subsequent VR video coding is carried out, compared with the video images of the low-quality groups, the video images of the high-quality groups are distributed with relatively more code rates, so that the compression efficiency can be improved, and the bandwidth can be saved.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method that facilitates VR video encoding, the method comprising:

acquiring a VR video image to be coded, wherein the VR video image comprises a first image and a second image, and the first image and the second image are images corresponding to different eyes;

dividing the first image and the second image into pixel blocks with preset row and column numbers;

for each pixel block in the first image, determining a pixel block in the second image as an adaptation block, and calculating a distance between the pixel block and the adaptation block, wherein the difference between the similarity between the pixel block and the adaptation block is minimum, and the distance represents a motion displacement between the pixel block and the corresponding adaptation block;

calculating depth information of a pixel block corresponding to the distance based on the distance;

before encoding, grouping each pixel block according to the depth information, so that the VR video image is divided into at least two groups; wherein,

the preset number of rows and columns is 8 × 8, and for each pixel block in the first image, determining a pixel block in the second image as an adaptation block includes:

for each of the pixel blocks in the first image, treating the pixel block as a source pixel block based on a formula

Calculating a similarity difference value between the pixel block and each pixel block in the second image, wherein SAD is the similarity difference value, S_ijFor the pixel in the ith row and jth column in the pixel block, d_ijThe pixel of the ith row and the jth column in a certain pixel block in the second image;

determining a plurality of pixel blocks adjacent to the source pixel block as a neighborhood pixel block when the plurality of pixel blocks are searched in the second image such that the SAD is minimum;

acquiring a position area of an adaptation block corresponding to each pixel block included in the neighborhood pixel blocks, and determining an area with the highest occurrence frequency of the adaptation blocks as a target area;

determining a pixel block belonging to the target region among the plurality of pixel blocks that minimizes the SAD as an adaptation block corresponding to the source pixel block.

2. The method of claim 1, wherein before dividing each of the first image and the second image into a preset number of blocks of pixels, the method further comprises:

and respectively carrying out binarization processing on the first image and the second image to obtain a first gradient image corresponding to the first image and a second gradient image corresponding to the second image.

3. The method of claim 1, wherein calculating depth information of the pixel block corresponding to the distance based on the distance comprises:

based on the formula

4. An apparatus that facilitates VR video encoding, the apparatus comprising:

the device comprises an acquisition module, a coding module and a decoding module, wherein the acquisition module is used for acquiring a VR video image to be coded, the VR video image comprises a first image and a second image, and the first image and the second image are corresponding images of different eyes;

the dividing module is used for dividing the first image and the second image into pixel blocks with preset row and column numbers;

a calculating module, configured to determine, for each pixel block in the first image, a pixel block in the second image as an adaptation block, and calculate a distance between the pixel block and the adaptation block, where a difference in similarity between the pixel block and the adaptation block is minimum, and the distance represents a motion displacement between the pixel block and the corresponding adaptation block;

the computing module is further configured to obtain depth information of the pixel block corresponding to the distance based on the distance computation;

a grouping module, configured to group each pixel block according to the depth information before encoding, so that the VR video image is divided into at least two groups;

the calculation module is used for regarding each pixel block in the first image as a source pixel block based on a formula

Calculating a similarity difference value between the pixel block and each pixel block in the second image, wherein SAD is the similarity difference value, S_ijFor the pixel in the ith row and jth column in the pixel block, d_ijThe pixel of the ith row and the jth column in a certain pixel block in the second image; determining a plurality of pixel blocks adjacent to the source pixel block as a neighborhood pixel block when the plurality of pixel blocks are searched in the second image such that the SAD is minimum; acquiring a position area of an adaptation block corresponding to each pixel block included in the neighborhood pixel blocks, and determining an area with the highest occurrence frequency of the adaptation blocks as a target area; will minimize the SADAnd determining a pixel block belonging to the target area in the plurality of pixel blocks as an adaptation block corresponding to the source pixel block.

5. The apparatus of claim 4, further comprising:

and the preprocessing module is used for respectively carrying out binarization processing on the first image and the second image to obtain a first gradient image corresponding to the first image and a second gradient image corresponding to the second image.

6. An electronic device, comprising an interconnected memory, a processor, a computer program being stored in the memory, the computer program, when executed by the processor, causing the electronic device to perform the method of any of claims 1-3.

7. A computer-readable storage medium, in which a computer program is stored which, when run on a computer, causes the computer to carry out the method according to any one of claims 1-3.