CN106803959B

CN106803959B - Video image encoding method, video image decoding method, video image encoding apparatus, video image decoding apparatus, and readable storage medium

Info

Publication number: CN106803959B
Application number: CN201710114491.6A
Authority: CN
Inventors: 杨帆; 荆彦青; 魏学峰; 曹文升; 耿天平
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2017-02-28
Filing date: 2017-02-28
Publication date: 2019-12-27
Anticipated expiration: 2037-02-28
Also published as: CN106803959A

Abstract

The embodiment of the invention discloses a video image coding and decoding method and device, which are used for improving the performance of a video image processing process. The method provided by the embodiment of the invention comprises the following steps: acquiring a plurality of frames of video frame images in a video source file, wherein the processing mode aiming at any one frame of video frame image in the plurality of frames of video frame images is as follows: determining a cut image of the video frame image, wherein the cut image comprises effective pixels in the video frame image, then carrying out contour scanning on the cut image to generate contour data of the cut image, transcoding the cut image according to the contour data to obtain a transcoded image, and compressing the transcoded image to obtain a compressed image corresponding to the video frame image. According to the embodiment of the invention, in the video image processing, the whole image of the video image is not processed, but only the effective pixels in the video image are transcoded and compressed, so that the pixel amount in the video image processing process can be reduced, and the video image processing performance can be improved.

Description

Video image encoding method, video image decoding method, video image encoding apparatus, video image decoding apparatus, and readable storage medium

Technical Field

The present application relates to the field of video technologies, and in particular, to a method and an apparatus for encoding and decoding a video image.

Background

The video compression is to reduce the size of a video file as much as possible under the condition of not damaging the resolution ratio by using an originally recorded high-definition video.

In the prior art, video compression is performed by video compression tools such as format factories, and the compression process includes: and directly transcoding the obtained RGBA image into YUV420 data, carrying out video coding on the YUV420 and carrying out data packaging according to a corresponding format to form a video file. The compressed video playing process comprises the following steps: reading frame-by-frame data in the video file, inputting the frame-by-frame data into a video decoder for decoding, and transcoding the decoded data to finish the final video image display.

In the prior art, the performance is very consumed in the process of transcoding and compressing video images, decoding video and transcoding.

Disclosure of Invention

The embodiment of the invention provides a video image encoding and decoding method and device, which can improve the performance of a video image processing process.

In a first aspect, an embodiment of the present invention provides a video image encoding method, where the method includes:

acquiring multi-frame video frame images in a video source file, and determining a cut image of a target video frame image aiming at a target video frame image in the multi-frame video frame images, wherein the cut image comprises effective pixels in the target video frame image, and the target video frame image is any one of the multi-frame video frame images; carrying out contour scanning on the cut image to generate contour data of the cut image, and transcoding the cut image according to the contour data to obtain a transcoded image; and compressing the transcoded image to obtain a compressed image corresponding to the target video frame image.

In a second aspect, an embodiment of the present invention provides a video image decoding method, where the method includes:

decoding the video file to obtain a decoded image; acquiring contour data, and transcoding the decoded image according to the contour data to obtain a cut image; acquiring spatial position data of the cut image, and generating a video image according to the spatial position data and the cut image, wherein the spatial position data indicates the spatial position of the cut image in the video image.

In a third aspect, an embodiment of the present invention provides a video image encoding apparatus, including:

the acquisition unit is used for acquiring a plurality of frames of video frame images in a video source file;

the image cutting unit is used for determining a cut image of a target video frame image aiming at the target video frame image in the multi-frame video frame images, wherein the cut image comprises effective pixels in the target video frame image, and the target video frame image is any one frame of video frame image in the multi-frame video frame images;

the contour data generating unit is used for carrying out contour scanning on the cut image to generate contour data of the cut image;

the image transcoding unit is used for transcoding the cut image according to the contour data to obtain a transcoded image;

and the image compression unit is used for compressing the transcoded image to obtain a compressed image corresponding to the target video frame image.

In a fourth aspect, an embodiment of the present invention further provides a video image decoding apparatus, where the apparatus includes:

the decoding unit is used for decoding the video file to obtain a decoded image;

the transcoding unit is used for acquiring the contour data and transcoding the decoded image according to the contour data to obtain a cut image;

and the video image generating unit is used for acquiring the spatial position data of the cut image and generating the video image according to the spatial position data and the cut image, wherein the spatial position data indicates the spatial position of the cut image in the video image.

According to the technical scheme, the embodiment of the invention has the following advantages:

in the embodiment of the invention, in the process of video compression, multiple frames of video frame images in a video source file are obtained, and the processing mode aiming at any one frame of video frame image in the multiple frames of video frame images is as follows: determining a cut image of the video frame image, wherein the cut image comprises effective pixels in the video frame image, then carrying out contour scanning on the cut image to generate contour data of the cut image, transcoding the cut image according to the contour data to obtain a transcoded image, and compressing the transcoded image to obtain a compressed image corresponding to the video frame image. According to the embodiment of the invention, in the video image processing, the whole image of the video image is not processed, but the cut image including the effective pixels in the video image is cut out firstly, then the outline scanning is carried out to obtain the outline data, the transcoding is carried out according to the outline data, and the transcoded image is compressed, so that the embodiment of the invention only transcodes and compresses the effective pixels in the video image, the pixel amount in the video image processing process can be reduced, and the video image processing performance can be improved.

Drawings

FIG. 1 is a flow chart of a video image encoding method according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating an embodiment of determining a cropped image from a video image;

FIG. 3 is a schematic diagram of a process for determining contour data from a cropped image according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating a process for determining spatial location data of a cropped image according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a video information file composition according to an embodiment of the present invention;

FIG. 6 is a flowchart of a video image decoding method according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating processing of a video information file according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating two frames of video images in a video source file to be compressed according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a clipped image obtained after clipping the two frames of images in FIG. 8 according to an embodiment of the present invention;

FIG. 10 is a schematic diagram illustrating a method for calculating contour data in an image according to an embodiment of the present invention;

FIG. 11 is a schematic diagram of a video image obtained after reducing and cropping an image according to spatial position data in an embodiment of the present invention;

FIG. 12 is a block diagram of a video encoding apparatus according to an embodiment of the present invention;

FIG. 13 is a block diagram of a functional block of an exemplary video decoding apparatus;

fig. 14 is a schematic diagram of a hardware structure of a terminal device in the embodiment of the present invention.

Detailed Description

The embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.

The video compression is to reduce the size of the video file after transcoding and compressing the original video file without affecting the effect, and decompress and transcode and play the compressed video file in the playing process.

The video compression method in the prior art can not effectively remove redundancy, so that all processing processes of transcoding, code compression, decoding, transcoding and image display in video compression and playing are full-image processing, and performance is very consumed. In the embodiment of the invention, only the effective pixels in the video image are processed during video compression, so that redundant information in compression and playing can be greatly reduced, and the performance consumption in the processing process is reduced.

The video image processing method in the embodiment of the present invention is described in detail below with reference to fig. 1.

101. Acquiring a plurality of frames of video frame images in a video source file;

the video source file is composed of images of one frame and one frame, when the video is pressed, the video source file to be processed is obtained, the video images in the video source file are read frame by frame, and the video images are processed frame by frame.

102. Determining a cutting image of a target video frame image aiming at the target video frame image in the multi-frame video frame images;

the target video frame image is any one of the video images, and the processing from step 102 to step 104 is performed on the target video frame image, which can be understood as the processing from step 102 to step 104 is performed on each frame image in the video image.

The human eyes are not sensitive to the pixels with the excessively small Alpha channel values, and the pixels with the excessively small Alpha channel values have high transparency and belong to invalid pixels. In the embodiment of the invention, the pixel of which the Alpha channel value is less than or equal to the preset channel threshold value is defined as an invalid pixel, and the pixel of which the Alpha channel value is greater than the preset channel threshold value is defined as an effective pixel. Optionally, the preset channel value may be 30, that is, the pixel with the Alpha channel value less than 30 is an invalid pixel.

When the video image is scanned, effective pixels of the target video frame image are scanned and determined, and a rectangular image containing the effective pixels is obtained, wherein the rectangular image is a cut image of the target video frame image.

As shown in fig. 2, the size of the video image is 512 × 480, and the size of the cropped image is 386 × 230, and the cropped image includes the effective pixels in the video image.

103. Carrying out contour scanning on the cut image to generate contour data of the cut image, and transcoding the cut image according to the contour data to obtain a transcoded image;

the obtained trimming image of the target video frame image contains valid pixels of the target video frame image, and because the trimming image is a rectangular image, a part of invalid pixels may be included in the trimming image.

Therefore, the cut image is subjected to contour scanning to obtain contour data of the effective pixels in the cut image, wherein the contour data is a set of the most marginal pixel points of the effective pixels in the cut image. And transcoding the cut image according to the contour data to obtain a transcoded image.

Optionally, the generating of the contour data of the cropped image by performing contour scanning on the cropped image may specifically be: and scanning the cut image line by line, and recording a starting position point and an ending position point of each line of effective pixels in the cut image, wherein the starting position point and the ending position point of each line of effective pixels in the cut image are the contour data of the cut image.

Fig. 3 shows that the start position point of the effective pixel of one line of the calculated clipping image is 40 pixels away from the left boundary of the clipping image, and the end position point is 30 pixels away from the right boundary of the clipping image.

Optionally, transcoding the clipped image according to the contour data actually is: and transcoding pixels within the outline corresponding to the outline data, and setting pixels outside the outline corresponding to the outline data to be zero, so that the effective pixels in the cut image are transcoded.

104. And compressing the transcoded image to obtain a compressed image corresponding to the target video frame image.

And transcoding the cut image according to the contour data to obtain a transcoded image, and compressing the transcoded image to obtain a compressed image, wherein the compression method can adopt a lossless compression method, can also adopt lossy compression, or adopts a method combining lossy compression and lossless compression to compress.

After the processing from step 102 to step 104 is performed on each frame of image in the video image, the compressed image corresponding to each frame of image in the obtained video image is stored as a video file.

According to the embodiment of the invention, in the video image processing, the whole image of the video image is not processed, but the cut image including the effective pixels in the video image is cut out firstly, then the outline scanning is carried out to obtain the outline data, the transcoding is carried out according to the outline data, and the transcoded image is compressed, so that the embodiment of the invention only transcodes and compresses the effective pixels in the video image, the pixel amount in the video image processing process can be reduced, and the video image processing performance can be improved.

Optionally, the specific manner of determining the cropped image of the target video frame image is as follows: scanning the compressed video image sequence, determining a cutting rectangle according to effective pixel points of each video frame image, wherein the cutting rectangle is used for indicating the cutting size of the video image, namely when the cutting image of each video image is determined, determining the cutting image of each video image according to the size of the cutting rectangle, the cutting image and the cutting rectangle are equal in length and width, and the cutting image of each video image can comprise the effective pixel in the video frame image.

The specific process of determining the clipping rectangle according to the effective pixel points of each video frame image may be as follows:

acquiring each frame of video frame image in a plurality of frames of video frame images, and determining an effective rectangle of each frame of video frame image, wherein the effective rectangle of each frame of video frame image is a minimum rectangular area containing effective pixel points of the video frame image; then, determining a cutting rectangle from the effective rectangles corresponding to the video frame images, wherein the width value of the cutting rectangle is the maximum width value in the width values of the effective rectangles of the video frame images, and the length value of the cutting rectangle is the maximum length value in the length values of the effective rectangles of the video frame images; i.e., the maximum length and width of the active rectangle in all frames, is positioned to crop the length and width of the rectangle.

In addition, in order to restore the video file during the playing of the video file, in the processing of the video image, after determining the cropped image of the target video frame image, the spatial position of the cropped image in the target video frame image needs to be determined to obtain the spatial position data of the cropped image.

Optionally, the specific manner of determining the spatial position data of the cropped image in the target video frame image is as follows: and calculating an offset vector of the cut image in the target video frame image, wherein the offset vector specifically takes the lower left corner of the video image as a coordinate origin, and the coordinate value of the lower left corner of the rectangle of the cut image relative to the coordinate origin is calculated. The offset vector is the spatial position data of the clipping image in the target video frame image.

As shown in fig. 4, the offset vector of the cropped image relative to the video image is (50, 60) on the basis of fig. 2.

And after the spatial position data of the cut images are obtained, storing the spatial position data of the cut images corresponding to the video images into a spatial position file.

In addition, in order to restore the video file during the playing of the video file, it is necessary to store the contour data of the cropped image corresponding to each video frame image as a contour data file.

As shown in fig. 5, the video information file formed after the video image coding shown in fig. 1 includes a video file, a spatial position file, and a profile data file.

In the following, referring to fig. 6, a method for decoding a video image of a video information file generated by the video encoding method shown in fig. 1 will be described in detail.

601. Decoding the video file to obtain a decoded image;

when playing a video, first obtaining video file information, as shown in fig. 7, where the video file information includes a video file, a spatial location file, and a profile data file, decoding and transcoding an image according to the video file information, and finally obtaining a display of the video image.

And acquiring the video file from the video file information, and decoding the video file to obtain each decoded image.

602. Acquiring contour data, and transcoding the decoded image according to the contour data to obtain a cut image;

and acquiring a profile data file from the video file information, and acquiring profile data corresponding to each image from the profile data file.

Alternatively, the contour data includes a start position point and an end position point of the effective pixels of each line. Transcoding the decoded image according to the profile data to obtain a cut image specifically comprises the following steps: and transcoding the pixels in the middle from the starting position point to the ending position point corresponding to each line in the decoded image to obtain a cut image.

603. And acquiring spatial position data of the cut image, and generating a video image according to the spatial position data and the cut image.

And acquiring a spatial position file from the video file information, and acquiring spatial position data corresponding to each cut image from the spatial position file, wherein the spatial position data of each cut image indicates the spatial position of the cut image in the video image.

Then, the cropped image is restored to a video image based on the spatial position data. For example: the size of the video image is 512 × 480, the spatial position data corresponding to the clipping image is (50, 60), the video image needs to be restored to 512 × 480, and the vector of the coordinates of the lower left corner of the clipping image from the coordinates of the lower left corner of the video image is (50, 60).

In the embodiment of the present invention, after a video image is processed by using the video encoding method shown in fig. 1, a video file can be decoded to obtain a decoded image, profile data is obtained, the decoded image is transcoded according to the profile data to obtain a cropped image, spatial position data of the cropped image is obtained, and a video image is generated according to the spatial position data and the cropped image. The video file of the embodiment of the invention only contains the information of the effective pixels, but not the whole image information, and in the video playing process, the video file, the profile data piece and the spatial position data are correspondingly decoded and transcoded to obtain the finally displayed video image, and the whole image information is not required to be decoded and transcoded, so that the redundant information in the playing process can be reduced, and the time consumption in the processing process of the video image is reduced.

The method in the implementation of the present invention is described below with reference to specific application scenarios.

Inputting a video source file to be pressed, wherein the frame image format of the video source file is RGBA, carrying out spatial position scanning according to Alpha channel information in the input image RGBA to obtain a cut image and the spatial position file, and carrying out contour scanning on the cut image to generate a contour file. The specific process is as follows:

scanning a frame image sequence in a video source file to be compressed, acquiring the length and width of an effective rectangle of effective pixels of each frame of video image, selecting the maximum length and the maximum width from the effective rectangles of all frames, positioning to obtain the length and width of a cutting rectangle, and cutting each frame according to the size of the cutting rectangle.

For example, a total of 3920 video images in the video source file to be compressed, and shown in fig. 8 are 1024 th and 2275 th frames of video images in the input image sequence, which are shown in 512 × 480.

Scanning 3920 frames of video images in a video source file, determining effective rectangular areas of effective pixels of each frame of image, determining the maximum length of the effective rectangular areas of the effective pixels to be 352px, determining the maximum width of the effective rectangular areas of the effective pixels to be 420px, and finally determining the rectangular size of the cut image to be 352 × 420.

The image of the 1024 th frame and the image of the 2275 th frame are clipped according to the size of the rectangle 352 by 420, and the resultant clipped images are shown in fig. 9. As can be seen from fig. 9, by Alpha cropping, the image is significantly reduced, but the effective pixels in the image are completely preserved.

Meanwhile, in order to restore the video image when playing the video, it is necessary to store spatial position data corresponding to the clipped image when processing the image, and the spatial position data is an offset vector after clipping with respect to that before clipping. The two frames of images shown in fig. 9 retain two vectors (10, 50), (160, 50), respectively, and are stored in the spatial location file.

If information indicating invalid pixels such as an Alpha channel is not acquired during image processing, default full-width image processing can be performed.

And secondly, scanning the cut image line by line and recording the starting point and the ending point of each line of effective pixels.

As shown in fig. 10, Start and End mark the effective Start point and End point of the line of pixels, respectively. The start and end points of all lines indicate the effective contour of this image and the start and end points of each line of pixels are then stored in a contour data file. If the starting point and the ending point of a line of pixels are equal to the length value of the cutting image, no effective pixel exists in the line.

If the information of the invalid pixels such as an Alpha channel and the like is not acquired similarly in the process of scanning and cutting the image, each pixel point can be defaulted to be an effective pixel point, and the whole image of the cut image is processed.

After the contour data are obtained, targeted transcoding is performed according to the contour data, full-screen transcoding is not needed, and only effective pixels in the contour are transcoded. And transcoding the cut image according to the contour data to generate a YUV file required by encoding, and inputting the obtained YUV file into an encoder to be compressed to form a final video file.

And in the playing process of the video, firstly, decompressing the video file to obtain a YUV image.

And then, carrying out corresponding YUV image transcoding according to the marks of the effective pixels of the outline in the outline data file to obtain a cut image, wherein each line is transcoded according to the starting point and the ending point marked in the outline data file, so that the transcoding efficiency can be greatly improved.

And then, acquiring spatial position data, and finally displaying the transcoded clipping image according to the spatial position data. The method comprises the following steps: after the cropped image is obtained, display restoration is performed according to the spatial position data of each frame, as shown in fig. 11, the spatial position data corresponding to the cropped image is a vector (10, 50), and the finally displayed video image is obtained after position restoration is performed according to the vector (10, 50), where it should be noted that the rectangular frame line of the inside cropped image in fig. 11 is used to illustrate the boundary of the cropped image and is not displayed in the video image.

As can be seen from fig. 11, after the image is processed by the video image encoding method in the embodiment of the present invention, the image can be completely restored when the video image is decoded and played, and the display effect is the same as that of the full-width image processing in the prior art.

As in the images of fig. 8 to 11, the image texture refresh decreases from 512 × 480 to 352 × 420, and the texture refresh decreases by 30%, increasing the refresh rate.

By the technical scheme in the embodiment of the invention, transcoding and code pressing in video pressing and decoding and transcoding operation processing in video playing are all reduced due to the reduction of the processing pixel quantity, so that the performance is greatly improved. The reduction in the number of processed pixels is due to the fact that only valid pixels in an image are processed, due to the contour data file and the spatial location file, which indicate valid pixels of an image.

In the scheme of the embodiment of the invention, at a mobile client platform: performance improvement is very significant on Android (Android) platforms and IOS platforms, as well as embedded platforms (on Arm platforms and X86 platforms). In Android, the length and width processing of each frame of image of the example images in fig. 8 to 11 is reduced from 512 by 480 to 352 by 420, so that the amount of encoding, decoding and display refreshing pixels is reduced by 30%, while the amount of transcoding pixel processing in compression and video playing is reduced by 75%, and the processing speed is increased by 3 times.

The above is an explanation of the method in the embodiment of the present invention, and the video image encoding apparatus and the video image decoding apparatus in the embodiment of the present invention are explained below from the perspective of functional modules.

Fig. 12 shows a functional block structure of a video image encoding apparatus according to an embodiment of the present invention, which specifically implements functions corresponding to the video image encoding methods provided in fig. 1 to 11. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software programs by hardware. The hardware and software include one or more unit modules corresponding to the above functions, which may be software and/or hardware.

Specifically, the video image encoding device includes:

an obtaining unit 1201, configured to obtain a plurality of frames of video frame images in a video source file;

an image cropping unit 1202, configured to determine, for a target video frame image in multiple video frame images, a cropped image of the target video frame image, where the cropped image includes an effective pixel in the target video frame image, and the target video frame image is any one of the multiple video frame images;

a contour data generating unit 1203, configured to perform contour scanning on the cut image to generate contour data of the cut image;

the image transcoding unit 1204 is configured to transcode the cut image according to the profile data to obtain a transcoded image;

and an image compression unit 1205, configured to compress the transcoded image to obtain a compressed image corresponding to the target video frame image.

In some specific embodiments, the apparatus further comprises:

a spatial position determining unit 1206, configured to determine a spatial position of the cropped image in the target video frame image to obtain spatial position data of the cropped image;

a storage unit 1207 for holding the compressed image, the contour data of the cropped image, and the spatial position data of the cropped image.

In some specific embodiments, the image cropping unit 1202 is specifically configured to determine a cropping rectangle according to effective pixel points of each video frame image in the multiple video frame images, determine a cropping image of the target video frame image according to the cropping rectangle, and the cropping image and the cropping rectangle are equal in length and width.

In some specific embodiments, the image cropping unit 1202 is specifically configured to, for each frame of video frame image in the multiple frames of video frame images, obtain an effective rectangle of the video frame image, where the effective rectangle includes a minimum rectangular region of an effective pixel point of the video frame image, determine a cropping rectangle from the effective rectangles of the video frame images of the multiple frames of video frame images, where a width value of the cropping rectangle is a maximum width value among width values of the effective rectangles of the video frame images, and a length value of the cropping rectangle is a maximum length value among length values in the effective rectangles of the video frame images; and determining a cutting image of the target video frame image according to the cutting rectangle, wherein the cutting image and the cutting rectangle have the same length and the same width.

In some specific embodiments, the spatial position determining unit 1206 is specifically configured to calculate an offset vector of the cropped image in the target video frame image, where the offset vector is spatial position data of the cropped image in the target video frame image.

In some specific embodiments, the contour data generating unit 1203 is specifically configured to scan the cropped image line by line, and record a start position point and an end position point of the effective pixel of each line in the cropped image, where the start position point and the end position point of the effective pixel of each line in the cropped image are the contour data of the cropped image.

In some specific embodiments, the image transcoding unit 1204 is specifically configured to transcode pixels within the outline corresponding to the outline data to obtain the transcoded image.

Fig. 13 shows a functional module structure of a video image decoding apparatus according to an embodiment of the present invention, which specifically implements functions corresponding to the video image decoding methods provided in fig. 1 to 11. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software programs by hardware. The hardware and software include one or more unit modules corresponding to the above functions, which may be software and/or hardware.

Specifically, the video image decoding apparatus includes:

a decoding unit 1301, configured to decode the video file to obtain a decoded image;

the transcoding unit 1302 is configured to obtain the contour data, and transcode the decoded image according to the contour data to obtain a cut image;

and a video image generating unit 1303 configured to acquire spatial position data of the cropped image, and generate a video image according to the spatial position data and the cropped image, where the spatial position data indicates a spatial position of the cropped image in the video image.

In some specific embodiments, the contour data includes a start position point and an end position point of the effective pixels of each line;

the transcoding unit 1302 is specifically configured to transcode, for each line of pixels in the decoded image, pixels in the middle from the start position point to the end position point corresponding to the line, so as to obtain a cropped image.

The video image encoding apparatus and the video image decoding apparatus in the embodiments of the present invention may be in the form of a terminal device (e.g., a computer). The terminal equipment of the invention comprises a desktop computer, handheld equipment, vehicle-mounted equipment, wearable equipment and various forms of user equipment. The handheld device may be any terminal device including a mobile phone, a tablet computer, a PDA (personal digital Assistant), and the like.

Fig. 14 is a schematic structural diagram of a terminal device according to an embodiment of the present invention, where the terminal device 14 may generate a relatively large difference due to different configurations or performances, and may include a processor (CPU) 1410 and a memory 1450. Where the memory 1450 has stored thereon one or more stored application programs, data, and an operating system, the program stored in the memory 1450 may include one or more modules, each module including a sequence of instruction operations. In particular, the memory 1450 has stored therein a gaming application.

The processor 1410 communicates with the memory 1450, and the processor 1410 calls a video image encoding method and a video image decoding method stored in the memory 1450 to implement the schemes described in fig. 1 to 11 above.

Furthermore, the present invention also provides a computer storage medium storing an application program that, when executed, includes some or all of the steps in the above-described video image encoding method and video image decoding method (embodiments shown in fig. 1 to 11).

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A video image encoding method, comprising:

acquiring a plurality of frames of video frame images in a video source file;

performing spatial position scanning on a target video frame image in the multi-frame video frame images according to Alpha channel information to determine a cut image of the target video frame image and record information indicating invalid pixels, wherein the cut image comprises the valid pixels in the target video frame image, and the target video frame image is any one of the multi-frame video frame images; the effective pixels are pixels of which Alpha channel values are larger than a preset channel threshold value; the invalid pixel is a pixel with an Alpha channel value less than or equal to the channel threshold value;

performing contour scanning on the cut image according to the information indicating the invalid pixels to generate contour data of the valid pixels in the cut image, and transcoding pixels within a contour corresponding to the contour data to obtain a YUV image;

and compressing the YUV image to obtain a compressed image corresponding to the target video frame image.

2. The method of claim 1, further comprising:

determining the spatial position of the cut image in the target video frame image to obtain spatial position data of the cut image;

and saving the compressed image, the contour data of the cutting image and the spatial position data of the cutting image.

3. The method of claim 1 or 2, wherein the determining a cropped image of the target video frame image comprises:

determining a cutting rectangle according to effective pixel points of each video frame image in the multi-frame video frame image;

and determining the cutting image of the target video frame image according to the cutting rectangle, wherein the cutting image and the cutting rectangle have the same length and the same width.

4. The method of claim 3, wherein said determining a cropping rectangle from active pixels of each of said plurality of video frame images comprises:

aiming at each frame of video frame image in the multi-frame video frame images, acquiring an effective rectangle of the video frame image, wherein the effective rectangle comprises a minimum rectangular area of effective pixel points of the video frame image;

determining a cropping rectangle from the effective rectangles of each video frame image of the multi-frame video frame images, wherein the width value of the cropping rectangle is the maximum width value of the width values of the effective rectangles of each video frame image, and the length value of the cropping rectangle is the maximum length value of the length values of the effective rectangles of each video frame image.

5. The method of claim 2, wherein determining spatial position data of the cropped image in the target video frame image comprises:

and calculating an offset vector of the cut image in the target video frame image, wherein the offset vector is spatial position data of the cut image in the target video frame image.

6. The method of claim 1, wherein generating the contour data of the cropped image by contour scanning the cropped image based on the information identifying the invalid pixels comprises:

scanning the cut image line by line, and recording a starting position point and an ending position point of each line of effective pixels in the cut image according to the information for marking the invalid pixels, wherein the starting position point and the ending position point of each line of effective pixels in the cut image are the contour data of the cut image;

and if the information for marking invalid pixels is not acquired in the process of scanning the cutting image, each pixel point is defaulted to be an effective pixel point.

7. A method for decoding video images, comprising:

decoding the video file to obtain a decoded YUV image;

acquiring contour data of effective pixels, and transcoding pixels within a contour corresponding to the contour data in the decoded YUV image according to the contour data to obtain a cut image; the cutting image is determined by scanning the space position of a target video frame image in a multi-frame video frame image according to Alpha channel information in the video image coding process, the target video frame image is any one of the multi-frame video frame image, the cutting image comprises the effective pixel, and the effective pixel is a pixel with an Alpha channel value larger than a preset channel threshold value; the invalid pixel is a pixel with an Alpha channel value less than or equal to the channel threshold value; the profile data is: in the process of encoding the video image, carrying out contour scanning generation on the cut image according to the information which is recorded when the spatial position of the target video frame image is scanned and indicates invalid pixels;

acquiring spatial position data of the cut image, and generating a video image according to the spatial position data and the cut image, wherein the spatial position data indicates the spatial position of the cut image in the video image.

8. The method of claim 7,

the contour data includes a start position point and an end position point of effective pixels of each line;

transcoding pixels within a contour corresponding to the contour data in the decoded YUV image according to the contour data to obtain a cut image comprises the following steps:

and transcoding the pixels in the middle from the starting position point to the ending position point corresponding to each line in the decoded YUV image to obtain the cut image.

9. A video image encoding apparatus, comprising:

the image cutting unit is used for scanning the space position of a target video frame image in the multi-frame video frame images according to Alpha channel information so as to determine the cut image of the target video frame image and record information for marking invalid pixels, wherein the cut image comprises the valid pixels in the target video frame image, and the target video frame image is any one of the multi-frame video frame images; the effective pixels are pixels of which Alpha channel values are larger than a preset channel threshold value; the invalid pixel is a pixel with an Alpha channel value less than or equal to the channel threshold value;

the contour data generating unit is used for carrying out contour scanning on the cut image according to the information for marking the invalid pixels to generate contour data of the valid pixels in the cut image;

the image transcoding unit is used for transcoding pixels within the outline corresponding to the outline data to obtain YUV images;

and the image compression unit is used for compressing the YUV image to obtain a compressed image corresponding to the target video frame image.

10. The apparatus of claim 9, further comprising:

the spatial position determining unit is used for determining the spatial position of the cut image in the target video frame image to obtain spatial position data of the cut image;

and the storage unit is used for storing the compressed image, the contour data of the cutting image and the spatial position data of the cutting image.

11. The apparatus of claim 9 or 10, wherein:

the image clipping unit is specifically configured to determine a clipping rectangle according to effective pixel points of each video frame image in the multiple frames of video frame images, determine the clipping image of the target video frame image according to the clipping rectangle, and the clipping image and the clipping rectangle are equal in length and width.

12. The apparatus of claim 11, wherein:

the image cropping unit is specifically configured to obtain, for each frame of video frame image in the multiple frames of video frame images, an effective rectangle of the video frame image, where the effective rectangle includes a minimum rectangular region of an effective pixel point of the video frame image, determine a cropping rectangle from the effective rectangles of the video frame images of the multiple frames of video frame images, where a width value of the cropping rectangle is a maximum width value among width values of the effective rectangles of the video frame images, and a length value of the cropping rectangle is a maximum length value among length values in the effective rectangles of the video frame images; and determining the cutting image of the target video frame image according to the cutting rectangle, wherein the cutting image and the cutting rectangle have the same length and the same width.

13. The apparatus of claim 10,

and the spatial position determining unit is specifically configured to calculate an offset vector of the cropped image in the target video frame image, where the offset vector is spatial position data of the cropped image in the target video frame image.

14. The apparatus of claim 9, wherein:

and the contour data generating unit is specifically used for scanning the cut image line by line and recording the starting position point and the ending position point of the effective pixel of each line in the cut image, wherein the starting position point and the ending position point of the effective pixel of each line in the cut image are the contour data of the cut image.

15. A video image decoding apparatus, comprising:

the decoding unit is used for decoding the video file to obtain a decoded YUV image;

the transcoding unit is used for acquiring contour data of effective pixels and transcoding pixels within a contour corresponding to the contour data in the decoded YUV image according to the contour data to obtain a cut image; the cutting image is determined by scanning the space position of a target video frame image in a multi-frame video frame image according to Alpha channel information in the video image coding process, the target video frame image is any one of the multi-frame video frame image, the cutting image comprises the effective pixel, and the effective pixel is a pixel with an Alpha channel value larger than a preset channel threshold value; the invalid pixel is a pixel with an Alpha channel value less than or equal to the channel threshold value; the profile data is: in the process of encoding the video image, carrying out contour scanning generation on the cut image according to the information which is recorded when the spatial position of the target video frame image is scanned and indicates invalid pixels;

and the video image generating unit is used for acquiring spatial position data of the cut image and generating a video image according to the spatial position data and the cut image, wherein the spatial position data indicates the spatial position of the cut image in the video image.

16. The apparatus of claim 15,

the transcoding unit is specifically configured to transcode, for each row of pixels in the decoded YUV image, pixels in the middle from a start position point to an end position point corresponding to the row, and obtain the cropped image.

17. A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the video image encoding method according to any one of claims 1 to 6 or carries out the steps of the video image decoding method according to any one of claims 7 to 8.