CN106303366B

CN106303366B - Video coding method and device based on regional classification coding

Info

Publication number: CN106303366B
Application number: CN201610685073.8A
Authority: CN
Inventors: 程国艮; 王语
Original assignee: Global Tone Communication Technology Co ltd
Current assignee: Global Tone Communication Technology Co ltd
Priority date: 2016-08-18
Filing date: 2016-08-18
Publication date: 2020-06-19
Anticipated expiration: 2036-08-18
Also published as: CN106303366A

Abstract

The invention discloses a video coding method and a device based on regional classification coding, relating to the technical field of video transmission; the technical problem of how to carry out more effective transmission under a fixed code rate is solved; the technical scheme comprises the following steps: step one, identifying each content area in a video picture; and step two, preprocessing each region to reduce image noise.

Description

Video coding method and device based on regional classification coding

Technical Field

The present invention relates to the field of video transmission technologies, and in particular, to a method and an apparatus for video coding based on region classification coding.

Background

In general, a video conference host is connected to a high-definition camera to capture pictures of a conference room, and performs video encoding transmission as shown in fig. 1. However, due to the influence of light and camera sampling, the shot slide area has larger noise and changes color compared with the original slide image, for example, a pure color area on the slide is shot by a camera, or is not pure color, which results in information distortion and a reduction in compression ratio after video coding. How to perform more effective transmission at a fixed code rate becomes a technical problem to be solved urgently.

Disclosure of Invention

The invention aims to solve the technical problem of more effective transmission under a fixed code rate.

In order to solve the above problem, the present invention provides a method for video coding based on region classification coding, comprising:

step one, identifying each content area in a video picture;

and step two, preprocessing each region respectively to reduce image noise.

The invention also provides a video coding device based on region classification coding, which comprises:

an identification unit that identifies each content area in a video frame;

and the preprocessing unit is used for preprocessing each region respectively to reduce image noise.

The technical scheme of the invention realizes the video coding method and device based on the regional classified coding, and the different regions of the image are preprocessed in different modes, so that the image noise can be reduced, the interested content of the user is highlighted, and the perception quality of the user is improved.

Drawings

Fig. 1 is a schematic diagram of a conventional camera connected to a video conference host;

FIG. 2 is a schematic diagram of a camera of the present invention connected to a video conference host;

FIG. 3 is a schematic diagram of a method for video coding based on region classification coding;

FIG. 4 is a flow chart of a method for video encoding based on region classification coding;

FIG. 5 is a schematic diagram of a pre-processing method for reducing spatial resolution;

fig. 6 is a schematic diagram of an apparatus for video encoding based on region classification coding.

Detailed Description

The technical solution of the present invention will be described in more detail with reference to the accompanying drawings and examples.

It should be noted that, if not conflicting, the embodiments of the present invention and the features of the embodiments may be combined with each other within the scope of protection of the present invention. Additionally, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

In a first embodiment, a method for video coding based on region classification coding, as shown in fig. 3, includes:

step one, identifying each content area in a video picture;

and step two, preprocessing each region respectively to reduce image noise.

In a second embodiment, a method for video coding based on region classification coding, as shown in fig. 4, on the basis of the first embodiment, includes:

further, in the step one, each content area is divided into: a combination of one or more of a face region, a computer display region, an active region, and an inactive region.

The computer display area, the human face area, the active area and the inactive area, and the emphasis of human eyes on perception are different. The face area is of most interest. For an active area, the human eye is better concerned about its motion. Whereas for inactive areas the human eye is more concerned about its details. Therefore, the computer display area, the human face area, the active area and the inactive area are treated differently in the preprocessing step.

The human face area, the computer display area, the active area and the inactive area in the video picture are identified through a pre-labeling or image analysis technology, different areas of the image are preprocessed in different modes before the traditional encoding process, image noise is reduced, the content of interest of a user is highlighted, and the perception quality of the user is improved.

Further, in the second step, each region is preprocessed, and the face region is not preprocessed.

Detecting a face area in a picture by adopting a face detection technology, and marking the area as A; the face region is of most interest, so the face region is not preprocessed.

And further, preprocessing each area, marking a computer picture on a picture acquired by the camera in the computer display area, and replacing the computer display area marked in the picture shot by the camera by using the picture acquired from the computer through affine transformation. As shown in fig. 2.

If the structure of FIG. 2 is adopted, the video conference host is communicated with the camera and the lecture computer, and the original desktop picture is directly acquired on the lecture computer through the API. Through the form of marking, on the picture that the camera was gathered, four angular points of computer picture are marked out, then through affine transformation, use the picture of gathering from the computer to replace the computer display area of marking in the picture that the camera was shot, can effectual promotion video conference terminal computer display area in the picture show the regional display quality, and can effectual improvement compression ratio.

Because the camera is usually fixed in the video conference, the four focuses of the computer display area B can be marked in a pre-marking mode; for the region B, covering a real-time picture obtained from a lecture computer on a frame image through affine transformation; the video conference host is directly connected with the camera and the computer equipment, and the picture is enhanced by acquiring the computer picture in real time and using the affine transformation to transform the corresponding content of the camera picture.

Further, in the second step, each region is preprocessed, and the active region is preprocessed to reduce the spatial resolution.

The active region C is identified in the non-a, non-B regions using a frame difference method.

Further, the pre-processing method for reducing the spatial resolution is to divide the image pixels into M ＊ N cells, and replace the image pixels in each cell with the average value of the pixel values in the cell.

The preprocessing method for reducing the spatial resolution comprises the following steps:

the image pixels are divided into M ＊ N cells, typically 2 x 2 the image pixels in each cell are replaced with the average of the pixel values in the cell, as shown in fig. 5, which reduces spatial resolution and improves video encoding compression.

Further, in the second step, each region is preprocessed, and the inactive region is preprocessed to reduce the time resolution.

The inactive region D is identified and noted.

Further, the preprocessing method for reducing the time resolution comprises the following steps: assuming that the pixel value of a certain point is V, the preprocessed pixel values of the previous n frames are V1, V2, … and Vn respectively, the average value of the pixel values is Vm, and a threshold value t is set, if the absolute value of the difference between V and Vm is not higher than the threshold value t, the pixel value of the certain point after preprocessing is Vm, otherwise, the pixel value is V. Thus, the time resolution is reduced, and the video encoding compression rate is improved.

In a third embodiment, an apparatus for video coding based on region classification coding, as shown in fig. 6, includes:

an identification unit that identifies each content area in a video frame;

In a fourth embodiment, an apparatus for video coding based on region classification coding, as shown in fig. 6, further includes, on the basis of the third embodiment:

further, the identification unit divides each content area into: a combination of one or more of a face region, a computer display region, an active region, and an inactive region.

Further, the preprocessing unit preprocesses each region, and the face region is not preprocessed.

Furthermore, the preprocessing unit preprocesses each area, marks a computer picture on a picture acquired by the camera in the computer display area, and replaces the computer display area marked in the picture shot by the camera with the picture acquired from the computer through affine transformation. As shown in fig. 2.

Further, the preprocessing unit preprocesses each region, and the active region preprocesses to reduce spatial resolution. The active region C is identified in the non-a, non-B regions using a frame difference method.

Further, the preprocessing unit preprocesses each region, and the inactive region preprocesses to reduce the time resolution. The inactive region D is identified and noted.

According to the method, the images in the high-definition video conference are divided into four types of areas according to different attention points of users: the method comprises a face area, a computer display area, an active area and an inactive area, wherein different areas of an image are preprocessed in different modes before a traditional encoding process, so that image noise is reduced, contents which users are interested in are highlighted, and the perception quality of the users is improved.

It will be understood by those skilled in the art that all or part of the steps of the above methods may be implemented by instructing the relevant hardware through a program, and the program may be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present invention is not limited to any specific form of combination of hardware and software.

The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it should be understood that various changes and modifications can be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for video coding based on region classification coding, comprising:

step one, identifying each content area in a video picture;

step two, respectively preprocessing each region to reduce image noise;

in the first step, each content area is divided into: a combination of one or more of a face region, a computer display region, an active region, an inactive region;

preprocessing each region, and preprocessing the inactive region to reduce the time resolution;

the preprocessing method for reducing the time resolution comprises the following steps: assuming that the pixel value of a certain point is V, the preprocessed pixel values of the previous n frames are V1, V2, … and Vn respectively, the average value of the pixel values is Vm, and a threshold value t is set;

preprocessing each region, wherein the face region is not preprocessed;

and secondly, preprocessing each area, marking a computer picture on the picture acquired by the camera in the computer display area, and replacing the computer display area marked in the picture shot by the camera by the picture acquired from the computer through affine transformation.

2. The method of claim 1, wherein in step two, each region is preprocessed, and the active region is preprocessed to reduce spatial resolution.

3. The method of claim 2, wherein the pre-processing to reduce the spatial resolution is by dividing the image pixels into M ＊ N bins and replacing the image pixels in each bin with an average of the pixel values in the bin.

4. An apparatus for video coding based on region classification coding, comprising:

an identification unit that identifies each content area in a video frame;

the preprocessing unit is used for respectively preprocessing each region to reduce image noise;

the identification unit is divided into the following content areas: a combination of one or more of a face region, a computer display region, an active region, an inactive region;

the preprocessing unit is used for preprocessing each area, and the face area is not preprocessed;

the preprocessing unit is used for preprocessing each area, the computer display area marks a computer picture on a picture acquired by the camera, and then the picture acquired from the computer replaces the computer display area marked in the picture shot by the camera through affine transformation.

5. The apparatus of claim 4, wherein the pre-processing unit pre-processes regions, and the active region pre-processes to reduce spatial resolution.

6. The apparatus of claim 5, wherein the pre-processing to reduce the spatial resolution is to divide the image pixels into M ＊ N bins and replace the image pixels in each bin with an average of the pixel values in the bin.

7. The apparatus of claim 4, wherein the pre-processing unit pre-processes regions, and wherein the inactive region pre-processes for reducing temporal resolution.

8. The apparatus of claim 7, wherein the pre-processing method for reducing the temporal resolution is: assuming that the pixel value of a certain point is V, the preprocessed pixel values of the previous n frames are V1, V2, … and Vn respectively, the average value of the pixel values is Vm, and a threshold value t is set, if the absolute value of the difference between V and Vm is not higher than the threshold value t, the pixel value of the certain point after preprocessing is Vm, otherwise, the pixel value is V.