CN106303366B - Video coding method and device based on regional classification coding - Google Patents
Video coding method and device based on regional classification coding Download PDFInfo
- Publication number
- CN106303366B CN106303366B CN201610685073.8A CN201610685073A CN106303366B CN 106303366 B CN106303366 B CN 106303366B CN 201610685073 A CN201610685073 A CN 201610685073A CN 106303366 B CN106303366 B CN 106303366B
- Authority
- CN
- China
- Prior art keywords
- region
- preprocessing
- picture
- area
- preprocessed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention discloses a video coding method and a device based on regional classification coding, relating to the technical field of video transmission; the technical problem of how to carry out more effective transmission under a fixed code rate is solved; the technical scheme comprises the following steps: step one, identifying each content area in a video picture; and step two, preprocessing each region to reduce image noise.
Description
Technical Field
The present invention relates to the field of video transmission technologies, and in particular, to a method and an apparatus for video coding based on region classification coding.
Background
In general, a video conference host is connected to a high-definition camera to capture pictures of a conference room, and performs video encoding transmission as shown in fig. 1. However, due to the influence of light and camera sampling, the shot slide area has larger noise and changes color compared with the original slide image, for example, a pure color area on the slide is shot by a camera, or is not pure color, which results in information distortion and a reduction in compression ratio after video coding. How to perform more effective transmission at a fixed code rate becomes a technical problem to be solved urgently.
Disclosure of Invention
The invention aims to solve the technical problem of more effective transmission under a fixed code rate.
In order to solve the above problem, the present invention provides a method for video coding based on region classification coding, comprising:
step one, identifying each content area in a video picture;
and step two, preprocessing each region respectively to reduce image noise.
The invention also provides a video coding device based on region classification coding, which comprises:
an identification unit that identifies each content area in a video frame;
and the preprocessing unit is used for preprocessing each region respectively to reduce image noise.
The technical scheme of the invention realizes the video coding method and device based on the regional classified coding, and the different regions of the image are preprocessed in different modes, so that the image noise can be reduced, the interested content of the user is highlighted, and the perception quality of the user is improved.
Drawings
Fig. 1 is a schematic diagram of a conventional camera connected to a video conference host;
FIG. 2 is a schematic diagram of a camera of the present invention connected to a video conference host;
FIG. 3 is a schematic diagram of a method for video coding based on region classification coding;
FIG. 4 is a flow chart of a method for video encoding based on region classification coding;
FIG. 5 is a schematic diagram of a pre-processing method for reducing spatial resolution;
fig. 6 is a schematic diagram of an apparatus for video encoding based on region classification coding.
Detailed Description
The technical solution of the present invention will be described in more detail with reference to the accompanying drawings and examples.
It should be noted that, if not conflicting, the embodiments of the present invention and the features of the embodiments may be combined with each other within the scope of protection of the present invention. Additionally, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
In a first embodiment, a method for video coding based on region classification coding, as shown in fig. 3, includes:
step one, identifying each content area in a video picture;
and step two, preprocessing each region respectively to reduce image noise.
The technical scheme of the invention realizes the video coding method and device based on the regional classified coding, and the different regions of the image are preprocessed in different modes, so that the image noise can be reduced, the interested content of the user is highlighted, and the perception quality of the user is improved.
In a second embodiment, a method for video coding based on region classification coding, as shown in fig. 4, on the basis of the first embodiment, includes:
further, in the step one, each content area is divided into: a combination of one or more of a face region, a computer display region, an active region, and an inactive region.
The computer display area, the human face area, the active area and the inactive area, and the emphasis of human eyes on perception are different. The face area is of most interest. For an active area, the human eye is better concerned about its motion. Whereas for inactive areas the human eye is more concerned about its details. Therefore, the computer display area, the human face area, the active area and the inactive area are treated differently in the preprocessing step.
The human face area, the computer display area, the active area and the inactive area in the video picture are identified through a pre-labeling or image analysis technology, different areas of the image are preprocessed in different modes before the traditional encoding process, image noise is reduced, the content of interest of a user is highlighted, and the perception quality of the user is improved.
Further, in the second step, each region is preprocessed, and the face region is not preprocessed.
Detecting a face area in a picture by adopting a face detection technology, and marking the area as A; the face region is of most interest, so the face region is not preprocessed.
And further, preprocessing each area, marking a computer picture on a picture acquired by the camera in the computer display area, and replacing the computer display area marked in the picture shot by the camera by using the picture acquired from the computer through affine transformation. As shown in fig. 2.
If the structure of FIG. 2 is adopted, the video conference host is communicated with the camera and the lecture computer, and the original desktop picture is directly acquired on the lecture computer through the API. Through the form of marking, on the picture that the camera was gathered, four angular points of computer picture are marked out, then through affine transformation, use the picture of gathering from the computer to replace the computer display area of marking in the picture that the camera was shot, can effectual promotion video conference terminal computer display area in the picture show the regional display quality, and can effectual improvement compression ratio.
Because the camera is usually fixed in the video conference, the four focuses of the computer display area B can be marked in a pre-marking mode; for the region B, covering a real-time picture obtained from a lecture computer on a frame image through affine transformation; the video conference host is directly connected with the camera and the computer equipment, and the picture is enhanced by acquiring the computer picture in real time and using the affine transformation to transform the corresponding content of the camera picture.
Further, in the second step, each region is preprocessed, and the active region is preprocessed to reduce the spatial resolution.
The active region C is identified in the non-a, non-B regions using a frame difference method.
Further, the pre-processing method for reducing the spatial resolution is to divide the image pixels into M * N cells, and replace the image pixels in each cell with the average value of the pixel values in the cell.
The preprocessing method for reducing the spatial resolution comprises the following steps:
the image pixels are divided into M * N cells, typically 2 x 2 the image pixels in each cell are replaced with the average of the pixel values in the cell, as shown in fig. 5, which reduces spatial resolution and improves video encoding compression.
Further, in the second step, each region is preprocessed, and the inactive region is preprocessed to reduce the time resolution.
The inactive region D is identified and noted.
Further, the preprocessing method for reducing the time resolution comprises the following steps: assuming that the pixel value of a certain point is V, the preprocessed pixel values of the previous n frames are V1, V2, … and Vn respectively, the average value of the pixel values is Vm, and a threshold value t is set, if the absolute value of the difference between V and Vm is not higher than the threshold value t, the pixel value of the certain point after preprocessing is Vm, otherwise, the pixel value is V. Thus, the time resolution is reduced, and the video encoding compression rate is improved.
In a third embodiment, an apparatus for video coding based on region classification coding, as shown in fig. 6, includes:
an identification unit that identifies each content area in a video frame;
and the preprocessing unit is used for preprocessing each region respectively to reduce image noise.
The technical scheme of the invention realizes the video coding method and device based on the regional classified coding, and the different regions of the image are preprocessed in different modes, so that the image noise can be reduced, the interested content of the user is highlighted, and the perception quality of the user is improved.
In a fourth embodiment, an apparatus for video coding based on region classification coding, as shown in fig. 6, further includes, on the basis of the third embodiment:
further, the identification unit divides each content area into: a combination of one or more of a face region, a computer display region, an active region, and an inactive region.
The computer display area, the human face area, the active area and the inactive area, and the emphasis of human eyes on perception are different. The face area is of most interest. For an active area, the human eye is better concerned about its motion. Whereas for inactive areas the human eye is more concerned about its details. Therefore, the computer display area, the human face area, the active area and the inactive area are treated differently in the preprocessing step.
The human face area, the computer display area, the active area and the inactive area in the video picture are identified through a pre-labeling or image analysis technology, different areas of the image are preprocessed in different modes before the traditional encoding process, image noise is reduced, the content of interest of a user is highlighted, and the perception quality of the user is improved.
Further, the preprocessing unit preprocesses each region, and the face region is not preprocessed.
Detecting a face area in a picture by adopting a face detection technology, and marking the area as A; the face region is of most interest, so the face region is not preprocessed.
Furthermore, the preprocessing unit preprocesses each area, marks a computer picture on a picture acquired by the camera in the computer display area, and replaces the computer display area marked in the picture shot by the camera with the picture acquired from the computer through affine transformation. As shown in fig. 2.
If the structure of FIG. 2 is adopted, the video conference host is communicated with the camera and the lecture computer, and the original desktop picture is directly acquired on the lecture computer through the API. Through the form of marking, on the picture that the camera was gathered, four angular points of computer picture are marked out, then through affine transformation, use the picture of gathering from the computer to replace the computer display area of marking in the picture that the camera was shot, can effectual promotion video conference terminal computer display area in the picture show the regional display quality, and can effectual improvement compression ratio.
Because the camera is usually fixed in the video conference, the four focuses of the computer display area B can be marked in a pre-marking mode; for the region B, covering a real-time picture obtained from a lecture computer on a frame image through affine transformation; the video conference host is directly connected with the camera and the computer equipment, and the picture is enhanced by acquiring the computer picture in real time and using the affine transformation to transform the corresponding content of the camera picture.
Further, the preprocessing unit preprocesses each region, and the active region preprocesses to reduce spatial resolution. The active region C is identified in the non-a, non-B regions using a frame difference method.
Further, the pre-processing method for reducing the spatial resolution is to divide the image pixels into M * N cells, and replace the image pixels in each cell with the average value of the pixel values in the cell.
The preprocessing method for reducing the spatial resolution comprises the following steps:
the image pixels are divided into M * N cells, typically 2 x 2 the image pixels in each cell are replaced with the average of the pixel values in the cell, as shown in fig. 5, which reduces spatial resolution and improves video encoding compression.
Further, the preprocessing unit preprocesses each region, and the inactive region preprocesses to reduce the time resolution. The inactive region D is identified and noted.
Further, the preprocessing method for reducing the time resolution comprises the following steps: assuming that the pixel value of a certain point is V, the preprocessed pixel values of the previous n frames are V1, V2, … and Vn respectively, the average value of the pixel values is Vm, and a threshold value t is set, if the absolute value of the difference between V and Vm is not higher than the threshold value t, the pixel value of the certain point after preprocessing is Vm, otherwise, the pixel value is V. Thus, the time resolution is reduced, and the video encoding compression rate is improved.
According to the method, the images in the high-definition video conference are divided into four types of areas according to different attention points of users: the method comprises a face area, a computer display area, an active area and an inactive area, wherein different areas of an image are preprocessed in different modes before a traditional encoding process, so that image noise is reduced, contents which users are interested in are highlighted, and the perception quality of the users is improved.
It will be understood by those skilled in the art that all or part of the steps of the above methods may be implemented by instructing the relevant hardware through a program, and the program may be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present invention is not limited to any specific form of combination of hardware and software.
The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it should be understood that various changes and modifications can be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (8)
1. A method for video coding based on region classification coding, comprising:
step one, identifying each content area in a video picture;
step two, respectively preprocessing each region to reduce image noise;
in the first step, each content area is divided into: a combination of one or more of a face region, a computer display region, an active region, an inactive region;
preprocessing each region, and preprocessing the inactive region to reduce the time resolution;
the preprocessing method for reducing the time resolution comprises the following steps: assuming that the pixel value of a certain point is V, the preprocessed pixel values of the previous n frames are V1, V2, … and Vn respectively, the average value of the pixel values is Vm, and a threshold value t is set;
preprocessing each region, wherein the face region is not preprocessed;
and secondly, preprocessing each area, marking a computer picture on the picture acquired by the camera in the computer display area, and replacing the computer display area marked in the picture shot by the camera by the picture acquired from the computer through affine transformation.
2. The method of claim 1, wherein in step two, each region is preprocessed, and the active region is preprocessed to reduce spatial resolution.
3. The method of claim 2, wherein the pre-processing to reduce the spatial resolution is by dividing the image pixels into M * N bins and replacing the image pixels in each bin with an average of the pixel values in the bin.
4. An apparatus for video coding based on region classification coding, comprising:
an identification unit that identifies each content area in a video frame;
the preprocessing unit is used for respectively preprocessing each region to reduce image noise;
the identification unit is divided into the following content areas: a combination of one or more of a face region, a computer display region, an active region, an inactive region;
the preprocessing unit is used for preprocessing each area, and the face area is not preprocessed;
the preprocessing unit is used for preprocessing each area, the computer display area marks a computer picture on a picture acquired by the camera, and then the picture acquired from the computer replaces the computer display area marked in the picture shot by the camera through affine transformation.
5. The apparatus of claim 4, wherein the pre-processing unit pre-processes regions, and the active region pre-processes to reduce spatial resolution.
6. The apparatus of claim 5, wherein the pre-processing to reduce the spatial resolution is to divide the image pixels into M * N bins and replace the image pixels in each bin with an average of the pixel values in the bin.
7. The apparatus of claim 4, wherein the pre-processing unit pre-processes regions, and wherein the inactive region pre-processes for reducing temporal resolution.
8. The apparatus of claim 7, wherein the pre-processing method for reducing the temporal resolution is: assuming that the pixel value of a certain point is V, the preprocessed pixel values of the previous n frames are V1, V2, … and Vn respectively, the average value of the pixel values is Vm, and a threshold value t is set, if the absolute value of the difference between V and Vm is not higher than the threshold value t, the pixel value of the certain point after preprocessing is Vm, otherwise, the pixel value is V.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610685073.8A CN106303366B (en) | 2016-08-18 | 2016-08-18 | Video coding method and device based on regional classification coding |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610685073.8A CN106303366B (en) | 2016-08-18 | 2016-08-18 | Video coding method and device based on regional classification coding |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106303366A CN106303366A (en) | 2017-01-04 |
CN106303366B true CN106303366B (en) | 2020-06-19 |
Family
ID=57679842
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610685073.8A Active CN106303366B (en) | 2016-08-18 | 2016-08-18 | Video coding method and device based on regional classification coding |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106303366B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109561239B (en) * | 2018-08-20 | 2019-08-16 | 上海久页信息科技有限公司 | Piece caudal flexure intelligent selection platform |
CN114723928A (en) * | 2021-01-05 | 2022-07-08 | 华为技术有限公司 | Image processing method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101389014A (en) * | 2007-09-14 | 2009-03-18 | 浙江大学 | Resolution variable video encoding and decoding method based on regions |
CN103310411A (en) * | 2012-09-25 | 2013-09-18 | 中兴通讯股份有限公司 | Image local reinforcement method and device |
CN103888710A (en) * | 2012-12-21 | 2014-06-25 | 深圳市捷视飞通科技有限公司 | Video conferencing system and method |
CN103929640A (en) * | 2013-01-15 | 2014-07-16 | 英特尔公司 | Techniques For Managing Video Streaming |
-
2016
- 2016-08-18 CN CN201610685073.8A patent/CN106303366B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101389014A (en) * | 2007-09-14 | 2009-03-18 | 浙江大学 | Resolution variable video encoding and decoding method based on regions |
CN103310411A (en) * | 2012-09-25 | 2013-09-18 | 中兴通讯股份有限公司 | Image local reinforcement method and device |
CN103888710A (en) * | 2012-12-21 | 2014-06-25 | 深圳市捷视飞通科技有限公司 | Video conferencing system and method |
CN103929640A (en) * | 2013-01-15 | 2014-07-16 | 英特尔公司 | Techniques For Managing Video Streaming |
Also Published As
Publication number | Publication date |
---|---|
CN106303366A (en) | 2017-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Rao et al. | A Survey of Video Enhancement Techniques. | |
US20200005468A1 (en) | Method and system of event-driven object segmentation for image processing | |
WO2016101883A1 (en) | Method for face beautification in real-time video and electronic equipment | |
US20170339409A1 (en) | Fast and robust human skin tone region detection for improved video coding | |
US20210281718A1 (en) | Video Processing Method, Electronic Device and Storage Medium | |
CN110189336B (en) | Image generation method, system, server and storage medium | |
US9390511B2 (en) | Temporally coherent segmentation of RGBt volumes with aid of noisy or incomplete auxiliary data | |
US20130308856A1 (en) | Background Detection As An Optimization For Gesture Recognition | |
CN111383201A (en) | Scene-based image processing method and device, intelligent terminal and storage medium | |
CN112954398B (en) | Encoding method, decoding method, device, storage medium and electronic equipment | |
KR20090045288A (en) | Method and device for adaptive video presentation | |
CN112019827B (en) | Method, device, equipment and storage medium for enhancing video image color | |
WO2020108010A1 (en) | Video processing method and apparatus, electronic device and storage medium | |
US20230127009A1 (en) | Joint objects image signal processing in temporal domain | |
CN112686810A (en) | Image processing method and device | |
CN107346417B (en) | Face detection method and device | |
CN106303366B (en) | Video coding method and device based on regional classification coding | |
CN110570441B (en) | Ultra-high definition low-delay video control method and system | |
US20080181462A1 (en) | Apparatus for Monitor, Storage and Back Editing, Retrieving of Digitally Stored Surveillance Images | |
US11354925B2 (en) | Method, apparatus and device for identifying body representation information in image, and computer readable storage medium | |
CN111754412A (en) | Method and device for constructing data pairs and terminal equipment | |
CN116506677A (en) | Color atmosphere processing method and system | |
CN110958460B (en) | Video storage method and device, electronic equipment and storage medium | |
CN112911299B (en) | Video code rate control method and device, electronic equipment and storage medium | |
CN110765919A (en) | Interview image display system and method based on face detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100040 Shijingshan District railway building, Beijing, the 16 floor Applicant after: Chinese translation language through Polytron Technologies Inc Address before: 100040 Shijingshan District railway building, Beijing, the 16 floor Applicant before: Mandarin Technology (Beijing) Co., Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |