CN106303366B - Video coding method and device based on regional classification coding - Google Patents

Video coding method and device based on regional classification coding Download PDF

Info

Publication number
CN106303366B
CN106303366B CN201610685073.8A CN201610685073A CN106303366B CN 106303366 B CN106303366 B CN 106303366B CN 201610685073 A CN201610685073 A CN 201610685073A CN 106303366 B CN106303366 B CN 106303366B
Authority
CN
China
Prior art keywords
region
preprocessing
picture
area
preprocessed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610685073.8A
Other languages
Chinese (zh)
Other versions
CN106303366A (en
Inventor
程国艮
王语
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Global Tone Communication Technology Co ltd
Original Assignee
Global Tone Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Global Tone Communication Technology Co ltd filed Critical Global Tone Communication Technology Co ltd
Priority to CN201610685073.8A priority Critical patent/CN106303366B/en
Publication of CN106303366A publication Critical patent/CN106303366A/en
Application granted granted Critical
Publication of CN106303366B publication Critical patent/CN106303366B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a video coding method and a device based on regional classification coding, relating to the technical field of video transmission; the technical problem of how to carry out more effective transmission under a fixed code rate is solved; the technical scheme comprises the following steps: step one, identifying each content area in a video picture; and step two, preprocessing each region to reduce image noise.

Description

Video coding method and device based on regional classification coding
Technical Field
The present invention relates to the field of video transmission technologies, and in particular, to a method and an apparatus for video coding based on region classification coding.
Background
In general, a video conference host is connected to a high-definition camera to capture pictures of a conference room, and performs video encoding transmission as shown in fig. 1. However, due to the influence of light and camera sampling, the shot slide area has larger noise and changes color compared with the original slide image, for example, a pure color area on the slide is shot by a camera, or is not pure color, which results in information distortion and a reduction in compression ratio after video coding. How to perform more effective transmission at a fixed code rate becomes a technical problem to be solved urgently.
Disclosure of Invention
The invention aims to solve the technical problem of more effective transmission under a fixed code rate.
In order to solve the above problem, the present invention provides a method for video coding based on region classification coding, comprising:
step one, identifying each content area in a video picture;
and step two, preprocessing each region respectively to reduce image noise.
The invention also provides a video coding device based on region classification coding, which comprises:
an identification unit that identifies each content area in a video frame;
and the preprocessing unit is used for preprocessing each region respectively to reduce image noise.
The technical scheme of the invention realizes the video coding method and device based on the regional classified coding, and the different regions of the image are preprocessed in different modes, so that the image noise can be reduced, the interested content of the user is highlighted, and the perception quality of the user is improved.
Drawings
Fig. 1 is a schematic diagram of a conventional camera connected to a video conference host;
FIG. 2 is a schematic diagram of a camera of the present invention connected to a video conference host;
FIG. 3 is a schematic diagram of a method for video coding based on region classification coding;
FIG. 4 is a flow chart of a method for video encoding based on region classification coding;
FIG. 5 is a schematic diagram of a pre-processing method for reducing spatial resolution;
fig. 6 is a schematic diagram of an apparatus for video encoding based on region classification coding.
Detailed Description
The technical solution of the present invention will be described in more detail with reference to the accompanying drawings and examples.
It should be noted that, if not conflicting, the embodiments of the present invention and the features of the embodiments may be combined with each other within the scope of protection of the present invention. Additionally, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
In a first embodiment, a method for video coding based on region classification coding, as shown in fig. 3, includes:
step one, identifying each content area in a video picture;
and step two, preprocessing each region respectively to reduce image noise.
The technical scheme of the invention realizes the video coding method and device based on the regional classified coding, and the different regions of the image are preprocessed in different modes, so that the image noise can be reduced, the interested content of the user is highlighted, and the perception quality of the user is improved.
In a second embodiment, a method for video coding based on region classification coding, as shown in fig. 4, on the basis of the first embodiment, includes:
further, in the step one, each content area is divided into: a combination of one or more of a face region, a computer display region, an active region, and an inactive region.
The computer display area, the human face area, the active area and the inactive area, and the emphasis of human eyes on perception are different. The face area is of most interest. For an active area, the human eye is better concerned about its motion. Whereas for inactive areas the human eye is more concerned about its details. Therefore, the computer display area, the human face area, the active area and the inactive area are treated differently in the preprocessing step.
The human face area, the computer display area, the active area and the inactive area in the video picture are identified through a pre-labeling or image analysis technology, different areas of the image are preprocessed in different modes before the traditional encoding process, image noise is reduced, the content of interest of a user is highlighted, and the perception quality of the user is improved.
Further, in the second step, each region is preprocessed, and the face region is not preprocessed.
Detecting a face area in a picture by adopting a face detection technology, and marking the area as A; the face region is of most interest, so the face region is not preprocessed.
And further, preprocessing each area, marking a computer picture on a picture acquired by the camera in the computer display area, and replacing the computer display area marked in the picture shot by the camera by using the picture acquired from the computer through affine transformation. As shown in fig. 2.
If the structure of FIG. 2 is adopted, the video conference host is communicated with the camera and the lecture computer, and the original desktop picture is directly acquired on the lecture computer through the API. Through the form of marking, on the picture that the camera was gathered, four angular points of computer picture are marked out, then through affine transformation, use the picture of gathering from the computer to replace the computer display area of marking in the picture that the camera was shot, can effectual promotion video conference terminal computer display area in the picture show the regional display quality, and can effectual improvement compression ratio.
Because the camera is usually fixed in the video conference, the four focuses of the computer display area B can be marked in a pre-marking mode; for the region B, covering a real-time picture obtained from a lecture computer on a frame image through affine transformation; the video conference host is directly connected with the camera and the computer equipment, and the picture is enhanced by acquiring the computer picture in real time and using the affine transformation to transform the corresponding content of the camera picture.
Further, in the second step, each region is preprocessed, and the active region is preprocessed to reduce the spatial resolution.
The active region C is identified in the non-a, non-B regions using a frame difference method.
Further, the pre-processing method for reducing the spatial resolution is to divide the image pixels into M * N cells, and replace the image pixels in each cell with the average value of the pixel values in the cell.
The preprocessing method for reducing the spatial resolution comprises the following steps:
the image pixels are divided into M * N cells, typically 2 x 2 the image pixels in each cell are replaced with the average of the pixel values in the cell, as shown in fig. 5, which reduces spatial resolution and improves video encoding compression.
Further, in the second step, each region is preprocessed, and the inactive region is preprocessed to reduce the time resolution.
The inactive region D is identified and noted.
Further, the preprocessing method for reducing the time resolution comprises the following steps: assuming that the pixel value of a certain point is V, the preprocessed pixel values of the previous n frames are V1, V2, … and Vn respectively, the average value of the pixel values is Vm, and a threshold value t is set, if the absolute value of the difference between V and Vm is not higher than the threshold value t, the pixel value of the certain point after preprocessing is Vm, otherwise, the pixel value is V. Thus, the time resolution is reduced, and the video encoding compression rate is improved.
In a third embodiment, an apparatus for video coding based on region classification coding, as shown in fig. 6, includes:
an identification unit that identifies each content area in a video frame;
and the preprocessing unit is used for preprocessing each region respectively to reduce image noise.
The technical scheme of the invention realizes the video coding method and device based on the regional classified coding, and the different regions of the image are preprocessed in different modes, so that the image noise can be reduced, the interested content of the user is highlighted, and the perception quality of the user is improved.
In a fourth embodiment, an apparatus for video coding based on region classification coding, as shown in fig. 6, further includes, on the basis of the third embodiment:
further, the identification unit divides each content area into: a combination of one or more of a face region, a computer display region, an active region, and an inactive region.
The computer display area, the human face area, the active area and the inactive area, and the emphasis of human eyes on perception are different. The face area is of most interest. For an active area, the human eye is better concerned about its motion. Whereas for inactive areas the human eye is more concerned about its details. Therefore, the computer display area, the human face area, the active area and the inactive area are treated differently in the preprocessing step.
The human face area, the computer display area, the active area and the inactive area in the video picture are identified through a pre-labeling or image analysis technology, different areas of the image are preprocessed in different modes before the traditional encoding process, image noise is reduced, the content of interest of a user is highlighted, and the perception quality of the user is improved.
Further, the preprocessing unit preprocesses each region, and the face region is not preprocessed.
Detecting a face area in a picture by adopting a face detection technology, and marking the area as A; the face region is of most interest, so the face region is not preprocessed.
Furthermore, the preprocessing unit preprocesses each area, marks a computer picture on a picture acquired by the camera in the computer display area, and replaces the computer display area marked in the picture shot by the camera with the picture acquired from the computer through affine transformation. As shown in fig. 2.
If the structure of FIG. 2 is adopted, the video conference host is communicated with the camera and the lecture computer, and the original desktop picture is directly acquired on the lecture computer through the API. Through the form of marking, on the picture that the camera was gathered, four angular points of computer picture are marked out, then through affine transformation, use the picture of gathering from the computer to replace the computer display area of marking in the picture that the camera was shot, can effectual promotion video conference terminal computer display area in the picture show the regional display quality, and can effectual improvement compression ratio.
Because the camera is usually fixed in the video conference, the four focuses of the computer display area B can be marked in a pre-marking mode; for the region B, covering a real-time picture obtained from a lecture computer on a frame image through affine transformation; the video conference host is directly connected with the camera and the computer equipment, and the picture is enhanced by acquiring the computer picture in real time and using the affine transformation to transform the corresponding content of the camera picture.
Further, the preprocessing unit preprocesses each region, and the active region preprocesses to reduce spatial resolution. The active region C is identified in the non-a, non-B regions using a frame difference method.
Further, the pre-processing method for reducing the spatial resolution is to divide the image pixels into M * N cells, and replace the image pixels in each cell with the average value of the pixel values in the cell.
The preprocessing method for reducing the spatial resolution comprises the following steps:
the image pixels are divided into M * N cells, typically 2 x 2 the image pixels in each cell are replaced with the average of the pixel values in the cell, as shown in fig. 5, which reduces spatial resolution and improves video encoding compression.
Further, the preprocessing unit preprocesses each region, and the inactive region preprocesses to reduce the time resolution. The inactive region D is identified and noted.
Further, the preprocessing method for reducing the time resolution comprises the following steps: assuming that the pixel value of a certain point is V, the preprocessed pixel values of the previous n frames are V1, V2, … and Vn respectively, the average value of the pixel values is Vm, and a threshold value t is set, if the absolute value of the difference between V and Vm is not higher than the threshold value t, the pixel value of the certain point after preprocessing is Vm, otherwise, the pixel value is V. Thus, the time resolution is reduced, and the video encoding compression rate is improved.
According to the method, the images in the high-definition video conference are divided into four types of areas according to different attention points of users: the method comprises a face area, a computer display area, an active area and an inactive area, wherein different areas of an image are preprocessed in different modes before a traditional encoding process, so that image noise is reduced, contents which users are interested in are highlighted, and the perception quality of the users is improved.
It will be understood by those skilled in the art that all or part of the steps of the above methods may be implemented by instructing the relevant hardware through a program, and the program may be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present invention is not limited to any specific form of combination of hardware and software.
The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it should be understood that various changes and modifications can be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (8)

1. A method for video coding based on region classification coding, comprising:
step one, identifying each content area in a video picture;
step two, respectively preprocessing each region to reduce image noise;
in the first step, each content area is divided into: a combination of one or more of a face region, a computer display region, an active region, an inactive region;
preprocessing each region, and preprocessing the inactive region to reduce the time resolution;
the preprocessing method for reducing the time resolution comprises the following steps: assuming that the pixel value of a certain point is V, the preprocessed pixel values of the previous n frames are V1, V2, … and Vn respectively, the average value of the pixel values is Vm, and a threshold value t is set;
preprocessing each region, wherein the face region is not preprocessed;
and secondly, preprocessing each area, marking a computer picture on the picture acquired by the camera in the computer display area, and replacing the computer display area marked in the picture shot by the camera by the picture acquired from the computer through affine transformation.
2. The method of claim 1, wherein in step two, each region is preprocessed, and the active region is preprocessed to reduce spatial resolution.
3. The method of claim 2, wherein the pre-processing to reduce the spatial resolution is by dividing the image pixels into M * N bins and replacing the image pixels in each bin with an average of the pixel values in the bin.
4. An apparatus for video coding based on region classification coding, comprising:
an identification unit that identifies each content area in a video frame;
the preprocessing unit is used for respectively preprocessing each region to reduce image noise;
the identification unit is divided into the following content areas: a combination of one or more of a face region, a computer display region, an active region, an inactive region;
the preprocessing unit is used for preprocessing each area, and the face area is not preprocessed;
the preprocessing unit is used for preprocessing each area, the computer display area marks a computer picture on a picture acquired by the camera, and then the picture acquired from the computer replaces the computer display area marked in the picture shot by the camera through affine transformation.
5. The apparatus of claim 4, wherein the pre-processing unit pre-processes regions, and the active region pre-processes to reduce spatial resolution.
6. The apparatus of claim 5, wherein the pre-processing to reduce the spatial resolution is to divide the image pixels into M * N bins and replace the image pixels in each bin with an average of the pixel values in the bin.
7. The apparatus of claim 4, wherein the pre-processing unit pre-processes regions, and wherein the inactive region pre-processes for reducing temporal resolution.
8. The apparatus of claim 7, wherein the pre-processing method for reducing the temporal resolution is: assuming that the pixel value of a certain point is V, the preprocessed pixel values of the previous n frames are V1, V2, … and Vn respectively, the average value of the pixel values is Vm, and a threshold value t is set, if the absolute value of the difference between V and Vm is not higher than the threshold value t, the pixel value of the certain point after preprocessing is Vm, otherwise, the pixel value is V.
CN201610685073.8A 2016-08-18 2016-08-18 Video coding method and device based on regional classification coding Active CN106303366B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610685073.8A CN106303366B (en) 2016-08-18 2016-08-18 Video coding method and device based on regional classification coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610685073.8A CN106303366B (en) 2016-08-18 2016-08-18 Video coding method and device based on regional classification coding

Publications (2)

Publication Number Publication Date
CN106303366A CN106303366A (en) 2017-01-04
CN106303366B true CN106303366B (en) 2020-06-19

Family

ID=57679842

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610685073.8A Active CN106303366B (en) 2016-08-18 2016-08-18 Video coding method and device based on regional classification coding

Country Status (1)

Country Link
CN (1) CN106303366B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109561239B (en) * 2018-08-20 2019-08-16 上海久页信息科技有限公司 Piece caudal flexure intelligent selection platform
CN114723928A (en) * 2021-01-05 2022-07-08 华为技术有限公司 Image processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101389014A (en) * 2007-09-14 2009-03-18 浙江大学 Resolution variable video encoding and decoding method based on regions
CN103310411A (en) * 2012-09-25 2013-09-18 中兴通讯股份有限公司 Image local reinforcement method and device
CN103888710A (en) * 2012-12-21 2014-06-25 深圳市捷视飞通科技有限公司 Video conferencing system and method
CN103929640A (en) * 2013-01-15 2014-07-16 英特尔公司 Techniques For Managing Video Streaming

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101389014A (en) * 2007-09-14 2009-03-18 浙江大学 Resolution variable video encoding and decoding method based on regions
CN103310411A (en) * 2012-09-25 2013-09-18 中兴通讯股份有限公司 Image local reinforcement method and device
CN103888710A (en) * 2012-12-21 2014-06-25 深圳市捷视飞通科技有限公司 Video conferencing system and method
CN103929640A (en) * 2013-01-15 2014-07-16 英特尔公司 Techniques For Managing Video Streaming

Also Published As

Publication number Publication date
CN106303366A (en) 2017-01-04

Similar Documents

Publication Publication Date Title
Rao et al. A Survey of Video Enhancement Techniques.
US20200005468A1 (en) Method and system of event-driven object segmentation for image processing
WO2016101883A1 (en) Method for face beautification in real-time video and electronic equipment
US20170339409A1 (en) Fast and robust human skin tone region detection for improved video coding
US20210281718A1 (en) Video Processing Method, Electronic Device and Storage Medium
CN110189336B (en) Image generation method, system, server and storage medium
US9390511B2 (en) Temporally coherent segmentation of RGBt volumes with aid of noisy or incomplete auxiliary data
US20130308856A1 (en) Background Detection As An Optimization For Gesture Recognition
CN111383201A (en) Scene-based image processing method and device, intelligent terminal and storage medium
CN112954398B (en) Encoding method, decoding method, device, storage medium and electronic equipment
KR20090045288A (en) Method and device for adaptive video presentation
CN112019827B (en) Method, device, equipment and storage medium for enhancing video image color
WO2020108010A1 (en) Video processing method and apparatus, electronic device and storage medium
US20230127009A1 (en) Joint objects image signal processing in temporal domain
CN112686810A (en) Image processing method and device
CN107346417B (en) Face detection method and device
CN106303366B (en) Video coding method and device based on regional classification coding
CN110570441B (en) Ultra-high definition low-delay video control method and system
US20080181462A1 (en) Apparatus for Monitor, Storage and Back Editing, Retrieving of Digitally Stored Surveillance Images
US11354925B2 (en) Method, apparatus and device for identifying body representation information in image, and computer readable storage medium
CN111754412A (en) Method and device for constructing data pairs and terminal equipment
CN116506677A (en) Color atmosphere processing method and system
CN110958460B (en) Video storage method and device, electronic equipment and storage medium
CN112911299B (en) Video code rate control method and device, electronic equipment and storage medium
CN110765919A (en) Interview image display system and method based on face detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100040 Shijingshan District railway building, Beijing, the 16 floor

Applicant after: Chinese translation language through Polytron Technologies Inc

Address before: 100040 Shijingshan District railway building, Beijing, the 16 floor

Applicant before: Mandarin Technology (Beijing) Co., Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant