CN112651350A

CN112651350A - Video processing method and device

Info

Publication number: CN112651350A
Application number: CN202011602544.7A
Authority: CN
Inventors: 张传金; 刘治国; 万海峰; 陶维俊; 马金星; 姚莉莉; 邵磊
Original assignee: ANHUI CREARO TECHNOLOGY CO LTD
Current assignee: ANHUI CREARO TECHNOLOGY CO LTD
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2021-04-13

Abstract

The invention provides a video processing method, which is used for a video conference system, and is used for detecting the existence of a real face image, processing the face image and outputting the processed face image; and displaying a video conference picture comprising a human face image according to the image signal. The method comprises the steps of separating three different color channels of a face RGB image to obtain an R component image, a G component image and a B component image, carrying out high-frequency and low-frequency optimization, and obtaining and displaying an optimized face. According to the invention, the low-frequency component which is mainly the background part is suppressed, and the high-frequency component image part is enhanced based on the illumination parameter, so that a more accurate pre-processed image is obtained, and the method is convenient for processing the high-accuracy image display and processing under the dark environment with uneven illumination.

Description

Video processing method and device

Technical Field

The present invention relates to the field of video image processing technologies, and in particular, to a video processing method and apparatus.

Background

A video conference system comprises a software video conference system and a hardware video conference system, which refer to individuals or groups in two or more different places, and distribute various data such as static and dynamic images, voice, characters, pictures and the like of people to computers of various users through various existing telecommunication transmission media, so that the users dispersed geographically can be copolymerized at one place, and the information can be exchanged through various modes such as graphics, sound and the like, and the understanding ability of the two parties to the content is improved. At present, the video conference gradually develops towards the direction of multi-network cooperation, high-definition and development.

The video conference is the most advanced communication technology at present, and can realize high-efficiency and high-definition remote conference and office only by means of the internet, has unique advantages in the aspects of continuously improving the communication efficiency of users, reducing the travel cost of enterprises, improving the management effect and the like, has partially replaced business trips, and becomes the latest mode of remote office. In recent years, the application range of video conferences is rapidly expanded, and the video conferences are seen everywhere from the fields of governments, public security, army, courts, science and technology, energy, medical treatment, education and the like, and cover the aspects of social life. The latest data of the chronological information shows that the market scale of the domestic video conference in 2009 is about 43.6 billion yuan.

High-definition display of a face image in a video conference is a technical problem, the face image is unclear due to different illumination and different shading degrees of different conference scenes, and a traditional face image processing process mainly comprises light compensation, gray level transformation, histogram equalization, normalization, geometric correction, filtering, sharpening and the like of the face image. The technical problem of face image recognition under the scene with weak illumination intensity is always solved, and the color clarity of the face and the surrounding environment is ensured as much as possible, so that subsequent video display, processing and video analysis are facilitated.

Disclosure of Invention

In view of the above, the present invention provides a video processing method and apparatus, which are used in a video conference system to process technical problems of uneven illumination and difficulty in displaying and processing high-definition face images in a dark environment by frequency band optimization.

The technical scheme of the invention is as follows:

a video processing method for a video conference system including a video conference device and a display device, the method comprising:

detecting sound generated by a sound source of the conference space and outputting a positioning signal;

judging whether a real face image exists in the sub-image block of the conference image corresponding to the sound source according to the positioning signal so as to process the face image and output the processed face image;

and displaying a video conference picture comprising a human face image according to the image signal.

Correspondingly, the processing the face image and outputting the processed image signal includes:

separating three different color channels of a human face RGB image to obtain an R component image, a G component image and a B component image, and obtaining an R low-frequency component image, an R high-frequency component image, a G low-frequency component image, a G high-frequency component image, a B low-frequency component image and a B high-frequency component image which respectively correspond to the R component image, the G component image and the B component image;

and respectively carrying out optimization processing on the low-frequency component image and the high-frequency component image.

Correspondingly, the performing optimization processing on the low-frequency component image and the high-frequency component image respectively includes:

respectively executing low-frequency suppression on the R low-frequency component image, the G low-frequency component image and the B low-frequency component image to obtain an R component low-frequency suppression image, a G component low-frequency suppression image and a B component low-frequency suppression image which respectively correspond to the R low-frequency component image, the G low-frequency component image and the B low-frequency component image;

enhancing the R high-frequency component image, the G high-frequency component image and the B high-frequency component image;

acquiring incident light parameters, and respectively enhancing the R high-frequency component image, the G high-frequency component image and the B high-frequency component image;

the incident light parameter r0 is set to a constant K/average gray value of the image;

and obtaining corresponding R high-frequency component enhanced images, G high-frequency component enhanced images and B high-frequency component enhanced images based on the product of the incident light parameter R0 and the R high-frequency component images, the G high-frequency component images and the B high-frequency component images.

Correspondingly, the generating step of synthesizing the processed face image based on the R component low-frequency suppression image, the G component low-frequency suppression image, and the B component low-frequency suppression image includes:

generating an R component optimized image based on the R component low-frequency suppression image and the R high-frequency component enhanced image;

generating a G component optimized image based on the G component low-frequency suppression image and the G high-frequency component enhanced image;

generating a B component optimized image based on the B component low-frequency suppression image and the B high-frequency component enhanced image;

and synthesizing and generating a processed face image based on the R component optimized image, the G component optimized image and the B component optimized image.

In addition, the present invention also provides a video processing apparatus for a video conference system, where the video conference system includes a video conference device and a display device, the apparatus includes:

the detection module is used for detecting the sound generated by the sound source of the conference space and outputting a positioning signal;

the processing module is used for judging whether a real face image exists in the subimage block of the conference image corresponding to the sound source according to the positioning signal so as to process the face image and output the processed face image;

and the display module displays a video conference picture comprising a human face image according to the image signal.

Correspondingly, the processing the face image and outputting the processed face image includes:

In the scheme of the embodiment of the invention, the video processing method is used for a video conference system, the video conference system comprises video conference equipment and display equipment, and the method comprises the following steps: detecting sound generated by a sound source of the conference space and outputting a positioning signal; judging whether a real face image exists in the sub-image block of the conference image corresponding to the sound source according to the positioning signal so as to process the face image and output the processed face image; and displaying a video conference picture comprising a human face image according to the image signal. The method comprises the steps of separating three different color channels of a human face RGB image to obtain an R component image, a G component image and a B component image, and obtaining an R low-frequency component image, an R high-frequency component image, a G low-frequency component image, a G high-frequency component image, a B low-frequency component image and a B high-frequency component image which respectively correspond to the R component image, the G component image and the B component image; a low-frequency optimization step, namely performing low-frequency suppression on the R component image, the G component image and the B component image to obtain an R component low-frequency suppression image, a G component low-frequency suppression image and a B component low-frequency suppression image which respectively correspond to the R component, the G component and the B component; a high-frequency optimization step of enhancing the R high-frequency component image, the G high-frequency component image and the B high-frequency component image; and a generating step of synthesizing the processed face image based on the R component low-frequency suppression image, the G component low-frequency suppression image and the B component low-frequency suppression image. According to the invention, the low-frequency component which is mainly the background part is inhibited, and the high-frequency component image part is enhanced based on the illumination parameter, so that a more accurate pre-processed image is obtained, and the technical problems that the illumination is uneven and the high-accuracy image display and processing are difficult to perform in a dark environment are solved.

Drawings

FIG. 1 is a flowchart of a method according to a first embodiment of the present invention;

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

The invention implements a video processing method, is used for the video conference system, the said video conference system includes video conference apparatus and display device, the said method includes:

specifically, when a video conference is performed, the processor acquires a conference image of the conference space via the image detection device. When the sound source detection device detects the sound source of the conference space, the sound source detection device outputs a positioning signal. In the embodiment, the sound source detecting device determines whether the sound intensity of the sound source exceeds a threshold (Thresholds) of the sound intensity and the sound duration, for example, to determine whether to output the localization signal to the processor. The processor receives the positioning signal to judge whether a real human face image exists in the sub-image block of the conference image corresponding to the sound source according to the positioning signal so as to output an image signal. The display device displays a close-up conference picture including a real face image and a speaker's voice by dialing in accordance with the image signal. Therefore, the video conference method of the embodiment can enable the video conference device to actively track the sound of the conference member, and can operate the display device to synchronously display the face image of the conference member emitting the sound, so as to provide a good video conference effect.

Specifically, it is well known in the prior art how to acquire an R component image, a G component image, and a B component image of an image, and the embodiment is not described more.

Specifically, the R low-frequency component image, the R high-frequency component image, the G low-frequency component image, the G high-frequency component image, the B low-frequency component image, and the B high-frequency component image corresponding to the R component image, the G component image, and the B component image are acquired, and DCT transform, wavelet transform, or fourier transform may be performed on the R component image, the G component image, and the B component image, respectively. In the following, fourier transform is taken as an example, and from the physical effect, the fourier transform is to convert an image from a spatial domain to a frequency domain, the physical meaning of the fourier transform is to transform a grayscale distribution function of the image into a frequency distribution function of the image, and the inverse fourier transform is to transform a frequency distribution function of the image into a grayscale distribution function. Low-frequency component (low-frequency signal): the area representing the slow change of brightness or gray value in the image, i.e. the large flat area in the image, describes the main part of the image, which is a comprehensive measure of the intensity of the whole image. High-frequency component (high-frequency signal): corresponding to the parts of the image where the change is severe, namely the edges (contours) or noise and the detailed parts of the image. Mainly for the measurement of image edges and contours, while the human eye is more sensitive to high frequency components. The reason why the noise corresponds to the high frequency component is that the image noise is high frequency in most cases.

A low-frequency optimization step, in which low-frequency suppression is respectively performed on the R low-frequency component image, the G low-frequency component image and the B low-frequency component image, and an R component low-frequency suppression image, a G component low-frequency suppression image and a B component low-frequency suppression image which respectively correspond to the R low-frequency component image, the G low-frequency component image and the B low-frequency component image are obtained;

specifically, the R low-frequency component image, the G low-frequency component image, and the B low-frequency component image are respectively input to a high-pass filter, so that low frequency is further suppressed, a high-frequency portion is enhanced, and edges or lines of the face image are clearer.

A high-frequency optimization step of enhancing the R high-frequency component image, the G high-frequency component image and the B high-frequency component image;

preferably, the high-frequency optimizing step of enhancing the R high-frequency component image, the G high-frequency component image, and the B high-frequency component image includes:

here, the average gray value of the image should be the average gray corresponding to the current R, G, and B high-frequency component images; where a constant K is used to control the overall brightness of the enhanced image.

Preferably, the enhancing the R high-frequency component image, the G high-frequency component image, and the B high-frequency component image includes:

And a generating step of synthesizing the preprocessed face image based on the R component low-frequency suppression image, the G component low-frequency suppression image and the B component low-frequency suppression image.

Preferably, the generating step of synthesizing the preprocessed face image based on the R component low-frequency suppression image, the G component low-frequency suppression image, and the B component low-frequency suppression image includes:

and synthesizing and generating a preprocessed face image based on the R component optimized image, the G component optimized image and the B component optimized image.

Example two

It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules may all be implemented in software invoked by a processing element. Or may be implemented entirely in hardware. And part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the determining module 310 may be a processing element separately set up, or may be implemented by being integrated into a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the determining module 310 may be called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when some of the above modules are implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor that can call program code. As another example, these modules may be integrated together, implemented in the form of a system-on-a-chip (SOC).

The bus 130 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus 130 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

In addition, the embodiment of the invention also provides a readable storage medium, wherein the readable storage medium stores computer execution instructions, and when a processor executes the computer execution instructions, the media data processing method based on remote interaction and cloud computing is realized.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be regarded as illustrative only and not as limiting the present specification. Various modifications, improvements, and offset processing may occur to those skilled in the art, though not expressly stated herein. Such modifications, improvements, and offset processing are suggested in this specification and still fall within the spirit and scope of the exemplary embodiments of this specification.

Also, the description uses specific words to describe embodiments of the description. Such as "one possible implementation," "one possible example," and/or "exemplary" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "one possible implementation," "one possible example," and/or "exemplary" in various places throughout this specification are not necessarily referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.

Moreover, those skilled in the art will appreciate that aspects of the present description may be illustrated and described in terms of several patentable species or contexts, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of this description may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present description may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Computer program code required for the operation of various portions of this specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or on a large data platform. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Additionally, the order in which the elements and lists are processed, the use of alphanumeric characters, or other designations in this specification is not intended to limit the order in which the processes and methods of this specification are performed, unless otherwise specified in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented through interactive services, they may also be implemented through software-only solutions, such as installing the described system on an existing large data platform or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

Claims

1. A video processing method for use in a video conferencing system including a video conferencing device and a display device, the method comprising:

2. The video processing method according to claim 1, wherein the processing the face image and outputting the processed image signal comprises:

3. The video processing method according to claim 2, wherein the performing optimization processing on the low-frequency component image and the high-frequency component image respectively comprises:

and enhancing the R high-frequency component image, the G high-frequency component image and the B high-frequency component image.

4. The video processing method according to claim 3, wherein the performing optimization processing on the low-frequency component image and the high-frequency component image respectively comprises:

the incident light parameter r₀Set to a constant K/average gray value of the image;

based on the incident light parameter r₀And respectively multiplying the R high-frequency component image, the G high-frequency component image and the B high-frequency component image to obtain a corresponding R high-frequency component enhanced image, a G high-frequency component enhanced image and a B high-frequency component enhanced image.

5. The image preprocessing method for face recognition according to claim 4, wherein the generating step of synthesizing the processed face image based on the R component low-frequency suppression image, the G component low-frequency suppression image, and the B component low-frequency suppression image comprises:

6. A video processing apparatus for use in a video conference system including a video conference device and a display device, the apparatus comprising:

7. The video processing apparatus according to claim 5, wherein said processing the face image and outputting the processed face image comprises:

8. The video processing apparatus according to claim 6, wherein the performing optimization processing on each of the low-frequency component image and the high-frequency component image comprises:

9. The video processing apparatus according to claim 7, wherein the performing optimization processing on each of the low-frequency component image and the high-frequency component image includes:

10. The image preprocessing method for face recognition according to claim 8, wherein the generating step of synthesizing the processed face image based on the R component low-frequency suppression image, the G component low-frequency suppression image, and the B component low-frequency suppression image comprises: