CN117422617B

CN117422617B - Method and system for realizing image stitching of video conference system

Info

Publication number: CN117422617B
Application number: CN202311321062.8A
Authority: CN
Inventors: 陶卿; 李森; 马月姣; 刘永珺; 杨龙保; 樊磊; 胡丰; 王海达; 陈川; 胡林
Original assignee: Huaneng Lancang River Hydropower Co Ltd
Current assignee: Huaneng Lancang River Hydropower Co Ltd
Priority date: 2023-10-12
Filing date: 2023-10-12
Publication date: 2024-04-09
Anticipated expiration: 2043-10-12
Also published as: CN117422617A

Abstract

The invention relates to the technical field of image processing, and discloses a method and a system for realizing image stitching of a video conference system, wherein the method comprises the following steps: carrying out image framing treatment on the conference video stream to obtain a framed conference image, and carrying out face detection treatment on the framed conference image to obtain a face detection image; performing gray level conversion on the face monitoring image to obtain a gray level face image, performing illumination equalization processing on the gray level face image to obtain an equalized face image, and performing scale standardization processing on the equalized face image to obtain a standard face image; carrying out image registration processing on the standard face image to obtain a registered face image; and performing image stitching processing on the registered face images to obtain stitched face images, performing smoothing processing on the stitched face images to obtain target stitched images, and sending the target stitched images to a terminal of the video conference for visualization operation to obtain a visualization result. The invention can improve the rationality of video conference system image splicing on the practical basis.

Description

Method and system for realizing image stitching of video conference system

Technical Field

The invention relates to the technical field of image processing, in particular to a method and a system for realizing image stitching of a video conference system.

Background

Video conferencing is a real-time visual connection between two or more remote parties that simulates a face-to-face conference, image stitching refers to combining multiple video streams or image sources in a video conference into a single large-screen display, which can increase interaction and communication between participants, and improve the efficiency and quality of the conference, while image stitching techniques can combine these video streams into a single screen to provide a better viewing experience.

At present, a picture-in-picture method is mainly adopted in the image splicing method of the video conference system, namely, a video is displayed on a main video window, meanwhile, other video windows are displayed above or beside the main video window, the main video window is a video for emphasizing a main speaker or an important participant, and the windows are video windows of other participants, but the method cannot clearly see the facial information of each participant in the conference, so that the emotional state of the participant cannot be known, a substitute meeting is easy to appear, or the windows display the faces of other people around the participant, so that the efficiency of the conference is greatly reduced.

Disclosure of Invention

In order to solve the problems, the invention provides a method and a system for realizing image stitching of a video conference system, which can improve the rationality of the image stitching of the video conference system on the practical basis.

In a first aspect, the present invention provides a method for implementing image stitching of a video conference system, including:

acquiring a conference video stream corresponding to a video conference, carrying out image framing treatment on the conference video stream to obtain a framed conference image, and carrying out face detection treatment on the framed conference image to obtain a face detection image;

performing gray level conversion on the face monitoring image to obtain a gray level face image, performing illumination equalization processing on the gray level face image to obtain an equalized face image, and performing scale standardization processing on the equalized face image to obtain a standard face image;

extracting image feature points in the standard face image, determining the optimal viewpoint position of each image in the standard face image according to the image feature points, and carrying out image registration processing on the standard face image according to the optimal viewpoint position to obtain a registered face image;

acquiring image parameters of the registered face images, performing image stitching processing on the registered face images according to the image parameters to obtain stitched face images, performing smoothing processing on the stitched face images to obtain target stitched images, and sending the target stitched images to a terminal of a video conference for visualization operation to obtain a visualization result.

The step of carrying out face detection processing on the framing conference image to obtain a face detection image comprises the following steps:

carrying out noise reduction treatment on the framing conference image to obtain a noise reduction conference image, and identifying a main image in the noise reduction conference image;

performing face detection on the main body image by using a preset face detection algorithm to obtain a face main body image, and acquiring user information corresponding to the framing conference image;

scheduling a user registration image corresponding to the user information, and calculating the coincidence ratio of the face main body image and the user registration image to obtain the face coincidence ratio;

screening the face main body image according to the face overlapping ratio to obtain a target face main body image;

and drawing a face detection frame corresponding to the target face main image, and outputting the noise reduction conference image according to the face detection frame and the target face main image to obtain a face detection image.

The calculating the coincidence ratio of the face main body image and the user registration image to obtain the face coincidence ratio comprises the following steps:

calculating the coincidence ratio of the face main image and the user registration image through the following formula:

；

Wherein,the degree of coincidence of the human faces is indicated,representing the pixel mean value corresponding to the face subject image,representing the pixel mean value corresponding to the user registered image,the number of pixels representing the face subject image,pixel covariance of the face subject image and the user registration image,the pixel variance corresponding to the face subject image,representing the pixel variance corresponding to the user registered image,representing the number of pixels of the user registered image.

The step of carrying out gray level transformation on the face monitoring image to obtain a gray level face image comprises the following steps:

identifying image pixel points in the face monitoring image, and reading three primary color component values of each pixel point in the image pixel points;

calculating the pixel gray value of each pixel point in the image pixel points according to the three primary color component values;

setting a color component value of the image pixel point according to the pixel gray value;

and generating a gray level image corresponding to the face monitoring image according to the color component value to obtain a gray level face image.

The step of carrying out illumination equalization processing on the gray-scale face image to obtain an equalized face image comprises the following steps:

measuring the number of pixels corresponding to the gray value of each pixel point in the gray face image, and constructing a gray histogram corresponding to the gray value of each pixel point in the gray face image according to the number of pixels;

The probability of occurrence of each gray value in the gray histogram is calculated by the following formula:

；

wherein,represents the probability of occurrence of each gray value in the gray histogram,the number of pixels with a gray value of a in the gray histogram is represented,representing the total number of pixels in the gray level histogram,representing the total number of different gray values in the gray histogram;

determining the gray level of the gray face image according to the occurrence probability;

according to the gray level and the occurrence probability, carrying out balanced mapping processing on the gray value of each pixel point in the gray face image by using a preset mapping function to obtain balanced gray values;

and generating the balanced face image of the gray face image according to the balanced gray value.

The extracting the image feature points in the standard face image comprises the following steps:

recognizing the face texture in the standard face image, and extracting the characteristics of the face texture to obtain the face texture characteristics;

determining a pixel direction corresponding to each pixel point in the standard face image, and determining a gray level change direction of each pixel point in the standard face image according to the pixel direction;

calculating the gray scale change rate of each pixel point in the standard face image, and determining important texture features in the face texture features according to the gray scale change direction and the gray scale change rate;

And according to the important texture features, positioning feature points in the standard face image to obtain image feature points in the standard face image.

The calculating the gray scale change rate of each pixel point in the standard face image comprises the following steps:

the gray scale change rate of each pixel point in the standard face image is calculated by the following formula:

；

wherein,represents the gray scale rate of each pixel point in the standard face image,represents the gray value of the j-th pixel point in the standard face image,represents the gray value of the j+1th pixel point in the standard face image,a forward difference quotient representing the gray value of the j-th and j+1th pixel points in the standard face image,the value obtained by deriving the j-th pixel point in the standard face image is represented,and the backward difference quotient of gray values of the jth and the (j+1) th pixel points in the standard face image is represented.

And performing image registration processing on the standard face image according to the optimal viewpoint position to obtain a registered face image, wherein the image registration processing comprises the following steps:

acquiring a time sequence corresponding to a pixel point in the standard face image, and calculating a pixel average value corresponding to the time sequence;

calculating a sequence variance corresponding to the time sequence according to the pixel average value, and analyzing the time domain characteristics of the standard face image according to the sequence variance and the pixel average value;

Performing Fourier transform on the standard face image to obtain a frequency domain image, and performing feature extraction on the frequency domain image to obtain frequency domain features;

and combining the time domain features, the frequency domain features and the optimal viewpoint positions, and performing image registration processing on the standard face image to obtain a registered face image.

The step of performing smoothing processing on the spliced face image to obtain a target spliced image comprises the following steps:

identifying an excessive region in the spliced face image, and extracting regional texture characteristics of the excessive region;

calculating the matching degree between the regional texture features, and performing fine adjustment processing on the regional texture features according to the matching degree to obtain target texture features;

according to the target texture characteristics, carrying out area updating on the transition area to obtain a target transition area;

carrying out fuzzification treatment on the target excessive region to obtain a fuzzy excessive region;

and according to the fuzzy transition region, carrying out image update on the spliced face image to obtain a target spliced image.

In a second aspect, the present invention provides a system for implementing image stitching in a videoconferencing system, the system comprising:

The face detection module is used for acquiring conference video streams corresponding to the video conference, carrying out image framing processing on the conference video streams to obtain framed conference images, and carrying out face detection processing on the framed conference images to obtain face detection images;

the image scale processing module is used for carrying out gray level transformation on the face monitoring image to obtain a gray level face image, carrying out illumination equalization processing on the gray level face image to obtain an equalized face image, and carrying out scale standardization processing on the equalized face image to obtain a standard face image;

the image registration module is used for extracting image feature points in the standard face image, determining the optimal viewpoint position of each image in the standard face image according to the image feature points, and carrying out image registration processing on the standard face image according to the optimal viewpoint position to obtain a registered face image;

the image stitching module is used for acquiring the image parameters of the registered face images, performing image stitching processing on the registered face images according to the image parameters to obtain stitched face images, performing smoothing processing on the stitched face images to obtain target stitched images, and sending the target stitched images to a terminal of a video conference for visualization operation to obtain a visualization result.

Compared with the prior art, the technical principle and beneficial effect of this scheme lie in:

according to the invention, the conference video stream is subjected to image framing processing, so that a conference image corresponding to the conference video stream can be obtained, the video is converted into an image, the difficulty of image stitching processing is reduced, further, the face monitoring image can be converted into a gray image through gray level conversion of the face monitoring image, so that color information is removed, the dimension of the face monitoring image can be reduced, the gray level face image can be subjected to illumination balance processing in the follow-up process, the image quality is improved, the information such as facial emotion of a conference person can be observed better in a video conference, further, the image feature point in the standard face image is extracted, the image representation of the standard face image can be obtained, the face distribution condition of the standard face image can be known through the image feature point, the determination of the follow-up optimal viewpoint position is facilitated, the relevant information of the face image can be reduced, the face image is stitched together can be conveniently, the practical registration processing is conveniently carried out, and the video conference image is displayed on a basis, and the video conference system is convenient to be realized. Therefore, the method and the system for realizing image splicing of the video conference system can improve the accuracy of realizing image splicing of the video conference system on the practical basis.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.

Fig. 1 is a flowchart of a method for implementing image stitching of a video conference system according to an embodiment of the present invention;

fig. 2 is a schematic block diagram of a system for implementing image stitching of a video conference system according to an embodiment of the present invention.

Detailed Description

It should be understood that the detailed description is presented by way of example only and is not intended to limit the invention.

The embodiment of the invention provides a method for realizing video conference system image stitching, and an execution subject of the method for realizing video conference system image stitching comprises, but is not limited to, at least one of a server, a terminal and the like which can be configured to execute the method provided by the embodiment of the invention. In other words, the method for implementing image stitching of the video conference system may be performed by software or hardware installed in a terminal device or a server device, where the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

Fig. 1 is a flowchart of a method for implementing image stitching in a video conference system according to an embodiment of the present invention. The method for realizing image stitching of the video conference system depicted in fig. 1 comprises the following steps:

s1, acquiring a conference video stream corresponding to a video conference, carrying out image framing processing on the conference video stream to obtain a framed conference image, and carrying out face detection processing on the framed conference image to obtain a face detection image.

The invention can further obtain the conference image corresponding to the conference video stream by carrying out image framing processing on the conference video stream, and convert the video into the image, thereby reducing the difficulty of image stitching processing, wherein the conference video stream is video data received during video conference, the framed conference image is the image corresponding to the conference video stream, a video person keeps a certain gesture during general video conference, the action amplitude is basically small, and further, the image framing processing on the conference video stream can be realized by an optical flow method through image representation of the conference video stream.

The invention can obtain the image about the human face in the frame conference image by carrying out the face detection processing on the frame conference image, improves the accuracy of the processing of the subsequent frame conference image, reduces the calculation amount of the subsequent image processing, can verify the human face and avoids the phenomenon that people replace meetings, wherein the face detection image is the image about the human face in the face detection image.

As an embodiment of the present invention, the performing face detection processing on the framed conference image to obtain a face detection image includes: the method comprises the steps of conducting noise reduction processing on a framing conference image to obtain a noise reduction conference image, identifying a main body image in the noise reduction conference image, conducting face detection on the main body image by means of a preset face detection algorithm to obtain a face main body image, obtaining user information corresponding to the framing conference image, scheduling a user registration image corresponding to the user information, calculating the coincidence ratio of the face main body image and the user registration image to obtain a face coincidence ratio, conducting screening on the face main body image according to the face coincidence ratio to obtain a target face main body image, drawing a face detection frame corresponding to the target face main body image, conducting image output on the noise reduction conference image according to the face detection frame and the target face main body image to obtain a face detection image.

The noise reduction conference image is an image obtained by removing noise and interference in the frame conference image, the main image is obvious content in the noise reduction conference image, such as a human face, an object or a photo, the preset human face detection algorithm is an algorithm for detecting the human face, such as a Viola-Jones algorithm, the human face main image is an image of the main image about the human face, if a non-conference person or the human face in the photo is identified, therefore, the human face main image needs to be verified to avoid display errors, the user registration image is a human face image used for verifying identity in a company by a user, the human face coincidence degree represents the coincidence degree of the human face main image and the user registration image, and the human face detection frame is a human face frame corresponding to the target human face main image so as to highlight the human face position.

Further, the noise reduction processing of the framing conference image may be implemented by a mean filtering algorithm, the mean filtering algorithm includes a gaussian filtering algorithm, the recognition of the main image in the noise reduction conference image may be implemented by a RetinaNet algorithm, the user information may be obtained by obtaining an IP address corresponding to the framing conference image, a user registration image corresponding to the user information may be obtained by scheduling from a corresponding information base, and the face detection image may be obtained by performing corresponding image clipping on the noise reduction conference image according to the face detection frame and the target face main image.

Further, the calculating the contact ratio of the face main image and the user registration image to obtain the face contact ratio includes:

；

wherein,the degree of coincidence of the human faces is indicated,representing the pixel mean value corresponding to the face subject image,representing the pixel mean value corresponding to the user registered image,the number of pixels representing the face subject image,pixel covariance of the face subject image and the user registration image, The pixel variance corresponding to the face subject image,representing the pixel variance corresponding to the user registered image,representing the number of pixels of the user registered image.

S2, carrying out gray level transformation on the face monitoring image to obtain a gray level face image, carrying out illumination equalization processing on the gray level face image to obtain an equalized face image, and carrying out scale standardization processing on the equalized face image to obtain a standard face image.

According to the invention, the face monitoring image can be converted into the gray image by carrying out gray conversion on the face monitoring image, so that color information is removed, the dimension of the face monitoring image can be reduced, the subsequent illumination balancing treatment can be carried out on the gray face image, the image quality is improved, and the information such as the facial emotion of a conference person can be observed better in a video conference, wherein the gray face image is an image obtained by converting the face monitoring image from color into gray.

As an embodiment of the present invention, the performing gray level transformation on the face monitoring image to obtain a gray level face image includes: and identifying image pixel points in the face monitoring image, reading three primary color component values of each pixel point in the image pixel points, calculating pixel gray values of each pixel point in the image pixel points according to the three primary color component values, setting color component values of the image pixel points according to the pixel gray values, and generating a gray image corresponding to the face monitoring image according to the color component values to obtain a gray face image.

The image pixel points are basic constituent elements in the face monitoring image, the three primary color component values are color values corresponding to RGB channels of each pixel point in the image pixel points, the pixel gray values represent brightness levels corresponding to the image pixel points, and the color component values are values obtained after three primary colors of the image pixel points are reset.

Further, the image pixels in the face monitoring image can be identified by a pixel detector, the pixel detector is compiled by a script language, reading the three primary color component values of each pixel in the image pixels can be realized by combining an image [ r, g, b ] grammar with a programming code, the average value can be calculated and used as the pixel gray value of each pixel in the image pixels, and generating the gray image corresponding to the face monitoring image can be realized by an image generator which is compiled by a Java language.

According to the invention, the illumination equalization processing is carried out on the gray-scale face image, so that the brightness influence generated by different environments can be reduced, and the definition degree and the visual effect of the gray-scale face image are improved, wherein the equalization face image is an image with balanced brightness obtained after the illumination equalization processing is carried out on the gray-scale face image.

As an embodiment of the present invention, the performing an illumination equalization process on the gray-scale face image to obtain an equalized face image includes: measuring the number of pixels corresponding to the gray value of each pixel point in the gray face image, constructing a gray histogram corresponding to the gray value of each pixel point in the gray face image according to the number of pixels, calculating the occurrence probability of each gray value in the gray histogram, determining the gray level of the gray face image according to the occurrence probability, carrying out balanced mapping processing on the gray value of each pixel point in the gray face image according to the gray level and the occurrence probability by using a preset mapping function to obtain balanced gray values, and generating the balanced face image of the gray face image according to the balanced gray values.

The number of pixels is the total number of gray values of each pixel point in the gray face image, the gray histogram is a distribution image of gray values of each pixel point in the gray face image, the occurrence probability represents the probability of occurrence of each gray value in the gray face image, the gray level represents the corresponding final gray level of the gray face image, i.e. the gray value with the highest occurrence probability, and the preset mapping function is a function for mapping and converting the gray value of each pixel point in the gray face image, such as a linear mapping function, and the balanced gray value is a value close to the gray value of the occurrence probability.

Further, the measurement of the number of pixels corresponding to the gray value of each pixel in the gray face image may be achieved through a counter, the construction of the gray histogram corresponding to the gray value of each pixel in the gray face image may be achieved through a drawing tool, for example, a visio tool, and the gray level of the gray face image may be determined according to the magnitude of the occurrence probability.

Further, the calculating the occurrence probability of each gray value in the gray histogram includes:

；

wherein,represents the probability of occurrence of each gray value in the gray histogram,the number of pixels with a gray value of a in the gray histogram is represented,representing the total number of pixels in the gray level histogram,representing the total number of different gray values in the gray histogram.

Further, according to the gray level and the occurrence probability, performing an equalizing mapping process on the gray value of each pixel point in the gray face image by using a preset mapping function to obtain an equalizing gray value, including:

the specific calculation formula of the preset mapping function is as follows:

；

Wherein,represents an equilibrium gray value, h represents a gray level, i represents a sequence of pixels in a gray face image,representing the total number of pixels in the gray-scale face image,represents the occurrence probability corresponding to the ith pixel point in the gray-scale face image,and the pixel value of the ith pixel point in the gray-scale face image is represented.

The invention can uniformly process the size of the balanced face image by performing scale standardization processing on the balanced face image, and provides convenience for the subsequent image registration processing on the standard face image, wherein the standard face image is an image obtained by uniformly processing the size of the balanced face image, and further, the standard face image can be obtained by scaling the balanced face image.

S3, extracting image feature points in the standard face image, determining the optimal viewpoint position of each image in the standard face image according to the image feature points, and carrying out image registration processing on the standard face image according to the optimal viewpoint position to obtain a registered face image.

According to the invention, the image characteristic points in the standard face image are extracted, so that the image representation of the standard face image can be obtained, and the face distribution condition of the standard face image can be known through the image characteristic points, so that the subsequent determination of the optimal viewpoint position is facilitated, wherein the image characteristic points are representative representations in the standard face image.

As one embodiment of the present invention, the extracting the image feature points in the standard face image includes: and recognizing the face texture in the standard face image, carrying out feature extraction on the face texture to obtain face texture features, determining the pixel direction corresponding to each pixel point in the standard face image, determining the gray level change direction of each pixel point in the standard face image according to the pixel direction, calculating the gray level change rate of each pixel point in the standard face image, determining important texture features in the face texture features according to the gray level change direction and the gray level change rate, and carrying out feature point positioning in the standard face image according to the important texture features to obtain image feature points in the standard face image.

The face texture is the face texture of the face in the standard face image, the face texture features are characterized textures in the face texture, the gray level change direction is the direction in which the direction of each pixel point in the standard face image changes fastest, the gray level change rate represents the gray level change intensity of each pixel point and surrounding pixel points in the standard face image, and the important texture features are important features in the face texture features.

Further, recognizing the face texture in the standard face image can be achieved through an LBP algorithm, feature extraction on the face texture can be achieved through a gray level co-occurrence matrix, determining the pixel direction corresponding to each pixel point in the standard face image can be achieved through a Sobel operator, the gray level change rate of each pixel point in the standard face image is calculated, important texture features in the face texture features are determined according to the gray level change direction and the gray level change rate, and feature point positioning is conducted in the standard face image according to the important texture features to obtain image feature points in the standard face image.

Further, the calculating the gray scale change rate of each pixel point in the standard face image includes:

；

wherein,represents the gray scale rate of each pixel point in the standard face image,represents the gray value of the j-th pixel point in the standard face image,represents the gray value of the j+1th pixel point in the standard face image,a forward difference quotient representing the gray value of the j-th and j+1th pixel points in the standard face image, The value obtained by deriving the j-th pixel point in the standard face image is represented,and the backward difference quotient of gray values of the jth and the (j+1) th pixel points in the standard face image is represented.

According to the invention, the optimal viewpoint position of each image in the standard face image is determined according to the image characteristic points, so that the optimal observation position corresponding to the standard face image can be obtained, and further the visual feeling of the video conference is improved, wherein the optimal viewpoint position is the optimal observation angle or observation position of each image in the standard face image, further, a characteristic image can be formed by fitting the image characteristic points, the image center point of the characteristic image is calculated, and the optimal viewpoint position of each image in the standard face image is determined according to the image center point.

According to the invention, the image registration processing is carried out on the standard face images according to the optimal viewpoint positions, so that the spatial consistency of each image in the standard face images can be kept, and a guarantee is provided for the subsequent image stitching processing, wherein the images obtained after the spatial adjustment of each image in the standard face images are registered.

As an embodiment of the present invention, the performing image registration processing on the standard face image according to the optimal viewpoint position to obtain a registered face image includes: obtaining a time sequence corresponding to pixel points in the standard face image, calculating a pixel average value corresponding to the time sequence, calculating a sequence variance corresponding to the time sequence according to the pixel average value, analyzing time domain features of the standard face image according to the sequence variance and the pixel average value, carrying out Fourier transformation on the standard face image to obtain a frequency domain image, carrying out feature extraction on the frequency domain image to obtain the frequency domain feature, and carrying out image registration processing on the standard face image by combining the time domain feature, the frequency domain feature and the optimal viewpoint position to obtain a registered face image.

The time sequence is a time sequence of pixel value changes corresponding to pixels in the standard face image, the sequence variance represents a measure of the change amplitude of the pixels in the time sequence, the time domain feature is a feature of the time sequence changes of the pixels in the standard face image, and the frequency domain feature is a feature in the frequency domain image, such as frequency domain amplitude, frequency domain frequency and the like.

Further, a time sequence corresponding to a pixel point in the standard face image may be obtained according to a time corresponding to the conference video stream, the average value of the pixels in the time sequence may be obtained by calculating an average value of the pixel values in the time sequence through an average function, a sequence variance corresponding to the time sequence may be obtained by calculating a variance calculator, a time domain feature of the standard face image may be obtained according to a numerical similarity degree of the sequence variance and the average value of the pixels, fourier transformation may be performed on the standard face image may be performed through a fourier transformation function, a frequency domain feature may be obtained by extracting a frequency spectrum and a frequency feature corresponding to the frequency domain image, a discrete coefficient related to the time domain feature, the frequency domain feature and the optimal viewpoint position between each image in the standard face image may be calculated respectively, and image registration processing may be performed on the standard face image according to the discrete coefficient.

S4, acquiring image parameters of the registered face images, performing image stitching processing on the registered face images according to the image parameters to obtain stitched face images, performing smoothing processing on the stitched face images to obtain target stitched images, and sending the target stitched images to a terminal of a video conference for visualization operation to obtain a visualization result.

According to the invention, related information of the images can be known by acquiring the image parameters of the registered face images, the registered face images are subjected to image stitching processing, so that the registered face images are conveniently gathered together, video presentation of a video conference is facilitated, the rationality of image stitching of a video conference system is improved on the practical basis, wherein the image parameters are image information, such as conference name information or time information, of the registered face images, the stitched face images are images after the registered face images are stitched together, further, the content of the registered face images can be updated according to the image parameters, then a stitching coordinate system is established, and the registered face images are stitched together according to the corresponding image parameters in the stitching coordinate system.

The method and the device can eliminate the boundary lines in the spliced face images by carrying out smoothing processing on the spliced face images, increase the continuity between the images and improve the overall visual perception, wherein the target spliced image is an image obtained by eliminating the boundary lines in the spliced face images after smoothing processing.

As an embodiment of the present invention, the smoothing processing of the stitched face image to obtain a target stitched image includes: identifying an excessive region in the spliced face image, extracting regional texture features of the excessive region, calculating the matching degree between the regional texture features, performing fine adjustment processing on the regional texture features according to the matching degree to obtain target texture features, performing region updating on the excessive region according to the target texture features to obtain a target excessive region, performing blurring processing on the target excessive region to obtain a blurred excessive region, and performing image updating on the spliced face image according to the blurred excessive region to obtain a target spliced image.

The transition region is a place where each image in the spliced face image is connected, the region texture features are total texture features of the transition region, the matching degree represents the matching degree between the region texture features, the target texture features are texture features obtained after the region texture features are slightly adjusted according to the numerical value of the matching degree, the target transition region is a region with higher texture matching degree between the region texture features, and the fuzzy transition region is a region obtained after the details, definition and sharpness in the target transition region are subjected to fuzzy processing.

Further, identifying the excessive region in the spliced face image may be achieved through an edge detection algorithm, for example, a canny algorithm, extracting the regional texture features of the excessive region may be achieved through the gray level co-occurrence matrix, the matching degree between the regional texture features may be obtained through calculating cosine similarity, fine tuning processing may be performed on the texture trend and texture in the regional texture features according to the matching degree, and blurring processing may be performed on the target excessive region may be achieved through a gaussian blurring method.

According to the method and the device for displaying the target spliced image, the target spliced image is sent to the terminal of the video conference to conduct visualization operation, and then the target spliced image can be displayed through the terminal, so that the video conference can be conducted, and the video effect of the video conference is improved.

According to the invention, the conference video stream is subjected to image framing processing, so that a conference image corresponding to the conference video stream can be obtained, the video is converted into an image, the difficulty of image stitching processing is reduced, further, the face monitoring image can be converted into a gray image through gray level conversion of the face monitoring image, so that color information is removed, the dimension of the face monitoring image can be reduced, the gray level face image can be subjected to illumination balance processing in the follow-up process, the image quality is improved, the information such as facial emotion of a conference person can be observed better in a video conference, further, the image feature point in the standard face image is extracted, the image representation of the standard face image can be obtained, the face distribution condition of the standard face image can be known through the image feature point, the determination of the follow-up optimal viewpoint position is facilitated, the relevant information of the face image can be reduced, the face image is stitched together can be conveniently, the practical registration processing is conveniently carried out, and the video conference image is displayed on a basis, and the video conference system is convenient to be realized. Therefore, the method for realizing image splicing of the video conference system provided by the embodiment of the invention can improve the accuracy of realizing image splicing of the video conference system on the practical basis.

As shown in fig. 2, a functional block diagram of a system for implementing image stitching in a video conference system according to the present invention is shown.

The system 100 for implementing image stitching of a video conference system according to the present invention may be installed in an electronic device. Depending on the implemented functions, the system for implementing image stitching of the video conference system may include a face detection module 101, an image scale processing module 102, an image registration module 103, and an image stitching module 104. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.

In the embodiment of the present invention, the functions of each module/unit are as follows:

the face detection module 101 is configured to obtain a conference video stream corresponding to a video conference, perform image framing processing on the conference video stream to obtain a framed conference image, and perform face detection processing on the framed conference image to obtain a face detection image;

the image scale processing module 102 is configured to perform gray level transformation on the face monitoring image to obtain a gray level face image, perform illumination equalization processing on the gray level face image to obtain an equalized face image, and perform scale standardization processing on the equalized face image to obtain a standard face image;

The image registration module 103 is configured to extract image feature points in the standard face image, determine an optimal viewpoint position of each image in the standard face image according to the image feature points, and perform image registration processing on the standard face image according to the optimal viewpoint position to obtain a registered face image;

the image stitching module 104 is configured to obtain image parameters of the registered face image, perform image stitching processing on the registered face image according to the image parameters to obtain a stitched face image, perform smoothing processing on the stitched face image to obtain a target stitched image, and send the target stitched image to a terminal of a video conference for performing visualization operation to obtain a visualization result.

In detail, the modules in the system 100 for implementing image stitching of a video conference system in the embodiment of the present invention use the same technical means as the method for implementing image stitching of a video conference system described in fig. 1, and can produce the same technical effects, which are not described herein.

The present invention also provides a storage medium storing a computer program which, when executed by a processor of an electronic device, can implement:

Claims

1. A method for implementing image stitching for a video conferencing system, the method comprising:

acquiring image parameters of the registered face images, performing image stitching processing on the registered face images according to the image parameters to obtain stitched face images, performing smoothing processing on the stitched face images to obtain target stitched images, and sending the target stitched images to a terminal of a video conference for visualization operation to obtain a visualization result;

combining the time domain features, the frequency domain features and the optimal viewpoint positions, and performing image registration processing on the standard face image to obtain a registered face image;

drawing a face detection frame corresponding to the target face main image, and outputting the noise reduction conference image according to the face detection frame and the target face main image to obtain a face detection image;

wherein a represents the face overlap ratio, μb represents the pixel mean value corresponding to the face main image, μc represents the pixel mean value corresponding to the user registration image, D1 represents the number of pixels of the face main image, σbc represents the pixel covariance of the face main image and the user registration image, σb represents the pixel variance corresponding to the face main image, σc represents the pixel variance corresponding to the user registration image, and D2 represents the number of pixels of the user registration image.

2. The method for implementing image stitching of a video conference system according to claim 1, wherein the performing gray level transformation on the face monitoring image to obtain a gray level face image includes:

3. The method for implementing image stitching of a video conference system according to claim 1, wherein the performing illumination equalization processing on the gray-scale face image to obtain an equalized face image includes:

Wherein E represents the probability of occurrence of each gray value in the gray histogram, F _a The number of pixels with a gray value of a in the gray level histogram is represented, b represents the total number of pixels in the gray level histogram, and d represents the total number of different gray values in the gray level histogram;

4. A method for implementing image stitching in a video conferencing system as claimed in claim 1, wherein said extracting image feature points in said standard face image comprises:

5. The method for implementing image stitching of a video conferencing system as claimed in claim 4, wherein said calculating a gray scale rate of each pixel in said standard face image comprises:

wherein H represents the gray scale change rate of each pixel point in the standard face image, M _j Representing gray value, M of jth pixel point in standard face image _j+1 Represents the gray value, e (M) _j +M _j+1 ) Forward difference quotient of gray value representing jth and (j+1) th pixel points in standard face image, e (M _j ) Represent the value of the j-th pixel point in the standard face image after derivation, e (M _j -M _j+1 ) Backward difference representing gray value of jth and (j+1) th pixel points in standard face image And (5) a quotient.

6. The method for implementing image stitching of a video conference system according to claim 1, wherein the smoothing the stitched face image to obtain a target stitched image includes:

7. A system for implementing video conferencing system image stitching according to any of claims 1-6, wherein the system comprises: