CN116600147B - Method and system for remote multi-person real-time cloud group photo - Google Patents

Method and system for remote multi-person real-time cloud group photo Download PDF

Info

Publication number
CN116600147B
CN116600147B CN202211716741.0A CN202211716741A CN116600147B CN 116600147 B CN116600147 B CN 116600147B CN 202211716741 A CN202211716741 A CN 202211716741A CN 116600147 B CN116600147 B CN 116600147B
Authority
CN
China
Prior art keywords
image
photo
user
real
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211716741.0A
Other languages
Chinese (zh)
Other versions
CN116600147A (en
Inventor
郑娃龙
张威
黄浩辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Ziweiyun Technology Co ltd
Original Assignee
Guangzhou Ziweiyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Ziweiyun Technology Co ltd filed Critical Guangzhou Ziweiyun Technology Co ltd
Priority to CN202211716741.0A priority Critical patent/CN116600147B/en
Publication of CN116600147A publication Critical patent/CN116600147A/en
Application granted granted Critical
Publication of CN116600147B publication Critical patent/CN116600147B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/21805Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/272Means for inserting a foreground image in a background image, i.e. inlay, outlay
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a method for realizing remote multi-person real-time cloud group photo, which is based on a portrait segmentation and image fusion technology, wherein image data is pushed to a streaming media server by adopting RTMP protocol coding, and real-time video stream decoding of each path of client side is pulled from the streaming media server, image fusion processing is carried out after portrait segmentation is carried out to obtain image data of group photo, the data is correspondingly pulled by the coding push client side, and the situation of the same frame group photo of each person after picture fusion can be seen in real time only by using a camera to collect personal whole-body portrait video data and pushing the same to a server side and pulling new fused video stream by relying on an AI portrait segmentation and background fusion technology and the streaming media technology. In addition, in order to ensure the effect, the station positions of the user can be dynamically adjusted, the background can be switched in real time to switch the background of the photo, the front station position and the back station position are set, the scaling of the figure is set, even the background is replaced by a dynamic video and the like, and the front end can be dynamically updated accordingly.

Description

Method and system for remote multi-person real-time cloud group photo
Technical Field
The invention relates to the technical field of computers, in particular to a method and a system for remote multi-person real-time cloud group photo.
Background
Today, with the rapid development of electronic technology. The shooting mode generally requires participants to be in the scene, the participants stand in sequence according to the arrangement, then shoot under the guidance of a photographer, and then repair the pictures in the later stage. If some people cannot participate in the scene due to other conditions, the people can unfortunately miss the group photo with everyone, and certainly, the people can also join in through the post P-picture processing. The photographing mode is inflexible, and the personalized requirements of users cannot be met. Particularly in the current environment, various activities are generally performed on-line, and cannot take part in outside, so that the demand of cloud group photo is also more and more vigorous.
At present, a cloud group photo mode exists in the market, but basically, each person is required to self-shoot a static image, and the static image is uploaded to a server for splicing and fusion. Such as implementation of cloud graduation for schools during epidemic situations. The modes have a common characteristic that a user selects a specific position, then self-photograph is uploaded after self-photographing, and background zooming and background fusion are carried out. The user cannot see the composition process, and the user can only see the final effect. If the user feels bad, the user can only find the angle and take the images again after making the expression, and then upload the images again. In the process, each person takes own pictures, no communication exists in the whole process, no guidance of photographers exists, and the hot atmosphere for shooting collective group pictures is lacking.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. The invention discloses a method for remote multi-person real-time cloud group photo, which is based on a portrait segmentation and image fusion technology, wherein image data is pushed to a streaming media server by adopting RTMP protocol coding, and the real-time video stream of each client is pulled from the streaming media server to decode the image after portrait segmentation, and then image fusion processing is carried out to obtain the image data of the group photo, the data is correspondingly pulled to the encoding push client, and the method for remote multi-person real-time cloud group photo comprises the following steps:
step 1, when receiving a group photo request submitted by a user, a real-time cloud group photo platform establishes a group photo task, wherein the group photo task is to create a single room number to avoid conflict of a plurality of group photo tasks at the same time, and a server of the real-time cloud group photo platform transmits user ids, video push stream addresses and stream addresses after group photo participation in each room;
step 2, a client of a user collects real-time image data through a video data collection device, after the client collects the real-time image data, the client adopts an H264 or H265 coding mode to code and then adopts a streaming media protocol to send the coded video data, and after the client carries out communication with the rear end according to the assigned room number, the collected video data is coded and processed and then flows to a streaming media server according to a designated streaming address by adopting an RTMP protocol;
step 3, pulling and decoding the real-time video stream pushed by the client into normal video data based on ffmpeg, then calling a portrait segmentation algorithm to perform real-time matting, wherein a plurality of video streams pushed by the client exist in the same room, and after mask data of an image are acquired, fusion of multiple ways of portraits is performed;
step 4, calculating the initial area position of the photo in the background according to the priority of each user, the number of people participating in the photo and the scaling parameter of the people, which are set by each user, and then fusing the portrait data buckled in the step 3 on the appointed background according to a ranking strategy;
and step 5, because the user information, the room information and the fused reasoning address which are transmitted by the back end of the real-time cloud group photo platform and participate in the group photo are acquired, aiming at the fused image data, namely the real-time group photo, video coding compression is required to be carried out according to the appointed stream address by adopting the scheme same as that of the client, the video is pushed to a streaming media server by using an RTMP protocol, and the client pulls the video stream address, so that the fused group photo image can be seen in real time.
Furthermore, the client is window PC equipment, and the video data acquisition device is a USB camera for acquiring real-time image data.
Still further, the real-time matting further includes: in order to improve the effect of image matting, an image matting algorithm is designed based on a Mobilene-v 3 or Resnet50 deep learning algorithm, and the image matting algorithm is deployed by using a TensorRT reasoning engine.
Still further, the step 4 further includes: firstly, informing the back end of the real-time cloud group photo platform of related user information by a client, then issuing the back end to a server to explain the number of the group photo of a plurality of users, then calculating the group photo area of each user in the background, and automatically cutting out the person area beyond the boundary of the background group photo area according to the adjustable scaling coefficient, if judging that the video data acquisition devices of a plurality of users have different camera environments, manually updating the corresponding coefficient by informing a background management platform so as to achieve a relatively coordinated and natural picture, and reducing abrupt; the arrangement logic of the multiple paths of video streams is arranged from the middle to the two sides, and the left side is arranged in preference to the right side, and the arrangement logic is arranged in the middle no matter whether the video paths are odd or even.
Furthermore, kalman tracking is added to the image acquired in real time, so that the shake of a human frame is reduced to alleviate the shake of a picture caused by the back and forth movement of a human figure, the minimum rectangular frame divided by the current frame is compared with the iou of the tracking frame of the previous frame, and the transverse width ratio of the two frames is compared, when a certain coefficient is not exceeded, the frame of the previous frame is used as the human figure frame of the current frame, and the picture can reduce certain shake.
Further, the calculating and ranking processing manner in the step 4 includes the following steps:
step 401, calculating the width of the user picture in the background area, wherein the width user_group_photo_w of the user picture in the background area is obtained by multiplying the width bg_width of the background image designated by the back end by a default scaling factor group_photo_ratio, and multiplying the ratio of the current user photo number user_camera_num to the total number of people participating in the photo, and the calculation formula of the user_group_photo_w is as follows:
user_group_photo_w=bg_width*group_photo_ratio*(user_camera_num/total_camera_num);
in step 402, the user's camera image is scaled uniformly, and is cut out beyond the user_group_photo_w, when the image person scale is too large and too small in the background area, the real-time cloud group photo platform can dynamically adjust the person scale to coordinate the person, when the user's camera image exceeds the camera width user_group_photo_w, the cutting position is recalculated, the crop_x represents the x coordinate of the upper left corner of the cut camera image, the crop_y represents the y coordinate of the upper left corner of the cut camera image, the crop_width represents the width of the cut camera image, and the height of the cut camera image represented by the crop_height, and the user's camera image is represented as:
image=resize(image,scale)
image=crop(image,crop_x,crop_y,crop_width,crop_height);
step 403, calculating a starting point coordinate location_x and location_y of each camera picture in a background area by participating in the total width of the cameras of the photo users, and keeping the integrated total picture centered to obtain the discharge position of the user picture, wherein the starting point coordinates location_x and location_y are expressed as:
locate_x=total_camera_width-user_group_photo_w
locate_y=bg_height-distance_from_bottom-height;
step 404, merging the original image and the background together by means of image pixel weighting.
bg_b=src_b*cls+(1-cls)*bg_b
bg_g=src_g*cls+(1-cls)*bg_g
bg_r=src_r*cls+(1-cls)*bg_r
Wherein src is an original image, cls is an AI algorithm for segmentation to obtain a probability map, and bg is a background image.
The invention also discloses a system for remote multi-person real-time cloud group photo, which is based on the technology of image segmentation and image fusion, wherein image data is encoded and pushed to a streaming media server by adopting an RTMP protocol, and the system for remote multi-person real-time cloud group photo comprises the following modules:
the system comprises a connection establishing module, a real-time cloud group photo platform, a video processing module and a group photo processing module, wherein the connection establishing module establishes group photo tasks when receiving group photo requests submitted by users, and the group photo tasks establish an independent room number to avoid conflict among a plurality of group photo tasks at the same time, wherein a server of the real-time cloud group photo platform transmits user ids, video push stream addresses and stream addresses after group photo participation in each room;
the system comprises an image acquisition and uploading module, wherein a client of a user acquires real-time image data through a video data acquisition device, the client encodes the real-time image data by adopting an H264 or H265 encoding mode and then transmits the encoded video data by adopting a streaming media protocol, and the acquired video data is encoded and processed according to an assigned room number and a rear end and then is pushed to a streaming media server by adopting an RTMP protocol according to an appointed streaming address after being communicated with the rear end;
the image real-time segmentation module is used for pulling and decoding a real-time video stream pushed by a client into normal video data based on ffmpeg, then invoking a portrait segmentation algorithm to perform real-time matting, wherein a plurality of video streams pushed by the client exist in the same room, and after mask data of an image are acquired, multi-path portrait fusion is performed;
the image real-time fusion module is used for calculating the initial area position of the photo in the background according to the priority of each user, the number of people participating in the photo and the scaling parameter of the people, which are set by each user, and then fusing the buckled portrait data on the appointed background according to the ranking strategy;
and the image issuing module acquires user information, room information and a fused reasoning address of the participation photo issued by the rear end of the real-time cloud photo platform, and aims at fused image data, namely the real-time photo, adopts the scheme which is the same as that of the client to carry out video coding compression, then pushes the video coding compression to a streaming media server according to a designated streaming address by using an RTMP protocol, and the client pulls the video streaming address, so that the fused photo image can be seen in real time.
Furthermore, the client is window PC equipment, and the video data acquisition device is a USB camera for acquiring real-time image data.
Still further, the real-time matting further includes: in order to improve the effect of image matting, an image matting algorithm is designed based on a Mobilene-v 3 or Resnet50 deep learning algorithm, and the image matting algorithm is deployed by using a TensorRT reasoning engine.
Still further, the image real-time segmentation module further includes: firstly, informing the back end of the real-time cloud group photo platform of related user information by a client, then issuing the back end to a server to explain the number of the group photo of a plurality of users, then calculating the group photo area of each user in the background, and automatically cutting out the person area beyond the boundary of the background group photo area according to the adjustable scaling coefficient, if judging that the video data acquisition devices of a plurality of users have different camera environments, manually updating the corresponding coefficient by informing a background management platform so as to achieve a relatively coordinated and natural picture, and reducing abrupt; the arrangement logic of the multiple paths of video streams is arranged from the middle to the two sides, and the left side is arranged in preference to the right side, and the arrangement logic is arranged in the middle no matter whether the video paths are odd or even.
Furthermore, kalman tracking is added to the image acquired in real time, so that the shake of a human frame is reduced to alleviate the shake of a picture caused by the back and forth movement of a human figure, the minimum rectangular frame divided by the current frame is compared with the iou of the tracking frame of the previous frame, and the transverse width ratio of the two frames is compared, when a certain coefficient is not exceeded, the frame of the previous frame is used as the human figure frame of the current frame, and the picture can reduce certain shake.
Compared with the prior art, the invention has the beneficial effects that: the invention relies on the AI image segmentation and background fusion technology and the streaming media technology, and the same-frame group photo condition of each person after the picture fusion can be seen in real time only by using a camera to collect the personal whole-body image video data and pushing the personal whole-body image video data to a server and then pulling a new fused video stream. In addition, in order to ensure the effect, the station positions of the user can be dynamically adjusted, the background can be switched in real time to switch the background of the photo, the front station position and the back station position are set, the scaling of the figure is set, even the background is replaced by a dynamic video and the like, and the front end can be dynamically updated accordingly. By the mode, people can conveniently and rapidly group photo in different places, and even can punch cards in scenic spots in clouds. The voice frequency function can be added, so that face-to-face voice communication, mutual guidance of station conditions of the opposite sides and the like can be realized when different-place group shadows are formed. Not only is quick and convenient, but also is efficient, and can also interact.
Drawings
The invention will be further understood from the following description taken in conjunction with the accompanying drawings. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the embodiments. In the figures, like reference numerals designate corresponding parts throughout the different views.
Fig. 1 is an overall flowchart of a remote multi-person real-time cloud group photo method of the present invention.
FIG. 2 is a flow chart of model transformation and deployment according to an embodiment of the present invention.
Fig. 3 is a schematic flow chart of multi-person group photo in the system of the remote multi-person real-time cloud group photo of the invention.
Detailed Description
As shown in fig. 1-3, the present invention relies on the technology of image segmentation and image fusion, and uses RTMP protocol encoding to push image data to a streaming media server, and further uses the streaming media server to pull real-time video stream decoding of each path of clients to perform image segmentation, and then performs image fusion processing, so as to obtain image data of a group photo, and uses the data to encode the push client to correspondingly pull video stream.
For each group photo task, a separate room number needs to be created, so that the conflict of a plurality of group photo tasks at the same time is avoided. The user id of each room reference and group photo, the video push stream address and the stream address after group photo are issued by the server, so that the uniqueness is ensured.
The whole scheme flow chart is shown in fig. 1 in detail, and the specific method and flow are as follows:
1. video data acquisition and transmission
Because video image data is larger, the transmission in a frame mode consumes larger bandwidth resources, and the coded video data is generally transmitted by adopting a streaming media protocol after being coded in a mode of H264 or H265 and the like. The client typically employs a window PC device, and uses a USB camera to collect real-time image data. Based on the powerful audio and video processing capability of ffmpeg, the module is mainly used for encoding collected video data according to an assigned room number and a designated stream address after communicating with the rear end, and pushing the video data to a streaming media server by adopting an RTMP protocol.
2. Data streaming and portrait segmentation
And pulling and decoding the real-time video stream pushed by the client based on the ffmpeg to form normal video data, and then calling a portrait segmentation algorithm to perform real-time matting.
In order to improve the effect of image matting, the image matting algorithm is designed based on backbones such as Mobilene-v 3 and Resnet50 by using a deep learning algorithm, the Mobilene-v 3 is faster than Resnet50 by a few milliseconds, the accuracy is not great, the Resnet50 is slightly better in visual effect, and a Resnet50 model is much larger. The algorithm is deployed by using a TensorRT reasoning engine, and good instantaneity and a matting effect can be achieved on display cards such as Indellovely P40, RTX3060, RTX2080Ti and the like, and specific model conversion deployment is shown in figure 2.
Video streams pushed by a plurality of clients exist in the same room, and after mask data of the images are acquired, fusion of multiple paths of images is needed.
3. Portrait fusion scheme
For the figure data buckled in 2, fusion on the specified background is required. How can users who participate in the group photo be ensured to see each other to appear in the same background picture and have better effect?
The general area position of the photo is calculated in the background by the parameters of the priority of each user, the number of people participating in the photo, the scaling of the people and the like, which are set by each user, and the general calculation and arrangement processing logic is described as follows:
the method comprises the steps that firstly, the client side informs the back end of relevant user information, the back end sends the back end to the server to explain that a plurality of users participate in the photo, the photo number of each user is what, then the photo area of each user in the background is calculated, and according to the adjustable scaling coefficient, the person area exceeding the boundary of the background photo area is automatically cut. According to different camera environments, corresponding coefficients can be manually updated through a background management platform, so that a relatively coordinated and natural picture is achieved, and the abrupt sense is reduced.
For the joint photographic arrangement logic of the multipath video streams, a relatively simple arrangement algorithm is adopted, the logic is arranged from the middle to the two sides, and the left side is arranged in preference to the right side, and the video paths are arranged in the middle no matter whether the video paths are odd or even.
In order to solve the problem of image shake caused by the fact that the person with the portrait matting moves back and forth, the product still adds Kalman tracking, and shake of a portrait frame is reduced. The minimum rectangular frame divided by the current frame is compared with the iou of the tracking frame of the previous frame, and when the transverse width ratio of the two frames does not exceed a certain coefficient, the frame of the previous frame is used as the portrait frame of the current frame, and the picture can reduce certain jitter.
The general calculation and arrangement processing mode is as follows:
1. and calculating the width of the user picture in the background area.
And obtaining the occupied width user_group_photo_w of the user picture in the background area by multiplying the width bg_width of the background image designated by the rear end by a default proportionality coefficient group_photo_ratio and multiplying the ratio of the current user photo number user_camera_num to the total number total_camera_num participating in the photo.
user_group_photo_w=bg_width*group_photo_ratio*(user_camera_num/total_camera_num)
total_camera_width+=user_group_photo_w
2. The user camera view image is scaled uniformly and cropped beyond the user_group_photo_w. When the character proportion of the camera picture is too large and too small in the background area, the management end can dynamically adjust the character scaling scale so as to coordinate the characters. When the user's camera frame exceeds the camera width user_group_photo_w, the clipping position is recalculated, the crop_x represents the x coordinate of the left upper corner of the clipping camera frame, the crop_y represents the y coordinate of the left upper corner of the clipping camera frame, the crop_width represents the width of the clipping camera frame, and the crop_height represents the height of the clipping camera frame.
image=resize(image,scale)
image=crop(image,crop_x,crop_y,crop_width,crop_height)
3. Discharge position of user screen. And solving the starting point coordinates locator_x and locator_y of each camera picture in the background area through the total width of the cameras of the participating photo users. And keeping the overall picture after fusion as centered as possible.
locate_x=total_camera_width-user_group_photo_w
locate_y=bg_height-distance_from_bottom-height
4. Background fusion. Wherein src is an original image, cls is an AI algorithm for segmentation to obtain a probability map, bg is a background image, and the original image and the background are combined together in an image pixel weighting mode.
bg_b=src_b*cls+(1-cls)*bg_b
bg_g=src_g*cls+(1-cls)*bg_g
bg_r=src_r*cls+(1-cls)*bg_r
4. Photo data output
Because the user information, the room information and the fused reasoning address and the like which are issued by the back end and participate in the photo are acquired, aiming at the fused image data, namely the real-time photo, the video coding compression is required to be carried out by adopting the scheme which is the same as that of the client, and then the video coding compression is carried out according to the designated stream address, and the RTMP protocol is used for pushing the video coding compression to the streaming media server. The client pulls the video stream address, and can see the fused group photo image in real time.
When the user moves back and forth and left and right, the figures in the group photo also change in real time.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
While the invention has been described above with reference to various embodiments, it should be understood that many changes and modifications can be made without departing from the scope of the invention. It is therefore intended that the foregoing detailed description be regarded as illustrative rather than limiting, and that it be understood that it is the following claims, including all equivalents, that are intended to define the spirit and scope of this invention. The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.

Claims (10)

1. The method for carrying out image fusion processing to obtain image data of a group photo after carrying out image segmentation by carrying out real-time video stream decoding of each path of client side by using a streaming media server to carry out image fusion processing on the image data based on a portrait segmentation and image fusion technology, and carrying out encoding and streaming client side corresponding pulling video on the data is characterized by comprising the following steps of:
step 1, when receiving a group photo request submitted by a user, a real-time cloud group photo platform establishes a group photo task, wherein the group photo task is to create a single room number to avoid conflict of a plurality of group photo tasks at the same time, and a server of the real-time cloud group photo platform transmits user ids, video reasoning addresses and stream addresses after group photo participation in each room;
step 2, a client of a user collects real-time image data through a video data collection device, after the client collects the real-time image data, the client adopts an H264 or H265 coding mode to code and then adopts a streaming media protocol to send the coded video data, and after the client carries out communication with the rear end according to the assigned room number, the collected video data is coded and processed and then flows to a streaming media server according to a designated streaming address by adopting an RTMP protocol;
step 3, pulling and decoding the real-time video stream pushed by the client into normal video data based on ffmpeg, then calling a portrait segmentation algorithm to perform real-time matting, wherein a plurality of video streams pushed by the client exist in the same room, and after mask data of an image are acquired, fusion of multiple ways of portraits is performed;
and 4, calculating the initial area position of the photo in the background according to the priority of each user, the number of people participating in the photo and the scaling parameter of the people, which are set by each user, and then fusing the portrait data buckled in the step 3 on the appointed background according to a ranking strategy, wherein the calculating and ranking processing mode comprises the following steps:
step 401, calculating the width of the user picture in the background area, wherein the width user_group_photo_w of the user picture in the background area is obtained by multiplying the width bg_width of the background image designated by the back end by a default scaling factor group_photo_ratio, and multiplying the ratio of the current user photo number user_camera_num to the total number of people participating in the photo, and the calculation formula of the user_group_photo_w is as follows:
user_group_photo_w=bg_width*group_photo_ratio*(user_camera_num/
total_camera_num);
in step 402, the user's camera image is scaled uniformly, and is cropped beyond the user_group_photo_w, when the image person scale is too large and too small in the background area, the real-time cloud group image platform can dynamically adjust the person scale to coordinate the person, and when the user's camera image exceeds the camera width user_group_photo_w, the cropping position is recalculated, crop_x represents the upper left corner x coordinate of the cropped camera image, crop_y represents the upper left corner y coordinate of the cropped camera image, crop_width represents the width of the cropped camera image, and crop_height represents the height of the cropped camera image, and the user's camera image is represented as:
image=resize(image,scale)
image=crop(image,crop_x,crop_y,crop_width,crop_height);
step 403, calculating a starting point coordinate location_x and location_y of each camera picture in a background area by participating in the total width of the cameras of the photo users, and keeping the integrated total picture centered to obtain the discharge position of the user picture, wherein the starting point coordinates location_x and location_y are expressed as:
locate_x=total_camera_width-user_group_photo_w
locate_y=bg_height-distance_from_bottom-height;
step 404, combining the original image with the background by means of image pixel weighting,
bg_b=src_b*cls+(1-cls)*bg_b
bg_g=src_g*cls+(1-cls)*bg_g
bg_r=src_r*cls+(1-cls)*bg_r
wherein src is an original image, cls is an AI algorithm for segmentation to obtain a probability map, bg is a background image;
and step 5, because the user information, the room information and the fused reasoning address which are transmitted by the back end of the real-time cloud group photo platform and participate in the group photo are acquired, aiming at the fused image data, namely the real-time group photo, video coding compression is required to be carried out according to the appointed stream address by adopting the scheme same as that of the client, the video is pushed to a streaming media server by using an RTMP protocol, and the client pulls the video stream address, so that the fused group photo image can be seen in real time.
2. The method for remote multi-person real-time cloud group photo as claimed in claim 1, wherein said client is a window PC device, and said video data acquisition device is a USB camera for acquiring real-time image data.
3. A method of multi-person-in-place real-time cloud group photo as recited in claim 1, wherein said real-time matting further comprises: in order to improve the effect of image matting, an image matting algorithm is designed based on a Mobilene-v 3 or Resnet50 deep learning algorithm, and an engine based on TensorRT reasoning is utilized to deploy the image matting algorithm.
4. The method of remote multi-person real-time cloud group photo as claimed in claim 1, wherein said step 4 further comprises: firstly, informing the back end of the real-time cloud group photo platform of related user information by a client, then issuing the back end to a server to explain the number of the group photo of a plurality of users, then calculating the group photo area of each user in the background, and automatically cutting out the person area beyond the boundary of the background group photo area according to the adjustable scaling coefficient, if judging that the video data acquisition devices of a plurality of users have different camera environments, manually updating the corresponding coefficient by informing a background management platform so as to achieve a relatively coordinated and natural picture, and reducing abrupt; the arrangement logic of the multiple paths of video streams is arranged from the middle to the two sides, and the left side is arranged in preference to the right side, and the arrangement logic is arranged in the middle no matter whether the video paths are odd or even.
5. The method of claim 1, wherein a kalman tracking is added to the image collected in real time, the shake of the portrait frame is reduced to alleviate the shake of the picture caused by the person walking back and forth, the smallest rectangular frame divided by the current frame is compared with the iou of the previous frame tracking frame, and the transverse width ratio of the two frames is compared, when a certain coefficient is not exceeded, the frame of the previous frame is used as the portrait frame of the current frame, and the picture is reduced by a certain shake.
6. The system for remote multi-person real-time cloud group photo based on the technology of image segmentation and image fusion comprises the following modules:
the system comprises a connection establishing module, a real-time cloud group photo platform, a video processing module and a video processing module, wherein the connection establishing module establishes group photo tasks when receiving group photo requests submitted by users, the group photo tasks establish an independent room number to avoid conflict of a plurality of group photo tasks at the same time, and a server of the real-time cloud group photo platform transmits user ids, video reasoning addresses and stream addresses after group photo participation in each room;
the system comprises an image acquisition and uploading module, wherein a client of a user acquires real-time image data through a video data acquisition device, the client encodes the real-time image data by adopting an H264 or H265 encoding mode and then transmits the encoded video data by adopting a streaming media protocol, and the acquired video data is encoded and processed according to an assigned room number and a rear end and then is pushed to a streaming media server by adopting an RTMP protocol according to an appointed streaming address after being communicated with the rear end; the image real-time segmentation module is used for pulling and decoding a real-time video stream pushed by a client into normal video data based on ffmpeg, then invoking a portrait segmentation algorithm to perform real-time matting, wherein a plurality of video streams pushed by the client exist in the same room, and after mask data of an image are acquired, multi-path portrait fusion is performed;
the image real-time fusion module is used for calculating the initial area position of the photo in the background according to the priority of each user, the number of people participating in the photo and the scaling parameter of the people, which are set by each user, and then fusing the buckled portrait data on the appointed background according to the ranking strategy, wherein the calculation ranking processing mode comprises the following steps:
step 401, calculating the width of the user picture in the background area, wherein the width user_group_photo_w of the user picture in the background area is obtained by multiplying the width bg_width of the background image designated by the back end by a default scaling factor group_photo_ratio, and multiplying the ratio of the current user photo number user_camera_num to the total number of people participating in the photo, and the calculation formula of the user_group_photo_w is as follows:
user_group_photo_w=bg_width*group_photo_ratio*(user_camera_num/
total_camera_num);
in step 402, the user's camera image is scaled uniformly, and is cropped beyond the user_group_photo_w, when the image person scale is too large and too small in the background area, the real-time cloud group image platform can dynamically adjust the person scale to coordinate the person, and when the user's camera image exceeds the camera width user_group_photo_w, the cropping position is recalculated, crop_x represents the upper left corner x coordinate of the cropped camera image, crop_y represents the upper left corner y coordinate of the cropped camera image, crop_width represents the width of the cropped camera image, and crop_height represents the height of the cropped camera image, and the user's camera image is represented as:
image=resize(image,scale)
image=crop(image,crop_x,crop_y,crop_width,crop_height);
step 403, calculating a starting point coordinate location_x and location_y of each camera picture in a background area by participating in the total width of the cameras of the photo users, and keeping the integrated total picture centered to obtain the discharge position of the user picture, wherein the starting point coordinates location_x and location_y are expressed as:
locate_x=total_camera_width-user_group_photo_w
locate_y=bg_height-distance_from_bottom-height;
step 404, combining the original image with the background by means of image pixel weighting,
bg_b=src_b*cls+(1-cls)*bg_b
bg_g=src_g*cls+(1-cls)*bg_g
bg_r=src_r*cls+(1-cls)*bg_r
wherein src is an original image, cls is an AI algorithm for segmentation to obtain a probability map, bg is a background image;
and the image issuing module acquires user information, room information and a fused reasoning address of the participation photo issued by the rear end of the real-time cloud photo platform, and aims at fused image data, namely the real-time photo, adopts the scheme which is the same as that of the client to carry out video coding compression, then pushes the video coding compression to a streaming media server according to a designated streaming address by using an RTMP protocol, and the client pulls the video streaming address, so that the fused photo image can be seen in real time.
7. The system of claim 6, wherein the client is a window PC device and the video data acquisition device is a USB camera for acquiring real-time image data.
8. A system for remote multi-person real-time cloud group photo as recited in claim 6, wherein said real-time matting further comprises: in order to improve the effect of image matting, an image matting algorithm is designed based on a Mobilene-v 3 or Resnet50 deep learning algorithm, and an engine based on TensorRT reasoning is utilized to deploy the image matting algorithm.
9. The system of claim 6, wherein the image real-time segmentation module further comprises: firstly, informing the back end of the real-time cloud group photo platform of related user information by a client, then issuing the back end to a server to explain the number of the group photo of a plurality of users, then calculating the group photo area of each user in the background, and automatically cutting out the person area beyond the boundary of the background group photo area according to the adjustable scaling coefficient, if judging that the video data acquisition devices of a plurality of users have different camera environments, manually updating the corresponding coefficient by informing a background management platform so as to achieve a relatively coordinated and natural picture, and reducing abrupt; the arrangement logic of the multiple paths of video streams is arranged from the middle to the two sides, and the left side is arranged in preference to the right side, and the arrangement logic is arranged in the middle no matter whether the video paths are odd or even.
10. A system for remote multi-person real-time cloud group photo as claimed in claim 9, wherein kalman tracking is added to the image collected in real time, the shake of the portrait frame is reduced to alleviate the shake of the picture caused by the person walking back and forth, the smallest rectangular frame divided by the current frame is compared with the iou of the previous frame tracking frame, and the transverse width ratio of the two frames is compared, when a certain coefficient is not exceeded, the frame of the previous frame is used as the portrait frame of the current frame, and the picture is reduced by a certain shake.
CN202211716741.0A 2022-12-29 2022-12-29 Method and system for remote multi-person real-time cloud group photo Active CN116600147B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211716741.0A CN116600147B (en) 2022-12-29 2022-12-29 Method and system for remote multi-person real-time cloud group photo

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211716741.0A CN116600147B (en) 2022-12-29 2022-12-29 Method and system for remote multi-person real-time cloud group photo

Publications (2)

Publication Number Publication Date
CN116600147A CN116600147A (en) 2023-08-15
CN116600147B true CN116600147B (en) 2024-03-29

Family

ID=87588613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211716741.0A Active CN116600147B (en) 2022-12-29 2022-12-29 Method and system for remote multi-person real-time cloud group photo

Country Status (1)

Country Link
CN (1) CN116600147B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102625129A (en) * 2012-03-31 2012-08-01 福州一点通广告装饰有限公司 Method for realizing remote reality three-dimensional virtual imitated scene interaction
CN107404617A (en) * 2017-07-21 2017-11-28 努比亚技术有限公司 A kind of image pickup method and terminal, computer-readable storage medium
CN109118558A (en) * 2017-06-23 2019-01-01 沈瑜越 The method taken a group photo to multiple group photo objects in different time points or place
CN112601044A (en) * 2020-12-08 2021-04-02 深圳市焦点数字科技有限公司 Conference scene picture self-adaption method
CN112954221A (en) * 2021-03-11 2021-06-11 深圳市几何数字技术服务有限公司 Method for real-time photo shooting
CN115209111A (en) * 2022-07-26 2022-10-18 北京新方通信技术有限公司 Home and office video monitoring method and system supporting real-time background replacement
CN115423728A (en) * 2021-05-13 2022-12-02 海信集团控股股份有限公司 Image processing method, device and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113141346B (en) * 2021-03-16 2023-04-28 青岛小鸟看看科技有限公司 VR one-to-multiple system and method based on series flow

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102625129A (en) * 2012-03-31 2012-08-01 福州一点通广告装饰有限公司 Method for realizing remote reality three-dimensional virtual imitated scene interaction
CN109118558A (en) * 2017-06-23 2019-01-01 沈瑜越 The method taken a group photo to multiple group photo objects in different time points or place
CN107404617A (en) * 2017-07-21 2017-11-28 努比亚技术有限公司 A kind of image pickup method and terminal, computer-readable storage medium
CN112601044A (en) * 2020-12-08 2021-04-02 深圳市焦点数字科技有限公司 Conference scene picture self-adaption method
CN112954221A (en) * 2021-03-11 2021-06-11 深圳市几何数字技术服务有限公司 Method for real-time photo shooting
CN115423728A (en) * 2021-05-13 2022-12-02 海信集团控股股份有限公司 Image processing method, device and system
CN115209111A (en) * 2022-07-26 2022-10-18 北京新方通信技术有限公司 Home and office video monitoring method and system supporting real-time background replacement

Also Published As

Publication number Publication date
CN116600147A (en) 2023-08-15

Similar Documents

Publication Publication Date Title
US9692964B2 (en) Modification of post-viewing parameters for digital images using image region or feature information
JP4057241B2 (en) Improved imaging system with virtual camera
US7911513B2 (en) Simulating short depth of field to maximize privacy in videotelephony
US9129381B2 (en) Modification of post-viewing parameters for digital images using image region or feature information
CN111402399B (en) Face driving and live broadcasting method and device, electronic equipment and storage medium
GB2440376A (en) Wide angle video conference imaging
CN113099245B (en) Panoramic video live broadcast method, system and computer readable storage medium
US11076127B1 (en) System and method for automatically framing conversations in a meeting or a video conference
CN103873453B (en) Immerse communication customer end, server and the method for obtaining content view
CN104169842B (en) For controlling method, the method for operating video clip, face orientation detector and the videoconference server of video clip
CN109547724B (en) Video stream data processing method, electronic equipment and storage device
JP2005117163A (en) Camera server apparatus, control method thereof, computer program and computer-readable storage medium
CN113391644A (en) Unmanned aerial vehicle shooting distance semi-automatic optimization method based on image information entropy
JP2003111041A (en) Image processor, image processing system, image processing method, storage medium and program
CN116600147B (en) Method and system for remote multi-person real-time cloud group photo
JP2004266670A (en) Image pickup device and method, image information providing system and program
WO2021200184A1 (en) Information processing device, information processing method, and program
JP2005142765A (en) Apparatus and method for imaging
CN112887653B (en) Information processing method and information processing device
CN113784084A (en) Processing method and device
WO2021184326A1 (en) Control method and apparatus for electronic apparatus, and device and system
CN115997379A (en) Restoration of image FOV for stereoscopic rendering
CN108989327A (en) A kind of virtual reality server system
CN109862419B (en) Intelligent digital laser television interaction method and system
JP2003125389A (en) Apparatus and method for delivering image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant