CN116600147B

CN116600147B - Method and system for remote multi-person real-time cloud group photo

Info

Publication number: CN116600147B
Application number: CN202211716741.0A
Authority: CN
Inventors: 郑娃龙; 张威; 黄浩辉
Original assignee: Guangzhou Ziweiyun Technology Co ltd
Current assignee: Guangzhou Ziweiyun Technology Co ltd
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2024-03-29
Anticipated expiration: 2042-12-29
Also published as: CN116600147A

Abstract

The invention discloses a method for realizing remote multi-person real-time cloud group photo, which is based on a portrait segmentation and image fusion technology, wherein image data is pushed to a streaming media server by adopting RTMP protocol coding, and real-time video stream decoding of each path of client side is pulled from the streaming media server, image fusion processing is carried out after portrait segmentation is carried out to obtain image data of group photo, the data is correspondingly pulled by the coding push client side, and the situation of the same frame group photo of each person after picture fusion can be seen in real time only by using a camera to collect personal whole-body portrait video data and pushing the same to a server side and pulling new fused video stream by relying on an AI portrait segmentation and background fusion technology and the streaming media technology. In addition, in order to ensure the effect, the station positions of the user can be dynamically adjusted, the background can be switched in real time to switch the background of the photo, the front station position and the back station position are set, the scaling of the figure is set, even the background is replaced by a dynamic video and the like, and the front end can be dynamically updated accordingly.

Description

Method and system for remote multi-person real-time cloud group photo

Technical Field

The invention relates to the technical field of computers, in particular to a method and a system for remote multi-person real-time cloud group photo.

Background

Today, with the rapid development of electronic technology. The shooting mode generally requires participants to be in the scene, the participants stand in sequence according to the arrangement, then shoot under the guidance of a photographer, and then repair the pictures in the later stage. If some people cannot participate in the scene due to other conditions, the people can unfortunately miss the group photo with everyone, and certainly, the people can also join in through the post P-picture processing. The photographing mode is inflexible, and the personalized requirements of users cannot be met. Particularly in the current environment, various activities are generally performed on-line, and cannot take part in outside, so that the demand of cloud group photo is also more and more vigorous.

At present, a cloud group photo mode exists in the market, but basically, each person is required to self-shoot a static image, and the static image is uploaded to a server for splicing and fusion. Such as implementation of cloud graduation for schools during epidemic situations. The modes have a common characteristic that a user selects a specific position, then self-photograph is uploaded after self-photographing, and background zooming and background fusion are carried out. The user cannot see the composition process, and the user can only see the final effect. If the user feels bad, the user can only find the angle and take the images again after making the expression, and then upload the images again. In the process, each person takes own pictures, no communication exists in the whole process, no guidance of photographers exists, and the hot atmosphere for shooting collective group pictures is lacking.

Disclosure of Invention

The present invention aims to solve at least one of the technical problems existing in the prior art. The invention discloses a method for remote multi-person real-time cloud group photo, which is based on a portrait segmentation and image fusion technology, wherein image data is pushed to a streaming media server by adopting RTMP protocol coding, and the real-time video stream of each client is pulled from the streaming media server to decode the image after portrait segmentation, and then image fusion processing is carried out to obtain the image data of the group photo, the data is correspondingly pulled to the encoding push client, and the method for remote multi-person real-time cloud group photo comprises the following steps:

step 1, when receiving a group photo request submitted by a user, a real-time cloud group photo platform establishes a group photo task, wherein the group photo task is to create a single room number to avoid conflict of a plurality of group photo tasks at the same time, and a server of the real-time cloud group photo platform transmits user ids, video push stream addresses and stream addresses after group photo participation in each room;

step 2, a client of a user collects real-time image data through a video data collection device, after the client collects the real-time image data, the client adopts an H264 or H265 coding mode to code and then adopts a streaming media protocol to send the coded video data, and after the client carries out communication with the rear end according to the assigned room number, the collected video data is coded and processed and then flows to a streaming media server according to a designated streaming address by adopting an RTMP protocol;

step 3, pulling and decoding the real-time video stream pushed by the client into normal video data based on ffmpeg, then calling a portrait segmentation algorithm to perform real-time matting, wherein a plurality of video streams pushed by the client exist in the same room, and after mask data of an image are acquired, fusion of multiple ways of portraits is performed;

step 4, calculating the initial area position of the photo in the background according to the priority of each user, the number of people participating in the photo and the scaling parameter of the people, which are set by each user, and then fusing the portrait data buckled in the step 3 on the appointed background according to a ranking strategy;

and step 5, because the user information, the room information and the fused reasoning address which are transmitted by the back end of the real-time cloud group photo platform and participate in the group photo are acquired, aiming at the fused image data, namely the real-time group photo, video coding compression is required to be carried out according to the appointed stream address by adopting the scheme same as that of the client, the video is pushed to a streaming media server by using an RTMP protocol, and the client pulls the video stream address, so that the fused group photo image can be seen in real time.

Furthermore, the client is window PC equipment, and the video data acquisition device is a USB camera for acquiring real-time image data.

Still further, the real-time matting further includes: in order to improve the effect of image matting, an image matting algorithm is designed based on a Mobilene-v 3 or Resnet50 deep learning algorithm, and the image matting algorithm is deployed by using a TensorRT reasoning engine.

Still further, the step 4 further includes: firstly, informing the back end of the real-time cloud group photo platform of related user information by a client, then issuing the back end to a server to explain the number of the group photo of a plurality of users, then calculating the group photo area of each user in the background, and automatically cutting out the person area beyond the boundary of the background group photo area according to the adjustable scaling coefficient, if judging that the video data acquisition devices of a plurality of users have different camera environments, manually updating the corresponding coefficient by informing a background management platform so as to achieve a relatively coordinated and natural picture, and reducing abrupt; the arrangement logic of the multiple paths of video streams is arranged from the middle to the two sides, and the left side is arranged in preference to the right side, and the arrangement logic is arranged in the middle no matter whether the video paths are odd or even.

Furthermore, kalman tracking is added to the image acquired in real time, so that the shake of a human frame is reduced to alleviate the shake of a picture caused by the back and forth movement of a human figure, the minimum rectangular frame divided by the current frame is compared with the iou of the tracking frame of the previous frame, and the transverse width ratio of the two frames is compared, when a certain coefficient is not exceeded, the frame of the previous frame is used as the human figure frame of the current frame, and the picture can reduce certain shake.

Further, the calculating and ranking processing manner in the step 4 includes the following steps:

step 401, calculating the width of the user picture in the background area, wherein the width user_group_photo_w of the user picture in the background area is obtained by multiplying the width bg_width of the background image designated by the back end by a default scaling factor group_photo_ratio, and multiplying the ratio of the current user photo number user_camera_num to the total number of people participating in the photo, and the calculation formula of the user_group_photo_w is as follows:

user_group_photo_w＝bg_width*group_photo_ratio*(user_camera_num/total_camera_num)；

in step 402, the user's camera image is scaled uniformly, and is cut out beyond the user_group_photo_w, when the image person scale is too large and too small in the background area, the real-time cloud group photo platform can dynamically adjust the person scale to coordinate the person, when the user's camera image exceeds the camera width user_group_photo_w, the cutting position is recalculated, the crop_x represents the x coordinate of the upper left corner of the cut camera image, the crop_y represents the y coordinate of the upper left corner of the cut camera image, the crop_width represents the width of the cut camera image, and the height of the cut camera image represented by the crop_height, and the user's camera image is represented as:

image＝resize(image,scale)

image＝crop(image,crop_x,crop_y,crop_width,crop_height)；

step 403, calculating a starting point coordinate location_x and location_y of each camera picture in a background area by participating in the total width of the cameras of the photo users, and keeping the integrated total picture centered to obtain the discharge position of the user picture, wherein the starting point coordinates location_x and location_y are expressed as:

locate_x＝total_camera_width-user_group_photo_w

locate_y＝bg_height-distance_from_bottom-height；

step 404, merging the original image and the background together by means of image pixel weighting.

bg_b＝src_b*cls+(1-cls)*bg_b

bg_g＝src_g*cls+(1-cls)*bg_g

bg_r＝src_r*cls+(1-cls)*bg_r

Wherein src is an original image, cls is an AI algorithm for segmentation to obtain a probability map, and bg is a background image.

The invention also discloses a system for remote multi-person real-time cloud group photo, which is based on the technology of image segmentation and image fusion, wherein image data is encoded and pushed to a streaming media server by adopting an RTMP protocol, and the system for remote multi-person real-time cloud group photo comprises the following modules:

the system comprises a connection establishing module, a real-time cloud group photo platform, a video processing module and a group photo processing module, wherein the connection establishing module establishes group photo tasks when receiving group photo requests submitted by users, and the group photo tasks establish an independent room number to avoid conflict among a plurality of group photo tasks at the same time, wherein a server of the real-time cloud group photo platform transmits user ids, video push stream addresses and stream addresses after group photo participation in each room;

the system comprises an image acquisition and uploading module, wherein a client of a user acquires real-time image data through a video data acquisition device, the client encodes the real-time image data by adopting an H264 or H265 encoding mode and then transmits the encoded video data by adopting a streaming media protocol, and the acquired video data is encoded and processed according to an assigned room number and a rear end and then is pushed to a streaming media server by adopting an RTMP protocol according to an appointed streaming address after being communicated with the rear end;

the image real-time segmentation module is used for pulling and decoding a real-time video stream pushed by a client into normal video data based on ffmpeg, then invoking a portrait segmentation algorithm to perform real-time matting, wherein a plurality of video streams pushed by the client exist in the same room, and after mask data of an image are acquired, multi-path portrait fusion is performed;

the image real-time fusion module is used for calculating the initial area position of the photo in the background according to the priority of each user, the number of people participating in the photo and the scaling parameter of the people, which are set by each user, and then fusing the buckled portrait data on the appointed background according to the ranking strategy;

and the image issuing module acquires user information, room information and a fused reasoning address of the participation photo issued by the rear end of the real-time cloud photo platform, and aims at fused image data, namely the real-time photo, adopts the scheme which is the same as that of the client to carry out video coding compression, then pushes the video coding compression to a streaming media server according to a designated streaming address by using an RTMP protocol, and the client pulls the video streaming address, so that the fused photo image can be seen in real time.

Still further, the image real-time segmentation module further includes: firstly, informing the back end of the real-time cloud group photo platform of related user information by a client, then issuing the back end to a server to explain the number of the group photo of a plurality of users, then calculating the group photo area of each user in the background, and automatically cutting out the person area beyond the boundary of the background group photo area according to the adjustable scaling coefficient, if judging that the video data acquisition devices of a plurality of users have different camera environments, manually updating the corresponding coefficient by informing a background management platform so as to achieve a relatively coordinated and natural picture, and reducing abrupt; the arrangement logic of the multiple paths of video streams is arranged from the middle to the two sides, and the left side is arranged in preference to the right side, and the arrangement logic is arranged in the middle no matter whether the video paths are odd or even.

Compared with the prior art, the invention has the beneficial effects that: the invention relies on the AI image segmentation and background fusion technology and the streaming media technology, and the same-frame group photo condition of each person after the picture fusion can be seen in real time only by using a camera to collect the personal whole-body image video data and pushing the personal whole-body image video data to a server and then pulling a new fused video stream. In addition, in order to ensure the effect, the station positions of the user can be dynamically adjusted, the background can be switched in real time to switch the background of the photo, the front station position and the back station position are set, the scaling of the figure is set, even the background is replaced by a dynamic video and the like, and the front end can be dynamically updated accordingly. By the mode, people can conveniently and rapidly group photo in different places, and even can punch cards in scenic spots in clouds. The voice frequency function can be added, so that face-to-face voice communication, mutual guidance of station conditions of the opposite sides and the like can be realized when different-place group shadows are formed. Not only is quick and convenient, but also is efficient, and can also interact.

Drawings

The invention will be further understood from the following description taken in conjunction with the accompanying drawings. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the embodiments. In the figures, like reference numerals designate corresponding parts throughout the different views.

Fig. 1 is an overall flowchart of a remote multi-person real-time cloud group photo method of the present invention.

FIG. 2 is a flow chart of model transformation and deployment according to an embodiment of the present invention.

Fig. 3 is a schematic flow chart of multi-person group photo in the system of the remote multi-person real-time cloud group photo of the invention.

Detailed Description

As shown in fig. 1-3, the present invention relies on the technology of image segmentation and image fusion, and uses RTMP protocol encoding to push image data to a streaming media server, and further uses the streaming media server to pull real-time video stream decoding of each path of clients to perform image segmentation, and then performs image fusion processing, so as to obtain image data of a group photo, and uses the data to encode the push client to correspondingly pull video stream.

For each group photo task, a separate room number needs to be created, so that the conflict of a plurality of group photo tasks at the same time is avoided. The user id of each room reference and group photo, the video push stream address and the stream address after group photo are issued by the server, so that the uniqueness is ensured.

The whole scheme flow chart is shown in fig. 1 in detail, and the specific method and flow are as follows:

1. video data acquisition and transmission

Because video image data is larger, the transmission in a frame mode consumes larger bandwidth resources, and the coded video data is generally transmitted by adopting a streaming media protocol after being coded in a mode of H264 or H265 and the like. The client typically employs a window PC device, and uses a USB camera to collect real-time image data. Based on the powerful audio and video processing capability of ffmpeg, the module is mainly used for encoding collected video data according to an assigned room number and a designated stream address after communicating with the rear end, and pushing the video data to a streaming media server by adopting an RTMP protocol.

2. Data streaming and portrait segmentation

And pulling and decoding the real-time video stream pushed by the client based on the ffmpeg to form normal video data, and then calling a portrait segmentation algorithm to perform real-time matting.

In order to improve the effect of image matting, the image matting algorithm is designed based on backbones such as Mobilene-v 3 and Resnet50 by using a deep learning algorithm, the Mobilene-v 3 is faster than Resnet50 by a few milliseconds, the accuracy is not great, the Resnet50 is slightly better in visual effect, and a Resnet50 model is much larger. The algorithm is deployed by using a TensorRT reasoning engine, and good instantaneity and a matting effect can be achieved on display cards such as Indellovely P40, RTX3060, RTX2080Ti and the like, and specific model conversion deployment is shown in figure 2.

Video streams pushed by a plurality of clients exist in the same room, and after mask data of the images are acquired, fusion of multiple paths of images is needed.

3. Portrait fusion scheme

For the figure data buckled in 2, fusion on the specified background is required. How can users who participate in the group photo be ensured to see each other to appear in the same background picture and have better effect?

The general area position of the photo is calculated in the background by the parameters of the priority of each user, the number of people participating in the photo, the scaling of the people and the like, which are set by each user, and the general calculation and arrangement processing logic is described as follows:

the method comprises the steps that firstly, the client side informs the back end of relevant user information, the back end sends the back end to the server to explain that a plurality of users participate in the photo, the photo number of each user is what, then the photo area of each user in the background is calculated, and according to the adjustable scaling coefficient, the person area exceeding the boundary of the background photo area is automatically cut. According to different camera environments, corresponding coefficients can be manually updated through a background management platform, so that a relatively coordinated and natural picture is achieved, and the abrupt sense is reduced.

For the joint photographic arrangement logic of the multipath video streams, a relatively simple arrangement algorithm is adopted, the logic is arranged from the middle to the two sides, and the left side is arranged in preference to the right side, and the video paths are arranged in the middle no matter whether the video paths are odd or even.

In order to solve the problem of image shake caused by the fact that the person with the portrait matting moves back and forth, the product still adds Kalman tracking, and shake of a portrait frame is reduced. The minimum rectangular frame divided by the current frame is compared with the iou of the tracking frame of the previous frame, and when the transverse width ratio of the two frames does not exceed a certain coefficient, the frame of the previous frame is used as the portrait frame of the current frame, and the picture can reduce certain jitter.

The general calculation and arrangement processing mode is as follows:

1. and calculating the width of the user picture in the background area.

And obtaining the occupied width user_group_photo_w of the user picture in the background area by multiplying the width bg_width of the background image designated by the rear end by a default proportionality coefficient group_photo_ratio and multiplying the ratio of the current user photo number user_camera_num to the total number total_camera_num participating in the photo.

user_group_photo_w＝bg_width*group_photo_ratio*(user_camera_num/total_camera_num)

total_camera_width+＝user_group_photo_w

2. The user camera view image is scaled uniformly and cropped beyond the user_group_photo_w. When the character proportion of the camera picture is too large and too small in the background area, the management end can dynamically adjust the character scaling scale so as to coordinate the characters. When the user's camera frame exceeds the camera width user_group_photo_w, the clipping position is recalculated, the crop_x represents the x coordinate of the left upper corner of the clipping camera frame, the crop_y represents the y coordinate of the left upper corner of the clipping camera frame, the crop_width represents the width of the clipping camera frame, and the crop_height represents the height of the clipping camera frame.

image＝resize(image,scale)

image＝crop(image,crop_x,crop_y,crop_width,crop_height)

3. Discharge position of user screen. And solving the starting point coordinates locator_x and locator_y of each camera picture in the background area through the total width of the cameras of the participating photo users. And keeping the overall picture after fusion as centered as possible.

locate_x＝total_camera_width-user_group_photo_w

locate_y＝bg_height-distance_from_bottom-height

4. Background fusion. Wherein src is an original image, cls is an AI algorithm for segmentation to obtain a probability map, bg is a background image, and the original image and the background are combined together in an image pixel weighting mode.

bg_b＝src_b*cls+(1-cls)*bg_b

bg_g＝src_g*cls+(1-cls)*bg_g

bg_r＝src_r*cls+(1-cls)*bg_r

4. Photo data output

Because the user information, the room information and the fused reasoning address and the like which are issued by the back end and participate in the photo are acquired, aiming at the fused image data, namely the real-time photo, the video coding compression is required to be carried out by adopting the scheme which is the same as that of the client, and then the video coding compression is carried out according to the designated stream address, and the RTMP protocol is used for pushing the video coding compression to the streaming media server. The client pulls the video stream address, and can see the fused group photo image in real time.

When the user moves back and forth and left and right, the figures in the group photo also change in real time.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

While the invention has been described above with reference to various embodiments, it should be understood that many changes and modifications can be made without departing from the scope of the invention. It is therefore intended that the foregoing detailed description be regarded as illustrative rather than limiting, and that it be understood that it is the following claims, including all equivalents, that are intended to define the spirit and scope of this invention. The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.

Claims

1. The method for carrying out image fusion processing to obtain image data of a group photo after carrying out image segmentation by carrying out real-time video stream decoding of each path of client side by using a streaming media server to carry out image fusion processing on the image data based on a portrait segmentation and image fusion technology, and carrying out encoding and streaming client side corresponding pulling video on the data is characterized by comprising the following steps of:

step 1, when receiving a group photo request submitted by a user, a real-time cloud group photo platform establishes a group photo task, wherein the group photo task is to create a single room number to avoid conflict of a plurality of group photo tasks at the same time, and a server of the real-time cloud group photo platform transmits user ids, video reasoning addresses and stream addresses after group photo participation in each room;

and 4, calculating the initial area position of the photo in the background according to the priority of each user, the number of people participating in the photo and the scaling parameter of the people, which are set by each user, and then fusing the portrait data buckled in the step 3 on the appointed background according to a ranking strategy, wherein the calculating and ranking processing mode comprises the following steps:

user_group_photo_w＝bg_width*group_photo_ratio*(user_camera_num/

total_camera_num)；

in step 402, the user's camera image is scaled uniformly, and is cropped beyond the user_group_photo_w, when the image person scale is too large and too small in the background area, the real-time cloud group image platform can dynamically adjust the person scale to coordinate the person, and when the user's camera image exceeds the camera width user_group_photo_w, the cropping position is recalculated, crop_x represents the upper left corner x coordinate of the cropped camera image, crop_y represents the upper left corner y coordinate of the cropped camera image, crop_width represents the width of the cropped camera image, and crop_height represents the height of the cropped camera image, and the user's camera image is represented as:

image＝resize(image,scale)

image＝crop(image,crop_x,crop_y,crop_width,crop_height)；

locate_x＝total_camera_width-user_group_photo_w

locate_y＝bg_height-distance_from_bottom-height；

step 404, combining the original image with the background by means of image pixel weighting,

bg_b＝src_b*cls+(1-cls)*bg_b

bg_g＝src_g*cls+(1-cls)*bg_g

bg_r＝src_r*cls+(1-cls)*bg_r

wherein src is an original image, cls is an AI algorithm for segmentation to obtain a probability map, bg is a background image;

2. The method for remote multi-person real-time cloud group photo as claimed in claim 1, wherein said client is a window PC device, and said video data acquisition device is a USB camera for acquiring real-time image data.

3. A method of multi-person-in-place real-time cloud group photo as recited in claim 1, wherein said real-time matting further comprises: in order to improve the effect of image matting, an image matting algorithm is designed based on a Mobilene-v 3 or Resnet50 deep learning algorithm, and an engine based on TensorRT reasoning is utilized to deploy the image matting algorithm.

4. The method of remote multi-person real-time cloud group photo as claimed in claim 1, wherein said step 4 further comprises: firstly, informing the back end of the real-time cloud group photo platform of related user information by a client, then issuing the back end to a server to explain the number of the group photo of a plurality of users, then calculating the group photo area of each user in the background, and automatically cutting out the person area beyond the boundary of the background group photo area according to the adjustable scaling coefficient, if judging that the video data acquisition devices of a plurality of users have different camera environments, manually updating the corresponding coefficient by informing a background management platform so as to achieve a relatively coordinated and natural picture, and reducing abrupt; the arrangement logic of the multiple paths of video streams is arranged from the middle to the two sides, and the left side is arranged in preference to the right side, and the arrangement logic is arranged in the middle no matter whether the video paths are odd or even.

5. The method of claim 1, wherein a kalman tracking is added to the image collected in real time, the shake of the portrait frame is reduced to alleviate the shake of the picture caused by the person walking back and forth, the smallest rectangular frame divided by the current frame is compared with the iou of the previous frame tracking frame, and the transverse width ratio of the two frames is compared, when a certain coefficient is not exceeded, the frame of the previous frame is used as the portrait frame of the current frame, and the picture is reduced by a certain shake.

6. The system for remote multi-person real-time cloud group photo based on the technology of image segmentation and image fusion comprises the following modules:

the system comprises a connection establishing module, a real-time cloud group photo platform, a video processing module and a video processing module, wherein the connection establishing module establishes group photo tasks when receiving group photo requests submitted by users, the group photo tasks establish an independent room number to avoid conflict of a plurality of group photo tasks at the same time, and a server of the real-time cloud group photo platform transmits user ids, video reasoning addresses and stream addresses after group photo participation in each room;

the system comprises an image acquisition and uploading module, wherein a client of a user acquires real-time image data through a video data acquisition device, the client encodes the real-time image data by adopting an H264 or H265 encoding mode and then transmits the encoded video data by adopting a streaming media protocol, and the acquired video data is encoded and processed according to an assigned room number and a rear end and then is pushed to a streaming media server by adopting an RTMP protocol according to an appointed streaming address after being communicated with the rear end; the image real-time segmentation module is used for pulling and decoding a real-time video stream pushed by a client into normal video data based on ffmpeg, then invoking a portrait segmentation algorithm to perform real-time matting, wherein a plurality of video streams pushed by the client exist in the same room, and after mask data of an image are acquired, multi-path portrait fusion is performed;

the image real-time fusion module is used for calculating the initial area position of the photo in the background according to the priority of each user, the number of people participating in the photo and the scaling parameter of the people, which are set by each user, and then fusing the buckled portrait data on the appointed background according to the ranking strategy, wherein the calculation ranking processing mode comprises the following steps:

user_group_photo_w＝bg_width*group_photo_ratio*(user_camera_num/

total_camera_num)；

image＝resize(image,scale)

image＝crop(image,crop_x,crop_y,crop_width,crop_height)；

locate_x＝total_camera_width-user_group_photo_w

locate_y＝bg_height-distance_from_bottom-height；

bg_b＝src_b*cls+(1-cls)*bg_b

bg_g＝src_g*cls+(1-cls)*bg_g

bg_r＝src_r*cls+(1-cls)*bg_r

7. The system of claim 6, wherein the client is a window PC device and the video data acquisition device is a USB camera for acquiring real-time image data.

8. A system for remote multi-person real-time cloud group photo as recited in claim 6, wherein said real-time matting further comprises: in order to improve the effect of image matting, an image matting algorithm is designed based on a Mobilene-v 3 or Resnet50 deep learning algorithm, and an engine based on TensorRT reasoning is utilized to deploy the image matting algorithm.

9. The system of claim 6, wherein the image real-time segmentation module further comprises: firstly, informing the back end of the real-time cloud group photo platform of related user information by a client, then issuing the back end to a server to explain the number of the group photo of a plurality of users, then calculating the group photo area of each user in the background, and automatically cutting out the person area beyond the boundary of the background group photo area according to the adjustable scaling coefficient, if judging that the video data acquisition devices of a plurality of users have different camera environments, manually updating the corresponding coefficient by informing a background management platform so as to achieve a relatively coordinated and natural picture, and reducing abrupt; the arrangement logic of the multiple paths of video streams is arranged from the middle to the two sides, and the left side is arranged in preference to the right side, and the arrangement logic is arranged in the middle no matter whether the video paths are odd or even.

10. A system for remote multi-person real-time cloud group photo as claimed in claim 9, wherein kalman tracking is added to the image collected in real time, the shake of the portrait frame is reduced to alleviate the shake of the picture caused by the person walking back and forth, the smallest rectangular frame divided by the current frame is compared with the iou of the previous frame tracking frame, and the transverse width ratio of the two frames is compared, when a certain coefficient is not exceeded, the frame of the previous frame is used as the portrait frame of the current frame, and the picture is reduced by a certain shake.