CN116600147B - Method and system for remote multi-person real-time cloud group photo - Google Patents
Method and system for remote multi-person real-time cloud group photo Download PDFInfo
- Publication number
- CN116600147B CN116600147B CN202211716741.0A CN202211716741A CN116600147B CN 116600147 B CN116600147 B CN 116600147B CN 202211716741 A CN202211716741 A CN 202211716741A CN 116600147 B CN116600147 B CN 116600147B
- Authority
- CN
- China
- Prior art keywords
- image
- photo
- user
- real
- group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 230000004927 fusion Effects 0.000 claims abstract description 23
- 230000011218 segmentation Effects 0.000 claims abstract description 20
- 230000000694 effects Effects 0.000 claims abstract description 11
- 238000005516 engineering process Methods 0.000 claims abstract description 11
- 238000007499 fusion processing Methods 0.000 claims abstract description 5
- 238000004422 calculation algorithm Methods 0.000 claims description 25
- 238000012545 processing Methods 0.000 claims description 11
- 230000006835 compression Effects 0.000 claims description 9
- 238000007906 compression Methods 0.000 claims description 9
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000003709 image segmentation Methods 0.000 claims description 6
- 238000013135 deep learning Methods 0.000 claims description 5
- 238000004891 communication Methods 0.000 claims description 4
- 238000013480 data collection Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 206010016322 Feeling abnormal Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/21805—Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
- H04N21/23424—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/272—Means for inserting a foreground image in a background image, i.e. inlay, outlay
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a method for realizing remote multi-person real-time cloud group photo, which is based on a portrait segmentation and image fusion technology, wherein image data is pushed to a streaming media server by adopting RTMP protocol coding, and real-time video stream decoding of each path of client side is pulled from the streaming media server, image fusion processing is carried out after portrait segmentation is carried out to obtain image data of group photo, the data is correspondingly pulled by the coding push client side, and the situation of the same frame group photo of each person after picture fusion can be seen in real time only by using a camera to collect personal whole-body portrait video data and pushing the same to a server side and pulling new fused video stream by relying on an AI portrait segmentation and background fusion technology and the streaming media technology. In addition, in order to ensure the effect, the station positions of the user can be dynamically adjusted, the background can be switched in real time to switch the background of the photo, the front station position and the back station position are set, the scaling of the figure is set, even the background is replaced by a dynamic video and the like, and the front end can be dynamically updated accordingly.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a method and a system for remote multi-person real-time cloud group photo.
Background
Today, with the rapid development of electronic technology. The shooting mode generally requires participants to be in the scene, the participants stand in sequence according to the arrangement, then shoot under the guidance of a photographer, and then repair the pictures in the later stage. If some people cannot participate in the scene due to other conditions, the people can unfortunately miss the group photo with everyone, and certainly, the people can also join in through the post P-picture processing. The photographing mode is inflexible, and the personalized requirements of users cannot be met. Particularly in the current environment, various activities are generally performed on-line, and cannot take part in outside, so that the demand of cloud group photo is also more and more vigorous.
At present, a cloud group photo mode exists in the market, but basically, each person is required to self-shoot a static image, and the static image is uploaded to a server for splicing and fusion. Such as implementation of cloud graduation for schools during epidemic situations. The modes have a common characteristic that a user selects a specific position, then self-photograph is uploaded after self-photographing, and background zooming and background fusion are carried out. The user cannot see the composition process, and the user can only see the final effect. If the user feels bad, the user can only find the angle and take the images again after making the expression, and then upload the images again. In the process, each person takes own pictures, no communication exists in the whole process, no guidance of photographers exists, and the hot atmosphere for shooting collective group pictures is lacking.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art. The invention discloses a method for remote multi-person real-time cloud group photo, which is based on a portrait segmentation and image fusion technology, wherein image data is pushed to a streaming media server by adopting RTMP protocol coding, and the real-time video stream of each client is pulled from the streaming media server to decode the image after portrait segmentation, and then image fusion processing is carried out to obtain the image data of the group photo, the data is correspondingly pulled to the encoding push client, and the method for remote multi-person real-time cloud group photo comprises the following steps:
step 1, when receiving a group photo request submitted by a user, a real-time cloud group photo platform establishes a group photo task, wherein the group photo task is to create a single room number to avoid conflict of a plurality of group photo tasks at the same time, and a server of the real-time cloud group photo platform transmits user ids, video push stream addresses and stream addresses after group photo participation in each room;
step 2, a client of a user collects real-time image data through a video data collection device, after the client collects the real-time image data, the client adopts an H264 or H265 coding mode to code and then adopts a streaming media protocol to send the coded video data, and after the client carries out communication with the rear end according to the assigned room number, the collected video data is coded and processed and then flows to a streaming media server according to a designated streaming address by adopting an RTMP protocol;
step 3, pulling and decoding the real-time video stream pushed by the client into normal video data based on ffmpeg, then calling a portrait segmentation algorithm to perform real-time matting, wherein a plurality of video streams pushed by the client exist in the same room, and after mask data of an image are acquired, fusion of multiple ways of portraits is performed;
step 4, calculating the initial area position of the photo in the background according to the priority of each user, the number of people participating in the photo and the scaling parameter of the people, which are set by each user, and then fusing the portrait data buckled in the step 3 on the appointed background according to a ranking strategy;
and step 5, because the user information, the room information and the fused reasoning address which are transmitted by the back end of the real-time cloud group photo platform and participate in the group photo are acquired, aiming at the fused image data, namely the real-time group photo, video coding compression is required to be carried out according to the appointed stream address by adopting the scheme same as that of the client, the video is pushed to a streaming media server by using an RTMP protocol, and the client pulls the video stream address, so that the fused group photo image can be seen in real time.
Furthermore, the client is window PC equipment, and the video data acquisition device is a USB camera for acquiring real-time image data.
Still further, the real-time matting further includes: in order to improve the effect of image matting, an image matting algorithm is designed based on a Mobilene-v 3 or Resnet50 deep learning algorithm, and the image matting algorithm is deployed by using a TensorRT reasoning engine.
Still further, the step 4 further includes: firstly, informing the back end of the real-time cloud group photo platform of related user information by a client, then issuing the back end to a server to explain the number of the group photo of a plurality of users, then calculating the group photo area of each user in the background, and automatically cutting out the person area beyond the boundary of the background group photo area according to the adjustable scaling coefficient, if judging that the video data acquisition devices of a plurality of users have different camera environments, manually updating the corresponding coefficient by informing a background management platform so as to achieve a relatively coordinated and natural picture, and reducing abrupt; the arrangement logic of the multiple paths of video streams is arranged from the middle to the two sides, and the left side is arranged in preference to the right side, and the arrangement logic is arranged in the middle no matter whether the video paths are odd or even.
Furthermore, kalman tracking is added to the image acquired in real time, so that the shake of a human frame is reduced to alleviate the shake of a picture caused by the back and forth movement of a human figure, the minimum rectangular frame divided by the current frame is compared with the iou of the tracking frame of the previous frame, and the transverse width ratio of the two frames is compared, when a certain coefficient is not exceeded, the frame of the previous frame is used as the human figure frame of the current frame, and the picture can reduce certain shake.
Further, the calculating and ranking processing manner in the step 4 includes the following steps:
step 401, calculating the width of the user picture in the background area, wherein the width user_group_photo_w of the user picture in the background area is obtained by multiplying the width bg_width of the background image designated by the back end by a default scaling factor group_photo_ratio, and multiplying the ratio of the current user photo number user_camera_num to the total number of people participating in the photo, and the calculation formula of the user_group_photo_w is as follows:
user_group_photo_w=bg_width*group_photo_ratio*(user_camera_num/total_camera_num);
in step 402, the user's camera image is scaled uniformly, and is cut out beyond the user_group_photo_w, when the image person scale is too large and too small in the background area, the real-time cloud group photo platform can dynamically adjust the person scale to coordinate the person, when the user's camera image exceeds the camera width user_group_photo_w, the cutting position is recalculated, the crop_x represents the x coordinate of the upper left corner of the cut camera image, the crop_y represents the y coordinate of the upper left corner of the cut camera image, the crop_width represents the width of the cut camera image, and the height of the cut camera image represented by the crop_height, and the user's camera image is represented as:
image=resize(image,scale)
image=crop(image,crop_x,crop_y,crop_width,crop_height);
step 403, calculating a starting point coordinate location_x and location_y of each camera picture in a background area by participating in the total width of the cameras of the photo users, and keeping the integrated total picture centered to obtain the discharge position of the user picture, wherein the starting point coordinates location_x and location_y are expressed as:
locate_x=total_camera_width-user_group_photo_w
locate_y=bg_height-distance_from_bottom-height;
step 404, merging the original image and the background together by means of image pixel weighting.
bg_b=src_b*cls+(1-cls)*bg_b
bg_g=src_g*cls+(1-cls)*bg_g
bg_r=src_r*cls+(1-cls)*bg_r
Wherein src is an original image, cls is an AI algorithm for segmentation to obtain a probability map, and bg is a background image.
The invention also discloses a system for remote multi-person real-time cloud group photo, which is based on the technology of image segmentation and image fusion, wherein image data is encoded and pushed to a streaming media server by adopting an RTMP protocol, and the system for remote multi-person real-time cloud group photo comprises the following modules:
the system comprises a connection establishing module, a real-time cloud group photo platform, a video processing module and a group photo processing module, wherein the connection establishing module establishes group photo tasks when receiving group photo requests submitted by users, and the group photo tasks establish an independent room number to avoid conflict among a plurality of group photo tasks at the same time, wherein a server of the real-time cloud group photo platform transmits user ids, video push stream addresses and stream addresses after group photo participation in each room;
the system comprises an image acquisition and uploading module, wherein a client of a user acquires real-time image data through a video data acquisition device, the client encodes the real-time image data by adopting an H264 or H265 encoding mode and then transmits the encoded video data by adopting a streaming media protocol, and the acquired video data is encoded and processed according to an assigned room number and a rear end and then is pushed to a streaming media server by adopting an RTMP protocol according to an appointed streaming address after being communicated with the rear end;
the image real-time segmentation module is used for pulling and decoding a real-time video stream pushed by a client into normal video data based on ffmpeg, then invoking a portrait segmentation algorithm to perform real-time matting, wherein a plurality of video streams pushed by the client exist in the same room, and after mask data of an image are acquired, multi-path portrait fusion is performed;
the image real-time fusion module is used for calculating the initial area position of the photo in the background according to the priority of each user, the number of people participating in the photo and the scaling parameter of the people, which are set by each user, and then fusing the buckled portrait data on the appointed background according to the ranking strategy;
and the image issuing module acquires user information, room information and a fused reasoning address of the participation photo issued by the rear end of the real-time cloud photo platform, and aims at fused image data, namely the real-time photo, adopts the scheme which is the same as that of the client to carry out video coding compression, then pushes the video coding compression to a streaming media server according to a designated streaming address by using an RTMP protocol, and the client pulls the video streaming address, so that the fused photo image can be seen in real time.
Furthermore, the client is window PC equipment, and the video data acquisition device is a USB camera for acquiring real-time image data.
Still further, the real-time matting further includes: in order to improve the effect of image matting, an image matting algorithm is designed based on a Mobilene-v 3 or Resnet50 deep learning algorithm, and the image matting algorithm is deployed by using a TensorRT reasoning engine.
Still further, the image real-time segmentation module further includes: firstly, informing the back end of the real-time cloud group photo platform of related user information by a client, then issuing the back end to a server to explain the number of the group photo of a plurality of users, then calculating the group photo area of each user in the background, and automatically cutting out the person area beyond the boundary of the background group photo area according to the adjustable scaling coefficient, if judging that the video data acquisition devices of a plurality of users have different camera environments, manually updating the corresponding coefficient by informing a background management platform so as to achieve a relatively coordinated and natural picture, and reducing abrupt; the arrangement logic of the multiple paths of video streams is arranged from the middle to the two sides, and the left side is arranged in preference to the right side, and the arrangement logic is arranged in the middle no matter whether the video paths are odd or even.
Furthermore, kalman tracking is added to the image acquired in real time, so that the shake of a human frame is reduced to alleviate the shake of a picture caused by the back and forth movement of a human figure, the minimum rectangular frame divided by the current frame is compared with the iou of the tracking frame of the previous frame, and the transverse width ratio of the two frames is compared, when a certain coefficient is not exceeded, the frame of the previous frame is used as the human figure frame of the current frame, and the picture can reduce certain shake.
Compared with the prior art, the invention has the beneficial effects that: the invention relies on the AI image segmentation and background fusion technology and the streaming media technology, and the same-frame group photo condition of each person after the picture fusion can be seen in real time only by using a camera to collect the personal whole-body image video data and pushing the personal whole-body image video data to a server and then pulling a new fused video stream. In addition, in order to ensure the effect, the station positions of the user can be dynamically adjusted, the background can be switched in real time to switch the background of the photo, the front station position and the back station position are set, the scaling of the figure is set, even the background is replaced by a dynamic video and the like, and the front end can be dynamically updated accordingly. By the mode, people can conveniently and rapidly group photo in different places, and even can punch cards in scenic spots in clouds. The voice frequency function can be added, so that face-to-face voice communication, mutual guidance of station conditions of the opposite sides and the like can be realized when different-place group shadows are formed. Not only is quick and convenient, but also is efficient, and can also interact.
Drawings
The invention will be further understood from the following description taken in conjunction with the accompanying drawings. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the embodiments. In the figures, like reference numerals designate corresponding parts throughout the different views.
Fig. 1 is an overall flowchart of a remote multi-person real-time cloud group photo method of the present invention.
FIG. 2 is a flow chart of model transformation and deployment according to an embodiment of the present invention.
Fig. 3 is a schematic flow chart of multi-person group photo in the system of the remote multi-person real-time cloud group photo of the invention.
Detailed Description
As shown in fig. 1-3, the present invention relies on the technology of image segmentation and image fusion, and uses RTMP protocol encoding to push image data to a streaming media server, and further uses the streaming media server to pull real-time video stream decoding of each path of clients to perform image segmentation, and then performs image fusion processing, so as to obtain image data of a group photo, and uses the data to encode the push client to correspondingly pull video stream.
For each group photo task, a separate room number needs to be created, so that the conflict of a plurality of group photo tasks at the same time is avoided. The user id of each room reference and group photo, the video push stream address and the stream address after group photo are issued by the server, so that the uniqueness is ensured.
The whole scheme flow chart is shown in fig. 1 in detail, and the specific method and flow are as follows:
1. video data acquisition and transmission
Because video image data is larger, the transmission in a frame mode consumes larger bandwidth resources, and the coded video data is generally transmitted by adopting a streaming media protocol after being coded in a mode of H264 or H265 and the like. The client typically employs a window PC device, and uses a USB camera to collect real-time image data. Based on the powerful audio and video processing capability of ffmpeg, the module is mainly used for encoding collected video data according to an assigned room number and a designated stream address after communicating with the rear end, and pushing the video data to a streaming media server by adopting an RTMP protocol.
2. Data streaming and portrait segmentation
And pulling and decoding the real-time video stream pushed by the client based on the ffmpeg to form normal video data, and then calling a portrait segmentation algorithm to perform real-time matting.
In order to improve the effect of image matting, the image matting algorithm is designed based on backbones such as Mobilene-v 3 and Resnet50 by using a deep learning algorithm, the Mobilene-v 3 is faster than Resnet50 by a few milliseconds, the accuracy is not great, the Resnet50 is slightly better in visual effect, and a Resnet50 model is much larger. The algorithm is deployed by using a TensorRT reasoning engine, and good instantaneity and a matting effect can be achieved on display cards such as Indellovely P40, RTX3060, RTX2080Ti and the like, and specific model conversion deployment is shown in figure 2.
Video streams pushed by a plurality of clients exist in the same room, and after mask data of the images are acquired, fusion of multiple paths of images is needed.
3. Portrait fusion scheme
For the figure data buckled in 2, fusion on the specified background is required. How can users who participate in the group photo be ensured to see each other to appear in the same background picture and have better effect?
The general area position of the photo is calculated in the background by the parameters of the priority of each user, the number of people participating in the photo, the scaling of the people and the like, which are set by each user, and the general calculation and arrangement processing logic is described as follows:
the method comprises the steps that firstly, the client side informs the back end of relevant user information, the back end sends the back end to the server to explain that a plurality of users participate in the photo, the photo number of each user is what, then the photo area of each user in the background is calculated, and according to the adjustable scaling coefficient, the person area exceeding the boundary of the background photo area is automatically cut. According to different camera environments, corresponding coefficients can be manually updated through a background management platform, so that a relatively coordinated and natural picture is achieved, and the abrupt sense is reduced.
For the joint photographic arrangement logic of the multipath video streams, a relatively simple arrangement algorithm is adopted, the logic is arranged from the middle to the two sides, and the left side is arranged in preference to the right side, and the video paths are arranged in the middle no matter whether the video paths are odd or even.
In order to solve the problem of image shake caused by the fact that the person with the portrait matting moves back and forth, the product still adds Kalman tracking, and shake of a portrait frame is reduced. The minimum rectangular frame divided by the current frame is compared with the iou of the tracking frame of the previous frame, and when the transverse width ratio of the two frames does not exceed a certain coefficient, the frame of the previous frame is used as the portrait frame of the current frame, and the picture can reduce certain jitter.
The general calculation and arrangement processing mode is as follows:
1. and calculating the width of the user picture in the background area.
And obtaining the occupied width user_group_photo_w of the user picture in the background area by multiplying the width bg_width of the background image designated by the rear end by a default proportionality coefficient group_photo_ratio and multiplying the ratio of the current user photo number user_camera_num to the total number total_camera_num participating in the photo.
user_group_photo_w=bg_width*group_photo_ratio*(user_camera_num/total_camera_num)
total_camera_width+=user_group_photo_w
2. The user camera view image is scaled uniformly and cropped beyond the user_group_photo_w. When the character proportion of the camera picture is too large and too small in the background area, the management end can dynamically adjust the character scaling scale so as to coordinate the characters. When the user's camera frame exceeds the camera width user_group_photo_w, the clipping position is recalculated, the crop_x represents the x coordinate of the left upper corner of the clipping camera frame, the crop_y represents the y coordinate of the left upper corner of the clipping camera frame, the crop_width represents the width of the clipping camera frame, and the crop_height represents the height of the clipping camera frame.
image=resize(image,scale)
image=crop(image,crop_x,crop_y,crop_width,crop_height)
3. Discharge position of user screen. And solving the starting point coordinates locator_x and locator_y of each camera picture in the background area through the total width of the cameras of the participating photo users. And keeping the overall picture after fusion as centered as possible.
locate_x=total_camera_width-user_group_photo_w
locate_y=bg_height-distance_from_bottom-height
4. Background fusion. Wherein src is an original image, cls is an AI algorithm for segmentation to obtain a probability map, bg is a background image, and the original image and the background are combined together in an image pixel weighting mode.
bg_b=src_b*cls+(1-cls)*bg_b
bg_g=src_g*cls+(1-cls)*bg_g
bg_r=src_r*cls+(1-cls)*bg_r
4. Photo data output
Because the user information, the room information and the fused reasoning address and the like which are issued by the back end and participate in the photo are acquired, aiming at the fused image data, namely the real-time photo, the video coding compression is required to be carried out by adopting the scheme which is the same as that of the client, and then the video coding compression is carried out according to the designated stream address, and the RTMP protocol is used for pushing the video coding compression to the streaming media server. The client pulls the video stream address, and can see the fused group photo image in real time.
When the user moves back and forth and left and right, the figures in the group photo also change in real time.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
While the invention has been described above with reference to various embodiments, it should be understood that many changes and modifications can be made without departing from the scope of the invention. It is therefore intended that the foregoing detailed description be regarded as illustrative rather than limiting, and that it be understood that it is the following claims, including all equivalents, that are intended to define the spirit and scope of this invention. The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.
Claims (10)
1. The method for carrying out image fusion processing to obtain image data of a group photo after carrying out image segmentation by carrying out real-time video stream decoding of each path of client side by using a streaming media server to carry out image fusion processing on the image data based on a portrait segmentation and image fusion technology, and carrying out encoding and streaming client side corresponding pulling video on the data is characterized by comprising the following steps of:
step 1, when receiving a group photo request submitted by a user, a real-time cloud group photo platform establishes a group photo task, wherein the group photo task is to create a single room number to avoid conflict of a plurality of group photo tasks at the same time, and a server of the real-time cloud group photo platform transmits user ids, video reasoning addresses and stream addresses after group photo participation in each room;
step 2, a client of a user collects real-time image data through a video data collection device, after the client collects the real-time image data, the client adopts an H264 or H265 coding mode to code and then adopts a streaming media protocol to send the coded video data, and after the client carries out communication with the rear end according to the assigned room number, the collected video data is coded and processed and then flows to a streaming media server according to a designated streaming address by adopting an RTMP protocol;
step 3, pulling and decoding the real-time video stream pushed by the client into normal video data based on ffmpeg, then calling a portrait segmentation algorithm to perform real-time matting, wherein a plurality of video streams pushed by the client exist in the same room, and after mask data of an image are acquired, fusion of multiple ways of portraits is performed;
and 4, calculating the initial area position of the photo in the background according to the priority of each user, the number of people participating in the photo and the scaling parameter of the people, which are set by each user, and then fusing the portrait data buckled in the step 3 on the appointed background according to a ranking strategy, wherein the calculating and ranking processing mode comprises the following steps:
step 401, calculating the width of the user picture in the background area, wherein the width user_group_photo_w of the user picture in the background area is obtained by multiplying the width bg_width of the background image designated by the back end by a default scaling factor group_photo_ratio, and multiplying the ratio of the current user photo number user_camera_num to the total number of people participating in the photo, and the calculation formula of the user_group_photo_w is as follows:
user_group_photo_w=bg_width*group_photo_ratio*(user_camera_num/
total_camera_num);
in step 402, the user's camera image is scaled uniformly, and is cropped beyond the user_group_photo_w, when the image person scale is too large and too small in the background area, the real-time cloud group image platform can dynamically adjust the person scale to coordinate the person, and when the user's camera image exceeds the camera width user_group_photo_w, the cropping position is recalculated, crop_x represents the upper left corner x coordinate of the cropped camera image, crop_y represents the upper left corner y coordinate of the cropped camera image, crop_width represents the width of the cropped camera image, and crop_height represents the height of the cropped camera image, and the user's camera image is represented as:
image=resize(image,scale)
image=crop(image,crop_x,crop_y,crop_width,crop_height);
step 403, calculating a starting point coordinate location_x and location_y of each camera picture in a background area by participating in the total width of the cameras of the photo users, and keeping the integrated total picture centered to obtain the discharge position of the user picture, wherein the starting point coordinates location_x and location_y are expressed as:
locate_x=total_camera_width-user_group_photo_w
locate_y=bg_height-distance_from_bottom-height;
step 404, combining the original image with the background by means of image pixel weighting,
bg_b=src_b*cls+(1-cls)*bg_b
bg_g=src_g*cls+(1-cls)*bg_g
bg_r=src_r*cls+(1-cls)*bg_r
wherein src is an original image, cls is an AI algorithm for segmentation to obtain a probability map, bg is a background image;
and step 5, because the user information, the room information and the fused reasoning address which are transmitted by the back end of the real-time cloud group photo platform and participate in the group photo are acquired, aiming at the fused image data, namely the real-time group photo, video coding compression is required to be carried out according to the appointed stream address by adopting the scheme same as that of the client, the video is pushed to a streaming media server by using an RTMP protocol, and the client pulls the video stream address, so that the fused group photo image can be seen in real time.
2. The method for remote multi-person real-time cloud group photo as claimed in claim 1, wherein said client is a window PC device, and said video data acquisition device is a USB camera for acquiring real-time image data.
3. A method of multi-person-in-place real-time cloud group photo as recited in claim 1, wherein said real-time matting further comprises: in order to improve the effect of image matting, an image matting algorithm is designed based on a Mobilene-v 3 or Resnet50 deep learning algorithm, and an engine based on TensorRT reasoning is utilized to deploy the image matting algorithm.
4. The method of remote multi-person real-time cloud group photo as claimed in claim 1, wherein said step 4 further comprises: firstly, informing the back end of the real-time cloud group photo platform of related user information by a client, then issuing the back end to a server to explain the number of the group photo of a plurality of users, then calculating the group photo area of each user in the background, and automatically cutting out the person area beyond the boundary of the background group photo area according to the adjustable scaling coefficient, if judging that the video data acquisition devices of a plurality of users have different camera environments, manually updating the corresponding coefficient by informing a background management platform so as to achieve a relatively coordinated and natural picture, and reducing abrupt; the arrangement logic of the multiple paths of video streams is arranged from the middle to the two sides, and the left side is arranged in preference to the right side, and the arrangement logic is arranged in the middle no matter whether the video paths are odd or even.
5. The method of claim 1, wherein a kalman tracking is added to the image collected in real time, the shake of the portrait frame is reduced to alleviate the shake of the picture caused by the person walking back and forth, the smallest rectangular frame divided by the current frame is compared with the iou of the previous frame tracking frame, and the transverse width ratio of the two frames is compared, when a certain coefficient is not exceeded, the frame of the previous frame is used as the portrait frame of the current frame, and the picture is reduced by a certain shake.
6. The system for remote multi-person real-time cloud group photo based on the technology of image segmentation and image fusion comprises the following modules:
the system comprises a connection establishing module, a real-time cloud group photo platform, a video processing module and a video processing module, wherein the connection establishing module establishes group photo tasks when receiving group photo requests submitted by users, the group photo tasks establish an independent room number to avoid conflict of a plurality of group photo tasks at the same time, and a server of the real-time cloud group photo platform transmits user ids, video reasoning addresses and stream addresses after group photo participation in each room;
the system comprises an image acquisition and uploading module, wherein a client of a user acquires real-time image data through a video data acquisition device, the client encodes the real-time image data by adopting an H264 or H265 encoding mode and then transmits the encoded video data by adopting a streaming media protocol, and the acquired video data is encoded and processed according to an assigned room number and a rear end and then is pushed to a streaming media server by adopting an RTMP protocol according to an appointed streaming address after being communicated with the rear end; the image real-time segmentation module is used for pulling and decoding a real-time video stream pushed by a client into normal video data based on ffmpeg, then invoking a portrait segmentation algorithm to perform real-time matting, wherein a plurality of video streams pushed by the client exist in the same room, and after mask data of an image are acquired, multi-path portrait fusion is performed;
the image real-time fusion module is used for calculating the initial area position of the photo in the background according to the priority of each user, the number of people participating in the photo and the scaling parameter of the people, which are set by each user, and then fusing the buckled portrait data on the appointed background according to the ranking strategy, wherein the calculation ranking processing mode comprises the following steps:
step 401, calculating the width of the user picture in the background area, wherein the width user_group_photo_w of the user picture in the background area is obtained by multiplying the width bg_width of the background image designated by the back end by a default scaling factor group_photo_ratio, and multiplying the ratio of the current user photo number user_camera_num to the total number of people participating in the photo, and the calculation formula of the user_group_photo_w is as follows:
user_group_photo_w=bg_width*group_photo_ratio*(user_camera_num/
total_camera_num);
in step 402, the user's camera image is scaled uniformly, and is cropped beyond the user_group_photo_w, when the image person scale is too large and too small in the background area, the real-time cloud group image platform can dynamically adjust the person scale to coordinate the person, and when the user's camera image exceeds the camera width user_group_photo_w, the cropping position is recalculated, crop_x represents the upper left corner x coordinate of the cropped camera image, crop_y represents the upper left corner y coordinate of the cropped camera image, crop_width represents the width of the cropped camera image, and crop_height represents the height of the cropped camera image, and the user's camera image is represented as:
image=resize(image,scale)
image=crop(image,crop_x,crop_y,crop_width,crop_height);
step 403, calculating a starting point coordinate location_x and location_y of each camera picture in a background area by participating in the total width of the cameras of the photo users, and keeping the integrated total picture centered to obtain the discharge position of the user picture, wherein the starting point coordinates location_x and location_y are expressed as:
locate_x=total_camera_width-user_group_photo_w
locate_y=bg_height-distance_from_bottom-height;
step 404, combining the original image with the background by means of image pixel weighting,
bg_b=src_b*cls+(1-cls)*bg_b
bg_g=src_g*cls+(1-cls)*bg_g
bg_r=src_r*cls+(1-cls)*bg_r
wherein src is an original image, cls is an AI algorithm for segmentation to obtain a probability map, bg is a background image;
and the image issuing module acquires user information, room information and a fused reasoning address of the participation photo issued by the rear end of the real-time cloud photo platform, and aims at fused image data, namely the real-time photo, adopts the scheme which is the same as that of the client to carry out video coding compression, then pushes the video coding compression to a streaming media server according to a designated streaming address by using an RTMP protocol, and the client pulls the video streaming address, so that the fused photo image can be seen in real time.
7. The system of claim 6, wherein the client is a window PC device and the video data acquisition device is a USB camera for acquiring real-time image data.
8. A system for remote multi-person real-time cloud group photo as recited in claim 6, wherein said real-time matting further comprises: in order to improve the effect of image matting, an image matting algorithm is designed based on a Mobilene-v 3 or Resnet50 deep learning algorithm, and an engine based on TensorRT reasoning is utilized to deploy the image matting algorithm.
9. The system of claim 6, wherein the image real-time segmentation module further comprises: firstly, informing the back end of the real-time cloud group photo platform of related user information by a client, then issuing the back end to a server to explain the number of the group photo of a plurality of users, then calculating the group photo area of each user in the background, and automatically cutting out the person area beyond the boundary of the background group photo area according to the adjustable scaling coefficient, if judging that the video data acquisition devices of a plurality of users have different camera environments, manually updating the corresponding coefficient by informing a background management platform so as to achieve a relatively coordinated and natural picture, and reducing abrupt; the arrangement logic of the multiple paths of video streams is arranged from the middle to the two sides, and the left side is arranged in preference to the right side, and the arrangement logic is arranged in the middle no matter whether the video paths are odd or even.
10. A system for remote multi-person real-time cloud group photo as claimed in claim 9, wherein kalman tracking is added to the image collected in real time, the shake of the portrait frame is reduced to alleviate the shake of the picture caused by the person walking back and forth, the smallest rectangular frame divided by the current frame is compared with the iou of the previous frame tracking frame, and the transverse width ratio of the two frames is compared, when a certain coefficient is not exceeded, the frame of the previous frame is used as the portrait frame of the current frame, and the picture is reduced by a certain shake.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211716741.0A CN116600147B (en) | 2022-12-29 | 2022-12-29 | Method and system for remote multi-person real-time cloud group photo |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211716741.0A CN116600147B (en) | 2022-12-29 | 2022-12-29 | Method and system for remote multi-person real-time cloud group photo |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116600147A CN116600147A (en) | 2023-08-15 |
CN116600147B true CN116600147B (en) | 2024-03-29 |
Family
ID=87588613
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211716741.0A Active CN116600147B (en) | 2022-12-29 | 2022-12-29 | Method and system for remote multi-person real-time cloud group photo |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116600147B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102625129A (en) * | 2012-03-31 | 2012-08-01 | 福州一点通广告装饰有限公司 | Method for realizing remote reality three-dimensional virtual imitated scene interaction |
CN107404617A (en) * | 2017-07-21 | 2017-11-28 | 努比亚技术有限公司 | A kind of image pickup method and terminal, computer-readable storage medium |
CN109118558A (en) * | 2017-06-23 | 2019-01-01 | 沈瑜越 | The method taken a group photo to multiple group photo objects in different time points or place |
CN112601044A (en) * | 2020-12-08 | 2021-04-02 | 深圳市焦点数字科技有限公司 | Conference scene picture self-adaption method |
CN112954221A (en) * | 2021-03-11 | 2021-06-11 | 深圳市几何数字技术服务有限公司 | Method for real-time photo shooting |
CN115209111A (en) * | 2022-07-26 | 2022-10-18 | 北京新方通信技术有限公司 | Home and office video monitoring method and system supporting real-time background replacement |
CN115423728A (en) * | 2021-05-13 | 2022-12-02 | 海信集团控股股份有限公司 | Image processing method, device and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113141346B (en) * | 2021-03-16 | 2023-04-28 | 青岛小鸟看看科技有限公司 | VR one-to-multiple system and method based on series flow |
-
2022
- 2022-12-29 CN CN202211716741.0A patent/CN116600147B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102625129A (en) * | 2012-03-31 | 2012-08-01 | 福州一点通广告装饰有限公司 | Method for realizing remote reality three-dimensional virtual imitated scene interaction |
CN109118558A (en) * | 2017-06-23 | 2019-01-01 | 沈瑜越 | The method taken a group photo to multiple group photo objects in different time points or place |
CN107404617A (en) * | 2017-07-21 | 2017-11-28 | 努比亚技术有限公司 | A kind of image pickup method and terminal, computer-readable storage medium |
CN112601044A (en) * | 2020-12-08 | 2021-04-02 | 深圳市焦点数字科技有限公司 | Conference scene picture self-adaption method |
CN112954221A (en) * | 2021-03-11 | 2021-06-11 | 深圳市几何数字技术服务有限公司 | Method for real-time photo shooting |
CN115423728A (en) * | 2021-05-13 | 2022-12-02 | 海信集团控股股份有限公司 | Image processing method, device and system |
CN115209111A (en) * | 2022-07-26 | 2022-10-18 | 北京新方通信技术有限公司 | Home and office video monitoring method and system supporting real-time background replacement |
Also Published As
Publication number | Publication date |
---|---|
CN116600147A (en) | 2023-08-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9692964B2 (en) | Modification of post-viewing parameters for digital images using image region or feature information | |
JP4057241B2 (en) | Improved imaging system with virtual camera | |
US7911513B2 (en) | Simulating short depth of field to maximize privacy in videotelephony | |
US9129381B2 (en) | Modification of post-viewing parameters for digital images using image region or feature information | |
CN111402399B (en) | Face driving and live broadcasting method and device, electronic equipment and storage medium | |
GB2440376A (en) | Wide angle video conference imaging | |
CN113099245B (en) | Panoramic video live broadcast method, system and computer readable storage medium | |
US11076127B1 (en) | System and method for automatically framing conversations in a meeting or a video conference | |
CN103873453B (en) | Immerse communication customer end, server and the method for obtaining content view | |
CN104169842B (en) | For controlling method, the method for operating video clip, face orientation detector and the videoconference server of video clip | |
CN109547724B (en) | Video stream data processing method, electronic equipment and storage device | |
JP2005117163A (en) | Camera server apparatus, control method thereof, computer program and computer-readable storage medium | |
CN113391644A (en) | Unmanned aerial vehicle shooting distance semi-automatic optimization method based on image information entropy | |
JP2003111041A (en) | Image processor, image processing system, image processing method, storage medium and program | |
CN116600147B (en) | Method and system for remote multi-person real-time cloud group photo | |
JP2004266670A (en) | Image pickup device and method, image information providing system and program | |
WO2021200184A1 (en) | Information processing device, information processing method, and program | |
JP2005142765A (en) | Apparatus and method for imaging | |
CN112887653B (en) | Information processing method and information processing device | |
CN113784084A (en) | Processing method and device | |
WO2021184326A1 (en) | Control method and apparatus for electronic apparatus, and device and system | |
CN115997379A (en) | Restoration of image FOV for stereoscopic rendering | |
CN108989327A (en) | A kind of virtual reality server system | |
CN109862419B (en) | Intelligent digital laser television interaction method and system | |
JP2003125389A (en) | Apparatus and method for delivering image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |