WO2023055365A1 - Système, procédé et support lisible par ordinateur conçus pour un traitement vidéo - Google Patents

Système, procédé et support lisible par ordinateur conçus pour un traitement vidéo Download PDF

Info

Publication number
WO2023055365A1
WO2023055365A1 PCT/US2021/052779 US2021052779W WO2023055365A1 WO 2023055365 A1 WO2023055365 A1 WO 2023055365A1 US 2021052779 W US2021052779 W US 2021052779W WO 2023055365 A1 WO2023055365 A1 WO 2023055365A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
region
video
user terminal
interactive
Prior art date
Application number
PCT/US2021/052779
Other languages
English (en)
Inventor
Shao Yuan Wu
Po-Sheng Chiu
Yuchuan Chang
Ming-Che Cheng
Original Assignee
17Live Japan Inc.
17Live (Usa) Corp.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 17Live Japan Inc., 17Live (Usa) Corp. filed Critical 17Live Japan Inc.
Priority to PCT/US2021/052779 priority Critical patent/WO2023055365A1/fr
Priority to JP2022522347A priority patent/JP7426021B2/ja
Priority to US17/881,743 priority patent/US20230101606A1/en
Publication of WO2023055365A1 publication Critical patent/WO2023055365A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/4223Cameras
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44218Detecting physical presence or behaviour of the user, e.g. using sensors to detect if the user is leaving the room or changes his face expression during a TV program
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4722End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting additional data associated with the content
    • H04N21/4725End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting additional data associated with the content using interactive regions of the image, e.g. hot spots
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting

Definitions

  • the present disclosure relates to image processing or video processing in a live video streaming or a video conference call.
  • the applications include live streaming, live conference calls and the like. As these applications increase in popularity, user demand for improved interactive experience during the communication is rising.
  • a method is a method for video processing.
  • the method includes displaying a live video of a first user in a first region on a user terminal and displaying a video of a second user in a second region on the user terminal. A portion of the live video of the first user extends to the second region on the user terminal.
  • a system is a system for video processing that includes one or a plurality of processors, and the one or plurality of processors execute a machine-readable instruction to perform: displaying a live video of a first user in a first region on a user terminal and displaying a video of a second user in a second region on the user terminal. A portion of the live video of the first user extends to the second region on the user terminal.
  • a computer-readable medium is a non-transitory computer-readable medium including a program for video processing, and the program causes one or a plurality of computers to execute: displaying a live video of a first user in a first region on a user terminal and displaying a video of a second user in a second region on the user terminal. A portion of the live video of the first user extends to the second region on the user terminal.
  • FIG. 1 shows an example of a group call.
  • FIG. 2 shows an example of a group call in accordance with some embodiments of the present disclosure.
  • FIG. 3 shows an example of a group call in accordance with some embodiments of the present disclosure.
  • FIG. 4 shows an example of a group call in accordance with some embodiments of the present disclosure.
  • FIG. 5 shows an example of a group call in accordance with some embodiments of the present disclosure.
  • FIG. 6 shows an example of a group call in accordance with some embodiments of the present disclosure.
  • FIG. 7 shows a schematic configuration of a communication system according to some embodiments of the present disclosure.
  • FIG. 8 shows an exemplary functional configuration of a communication system according to some embodiments of the present disclosure.
  • FIG. 9 shows an exemplary sequence chart illustrating an operation of a communication system in accordance with some embodiments of the present disclosure.
  • FIG. 1 shows an example of a group call.
  • SI is a screen of a user terminal displaying the group call.
  • RA is a region within the screen SI displaying a live video of a user A.
  • RB is a region within the screen SI displaying a live video of a user B.
  • the live video of user A may be taken and provided by a video capturing device, such as a camera, positioned in the vicinity of user A.
  • the live video of user B may be taken and provided by a video capturing device, such as a camera, positioned in the vicinity of user B.
  • the video of user A can only be shown in region RA, and cannot be shown in region RB.
  • the video of user B can only be shown in region RB, and cannot be shown in region RA. That may cause inconvenience or hinder some applications during the communication.
  • user B is presenting a newly developed product to user A in the group call
  • user A cannot precisely point out a portion or a part of the product for detailed discussion. Therefore, it is desired to have more interaction during a group call or a conference call.
  • FIG. 2 shows an example of a group call in accordance with some embodiments of the present disclosure.
  • a portion Al of user A extends to or is reproduced/ duplicated in the region RB wherein user B is displayed.
  • the portion Al is a hand of user A in region RA
  • the portion All is the extended, reproduced or duplicated version of the portion Al displayed in region RB.
  • the portion All points to or is directed toward an object Bl in region RB.
  • the video of user B shown in region RB is a live video.
  • the video of user B shown in region RB is a replayed video.
  • the portion All follows the movement or the trajectory of the portion Al. In some embodiments, the portion All moves synchronously with the portion Al. The user A may control or move the portion All to point to a position in region RB about which the user A wants to discuss by simply moving his hand, which is the portion Al. In some embodiments, the portion All may be represented or displayed as a graphical object or an animated object.
  • the boundary A3 defines a region A31 and a region A32 within region RA.
  • the region A31 surrounds the region A32.
  • the region A31 may be referred to as or defined as an interactive region.
  • the portion Al which extends to or is reproduced in region RB, is within the interactive region A31.
  • the portion Al extends towards user B in region RA.
  • only portions in the interactive region A31 can be extended to or displayed in region RB.
  • user A if user A wants to interact with user B by extending a portion of user A to region RB, user A simply moves the portion to the interactive region A31 and the portion will then be displayed in region RB.
  • the region RA and the region RB are separated from each other.
  • the region RA and the region RB may be at least partially overlapped on the screen SI.
  • the boundary B3 defines a region B31 and a region B32 within region RB.
  • the region B31 surrounds the region B32.
  • the region B31 may be referred to as or defined as an interactive region.
  • portions in the interactive region B31 can be extended to or displayed in region RA.
  • user B if user B wants to interact with user A by extending a portion of user B to region RA, user B simply moves the portion to the interactive region B31 and the portion will then be displayed in region RA.
  • the boundary A3 and/or the boundary B3 may not be displayed on the region RA and/or the region RB.
  • user A and user B, or region RA and region RB, are aligned in a lateral direction on the screen SI of the user terminal, and the portion Al of the live video of user A extends towards user B in the region RA.
  • FIG. 3 shows another example of a group call in accordance with some embodiments of the present disclosure.
  • user A and user B are aligned in a vertical direction on the screen SI of the user terminal.
  • a portion A2 of user A extends to or is reproduced/ duplicated in the region RB wherein user B is displayed.
  • the portion A2 includes a hand of user A and an object held by the hand, and the portion A21 is the extended, reproduced or duplicated version of the portion A2 displayed in region RB.
  • the portion A21 approaches or is directed toward user B in region RB.
  • a special effect SP1 is displayed in region RB when the portion A21 touches user B.
  • the special effect SP1 may include a graphical object or an animated object.
  • the special effect SP1 may include a sound effect.
  • the portion A21 follows the movement or the trajectory of the portion A2. In some embodiments, the portion A21 moves synchronously with the portion A2. The user A may control or move the portion A21 to point to or touch a position in region RB with which the user A wants to interact by simply moving his hand, which may hold an object. In some embodiments, the portion A21 may be represented or displayed as a graphical object or an animated object.
  • the boundary A3 defines a region A31 and a region A32 within region RA.
  • the region A31 surrounds the region A32.
  • the region A31 may be referred to as or defined as an interactive region.
  • the portion A2, which extends to or is reproduced in region RB, is within the interactive region A31.
  • the portion A2 extends towards user B in region RA.
  • only portions in the interactive region A31 can be extended to or displayed in region RB.
  • user A if user A wants to interact with user B by extending a portion of user A to region RB, user A simply moves the portion to the interactive region A31, and the portion will then be displayed in region RB.
  • FIG. 4 shows another example of a group call in accordance with some embodiments of the present disclosure.
  • user A and user D are aligned in a diagonal direction on the screen SI of the user terminal.
  • a portion Al of user A extends to or is reproduced/ duplicated in the region RD wherein user D is displayed.
  • the portion Al is a hand of user A, and the portion All is the extended, reproduced or duplicated version of the portion Al displayed in region RD.
  • the portion All points to or is directed toward user D in region RD.
  • the portion All follows the movement or the trajectory of the portion Al. In some embodiments, the portion All moves synchronously with the portion Al. The user A may control or move the portion All to point to a position in region RD about which the user A wants to interact by simply moving his hand, which is the portion Al. In some embodiments, the portion All may be represented or displayed as a graphical object or an animated object.
  • the boundary A3 defines a region A31 and a region A32 within region RA.
  • the region A31 surrounds the region A32.
  • the region A31 may be referred to as or defined as an interactive region.
  • the interactive region A31 includes a subregion A311.
  • the portion Al which extends to or is reproduced in region RD, is within the subregion A311.
  • the subregion A311 is between user A and user D.
  • the subregion A311 is located in a position of region RA that faces towards region RD from user A's point of view.
  • a direction towards which a portion of user A extends in region RA may determine the region wherein the extended, duplicated or reproduced version of the portion of user A is displayed. Therefore, user A may determine which region (and the corresponding user) to interact with by simply moving or extending the portion of user A towards the corresponding direction. For example, user A may extend a portion in a lateral direction to interact with a user whose display region is aligned or positioned in a lateral direction with respect to user A on the screen SI.
  • user A may extend a portion in a vertical direction to interact with a user whose display region is aligned or positioned in a vertical direction with respect to user A on the screen SI.
  • user A may extend a portion in a diagonal direction to interact with a user whose display region is aligned or positioned in a diagonal direction with respect to user A on the screen SI.
  • FIG. 5 shows another example of a group call in accordance with some embodiments of the present disclosure.
  • the boundary A3 defines the interactive region A31, which is a region user A utilizes to interact with other users, as described in previous exemplary embodiments.
  • the interactive region A31 includes a subregion A311.
  • the boundary A3 includes at least a border BRI and a border BR2.
  • user A may adjust a position of the border BRI and/or a position of the border BR2 to adjust the shape of the interactive region A31 and the shape of the subregion A311.
  • the border BRI corresponds to a direction of user C or the region RC with respect to the region RA, and is between the region RA and the region RC.
  • the border BR2 corresponds to a direction of user B or the region RB with respect to the region RA, and is between the region RA and the region RB.
  • user A may drag or move the border BRI closer to user A, such that a subregion A312 of the interactive region A31 that is between user A and user C becomes wider and closer to user A. In this way, it is easier for user A to interact with user C with a portion of user A.
  • User A only needs to extend the portion of user A for a relatively shorter distance to cross the border BRI and reach the subregion A312 of the interactive region A31, and then the portion will be extended, duplicated or reproduced in region RC wherein user C is displayed.
  • user A may drag or move the border BR2 closer to user A, such that a subregion A313 of the interactive region A31 that is between user A and user B becomes wider and closer to user A. In this way, it is easier for user A to interact with user B with a portion of user A. User A only needs to extend the portion of user A for a relatively shorter distance to cross the border BR2 and reach the subregion A312 of the interactive region A31, and then the portion will be extended, duplicated or reproduced in region RB wherein user B is displayed.
  • user A may drag or move the border BRI and/or the border BR2 closer to user A, such that the subregion A311 of the interactive region A31 that is between user A and user D becomes wider and closer to user A. In this way, it is easier for user A to interact with user D with a portion of user A. User A only needs to extend the portion of user A for a relatively shorter distance in a diagonal direction to reach the subregion A311 of the interactive region A31, and then the portion will be extended, duplicated or reproduced in region RD wherein user D is displayed.
  • FIG. 6 shows another example of a group call in accordance with some embodiments of the present disclosure.
  • only outside of the interactive region are extracted to be displayed on the screen SI. More specifically, for user A, only the region enclosed by the boundary A3 is shown on the screen SI. for user B, user C and user D, only the regions enclosed by the boundary B3, C3 and D3 are shown on the screen SI. By means of that, it may improve the realism of the interaction. For example, when user A extends a portion to another user's display region, the portion will not be shown in user A's display region.
  • FIG. 7 shows a schematic configuration of a communication system 1 according to some embodiments of the present disclosure.
  • the communication system 1 may provide a live streaming service with interaction via a content.
  • content refers to a digital content that can be played on a computer device.
  • the communication system 1 enables a user to participate in real-time interaction with other users on-line.
  • the communication system 1 includes a plurality of user terminals 10, a backend server 30, and a streaming server 40.
  • the user terminals 10, the backend server 30 and the streaming server 40 are connected via a network 90, which may be the Internet, for example.
  • the backend server 30 may be a server for synchronizing interaction between the user terminals and/ or the streaming server 40.
  • the backend server 30 may be referred to as the origin server of an application (APP) provider.
  • the streaming server 40 is a server for handling or providing streaming data or video data.
  • the backend server 30 and the streaming server 40 may be independent servers.
  • the backend server 30 and the streaming server 40 may be integrated into one server.
  • the user terminals 10 are client devices for the live streaming.
  • the user terminal 10 may be referred to as viewer, streamer, anchor, podcaster, audience, listener or the like.
  • Each of the user terminal 10, the backend server 30, and the streaming server 40 is an example of an information-processing device.
  • the streaming may be live streaming or video replay.
  • the streaming may be audio streaming and/or video streaming.
  • the streaming may include contents such as online shopping, talk shows, talent shows, entertainment events, sports events, music videos, movies, comedy, concerts, group calls, conference calls or the like.
  • FIG. 8 shows an exemplary functional configuration of a communication system according to some embodiments of the present disclosure.
  • the backend server 30 includes a message unit 32.
  • the message unit 32 is configured to receive data or information from user terminals, process and/or store those data, and transmit the data to user terminals.
  • the message unit 32 may be a separate unit from the backend server 30.
  • the streaming server 40 includes a data receiver 400 and a data transmitter 402.
  • the data receiver 400 is configured to receive data or information from various user terminals, such as streaming data or video data.
  • the data transmitter 402 is configured to transmit data or information to user terminals, such as streaming data or video data.
  • the user terminal 10A may be a user terminal operated by a user A.
  • the user terminal 10A includes a camera 700, a Tenderer 702, a display 704, an encoder 706, a decoder 708, a result sender 710, a matting unit 712, and an object recognizing unit 714.
  • the camera 700 may be or may include any type of video capturing device.
  • the camera 700 is configured to capture video data of, for example, user A.
  • the Tenderer 702 is configured to receive video data from the camera 700 (video data of user A), to receive video data from the decoder 708 (which may include video data from user B), and to generate a rendered video (such as a video displaying a group call wherein user A and user B are displayed) that is to be displayed on the display 704.
  • the display 704 is configured to display the rendered video from the Tenderer 702.
  • the display 704 may be a screen on the user terminal 10A.
  • the encoder 706 is configured to encode the video data from camera 700, and transmit the encoded video data to the data receiver 400 of the streaming server 40.
  • the encoded data may be transmitted as streaming data.
  • the decoder 708 is configured to receive video data or streaming data (which may include video data from user B) from the data transmitter 402 of the streaming server 40, decode them into decoded video data, and transmit the decoded video data to the Tenderer 702 for rendering.
  • video data or streaming data which may include video data from user B
  • the matting unit 712 is configured to perform a matting process (image matting or video matting) on the video data from the camera 700, which is video data of user A.
  • the matting process may include a contour recognizing process, an image comparison process, a moving object detection process, and/or a cropping process.
  • the matting process may be executed with techniques including constant-color matting, difference matting, and natural image matting.
  • the algorithms involved in the matting process may include Bayesian matting, Poisson matting, or Robust matting.
  • the image comparison process periodically compares an initial or default background image with a current or live image to detect a portion of user A in an interactive region.
  • the matting unit 712 receives video data of user A from camera 700.
  • the video data may include an interactive region as described above with examples in FIG. 2, FIG.3, FIG 4 and FIG. 5.
  • the matting unit 712 performs a matting process to detect or to extract a contour of user A in the video data.
  • the matting unit 712 performs a matting process to detect or to extract a portion of user A in the interactive region (such as a hand of user A, or a hand of user A holding an object).
  • the matting unit 712 performs a cropping process to remove a region or a portion outside of the interactive region from the video data of user A.
  • the matting unit 712 detects, recognizes or determines a position in the interactive region wherein the portion of user A is detected.
  • a contour recognizing process or an image comparison process may be performed before a cropping process, which may improve an accuracy of the detection of the portion of user A in the interactive region.
  • the interactive region, and the corresponding boundary or border may be defined by a processor (not shown) of the user terminal 10A or an application enabling the group call.
  • the interactive region, and the corresponding boundary or border may be determined by user A by a Ul (user interface) unit (not shown) of the user terminal IDA.
  • the matting unit 712 detects or determines the portion of user A (or the portion of the live video of user A) in the interactive region by detecting a portion of user A crossing a border in the region RA.
  • the border in the region RA could be, for example, the border BRI or the border BR2 in FIG. 5.
  • the object recognizing unit 714 is configured to perform an object recognizing process on the output data from the matting unit 712.
  • the output data may include a detected portion or an extracted portion of user A (such as a hand of user A, or a hand of user A holding an object).
  • the object recognizing unit 714 performs the object recognizing process to determine if the detected portion of user A includes any predetermined pattern, object and/or gesture.
  • the object recognizing process may include techniques such as template matching, pattern matching, contour matching, gesture recognizing, skin recognizing, outline matching, color or shape matching, and feature based matching.
  • the object recognizing unit 714 calculates a matching correlation between the detected portion of user A (or a part of which) and a set of predetermined patterns to determine if any pattern is matched or recognized within the detected portion of user A. In some embodiments, the object recognizing unit 714 detects, recognizes or determines a position in the interactive region wherein the portion of user A is detected. In some embodiments, the object recognizing process may be performed on an image or video from the matting unit 712 wherein a cropping process is not performed yet, which may improve an accuracy of the object recognizing process. In some embodiments, the object recognizing unit 714 recognizes and extracts the image or video of the portion of user A in the interactive region, and transmits the extracted image or video to the result sender 710.
  • the result sender 710 is configured to transmit the output result of the object recognizing unit 714 (which may include the output of the matting unit 712) to the message unit 32 of the backend server 30. In some embodiments, the result sender 710 may transmit the output directly to the result receiver 810 instead of transmitting via the message unit 32.
  • the user terminal 10B may be a user terminal operated by a user B.
  • the user terminal 10B includes a camera 800, a Tenderer 802, a display 804, an encoder 806, a decoder 808, a result receiver 810, and an image processor 812.
  • the camera 800 may be or may include any type of video capturing device.
  • the camera 800 is configured to capture video data of, for example, user B.
  • the camera 800 transmits the captured video data to the encoder 806, the Tenderer 802, and/or the image processor 812.
  • the Tenderer 802 is configured to receive video data from the camera 800 (e.g., video data of user B), to receive video data from the decoder 808 (which may include video data from another user such as user A), to receive output data of the image processor 812, and to generate a rendered video (such as a video displaying a group call wherein user A and user B are displayed) that is to be displayed on the display 804.
  • video data from the camera 800 e.g., video data of user B
  • the decoder 808 which may include video data from another user such as user A
  • receive output data of the image processor 812 and to generate a rendered video (such as a video displaying a group call wherein user A and user B are displayed) that is to be displayed on the display 804.
  • the display 804 is configured to display the rendered video from the Tenderer 802.
  • the display 804 may be a screen on the user terminal 10B.
  • the encoder 806 is configured to encode data, which includes the video data from the camera 800, and/or video data from the image processor 812.
  • the encoder 806 transmits the encoded video data to the data receiver 400 of the streaming server 40.
  • the encoded data may be transmitted as streaming data.
  • the decoder 808 is configured to receive video data or streaming data (which may include video data from user A) from the data transmitter 402 of the streaming server 40, decode them into decoded video data, and transmit the decoded video data to the Tenderer 802 for rendering.
  • the result receiver 810 is configured to receive output data from the message unit 32 of the backend server 30, and transmit the data to the image processor 812.
  • the output data from the message unit 32 includes data or information from the matting unit 712 and the object recognizing unit 714.
  • the output data from the message unit 32 includes a result of the object recognizing process executed by the object recognizing unit 714.
  • the output data from the message unit 32 may include information regarding a matched or recognized pattern, object or gesture.
  • the output data from the message unit 32 includes information regarding a position in the interactive region (on the user terminal 10A) wherein the portion of user A is detected, for example, by the matting unit 712 of the user terminal 10A or the object recognizing unit 714.
  • the output data from the message unit 32 includes a video or image of a detected/ recognized portion of user A in the interactive region.
  • the image processor 812 is configured to receive video data from the camera 800, and/or data or information from the result receiver 810. In some embodiments, the image processor 812 performs image processing or video processing on the video data received from the camera 800 based on data or information received from the result receiver 810. For example, if the data received from the result receiver 810 indicates that the object recognizing process executed by the object recognizing unit 714 successfully recognized a predetermined pattern in the portion of user A (which is in the interactive region on a screen of the user terminal 10A), the image processor 812 may include, render, or overlap a special effect corresponding to the predetermined pattern onto the video data received from the camera 800. The overlapped video is later transmitted to the Tenderer 802, and may later be subsequently displayed on the user terminal 804. In some embodiments, the special effect data may be stored in a storage on the user terminal 10B (not shown).
  • the message unit 32 determines a destination of output data of the message unit 32 based on data from the matting unit 712 and/or data from the object recognizing unit 714. In some embodiments, the message unit 32 determines the region to extend, duplicate or reproduce the portion of user A based on the position of the portion of user A detected in the interactive region.
  • the message unit 32 may determine the user terminal of user C to be the destination to send the output data of the message unit 32.
  • the portion of user A will then extend to or be duplicated/ reproduced/ displayed in region RC, which could be done by an image processor of the user terminal of user C.
  • the message unit 32 may determine the user terminal of user D to be the destination to send the output data of the message unit 32.
  • the portion of user A will then extend to or be duplicated/ reproduced/ displayed in region RD, which could be done with cooperation of an image processor and/or a Tenderer in the user terminal of user D.
  • the message unit 32 may determine the user terminal of user B to be the destination to send the output data of the message unit 32.
  • the portion of user A will then extend to or be duplicated/ reproduced/ displayed in region RB, which could be done with cooperation of an image processor and/or a Tenderer in the user terminal of user B.
  • the output data of the message unit 32 may include an image or video of the detected portion of user A in the interactive region of region RA.
  • the image processor 812 may subsequently overlap, duplicate or reproduce the portion of user A onto the video of user B, which is received from the camera 800.
  • the portion of user A in the interactive region may extend to the region B without being represented as a graphical or animated object.
  • the image processor 812 may receive the image or video data of user A through the decoder 808, and then utilize information from the message unit 32 (which may include a range, outline or contour information regarding the portion of user A detected in the interactive region) to overlap, duplicate or reproduce the portion of user A in the interactive region onto the video of user B received from the camera 800.
  • the portion of user A in the interactive region may extend to the region B without being represented as a graphical or animated object.
  • FIG. 9 shows an exemplary sequence chart illustrating an operation of a communication system in accordance with some embodiments of the present disclosure. In some embodiments, FIG. 9 illustrates how a portion of a user (for example, user A) extends to a region wherein another user (for example, user B) is displayed.
  • step S200 the camera 700 of the user terminal 10A transmits the video data of user A to the matting unit 712 of the user terminal 10A.
  • the matting unit 712 detects a portion of user A in the interactive region on a screen of the user terminal 10A.
  • the detection may include a matting process and/or a cropping process.
  • the matting unit 712 determines a position within the interactive region wherein the portion of user A is detected.
  • the object recognizing unit 714 of the user terminal 10A receives output data from the matting unit 712, and performs an object recognizing process on the output of the matting unit 712 to determine if any predetermined pattern, gesture or object can be recognized in the detected portion of user A in the interactive region.
  • the object recognizing process may include a matching process, a gesture recognizing process and/or a skin recognizing process.
  • step S206 the object recognizing unit 714 recognizes a predetermined pattern, gesture or object, and then the object recognizing unit 714 collects related information of the predetermined pattern, gesture or object, such as position and size, for determining the destination to whom the data should be transmitted.
  • step S208 the output of the object recognizing unit 714 is transmitted to the message unit 32 of the backend server 30 through the result sender 710 of the user terminal 10A.
  • step S210 the message unit 32 determines a destination to transmit the data from the user terminal 10A according to information regarding the position of the portion of user A in the interactive region included in the data from the user terminal 10A.
  • the information could be determined in step S206, for example.
  • step S211 the message unit 32 transmits the data from the user terminal 10A to the result receiver 810 of the user terminal 10B (in an exemplary scenario that the message unit 32 determines the destination to be user B or region RB).
  • step S212 the result receiver 810 transmits the received data to the image processor 812 of the user terminal 10B.
  • step S214 the image processor 812 overlaps or superimposes the detected portion of user A (or a portion of the detected portion of user A, which is in the interactive region of region RA), onto the video data of user B.
  • the image or video data of the detected portion of user A is transmitted to the user terminal 10B through the streaming server 40.
  • the image or video data of the detected portion of user A is transmitted to the user terminal 10B through the message unit 32.
  • the image or video data of user B is transmitted to the image processor 812 from the camera 800 of the user terminal 10B.
  • step S216 the image processor 812 transmits the processed image or video data to the renderer 802 of the user terminal 10B for rendering.
  • the processed image or video data may be rendered together with video data from the decoder 808 of the user terminal 10B and/or video data from the camera 800.
  • step S2128 the rendered video data is transmitted to the display 804 of the user terminal 10B for displaying on the screen of the user terminal 10B.
  • step S220 the image processor 812 transmits the processed image or video data to the encoder 806 of the user terminal 10B for an encoding process.
  • step S222 the encoded video data is transmitted to the streaming server 40.
  • step S224 the streaming server 40 transmits the encoded video data (from the user terminal 10B) to the decoder 708 of the user terminal 10A for a decoding process.
  • step S226 the decoded video data is transmitted to the renderer 702 of the user terminal 10A for a rendering process.
  • step S228, the rendered video data is transmitted to the display 804 for displaying on the screen of the user terminal 10A.
  • the matting unit 712 continuously or periodically detects a portion of user A in the interactive region.
  • the object recognizing unit 714 continuously or periodically performs a recognizing process on the portion of user A in the interactive region.
  • the message unit 32 continuously or periodically determines a destination to send the data received from the user terminal 10A.
  • the image processor 812 of the user terminal 10B continuously or periodically performs an overlapping or a superimposing process based on information received from the message unit 32, to make sure the extended or reproduced/ duplicated portion of user A in the region RB moves synchronously with the portion of user A in the region RA.
  • the user terminal 10B has a processing unit, such as a CPU or a GPU, to determine if the extended or reproduced portion of user A in the region RB touches the image or video of user B. The result of the determination may be utilized by the image processor 812 to decide whether or not to include a special effect in the region RB.
  • a processing unit such as a CPU or a GPU
  • the present disclosure makes conference calls or group calls more convenient, interesting or interactive.
  • the present disclosure can prevent misunderstanding when a user wants to discuss about an object in another user's display region.
  • the present disclosure can boost users' motivation to participate in a group call chat room, which could be in a live streaming form.
  • the present disclosure can attract more streamers or viewers to join in a live streaming group call.
  • the processing and procedures described in the present disclosure may be realized by software, hardware, or any combination of these in addition to what was explicitly described.
  • the processing and procedures described in the specification may be realized by implementing a logic corresponding to the processing and procedures in a medium such as an integrated circuit, a volatile memory, a non-volatile memory, a non-transitory computer-readable medium and a magnetic disk.
  • the processing and procedures described in the specification can be implemented as a computer program corresponding to the processing and procedures, and can be executed by various kinds of computers.
  • the system or method described in the above embodiments may be integrated into programs stored in a computer-readable non-transitory medium such as a solid state memory device, an optical disk storage device, or a magnetic disk storage device.
  • the programs may be downloaded from a server via the Internet and be executed by processors.

Abstract

La présente divulgation concerne un système, un procédé et un support lisible par ordinateur conçus pour un traitement vidéo. Le procédé comprend les étapes consistant à : afficher une vidéo en direct d'un premier utilisateur dans une première région sur un terminal utilisateur ; et afficher une vidéo d'un second utilisateur dans une seconde région sur le terminal utilisateur. Une partie de la vidéo en direct du premier utilisateur s'étend vers la seconde région sur le terminal utilisateur. La présente divulgation peut améliorer une interaction pendant un appel de conférence ou un appel de groupe.
PCT/US2021/052779 2021-09-30 2021-09-30 Système, procédé et support lisible par ordinateur conçus pour un traitement vidéo WO2023055365A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/US2021/052779 WO2023055365A1 (fr) 2021-09-30 2021-09-30 Système, procédé et support lisible par ordinateur conçus pour un traitement vidéo
JP2022522347A JP7426021B2 (ja) 2021-09-30 2021-09-30 映像処理のためのシステム、方法、及びコンピュータ可読媒体
US17/881,743 US20230101606A1 (en) 2021-09-30 2022-08-05 System, method and computer-readable medium for video processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2021/052779 WO2023055365A1 (fr) 2021-09-30 2021-09-30 Système, procédé et support lisible par ordinateur conçus pour un traitement vidéo

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/073182 Continuation-In-Part WO2023129181A1 (fr) 2021-09-30 2021-12-30 Système, procédé et support lisible par ordinateur pour la reconnaissance d'images

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/881,743 Continuation-In-Part US20230101606A1 (en) 2021-09-30 2022-08-05 System, method and computer-readable medium for video processing

Publications (1)

Publication Number Publication Date
WO2023055365A1 true WO2023055365A1 (fr) 2023-04-06

Family

ID=85783375

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/052779 WO2023055365A1 (fr) 2021-09-30 2021-09-30 Système, procédé et support lisible par ordinateur conçus pour un traitement vidéo

Country Status (2)

Country Link
JP (1) JP7426021B2 (fr)
WO (1) WO2023055365A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090033737A1 (en) * 2007-08-02 2009-02-05 Stuart Goose Method and System for Video Conferencing in a Virtual Environment
US20170339372A1 (en) * 2014-11-14 2017-11-23 Pcms Holdings, Inc. System and method for 3d telepresence
US20200192550A1 (en) * 2017-09-08 2020-06-18 Nokia Technologies Oy Methods, apparatus, systems, computer programs for enabling mediated reality

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5149043B2 (ja) 2008-03-10 2013-02-20 株式会社日立製作所 遠隔コミュニケーションにおける映像表示方法
JP6019947B2 (ja) 2012-08-31 2016-11-02 オムロン株式会社 ジェスチャ認識装置、その制御方法、表示機器、および制御プログラム
JP6287382B2 (ja) 2014-03-12 2018-03-07 オムロン株式会社 ジェスチャ認識装置およびジェスチャ認識装置の制御方法
WO2018074262A1 (fr) 2016-10-20 2018-04-26 ソニー株式会社 Dispositif de communication, procédé de communication et programme

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090033737A1 (en) * 2007-08-02 2009-02-05 Stuart Goose Method and System for Video Conferencing in a Virtual Environment
US20170339372A1 (en) * 2014-11-14 2017-11-23 Pcms Holdings, Inc. System and method for 3d telepresence
US20200192550A1 (en) * 2017-09-08 2020-06-18 Nokia Technologies Oy Methods, apparatus, systems, computer programs for enabling mediated reality

Also Published As

Publication number Publication date
JP2023543369A (ja) 2023-10-16
JP7426021B2 (ja) 2024-02-01

Similar Documents

Publication Publication Date Title
US11863813B2 (en) System and methods for interactive filters in live streaming media
JP7295851B2 (ja) 仮想現実アプリケーションのためのオーディオ配信の最適化
US10127724B2 (en) System and method for providing augmented reality on mobile devices
US10013857B2 (en) Using haptic technologies to provide enhanced media experiences
EP2566158B1 (fr) Dispositif de reproduction de contenu et système de reproduction de contenu
US8876601B2 (en) Method and apparatus for providing a multi-screen based multi-dimension game service
CN109982148B (zh) 一种直播方法、装置、计算机设备与存储介质
CN112135155B (zh) 音视频的连麦合流方法、装置、电子设备及存储介质
WO2016188276A1 (fr) Support de stockage informatique, client et procédé de lecture vidéo
CN113965813B (zh) 直播间内的视频播放方法、系统、设备及介质
JP2023001324A (ja) 映像コーディングを行うコンピュータプログラム
CN113573090A (zh) 游戏直播中的内容显示方法、装置、系统和存储介质
CN111225287A (zh) 一种弹幕处理的方法及装置、电子设备、存储介质
US11223662B2 (en) Method, system, and non-transitory computer readable record medium for enhancing video quality of video call
JP2023522266A (ja) マルチメディアデータ配信の方法、装置、デバイス及び媒体
CN114139491A (zh) 一种数据处理方法、装置及存储介质
CN114830636A (zh) 用于叠加处理远程终端的沉浸式远程会议及远程呈现的参数
WO2023055365A1 (fr) Système, procédé et support lisible par ordinateur conçus pour un traitement vidéo
CN109862385B (zh) 直播的方法、装置、计算机可读存储介质及终端设备
TWI772192B (zh) 影片處理之系統、方法及電腦可讀媒體
CN112995692B (zh) 互动数据处理方法、装置、设备及介质
KR102516831B1 (ko) 싱글 스트림을 이용하여 관심 영역 고화질 영상을 제공하는 방법, 컴퓨터 장치, 및 컴퓨터 프로그램
US20230101606A1 (en) System, method and computer-readable medium for video processing
WO2022155107A1 (fr) Synchronisation de contenu audiovisuel secondaire sur la base de transitions d'image dans du contenu de diffusion en continu
CN112565655A (zh) 视频数据的鉴黄方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2022522347

Country of ref document: JP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21959621

Country of ref document: EP

Kind code of ref document: A1