WO2023129182A1 - Système, procédé et support lisible par ordinateur pour traitement vidéo - Google Patents

Système, procédé et support lisible par ordinateur pour traitement vidéo Download PDF

Info

Publication number
WO2023129182A1
WO2023129182A1 PCT/US2021/073183 US2021073183W WO2023129182A1 WO 2023129182 A1 WO2023129182 A1 WO 2023129182A1 US 2021073183 W US2021073183 W US 2021073183W WO 2023129182 A1 WO2023129182 A1 WO 2023129182A1
Authority
WO
WIPO (PCT)
Prior art keywords
message
user
live video
region
video
Prior art date
Application number
PCT/US2021/073183
Other languages
English (en)
Inventor
Shao Yuan Wu
Ming-Che Cheng
Original Assignee
17Live Japan Inc.
17Live (Usa) Corp.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 17Live Japan Inc., 17Live (Usa) Corp. filed Critical 17Live Japan Inc.
Priority to PCT/US2021/073183 priority Critical patent/WO2023129182A1/fr
Priority to JP2022528663A priority patent/JP7449519B2/ja
Priority to US17/881,743 priority patent/US20230101606A1/en
Publication of WO2023129182A1 publication Critical patent/WO2023129182A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1827Network arrangements for conference optimisation or adaptation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/10Multimedia information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region

Definitions

  • the present disclosure relates to video processing in a video streaming.
  • the applications include live streaming, live conference calls and the like. As these applications increase in popularity, user demands for improved communication efficiency and better understanding of each other’s message during the communication are rising.
  • a method is a method for live video processing.
  • the method includes receiving a message from a user, and enlarging a region of the live video in the vicinity of a predetermined object.
  • a system is a system for live video processing that includes one or a plurality of processors, and the one or plurality of processors execute a machine-readable instruction to perform: receiving a message from a user, and enlarging a region of the live video in the vicinity of a predetermined object.
  • a computer-readable medium is a non-transitory computer-readable medium including a program for live video processing, and the program causes one or a plurality of computers to execute: receiving a message from a user, and enlarging a region of the live video in the vicinity of a predetermined object.
  • FIG. 1 shows an example of a live streaming.
  • FIG. 2A, FIG. 2B, FIG. 2C, and FIG. 2D show exemplary streamings in accordance with some embodiments of the present disclosure.
  • FIG. 3 shows an exemplary streaming in accordance with some embodiments of the present disclosure.
  • FIG. 4 shows a schematic configuration of a communication system according to some embodiments of the present disclosure.
  • FIG. 5 shows a block diagram of a user terminal according to some embodiments of the present disclosure.
  • FIG. 6 shows an exemplary look-up table in accordance with some embodiments of the present disclosure.
  • on-line communication has some disadvantages which may reduce the communication efficiency or increase the chances of misunderstanding.
  • a live video or a live streaming communication it is difficult to keep the focus on the correct region, especially when there are some distractions such as comments, special effects on the display wherein the live video is being displayed.
  • a live video or a live streaming communication it is difficult to see the details of the video content due to the limited size of the display or the limited resolution of the video.
  • FIG. 1 shows an example of a live streaming.
  • SI is a screen of a user terminal displaying the live streaming.
  • RA is a display region within the screen SI displaying a live video of a user A.
  • the live video of user A may be taken and provided by a video capturing device, such as a camera, positioned in the vicinity of user A.
  • user A may be a streamer or a broadcastor who is distributing a live video to teach how to cook.
  • User A would like viewers of this live video to be able to focus on the right region of the video, and to be able to see the details of the region, in order for the viewers to get the correct knowledge such as cooking steps or cooking materials.
  • user A may need to bring up the object of interest (such as a pan or a chopping board) closer to the camera for the users to see clearly.
  • user A may need to adjust a direction, a position or a focus of the camera for users to see the details user A wants to emphasize.
  • the above actions are inconvenient for user A and interrupt the cooking process.
  • FIG. 2A, FIG. 2B, FIG. 2C, and FIG. 2D show exemplary streamings in accordance with some embodiments of the present disclosure.
  • the message Ml is a voice message indicating “zoom in.”
  • the message Ml may be a gesture message expressed by user A.
  • user A may use a body portion (such as a hand) to form a gesture message.
  • the message Ml may be a facial expression message expressed by user A.
  • the message Ml is part of the video (including audio data) of user A.
  • the message Ml may be received by a user terminal used to capture the video of user A, such as a smartphone, a tablet, a laptop or any device with a video capturing function.
  • the message Ml is recognized by a user terminal used to produce or deliver the video of user A.
  • the message Ml is recognized by a system that provides the streaming service.
  • the message Ml is recognized by a server that supports the streaming service.
  • the message Ml is recognized by an application that supports the streaming service.
  • the message Ml is recognized by a voice recognition process, a gesture recognition process and/or a facial expression recognition process.
  • the message Ml may be an electrical signal, and can be transmitted and received by wireless connections.
  • objects 01 are recognized, and a region R1 is determined.
  • the objects 01 are recognized according to the message Ml.
  • the recognition of the object 01 follows the receiving of the message Ml.
  • the receiving of the message Ml triggers the recognition of the object 01.
  • a recognition of the message Ml is done before the recognition of the object 01.
  • the object 01 is set, taught or determined to be a body part (hands) of user A.
  • the object 01 may be determined to be a non-body object such as a chopping board or a pan.
  • the object 01 may be determined to be a wearable object on user A such as a watch, a bracelet or a sticker.
  • the object 01 may be predetermined or set to be any object in the video of user A.
  • the region R1 is determined to be a region in the vicinity of the object 01.
  • the region R1 may be determined to be a region enclosing or surrounding all objects 01, thereby user A may control the size of the region R1 conveniently by controlling the positions of objects 01 (in this case, the objects 01 are her hands).
  • a distance between an edge of the region R1 and the object 01 may be determined according to the actual practice.
  • different messages Ml may correspond to different predetermined objects 01.
  • user A may choose the object to be recognized, and the region to be determined, simply by sending out the corresponding message.
  • user A may speak “pan,” and then a pan (which is a predetermined object corresponding to the message “pan”) is recognized, and the region R1 would be determined to be a region in the vicinity of the pan.
  • an object 01 is recognized by a user terminal used to capture the live video of user A. In some embodiments, an object 01 is recognized by a user terminal used to produce or deliver the video of user A. In some embodiments, an object 01 is recognized by a system that provides the streaming service. In some embodiments, an object 01 is recognized by a server that supports the streaming service. In some embodiments, an object 01 is recognized by an application that supports the streaming service.
  • the region R1 is determined by a user terminal used to capture the live video of user A. In some embodiments, the region R1 is determined by a user terminal used to produce or deliver the video of user A. In some embodiments, the region R1 is determined by a system that provides the streaming service. In some embodiments, the region R1 is determined by a server that supports the streaming service. In some embodiments, the region R1 is determined by an application that supports the streaming service.
  • the region R1 is enlarged such that details of the video content within the region R1 can be seen clearly.
  • the enlarged region R1 may cover or overlap a portion of the video of user A that is outside the region Rl.
  • the enlarged region R1 may be displayed on any region of the screen SI.
  • the enlarging process is performed by a user terminal used to capture the live video of user A. In some embodiments, the enlarging process is performed by a user terminal used to produce or deliver the video of user A. In some embodiments, the enlarging process is performed by a system that provides the streaming service. In some embodiments, the enlarging process is performed by a server that supports the streaming service. In some embodiments, the enlarging process is performed by an application that supports the streaming service. In some embodiments, the enlarging process is performed by a user terminal displaying the video of user A, such as a user terminal of a viewer.
  • the user terminal can be configured to capture the region Rl (the region R1 may move according to a movement of an object 01) with a higher resolution compared to another region outside of the region Rl. Therefore, the region of the live video to be enlarged has a higher resolution than another region of the live video not to be enlarged. Therefore, the region to be emphasized can have more information for a viewer to see the details.
  • regions within the display region RA may be processed such that the enlarged region Rl stands out and becomes more obvious.
  • other regions may be darkened or blurred, such that a viewer can focus more easily on the region Rl.
  • FIG. 3 shows an exemplary streaming in accordance with some embodiments of the present disclosure.
  • the object 01 is determined to be a wearable device or a wearable object on user A.
  • the object 01 moves synchronously with a movement of user A, and the region of the live video to be enlarged moves synchronously with a movement of the object 01. Therefore, it is convenient for user A to determine which region to be enlarged or emphasized by simply controlling the position of the object 01.
  • enlarging a region of a live video and/ or moving the enlarged region are done with video processes executed by a user terminal, a server, or an application. Therefore, a direction of a video capturing device used to capture the live video can be kept fixed when the region of the live video to be enlarged moves synchronously with the movement of the predetermined object.
  • a user may send out a first message to trigger a message recognition process, and then send out a second message to indicate which object to recognize. The object then determines the region to be enlarged.
  • the first message and/or the second message can be or can include voice message, gesture message or facial expression message.
  • the first message can be referred to as a trigger message.
  • user A may speak “focus” or “zoom in” to indicate that whatever he or she sends out next is for recognizing the object 01.
  • user A may speak “pan” such that a pan on the video would be recognized as the object 01.
  • a region in the vicinity of the pan would be enlarged.
  • the above configuration may save the resources used in message recognition.
  • a constantly ongoing message recognition process (which may include comparing the video information with a message table) can be only focused on the first message, which may be a single voice message.
  • the second message may have more variants, each corresponding to a different object in the video.
  • the message recognition process for the second message can be turned on only when the first message is received and/ or detected.
  • FIG. 4 shows a schematic configuration of a communication system according to some embodiments of the present disclosure.
  • the communication system 1 may provide a live streaming service with interaction via a content.
  • content refers to a digital content that can be played on a computer device.
  • the communication system 1 enables a user to participate in real-time interaction with other users on-line.
  • the communication system 1 includes a plurality of user terminals 10, a backend server 30, and a streaming server 40.
  • the user terminals 10, the backend server 30 and the streaming server 40 are connected via a network 90, which may be the Internet, for example.
  • the backend server 30 may be a server for synchronizing interaction between the user terminals and/ or the streaming server 40.
  • the backend server 30 may be referred to as the origin server of an application (APP) provider.
  • the streaming server 40 is a server for handling or providing streaming data or video data.
  • the backend server 30 and the streaming server 40 may be independent servers.
  • the backend server 30 and the streaming server 40 may be integrated into one server.
  • the user terminals 10 are client devices for the live streaming.
  • a user terminal 10 may be referred to as viewer, streamer, anchor, podcaster, audience, listener or the like.
  • Each of the user terminals 10, the backend server 30, and the streaming server 40 is an example of an information-processing device.
  • the streaming may be live streaming or video replay.
  • the streaming may be audio streaming and/or video streaming.
  • the streaming may include contents such as online shopping, talk shows, talent shows, entertainment events, sports events, music videos, movies, comedy, concerts, group calls, conference calls or the like.
  • FIG. 5 shows a block diagram of a user terminal according to some embodiments of the present disclosure.
  • the user terminal 10S is a user terminal of a streamer or a broadcastor.
  • the user terminal 10S includes a live video capturing unit 12, a message reception unit 13, an object identifying unit 14, a region determining unit 15, an enlarging unit 16, and a transmitting unit 17.
  • the live video capturing unit 12 includes a camera 122 and a microphone 124, and is configured to capture live video data (including audio data) of the streamer.
  • the message reception unit 13 is configured to monitor voice stream (or image stream in some embodiments) in the live video, and to recognize a predetermined word (for example, “focus” or “zoom-in”) in the voice stream.
  • the object identifying unit 14 is configured to identify one or more predetermined objects in the live video, and to recognize the identified one or more objects in the image or the live video.
  • the identification of objects may be done by a look-up table and the predetermined word recognized by the message reception unit 13, which will be described later. In another embodiment, the identification of objects may be done by the message reception unit 13.
  • the region determining unit 15 is configured to determine a region in the live video to be enlarged.
  • the region to be enlarged is a region in the vicinity of the identified or recognized object.
  • the enlarging unit 16 is configured to perform video processes related to enlarging a region of a live video.
  • the camera 122 may be involved in the enlarging process.
  • the transmitting unit 17 is configured to transmit the enlarged live video (or a live video with a region enlarged) to a server (such as a streaming server) if the enlarging process is performed. If an enlarging process is not performed, the transmitting unit 17 transmits the live video captured by the live video capturing unit 12.
  • FIG. 6 shows an exemplary look-up table in accordance with some embodiments of the present disclosure, which may be utilized by the object identifying unit 14 of FIG. 5.
  • the column “predetermined word” indicates the words to be identified in the voice stream of the live video.
  • the column “object” indicates the object corresponding to each predetermined word to be recognized. For example, in this example, an identified “zoom-in” leads to recognition of the streamer’s hand in the live video, an identified “pan” leads to recognition of a pan in the live video, an identified “board please” leads to recognition of a chopping board in the live video.
  • the predetermined words or the objects are pre-set by a user. In some embodiments, the predetermined words or the objects may be auto-created through Al or machine learning.
  • processing and procedures described in the present disclosure may be realized by software, hardware, or any combination of these in addition to what was explicitly described.
  • the processing and procedures described in the specification may be realized by implementing a logic corresponding to the processing and procedures in a medium such as an integrated circuit, a volatile memory, a non-volatile memory, a non-transitory computer-readable medium and a magnetic disk.
  • the processing and procedures described in the specification can be implemented as a computer program corresponding to the processing and procedures, and can be executed by various kinds of computers.
  • the system or method described in the above embodiments may be integrated into programs stored in a computer-readable non-transitory medium such as a solid state memory device, an optical disk storage device, or a magnetic disk storage device.
  • the programs may be downloaded from a server via the Internet and be executed by processors.
  • a person having common knowledge in the technical field of the present invention may still make many variations and modifications without disobeying the teaching and disclosure of the present invention. Therefore, the scope of the present invention is not limited to the embodiments that are already disclosed, but includes another variation and modification that do not disobey the present invention, and is the scope covered by the patent application scope.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

La présente invention concerne un système, un procédé et un support lisible par ordinateur pour le traitement vidéo en direct. Le procédé comprend la réception d'un message provenant d'un utilisateur, et l'agrandissement d'une région de la vidéo en direct au voisinage d'un objet prédéterminé. La présente invention peut permettre la présentation et la mise au point d'une vidéo en direct.
PCT/US2021/073183 2021-09-30 2021-12-30 Système, procédé et support lisible par ordinateur pour traitement vidéo WO2023129182A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/US2021/073183 WO2023129182A1 (fr) 2021-12-30 2021-12-30 Système, procédé et support lisible par ordinateur pour traitement vidéo
JP2022528663A JP7449519B2 (ja) 2021-12-30 2021-12-30 映像処理のためのシステム、方法、及びコンピュータ可読媒体
US17/881,743 US20230101606A1 (en) 2021-09-30 2022-08-05 System, method and computer-readable medium for video processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2021/073183 WO2023129182A1 (fr) 2021-12-30 2021-12-30 Système, procédé et support lisible par ordinateur pour traitement vidéo

Related Child Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/073182 Continuation-In-Part WO2023129181A1 (fr) 2021-09-30 2021-12-30 Système, procédé et support lisible par ordinateur pour la reconnaissance d'images

Publications (1)

Publication Number Publication Date
WO2023129182A1 true WO2023129182A1 (fr) 2023-07-06

Family

ID=87000027

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/073183 WO2023129182A1 (fr) 2021-09-30 2021-12-30 Système, procédé et support lisible par ordinateur pour traitement vidéo

Country Status (2)

Country Link
JP (1) JP7449519B2 (fr)
WO (1) WO2023129182A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160205341A1 (en) * 2013-08-20 2016-07-14 Smarter Tv Ltd. System and method for real-time processing of ultra-high resolution digital video
US20210099505A1 (en) * 2013-02-13 2021-04-01 Guy Ravine Techniques for Optimizing the Display of Videos
US20210365707A1 (en) * 2020-05-20 2021-11-25 Qualcomm Incorporated Maintaining fixed sizes for target objects in frames

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4232419B2 (ja) 2001-09-18 2009-03-04 ソニー株式会社 送信装置、送信方法、コンテンツ配信装置、コンテンツ配信方法、及びプログラム
CN107851334A (zh) 2015-08-06 2018-03-27 索尼互动娱乐股份有限公司 信息处理装置
JP2020021225A (ja) 2018-07-31 2020-02-06 株式会社ニコン 表示制御システム、表示制御方法、及び、表示制御プログラム
TW202133118A (zh) 2020-02-21 2021-09-01 四葉草娛樂有限公司 實境模擬全景系統及其方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210099505A1 (en) * 2013-02-13 2021-04-01 Guy Ravine Techniques for Optimizing the Display of Videos
US20160205341A1 (en) * 2013-08-20 2016-07-14 Smarter Tv Ltd. System and method for real-time processing of ultra-high resolution digital video
US20210365707A1 (en) * 2020-05-20 2021-11-25 Qualcomm Incorporated Maintaining fixed sizes for target objects in frames

Also Published As

Publication number Publication date
JP7449519B2 (ja) 2024-03-14
JP2024501091A (ja) 2024-01-11

Similar Documents

Publication Publication Date Title
CN111818359B (zh) 直播互动视频的处理方法、装置、电子设备及服务器
US8893168B2 (en) Method and system for synchronization of dial testing and audience response utilizing automatic content recognition
US20150334344A1 (en) Virtual Window
CN106105246B (zh) 直播显示方法、装置及系统
US20180077461A1 (en) Electronic device, interractive mehotd therefor, user terminal and server
US20150326925A1 (en) Embedding Interactive Objects into a Video Session
EP2756671B1 (fr) Mise en oeuvre coopérative de fonctions d'usager personnalisées au moyen d'un dispositif partagé et d'un dispositif personnel
CN111343476A (zh) 视频共享方法、装置、电子设备及存储介质
CN108337925A (zh) 用于识别视频片段以及显示从替代源和/或在替代设备上观看的选项的方法
JP6289651B2 (ja) 2つの電子機器での再生を同期させる方法及び装置
EP3316582B1 (fr) Procédé et système de traitement d'information multimédia, serveur standardisé et terminal de diffusion en direct
US20150029342A1 (en) Broadcasting providing apparatus, broadcasting providing system, and method of providing broadcasting thereof
CN105163152A (zh) 一种电视互动系统的互动接入方法
KR101900471B1 (ko) 반응 효과가 삽입된 방송 시스템
US9774907B1 (en) Tailored audio content delivery
WO2015035247A1 (fr) Fenêtre virtuelle
WO2019119643A1 (fr) Terminal et procédé d'interaction pour diffusion en direct mobile, et support de stockage lisible par ordinateur
US9332206B2 (en) Frame sharing
CN113784180A (zh) 视频显示方法、视频推送方法、装置、设备及存储介质
US11825170B2 (en) Apparatus and associated methods for presentation of comments
US10764535B1 (en) Facial tracking during video calls using remote control input
WO2023129182A1 (fr) Système, procédé et support lisible par ordinateur pour traitement vidéo
US20170094367A1 (en) Text Data Associated With Separate Multimedia Content Transmission
CN105187934A (zh) 一种电视互动系统的终端平台
TW202327366A (zh) 視訊處理之系統、方法及電腦可讀媒體

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2022528663

Country of ref document: JP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21970189

Country of ref document: EP

Kind code of ref document: A1