WO2025001850A1 - 视频处理方法、设备及存储介质 - Google Patents

视频处理方法、设备及存储介质 Download PDF

Info

Publication number
WO2025001850A1
WO2025001850A1 PCT/CN2024/098739 CN2024098739W WO2025001850A1 WO 2025001850 A1 WO2025001850 A1 WO 2025001850A1 CN 2024098739 W CN2024098739 W CN 2024098739W WO 2025001850 A1 WO2025001850 A1 WO 2025001850A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
video
conference
shared data
terminal
Prior art date
Application number
PCT/CN2024/098739
Other languages
English (en)
French (fr)
Inventor
杨亮
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2025001850A1 publication Critical patent/WO2025001850A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/401Support for services or applications wherein the services involve a main real-time session and one or more additional parallel real-time or time sensitive sessions, e.g. white board sharing or spawning of a subconference

Definitions

  • the present disclosure relates to the field of video processing technology, and in particular to a video processing method, device and storage medium.
  • Video conferencing is the use of voice and video to achieve remote face-to-face communication
  • data conferencing is the use of shared data screens (such as shared whiteboards or shared desktops) to achieve remote information exchange.
  • video conferencing and data conferencing need to work together synchronously to meet user needs.
  • a video processing method including: obtaining an identification of a target person and a shared data screen; the shared data screen is a video screen corresponding to the shared data; wherein the target person is a person among the participants who participate in the discussion and/or operate the shared data; the participant is a person who uses a participant terminal to participate in a video conference and a data conference, and the shared data is data shared by the data conference; superimposing the identification of the target person on the shared data screen to obtain a first target video; and sending the first target video to the participant terminal.
  • a video processing device including: an acquisition unit, used to acquire the identification of a target person and a shared data screen; the shared data screen is a video screen corresponding to the shared data; wherein the target person is a person among the participants who participate in the discussion and/or operate the shared data; the participant is a person who uses a participant terminal to participate in a video conference and a data conference, and the shared data is data shared by the data conference; a processing unit, used to superimpose the identification of the target person on the shared data screen to obtain a first target video; and a sending unit, used to send the first target video to the participant terminal.
  • a video processing device comprising: a memory and a processor; the memory and the processor are coupled; the memory is used to store a computer program; and the processor implements the video processing method described in any one of the above aspects or embodiments when executing the computer program.
  • a computer-readable storage medium on which computer program instructions are stored.
  • the computer program instructions are executed by a processor, the video processing method described in any one of the above aspects or embodiments is implemented.
  • a computer program product which includes computer program instructions, and when the computer program instructions are executed by a processor, the video processing method described in any one of the above aspects or embodiments is implemented.
  • FIG1 is a schematic diagram of an implementation environment involved in a video processing method provided in some embodiments of the present disclosure.
  • FIG. 2 is a flowchart of a video processing method provided by some embodiments of the present disclosure.
  • FIG3 is a schematic diagram of a terminal interface provided by some embodiments of the present disclosure.
  • FIG. 4 is a schematic diagram of another terminal interface provided in some embodiments of the present disclosure.
  • FIG5 is a schematic diagram of another terminal interface provided by some embodiments of the present disclosure.
  • FIG. 6 is a schematic diagram of another terminal interface provided in some embodiments of the present disclosure.
  • FIG. 7 is a schematic diagram of another terminal interface provided in some embodiments of the present disclosure.
  • FIG8 is a schematic diagram of another terminal interface provided by some embodiments of the present disclosure.
  • FIG9 is a schematic diagram of the structure of a video processing device provided in some embodiments of the present disclosure.
  • FIG. 10 is a schematic diagram of the structure of another video processing device provided in some embodiments of the present disclosure.
  • first”, “second”, etc. are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of the indicated technical features.
  • a feature defined as “first”, “second”, etc. may explicitly or implicitly include one or more of the features.
  • an auxiliary video is usually added to the video conference to synchronize the data conference.
  • This method displays the shared data screen in multiple conference terminals through the added auxiliary video. The user can see the content written in the shared data screen on the screen of the conference terminal.
  • the video and shared data are displayed separately. Participants cannot see the shared data and the people discussing/operating the shared data on the same interface at the same time.
  • the other participants cannot know in time which person is currently speaking or writing, which will bring inconvenience to the participants and reduce the experience and sense of reality of the participants.
  • the embodiment of the present disclosure proposes a video processing method, in which the identification of the target person is superimposed on the shared data screen and sent to the conference participant terminal.
  • the video screen displayed in the conference participant terminal includes both the identification of the target person and the screen of the shared data.
  • the conference participant terminal will synchronously display the identification of the target person, so that the conference participant can timely perceive the identity of the person currently speaking or writing, which increases the sense of reality and experience of the conference participant.
  • Figure 1 shows an implementation environment involved in a video processing method provided by an embodiment of the present disclosure, and the implementation environment includes: a multi-point control unit (MCU) 110, a data conferencing device 120 and a terminal 130.
  • MCU multi-point control unit
  • MCU 110 is a key device in the multi-point video conferencing system. Its function is equivalent to that of a switch. It separates the information flows from various conference sites synchronously, extracts audio, video, data and other information and signaling, and then sends the information and signaling of each conference site to the same processing module to complete the corresponding audio mixing or switching, video mixing or switching, data broadcasting and routing selection, timing and conference control and other processes. Finally, it recombine the various information required by each conference site and send it to the corresponding terminal system equipment.
  • MCU 110 is used to hold a video conference, encode and decode videos, and coordinate communications between various devices.
  • MCU 110 can support multiple terminals to conduct a video conference, and the terminals participating in the video conference all communicate with MCU 110.
  • MCU 110 manages video conference resources, negotiates between multiple terminals to determine the audio or video encoder/decoder to be used, and can process media streams.
  • the MCU 110 includes a decoding module 111 and a data conference processing module 112.
  • the decoding module 111 may be a hardware or software module having video decoding and video processing functions.
  • the data conference processing module 112 may be a hardware or software module for communicating with a data conference device.
  • the data conference device 120 is used to hold a data conference and can support multiple terminals to share data.
  • the shared data for example, shares the content displayed on the desktop or the content displayed on the whiteboard.
  • the data conferencing device 120 may be a computing device such as a server, a terminal, a notebook, a desktop computer, or a handheld computer.
  • the terminal 130 is used to participate in a video conference and/or a data conference, and can send and receive information, video or data conference requests, etc., and can encode and decode videos, and directly or indirectly capture videos.
  • the terminal 130 participating in a video conference and/or a data conference is called a participating terminal, and the terminal participating in the discussion and/or operating the shared data among the participating terminals is called a target terminal.
  • the terminal 130 may include a mobile phone, a tablet computer, a desktop computer, a laptop computer, a PDA or other devices.
  • the MCU 110 and the data conferencing device 120 can be integrated together or independently set and connected through a network.
  • the disclosed embodiment does not limit how the MCU 110 and the data conferencing device 120 are specifically set.
  • the MCU 110, the data conferencing device 120 and the terminal 130 can be one or more respectively. Each device can be set up in different places respectively. The devices are connected through the network. The embodiment of the present disclosure does not limit the number of the above devices.
  • multiple terminals 130 conduct a video conference through the MCU 110 and a data conference through the data conference device 120. Some of the multiple terminals 130 participate in the discussion or operate the shared data screen. After the MCU 110 obtains the identification of the target person of the part of the terminals, it superimposes the identification of the target person and the shared data screen and sends it to the multiple terminals 130. The screen displayed by the multiple terminals 130 contains the identification of multiple target persons and the shared data screen at the same time.
  • the video processing method provided in the embodiment of the present disclosure can be applied to MCU 110.
  • the execution subject of the video processing method provided in the embodiment of the present disclosure can also be a video processing device.
  • the video processing device can be an MCU, or a control module in the MCU for executing the video processing method. The following description will be made by taking the method provided in the embodiment of the present disclosure as an example of applying MCU.
  • Fig. 2 is a flow chart of a video processing method provided by an embodiment of the present disclosure. As shown in Fig. 2, the method may include S101-S109.
  • S101 The MCU creates a video conference.
  • Video conferencing refers to a meeting in which people in multiple locations have face-to-face conversations using communication equipment and the Internet.
  • the MCU may first receive an operation to create a video conference, and create a video conference in response to the operation.
  • the MCU may receive a request to create a video conference sent by any participating terminal, and respond to the request to create a video conference.
  • the above-mentioned participating terminals are terminals participating in the video conference.
  • S102 The MCU establishes a video conference connection with the participating terminals.
  • the MCU establishes a video conference connection with the participating terminal, that is, the participating terminal joins the video conference created by the MCU.
  • the MCU sends a video conference invitation to the participating terminal based on the identifier of the participating terminal. After the participating terminal responds to the video conference invitation, the MCU establishes a video conference connection with the participating terminal.
  • the MCU is preset with the identification of the participating terminal.
  • the identification of the participating terminal preset by the MCU can be directly input on the MCU, can be pre-stored on the MCU, or can be sent by any terminal, which is not limited in the embodiments of the present disclosure.
  • the identification of the participating terminal can be the code (identity document, ID) of the participating terminal or the Internet protocol (internet protocol, IP) address of the participating terminal.
  • the participating terminals are two mobile phones, and the identifications of the participating terminals are the mobile phone numbers "18000xxxx01" and "18023xxxx28" of the two mobile phones.
  • the MCU creates the video conference, it dials the mobile phone numbers "18000xxxx01" and "18023xxxx28" respectively to invite the participating terminals to join the video conference.
  • the participating terminals accept the invitation sent by the MCU, the MCU completes establishing a video conference connection with the participating terminals.
  • the MCU accepts a request from a participant terminal to join a video conference to establish a video conference connection with the participant terminal.
  • the MCU after the MCU creates a video conference, it generates a conference number "AABB0102". Participants using the conference terminal can join the video conference by entering "AABB0102" in the conference joining interface displayed on the conference terminal.
  • Figure 3 shows a schematic diagram of the interface of the conference terminal with the theme of "XX Conference", which includes a "Conference Number” input position. After entering the video conference number "AABB0102", click "Confirm” to join the video conference.
  • S103 The MCU sends a creation request to the data conference device.
  • the create request is used to create a data conference for the participating terminals.
  • a data conference is a meeting where people in multiple locations discuss or operate shared data through communication equipment and the Internet.
  • the process can be implemented through S103a-S103c.
  • S103a The MCU sends a creation request to the data conference device.
  • the MCU may first receive an operation to create a data conference, and in response to the operation, send a creation request to the data conference device.
  • the MCU may receive a request for creating a data conference sent by any participating terminal, and in response to the request, send a creation request to the data conference device.
  • S103b The data conference device creates a data conference in response to the creation request sent by the MCU.
  • S103c The data conference device establishes a data conference connection with the participating terminal.
  • the data conference device establishes a data conference connection with the participating terminal, indicating that the participating terminal has joined the data conference created by the data conference device.
  • the following are three methods for establishing a data conference connection between a data conference device and a participating terminal.
  • the data conferencing device sends a data conference invitation to the participating terminal based on the identifier of the participating terminal. After the participating terminal responds to the data conference invitation, the data conferencing device establishes a data conference connection with the participating terminal.
  • the identification of the participating terminal may be sent to the data conference device by the MCU, or may be sent to the data conference device by any terminal.
  • the disclosed embodiment does not limit how the data conference device obtains the identification of the participating terminal.
  • the participating terminals are two mobile phones, and the identification of the participating terminals is the mobile phone numbers "18000xxxx01" and "18023xxxx28".
  • the data conferencing device creates the data conference, it dials the mobile phone numbers "18000xxxx01" and "18023xxxx28" respectively to invite the participating terminals to join the data conference.
  • the data conferencing device establishes a data conference connection with the participating terminals.
  • the data conference device accepts a request from a conference participant terminal to join a data conference, so as to establish a data conference connection with the conference participant terminal.
  • the data conference device After the data conference device creates a data conference, it generates a conference number "AABB0101".
  • the participants using the conference terminal can join the data conference by entering "AABB0101" in the conference joining interface displayed on the conference terminal.
  • Figure 4 shows a schematic diagram of the interface of the conference terminal with the theme of "XX Conference", which includes a "Conference Number” input position. After entering the conference number "AABB0101" of the data conference, click "Confirm" to join the data conference.
  • the data conference device after the data conference device creates the data conference, it first sends a message of the data conference creation to the MCU. After receiving the message of the successful creation of the data conference, the MCU sends a prompt message to the participating terminal that has joined the video conference to prompt the participating terminal to join the data conference.
  • Figure 5 shows an interface diagram of a participating terminal with the theme of "XX Conference", which includes a prompt message "Please join the data conference" sent by the MCU to the participating terminal that has joined the video conference. If "Join" is selected in the interface, the participating terminal can join the data conference.
  • the MCU may first execute S102 to establish a video conference connection with the participating terminal, or first execute S103 to enable the data conference device to create a data conference, or execute S102 and S103 simultaneously.
  • the embodiment of the present disclosure does not limit the order of executing S102 and S103.
  • the embodiments of the present disclosure do not limit the order in which the participating terminals join a video conference or a data conference.
  • the data conferencing device receives a request from a target terminal to participate in a discussion and/or operate shared data, creates a target list, and sends the target list to the MCU.
  • the target list is used to record the identifiers of the target terminals.
  • the target terminal is the terminal used by the target person among the participating terminals.
  • the target terminal can be one terminal or multiple terminals.
  • Target persons are those who participate in discussions and/or operate shared data among the participants; participants are those who participate in video conferences and data conferences using participating terminals.
  • Shared data refers to the data shared in a data conference.
  • the content written on a shared whiteboard or the content displayed on a shared desktop (such as a document).
  • the content displayed on the shared desktop can be the content displayed on the shared desktop of any participating terminal.
  • FIG6 shows a schematic diagram of a conference interface with the theme of “XX Conference” in a participant terminal, and the interface includes a video display area for displaying a first target video.
  • a function button for applying for “Participate in the Discussion” is displayed in the interface of the participant terminal.
  • the participant selects “Participate in the Discussion” on the participant terminal interface to join the discussion or explain the shared data.
  • the participant terminal that joins the “Participate in the Discussion” is the target terminal, and the person who uses the target terminal to participate in the discussion or operate the shared data is the target person.
  • the target list can be updated in real time.
  • the data conferencing equipment can add or delete the identification of the target terminal in the target list in real time and send the updated target list to the MCU.
  • S105 The MCU obtains the identification of the target person.
  • the MCU obtaining the identification of the target person includes: the MCU obtaining a second target video.
  • the second target video is a video including a target person's picture collected by the target terminal.
  • the second target video is a video including a target person's picture collected by the target terminal.
  • the video collected by the target terminal including the target person's screen refers to the video collected by the camera of the target terminal or the camera connected to the target terminal when the target terminal conducts a video conference.
  • the video includes the target person's screen, that is, the video includes the screen of the person participating in the discussion or operating the shared data.
  • an example implementation of S105 may include: S105a-S105b.
  • S105a The MCU sends a request for acquiring the identification of the target person to the target terminal based on the identification of the target terminal in the target list.
  • the MCU sends requests for obtaining the identification of the target person to the IP addresses of the multiple target terminals respectively.
  • S105b After receiving the request for obtaining the identification of the target person sent by the MCU, the target terminal sends the identification of the target person to the MCU.
  • the target terminal when the target person is identified as a video including a picture of the target person, the target terminal sends the collected second target video to the MCU.
  • the MCU receives the second target video sent by the target terminal.
  • the second target video is a video stream collected by the target terminal in real time.
  • the identifier of the target person may also be a name, nickname, and/or work number, etc., which is not limited in the embodiments of the present disclosure.
  • participant participates in video conferences or data conferences by logging into the conference software in the participating terminal.
  • participants When logging into the conference software, participants need to log in with an account name and password.
  • the account name can be the user's name, mobile phone number, ID card information, nickname or work number, etc.
  • the account name can be used as an identifier for the participant.
  • the identification of the target person can be in various forms, and using the target person's picture as the target person's identification can more intuitively represent the target person.
  • the participants can clearly see the target person's actions or expressions, which increases the participants' experience.
  • Sharing data screens includes: sharing a whiteboard or sharing a desktop.
  • the shared data screen is a shared whiteboard
  • the MCU sends a request to obtain the shared whiteboard to the data conferencing device.
  • the data conferencing device After receiving the request to obtain the shared whiteboard sent by the MCU, the data conferencing device creates a shared whiteboard and sends the shared whiteboard to the MCU.
  • the shared data screen is a shared desktop.
  • the participating terminal sends the shared desktop to the data conferencing device, and the data conferencing device sends the shared desktop to the MCU after receiving it.
  • S107 The MCU superimposes the identification of the target person on the shared data screen to obtain a first target video.
  • the target person is identified as a video including a picture of the target person
  • a possible implementation method is proposed below, which may include: S107a-S107e.
  • S107a The MCU extracts the image of the target person from the image frame of the second target video to obtain a third target video.
  • the MCU performs boundary value analysis on the image frame of the second target video to identify the image of the target person in the image frame of the second target video.
  • the MCU extracts the image of the target person in the image frame of the second target video to obtain a third target video.
  • the MCU when there are multiple second target videos, the MCU extracts images of the target person from the image frames of the multiple second target videos respectively, and obtains multiple third target videos accordingly.
  • the MCU performs boundary value analysis on each image frame of the second target video, identifies the target person image in each image frame, extracts the target person image in each image frame, and then combines each image frame containing only the target person image into a third target video.
  • This method is only one possible implementation mode, and other methods may also be used to extract the image of the target person in the second target video, which is not limited in the present disclosure.
  • the MCU adjusts the size of the target person image in the third target video based on the size of the shared data image to obtain a fourth target video.
  • Adjusting the size of the target person picture in the third target video includes: reducing or enlarging the size of the target person picture in the third target video.
  • This exemplary step is to prevent the target person image size in the third target video from being too large or too small compared to the shared data image, resulting in a poor superposition effect during subsequent superposition.
  • S107c The MCU determines the target position on the shared data screen.
  • the target position is the position where the fourth target video is superimposed on the shared data screen.
  • the multiple target positions do not overlap each other and can be located on the upper side, lower side, left side or right side of the shared data screen in sequence.
  • the target position can also be set in other ways, based on not blocking the shared data screen, which is not limited in this disclosure.
  • S107d The MCU superimposes the fourth target video on the target position of the shared data screen.
  • the MCU overlays the plurality of fourth target videos on the plurality of target positions of the shared data screen respectively.
  • the S107e only performs transparency processing on the target persons when there is occlusion between at least two target persons, or only performs partial transparency processing on the occluded part, and no transparency processing is required when there is no occlusion.
  • the purpose of transparent processing is to prevent the key actions of the target person who is blocked from being blocked by other target persons.
  • the purpose of opaque processing when there is no blockage is to allow the participants to see the actions of the target person more clearly. This method of dynamically making the target person's image transparent based on the blockage situation can enhance the experience of the participants.
  • Figure 7 shows a schematic diagram of a screen in the first target video displayed on the target terminal, and the screen includes target person A, target person B, target person C, target person D and a shared whiteboard.
  • the "Xxxxxx" included in the shared whiteboard is shared data discussed or written by multiple target persons. Since the target person C and the target person D are obscured from each other in this screen, the screen of the target person C and the target person D displayed is a transparent screen.
  • the above S107e is a possible processing method.
  • the target person screen is displayed at the target position in the shared data screen, and for target persons who do not discuss or operate the shared data, the target person's name and other identification are displayed at the target position in the shared data screen.
  • This processing method can also reduce the occurrence of occlusion to a certain extent, and make the screen look simpler.
  • S108 The MCU sends the first target video to the participating terminals.
  • the MCU may also send identifiers such as the name, surname, nickname and/or work number of the target person to the participating terminal.
  • the MCU sends the name, nickname and/or work number and other identifiers of the target personnel to the participating terminal, which enables the participating terminal to synchronously display the name, nickname and/or work number and other identifiers of the target personnel on the interface, so that the participants can know who the target personnel are in a timely manner, and know who is currently discussing or operating the shared data, giving the participants a more intuitive experience.
  • S109 The participating terminal displays the first target video.
  • FIG8 shows a schematic diagram of an interface in which a participating terminal participates in a topic of “XX conference”, in which a first target video and the name of a target person are displayed, and the first target video includes a screen of a shared whiteboard and target persons, and the names of the target persons are respectively “Zhang xx”, “Li xx”, “Wang x” and “Zhao x”, and shared data “Xxxxxx...” is displayed on the shared whiteboard.
  • a “microphone” icon is displayed to the right of the name of the person.
  • the MCU can superimpose the identification of the target person on the shared data screen and send it to each participating terminal, so that the screen displayed by the participating terminal contains both the identification of the target person and the shared data screen.
  • the participating terminal will synchronously display the identification of the target person, allowing the participating persons to timely perceive the identity of the person currently speaking or writing, thereby increasing the sense of reality and experience of the participating persons.
  • the video processing device includes hardware structures and/or software modules corresponding to the execution of each function.
  • the present disclosure can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is executed in the form of hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the present disclosure.
  • the disclosed embodiment can divide the video processing device into functional modules according to the above method embodiment.
  • each functional module can be divided according to each function, or two or more functions can be integrated into one functional module. It can be implemented in the form of hardware or software.
  • the division of modules in the embodiment of the present disclosure is schematic and is only a logical function division. There may be other division methods in actual implementation. The following is an example of dividing each functional module corresponding to each function.
  • FIG9 is a schematic diagram of the structure of a video processing device provided by an embodiment of the present disclosure, and the video processing device can execute the video processing method provided by the above method embodiment.
  • the video processing device 200 includes: an acquisition unit 201, which is used to acquire the identification of the target person and the shared data screen; the shared data screen is a video screen corresponding to the shared data; wherein the target person is a person who participates in the discussion and/or operates the shared data among the participants; the participants are persons who use the participating terminals to participate in the video conference and the data conference, and the shared data is the data shared by the data conference; the processing unit 202 is used to superimpose the identification of the target person on the shared data screen to obtain the first target video; the sending unit 203 is used to send the first target video to the participating terminal.
  • the acquisition unit 201 can be applied to S105 and S106 in the method embodiment
  • the processing unit 202 can be applied to S107 in the method embodiment
  • the sending unit 203 can be
  • the target person is identified as a video including the target person's screen; the acquisition unit 201 can be used to acquire a second target video; the second target video is a video including the target person's screen captured by the target terminal; the target terminal is the terminal used by the target person among the participating terminals.
  • the shared data screen includes: a shared whiteboard or a shared desktop; the shared data includes: content written on the shared whiteboard or content displayed on the shared desktop.
  • the video processing device 200 further includes a connection unit 204, which is used to establish a video conference connection with a participating terminal before obtaining the identification of the target person and the shared data screen; the sending unit 203 is also used to initiate a creation request to the data conference device, and the creation request is used to create a data conference for the participating terminal.
  • the connection unit 204 can be applied to S103 in the method embodiment, and the sending unit 203 can be applied to S104 in the method embodiment.
  • the video processing device 200 further includes a receiving unit 205, which is used to receive a target list sent by the data conference device after initiating a creation request to the data conference device; the target list is used to record the identification of the target terminal; the target terminal is the terminal used by the target person in the participating terminal; the acquisition unit 201 can be used to receive the shared data screen sent by the data conference device; based on the identification of the target terminal in the target list, send a request to the target terminal to obtain the identification of the target person; receive the identification of the target person sent by the target terminal.
  • the receiving unit 205 can be applied to S104 in the method embodiment
  • the acquisition unit 201 can be applied to S105a-S105b in the method embodiment.
  • the processing unit 202 can be used to extract the image of the target person from the image frame of the second target video to obtain a third target video; adjust the size of the target person image in the third target video based on the size of the shared data screen to obtain a fourth target video; determine the target position on the shared data screen; and superimpose the fourth target video on the target position in the shared data screen to obtain the first target video.
  • the processing unit 202 can be applied to S107a-S107e in the method embodiment.
  • the processing unit 202 can be used to, when there are multiple fourth target videos, determine multiple target positions on the shared data screen; respectively superimpose the multiple fourth target videos on the multiple target positions in the shared data screen; when there is occlusion between the target person images in the multiple fourth target videos superimposed on the shared data screen, respectively perform transparency processing on the target person images in the multiple fourth target videos that are blocked to obtain the first target video.
  • the processing unit 202 can be applied to S107a-S107e in the method embodiment.
  • the plurality of target locations are located at the top, bottom, left, or right of the shared data screen.
  • the processing unit 202 can be used to perform boundary value analysis on the image frame of the second target video to identify the image of the target person in the image frame of the second target video; extract the image of the target person in the image frame of the second target video to obtain the third target video.
  • the processing unit 202 can be applied to S107a in the method embodiment.
  • the embodiment of the present disclosure provides another possible structure of the video processing device involved in the above-mentioned embodiment.
  • the video processing device 300 includes: a processor 302, a bus 304.
  • the video processing device may also include a memory 301; in some embodiments, the video processing device may also include a communication interface 303.
  • the processor 302 can implement or execute various exemplary logic blocks, modules and circuits described in conjunction with the embodiments of the present disclosure.
  • the processor 302 can be a central processing unit, a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component or any combination thereof, which can implement or execute various exemplary logic blocks, modules and circuits described in conjunction with the embodiments of the present disclosure.
  • the processor 302 can also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP (Digital Signal Processor) and a microprocessor, etc.
  • DSP Digital Signal Processor
  • the communication interface 303 is used to connect with other devices through a communication network.
  • the communication network can be Ethernet, wireless access network, wireless local area network (WLAN), etc.
  • the memory 301 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM) or other types of dynamic storage devices that can store information and instructions, or an electrically erasable programmable read-only memory (EEPROM), a disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
  • ROM read-only memory
  • RAM random access memory
  • EEPROM electrically erasable programmable read-only memory
  • disk storage medium or other magnetic storage device or any other medium that can be used to carry or store desired program codes in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
  • the memory 301 may exist independently of the processor 302, and the memory 301 may be connected to the processor 302 via a bus 304 to store instructions or program codes.
  • the processor 302 calls and executes the instructions or program codes stored in the memory 301, the video processing method provided in the embodiment of the present disclosure can be implemented.
  • the memory 301 may also be integrated with the processor 302 .
  • the bus 304 may be an extended industry standard architecture (EISA) bus, etc.
  • the bus 304 may be divided into an address bus, a data bus, a control bus, etc.
  • FIG10 only uses one thick line, but does not mean that there is only one bus or one type of bus.
  • Some embodiments of the present disclosure provide a computer-readable storage medium (e.g., a non-transitory computer-readable storage medium), in which computer program instructions are stored.
  • a computer-readable storage medium e.g., a non-transitory computer-readable storage medium
  • the computer program instructions are executed on a computer, the computer executes the video processing method as described in any of the above embodiments.
  • the above-mentioned computer-readable storage media may include, but are not limited to: magnetic storage devices (e.g., hard disks, floppy disks or magnetic tapes, etc.), optical disks (e.g., Compact Disks (CDs), Digital Versatile Disks (DVDs), etc.), smart cards and flash memory devices (e.g., Erasable Programmable Read-Only Memory (EPROMs), cards, sticks or key drives, etc.).
  • the various computer-readable storage media described in the present disclosure may represent one or more devices and/or other machine-readable storage media for storing information.
  • the term "machine-readable storage medium" may include, but is not limited to, wireless channels and various other media capable of storing, containing and/or carrying instructions and/or data.
  • the embodiments of the present disclosure provide a computer program product including instructions.
  • the computer program product When the computer program product is run on a computer, the computer is enabled to execute the video processing method described in any one of the above embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本公开公开了一种视频处理方法、设备及存储介质。该方法包括:获取目标人员的标识和共享数据画面;共享数据画面为共享数据对应的视频画面;其中,目标人员是参会人员中参与讨论和/或操作共享数据的人员;参会人员是使用参会终端参与视频会议和数据会议的人员,共享数据是数据会议所共享的数据;将目标人员的标识叠加在共享数据画面中,得到第一目标视频;向参会终端发送第一目标视频。

Description

视频处理方法、设备及存储介质
本公开要求于2023年06月29日提交的、申请号为202310791399.9的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及视频处理技术领域,尤其涉及一种视频处理方法、设备及存储介质。
背景技术
视频会议是通过语音和视频的方法实现远程的面对面交流,数据会议是通过共享数据画面(例如共享白板或共享桌面)的方法实现远程信息的交流。在很多远程会议场景中,需要视频会议和数据会议同步协同工作来满足用户的需求。
发明内容
一方面,提供一种视频处理方法,包括:获取目标人员的标识和共享数据画面;共享数据画面为共享数据对应的视频画面;其中,目标人员是参会人员中参与讨论和/或操作共享数据的人员;参会人员是使用参会终端参与视频会议和数据会议的人员,共享数据是数据会议所共享的数据;将目标人员的标识叠加在共享数据画面中,得到第一目标视频;向参会终端发送第一目标视频。
另一方面,提供一种视频处理设备,包括:获取单元,用于获取目标人员的标识和共享数据画面;共享数据画面为共享数据对应的视频画面;其中,目标人员是参会人员中参与讨论和/或操作共享数据的人员;参会人员是使用参会终端参与视频会议和数据会议的人员,共享数据是数据会议所共享的数据;处理单元,用于将目标人员的标识叠加在共享数据画面中,得到第一目标视频;发送单元,用于向参会终端发送第一目标视频。
又一方面,提供一种视频处理设备,包括:存储器和处理器;存储器和处理器耦合;存储器用于存储计算机程序;处理器执行计算机程序时实现上述方面或实施例中任一项所述的视频处理方法。
又一方面,提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序指令,该计算机程序指令被处理器执行时实现上述方面或实施例中任一项所述的视频处理方法。
又一方面,提供一种计算机程序产品,该计算机程序产品包括计算机程序指令,该计算机程序指令被处理器执行时实现上述方面或实施例中任一项所述的视频处理方法。
附图说明
为了更清楚地说明本公开中的技术方案,下面将对本公开一些实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例的附图,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。
图1为本公开一些实施例提供的一种视频处理方法所涉及的实施环境示意图。
图2为本公开一些实施例提供的一种视频处理方法的流程图。
图3为本公开一些实施例提供的一种终端界面示意图。
图4为本公开一些实施例提供的另一种终端界面示意图。
图5为本公开一些实施例提供的另一种终端界面示意图。
图6为本公开一些实施例提供的另一种终端界面示意图。
图7为本公开一些实施例提供的另一种终端界面示意图。
图8为本公开一些实施例提供的另一种终端界面示意图。
图9为本公开一些实施例提供的一种视频处理设备的结构示意图。
图10为本公开一些实施例提供的另一种视频处理设备的结构示意图。
具体实施方式
下面将结合本公开中的附图,对本公开中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
需要说明的是,在本公开中,“示例性地”或者“例如”等词用于表示作例子、例证或说明。本公开中被描述为“示例性地”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性地”或者“例如”等词旨在以一些方式呈现相关概念。
以下,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”等的特征可以明示或者隐含地包括一个或者更多个该特征。
在本公开的描述中,除非另有说明,“/”表示“或”的意思,例如,A/B可以表示A或B。本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:仅A、仅B、以及A和B。此外,“至少一个”是指一个或多个,“多个”是指两个或两个以上。
传统技术中,通常在视频会议中增加一路辅视频来同步进行数据会议,该方法通过增加的一路辅视频在多个会议终端中显示共享数据画面。用户可以在会议终端的屏幕中看见共享数据画面中书写的内容。传统的视频会议和数据会议中,视频与共享数据是分开展示的,参会人员无法在同一个界面中同时看到共享数据和讨论/操作共享数据的人员,当有多个人员参与讨论或在共享数据画面中书写内容时,其余参会人员无法及时得知当前是哪个人员在说话或书写,这将给参会人员带来不便,同时降低参会人员的体验感与真实感。
基于此,本公开实施例提出一种视频处理方法,该方法中将目标人员的标识与共享数据画面叠加在一起发送给参会终端。可以理解的是,参会终端中显示的视频画面同时包含目标人员的标识与共享数据的画面,当目标人员说话或在共享数据的画面中书写时,参会终端将同步显示该目标人员的标识,能够让参会人员及时感知当前说话或书写的人员身份,增加了参会人员的真实感与体验感。
下面将结合附图对本公开实施例的实施方式进行详细描述。
请参考图1,其示出本公开实施例提供的一种视频处理方法所涉及的实施环境,该实施环境包括:多点控制单元(multi-point control unit,MCU)110、数据会议设备120和终端130。
MCU 110,是多点视频会议系统的关键设备,它的作用相当于一个交换机的作用,它将来自各会议场点的信息流经同步分离后,抽取出音频、视频、数据等信息和信令,再将各会议场点的信息和信令送入同一种处理模块,完成相应的音频混合或切换、视频混合或切换、数据广播和路由选择、定时和会议控制等过程,最后将各会议场点所需的各种信息重新组合起来,送往各相应的终端系统设备。
本公开实施例中,MCU 110用于召开视频会议,编解码视频,以及协调各个设备通信等。例如,MCU 110可以支持多个终端进行视频会议,参与视频会议的终端均与MCU 110通信,MCU 110管理视频会议资源,在多个终端之间进行协商,以确定要使用的音频或视频编码器/解码器,并且可以处理媒体流。
在一种可能的实现方式中,MCU 110包括解码模块111和数据会议处理模块112。解码模块111可以是具有视频解码功能,以及视频处理功能的硬件或软件模块。数据会议处理模块112可以是与数据会议设备之间通信的硬件或软件模块。
数据会议设备120用于召开数据会议,能够支持多个终端共享数据,共享数据例如共享桌面显示的内容,或共享白板显示的内容。
示例性地,数据会议设备120可以是如服务器、终端、笔记本、台式机或掌上电脑等计算设备。
终端130用于参与视频会议和/或数据会议,能够发送和接收信息、视频或数据会议请求等,同时能够对视频编解码,以及直接或间接采集视频。本公开实施例中,参与视频会议和/或数据会议的终端130称为参会终端,参会终端中参与讨论和/或操作共享数据的终端称为目标终端。
示例性地,终端130可以包括手机、平板电脑、台式机、笔记本电脑或掌上电脑等设备。
本公开实施例中,MCU 110与数据会议设备120可以集成在一起,也可以独立设置,并通过网络连接。本公开实施例对MCU 110与数据会议设备120具体如何设置不进行限定。
本公开实施例中,MCU 110、数据会议设备120和终端130分别可以是一个或多个,各个设备可以分别设置在不同的地方,各个设备之间通过网络建立连接,本公开实施例对上述设备的数量不做限定。
在一个应用场景中,多个终端130通过MCU 110开展视频会议,通过数据会议设备120开展数据会议。多个终端130中的部分终端参与讨论或操作共享数据画面,MCU 110获取该部分终端的目标人员的标识后,将目标人员的标识与共享数据画面叠加处理,并发送到多个终端130。多个终端130显示的画面中同时包含多个目标人员的标识与共享数据画面。
下文对本公开实施例提供的视频处理方法进行说明:
本公开实施例提供的视频处理方法可以应用于MCU 110。本公开实施例提供的视频处理方法的执行主体还可以为视频处理设备。该视频处理设备可以为MCU,或者为该MCU中的用于执行视频处理方法的控制模块。下文中均以本公开实施例提供的方法是应用MCU为例进行说明。
请参考图2,为本公开实施例提供的一种视频处理方法的流程图。如图2所示,该方法可以包括S101-S109。
S101:MCU创建视频会议。
视频会议,是指位于多个地点的人员,通过通信设备和网络进行面对面交谈的会议。
在一些实施例中,MCU可以先接收创建视频会议的操作,并响应于该操作,创建视频会议。
在一些实施例中,MCU可以接收任意参会终端发送的创建视频会议的请求,并响应该请求,创建视频会议。
上述参会终端是参与视频会议的终端。
S102:MCU与参会终端建立视频会议连接。
MCU与参会终端建立视频会议连接,即参会终端加入了MCU创建的视频会议中。
以下提出两种MCU与参会终端建立视频会议连接的方式。
在一种可能的实现方式中,MCU基于参会终端的标识,向参会终端发送视频会议的邀请,在参会终端响应该视频会议的邀请后,MCU与参会终端建立视频会议连接。
MCU预置有参会终端的标识。MCU预置的参会终端的标识,可以是直接在MCU上输入的,也可以预先在MCU上储存好的,也可以是任意终端发送的,本公开实施例对此不作限定。作为示例,参会终端的标识可以是参会终端的编码(identity document,ID)或参会终端的互联网协议(internet protocol,IP)地址等。
在一个示例中,参会终端为两台手机,参会终端的标识为该两台手机的手机号“18000xxxx01”和“18023xxxx28”,MCU创建好视频会议后,分别拨打手机号“18000xxxx01”和“18023xxxx28”来邀请该参会终端加入视频会议,该参会终端接受MCU发送的邀请后,MCU完成与参会终端建立视频会议连接。
在另一种可能的实现方式中,MCU接受参会终端申请加入视频会议的请求,以建立与参会终端之间的视频会议连接。
在一个示例中,MCU创建好视频会议后,生成一个会议号“AABB0102”,使用参会终端的参会人员在参会终端显示的会议加入界面输入“AABB0102”,即可加入视频会议。如图3所示,图3中示出主题为“XX会议”的参会终端的界面示意图,该示意图中包括“会议号”输入位置,在输入视频会议的会议号“AABB0102”后,点击“确认”则加入视频会议。
S103:MCU向数据会议设备发送创建请求。
创建请求用于为参会终端创建数据会议。
数据会议,是指多个地点的人员,通过通信设备和网络,针对共享数据进行讨论或操作的会议。
例如,该过程可以通过S103a-S103c实现。
S103a:MCU向数据会议设备发送创建请求。
在一些实施例中,MCU可以先接收创建数据会议的操作,并响应于该操作,向数据会议设备发送创建请求。
在一些实施例中,MCU可以接收任意参会终端发送的创建数据会议的请求,并响应该请求,向数据会议设备发送创建请求。
S103b:数据会议设备响应于MCU发送的创建请求,创建数据会议。
S103c:数据会议设备与参会终端建立数据会议连接。
数据会议设备与参会终端建立数据会议连接,表示参会终端加入了数据会议设备创建的数据会议中。
以下提出三种数据会议设备与参会终端建立数据会议连接的方式。
在一种可能的实现方式中,数据会议设备基于参会终端的标识,向参会终端发送数据会议的邀请,在参会终端响应该数据会议的邀请后,数据会议设备与参会终端建立数据会议连接。
参会终端的标识可以是由MCU发送到数据会议设备的,也可以是任意终端发送到数据会议设备的。本公开实施例对数据会议设备如何获取到参会终端的标识不进行限定。
在一个示例中,参会终端为两台手机,参会终端的标识为手机号“18000xxxx01”和“18023xxxx28”,数据会议设备创建好数据会议后,分别拨打手机号“18000xxxx01”和“18023xxxx28”来邀请该参会终端加入数据会议,该参会终端接受数据会议设备发送的邀请后,数据会议设备与参会终端建立数据会议连接。
在另一种可能的实现方式中,数据会议设备接受参会终端申请加入数据会议的请求,以建立与参会终端之间的数据会议连接。
在一个示例中,数据会议设备创建好数据会议后,生成一个会议号“AABB0101”,使用参会终端的参会人员在参会终端显示的会议加入界面输入“AABB0101”,即可加入数据会议。如图4所示,图4中示出主题为“XX会议”的参会终端的界面示意图,该示意图中包括“会议号”输入位置,在输入数据会议的会议号“AABB0101”后,点击“确认”则加入数据会议。
在又一种可能的实现方式中,数据会议设备创建好数据会议后,先将创建好数据会议的消息发送到MCU。MCU接收到该数据会议创建成功的消息后,向已经加入视频会议的参会终端发送提示消息,来提示该参会终端加入数据会议。
在一个示例中,如图5所示,图5中示出主题为“XX会议”的参会终端的界面示意图,该示意图中包含MCU向已经加入视频会议的参会终端发送的提示消息“请加入数据会议”,在该界面中选择“加入”,则该参会终端可加入数据会议。
可以理解的是,MCU可以先执行S102来与参会终端建立视频会议连接,也可以先执行S103来使数据会议设备创建数据会议,或者同时执行S102和S103。本公开实施例对执行S102和S103的顺序不进行限定。
本公开实施例对参会终端加入视频会议或加入数据会议的顺序也不进行限定。
S104(在一些实施例中):数据会议设备接收目标终端参与讨论和/或操作共享数据的请求,建立目标列表,并向MCU发送目标列表。
目标列表用于记录目标终端的标识。
目标终端是参会终端中目标人员所使用的终端。目标终端可以是一个终端,也可以是多个终端。
目标人员是参会人员中参与讨论和/或操作共享数据的人员;参会人员是使用参会终端参与视频会议和数据会议的人员。
共享数据是数据会议所共享的数据。例如,共享白板上书写的内容或共享桌面上显示的内容(如文档)。共享桌面上显示的内容可以是任意参会终端共享桌面中显示的内容。
在一个示例中,如图6所示,图6示出参会终端中主题为“XX会议”的会议界面示意图,该界面包括视频显示区,用于显示第一目标视频。参会终端与数据会议设备建立数据会议连接后,该参会终端的界面中显示申请“参与讨论”的功能按钮,参会人员选择参会终端界面上的“参与讨论”便可加入到参与讨论或讲解共享数据。此时,加入到“参与讨论”的参会终端即为目标终端,使用该目标终端参与讨论或操作共享数据的人员即为目标人员。
可以理解的是,目标列表可以是实时更新的,当在不同时刻有新的目标终端加入到参与讨论或操作共享数据时,或有目标终端退出参与讨论或操作共享数据时,数据会议设备可以实时增加或删除目标列表中目标终端的标识,并且向MCU发送更新后的目标列表。
S105:MCU获取目标人员的标识。
在一些实施例中,当目标人员的标识为包括目标人员画面的视频时,MCU获取目标人员的标识包括:MCU获取第二目标视频。
第二目标视频是目标终端采集的包括目标人员画面的视频,当有多个目标终端时,则对应的有多个第二目标视频。
目标终端采集的包括目标人员画面的视频,是指目标终端进行视频会议时,目标终端的摄像头或与目标终端连接的摄像头所采集的视频,该视频中包括目标人员画面,即该视频中包括参与讨论或操作共享数据的人员画面。
在一种可能的实现方式中,S105的示例实现方式可以包括:S105a-S105b。
S105a:MCU基于目标列表中目标终端的标识,向目标终端发送获取目标人员的标识的请求。
在一种可能的实现方式中,若目标列表中有多个目标终端的IP地址,MCU分别向多个目标终端的IP地址发送获取目标人员的标识的请求。
S105b:目标终端接收MCU发送的获取目标人员的标识的请求后,将目标人员的标识发送到MCU。
在一些实施例中,当目标人员的标识为包括所述目标人员画面的视频时,目标终端将采集到的第二目标视频发送到MCU。
对应地,MCU接收目标终端发送的第二目标视频。
由于视频会议是实时进行的,因此第二目标视频是目标终端实时采集的视频流。
在一些实施例中,目标人员的标识还可以是姓名、昵称和/或工号等,本公开实施例对此不做限定。
可以理解的是,一般的,参会人员(包括目标人员)是通过登录参会终端中的会议软件来参与视频会议或数据会议的,参会人员在登录会议软件时,需要登录账户名和密码,账户名可以是用户的姓名、手机号、身份证信息、昵称或工号等信息,该账户名即可以作为参会人员的标识。
可以理解的是,上述目标人员的标识可以有多种形式,而用目标人员的画面来作为目标人员的标识,能够更直观的表示目标人员。后续在显示包含目标人员的画面的第一目标视频时,参会人员能够清楚地看到目标人员的动作或表情,增加参会人员的体验感。
S106:MCU获取共享数据画面。
共享数据画面包括:共享白板或共享桌面。
在一种可能的实现方式中,共享数据画面为共享白板,MCU向数据会议设备发送获取共享白板的请求。数据会议设备接收到MCU发送的获取共享白板的请求后,创建共享白板,并向MCU发送共享白板。
在另一种可能的实现方式中,共享数据画面为共享桌面,参会终端向数据会议设备发送共享桌面,数据会议设备接收到共享桌面后发送给MCU。
S107:MCU将目标人员的标识叠加在共享数据画面中,得到第一目标视频。
当目标人员的标识为包括目标人员画面的视频时,以下提出一种可能的实现方法,可以包括:S107a-S107e。
S107a:MCU从第二目标视频的图像帧中提取目标人员的图像,得到第三目标视频。
例如,第一步,MCU对所述第二目标视频的图像帧进行边界值分析,识别出第二目标视频的图像帧中目标人员的图像。
第二步,MCU提取所述第二目标视频的图像帧中目标人员的图像,得到第三目标视频。
在一些实施例中,当有多个第二目标视频时,MCU分别从多个第二目标视频的图像帧中提取目标人员的图像,对应得到多个第三目标视频。
该可能的方法中,MCU将第二目标视频的每个图像帧进行边界值分析,识别出每个图像帧中的目标人员图像,并提取出每个图像帧中的目标人员图像,再将每个只包含目标人员图像的图像帧组合成第三目标视频。
该方法仅为一种可能的实现方式,还可以采用其他方式实现提取第二目标视频中的目标人员的图像,本公开对此不做限定。
S107b(在一些实施例中):MCU基于共享数据画面的大小,调整第三目标视频中的目标人员画面的大小,得到第四目标视频。
调整第三目标视频中的目标人员画面的大小包括:缩小或放大第三目标视频中目标人员画面的大小。
该示例步骤是为了防止第三目标视频中目标人员画面尺寸相比于共享数据画面而言过大或过小,导致后续叠加时呈现不好的叠加效果。
S107c:MCU在共享数据画面上确定目标位置。
目标位置是第四目标视频叠加在共享数据画面上的位置。当有第四目标视频时,对应的有多个目标位置。
多个目标位置互不重叠,可以依次位于所述共享数据画面的上侧、下侧、左侧或右侧。目标位置的设置还可以有其他方式,以不遮挡共享数据画面为基准,本公开对此不做限定。
S107d:MCU将第四目标视频叠加在共享数据画面的目标位置。
在一些实施例中,MCU分别将多个第四目标视频叠加在共享数据画面的多个目标位置。
S107e:在共享数据画面上叠加的多个第四目标视频中的目标人员画面之间存在遮挡的情况下,MCU分别对多个第四目标视频中存在遮挡情况的目标人员画面进行透明化处理,以得到第一目标视频。
由于目标终端所采集的视频中的目标人员是运动的,因此多个目标人员之间有时存在遮挡,有时不会遮挡,而该S107e只在至少两个目标人员之间存在遮挡时对目标人员进行透明化处理,或者只对遮挡部分进行局部透明化处理,在不遮挡时可以不用透明化处理。
透明化处理是为了防止存在遮挡的目标人员的关键动作被其他目标人员遮挡住,在不遮挡时不透明化处理是为了让参会人员更清楚地看到目标人员的动作,该种基于遮挡情况动态对目标人员画面透明化处理的方式能够增强参会人员的体验感。
在一个示例中,若共享数据画面为共享白板,如图7所示,图7中示出目标终端显示的第一目标视频中一个画面的示意图,该画面中包括目标人员A、目标人员B、目标人员C、目标人员D和共享白板,共享白板中包括的“Xxxxxx……”为多个目标人员所讨论或书写的共享数据,该画面中由于目标人员C和目标人员D存在互相遮挡的情况,因此显示出来的目标人员C和目标人员D的画面为透明化处理后的画面。
上述S107e为一种可能的处理方式,实际上,MCU将多个目标人员叠加在共享数据画面后,当一些目标人员讨论或操作共享数据时,在共享数据画面中的目标位置处显示该目标人员画面,另外没有讨论或操作共享数据的目标人员,在共享数据画面中的目标位置处显示该目标人员姓名等标识。该处理方式也能在一定程度上降低遮挡情况的出现,同时使画面看上去更简洁。
S108:MCU向参会终端发送第一目标视频。
在一些实施例中,当第一目标视频中叠加的是目标人员画面时,MCU还可以向参会终端发送目标人员的姓名、姓名、昵称和/或工号等标识。
可以理解的是,MCU将目标人员的姓名、昵称和/或工号等标识发送到参会终端,能够让参会终端在界面同步上显示目标人员的姓名、昵称和/或工号等标识,以使得参会人员能及时知道目标人员有哪些,以及得知当前讨论或操作共享数据的人员是谁,给参会人员带来更直观的感受。
S109:参会终端显示第一目标视频。
在一个示例中,如图8所示,图8示出参会终端参与主题为“XX会议”的界面示意图,该界面中显示有第一目标视频和目标人员的姓名,该第一目标视频中包含共享白板和目标人员的画面,目标人员的姓名分别为“张xx”、“李xx”、“王x”和“赵x”,共享白板上显示有共享数据“Xxxxxx……”。当目标人员中任意人员讨论或操作共享数据时,该人员的姓名右边对应显示“话筒”图标。
本公开实施例提出的视频处理方法中,MCU能够将目标人员的标识与共享数据画面叠加后发送到各个参会终端,以使得参会终端显示的画面中同时包含目标人员的标识和共享数据画面,当目标人员说话或在共享数据的画面中书写时,参会终端将同步显示该目标人员的标识,能够让参会人员及时感知当前说话或书写的人员身份,增加了参会人员的真实感与体验感。
可以理解的是,视频处理设备为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本公开实施例描述的各示例的算法步骤,本公开能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。
本公开实施例可以根据上述方法实施例对视频处理设备进行功能模块的划分,例如,可以对应每一个功能划分每一个功能模块,也可以将两个或两个以上的功能集成在一个功能模块中。上述集成的模块既可 以采用硬件的形式实现,也可以采用软件的形式实现。需要说明的是,本公开实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。下面以采用对应每一个功能划分每一个功能模块为例进行说明。
图9是本公开实施例提供的一种视频处理设备的结构示意图,视频处理设备可以执行上述方法实施例提供的视频处理方法。如图9所示,视频处理设备200包括:获取单元201,用于获取目标人员的标识和共享数据画面;共享数据画面为共享数据对应的视频画面;其中,目标人员是参会人员中参与讨论和/或操作共享数据的人员;参会人员是使用参会终端参与视频会议和数据会议的人员,共享数据是数据会议所共享的数据;处理单元202,用于将目标人员的标识叠加在共享数据画面中,得到第一目标视频;发送单元203,用于向参会终端发送第一目标视频。例如,结合图2,获取单元201可以应用于方法实施例中的S105和S106,处理单元202可以应用于方法实施例中的S107,发送单元203可以应用于方法实施例中的S108。
在一些实施例中,目标人员的标识为包括目标人员画面的视频;获取单元201可以用于,获取第二目标视频;第二目标视频是目标终端采集的包括目标人员画面的视频;目标终端是参会终端中目标人员所使用的终端。
在一些实施例中,共享数据画面包括:共享白板或共享桌面;共享数据包括:共享白板上书写的内容或共享桌面上显示的内容。
在一些实施例中,视频处理设备200还包括连接单元204,用于在获取目标人员的标识和共享数据画面之前,与参会终端建立视频会议连接;发送单元203还用于,向数据会议设备发起创建请求,创建请求用于为参会终端创建数据会议。例如,结合图2,连接单元204可以应用于方法实施例中的S103,发送单元203可以应用于方法实施例中的S104。
在一些实施例中,视频处理设备200还包括接收单元205,用于向数据会议设备发起创建请求之后,接收数据会议设备发送的目标列表;目标列表用于记录目标终端的标识;目标终端是参会终端中目标人员所使用的终端;获取单元201可以用于,接收数据会议设备发送的共享数据画面;基于目标列表中的目标终端的标识,向目标终端发送获取目标人员的标识的请求;接收目标终端发送的目标人员的标识。例如,结合图2,接收单元205可以应用于方法实施例中的S104,获取单元201可以应用于方法实施例中的S105a-S105b。
在一些实施例中,处理单元202可以用于,从第二目标视频的图像帧中提取目标人员的图像,得到第三目标视频;基于共享数据画面的大小,调整第三目标视频中的目标人员画面的大小,得到第四目标视频;在共享数据画面上确定目标位置;将第四目标视频叠加在共享数据画面中的目标位置,得到第一目标视频。例如,处理单元202可以应用于方法实施例中的S107a-S107e。
在一些实施例中,处理单元202可以用于,在有多个第四目标视频的情况下,在共享数据画面上确定多个目标位置;分别将多个第四目标视频叠加在共享数据画面中的多个目标位置;在共享数据画面上叠加的多个第四目标视频中的目标人员画面之间存在遮挡的情况下,分别对多个第四目标视频中存在遮挡情况的目标人员画面进行透明化处理,以得到第一目标视频。例如,处理单元202可以应用于方法实施例中的S107a-S107e。
在一些实施例中,多个目标位置位于共享数据画面的上侧、下侧、左侧或右侧。
在一些实施例中,处理单元202可以用于,对第二目标视频的图像帧进行边界值分析,识别出第二目标视频的图像帧中目标人员的图像;提取第二目标视频的图像帧中目标人员的图像,得到第三目标视频。例如,处理单元202可以应用于方法实施例中的S107a。
在采用硬件的形式实现上述集成的模块的功能的情况下,本公开实施例提供了上述实施例中所涉及的视频处理设备的另一种可能的结构。如图10所示,该视频处理设备300包括:处理器302,总线304。在一些实施例中,该视频处理设备还可以包括存储器301;在一些实施例中,该视频处理设备还可以包括通信接口303。
处理器302,可以实现或执行结合本公开实施例所描述的各种示例性的逻辑方框、模块和电路。该处理器302可以是中央处理器,通用处理器,数字信号处理器,专用集成电路,现场可编程门阵列或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合,其可以实现或执行结合本公开实施例所描述的各种示例性的逻辑方框、模块和电路。处理器302也可以是实现计算功能的组合,例如包含一个或多个微处理器的组合,DSP(Digital Signal Processor,数字信号处理器)和微处理器的组合等。
通信接口303,用于与其他设备通过通信网络连接。该通信网络可以是以太网,无线接入网,无线局域网(wireless local area networks,WLAN)等。
存储器301,可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。
作为一种可能的实现方式,存储器301可以独立于处理器302存在,存储器301可以通过总线304与处理器302相连接,用于存储指令或者程序代码。处理器302调用并执行存储器301中存储的指令或程序代码时,能够实现本公开实施例提供的视频处理方法。
另一种可能的实现方式中,存储器301也可以和处理器302集成在一起。
总线304,可以是扩展工业标准结构(extended industry standard architecture,EISA)总线等。总线304可以分为地址总线、数据总线、控制总线等。为便于表示,图10中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
本公开的一些实施例提供了一种计算机可读存储介质(例如,非暂态计算机可读存储介质),该计算机可读存储介质中存储有计算机程序指令,计算机程序指令在计算机上运行时,使得计算机执行如上述实施例中任一实施例所述的视频处理方法。
示例性地,上述计算机可读存储介质可以包括,但不限于:磁存储器件(例如,硬盘、软盘或磁带等),光盘(例如,压缩盘(Compact Disk,CD)、数字通用盘(Digital Versatile Disk,DVD)等),智能卡和闪存器件(例如,可擦写可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)、卡、棒或钥匙驱动器等)。本公开描述的各种计算机可读存储介质可代表用于存储信息的一个或多个设备和/或其它机器可读存储介质。术语“机器可读存储介质”可包括但不限于,无线信道和能够存储、包含和/或承载指令和/或数据的各种其它介质。
本公开实施例提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得该计算机执行上述实施例中任一实施例所述的视频处理方法。
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何在本公开揭露的技术范围内的变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应该以权利要求的保护范围为准。

Claims (11)

  1. 一种视频处理方法,包括:
    获取目标人员的标识和共享数据画面;所述共享数据画面为共享数据对应的视频画面;其中,所述目标人员是参会人员中参与讨论和/或操作所述共享数据的人员;所述参会人员是使用参会终端参与视频会议和数据会议的人员,所述共享数据是所述数据会议所共享的数据;
    将所述目标人员的标识叠加在所述共享数据画面中,得到第一目标视频;
    向所述参会终端发送所述第一目标视频。
  2. 根据权利要求1所述的方法,其中,所述目标人员的标识为包括所述目标人员的画面的视频;获取所述目标人员的标识包括:
    获取第二目标视频;所述第二目标视频是目标终端采集的包括所述目标人员的画面的视频;所述目标终端是所述参会终端中所述目标人员所使用的终端。
  3. 根据权利要求1或2所述的方法,其中,所述共享数据画面包括:共享白板或共享桌面;所述共享数据包括:所述共享白板上书写的内容或所述共享桌面上显示的内容。
  4. 根据权利要求1或2所述的方法,其中,在获取所述目标人员的标识和所述共享数据画面之前,所述方法还包括:
    与所述参会终端建立视频会议连接;
    向数据会议设备发起创建请求,所述创建请求用于为所述参会终端创建所述数据会议。
  5. 根据权利要求4所述的方法,其中,向所述数据会议设备发起所述创建请求之后,所述方法还包括:
    接收所述数据会议设备发送的目标列表;所述目标列表用于记录目标终端的标识;所述目标终端是所述参会终端中所述目标人员所使用的终端;
    获取所述目标人员的标识和所述共享数据画面,包括:
    接收所述数据会议设备发送的所述共享数据画面;
    基于所述目标列表中的所述目标终端的标识,向所述目标终端发送获取所述目标人员的标识的请求;
    接收所述目标终端发送的所述目标人员的标识。
  6. 根据权利要求2所述的方法,其中,将所述目标人员的标识叠加在所述共享数据画面中,得到所述第一目标视频,包括:
    从所述第二目标视频的图像帧中提取所述目标人员的图像,得到第三目标视频;
    基于所述共享数据画面的大小,调整所述第三目标视频中的所述目标人员的画面的大小,得到第四目标视频;
    在所述共享数据画面上确定目标位置;
    将所述第四目标视频叠加在所述共享数据画面中的所述目标位置,得到所述第一目标视频。
  7. 根据权利要求6所述的方法,其中,在有多个所述第四目标视频的情况下,在所述共享数据画面上确定所述目标位置,包括:
    在所述共享数据画面上确定多个目标位置;
    将所述第四目标视频叠加在所述共享数据画面中的所述目标位置,得到所述第一目标视频,包括:
    分别将多个所述第四目标视频叠加在所述共享数据画面中的所述多个目标位置;
    在所述共享数据画面上叠加的所述多个所述第四目标视频中的目标人员的画面之间存在遮挡的情况下,分别对所述多个所述第四目标视频中存在遮挡情况的目标人员的画面进行透明化处理,以得到所述第一目标视频。
  8. 根据权利要求6或7所述的方法,其中,所述目标位置位于所述共享数据画面的上侧、下侧、左侧或右侧。
  9. 根据权利要求6所述的方法,其中,从所述第二目标视频的图像帧中提取所述目标人员的图像,得到所述第三目标视频,包括:
    对所述第二目标视频的图像帧进行边界值分析,识别出所述第二目标视频的图像帧中所述目标人员的图像;
    提取所述第二目标视频的图像帧中所述目标人员的图像,得到所述第三目标视频。
  10. 一种视频处理设备,包括存储器和处理器;所述存储器和所述处理器耦合;所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令;其中,当所述处理器执行所述计算机指令时,使得所述视频处理设备执行根据权利要求1-9中任意一项所述的视频处理方法。
  11. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机指令;其中,当所述计算机指令在视频处理设备上运行时,使得所述视频处理设备执行根据权利要求1-9中任意一项所述的视频处理方法。
PCT/CN2024/098739 2023-06-29 2024-06-12 视频处理方法、设备及存储介质 WO2025001850A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310791399.9A CN119232872A (zh) 2023-06-29 2023-06-29 一种视频处理方法、设备及存储介质
CN202310791399.9 2023-06-29

Publications (1)

Publication Number Publication Date
WO2025001850A1 true WO2025001850A1 (zh) 2025-01-02

Family

ID=93937581

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/098739 WO2025001850A1 (zh) 2023-06-29 2024-06-12 视频处理方法、设备及存储介质

Country Status (2)

Country Link
CN (1) CN119232872A (zh)
WO (1) WO2025001850A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111880695A (zh) * 2020-08-03 2020-11-03 腾讯科技(深圳)有限公司 一种屏幕共享方法、装置、设备及存储介质
JP7043110B1 (ja) * 2020-10-29 2022-03-29 株式会社パルケ オンライン会議支援装置、オンライン会議支援プログラム、およびオンライン会議支援システム
CN114489910A (zh) * 2022-02-10 2022-05-13 北京字跳网络技术有限公司 一种视频会议数据显示方法、装置、设备及介质
CN114942806A (zh) * 2022-04-18 2022-08-26 阿里巴巴(中国)有限公司 界面显示方法、显示处理方法及设备
CN115002084A (zh) * 2022-08-01 2022-09-02 广州迈聆信息科技有限公司 书写笔迹处理方法、装置、电子设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111880695A (zh) * 2020-08-03 2020-11-03 腾讯科技(深圳)有限公司 一种屏幕共享方法、装置、设备及存储介质
JP7043110B1 (ja) * 2020-10-29 2022-03-29 株式会社パルケ オンライン会議支援装置、オンライン会議支援プログラム、およびオンライン会議支援システム
CN114489910A (zh) * 2022-02-10 2022-05-13 北京字跳网络技术有限公司 一种视频会议数据显示方法、装置、设备及介质
CN114942806A (zh) * 2022-04-18 2022-08-26 阿里巴巴(中国)有限公司 界面显示方法、显示处理方法及设备
CN115002084A (zh) * 2022-08-01 2022-09-02 广州迈聆信息科技有限公司 书写笔迹处理方法、装置、电子设备和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "The Shared Screen Supports Portrait Overlay, Making Online Sharing More Vivid", VOOV MEETING, 5 January 2021 (2021-01-05), XP093253937, Retrieved from the Internet <URL:https://meeting.tencent.com/news/gongxiang.html> *

Also Published As

Publication number Publication date
CN119232872A (zh) 2024-12-31

Similar Documents

Publication Publication Date Title
US8917306B2 (en) Previewing video data in a video communication environment
US9124765B2 (en) Method and apparatus for performing a video conference
CN105407408B (zh) 一种在移动终端实现多人音视频的方法及移动终端
US9485284B2 (en) Customizing participant information in an online conference
RU2637469C2 (ru) Способ, устройство и система осуществления вызовов в видеоконференциях, основанных на унифицированном общении
US20130198657A1 (en) Integrated Public/Private Online Conference
US20130290870A1 (en) Inter-Conference Alerts for Simultaneous Online Conferences
US20150032809A1 (en) Conference Session Handoff Between Devices
US11924581B2 (en) Multi-device teleconferences
WO2013182056A1 (zh) 视频通讯的方法及用于视频通讯的终端、服务器和系统
US8861704B2 (en) Systems, methods, and computer programs for transitioning from a phone-only mode to a web conference mode
US20130298040A1 (en) Systems, Methods, and Computer Programs for Providing Simultaneous Online Conferences
US11956561B2 (en) Immersive scenes
CN104283857A (zh) 多媒体会议的建立方法、装置及系统
WO2016206471A1 (zh) 多媒体业务处理方法、系统及装置
WO2025020653A1 (zh) 网络通话方法、装置、设备和介质
WO2022116033A1 (zh) 协同操作方法、装置、终端及存储介质
KR20160085302A (ko) 동기식 통신 시스템 및 방법
WO2025001850A1 (zh) 视频处理方法、设备及存储介质
EP4047875A1 (en) Method, computer program and system for configuring a multi-point video conferencing session
WO2009134261A1 (en) Messaging between events
EP2852092A1 (en) Method and system for videoconferencing
JP7456162B2 (ja) プログラム、通信方法、通信端末および通信システム
US20110069143A1 (en) Communications Prior To A Scheduled Event
JP7243323B2 (ja) 通信端末、通信システム、表示制御方法およびプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24830496

Country of ref document: EP

Kind code of ref document: A1