WO2016008209A1 - 一种移动终端的工具及智能整合音视频的服务器 - Google Patents

一种移动终端的工具及智能整合音视频的服务器 Download PDF

Info

Publication number
WO2016008209A1
WO2016008209A1 PCT/CN2014/086576 CN2014086576W WO2016008209A1 WO 2016008209 A1 WO2016008209 A1 WO 2016008209A1 CN 2014086576 W CN2014086576 W CN 2014086576W WO 2016008209 A1 WO2016008209 A1 WO 2016008209A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
audio
real
communication
mobile terminal
Prior art date
Application number
PCT/CN2014/086576
Other languages
English (en)
French (fr)
Inventor
宋晨枫
Original Assignee
北京小鱼儿科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京小鱼儿科技有限公司 filed Critical 北京小鱼儿科技有限公司
Priority to US15/326,248 priority Critical patent/US10349008B2/en
Publication of WO2016008209A1 publication Critical patent/WO2016008209A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/152Multipoint control units therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/142Constructional details of the terminal equipment, e.g. arrangements of the camera and the display
    • H04N2007/145Handheld terminals

Definitions

  • the present invention relates to communication and image processing technologies, and in particular, to a tool for a mobile terminal and a server for intelligently integrating audio and video.
  • One of the technical problems solved by the present invention is that the monitored person can see the entire monitored scene instead of a part of the monitored scene if the monitored scene exceeds the shooting range of one camera.
  • a tool for installing a mobile terminal comprising: a sending unit configured to send a request for an integrated video of real-time video collected by a plurality of communication terminals in response to the first trigger,
  • the plurality of communication terminals respectively collect real-time video of a part of a specific scene, and the real-time video collected by the plurality of communication terminals is integrated to form a real-time video of the specific scene; and the receiving unit is configured to receive the plurality of An integrated video of real-time video collected by the communication terminal, wherein the transmitting unit transmits the first communication terminal set in the plurality of communication terminals corresponding to the video displayed on the display of the mobile terminal to the first communication terminal set
  • the communication terminal collects the integrated audio request of the real-time audio, and the receiving unit receives the integrated audio of the real-time audio collected by the communication terminal in the first communication terminal set, wherein the display is on the display of the mobile terminal
  • the displayed video is part of an integrated video of live video captured by the plurality of communication terminals
  • the tool further includes: a configuration unit, configured to receive a configuration in which the user integrates the video and audio collected by the multiple communication terminals.
  • a configuration unit configured to receive a configuration in which the user integrates the video and audio collected by the multiple communication terminals.
  • the sending unit further initiates a connection request to the communication terminal in the first communication terminal set, and establishes two-way communication with the communication terminal in the first communication terminal set in response to the automatic response of the communication terminal in the first communication terminal set.
  • the tool further includes: a scaling unit configured to scale the video displayed on the display of the mobile terminal in response to the user zooming the video displayed on the display of the mobile terminal, so that the video displayed on the display corresponds to The first set of communication terminals is changed.
  • a scaling unit configured to scale the video displayed on the display of the mobile terminal in response to the user zooming the video displayed on the display of the mobile terminal, so that the video displayed on the display corresponds to The first set of communication terminals is changed.
  • the tool further includes: a sliding unit configured to slide the video displayed on the display of the mobile terminal in response to the sliding operation of the video displayed on the display of the mobile terminal by the user, so that the video displayed on the display corresponds to The first set of communication terminals is changed.
  • a sliding unit configured to slide the video displayed on the display of the mobile terminal in response to the sliding operation of the video displayed on the display of the mobile terminal by the user, so that the video displayed on the display corresponds to The first set of communication terminals is changed.
  • the first trigger includes any one of: starting of the mobile terminal; activation of the tool when the mobile terminal is powered on; and performing a specific action on the user interface when the mobile terminal is powered on.
  • the sending unit in response to receiving the selection of the specific person in the specific scenario, sends an integrated view of the real-time video and audio related to the specific person in the real-time video and audio collected by the multiple communication terminals. And receiving, by the receiving unit, the integrated video and audio related to the real-time video and audio of the specific person in the real-time video and audio collected by the plurality of communication terminals.
  • the sending unit initiates a connection request to the communication terminal that collects real-time video and audio related to the specific person in response to receiving the selection of the specific person in the specific scenario, and in response to collecting the specific
  • the automatic response of the human real-time video and audio communication terminal establishes two-way communication with the communication terminal that collects real-time video and audio related to the specific person.
  • the selection for a particular person in the particular scene is a click on a particular person in the video displayed on the display of the mobile terminal or the name of the particular person.
  • a server for intelligently integrating real-time audio and video including: a video and audio receiving device configured to receive real-time video, audio, and a mobile terminal from a plurality of communication terminals.
  • Integrated video of real-time video collected by multiple communication terminals Requesting, from the mobile terminal, a request for integrated audio of real-time audio collected by the communication terminal of the first communication terminal set of the plurality of communication terminals;
  • the video and audio integration device configured to respond to the pair from the mobile terminal a request for integrated video of real-time video collected by the plurality of communication terminals, integrating real-time video collected by the plurality of communication terminals, and responding to a first communication terminal of the plurality of communication terminals from the mobile terminal a request for integrated audio of real-time audio collected by the communication terminal in the collection, integrating real-time audio collected by the communication terminal in the first communication terminal set of the plurality of communication terminals;
  • the video and audio transmitting device configured to The integrated video or/and integrated audio is sent to the mobile terminal
  • the server further comprises: a communication establishing unit configured to respond to the connection request from the mobile terminal to the communication terminal in the first communication terminal set to the communication in the first communication terminal set The terminal forwards the connection request and establishes two-way communication between the mobile terminal and the communication terminal in the first set of communication terminals in response to an automatic response of the communication terminal in the first set of communication terminals.
  • a communication establishing unit configured to respond to the connection request from the mobile terminal to the communication terminal in the first communication terminal set to the communication in the first communication terminal set The terminal forwards the connection request and establishes two-way communication between the mobile terminal and the communication terminal in the first set of communication terminals in response to an automatic response of the communication terminal in the first set of communication terminals.
  • the video and audio integration device includes: a video picture comparison module configured to compare real-time video collected by the multiple communication terminals in real time, and determine an overlap between real-time videos collected by the multiple communication terminals And an overlap portion elimination module configured to eliminate overlapping portions between the real-time videos collected by the plurality of communication terminals, thereby integrating real-time video collected by the plurality of communication terminals.
  • the server further includes: an identifying device, in response to receiving the integrated video and audio of the real-time video and audio related to the specific person collected in the real-time video and audio collected by the mobile terminal from the mobile terminal Requesting, identifying real-time video and audio related to the specific person in real-time video and audio collected by the plurality of communication terminals, and integrating the real-time video and audio related to the specific person by the video and audio integration device
  • the viewing and audio transmitting device transmits the integrated real-time video and audio related to the specific person to the mobile terminal.
  • the server further includes: identifying means, in response to receiving a connection request from the mobile terminal to the communication terminal that collects real-time video and audio related to the specific person, identifying the real-time view collected by the plurality of communication terminals In the audio, the real-time view and audio of the specific person are involved, thereby identifying a communication terminal that collects real-time video and audio related to the specific person, and the communication establishing unit collects a real-time view involving the specific person, The audio communication terminal forwards the connection request and rings An automatic response of the communication terminal of the real-time video and audio related to the specific person is collected, and two-way communication is established between the mobile terminal and the communication terminal that collects real-time video and audio related to the specific person.
  • a plurality of communication terminals respectively collect real-time video of a part of a specific scene, and the real-time video collected by the plurality of communication terminals is integrated to form a real-time video of the specific scene, so that the mobile terminal sends After the request for the integrated video, the integrated video can be displayed on the mobile terminal, and the effect that the monitored person can see the entire monitored scene can be achieved if the monitored scene exceeds the shooting range of one camera.
  • the monitoring user may monitor a part of the scene at a certain point in time, that is, see the video of the part of the scene, and hear the audio of the part of the scene, so the embodiment of the present invention may be based on a first communication terminal set of the plurality of communication terminals corresponding to the video displayed on the display of the mobile terminal, transmitting a request for integrated audio of the real-time audio collected by the communication terminal in the first communication terminal set, and receiving only The integrated audio of the real-time audio collected by the communication terminal in the first set of communication terminals.
  • the mobile terminal when the mobile terminal receives the integrated video of the real-time video collected from the plurality of communication terminals, it automatically knows which part of the integrated video the video displayed by the display corresponds to according to the size of the display and the screen size currently displayed on the display. And a corresponding first set of communication terminals, and acquiring integrated audio of the real-time audio collected by the communication terminal in the first communication terminal set, that is, the embodiment ensures that the video displayed on the display and the user hear The audio is corresponding, which has the beneficial effect of effectively avoiding the interference of other parts of the audio on part of the video displayed by the display due to receiving all the audio.
  • the tool of an embodiment of the present invention further includes a configuration unit, configured to receive a configuration in which a user integrates video and audio collected by the plurality of communication terminals, that is, the plurality of communication terminals are specified by a user and a user.
  • the mobile terminal is bound so that the next time the response is triggered, the integrated video of which mobile terminals are requested can be known. In this way, it can be specified by the user to specify
  • the user bound by the mobile terminal wants to integrate multiple communication terminals of its video and audio, and achieves the beneficial effect that the user can flexibly specify the communication terminal that is bound to the terminal and integrates its video and audio as needed.
  • the tool installed in the mobile terminal may initiate a connection request to the communication terminal in the first communication terminal set, and respond to the first communication in response to the automatic response of the communication terminal in the first communication terminal set.
  • the communication terminal in the terminal set establishes two-way communication, so that the embodiment can automatically initiate a connection request to the communication terminal in the set according to the identified specific communication terminal set, thereby establishing communication with the identified communication terminal to achieve monitoring.
  • the user can see who on the display, can be as easy to use as the two-way communication, this is the current conference monitoring system can not do, is a pioneering work of the surveillance system.
  • the communication terminal in the first communication terminal set automatically responds, ensuring that, for example, the person in the monitored conference scene does not feel such a handover, and seamless conference monitoring is realized, so that the fluency of the conference and the call is not interrupted.
  • the tool installed in the mobile terminal provided by one embodiment of the present invention may further include a zooming unit and/or a sliding unit, and the first corresponding to the video displayed on the display is changed in response to the zooming operation and/or the sliding operation of the user.
  • the user can arbitrarily scale and move the video screen according to the needs of watching the video, so that if the monitoring user wants to talk to another person in the monitored scene, the video screen is swiped, so that the screen displayed by the display becomes If there is more than one person in the screen of the current monitor, but the monitoring user only wants to talk to one person, the screen displayed by the zoom display becomes only the person, so that the monitoring user can select and be monitored as desired.
  • the tool installed in the mobile terminal provided by one embodiment of the present invention can transmit the real-time view and audio collected for the plurality of communication terminals in response to receiving the selection for the specific person in the specific scenario, the specific The real-time view of the human being, the integrated audio and video request of the audio, and receiving the integrated video and audio of the real-time video and audio of the specific person collected by the plurality of communication terminals, thereby making the mobile terminal
  • the users next to it are very clear about which people in a particular scene need to talk in real time. Just speaking or entering the names of these people can quickly lock and view the integrated view of those people without having to zoom or slide the screen on the display. Audio, saving time and effort in manual screening. This is also the pioneering work of the conference monitoring system.
  • the tool installed in the mobile terminal provided by one embodiment of the present invention can initiate a connection request to a communication terminal that collects real-time video and audio related to the specific person in response to receiving a selection for a specific person in the specific scene. And in response to the automatic response of the communication terminal that collects the real-time video and audio related to the specific person, establishes two-way communication with the communication terminal that collects the real-time video and audio related to the specific person, thereby making the user next to the mobile terminal It is very clear that when you need to talk to someone in a specific scene in real time, simply speaking or entering the names of these people can quickly lock and further establish two-way communication with the communication terminals next to these people without having to zoom or slide the screen on the display. , effectively saving the time and effort of manual screening. This is also the pioneering work of the conference monitoring system.
  • the selection of a specific person in the specific scene is to click or speak a specific person's name to a specific person in the video displayed on the display of the mobile terminal, so that the user can speak or manually
  • the operation mode conveniently selects a specific person appearing in a specific scene, and further triggers sending a request for visual and audio integration of real-time video and audio related to the specific person in the real-time video and audio collected by the plurality of communication terminals, Or further triggering a connection request to a communication terminal that collects real-time video and audio related to the specific person, that is, according to an embodiment of the present invention, triggering a series of subsequent steps automatically in response to the user's speaking or manual selection. Completion, this simple triggering method saves a lot of time and effort for the user.
  • a server for intelligently integrating real-time audio and video which can integrate video and audio captured by a plurality of communication terminals according to a request for integrating corresponding video and audio from a mobile terminal.
  • the integrated video and audio is transmitted to the mobile terminal, so that the monitored person can see the entire monitored scene instead of being part of the monitored scene if the monitored scene exceeds the shooting range of one camera.
  • the server may integrate the audio collected by some of the plurality of communication terminals and transmit the integrated audio to the mobile terminal according to the request of the mobile terminal, or may integrate all the multiple communication terminals. Acquired audio.
  • the server provided by this embodiment can adaptively adjust the audio returned to the mobile terminal according to the specific request of the mobile terminal, so that the user of the mobile terminal can receive a specific portion of the integrated audio from the server with great flexibility. For example, when only one monitored field is displayed on the display of a segment of the terminal In a part of the scene, the audio corresponding to the part of the scene can be sent only to the user of the mobile terminal, so that the video and audio that the user sees are corresponding, and are not interfered by other parts of the audio.
  • the server provided according to an embodiment of the present invention may further forward a connection request to the communication terminal in the first communication terminal set in response to receiving a connection request from the mobile terminal to the communication terminal in the first communication terminal set And establishing a two-way communication between the mobile terminal and the communication terminal in the first communication terminal set in response to the automatic response of the communication terminal in the first communication terminal set, whereby the mobile terminal and the display can be automatically established through the server It displays the connection of a specific communication terminal in the screen, and achieves the effect of displaying who can communicate with whom.
  • the server provided according to an embodiment of the present invention can also compare real-time video collected by multiple communication terminals in real time and eliminate overlapping portions between real-time videos, so that the processed video looks more overall.
  • a plurality of communication terminals are placed, and each communication terminal separately collects a part of the real-time audio and video of the conference venue, since the audio and video capture lens of the communication terminal is usually The wide-angle, and thus the adjacent or adjacent communication terminals, must have overlapping pictures on the video pictures collected by the communication terminals.
  • the video pictures are compared and the overlapping parts are eliminated, so that the finally integrated multiple communication terminals are
  • the captured video images form a whole, complete video picture, and the overall picture presented to the user is not perceived by the user as being collected by multiple communication terminals, but is perceived by an independent one that is very long and wide.
  • the shooting device of the lens is shot separately, and the overall picture is very strong.
  • the server provided by one embodiment of the present invention may be responsive to receiving the integrated video and audio related to the real-time video and audio of the specific person in the real-time video and audio collected by the mobile terminal from the mobile terminal.
  • Requesting, identifying real-time video and audio related to the specific person in the real-time video and audio collected by the plurality of communication terminals, and integrating the real-time video and audio related to the specific person, and transmitting the integrated related office to the mobile terminal Describe the real-time video and audio of a specific person, so that the user next to the mobile terminal is very clear about which people in the specific scene need to talk to each other in real time, and only send requests for visual and audio of these people without further browsing the entire monitored scene and Select, you can quickly lock and watch the integrated video and audio involved in these people, saving manual screening Time and energy.
  • the server provided by one embodiment of the present invention may further include: identifying means for identifying the plurality of communications in response to receiving a connection request from the mobile terminal to the communication terminal that collects real-time video and audio related to the specific person
  • the real-time view and audio collected by the terminal relate to the real-time view and audio of the specific person, thereby identifying a communication terminal that collects real-time video and audio related to the specific person
  • the communication establishing unit collects the The communication terminal of the real-time video and audio of the specific person forwards the connection request, and in response to the automatic response of the communication terminal that collects the real-time video and audio related to the specific person, at the mobile terminal and collects real-time video and audio related to the specific person
  • Two-way communication is established between the communication terminals, so that the user next to the mobile terminal is very clear about the need to communicate with those people in a specific scenario, and only sends connection requests to the communication terminals associated with these people, thereby communicating terminals associated with these people
  • FIG. 1 shows a schematic block diagram of a tool 11 mounted to a mobile terminal 1 according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram showing real-time video and audio collection by a plurality of communication terminals according to a preferred embodiment of the present invention
  • FIG. 3(a) shows a video taken by six communication terminals integrated by a server according to an embodiment of the present invention
  • 3(b) shows an initial screen displayed on the display of the mobile terminal after activation of the tool 11 installed in the mobile terminal 1 according to an embodiment of the present invention
  • Figure 3 (c) shows the result of scaling the picture displayed on the display of Figure 3 (b) in accordance with one embodiment of the present invention
  • Figure 3 (d) shows the result of sliding the screen displayed on the display of Figure 3 (b) in accordance with one embodiment of the present invention
  • FIG. 3(e) illustrates a situation in which a video of a specific person after integration is displayed on a display when a user selects a specific person according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram showing a mobile terminal directly establishing a connection with a communication terminal in a first communication set according to a preferred embodiment of the present invention
  • FIG. 5 is a schematic block diagram of a server for intelligently integrating real-time audio and video according to an embodiment of the present invention
  • FIG. 6 shows a schematic diagram of establishing communication between a mobile terminal and a communication terminal based on a server according to a preferred embodiment of the present invention
  • FIG. 7 shows a schematic block diagram of an AV integrated device according to an embodiment of the present invention.
  • Fig. 1 shows a schematic block diagram of a tool 11 mounted to a mobile terminal 1 in accordance with one embodiment of the present invention.
  • the tool 11 installed in the mobile terminal 1 includes:
  • the sending unit 101 is configured to, in response to the first trigger, send a request for the integrated video of the real-time video collected by the plurality of communication terminals 2, wherein the plurality of communication terminals 2 respectively collect real-time video of a part of the specific scene, The real-time video collected by the plurality of communication terminals 2 is integrated to form a real-time video of the specific scene;
  • the receiving unit 102 is configured to receive an integrated video of the real-time video collected by the multiple communication terminals 2;
  • the transmitting unit 101 transmits the real-time audio collected by the communication terminal 2 in the first communication terminal set based on the first communication terminal set of the plurality of communication terminals 2 corresponding to the video displayed on the display of the mobile terminal 1.
  • the integrated audio request, the receiving unit 102 receives the integrated audio of the real-time audio collected by the communication terminal 2 in the first communication terminal set, wherein the video displayed on the display of the mobile terminal 1 is collected by the plurality of communication terminals 2 Integrated view of real-time video Part of the frequency.
  • the integration of the above video and audio includes, but is not limited to, deduplication and splicing of multiple video pictures, de-duplication and noise reduction of multiple audios, and the like.
  • deduplication and splicing of multiple video pictures includes, but is not limited to, deduplication and splicing of multiple video pictures, de-duplication and noise reduction of multiple audios, and the like.
  • the patent application number "201410117927.3” discloses the splicing of multiple images into one image.
  • the tool 11 installed in the mobile terminal 1 is installed on the mobile terminal in an application, such as an application, and is displayed in the form of a corresponding application icon, or the app is solidified and inserted into the mobile terminal in one chip.
  • the tool 11 mounted on the mobile terminal 1 is embodied as the chip.
  • the first trigger refers to an action that causes the transmitting unit to transmit a request for an integrated video of the live video captured by the plurality of communication terminals 2.
  • it may include any one of the following: booting of the mobile terminal; activation of the tool in the powered-on state of the mobile terminal; specific action on the user interface in the powered-on state of the mobile terminal; The light sensed in the power on state becomes strong.
  • the boot is used as a trigger, the integrated video can be received upon booting, and the user does not need to activate the tool to avoid complicated operations.
  • the activation of the tool in the power-on state of the mobile terminal is triggered.
  • the advantage is that the user can decide again whether to receive the integrated video after the power-on, and avoid the situation that the user automatically activates after the power-on but the user does not need it.
  • the first trigger can also be performed by a specific action on the user interface in the power-on state, such as clicking, double-clicking, long-pressing, etc., and the benefit is that the user can decide again whether to receive the integrated video after booting, and avoid booting. After the event is automatically activated but not required by the user.
  • the first trigger may be performed by the light intensity sensed in the power-on state of the mobile terminal, so that, for example, the user pulls out the mobile terminal from the pocket so that the light sensed by the mobile terminal becomes strong and is automatically triggered.
  • the beneficial effect is that it is not triggered by the booting, because even if the mobile terminal is in the user's pocket, the user may not need to integrate the video and audio. As long as the user pulls out the mobile terminal from the pocket, it automatically turns on the integrated video and audio. The function avoids the complicated operation of the user to turn on the integration function again.
  • the first trigger may also be other manners.
  • the triggering manner of the tool is not limited.
  • the mobile terminal 1 includes, but is not limited to, any communication capable of human-computer interaction with a user. Equipment is not limited here.
  • the communication terminal 2 includes, but is not limited to, any electronic product that can interact with a user through a touchpad, a remote control device, a voice control device, a keyboard, or the like, such as a computer, a tablet (PAD), etc., and those skilled in the art should It will be understood that other devices, such as may be suitable for use in the present invention, are also intended to be included within the scope of the present invention.
  • the communication terminal 2 can perform real-time video collection by any device having a video capture function, such as a camera, and the communication terminal 2 can perform real-time audio collection through any device having an audio collection function, such as a recording unit.
  • the communication terminal 2 can upload real-time collected video and audio in real time or timing to a corresponding server based on, for example, Transmission Control Protocol (TCP) or User Datagram Protocol (UDP), and the video and audio uploaded by the server to the plurality of communication terminals 2 Perform a unified integration process.
  • TCP Transmission Control Protocol
  • UDP User Datagram Protocol
  • the plurality of communication terminals 2 are usually located in a specific scenario, and each communication terminal 2 is generally responsible for collecting a part of real-time video of a specific scene, and each communication terminal 2 uploads the collected video and audio information to the corresponding server in real time. These video and audio are integrated by the server to obtain complete real-time video and audio of the specific scene.
  • the server may integrate the video and audio uploaded by a part of the communication terminals of the plurality of communication terminals 2, and may also integrate the video and audio uploaded by all of the plurality of communication terminals.
  • FIG. 2 is a schematic diagram of real-time video and audio collection by multiple communication terminals according to an embodiment of the present invention. As shown in FIG.
  • each communication terminal 2 in a long location, six communication terminals 2 are placed, and each communication terminal 2 is responsible for collecting video and audio information (determined by the corresponding field of view) of a certain area of the party place, and the positions are adjacent or similar.
  • the communication terminal 2 usually has video or audio that is overlapped or overlapped. For example, if two adjacent communication terminals 2 simultaneously capture the same person, or simultaneously capture the speech of multiple people, the server passes through two adjacent communication.
  • the plurality of videos uploaded by the terminal 2 including a plurality of videos of the same person or simultaneously capturing the speeches of the plurality of people are integrated, and in the integrated video picture, only the integrated picture of the person after integration is included, instead of including the Two separate pictures of the human with the overlapping part of the picture; in the integrated audio, only one piece of audio after the integration of the captured multiple people is included, instead of the two separate audios containing the overlap of the captured multiple people Superimposed audio.
  • six communication terminals 2 respectively capture video and audio of six people p1-p6, and each communication terminal captures video and audio of one person.
  • Specific scenes can be large conference venues, banquet venues, etc., but also other needs
  • a communication terminal performs real-time video and audio collection at the scene.
  • FIG. 3(a) shows a video taken by six communication terminals integrated by a server according to an embodiment of the present invention. Assume that six people p1-p6 in the monitored scene are located in the video 6-1, 6-2, ..., 6-6 collected by the six communication terminals, and the video portion collected by each communication terminal is called "in the integrated video". window”. If the entire integrated video in Fig. 3(a) is displayed on the display of the mobile terminal 1, each window is too small to be seen. Accordingly, one embodiment of the present invention allows only a partial window to be displayed on the display 180 of the mobile terminal 1. As shown in FIG.
  • the transmitting unit 101 knows at this time which communication terminals (the second and third communication terminals in this example) the video displayed on the display of the mobile terminal 1 is, so that it can be based on the video displayed on the display of the mobile terminal 1.
  • the first communication terminal set (ie, the second and third communication terminals) of the plurality of communication terminals 2 transmits a request for integrated audio of real-time audio collected by the communication terminal 2 in the first communication terminal set
  • Receiving unit 102 receives the integrated audio of the real-time audio collected by the communication terminal 2 in the first communication terminal set, so that the speaker of the mobile terminal 1 outputs only the communication terminal 2 from the first communication terminal set (in this example The 2nd and 3rd communication terminals) integrate the integrated audio of the real-time audio, instead of the integrated audio from the real-time audio collected by all the 6 communication terminals 2.
  • FIG. 1 block diagrams shown in FIG. 1 are for illustrative purposes only and are not intended to limit the scope of the invention. In some cases, certain units or devices may be added or removed as appropriate.
  • the transmitting unit 101 also initiates a connection request to the communication terminal 2 in the first communication terminal set, and in response to the automatic response of the communication terminal 2 in the first communication terminal set,
  • the communication terminal 2 in a set of communication terminals establishes two-way communication.
  • FIG. 4 shows a schematic diagram of a mobile terminal directly establishing a connection with a communication terminal in a first communication set in accordance with a preferred embodiment of the present invention.
  • the user next to the mobile terminal does not need to switch the currently played video page to the page that initiates the connection request to the communication terminal 2, so that the user next to the mobile terminal can perform the communication in the process of establishing communication between the mobile terminal and the communication terminal 2 without interruption.
  • Watch the current video page For example, windows 6-2, 6-3 are displayed in the display shown in Figure 2b, and therefore, are initiated to be associated with windows 6-2, 6-3 (i.e., videos of windows 6-2, 6-3 are taken).
  • the second and third communication terminals establish a connection request for communication.
  • the tool 11 further includes a scaling unit 104 configured to perform a video display on the display of the mobile terminal 1 in response to a user zooming operation on a video displayed on the display of the mobile terminal 1. Zooming so that the first set of communication terminals corresponding to the video displayed on the display changes. As shown in FIG. 3(c), when the user sees the video of the window 6-2, 6-3 shown in FIG. 3(b) and only wants to watch the video of the window 6-2 and listen to the sound of the person p2, The screen on the display is enlarged so that only the window 6-2 of the person p2 is displayed on the display. At this time, the speaker of the mobile terminal only outputs the sound collected by the communication terminal corresponding to the window, so the user can perform separate monitoring with the person p2, and only Obtain the video and audio related to p2 without interference from others.
  • a scaling unit 104 configured to perform a video display on the display of the mobile terminal 1 in response to a user zooming operation on a video displayed on the display of the mobile terminal 1.
  • the scaling unit 104 may reduce or enlarge the video picture currently displayed by the mobile terminal 1 in response to an operation such as a two-finger movement or sliding of the user, when the size such as the video picture is satisfied, such as according to the tool default or user
  • the first communication terminal set corresponding to the scaled video is changed.
  • the tool 11 according to a preferred embodiment of the present invention further includes:
  • the sliding unit 105 is configured to slide the video displayed on the display of the mobile terminal 1 in response to the sliding operation of the video displayed on the display of the mobile terminal 1 by the user, so that the first communication terminal set corresponding to the video displayed on the display change.
  • Figure 3(d) when the user sees the video of windows 6-2, 6-3 shown in Figure 3(b) and wants to see who is on the right side of p3, you can slide the window to the right.
  • Windows 6-2, 6-3, and windows 6-3, 6-4 are displayed on the display.
  • the user can obtain the video and audio related to the people p3 and p4, instead of the video and audio related to the people p2 and p3.
  • the sliding unit 105 may slide the video currently displayed on the display of the mobile terminal 1 in response to a user such as dragging, long-pressing sliding, sliding only, etc., when conditions such as the sliding distance exceeding a certain threshold are satisfied.
  • a user such as dragging, long-pressing sliding, sliding only, etc.
  • the user can simultaneously zoom and slide the currently displayed video picture, or zoom and slide the currently displayed video picture, and then slide and then zoom the currently displayed video picture, then the first communication terminal set is correspondingly Change.
  • the transmitting unit 101 transmits, in response to receiving a selection of a specific person in the specific scene, the real-time video and audio collected by the plurality of communication terminals 2 in the specific
  • the receiving unit 102 receives the integrated audio of the real-time audio collected by the communication terminal 2 in the first communication terminal set by the user's real-time video and audio integrated video and audio request.
  • the receiving the selection of a specific person in the specific scene may be performed by, for example, the tool 11 recognizing that the currently playing video or the received video contains a picture of a specific person, which will be recognized
  • the specific person avatar circle is provided to the user for selection in the form of a menu; for example, by speaking in response to the user's clicking, double-clicking, or the like on a specific person in the video displayed on the display of the mobile terminal 1, or receiving the user to speak a specific The audio of the person's name, etc.
  • the user only wants to know what the people p2 and p5 are doing, and when they hear what p2 and p5 are saying, they directly say the names of p2 and p5, and the tool 11 recognizes p2 and p5 by voice recognition.
  • the server recognizes that the second communication terminal and the fifth communication terminal associated with the windows 6-2, 6-5 respectively collect the video and audio of p2 and p5, respectively, and the video and audio collected by the second communication terminal and the fifth communication terminal respectively
  • the integration is sent to the receiving unit 102 of the tool 11.
  • the audio output of the mobile terminal is also the audio corresponding to the windows p2 and p5, so that the user only sees himself.
  • the tool 11 may store the mode and/or the sound frequency of the face of the specific person in the memory in advance in the case of identifying the currently played video or the received video containing the picture of the specific person, when received If there is a pattern matching of a specific person's face in the video or audio or the currently played video and/or a match with the specific person's voice frequency, the specific person's avatar is intercepted from the video picture and circled, and provided to the user for selection. .
  • the tool can also employ a self-learning method to identify video or/and audio containing a particular person's picture.
  • a prompt may be displayed on the display of the mobile terminal 1 to prompt the identification of the specific person, and the user next to the mobile terminal 1 Judging and naming, if the user next to the mobile terminal finds an identification error, the feedback information is input on the display and returned to the tool, and the tool corrects accordingly according to the historical feedback information in the next recognition.
  • the mode or/and the sound frequency of the face of the specific person may not be stored in the memory in advance.
  • the sending unit 101 in response to receiving the selection for the specific person in the specific scene, transmitting real-time video and audio related to the specific person in the real-time video and audio collected by the plurality of communication terminals 2
  • the integrated video and audio request is received by the receiving unit 102 for the corresponding integrated audio.
  • the communication terminal 2 can identify a specific person based on one or more of face recognition, height recognition, and voice recognition.
  • the transmitting unit 101 initiates a connection request to the communication terminal 2 that collects real-time video and audio related to the specific person in response to receiving a selection for a specific person in the specific scene, And in response to the automatic response of the communication terminal 2 that has acquired the real-time video and audio of the specific person, communication is established with the communication terminal 2 that has acquired the real-time video and audio related to the specific person.
  • the user carrying the mobile terminal 1 not only sees the video of the desired person according to his or her own wishes, but also hears the audio of the desired person.
  • the desired person also sees his own video and hears his own audio. Two-way communication with the person of hope.
  • the sending unit 101 may also initiate a connection request to the communication terminal 2 that collects real-time video and audio related to the specific person, thereby establishing communication directly between the mobile terminal 1 and the communication terminal 2, so that the mobile terminal 1 directly Real-time communication with a specific one or more communication terminals 2 to acquire each other's real-time video and audio.
  • the mobile terminal 1 may be one or more. When there are multiple mobile terminals 1, each mobile terminal 1 may be associated with each other or may be independent of each other.
  • FIG. 5 is a schematic block diagram of a server for intelligently integrating real-time audio and video according to an embodiment of the present invention.
  • the server comprises:
  • Video and audio receiving device 301 configured to receive real-time from a plurality of communication terminals 2 Video and audio, a request for integrated video of the real-time video collected by the plurality of communication terminals from the mobile terminal 1, and a communication terminal 2 of the first communication terminal set of the plurality of communication terminals 2 from the mobile terminal 1 Acquired integrated audio requests for real-time audio;
  • the video and audio integration device 302 is configured to integrate the real-time video collected by the plurality of communication terminals 2 in response to a request from the mobile terminal 1 for the integrated video of the real-time video collected by the plurality of communication terminals 2, And responding to the request from the mobile terminal 1 for the integrated audio of the real-time audio collected by the communication terminal 2 of the first communication terminal set of the plurality of communication terminals 2, the first communication of the plurality of communication terminals 2 Real-time audio collected by the communication terminal 2 in the terminal set is integrated;
  • the video and audio transmitting device 303 is configured to transmit the integrated video or/and the integrated audio to the mobile terminal 1.
  • the server 3 may include, but is not limited to, a single network server, a plurality of network server sets, or a cloud composed of multiple servers.
  • the server 3 receives video and audio uploaded from multiple communication terminals 2 in real time or in time on the one hand, and can also receive real-time video or/and real-time audio collected from the plurality of communication terminals 2 from the mobile terminal 1 on the one hand.
  • Video or / and audio according to the received request for integration of real-time video or / and real-time audio, integrate the corresponding video or / and audio and send the integrated video or / and audio to the mobile terminal 1 .
  • the server 3 further comprises: a communication establishing unit 305 configured to, in response to receiving a connection request from the mobile terminal 1 to the communication terminal 2 in the first set of communication terminals, to The communication terminal 2 in the first communication terminal set forwards the connection request, and in response to the automatic response of the communication terminal 2 in the first communication terminal set, between the mobile terminal 1 and the communication terminal 2 in the first communication terminal set Establish two-way communication.
  • a communication establishing unit 305 configured to, in response to receiving a connection request from the mobile terminal 1 to the communication terminal 2 in the first set of communication terminals, to The communication terminal 2 in the first communication terminal set forwards the connection request, and in response to the automatic response of the communication terminal 2 in the first communication terminal set, between the mobile terminal 1 and the communication terminal 2 in the first communication terminal set Establish two-way communication.
  • FIG. 6 is a schematic diagram showing establishing communication between a mobile terminal and a communication terminal based on a server according to a preferred embodiment of the present invention.
  • the server 3 receives the connection request from the mobile terminal 1 to the communication terminal in the first communication terminal set or to the specific one or more communication terminals, and then forwards the request to the target communication terminal according to the received connection request.
  • the connection request receives the automatic of the target communication terminal After the response, a two-way communication connection is established with the mobile terminal 1 and the target communication terminal 2.
  • FIG. 7 is a schematic block diagram of an audio-video integration apparatus according to an embodiment of the present invention.
  • the video and audio integration device 302 includes:
  • the video frame comparison module 3021 is configured to perform real-time comparison on the real-time video collected by the plurality of communication terminals 2, and determine an overlapping portion between the real-time videos collected by the plurality of communication terminals 2;
  • the overlapping portion eliminating module 3022 is configured to eliminate overlapping portions between the real-time videos collected by the plurality of communication terminals 2, thereby integrating real-time video collected by the plurality of communication terminals 2.
  • each of the plurality of communication terminals 2 is generally responsible for collecting a part of audio and video of a specific scene, since the captured video is usually taken at a wide angle, and in order to capture video of all views of a specific scene, adjacent Or the audio and video collected by communication terminals in similar positions usually have overlapping parts, and in order to integrate the video collected by multiple communication terminals into a complete, unconformed trace, it seems to be collected by a communication terminal with an infinite field of view.
  • Video the overlapping part of the video and audio collected by multiple communication terminals needs to be eliminated, and only one video and audio collected for the same scene is reserved.
  • the real-time video collected by the plurality of communication terminals 2 needs to be compared in real time to determine and eliminate the overlapping video images.
  • the server 3 further comprises: an identification device 304 responsive to receiving real-time viewing of the specific person in the real-time video and audio collected from the mobile terminal 1 for the plurality of communication terminals 2 Identifying the real-time video and audio of the specific person in the video and audio collected by the plurality of communication terminals 2, and the integrated video and audio request of the audio, and
  • the video and audio integration device 302 integrates the real-time video and audio related to the specific person.
  • the video and audio transmitting device 303 transmits the integrated real-time video and audio related to the specific person to the mobile terminal 1.
  • the server 3 may also identify the received from the plurality of communication terminals 2 by storing the face mode and/or the sound frequency of the specific person in a memory or self-learning in advance.
  • the set of video and audio involves real-time video and audio of a specific person, and the recognized real-time video and audio is filtered and integrated from all received video and audio, and transmitted to the mobile terminal 1.
  • the server 3 further comprises: an identification device 304 responsive to receiving a connection request from the mobile terminal 1 to the communication terminal 2 that has acquired real-time video and audio related to the specific person, identifying the Real-time video and audio of the specific person is involved in the real-time video and audio collected by the plurality of communication terminals 2, thereby identifying the communication terminal 2 that collects real-time video and audio related to the specific person, and
  • the communication establishing unit 305 forwards the connection request to the communication terminal 2 that collects the real-time video and audio related to the specific person, and responds to the automatic response of the communication terminal 2 that collects the real-time video and audio related to the specific person. Two-way communication is established between the mobile terminal 1 and the communication terminal 2 that has acquired real-time video and audio related to the specific person.
  • the server 3 also receives as a communication relay station, receives a connection request from the mobile terminal 1 to the communication terminal 2 that collects real-time video and audio related to a specific person, in the mobile terminal 1 and the A two-way communication connection is established between communication terminals 2 involving real-time video and audio of a specific person.
  • the present invention can be implemented as a device, apparatus, method, or computer program product. Therefore, the present disclosure may be embodied in the following forms, that is, it may be complete hardware, full software, or a combination of hardware and software.
  • each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the Executable instructions.
  • the functions noted in the blocks may also occur in a different order than that illustrated in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or a group of dedicated hardware and computer instructions Come together to achieve.

Abstract

本发明公开了一种安装于移动终端的工具和一种智能整合实时音视频的服务器,其中,安装于移动终端的工具包括:发送单元,被配置为响应于第一触发,发送对多个通信终端采集的实时视频的整合的视频的请求;接收单元,被配置为接收所述多个通信终端采集的实时视频的整合的视频,其中,发送单元基于在移动终端的显示器上显示的视频对应的、所述多个通信终端中的第一通信终端集合,发送对第一通信终端集合中的通信终端采集的实时音频的整合的音频的请求,接收单元接收第一通信终端集合中的通信终端采集的实时音频的整合的音频。本发明在被监视的场景超出了一个摄像头的拍摄范围的情况下能够让监视的人看到整个被监视场景,而不是被监视场景的一部分。

Description

一种移动终端的工具及智能整合音视频的服务器
本申请要求了2014年7月15日提交的、申请号为201410337180.2、发明名称为“一种移动终端的工具及智能整合音视频的服务器”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及通信和图像处理技术,尤其涉及一种移动终端的工具及智能整合音视频的服务器。
背景技术
现有技术中,例如长桌会议等环境下,由于会议场景狭长,超出了一个摄像头的拍摄范围,因此在利用远程摄像头等进行监控或使用视频终端进行远程双向视频通话的应用中,监控或通话的人只能通过该一个摄像头采集的视频,观看到会议场景的一部分。
发明内容
本发明解决的技术问题之一是在被监视的场景超出了一个摄像头的拍摄范围的情况下能够让监视的人看到整个被监视场景,而不是被监视场景的一部分。
根据本发明的一个实施例,提供了一种安装于移动终端的工具,包括:发送单元,被配置为响应于第一触发,发送对多个通信终端采集的实时视频的整合的视频的请求,其中所述多个通信终端分别采集特定场景的一部分的实时视频,所述多个通信终端分别采集的实时视频整合后构成所述特定场景的实时视频;接收单元,被配置为接收所述多个通信终端采集的实时视频的整合的视频,其中,发送单元基于在移动终端的显示器上显示的视频对应的、所述多个通信终端中的第一通信终端集合,发送对第一通信终端集合中的通信终端采集的实时音频的整合的音频的请求,接收单元接收第一通信终端集合中的通信终端采集的实时音频的整合的音频,其中在移动终端的显示器上 显示的视频是所述多个通信终端采集的实时视频的整合的视频的一部分。
可选地,该工具还包括:配置单元,用于接收用户对所述多个通信终端采集的视音频进行整合的配置。
可选地,发送单元还向第一通信终端集合中的通信终端发起连接请求,并响应于第一通信终端集合中的通信终端的自动应答,与第一通信终端集合中的通信终端建立双向通信。
可选地,该工具还包括:缩放单元,被配置为响应于用户对移动终端的显示器上显示的视频的缩放操作,对移动终端的显示器上显示的视频进行缩放,从而显示器上显示的视频对应的第一通信终端集合改变。
可选地,该工具还包括:滑动单元,被配置为响应于用户对移动终端的显示器上显示的视频的滑动操作,对移动终端的显示器上显示的视频进行滑动,从而显示器上显示的视频对应的第一通信终端集合改变。
可选地,所述第一触发包括以下中的任一种:所述移动终端的开机;所述移动终端开机状态下所述工具的激活;所述移动终端开机状态下用户界面上的特定动作;所述移动终端开机状态下接收到的特定语音;所述移动终端开机状态下感测到的光线变强。
可选地,发送单元响应于接收到针对所述特定场景中特定人的选择,发送对所述多个通信终端采集的实时视、音频中涉及所述特定人的实时视、音频的整合的视、音频的请求,接收单元接收所述多个通信终端采集的实时视、音频中涉及所述特定人的实时视、音频的整合的视、音频。
可选地,发送单元响应于接收到针对所述特定场景中特定人的选择,向采集了涉及所述特定人的实时视、音频的通信终端发起连接请求,并响应于采集了涉及所述特定人的实时视、音频的通信终端的自动应答,与采集了涉及所述特定人的实时视、音频的通信终端建立双向通信。
可选地,针对所述特定场景中特定人的选择是对在移动终端的显示器上显示的视频中特定人的点击或说出特定人的名字。
根据本发明的一个实施例,还提供了一种智能整合实时音视频的服务器,包括:视、音频接收装置,被配置为接收来自多个通信终端的实时视、音频、来自移动终端的对所述多个通信终端采集的实时视频的整合的视频的 请求、来自移动终端的对所述多个通信终端中第一通信终端集合中的通信终端采集的实时音频的整合的音频的请求;视、音频整合装置,被配置为响应于来自移动终端的对所述多个通信终端采集的实时视频的整合的视频的请求,对所述多个通信终端采集的实时视频进行整合,并响应于来自移动终端的对所述多个通信终端中第一通信终端集合中的通信终端采集的实时音频的整合的音频的请求,对所述多个通信终端中第一通信终端集合中的通信终端采集的实时音频进行整合;视、音频发送装置,被配置为将整合的视频或/和整合的音频发送到移动终端。
可选地,服务器还包括:通信建立单元,被配置为响应于接收到来自移动终端的向所述第一通信终端集合中的通信终端的连接请求,向所述第一通信终端集合中的通信终端转发该连接请求,并响应于第一通信终端集合中的通信终端的自动应答,在移动终端和第一通信终端集合中的通信终端间建立双向通信。
可选地,视、音频整合装置包括:视频画面比对模块,被配置为将所述多个通信终端采集的实时视频进行实时对比,确定所述多个通信终端采集的实时视频之间的重叠部分;重叠部分消除模块,被配置为消除所述多个通信终端采集的实时视频之间的重叠部分,从而对所述多个通信终端采集的实时视频进行整合。
可选地,服务器还包括:识别装置,响应于接收到来自移动终端的对所述多个通信终端采集的实时视、音频中涉及所述特定人的实时视、音频的整合的视、音频的请求,识别所述多个通信终端采集的实时视、音频中涉及所述特定人的实时视、音频,并且所述视、音频整合装置整合所述涉及所述特定人的实时视、音频,所述视、音频发送装置向移动终端发送整合的所述涉及所述特定人的实时视、音频。
可选地,服务器还包括:识别装置,响应于接收到来自移动终端的向采集了涉及所述特定人的实时视、音频的通信终端的连接请求,识别所述多个通信终端采集的实时视、音频中涉及所述特定人的实时视、音频,从而识别采集了涉及所述特定人的实时视、音频的通信终端,并且所述通信建立单元向采集了涉及所述特定人的实时视、音频的通信终端转发连接请求,并响 应于采集了涉及所述特定人的实时视、音频的通信终端的自动应答,在移动终端和采集了涉及所述特定人的实时视、音频的通信终端之间建立双向通信。
由于本发明的一个实施例中,多个通信终端分别采集特定场景的一部分的实时视频,所述多个通信终端分别采集的实时视频整合后构成所述特定场景的实时视频,这样,移动终端发送对该整合视频的请求后,该整合视频就能显示在移动终端,达到了在被监视的场景超出了一个摄像头的拍摄范围的情况下能够让监视的人看到整个被监视场景的效果。
另外,由于被监视场景是例如狭长的,监视用户可能在某一时间点只要监视一部分场景,即看到这一部分场景的视频,听到这一部分场景的音频,因此本发明的实施例可以基于在移动终端的显示器上显示的视频对应的、所述多个通信终端中的第一通信终端集合,发送对第一通信终端集合中的通信终端采集的实时音频的整合的音频的请求,并只接收第一通信终端集合中的通信终端采集的实时音频的整合的音频。如此,当移动终端收到来自多个通信终端采集的实时视频的整合视频时,根据显示器的尺寸和视频画面在显示器当前可显示的画面大小自动知道显示器显示的视频对应于整合视频的哪一部分、以及其对应的第一通信终端集合,,并获取对该第一通信终端集合中的通信终端采集的实时音频的整合音频,也即,本实施例确保在显示器上显示的视频和用户听到的音频是对应的,达到了有效避免因接收所有音频而造成其它部分音频对显示器显示的部分视频的干扰的有益效果。一旦音频与视频不对应,监视用户会难以分清声音是否来自于当前显示的画面中的人,造成困惑。能够只听显示器画面中的人说话,同时抑制其他通信终端所采集到的音频,而不是听整个场景中所有的人说话,目前是监视系统尤其是会议监视系统的一个创举。
由于本发明的一个实施例的工具还包括配置单元,用于接收用户对所述多个通信终端采集的视音频进行整合的配置,也就是说,所述多个通信终端是由用户指定与用户的移动终端绑定的,这样,下次响应于第一触发,才能知道请求哪些移动终端的整合的视频。这样,可以实现由用户来指定与 其移动终端绑定的用户希望整合其视音频的多个通信终端,达到了用户可以根据需要灵活指定与其终端绑定、并整合其视音频的通信终端的有益效果。
由于本发明的一个实施例提供的安装于移动终端的工具可以向第一通信终端集合中的通信终端发起连接请求,并响应于第一通信终端集合中的通信终端的自动应答,与第一通信终端集合中的通信终端建立双向通信,这样,本实施例可以根据识别出的特定的通信终端集合,向该集合中的通信终端自动发起连接请求,从而与识别出的通信终端建立通信,达到监视用户在显示器上看见谁、就能跟谁像打电话一样无障碍双向交流的有益效果,这是目前的会议监视系统做不到的,是监视系统目前的一个创举。另外,第一通信终端集合中的通信终端自动应答,确保了例如被监视会议场景的人感觉不到这种切换,实现了无缝会议监视,使开会和通话的流畅性不被打断。
由于本发明的一个实施例提供的安装于移动终端的工具还可以包括缩放单元和/或滑动单元,通过响应于用户的缩放操作和/或滑动操作,改变显示器上显示的视频所对应的第一通信终端集合。根据该实施例,用户可以根据观看视频的需要,任意地缩放和移动视频画面,这样,监视用户如果想跟被监视场景中的另一个人说话,就滑动视频画面,使显示器显示的画面变成含有那个人的画面;如果当前显示器的画面中含有多个人,但监视用户只想跟一个人说话,可以缩放显示器显示的画面变成只含有该人,这样,达到了监视用户随心所欲选择和被监视场景中的任何人说话的目的。这也是会议监视系统中的创举。
由于本发明的一个实施例提供的安装于移动终端的工具可以响应于接收到针对所述特定场景中特定人的选择,发送对所述多个通信终端采集的实时视、音频中涉及所述特定人的实时视、音频的整合的视、音频的请求并接收所述多个通信终端采集的实时视、音频中涉及所述特定人的实时视、音频的整合的视、音频,从而使得移动终端旁的用户非常清楚需要与特定场景中的哪些人实时对话时,仅说出或输入这些人的名字就不用再缩放或滑动显示器上的画面就能快速锁定并观看其中涉及这些人的整合的视音频,有效节省人工筛选的时间和精力。这也是会议监视系统的创举。
由于本发明的一个实施例提供的安装于移动终端的工具可以响应于接收到针对所述特定场景中特定人的选择,向采集了涉及所述特定人的实时视、音频的通信终端发起连接请求,并响应于采集了涉及所述特定人的实时视、音频的通信终端的自动应答,与采集了涉及所述特定人的实时视、音频的通信终端建立双向通信,从而使得移动终端旁的用户非常清楚需要与特定场景中的哪些人实时对话时,仅说出或输入这些人的名字就不用再缩放或滑动显示器上的画面就能快速锁定并进一步直接与这些人旁边的通信终端建立双向通信,有效节省人工筛选的时间和精力。这也是会议监视系统的创举。
根据本发明的一个实施例,针对所述特定场景中特定人的选择是对在移动终端的显示器上显示的视频中特定人的点击或说出特定人的名字,如此,用户可以通过说话或者手动操作地方式方便地选择特定场景中出现的特定人,并可以进一步触发发送对多个通信终端采集的实时视、音频中涉及所述特定人的实时视、音频的整合的视、音频的请求,或进一步触发向采集了涉及所述特定人的实时视、音频的通信终端发起连接请求,也即,根据本发明的实施例,可以响应于用户的说话或手动选择,触发一系列后续步骤的自动完成,对于用户而言,这种简单的触发方式节省了大量时间和精力。
由于根据本发明的另一个方面的一个实施例,提供了一种智能整合实时音视频的服务器,其可以根据来自移动终端的整合相应视音频的请求,对多个通信终端拍摄的视音频进行整合并将整合后的视音频发送给移动终端,从而实现了在被监视的场景超出了一个摄像头的拍摄范围的情况下能够让监视的人看到整个被监视场景,而不是被监视场景的一部分。
在本发明的一个实施例中,服务器既可以根据移动终端的请求来整合所有多个通信终端中的部分通信终端采集的音频并将整合的音频发送给移动终端,也可以整合所有多个通信终端采集的音频。无论如何,本实施例提供的服务器可以根据移动终端的具体请求自适应调整返回给移动终端的音频,从而使得移动终端的用户可以非常灵活地从服务器接收特定部分的整合的音频。例如,当一段终端的显示器上仅显示被监视场 景中的一部分时,可以只向移动终端的用户发送这一部分场景相对应的音频,这样,监视用户看到的视频和音频是对应的,不受其它部分音频干扰。
由于根据本发明的一个实施例提供的服务器还可以响应于接收到来自移动终端的向第一通信终端集合中的通信终端的连接请求,向所述第一通信终端集合中的通信终端转发连接请求,并响应于第一通信终端集合中的通信终端的自动应答,在移动终端和第一通信终端集合中的通信终端间建立双向通信,由此,通过该服务器,可以自动建立移动终端与显示器上显示画面中的特定通信终端的连接,达到显示谁、就能和谁之间双向交流的效果。
由于根据本发明的一个实施例提供的服务器还可以对多个通信终端采集的实时视频进行实时对比,并消除实时视频之间的重叠部分,从而使得处理后的视频看上去的整体感更强。例如,在一个大型的会议场所,为了拍摄整个会议场所的所有视角,放置了多台通信终端,每台通信终端分别采集该会议场所的一部分实时音视频,由于通信终端的音视频采集镜头通常是广角的,因而相邻或邻近的通信终端所采集的视频画面必然存在重叠画面,本实施例通过对视频画面进行比对并对其中的重叠部分予以消除,使得最后整合的来自多个通信终端所采集的视频画面形成一个整体的、完整的视频画面,最后给用户呈现的整体画面使用户感觉不到是由多个通信终端分别采集而得的,而是感觉由一个独立的具有很长很宽的镜头的拍摄设备单独拍摄完成,画面的整体感很强。
由于本发明的一个实施例提供的服务器可以响应于接收到来自移动终端的对所述多个通信终端采集的实时视、音频中涉及所述特定人的实时视、音频的整合的视、音频的请求,识别所述多个通信终端采集的实时视、音频中涉及所述特定人的实时视、音频,并且整合所述涉及特定人的实时视音频,并向移动终端发送整合的所述涉及所述特定人的实时视音频,从而使得移动终端旁的用户非常清楚需要与特定场景中的哪些人实时对话时,仅发送对这些人的视、音频的请求不用再进一步地浏览整个被监视场景并选择,就能快速锁定并观看其中涉及这些人的整合的视音频,有效节省人工筛选的 时间和精力。
由于本发明的一个实施例提供的服务器还可以包括识别装置,响应于接收到来自移动终端的向采集了涉及所述特定人的实时视、音频的通信终端的连接请求,识别所述多个通信终端采集的实时视、音频中涉及所述特定人的实时视、音频,从而识别采集了涉及所述特定人的实时视、音频的通信终端,并且,所述通信建立单元向采集了涉及所述特定人的实时视音频的通信终端转发连接请求,并响应于采集了涉及所述特定人的实时视音频的通信终端的自动应答,在移动终端和采集了涉及所述特定人的实时视音频的通信终端之间建立双向通信,从而使得移动终端旁的用户非常清楚需要与特定场景中的哪些人实时对话时,仅发送向这些人相关的通信终端的连接请求,从而与这些人相关的通信终端建立连接,就能快速与需要的人建立直接通信,有效节省人工筛选的时间和精力。
本领域普通技术人员将了解,虽然下面的详细说明将参考图示实施例、附图进行,但本发明并不仅限于这些实施例。而是,本发明的范围是广泛的,且意在仅通过后附的权利要求限定本发明的范围。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本发明的其它特征、目的和优点将会变得更明显:
图1示出根据本发明一个实施例的安装于移动终端1的工具11的示意性框图;
图2示出了根据本发明一个优选实施例的多个通信终端进行实时视音频采集的示意图;
图3(a)示出了根据本发明一个实施例的由服务器整合后的六个通信终端拍摄的视频;
图3(b)示出了根据本发明一个实施例的安装于移动终端1的工具11激活后移动终端的显示器上显示的初始画面;
图3(c)示出了根据本发明一个实施例的缩放图3(b)中显示器上显示的画面后的结果;
图3(d)示出了根据本发明一个实施例的滑动图3(b)中显示器上显示的画面后的结果;
图3(e)示出了根据本发明一个实施例的当用户选择特定人时显示器上显示整合后的特定人所在的视频的情形;
图4示出了根据本发明一个优选实施例的移动终端与第一通信集合中的通信终端直接建立连接的示意图;
图5示出了根据本发明一个实施例的智能整合实时音视频的服务器的示意性框图;
图6示出了根据本发明一个优选实施例的基于服务器在移动终端和通信终端之间建立通信的示意图;
图7示出了根据本发明一个实施例的视音频整合装置的示意性框图;
附图中相同或相似的附图标记代表相同或相似的部件。
具体实施方式
下面结合附图对本发明作进一步详细描述。
图1示出了根据本发明一个实施例的安装于移动终端1的工具11的示意性框图。根据图1,所述安装于移动终端1的工具11,包括:
发送单元101,被配置为响应于第一触发,发送对多个通信终端2采集的实时视频的整合的视频的请求,其中所述多个通信终端2分别采集特定场景的一部分的实时视频,所述多个通信终端2分别采集的实时视频整合后构成所述特定场景的实时视频;
接收单元102,被配置为接收所述多个通信终端2采集的实时视频的整合的视频;
其中,发送单元101基于在移动终端1的显示器上显示的视频对应的、所述多个通信终端2中的第一通信终端集合,发送对第一通信终端集合中的通信终端2采集的实时音频的整合的音频的请求,接收单元102接收第一通信终端集合中的通信终端2采集的实时音频的整合的音频,其中在移动终端1的显示器上显示的视频是所述多个通信终端2采集的实时视频的整合的视 频的一部分。
需要说明的是,上述视音频的整合包括但不限于多个视频画面的去重和拼接,多个音频的去重和降噪等。现有技术存在多种对图像进行整合的技术,例如申请号为“201410117927.3”、发明名称为“一种多路视频监控图像数据处理方法及系统”的专利公开了将多路图像拼接成一幅图像的技术方案。
上文中,所述安装于移动终端1的工具11以诸如应用程序(app)的方式安装于移动终端上,并以相应的应用图标的形式予以展示,或者app固化在一个芯片内插入移动终端,安装于移动终端1的工具11体现为该芯片。
第一触发指某种动作,该动作使发送单元发送对多个通信终端2采集的实时视频的整合的视频的请求。例如,它可以包括以下中的任一种:所述移动终端的开机;所述移动终端开机状态下所述工具的激活;所述移动终端开机状态下用户界面上的特定动作;所述移动终端开机状态下感测到的光线变强。其中,开机作为触发,就可以使得一开机就接收到整合的视频,用户不用激活工具,避免复杂操作。所述移动终端开机状态下所述工具的激活作为触发,好处是用户可以在开机之后再次决定是否要接收整合的视频,避免开机后自动激活但用户并不需要的情况。也可以通过所述移动终端开机状态下用户界面上的特定动作诸如点击、双击、长按等来进行第一触发,它的好处也是用户可以在开机之后再次决定是否要接收整合的视频,避免开机后自动激活但用户并不需要的情况。另外,还可以通过所述移动终端开机状态下感测到的光线变强进行第一触发,这样,实现例如用户从口袋里掏出移动终端使得移动终端感测到的光线变强而自动触发的有益效果,它不是开机作为触发,因为即使开机由于移动终端在用户的口袋里用户也不可能需要整合的视、音频,只要用户从口袋里掏出移动终端,它就自动开启整合视、音频的功能,避免了用户再开启整合功能的复杂操作。
所述第一触发还可以是其它方式,在此,对于所述工具的触发方式不作限定。
所述移动终端1包括但不限于任何一种可与用户进行人机交互的通信 设备,在此不作限定。所述通信终端2包括但不限于任何一种可与用户通过触摸板、遥控设备、声控设备或键盘等进行人机交互的电子产品,例如计算机、平板电脑(PAD)等,本领域技术人员应能理解,其他设备如可适用于本发明,也应包含在本发明保护范围以内。
其中,通信终端2可以通过任何具有视频采集功能的装置(诸如摄像头)进行实时视频的采集,通信终端2可以通过任何具有音频采集功能的装置(诸如录音单元)进行实时音频的采集。所述通信终端2可以基于诸如传输控制协议(TCP)或用户数据报协议(UDP)等将实时采集的视音频实时或定时上传到相应的服务器,由服务器对多个通信终端2上传的视音频进行统一地整合处理。
实践中,所述多个通信终端2通常位于特定的场景,各个通信终端2通常负责采集特定场景的一部分实时视频,当各个通信终端2实时将所采集的各部分视音频信息上传到相应的服务器,由服务器对这些视音频进行整合,得到该特定场景的完整的实时视音频。当然,服务器可以对所述多个通信终端2中的一部分通信终端上传的视音频进行整合,也可以对全部所述多个通信终端上传的视音频进行整合。典型地,请参考图2,图2示出了根据本发明一个实施例的多个通信终端进行实时视音频采集的示意图。如图2所示,在一个长型场所,放置六台通信终端2,每个通信终端2负责采集该宴会场所的一定区域的视音频信息(由对应的视场决定),位置相邻或相近的通信终端2通常所采集的视音频存在交叉或重叠,例如,相邻的两个通信终端2同时拍摄到同一个人,或同时捕捉到多个人的发言,则服务器经过对相邻的两个通信终端2上传的包含同一个人的多个视频或同时捕捉到多个人的发言的多个音频进行整合处理,在该整合的视频画面中,仅包含这个人的整合之后的整体画面,而不是包含这个人的具有画面重叠部分的两个独立的画面;在该整合的音频中,仅包含捕捉到的多个人的整合之后的一份音频,而不是包含捕捉到的多个人的重叠的两份独立音频的叠加音频。在图2中,6个通信终端2分别捕捉到6个人p1-p6的视音频,每个通信终端捕捉到一个人的视音频。
特定的场景可以是大型会议场所、宴会场所等,还可以是其他需要多 个通信终端进行现场的实时视音频采集的场所。
图3(a)示出了根据本发明一个实施例的由服务器整合后的六个通信终端拍摄的视频。假设被监视场景中的六个人p1-p6分别位于六个通信终端采集的视频6-1、6-2……6-6中,其中每个通信终端采集的视频部分在整合视频中称为“窗口”。如果将图3(a)中整个的整合视频显示在移动终端1的显示器上,会导致每个窗口太小,看不清人。因此,本发明的一个实施例允许在移动终端1的显示器180上只显示部分窗口。如图3(b)所示,在安装于移动终端1的工具11激活后移动终端1的显示器180上显示的初始画面中只包括窗口6-2和6-3,即人p2、p3所在窗口。
由于显示器180上显示的是两个窗口,如果移动终端的扬声器输出所有窗口中(即所有通信终端采集的声音)的话,持有移动终端1的用户就会发生困惑,因为有些声音来自窗口6-2和6-3这两个窗口以外的窗口,用户会不知道是否是这两个窗口中的人发出的声音。因此,有必要此时让用户仅听到这两个窗口中的人相关的声音。发送单元101此时知道在移动终端1的显示器上显示的视频对应着哪些通信终端(在本例中第2、3个通信终端),因此,它可以基于在移动终端1的显示器上显示的视频对应的、所述多个通信终端2中的第一通信终端集合(即第2、3个通信终端),发送对第一通信终端集合中的通信终端2采集的实时音频的整合的音频的请求,接收单元102接收第一通信终端集合中的通信终端2采集的实时音频的整合的音频,从而移动终端1的扬声器只输出所述来自第一通信终端集合的通信终端2(在本例中即第2、3个通信终端)采集的实时音频的整合的音频,而不是来自所有6个通信终端2采集的实时音频的整合的音频。
应当理解,图1所示的框图仅仅是为了示例的目的,而不是对本发明范围的限制。在某些情况下,可以根据具体情况增加或减少某些单元或装置。
根据本发明的一个优选实施例的工具11,发送单元101还向第一通信终端集合中的通信终端2发起连接请求,并响应于第一通信终端集合中的通信终端2的自动应答,与第一通信终端集合中的通信终端2建立双向通信。对此可参考图4,图4示出了根据本发明一个优选实施例的移动终端与第一通信集合中的通信终端直接建立连接的示意图。由此,无需移动终端旁的用 户进行手动地选择待发起连接请求的对象,也无需在选定通信对象后手动启动通信连接请求。这样,移动终端旁的用户无需将当前播放的视频页面进行切换至向通信终端2发起连接请求的页面,因而使得移动终端旁用户可以在本移动终端与通信终端2建立通信的过程中无打扰地观看当前视频页面。例如,在图2b所示的显示器中显示窗口6-2、6-3,因此,发起向与窗口6-2、6-3相关(即拍摄了窗口6-2、6-3的视频)的第2、3个通信终端建立通信的连接请求。
根据本发明的一个优选实施例的工具11,还包括:缩放单元104,被配置为响应于用户对移动终端1的显示器上显示的视频的缩放操作,对移动终端1的显示器上显示的视频进行缩放,从而显示器上显示的视频对应的第一通信终端集合改变。如图3(c)所示,当用户看到图3(b)所示的窗口6-2、6-3的视频后仅想看窗口6-2的视频、听人p2的声音时,可以放大显示器上的画面,使显示器上只显示有人p2的窗口6-2,此时移动终端的扬声器只输出该窗口对应的通信终端采集的声音,因此,用户可以与人p2进行单独监视,可以只获得与p2有关的视、音频而不受其他人的干扰。
具体而言,缩放单元104可以响应于用户诸如双指移动或滑动的操作,对移动终端1当前显示的视频画面进行缩小或放大,当满足诸如视频画面的大小位于诸如根据该工具默认的或用户预先设定的视频画面大小的范围内等条件时,缩放后的视频对应的第一通信终端集合改变。
根据本发明的一个优选实施例的工具11,还包括:
滑动单元105,被配置为响应于用户对移动终端1的显示器上显示的视频的滑动操作,对移动终端1的显示器上显示的视频进行滑动,从而显示器上显示的视频对应的第一通信终端集合改变。如图3(d)所示,当用户看到图3(b)所示的窗口6-2、6-3的视频后想看p3的右边还有谁,可以向右滑动窗口,此时取代窗口6-2、6-3,窗口6-3、6-4显示在显示器上。此时,用户可以获得与人p3、p4有关的视、音频,取代与人p2、p3有关的视、音频。
具体而言,滑动单元105可以响应于用户诸如拖动、长按滑动、仅滑动等操作,对移动终端1的显示器上当前显示的视频进行滑动,当满足诸如滑动的距离超过一定的阈值等条件时,滑动后的视频对应的第一通信终端集 合改变。
当然,在上文中,用户可以同时缩放和滑动当前显示的视频画面,也可以先缩放后滑动当前显示的视频画面,还可以先滑动后缩放当前显示的视频画面,则第一通信终端集合进行相应的改变。
根据本发明的一个优选实施例的工具11,发送单元101响应于接收到针对所述特定场景中特定人的选择,发送对所述多个通信终端2采集的实时视、音频中涉及所述特定人的实时视、音频的整合的视音频的请求,接收单元102接收第一通信终端集合中的通信终端2采集的实时音频的整合的音频。
具体地,所述接收到对所述特定场景中特定人的选择可以通过诸如以下的方式进行:例如,工具11识别出当前播放视频或接收到的视频中包含特定人的画面,将所识别出的特定人头像圈出以菜单的形式提供给用户进行选择;又如,通过响应于用户对在移动终端1的显示器上显示的视频中特定人的点击、双击等操作或接收到用户说出特定人的名字的音频等。如图3(e)所示,用户仅想知道人p2和p5在干什么,听到p2和p5在说什么,就直接说出p2和p5的名字,工具11通过语音识别从而识别出p2和p5,向服务器发送对p2和p5的视、音频的整合的视音频的请求。服务器识别出与窗口6-2、6-5相关联的第二通信终端、第五通信终端分别采集了p2、p5的视音频,将第二通信终端、第五通信终端采集的视频及音频分别整合,发送给工具11的接收单元102。这样,在移动终端的显示器上出现了图3(e)所示的整合后的窗口p2、p5,并且移动终端的扬声器输出的也是与窗口p2、p5对应的音频,达到了用户仅看到自己感兴趣的人的视频、听到自己感兴趣的人的音频的效果。
其中,所述工具11在识别当前播放视频或接收到的视频中包含特定人的画面的情况下,可以预先将特定人的人脸的模式和/或声音频率存储在存储器中,当接收到的视音频或当前播放的视音频中存在特定人的人脸的模式匹配或/和存在特定人的声音频率的匹配,则将特定人的头像从视频画面中截取并圈出,提供给用户进行选择。当然,所述工具也可以采用自学习的方法来识别包含特定人的画面的视频或/和音频。例如, 如果接收到的视音频中频繁出现某个人的画面或/和某个人的声音频率,则可以在移动终端1的显示器上显示提示,提示的内容为识别出特定人,请移动终端1旁的用户判断并命名,如果移动终端旁的用户发现识别错误,则在显示器上输入反馈信息返回至该工具,在下一次识别中该工具根据历史反馈信息进行相应地纠正。在自学习的方式下,可以不预先将特定人的人脸的模式或/和声音频率存储在存储器中。
当用户做出选择后,发送单元101响应于接收到针对所述特定场景中特定人的选择,发送对所述多个通信终端2采集的实时视音频中涉及所述特定人的实时视音频的整合的视音频的请求并由接收单元102接收相应的整合的音频。其中,通信终端2可以基于人脸识别、身高识别、声音识别中的一个或多个来识别特定人。
根据本发明的一个优选实施例的工具11,发送单元101响应于接收到针对所述特定场景中特定人的选择,向采集了涉及所述特定人的实时视音频的通信终端2发起连接请求,并响应于采集了涉及所述特定人的实时视音频的通信终端2的自动应答,与采集了涉及所述特定人的实时视音频的通信终端2建立通信。这样,携带移动终端1的用户就不只是按照自己的意愿看到希望的人的视频、听到希望的人的音频而已,希望的人也看到了自己的视频,听到了自己的音频,即实现了与希望的人的双向通信。
具体地,发送单元101还可以向采集了涉及所述特定人的实时视音频的通信终端2发起连接请求,由此直接在移动终端1和通信终端2之间建立通信,以便于移动终端1直接与特定的一个或多个通信终端2进行实时通信,互相获取对方的实时视音频。
当然,上述移动终端1可以为一个或多个,当移动终端1为多个时,各移动终端1之间可以是相互关联的,也可以是相互独立的。
根据本发明的另一个方面的一个实施例,提供了一种智能整合实时音视频的服务器3。请参考图5,图5示出了根据本发明一个实施例的智能整合实时音视频的服务器的示意性框图。根据图5,所述服务器包括:
视、音频接收装置301,被配置为接收来自多个通信终端2的实时 视音频、来自移动终端1的对所述多个通信终端采集的实时视频的整合的视频的请求、来自移动终端1的对所述多个通信终端2中第一通信终端集合中的通信终端2采集的实时音频的整合的音频的请求;
视音频整合装置302,被配置为响应于来自移动终端1的对所述多个通信终端2采集的实时视频的整合的视频的请求,对所述多个通信终端2采集的实时视频进行整合,并响应于来自移动终端1的对所述多个通信终端2中第一通信终端集合中的通信终端2采集的实时音频的整合的音频的请求,对所述多个通信终端2中第一通信终端集合中的通信终端2采集的实时音频进行整合;
视音频发送装置303,被配置为将整合的视频或/和整合的音频发送到移动终端1。
其中,所述服务器3可以包括但不限于单个网络服务器、多个网络服务器集或多个服务器构成的云。该服务器3一方面接收来自多个通信终端2实时或及时上传的视音频,一方面还可以接收来自移动终端1的对所述多个通信终端2采集的实时视频或/和实时音频的整合后的视频或/和音频,根据所接收到的对实时视频或/和实时音频的整合的请求,对相应的视频或/和音频进行整合并将整合后的视频或/和音频发送至移动终端1。
根据本发明的一个实施例,所述服务器3还包括:通信建立单元305,被配置为响应于接收到来自移动终端1的向所述第一通信终端集合中的通信终端2的连接请求,向所述第一通信终端集合中的通信终端2转发该连接请求,并响应于第一通信终端集合中的通信终端2的自动应答,在移动终端1和第一通信终端集合中的通信终端2间建立双向通信。
在该实施例中,所述服务器还可以作为通信中转站,在移动终端1和通信终端2之间建立通信。请参考图6,图6示出了根据本发明一个优选实施例的基于服务器在移动终端和通信终端之间建立通信的示意图。具体而言,服务器3接收到移动终端1的向第一通信终端集合中的通信终端或向特定的一个或多个通信终端发出的连接请求,则根据接收到的连接请求,向目标通信终端转发该连接请求,收到目标通信终端的自动 应答后,与移动终端1和目标通信终端2建立双向通信连接。
请参考图7,图7示出了根据本发明一个实施例的视音频整合装置的示意性框图。根据本发明的一个实施例,所述视、音频整合装置302包括:
视频画面比对模块3021,被配置为将所述多个通信终端2采集的实时视频进行实时比对,确定所述多个通信终端2采集的实时视频之间的重叠部分;
重叠部分消除模块3022,被配置为消除所述多个通信终端2采集的实时视频之间的重叠部分,从而对所述多个通信终端2采集的实时视频进行整合。
具体而言,由于多个通信终端2中的每个通信终端通常负责采集特定场景的一部分音视频,由于采集的视频通常都是广角拍摄的,而为了采集特定场景的所有视角的视频,相邻或相近位置的通信终端采集的音视频通常存在重叠部分,而为了将多个通信终端采集的视频整合成一整幅完整的、无整合痕迹的、看上去由一个具有无限视场的通信终端采集的视频,需要对多个通信终端采集的视音频中重叠的部分予以消除,仅保留一份对相同场景采集的视音频。而为了将整合的视频实时发送至移动终端,需要对多个通信终端2采集的实时视频进行实时比对,以确定并消除其中重叠的视频画面。
根据本发明的一个实施例,所述服务器3还包括:识别装置304,响应于接收到来自移动终端1的对所述多个通信终端2采集的实时视音频中涉及所述特定人的实时视音频的整合的视音频的请求,识别所述多个通信终端2采集的视音频中涉及所述特定人的实时视音频,并且
所述视音频整合装置302整合所述涉及所述特定人的实时视、音频,
所述视音频发送装置303向移动终端1发送整合的所述涉及所述特定人的实时视、音频。
其中,服务器3也可以通过预先将特定人的人脸模式和/或声音频率存储在存储器或自学习等方式来识别所接收的来自多个通信终端2采 集的视音频中涉及特定人的实时视音频,并对所识别出的实时视音频从所接收的所有视音频中筛选并进行整合,并发送给移动终端1。
根据本发明的一个实施例,所述服务器3还包括:识别装置304,响应于接收到来自移动终端1的向采集了涉及所述特定人的实时视音频的通信终端2的连接请求,识别所述多个通信终端2采集的实时视音频中涉及所述特定人的实时视音频,从而识别采集了涉及所述特定人的实时视音频的通信终端2,并且
所述通信建立单元305向采集了涉及所述特定人的实时视、音频的通信终端2转发连接请求,并响应于采集了涉及所述特定人的实时视、音频的通信终端2的自动应答,在移动终端1和采集了涉及所述特定人的实时视、音频的通信终端2之间建立双向通信。
在该实施例中,所述服务器3同样作为通信中转站,接收到来自移动终端1的向采集了涉及特定人的实时视音频的通信终端2的连接请求,在所述移动终端1和所述涉及特定人的实时视音频的通信终端2之间建立双向通信连接。
所属技术领域的技术人员知道,本发明可以实现为设备、装置、方法或计算机程序产品。因此,本公开可以具体实现为以下形式,即:可以是完全的硬件,也可以是完全的软件,还可以是硬件和软件结合的形式。
附图中的流程图和框图显示了根据本发明的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组 合来实现。
对于本领域技术人员而言,显然本发明不限于上述示范性实施例的细节,而且在不背离本发明的精神或基本特征的情况下,能够以其他的具体形式实现本发明。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本发明的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化囊括在本发明内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。

Claims (14)

  1. 一种安装于移动终端(1)的工具(11),包括:
    发送单元(101),被配置为响应于第一触发,发送对多个通信终端(2)采集的实时视频的整合的视频的请求,其中所述多个通信终端(2)分别采集特定场景的一部分的实时视频,所述多个通信终端(2)分别采集的实时视频整合后构成所述特定场景的实时视频;
    接收单元(102),被配置为接收所述多个通信终端(2)采集的实时视频的整合的视频,
    其中,发送单元(101)基于在移动终端(1)的显示器上显示的视频对应的、所述多个通信终端(2)中的第一通信终端集合,发送对第一通信终端集合中的通信终端(2)采集的实时音频的整合的音频的请求,接收单元(102)接收第一通信终端集合中的通信终端(2)采集的实时音频的整合的音频,其中在移动终端(1)的显示器上显示的视频是所述多个通信终端(2)采集的实时视频的整合的视频的一部分。
  2. 根据权利要求1所述的工具(11),还包括:配置单元(103),用于接收用户对所述多个通信终端(2)采集的视音频进行整合的配置。
  3. 根据权利要求1所述的工具(11),其中发送单元(101)还向第一通信终端集合中的通信终端(2)发起连接请求,并响应于第一通信终端集合中的通信终端(2)的自动应答,与第一通信终端集合中的通信终端(2)建立双向通信。
  4. 根据权利要求1所述的工具(11),还包括:
    缩放单元(104),被配置为响应于用户对移动终端(1)的显示器上显示的视频的缩放操作,对移动终端(1)的显示器上显示的视频进行缩放,从而显示器上显示的视频对应的第一通信终端集合改变。
  5. 根据权利要求1所述的工具(11),还包括:
    滑动单元(105),被配置为响应于用户对移动终端(1)的显示器上显示的视频的滑动操作,对移动终端(1)的显示器上显示的视频进行滑动,从而显示器上显示的视频对应的第一通信终端集合改变。
  6. 根据权利要求1所述的工具(11),其中所述第一触发包括以下中的任一种:
    所述移动终端的开机;
    所述移动终端开机状态下所述工具的激活;
    所述移动终端开机状态下用户界面上的特定动作;
    所述移动终端开机状态下接收到的特定语音;
    所述移动终端开机状态下感测到的光线变强。
  7. 根据权利要求1所述的工具(11),其中发送单元(101)响应于接收到针对所述特定场景中特定人的选择,发送对所述多个通信终端(2)采集的实时视、音频中涉及所述特定人的实时视、音频的整合的视、音频的请求,接收单元(102)接收所述多个通信终端(2)采集的实时视、音频中涉及所述特定人的实时视、音频的整合的视、音频。
  8. 根据权利要求1所述的工具(11),其中发送单元(101)响应于接收到针对所述特定场景中特定人的选择,向采集了涉及所述特定人的实时视、音频的通信终端(2)发起连接请求,并响应于采集了涉及所述特定人的实时视、音频的通信终端(2)的自动应答,与采集了涉及所述特定人的实时视、音频的通信终端(2)建立双向通信。
  9. 根据权利要求1所述的工具(11),其中针对所述特定场景中特定人的选择是对在移动终端(1)的显示器上显示的视频中特定人的点击或说出特定人的名字。
  10. 一种智能整合实时音视频的服务器(3),包括:
    视、音频接收装置(301),被配置为接收来自多个通信终端(2)的实时视、音频、来自移动终端(1)的对所述多个通信终端(2)采集的实时视频的整合的视频的请求、来自移动终端(1)的对所述多个通信终端(2)中第一通信终端集合中的通信终端(2)采集的实时音频的整合的音频的请求;
    视、音频整合装置(302),被配置为响应于来自移动终端(1)的对所述多个通信终端(2)采集的实时视频的整合的视频的请求,对所述多个通信终端(2)采集的实时视频进行整合,并响应于来自移动终端(1)的对所述多个通信终端(2)中第一通信终端集合中的通信终端(2)采集的实时音 频的整合的音频的请求,对所述多个通信终端(2)中第一通信终端集合中的通信终端(2)采集的实时音频进行整合;
    视、音频发送装置(303),被配置为将整合的视频或/和整合的音频发送到移动终端(1)。
  11. 根据权利要求10所述的服务器(3),还包括:通信建立单元(305),被配置为响应于接收到来自移动终端(1)的向所述第一通信终端集合中的通信终端(2)的连接请求,向所述第一通信终端集合中的通信终端(2)转发该连接请求,并响应于第一通信终端集合中的通信终端(2)的自动应答,在移动终端(1)和第一通信终端集合中的通信终端(2)间建立双向通信。
  12. 根据权利要求10所述的服务器(3),其中视、音频整合装置(302)包括:
    视频画面比对模块(3021),被配置为将所述多个通信终端(2)采集的实时视频进行实时对比,确定所述多个通信终端(2)采集的实时视频之间的重叠部分;
    重叠部分消除模块(3022),被配置为消除所述多个通信终端(2)采集的实时视频之间的重叠部分,从而对所述多个通信终端(2)采集的实时视频进行整合。
  13. 根据权利要求10所述的服务器(3),还包括:识别装置(304),响应于接收到来自移动终端(1)的对所述多个通信终端(2)采集的实时视、音频中涉及所述特定人的实时视、音频的整合的视、音频的请求,识别所述多个通信终端(2)采集的实时视、音频中涉及所述特定人的实时视、音频,并且
    所述视、音频整合装置(302)整合所述涉及所述特定人的实时视、音频,
    所述视、音频发送装置(303)向移动终端(1)发送整合的所述涉及所述特定人的实时视、音频。
  14. 根据权利要求11所述的服务器(3),还包括:识别装置(304),响应于接收到来自移动终端(1)的向采集了涉及所述特定人的实时视、音频的通信终端(2)的连接请求,识别所述多个通信终端(2)采集的实时视、 音频中涉及所述特定人的实时视、音频,从而识别采集了涉及所述特定人的实时视、音频的通信终端(2),并且
    所述通信建立单元(305)向采集了涉及所述特定人的实时视、音频的通信终端(2)转发连接请求,并响应于采集了涉及所述特定人的实时视、音频的通信终端(2)的自动应答,在移动终端(1)和采集了涉及所述特定人的实时视、音频的通信终端(2)之间建立双向通信。
PCT/CN2014/086576 2014-07-15 2014-09-15 一种移动终端的工具及智能整合音视频的服务器 WO2016008209A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/326,248 US10349008B2 (en) 2014-07-15 2014-09-15 Tool of mobile terminal and intelligent audio-video integration server

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410337180.2 2014-07-15
CN201410337180.2A CN104135641B (zh) 2014-07-15 2014-07-15 一种移动终端的工具及智能整合音视频的服务器

Publications (1)

Publication Number Publication Date
WO2016008209A1 true WO2016008209A1 (zh) 2016-01-21

Family

ID=51808153

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/086576 WO2016008209A1 (zh) 2014-07-15 2014-09-15 一种移动终端的工具及智能整合音视频的服务器

Country Status (3)

Country Link
US (1) US10349008B2 (zh)
CN (1) CN104135641B (zh)
WO (1) WO2016008209A1 (zh)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104135641B (zh) * 2014-07-15 2018-10-02 北京小鱼在家科技有限公司 一种移动终端的工具及智能整合音视频的服务器
CN105959614A (zh) * 2016-06-21 2016-09-21 维沃移动通信有限公司 一种视频会议的处理方法及系统
CN106331838A (zh) * 2016-08-25 2017-01-11 刘华英 多媒体播放日志的管理方法和装置
CN108933914B (zh) * 2017-05-24 2021-09-28 中兴通讯股份有限公司 一种使用移动终端进行视频会议的方法及系统
CN108833820B (zh) * 2018-05-29 2021-03-12 Oppo广东移动通信有限公司 视频通话方法及相关产品
CN109215688B (zh) * 2018-10-10 2020-12-22 麦片科技(深圳)有限公司 同场景音频处理方法、装置、计算机可读存储介质及系统
CN109600628A (zh) * 2018-12-21 2019-04-09 广州酷狗计算机科技有限公司 视频制作方法、装置、计算机设备及存储介质
EP4171022B1 (en) * 2021-10-22 2023-11-29 Axis AB Method and system for transmitting a video stream
CN116886856B (zh) * 2023-09-08 2023-12-15 湖北华中电力科技开发有限责任公司 基于视频通讯的电力应急会商方法及系统

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101867769A (zh) * 2010-05-26 2010-10-20 广东亿迅科技有限公司 一种网络多媒体通信方法与系统
CN102892032A (zh) * 2012-11-02 2013-01-23 湖南正海智慧网真设备有限公司 实时互动高清网络视频通讯系统

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050015444A1 (en) * 2003-07-15 2005-01-20 Darwin Rambo Audio/video conferencing system
US7453829B2 (en) * 2003-10-29 2008-11-18 Tut Systems, Inc. Method for conducting a video conference
EA200702509A1 (ru) * 2005-05-13 2008-06-30 КЭПЧЕ-КЭМ АйПи ПиТиУай ЛТД. Способ и система для передачи видеосигнала на мобильный терминал
US8181115B2 (en) * 2008-02-11 2012-05-15 Dialogic Corporation System and method for performing video collaboration
JP2009192949A (ja) * 2008-02-15 2009-08-27 Sony Corp 画像処理装置と画像処理方法および画像処理システム
CN101534413B (zh) * 2009-04-14 2012-07-04 华为终端有限公司 一种远程呈现的系统、装置和方法
US20110088068A1 (en) * 2009-10-09 2011-04-14 Sony Ericsson Mobile Communications Ab Live media stream selection on a mobile device
US9152303B2 (en) * 2012-03-01 2015-10-06 Harris Corporation Systems and methods for efficient video analysis
CN104135641B (zh) * 2014-07-15 2018-10-02 北京小鱼在家科技有限公司 一种移动终端的工具及智能整合音视频的服务器

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101867769A (zh) * 2010-05-26 2010-10-20 广东亿迅科技有限公司 一种网络多媒体通信方法与系统
CN102892032A (zh) * 2012-11-02 2013-01-23 湖南正海智慧网真设备有限公司 实时互动高清网络视频通讯系统

Also Published As

Publication number Publication date
US10349008B2 (en) 2019-07-09
CN104135641B (zh) 2018-10-02
CN104135641A (zh) 2014-11-05
US20180176507A1 (en) 2018-06-21

Similar Documents

Publication Publication Date Title
WO2016008209A1 (zh) 一种移动终端的工具及智能整合音视频的服务器
US9641585B2 (en) Automated video editing based on activity in video conference
US9912907B2 (en) Dynamic video and sound adjustment in a video conference
JP6117446B2 (ja) リアルタイム・ビデオの提供方法、リアルタイム・ビデオの提供装置、サーバ、端末装置、プログラム及び記録媒体
US10057542B2 (en) System for immersive telepresence
US9473741B2 (en) Teleconference system and teleconference terminal
US11115227B2 (en) Terminal and method for bidirectional live sharing and smart monitoring
JP6179834B1 (ja) テレビ会議装置
KR20170023699A (ko) 동영상 촬영 방법, 그 장치, 프로그램 및 기록매체
US11076127B1 (en) System and method for automatically framing conversations in a meeting or a video conference
WO2011109578A1 (en) Digital conferencing for mobile devices
WO2012072008A1 (zh) 视频信号的辅助信息叠加方法及装置
JP2018519679A (ja) ビデオ処理方法、装置、プログラム及び記録媒体
WO2017036616A1 (en) Apparatus for video communication
CN111988555B (zh) 一种数据处理方法、装置、设备和机器可读介质
CN113093578A (zh) 控制方法及装置、电子设备和存储介质
WO2015131577A1 (zh) 一种远程协助方法、装置和计算机存储介质
CN108924529B (zh) 图像显示的控制方法及装置
CN109218612B (zh) 一种追踪拍摄系统及拍摄方法
JP2010004480A (ja) 撮像装置、その制御方法及びプログラム
JP6544209B2 (ja) 情報処理装置、会議システム、情報処理方法およびプログラム
CN111526295A (zh) 音视频处理系统、采集方法、装置、设备及存储介质
CN217546174U (zh) 智能会议系统
CN109194918B (zh) 一种基于移动载体的拍摄系统
CN112437279B (zh) 视频分析方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14897668

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03/05/2017)

WWE Wipo information: entry into national phase

Ref document number: 15326248

Country of ref document: US

122 Ep: pct application non-entry in european phase

Ref document number: 14897668

Country of ref document: EP

Kind code of ref document: A1