CN112929592A

CN112929592A - Video call method, display device and server

Info

Publication number: CN112929592A
Application number: CN202110168641.8A
Authority: CN
Inventors: 李鑫
Original assignee: Qingdao Hisense Media Network Technology Co Ltd
Current assignee: Qingdao Hisense Media Network Technology Co Ltd; Juhaokan Technology Co Ltd
Priority date: 2021-02-07
Filing date: 2021-02-07
Publication date: 2021-06-08

Abstract

The invention discloses a video call method, display equipment and a server.A second audio and video data of a target object and a first audio and video data of an initiator user are obtained from the server in response to receiving first connected state information sent by the server after a video call request for the target object is sent to the server; the first connected state information is sent after the server inquires that the display device does not have video and/or audio acquisition capacity after the opposite terminal device is connected with the video call, acquires a first target terminal capable of making up the acquisition capacity of the display device, and receives and uploads first audio and video data and second audio and video data acquired by the first target terminal; and displaying the video data in the first audio and video data and the second audio and video data in different windows, and playing the audio data in the first audio and video data and the second audio and video data. The effectiveness of two-way video call of users at two ends can be realized when the acquisition capability of the display equipment is lost.

Description

Video call method, display device and server

Technical Field

The invention relates to the field of intelligent interaction, in particular to a video call method, display equipment and a server.

Background

The communication application can be installed in display equipment such as a television, and after the video communication is started, a home-end user can join the virtual room and conduct video chat with friends in the virtual room through a large screen. The method is limited by hardware configuration and models of display equipment, and at present, many display equipment are not provided with a camera and a microphone, so that the display equipment cannot acquire and upload audio and video data of a local user to a server, and friends at opposite ends cannot acquire the audio and video data of the local user from the server.

In actual life, a user may have one or more old mobile phones, and the old mobile phones are generally characterized by being equipped with basic hardware such as a camera and a microphone, so that the old mobile phones have audio and video acquisition capacity and basic processing capacity, but are idle due to the fact that the old mobile phones cannot meet the increasing use requirements of the user in other aspects. In consideration of the characteristics and the defects of the display device and the old mobile phone, the method and the system utilize the idle old mobile phone to make up the defect that the display device cannot collect audio and video data, realize effective video call and improve interactive experience.

Disclosure of Invention

In order to solve the technical problems discussed in the background art, the invention provides a video call method, a display device and a server, which are used for fusing functions of the display device and an idle old mobile phone of a user, so that the interactive experience of video call among users is improved.

The display device provided in the first aspect, which corresponds to a device side that initiates a video call request, includes:

a display for playing video data;

a sound player for playing audio data;

a communicator for communicatively connecting the display device with the server;

a controller for performing:

after a video call request for a target object is sent to a server, if the display equipment does not have video and/or audio acquisition capacity, second audio and video data of the target object and first audio and video data of an initiator user are obtained from the server in response to receiving first connected state information sent by the server; the first connected state information is sent after the server inquires that the display device does not have video and/or audio acquisition capability after an opposite terminal device used by a target object is connected with a video call, acquires a first target terminal capable of compensating the acquisition capability of the display device, receives first audio and video data acquired and uploaded by the first target terminal according to first control information sent by the server and receives second audio and video data of the target object; the first control information is used for indicating the first target terminal to control the acquisition device to acquire first audio and video data;

and controlling the display to respectively display the video data in the first audio and video data and the second audio and video data in different windows, and controlling the sound player to respectively play the audio data in the first audio and video data and the second audio and video data.

The display device provided in the second aspect, corresponding to the device side invited to access the video call, includes:

a display for playing video data;

a sound player for playing audio data;

a controller for performing:

when a video call request sent by a server is received, controlling a display to display an incoming call prompt interface;

receiving click operation of an answering control in the incoming call prompt interface, and if the display equipment does not have video and/or audio acquisition capacity, responding to received indication information sent by a server, and acquiring first audio and video data of an initiator user and second audio and video data acquired by a second target terminal from the server; the indication information is sent after the server inquires that the display equipment does not have video and/or audio acquisition capacity after the display equipment is connected with a video call, acquires a second target terminal which can compensate the acquisition capacity of the display equipment, receives second audio and video data which are acquired and uploaded by the second target terminal according to second control information sent by the server, and receives first audio and video data of an initiator user; the second control information is used for indicating a second target terminal to control an acquisition device of the second target terminal to acquire second audio and video data;

A third aspect provides a server comprising:

the communicator is used for being in communication connection with a first display device of an initiator user and a second display device of a target object;

a controller for performing:

receiving a video call request initiated by a first display device to a target object, and sending the video call request to a second display device so that the second display device displays an incoming call prompt interface;

receiving incoming call answering information sent by second display equipment, and respectively inquiring the equipment capability levels of the first display equipment and the second display equipment;

if at least one target device in the first display device and the second display device does not have video and/or audio acquisition capacity, traversing a device list to which the target device belongs, searching a target terminal capable of making up the acquisition capacity of the target device from the device list, and sending control information to the target terminal, wherein the control information is used for indicating the target terminal to acquire audio and video data of an end user to which the target terminal belongs;

when second audio and video data of a target object and first audio and video data of an initiator user are received, pull stream prompt information is respectively sent to first display equipment and second display equipment, and the pull stream prompt information is used for prompting the first display equipment and the second display equipment to start receiving the audio and video data required by the first display equipment and the second display equipment.

The video call method provided in the fourth aspect is a display device for initiating a video call request, and includes:

after a video call request for a target object is sent to a server, if display equipment does not have video and/or audio acquisition capacity, second audio and video data of the target object and first audio and video data of an initiator user are obtained from the server in response to receiving first connected state information sent by the server; the first connected state information is sent after the server inquires that the display device does not have video and/or audio acquisition capability after an opposite terminal device used by a target object is connected with a video call, acquires a first target terminal capable of compensating the acquisition capability of the display device, receives first audio and video data acquired and uploaded by the first target terminal according to first control information sent by the server and receives second audio and video data of the target object; the first control information is used for indicating the first target terminal to control the acquisition device to acquire first audio and video data;

and respectively displaying the video data in the first audio and video data and the second audio and video data in different windows, and respectively playing the audio data in the first audio and video data and the second audio and video data.

The video call method provided by the fifth aspect is used for a display device invited to access a video call, and comprises the following steps:

when a video call request sent by a server is received, displaying an incoming call prompt interface;

The video call method provided in the sixth aspect is applied to a server side, where a first display device corresponds to an initiator user, and a second display device corresponds to a target object invited to access a video call by an initiator user, and the method includes:

The three ends that this application technical scheme relates to are the first display device, the server that initiate the video conversation request and are invited to insert the second display device of video conversation respectively, and for first display device and second display device arbitrary one end, the equipment ability level of display device can be adjusted according to self hardware configuration, when display device possesses the collection and the broadcast ability of audio frequency and video simultaneously, can accomplish the collection and the plug flow of audio frequency and video data by display device self to draw the audio frequency and video stream of opposite terminal user from the server.

If the display device does not have video and/or audio acquisition capacity, namely the acquisition capacity of the display device is lost, a target terminal can be selected from a plurality of idle old terminals of a user to make up for the acquisition capacity of the display device, and the acquisition and playing of audio and video are realized through the matching of the display device and the target terminal, namely the target terminal (slave device) is equivalent to an acquisition device externally connected with the display device (master device). For example, the device used by the user a includes a display device a1, a terminal device a2, and a terminal device A3, the user may set the capability levels of the devices a1, a2, and A3 in a video call application, the setting result of each device capability level may be synchronized to the server, the server may maintain a device list, the capability level currently set by each device used by the user a is recorded in the device list, after the user B listens to a video call initiated by the user a, if the server queries that the display device a1 only has audio and video playing capability and does not have audio acquisition and video acquisition capability, a target terminal is obtained from the terminal device a2 and the terminal device A3, for example, the terminal device a2 has audio and video acquisition capability, the terminal device A3 only has audio acquisition capability but does not have video acquisition capability, obviously, the terminal device a2 is more adapted to the display device a1, the terminal device A2 is selected as a target terminal, and the terminal device A2 collects the audio and video data of the user A and pushes the data to a server; the display device A1 simultaneously pulls the second audio and video data of the user B and the first audio and video data collected by the terminal device A2 from the server, and plays and displays the first audio and video data and the second audio and video data at the local terminal, so that the user A can see the video pictures of the user A and the user B in different windows and hear the voices of the user A and the user B; the display device of the user B at the opposite end directly pulls the first audio and video data collected by the terminal device A2, so that the video picture of the user A can be seen and the voice of the user A can be heard, the effectiveness of two-way video call of the users at two ends can be realized when the collection capability of the display device A1 of the user A is lacked, the user B can not sense the process that the opposite end uses the master-slave device to cooperate to complete the audio and video push-pull stream, the video call experience is not different, the performance of an idle old terminal can be fully exerted, the cost of a display device outsourcing collection device is saved through the function combination of the master-slave devices, the physical connection between the display device and the terminal is not needed, the use scene is not needed to be restricted in the same Wifi network environment like call screen projection, the compulsory and lazy property to the network are reduced, the potential safety hazard is reduced, and the capability of the display device end and the, but only needs capability complementation, thereby providing more convenient and video interactive service for users.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings to be accessed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 illustrates a usage scenario of a display device according to some embodiments;

fig. 2 illustrates a hardware configuration block diagram of the control apparatus 100 according to some embodiments;

fig. 3 illustrates a hardware configuration block diagram of the display apparatus 200 according to some embodiments;

FIG. 4 illustrates a software configuration diagram in the display device 200 according to some embodiments;

FIG. 5 illustrates an icon control interface display of an application in display device 200, in accordance with some embodiments;

fig. 6 is a flowchart illustrating a video call method performed by the first display device;

fig. 7 is a flowchart illustrating a video call method performed by the second display device;

FIG. 8 is a schematic diagram illustrating interaction logic of a video call between a first display device, a server, and a second display device;

FIG. 9 is a schematic diagram illustrating a second interaction logic of a video call between a first display device, a server, and a second display device;

FIG. 10 is a schematic diagram illustrating the interaction logic of a video call between a third first display device, a server and a second display device;

FIG. 11 is a schematic diagram illustrating the interaction logic of a video call between a fourth first display device, a server, and a second display device;

fig. 12 is a schematic diagram illustrating an interaction logic of a video call between a fifth first display device, a server and a second display device;

FIG. 13 is a schematic diagram illustrating an interaction logic of a video call between a sixth first display device, a server, and a second display device;

FIG. 14 is a schematic diagram illustrating an interaction logic of a video call between a seventh first display device, a server and a second display device;

an interaction logic diagram of a video call between the eighth first display device, the server and the second display device is exemplarily shown in fig. 15.

Detailed Description

To make the purpose and embodiments of the present application clearer, the following will clearly and completely describe the exemplary embodiments of the present application with reference to the attached drawings in the exemplary embodiments of the present application, and it is obvious that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and not necessarily for describing a particular sequential or chronological order, unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances.

The terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or apparatus that comprises a list of elements is not necessarily limited to all elements expressly listed, but may include other elements not expressly listed or inherent to such product or apparatus.

The term "module" refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

Fig. 1 is a schematic diagram of a usage scenario of a display device according to an embodiment. As shown in fig. 1, the display apparatus 200 is also in data communication with a server 400, and a user can operate the display apparatus 200 through the smart device 300 or the control device 100.

In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes at least one of an infrared protocol communication or a bluetooth protocol communication, and other short-distance communication methods, and controls the display device 200 in a wireless or wired manner. The user may control the display apparatus 200 by inputting a user instruction through at least one of a key on a remote controller, a voice input, a control panel input, and the like.

In some embodiments, the smart device 300 may include any of a mobile terminal, a tablet, a computer, a laptop, an AR/VR device, and the like.

In some embodiments, the smart device 300 may also be used to control the display device 200. For example, the display device 200 is controlled using an application program running on the smart device.

In some embodiments, the smart device 300 and the display device may also be used for communication of data.

In some embodiments, the display device 200 may also be controlled in a manner other than the control apparatus 100 and the smart device 300, for example, the voice instruction control of the user may be directly received by a module configured inside the display device 200 to obtain a voice instruction, or may be received by a voice control apparatus provided outside the display device 200.

In some embodiments, the display device 200 is also in data communication with a server 400. The display device 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display apparatus 200. The server 400 may be a cluster or a plurality of clusters, and may include one or more types of servers.

In some embodiments, software steps executed by one step execution agent may be migrated on demand to another step execution agent in data communication therewith for execution. Illustratively, software steps performed by the server may be migrated to be performed on a display device in data communication therewith, and vice versa, as desired.

Fig. 2 exemplarily shows a block diagram of a configuration of the control apparatus 100 according to an exemplary embodiment. As shown in fig. 2, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface 140, a memory, and a power supply. The control apparatus 100 may receive an input operation instruction from a user and convert the operation instruction into an instruction recognizable and responsive by the display device 200, serving as an interaction intermediary between the user and the display device 200.

In some embodiments, the communication interface 130 is used for external communication, and includes at least one of a WIFI chip, a bluetooth module, NFC, or an alternative module.

In some embodiments, the user input/output interface 140 includes at least one of a microphone, a touchpad, a sensor, a key, or an alternative module.

Fig. 3 shows a hardware configuration block diagram of the display apparatus 200 according to an exemplary embodiment.

In some embodiments, the display apparatus 200 includes at least one of a tuner demodulator 210, a communicator 220, a detector 230, an external device interface 240, a controller 250, a display 260, an audio output interface 270, a memory, a power supply, a user interface.

In some embodiments the controller comprises a central processor, a video processor, an audio processor, a graphics processor, a RAM, a ROM, a first interface to an nth interface for input/output.

In some embodiments, the display 260 includes a display screen component for displaying pictures, and a driving component for driving image display, a component for receiving image signals from the controller output, displaying video content, image content, and menu manipulation interface, and a user manipulation UI interface, etc.

In some embodiments, the display 260 may be at least one of a liquid crystal display, an OLED display, and a projection display, and may also be a projection device and a projection screen.

In some embodiments, the tuner demodulator 210 receives broadcast television signals via wired or wireless reception, and demodulates audio/video signals, such as EPG data signals, from a plurality of wireless or wired broadcast television signals.

In some embodiments, communicator 220 is a component for communicating with external devices or servers according to various communication protocol types. For example: the communicator may include at least one of a Wifi module, a bluetooth module, a wired ethernet module, and other network communication protocol chips or near field communication protocol chips, and an infrared receiver. The display apparatus 200 may establish transmission and reception of control signals and data signals with the control device 100 or the server 400 through the communicator 220.

In some embodiments, the detector 230 is used to collect signals of the external environment or interaction with the outside. For example, detector 230 includes a light receiver, a sensor for collecting ambient light intensity; alternatively, the detector 230 includes an image collector, such as a camera, which may be used to collect external environment scenes, attributes of the user, or user interaction gestures, or the detector 230 includes a sound collector, such as a microphone, which is used to receive external sounds.

In some embodiments, the external device interface 240 may include, but is not limited to, the following: high Definition Multimedia Interface (HDMI), analog or data high definition component input interface (component), composite video input interface (CVBS), USB input interface (USB), RGB port, and the like. The interface may be a composite input/output interface formed by the plurality of interfaces.

In some embodiments, the controller 250 and the modem 210 may be located in different separate devices, that is, the modem 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box.

In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in memory. The controller 250 controls the overall operation of the display apparatus 200. For example: in response to receiving a user command for selecting a UI object to be displayed on the display 260, the controller 250 may perform an operation related to the object selected by the user command.

In some embodiments, the object may be any one of selectable objects, such as a hyperlink, an icon, or other actionable control. The operations related to the selected object are: displaying an operation connected to a hyperlink page, document, image, or the like, or performing an operation of a program corresponding to the icon.

In some embodiments the controller comprises at least one of a Central Processing Unit (CPU), a video processor, an audio processor, a Graphics Processing Unit (GPU), a RAM Random Access Memory (RAM), a ROM (Read-Only Memory), a first to nth interface for input/output, a communication Bus (Bus), and the like.

A CPU processor. For executing operating system and application program instructions stored in the memory, and executing various application programs, data and contents according to various interactive instructions receiving external input, so as to finally display and play various audio-video contents. The CPU processor may include a plurality of processors. E.g. comprising a main processor and one or more sub-processors.

In some embodiments, a graphics processor for generating various graphics objects, such as: at least one of an icon, an operation menu, and a user input instruction display figure. The graphic processor comprises an arithmetic unit, which performs operation by receiving various interactive instructions input by a user and displays various objects according to display attributes; the system also comprises a renderer for rendering various objects obtained based on the arithmetic unit, wherein the rendered objects are used for being displayed on a display.

In some embodiments, the video processor is configured to receive an external video signal, and perform at least one of video processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image synthesis according to a standard codec protocol of the input signal, so as to obtain a signal displayed or played on the direct display device 200.

In some embodiments, the video processor includes at least one of a demultiplexing module, a video decoding module, an image composition module, a frame rate conversion module, a display formatting module, and the like. The demultiplexing module is used for demultiplexing the input audio and video data stream. And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like. And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display. And the frame rate conversion module is used for converting the frame rate of the input video. And the display formatting module is used for converting the received video output signal after the frame rate conversion, and changing the signal to be in accordance with the signal of the display format, such as an output RGB data signal.

In some embodiments, the audio processor is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform at least one of noise reduction, digital-to-analog conversion, and amplification processing to obtain a sound signal that can be played in the speaker.

In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on display 260, and the user input interface receives the user input commands through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.

In some embodiments, a "user interface" is a media interface for interaction and information exchange between an application or operating system and a user that enables conversion between an internal form of information and a form that is acceptable to the user. A commonly used presentation form of the User Interface is a Graphical User Interface (GUI), which refers to a User Interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in the display screen of the electronic device, where the control may include at least one of an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc. visual interface elements.

In some embodiments, user interface 280 is an interface that may be used to receive control inputs (e.g., physical buttons on the body of the display device, or the like).

In some embodiments, a system of a display device may include a Kernel (Kernel), a command parser (shell), a file system, and an application program. The kernel, shell, and file system together make up the basic operating system structure that allows users to manage files, run programs, and use the system. After power-on, the kernel is started, kernel space is activated, hardware is abstracted, hardware parameters are initialized, and virtual memory, a scheduler, signals and interprocess communication (IPC) are operated and maintained. And after the kernel is started, loading the Shell and the user application program. The application program is compiled into machine code after being started, and a process is formed.

Referring to fig. 4, in some embodiments, the system is divided into four layers, which are an Application (Applications) layer (abbreviated as "Application layer"), an Application Framework (Application Framework) layer (abbreviated as "Framework layer"), an Android runtime (Android runtime) and system library layer (abbreviated as "system runtime library layer"), and a kernel layer from top to bottom.

In some embodiments, at least one application program runs in the application program layer, and the application programs may be windows (windows) programs carried by an operating system, system setting programs, clock programs or the like; or an application developed by a third party developer. In particular implementations, the application packages in the application layer are not limited to the above examples.

The framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions. The application framework layer acts as a processing center that decides to let the applications in the application layer act. The application program can access the resources in the system and obtain the services of the system in execution through the API interface.

As shown in fig. 4, in the embodiment of the present application, the application framework layer includes a manager (Managers), a Content Provider (Content Provider), and the like, where the manager includes at least one of the following modules: an Activity Manager (Activity Manager) is used for interacting with all activities running in the system; the Location Manager (Location Manager) is used for providing the system service or application with the access of the system Location service; a Package Manager (Package Manager) for retrieving various information related to an application Package currently installed on the device; a Notification Manager (Notification Manager) for controlling display and clearing of Notification messages; a Window Manager (Window Manager) is used to manage the icons, windows, toolbars, wallpapers, and desktop components on a user interface.

In some embodiments, the activity manager is used to manage the lifecycle of the various applications as well as general navigational fallback functions, such as controlling exit, opening, fallback, etc. of the applications. The window manager is used for managing all window programs, such as obtaining the size of a display screen, judging whether a status bar exists, locking the screen, intercepting the screen, controlling the change of the display window (for example, reducing the display window, displaying a shake, displaying a distortion deformation, and the like), and the like.

In some embodiments, the system runtime layer provides support for the upper layer, i.e., the framework layer, and when the framework layer is used, the android operating system runs the C/C + + library included in the system runtime layer to implement the functions to be implemented by the framework layer.

In some embodiments, the kernel layer is a layer between hardware and software. As shown in fig. 4, the core layer includes at least one of the following drivers: audio drive, display driver, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (like fingerprint sensor, temperature sensor, pressure sensor etc.) and power drive etc..

In some embodiments, the display device may directly enter the interface of the preset vod program after being activated, and the interface of the vod program may include at least a navigation bar 510 and a content display area located below the navigation bar 510, as shown in fig. 5, where the content displayed in the content display area may change according to the change of the selected control in the navigation bar. The programs in the application program layer can be integrated in the video-on-demand program and displayed through one control of the navigation bar, and can also be further displayed after the application control in the navigation bar is selected.

In some embodiments, the display device may directly enter a display interface of a signal source selected last time after being started, or a signal source selection interface, where the signal source may be a preset video-on-demand program, or may be at least one of an HDMI interface, a live tv interface, and the like, and after a user selects different signal sources, the display may display contents obtained from different signal sources.

In the above embodiments, the display device is shown to have the detector 230, where the detector 230 includes a sound collector for collecting user audio data and an image collector for collecting user video data, and if the display device is configured with the detector 230, the display device has video collection capability, video display capability, audio collection capability, and audio playing capability, that is, the display device itself can satisfy the collection, stream pushing, stream pulling, and playing and displaying capabilities of audio and video streams in a video call. When a user initiates a video call request, a server establishes a virtual room, if an initiator user only invites a target object, the virtual room belongs to one-to-one video call, namely, the virtual room is only accessed to 2 paths after the target object answers a video call; if the user initiates a call invitation to N (N is more than 1) target objects at the same time, the video call belongs to many-to-many video call, and after each target object is accessed into a virtual room, the virtual room is actually accessed into N +1 paths. And the audio and video data of each member in the virtual room are acquired and then uploaded to the server, so that in the video call process, the display equipment acquires the audio and video data of other members in the virtual room except the home terminal user from the server and displays and plays the audio and video data at the home terminal.

In an actual application scene, many display devices may not be configured with hardware acquisition devices such as a camera and/or a microphone, so that the display devices only have audio and video playing capabilities but do not have video and/or audio acquisition capabilities, so that audio data and/or video data of a local user cannot be acquired, and part or even complete lack of the acquisition capabilities may cause that an opposite-end user cannot see a video picture of the local user, cannot hear voice of the local user, or cannot realize audio-visual interactive conversation, and the opposite-end user can perceive that the local display device has the lack of acquisition capabilities, thereby reducing application experience of the video conversation on the display devices.

Although the existing small-screen terminal is used for collecting audio and video streams and the video call started by the terminal is projected to the application of the display device to be played in a screen projection mode, the application scene of the mode limits that the small-screen terminal and the large-screen display device need to be connected with the same Wifi network, the small-screen terminal and the large-screen display device have limitation and strong dependence on network conditions, certain potential safety hazards may exist, the terminal and the display device need to have the same device capability level, and the idle old model terminal can not be matched with the display device to be used.

In this embodiment of the present application, a capability, such as an audio capture capability and/or a video capture capability, of an old terminal that is idle by a user is fully exerted, when a display device does not have the video and/or audio capture capability, a target terminal capable of compensating for the missing capture capability of the display device is selected from a plurality of old terminals, and effective video call interaction is realized by the cooperation of the target terminal and the display device capability.

In some embodiments, at least one old terminal left unused by the user may be placed at a preset position of the display device, for example, fixed on the top of a screen frame of the display device, and a camera of the terminal device is aligned with the front side of the display device, so as to facilitate the collection of a video picture of the user at the front side of the display device; or, the terminal device may be placed at a position where the user is often located, for example, near a sofa, and when the user sits on the sofa, a close-up video picture of the user may be captured.

In some embodiments, a video call application may be installed in the display device and the plurality of idle terminal devices, and a user may set device capability levels of the display device and the terminal devices in the video call application, where the device capability levels may include complete capabilities of the devices or partial capabilities thereof, and support dynamic adjustment at any time according to relevant pages in the video call application in an actual scene, but the capability level setting cannot exceed an upper limit of hardware configuration of the device itself. For example, the terminal device a2 has video display capability, video capture capability, audio playing capability and audio capture capability, and it is expected to play audio and video data in a large-screen display device, so the device capability level of the terminal device a2 can be set as video display capability N (abbreviated as NO, which indicates not available), video capture capability Y (abbreviated as YES, which indicates available), audio playing capability N and audio capture capability Y, so that the terminal device a2 only enables the audio and video capture capability therein, i.e., the terminal device a2 is equivalent to an audio and video data capture device, after the terminal device a2 accesses a video call, it will not play the audio and video data captured by itself in its application interface, and also pull the audio and video streams of other members in a virtual room for display and play, i.e., the terminal device a2 only captures the audio and pushes the audio and video streams to a server, but not to pull the stream.

In some embodiments, when a display device and a plurality of terminal devices log in an account of the same video call application, the devices establish associations, the display device having the capability of displaying video data and playing audio data on a large screen serves as a master device, the associated terminal devices serve as slave devices, and the master device and the slave device can form a device set used by a current user account. The server can establish a corresponding device list for each user account, the device list records the device information of each device in the device set, the device information comprises a device identifier and the current device capability level, the device identifier can be a device name, an IP address and an MAC address of the device, and specific devices can be conveniently identified; the device capability level is used to indicate whether the device has audio playing capability, video display capability, video capturing capability and audio capturing capability. After the user adjusts the device capability level each time, the display device or the terminal device synchronizes the modified device capability level and the device information formed by the device identifier to the server, so that the server updates and dynamically maintains the device capability level of the corresponding device in the device list, and the device capability level of each device is accurately inquired after the video call is connected, so that whether the display device needs to be matched with a target terminal or not is accurately determined, and which slave device is selected as the target terminal.

In some embodiments, after the display device or the terminal device synchronizes the device information to the server, the server may identify a device identifier included in the device information, and determine whether the device identifier is stored in the device list; if the device identification is stored, the device capability level corresponding to the device identification is directly updated, which indicates that the device corresponding to the device identification is stored in the device list before; and if the device identification is not stored before, indicating that the device corresponding to the device identification belongs to the newly added slave device, inserting a new information record into the device list to record the device information of the newly added slave device.

In some embodiments, for example, the device used by the user a includes a display device a1, a terminal device a2, and A3, the user may set the capability levels of the devices a1, a2, and A3 in a video call application, the setting result of each device capability level may be synchronized to the server, the server maintains a device list of the user a, the device list records the capability levels currently set by each device used by the user a, after the user B receives a video call initiated by the user a, if the server queries that the display device a1 only has audio and video playing capability and does not have audio collecting and video collecting capability, a target terminal is obtained from the terminal device a2 and the terminal device A3, for example, the terminal device a2 has audio and video collecting capability, the terminal device A3 only has audio collecting capability but does not have video collecting capability, then the terminal device a2 is more adapted to the display device a1, the terminal device A2 is selected as a target terminal, and the terminal device A2 collects the audio and video data of the user A and pushes the data to a server; the display device A1 simultaneously pulls the second audio and video data of the user B and the first audio and video data collected by the terminal device A2 from the server, and plays and displays the first audio and video data and the second audio and video data at the local terminal, so that the user A can see the video pictures of the user A and the user B in different windows and hear the voices of the user A and the user B; the display device of the user B at the opposite end pulls the first audio and video data collected by the terminal device A2 from the server, so that the video picture of the user A can be seen and the voice of the user A can be heard, the effectiveness of two-way video call of the users at two ends can be realized when the collection capability of the display device A1 of the user A is lacked, the user B can not sense the process that the opposite end uses the master-slave device to cooperate to complete the audio and video push-pull stream, the experience of the video call is not different, the performance of an idle old terminal can be fully exerted, the cost of a display device purchasing collection device is saved through the function combination of the master-slave device, the physical connection between the display device and the terminal is not needed, the use scene is not required to be restricted in the same Wifi network environment like call screen projection, the information can be sent to the slave device in a long connection message push mode, and the network competiveness and disinter, potential safety hazards are reduced, the display equipment end is not required to be equal to the capacity of the target terminal, and only the complementarity of the capacities is considered, so that more convenient and video interactive service is provided for users.

In some embodiments, the terminal device and the display device do not directly interact in communication, but through an intermediate server. The video call application of the terminal equipment is in a running state at all times so as to receive the information instruction sent by the server immediately and keep the real-time communication between the terminal equipment and the server. When the terminal equipment is selected as a target terminal, the server sends control information to the target terminal, and the control information can carry an access request to request the target terminal to access a video call, so that the collected audio and video stream can be conveniently pushed to the server; the target terminal receives the control information sent by the server, and since the target terminal is the slave device, the user does not need to confirm answering, the target terminal can directly respond to the access request, join in the video call, call the own camera, microphone and other acquisition devices, acquire the video data of the user in the shooting field of view and acquire the audio data of the user, so as to obtain the audio and video data, and upload the audio and video data to the server (i.e. stream pushing). After receiving the audio and video data sent by the target terminal, the server can send a message notification to the main device associated with the target terminal, so that the user can know that the audio and video acquisition of the target terminal to the user is started.

In some embodiments, since the terminal device consumes power to run the application and associated hardware, abnormal situations such as shutdown and the like of the target terminal due to sudden no electricity in the collection and stream pushing processes may occur, at this time, the target terminal stream pushing logic is interrupted to be executed, so that the server can not continuously receive the audio and video data collected by the target terminal, the server may send a prompt message to the display device (i.e., the master device) associated with the target terminal to prompt the user a "target terminal is abnormal, please attempt reconnection", and the server sends the information that the user A exits the current video call to the opposite terminal equipment so that the opposite terminal user B knows that the user A exits the call, that is, in this case, the video call between the user a and the user B can be automatically terminated, and the user B does not perceive that the user a uses the target terminal and that the target terminal is abnormal. After the user A checks the prompt information, the user A knows that the target terminal is abnormal, can adopt measures for eliminating the abnormality such as charging and starting the target terminal, restart the video call application in the running target terminal, and continue to perform video call interaction according to the mode after the display equipment re-accesses the virtual room of the video call; or after the current target terminal abnormally exits, the server additionally selects a new target terminal from other slave devices in the device list and switches the new target terminal to control the new target terminal to continue to acquire audio and video data, namely, the current target terminal is quickly switched to other standby terminals meeting the acquisition capacity when abnormally exits, so that the problem of long time consumption for reconnection of the target terminal is solved.

In some embodiments, the device capability levels of the slave devices in the device list may be the same or different, and when the device capability levels of the slave devices are different, the target terminal may be screened according to the device capability level of the associated master device; when two or more slave devices have the same capability level and are sufficient to compensate for the missing acquisition capability of the master device, the selection may be performed according to a preset priority, for example, according to a time sequence of storing the device list, or according to a frequency of counting the selected target terminals, or according to a time of a latest update of the capability level of the device. The priority of selecting the target terminal is not limited.

For example, the capability level of the display device a1 as a master device is "video display capability Y, audio playback capability Y, video capture capability N, audio capture capability N", the master device has three slave devices, which are respectively the terminal device a2, the terminal device A3 and the terminal device a4, the capability level of the terminal device a2 is "video display capability N, audio playback capability N, video capture capability Y, audio capture capability Y", the capability level of the terminal device A3 is "video display capability N, audio playback capability N, video capture capability Y, audio capture capability Y", the capability level of the terminal device a4 is "video display capability N, audio playback capability N, video capture capability Y, audio capture capability N", the visible terminal device a2 and the terminal device A3 all satisfy the condition of becoming a target terminal, and the capability levels of the terminal device a2 and the terminal device A3 are the same, the time when the terminal device a2 and the terminal device A3 were stored in the device list is inquired, and the terminal device a2 is found to be accessed earlier than the terminal device A3, and the terminal device a2 is selected as the target terminal.

In the foregoing embodiments, the server automatically filters and matches the target terminal according to the device list dynamically maintained by the server and the capability level of the master device. In some embodiments, the user can also be supported to select the target terminal by self, the server can synchronize a device list maintained by the server to the display device end, when the display device end initiates or answers a video call, the device capability level of the server is inquired, if the server does not have the video and/or audio acquisition capability, the device list is displayed in the application interface, and the device information of all master and slave devices corresponding to the current user account is displayed in the device list, so that reference and basis are provided for the user to select the target terminal by self.

In some embodiments, for a first display device of an originating video call end, when receiving a video call request originating to a target object, the first display device queries its device capability level, if it is queried that it does not have video and/or audio acquisition capability, the first display device displays a device list corresponding to an originating user on an application interface, when the first display device receives a selection operation of a target terminal (named as a first target terminal for easy differentiation) in the device list by a user, the first display device may carry device information of the first target terminal selected by a user in a user-defined manner in the video call request sent to a server, after receiving the video call request, the server does not need to automatically match the first target terminal of the first display device end again if it is recognized that the video call request carries the device information, but issues control information according to the first target terminal indicated by the device information, controlling a first target terminal to acquire first audio and video data of an initiator user; and if the server does not recognize the equipment information from the video call request, the target terminal is automatically matched and acquired from the equipment list maintained by the server subsequently.

In some embodiments, for a second display device of a target object end invited to join a video call by an initiator user, after receiving a video call request sent by a server, displaying an incoming call prompt interface on an application interface, where the incoming call prompt interface is provided with an answering control and a rejecting control. Then, the method is refined into three cases:

firstly, when a target object clicks a rejection control, the second display device feeds back information of the rejection state to the server, and after the server receives the information of the rejection state, the server sends the information of the rejection state to the first display device to inform an initiator user that the target object rejects the video call, and the video call is cancelled; secondly, the target object does not click the answering control or the rejection control within the preset time, namely the server does not receive any state information fed back by the second display device within the preset time, the server sends invitation overtime state information to the first display device, and the video call is cancelled; thirdly, when the target object clicks the answering control, the second display device inquires the self device capability level, if the second display device does not have the video and/or audio acquisition capability, the second display device displays a device list corresponding to the receiver user on the application interface, and after receiving the selection operation of the receiver user to the target terminal (named as a second target terminal for the convenience of distinguishing) in the device list, the answering state information sent to the server can carry the equipment information of the second target terminal selected by the user of the answering party in a self-defined way, after the server receives the answering state information, if the device information is identified, the second target terminal of the second display device end does not need to be matched automatically, but can directly send control information to the acquired second target terminal to control the second target terminal to acquire second audio/video data of the user of the receiver.

In the application, an initiating terminal and an invited terminal are distinguished by utilizing a first terminal and a second terminal, wherein the prefix of partial technical characteristics related to the initiating terminal is added with the first terminal, and the prefix of partial technical characteristics related to the invited terminal/answering terminal is added with the second terminal, for example, the audio and video data of an initiating party user is the first audio and video data, and a slave device matched with a first display device is a first target terminal; and the audio and video data of the invited end user is second audio and video data, and the slave device matched with the second display device is a second target terminal. By analogy, the description is not repeated here.

In some embodiments, during a video call, hang-up controls are generally disposed on call interfaces of the first display device and the second display device. If only two call members exist in the virtual room, namely one-to-one video call, when a user at any end clicks the hang-up control, the first display device and the second display device need to execute the hang-up video call logic, cancel acquisition and stream pushing of the audio and video stream at the local end and cancel pulling of the audio and video stream at the opposite end, and therefore the current video call is quitted.

In some embodiments, for the one-to-one video call mode, if the first display device does not have a video and/or audio acquisition capability and the second display device also has an audio and video acquisition capability, when the first display device receives an operation of hanging up the video call by the initiator user, it is necessary to stop receiving the first audio and video data acquired by the first target terminal and the second audio and video data of the target object (i.e., stop pulling), and at the same time, send first hang-up indication information to the server; the server receives the first hang-up indication information, namely knows that the first display equipment end hangs up the call and exits the virtual room, and then sends first hang-up state information to second display equipment (namely opposite end equipment) and second hang-up state information to the first target terminal; after receiving the first hang-up state information, the second display device controls a self acquisition device to stop acquiring second audio and video data and stop receiving the first audio and video data, namely the second display device ends terminate acquisition, push flow and pull flow of audio and video streams and synchronously quit the current video call; after the first target terminal receives the second hang-up state information, the first target terminal controls the acquisition device to stop acquiring the first audio and video data, and then the first target terminal stops uploading the audio and video stream of the initiator user to the server, so that the first target terminal is prevented from executing invalid acquisition and stream pushing after the video call is finished. The push flow and the pull flow related in the embodiments of the present application are completed by opening the corresponding data transmission channels, and when the push flow and the pull flow are stopped, only the corresponding data transmission channels need to be closed.

In some embodiments, for the one-to-one video call mode, if the first display device and the second display device both have audio and video acquisition capabilities, when the first display device receives an operation of hanging up a video call from an initiator user, the first display device needs to control its own acquisition device to stop acquiring first audio and video data, and stop receiving second audio and video data of a target object (i.e., stop push-pull stream), and send second hanging up indication information to the server; the server receives the second hang-up indication information, namely the server knows that the first display equipment end hangs up the call and exits the virtual room, and only sends the first hang-up state information to the second display equipment; and after receiving the first hang-up state information, the second display equipment controls the acquisition device to stop acquiring the second audio and video data and stop receiving the first audio and video data, namely the second display equipment ends stop acquiring, pushing and pulling the audio and video stream and synchronously quit the current video call.

In some embodiments, for the one-to-one video call mode, if the first display device has both audio and video acquisition capabilities and the second display device does not have video and/or audio acquisition capabilities, when the first display device receives an operation of hanging up the video call by the initiator user, the first display device needs to control its own acquisition device to stop acquiring the first audio and video data and stop receiving the second audio and video data of the target object (i.e., stop the push-pull stream), and simultaneously send third hang-up indication information to the server; the server receives the third hang-up indication information, namely knows that the first display equipment end has hung up the call and quits the virtual room, and then sends the third hang-up state information to the second display equipment and sends the fourth hang-up state information to the second target terminal; after receiving the third hang-up state information, the second display device stops receiving the first audio and video data of the initiator user, stops receiving the second audio and video data acquired by the second target terminal, and synchronously exits the current video call; and after receiving the fourth hang-up state information, the second target terminal controls the acquisition device to stop acquiring second audio and video data, and then the second target terminal stops uploading audio and video streams of the user of the receiver to the server, so that the second target terminal is prevented from executing invalid acquisition and stream pushing after the video call is finished.

In some embodiments, for the one-to-one video call mode, if neither the first display device nor the second display device has video and/or audio acquisition capability, when the first display device receives an operation of hanging up the video call by the initiator user, it needs to stop receiving the first audio and video data acquired by the first target terminal and the second audio and video data of the target object (i.e., stop pull stream), and at the same time, send fourth hang-up indication information to the server; the server receives the fourth hang-up indication information, namely knows that the first display equipment end has hung up the call and quits the virtual room, and then sends the second hang-up state information to the first target terminal, the third hang-up state information to the second display equipment and the fourth hang-up state information to the second target terminal; after receiving the second hang-up state information, the first target terminal controls a self acquisition device to stop acquiring the first audio and video data, and then the first target terminal stops uploading the first audio and video data of the initiator user to the server; after receiving the third hang-up state information, the second display device stops receiving the first audio and video data of the initiator user, stops receiving the second audio and video data acquired by the second target terminal, and synchronously exits the current video call; and after receiving the fourth hang-up state information, the second target terminal controls the acquisition device to stop acquiring second audio and video data, and then the second target terminal stops continuously uploading the second audio and video data of the user of the receiver to the server.

The foregoing four embodiments are processing logic for the initiating user to hang up the video call in the first display device, and the following describes processing logic for the receiving user to hang up the video call in the second display device.

In some embodiments, for the one-to-one video call mode, if the first display device does not have video and/or audio acquisition capability and the second display device also has audio and video acquisition capability, when the second display device receives an operation of hanging up the video call of the user of the receiver, the second display device needs to control its own acquisition device to stop acquiring the second audio and video data and stop receiving the first audio and video data of the user of the initiator, and simultaneously send fifth hang-up indication information to the server; the server receives the fifth hang-up indication information, namely knows that the second display device end has hung up the call and quits the virtual room, and then sends the fifth hang-up state information to the first display device (namely, the opposite-end device) and sends the second hang-up state information to the first target terminal; after receiving the fifth hang-up state information, the first display device stops receiving second audio and video data of the user of the receiver, stops receiving first audio and video data collected by the first target terminal, and synchronously exits the current video call; and after receiving the second hang-up state information, the first target terminal controls a self acquisition device to stop acquiring the first audio and video data, and then the first target terminal stops continuously uploading the first audio and video data of the initiator user to the server.

In some embodiments, for the one-to-one video call mode, if the first display device and the second display device both have audio and video acquisition capabilities, when the second display device receives an operation of hanging up the video call of the user of the receiver, the second display device needs to control its own acquisition device to stop acquiring the second audio and video data, and stop receiving the first audio and video data of the user of the initiator (i.e., stop the push-pull stream), and simultaneously send sixth hanging up indication information to the server; the server receives the sixth hang-up indication information, namely knows that the second display equipment end has hung up the call and quits the virtual room, and then sends sixth hang-up state information to the first display equipment; and after the first display device receives the sixth hang-up state information, controlling the acquisition device to stop acquiring the first audio and video data and stop receiving the second audio and video data, namely, the first display device ends terminate acquisition, push streaming and pull streaming of the audio and video stream of the first display device and synchronously quit the current video call.

In some embodiments, for the one-to-one video call mode, if the first display device has both audio and video acquisition capabilities and the second display device does not have video and/or audio acquisition capabilities, when the second display device receives an operation of hanging up the video call by the user of the receiver, the second display device stops receiving the first audio and video data, stops receiving the second audio and video data acquired by the second target terminal, and simultaneously sends seventh hang-up indication information to the server; the server receives the seventh hang-up indication information, namely knows that the second display equipment end has hung up the call and quits the virtual room, and then sends sixth hang-up state information to the first display equipment and fourth hang-up state information to the second target terminal; after receiving the sixth hang-up state information, the first display device controls a self acquisition device to stop acquiring the first audio and video data, stop receiving the second audio and video data and synchronously quit the current video call; and after receiving the fourth hang-up state information, the second target terminal controls the acquisition device to stop acquiring second audio and video data, and then the second target terminal stops uploading the second audio and video data of the user of the receiver to the server.

In some embodiments, for the one-to-one video call mode, if neither the first display device nor the second display device has video and/or audio acquisition capability, when the second display device receives an operation of hanging up the video call by the user of the receiver, it needs to stop receiving the second audio and video data acquired by the second target terminal and the first audio and video data of the user of the initiator (i.e., stop pull stream), and at the same time, send eighth hanging up indication information to the server; the server receives the eighth hang-up indication information, namely knows that the second display device end has hung up the call and quits the virtual room, and then sends fifth hang-up state information to the first display device, second hang-up state information to the first target terminal and fourth hang-up state information to the second target terminal; after receiving the fifth hang-up state information, the first display device stops receiving second audio and video data of the user of the receiver, stops receiving first audio and video data collected by the first target terminal, and synchronously exits the current video call; after receiving the second hang-up state information, the first target terminal controls a self acquisition device to stop acquiring the first audio and video data, and then the first target terminal stops continuously uploading the first audio and video data of the initiator user to the server; and after receiving the fourth hang-up state information, the second target terminal controls the acquisition device to stop acquiring second audio and video data, and then the second target terminal stops continuously uploading the second audio and video data of the user of the receiver to the server.

In some embodiments, if the number of call members in a virtual room exceeds two, i.e., a many-to-many video call, the hang-up logic is substantially similar to the aforementioned one-to-one video call, except that: if a certain call member A hangs up the call and exits the virtual room, the end where the call member A is located stops the processes of collecting, pushing and pulling the audio and video streams, while other call members remaining in the virtual room still keep the collection and pushing of the audio and video streams of the local end and stop pulling the audio and video streams of the call member A.

Referring to the above description, fig. 6 provides a video call method executed by a first display device (a party initiating a video call), where the method is used for the first display device, where an execution subject of the method is a controller, and the method mainly focuses on video call logic of the first display device under different acquisition capabilities, regardless of whether a second display device is matched with a slave device or not, and the method includes:

step S10, receiving an operation of initiating a video call to the target object, and sending a video call request to the server. The video call application is provided with a friend list, and a user can select one or more target objects from the friend list to initiate a video call request.

Step S20, inquiring whether the first display device has audio/video acquisition capacity; if the query result is negative, that is, the first display device does not have video and/or audio acquisition capability, executing step S301 and step S302; otherwise, if the first display device has the audio/video acquisition capability, executing step S401 to step S404.

Step S301, in response to receiving first connected state information sent by a server, second audio and video data of a target object and first audio and video data uploaded by a first target terminal are acquired from the server.

The method comprises the steps that after a server receives a video call request sent by first display equipment, target object information carried in the video call request is obtained, and then the video call request is sent to second display equipment corresponding to a target object; the second display equipment displays the incoming call prompt interface after receiving the video call request, the target object clicks an answering control in the incoming call prompt interface to put through the video call, and the second display equipment feeds back the information of the incoming call to the server; after receiving the incoming call answering information sent by the second display device, the server inquires the capability level of the first display device, acquires a first target terminal capable of compensating the missing acquisition capability of the first display device from a device list of an initiator user when the first display device is inquired to have no video and/or audio acquisition capability, and sends first control information to the first target terminal; after receiving the first control information, the first target terminal controls an acquisition device of the first target terminal to acquire first audio and video data and uploads the first audio and video data to the server, and the second display device or the second target terminal also acquires second audio and video data of a target object and uploads the second audio and video data to the server; when the server receives the first audio and video data, the server sends prompt information that the first target terminal collects the started audio and video data to the first display device, and after the server receives the first audio and video data and the second audio and video data, the data at the two ends are ready, the server can send first connection state information to the first display device to inform the first display device to start stream pulling, and pull the second audio and video data of a target object and the first audio and video data collected and uploaded by the first target terminal.

Step S302, controlling the display to respectively display the video data in the first audio and video data and the second audio and video data in different windows, and controlling the sound player to respectively play the audio data in the first audio and video data and the second audio and video data.

If the initiator user initiates a video call request to N (N is greater than or equal to 1) target objects, N +1 windows may be displayed in the video call interface, where 1 window displays video data of the initiator user, and the other N windows display video data of each target object, and each window may have a voice playing control for playing voice information of each call member. When a target object is accessed, the video picture is additionally displayed in the next new window, and the window which is not accessed to the call can display the prompt message waiting for access. The UI display and audio/video playing of the video call interface can be implemented by referring to the existing manner, and the present application is not particularly limited.

Step S401, controlling a collecting device of the first display device to collect first audio and video data and uploading the first audio and video data to a server.

And S402, controlling the display to display the video data in the first audio and video data and controlling the sound player to play the audio data in the first audio and video data.

The step S401 and the step S402 can be executed synchronously, and since the first display device can collect the first audio/video data, the first audio/video data can be directly displayed and played in the local video call interface after being collected.

In step S403, in response to receiving the second turned-on state information sent by the server, second audio and video data of the target object is acquired from the server.

After receiving the incoming call answering information fed back by the second display device, the server inquires that the first display device has both video acquisition capacity and audio acquisition capacity, and then after receiving second audio and video data of the target object, the server can send second connected state information to the first display device to inform the first display device that the data of the opposite end is ready, and can start pull streaming.

And step S404, controlling the display to display the video data in the second audio and video data in the new window, and controlling the sound player to play the audio data in the second audio and video data.

Referring to the above description, fig. 7 provides a video call method executed by a second display device (invitee) for the second display device, where an execution subject of the method is a controller, and the method mainly focuses on video call logic of the second display device under different acquisition capabilities, regardless of whether the first display device is matched with a slave device or not, and the method includes:

and step S50, when a video call request sent by the server is received, controlling the display to display an incoming call prompt interface.

And step S60, receiving the click operation of the answer control in the incoming call prompt interface.

The incoming call prompt interface comprises information (such as a user name, an avatar and the like) of the initiator user, an answering control, a rejecting control and the like. The processing logic in the two cases of the user clicking the reject control or the invitation timeout is already described in the foregoing related contents, and is not described here again. The embodiment mainly focuses on the situation that the user clicks the answering control to put through the video call.

Step S70, inquiring whether the second display device has audio/video acquisition capability; if the query result is negative, that is, the second display device does not have video and/or audio acquisition capability, executing step S801 and step S802; otherwise, if the second display device has the audio/video acquisition capability, the steps S901 to S904 are executed.

Step S801, in response to receiving the first indication information sent by the server, obtaining, from the server, first audio and video data of the initiator user and second audio and video data acquired by the second target terminal.

After receiving the click operation of the answering control, the second display device feeds back the answering incoming call information to the server; after receiving the incoming call answering information sent by the second display equipment, the server inquires the capability level of the second display equipment, acquires a second target terminal capable of compensating the acquisition capability missing from the equipment list of the user of the answering party when the second display equipment does not have the video and/or audio acquisition capability, and sends second control information to the second target terminal; after receiving the second control information, the second target terminal controls an acquisition device of the second target terminal to acquire second audio and video data and uploads the second audio and video data to the server, and the first display device or the first target terminal can acquire first audio and video data of the initiator user and upload the first audio and video data to the server; when the server receives the second audio and video data, the server sends prompt information that the second target terminal collects the started audio and video data to the second display device, and after the server receives the first audio and video data and the second audio and video data, the data at the two ends are ready, the server can send indication information to the second display device to inform the second display device to start the stream pulling, and pull the first audio and video data of the initiator user and the second audio and video data collected and uploaded by the second target terminal.

Step S802, controlling the display to respectively display the video data in the first audio and video data and the second audio and video data in different windows, and controlling the sound player to respectively play the audio data in the first audio and video data and the second audio and video data.

Step S901, controlling a collection device of the second display device to collect the second audio/video data, and uploading the second audio/video data to the server.

And step S902, controlling the display to display the video data in the second audio and video data, and controlling the sound player to play the audio data in the second audio and video data.

The step S901 and the step S902 can be executed synchronously, and since the second display device can collect the second audio/video data, the second audio/video data can be directly displayed and played in the local video call interface after being collected.

Step S903, responding to the second indication information sent by the server, and acquiring the first audio and video data of the initiator user from the server.

Step S904, controlling the display to display the video data in the first audio and video data in the new window, and controlling the sound player to play the audio data in the first audio and video data.

The first audio and video data of the initiator user can be acquired and uploaded by the first display device, or acquired and uploaded by the first target terminal, and no matter which acquisition mode the first audio and video data is, after the second display device is connected with a video call, if the server receives the first audio and video data, the initiator data is ready, and second indication information can be sent to the second display device to inform the second display device to start pull stream; and after receiving the second indication information, the second display device pulls the first audio and video data from the server, additionally displays the video data of the initiator user in a new window and plays the audio data of the initiator user. The processing logic for hanging up the video call between the first display device and the second display device is already described in the foregoing embodiments, and is not described herein again.

According to the technical scheme, the video call processing logic of the server side is as follows: receiving a video call request initiated by first display equipment to a target object, and sending the video call request to second display equipment so that the second display equipment displays an incoming call prompt interface; receiving incoming call answering information sent by second display equipment, and respectively inquiring the equipment capability levels of the first display equipment and the second display equipment; if at least one target device in the first display device and the second display device does not have video and/or audio acquisition capacity, traversing a device list to which the target device belongs, searching a target terminal (the target terminal is a first target terminal and/or a second target terminal) capable of making up the acquisition capacity of the target device from the device list, and sending control information to the target terminal, wherein the control information is used for indicating the target terminal to acquire audio and video data of a user at the end to which the target terminal belongs (according to the respective capacity levels at the two ends, the control information may be first control information and/or second control information, the first control information controls the first target terminal to acquire the first audio and video data, and the second control information controls the second target terminal to acquire the second audio and video data); when second audio and video data of a target object and first audio and video data of an initiator user are received, pull stream prompt information is respectively sent to first display equipment and second display equipment, and the pull stream prompt information is used for prompting the first display equipment and the second display equipment to start receiving the audio and video data required by the first display equipment and the second display equipment.

Taking the first display device end as an example, if the first display device has audio and video acquisition capability, the stream pulling prompt information is used for prompting the first display device to pull only the second audio and video data; and if the first display equipment does not have the video and/or audio acquisition capability, the stream pulling prompt information is used for prompting the first display equipment to pull the second audio and video data and the first audio and video data acquired by the first target terminal. The video call processing logic at the server side is a higher-level summary given after considering the capability levels of both ends of the first display device and the second display device, and when the server interacts with the terminals, the logic of the embodiment described with reference to fig. 6 to 15 can be specifically executed.

Fig. 8 to 15 show several interaction logics of a video call between a first display device, a server, and a second display device, where the first display device is a master device for initiating a video call request, the second display device is a master device used by a target object invited to join the video call by an initiator user, and each of the first display device and the second display device may have a plurality of slave devices that are idle. It should be noted that, in practical applications, if the display device itself has a complete capability level and can collect and play audio and video data, the slave device may not be equipped.

In some embodiments, fig. 8 and 9 are two interaction logics when the first display device does not have video and/or audio capture capability and the second display device also has audio and video capture capability, where the two interaction logics are different in processing logic for hanging up a video call, and specifically relate to the following program steps:

step (a1), the first display device side: receiving an operation of initiating a video call to a target object, sending a video call request to a server, inquiring that a first display device does not have video and/or audio acquisition capacity, cannot acquire first audio and video data per se, and needs to pull streaming after waiting for first connected state information of the server;

step (a2), the server: after receiving a video call request sent by first display equipment, acquiring target object information carried in the video call request, and sending the video call request to second display equipment corresponding to a target object;

step (a3), the second display device side: after receiving a video call request, displaying a call prompt interface, when receiving a click operation on an answering control, feeding back answering call information to a server, controlling an acquisition device of second display equipment to acquire second audio and video data, uploading the second audio and video data to the server, displaying video data in the second audio and video data, and playing audio data in the second audio and video data;

step (a4), the server: after receiving incoming call answering information sent by second display equipment, inquiring the capability levels of the first display equipment and the second display equipment, acquiring a first target terminal from an equipment list of an initiator user when the first display equipment is inquired to have no video and/or audio acquisition capability, sending first control information to the first target terminal, and inquiring that the second display equipment has audio and video capability, directly receiving second audio and video data uploaded by the second display equipment without acquiring a second target terminal;

step (a5), the first target terminal: after receiving the first control information, responding to the access request to join the video call, controlling a self acquisition device to acquire first audio and video data, and uploading the first audio and video data to a server;

step (a6), the server: when first audio and video data are received, sending a first target terminal to collect started prompt information to first display equipment;

step (a7), the server: sending the first connected state information to the first display device, and sending the second indication information to the second display device;

step (A8), the first display device side: responding to the received first connected state information sent by the server, acquiring second audio and video data of a target object from the server and first audio and video data uploaded by a first target terminal, respectively displaying video data in the first audio and video data and the second audio and video data in different windows, and respectively playing audio data in the first audio and video data and the second audio and video data;

step (a9), the second display device side: and responding to the received second indication information sent by the server, acquiring first audio and video data of the initiator user from the server, displaying video data in the first audio and video data in a new window, and playing audio data in the first audio and video data.

Referring to fig. 8, when the first display device hangs up the video call, the hang-up logic includes steps (a10) to (a 13):

step (a10), the first display device side: in response to receiving the operation of hanging up the video call of the initiator user, stopping receiving the first audio and video data acquired by the first target terminal and the second audio and video data of the target object, so as to quit the current video call and simultaneously send first hanging up indication information to the server;

step (a11), the server: receiving first hang-up indication information, sending first hang-up state information to second display equipment, and sending second hang-up state information to a first target terminal;

step (a12), the second display device side: after receiving the first hang-up state information, controlling a self acquisition device to stop acquiring second audio and video data, stopping receiving the first audio and video data, and synchronously quitting the current video call;

step (a13), the first target terminal: and after receiving the second hang-up state information, controlling a self acquisition device to stop acquiring the first audio and video data, thereby stopping continuously uploading the first audio and video data to the server and synchronously quitting the current video call.

Referring to fig. 9, when the second display device hangs up the video call, the hang-up logic includes steps (a14) to (a 17):

step (a14), the second display device side: in response to the operation of receiving the receiver user to hang up the video call, controlling a self acquisition device to stop acquiring second audio and video data and stop receiving first audio and video data of the initiator user, so that the current video call is quitted, and meanwhile, sending fifth hang-up indication information to the server;

step (a15), the server: after receiving the fifth hang-up indication information, sending fifth hang-up state information to the first display device, and sending second hang-up state information to the first target terminal;

step (a16), the first display device side: after receiving the fifth hang-up state information, stopping receiving second audio and video data of the user of the receiver, stopping receiving first audio and video data collected by the first target terminal, and synchronously quitting the current video call;

step (a17), the first target terminal: and after receiving the second hang-up state information, controlling a self acquisition device to stop acquiring the first audio and video data, thereby stopping continuously uploading the first audio and video data to the server and synchronously quitting the current video call.

In some embodiments, fig. 10 and 11 show two interaction logics when the first display device and the second display device both have audio and video capture capabilities, where the two interaction logics are different from each other in processing logic for hanging up a video call, and specifically relate to the following program steps:

step (B1), the first display device side: receiving an operation of initiating a video call to a target object, sending a video call request to a server, simultaneously inquiring that a first display device has audio and video acquisition capacity, controlling an acquisition device of the server to acquire first audio and video data, uploading the first audio and video data to the server, displaying video data in the first audio and video data, and playing audio data in the first audio and video data;

step (B2), the server: after a video call request sent by first display equipment is received, target object information carried in the video call request is obtained, the video call request is sent to second display equipment corresponding to a target object, and first audio and video data uploaded by the first display equipment are received;

step (B3), the second display device side: after receiving a video call request, displaying a call prompt interface, when receiving a click operation on an answering control, feeding back answering call information to a server, controlling an acquisition device of second display equipment to acquire second audio and video data, uploading the second audio and video data to the server, displaying video data in the second audio and video data, and playing audio data in the second audio and video data;

step (B4), the server: after receiving the incoming call answering information sent by the second display device, only the capability level of the second display device is inquired without inquiring the capability level of the first display device because the first audio and video data is received, and when the second display device is inquired to have audio and video acquisition capability, the second target terminal is not required to be acquired again, and the second audio and video data uploaded by the second display device is directly received;

step (B5), the server: when second audio and video data are received, second connected state information is sent to the first display equipment, and second indication information is sent to the second display equipment;

step (B6), the first display device side: responding to the received second connected state information sent by the server, acquiring second audio and video data of the target object from the server, displaying video data in the second audio and video data in a new window, and playing audio data in the second audio and video data;

step (B7), the second display device side: and responding to the received second indication information sent by the server, acquiring first audio and video data of the initiator user from the server, displaying video data in the first audio and video data in a new window, and playing audio data in the first audio and video data.

Referring to fig. 10, when the first display device hangs up the video call, the hang-up logic includes steps (B8) to (B10):

step (B8), the first display device side: in response to receiving the operation of hanging up the video call of the initiator user, controlling a self acquisition device to stop acquiring first audio and video data, stopping receiving second audio and video data of a target object, and simultaneously sending second hanging up indication information to a server;

step (B9), the server: receiving second hang-up indication information, and sending first hang-up state information to second display equipment;

step (B10), the second display device side: and after receiving the first hang-up state information, controlling a self acquisition device to stop acquiring the second audio and video data, stopping receiving the first audio and video data, and synchronously quitting the current video call.

Referring to fig. 11, when the second display device hangs up the video call, the hang-up logic includes steps (B11) to (B13):

step (B11), the second display device side: in response to receiving the operation of hanging up the video call of the receiver user, controlling a self acquisition device to stop acquiring second audio and video data and stop receiving first audio and video data of the initiator user, so that the current video call is quitted, and meanwhile, sending sixth hanging up indication information to the server;

step (B12), the server: after receiving the sixth hang-up indication information, sending sixth hang-up state information to the first display device;

step (B13), the first display device side: and after receiving the sixth hang-up state information, controlling a self acquisition device to stop acquiring the first audio and video data, stopping receiving the second audio and video data, and synchronously quitting the current video call.

In some embodiments, fig. 12 and 13 show two interaction logics when the first display device has both audio and video capture capabilities and the second display device does not have video and/or audio capture capabilities, where the two interaction logics are different from each other in processing logic for hanging up a video call, and specifically relate to the following program steps:

step (C1), the first display device side: receiving an operation of initiating a video call to a target object, sending a video call request to a server, simultaneously inquiring that a first display device has audio and video acquisition capacity, controlling an acquisition device of the server to acquire first audio and video data, uploading the first audio and video data to the server, displaying video data in the first audio and video data, and playing audio data in the first audio and video data;

step (C2), the server: after a video call request sent by first display equipment is received, target object information carried in the video call request is obtained, the video call request is sent to second display equipment corresponding to a target object, and first audio and video data uploaded by the first display equipment are received;

step (C3), the second display device side: after receiving a video call request, displaying a call prompt interface, when receiving a click operation on an answering control, feeding back the answering call information to the server, and inquiring that second display equipment does not have video and/or audio acquisition capability, cannot acquire second audio and video data per se and needs to pull streaming after waiting for first indication information of the server;

step (C4), the server: after receiving the incoming call answering information sent by the second display device, only the capability level of the second display device is inquired without inquiring the capability level of the first display device because the first audio and video data is received, and when the second display device does not have the video and/or audio acquisition capability, a second target terminal is obtained from a device list of an answering party user and second control information is sent to the second target terminal;

step (C5), the second target terminal: after receiving the second control information, responding to the access request to join the video call, controlling a self acquisition device to acquire second audio and video data, and uploading the second audio and video data to the server;

step (C6), the server: when second audio and video data are received, sending prompt information that a second target terminal collects start-up to a second display device;

step (C7), the server: sending the second switched-on state information to the first display device, and sending the first indication information to the second display device;

step (C8), the first display device side: responding to the received second connected state information sent by the server, acquiring second audio and video data of the target object from the server, displaying video data in the second audio and video data in a new window, and playing audio data in the second audio and video data;

step (C9), the second display device side: and responding to the received first indication information sent by the server, acquiring first audio and video data of the initiator user and second audio and video data acquired by a second target terminal from the server, respectively displaying video data in the first audio and video data and the second audio and video data in different windows, and respectively playing audio data in the first audio and video data and audio data in the second audio and video data.

Referring to fig. 12, when the first display device hangs up the video call, the hang-up logic includes steps (C10) to (C13):

step (C10), the first display device side: in response to receiving the operation of hanging up the video call of the initiator user, controlling a self acquisition device to stop acquiring first audio and video data, stopping receiving second audio and video data of a target object, and simultaneously sending third hanging up indication information to the server;

step (C11), the server: receiving third hang-up indication information, sending third hang-up state information to second display equipment, and sending fourth hang-up state information to a second target terminal;

step (C12), the second display device side: after receiving the third hang-up state information, stopping receiving the first audio and video data of the initiator user, stopping receiving the second audio and video data acquired by the second target terminal, and synchronously quitting the current video call;

step (C13), the second target terminal: and after receiving the fourth hang-up state information, controlling a self acquisition device to stop acquiring second audio and video data, and stopping the second target terminal from continuously uploading the second audio and video data to the server and synchronously quitting the current video call.

Referring to fig. 13, when the second display device hangs up the video call, the hang-up logic includes steps (C14) to (C17):

step (C14), the second display device side: in response to receiving the operation of hanging up the video call of the user of the receiver, stopping receiving the first audio and video data, stopping receiving the second audio and video data acquired by the second target terminal, and simultaneously sending seventh hanging up indication information to the server;

step (C15), the server: after receiving the seventh hang-up indication information, sending sixth hang-up state information to the first display device, and sending fourth hang-up state information to the second target terminal;

step (C16), the first display device side: after receiving the sixth hang-up state information, controlling a self acquisition device to stop acquiring the first audio and video data, stopping receiving the second audio and video data, and synchronously quitting the current video call;

step (C17), the second target terminal: and after the fourth hang-up state information is received, controlling the acquisition device to stop acquiring the second audio and video data, and stopping the second target terminal from continuously uploading the second audio and video data to the server.

In some embodiments, fig. 14 and 15 are two interaction logics when neither the first display device nor the second display device has video and/or audio capture capability, which are different from the processing logic for hanging up the video call, and specifically relate to the following program steps:

step (D1), the first display device side: receiving an operation of initiating a video call to a target object, sending a video call request to a server, simultaneously inquiring that a first display device does not have video and/or audio acquisition capacity, cannot acquire first audio and video data per se, and needs to pull streaming after waiting for first connected state information of the server;

step (D2), the server: after receiving a video call request sent by first display equipment, acquiring target object information carried in the video call request, and sending the video call request to second display equipment corresponding to a target object;

step (D3), the second display device side: after receiving a video call request, displaying a call prompt interface, when receiving a click operation on an answering control, feeding back the answering call information to the server, and inquiring that second display equipment does not have video and/or audio acquisition capability, cannot acquire second audio and video data per se and needs to pull streaming after waiting for first indication information of the server;

step (D4), the server: after receiving incoming call answering information sent by second display equipment, inquiring the capability levels of the first display equipment and the second display equipment, acquiring a first target terminal from an equipment list of an initiator user and sending first control information to the first target terminal when inquiring that the first display equipment does not have video and/or audio acquisition capability, and acquiring a second target terminal from an equipment list of an answering party user and sending second control information to the second target terminal when inquiring that the second display equipment does not have video and/or audio acquisition capability;

step (D5), the first target terminal: after receiving the first control information, responding to the access request to join the video call, controlling a self acquisition device to acquire first audio and video data, and uploading the first audio and video data to a server;

step (D6), the second target terminal: after receiving the second control information, responding to the access request to join the video call, controlling a self acquisition device to acquire second audio and video data, and uploading the second audio and video data to the server;

step (D7), the server: when first audio and video data are received, sending prompt information that the first target terminal is started to collect to first display equipment, and when second audio and video data are received, sending prompt information that the second target terminal is started to collect to second display equipment;

step (D8), the server: sending first connected state information to the first display device, and sending first indication information to the second display device;

step (D9), the first display device side: responding to the received first connected state information sent by the server, acquiring second audio and video data of a target object from the server and first audio and video data uploaded by a first target terminal, respectively displaying video data in the first audio and video data and the second audio and video data in different windows, and respectively playing audio data in the first audio and video data and the second audio and video data;

step (D10), the second display device side: and responding to the received first indication information sent by the server, acquiring first audio and video data of the initiator user and second audio and video data acquired by a second target terminal from the server, respectively displaying video data in the first audio and video data and the second audio and video data in different windows, and respectively playing audio data in the first audio and video data and audio data in the second audio and video data.

Referring to fig. 14, when the first display device hangs up the video call, the hang-up logic includes steps (D11) to (D15):

step (D11), the first display device side: in response to receiving the operation of hanging up the video call of the initiator user, stopping receiving first audio and video data acquired by a first target terminal and second audio and video data of a target object, and simultaneously sending fourth hanging up indication information to the server;

step (D12), the server: receiving fourth hang-up indication information, sending second hang-up state information to the first target terminal, sending third hang-up state information to the second display device, and sending fourth hang-up state information to the second target terminal;

step (D13), the second display device side: after receiving the third hang-up state information, stopping receiving the first audio and video data of the initiator user, stopping receiving the second audio and video data acquired by the second target terminal, and synchronously quitting the current video call;

step (D14), the first target terminal: after receiving the second hang-up state information, controlling a self acquisition device to stop acquiring the first audio and video data, thereby stopping continuously uploading the first audio and video data to the server and synchronously quitting the current video call;

step (D15), the second target terminal: and after receiving the fourth hang-up state information, controlling a self acquisition device to stop acquiring second audio and video data, and stopping the second target terminal from continuously uploading the second audio and video data to the server and synchronously quitting the current video call.

Referring to fig. 15, when the second display device hangs up the video call, the hang-up logic includes steps (D16) to (D20):

step (D16), the second display device side: in response to receiving the operation of hanging up the video call of the receiver user, stopping receiving second audio and video data acquired by a second target terminal and first audio and video data of the initiator user, and simultaneously sending eighth hanging up indication information to the server;

step (D17), the server: after receiving the eighth hang-up indication information, sending fifth hang-up state information to the first display device, sending second hang-up state information to the first target terminal, and sending fourth hang-up state information to the second target terminal;

step (D18), the first display device side: after receiving the fifth hang-up state information, stopping receiving second audio and video data of the user of the receiver, stopping receiving first audio and video data collected by the first target terminal, and synchronously quitting the current video call;

step (D19), the first target terminal: after receiving the second hang-up state information, controlling a self acquisition device to stop acquiring the first audio and video data, thereby stopping continuously uploading the first audio and video data to the server and synchronously quitting the current video call;

step (D20), the second target terminal: and after receiving the fourth hang-up state information, controlling a self acquisition device to stop acquiring second audio and video data, and stopping the second target terminal from continuously uploading the second audio and video data to the server and synchronously quitting the current video call.

It should be noted that, because the first display device and the second display device each have a dynamically adjustable capability level, and the number and the capability level of the slave devices equipped in each end display device may also be changed, there are various application scenarios in the specific implementation, that is, there are interaction processing logics for more video calls, and the present invention is not limited to the foregoing embodiments. With reference to the interactive logic in fig. 8 to fig. 15, the interactive processing logic of the video call can be adaptively adjusted according to the actual application scenario, which is not listed in this application. In addition, except for the video call method described in the present application, other contents related to the video call application can be referred to the prior art adaptively.

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. In a specific implementation, the invention also provides a computer storage medium, which can store a program. When the computer storage medium is located in any one of the first/second display devices, the server, and the terminal device, the program when executed may include program steps involved in a video call method that the controller of any one of the respective terminals is configured to perform. The computer storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM) or a Random Access Memory (RAM).

In the present application, the single-ended devices and the video call method executed by the configuration thereof are listed in the interaction logic among the aforementioned multi-ended devices, and the same and similar parts among the embodiments may be referred to each other, and the related contents are not described again.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

The foregoing description, for purposes of explanation, has been presented in conjunction with specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A display device, comprising:

a display for playing video data;

a sound player for playing audio data;

a controller for performing:

2. The display device of claim 1, wherein after initiating the video call, the controller is further configured to perform:

if the display equipment has video and audio acquisition capacity, controlling an acquisition device of the display equipment to acquire first audio and video data, uploading the first audio and video data to a server, controlling a display to display the video data in the first audio and video data, and controlling a sound player to play the audio data in the first audio and video data;

responding to the received second switched-on state information sent by the server, and acquiring second audio and video data of the target object from the server; the second connected state information is sent after the server inquires that the display equipment has video and audio acquisition capacity after the opposite terminal equipment is connected with the video call and receives second audio and video data of the target object;

and controlling the display to display the video data in the second audio and video data in the new window, and controlling the sound player to play the audio data in the second audio and video data.

3. The display device according to claim 1, wherein the controller is further configured to perform, when initiating a video call:

when the display equipment does not have the video and/or audio acquisition capacity, controlling a display to display an equipment list; the device list records device information of the display device and device information of at least one terminal device used by the initiator user and used for logging in the same video application account with the display device; the equipment information comprises an equipment identifier and an equipment capability level, wherein the equipment capability level is used for indicating whether the equipment has audio playing capability, video display capability, video acquisition capability and audio acquisition capability;

and receiving a selection operation of a user on the first target terminal in the equipment list, and sending a video call request carrying equipment information of the first target terminal to the server so that the server can acquire the first target terminal selected by the initiator user.

4. The display device according to claim 2, wherein the controller is further configured to perform:

if the display device does not have the video and/or audio acquisition capacity, when receiving the operation of hanging up the video call of the initiator user, stopping receiving the first audio and video data and the second audio and video data, and simultaneously sending first hanging up indication information to the server, wherein the first hanging up indication information is used for indicating the server to send first hanging up state information to the opposite terminal device and sending second hanging up state information to the first target terminal; the first hang-up state information is used for indicating the opposite terminal equipment to stop collecting the second audio and video data and stop receiving the first audio and video data; the second hang-up state information is used for indicating the first target terminal to stop collecting the first audio and video data;

or, if the display device has video and audio acquisition capabilities, when receiving an operation of hanging up a video call by an initiator user, controlling an acquisition device of the display device to stop acquiring the first audio and video data, stop receiving the second audio and video data, and send second hanging up indication information to the server, wherein the second hanging up indication information is used for indicating the server to only send the first hanging up state information to the opposite terminal device.

5. A display device, comprising:

a display for playing video data;

a sound player for playing audio data;

a controller for performing:

receiving click operation of an answering control in the incoming call prompt interface, and if the display equipment does not have video and/or audio acquisition capacity, responding to first indication information sent by a server, and acquiring first audio and video data of an initiator user and second audio and video data acquired by a second target terminal from the server; the indication information is sent after the server inquires that the display equipment does not have video and/or audio acquisition capacity after the display equipment is connected with a video call, acquires a second target terminal which can compensate the acquisition capacity of the display equipment, receives second audio and video data which are acquired and uploaded by the second target terminal according to second control information sent by the server, and receives first audio and video data of an initiator user; the second control information is used for indicating a second target terminal to control an acquisition device of the second target terminal to acquire second audio and video data;

6. The display device according to claim 5, wherein after receiving a click operation on an answer control in the incoming call prompt interface, the controller is further configured to:

when the display equipment has video and audio acquisition capacity, controlling an acquisition device of the display equipment to acquire second audio and video data, uploading the second audio and video data to a server, controlling a display to display the video data in the second audio and video data, and controlling a sound player to play the audio data in the second audio and video data;

responding to second indication information sent by a server, and acquiring first audio and video data of an initiator user from the server;

and controlling the display to display the video data in the first audio and video data in the new window, and controlling the sound player to play the audio data in the first audio and video data.

7. A server, comprising:

a controller for performing:

8. A video call method, comprising:

9. A video call method, comprising:

10. A video call method for a server, comprising: