CN111669662A

CN111669662A - Display device, video call method and server

Info

Publication number: CN111669662A
Application number: CN202010635659.XA
Authority: CN
Inventors: 王大勇; 王卫明; 吴超
Original assignee: Hisense Visual Technology Co Ltd
Current assignee: Hisense Visual Technology Co Ltd
Priority date: 2020-07-03
Filing date: 2020-07-03
Publication date: 2020-09-15

Abstract

The embodiment of the application provides a display device, a video call method and a server, wherein the display device comprises: the camera is used for acquiring a first depth image; a display; a controller connected to the display and the camera, respectively, the controller configured to: sending a mixed call request to a called terminal in response to receiving a control signal for indicating mixed call input by a user; sending a first depth image to a server according to a received confirmation signal of a called terminal; and controlling a display to display a first mixed image according to the received mixed image from the server, wherein the first mixed image comprises a depth image obtained by rendering a second person into a first depth image by the server according to the depth information of the second person, the second person is a person in the second depth image, and the second depth image is a depth image collected by the called terminal. The method and the device solve the problem that the two parties of the video call are respectively in different backgrounds, and improve user experience.

Description

Display device, video call method and server

Technical Field

The application relates to the technical field of display equipment, in particular to display equipment, a video call method and a server.

Background

Under the current fast-paced life style, the chances of meeting friends and family are gradually reduced, and more emotional contacts are carried out by video calls. At present, a mobile device with a camera can realize video call by installing a video call application program, however, a display screen of the mobile device is small, and the mobile device is usually held by a hand, so that a user can only see the head portrait of a person of a call opposite side, and the call experience is poor. The intelligent television enables video calls based on the television by adding the camera assembly. The display screen of the intelligent television is large, and the user and the intelligent television can keep a certain distance so that the user can see more information of the other party in the call.

Disclosure of Invention

In order to solve the technical problem, the application provides a display device, a video call method and a server.

In a first aspect, the present application provides a display device comprising:

the camera is used for acquiring a first depth image;

a display for displaying a user interface and a selector in the user interface for indicating that an item is selected in the user interface;

a controller connected to the display and camera, respectively, the controller configured to:

sending a mixed call request to a called terminal in response to receiving a control signal for indicating mixed call input by a user;

sending the first depth image to a server according to the received confirmation signal of the called terminal;

and controlling the display to display the first mixed image according to the received mixed image from the server, wherein the first mixed image comprises a depth image obtained by rendering a second person into a first depth image by the server according to depth information of the second person, the second person is a person in a second depth image, and the second depth image is a depth image collected by the called terminal.

In some embodiments, the controller is further configured to:

in response to receiving a control signal for indicating switching context input by a user, sending a switching context request to the server;

and according to the received second mixed image from the server, controlling the display to switch the first mixed image into the second mixed image, or controlling the display to switch the first mixed image into the second mixed image, wherein the second mixed image comprises a depth image obtained by rendering a first person into a second depth image by the server according to the depth information of the first person, and the first person is a person in the first depth image.

In a second aspect, an embodiment of the present application provides a display device, including:

the camera is used for acquiring a second depth image;

responding to a received mixed call request of a calling terminal, and controlling the display to display second prompt information requesting mixed call;

responding to the received second prompt information control signal input by the user, and sending a confirmation signal to the server;

extracting depth information of a second person from the second depth image according to a person depth image request received from the server, and sending the depth information of the second person to the server;

and controlling the display to display a first mixed image according to the received first mixed image from the server, wherein the first mixed image comprises a depth image obtained by rendering a second person into a first depth image by the server according to the depth information of the second person, and the first depth image is a depth image collected by the calling terminal.

In some embodiments, the controller is further configured to: and sending the second depth image to the server, so that the server renders the first person into the second depth image according to the depth information of the first person to obtain a second mixed image, wherein the first person is a person in the first depth image, and the first depth image is the depth image acquired by the calling terminal.

In a third aspect, an embodiment of the present application provides a video call method, where the video call method includes:

sending the mixed call request of the calling terminal to the called terminal;

acquiring a first depth image acquired by a calling terminal according to a received confirmation signal of a called terminal;

acquiring depth information of a second person of the called terminal;

rendering the second person into the first depth image according to the depth information of the second person to obtain a first mixed image;

and respectively sending the first mixed image to the calling terminal and the called terminal.

In a fourth aspect, an embodiment of the present application provides a server, where the server is configured to:

sending the mixed call request of the calling terminal to the called terminal;

acquiring depth information of a second person of the called terminal;

The display device, the video call method and the server have the advantages that:

according to the embodiment of the application, the depth information of the two parties of the call is collected through the 3D camera module, the character of one party is rendered into the depth image of the other party according to the depth information of the two parties of the call, real-time display of the two parties of the call under the same real background is achieved, the problem that the characters of the two parties of the call on a call interface are in different backgrounds is solved, and the video call experience of a user is improved.

Drawings

In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.

Fig. 1 is a schematic diagram illustrating an operational scenario between a display device and a control apparatus according to some embodiments;

a block diagram of a hardware configuration of a display device 200 according to some embodiments is illustrated in fig. 2;

a block diagram of the hardware configuration of the control device 100 according to some embodiments is illustrated in fig. 3;

a schematic diagram of a software configuration in a display device 200 according to some embodiments is illustrated in fig. 4;

FIG. 5 illustrates an icon control interface display diagram of an application in the display device 200, according to some embodiments;

an AR hybrid call diagram according to some embodiments is illustrated in fig. 6;

a hybrid call interaction diagram according to some embodiments is illustrated in fig. 7;

a video call interface schematic according to some embodiments is illustrated in fig. 8;

a hybrid call interface schematic according to some embodiments is illustrated in fig. 9;

FIG. 10 is a schematic illustration of a hybrid call interface in accordance with further embodiments;

a flow diagram of a video call method according to some embodiments is illustrated in fig. 11.

Detailed Description

To make the objects, embodiments and advantages of the present application clearer, the following description of exemplary embodiments of the present application will clearly and completely describe the exemplary embodiments of the present application with reference to the accompanying drawings in the exemplary embodiments of the present application, and it is to be understood that the described exemplary embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

All other embodiments, which can be derived by a person skilled in the art from the exemplary embodiments described herein without inventive step, are intended to be within the scope of the claims appended hereto. In addition, while the disclosure herein has been presented in terms of one or more exemplary examples, it should be appreciated that aspects of the disclosure may be implemented solely as a complete embodiment.

It should be noted that the brief descriptions of the terms in the present application are only for the convenience of understanding the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first," "second," "third," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and are not necessarily intended to limit the order or sequence of any particular one, Unless otherwise indicated. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.

Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.

The term "module," as used herein, refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

The term "remote control" as used in this application refers to a component of an electronic device (such as the display device disclosed in this application) that is typically wirelessly controllable over a relatively short range of distances. Typically using infrared and/or Radio Frequency (RF) signals and/or bluetooth to connect with the electronic device, and may also include WiFi, wireless USB, bluetooth, motion sensor, etc. For example: the hand-held touch remote controller replaces most of the physical built-in hard keys in the common remote control device with the user interface in the touch screen.

The term "gesture" as used in this application refers to a user's behavior through a change in hand shape or an action such as hand motion to convey a desired idea, action, purpose, or result.

Fig. 1 is a schematic diagram illustrating an operation scenario between a display device and a control apparatus according to an embodiment. As shown in fig. 1, a user may operate the display device 200 through the mobile terminal 300 and the control apparatus 100.

In some embodiments, the control apparatus 100 may be a remote controller, and the communication between the remote controller and the display device includes an infrared protocol communication or a bluetooth protocol communication, and other short-distance communication methods, etc., and the display device 200 is controlled by wireless or other wired methods. The user may input a user command through a key on a remote controller, voice input, control panel input, etc. to control the display apparatus 200. Such as: the user can input a corresponding control command through a volume up/down key, a channel control key, up/down/left/right moving keys, a voice input key, a menu key, a power on/off key, etc. on the remote controller, to implement the function of controlling the display device 200.

In some embodiments, mobile terminals, tablets, computers, laptops, and other smart devices may also be used to control the display device 200. For example, the display device 200 is controlled using an application program running on the smart device. The application, through configuration, may provide the user with various controls in an intuitive User Interface (UI) on a screen associated with the smart device.

In some embodiments, the mobile terminal 300 may install a software application with the display device 200 to implement connection communication through a network communication protocol for the purpose of one-to-one control operation and data communication. Such as: the mobile terminal 300 and the display device 200 can establish a control instruction protocol, synchronize a remote control keyboard to the mobile terminal 300, and control the display device 200 by controlling a user interface on the mobile terminal 300. The audio and video content displayed on the mobile terminal 300 can also be transmitted to the display device 200, so as to realize the synchronous display function.

As also shown in fig. 1, the display apparatus 200 also performs data communication with the server 400 through various communication means. The display device 200 may be allowed to be communicatively connected through a Local Area Network (LAN), a Wireless Local Area Network (WLAN), and other networks. The server 400 may provide various contents and interactions to the display apparatus 200. Illustratively, the display device 200 receives software program updates, or accesses a remotely stored digital media library, by sending and receiving information, as well as Electronic Program Guide (EPG) interactions. The server 400 may be a cluster or a plurality of clusters, and may include one or more types of servers. Other web service contents such as video on demand and advertisement services are provided through the server 400.

The display device 200 may be a liquid crystal display, an OLED display, a projection display device. The particular display device type, size, resolution, etc. are not limiting, and those skilled in the art will appreciate that the display device 200 may be modified in performance and configuration as desired.

The display apparatus 200 may additionally provide an intelligent network tv function of a computer support function including, but not limited to, a network tv, an intelligent tv, an Internet Protocol Tv (IPTV), and the like, in addition to the broadcast receiving tv function.

A hardware configuration block diagram of a display device 200 according to an exemplary embodiment is exemplarily shown in fig. 2.

In some embodiments, at least one of the controller 250, the tuner demodulator 210, the communicator 220, the detector 230, the input/output interface 255, the display 275, the audio output interface 285, the memory 260, the power supply 290, the user interface 265, and the external device interface 240 is included in the display apparatus 200.

In some embodiments, a display 275 receives image signals originating from the first processor output and displays video content and images and components of the menu manipulation interface.

In some embodiments, the display 275, includes a display screen assembly for presenting a picture, and a driving assembly that drives the display of an image.

In some embodiments, the video content is displayed from broadcast television content, or alternatively, from various broadcast signals that may be received via wired or wireless communication protocols. Alternatively, various image contents received from the network communication protocol and sent from the network server side can be displayed.

In some embodiments, the display 275 is used to present a user-manipulated UI interface generated in the display apparatus 200 and used to control the display apparatus 200.

In some embodiments, a driver assembly for driving the display is also included, depending on the type of display 275.

In some embodiments, display 275 is a projection display and may also include a projection device and a projection screen.

In some embodiments, communicator 220 is a component for communicating with external devices or external servers according to various communication protocol types. For example: the communicator may include at least one of a Wifi chip, a bluetooth communication protocol chip, a wired ethernet communication protocol chip, and other network communication protocol chips or near field communication protocol chips, and an infrared receiver.

In some embodiments, the display apparatus 200 may establish control signal and data signal transmission and reception with the external control device 100 or the content providing apparatus through the communicator 220.

In some embodiments, the user interface 265 may be configured to receive infrared control signals from a control device 100 (e.g., an infrared remote control, etc.).

In some embodiments, the detector 230 is a signal used by the display device 200 to collect an external environment or interact with the outside.

In some embodiments, the detector 230 includes a light receiver, a sensor for collecting the intensity of ambient light, and parameters changes can be adaptively displayed by collecting the ambient light, and the like.

In some embodiments, the detector 230 may further include an image collector, such as a camera, etc., which may be configured to collect external environment scenes, collect attributes of the user or gestures interacted with the user, adaptively change display parameters, and recognize user gestures, so as to implement a function of interaction with the user.

In some embodiments, the detector 230 may also include a temperature sensor or the like, such as by sensing ambient temperature.

In some embodiments, the display apparatus 200 may adaptively adjust a display color temperature of an image. For example, the display apparatus 200 may be adjusted to display a cool tone when the temperature is in a high environment, or the display apparatus 200 may be adjusted to display a warm tone when the temperature is in a low environment.

In some embodiments, the detector 230 may also be a sound collector or the like, such as a microphone, which may be used to receive the user's voice. Illustratively, a voice signal including a control instruction of the user to control the display device 200, or to collect an ambient sound for recognizing an ambient scene type, so that the display device 200 can adaptively adapt to an ambient noise.

In some embodiments, as shown in fig. 2, the input/output interface 255 is configured to allow data transfer between the controller 250 and external other devices or other controllers 250. Such as receiving video signal data and audio signal data of an external device, or command instruction data, etc.

In some embodiments, the external device interface 240 may include, but is not limited to, the following: the interface can be any one or more of a high-definition multimedia interface (HDMI), an analog or data high-definition component input interface, a composite video input interface, a USB input interface, an RGB port and the like. The plurality of interfaces may form a composite input/output interface.

In some embodiments, as shown in fig. 2, the tuning demodulator 210 is configured to receive a broadcast television signal through a wired or wireless receiving manner, perform modulation and demodulation processing such as amplification, mixing, resonance, and the like, and demodulate an audio and video signal from a plurality of wireless or wired broadcast television signals, where the audio and video signal may include a television audio and video signal carried in a television channel frequency selected by a user and an EPG data signal.

In some embodiments, the frequency points demodulated by the tuner demodulator 210 are controlled by the controller 250, and the controller 250 can send out control signals according to user selection, so that the modem responds to the television signal frequency selected by the user and modulates and demodulates the television signal carried by the frequency.

In some embodiments, the broadcast television signal may be classified into a terrestrial broadcast signal, a cable broadcast signal, a satellite broadcast signal, an internet broadcast signal, or the like according to the broadcasting system of the television signal. Or may be classified into a digital modulation signal, an analog modulation signal, and the like according to a modulation type. Or the signals are classified into digital signals, analog signals and the like according to the types of the signals.

In some embodiments, the controller 250 and the modem 210 may be located in different separate devices, that is, the modem 210 may also be located in an external device of the main device where the controller 250 is located, such as an external set-top box. Therefore, the set top box outputs the television audio and video signals modulated and demodulated by the received broadcast television signals to the main body equipment, and the main body equipment receives the audio and video signals through the first input/output interface.

In some embodiments, the controller 250 controls the operation of the display device and responds to user operations through various software control programs stored in memory. The controller 250 may control the overall operation of the display apparatus 200. For example: in response to receiving a user command for selecting a UI object to be displayed on the display 275, the controller 250 may perform an operation related to the object selected by the user command.

In some embodiments, the object may be any one of selectable objects, such as a hyperlink or an icon. Operations related to the selected object, such as: displaying an operation connected to a hyperlink page, document, image, or the like, or performing an operation of a program corresponding to the icon. The user command for selecting the UI object may be a command input through various input means (e.g., a mouse, a keyboard, a touch pad, etc.) connected to the display apparatus 200 or a voice command corresponding to a voice spoken by the user.

As shown in fig. 2, the controller 250 includes at least one of a Random Access Memory 251 (RAM), a Read-Only Memory 252 (ROM), a video processor 270, an audio processor 280, other processors 253 (e.g., a Graphics Processing Unit (GPU), a central Processing Unit 254 (CPU), a Communication Interface (Communication Interface), and a Communication Bus 256(Bus), which connects the respective components.

In some embodiments, RAM 251 is used to store temporary data for the operating system or other programs that are running

In some embodiments, ROM252 is used to store instructions for various system boots.

In some embodiments, the ROM252 is used to store a Basic Input Output System (BIOS). The system is used for completing power-on self-test of the system, initialization of each functional module in the system, a driver of basic input/output of the system and booting an operating system.

In some embodiments, when the power-on signal is received, the display device 200 starts to power up, the CPU executes the system boot instruction in the ROM252, and copies the temporary data of the operating system stored in the memory to the RAM 251 so as to start or run the operating system. After the start of the operating system is completed, the CPU copies the temporary data of the various application programs in the memory to the RAM 251, and then, the various application programs are started or run.

In some embodiments, CPU processor 254 is used to execute operating system and application program instructions stored in memory. And executing various application programs, data and contents according to various interactive instructions received from the outside so as to finally display and play various audio and video contents.

In some example embodiments, the CPU processor 254 may comprise a plurality of processors. The plurality of processors may include a main processor and one or more sub-processors. A main processor for performing some operations of the display apparatus 200 in a pre-power-up mode and/or operations of displaying a screen in a normal mode. One or more sub-processors for one operation in a standby mode or the like.

In some embodiments, the graphics processor 253 is used to generate various graphics objects, such as: icons, operation menus, user input instruction display graphics, and the like. The display device comprises an arithmetic unit which carries out operation by receiving various interactive instructions input by a user and displays various objects according to display attributes. And the system comprises a renderer for rendering various objects obtained based on the arithmetic unit, wherein the rendered objects are used for being displayed on a display.

In some embodiments, the video processor 270 is configured to receive an external video signal, and perform video processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, image synthesis, and the like according to a standard codec protocol of the input signal, so as to obtain a signal that can be displayed or played on the direct display device 200.

In some embodiments, video processor 270 includes a demultiplexing module, a video decoding module, an image synthesis module, a frame rate conversion module, a display formatting module, and the like.

The demultiplexing module is used for demultiplexing the input audio and video data stream, and if the input MPEG-2 is input, the demultiplexing module demultiplexes the input audio and video data stream into a video signal and an audio signal.

And the video decoding module is used for processing the video signal after demultiplexing, including decoding, scaling and the like.

And the image synthesis module is used for carrying out superposition mixing processing on the GUI signal input by the user or generated by the user and the video image after the zooming processing by the graphic generator so as to generate an image signal for display.

The frame rate conversion module is configured to convert an input video frame rate, such as a 60Hz frame rate into a 120Hz frame rate or a 240Hz frame rate, and the normal format is implemented in, for example, an interpolation frame mode.

The display format module is used for converting the received video output signal after the frame rate conversion, and changing the signal to conform to the signal of the display format, such as outputting an RGB data signal.

In some embodiments, the graphics processor 253 and the video processor may be integrated or separately configured, and when the graphics processor and the video processor are integrated, the graphics processor and the video processor may perform processing of graphics signals output to the display, and when the graphics processor and the video processor are separately configured, the graphics processor and the video processor may perform different functions, respectively, for example, a GPU + frc (frame Rate conversion) architecture.

In some embodiments, the audio processor 280 is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform noise reduction, digital-to-analog conversion, and amplification processes to obtain an audio signal that can be played in a speaker.

In some embodiments, video processor 270 may comprise one or more chips. The audio processor may also comprise one or more chips.

In some embodiments, the video processor 270 and the audio processor 280 may be separate chips or may be integrated together with the controller in one or more chips.

In some embodiments, the audio output, under the control of controller 250, receives sound signals output by audio processor 280, such as: the speaker 286, and an external sound output terminal of the sound generating device that can output to the external device, in addition to the speaker carried by the display device 200 itself, such as: external sound interface or earphone interface, etc., and may also include a near field communication module in the communication interface, for example: and the Bluetooth module is used for outputting sound of the Bluetooth loudspeaker.

The power supply 290 supplies power to the display device 200 from the power input from the external power source under the control of the controller 250. The power supply 290 may include a built-in power supply circuit installed inside the display apparatus 200, or may be a power supply interface installed outside the display apparatus 200 to provide an external power supply in the display apparatus 200.

A user interface 265 for receiving an input signal of a user and then transmitting the received user input signal to the controller 250. The user input signal may be a remote controller signal received through an infrared receiver, and various user control signals may be received through the network communication module.

In some embodiments, the user inputs a user command through the control apparatus 100 or the mobile terminal 300, the user input interface responds to the user input through the controller 250 according to the user input, and the display device 200 responds to the user input through the controller 250.

In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed on the display 275, and the user input interface receives the user input commands through the Graphical User Interface (GUI). Alternatively, the user may input the user command by inputting a specific sound or gesture, and the user input interface receives the user input command by recognizing the sound or gesture through the sensor.

In some embodiments, a "user interface" is a media interface for interaction and information exchange between an application or operating system and a user that enables conversion between an internal form of information and a form that is acceptable to the user. A commonly used presentation form of the User Interface is a Graphical User Interface (GUI), which refers to a User Interface related to computer operations and displayed in a graphical manner. It may be an interface element such as an icon, a window, a control, etc. displayed in the display screen of the electronic device, where the control may include a visual interface element such as an icon, a button, a menu, a tab, a text box, a dialog box, a status bar, a navigation bar, a Widget, etc.

The memory 260 includes a memory storing various software modules for driving the display device 200. Such as: various software modules stored in the first memory, including: at least one of a basic module, a detection module, a communication module, a display control module, a browser module, and various service modules.

The base module is a bottom layer software module for signal communication between various hardware in the display device 200 and for sending processing and control signals to the upper layer module. The detection module is used for collecting various information from various sensors or user input interfaces, and the management module is used for performing digital-to-analog conversion and analysis management.

For example, the voice recognition module comprises a voice analysis module and a voice instruction database module. The display control module is used for controlling the display to display the image content, and can be used for playing the multimedia image content, UI interface and other information. And the communication module is used for carrying out control and data communication with external equipment. And the browser module is used for executing a module for data communication between browsing servers. And the service module is used for providing various services and modules including various application programs. Meanwhile, the memory 260 may store a visual effect map for receiving external data and user data, images of various items in various user interfaces, and a focus object, etc.

Fig. 3 exemplarily shows a block diagram of a configuration of the control apparatus 100 according to an exemplary embodiment. As shown in fig. 3, the control device 100 includes a controller 110, a communication interface 130, a user input/output interface, a memory, and a power supply.

The control apparatus 100 is configured to control the display device 200 and may receive an input operation instruction of a user and convert the operation instruction into an instruction recognizable and responsive by the display device 200, serving as an interaction intermediary between the user and the display device 200. Such as: the user operates the channel up/down key on the control device 100, and the display device 200 responds to the channel up/down operation.

In some embodiments, the control device 100 may be a smart device. Such as: the control apparatus 100 may install various applications that control the display device 200 according to user demands.

In some embodiments, as shown in fig. 1, a mobile terminal 300 or other intelligent electronic device may function similar to the control apparatus 100 after an application for manipulating the display device 200 is installed. Such as: the user may implement the function of controlling the physical keys of the apparatus 100 by installing an application, various function keys or virtual buttons of a graphical user interface available on the mobile terminal 300 or other intelligent electronic device.

The controller 110 includes a processor 112 and RAM 113 and ROM 114, a communication interface 130, and a communication bus. The controller is used for controlling the operation of the control device 100, as well as the communication cooperation among the internal components and the external and internal data processing functions.

The communication interface 130 enables communication of control signals and data signals with the display apparatus 200 under the control of the controller 110. Such as: the received user input signal is transmitted to the display apparatus 200. The communication interface 130 may include at least one of a WiFi chip 131, a bluetooth module 132, an NFC module 133, and other near field communication modules.

A user input/output interface 140, wherein the input interface includes at least one of a microphone 141, a touch pad 142, a sensor 143, keys 144, and other input interfaces. Such as: the user can realize a user instruction input function through actions such as voice, touch, gesture, pressing, and the like, and the input interface converts the received analog signal into a digital signal and converts the digital signal into a corresponding instruction signal, and sends the instruction signal to the display device 200.

The output interface includes an interface that transmits the received user instruction to the display apparatus 200. In some embodiments, the interface may be an infrared interface or a radio frequency interface. Such as: when the infrared signal interface is used, the user input instruction needs to be converted into an infrared control signal according to an infrared control protocol, and the infrared control signal is sent to the display device 200 through the infrared sending module. The following steps are repeated: when the rf signal interface is used, a user input command needs to be converted into a digital signal, and then the digital signal is modulated according to the rf control signal modulation protocol and then transmitted to the display device 200 through the rf transmitting terminal.

In some embodiments, the control device 100 includes at least one of a communication interface 130 and an input-output interface 140. The control device 100 is configured with a communication interface 130, such as: the WiFi, bluetooth, NFC, etc. modules may transmit the user input command to the display device 200 through the WiFi protocol, or the bluetooth protocol, or the NFC protocol code.

A memory 190 for storing various operation programs, data and applications for driving and controlling the control apparatus 200 under the control of the controller. The memory 190 may store various control signal commands input by a user.

And a power supply 180 for providing operation power support for each element of the control device 100 under the control of the controller. A battery and associated control circuitry.

In some embodiments, the system may include a Kernel (Kernel), a command parser (shell), a file system, and an application program. The kernel, shell, and file system together make up the basic operating system structure that allows users to manage files, run programs, and use the system. After power-on, the kernel is started, kernel space is activated, hardware is abstracted, hardware parameters are initialized, and virtual memory, a scheduler, signals and interprocess communication (IPC) are operated and maintained. And after the kernel is started, loading the Shell and the user application program. The application program is compiled into machine code after being started, and a process is formed.

Referring to fig. 4, in some embodiments, the system is divided into four layers, which are an Application (Applications) layer (abbreviated as "Application layer"), an Application Framework (Application Framework) layer (abbreviated as "Framework layer"), an Android runtime (Android runtime) and system library layer (abbreviated as "system runtime library layer"), and a kernel layer from top to bottom.

In some embodiments, at least one application program runs in the application program layer, and the application programs can be Window (Window) programs carried by an operating system, system setting programs, clock programs, camera applications and the like; or may be an application developed by a third party developer such as a hi program, a karaoke program, a magic mirror program, or the like. In specific implementation, the application packages in the application layer are not limited to the above examples, and may actually include other application packages, which is not limited in this embodiment of the present application.

The framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions. The application framework layer acts as a processing center that decides to let the applications in the application layer act. The application program can access the resource in the system and obtain the service of the system in execution through the API interface

As shown in fig. 4, in the embodiment of the present application, the application framework layer includes a manager (Managers), a Content Provider (Content Provider), and the like, where the manager includes at least one of the following modules: an activity manager (ActivityManager) is used to interact with all activities running in the system; the Location Manager (Location Manager) is used for providing the system service or application with the access of the system Location service; a Package Manager (Package Manager) for retrieving various information related to an application Package currently installed on the device; a notification manager (notifiationmanager) for controlling display and clearing of notification messages; a Window Manager (Window Manager) is used to manage the icons, windows, toolbars, wallpapers, and desktop components on a user interface.

In some embodiments, the activity manager is to: managing the life cycle of each application program and the general navigation backspacing function, such as controlling the exit of the application program (including switching the user interface currently displayed in the display window to the system desktop), opening, backing (including switching the user interface currently displayed in the display window to the previous user interface of the user interface currently displayed), and the like.

In some embodiments, the window manager is configured to manage all window processes, such as obtaining a display size, determining whether a status bar is available, locking a screen, intercepting a screen, controlling a display change (e.g., zooming out, dithering, distorting, etc.) and the like.

In some embodiments, the system runtime layer provides support for the upper layer, i.e., the framework layer, and when the framework layer is used, the android operating system runs the C/C + + library included in the system runtime layer to implement the functions to be implemented by the framework layer.

In some embodiments, the kernel layer is a layer between hardware and software. As shown in fig. 4, the core layer includes at least one of the following drivers: audio drive, display drive, bluetooth drive, camera drive, WIFI drive, USB drive, HDMI drive, sensor drive (such as fingerprint sensor, temperature sensor, touch sensor, pressure sensor, etc.), and so on.

In some embodiments, the kernel layer further comprises a power driver module for power management.

In some embodiments, software programs and/or modules corresponding to the software architecture of fig. 4 are stored in the first memory or the second memory shown in fig. 2 or 3.

In some embodiments, taking the magic mirror application (photographing application) as an example, when the remote control receiving device receives a remote control input operation, a corresponding hardware interrupt is sent to the kernel layer. The kernel layer processes the input operation into an original input event (including information such as a value of the input operation, a timestamp of the input operation, etc.). The raw input events are stored at the kernel layer. The application program framework layer obtains an original input event from the kernel layer, identifies a control corresponding to the input event according to the current position of the focus and uses the input operation as a confirmation operation, the control corresponding to the confirmation operation is a control of a magic mirror application icon, the magic mirror application calls an interface of the application framework layer to start the magic mirror application, and then the kernel layer is called to start a camera driver, so that a static image or a video is captured through the camera.

In some embodiments, for a display device with a touch function, taking a split screen operation as an example, the display device receives an input operation (such as a split screen operation) that a user acts on a display screen, and the kernel layer may generate a corresponding input event according to the input operation and report the event to the application framework layer. The window mode (such as multi-window mode) corresponding to the input operation, the position and size of the window and the like are set by an activity manager of the application framework layer. And the window management of the application program framework layer draws a window according to the setting of the activity manager, then sends the drawn window data to the display driver of the kernel layer, and the display driver displays the corresponding application interface in different display areas of the display screen.

In some embodiments, as shown in fig. 5, the application layer containing at least one application may display a corresponding icon control in the display, such as: the system comprises a live television application icon control, a video on demand application icon control, a media center application icon control, an application center icon control, a game application icon control and the like.

In some embodiments, the live television application may provide live television via different signal sources. For example, a live television application may provide television signals using input from cable television, radio broadcasts, satellite services, or other types of live television services. And, the live television application may display video of the live television signal on the display device 200.

In some embodiments, a video-on-demand application may provide video from different storage sources. Unlike live television applications, video on demand provides a video display from some storage source. For example, the video on demand may come from a server side of the cloud storage, from a local hard disk storage containing stored video programs.

In some embodiments, the media center application may provide various applications for multimedia content playback. For example, a media center, which may be other than live television or video on demand, may provide services that a user may access to various images or audio through a media center application.

In some embodiments, an application center may provide storage for various applications. The application may be a game, an application, or some other application associated with a computer system or other device that may be run on the smart television. The application center may obtain these applications from different sources, store them in local storage, and then be operable on the display device 200.

The hardware or software architecture in some embodiments may be based on the description in the above embodiments, and in some embodiments may be based on other hardware or software architectures that are similar to the above embodiments, and it is sufficient to implement the technical solution of the present application.

In some embodiments, the image collector of the display device 200 may comprise a camera, and the user may conduct a video call with a user using another display device 200 through a video call application installed on the display device 200. In the related art, the call interface displayed by the display device 200 includes two windows, and images collected by the two devices of the video call are respectively displayed in the different windows of the call interface. However, in the above video call scenario, the backgrounds of the people in the two windows are usually different, and the people of the two parties of the video call are respectively in the two different backgrounds, which cannot break the hard boundary limitation of the screen of the display device 200, and thus the user experience is not good.

In order to solve the technical problem, the application provides a mixed call scheme based on the AR technology, and the camera of the mixed call scheme based on the display device 200 is a 3D camera module, so that the AR mixed call can be realized. In some embodiments, the 3D camera module may include a 3D camera and other cameras, such as a wide-angle camera, a macro camera, a main camera, and the like; in other embodiments, the 3D camera module may also include only a 3D camera.

Referring to fig. 6, an AR hybrid call is illustrated according to some embodiments. As shown in fig. 6, the 3D camera modules of the two display devices 200 respectively collect depth images, and upload the respective collected depth images to the server, and the server can mix the characters of both sides of the video call in the same background according to the two depth images, so that the two display devices 200 can both display the images of both sides of the characters in the same background, thereby improving the video chat experience.

The process of mixing calls is described further below. Referring to fig. 7, a diagram of mixed-call interaction is shown, in accordance with some embodiments. As shown in fig. 7, in some embodiments, after the two users start a video call through two display devices 200, the calling end and the called end may perform a mixed call through the server, where the display device 200 that sends the mixed call request may be referred to as the calling end, and the display device 200 that receives the mixed call request may be referred to as the called end.

In some embodiments, the video call application further has a voice call function and a function of switching the voice call to the video call, so that the hybrid call scheme provided by the embodiment of the present application is also applicable to a voice call scenario, so that a user can switch from the voice call to the hybrid call.

Referring to fig. 8, which is a schematic view of a video call interface according to some embodiments, as shown in fig. 8, the video call interface includes two windows, one of the two windows displays a character and a background of a calling end, and the other window displays a character and a background of a called end, for convenience of distinguishing, a user at the calling end may be referred to as a first character, a user at the called end may be referred to as a second character, and backgrounds of the first character and the second character are generally different, in fig. 8, a horizontal stripe is used to represent a background of the first character, and a vertical stripe is used to represent a background of the second character.

In some embodiments, the controller of the display apparatus 200 may query whether AR hybrid call is supported after the video call application is started. According to the starting condition that the video call application program has the 3D camera module, the AR mixed call can be judged to be supported. The enabling conditions may include that the display apparatus 200 has a 3D camera module, the video call application has a use authority of the 3D camera module, and the 3D camera module operates normally. If the display device 200 detects that the video call application program has the enabling condition of the 3D camera module, as shown in fig. 8, the display may be controlled to display the hybrid call control on the video call interface. The name of the mixed call control can be 'AR call', the triggering mode of the mixed call control can be voice triggering, clicking triggering and other modes, the triggering signal of the mixed call control can be a control signal for indicating the mixed call, and of course, the control signal for indicating the mixed call can also be other signals, such as signals of a preset gesture signal and double-click signals at any position of a screen.

In some embodiments, the controller of the display device 200 may directly display the hybrid call control as shown in fig. 8 after the video call application is started, and detect whether the video call application has the enabling condition of the 3D camera module after receiving the control signal indicating the hybrid call.

After the user inputs a control signal indicating a mixed call on the display device 200 by clicking a mixed call control on the display device 200, the display device 200 becomes a calling terminal, and the user becomes a first user.

In some embodiments, the calling end displays the mixed call control after detecting that the AR mixed call is supported, so that the calling end can directly generate a mixed call request after receiving a control signal for indicating the mixed call, and the mixed call request is sent to the called end through the server, so that the detection time of the 3D camera module is saved; in some embodiments, the calling end does not detect whether the AR hybrid call is supported or not before receiving the control signal for indicating the hybrid call, and the enabling condition may also change at any time, for example, the user closes the use right of the 3D camera module, so that the calling end can detect whether the AR hybrid call is supported or not after the user inputs the control signal for indicating the hybrid call, and ensure that the 3D camera module of the calling end can be enabled normally, the calling end generates a hybrid call request after detecting that the 3D camera module can be enabled normally, sends the hybrid call request to the server, and the server can send the hybrid call request to the called end, thereby querying whether the called end supports and receives the AR hybrid call.

After receiving the mixed call request, the called end can inquire whether to support AR mixed call. According to the starting condition that the video call application program has the 3D camera module, the AR mixed call can be judged to be supported. If the video call application program does not support AR mixed call, a signal which does not support AR mixed call can be fed back to the server, and the server sends the signal which does not support AR mixed call to the calling end, so that the calling end can display prompt information that the opposite side does not support AR mixed call. And if the video call application program supports AR mixed call, generating second prompt information and controlling the display to display the second prompt information. The second prompt message may include a prompt box and a selection control, and the content of the prompt box may be a message prompting whether to accept the mixed call, such as "confirm to make the mixed call? ", the number of the selection controls can be two, one of which indicates that the user of the called terminal accepts the mixed call when responding to the trigger, and the other of which indicates that the user of the called terminal refuses the mixed call when responding to the trigger.

And when the called terminal receives the control signal which is input by the user and corresponds to the second prompt message and is a signal for rejecting the mixed call, the called terminal generates a rejection signal and sends the rejection signal to the server, and the server can forward the rejection signal to the calling terminal.

And the calling terminal generates and controls the display to display the third prompt message according to the received rejection signal. The third prompt message may include a prompt box, and the content of the prompt box may be a message prompting the other party to reject the mixed call, such as "the other party has rejected the mixed call".

And when the called terminal receives the control signal corresponding to the second prompt message and input by the user, and the control signal is a signal for receiving the mixed call, the called terminal generates a confirmation signal and sends the confirmation signal to the server.

In some embodiments, the server may directly forward the confirmation signal to the calling terminal, and the calling terminal may control the 3D camera module to acquire the first depth image according to the reception of the confirmation signal, and send the first depth image to the server.

In some embodiments, before the calling end sends the mixed call request, the call made by the two users is a voice call, if the user touches the mixed call control by mistake and the other receives the mixed call request, the calling end directly starts the 3D camera module according to the confirmation signal, which may cause the privacy of the calling end to be exposed, or the calling end does not touch the mixed call control by mistake, and really wants to establish the mixed call connection, but the calling end does not prepare for opening the camera. The first prompt message may include a prompt box and a selection control, and the content of the prompt box may be a message prompting whether to accept the mixed call, such as "confirm to make the mixed call? ", the number of the selection controls can be two, one of which indicates the user at the calling end confirms to carry out the mixed call when responding to the trigger, and the other of which indicates the user at the calling end cancels the mixed call when responding to the trigger. And after the user of the calling terminal triggers a selection control which indicates 'confirmation of mixed call', the video call application program of the calling terminal controls the 3D camera module to acquire a first depth image and sends the first depth image to the server.

The first depth image may include a point cloud containing depth information. In some embodiments, the video call application program of the calling end generates a mixed stream according to the first depth image, the audio collected by the microphone, and the video collected by other cameras of the 3D camera module, and sends the mixed stream to the server, so that the server performs audio and video processing, such as portrait background blurring, portrait beautifying, sound effect setting, and the like.

In some embodiments, the server may send a person depth image request to the called terminal after receiving the first depth image sent by the calling terminal, so as to request spatial information, i.e., depth information, of a person of the called terminal. The called terminal can control the 3D camera module to collect a second depth image according to the received character depth image request, the second depth image can comprise point cloud containing depth information, and the called terminal can extract the depth information of a second character, namely character space segmentation information, from the second depth image and send the depth information to the server. The method for extracting the depth information of the second person from the second depth image comprises the following steps: carrying out human body recognition on the second depth image by using a human body recognition algorithm, and recognizing the position of a second person in the second depth image; and performing background segmentation according to the position of the second person in the second depth image to strip the depth information of the second person from the second depth image, thereby obtaining the depth information of the second person.

In some embodiments, the server may send a depth image request to the called terminal after receiving the first depth image sent by the calling terminal, so as to request the called terminal to provide the depth information of the called terminal. The called terminal can control the 3D camera module to collect a second depth image according to the received depth image request, the second depth image is sent to the server, and the server extracts the depth information of the second person from the second depth image.

The server can render the second person into the first depth image according to the depth information of the second person and the depth information of the first depth image to obtain a first mixed image, and the first mixed image is respectively sent to the called end and the calling end. And after receiving the first mixed image, the calling end and the called end respectively control the respective displays to display the first mixed image. Referring to fig. 9, a diagram of a hybrid telephony interface is shown, in accordance with some embodiments. As shown in fig. 9, in the first mixed image, the first person and the second person are both in the same background, and the background is the real background of the first person.

In some embodiments, the server may perform audio and video processing on the first mixed image to obtain an AR mixed stream, and send the AR mixed stream to the calling end and the called end respectively, so that the calling end and the called end display the processed first mixed image and audio.

In some embodiments, the hybrid call interface may be provided with a switching background control, as shown in fig. 9, the control name may be "switching background", when the user at the calling end or the user at the called end triggers the switching background control, the server may switch the first mixed image to the second mixed image shown in fig. 10, where the background of the second mixed image is the real background of the second person, taking the case that the user at the calling end triggers the switching background control as an example, the specific process of switching background is as follows:

when a user at a calling end inputs a control signal for indicating background switching at the calling end by clicking a background switching control and the like, the calling end responds to the received control signal for indicating background switching and sends a background switching request to a server.

Since in some embodiments, the called end sends the second depth image to the server, and in some embodiments, the called end only sends the depth information of the second person to the server, and the background depth information of the second person is needed for switching to the background of the second person. Therefore, the server can judge whether the background depth information of the second person exists, if the background depth information of the second person exists, the server can extract the depth information of the first person from the first depth image, the extraction method is the same as the method for extracting the depth information of the second person from the second depth image, and the first person is rendered into the second depth image to obtain a second mixed image; if the background depth information of the second person does not exist, the server can send a depth image request to the called terminal to request the called terminal to provide a second depth image, further extract the depth information of the first person from the first depth image, and render the first person into the second depth image to obtain a second mixed image.

And after the server generates a second mixed image, the second mixed image is respectively sent to the called end and the calling end. And after receiving the second mixed image, the calling end and the called end respectively control the respective displays to switch the first mixed image into the second mixed image.

As shown in fig. 10, the interface for the second blended image may retain a switch background control for the user to select to switch the second blended image to the first blended image.

To further explain the above hybrid call scheme, an embodiment of the present application further provides a video call method, and referring to fig. 11, the video call method may include the following steps:

step S110: and sending the mixed call request of the calling terminal to the called terminal.

After receiving the mixed call request of the calling terminal, the server can send the mixed call request of the calling terminal to the called terminal.

Step S120: and acquiring a first depth image acquired by the calling terminal according to the received confirmation signal of the called terminal.

In some embodiments, after receiving a confirmation signal that the called end corresponds to the hybrid call request, the server may send the confirmation signal to the calling end, so that the calling end controls the 3D camera module to acquire the first depth image according to the receipt of the confirmation signal, and sends the first depth image to the server.

In some embodiments, after receiving a confirmation signal that the called end corresponds to the hybrid call request, the server may send a first prompt signal to the calling end, so that the calling end displays the first prompt information, and after receiving the confirmation signal that corresponds to the first prompt information, the calling end controls the 3D camera module to collect the first depth image and sends the first depth image to the server.

Step S130: and acquiring the depth information of a second person of the called terminal.

The server may send a person depth image request to the called end after receiving the first depth information, and obtain depth information of a second person in the second depth image.

Step S140: and rendering the second person into the first depth image according to the depth information of the second person to obtain a first mixed image.

The server may render the second person to a suitable position of the first depth image, such as the same horizontal position as the first person, based on the depth information of the second person and the background depth information in the first depth image, adjust the size of the second person to be equal to the size of the first person, and finally synthesize the first mixed image.

Step S150: and respectively sending the first mixed image to the calling terminal and the called terminal.

The server respectively sends the first mixed image to the calling terminal and the called terminal, and the first mixed image is displayed on respective displays by the calling terminal and the called terminal.

Further, the server can also receive a background switching request from the calling terminal or the called terminal, switch the first mixed image into the second mixed image, or switch the second mixed image into the first mixed image again.

The embodiment of the application also provides a server which can be used for executing the video call method.

According to the embodiment, the depth information of the two parties of the call is collected through the 3D camera module, the character of one party is rendered into the depth image of the other party according to the depth information of the two parties of the call, real-time display of the two parties of the call under the same real background is achieved, the problem that the characters of the two parties on a call interface are in different backgrounds is solved, and video call experience of a user is improved.

Since the above embodiments are all described by referring to and combining with other embodiments, the same portions are provided between different embodiments, and the same and similar portions between the various embodiments in this specification may be referred to each other. And will not be described in detail herein.

It is noted that, in this specification, relational terms such as "first" and "second," and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a circuit structure, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such circuit structure, article, or apparatus. Without further limitation, the presence of an element identified by the phrase "comprising an … …" does not exclude the presence of other like elements in a circuit structure, article, or device comprising the element.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims. The above embodiments of the present application do not limit the scope of the present application.

Claims

1. A display device, comprising:

the camera is used for acquiring a first depth image;

2. The display device of claim 1, wherein the controller is further configured to:

3. The display device of claim 1, wherein the control signal indicating mixed call in response to receiving user input further comprises:

establishing video call connection with a called terminal;

controlling a display to display a mixed call control on a user interface of a video call, wherein the mixed call control generates the control signal indicating mixed call in response to a trigger.

4. The display device of claim 1, wherein the sending the first depth image to a server further comprises:

controlling the display to display first prompt information for confirming the mixed call;

and receiving a control signal which is input by a user and corresponds to the first prompt message.

5. A display device, comprising:

the camera is used for acquiring a second depth image;

6. The display device of claim 5, wherein the controller is further configured to: and sending the second depth image to the server, so that the server renders the first person into the second depth image according to the depth information of the first person to obtain a second mixed image, wherein the first person is a person in the first depth image, and the first depth image is the depth image acquired by the calling terminal.

7. The display device according to claim 5, wherein the extracting depth information of the second person from the second depth image comprises:

carrying out human body recognition on the second depth image to obtain the position of the second person in the second depth image;

and performing background segmentation according to the position of the second person in the second depth image to obtain the depth information of the second person.

8. A video call method, comprising:

sending the mixed call request of the calling terminal to the called terminal;

acquiring depth information of a second person of the called terminal;

9. The video call method of claim 8, further comprising:

extracting depth information of a first person from the first depth image according to the received background switching request;

rendering the first person into the second depth image according to the depth information of the first person to obtain a second mixed image;

and respectively sending the second mixed image to the calling terminal and the called terminal.

10. A server, wherein the server is configured to:

sending the mixed call request of the calling terminal to the called terminal;

acquiring depth information of a second person of the called terminal;