WO2020248829A1

WO2020248829A1 - Audio and video processing method and display device

Info

Publication number: WO2020248829A1
Application number: PCT/CN2020/093101
Authority: WO
Inventors: 杨香斌; 王峰
Original assignee: 海信视像科技股份有限公司
Priority date: 2019-06-10
Filing date: 2020-05-29
Publication date: 2020-12-17
Also published as: CN112073663A; CN112073663B

Abstract

The present application provides an audio adjustment method, a video call method, and a display device, which are suitable for social TVs. The method comprises: obtaining focal length information corresponding to a current image during a video call; obtaining microphone gain according to the focal length information; and adjusting an audio received by a microphone according to the obtained microphone gain value. According to the present application, the focal length information is used to process video images during a video call to determine microphone gain, so that gain processing is performed on the audio data on the basis of the distance between a person and a microphone during the video call, thereby ensuring the sound stability during the video call.

Description

Audio and video processing method and display device

This application requires a Chinese patent application to be submitted to the Chinese Patent Office on June 10, 2019, the application number is 201910497121.4, and the application name is "Microphone gain adjustment method, video chat method and display device" and a Chinese patent application on August 9, 2019 Office, the application number is 201910736428.5, the application name is "audio gain adjustment method, video chat method and display device" Chinese patent application priority, the entire content of which is incorporated in this application by reference.

Technical field

This application relates to the technical field of display devices, and in particular to an audio and video processing method and a display device.

Background technique

With the development of smart TVs, smart TVs have gradually begun to set up cameras for voice and video calls to realize the function of "watching and chatting" on the TV. Smart TVs are usually fixedly installed in relatively large spaces such as living rooms, and people often keep a certain distance from them when they use them; and when people use smart TVs for voice and video calls, they often accompany people's movement. If the person using the smart TV voice and video call is moving along, the sound will be unstable. For example, if the voice and video call occurs, the voice fluctuates and the voice is very unstable, and sometimes even affects the voice and video call. get on.

However, the current main application scenarios for voice and video calls are on handheld mobile devices such as mobile phones. Although in order to ensure the voice quality during voice and video calls, mobile phones and other handheld mobile devices mostly perform noise reduction or gain processing on sound. However, because most of the voice and video calls applied on mobile phones and other handheld mobile devices are near-field voice and video calls, the current noise reduction or gain processing on the sound of handheld mobile devices such as mobile phones is mostly for the sound in a certain distance scenario deal with. However, a voice and video call using a smart TV is a far-field voice and video call. The distance between the speaker and the microphone on the TV usually has a relatively large distance and the distance may vary with the movement of the speaker. Therefore, the sound processing technology of mobile phones and other handheld mobile devices for voice and video calls cannot meet the demand for sound stability in the use of voice and video calls in smart TVs.

Summary of the invention

This application provides an audio adjustment method, a video call method, and a display device to ensure the stability of the sound in the video call.

In a first aspect, the present application provides an audio and video processing method, the method includes: receiving a current image generated according to a local image collected by a camera, and receiving generated audio according to a microphone collecting local sound; obtaining focal length information corresponding to the current image; The focal length information and a preset correspondence relationship are used to obtain microphone gains, where different microphone gains in the preset correspondence relationship correspond to different focal length information; the audio is adjusted according to the acquired microphone gain value; and the adjusted audio is sent To the peer device of the video call.

In the second aspect, this application provides an audio and video processing method, the method includes: receiving the current image generated by the local image collected by the camera, and receiving the audio generated by the local sound collected by the microphone; The focal length information of the current image adjusts the audio, and sends the adjusted audio and the current image to the peer device of the video call; if it is in the recording state, there is no need to adjust the audio according to the focal length information of the current image To generate a video file based on the current image and the audio.

In a third aspect, the present application provides an audio and video processing method, the method includes: the auxiliary chip transmits the video image collected by the camera after automatic zoom processing to the main chip, and transmits the focal length information corresponding to the video image to the main chip The main chip; the main chip receives the video image and the focal length information; the main chip obtains microphone gain according to the focal length information, and performs gain processing on the audio corresponding to the video image according to the microphone gain , In order to reduce the fluctuation of the audio volume sent locally to the opposite end; the main chip synchronizes the gain-processed audio with the video image, and transmits the synchronized audio and video to the display frame of the opposite end.

In a fourth aspect, the present application provides a display device, the display device includes: a camera; a microphone; a controller, the controller is configured to: receive the current image generated according to the local image collected by the camera, and receive the local sound collected according to the microphone Generate audio; obtain focal length information corresponding to the current image; obtain microphone gain according to the focal length information and a preset correspondence relationship, wherein different microphone gains in the preset correspondence relationship correspond to different focal length information; according to the acquired microphone gain Adjust the audio according to the value; send the adjusted audio to the peer device of the video call.

In a fifth aspect, the present application provides a display device, the display device includes: a camera; a microphone; a controller, the controller is configured to: receive the current image generated according to the local image collected by the camera, and receive the local sound collected according to the microphone Generate audio; if it is in a video call state, adjust the audio according to the focal length information of the current image, and send the adjusted audio and the current image to the peer device of the video call; if it is in a non-video call state, There is no need to adjust the audio according to the focal length information of the current image, and an audio and video file is generated according to the current image and the audio.

In a sixth aspect, the present application provides a display device, the display device comprising: a camera; a microphone; a main chip and an auxiliary chip connected to each other; the auxiliary chip receives the local image collected by the camera, and converts the local image The current image generated after automatic zoom processing is transmitted to the main chip, and the focal length information corresponding to the current image is transmitted to the main chip; the main chip receives the current image and the focal length information; the main chip Obtain the microphone gain according to the focal length information, and perform gain processing on the audio corresponding to the current image according to the microphone gain to reduce the fluctuation of the audio volume sent locally to the opposite end; the main chip processes the audio after gain processing Synchronize with the video image, and transmit the synchronized audio and video to the display device at the opposite end.

Description of the drawings

In order to explain the implementation of the present application more clearly, the following will briefly introduce the drawings needed in the embodiments. Obviously, for those of ordinary skill in the art, without creative labor, Other drawings can be obtained from these drawings.

Fig. 1 exemplarily shows a schematic diagram of an operation scenario between a display device and a control device according to an embodiment;

FIG. 2 exemplarily shows a block diagram of the hardware configuration of the control device 100 according to the embodiment;

FIG. 3 exemplarily shows a block diagram of the hardware configuration of the display device 200 according to the embodiment;

FIG. 4 exemplarily shows a block diagram of the hardware architecture of the display device 200 according to FIG. 3;

FIG. 5 exemplarily shows a schematic diagram of the functional configuration of the display device 200 according to the embodiment;

Fig. 6a exemplarily shows a schematic diagram of software configuration in the display device 200 according to the embodiment;

FIG. 6b exemplarily shows a configuration diagram of an application program in the display device 200 according to the embodiment;

FIG. 7 exemplarily shows a schematic diagram of the user interface in the display device 200 according to the embodiment;

FIG. 8 exemplarily shows a schematic flowchart of an audio adjustment method according to an embodiment;

FIG. 9 exemplarily shows the calculation principle diagram of the focal length information according to the embodiment;

Fig. 10 exemplarily shows a schematic flowchart of a video call method according to an embodiment.

Detailed ways

In order to make the purpose, implementation and advantages of the present application clearer, the exemplary embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the exemplary embodiments of the present application. Obviously, the described exemplary embodiments It is only a part of the embodiments of this application, but not all the embodiments.

For the convenience of users, various external device interfaces are usually provided on the display device to facilitate the connection of different peripheral devices or cables to realize corresponding functions. When a high-definition camera is connected to the interface of the display device, if the hardware system of the display device does not have the hardware interface of the high-pixel camera that receives the source code, then the data received by the camera cannot be presented to the display of the display device. On the screen.

In addition, due to the hardware structure, the hardware system of traditional display devices only supports one hard decoding resource, and usually only supports 4K resolution video decoding. Therefore, when you want to realize the video chat while watching Internet TV, in order not to reduce The definition of the network video screen requires the use of hard decoding resources (usually the GPU in the hardware system) to decode the network video. In this case, the general-purpose processor (such as CPU) in the hardware system can only be used to decode the video. The video chat screen is processed by soft decoding.

Using soft decoding to process the video chat screen will greatly increase the data processing burden of the CPU. When the CPU's data processing burden is too heavy, the picture may freeze or become unsmooth. Further, subject to the data processing capability of the CPU, when the CPU soft decoding is used to process the video chat screen, it is usually impossible to achieve multi-channel video calls. When the user wants to simultaneously video chat with multiple other users in the same chat scene, it will There is a situation where access is blocked.

Some embodiments of the present application disclose a dual hardware system architecture to implement multiple channels of video chat data (at least one local video).

The following first describes the concepts involved in the present application with reference to the drawings. It should be pointed out here that the following description of each concept is only to make the content of this application easier to understand, and does not mean to limit the protection scope of this application.

The term "module" used in the various embodiments of the present application can refer to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware or/and software code, which can execute related components. Function.

The term "remote control" used in the various embodiments of this application refers to a component of an electronic device (such as the display device disclosed in this application), which can generally control the electronic device wirelessly within a short distance. The component can generally be connected to an electronic device using infrared and/or radio frequency (RF) signals and/or Bluetooth, and can also include at least one of functional modules such as WiFi, wireless USB, Bluetooth, and motion sensors. For example, a handheld touch remote control uses a user interface in a touch screen to replace most of the physical built-in hard keys in general remote control devices.

The term "gesture" used in the embodiments of the present application refers to a user's behavior through a change of hand shape or hand movement to express expected ideas, actions, goals, and/or results.

The term "hardware system" used in the various embodiments of this application may include integrated circuit (IC), printed circuit board (Printed circuit board, PCB) and other mechanical, optical, electrical, and magnetic devices with computing, At least one of the physical components of control, storage, input and output functions.

Fig. 1 exemplarily shows a schematic diagram of an operation scenario between a display device and a control device according to an embodiment. As shown in FIG. 1, the user can operate the display device 200 by controlling the device 100.

The control device 100 may be a remote controller 100A, which can communicate with the display device 200 through at least one of infrared protocol communication, Bluetooth protocol communication, ZigBee protocol communication, or other short-distance communication methods for The display device 200 is controlled by wireless or other wired methods. The user can control the display device 200 by inputting user instructions through keys on the remote control, voice input, control panel input, etc. For example, the user can control the display device 200 by inputting corresponding control commands through the volume plus and minus keys, channel control keys, up/down/left/right movement keys, voice input keys, menu keys, and switch buttons on the remote control. Function.

The control device 100 can also be a smart device, such as a mobile terminal 100B, a tablet computer, a computer, a notebook computer, etc., which can be connected through a local area network (LAN, Wide Area Network), a wide area network (WAN, Wide Area Network), and a wireless local area network ((WLAN) , Wireless Local Area Network) or at least one of other networks communicates with the display device 200, and controls the display device 200 through an application program corresponding to the display device 200. For example, using an application program running on a smart device Control the display device 200. The application can provide various controls for the user through an intuitive user interface (UI, User Interface) on the screen associated with the smart device.

For example, both the mobile terminal 100B and the display device 200 can be installed with software applications, so that the connection and communication between the two can be realized through a network communication protocol, thereby realizing one-to-one control operation and data communication. For example, the mobile terminal 100B can establish a control command protocol with the display device 200, synchronize the remote control keyboard to the mobile terminal 100B, and control the display device 200 by controlling the user interface of the mobile terminal 100B; or the mobile terminal 100B The audio and video content displayed on the screen is transmitted to the display device 200 to realize the synchronous display function.

As shown in FIG. 1, the display device 200 can also communicate with the server 300 through multiple communication methods. In various embodiments of the present application, the display device 200 may be allowed to communicate with the server 300 through at least one of a local area network, a wireless local area network, or other networks. The server 300 may provide various contents and interactions to the display device 200.

Illustratively, the display device 200 transmits and receives information, interacts with an Electronic Program Guide (EPG, Electronic Program Guide), receives software program updates, or accesses a remotely stored digital media library. The server 300 may be a group or multiple groups, and may be one or more types of servers. The server 300 provides other network service content such as video on demand and advertising services.

The display device 200 may be a liquid crystal display, an OLED (Organic Light Emitting Diode) display, a projection display device, or a smart TV. The specific display device type, size, resolution, etc. are not limited, and those skilled in the art can understand that the display device 200 can make some changes in performance and configuration as required.

In addition to providing the broadcast receiving TV function, the display device 200 may additionally provide a smart network TV function that provides a computer support function. Examples include Internet TV, Smart TV, Internet Protocol TV (IPTV) and so on.

As shown in Figure 1, the display device may be connected or provided with a camera, which is used to present the picture captured by the camera on the display interface of the display device or other display devices to realize interactive chats between users. In some embodiments, the picture captured by the camera may be displayed on the display device in full screen, half screen, or in any selectable area.

As an optional connection method, the camera is connected to the monitor rear shell through a connecting plate, and is fixedly installed on the upper middle of the monitor rear shell. As an installable method, it can be fixedly installed at any position of the monitor rear shell to ensure its It is sufficient that the image capture area is not blocked by the rear shell, for example, the image capture area and the display device have the same orientation.

As another optional connection method, the camera can be connected to the display back shell through a connecting plate or other conceivable connectors. A lifting motor is installed on the connector. When the user wants to use the camera or there is When the application wants to use the camera, it is raised above the display. When the camera is not needed, it can be embedded behind the back shell to protect the camera from damage.

As an embodiment, the camera used in this application may have 16 million pixels to achieve the purpose of ultra-high-definition display. In actual use, a camera with higher or lower than 16 million pixels can also be used.

When a camera is installed on the display device, the content displayed in different application scenarios of the display device can be merged in many different ways, so as to achieve functions that cannot be achieved by traditional display devices.

Exemplarily, the user can video chat with at least one other user while watching a video program. The presentation of the video program can be used as the background picture, and the video chat window is displayed on the background picture. Visually, you can call this function "watch and chat".

Optionally, in the scenario of "watching while chatting", while watching live video or network video, at least one video chat is conducted across terminals.

In another example, the user can video chat with at least one other user while entering the education application for learning. For example, students can realize remote interaction with teachers while learning content in educational applications. Visually, you can call this function "learning and chatting".

In another example, when a user is playing a card game, a video chat is conducted with players entering the game. For example, when a player enters a game application to participate in a game, it can realize remote interaction with other players. Visually, you can call this function "watch and play".

Optionally, the game scene is integrated with the video picture, and the portrait in the video picture is cut out and displayed on the game picture to improve user experience.

Optionally, in somatosensory games (such as ball games, boxing games, running games, dancing games, etc.), human body postures and movements are acquired through the camera, body detection and tracking, and the detection of human bone key points data, and then the game Animations are integrated to realize games such as sports and dance scenes.

In another example, the user can interact with at least one other user in video and voice in the K song application. Visually, you can call this function "watch and sing". Preferably, when at least one user enters the application in the chat scene, multiple users can jointly complete the recording of a song.

In another example, the user can turn on the camera locally to obtain pictures and videos, which is vivid, and this function can be called "look in the mirror".

In other examples, more functions can be added or the aforementioned functions can be reduced. This application does not specifically limit the function of the display device.

Fig. 2 exemplarily shows a configuration block diagram of the control device 100 according to an exemplary embodiment. As shown in FIG. 3, the control device 100 includes a controller 110, a communicator 130, a user input/output interface 140, a memory 190, and a power supply 180.

The control device 100 is configured to control the display device 200, and can receive user input operation instructions, and convert the operation instructions into instructions that can be recognized and responded to by the display device 200, and serve as an interactive intermediary between the user and the display device 200 effect. For example, the user operates the channel addition and subtraction keys on the control device 100, and the display device 200 responds to the channel addition and subtraction operations.

In some embodiments, the control device 100 may be a smart device. For example, the control device 100 can install various applications for controlling the display device 200 according to user requirements.

In some embodiments, as shown in FIG. 1, the mobile terminal 100B or other smart electronic devices can perform similar functions to the control device 100 after installing an application for controlling the display device 200. For example, the user can install various function keys or virtual buttons of the graphical user interface that can be provided on the mobile terminal 100B or other smart electronic devices by installing applications to realize the function of the physical keys of the control device 100.

The controller 110 includes at least one of a processor 112, a RAM 113 and a ROM 114, a communication interface, and a communication bus. The controller 110 is used to control the operation and operation of the control device 100, as well as the communication and cooperation between internal components, and external and internal data processing functions.

The communicator 130 realizes communication of control signals and data signals with the display device 200 under the control of the controller 110. For example, the received user input signal is sent to the display device 200. The communicator 130 may include at least one of communication modules such as a WIFI module 131, a Bluetooth module 132, and an NFC module 133.

The user input/output interface 140, wherein the input interface includes at least one of input interfaces such as a microphone 141, a touch panel 142, a sensor 143, and a button 144. For example, the user can implement the user instruction input function through voice, touch, gesture, pressing and other actions. The input interface converts the received analog signal into a digital signal and the digital signal into a corresponding instruction signal, which is sent to the display device 200.

The output interface includes an interface for sending the received user instruction to the display device 200. In some embodiments, it may be an infrared interface or a radio frequency interface. For example, in the case of an infrared signal interface, the user input instruction needs to be converted into an infrared control signal according to the infrared control protocol, and sent to the display device 200 via the infrared sending module. For another example, in the case of a radio frequency signal interface, a user input instruction needs to be converted into a digital signal, which is then modulated according to the radio frequency control signal modulation protocol, and then sent to the display device 200 by the radio frequency transmitting terminal.

In some embodiments, the control device 100 includes at least one of a communicator 130 and an output interface. The control device 100 is configured with a communicator 130, such as: WIFI, Bluetooth, NFC and other modules, which can encode user input instructions through the WIFI protocol, or Bluetooth protocol, or NFC protocol, and send to the display device 200.

The memory 190 is used to store various operating programs, data and applications for driving and controlling the control device 100 under the control of the controller 110. The memory 190 can store various control signal instructions input by the user.

The power supply 180 is used to provide operating power support for each element of the control device 100 under the control of the controller 110. Can battery and related control circuit.

FIG. 3 exemplarily shows a hardware configuration block diagram of a hardware system in the display device 200 according to an exemplary embodiment.

When the dual hardware system architecture is adopted, the mechanism relationship of the hardware system can be shown in Figure 3. For ease of description, one hardware system in the dual hardware system architecture is referred to as the first hardware system or A system, A chip, and the other hardware system is referred to as the second hardware system or N system, N chip. The A chip includes the controller and various interfaces of the A chip, and the N chip includes the controller and various interfaces of the N chip. An independent operating system may be installed in the A chip and the N chip, so that there are two independent but interrelated subsystems in the display device 200.

As shown in Figure 3, the A chip and the N chip can realize connection, communication and power supply through multiple different types of interfaces. The interface type of the interface between the A chip and the N chip may include at least one of general-purpose input/output (GPIO), USB interface, HDMI interface, UART interface, and the like. One or more of these interfaces can be used between the A chip and the N chip for communication or power transmission. For example, as shown in Figure 3, in the dual hardware system architecture, the N chip can be powered by an external power source, and the A chip can be powered by the N chip instead of the external power source.

In addition to the interface for connecting with the N chip, the A chip may also include interfaces for connecting other devices or components, such as the MIPI interface for connecting to a camera (Camera) shown in FIG. 3, a Bluetooth interface, etc.

Similarly, in addition to the interface for connecting with the N chip, the N chip can also include a VBY interface for connecting to the display screen TCON (Timer Control Register), which is used to connect a power amplifier (Amplifier, AMP) and a speaker (Speaker). ) I2S interface; and at least one of IR/Key interface, USB interface, Wifi interface, Bluetooth interface, HDMI interface, Tuner interface, etc.

The dual hardware system architecture of the present application will be further described below in conjunction with FIG. 4. It should be noted that FIG. 4 is only an exemplary description of the dual hardware system architecture of the present application, and does not represent a limitation to the present application. In practical applications, both hardware systems can contain more or less hardware or interfaces as required.

FIG. 4 exemplarily shows a hardware architecture block diagram of the display device 200 according to FIG. 3. As shown in FIG. 4, the hardware system of the display device 200 may include an A chip and an N chip, and modules connected to the A chip or the N chip through various interfaces.

The N chip may include a tuner and demodulator 220, a communicator 230, an external device interface 250, a controller 210, a memory 290, a user input interface, a video processor 260-1, an audio processor 260-2, a display 280, and an audio output interface 272. At least one of the power supplies. In other embodiments, the N chip may also include more or fewer modules.

Among them, the tuner and demodulator 220 is used to perform modulation and demodulation processing such as amplifying, mixing, and resonating broadcast television signals received through wired or wireless methods, thereby demodulating the user’s information from multiple wireless or cable broadcast television signals. Select the audio and video signals carried in the frequency of the TV channel, and additional information (such as EPG data signals). According to different television signal broadcasting systems, the signal path of the tuner and demodulator 220 can be varied, such as: terrestrial broadcasting, cable broadcasting, satellite broadcasting or Internet broadcasting; and according to different modulation types, the signal adjustment method can be digitally modulated The method may also be an analog modulation method; and according to different types of received television signals, the tuner demodulator 220 may demodulate analog signals and/or digital signals.

The tuner and demodulator 220 is also used to respond to the TV channel frequency selected by the user and the TV signal carried by the frequency according to the user's selection and control by the controller 210.

In some other exemplary embodiments, the tuner demodulator 220 may also be in an external device, such as an external set-top box. In this way, the set-top box outputs TV audio and video signals through modulation and demodulation, and inputs them to the display device 200 through the external device interface 250.

The communicator 230 is a component for communicating with external devices or external servers according to various communication protocol types. For example, the communicator 230 may include a WIFI module 231, a Bluetooth communication protocol module 232, a wired Ethernet communication protocol module 233, and an infrared communication protocol module and other network communication protocol modules or near field communication protocol modules.

The display device 200 may establish a control signal and a data signal connection with an external control device or content providing device through the communicator 230. For example, the communicator may receive the control signal of the remote controller 100 according to the control of the controller.

The external device interface 250 is a component that provides data transmission between the N chip controller 210 and the A chip and other external devices. The external device interface can be connected to external devices such as set-top boxes, game devices, notebook computers, etc. in a wired/wireless manner, and can receive external devices such as video signals (such as moving images), audio signals (such as music), and additional information (such as EPG). ) And other data.

Among them, the external device interface 250 may include: a high-definition multimedia interface (HDMI) terminal 251, a composite video blanking synchronization (CVBS) terminal 252, an analog or digital component terminal 253, a universal serial bus (USB) terminal 254, red, green, and blue ( RGB) terminal (not shown in the figure) and any one or more. This application does not limit the number and types of external device interfaces.

The controller 210 controls the work of the display device 200 and responds to user operations by running various software control programs (such as an operating system and/or various application programs) stored on the memory 290.

As shown in FIG. 4, the controller 210 includes at least one of a read-only memory RAM 213, a random access memory ROM 214, a graphics processor 216, a CPU processor 212, a communication interface 218, and a communication bus. Among them, RAM213 and ROM214, graphics processor 216, CPU processor 212, and communication interface 218 are connected by a bus.

ROM213, used to store various system startup instructions. For example, when the power-on signal is received, the power of the display device 200 starts to start, and the CPU processor 212 runs the system start-up instruction in the ROM, and copies the temporary data of the operating system stored in the memory 290 to the RAM 214 to start the operating system. After the operating system is started, the CPU processor 212 copies the temporary data of the various application programs in the memory 290 to the RAM 214, and then starts to run and start the various application programs.

The graphics processor 216 is used to generate various graphics objects, such as icons, operation menus, and user input instructions to display graphics. Including an arithmetic unit, which performs operations by receiving various interactive commands input by the user, and displays various objects according to display attributes. As well as including a renderer, various objects obtained based on the arithmetic unit are generated, and the rendering result is displayed on the display 280.

The CPU processor 212 is configured to execute operating system and application program instructions stored in the memory 290. And according to receiving various interactive instructions input from the outside, to execute various applications, data and content, so as to finally display and play various audio and video content.

In some exemplary embodiments, the CPU processor 212 may include multiple processors. The multiple processors may include one main processor and multiple or one sub-processors. The main processor is used to perform some operations of the display device 200 in the pre-power-on mode, and/or to display images in the normal mode. Multiple or one sub-processor, used to perform an operation in the standby mode and other states.

The communication interface may include the first interface 218-1 to the nth interface 218-n. These interfaces may be network interfaces connected to external devices via a network.

The controller 210 may control the overall operation of the display device 200. For example, in response to receiving a user command for selecting a UI object to be displayed on the display 280, the controller 210 may perform an operation related to the object selected by the user command.

Wherein, the object may be any one of the selectable objects, such as a hyperlink or an icon. Operations related to the selected object, for example: display operations connected to hyperlink pages, documents, images, etc., or perform operations corresponding to the icon. The user command for selecting the UI object may be a command input through various input devices (for example, a mouse, a keyboard, a touch pad, etc.) connected to the display device 200 or a voice command corresponding to the voice spoken by the user.

The memory 290 includes storing various software modules for driving and controlling the display device 200. For example, various software modules stored in the memory 290 include: at least one of a basic module, a detection module, a communication module, a display control module, a browser module, and various service modules.

Among them, the basic module is the underlying software module used for signal communication between various hardware in the display device 200 and sending processing and control signals to the upper module. The detection module is a management module used to collect various information from various sensors or user input interfaces, and perform digital-to-analog conversion and analysis management.

For example: the voice recognition module includes a voice analysis module and a voice command database module. The display control module is a module for controlling the display 280 to display image content, and can be used to play information such as multimedia image content and UI interfaces. The communication module is a module used for control and data communication with external devices. The browser module is a module used to perform data communication between browsing servers. The service module is a module used to provide various services and various applications.

At the same time, the memory 290 is also used to store and receive external data and user data, images of various items in various user interfaces, and visual effect diagrams of focus objects.

The user input interface is used to send a user's input signal to the controller 210, or to transmit a signal output from the controller to the user. Exemplarily, the control device (such as a mobile terminal or a remote control) may send input signals input by the user, such as a power switch signal, a channel selection signal, and a volume adjustment signal, to the user input interface, and then the user input interface forwards the input signal to the controller; Alternatively, the control device may receive output signals such as audio, video, or data output from the user input interface processed by the controller, and display the received output signal or output the received output signal as audio or vibration.

In some embodiments, the user may input a user command on a graphical user interface (GUI) displayed on the display 280, and the user input interface receives the user input command through the graphical user interface (GUI). Alternatively, the user can input a user command by inputting a specific sound or gesture, and the user input interface recognizes the sound or gesture through the sensor to receive the user input command.

The video processor 260-1 is used to receive video signals, and perform video data processing such as decompression, decoding, scaling, noise reduction, frame rate conversion, resolution conversion, and image synthesis according to the standard codec protocol of the input signal. The video signal displayed or played directly on the display 280.

Illustratively, the video processor 260-1 includes a demultiplexing module, a video decoding module, an image synthesis module, a frame rate conversion module, a display formatting module, and the like.

Among them, the demultiplexing module is used to demultiplex the input audio and video data stream. For example, if MPEG-2 is input, the demultiplexing module will demultiplex into video signals and audio signals.

The video decoding module is used to process the demultiplexed video signal, including decoding and scaling.

An image synthesis module, such as an image synthesizer, is used to superimpose and mix the GUI signal generated by the graphics generator with the zoomed video image according to user input or itself to generate an image signal for display.

Frame rate conversion module, used to convert the frame rate of the input video, such as converting the frame rate of the input 24Hz, 25Hz, 30Hz, 60Hz video to the frame rate of 60Hz, 120Hz or 240Hz, where the input frame rate can be compared with the source The video stream is related, and the output frame rate can be related to the update rate of the display. The input has a usual format, such as frame insertion.

The display formatting module is used to change the signal output by the frame rate conversion module into a signal that conforms to a display format such as a display, such as format conversion of the signal output by the frame rate conversion module to output RGB data signals.

The display 280 is used to receive the image signal input from the video processor 260-1, display video content and images, and a menu control interface. The display 280 includes a display component for presenting a picture and a driving component for driving image display. The displayed video content can be from the video in the broadcast signal received by the tuner and demodulator 220, or from the video content input by the communicator or the interface of an external device. The display 220 simultaneously displays a user manipulation interface UI generated in the display device 200 and used to control the display device 200.

And, depending on the type of the display 280, it also includes a driving component for driving the display. Alternatively, if the display 280 is a projection display, it may also include a projection device and a projection screen.

The audio processor 260-2 is used to receive audio signals, and perform decompression and decoding according to the standard codec protocol of the input signal, as well as audio data processing such as noise reduction, digital-to-analog conversion, and amplification processing, and the result can be in the speaker 272 The audio signal to be played.

The audio output interface 270 is used to receive the audio signal output by the audio processor 260-2 under the control of the controller 210. The audio output interface may include a speaker 272 or output to an external audio output terminal 274 of the generator of an external device, such as : External audio terminal or headphone output terminal, etc.

In some other exemplary embodiments, the video processor 260-1 may include one or more chips. The audio processor 260-2 may also include one or more chips.

And, in some other exemplary embodiments, the video processor 260-1 and the audio processor 260-2 may be separate chips, or they may be integrated with the controller 210 in one or more chips.

The power supply is used to provide power supply support for the display device 200 with power input from an external power supply under the control of the controller 210. The power supply may include a built-in power supply circuit installed inside the display device 200, or may be a power supply installed outside the display device 200, such as a power interface that provides an external power supply in the display device 200.

Similar to the N chip, as shown in FIG. 4, the A chip may include a controller 310, a communicator 330, a detector 340, and a memory 390. In some embodiments, it may also include a user input interface, a video processor, an audio processor, a display, and an audio output interface. In some embodiments, there may also be a power supply that independently powers the A chip.

The communicator 330 is a component for communicating with external devices or external servers according to various communication protocol types. For example, the communicator 330 may include a WIFI module 331, a Bluetooth communication protocol module 332, a wired Ethernet communication protocol module 333, and an infrared communication protocol module and other network communication protocol modules or near field communication protocol modules.

The communicator 330 of the A chip and the communicator 230 of the N chip also interact with each other. For example, the WiFi module 231 of the N chip is used to connect to an external network and generate network communication with an external server and the like. The WiFi module 331 of the A chip is used to connect to the WiFi module 231 of the N chip, and does not directly connect to an external network or the like. Therefore, for the user, a display device as in the above embodiment can externally display a WiFi account.

The detector 340 is a component used by the chip of the display device A to collect signals from the external environment or interact with the outside. The detector 340 may include a light receiver 342, a sensor used to collect the intensity of ambient light, which can adaptively display parameter changes by collecting ambient light, etc.; it may also include an image collector 341, such as a camera, a camera, etc., which can be used to collect external Environmental scenes, as well as gestures used to collect user attributes or interact with users, can adaptively change display parameters, and can also recognize user gestures to achieve the function of interaction with users.

The external device interface 350 provides components for data transmission between the controller 310 and the N chip or other external devices. The external device interface can be connected to external devices such as set-top boxes, game devices, notebook computers, etc., in a wired/wireless manner.

The controller 310 controls the work of the display device 200 and responds to user operations by running various software control programs (such as installed third-party applications, etc.) stored on the memory 390 and interacting with the N chip.

As shown in FIG. 4, the controller 310 includes at least one of a read-only memory ROM313, a random access memory RAM314, a graphics processor 316, a CPU processor 312, a communication interface 318, and a communication bus. Among them, the ROM 313 and the RAM 314, the graphics processor 316, the CPU processor 312, and the communication interface 318 are connected by a bus.

ROM313, used to store various system startup instructions. The CPU processor 312 runs the system startup instruction in the ROM, and copies the temporary data of the operating system stored in the memory 390 to the RAM 314 to start the operating system. After the operating system is started, the CPU processor 312 copies the temporary data of the various application programs in the memory 390 to the RAM 314, and then starts to run and start the various application programs.

The CPU processor 312 is used to execute the operating system and application instructions stored in the memory 390, communicate with the N chip, transmit and interact with signals, data, instructions, etc., and execute various interactive instructions received from external inputs Various applications, data and content, in order to finally display and play various audio and video content.

The communication interface may include the first interface 318-1 to the nth interface 318-n. These interfaces may be network interfaces connected to external devices via a network, or network interfaces connected to the N chip via a network.

The controller 310 may control the overall operation of the display device 200. For example, in response to receiving a user command for selecting a UI object to be displayed on the display 280, the controller 210 may perform an operation related to the object selected by the user command.

The graphics processor 316 is used to generate various graphics objects, such as icons, operation menus, and user input instructions to display graphics. Including an arithmetic unit, which performs operations by receiving various interactive commands input by the user, and displays various objects according to display attributes. As well as including a renderer, various objects obtained based on the arithmetic unit are generated, and the rendering result is displayed on the display 280.

Both the graphics processor 316 of the A chip and the graphics processor 216 of the N chip can generate various graphics objects. Differentily, if application 1 is installed on the A chip and application 2 is installed on the N chip, when the user is in the interface of the application 1 and the user inputs instructions in the application 1, the A chip graphics processor 316 generates a graphic object. When the user is on the interface of Application 2 and performs the user-input instructions in Application 2, the graphics processor 216 of the N chip generates the graphics object.

Fig. 5 exemplarily shows a schematic diagram of a functional configuration of a display device according to an exemplary embodiment.

In some embodiments, as shown in FIG. 5, the memory 390 of the A chip and the memory 290 of the N chip are used to store operating systems, applications, content, and user data, respectively. The controller 310 of the A chip and the memory 290 of the N chip The system operation of driving the display device 200 and responding to various operations of the user are performed under the control of the controller 210. The memory 390 of the A chip and the memory 290 of the N chip may include volatile and/or nonvolatile memory.

For the N chip, the memory 290 is used to store the operating program that drives the controller 210 in the display device 200, and store various application programs built in the display device 200, various application programs downloaded by the user from an external device, and application related programs The various graphical user interfaces, and various objects related to the graphical user interface, user data information, and various internal data supporting applications. The memory 290 is used to store system software such as an operating system (OS) kernel, middleware, and applications, and to store input video data and audio data, and other user data.

The memory 290 is used to store driver programs and related data such as the video processor 260-1 and the audio processor 260-2, the display 280, the communication interface 230, the tuner and demodulator 220, and the input/output interface.

In some embodiments, the memory 290 may store software and/or programs. The software programs used to represent an operating system (OS) include, for example, kernels, middleware, application programming interfaces (APIs), and/or application programs. Exemplarily, the kernel may control or manage system resources, or functions implemented by other programs (such as the middleware, API, or application program), and the kernel may provide interfaces to allow middleware and APIs, or applications to access the controller , In order to achieve control or management of system resources.

For example, the memory 290 includes a broadcast receiving module 2901, a channel control module 2902, a volume control module 2903, an image control module 2904, a display control module 2905, an audio control module 2906, an external command recognition module 2907, a communication control module 2908, and an optical receiver At least one of a module 2909, a power control module 2910, an operating system 2911, and other application programs 2912, a browser module, and so on. The controller 210 executes various software programs in the memory 290, such as: broadcast and television signal reception and demodulation function, TV channel selection control function, volume selection control function, image control function, display control function, audio control function, external command Various functions such as identification function, communication control function, optical signal receiving function, power control function, software control platform supporting various functions, and browser function.

The memory 390 includes storing various software modules for driving and controlling the display device 200. For example, various software modules stored in the memory 390 include: at least one of a basic module, a detection module, a communication module, a display control module, a browser module, and various service modules. Since the functions of the memory 390 and the memory 290 are relatively similar, please refer to the memory 290 for related parts, and will not be repeated here.

For example, the memory 390 includes an image control module 3904, an audio control module 2906, an external command recognition module 3907, a communication control module 3908, an optical receiving module 3909, an operating system 3911, and other application programs 3912, a browser module, and so on. The controller 210 executes various software programs in the memory 290, such as: image control function, display control function, audio control function, external command recognition function, communication control function, light signal receiving function, power control function, support for various Functional software control platform, and various functions such as browser functions.

Differentily, the external command recognition module 2907 of the N chip and the external command recognition module 3907 of the A chip can recognize different commands.

Exemplarily, because the image receiving device such as the camera is connected to the A chip, the external command recognition module 3907 of the A chip may include a graphic recognition module 2907-1. The graphic recognition module 3907-1 stores a graphic database, and the camera receives external commands. In order to control the display device, the corresponding relationship is made with the instructions in the graphics database. Since the voice receiving device and the remote controller are connected to the N chip, the external command recognition module 2907 of the N chip may include a voice recognition module 2907-2. The graphics recognition module 2907-2 stores a voice database, and the voice receiving device, etc. The external voice commands or time correspond to the commands in the voice database to control the display device. Similarly, a control device 100 such as a remote controller is connected to the N chip, and the key command recognition module interacts with the control device 100.

Fig. 6a exemplarily shows a configuration block diagram of the software system in the display device 200 according to an exemplary embodiment.

For the N chip, as shown in Figure 6a, the operating system 2911 includes operating software for processing various basic system services and for implementing hardware-related tasks, acting as a medium for data processing between application programs and hardware components.

In some embodiments, part of the operating system kernel may include a series of software to manage the hardware resources of the display device and provide services for other programs or software codes.

In some other embodiments, part of the operating system kernel may include one or more device drivers, and the device drivers may be a set of software codes in the operating system to help operate or control devices or hardware associated with the display device. The drive may contain code to manipulate video, audio, and/or other multimedia components. Examples include displays, cameras, Flash, WiFi, and audio drivers.

Among them, the accessibility module 2911-1 is used to modify or access the application program, so as to realize the accessibility of the application program and the operability of its display content.

The communication module 2911-2 is used to connect to other peripherals via related communication interfaces and communication networks.

The user interface module 2911-3 is used to provide objects that display the user interface for access by various applications, and can realize user operability.

The control application 2911-4 is used to control process management, including runtime applications.

The event transmission system 2914 can be implemented in the operating system 2911 or in the application 2912. In some embodiments, it is implemented in the operating system 2911 on the one hand, and implemented in the application program 2912 at the same time, for monitoring various user input events, and responding to the recognition results of various events or sub-events according to various events. And implement one or more sets of pre-defined operation procedures.

Among them, the event monitoring module 2914-1 is used to monitor input events or sub-events of the user input interface.

The event recognition module 2914-1 is used to input the definitions of various events to various user input interfaces, recognize various events or sub-events, and transmit them to the processing to execute the corresponding one or more groups of processing programs .

Among them, the event or sub-event refers to the input detected by one or more sensors in the display device 200 and the input of an external control device (such as the control device 100). Such as: various sub-events of voice input, gesture input sub-events of gesture recognition, and sub-events of remote control button command input of control devices. For example, one or more sub-events in the remote control include multiple forms, including but not limited to one or a combination of pressing up/down/left/right/, confirming keys, and pressing keys. And the operations of non-physical buttons, such as moving, pressing, and releasing.

The interface layout management module 2913, which directly or indirectly receives various user input events or sub-events monitored by the event transmission system 2914, is used to update the layout of the user interface, including but not limited to the position of each control or sub-control in the interface, and the container The size, position, level, etc. of the interface are related to various execution operations.

Since the functions of the operating system 3911 of the A chip and the operating system 2911 of the N chip are relatively similar, please refer to the operating system 2911 for related details, and will not be repeated here.

As shown in FIG. 6b for the application control in the interactive interface, the application layer of the display device includes various applications that can be executed on the display device 200.

The application layer 2912 of the N chip may include, but is not limited to, one or more applications, such as video-on-demand applications, application centers, and game applications. The application layer 3912 of the A chip may include, but is not limited to, one or more applications, such as a live TV application, a media center application, and so on. It should be noted that the application programs contained on the A chip and the N chip are determined according to the operating system and other designs. This application does not need to specifically limit and divide the application programs contained on the A chip and the N chip.

Live TV applications can provide live TV through different sources. For example, a live TV application may use input from cable TV, wireless broadcasting, satellite services, or other types of live TV services to provide TV signals. And, the live TV application can display the video of the live TV signal on the display device 200.

Video-on-demand applications can provide videos from different storage sources. Unlike live TV applications, VOD provides video display from certain storage sources. For example, the video on demand can come from the server side of cloud storage, and from the local hard disk storage that contains the stored video programs.

Media center applications can provide various multimedia content playback applications. For example, the media center can provide services that are different from live TV or video on demand, and users can access various images or audio through the media center application.

Application center, can provide storage of various applications. The application program may be a game, an application program, or some other application program that is related to a computer system or other device but can be run on a display device. The application center can obtain these applications from different sources, store them in the local storage, and then run on the display device 200.

FIG. 7 exemplarily shows a schematic diagram of a user interface in the display device 200 according to an exemplary embodiment. As shown in FIG. 7, the user interface includes multiple view display areas, for example, a first view display area 201 and a play screen 202, where the play screen includes layout of one or more different items. And the user interface also includes a selector indicating that the item is selected, and the position of the selector can be moved through user input to change the selection of different items.

It should be noted that multiple view display areas can present display screens of different levels. For example, the display area of the first view may present the content of the video chat item, and the display area of the second view may present the content of the application layer item (eg, webpage video, VOD display, application screen, etc.).

In some embodiments, the content of the display area of the second view includes content displayed on the video layer and part of the content displayed on the floating layer, and the content of the display area of the first view includes content displayed on the floating layer. The floating layers used in the first view display area and the second view display area are different floating layers.

In some embodiments, the presentation of different view display areas has different priorities, and the display priorities of the view display areas are different between view display areas with different priorities. For example, the priority of the system layer (such as the video layer) is higher than that of the application layer. When the user uses the acquisition selector and screen switching in the application layer, the screen display in the view display area of the system layer is not blocked; and, according to the user When the size and position of the view display area of the application layer change, the size and position of the view display area of the system layer are not affected.

In some implementations, such as picture-in-picture, two different display windows can be drawn in the same layer to achieve the same level of display. At this time, the selector can be in the first view display area and the second view. Switch between display areas (ie switch between two display windows). At this time, in some embodiments, when the size and position of the display area of the first view change, the size and position of the display area of the second view may change accordingly.

In some embodiments, for a dual-chip smart TV 200, independent operating systems may be installed in the A chip and the N chip, so that there are two independent but related sub-systems in the display device 200. system. For example, both the A chip and the N chip can be independently installed with Android and various APPs, so that each chip can realize a certain function, and the A chip and the N chip can realize a certain function in cooperation.

In some embodiments, for a smart TV 200 that is not dual-chip (for example, a single-chip smart TV), there is a system chip, and the operating system controls the realization of all functions of the smart TV.

In the display device provided by the embodiment of the present application, the camera is connected to the auxiliary chip, which can perform artificial intelligence operations on the image obtained by the camera; the microphone is connected to the main chip, and the main chip performs gain processing on the sound collected by the microphone. When a video call is made through the display device provided in the embodiment of the application, the auxiliary chip collects video images through the camera, and uses artificial intelligence application technologies such as face recognition and motion (lip shape) recognition. The picture during the video call is no longer Limited to a fixed focal length picture, but a zoomable video that can be focused on the target speaker through the combination of face recognition + lip recognition, which can realize automatic face recognition regardless of the corner of the person or how far away from the camera. Partial focus, that is, the face can be kept unchanged in the display frame of the opposite end of the display device. When making a video call through the display device provided by the embodiment of the application, the display size of the face does not change as the distance between the person and the camera changes, but when the distance between the person and the camera changes, the person and the microphone (far-field The distance of the microphone will also change. In order to ensure that the sound can have a certain stability while the display size of the face remains unchanged, the embodiment of the present application also provides an audio adjustment method.

FIG. 8 is a schematic flowchart of an audio adjustment method provided by an embodiment of the application. As shown in Figure 8, the audio adjustment method provided by the embodiment of the present application includes:

S101: Obtain focal length information corresponding to the current image in the video call.

In some embodiments, the controller may include a main chip and an auxiliary chip. During a video call, the auxiliary chip obtains an image through a camera, and automatically focuses on the face when the image is collected. In the embodiment of the present application, the face auto-focusing realizes focusing by the phase method. The phase method of focusing refers to judging whether it is in focus by the time sequence of the light beam reaching the photosensitive element, that is, the phase shift amount. During autofocus, the camera will place a grid plate with light-transmitting and opaque lines alternately parallel to the photosensitive element, and place two light-receiving elements symmetrically along the optical axis at an appropriate position behind the grid plate.

When focusing, the grid plate moves up and down along the vertical direction of the optical axis. When the focus plane and the network plate coincide (ie in focus), the light passing through the grid plate will reach the two light receiving elements behind the plate at the same time; In the case of front focus or back focus), the two beams can only reach the light receiving element one after the other, and there is a phase difference between the output signals. Because the peak positions of the front focus and the back focus are different, the camera can quickly determine where to shift, instead of moving back and forth several times like the contrast type to achieve focus. Refer to Figure 9 for the calculation principle, which will not be repeated here.

In some embodiments, the image transmitted to the opposite end is a cropped face image, that is, due to the existence of autofocus, regardless of the distance between the person and the TV, the face in the collected image is locally transmitted For the opposite end, the video received by the opposite end cannot feel the change in the distance between the local people and the TV. But if the sound uses a fixed gain, the face received by the opposite end cannot change the distance, but the sound received by the opposite end will change with the distance between the local person and the TV.

When the camera of the auxiliary chip collects images, it will automatically focus in real time and output focal length information in real time. In image processing, the sharpness and focus of the image are determined by the amount of high-frequency components of the image. If there are more high-frequency components, the image is clear. Otherwise, the image is blurred. You need to adjust the focus to achieve clarity and determine the clear image. There are Fourier Transform (FFT) and Discrete Cosine Transform (DCT); therefore, each frame of image will output a value that characterizes whether the image is clear, such as image distance. In the embodiments of the present application, the calculation methods of Image distance include high frequency component method, smoothing method, threshold integration method, gray difference method, Laplace image energy function, and so on.

In order to quickly output the focal length information corresponding to the current image, an improved gray-scale difference method can be used as the image sharpness evaluation function, that is, the sum of the squares of the brightness value differences between all pixels of an image and the surrounding pixels as the image The focus evaluation function of, calculates the value of the adjacent same-field image evaluation function, the focus evaluation function is as follows:

f(x, y) is expressed as the brightness value of the pixel in the x-th row and y-th column. This algorithm selects two adjacent pixels (the left and upper side of f(x,y) pixels) for comparison. When the image is in focus, F(x,y) is the largest, that is, the corresponding Image distance value is the largest. The image distance is calculated in real time by adjusting the focal length of the lens adaptively, and when the relative maximum is reached, the autofocus is completed and the corresponding focal length information is output.

In some embodiments, the controller may not distinguish between the main chip and the auxiliary chip. The controller starts the camera to collect the local image according to the input operation, and generates the current image according to the local image, and controls the microphone to collect the local sound to generate audio. Since the current image corresponds to the focal length information, the focal length information of the current image is acquired to adjust the sound.

In some embodiments, the application that starts the camera and speaker for audio and video collection may be a video call application, or a video/self-timer application. Therefore, after collecting local images and local sounds, the controller also needs to determine whether it is in a video call Status. If it is, it means that the application that starts the camera and speaker for audio and video collection is a video call application. The audio needs to be adjusted according to the focal length information of the current image. The controller adjusts the audio after the focal length information of the current image. The audio and the current image are sent to the peer device of the video call. If it is not in the video call state, it means that the application that starts the camera and speaker for audio and video collection is a video/self-portrait application. There is no need to adjust the focus information of the current image. For the audio, the control system directly generates audio and video files based on the current image and the audio.

In some embodiments, the video call status can be obtained by marking the application that starts the camera and speaker for audio and video collection through the application manager part of the controller, and the video call is made when the application that starts the camera and speaker for audio and video collection is started. It is marked as a video call state when it is applied, and it is marked as a non-video call state when other applications such as recording/Selfie and other applications are started when the camera and speaker are used for audio and video capture.

In some embodiments, the user selects the video/self-portrait application to preview or record audio and video through the camera and speaker, the controller activates the camera to collect video images, activates the speaker to collect sound, and the application can first display the preview interface , Display the current image after data processing based on the image collected by the camera in the preview interface, where the data processing can be the adjustment of image quality (such as brightness, contrast, chroma, color temperature, etc.), adding controls (decorative controls, layers) Etc.), or at least one of other treatments. The preview interface can also be provided with a control for generating audio and video files. In response to the user's selection of the control for generating audio and video files, the controller generates the current image according to the image collected by starting the camera to collect the video image, and starts the speaker to collect the sound to generate audio , And combine the current image and audio into audio and video files. In some embodiments, after the user selects the control for generating audio and video files, the interface of the application may continue to display the current image.

In some embodiments, the buffer data is continuously and periodically generated in the recorded audio and video files, and upon receiving the input operation instruction to save the video (an exemplary recording end operation, or the only operation instruction), according to the buffer The data generates a video file. This can speed up the generation of video files that users feel.

S102: Acquire microphone gain according to the focal length information.

In some embodiments, the change information of Image distance reflects the change information of the focal length of the current lens, and so on, it can correspond to the change of the distance of the current user from the camera of the display device, that is, the change of the distance from the last call. . Generally, when the Image distance becomes larger, it means that the distance between the current user and the display device (camera) becomes larger, and if the Image distance becomes smaller, it means that the distance between the current user and the display device (camera) becomes smaller. Therefore, according to the Image distance and its changes, find out the focal length corresponding to when the Image distance reaches the relative maximum, that is, the focal length information corresponding to the current image. The distance between the current user and the far-field microphone is determined according to the focal length information, thereby determining the change of the distance from the far-field microphone, and then obtaining the microphone gain, and performing gain processing on the collected audio data through the microphone gain.

In some embodiments, in order to facilitate the acquisition of microphone gain, the distance between the current user and the display device (camera) and the corresponding focal length information are calculated, and the empirical value is used to establish a preset correspondence between the microphone gain and the focal length information. When the focal length information corresponding to the current image in the video call is determined, the microphone gain can be obtained according to the preset correspondence relationship between the microphone gain and the focal length information.

The preset correspondence between the microphone gain and the focus information is established based on the empirical value, as shown in Table 1, where Table 1 is only given as an example, and is not a limitation of this application.

Table 1:

对应的距离Corresponding distance	焦距信息Focal length information	麦克风增益Microphone gain
3米3 meters	0.3毫米0.3mm	0dB0dB
4米4 meters	0.4毫米0.4mm	10dB10dB
2米2 meters	0.2毫米0.2mm	-10dB-10dB
2.5米2.5 meters	0.25毫米0.25mm	-8dB-8dB
3.5米3.5 meters	0.35毫米0.35mm	8dB8dB

Therefore, when the acquired focal length information is 0.2 mm, then according to the preset correspondence between the microphone gain and the focal length information shown in Table 1, the corresponding microphone gain can be obtained as -10dB, and then the acquired microphone gain is used for the corresponding collected Audio data undergoes gain processing.

If the acquired focal length information is not in the table, the corresponding microphone gain is calculated by interpolation. For example, if the acquired focal length information is 0.375 mm, the microphone gain can be calculated by the following equation, where X is the corresponding microphone gain.

It can be calculated that X=9 dB, that is, when the focal length information is 0.375 mm, the corresponding microphone gain is 9 dB.

In some embodiments, in order to facilitate the acquisition of the microphone gain, the distance between the current user and the display device (camera) and the corresponding image distance may also be counted to establish a preset function model of the microphone gain and focal length information. When the focal length information corresponding to the current image in the video call is determined, the preset function model of microphone gain and focal length information is obtained; the microphone gain can be obtained according to the focal length information and the preset function model combining the microphone gain and focal length information .

In some embodiments, in order to facilitate the acquisition of the microphone gain, an adaptive method may be adopted to acquire the focal length information according to the focal length information.

In some implementations, since the application that starts the camera and speaker for audio and video collection may be a video call application, in order to ensure the display effect of the video call, the controller will receive the local image collected by the camera, and according to the position of the person in the local image The local image is cropped to generate a current image of a preset size. Because the camera adjusts the focal length when the image is collected to obtain a clear image of a person, the current image corresponds to a focal length information.

In some implementations, due to the movement of the person’s position, the distance from the display device will change, but since the image transmitted to the opposite end is captured from the local image, the opposite end may not be able to see the image. The movement of a person relative to the display device, but the change in the distance between the display device and the display device will cause the volume of the sound data collected by the speaker to change, so adjusting the gain of the audio data through the current image corresponding to different focal lengths can offset the person and display The change of the distance between the devices will cause the volume of the sound data collected by the speaker to change.

In some embodiments, due to the movement of the person’s position, the distance to the display device will change, but because the image transmitted to the opposite end is the area corresponding to the face/human body intercepted from the local image as the transmission to the opposite end. The image on the other end may not see the movement of the person relative to the display device from the image. Since the focal length of the camera changes with the face/human body, adjusting the audio data gain through the current image corresponding to different focal lengths can offset the human and The change of the distance between the display devices will cause the volume of the sound data collected by the speaker to change, thus ensuring the consistency of the sound and image sent to the peer device.

S103: Adjust the audio received by the microphone according to the acquired microphone gain value.

In some embodiments, the microphone gain obtained through the focal length information is used to adjust the audio received by the microphone, that is, the obtained microphone gain value is used to perform gain processing on the audio received by the microphone, which is convenient to ensure the audio sound received by the microphone. Stability of size.

In some embodiments, the microphone gain is obtained through the focal length information, but the speaker gain is not processed, so that the audio of the opposite end can be output normally.

Therefore, the audio adjustment method provided by the present application includes: obtaining the focal length information of the current video image in the video call, and obtaining the microphone gain according to the focal length information. In the process of a video call with zoom processing through a smart TV, the focal length information is obtained according to the automatic zoom processing of the video image, the corresponding microphone gain is obtained according to the focal length information, and the current audio data in the video call is gain processing by obtaining the microphone gain . In this application, the focal length information is used to process the video image in the video call to determine the microphone gain, so as to realize the gain processing of the audio data based on the distance between the person and the microphone during the video call, so as to reduce the impact of the change of the distance between the person and the TV. The fluctuation caused by the volume of the voice sent locally to the opposite end makes the volume of the voice sent locally to the opposite end basically unchanged, ensuring the stability of the sound during the video call.

Based on the audio adjustment method provided by the implementation of this application, an embodiment of this application also provides a video call method. FIG. 10 is a schematic flowchart of a video call method provided by an embodiment of the application.

As shown in FIG. 10, the video call method provided by the embodiment of the present application includes:

S201: The auxiliary chip transmits the video image processed by the automatic zoom to the main chip, and transmits the focal length information corresponding to the video image to the main chip.

In some embodiments, the auxiliary chip receives the initial video image generated by the local image collected by the camera, and performs automatic zoom processing on the initial video image to generate a zoomed image. In some embodiments, since the acquisition and transmission of the video image is continuous, if the image focal length information after the zoom processing after the initial video image processing changes, the auxiliary chip determines whether the focal length information corresponding to the current image is consistent with the above The focal length information corresponding to the current image at one moment is the same; when the focal length information corresponding to the current image is different from the focal length information corresponding to the current image at the previous moment, the focal length information corresponding to the video image is transmitted to the main chip; When the focal length information corresponding to the current image is the same as the focal length information corresponding to the current image at the previous moment, the focal length information is not transmitted to the main chip or an identifier indicating that the focal length information remains unchanged is transmitted to the main chip. At this time, the focal length information is generated based on the focal length information of the auto zoom.

In some embodiments, the auxiliary chip can directly transmit the initial image collected by the camera to the main chip. During the initial image acquisition, the camera performs physical zoom processing, so the initial image is sufficient. Perform cropping to generate the current image. At this time, the focal length information is generated according to the physical focal length information of the camera.

In some embodiments, the auxiliary chip may perform automatic zoom processing on the initial image obtained by the physical zoom of the camera, and then generate a zoomed image and send it to the main chip. At this time, the focal length information is generated according to the physical focal length information of the camera and the focal length information of the automatic zoom.

In some embodiments, the auxiliary chip receives the initial video image generated by the local image collected by the camera, and automatically zooms the initial video image according to the position of the human face or human body and cuts it into a preset size to generate a zoomed image. The main chip is cut. The camera can be a zoom camera or a fixed focus camera. When the camera is a zoom camera, the initial video image is generated by controlling the focal length of the camera according to the position of the face or human body. At this time, the initial video image can be directly used as a zoomed camera. , You can also continue auto zoom processing.

S202: The main chip receives the video image and the focal length information.

In some embodiments, the main chip recognizes the face or human body in the zoomed image, and cuts the image according to the relative position of the face or human body in the image to generate the current image to send to the opposite device .

In some embodiments, since cropping does not change the focal length of the image, the focal length information of the current image is the focal length information of the image after zooming.

In some embodiments, a face or a human body is recognized on the initial image, and the image is cut according to the relative position of the face or the human body in the image to generate the current image to send to the opposite device.

S203: The main chip obtains the microphone gain according to the focal length information, and performs gain processing on the audio corresponding to the video image according to the microphone gain, so as to reduce the fluctuation of the audio volume sent locally to the peer.

S204: The main chip synchronizes the gain-processed audio with the video image, and transmits the synchronized audio and video to the opposite device.

In some embodiments, the synchronized audio and video are periodically encapsulated into data packets and sent to a peer display device, so that the peer display device parses and plays audio and video.

In some embodiments, the auxiliary chip collects a video image through a camera, and obtains an auto-zoomable video image through an auto-zoom process during the acquisition of the video image, and outputs focal length information corresponding to the auto-zoomed video image. The auxiliary chip transmits the video image processed by the automatic zoom to the main chip, and at the same time transmits the focal length information corresponding to the video image to the main chip. In the embodiment of the present application, the main chip and the auxiliary chip include at least one of the communication modes of a network, a serial port, a USB, and an HDMI. Therefore, the auxiliary chip can transmit the video image and the focal length information corresponding to the video image to the main chip through the network, serial port, USB or HDMI. Among them, the auxiliary chip can dynamically select any communication mode of network, serial port, USB, and HDMI based on the stability of communication between the auxiliary chip and the main chip, which is not specifically limited here.

When the auxiliary chip transmits the video image and the focal length information corresponding to the video image to the main chip through the network, serial port, USB or HDMI, the main chip receives the video image and the focal length information corresponding to the video image.

In some embodiments, the video image processed by the automatic zoom is an image cropped in the image collected by the camera according to the focal length. In some embodiments, the video image processed by the automatic zoom is a face image cropped according to the tracking result of the automatic zoom.

In some embodiments, the main chip collects audio information through a microphone, where the audio information collected by the microphone is synchronized with the video image collected by the camera. When the main chip receives the video image and the focal length information corresponding to the video image transmitted by the auxiliary chip, it determines the microphone gain according to the focal length information corresponding to the video image, and then performs gain processing on the collected audio information by determining the obtained microphone gain to obtain Audio after gain processing.

After performing gain processing on the collected audio information, the main chip synchronizes the gain-processed audio with the video image to obtain audio-visual synchronized video call data, and transmits the audio-visual synchronized video call data to the display frame of the opposite end. Complete the video call.

The video call method provided in the embodiments of this application realizes the mutual cooperation of the main chip and the auxiliary chip, and solves the problem that a single computing chip must support the instant communication function (video codec, transmission) of the video call and also perform real-time artificial intelligence algorithms. (Face recognition, lip shape recognition) calculation pressure. At the same time, in the video call method provided in the embodiments of the present application, in the process of performing a zoomable video call through a smart TV, the focal length information is obtained according to the automatic zoom processing of the video image, and the corresponding microphone gain is obtained according to the focal length information. Obtain the microphone gain. Perform gain processing on the current audio data in the video call, realize the gain processing on the audio data based on the distance between the person and the microphone during the video call, and ensure the stability of the sound during the video call, that is, it can be real-time during the video process Zoom to focus on the face and the sound can be stable and smooth in real time.

In some embodiments, the distinction between the main chip and the auxiliary chip may not be set, and all the corresponding operations are directly executed by the controller.

In order to facilitate the acquisition of microphone gain, in some embodiments, in the video call method provided, the main chip acquiring the microphone gain according to the focal length information includes:

The main chip obtains the preset correspondence between microphone gain and focus information;

Search for a preset correspondence between the focal length information and the microphone gain according to the focal length information, and obtain the microphone gain.

Or, in order to facilitate the acquisition of microphone gain, in the video call method provided in the embodiment of the present application, the main chip acquiring the microphone gain according to the focal length information includes:

The main chip acquires a preset function model of microphone gain and focal length information;

Acquire microphone gain according to the focal length information and a preset function model combining the microphone gain and focal length information.

The steps of obtaining the microphone gain through the preset corresponding relationship between the microphone gain and the focus information or the preset function model of the microphone gain and the focus information can refer to the audio adjustment method provided in the foregoing embodiment, which will not be repeated here. The acquisition of the microphone gain in the video call method provided in the embodiments of the present application is not limited to the acquisition of the preset correspondence between the microphone gain and the focus information or the preset function model of the microphone gain and the focus information, and an adaptive method may also be used to obtain the focus. information.

In some implementation manners, in the video call method provided by the embodiments of the present application, the transmitting the focal length information corresponding to the video image to the main chip includes:

Determining whether the focal length information corresponding to the video image is the same as the focal length information corresponding to the video image at the previous moment;

When the focal length information corresponding to the video image is different from the focal length information corresponding to the video image at the previous moment, the focal length information corresponding to the video image is transmitted to the main chip.

In this way, when the auxiliary chip transmits the video image processed by the automatic zoom to the main chip, by comparing the focal length information corresponding to the video image with the focal length information corresponding to the video image at the previous moment, it is determined that the focal length information corresponding to the video image at the current moment is compared Whether the focal length information corresponding to the video image has changed at the previous moment, when the focal length information corresponding to the video image at the current moment changes compared with the focal length information corresponding to the video image at the previous moment, the focal length information corresponding to the video image is transmitted to The main chip can now reduce the computational consumption of the main chip.

In some embodiments, a display device is also provided. The display device provided by the embodiment of the present application includes a display, and the display is configured to display a user interface;

A controller communicatively connected with the display, the controller being configured to perform the presentation of the user interface:

The main chip connected to the display and the auxiliary chip connected to the main chip through at least one of a network, a serial port, a USB and an HDMI communication mode, wherein the main chip is configured to perform the methods provided in the foregoing embodiments Audio adjustment method; or,

The main chip and the auxiliary chip are configured to cooperatively execute the video call method provided in the foregoing embodiment.

In some implementation manners, the present application provides an audio and video processing method, the method includes:

Receive the current image generated by the local image collected by the camera, and receive the audio generated by the local sound collected by the microphone; obtain the focal length information corresponding to the current image; obtain the microphone gain according to the focal length information and the preset correspondence, wherein the preset corresponds Different microphone gains in the relationship correspond to different focal length information; adjust the audio according to the acquired microphone gain value; and send the adjusted audio to the peer device of the video call.

In some embodiments of the present application, the obtaining the microphone gain according to the focal length information includes: obtaining a preset correspondence between the microphone gain and the focal length information; searching the preset correspondence according to the focal length information to obtain the microphone gain .

In some embodiments of the present application, the obtaining microphone gain according to the focal length information includes: obtaining a preset function model of microphone gain and focal length information; according to the focal length information and a preset combination of the microphone gain and focal length information Set up a function model to obtain microphone gain.

In some embodiments of the present application, after receiving the current image generated according to the local image collected by the camera, and receiving the audio generated according to the local sound collected by the microphone, the method further includes: sending the current image to the video caller The opposite device.

In some embodiments, before the obtaining the focal length information corresponding to the current image, the method further includes: determining whether it is currently in a video call state, and then performing the step of obtaining focal length information corresponding to the current image to process the audio; if If it is not in a video call state, the step of obtaining focal length information corresponding to the current image is not performed to process the audio.

In some embodiments of the present application, receiving the current image generated from the local image collected by the camera, and receiving the audio generated by collecting the local sound from the microphone; if it is in a video call state, adjust the audio according to the focal length information of the current image, And send the adjusted audio and the current image to the peer device of the video call; if it is in the recording state, there is no need to adjust the audio according to the focal length information of the current image, and generate according to the current image and the audio Video files.

In some embodiments of the present application, the adjusting the audio according to the focal length information of the current image, and sending the adjusted audio and the current image to the peer device of the video call includes: obtaining the focal length corresponding to the current image Information; obtain microphone gain according to the focal length information and a preset correspondence, wherein different microphone gains in the preset correspondence correspond to different focal length information; adjust the audio according to the acquired microphone gain value; after the adjustment The audio and the current image are sent to the peer device of the video call.

In some embodiments of the present application, generating a video file based on the current image and the audio includes: generating buffer data based on the current image and the audio superimposing; receiving an input operation instruction to save the video, according to the buffer The data generates a video file.

For the audio adjustment method and the video call method, please refer to the above-mentioned embodiment and other features of the display device provided in the embodiment of the present application can be referred to the display device 200 or other non-dual-chip display devices provided in the above-mentioned embodiment, which will not be repeated here.

Based on the exemplary embodiments shown in this application, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of the appended claims of this application. In addition, although the disclosure in this application is introduced in accordance with one or several exemplary examples, it should be understood that various aspects of these disclosures can also constitute a complete implementation separately.

It should be understood that the terms "first", "second", "third", etc. in the specification and claims of this application and the above-mentioned drawings are used to distinguish similar objects or entities of the same kind, and are not necessarily used to describe A specific order or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances, unless otherwise specified, for example, it can be implemented according to an order other than those given in the illustration or description of the embodiments of the present application.

In addition, the terms "including" and "having" and any variations of them are intended to cover but not exclusively include. For example, a product or device including a series of components need not be limited to those clearly listed, but may include Other components that are not clearly listed or are inherent to these products or equipment.

Claims

An audio and video processing method, characterized in that the method includes:

Receive the current image generated by the local image collected by the camera, and receive the audio generated by the local sound collected by the microphone;

Obtain focal length information corresponding to the current image;

Acquiring microphone gains according to the focal length information and the preset correspondence, wherein different microphone gains in the preset correspondence correspond to different focal length information;

Adjusting the audio according to the acquired microphone gain value;

Send the adjusted audio to the peer device of the video call.
The audio and video processing method according to claim 1, wherein the obtaining microphone gain according to the focal length information comprises:

Obtain the preset correspondence between microphone gain and focus information;

Finding the preset correspondence relationship according to the focal length information, and obtaining the microphone gain.
The audio and video processing method according to claim 1, wherein the obtaining microphone gain according to the focal length information comprises:

Obtain a preset function model of microphone gain and focal length information;

Acquire microphone gain according to the focal length information and a preset function model combining the microphone gain and focal length information.
The audio and video processing method according to claim 1, characterized in that, after receiving the current image generated according to the local image collected by the camera, and receiving the audio generated according to the local sound collected by the microphone, the method further comprises: The current image is sent to the peer device of the video call.
The audio and video processing method according to claim 1, characterized in that, before said obtaining the focal distance information corresponding to the current image, the method further comprises:

Determine whether it is currently in a video call state, then execute the step of obtaining focal length information corresponding to the current image to process the audio;

If it is not in a video call state, the step of obtaining focal length information corresponding to the current image is not performed to process the audio.
An audio and video processing method, including:

Receive the current image generated by the local image collected by the camera, and receive the audio generated by the local sound collected by the microphone;

If it is in a video call state, adjust the audio according to the focal length information of the current image, and send the adjusted audio and the current image to the peer device of the video call; The audio is adjusted by the focal length information of the current image, and a video file is generated according to the current image and the audio.
The audio and video processing method of claim 6, wherein the audio is adjusted according to the focal length information of the current image, and the adjusted audio and the current image are sent to the peer device of the video call include:

Obtain focal length information corresponding to the current image;

Acquiring microphone gains according to the focal length information and the preset correspondence, wherein different microphone gains in the preset correspondence correspond to different focal length information;

Adjusting the audio according to the acquired microphone gain value;

Send the adjusted audio and the current image to the peer device of the video call.
8. The audio and video processing method of claim 6, wherein generating a video file according to the current image and the audio comprises:

Generating cache data according to the current image and the audio superimposed;

Receive an input operation instruction for saving the video, and generate a video file according to the buffered data.
An audio and video processing method, characterized in that the method includes:

The auxiliary chip transmits the video image collected by the camera after the automatic zoom processing to the main chip, and transmits the focal length information corresponding to the video image to the main chip;

The main chip receives the video image and the focal length information;

The main chip obtains the microphone gain according to the focal length information, and performs gain processing on the audio corresponding to the video image according to the microphone gain, so as to reduce the fluctuation of the audio volume sent locally to the peer;

The main chip synchronizes the audio after gain processing with the video image, and transmits the synchronized audio and video to the display frame of the opposite end.
A display device including:

camera;

microphone;

The controller is used for:

Receive the current image generated by the local image collected by the camera, and receive the audio generated by the local sound collected by the microphone;

Obtain focal length information corresponding to the current image;

Acquiring microphone gains according to the focal length information and the preset correspondence, wherein different microphone gains in the preset correspondence correspond to different focal length information;

Adjusting the audio according to the acquired microphone gain value;

Send the adjusted audio to the peer device of the video call.
11. The display device of claim 10, wherein the controller further:

Send the current image to the peer device of the video call.
The display device according to claim 10, wherein before the controller obtains the focal length information corresponding to the current image, the controller is further configured to:

Determine whether it is currently in a video call state, then execute the step of obtaining focal length information corresponding to the current image to process the audio;

If it is not in a video call state, the step of obtaining focal length information corresponding to the current image is not performed to process the audio
A display device, characterized in that it includes:

camera;

microphone;

The controller is used for:

Receive the current image generated by the local image collected by the camera, and receive the audio generated by the local sound collected by the microphone;

If it is in a video call state, adjust the audio according to the focal length information of the current image, and send the adjusted audio and the current image to the peer device of the video call;

If it is in a non-video call state, there is no need to adjust the audio according to the focal length information of the current image, and an audio and video file is generated according to the current image and the audio.
A display device, characterized by comprising:

camera;

microphone;

Main chip and auxiliary chip connected to each other;

The auxiliary chip receives the local image collected by the camera, transmits the local image to the main chip after automatic zoom processing, and transmits the focal length information corresponding to the current image to the main chip;

The main chip receives the current image and the focal length information, and generates the current image according to the local image after automatic zoom processing;

The main chip obtains the microphone gain according to the focal length information, and performs gain processing on the audio corresponding to the current image according to the microphone gain, so as to reduce the fluctuation of the audio volume sent locally to the peer;

The main chip synchronizes the audio after gain processing with the video image, and transmits the synchronized audio and video to the display device of the opposite end.
The video call method of claim 14, wherein the transmitting the focal length information corresponding to the current image to the main chip comprises:

Judging whether the focal length information corresponding to the current image is the same as the focal length information corresponding to the current image at the previous moment;

Transmitting the focal length information corresponding to the video image to the main chip when the focal length information corresponding to the current image is different from the focal length information corresponding to the current image at the previous moment;

When the focal length information corresponding to the current image is the same as the focal length information corresponding to the current image at the previous moment, the focal length information is not transmitted to the main chip or an identifier indicating that the focal length information remains unchanged is transmitted to the main chip.