CN115550559A - Video picture display method, device, equipment and storage medium - Google Patents

Video picture display method, device, equipment and storage medium Download PDF

Info

Publication number
CN115550559A
CN115550559A CN202210384109.4A CN202210384109A CN115550559A CN 115550559 A CN115550559 A CN 115550559A CN 202210384109 A CN202210384109 A CN 202210384109A CN 115550559 A CN115550559 A CN 115550559A
Authority
CN
China
Prior art keywords
camera
video
cameras
terminal
audio information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210384109.4A
Other languages
Chinese (zh)
Other versions
CN115550559B (en
Inventor
张东
李晓雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Honor Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honor Device Co Ltd filed Critical Honor Device Co Ltd
Priority to CN202210384109.4A priority Critical patent/CN115550559B/en
Publication of CN115550559A publication Critical patent/CN115550559A/en
Application granted granted Critical
Publication of CN115550559B publication Critical patent/CN115550559B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/181Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a plurality of remote sources

Abstract

The application discloses a video picture display method, a video picture display device, video picture display equipment and a storage medium, and belongs to the technical field of display. The method comprises the following steps: the method comprises the steps of obtaining video pictures shot by each camera in a plurality of cameras and obtaining environmental audio information. And then, according to the environmental audio information, determining a first camera from the plurality of cameras, wherein a human voice sound source exists in a shooting area of the first camera. And finally, displaying the difference between the video picture shot by the first camera and the video picture shot by the second camera. According to the method and the device, the display modes of the video pictures can be dynamically adjusted according to the audio information in the environment where each camera in the video pictures is located, so that the differentiated display of the video pictures is realized, and the effect of highlighting the video pictures is achieved when a person in the video pictures speaks. Therefore, the flexibility of video picture display can be improved, and a user can watch the video picture better.

Description

Video picture display method, device, equipment and storage medium
Technical Field
The present application relates to the field of display technologies, and in particular, to a method, an apparatus, a device, and a storage medium for displaying a video frame.
Background
With the development of terminal technology, terminals gradually integrate functions of communication, shooting, video and audio, and become an indispensable part of people in daily life. At present, a front camera and a rear camera can be arranged on a terminal, and the terminal can realize double video recording functions by using the front camera and the rear camera. Specifically, when the terminal performs double video recording, the front camera and the rear camera can be simultaneously started to record video, and a video picture shot by the front camera and a video picture shot by the rear camera can be displayed in a video interface.
Disclosure of Invention
The application provides a video picture display method, a video picture display device, video picture display equipment and a storage medium, which can improve the flexibility of video picture display. The technical scheme is as follows:
in a first aspect, a video image display method is provided and applied to a terminal. In the method, video pictures shot by each camera in a plurality of cameras are obtained, and environmental audio information is obtained, wherein the environmental audio information comprises audio information in the environment where each camera in the plurality of cameras is located. Then, according to the environmental audio information, a first camera is determined from the multiple cameras, and a human voice sound source exists in a shooting area of the first camera; the video pictures shot by the first camera and the video pictures shot by the second camera are displayed in a differentiated mode, and the second camera is the other cameras except the first camera in the multiple cameras.
In the application, the terminal performs differential display on the video picture shot by the first camera and the video picture shot by the second camera, which means that the video picture shot by the first camera and the video picture shot by the second camera are displayed in different display modes to realize highlighting of the video picture shot by the first camera. That is, a video image captured by a camera having a human voice sound source in a capturing area is displayed in a display manner different from other video images, so that the video image in which the person speaking is located is different from other video images. Therefore, if the person in a certain video picture speaks, the video picture and other video pictures are displayed in a differentiated mode, so that the flexibility of video picture display can be improved, and a user can watch the video pictures better. In addition, the effect of highlighting which video picture is highlighted when the person in which video picture speaks can be achieved, so that the interactive experience of the user can be improved to a certain extent.
Optionally, the multiple cameras are n groups of cameras, different groups of cameras are arranged in different devices, the same group of cameras are arranged in the same device, and n is a positive integer. In this case, the operation of acquiring the environmental audio information may be: acquiring n pieces of audio information, wherein the n pieces of audio information are the environmental audio information, the n pieces of audio information correspond to the n groups of cameras one to one, and each piece of audio information in the n pieces of audio information is the audio information in the environment where the corresponding group of cameras are located.
In this application, the terminal is capable of acquiring a video picture taken by a camera of each of the n devices, and in this case, the terminal is capable of acquiring audio information in an environment where each of the n devices is located, so that it can be determined from the audio information that a person who is speaking may exist in the video picture taken by the camera of which device.
Optionally, according to the environmental audio information, the operation of determining the first camera from the plurality of cameras may be: at least one target audio information is determined from the n audio information, the target audio information is audio information with human voice, and then a first camera is determined from a group of cameras corresponding to each target audio information in the at least one target audio information.
The determining of the first camera from the group of cameras corresponding to each target audio information in the at least one target audio information may be: for any target audio information in the at least one target audio information, if a group of cameras corresponding to the target audio information comprises j cameras, positioning a human voice sound source according to the target audio information to obtain the direction of the human voice sound source, wherein j is an integer greater than or equal to 2, and the shooting directions of the j cameras are different; and determining a first camera from the j cameras according to the direction of the human voice sound source and the shooting direction of each camera in the j cameras.
In the application, the direction of the human voice sound source can be analyzed according to the target audio information, and then whether the human voice in the environment where the camera is located comes from the shooting direction of the camera is analyzed by combining the direction of the human voice sound source and the shooting direction of the camera, so that whether the human voice sound source exists in the shooting area of the camera can be determined, namely whether the camera is the first camera is determined. So, can improve the degree of accuracy of the first camera of determining.
Optionally, the operation of performing differentiated display on the video picture shot by the first camera and the video picture shot by the second camera may be: the video picture shot by the first camera is displayed in an enlarged mode, the video picture shot by the second camera is displayed in a reduced mode, at the moment, the display area of the video picture shot by the first camera is larger than that of the video picture shot by the second camera, and the video picture shot by the first camera can be highlighted. Or, the video picture shot by the first camera is a main picture in picture-in-picture mode, the video picture shot by the second camera is a sub picture in picture-in picture mode, and the video picture shot by the first camera and the video picture shot by the second camera are displayed in picture-in-picture mode, wherein the picture-in-picture mode refers to that the sub pictures are simultaneously displayed on a small area of the main picture in the process of displaying the main picture in full screen, so that the main picture can be highlighted.
Optionally, the multiple cameras are all disposed on the terminal, and shooting directions of the multiple cameras are different. In this case, before the video image captured by each of the plurality of cameras is acquired, the plurality of cameras may be started after receiving the multi-camera instruction. In this case, the operation of acquiring the environmental audio information may be: the audio information collected by a plurality of microphones of the terminal is obtained, the audio information collected by the plurality of microphones is the environment audio information, and the plurality of microphones are arranged at different positions of the terminal.
In the application, in the process of multi-camera shooting, when a video picture shot by each of a plurality of cameras of the terminal is displayed in a multi-camera shooting interface, the video picture shot by a first camera of the plurality of cameras can be highlighted, that is, a video picture shot by a camera with a human voice source in a shooting area is displayed in a display mode different from other video pictures, so that a video picture of a person speaking is different from other video pictures. Therefore, the effect of highlighting which video picture can be achieved when the person in which video picture speaks can be achieved, so that the flexibility of video picture display can be improved, a user can watch the video picture better, and the interaction experience of the user can be improved to a certain degree.
Optionally, the plurality of cameras are at least two groups of cameras, one of the at least two groups of cameras is disposed on the terminal, and the other at least one group of cameras is disposed on at least one cooperative device in a multi-screen cooperative state with the terminal. In this case, before the video image captured by each of the multiple cameras is acquired, the camera of the terminal may be started after the cooperative video recording instruction is received, and each of the at least one cooperative device is instructed to start its own camera. In this case, the operation of acquiring the video picture taken by each of the plurality of cameras may be: and acquiring a video picture shot by a camera of the terminal, and receiving the video picture shot by the camera of each cooperative device in the at least one cooperative device. The operation of acquiring the environmental audio information may be: acquiring audio information acquired by a microphone of the terminal, and receiving the audio information acquired by the microphone of the terminal and sent by each cooperative device in the at least one cooperative device, wherein the audio information acquired by the microphone of the terminal and the audio information acquired by the microphone of each cooperative device in the at least one cooperative device are the environmental audio information.
In the application, in the process of performing collaborative video recording, when the terminal displays the video frames shot by the camera of the terminal and the camera of each collaborative device in the at least one collaborative device in the collaborative video recording interface, the video frame shot by the first camera in the cameras can be highlighted, that is, the video frame shot by the camera with the human voice source in the shooting area is displayed in a display mode different from other video frames, so that the video frame where the person speaking is located is different from other video frames. Therefore, the effect of highlighting which video picture can be achieved when the person in which video picture speaks can be achieved, so that the flexibility of video picture display can be improved, the user can watch the video picture conveniently, and the interactive experience of the user can be improved to a certain extent.
Optionally, the multiple cameras are all disposed on the terminal, and shooting directions of the multiple cameras are different. In this case, before the video picture captured by each of the plurality of cameras is obtained, the plurality of cameras may be started after a video call instruction is received, where the video call instruction is used to instruct a call to be made with the far-end call device. In this case, the operation of acquiring the environmental audio information may be: the method comprises the steps of obtaining audio information collected by a plurality of microphones of the terminal, wherein the audio information collected by the plurality of microphones is the environment audio information, and the plurality of microphones are arranged at different positions of the terminal. The operation of performing differentiated display on the video image shot by the first camera and the video image shot by the second camera may be: and displaying the video picture shot by the first camera on the video call interface, not displaying the video picture shot by the second camera, and sending the video picture shot by the first camera to the far-end call equipment for displaying.
In the application, in the process of carrying out the video call, the terminal displays the video picture shot by the first camera of the terminal in the video call interface of the terminal, but does not display the video picture shot by the second camera of the terminal, and sends the video picture shot by the first camera to the video call interface of the far-end call equipment for display. That is, in the present application, the video pictures shot by the camera having the human voice source in the shooting area among the multiple cameras of the terminal are displayed on the video call interfaces of the terminal and the far-end call device, that is, the video pictures shot by the multiple cameras of the terminal have the video pictures of the person who is speaking in the video call interfaces of the terminal and the far-end call device. Therefore, when a person in the video image shot by the camera in the terminal speaks, the effect of displaying the video image shot by the camera on the video call interface of the terminal and the far-end call device can be achieved, so that the flexibility of video image display can be improved, a user can watch the video image better, and the interactive experience of the user can be improved to a certain extent.
Furthermore, the terminal can not only realize the display adjustment of the video pictures, but also generate video files corresponding to the video pictures shot by the cameras. The video file comprises video data and audio data, and the video file format can be set to place the video data and the audio data in one file, so that the video data and the audio data can be played back conveniently and simultaneously, and the video playing is realized. The operation of the terminal generating the video file corresponding to the video picture shot by each camera can include the following two possible modes.
A first possible way: in the shooting process of the n groups of cameras, generating multi-channel audio data for any one group of cameras in the n groups of cameras according to the audio information corresponding to the group of cameras, and performing channel separation on the multi-channel audio data of the group of cameras to obtain the audio data of each camera in the group of cameras, wherein the channels of the audio data of different cameras in the group of cameras are different; and for any one camera in the n groups of cameras, generating a video file corresponding to the one camera according to the video data of the video picture shot by the one camera and the audio data of the one camera. Therefore, when the shooting of the n groups of cameras is finished, the terminal can output the video file corresponding to each camera in the n groups of cameras, namely, a plurality of video files.
A second possible way: in the shooting process of the n groups of cameras, generating multi-channel audio data for any one group of cameras in the n groups of cameras according to the audio information corresponding to the group of cameras, and performing channel separation on the multi-channel audio data of the group of cameras to obtain the audio data of each camera in the group of cameras, wherein the channels of the audio data of different cameras in the group of cameras are different; if the video pictures shot by the first camera exist in all the video pictures currently displayed, generating a video file according to the video data of all the video pictures currently displayed and the audio data of the first camera; and if the video pictures shot by the first camera do not exist in all the video pictures currently displayed, performing audio mixing operation on the audio data of the cameras to which all the video pictures currently displayed belong to obtain mixed audio data, and generating a video file according to the video data of all the video pictures currently displayed and the mixed audio data. Thus, when the shooting of the n groups of cameras is finished, the terminal can output a video file. The video data in the video file is the video data of the fusion picture of all the video pictures displayed by the terminal, and the audio data in the video file is the audio data of the highlighted video picture or the mixed audio data of all the commonly displayed video pictures.
In a second aspect, there is provided a video screen display apparatus having a function of realizing the behavior of the video screen display method in the first aspect described above. The video picture display device comprises at least one module, and the at least one module is used for realizing the video picture display method provided by the first aspect.
In a third aspect, a video screen display device is provided, which comprises a processor and a memory, wherein the memory is used for storing a program for supporting the video screen display device to execute the video screen display method provided by the first aspect, and storing data for realizing the video screen display method provided by the first aspect. The processor is configured to execute programs stored in the memory. The video picture display apparatus may further include a communication bus for establishing a connection between the processor and the memory.
In a fourth aspect, there is provided a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to execute the video picture display method of the first aspect described above.
In a fifth aspect, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the video picture display method of the first aspect described above.
The technical effects obtained by the second, third, fourth and fifth aspects are similar to the technical effects obtained by the corresponding technical means in the first aspect, and are not described herein again.
Drawings
Fig. 1 is a schematic structural diagram of a terminal according to an embodiment of the present application;
fig. 2 is a block diagram of a software system of a terminal according to an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating a first video frame according to an embodiment of the present application;
fig. 4 is a flowchart of a first video frame display method according to an embodiment of the present application;
FIG. 5 is a schematic diagram illustrating a second video frame according to an embodiment of the present application;
fig. 6 is a flowchart of a second video frame display method according to an embodiment of the present application;
FIG. 7 is a diagram illustrating a display of a third video frame according to an embodiment of the present application;
FIG. 8 is a diagram illustrating a fourth video frame according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a first software system provided in an embodiment of the present application;
fig. 10 is a flowchart of a video frame display process provided by an embodiment of the present application;
FIG. 11 is a schematic diagram illustrating a fifth video frame according to an embodiment of the present application;
FIG. 12 is a flowchart illustrating a third method for displaying video frames according to an embodiment of the present disclosure;
fig. 13 is a schematic diagram illustrating a sixth video frame according to an embodiment of the present application;
fig. 14 is a schematic display diagram of a seventh video frame provided in the embodiment of the present application;
FIG. 15 is a schematic diagram of a second software system provided by an embodiment of the present application;
fig. 16 is a schematic view of a display of an eighth video frame according to an embodiment of the present application;
fig. 17 is a flowchart of a fourth video frame display method according to an embodiment of the present application;
fig. 18 is a schematic display diagram of a ninth video frame provided in the embodiment of the present application;
fig. 19 is a schematic display diagram of a tenth video frame provided in the embodiment of the present application;
fig. 20 is a flowchart of a fifth video frame display method according to an embodiment of the present application;
fig. 21 is a schematic display diagram of an eleventh video frame provided in the embodiment of the present application;
fig. 22 is a schematic display diagram of a twelfth video frame according to an embodiment of the present application;
fig. 23 is a flowchart of a sixth video frame display method according to an embodiment of the present application;
fig. 24 is a schematic diagram illustrating a display of a thirteenth video frame according to an embodiment of the present application;
fig. 25 is a schematic diagram illustrating a display of a fourteenth video frame according to an embodiment of the present application;
FIG. 26 is a schematic diagram of a third software system provided in an embodiment of the present application;
fig. 27 is a schematic diagram of a video file provided in an embodiment of the present application;
fig. 28 is a schematic structural diagram of a video screen display device according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
It should be understood that reference to "a plurality" in this application means two or more. In the description of the present application, "/" means "or" unless otherwise stated, for example, a/B may mean a or B; "and/or" herein is only an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, for the convenience of clearly describing the technical solutions of the present application, the terms "first", "second", and the like are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.
The statement that "one embodiment" or "some embodiments" or the like are described in this application mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. Furthermore, the terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
The following describes a terminal according to an embodiment of the present application.
Fig. 1 is a schematic structural diagram of a terminal according to an embodiment of the present application. Referring to fig. 1, the terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
It is to be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation to the terminal 100. In other embodiments of the present application, terminal 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. Wherein, the different processing units may be independent devices or may be integrated in one or more processors.
The controller may be, among other things, a neural center and a command center of the terminal 100. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.
A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.
The charging management module 140 is configured to receive a charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from a wired charger via the USB interface 130. In some wireless charging embodiments, the charging management module 140 may receive a wireless charging input through a wireless charging coil of the terminal 100. The charging management module 140 may also supply power to the terminal 100 through the power management module 141 while charging the battery 142.
The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), etc. In some other embodiments, the power management module 141 may also be disposed in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may be disposed in the same device.
The wireless communication function of the terminal 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.
The mobile communication module 150 may provide a solution including wireless communication of 2G/3G/4G/5G, etc. applied on the terminal 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110.
The wireless communication module 160 may provide solutions for wireless communication applied to the terminal 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (BT), global Navigation Satellite System (GNSS), frequency Modulation (FM), near Field Communication (NFC), infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.
The terminal 100 implements a display function through the GPU, the display screen 194, and the application processor, etc. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
The terminal 100 may implement a photographing function through the ISP, the camera 193, the video codec, the GPU, the display screen 194, and the application processor, etc.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the terminal 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. Such as saving files of music, video, etc. in an external memory card.
The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the terminal 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (e.g., audio data, a phonebook, etc.) created during use of the terminal 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.
The terminal 100 may implement audio functions, such as playing music, recording, etc., through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor.
The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.
The SIM card interface 195 is used to connect a SIM card. The SIM card can be attached to and detached from the terminal 100 by being inserted into the SIM card interface 195 or being pulled out of the SIM card interface 195. The terminal 100 may support 1 or N SIM card interfaces, where N is an integer greater than 1. The SIM card interface 195 may support a Nano SIM card, a Micro SIM card, a SIM card, etc. The same SIM card interface 195 can be inserted with multiple cards at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 195 is also compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The terminal 100 interacts with the network through the SIM card to implement functions such as communication and data communication. In some embodiments, the terminal 100 employs esims, namely: an embedded SIM card. The eSIM card can be embedded in the terminal 100 and cannot be separated from the terminal 100.
Next, a software system of the terminal 100 will be explained.
The software system of the terminal 100 may adopt a hierarchical architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. In the embodiment of the present application, an Android (Android) system with a layered architecture is taken as an example to exemplarily explain a software system of the terminal 100.
Fig. 2 is a block diagram of a software system of the terminal 100 according to an embodiment of the present disclosure. Referring to fig. 2, the layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided from top to bottom into an Application layer (Application), an Application Framework layer (Framework), an Android runtime (Android runtime) and system layer, an extension layer, and a Kernel layer (Kernel).
The application layer may include a series of application packages. As shown in fig. 2, the application package may include applications such as camera, gallery, calendar, call, map, instant messaging, WLAN, multi-screen collaboration, bluetooth, short message, etc.
The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions. As shown in fig. 2, the application Framework layer may include an Audio Framework (Audio Framework), a Camera Framework (Camera Framework), a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and the like.
The audio framework comprises an AudioTrack, an AudioRecord and an AudioSystem, and the AudioTrack, the AudioRecord and the AudioSystem are all Android application program framework API classes, wherein the AudioTrack is responsible for outputting playback data, the AudioRecord is responsible for acquiring recording data, and the AudioSystem is responsible for comprehensively managing audio transactions. The camera framework is used for providing a functional interface for the upper application. The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like. The content provider is used to store and retrieve data, which may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc., and makes the data accessible to applications. The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system can be used for constructing a display interface of an application program, and the display interface can be composed of one or more views, such as a view for displaying a short message notification icon, a view for displaying characters and a view for displaying pictures. The phone manager is used to provide communication functions of the terminal 100, such as management of call states (including connection, disconnection, etc.). The resource manager provides various resources, such as localized strings, icons, pictures, layout files, video files, etc., to the application. The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a brief dwell, and does not require user interaction. For example, a notification manager is used to notify download completion, message alerts, and the like. The notification manager may also be a notification that appears in the form of a chart or scrollbar text in a status bar at the top of the system, such as a notification of a background running application. The notification manager may also be a notification that appears on the screen in the form of a dialog window, such as prompting a text message in a status bar, sounding a prompt tone, vibrating the electronic device, flashing an indicator light, etc.
The Android Runtime comprises a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system. The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android. The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.
The system layer may include a plurality of functional modules, such as: audio Services (Audio Services), camera Services (Camera Services), surface manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), two-dimensional graphics engines (e.g., SGL), etc. The audio service includes audiopolicservice and audioFlinger, wherein the audioPolicyservice is an audio policy maker and is responsible for policy decision, volume adjustment policy and the like for switching of audio devices, and the audioFlinger is an audio policy executor and is responsible for management of input and output streaming devices and processing and transmission of audio streaming data. The camera service is used to interact with the camera hardware abstraction layer. The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications. The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc. The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like. The two-dimensional graphics engine is a two-dimensional drawing engine.
The extension layer may also be referred to as a Hardware Abstraction Layer (HAL), which may implement encapsulation of the kernel driver, provide an interface upwards, and shield implementation details of lower layers. The expansion layer is upwards connected with Android Runtime and Framework, and is downwards connected with a driver. The extension layers may include an Audio hardware abstraction layer (Audio HAL) responsible for interaction with Audio hardware devices and a Camera hardware abstraction layer (Camera HAL) responsible for interaction with Camera hardware devices.
The kernel layer is a layer between hardware and software. The core layer may include a display driver, a camera driver, an audio driver, a sensor driver, and the like.
Before explaining the embodiments of the present application in detail, application scenarios related to the embodiments of the present application will be described.
Currently, as shown in fig. 3, in many shooting scenes, a terminal such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a management device, etc. can display a video picture 31 shot by each of a plurality of cameras, that is, the terminal can display a plurality of video pictures 31. Several possible shooting scenarios are described below.
In a first shooting scenario, the mobile phone has a front camera and a rear camera. In a double-camera shooting scene, the mobile phone can start a front camera and a rear camera of the mobile phone, and then can display a video picture shot by the front camera and a video picture shot by the rear camera on a video recording interface (also called a video preview interface).
In the second shooting scenario, the mobile phone and the tablet computer are in a multi-screen cooperative state, the mobile phone has a camera (a front camera and/or a rear camera), and the tablet computer also has a camera (a front camera and/or a rear camera). In a multi-screen collaborative video recording scene, the mobile phone may start its own camera and instruct the tablet computer to start the camera of the tablet computer, and then the mobile phone may display a video picture shot by its own camera and a video picture shot by its camera and sent by the tablet computer on a video recording interface (also referred to as a video preview interface).
In a third shooting scenario, the mobile phone performs a video call with the far-end call device, where the mobile phone has a camera (a front camera and/or a rear camera), and the far-end call device also has a camera (a front camera and/or a rear camera). In a video call scene, the mobile phone can start a camera of the mobile phone, the remote call equipment can also start the camera of the mobile phone, and then the mobile phone can display a video picture shot by the camera of the mobile phone and a video picture shot by the camera of the remote call equipment and sent by the remote call equipment on a video call interface.
In a fourth shooting scenario, the notebook computer performs a video conference with at least one other device (which may be called a participating device), the notebook computer has a camera (a front camera and/or a rear camera), and each of the at least one other participating device also has a camera (a front camera and/or a rear camera). In a video conference scene, the notebook computer can start a camera of the notebook computer, each of other at least one conference-participating device can also start the camera of the notebook computer, and then the notebook computer can display a video picture shot by the camera of the notebook computer and a video picture shot by the camera of each of the other at least one conference-participating device in the video conference interface.
In a fifth shooting scenario, a management device monitors a plurality of different areas through a plurality of monitoring devices installed in different areas, each of the plurality of monitoring devices having a camera. In the centralized monitoring scene, the management device may instruct each of the multiple monitoring devices to start a camera, and then the management device may display, on the monitoring interface, a video picture sent by each of the multiple monitoring devices and captured by its camera.
In the shooting scenes, the terminal can display a plurality of video pictures which are respectively shot by the cameras. In order to improve the display effect of the multiple video pictures, an embodiment of the present application provides a video picture display method, which can dynamically adjust the display modes of the multiple video pictures according to audio information in an environment where each camera of the multiple cameras is located, so as to implement differentiated display of the multiple video pictures, thereby achieving an effect of highlighting which video picture is displayed when a person in which video picture speaks. Therefore, the flexibility of video picture display can be improved, a user can watch video pictures better, and the interactive experience of the user can be improved to a certain extent.
The following explains the video screen display method provided in the embodiments of the present application in detail.
Fig. 4 is a flowchart of a video frame display method according to an embodiment of the present application. The method is applied to a terminal, which may be the terminal described above in the embodiments of fig. 1-2. Referring to fig. 4, the method includes the steps of:
step 401: the terminal acquires a video picture shot by each camera in the plurality of cameras and acquires environment audio information, wherein the environment audio information comprises audio information in the environment where each camera in the plurality of cameras is located.
The plurality of cameras are cameras by which the terminal can acquire video pictures shot by the terminal. The terminal can display the video pictures shot by each camera in the plurality of cameras.
The multiple cameras can be cameras of the terminal, and the terminal can display video pictures shot by the multiple cameras of the terminal. Or, some of the cameras in the multiple cameras may be cameras that the terminal itself has, and another part of the cameras may be cameras that other devices have, so that the terminal can display not only video pictures that the terminal itself has taken by the cameras, but also video pictures that other devices communicating with the terminal have taken by the cameras. Or, the multiple cameras may be cameras of other devices, and at this time, the terminal can display video pictures shot by the cameras and sent by other devices communicating with the terminal.
The audio information in the environment where each camera in the plurality of cameras is located refers to the audio information in the environment where the device where each camera is located, and the audio information in the environment where each camera is located can be collected by the device where each camera is located. That is, when the plurality of cameras are cameras that the terminal itself has, the environmental audio information may be audio information in an environment where the terminal itself is located, which is collected by the terminal. When some of the cameras in the plurality of cameras are cameras that the terminal has, and another part of the cameras are cameras that other devices have, the environment audio information may include audio information in an environment where the terminal is located and audio information in an environment where the other devices are located, which are collected by the terminal. When the plurality of cameras are cameras that other devices have, the environmental audio information may include audio information in their environments that the other devices have collected.
Optionally, the plurality of cameras are n groups of cameras, and n is a positive integer. The cameras of different groups are arranged on different devices, and the cameras of the same group are arranged on the same device. That is, the n groups of cameras one are arranged in n different devices, and each device is provided with one group of cameras. The ambient audio information may comprise audio information in the environment in which each of the n devices is located, i.e. the ambient audio information may comprise n audio information.
In this case, the operation of step 401 may be: the terminal acquires n pieces of audio information, the n pieces of audio information are the environmental audio information, the n pieces of audio information correspond to the n groups of cameras one to one, and each piece of audio information in the n pieces of audio information is the audio information in the environment where the corresponding group of cameras are located.
Specifically, for any one of the n groups of cameras, if the device in which the group of cameras is located is the terminal, that is, if the group of cameras is located in the terminal, the terminal may collect audio information in an environment in which the terminal is located as audio information corresponding to the group of cameras. For example, the terminal may collect audio information through a microphone of the terminal, and use the collected audio information as one audio information corresponding to the group of cameras.
If the device where the group of cameras is located is not the terminal, that is, if the group of cameras is disposed in another device, the another device may collect audio information in the environment where the another device is located, for example, the another device may collect audio information through a microphone of the another device. The other device may then send the captured audio information to the terminal. After the terminal receives the audio information sent by the other device, the received audio information can be used as the audio information corresponding to the group of cameras.
It should be noted that, in this embodiment of the application, the terminal may be capable of acquiring the video pictures taken by the camera of each of the n devices, and in this case, the terminal may acquire the audio information in the environment where each of the n devices is located, so that it may be determined from this that a person speaking may exist in the video pictures taken by the camera of which device.
Step 402: the terminal determines a first camera from the plurality of cameras according to the environment audio information, and a human voice sound source exists in a shooting area of the first camera.
The human voice sound source refers to a sound source which emits human voice. The person sound source exists in the shooting area of the first camera, which indicates that the shooting object of the first camera is likely to be a person and is speaking, i.e. the person who is speaking is likely to appear in the video picture shot by the first camera.
After the terminal acquires the audio information in the environment where each camera in the multiple cameras is located, the audio information can be analyzed to determine which cameras in the multiple cameras have the human voice source in the shooting area, namely, to determine which cameras in the multiple cameras shoot the person who is speaking in the video picture.
In the case that the environmental audio information includes the n pieces of audio information, the operation of step 402 may be: the terminal determines at least one target audio information from the n audio information, wherein the target audio information is audio information with human voice; the terminal determines a first camera from a group of cameras corresponding to each target audio information in the at least one target audio information.
Therefore, the terminal determines the first camera from the group of cameras corresponding to the target audio information with the voice, namely, the terminal determines the first camera from the cameras of the equipment with the voice in the environment, and therefore the first camera can be determined more accurately.
For any one of the n pieces of audio information, the terminal first detects whether the audio information contains voice, if the audio information contains voice, the audio information is determined to be target audio information, and then a first camera is determined from a group of cameras corresponding to the target audio information.
It should be noted that, for any one of the n groups of cameras, the group of cameras may include only one camera, or may include at least two cameras. If the group of cameras comprises at least two cameras, the shooting directions of the at least two cameras are different, that is, the shooting directions of the at least two cameras arranged on the same device are different. The shooting direction of the camera of each of the n devices can be recorded in the terminal.
In this case, the operation of the terminal determining the first camera from the group of cameras corresponding to the target audio information may be: if a group of cameras corresponding to the target audio information only comprises one camera, the terminal directly determines that the camera is the first camera. Or if the group of cameras corresponding to the target audio information only comprises one camera, the terminal performs the positioning of the human voice sound source according to the target audio information to obtain the direction of the human voice sound source; under the condition that the direction of the human voice sound source is the same as the shooting direction of the camera, determining the camera as a first camera; in the case where the direction in which the human voice sound source is located is different from the shooting direction of this camera, it is determined that this camera is not the first camera. Or if the group of cameras corresponding to the target audio information comprises j cameras, the terminal firstly performs human sound source positioning according to the target audio information to obtain the direction of the human sound source, and then determines a first camera from the j cameras according to the direction of the human sound source and the shooting direction of each camera in the j cameras, wherein j is an integer greater than or equal to 2, and the shooting directions of the j cameras are different.
Optionally, the determining, by the terminal, the operation of the first camera from the j cameras according to the direction in which the human voice sound source is located and the shooting direction of each camera in the j cameras may be: the terminal determines that a camera with the same shooting direction as the direction in which the human voice source is located in the j cameras is a first camera, and determines that a camera with the different shooting direction from the direction in which the human voice source is located in the j cameras is not the first camera. Or the terminal performs sound source separation on the target audio information according to the shooting direction of each camera in the j cameras to obtain the audio of the sound source in the shooting direction of each camera in the j cameras, that is, j audio frequencies are separated, the j audio frequencies correspond to the j cameras one by one, each audio frequency in the j audio frequencies is the audio frequency of the sound source in the shooting direction of the corresponding camera, then the terminal determines the sound source energy of each audio frequency in the j audio frequencies, detects whether the voice exists in each audio frequency in the j audio frequencies to obtain a voice detection result, then the terminal determines the voice occupation ratio of the sound source in the shooting direction of each camera in the j cameras according to the direction in which the voice source is located, the sound source energy of each audio frequency in the j audio frequencies and the voice detection result, and finally, if the voice occupation ratio of the sound source in the shooting direction of each camera in the j cameras is greater than or equal to the voice occupation ratio threshold, the terminal determines that the camera is the first camera as the first camera, and if the voice occupation ratio of the camera in the shooting direction of the camera is not less than the first camera. The voice proportion threshold may be preset, if the voice proportion of a certain sound source is greater than or equal to the voice proportion threshold, it is indicated that the sound source is likely to be a voice sound source, and if the voice proportion of a certain sound source is less than the voice proportion threshold, it is indicated that the sound source is likely not a voice sound source.
It is worth mentioning that, in the embodiment of the present application, the direction in which the human voice source is located may be analyzed according to the target audio information, and then, in combination with the direction in which the human voice source is located and the shooting direction of the camera, whether the human voice in the environment in which the camera is located comes from the shooting direction of the camera is analyzed, so that whether the human voice source exists in the shooting area of the camera may be determined, that is, whether the camera is the first camera is determined. So, can improve the degree of accuracy of the first camera of determining.
It should be noted that the process of determining, by the terminal in step 402, the first camera from the plurality of cameras according to the environmental audio information is a process of performing audio directivity analysis according to the environmental audio information to obtain directivity data. The audio directivity analysis is to analyze the directivity of the audio from the human voice source in the environmental audio information to determine whether the direction in which the human voice source is located is the shooting direction of the camera, and accordingly obtain directivity data, where the directivity data is used to indicate which camera has the same shooting direction as the direction in which the human voice source is located, that is, to indicate which camera is the first camera.
Step 403: the terminal carries out differential display on a video picture shot by the first camera and a video picture shot by the second camera, and the second camera is other cameras except the first camera in the multiple cameras.
The terminal displays the video pictures shot by the first camera and the video pictures shot by the second camera in a differentiated mode, namely displays the video pictures shot by the first camera and the video pictures shot by the second camera in different display modes so as to realize the highlight display of the video pictures shot by the first camera. That is, a video image captured by a camera having a human voice sound source in a capturing area is displayed in a display manner different from other video images, so that the video image in which the person speaking is located is different from other video images. Therefore, if the person in a certain video picture speaks, the video picture and other video pictures are displayed in a differentiated mode, so that the flexibility of video picture display can be improved, and a user can watch the video pictures better. Moreover, the effect of highlighting which video picture is highlighted when the person in which video picture speaks can be achieved, so that the interactive experience of the user can be improved to a certain extent.
Optionally, the operation of the terminal performing differentiated display on the video picture shot by the first camera and the video picture shot by the second camera may be: the terminal enlarges and displays the video picture shot by the first camera, reduces and displays the video picture shot by the second camera, and at the moment, the display area of the video picture shot by the first camera is larger than that of the video picture shot by the second camera, so that the video picture shot by the first camera can be highlighted. Or the terminal uses the video picture shot by the first camera as a main picture in picture-in-picture mode, uses the video picture shot by the second camera as a sub-picture in picture-in picture mode, and displays the video picture shot by the first camera and the video picture shot by the second camera in picture-in-picture mode, wherein the picture-in-picture mode refers to that the sub-pictures are simultaneously displayed on a small area of the main picture in the process of displaying the main picture in full screen, so that the main picture can be highlighted. Or the terminal displays the video picture shot by the first camera and does not display the video picture shot by the second camera, for example, the terminal can display the video picture shot by the first camera in a full screen manner, so that the video picture shot by the first camera can be highlighted. Of course, the terminal may also display the video image shot by the first camera and the video image shot by the second camera in a differentiated manner in other manners, which is not limited in this embodiment of the application.
It should be noted that, if the terminal does not determine the first camera from the multiple cameras in the step 402, that is, if the first camera does not exist in the multiple cameras, the terminal does not perform the step 403, but displays the video frames captured by each of the multiple cameras in the same display manner, or performs differentiation display on the video frames captured by each of the multiple cameras according to other criteria.
When the terminal performs differential display on the video pictures shot by each camera in the multiple cameras according to other standards, the video pictures shot by each camera in the multiple cameras can be displayed in a differential mode according to the number of people in the video pictures, the priority of the cameras, the loudness of environmental audio and other standards. For example, the terminal may detect the number of persons appearing in the video picture taken by each of the plurality of cameras and then highlight the video picture in which the number of persons appearing is the largest. Alternatively, the priority of each of the plurality of cameras may be recorded in the terminal, and the video image captured by the camera with the highest priority may be highlighted. Or, the terminal may determine one audio information with the highest loudness from the n audio information, and highlight the video frames captured by a group of cameras corresponding to the one audio information. The manner in which the terminal highlights a certain video frame may be various. For example, the terminal may enlarge and display the video image, and reduce and display other video images; or, the terminal may use the video picture as a main picture in the pip mode, and use the other video pictures as sub-pictures in the pip mode, and display the video picture and the other video pictures in the pip mode; alternatively, the terminal may display only this video picture without displaying the other video pictures. Of course, the terminal may highlight the video frame in other manners, which is not limited in the embodiment of the present application.
It should be noted that the terminal may continuously perform the above steps 401 to 403 during the process of capturing the video pictures by the plurality of cameras. Therefore, the terminal can dynamically adjust the display mode of the video pictures shot by each camera in the plurality of cameras according to the real-time audio in the environment where each camera in the plurality of cameras is located in the whole shooting process, so that the effect of highlighting which video picture is shot when a person in which video picture speaks is achieved.
The following is an exemplary description of the possible implementation of the video display method described in the embodiment of fig. 4 in various shooting scenarios.
First, a first shooting scene is explained:
first shooting scene: multiple video scenes
In such a shooting scene, the terminal has a plurality of cameras, and shooting directions of the plurality of cameras are different.
The terminal can start a multi-camera function so as to record video simultaneously through a plurality of cameras of the terminal, and display video pictures shot by each camera in the plurality of cameras in a multi-camera interface. Illustratively, the terminal may have a front camera and a rear camera, and after the terminal starts the multi-camera function, the terminal starts its own front camera and rear camera, and then, as shown in fig. 5, the terminal may display a video screen 521 taken by its own front camera and a video screen 522 taken by its own rear camera in the multi-camera interface 51.
In the multi-camera shooting scene, the n is 1, that is, the multiple cameras belong to one group, the multiple cameras are all arranged at the terminal, and shooting directions of the multiple cameras are different.
The method for displaying video frames in a multi-camera scene will be described with reference to the embodiments of fig. 6-10 below.
Fig. 6 is a flowchart of a video frame display method according to an embodiment of the present application. Referring to fig. 6, the method includes:
step 601: after receiving the multi-camera shooting instruction, the terminal starts a plurality of cameras of the terminal to obtain video pictures shot by each camera in the plurality of cameras.
The multi-camera recording instruction is used for indicating multi-camera recording, and the multi-camera recording refers to the simultaneous recording of a plurality of cameras of the terminal. The multi-camera video recording instruction can be triggered by a user, and the user can trigger the multi-camera video recording instruction through operations such as click operation, sliding operation, voice operation, gesture operation and motion sensing operation, and the embodiment of the application does not limit the multi-camera video recording instruction.
After the terminal receives the multi-camera shooting and recording instruction, a plurality of cameras of the terminal can be started, and after the cameras are started, video pictures can be shot. In this case, as shown in fig. 5, the terminal may display a video picture taken by each of the plurality of cameras. Moreover, the terminal may continuously acquire the environmental audio information during the process of capturing the video image by the multiple cameras, as described in step 602 below.
Step 602: the terminal acquires audio information acquired by a plurality of microphones of the terminal, wherein the audio information acquired by the plurality of microphones is environment audio information.
Under the condition that the plurality of cameras are arranged on the terminal, the environment audio information comprises audio information, namely the audio information in the environment where the terminal where the plurality of cameras are located is located, so that the terminal can collect the audio information through a plurality of microphones of the terminal, the audio information collected by the plurality of microphones is used as the environment audio information, and at the moment, the plurality of cameras are used as a group of cameras to correspond to the environment audio information. The plurality of microphones can be arranged at different positions of the terminal to collect audio information in an all-around manner. The terminal can start a plurality of cameras of the terminal to record videos and start a plurality of microphones of the terminal to collect audio information. For example, the terminal may have three microphones, which may be respectively disposed at the top, bottom, and back of the terminal.
Step 603: the terminal determines a first camera from the plurality of cameras according to the environment audio information, and a human voice sound source exists in a shooting area of the first camera.
The human voice sound source refers to a sound source which emits human voice. A human voice sound source exists in the shooting area of the first camera, which indicates that the shooting object of the first camera is likely to be a human and speaking, namely, the human who is speaking is likely to appear in the video picture shot by the first camera.
After the terminal acquires the audio information in the environment where the terminal is located, the terminal can analyze the audio information to determine which cameras in the multiple cameras of the terminal have the human voice source in the shooting area, namely, determine which cameras in the multiple cameras shoot the person who is speaking in the video picture.
Specifically, the operation of step 603 may be: the terminal detects whether the voice exists in the environment audio information or not, if the voice exists in the environment audio information, the environment audio information is determined to be target audio information, and then a first camera is determined from the multiple cameras corresponding to the environment audio information.
The operation of the terminal determining the first camera from the multiple cameras corresponding to the environmental audio information may be: the terminal firstly carries out human voice source positioning according to the environment audio information to obtain the direction of the human voice source, and then determines a first camera from the plurality of cameras according to the direction of the human voice source and the shooting direction of each camera in the plurality of cameras.
Optionally, the terminal may determine, according to the direction in which the human voice sound source is located and the shooting direction of each of the plurality of cameras, the operation of the first camera from the plurality of cameras as: the terminal determines that a camera with the same shooting direction as the direction in which the human voice sound source is located in the plurality of cameras is a first camera, and determines that a camera with the different shooting direction from the direction in which the human voice sound source is located in the plurality of cameras is not the first camera. Or the terminal performs sound source separation on the environmental audio information according to the shooting direction of each of the multiple cameras to obtain audio of a sound source in the shooting direction from each of the multiple cameras, that is, j audio frequencies are separated, j is the number of the multiple cameras, the j audio frequencies are in one-to-one correspondence with the multiple cameras, each of the j audio frequencies is the audio of the sound source in the shooting direction from the corresponding camera, then the terminal determines the sound source energy of each of the j audio frequencies, detects whether human voice exists in each of the j audio frequencies to obtain a human voice detection result, then the terminal determines the human voice occupation ratio of the sound source in the shooting direction of each of the multiple cameras according to the direction in which the human voice sound source is located, the sound source energy of each of the j audio frequencies and the human voice detection result, and finally determines that the human voice occupation ratio of the sound source in the shooting direction of each of the multiple cameras is smaller than a first human voice occupation ratio of the multiple cameras if the human voice occupation ratio of the sound source in the shooting direction of the one camera is larger than or equal to the first camera, and determines that the first camera is smaller than the first camera.
It is worth mentioning that, in the embodiment of the present application, a direction in which a human voice sound source is located may be analyzed according to the environmental audio information, and then, in combination with the direction in which the human voice sound source is located and a shooting direction of the camera, whether human voice in an environment in which the camera is located comes from the shooting direction of the camera is analyzed, so that whether a human voice sound source exists in a shooting area of the camera may be determined, that is, whether the camera is the first camera is determined. So, can determine first camera comparatively accurately.
After the terminal determines the first camera from the multiple cameras according to the environmental audio information in step 603, the other cameras except the first camera from the multiple cameras may be referred to as second cameras.
Step 604: the terminal displays a video picture shot by the first camera and a video picture shot by the second camera in the multi-camera interface, and compared with the video picture shot by the second camera, the video picture shot by the first camera is highlighted.
It should be noted that, in the embodiment of the present application, when the terminal displays a video picture shot by each of multiple cameras in the multi-camera interface in the multi-camera process, the video picture shot by a first camera in the multiple cameras may be highlighted, that is, a video picture shot by a camera having a vocal sound source in a shooting area is displayed in a display manner different from other video pictures, so as to distinguish a video picture in which a person speaking is located from other video pictures. Therefore, the effect of highlighting which video picture can be achieved when the person in which video picture speaks can be achieved, so that the flexibility of video picture display can be improved, a user can watch the video picture better, and the interaction experience of the user can be improved to a certain degree.
Optionally, the terminal may perform zoom-in display on a video picture shot by the first camera in the multi-camera interface, and perform zoom-out display on a video picture shot by the second camera; or the terminal uses the video picture shot by the first camera as a main picture of the picture-in-picture mode, uses the video picture shot by the second camera as a sub-picture of the picture-in-picture mode, and displays the video picture shot by the first camera and the video picture shot by the second camera in the picture-in-picture mode in the multi-camera video interface. Of course, the terminal may also highlight the video picture captured by the first camera in other manners, which is not limited in this embodiment of the present application.
For example, the terminal has a front camera and a rear camera, and as shown in fig. 5, the terminal can display a video screen 521 captured by the front camera and a video screen 522 captured by the rear camera in the multi-camera interface 51. Then, if the terminal determines that the front camera is the first camera, that is, determines that a human voice source exists in the shooting area of the front camera, that is, determines that a speaking person exists in the video picture 521 shot by the front camera, the terminal displays the video picture 521 shot by the front camera in an enlarged manner and displays the video picture 522 shot by the rear camera in a multi-camera interface 51 in a reduced manner, as shown in fig. 7 (a). Alternatively, if the terminal determines that the rear camera is the first camera, that is, determines that a human voice source exists in the shooting area of the rear camera, that is, determines that a speaking person exists in the video picture 522 shot by the rear camera, the terminal displays the video picture 522 shot by the rear camera in an enlarged manner and displays the video picture 521 shot by the front camera in a reduced manner in the multi-camera interface 51 as shown in fig. 7 (b). In this way, the effect of displaying the enlarged image 521 captured by the front camera when the person in the image 521 captured by the front camera speaks can be achieved, and displaying the enlarged image 522 captured by the rear camera when the person in the image 522 captured by the rear camera speaks.
For another example, the terminal has a front camera and a rear camera, and as shown in fig. 5, the terminal can display a video frame 521 captured by the front camera and a video frame 522 captured by the rear camera in the multi-camera interface 51. Then, if the terminal determines that the front camera is the first camera, that is, it determines that there is a human voice source in the shooting area of the front camera, that is, it determines that there is a person speaking in the video picture 521 shot by the front camera, as shown in (a) of fig. 8, the terminal displays the video picture 521 shot by the front camera as a main picture in picture-in-picture mode, and displays the video picture 522 shot by the rear camera as a sub-picture in picture-in-picture mode in the multi-camera interface 51, where the video picture 521 shot by the front camera and the video picture 522 shot by the rear camera are in picture-in-picture mode. Alternatively, if the terminal determines that the rear camera is the first camera, that is, determines that a human voice source exists in the shooting area of the rear camera, that is, determines that a person is speaking exists in the video frame 522 shot by the rear camera, as shown in fig. 8 (b), the terminal displays the video frame 522 shot by the rear camera in the picture-in-picture mode as the main frame of the picture-in-picture mode, and displays the video frame 521 shot by the front camera and the video frame 522 shot by the rear camera in the picture-in-picture mode on the multi-camera interface 51, as shown in fig. 521. In this way, it is possible to achieve the effect that when a person in the video image 521 captured by the front camera speaks, the video image 521 captured by the front camera is displayed as the main image, and when a person in the video image 522 captured by the rear camera speaks, the video image 522 captured by the rear camera is displayed as the main image.
It should be noted that, if the terminal does not determine the first camera from the multiple cameras according to the environmental audio information in the step 603, that is, if the first camera does not exist in the multiple cameras, the terminal does not perform the step 604, but displays the video frames captured by each of the multiple cameras in the same display manner on the multi-camera interface, or performs differentiation display on the video frames captured by each of the multiple cameras on the multi-camera interface according to other criteria.
When the terminal performs differentiated display on the video pictures shot by each camera in the multiple cameras through the multiple-camera interface according to other standards, the terminal can perform differentiated display on the video pictures shot by each camera in the multiple cameras through the multiple-camera interface according to standards such as the number of people in the video pictures, the priority of the cameras and the like. For example, the terminal may detect the number of people appearing in the video frames captured by each of the plurality of cameras, and then highlight the video frame with the largest number of people appearing in the multi-camera interface. Or, the terminal may record the priority of each of the plurality of cameras, and highlight the video frame captured by the camera with the highest priority on the multi-camera interface. The terminal can display a certain video picture in a plurality of modes in a highlighted manner on the multi-camera interface. For example, the terminal can enlarge and display the video picture on the multi-camera interface and reduce and display other video pictures; or, the terminal may use the video picture as a main picture in picture-in-picture mode, use the other video pictures as sub-pictures in picture-in-picture mode, and display the video picture and the other video pictures in picture-in-picture mode on the multi-camera interface. Of course, the terminal may highlight the video frame at the multi-camera interface in other manners, which is not limited in the embodiment of the present application.
It should be noted that the terminal may continuously perform the above steps 602 to 604 during the process of capturing the video pictures by the plurality of cameras. Therefore, the terminal can dynamically adjust the display mode of the video picture shot by each camera in the plurality of cameras according to the real-time audio in the environment where the terminal is located in the whole multi-camera shooting process, so that the effect of highlighting the video picture when the person in the video picture speaks is achieved.
To facilitate understanding of the multi-camera video scene, the video frame display method in the embodiment of fig. 6 is described below with reference to the software system shown in fig. 9 and the flowchart of the video frame display process shown in fig. 10.
Referring to fig. 9, the software system of the terminal may include a Camera application in the application layer, and an Audio frame (not shown) and a Camera frame (not shown) in the application frame layer, and an Audio service (not shown) and a Camera service (not shown) in the system layer, and an Audio HAL and a Camera HAL in the extension layer. The audio framework comprises an AudioRecord, and the AudioRecord is responsible for acquiring recording data. The camera framework includes MediaCodec (not shown in the figure), which is a class for codec of audio and video, and functions of codec are realized by accessing a codec (codec) of an underlying layer. The audio service includes AudioFlinger, which is an executor of an audio policy and is responsible for management of input and output streaming devices and processing and transmission of audio streaming data. The Audio HAL includes an Audio InputStream, which is used to obtain an Audio input stream.
In addition, referring to fig. 9, the terminal also has a plurality of cameras that can take video pictures and a plurality of microphones (mic) that can collect audio information. For example, the terminal may have two cameras, one of which is a front camera disposed at the front of the terminal and the other of which is a rear camera disposed at the back of the terminal. Illustratively, the terminal may have three microphones, one of which may be disposed at the top of the terminal, another of which may be disposed at the bottom of the terminal, and yet another of which may be disposed at the back of the terminal, and in some embodiments, the three microphones may be microphones for implementing an audio zoom (recording focus) function.
Referring to fig. 10, the video screen display process may include the following steps 1001-1008.
Step 1001: after the Camera application starts, the MediaCodec is instructed to create an encoder instance, and the AudioRecord is instructed to create an Audio instance, while the Camera HAL is instructed to start the multiple cameras of the terminal.
The encoder instance is used for encoding the data stream collected by the camera. The Audio example is used to collect recorded data.
And after the camera application program is started, the terminal displays a camera interface. At this point, the camera application prepares for recording, i.e., creates an encoder instance, creates an Audio instance, and starts multiple cameras of the terminal. After the plurality of cameras are started, video pictures can be shot, and a camera application program can acquire the video pictures shot by each camera in the plurality of cameras.
Step 1002: the camera application instructs MediaCodec to start the encoder instance and instructs AudioRecord to start the Audio instance after the camera interface receives a click operation on the multi-camcorder button.
And after the user clicks the multi-camera shooting button on the camera interface, the multi-camera shooting instruction is triggered. At this time, the camera application program may start the created encoder instance and Audio instance to record through each of the plurality of cameras of the terminal. In this case, the camera interface is the multi-camera interface described above.
Step 1003: audioRecord instructs the Audio HAL to obtain Audio information after starting an Audio instance.
Optionally, the AudioRecord may call the Audio HAL through the AudioFlinger to instruct the Audio HAL to obtain the Audio information.
Step 1004: the Audio HAL receives Audio information collected by a plurality of microphones of the terminal as ambient Audio information.
Alternatively, the Audio information collected by multiple microphones of the terminal may be received by the AudioInputStream in the Audio HAL, where the Audio information collected by the multiple microphones is the Audio information in the environment where the terminal is located, that is, the environment Audio information.
Step 1005: the Audio HAL performs Audio directivity analysis according to the environmental Audio information to obtain directivity data.
The audio directivity analysis is to analyze the directivity of the audio from the human voice source in the environmental audio information to determine whether the direction in which the human voice source is located is the shooting direction of the camera, and accordingly obtain directivity data, where the directivity data is used to indicate which camera the shooting direction of which is the same as the direction in which the human voice source is located is which camera, that is, which camera is the first camera.
The Audio HAL performs Audio directivity analysis according to the environmental Audio information to obtain directivity data, that is, the Audio HAL determines the first camera from the cameras of the terminal according to the environmental Audio information. The operation of the Audio HAL determining the first camera from the multiple cameras of the terminal according to the environmental Audio information is the same as the operation of the terminal determining the first camera from the multiple cameras according to the environmental Audio information in step 603, and details of this embodiment are not repeated herein.
It is noted that the Audio HAL includes an Audio directivity analysis algorithm for performing the Audio directivity analysis of the environmental Audio information in step 1005. In addition, the Audio HAL may further include a recording algorithm to generate corresponding Audio data for the video image captured by each of the plurality of cameras according to the environmental Audio information, so as to output a corresponding video file in the subsequent process.
Step 1006: the Audio HAL sends the directional data to the Camera HAL.
Alternatively, camera HAL may register a callback function with Audio HAL after starting multiple cameras of the terminal in step 1001 above. In this manner, the Audio HAL, after obtaining the directivity data, may call the callback function to pass the directivity data to the Audio HAL.
Step 1007: camera HAL sends the directivity data to the Camera application.
Alternatively, camera HAL may report the directivity data to the Camera application via the Meta report path.
The Meta reporting path refers to reporting data using the Camera _ metadata data structure. The Camera _ metadata data structure may enable parameter passing between Camera HAL and a Camera application.
Exemplarily, in the case where the terminal has a front Camera and a rear Camera, TAG information in the Camera _ metadata data structure may be as follows:
public static final CaptureRequest.Key<Byte>AUIDO_DIRECTION_=
KeyGenerator.generateCaptureRequestKey("metadata.auidodirection",byte.class);
wherein, the Byte is 0, which means that only the front camera is recognized as the first camera; the Byte is 1, which means that only the rear camera is recognized as the first camera; byte is 2 and stands for and discerns that leading camera and rear camera are first camera, perhaps stands for and discerns that leading camera and rear camera are not first camera yet.
Step 1008: the camera application dynamically adjusts a plurality of video frames displayed in the camera interface according to the directional data.
The camera application displays the video image captured by each of the multiple cameras in the camera interface, and highlights the video image captured by the first camera indicated by the directional data compared to other video images, and for a specific highlighting manner, reference may be made to the relevant description in step 604, which is not described again in this embodiment of the present application.
It is noted that the above steps 1004-1008 can be performed continuously during the multi-camera imaging process. So, audio HAL can acquire environment Audio information in real time and carry out Audio frequency directive property analysis, obtain directive property data, camera application can be according to this directive property data, a plurality of video pictures that show in the dynamic adjustment camera interface, in order in whole many video recording processes, when reaching the personage in which video picture and speaking, just carry out the effect that highlights to which video picture, thereby not only can improve the flexibility that video picture shows, be convenient for the user better watch the video picture, and can improve user's interactive experience to a certain extent.
The second shooting scenario is explained next:
a second shooting scenario: collaborative video recording scenario
In the shooting scene, the terminal and at least one other device (which may be called a cooperative device) are in a multi-screen cooperative state, the terminal and the at least one cooperative device both have cameras, and the terminal can shoot a video picture by means of the camera of each of the at least one cooperative device.
The terminal can start a collaborative video recording function to simultaneously shoot video pictures through the camera of the terminal and the camera of each collaborative device in the at least one collaborative device, and then display the video pictures shot by the camera of the terminal and the video pictures shot by the camera of each collaborative device in the at least one collaborative device in a collaborative video recording interface. Illustratively, the terminal and a cooperative device each have a camera, and after the terminal activates the cooperative video recording function, the terminal activates its own camera and the camera of the cooperative device, and then, as shown in fig. 11, the terminal 1101 may display a video screen 1121 taken by its own camera and a video screen 1122 taken by the camera of the cooperative device 1102 in the cooperative video recording interface 111.
In the collaborative video recording scene, n is an integer greater than or equal to 2, that is, there are at least two groups of cameras, one of which is disposed on the terminal, and one of the other at least one group of cameras is disposed on at least one device (i.e., a collaborative device) in a multi-screen collaborative state with the terminal.
Next, a method for displaying a video frame in a collaborative recording scene will be described with reference to the following embodiments of fig. 12 to 15.
Fig. 12 is a flowchart of a video frame display method according to an embodiment of the present application. Referring to fig. 12, the method includes:
step 1201: after receiving the collaborative video recording instruction, the terminal starts a camera of the terminal, and indicates each collaborative device in at least one collaborative device in a multi-screen collaborative state with the terminal to start the camera of the terminal.
The collaborative video recording instruction is used for indicating collaborative video recording, and the collaborative video recording refers to simultaneously recording by using a camera of the terminal and a camera of collaborative equipment in a multi-screen collaborative state with the terminal. The collaborative video recording instruction can be triggered by a user, and the user can trigger the collaborative video recording instruction through operations such as click operation, sliding operation, voice operation, gesture operation and somatosensory operation.
After the terminal receives the collaborative video recording instruction, the terminal can start a camera of the terminal and instruct each piece of collaborative equipment in the at least one piece of collaborative equipment to start the camera, and after the cameras are started, video pictures can be shot. After each cooperative device in the at least one cooperative device starts its own camera, the video picture shot by its own camera can be sent to the terminal.
Step 1202: the terminal acquires a video picture shot by a camera of the terminal and receives the video picture shot by the camera of the terminal and sent by each cooperative device in the at least one cooperative device.
In this case, as shown in fig. 11, the terminal 1101 may display a video screen 1121 captured by its own camera and a video screen 1122 captured by a camera of the cooperative device 1102.
Moreover, the terminal may further continuously acquire the environmental audio information during the process that the camera of the terminal and the camera of each of the at least one cooperative device capture the video frame, which is specifically described in step 1203 below.
Step 1203: the terminal acquires audio information acquired by a microphone of the terminal and receives audio information acquired by the microphone of the terminal and sent by each cooperative device of the at least one cooperative device, wherein the audio information acquired by the microphone of the terminal and the audio information acquired by the microphone of each cooperative device of the at least one cooperative device are environmental audio information.
In this case, the environment audio information includes at least two audio information, one of the audio information is the audio information collected by the microphone of the terminal in the environment where the terminal is located, and the other at least one audio information is the audio information collected by the microphone of each of the at least one cooperative device in the environment where the terminal is located. The audio information collected by the microphone of the terminal corresponds to the camera of the terminal, and the audio information collected by the microphone of a certain cooperative device corresponds to the camera of the cooperative device.
The terminal can start a camera of the terminal to record video and can start a microphone of the terminal to collect audio information. Similarly, each cooperative device in the at least one cooperative device can start its own microphone to collect audio information while starting its own camera to record video, so that the terminal can be sent with the video image shot by its own camera and the audio information collected by its own microphone.
Step 1204: the terminal determines a first camera from the camera of the terminal and the camera of each cooperative device in the at least one cooperative device according to the environmental audio information, and a human voice sound source exists in a shooting area of the first camera.
The human voice sound source refers to a sound source which emits human voice. The person sound source exists in the shooting area of the first camera, which indicates that the shooting object of the first camera is likely to be a person and is speaking, i.e. the person who is speaking is likely to appear in the video picture shot by the first camera.
After the terminal acquires the audio information in the environment where the terminal is located and the audio information in the environment where each piece of cooperative equipment in the at least one piece of cooperative equipment is located, the terminal can analyze the audio information to determine that a human voice source exists in the shooting areas of the camera of the terminal and the cameras of each piece of cooperative equipment in the at least one piece of cooperative equipment, namely, to determine which cameras shoot the video pictures with the person who is speaking.
The operation of step 1204 is similar to the operation of determining the first camera from the multiple cameras by the terminal according to the environmental audio information in step 402, and details of this embodiment of the present application are omitted.
After the terminal determines the first camera from the camera of the terminal and the camera of each of the at least one cooperative device according to the environmental audio information in step 1204, the other cameras except the first camera from the camera of the terminal and the camera of each of the at least one cooperative device may be referred to as second cameras.
Step 1205: the terminal displays a video picture shot by the first camera and a video picture shot by the second camera in the collaborative video recording interface, and highlights the video picture shot by the first camera compared with the video picture shot by the second camera.
It should be noted that, in the embodiment of the present application, when the terminal displays the video frames captured by the camera of the terminal and the camera of each piece of the at least one piece of cooperative equipment in the cooperative video recording interface, the video frame captured by the first camera of the cameras may be highlighted, that is, the video frame captured by the camera having the vocal sound source in the capturing area is displayed in a display manner different from other video frames, so as to distinguish the video frame where the person speaking is from other video frames. Therefore, the effect of highlighting which video picture can be achieved when the person in which video picture speaks can be achieved, so that the flexibility of video picture display can be improved, a user can watch the video picture better, and the interaction experience of the user can be improved to a certain degree.
Optionally, the terminal may perform zoom-in display on a video image shot by the first camera in the collaborative video interface, and perform zoom-out display on a video image shot by the second camera; or, the terminal may use the video picture shot by the first camera as a main picture in the picture-in-picture mode, use the video picture shot by the second camera as a sub-picture in the picture-in-picture mode, and display the video picture shot by the first camera and the video picture shot by the second camera in the picture-in-picture mode in the collaborative video recording interface.
For example, as shown in fig. 11, the terminal 1101 and the cooperative device 1102 each have a camera, and the terminal 1101 may display a video screen 1121 captured by its own camera and a video screen 1122 captured by the camera of the cooperative device 1102 on the cooperative video recording interface 111. After that, if the terminal 1101 determines that the camera of the terminal is the first camera, that is, determines that a human voice sound source exists in the shooting area of the camera of the terminal, that is, it determines that a person speaking exists in the video screen 1121 shot by the camera of the terminal, as shown in (a) of fig. 13, the terminal 1101 displays the video screen 1121 shot by the camera of the terminal on the collaborative video interface 111 in an enlarged manner and displays the video screen 1122 shot by the camera of the collaborative device 1102 in a reduced manner. Alternatively, if the terminal 1101 determines that the camera of the cooperative device 1102 is the first camera, that is, that a human voice source exists in the shooting area of the camera of the cooperative device 1102, that is, that a person speaking exists in the video screen 1122 shot by the camera of the cooperative device 1102, the terminal 1101 performs an enlargement display of the video screen 1122 shot by the camera of the cooperative device 1102 and a reduction display of the video screen 1121 shot by the camera of the terminal 1101 in the cooperative video recording interface 111 as shown in (b) of fig. 13. In this way, when a person in the video screen 1121 captured by the camera of the terminal 1101 speaks, the video screen 1121 captured by the camera of the terminal 1101 is displayed in an enlarged manner, and when a person in the video screen 1122 captured by the camera of the cooperative apparatus 1102 speaks, the video screen 1122 captured by the camera of the cooperative apparatus 1102 is displayed in an enlarged manner.
For another example, as shown in fig. 11, the terminal 1101 and the cooperative device 1102 each have a camera, and the terminal 1101 may display a video screen 1121 captured by its own camera and a video screen 1122 captured by the camera of the cooperative device 1102 on the cooperative video recording interface 111. Then, if the terminal 1101 determines that the camera of the terminal is the first camera, that is, determines that a human voice sound source exists in the shooting area of the camera of the terminal, that is, it determines that a person speaking exists in the video screen 1121 shot by the camera of the terminal, as shown in (a) of fig. 14, the terminal 1101 uses the video screen 1121 shot by the camera of the terminal as a main screen in a picture-in-picture mode, uses the video screen 1122 shot by the camera of the cooperative device 1102 as a sub-screen in a picture-in-picture mode, and displays the video screen 1121 shot by the camera of the terminal and the video screen 1122 shot by the camera of the cooperative device 1102 in the picture-in-picture mode on the cooperative video interface 111. Alternatively, if the terminal 1101 determines that the camera of the cooperative device 1102 is the first camera, that is, it determines that a human voice source exists in the shooting area of the camera of the cooperative device 1102, that is, it determines that a person speaking exists in the video screen 1122 shot by the camera of the cooperative device 1102, as shown in (b) of fig. 14, the terminal 1101 displays the video screen 1122 shot by the camera of the cooperative device 1102 in the pip mode as a main screen, the video screen 1121 shot by the camera of the terminal 1101 in the pip mode as a sub-screen, and the video screen 1122 shot by the camera of the cooperative device 1102 and the video screen 1121 shot by the camera of the terminal 1101 in the pip mode in the cooperative video interface 111. In this way, it is possible to achieve an effect that the video screen 1121 captured by the camera of the terminal 1101 is displayed as a main screen when a person is speaking in the video screen 1121 captured by the camera of the terminal 1101, and the video screen 1122 captured by the camera of the cooperative apparatus 1102 is displayed as a main screen when a person is speaking in the video screen 1122 captured by the camera of the cooperative apparatus 1102.
It should be noted that if the terminal does not determine the first camera from the camera of the terminal and the camera of each of the at least one cooperative device according to the environmental audio information in the above step 1204, that is, if the first camera does not exist in the cameras, the terminal does not perform the above step 1205, but displays the video frames captured by the camera of the terminal and the camera of each of the at least one cooperative device in the same display manner on the cooperative video interface, or performs differential display on the video frames captured by the camera of the terminal and the camera of each of the at least one cooperative device in the cooperative video interface according to other criteria.
When the terminal performs differentiated display on video pictures shot by the camera of the terminal and the camera of each cooperative device in the at least one cooperative device in the cooperative video interface according to other standards, the terminal may perform differentiated display on the video pictures shot by the camera of the terminal and the camera of each cooperative device in the at least one cooperative device in the cooperative video interface according to standards such as the number of people in the video pictures, the priority of the camera, and the ambient audio loudness. For example, the terminal may detect the number of people appearing in the video frames captured by the camera of the terminal and the camera of each of the at least one collaborative device, and then highlight the video frame with the largest number of people appearing in the collaborative video interface. Or, the terminal may record therein priorities of the camera of the terminal and the camera of each of the at least one cooperative device, and highlight a video picture taken by the camera with the highest priority on the cooperative video interface. Or, the terminal may determine, from the audio information corresponding to the camera of the terminal (i.e., the audio information collected by the microphone of the terminal) and the audio information corresponding to the camera of each of the at least one cooperative device (i.e., the audio information collected by the microphone of each cooperative device), the audio information with the highest loudness, and highlight, at the cooperative video interface, a video frame shot by the camera corresponding to the audio information. The terminal may highlight a certain video frame in the collaborative video interface in various ways. For example, the terminal may perform zoom-in display on the video screen and perform zoom-out display on other video screens in the collaborative video interface; or, the terminal may use the video picture as a main picture in the pip mode and use the other video pictures as sub-pictures in the pip mode, and display the video picture and the other video pictures in the pip mode on the collaborative video recording interface. Of course, the terminal may also highlight the video frame in the collaborative video interface in other ways, which is not limited in this embodiment of the present application.
It should be noted that the terminal may continuously perform the above steps 1203 to 1205 during the process of capturing the video picture by the camera of the terminal and the camera of each of the at least one cooperative device. Therefore, the terminal can dynamically adjust the display modes of the video pictures shot by the camera of the terminal and the camera of each cooperative device in the at least one cooperative device according to the real-time audio in the environment where the terminal and each cooperative device in the at least one cooperative device are located in the whole cooperative video recording process, so that the effect of highlighting which video picture is displayed when the person in which video picture speaks is achieved.
To facilitate understanding of the above-mentioned collaborative video recording scenario, the following describes an example of the video screen display method according to the embodiment of fig. 12 with reference to the software system shown in fig. 15.
Referring to fig. 15, the software system of the terminal may include a camera application and a multi-screen cooperative application (not shown in the drawing) in an application layer, and an Audio framework in an application framework layer, and an Audio service in a system layer, and an Audio HAL in an extension layer. The software system of the collaborative device may include an Audio framework in an application framework layer, and an Audio service in a system layer, and an Audio HAL in an extension layer. Wherein the audio framework comprises an AudioRecord. The audio service includes AudioFlinger. In addition, the terminal and the cooperative device are provided with a camera (not shown in the figure) and a microphone, the camera can shoot video pictures, and the microphone can collect audio information.
In this case, the video picture display process may include the following steps (1) to (5).
(1) And after a camera application program in the terminal is started, starting a camera of the terminal. And after detecting that a camera interface is opened, a multi-screen cooperative application program in the terminal indicates the cooperative equipment to start a camera of the cooperative equipment.
And after a camera application program in the terminal is started, the terminal displays a camera interface. At this time, the camera application program in the terminal prepares for video recording, namely, a camera of the terminal is started. After the camera of the terminal is started, a video picture can be shot, and a camera application program in the terminal can acquire the video picture shot by the camera of the terminal.
After the cooperative device starts the camera of the cooperative device, the video picture shot by the camera of the cooperative device can be sent to the terminal, so that a camera application program in the terminal can acquire the video picture shot by the camera of the cooperative device.
(2) And after the camera interface receives the click operation of the collaborative video recording button, the camera application program in the terminal instructs an audio framework in the terminal to acquire the audio information in the environment where the terminal is located. After detecting the click operation of the collaborative video recording button, a multi-screen collaborative application program in the terminal indicates the collaborative device to acquire the audio information in the environment where the collaborative device is located.
And after the user clicks the collaborative video recording button on the camera interface, triggering a collaborative video recording instruction. In this case, the camera interface is the above-described collaborative recording interface.
Optionally, the AudioRecord in the Audio framework in the terminal may call the audiohal in the terminal through the AudioFlinger in the terminal to instruct the audiohal in the terminal to acquire Audio information in the environment where the terminal is located. The Audio HAL in the terminal may receive the Audio information collected by the microphone of the terminal and then pass the Audio information to the AudioRecord in the terminal via the audioflanger in the terminal.
Optionally, the AudioRecord in the Audio framework of the cooperative device may call the Audio HAL in the cooperative device through the AudioFlinger in the cooperative device to instruct the Audio HAL in the cooperative device to acquire Audio information in the environment where the cooperative device is located. The Audio HAL in the collaborative device may receive the Audio information collected by the microphone of the collaborative device, then transfer the Audio information to the Audio record in the collaborative device through the Audio flinger in the collaborative device, and then transfer the Audio information to the Audio framework in the terminal by the Audio record in the collaborative device.
The audio information in the environment where the terminal is located (i.e. the audio information collected by the microphone of the terminal) and the audio information in the environment where the cooperative device is located (i.e. the audio information collected by the microphone of the cooperative device) are environment audio information. In this case, the environment audio information includes two pieces of audio information, one of which corresponds to the camera of the terminal and the other of which corresponds to the camera of the cooperative device.
(3) And the audio frame in the terminal performs audio directivity analysis according to the environmental audio information to obtain directivity data.
The audio directivity analysis is to analyze the directivity of the audio from the human voice source in the environmental audio information to determine whether the direction in which the human voice source is located is the shooting direction of the camera, and accordingly obtain directivity data, wherein the directivity data is used for indicating which camera the shooting direction of which is the same as the direction in which the human voice source is located is, namely indicating which camera is the first camera.
Optionally, an audio frame in the terminal includes an audio directivity analysis algorithm, so as to implement the audio directivity analysis on the environmental audio information in step (3) above. The audio directivity analysis algorithm may be performed, for example, by AudioPolicy in an audio framework in the terminal. AudioPolicy is an audio policy maker and is responsible for policy selection, volume adjustment policy, and the like for switching audio devices.
And the audio framework in the terminal performs audio directivity analysis according to the environmental audio information to obtain directivity data, namely the audio framework in the terminal determines the first camera from the camera of the terminal and the camera of the cooperative device according to the environmental audio information. The operation of determining the first camera from the camera of the terminal and the camera of the cooperative device by the audio frame in the terminal according to the environmental audio information is similar to the operation of determining the first camera from the camera of the terminal and the camera of each cooperative device in the at least one cooperative device according to the environmental audio information in step 1204, which is not described in detail herein for the embodiment of the present application.
(4) An audio framework in the terminal sends the directional data to a camera application in the terminal.
Optionally, the audio framework in the terminal may also send the environmental audio information to the camera application in the terminal, where the environmental audio information includes two pieces of audio information, and each piece of audio information in the two pieces of audio information may carry a Device identifier (Device ID) of a Device to which the audio information belongs, that is, one piece of audio information carries the Device identifier of the terminal and the other piece of audio information carries the Device identifier of the cooperative Device, so that the camera application in the subsequent terminal may generate corresponding audio data for video pictures taken by the camera of the terminal and the camera of the cooperative Device according to the two pieces of audio information, and output a corresponding video file accordingly. Illustratively, the audio handler (AudioHandler) in the camera application in the terminal may be configured to generate corresponding audio data for video frames captured by the camera of the terminal and the camera of the cooperative device respectively according to the two pieces of audio information.
(5) And the camera application program in the terminal dynamically adjusts a plurality of video pictures displayed in the camera interface according to the directivity data.
The camera application in the terminal displays the video pictures shot by the camera of the terminal and the camera of the cooperative device in the camera interface, and highlights the video picture shot by the first camera indicated by the directional data compared with other video pictures, and the specific highlighting manner may refer to the relevant description in step 1205, which is not described herein again in this embodiment.
It should be noted that, during the collaborative video recording process, the above steps (2) - (5) may be continuously performed. Therefore, an audio frame in the terminal can acquire environmental audio information in real time and perform audio directivity analysis to obtain directivity data, a camera application program can dynamically adjust a plurality of video pictures displayed in a camera interface according to the directivity data to achieve the effect of highlighting which video picture when a person in which video picture speaks in the whole collaborative video recording process, so that the flexibility of video picture display can be improved, a user can watch the video pictures better, and the interactive experience of the user can be improved to a certain extent.
The third shooting scenario is explained below:
the third shooting scene: video call scenario
In such a shooting scene, the terminal has a plurality of cameras, and the terminal is in a video call state with another device (which may be referred to as a far-end call device).
In the process of carrying out video call between the terminal and the far-end call equipment, each camera in a plurality of cameras of the terminal shoots a video picture. The terminal can display a video picture shot by one camera in the multiple cameras of the terminal in a video call interface of the terminal, and can display the received video picture sent by the far-end call equipment in the video call interface of the terminal. Meanwhile, the terminal can send a video picture shot by a camera of the terminal and displayed in a video call interface of the terminal to the far-end call equipment so that the far-end call equipment can display the video call interface.
Illustratively, as shown in fig. 16, the terminal 1601 may have a front camera and a rear camera. The terminal 1601 and the far-end call device 1602 conduct a video call. During the video call, the front camera and the rear camera of the terminal 1601 both capture video pictures, and the terminal 1601 displays the video picture 1631 captured by the front camera of the terminal and displays the received video picture 1641 sent by the far-end call device 1602 in the video call interface 161 of the terminal. Meanwhile, the terminal 1601 sends the video frame 1631 captured by the front camera of the terminal displayed in the video call interface 161 of the terminal to the far-end call device 1602, so that the far-end call device 1602 can display the video call interface 162.
In a video call scene, the n is 1, that is, the plurality of cameras belong to a group, the plurality of cameras are all disposed at the terminal, and shooting directions of the plurality of cameras are different.
Next, a method for displaying video frames in a video call scene is described by the following embodiments in fig. 17 to 18.
Fig. 17 is a flowchart of a video frame display method according to an embodiment of the present application. Referring to fig. 17, the method includes:
step 1701: after the terminal receives the video call instruction, a plurality of cameras of the terminal are started, and video pictures shot by each camera in the plurality of cameras are obtained.
The video call instruction is used for indicating the video call with the far-end call device. The video call instruction can be triggered by a user, and the user can trigger the video call instruction through operations such as click operation, sliding operation, voice operation, gesture operation and somatosensory operation.
After the terminal receives the video call instruction, the terminal establishes video call connection with the far-end call device, at the moment, the terminal can start a plurality of cameras of the terminal, and after the plurality of cameras are started, video pictures can be shot. In this case, the terminal may also continuously obtain the environmental audio information during the process of capturing the video images by the multiple cameras, as described in step 1702.
Step 1702: the terminal acquires audio information acquired by a plurality of microphones of the terminal, wherein the audio information acquired by the plurality of microphones is environment audio information.
Under the condition that the plurality of cameras are arranged on the terminal, the environment audio information comprises audio information, namely the audio information in the environment where the terminal where the plurality of cameras are located is located, so that the terminal can collect the audio information through a plurality of microphones of the terminal, the audio information collected by the plurality of microphones is used as the environment audio information, and at the moment, the plurality of cameras are used as a group of cameras to correspond to the environment audio information. The plurality of microphones can be arranged at different positions of the terminal to collect audio information in an all-around manner. The terminal can start a plurality of cameras of the terminal to carry out video call, and can also start a plurality of microphones of the terminal to collect audio information. For example, the terminal may have three microphones, which may be respectively disposed at the top, bottom, and back of the terminal.
Step 1703: the terminal determines a first camera from the plurality of cameras according to the environment audio information, and a human voice sound source exists in a shooting area of the first camera.
The human voice sound source refers to a sound source emitting human voice. The person sound source exists in the shooting area of the first camera, which indicates that the shooting object of the first camera is likely to be a person and is speaking, i.e. the person who is speaking is likely to appear in the video picture shot by the first camera.
After the terminal acquires the audio information in the environment where the terminal is located, the terminal can analyze the audio information to determine which cameras in the multiple cameras of the terminal have the human voice source in the shooting area, namely, determine which cameras in the multiple cameras shoot the person who is speaking in the video picture.
The operation of step 1703 is similar to the operation of the terminal determining the first camera from the multiple cameras according to the environmental audio information in step 603, and this is not described again in this embodiment of the present application.
After the terminal determines the first camera from the multiple cameras according to the environmental audio information in step 1703, the other cameras except the first camera from the multiple cameras may be referred to as second cameras.
Step 1704: the terminal displays the video pictures shot by the first camera and does not display the video pictures shot by the second camera in the video call interface of the terminal, and displays the received video pictures sent by the far-end call equipment in the video call interface of the terminal. And the terminal sends the video picture shot by the first camera to the far-end call equipment so that the far-end call equipment can display the video picture on a video call interface of the far-end call equipment.
In the process of carrying out video call with the far-end call device, the terminal needs to send a video picture shot by a camera of the terminal to the far-end call device so that the far-end call device can display the video call interface of the far-end call device. And the far-end call equipment can send the video picture shot by the far-end call equipment to the terminal so as to be displayed on a video call interface of the terminal by the terminal. That is, the video call interface of the terminal not only displays the video pictures shot by the camera of the terminal, but also displays the video pictures shot by the far-end call device, and the video pictures shot by the camera of the terminal and displayed in the video call interface of the terminal are also synchronously sent to the video call interface of the far-end call device for display.
In this case, during the video call, the terminal displays the video picture shot by the first camera of the terminal in the video call interface of the terminal, but does not display the video picture shot by the second camera of the terminal, and sends the video picture shot by the first camera to the video call interface of the far-end call device for display. That is, in the embodiment of the present application, the video pictures shot by the camera having the vocal sound source in the shooting area among the multiple cameras of the terminal are displayed on the video call interfaces of the terminal and the far-end call device, that is, the video pictures of the person speaking in the video pictures shot by the multiple cameras of the terminal are displayed on the video call interfaces of the terminal and the far-end call device. Therefore, when a person in a video picture shot by a camera in the terminal speaks, the video picture shot by the camera is displayed on the video call interface of the terminal and the far-end call equipment, so that the flexibility of video picture display can be improved, a user can watch the video picture better, and the interactive experience of the user can be improved to a certain extent.
For example, as shown in fig. 18, the terminal 1601 has a front camera and a rear camera, and the terminal 1601 performs a video call with the far-end call device 1602. During the video call, the terminal 1601 activates its front camera and rear camera. Then, if the terminal 1601 determines that the front camera is the first camera, that is, it determines that there is a human voice sound source in the shooting area of the front camera, that is, it determines that there is a person speaking in the video picture shot by the front camera, then as shown in (a) of fig. 18, the terminal 1601 displays a video picture 1621 shot by the front camera and a received video picture 1631 sent by the far-end talking device 1602 in the video talking interface 161 of the terminal, and the terminal 1601 sends the video picture 1621 shot by the front camera to the far-end talking device 1602 for the far-end talking device 1602 to display on the video talking interface 162 of the far-end talking device 1602. At this time, the terminal 1601 and the far-end call device 1602 both display a video picture taken by the front camera of the terminal 1601 and do not display a video picture taken by the rear camera of the terminal 1602. As the video call continues, if the terminal 1601 determines that the rear camera is the first camera, that is, it determines that there is a human voice source in the shooting area of the rear camera, that is, it determines that there is a person speaking in the video picture shot by the rear camera, then as shown in (b) of fig. 18, the terminal 1601 displays a video picture 1622 shot by the rear camera and a received video picture 1631 sent by the far-end call device 1602 in the video call interface 161, and the terminal 1601 sends the video picture 1622 shot by the rear camera to the far-end call device 1602 for the far-end call device 1602 to display on the video call interface 162 of the far-end call device 1602. At this time, the terminal 1601 and the far-end call device 1602 switch from the video screen 1621 captured by the front camera of the terminal 1601 to the video screen 1622 captured by the rear camera of the terminal 1601 for display. In this way, it is possible to achieve an effect that when a person in the video screen 1621 captured by the front camera of the terminal 1601 speaks during a video call, the video screen 1621 captured by the front camera of the terminal 1601 is displayed in the video call interfaces of the terminal 1601 and the remote call device 1602, and when a person in the video screen 1622 captured by the rear camera of the terminal 1601 speaks, the video screen 1622 captured by the rear camera of the terminal 1601 is displayed in the video call interfaces of the terminal 1601 and the remote call device 1602.
It is worth to be noted that the video call mode can form an effect similar to a multi-party video call. For example, when a plurality of friends meet at the scene, a mobile phone is used for carrying out video call with another remote friend. In this case, the mobile phone starts the front camera and the rear camera, and defaults to transmit the video picture shot by the front camera to the video call interface of the far-end call device used by the distant friend for display, and simultaneously, displays the video picture shot by the front camera of the mobile phone and the video picture sent by the far-end call device on the video call interface of the mobile phone. Suppose that a friend of the friends holds the mobile phone, and the friend is in the shooting area of the front camera of the mobile phone, at this time, the front camera of the mobile phone shoots a video picture containing the image of the friend. The remote friend holds the remote conversation equipment to shoot a video picture containing own images. And displaying a video picture containing the image of the friend and a video picture containing the image of the remote friend in the video call interfaces of the mobile phone and the remote call equipment. Then, if another friend of the friends is in the shooting area of the rear camera of the mobile phone and the another friend starts speaking, the mobile phone recognizes that a human voice sound source exists in the shooting area of the rear camera of the mobile phone, so that the rear camera can be determined to be the first camera, a video picture containing an image of the another friend shot by the rear camera can be switched and displayed on the video call interface of the mobile phone, and the video picture containing the image of the another friend shot by the rear camera of the mobile phone is sent to the video call interface of the far-end call device to be displayed. At this moment, the video call interfaces of the mobile phone and the far-end call equipment both display the video picture containing the image of the other friend and display the video picture containing the image of the far-end friend. Therefore, in the video call process, the video call between two friends in the spot party and the remote friend is realized, and the effect similar to the multi-party video call is formed.
It should be noted that if the terminal does not determine the first camera from the multiple cameras in step 1703, that is, if the first camera does not exist in the multiple cameras, the terminal does not perform step 1704, but displays a video image captured by a previously specified camera (i.e., a default displayed camera) in the multiple cameras on the video call interface of the terminal, or may select one camera from the multiple cameras according to other criteria and display the video image captured by the selected camera on the video call interface of the terminal. In this case, the video pictures shot by the other cameras in the multiple cameras except the camera are not displayed on the video call interface of the terminal, and the received video pictures sent by the far-end call device are displayed on the video call interface of the terminal, and the terminal sends the video pictures shot by the camera to the far-end call device so that the far-end call device can display the video call interface of the far-end call device. When the terminal selects one camera from the plurality of cameras according to other standards, the terminal can select one camera from the plurality of cameras according to standards such as the number of people in a video picture, the priority of the cameras and the like. For example, the terminal may detect the number of persons appearing in a video picture taken by each of the plurality of cameras, and then select the camera having the largest number of persons appearing in the video picture. Alternatively, the terminal may record the priority of each of the plurality of cameras and then select the camera with the highest priority.
It should be noted that the terminal may continue to perform the above steps 1702-1704 during the process of capturing the video pictures by the plurality of cameras. Therefore, the terminal can dynamically adjust the video pictures displayed in the video call interface according to the real-time audio in the environment where the terminal is located in the whole video call process, so that the effect of displaying the video pictures shot by which camera in the terminal when a person in the video pictures shot by which camera speaks is achieved in the video call interface.
The fourth shooting scenario is explained below:
a fourth shooting scene: video conference scenario
In such a shooting scene, the terminal and at least one other device both have cameras, and the terminal and the at least one device (which may be called a participating device) are in a video conference state.
In the process of carrying out the video conference between the terminal and the at least one conference participating device, the camera of the terminal and the camera of each conference participating device in the at least one conference participating device shoot video pictures. The terminal can display the video pictures shot by the camera of the terminal and the camera of each of the at least one conferencing device in the video conference interface of the terminal.
Illustratively, as shown in fig. 19, the terminal 1901 and three conferencing appliances 1902 each have a camera. The terminal 1901 and the three conferencing appliances 1902 conduct a video conference. The terminal 1901 displays in the video conference interface 1911 the video frames 1921 captured by its camera and the received video frames 1922 captured by its camera and transmitted by each of the three participating devices 1902. That is, the terminal 1901 displays four video frames on the video conference interface 1911, namely, a video frame 1921 captured by its own camera and a video frame 1922 captured by the camera of each of the three participating devices 1902.
In a video conference scene, n is an integer greater than or equal to 2, that is, the plurality of cameras are divided into a plurality of groups, one group of cameras in the plurality of groups of cameras is arranged at the terminal, and other groups of cameras are arranged at each conference participating device which performs a video conference with the terminal.
Next, a video frame display method in a video conference scene is explained by the following embodiments of fig. 20 to 21.
Fig. 20 is a flowchart of a video frame display method according to an embodiment of the present application. Referring to fig. 20, the method includes:
step 2001: and after the terminal receives the video conference instruction, starting a camera of the terminal.
The video conference instruction is used for instructing the terminal to carry out video conference with other conference-participating equipment. The video conference instruction can be triggered by a user, and the user can trigger the video conference instruction through operations such as click operation, sliding operation, voice operation, gesture operation and motion sensing operation.
And after the terminal receives the video conference instruction, the terminal establishes video call connection with each conference-participating device in at least one conference-participating device. At the moment, the terminal can start a camera of the terminal, and the camera of the terminal can shoot video pictures after being started. Each of the at least one conferencing device can start a camera of the conferencing device, shoot video pictures through the camera of the conferencing device, and can send the video pictures shot by the camera of the conferencing device to the terminal.
Step 2002: the terminal acquires a video picture shot by a camera of the terminal and receives the video picture shot by the camera of each of the at least one conferencing equipment.
In this case, as shown in fig. 19, the terminal 1901 may display, in the video conference interface 1911, a video picture 1921 captured by the camera of the terminal 1901, and may also display a received video picture 1922 captured by the camera of each of the at least one participating device 1902 and transmitted by the at least one participating device 1902.
Moreover, the terminal may also continuously obtain the environmental audio information during the video conference, as described in step 2003 below.
Step 2003: the terminal acquires audio information acquired by a microphone of the terminal and receives the audio information acquired by the microphone of each of the at least one conferencing equipment, wherein the audio information acquired by the microphone of the terminal and the audio information acquired by the microphone of each of the at least one conferencing equipment are environmental audio information.
In this case, the environmental audio information includes a plurality of audio information, one of the audio information is the audio information collected by the microphone of the terminal in the environment where the terminal is located, and the other at least one audio information is the audio information collected by the microphone of each of the at least one conferencing appliance in the environment where the terminal is located. The audio information collected by the microphone of the terminal corresponds to the camera of the terminal. And the audio information collected by the microphone of a certain conference participating device corresponds to the camera of the conference participating device.
The terminal can start a camera of the terminal to carry out a video conference and can start a microphone of the terminal to collect audio information. Similarly, when a certain conference-participating device starts a camera of the conference-participating device to perform a video conference, a microphone of the conference-participating device can be started to acquire audio information, so that the conference-participating device can send a video picture shot by the camera of the conference-participating device to the terminal and can also send the audio information acquired by the microphone of the conference-participating device to the terminal.
Step 2004: the terminal determines a first camera from the camera of the terminal and the camera of each conference-participating device in the at least one conference-participating device according to the environment audio information, and a human voice sound source exists in a shooting area of the first camera.
The human voice sound source refers to a sound source which emits human voice. The person sound source exists in the shooting area of the first camera, which indicates that the shooting object of the first camera is likely to be a person and is speaking, i.e. the person who is speaking is likely to appear in the video picture shot by the first camera.
After the terminal acquires the audio information in the environment where the terminal is located and the audio information in the environment where each of the at least one conferencing device is located, the plurality of audio information can be analyzed to determine that a voice sound source exists in the shooting areas of the camera of the terminal and the cameras of each of the at least one conferencing device, namely, to determine which cameras of the plurality of cameras shoot the person who is speaking in the video picture.
The operation of step 2004 is similar to the operation of determining the first camera from the multiple cameras by the terminal according to the environmental audio information in step 402, which is not described again in this embodiment of the present application.
In the foregoing step 2004, after the terminal determines the first camera from the camera of the terminal and the camera of each of the at least one conferencing apparatus according to the environmental audio information, the other cameras, except the first camera, of the camera of the terminal and the camera of each of the at least one conferencing apparatus may be referred to as second cameras.
Step 2005: the terminal displays a video picture shot by the first camera and a video picture shot by the second camera in a video conference interface, and compared with the video picture shot by the second camera, the video picture shot by the first camera is highlighted.
It should be noted that, in the embodiment of the present application, when the terminal displays a video picture taken by a camera of the terminal and a video picture taken by a camera of each of the at least one conferencing device in a video conference interface during a video conference, the video picture taken by a first camera of the cameras can be highlighted, that is, a video picture taken by a camera having a human voice source in a shooting area is displayed in a display manner different from other video pictures, so as to distinguish a video picture of a person being spoken from other video pictures. Therefore, the effect of highlighting which video picture can be displayed when the person in which video picture speaks can be achieved, so that the flexibility of video picture display can be improved, a user can pay more attention to the person speaking in the video conference, and the interaction experience of the user can be improved to a certain extent.
Optionally, the terminal may perform zoom-in display on a video picture shot by the first camera in the video conference interface, and perform zoom-out display on a video picture shot by the second camera; or, the terminal may use the video picture shot by the first camera as a main picture in the picture-in-picture mode, use the video picture shot by the second camera as a sub-picture in the picture-in-picture mode, and display the video picture shot by the first camera and the video picture shot by the second camera in the picture-in-picture mode in the video conference interface. Of course, the terminal may also highlight the video picture taken by the first camera in other manners, which is not limited in this embodiment of the application.
For example, as shown in fig. 19, the terminal 1901 and each of the at least one conferencing device 1902 have a camera, and the terminal 1901 may display a video 1921 captured by its camera in the video conferencing interface 1911 and display a video 1922 captured by the camera of each of the at least one conferencing device 1902. Then, if the terminal 1901 determines that the camera of the terminal is the first camera, that is, it determines that there is a human voice sound source in the shooting area of the camera of the terminal, that is, it determines that there is a person speaking in the video picture 1921 shot by the camera of the terminal, as shown in (a) of fig. 21, the terminal 1901 displays the video picture 1921 shot by the camera of the terminal in an enlarged manner and displays the video picture 1922 shot by the camera of each of the at least one conferencing apparatus 1902 in a reduced manner in the video conference interface 1911. As the video conference continues, if the terminal 1901 determines that the camera of one of the at least one participating device 1902 is the first camera, that is, it determines that a human voice sound source exists in the shooting area of the camera of the participating device 1902, that is, it determines that a person in conversation exists in the video pictures 1922 shot by the camera of the participating device 1902, then as shown in fig. 21 (b), the terminal 1901 displays the video pictures 1922 shot by the camera of the participating device 1902 in an enlarged manner in the video conference interface 1911, and displays the video pictures 1922 shot by the cameras of the other participating devices 1902, except the participating device 1902, in the at least one participating device 1902 in a reduced manner in the video conference interface 1911, and displays the video pictures 1921 shot by the camera of the terminal 1901 in a reduced manner. In this way, when a person in a video picture 1921 captured by the camera of terminal 1901 speaks, the video picture 1921 captured by the camera of terminal 1901 is magnified and displayed, and when a person in a video picture 1922 captured by the camera of any one of at least one conferencing apparatus 1902 speaks, the video picture 1922 captured by the camera of that conferencing apparatus 1902 is magnified and displayed.
It should be noted that if the terminal does not determine the first camera from the cameras of the terminal and each of the at least one conferencing device in the step 2004 according to the environmental audio information, that is, if there is no first camera in the plurality of cameras, the terminal does not perform the step 2005, but displays the video frames taken by the cameras of the terminal and each of the at least one conferencing device in the same display manner on the video conference interface, or displays the video frames taken by the cameras of the terminal and the at least one conferencing device in a differentiated manner on the video conference interface according to other criteria.
When the terminal performs differential display on video pictures shot by the camera of the terminal and the at least one conference participating device at the video conference interface by using other standards, the terminal performs differential display on the video pictures shot by the camera of the terminal and the at least one conference participating device at the video conference interface by using standards such as the number of people in the video pictures, the priority of the camera, the loudness of environmental audio and the like. For example, the terminal may detect the number of people appearing in the video pictures taken by the cameras of the terminal and the at least one conferencing device, and then highlight the video picture with the largest number of people appearing in the video conference interface. Or the priority of the terminal and the camera of the at least one conferencing device may be recorded in the terminal, and the video picture taken by the camera with the highest priority is highlighted in the video conference interface. Or, the terminal may determine, from the audio information corresponding to the camera of the terminal (i.e., the audio information collected by the microphone of the terminal) and the audio information corresponding to the camera of each of the at least one conferencing device (i.e., the audio information collected by the microphone of each conferencing device), the audio information with the highest loudness, and highlight, at the video conference interface, the video picture taken by the camera corresponding to the audio information. The terminal may highlight a certain video frame in the video conference interface in various ways. For example, the terminal may enlarge and display the video picture at the video conference interface, and reduce and display other video pictures; or, the terminal may use the video picture as a main picture in picture-in-picture mode and use the other video pictures as sub-pictures in picture-in-picture mode, and display the video picture and the other video pictures in picture-in-picture mode in the video conference interface. Of course, the terminal may highlight the video frame in the video conference interface in other manners, which is not limited in the embodiment of the present application.
It should be noted that, during the process of capturing the video pictures by the camera of the terminal and the camera of each of the at least one conferencing device, the terminal may continuously perform the above steps 2003-2005. Therefore, the terminal can dynamically adjust the display mode of the video picture shot by the camera of each of the terminal and the at least one conferencing device according to the real-time audio in the environment where each of the terminal and the at least one conferencing device is located in the whole video conference process, so that the effect of highlighting which video picture is displayed when the person in which video picture speaks is achieved.
The following explains a fifth shooting scenario:
a fifth shooting scene: centralized monitoring scene
In such a shooting scene, the terminal communicates with a plurality of monitoring apparatuses as a management apparatus. Each of the plurality of monitoring devices has a camera. The shooting area of the camera of each monitoring device in the multiple monitoring devices is different, that is, the multiple monitoring devices can be installed in different areas to monitor different areas. The terminal may have a camera or may not have a camera, which is not limited in the embodiment of the present application. Under the condition that the terminal is provided with the camera, the shooting area of the camera of the terminal can be different from the shooting areas of the cameras of the plurality of monitoring devices, namely, the camera of the terminal can independently realize the monitoring of one area.
In the process that the terminal monitors through the plurality of monitoring devices, the camera of each monitoring device in the plurality of monitoring devices shoots a video picture and transmits the shot video picture to the terminal. The terminal can display the received video pictures shot by the camera of each monitoring device in the plurality of monitoring devices in the monitoring interface. Optionally, in a case that the terminal also starts a camera for monitoring, the terminal may further display a video picture taken by the camera of the terminal in the monitoring interface. In this case, the video pictures taken by the respective cameras may also be referred to as monitoring pictures.
Illustratively, the terminal communicates with three monitoring devices to perform monitoring through the three monitoring devices, each of the three monitoring devices having a camera. As shown in fig. 22, the terminal displays, in the monitoring interface 221, a video picture 2221 that is transmitted by each of the three monitoring apparatuses and is captured by its camera, that is, a monitoring picture of a capturing area of the camera of each of the three monitoring apparatuses. Assuming that the shooting areas of the three monitoring devices are respectively the main-lying monitoring screen, the secondary-lying monitoring screen and the living room, the terminal displays the main-lying monitoring screen, the secondary-lying monitoring screen and the living room monitoring screen on the monitoring interface 221.
In the centralized monitoring scene, n is an integer greater than or equal to 2, that is, the plurality of cameras are divided into a plurality of groups, and each group of cameras in the plurality of groups of cameras is arranged in each monitoring device. Optionally, there may be one camera set in the terminal among the multiple cameras.
The following describes a method for displaying video pictures in a centralized monitoring scene by using the embodiments of fig. 23 to 25.
Fig. 23 is a flowchart of a video frame display method according to an embodiment of the present application. Referring to fig. 23, the method includes:
step 2301: and after receiving the monitoring instruction, the terminal instructs each monitoring device in the plurality of monitoring devices to start the camera.
The monitoring instruction is used for indicating the monitoring of the area. The monitoring instruction can be triggered by a user, and the user can trigger the monitoring instruction through operations such as click operation, sliding operation, voice operation, gesture operation and motion sensing operation.
After the terminal receives the monitoring instruction, the terminal can instruct each monitoring device in the plurality of monitoring devices which are communicated with the terminal to start the camera, and after the camera of each monitoring device is started, the terminal can shoot video pictures and can send the shot video pictures to the terminal. Further, if the terminal itself also has a camera, the terminal may also start the camera to capture a video image and obtain the video image captured by the camera.
Step 2302: the terminal receives the video pictures shot by the camera of each monitoring device in the plurality of monitoring devices.
In this case, as shown in fig. 22, the terminal may display, in the monitoring interface 221, the received video screen 2221 transmitted by each of the plurality of monitoring apparatuses and captured by its camera. Further, when the camera of the terminal also monitors, the video picture taken by the camera of the terminal may be displayed on the monitoring interface.
The terminal may also continuously obtain the environmental audio information during the monitoring process, which is described in step 2303 below.
Step 2303: the terminal receives audio information which is sent by each monitoring device in the multiple monitoring devices and collected by a microphone of the monitoring device, and the audio information which is collected by the microphone of each monitoring device in the multiple monitoring devices is environmental audio information.
Further, under the condition that the camera of the terminal also monitors, the terminal may also acquire audio information collected by a microphone of the terminal, where the audio information collected by the microphone of the terminal is also environmental audio information.
In this case, the environmental audio information includes a plurality of audio information, and each of the plurality of audio information is the audio information in the environment where the microphone of each monitoring device collects. The audio information collected by the microphone of a certain monitoring device corresponds to the camera of the monitoring device. Optionally, there may be one of the plurality of audio information that is collected by a microphone of the terminal in an environment where the terminal is located. The audio information collected by the microphone of the terminal corresponds to the camera of the terminal.
When a certain monitoring device starts a camera of the monitoring device to monitor, a microphone of the monitoring device can be started to collect audio information, so that a video picture shot by the camera of the monitoring device can be sent to the terminal, and meanwhile, the audio information collected by the microphone of the monitoring device can also be sent to the terminal. Similarly, the terminal can start a camera of the terminal to monitor and simultaneously start a microphone of the terminal to collect audio information.
Step 2304: the terminal determines a first camera from the cameras of each monitoring device in the plurality of monitoring devices according to the environment audio information, and a human voice sound source exists in a shooting area of the first camera.
The human voice sound source refers to a sound source which emits human voice. The person sound source exists in the shooting area of the first camera, which indicates that the shooting object of the first camera is likely to be a person and is speaking, i.e. the person who is speaking is likely to appear in the video picture shot by the first camera.
And under the condition that a video picture shot by a camera of the terminal is displayed on the monitoring interface, the environment audio information comprises audio information collected by a microphone of the terminal. In this case, the terminal may determine the first camera from the camera of the terminal and the cameras of each of the plurality of monitoring devices according to the environmental audio information.
After the terminal acquires the audio information in the environment where each monitoring device in the multiple monitoring devices is located and the audio information in the environment where the terminal is located, the terminal can analyze the audio information to determine that a human voice source exists in the shooting area of the camera of each monitoring device in the multiple monitoring devices and the cameras in the terminal, namely, to determine which cameras shoot the person who is speaking.
The operation of step 2304 is similar to the operation of determining the first camera from the multiple cameras by the terminal according to the environmental audio information in step 402, and details of this embodiment are not repeated herein.
After the terminal determines the first camera from the camera of each of the multiple monitoring devices and the camera of the terminal according to the environmental audio information in step 2304, the camera of each of the multiple monitoring devices and the other cameras except the first camera in the camera of the terminal may be referred to as second cameras.
Step 2305: the terminal displays the video pictures shot by the first camera and the video pictures shot by the second camera in a differentiated mode on a monitoring interface.
The terminal displays the video pictures shot by the first camera and the video pictures shot by the second camera in a differentiated mode, namely displays the video pictures shot by the first camera and the video pictures shot by the second camera in different display modes so as to realize the highlight display of the video pictures shot by the first camera. That is, a video image captured by a camera having a human voice sound source in a capturing area is displayed in a display manner different from other video images, so that the video image in which the person speaking is located is different from other video images. Therefore, if a person in a certain video picture speaks, the video picture and other video pictures are displayed in a differentiated mode, so that the flexibility of video picture display can be improved, a user can watch the video pictures better, and the monitoring effect is improved. In addition, the effect of highlighting which video picture is highlighted when the person in which video picture speaks can be achieved, so that the interactive experience of the user can be improved to a certain extent.
The operation in step 2305 is similar to the operation of the terminal performing differentiated display on the video picture shot by the first camera and the video picture shot by the second camera in step 403, and details of this operation are not repeated in this embodiment of the present application.
For example, the terminal communicates with three monitoring devices to perform monitoring through the three monitoring devices, each of the three monitoring devices having a camera. Assuming that the shooting areas of the three monitoring devices are respectively the main-lying area, the secondary-lying area and the living room, as shown in fig. 22, the terminal displays a video picture 2221 shot by the camera of each of the three monitoring devices in the monitoring interface 221, that is, displays the monitoring picture of the main-lying area, the monitoring picture of the secondary-lying area and the monitoring picture of the living room. Then, if the terminal determines that the camera of the monitoring device whose shooting area is the second lying position is the first camera, that is, it determines that there is a human voice sound source in the second lying position, that is, it determines that there is a person speaking in the monitoring picture in the second lying position, as shown in fig. 24, the terminal displays in an enlarged manner in the monitoring interface 221 the video picture 2221 (that is, the monitoring picture in the second lying position) shot by the camera of the monitoring device whose shooting area is the second lying position and displays in a reduced manner the video pictures 2221 (that is, the monitoring picture in the main lying position and the monitoring picture in the living room) shot by the cameras of the other monitoring devices. Therefore, in the centralized monitoring process, the effect of amplifying and displaying the monitoring picture of which area on the monitoring interface when the human voice sound source exists in which area can be achieved.
For another example, the terminal communicates with three monitoring devices to perform monitoring through the three monitoring devices, and each monitoring device in the three monitoring devices has a camera. Assuming that the shooting areas of the three monitoring devices are main lying, sub lying and living room respectively, as shown in fig. 22, the terminal displays a video picture 2221 shot by the camera thereof, which is transmitted by each of the three monitoring devices, in the monitoring interface 221, that is, displays a monitoring picture of main lying, a monitoring picture of sub lying and a monitoring picture of living room. Then, if the terminal determines that the camera of the monitoring device whose shooting area is the second lying position is the first camera, that is, determines that there is a human voice sound source in the second lying position, that is, determines that there is a person speaking in the monitoring picture in the second lying position, as shown in fig. 25, the terminal displays only a video picture 2221 (i.e., a monitoring picture in the second lying position) shot by the camera of the monitoring device whose shooting area is the second lying position in the monitoring interface 221, and does not display video pictures 2221 (i.e., a monitoring picture in the main lying position and a monitoring picture in the living room) shot by the cameras of other monitoring devices. Therefore, in the centralized monitoring process, the effect of independently displaying the monitoring picture of which area on the monitoring interface when the human voice source exists in which area can be achieved. At this time, in the centralized monitoring process, the monitoring interface can continuously switch and display the monitoring pictures of the areas with the human voice sound sources.
It should be noted that, if the terminal does not determine the first camera in the step 2304, that is, if the first camera does not exist in the cameras, the terminal does not perform the step 2305, but displays the video frames captured by the cameras on the monitoring interface in the same display manner, or performs differentiated display on the video frames captured by the cameras on the monitoring interface according to other standards.
When the terminal performs differentiated display on the video pictures shot by the cameras on the monitoring interface by using other standards, the video pictures shot by the cameras on the monitoring interface can be differentiated displayed by using the standards of the number of people in the video pictures, the priority of the cameras, the loudness of environmental audio and the like. For example, the terminal may detect the number of persons appearing in the video pictures taken by each of the cameras, and then highlight the video picture in which the number of persons appearing is the largest. Alternatively, the terminal may record the priority of each of the cameras, and highlight the video image captured by the camera with the highest priority. Or, the terminal may determine, from the audio information corresponding to each of the cameras, one audio information with the highest loudness, and highlight the video picture captured by the camera corresponding to the audio information. There are many ways for the terminal to highlight a certain video frame. For example, the terminal may enlarge and display the video image, and reduce and display other video images; alternatively, the terminal may display only this video picture without displaying the other video pictures. Of course, the terminal may highlight the video frame in other manners, which is not limited in the embodiment of the present application.
It should be noted that the terminal may continuously perform the above steps 2303 to 2305 during the process of capturing the video pictures by the cameras of the multiple monitoring devices. Therefore, the terminal can dynamically adjust the display mode of the video picture shot by the camera of each monitoring device in the plurality of monitoring devices according to the real-time audio in the environment where each monitoring device in the plurality of monitoring devices is located in the whole monitoring process, so that the effect of highlighting which video picture is displayed when the person in which video picture speaks is achieved.
Further, after the display adjustment of the video pictures is realized through the embodiments of fig. 3 to fig. 25, the terminal may further generate video files corresponding to the video pictures shot by the respective cameras. The video file comprises video data and audio data, and the video file format can be set to place the video data and the audio data in one file, so that the video data and the audio data can be played back conveniently and simultaneously, and the video playing is realized.
Optionally, the operation of the terminal generating the video file corresponding to the video picture shot by each camera may include the following three possible modes.
A first possible way: the terminal outputs the multi-video file when shooting is finished, and specifically comprises the following steps (1) and (2).
(1) In the shooting process of the n groups of cameras, the terminal generates multi-channel audio data for any one group of cameras in the n groups of cameras according to the audio information corresponding to the group of cameras, and performs channel separation on the multi-channel audio data of the group of cameras to obtain the audio data of each camera in the group of cameras.
When the terminal generates multi-channel audio data according to the audio information corresponding to the group of cameras, the audio information corresponding to the group of cameras can be processed by using audio processing algorithms such as a recording algorithm and audio zooming to obtain the multi-channel audio data.
The multi-channel audio data includes a plurality of channels of sound, and is analog stereo data. For example, the multi-channel audio data may be 5.1 channel audio data, 7.1 channel audio data, or the like. The 5.1 channels include a center channel, a front left front channel, a front right channel, a rear left surround channel, a rear right surround channel, and a subwoofer channel (i.e., 0.1 channel). The 7.1 channels include a left front surround channel, a right front surround channel, a center surround channel, a left rear surround channel, a right rear surround channel, a left surround channel, and a right surround channel. Illustratively, the multi-channel audio data may be Pulse Code Modulation (PCM) data.
The terminal performs channel separation on the multi-channel audio data of the group of cameras, that is, the multi-channel audio data is disassembled to use the audio data of each part channel in the multi-channel audio data as the audio data of each camera in the group of cameras. At this time, the audio channels of the audio data of different cameras in the group of cameras are different, so that the audio data of different cameras in the group of cameras are different.
For example, the group of cameras includes two cameras, one being a front camera and the other being a rear camera. The multi-channel audio data is 7.1-channel audio data. After the 7.1 channel audio data is subjected to channel separation, the audio data of the first channel and the audio data of the second channel in the 7.1 channel audio data may be used as the audio data of the front camera, and the audio data of the third channel and the audio data of the fourth channel in the 7.1 channel audio data may be used as the audio data of the rear camera. At this time, the audio data of the front camera and the audio data of the rear camera are both dual-channel audio data.
In the shooting process of the n groups of cameras, the terminal can acquire the video pictures shot by each camera in the n groups of cameras, and then acquire the video data of the video pictures shot by each camera in the n groups of cameras. In this way, the terminal may execute the following step (2) during the shooting process of the n groups of cameras to generate the video file corresponding to each camera in the n groups of cameras in real time during the shooting process of the n groups of cameras.
(2) And for any one camera in the n groups of cameras, generating a video file corresponding to the camera according to the video data of the video picture shot by the camera and the audio data of the camera.
Therefore, when the shooting of the n groups of cameras is finished, the terminal can output the video file corresponding to each camera in the n groups of cameras, namely, a plurality of video files. Illustratively, the format of the video file may be MP4 format.
The first possible way is exemplified below with reference to the software system of the terminal shown in fig. 26.
Referring to fig. 26, assuming that n is 1, the group of cameras includes two cameras and are both disposed at the terminal. The video file generation process may include the following steps a-D:
step A: the Camera HAL acquires video data of a video picture shot by each of two cameras of the terminal, and transmits the video data of the video picture shot by each of the two cameras to the Camera application program.
Alternatively, camera HAL may include an Image Sensor (Sensor), an Image Front End (IFE), an Image Processing Engine (IPE), and the like. The data stream transmitted by each of the two cameras of the terminal can be transmitted to the upper layer application after being processed by a Sensor, an IFE, an IPE, and the like in Camera HAL.
As shown in (a) of fig. 26, camera HAL may transmit video data of a video picture photographed by each of the two cameras to an upper Camera application through a Camera service and a Camera framework.
And B: the Audio HAL acquires Audio information, generates multi-channel Audio data from the Audio information, and transmits the multi-channel Audio data to the camera application.
Optionally, an AudioInputStream may be included in the Audio HAL for receiving Audio information collected by a plurality of microphones of the terminal. The Audio HAL may further comprise Audio processing algorithms such as a recording algorithm, audio zooming, etc., for processing the received Audio information to obtain multi-channel Audio data.
As shown in fig. 26 (a), the Audio HAL may transmit the multi-channel Audio data to the camera application of the upper layer through an Audio service (e.g., audio flinger) and an Audio framework (e.g., audio record).
And C: and the camera application program performs sound channel separation on the multi-channel audio data to obtain the audio data of each of the two cameras, and transmits the video data and the audio data of the video pictures shot by each of the two cameras to the coding frame.
The camera application may include an AudioHandler and an audio splitter (audioscope). The AudioHandler may receive the multi-channel audio data, and the audiospot may perform channel separation on the channel audio data.
Step D: and the coding frame generates a corresponding video file according to the video data of the video picture shot by each camera in the two cameras and the audio data of each camera so as to obtain two video files.
As shown in fig. 26 (b), the encoding frame includes a video/audio multiplexer (Muxer). A video and audio multiplexer can encode and synthesize video data of a video picture shot by a camera and audio data of the camera and then output a video file corresponding to the camera.
A second possible way: the terminal outputs the single video file when shooting is finished, and the method specifically comprises the following steps (1) to (4).
(1) In the shooting process of the n groups of cameras, the terminal generates multi-channel audio data for any one group of cameras in the n groups of cameras according to the audio information corresponding to the group of cameras, and performs channel separation on the multi-channel audio data of the group of cameras to obtain the audio data of each camera in the group of cameras.
The operation of step (1) in the second possible manner is the same as the operation of step (1) in the first possible manner, and details of this embodiment are not repeated herein.
(2) The terminal judges whether the video pictures shot by the first camera exist in all the video pictures currently displayed.
If the video picture shot by the first camera exists in all the video pictures currently displayed, the video picture shot by the first camera is the video picture which is highlighted in all the video pictures currently displayed.
If the video pictures shot by the first camera exist in all the video pictures currently displayed, executing the following step (3); if the video picture shot by the first camera does not exist in all the video pictures currently displayed, the following step (4) is executed.
(3) And if the video pictures shot by the first camera exist in all the video pictures currently displayed, the terminal generates a video file according to the video data of all the video pictures currently displayed and the audio data of the first camera.
In this case, the video data of all the video pictures currently being displayed is the video data of the merged picture of all the video pictures currently being displayed. The video data in the video file generated at this time is the video data of the fusion picture of all the displayed video pictures, and the audio data is the audio data of the first camera (i.e. the audio data of the highlighted video picture).
(4) And if the video pictures shot by the first camera do not exist in all the video pictures currently displayed, the terminal performs audio mixing operation on the audio data of the cameras to which all the video pictures currently displayed belong to obtain mixed audio data, and generates a video file according to the video data of all the video pictures currently displayed and the mixed audio data.
In this case, the video data of all the video pictures currently being displayed is the video data of the merged picture of all the video pictures currently being displayed. The video data in the video file generated at this time is the video data of the fusion picture of all the displayed video pictures, and the audio data is the mixed audio data of the cameras to which all the displayed video pictures belong (i.e. the mixed audio data of all the displayed video pictures).
Thus, when the shooting of the n groups of cameras is finished, the terminal can output a video file. The video data in the video file is the video data of the fusion picture of all the video pictures displayed by the terminal, and the audio data in the video file is the audio data of the video picture which is highlighted or the mixed audio data of all the video pictures which are commonly displayed.
For example, as shown in fig. 27, the three consecutive frames of video images of the video file each include a video image captured by the front camera and a video image captured by the rear camera, that is, the three frames of video images are each a fusion image of the video image captured by the front camera and the video image captured by the rear camera. If the front camera in the first frame of video image is the first camera, that is, the video picture shot by the front camera in the first frame of video image is a main picture (that is, a highlighted video picture), the audio data of the front camera can be used as the audio data of the first frame of video image. The rear camera in the second frame of video image is the first camera, that is, the video picture shot by the rear camera in the second frame of video image is the main picture, so that the audio data of the rear camera can be used as the audio data of the second frame of video image. The first camera does not exist in the third frame of video image, that is, the video picture shot by the front camera and the video picture shot by the rear camera in the third frame of video image are not main pictures, so that the mixed audio data of the front camera and the rear camera can be used as the audio data of the third frame of video image.
A third possible way: after shooting is finished, the terminal edits the multiple shot video files to obtain a fused video file, and specifically, the method can include the following steps (1) to (2).
(1) The terminal obtains, by the first possible manner, the video file corresponding to each camera in the n groups of cameras, that is, obtains a plurality of video files, when the shooting by the n groups of cameras is finished.
For any one of the video files, the video file may carry history display information, where the history display information is used to indicate a display condition of video data in the video file in the shooting process of the n groups of cameras in the terminal, that is, the history display information is used to indicate a display time period of each frame of video image in the video data in the video file in the shooting process of the n groups of cameras and whether the video file is in a highlighted state during display. For example, the history display information may be written into the video file as Tag information of the video file.
(2) And after the terminal detects the editing instructions aiming at the plurality of video files, processing the video data and the audio data in each of the plurality of video files according to the historical display information of each of the plurality of video files to obtain a fused video file.
The editing instructions are used for indicating that the plurality of video files are fused into one video file. The editing instruction can be triggered by a user, and the user can trigger the editing instruction through operations such as click operation, sliding operation, voice operation, gesture operation and somatosensory operation.
The operation of the terminal processing the video data and the audio data in each of the plurality of video files according to the history display information of each of the plurality of video files may be: determining, for each display time point, at least one of the plurality of video files that displays a video image at the display time point, and determining whether there is a video file in which the video image is in a highlighted state among the at least one video file, according to the historical display information of each of the plurality of video files; if the video file with the video image in the highlight display state exists in the at least one video file, fusing the video data in the at least one video file according to the highlight display mode to obtain target video data of the display time point, and taking the audio data in the video file with the video image in the highlight display state in the at least one video file as the target audio data of the display time point; if the video file with the video image in the highlighted state does not exist in the at least one video file, fusing the video data in the at least one video file according to a common display mode to obtain target video data of the display time point, and mixing audio data in the at least one video file to obtain target audio data of the display time point; and generating a fusion video file according to the target video data and the target audio data of all the display time points.
Fig. 28 is a schematic structural diagram of a video screen display apparatus provided in this embodiment, where the apparatus may be implemented as part or all of a computer device by software, hardware, or a combination of the two, and the computer device may be a terminal as described in the above embodiments of fig. 1 to fig. 2. Referring to fig. 28, the apparatus includes: an acquisition module 2801, a determination module 2802, and a display module 2803.
An obtaining module 2801, configured to obtain a video picture taken by each of the multiple cameras, and obtain environmental audio information, where the environmental audio information includes audio information in an environment where each of the multiple cameras is located;
a determining module 2802, configured to determine, according to the environmental audio information, a first camera from the multiple cameras, where a human voice source exists in a shooting area of the first camera;
the display module 2803 is used for displaying a video picture shot by a first camera and a video picture shot by a second camera in a differentiated manner, wherein the second camera is another camera except the first camera in the multiple cameras.
Optionally, the multiple cameras are n groups of cameras, different groups of cameras are arranged in different devices, the same group of cameras is arranged in the same device, and n is a positive integer; an obtaining module 2801 is configured to:
acquiring n pieces of audio information, wherein the n pieces of audio information are environmental audio information, the n pieces of audio information correspond to the n groups of cameras one to one, and each piece of audio information in the n pieces of audio information is audio information in the environment where the corresponding group of cameras are located.
Optionally, the determining module 2802 is configured to:
determining at least one target audio information from the n audio information, wherein the target audio information is audio information with human voice; and determining a first camera from a group of cameras corresponding to each target audio information in the at least one target audio information.
Optionally, the determining module 2802 is configured to:
for any target audio information in at least one target audio information, if a group of cameras corresponding to the target audio information comprises j cameras, positioning a human voice sound source according to the target audio information to obtain the direction of the human voice sound source, wherein j is an integer greater than or equal to 2, and the shooting directions of the j cameras are different; and determining a first camera from the j cameras according to the direction of the human voice sound source and the shooting direction of each camera in the j cameras.
Optionally, the display module 2803 is configured to:
the video picture shot by the first camera is displayed in an enlarged mode, and the video picture shot by the second camera is displayed in a reduced mode; or the video picture shot by the first camera is a main picture in a picture-in-picture mode, the video picture shot by the second camera is a sub picture in the picture-in-picture mode, and the video picture shot by the first camera and the video picture shot by the second camera are displayed in the picture-in-picture mode.
Optionally, a plurality of cameras are arranged on the device, and shooting directions of the plurality of cameras are different; the device also includes:
the starting module is used for starting the plurality of cameras after receiving the multi-camera shooting instruction;
an obtaining module 2801 is configured to:
the audio information that a plurality of microphones of acquireing the device gathered, the audio information that a plurality of microphones gathered is environment audio information, and a plurality of microphones set up on the different positions of the device.
Optionally, the plurality of cameras are at least two groups of cameras, one of the at least two groups of cameras is arranged on the apparatus, and the other at least one group of cameras is arranged on at least one cooperative device in a multi-screen cooperative state with the apparatus; the device also includes:
the starting module is used for starting the camera of the device after receiving the collaborative video recording instruction and indicating each piece of collaborative equipment in at least one piece of collaborative equipment to start the camera of the device;
an obtaining module 2801 is configured to:
acquiring a video picture shot by a camera of the device, and receiving the video picture shot by the camera of each cooperative device in at least one cooperative device; the method comprises the steps of acquiring audio information acquired by a microphone of the device, and receiving audio information acquired by the microphone of each cooperative device in at least one cooperative device, wherein the audio information acquired by the microphone of the device and the audio information acquired by the microphone of each cooperative device in at least one cooperative device are environmental audio information.
Optionally, a plurality of cameras are arranged on the device, and the shooting directions of the cameras are different; the device also includes:
the starting module is used for starting the plurality of cameras after receiving a video call instruction, and the video call instruction is used for indicating a call with the far-end call equipment;
an obtaining module 2801 is configured to:
acquiring audio information acquired by a plurality of microphones of the device, wherein the audio information acquired by the plurality of microphones is environmental audio information, and the plurality of microphones are arranged at different positions of the device;
the display module 2803 is to:
and displaying the video picture shot by the first camera on the video call interface, not displaying the video picture shot by the second camera, and sending the video picture shot by the first camera to the far-end call equipment for displaying.
Optionally, the apparatus further comprises:
the first generation module is used for generating multi-channel audio data for any one group of cameras in the n groups of cameras according to audio information corresponding to the group of cameras in the shooting process of the n groups of cameras, and performing channel separation on the multi-channel audio data of the group of cameras to obtain the audio data of each camera in the group of cameras, wherein the channels of the audio data of different cameras in the group of cameras are different;
and the second generation module is used for generating a video file corresponding to one camera according to the video data of the video picture shot by the one camera and the audio data of the one camera for any one camera in the n groups of cameras.
Optionally, the apparatus further comprises:
the first generation module is used for generating multi-channel audio data for any one group of cameras in the n groups of cameras according to audio information corresponding to the group of cameras in the shooting process of the n groups of cameras, and performing channel separation on the multi-channel audio data of the group of cameras to obtain the audio data of each camera in the group of cameras, wherein the channels of the audio data of different cameras in the group of cameras are different;
the third generation module is used for generating a video file according to the video data of all the video pictures currently displayed and the audio data of the first camera if the video pictures shot by the first camera exist in all the video pictures currently displayed; and if the video pictures shot by the first camera do not exist in all the video pictures currently displayed, performing audio mixing operation on the audio data of the cameras to which all the video pictures currently displayed belong to obtain mixed audio data, and generating a video file according to the video data and the mixed audio data of all the video pictures currently displayed.
In the embodiment of the application, the video pictures shot by the first camera and the video pictures shot by the second camera in the multiple cameras are displayed in a differentiated mode, namely the video pictures shot by the first camera and the video pictures shot by the second camera are displayed in different display modes, so that the video pictures shot by the first camera are highlighted. That is, a video image captured by a camera having a human voice sound source in a capturing area is displayed in a display manner different from other video images, so that the video image in which the person speaking is located is different from other video images. Therefore, if the person in a certain video picture speaks, the video picture and other video pictures are displayed in a differentiated mode, so that the flexibility of video picture display can be improved, and a user can watch the video pictures better. Moreover, the effect of highlighting which video picture is highlighted when the person in which video picture speaks can be achieved, so that the interactive experience of the user can be improved to a certain extent.
It should be noted that: in the video image display apparatus provided in the foregoing embodiment, when displaying a video image, only the division of the above functional modules is illustrated, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the above described functions.
Each functional unit and module in the above embodiments may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the embodiments of the present application.
The video frame display apparatus and the video frame display method provided in the above embodiments belong to the same concept, and for the specific working processes of the units and modules and the technical effects brought about in the above embodiments, reference may be made to the method embodiment section, which is not described herein again.
In the above embodiments, the implementation may be wholly or partly realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., digital Versatile Disk (DVD)), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.
The above description is not intended to limit the present application to the particular embodiments disclosed, but rather, the present application is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present application.

Claims (13)

1. A video picture display method is applied to a terminal, and the method comprises the following steps:
acquiring video pictures shot by each camera in a plurality of cameras and acquiring environment audio information, wherein the environment audio information comprises audio information in the environment where each camera in the plurality of cameras is located;
determining a first camera from the plurality of cameras according to the environmental audio information, wherein a human voice sound source exists in a shooting area of the first camera;
and displaying the difference between the video pictures shot by the first camera and the video pictures shot by a second camera, wherein the second camera is the other cameras except the first camera in the plurality of cameras.
2. The method of claim 1, wherein the plurality of cameras are n groups of cameras, different groups of cameras are disposed on different devices, the same group of cameras is disposed on the same device, and n is a positive integer;
the acquiring of the environmental audio information includes:
acquiring n pieces of audio information, wherein the n pieces of audio information are the environmental audio information, the n pieces of audio information correspond to the n groups of cameras one to one, and each piece of audio information in the n pieces of audio information is the audio information in the environment where the corresponding group of cameras are located.
3. The method of claim 2, wherein determining a first camera from the plurality of cameras based on the environmental audio information comprises:
determining at least one target audio information from the n audio information, wherein the target audio information is audio information with human voice;
and determining the first camera from a group of cameras corresponding to each target audio information in the at least one target audio information.
4. The method of claim 3, wherein the determining the first camera from a set of cameras corresponding to each of the at least one target audio information comprises:
for any target audio information in the at least one target audio information, if a group of cameras corresponding to the target audio information comprises j cameras, positioning a human voice sound source according to the target audio information to obtain the direction of the human voice sound source, wherein j is an integer greater than or equal to 2, and the shooting directions of the j cameras are different;
and determining the first camera from the j cameras according to the direction of the human voice sound source and the shooting direction of each camera in the j cameras.
5. The method of any one of claims 1-4, wherein the differentiating the video frames captured by the first camera from the video frames captured by the second camera comprises:
the video picture shot by the first camera is displayed in an enlarged mode, and the video picture shot by the second camera is displayed in a reduced mode; alternatively, the first and second liquid crystal display panels may be,
and displaying the video picture shot by the first camera and the video picture shot by the second camera in a picture-in-picture mode by taking the video picture shot by the first camera as a main picture of the picture-in-picture mode and taking the video picture shot by the second camera as a sub picture of the picture-in-picture mode.
6. The method according to any one of claims 1 to 5, wherein the plurality of cameras are all arranged on the terminal, and shooting directions of the plurality of cameras are different;
before the video picture that every camera shoots in a plurality of cameras is obtained, still include:
after receiving a multi-camera shooting and recording instruction, starting the plurality of cameras;
the acquiring of the environmental audio information includes:
and acquiring audio information acquired by a plurality of microphones of the terminal, wherein the audio information acquired by the plurality of microphones is the environment audio information, and the plurality of microphones are arranged at different positions of the terminal.
7. The method according to any one of claims 1 to 5, wherein the plurality of cameras are at least two groups of cameras, one of the at least two groups of cameras is disposed on the terminal, and at least one other group of cameras is disposed on at least one cooperative device in a multi-screen cooperative state with the terminal;
before the video picture that every camera in a plurality of cameras was shot is obtained, still include:
after receiving the collaborative video recording instruction, starting a camera of the terminal, and instructing each piece of collaborative equipment in the at least one piece of collaborative equipment to start the camera of the collaborative video recording equipment;
the video picture that every camera shot in obtaining a plurality of cameras includes:
acquiring a video picture shot by a camera of the terminal, and receiving the video picture shot by the camera of each cooperative device in the at least one cooperative device;
the acquiring of the environmental audio information includes:
acquiring audio information acquired by a microphone of the terminal, and receiving the audio information acquired by the microphone of the terminal and sent by each piece of cooperative equipment in the at least one piece of cooperative equipment, where the audio information acquired by the microphone of the terminal and the audio information acquired by the microphone of each piece of cooperative equipment in the at least one piece of cooperative equipment are the environment audio information.
8. The method according to any one of claims 1 to 4, wherein the plurality of cameras are all arranged on the terminal, and shooting directions of the plurality of cameras are different;
before the video picture that every camera shoots in a plurality of cameras is obtained, still include:
after a video call instruction is received, starting the plurality of cameras, wherein the video call instruction is used for indicating a call with a far-end call device;
the acquiring of the environmental audio information includes:
acquiring audio information acquired by a plurality of microphones of the terminal, wherein the audio information acquired by the plurality of microphones is the environmental audio information, and the plurality of microphones are arranged at different positions of the terminal;
the difference display is carried out to the video picture that the first camera was shot and the video picture that the second camera was shot, includes:
and displaying the video picture shot by the first camera on a video call interface, not displaying the video picture shot by the second camera, and sending the video picture shot by the first camera to the far-end call equipment for displaying.
9. The method of any of claims 2-8, wherein the method further comprises:
in the shooting process of the n groups of cameras, generating multi-channel audio data for any one group of cameras in the n groups of cameras according to the audio information corresponding to the group of cameras, and performing channel separation on the multi-channel audio data of the group of cameras to obtain the audio data of each camera in the group of cameras, wherein the channels of the audio data of different cameras in the group of cameras are different;
and for any one of the n groups of cameras, generating a video file corresponding to the camera according to the video data of the video picture shot by the camera and the audio data of the camera.
10. The method of any of claims 2-8, further comprising:
in the shooting process of the n groups of cameras, generating multi-channel audio data for any one group of cameras in the n groups of cameras according to the audio information corresponding to the group of cameras, and performing channel separation on the multi-channel audio data of the group of cameras to obtain the audio data of each camera in the group of cameras, wherein the channels of the audio data of different cameras in the group of cameras are different;
if the video pictures shot by the first camera exist in all the video pictures currently displayed, generating a video file according to the video data of all the video pictures currently displayed and the audio data of the first camera;
and if the video pictures shot by the first camera do not exist in all the video pictures currently displayed, performing audio mixing operation on the audio data of the cameras to which all the video pictures currently displayed belong to obtain mixed audio data, and generating a video file according to the video data of all the video pictures currently displayed and the mixed audio data.
11. A video picture display apparatus, characterized in that the apparatus comprises:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring video pictures shot by each camera in a plurality of cameras and acquiring environment audio information, and the environment audio information comprises audio information in the environment where each camera in the plurality of cameras is located;
the determining module is used for determining a first camera from the plurality of cameras according to the environmental audio information, and a human voice sound source exists in a shooting area of the first camera;
and the display module is used for displaying the difference between the video pictures shot by the first camera and the video pictures shot by the second camera, and the second camera is other cameras except the first camera in the multiple cameras.
12. A computer arrangement comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the computer program, when executed by the processor, implementing the method of any one of claims 1-10.
13. A computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1-10.
CN202210384109.4A 2022-04-13 2022-04-13 Video picture display method, device, equipment and storage medium Active CN115550559B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210384109.4A CN115550559B (en) 2022-04-13 2022-04-13 Video picture display method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210384109.4A CN115550559B (en) 2022-04-13 2022-04-13 Video picture display method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115550559A true CN115550559A (en) 2022-12-30
CN115550559B CN115550559B (en) 2023-07-25

Family

ID=84724672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210384109.4A Active CN115550559B (en) 2022-04-13 2022-04-13 Video picture display method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115550559B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117389507A (en) * 2023-12-12 2024-01-12 荣耀终端有限公司 Audio data processing method, electronic device and storage medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1416538A (en) * 2001-01-12 2003-05-07 皇家菲利浦电子有限公司 Method and appts. for determining camera movement control criteria
JP2004179971A (en) * 2002-11-27 2004-06-24 Fuji Photo Film Co Ltd Monitor camera
US20070091178A1 (en) * 2005-10-07 2007-04-26 Cotter Tim S Apparatus and method for performing motion capture using a random pattern on capture surfaces
CN102215372A (en) * 2010-04-07 2011-10-12 苹果公司 Remote control operations in a video conference
CN102281425A (en) * 2010-06-11 2011-12-14 华为终端有限公司 Method and device for playing audio of far-end conference participants and remote video conference system
CN102891984A (en) * 2011-07-20 2013-01-23 索尼公司 Transmitting device, receiving system, communication system, transmission method, reception method, and program
CN103595953A (en) * 2013-11-14 2014-02-19 华为技术有限公司 Method and device for controlling video shooting
US20150049163A1 (en) * 2013-03-15 2015-02-19 James Paul Smurro Network system apparatus and method of use adapted for visual neural networking with multi-channel multiplexed streaming medical imagery and packetized clinical informatics
CN107770477A (en) * 2017-11-07 2018-03-06 广东欧珀移动通信有限公司 Video call method, device, terminal and storage medium
US20180196585A1 (en) * 2017-01-10 2018-07-12 Cast Group Of Companies Inc. Systems and Methods for Tracking and Interacting With Zones in 3D Space
CN111669636A (en) * 2020-06-19 2020-09-15 海信视像科技股份有限公司 Audio-video synchronous video recording method and display equipment
CN112995566A (en) * 2019-12-17 2021-06-18 佛山市云米电器科技有限公司 Sound source positioning method based on display equipment, display equipment and storage medium
CN113365012A (en) * 2020-03-06 2021-09-07 华为技术有限公司 Audio processing method and device
CN113727021A (en) * 2021-08-27 2021-11-30 维沃移动通信(杭州)有限公司 Shooting method and device and electronic equipment
WO2022068537A1 (en) * 2020-09-29 2022-04-07 华为技术有限公司 Image processing method and related apparatus

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1416538A (en) * 2001-01-12 2003-05-07 皇家菲利浦电子有限公司 Method and appts. for determining camera movement control criteria
JP2004179971A (en) * 2002-11-27 2004-06-24 Fuji Photo Film Co Ltd Monitor camera
US20070091178A1 (en) * 2005-10-07 2007-04-26 Cotter Tim S Apparatus and method for performing motion capture using a random pattern on capture surfaces
CN102215372A (en) * 2010-04-07 2011-10-12 苹果公司 Remote control operations in a video conference
CN102281425A (en) * 2010-06-11 2011-12-14 华为终端有限公司 Method and device for playing audio of far-end conference participants and remote video conference system
CN102891984A (en) * 2011-07-20 2013-01-23 索尼公司 Transmitting device, receiving system, communication system, transmission method, reception method, and program
US20150049163A1 (en) * 2013-03-15 2015-02-19 James Paul Smurro Network system apparatus and method of use adapted for visual neural networking with multi-channel multiplexed streaming medical imagery and packetized clinical informatics
CN103595953A (en) * 2013-11-14 2014-02-19 华为技术有限公司 Method and device for controlling video shooting
US20180196585A1 (en) * 2017-01-10 2018-07-12 Cast Group Of Companies Inc. Systems and Methods for Tracking and Interacting With Zones in 3D Space
CN107770477A (en) * 2017-11-07 2018-03-06 广东欧珀移动通信有限公司 Video call method, device, terminal and storage medium
CN112995566A (en) * 2019-12-17 2021-06-18 佛山市云米电器科技有限公司 Sound source positioning method based on display equipment, display equipment and storage medium
CN113365012A (en) * 2020-03-06 2021-09-07 华为技术有限公司 Audio processing method and device
CN111669636A (en) * 2020-06-19 2020-09-15 海信视像科技股份有限公司 Audio-video synchronous video recording method and display equipment
WO2022068537A1 (en) * 2020-09-29 2022-04-07 华为技术有限公司 Image processing method and related apparatus
CN113727021A (en) * 2021-08-27 2021-11-30 维沃移动通信(杭州)有限公司 Shooting method and device and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117389507A (en) * 2023-12-12 2024-01-12 荣耀终端有限公司 Audio data processing method, electronic device and storage medium

Also Published As

Publication number Publication date
CN115550559B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN110109636B (en) Screen projection method, electronic device and system
CN110231905B (en) Screen capturing method and electronic equipment
KR20170091913A (en) Method and apparatus for providing video service
CN111243632A (en) Multimedia resource generation method, device, equipment and storage medium
CN111246300B (en) Method, device and equipment for generating clip template and storage medium
CN112398855B (en) Method and device for transferring application contents across devices and electronic device
CN114040242B (en) Screen projection method, electronic equipment and storage medium
JP7416519B2 (en) Multi-terminal multimedia data communication method and system
CN113726950A (en) Image processing method and electronic equipment
CN114185503B (en) Multi-screen interaction system, method, device and medium
CN115756270B (en) Content sharing method, device and system
WO2024045801A1 (en) Method for screenshotting, and electronic device, medium and program product
CN115550597A (en) Shooting method, system and electronic equipment
CN114697742A (en) Video recording method and electronic equipment
US11870941B2 (en) Audio processing method and electronic device
CN115550559B (en) Video picture display method, device, equipment and storage medium
CN114356195B (en) File transmission method and related equipment
CN114827581A (en) Synchronization delay measuring method, content synchronization method, terminal device, and storage medium
CN112822544A (en) Video material file generation method, video synthesis method, device and medium
CN115242994B (en) Video call system, method and device
CN114780268A (en) Notification message display method and electronic equipment
CN111294509A (en) Video shooting method, device, terminal and storage medium
CN115016871B (en) Multimedia editing method, electronic device and storage medium
WO2024022307A1 (en) Screen mirroring method and electronic device
CN116301541A (en) Method for sharing file, electronic device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant