CN115550559B - Video picture display method, device, equipment and storage medium - Google Patents
Video picture display method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN115550559B CN115550559B CN202210384109.4A CN202210384109A CN115550559B CN 115550559 B CN115550559 B CN 115550559B CN 202210384109 A CN202210384109 A CN 202210384109A CN 115550559 B CN115550559 B CN 115550559B
- Authority
- CN
- China
- Prior art keywords
- camera
- video
- cameras
- terminal
- audio information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 119
- 230000007613 environmental effect Effects 0.000 claims abstract description 112
- 230000008569 process Effects 0.000 claims description 50
- 238000000926 separation method Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 abstract description 36
- 238000012806 monitoring device Methods 0.000 description 62
- 238000012544 monitoring process Methods 0.000 description 62
- 238000007726 management method Methods 0.000 description 29
- 230000006870 function Effects 0.000 description 27
- 238000010586 diagram Methods 0.000 description 24
- 238000012545 processing Methods 0.000 description 20
- 238000004891 communication Methods 0.000 description 18
- 238000004458 analytical method Methods 0.000 description 15
- 230000003993 interaction Effects 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 9
- 230000004927 fusion Effects 0.000 description 9
- 238000010295 mobile communication Methods 0.000 description 8
- 230000002452 interceptive effect Effects 0.000 description 7
- 230000003238 somatosensory effect Effects 0.000 description 6
- 230000001960 triggered effect Effects 0.000 description 6
- 238000001514 detection method Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 230000004807 localization Effects 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/431—Generation of visual interfaces for content selection or interaction; Content or additional data rendering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/18—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
- H04N7/181—Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a plurality of remote sources
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Studio Devices (AREA)
Abstract
The application discloses a video picture display method, a device, equipment and a storage medium, and belongs to the technical field of display. The method comprises the following steps: and acquiring video pictures shot by each camera in the plurality of cameras and acquiring environmental audio information. And then, according to the environmental audio information, determining a first camera from the plurality of cameras, wherein a human sound source exists in a shooting area of the first camera. And finally, performing differential display on the video picture shot by the first camera and the video picture shot by the second camera. According to the method and the device, the display modes of the video pictures can be dynamically adjusted according to the audio information in the environment where each camera is located, so that the differentiated display of the video pictures is realized, and the effect of highlighting the video pictures when people in the video pictures speak is achieved. Therefore, the flexibility of video picture display can be improved, and a user can watch the video picture better.
Description
Technical Field
The present disclosure relates to the field of display technologies, and in particular, to a method, an apparatus, a device, and a storage medium for displaying a video picture.
Background
Along with the development of terminal technology, the terminal gradually integrates the functions of communication, shooting, video and audio, and the like, and becomes an indispensable part in daily life of people. At present, the terminal can be provided with a front camera and a rear camera, and the terminal can use the front camera and the rear camera to realize a double-camera video recording function. Specifically, when the terminal performs double-shot video recording, the front camera and the rear camera can be started simultaneously to perform video recording, and a video picture shot by the front camera and a video picture shot by the rear camera can be displayed in a video recording interface.
Disclosure of Invention
The application provides a video picture display method, a device, equipment and a storage medium, which can improve the flexibility of video picture display. The technical scheme is as follows:
in a first aspect, a video frame display method is provided, which is applied to a terminal. In the method, a video picture shot by each of a plurality of cameras is acquired, and environmental audio information is acquired, wherein the environmental audio information comprises audio information in an environment where each of the plurality of cameras is located. Then, according to the environmental audio information, determining a first camera from the plurality of cameras, wherein a human voice sound source exists in a shooting area of the first camera; and carrying out differential display on the video pictures shot by the first camera and the video pictures shot by the second camera, wherein the second camera is the other cameras except the first camera in the plurality of cameras.
In the application, the terminal performs differential display on the video picture shot by the first camera and the video picture shot by the second camera, which means that the video picture shot by the first camera and the video picture shot by the second camera are displayed in different display modes so as to realize the highlighting of the video picture shot by the first camera. That is, the video picture captured by the camera having the human voice sound source in the capturing area is displayed in a different manner from other video pictures, so that the video picture in which the person speaking is located is distinguished from the other video pictures. Therefore, if a person in a certain video picture is speaking, the video picture and other video pictures are displayed in a differentiated mode, so that the flexibility of video picture display can be improved, and a user can watch the video picture better. In addition, the effect of highlighting which video picture can be achieved when the person in which video picture speaks, so that the interaction experience of the user can be improved to a certain extent.
Optionally, the plurality of cameras are n groups of cameras, the cameras of different groups are arranged in different devices, the same group of cameras are arranged in the same device, and n is a positive integer. In this case, the operation of acquiring the environmental audio information may be: acquiring n pieces of audio information, wherein the n pieces of audio information are the environmental audio information, the n pieces of audio information are in one-to-one correspondence with the n groups of cameras, and each piece of audio information in the n pieces of audio information is the audio information in the environment where the corresponding group of cameras are located.
In the application, the terminal can acquire video pictures shot by the camera of each of the n devices, and in this case, the terminal can acquire audio information in the environment where each of the n devices is located, so that it can be determined which device's camera can possibly have a talking person in the video pictures shot by the camera.
Optionally, determining the first camera from the plurality of cameras according to the environmental audio information may be: at least one target audio information is determined from the n audio information, the target audio information being audio information in which a human voice is present, and then a first camera is determined from a group of cameras corresponding to each of the at least one target audio information.
The operation of determining the first camera from the group of cameras corresponding to each target audio information in the at least one target audio information may be: for any one of the at least one piece of target audio information, if a group of cameras corresponding to the one piece of target audio information comprises j cameras, performing voice sound source positioning according to the one piece of target audio information to obtain a direction in which a voice sound source is located, j is an integer greater than or equal to 2, and shooting directions of the j cameras are different; and determining a first camera from the j cameras according to the direction of the voice source and the shooting direction of each camera in the j cameras.
In the application, the direction of the voice sound source can be analyzed according to the target audio information, then the direction of the voice sound source and the shooting direction of the camera are combined, whether the voice in the environment where the camera is located comes from the shooting direction of the camera is analyzed, and therefore whether the voice sound source exists in the shooting area of the camera or not can be determined, namely whether the camera is the first camera or not is determined. Thus, the accuracy of the determined first camera can be improved.
Optionally, the operation of differentially displaying the video frame captured by the first camera and the video frame captured by the second camera may be: the video pictures shot by the first camera are enlarged and displayed, the video pictures shot by the second camera are reduced and displayed, and at the moment, the display area of the video pictures shot by the first camera is larger than that of the video pictures shot by the second camera, so that the highlighting of the video pictures shot by the first camera can be realized. Or, the video picture shot by the first camera is taken as a main picture of the picture-in-picture mode, the video picture shot by the second camera is taken as a sub-picture of the picture-in-picture mode, and the video picture shot by the first camera and the video picture shot by the second camera are displayed in the picture-in-picture mode, wherein the picture-in-picture mode is that the sub-picture is simultaneously displayed on a small area of the main picture in the process of displaying the main picture in a full screen mode, so that the main picture can be highlighted.
Optionally, the plurality of cameras are all arranged at the terminal, and the shooting directions of the plurality of cameras are different. In this case, before the video picture shot by each of the plurality of cameras is acquired, the plurality of cameras may be started after receiving the multi-shot video command. In this case, the operation of acquiring the environmental audio information may be: and acquiring audio information acquired by a plurality of microphones of the terminal, wherein the audio information acquired by the plurality of microphones is the environmental audio information, and the plurality of microphones are arranged at different positions of the terminal.
In the application, when the terminal displays the video picture shot by each of the cameras in the multiple-camera video interface in the multiple-camera video recording process, the video picture shot by the first camera in the multiple cameras can be highlighted, that is, the video picture shot by the camera with the voice source in the shooting area is displayed in a display mode different from other video pictures, so that the video picture of the talking person is distinguished from other video pictures. Therefore, the effect of highlighting which video picture can be achieved when the person in which video picture speaks can be achieved, so that the flexibility of video picture display can be improved, a user can watch the video picture better, and the interactive experience of the user can be improved to a certain extent.
Optionally, the plurality of cameras are at least two groups of cameras, one group of cameras in the at least two groups of cameras is arranged at the terminal, and the other at least one group of cameras are arranged at least one cooperative device in a multi-screen cooperative state with the terminal. In this case, before the video picture shot by each of the plurality of cameras is acquired, after the collaborative video command is received, the camera of the terminal may be started, and each of the at least one collaborative device may be instructed to start its own camera. In this case, the operation of acquiring the video picture photographed by each of the plurality of cameras may be: the method comprises the steps of obtaining video pictures shot by a camera of the terminal, and receiving video pictures shot by the camera of the terminal, wherein the video pictures are sent by each cooperative device in the at least one cooperative device. The operation of acquiring the environmental audio information may be: the method comprises the steps of acquiring audio information acquired by a microphone of the terminal, receiving the audio information acquired by the microphone of the terminal and transmitted by each cooperative device in the at least one cooperative device, wherein the audio information acquired by the microphone of the terminal and the audio information acquired by the microphone of each cooperative device in the at least one cooperative device are the environmental audio information.
In the application, when the terminal displays the camera of the terminal and the video picture shot by the camera of each cooperative device in the at least one cooperative device in the cooperative video interface in the cooperative video recording process, the video picture shot by the first camera in the cameras can be highlighted, that is, the video picture shot by the camera with the voice source in the shooting area is displayed in a display mode different from other video pictures, so that the video picture where the talking person is located is different from other video pictures. Therefore, the effect of highlighting which video picture can be achieved when the person in which video picture speaks, so that the flexibility of video picture display can be improved, a user can watch the video picture better, and the interactive experience of the user can be improved to a certain extent.
Optionally, the plurality of cameras are all arranged at the terminal, and the shooting directions of the plurality of cameras are different. In this case, before the video picture shot by each of the plurality of cameras is acquired, the plurality of cameras may be started after a video call instruction is received, where the video call instruction is used to instruct to make a call with the far-end call device. In this case, the operation of acquiring the environmental audio information may be: and acquiring audio information acquired by a plurality of microphones of the terminal, wherein the audio information acquired by the plurality of microphones is the environmental audio information, and the plurality of microphones are arranged at different positions of the terminal. The operation of performing differential display on the video picture shot by the first camera and the video picture shot by the second camera may be: and displaying the video picture shot by the first camera and not displaying the video picture shot by the second camera on the video call interface, and sending the video picture shot by the first camera to the far-end call equipment for display.
In the application, the terminal displays the video picture shot by the first camera of the terminal in the video call interface of the terminal, but does not display the video picture shot by the second camera of the terminal, and sends the video picture shot by the first camera to the video call interface of the far-end call equipment for display. In other words, in the present application, the video frames captured by the cameras having the voice source in the capturing area in the plurality of cameras of the terminal are displayed on the video call interfaces of the terminal and the far-end call device, that is, the video frames having the talking person in the video frames captured by the plurality of cameras of the terminal are displayed on the video call interfaces of the terminal and the far-end call device. Therefore, the effect that when people in the video pictures shot by which camera in the terminal speak, the video pictures shot by which camera are displayed on the video call interfaces of the terminal and the far-end call equipment can be achieved, the flexibility of video picture display can be improved, users can watch the video pictures better, and the interaction experience of the users can be improved to a certain extent.
Further, the terminal not only can realize display adjustment of the video pictures, but also can generate video files corresponding to the video pictures shot by each camera. The video file comprises video data and audio data, and the video file format can be set to put the video data and the audio data in one file, so that the video file is convenient to play back simultaneously, and video playing is realized. The operation of the terminal to generate the video file corresponding to the video picture shot by each camera can include the following two possible ways.
A first possible way is: in the shooting process of the n groups of cameras, generating multi-channel audio data according to audio information corresponding to one group of cameras for any group of cameras in the n groups of cameras, and carrying out channel separation on the multi-channel audio data of the group of cameras to obtain audio data of each camera in the group of cameras, wherein the channels of the audio data of different cameras in the group of cameras are different; and for any one of the n groups of cameras, generating a video file corresponding to the one camera according to video data of a video picture shot by the one camera and audio data of the one camera. Thus, when the shooting of the n groups of cameras is finished, the terminal can output the video files corresponding to each camera in the n groups of cameras, namely, a plurality of video files.
A second possible way is: in the shooting process of the n groups of cameras, generating multi-channel audio data according to audio information corresponding to one group of cameras for any group of cameras in the n groups of cameras, and carrying out channel separation on the multi-channel audio data of the group of cameras to obtain audio data of each camera in the group of cameras, wherein the channels of the audio data of different cameras in the group of cameras are different; if all the video pictures currently being displayed have the video pictures shot by the first camera, generating a video file according to the video data of all the video pictures currently being displayed and the audio data of the first camera; if the video pictures shot by the first camera do not exist in all the video pictures currently being displayed, audio data of cameras to which all the video pictures currently being displayed belong are subjected to audio mixing operation to obtain mixed audio data, and a video file is generated according to the video data of all the video pictures currently being displayed and the mixed audio data. Thus, when the shooting of the n groups of cameras is finished, the terminal can output a video file. The video data in the video file is the video data of the fusion picture of all video pictures displayed by the terminal, and the audio data in the video file is the audio data of the highlighted video picture or the mixed audio data of all video pictures displayed in common.
In a second aspect, there is provided a video picture display device having a function of realizing the behavior of the video picture display method in the first aspect described above. The video picture display device comprises at least one module for implementing the video picture display method provided in the first aspect.
In a third aspect, a video picture display device is provided, which includes a processor and a memory in its structure, where the memory is configured to store a program for supporting the video picture display device to execute the video picture display method provided in the first aspect, and store data related to implementing the video picture display method described in the first aspect. The processor is configured to execute a program stored in the memory. The video picture display device may further comprise a communication bus for establishing a connection between the processor and the memory.
In a fourth aspect, there is provided a computer readable storage medium having instructions stored therein, which when run on a computer, cause the computer to perform the video picture display method of the first aspect described above.
In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the video picture display method of the first aspect described above.
The technical effects obtained by the second, third, fourth and fifth aspects are similar to the technical effects obtained by the corresponding technical means in the first aspect, and are not described in detail herein.
Drawings
Fig. 1 is a schematic structural diagram of a terminal according to an embodiment of the present application;
FIG. 2 is a block diagram of a software system of a terminal provided in an embodiment of the present application;
fig. 3 is a schematic display diagram of a first video frame according to an embodiment of the present application;
fig. 4 is a flowchart of a first video frame display method according to an embodiment of the present application;
fig. 5 is a schematic display diagram of a second video frame according to an embodiment of the present application;
FIG. 6 is a flowchart of a second video display method according to an embodiment of the present application;
fig. 7 is a schematic display diagram of a third video frame according to an embodiment of the present application;
fig. 8 is a schematic display diagram of a fourth video frame according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a first software system provided by an embodiment of the present application;
FIG. 10 is a flowchart of a video frame display process provided in an embodiment of the present application;
fig. 11 is a schematic display diagram of a fifth video frame according to an embodiment of the present application;
fig. 12 is a flowchart of a third video frame display method according to an embodiment of the present application;
fig. 13 is a schematic display diagram of a sixth video frame according to an embodiment of the present application;
fig. 14 is a schematic display diagram of a seventh video frame according to an embodiment of the present application;
FIG. 15 is a schematic diagram of a second software system provided by an embodiment of the present application;
fig. 16 is a schematic display diagram of an eighth video frame according to an embodiment of the present disclosure;
FIG. 17 is a flowchart of a fourth video frame display method according to an embodiment of the present application;
fig. 18 is a schematic display diagram of a ninth video frame according to an embodiment of the present application;
fig. 19 is a schematic display diagram of a tenth video frame according to an embodiment of the present application;
fig. 20 is a flowchart of a fifth video frame display method according to an embodiment of the present application;
fig. 21 is a schematic display diagram of an eleventh video frame according to an embodiment of the present application;
fig. 22 is a schematic display diagram of a twelfth video frame according to an embodiment of the present application;
Fig. 23 is a flowchart of a sixth video screen display method provided in an embodiment of the present application;
fig. 24 is a schematic display diagram of a thirteenth video frame according to an embodiment of the present application;
fig. 25 is a schematic display diagram of a fourteenth video frame according to an embodiment of the present disclosure;
FIG. 26 is a schematic diagram of a third software system provided by an embodiment of the present application;
FIG. 27 is a schematic diagram of a video file according to an embodiment of the present application;
fig. 28 is a schematic structural diagram of a video display device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
It should be understood that reference herein to "a plurality" means two or more. In the description of the present application, "/" means or, unless otherwise indicated, for example, a/B may represent a or B; "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, for the purpose of facilitating the clear description of the technical solutions of the present application, the words "first", "second", etc. are used to distinguish between the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.
The statements of "one embodiment" or "some embodiments" and the like, described in this application, mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in various places throughout this application are not necessarily all referring to the same embodiment, but mean "one or more, but not all, embodiments" unless expressly specified otherwise. Furthermore, the terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless otherwise specifically noted.
The terminal according to the embodiment of the present application will be described below.
Fig. 1 is a schematic structural diagram of a terminal according to an embodiment of the present application. Referring to fig. 1, the terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a key 190, a motor 191, an indicator 192, a camera 193, a display 194, a subscriber identity module (subscriber identity module, SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
It should be understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the terminal 100. In other embodiments of the present application, terminal 100 may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.
The controller may be a neural hub and a command center of the terminal 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.
A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.
The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 140 may receive a charging input of a wired charger through the USB interface 130. In some wireless charging embodiments, the charge management module 140 may receive wireless charging input through a wireless charging coil of the terminal 100. The charging management module 140 may also supply power to the terminal 100 through the power management module 141 while charging the battery 142.
The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 to power the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be configured to monitor battery capacity, battery cycle number, battery health (leakage, impedance) and other parameters. In other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charge management module 140 may be disposed in the same device.
The wireless communication function of the terminal 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.
The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied to the terminal 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.
The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., applied on the terminal 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.
Terminal 100 implements display functions via a GPU, display 194, and application processor, etc. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
The terminal 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display 194, an application processor, and the like.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to realize the memory capability of the extension terminal 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. Such as storing files of music, video, etc. in an external memory card.
The internal memory 121 may be used to store computer-executable program code that includes instructions. The processor 110 performs various functional applications of the terminal 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data (e.g., audio data, phonebook, etc.) created by the terminal 100 during use, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.
The terminal 100 may implement audio functions such as music playing, recording, etc. through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, etc.
The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.
The SIM card interface 195 is used to connect a SIM card. The SIM card may be contacted and separated from the terminal 100 by being inserted into the SIM card interface 195 or by being withdrawn from the SIM card interface 195. The terminal 100 may support 1 or N SIM card interfaces, N being an integer greater than 1. The SIM card interface 195 may support Nano SIM cards, micro SIM cards, and the like. The same SIM card interface 195 may be used to insert multiple cards simultaneously. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The terminal 100 interacts with the network through the SIM card to realize functions such as call and data communication. In some embodiments, the terminal 100 employs esims, i.e.: an embedded SIM card. The eSIM card may be embedded in the terminal 100 and cannot be separated from the terminal 100.
The software system of the terminal 100 will be described next.
The software system of the terminal 100 may employ a layered architecture, an event driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. In this embodiment, a software system of the terminal 100 is exemplarily described by taking an Android (Android) system with a hierarchical architecture as an example.
Fig. 2 is a block diagram of a software system of the terminal 100 according to an embodiment of the present application. Referring to fig. 2, the hierarchical architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into an Application layer (Application), an Application Framework layer (Framework), a An Zhuoyun row (Android run) and system layer, an extension layer, and a Kernel layer (Kernel) from top to bottom.
The application layer may include a series of application packages. As shown in fig. 2, the application package may include applications such as camera, gallery, calendar, phone call, map, instant messaging, WLAN, multi-screen collaboration, bluetooth, short message, etc.
The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions. As shown in fig. 2, the application Framework layer may include an Audio Framework (Audio Framework), a Camera Framework (Camera Framework), a window manager, a content provider, a view system, a phone manager, a resource manager, a notification manager, and the like.
The audio framework comprises AudioTrack, audioRecord and AudioSystem, audioTrack, and AudioRecord and AudioSystem are all Android application program framework API classes, wherein AudioTrack is responsible for outputting playback data, audioRecord is responsible for collecting recording data, and AudioSystem is responsible for comprehensive management of audio transactions. The camera frame is used for providing a functional interface for upper-layer applications. The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like. The content provider is used to store and retrieve data, which may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc., and make such data accessible to the application. The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to construct a display interface for an application, which may be comprised of one or more views, such as a view that includes displaying a text notification icon, a view that includes displaying text, and a view that includes displaying a picture. The telephony manager is used to provide communication functions of the terminal 100, such as management of call status (including on, off, etc.). The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like. The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. For example, a notification manager is used to inform that the download is complete, a message alert, etc. The notification manager may also be a notification that appears in the system top status bar in the form of a chart or a scroll bar text, such as a notification of a background running application. The notification manager may also be a notification that appears on the screen in the form of a dialog window, such as a text message being prompted in a status bar, a notification sound being emitted, the electronic device vibrating, a flashing indicator light, etc.
Android run time includes a core library and virtual machines. Android run time is responsible for scheduling and management of the Android system. The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android. The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.
The system layer may include a plurality of functional modules such as: audio Services (Audio Services), camera Services (Camera Services), surface manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), two-dimensional graphics engines (e.g., SGL), etc. The audio services comprise audiopolicy service and AudioFlinger, wherein audiopolicy service is a formulator of audio policies, is responsible for policy decisions, volume adjustment policies and the like for switching audio equipment, and is responsible for management of input and output streaming equipment and processing and transmission of audio stream data. The camera service is used for interacting with a camera hardware abstraction layer. The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications. Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as: MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc. The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like. A two-dimensional graphics engine is a drawing engine that draws two-dimensional drawings.
The extension layer, which may also be referred to as a hardware abstraction layer (hardware abstracting layer, HAL), may implement encapsulation of the kernel driver, provide an interface upward, and mask low-level implementation details. The extension layer links up Android run times and frameworks and links down drivers. The extension layer may include an Audio hardware abstraction layer (Audio HAL) and a Camera hardware abstraction layer (Camera HAL), the Audio hardware abstraction layer being responsible for interaction with the Audio hardware device, the Camera hardware abstraction layer being responsible for interaction with the Camera hardware device.
The kernel layer is a layer between hardware and software. The kernel layer may contain display drivers, camera drivers, audio drivers, sensor drivers, etc.
Before explaining the embodiments of the present application in detail, application scenarios related to the embodiments of the present application are explained.
Currently, as shown in fig. 3, in many photographing scenes, a terminal such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a management device, etc. may display a video picture 31 photographed by each of a plurality of cameras, that is, the terminal may display a plurality of video pictures 31. Several possible shooting scenarios are described below.
In a first shooting scenario, a cell phone has a front camera and a rear camera. In a double-shot video scene, the mobile phone can start a front camera and a rear camera of the mobile phone, and then can display video pictures shot by the front camera and video pictures shot by the rear camera on a video interface (also called a video preview interface).
In the second shooting scene, the mobile phone and the tablet personal computer are in a multi-screen cooperative state, the mobile phone is provided with a camera (a front camera and/or a rear camera), and the tablet personal computer is also provided with a camera (a front camera and/or a rear camera). In a multi-screen collaborative video scene, the mobile phone can start a camera of the mobile phone and instruct the tablet computer to start the camera of the tablet computer, and then the mobile phone can display video pictures shot by the camera of the mobile phone and video pictures shot by the camera of the mobile phone, which are sent by the tablet computer, on a video interface (also called a video preview interface).
In a third shooting scenario, the cell phone performs a video call with a far-end call device, the cell phone has a camera (front camera and/or rear camera), and the far-end call device also has a camera (front camera and/or rear camera). In the video call scene, the mobile phone can start the camera of the mobile phone, the far-end call equipment can also start the camera of the mobile phone, and then the mobile phone can display the video picture shot by the camera of the mobile phone and sent by the far-end call equipment on the video call interface.
In a fourth shooting scenario, a notebook computer is in video conference with at least one other device (which may be referred to as a conference device), the notebook computer having a camera (front camera and/or rear camera), each of the at least one other conference device also having a camera (front camera and/or rear camera). In the video conference scene, the notebook computer can start the camera of the notebook computer, each of the other at least one conference equipment can also start the camera of the notebook computer, and then the notebook computer can display the video picture shot by the camera of the notebook computer and the video picture shot by the camera of each of the other at least one conference equipment at the video conference interface.
In a fifth photographing scene, the management apparatus monitors a plurality of different areas through a plurality of monitoring apparatuses installed in the different areas, each of the plurality of monitoring apparatuses having a camera. In the centralized monitoring scene, the management device may instruct each of the plurality of monitoring devices to start the camera, and then the management device may display, on the monitoring interface, a video picture sent by each of the plurality of monitoring devices and captured by the camera thereof.
Among the above-mentioned various shooting scenes, the terminal can display a plurality of video pictures, which are shot by a plurality of cameras, respectively. In order to improve the display effect of the plurality of video pictures, the embodiment of the application provides a video picture display method, which can dynamically adjust the display mode of the plurality of video pictures according to the audio information in the environment where each camera of the plurality of cameras is located, so as to realize the differentiated display of the plurality of video pictures, thereby achieving the effect of highlighting which video picture when a person in which video picture speaks. Therefore, the flexibility of video picture display can be improved, a user can watch the video picture better, and the interactive experience of the user can be improved to a certain extent.
The following explains in detail the video picture display method provided in the embodiment of the present application.
Fig. 4 is a flowchart of a video frame display method according to an embodiment of the present application. The method is applied to a terminal, which may be the terminal described in the embodiments of fig. 1-2 above. Referring to fig. 4, the method includes the steps of:
step 401: the terminal acquires video pictures shot by each of the plurality of cameras and acquires environmental audio information, wherein the environmental audio information comprises audio information in the environment where each of the plurality of cameras is located.
The plurality of cameras are cameras with which the terminal can acquire video pictures photographed by the terminal. The terminal can display video pictures shot by each camera in the plurality of cameras.
The plurality of cameras can be cameras of the terminal, and the terminal can display video pictures shot by the plurality of cameras. Or, some cameras in the plurality of cameras may be cameras of the terminal, and another part of cameras may be cameras of other devices, so that the terminal can display not only video pictures shot by the cameras of the terminal, but also video pictures shot by the cameras of other devices communicating with the terminal. Alternatively, the plurality of cameras may be cameras of other devices, and the terminal can display video pictures shot by the cameras of the other devices communicating with the terminal.
The audio information in the environment where each camera is located in the plurality of cameras refers to the audio information in the environment where the equipment where each camera is located, and the audio information in the environment where each camera is located can be collected by the equipment where each camera is located. That is, when the plurality of cameras are cameras of the terminal, the environmental audio information may be audio information in the environment where the terminal is located, which is collected by the terminal. When one part of the cameras are cameras of the terminal and the other part of the cameras are cameras of other equipment, the environment audio information can comprise the audio information in the environment where the terminal is located and the audio information in the environment where the terminal is located, which are collected by the other equipment. When the plurality of cameras are all cameras of other devices, the environmental audio information may include audio information collected by the other devices in the environment in which the cameras are located.
Optionally, the plurality of cameras are n groups of cameras, and n is a positive integer. Wherein, the camera of different groups sets up in different equipment, and same group's camera sets up in same equipment. That is, the n groups of cameras are disposed on n different devices one by one, each device having a group of cameras. The environmental audio information may include audio information in an environment in which each of the n devices is located, i.e., the environmental audio information may include n audio information.
In this case, the operation of step 401 may be: the terminal acquires n pieces of audio information, wherein the n pieces of audio information are the environmental audio information, the n pieces of audio information are in one-to-one correspondence with the n groups of cameras, and each piece of audio information in the n pieces of audio information is the audio information in the environment where the corresponding group of cameras are located.
Specifically, for any one of the n groups of cameras, if the device in which the one group of cameras is located is the terminal, that is, if the one group of cameras is disposed in the terminal, the terminal may collect audio information in the environment in which the terminal is located as one audio information corresponding to the one group of cameras. For example, the terminal may collect audio information through a microphone of the terminal, and use the collected audio information as one audio information corresponding to the group of cameras.
If the device where the group of cameras is located is not the terminal, that is, if the group of cameras is set in other devices, the other devices can collect audio information in the environment where the other devices are located, for example, the other devices can collect audio information through microphones of the other devices. The other device may then send the collected audio information to the terminal. After the terminal receives the audio information sent by the other devices, the received audio information can be used as one audio information corresponding to the group of cameras.
It should be noted that, in the embodiment of the present application, the terminal may be capable of acquiring a video picture captured by the camera of each of the n devices, and in this case, the terminal may acquire audio information in an environment where each of the n devices is located, so as to determine, in a video picture captured by the camera of which device may have a person speaking in the next time, according to the audio information.
Step 402: and the terminal determines a first camera from the plurality of cameras according to the environmental audio information, wherein a human voice sound source exists in a shooting area of the first camera.
The human sound source refers to a sound source that emits human sound. The presence of a human sound source in the shooting area of the first camera indicates that the shooting object of the first camera is likely to be a person and is speaking, that is, the person who is speaking is likely to appear in the video picture shot by the first camera.
After the terminal obtains the audio information in the environment where each of the plurality of cameras is located, the audio information can be analyzed to determine which cameras of the plurality of cameras have voice sound sources in the shooting areas, namely, determine which cameras of the plurality of cameras shoot video pictures in which people who speak appear.
In the case where the environmental audio information includes the above-mentioned n pieces of audio information, the operation of step 402 may be: the terminal determines at least one target audio information from the n audio information, wherein the target audio information is audio information with voice; the terminal determines a first camera from a group of cameras corresponding to each target audio information in the at least one target audio information.
Therefore, the terminal determines the first camera from the group of cameras corresponding to the target audio information with the voice, namely, the terminal determines the first camera from the cameras of the equipment with the voice in the environment, so that the first camera can be accurately determined.
For any one of the n pieces of audio information, the terminal firstly detects whether the voice exists in the one piece of audio information, if so, the one piece of audio information is determined to be one piece of target audio information, and then a first camera is determined from a group of cameras corresponding to the one piece of target audio information.
It is noted that for any one of the above n groups of cameras, the group of cameras may include only one camera or may include at least two cameras. If the group of cameras includes at least two cameras, the shooting directions of the at least two cameras are different, that is, the shooting directions of the at least two cameras disposed in the same device are different. The terminal may record a shooting direction of a camera that each of the n devices has.
In this case, the operation of the terminal to determine the first camera from the group of cameras corresponding to the one target audio information may be: if the group of cameras corresponding to the target audio information only comprises one camera, the terminal directly determines that the camera is the first camera. Or if the group of cameras corresponding to the one target audio information only comprises one camera, the terminal performs the positioning of the voice sound source according to the one target audio information to obtain the direction of the voice sound source; under the condition that the direction of the human sound source is the same as the shooting direction of the camera, determining the camera as a first camera; and under the condition that the direction of the human sound source is different from the shooting direction of the camera, determining that the camera is not the first camera. Or if the group of cameras corresponding to the one target audio information comprises j cameras, the terminal firstly performs the localization of the voice sound source according to the one target audio information to obtain the direction of the voice sound source, and then determines the first camera from the j cameras according to the direction of the voice sound source and the shooting direction of each camera in the j cameras, wherein j is an integer greater than or equal to 2, and the shooting directions of the j cameras are different.
Optionally, the determining, by the terminal, the operation of the first camera from the j cameras according to the direction in which the sound source of the voice is located and the shooting direction of each camera of the j cameras may be: the terminal determines that a camera with the same shooting direction as the direction of the voice sound source in the j cameras is a first camera, and determines that a camera with the different shooting direction as the direction of the voice sound source in the j cameras is not the first camera. Or the terminal performs sound source separation on the one target audio information according to the shooting direction of each camera in the j cameras to obtain the audio of the sound source in the shooting direction of each camera in the j cameras, namely, j audios are separated, the j audios are in one-to-one correspondence with the j cameras, each audio in the j audios is the audio of the sound source in the shooting direction of the corresponding camera, then the terminal determines the sound source energy of each audio in the j audios, detects whether a voice exists in each audio in the j audios to obtain a voice detection result, then the terminal determines the voice occupation ratio of the sound source in the shooting direction of each camera in the j cameras according to the direction of the voice source, the sound source energy of each audio in the j audios and the voice detection result, and finally, for any one camera in the j cameras, if the voice occupation ratio of the sound source in the shooting direction of the one camera is greater than or equal to the voice occupation ratio of the corresponding camera, the terminal determines the voice occupation ratio of the sound source in the shooting direction of the one camera to be the first camera, and if the voice occupation ratio of the sound source in the photographing direction of the one camera is less than the first camera is determined to be the voice occupation ratio of the first camera. The voice duty ratio threshold may be set in advance, which indicates that a sound source is likely to be a voice source if the voice duty ratio of the sound source is greater than or equal to the voice duty ratio threshold, and indicates that the sound source is likely not to be a voice source if the voice duty ratio of the sound source is less than the voice duty ratio threshold.
It is worth to describe that in the embodiment of the present application, the direction in which the voice sound source is located may be analyzed according to the target audio information, and then, by combining the direction in which the voice sound source is located and the shooting direction of the camera, whether the voice sound in the environment in which the camera is located comes from the shooting direction of the camera is analyzed, so that whether the voice sound source exists in the shooting area of the camera may be determined, that is, whether the camera is the first camera is determined. Thus, the accuracy of the determined first camera can be improved.
It should be noted that, in the step 402, the process of determining the first camera from the plurality of cameras according to the environmental audio information is a process of performing audio directivity analysis according to the environmental audio information to obtain directivity data. The audio directivity analysis refers to analyzing the directivity of the audio from the human sound source in the environmental audio information to determine whether the direction of the human sound source is the shooting direction of the camera, so as to obtain directivity data, wherein the directivity data is used for indicating which camera is the same as the shooting direction of the human sound source, i.e. which camera is the first camera.
Step 403: the terminal performs differential display on video pictures shot by the first camera and video pictures shot by the second camera, wherein the second camera is other cameras except the first camera in the plurality of cameras.
The terminal performs differential display on the video picture shot by the first camera and the video picture shot by the second camera, namely, displays the video picture shot by the first camera and the video picture shot by the second camera in different display modes so as to realize the highlighting of the video picture shot by the first camera. That is, the video picture captured by the camera having the human voice sound source in the capturing area is displayed in a different manner from other video pictures, so that the video picture in which the person speaking is located is distinguished from the other video pictures. Therefore, if a person in a certain video picture is speaking, the video picture and other video pictures are displayed in a differentiated mode, so that the flexibility of video picture display can be improved, and a user can watch the video picture better. In addition, the effect of highlighting which video picture can be achieved when the person in which video picture speaks, so that the interaction experience of the user can be improved to a certain extent.
Optionally, the operation of the terminal for differentially displaying the video frame shot by the first camera and the video frame shot by the second camera may be: the terminal enlarges and displays the video picture shot by the first camera, reduces and displays the video picture shot by the second camera, and at the moment, the display area of the video picture shot by the first camera is larger than that of the video picture shot by the second camera, so that the highlighting of the video picture shot by the first camera can be realized. Or the terminal uses the video picture shot by the first camera as a main picture of the picture-in-picture mode, uses the video picture shot by the second camera as a sub-picture of the picture-in-picture mode, and displays the video picture shot by the first camera and the video picture shot by the second camera in the picture-in-picture mode, wherein the picture-in-picture mode is to display the sub-picture on a small area of the main picture simultaneously in the process of displaying the main picture in a full screen mode, so that the main picture can be highlighted. Or the terminal displays the video picture shot by the first camera and does not display the video picture shot by the second camera, for example, the terminal can carry out full-screen display on the video picture shot by the first camera, so that the highlighting of the video picture shot by the first camera can be realized. Of course, the terminal may also perform differential display on the video image captured by the first camera and the video image captured by the second camera in other manners, which is not limited in the embodiment of the present application.
It should be noted that, if the terminal does not determine the first camera from the plurality of cameras according to the environmental audio information in the above step 402, that is, if the first camera does not exist in the plurality of cameras, the terminal does not execute the above step 403, but displays the video image captured by each of the plurality of cameras in the same display manner, or displays the video image captured by each of the plurality of cameras in a different manner according to other standards.
When the terminal differentially displays the video pictures shot by each of the cameras according to other standards, the video pictures shot by each of the cameras can be differentially displayed according to the standards of the number of people in the video pictures, the priority of the cameras, the loudness of the environmental audio and the like. For example, the terminal may detect the number of people appearing in the video picture photographed by each of the plurality of cameras, and then highlight the video picture having the largest number of people appearing. Or, the terminal may record the priority of each of the cameras, and highlight the video picture shot by the camera with the highest priority. Or the terminal can determine one audio information with the largest loudness from the n audio information, and highlight the video picture shot by the group of cameras corresponding to the one audio information. The terminal may highlight a certain video frame in various manners. For example, the terminal can enlarge and display the video picture, and reduce and display other video pictures; or the terminal may use the video picture as a main picture of the picture-in-picture mode, use other video pictures as sub-pictures of the picture-in-picture mode, and display the video picture and the other video pictures in the picture-in-picture mode; alternatively, the terminal may display only this video picture and not other video pictures. Of course, the terminal may highlight this video frame in other manners, which are not limited by the embodiments of the present application.
It should be noted that the terminal may continuously perform the steps 401-403 during the process of capturing the video frames by the plurality of cameras. Therefore, the terminal can dynamically adjust the display mode of the video picture shot by each camera according to the real-time audio frequency of the environment where each camera is positioned in the plurality of cameras in the whole shooting process so as to achieve the effect of highlighting the video picture when the person in the video picture speaks.
The following is an exemplary description of possible implementations of the video picture display method described in the embodiment of fig. 4 above in a plurality of shooting scenarios.
First, a first shooting scene will be described:
first kind of shooting scene: multi-shot video scene
In such a shooting scene, the terminal has a plurality of cameras, and shooting directions of the plurality of cameras are different.
The terminal can start a multi-camera video recording function to simultaneously record videos through a plurality of cameras of the terminal, and video pictures shot by each of the cameras are displayed in a multi-camera video recording interface. For example, the terminal may have a front camera and a rear camera, and after the terminal starts the multi-shot video recording function, the terminal starts its front camera and rear camera, and then, as shown in fig. 5, the terminal may display a video frame 521 shot by its front camera and a video frame 522 shot by its rear camera in the multi-shot video recording interface 51.
In the multi-shot video scene, n is 1, that is, the plurality of cameras belong to one group, and the plurality of cameras are all arranged at the terminal, and the shooting directions of the plurality of cameras are different.
Next, a method for displaying video pictures in a multi-shot video scene will be described by the embodiments of fig. 6 to 10 below.
Fig. 6 is a flowchart of a video frame display method according to an embodiment of the present application. Referring to fig. 6, the method includes:
step 601: after receiving the multi-camera video recording instruction, the terminal starts a plurality of cameras of the terminal to acquire video pictures shot by each camera in the cameras.
The multi-shot video recording instruction is used for indicating to perform multi-shot video recording, and the multi-shot video recording refers to simultaneous video recording by using a plurality of cameras of the terminal. The multi-shot video instruction may be triggered by a user, and the user may trigger through operations such as clicking operation, sliding operation, voice operation, gesture operation, somatosensory operation, etc., which is not limited in this embodiment of the present application.
After receiving the multi-shot video recording instruction, the terminal can start a plurality of cameras of the terminal, and after the cameras are started, video pictures can be shot. In this case, as shown in fig. 5, the terminal may display a video picture photographed by each of the plurality of cameras. In addition, the terminal may also continuously acquire the environmental audio information during the process of capturing the video frames by the plurality of cameras, as described in step 602.
Step 602: the terminal acquires audio information acquired by a plurality of microphones of the terminal, wherein the audio information acquired by the microphones is environmental audio information.
Under the condition that the cameras are arranged on the terminal, the environment audio information comprises audio information, namely the audio information in the environment where the cameras are located, so that the terminal can collect the audio information through the microphones of the terminal, the audio information collected by the microphones is used as the environment audio information, and at the moment, the cameras are used as a group of cameras to correspond to the environment audio information. The microphones may be disposed at different locations of the terminal to collect audio information in an omni-directional manner. The terminal can start a plurality of cameras to record video and can start a plurality of microphones to collect audio information. For example, the terminal may have three microphones, which may be disposed at the top, bottom and back of the terminal, respectively.
Step 603: and the terminal determines a first camera from the plurality of cameras according to the environmental audio information, wherein a human voice sound source exists in a shooting area of the first camera.
The human sound source refers to a sound source that emits human sound. The presence of a human sound source in the shooting area of the first camera indicates that the shooting object of the first camera is likely to be a person and is speaking, that is, the person who is speaking is likely to appear in the video picture shot by the first camera.
After the terminal obtains the audio information in the environment where the terminal is located, the audio information can be analyzed to determine which cameras of the plurality of cameras of the terminal have voice sound sources in the shooting areas, namely, determine which cameras of the plurality of cameras shoot video pictures in which people who speak appear.
Specifically, the operation of step 603 may be: the terminal detects whether human voice exists in the environmental audio information, if the human voice exists in the environmental audio information, the terminal determines that the environmental audio information is target audio information, and then determines a first camera from the cameras corresponding to the environmental audio information.
The operation of the terminal to determine the first camera from the plurality of cameras corresponding to the environmental audio information may be: the terminal firstly carries out the localization of the voice sound source according to the environmental audio information to obtain the direction of the voice sound source, and then determines the first camera from the cameras according to the direction of the voice sound source and the shooting direction of each camera in the cameras.
Optionally, the determining, by the terminal, the operation of the first camera from the plurality of cameras according to the direction in which the sound source of the voice is located and the shooting direction of each camera in the plurality of cameras may be: the terminal determines that a camera with the same shooting direction as the direction of the voice sound source in the plurality of cameras is a first camera, and determines that a camera with the different shooting direction from the direction of the voice sound source in the plurality of cameras is not the first camera. Or the terminal performs sound source separation on the environmental audio information according to the shooting direction of each camera in the plurality of cameras to obtain sound sources in the shooting direction of each camera in the plurality of cameras, namely j pieces of sound are separated, j is the number of the plurality of cameras, the j pieces of sound are in one-to-one correspondence with the plurality of cameras, each piece of sound in the j pieces of sound is the sound source in the shooting direction of the corresponding camera, then the terminal determines the sound source energy of each piece of sound in the j pieces of sound, detects whether a voice exists in each piece of sound in the j pieces of sound to obtain a voice detection result, then the terminal determines the voice occupation ratio of the sound source in the shooting direction of each piece of sound in the j pieces of sound according to the direction in which the voice source is located, the sound source energy of each piece of sound in the j pieces of sound and the voice detection result, and finally, if the voice occupation ratio of the sound source in the shooting direction of the one piece of sound is greater than or equal to the first camera is determined to be the voice occupation ratio of the first camera, and finally, if the voice occupation ratio of the sound source in the shooting direction of the one piece of sound source in the camera is determined to be the first camera is not determined to be the voice occupation ratio of the first camera.
It is worth to describe that in the embodiment of the present application, the direction in which the voice sound source is located may be analyzed according to the environmental audio information, and then, by combining the direction in which the voice sound source is located with the shooting direction of the camera, whether the voice sound in the environment in which the camera is located comes from the shooting direction of the camera is analyzed, so as to determine whether the voice sound source exists in the shooting area of the camera, that is, whether the camera is the first camera is determined. Therefore, the first camera can be accurately determined.
In step 603, after determining the first camera from the plurality of cameras according to the environmental audio information, the other cameras except the first camera may be referred to as a second camera.
Step 604: the terminal displays the video picture shot by the first camera and the video picture shot by the second camera in the multi-shot video interface, and compared with the video picture shot by the second camera, the terminal highlights the video picture shot by the first camera.
It should be noted that, in the embodiment of the present application, when the terminal displays, in the process of performing multi-shot video recording, the video picture captured by each of the multiple cameras in the multi-shot video recording interface, the video picture captured by the first camera in the multiple cameras may be highlighted, that is, the video picture captured by the camera with the voice source in the capturing area is displayed in a display manner different from other video pictures, so as to distinguish the video picture in which the speaking person is located from other video pictures. Therefore, the effect of highlighting which video picture can be achieved when the person in which video picture speaks can be achieved, so that the flexibility of video picture display can be improved, a user can watch the video picture better, and the interactive experience of the user can be improved to a certain extent.
Optionally, the terminal may zoom in and display the video picture shot by the first camera in the multi-shot video interface, and zoom out and display the video picture shot by the second camera; or the terminal takes the video picture shot by the first camera as a main picture of the picture-in-picture mode, takes the video picture shot by the second camera as a sub-picture of the picture-in-picture mode, and displays the video picture shot by the first camera and the video picture shot by the second camera in the picture-in-picture mode in a multi-camera video interface. Of course, the terminal may also highlight the video frame shot by the first camera in other manners, which is not limited in the embodiment of the present application.
For example, the terminal has a front camera and a rear camera, and as shown in fig. 5, the terminal may display a video frame 521 captured by its front camera and a video frame 522 captured by its rear camera in the multi-shot video interface 51. Then, if the terminal determines that the front camera is the first camera, that is, that a voice sound source exists in the shooting area of the front camera, that is, that a person speaking is present in the video frame 521 shot by the front camera, as shown in fig. 7 (a), the terminal performs an enlarged display on the video frame 521 shot by the front camera and a reduced display on the video frame 522 shot by the rear camera in the multi-shot video interface 51. Or, if the terminal determines that the rear camera is the first camera, that is, determines that a voice sound source exists in the shooting area of the rear camera, that is, determines that a speaking person exists in the video frame 522 shot by the rear camera, as shown in (b) of fig. 7, the terminal performs enlarged display on the video frame 522 shot by the rear camera and reduced display on the video frame 521 shot by the front camera in the multi-shot video interface 51. Thus, the effect of magnifying and displaying the video picture 521 shot by the front camera when the person speaks in the video picture 521 shot by the front camera and magnifying and displaying the video picture 522 shot by the rear camera when the person speaks in the video picture 522 shot by the rear camera can be achieved.
For another example, the terminal has a front camera and a rear camera, and as shown in fig. 5, the terminal may display a video frame 521 captured by its front camera and a video frame 522 captured by its rear camera in the multi-shot video interface 51. Then, if the terminal determines that the front camera is the first camera, that is, that a voice sound source exists in the shooting area of the front camera, that is, that a person speaking is present in the video frame 521 shot by the front camera, as shown in fig. 8 (a), the terminal uses the video frame 521 shot by the front camera as a main frame in a picture-in-picture mode, uses the video frame 522 shot by the rear camera as a sub-frame in a picture-in-picture mode, and displays the video frame 521 shot by the front camera and the video frame 522 shot by the rear camera in a picture-in-picture mode in the multi-shot video interface 51. Or, if the terminal determines that the rear camera is the first camera, that is, determines that a voice sound source exists in the shooting area of the rear camera, that is, determines that a person speaking is present in the video frame 522 shot by the rear camera, as shown in fig. 8 (b), the terminal uses the video frame 522 shot by the rear camera as a main frame in a picture-in-picture mode, uses the video frame 521 shot by the front camera as a sub-frame in the picture-in-picture mode, and displays the video frames 522 shot by the rear camera and the video frame 521 shot by the front camera in a picture-in-picture mode in the multi-shot video interface 51. In this way, the effect can be achieved that when a person in the video frame 521 photographed by the front camera speaks, the video frame 521 photographed by the front camera is displayed as a main frame, and when a person in the video frame 522 photographed by the rear camera speaks, the video frame 522 photographed by the rear camera is displayed as a main frame.
It should be noted that, if the terminal does not determine the first camera from the plurality of cameras according to the environmental audio information in the above step 603, that is, if the first camera does not exist in the plurality of cameras, the terminal does not execute the above step 604, but displays the video picture shot by each of the plurality of cameras on the multi-camera video interface in the same display manner, or displays the video picture shot by each of the plurality of cameras on the multi-camera video interface in a different manner according to other standards.
When the terminal differentially displays the video pictures shot by each of the cameras on the multi-shot video interface according to other standards, the terminal can differentially display the video pictures shot by each of the cameras on the multi-shot video interface according to the standards of the number of people in the video pictures, the priority of the cameras and the like. For example, the terminal may detect the number of people appearing in the video frames photographed by each of the plurality of cameras, and then highlight the video frame having the largest number of people appearing on the multi-shot video interface. Or, the terminal may record the priority of each of the plurality of cameras, and highlight the video picture shot by the camera with the highest priority on the multi-shot video interface. The terminal may highlight a video frame on the multi-shot video interface in various ways. For example, the terminal can enlarge and display the video picture on a multi-shot video interface and reduce and display other video pictures; alternatively, the terminal may use the video picture as a main picture in a pip mode, use other video pictures as sub-pictures in a pip mode, and display the video picture and other video pictures in a pip mode at a multi-shot video interface. Of course, the terminal may also highlight this video frame on the multi-shot interface in other ways, which is not limited by the embodiments of the present application.
It should be noted that the terminal may continuously perform the steps 602-604 during the process of capturing the video frames by the plurality of cameras. Therefore, the terminal can dynamically adjust the display mode of the video picture shot by each camera in the plurality of cameras according to the real-time audio frequency in the environment of the terminal in the whole multi-camera video recording process so as to achieve the effect of highlighting the video picture when the person in the video picture speaks.
In order to facilitate understanding of the above-described multi-shot video scene, the following will exemplify the video picture display method described in the above-described embodiment of fig. 6, in conjunction with the software system shown in fig. 9 and the flowchart of the video picture display process shown in fig. 10.
Referring to fig. 9, the software system of the terminal may include a Camera application in an application layer, and an Audio frame (not shown) and a Camera frame (not shown) in an application frame layer, and an Audio service (not shown) and a Camera service (not shown) in a system layer, and an Audio HAL and a Camera HAL in an extension layer. Wherein, the audio frame includes AudioRecord, audioRecord responsible for the collection of recording data. The camera frame includes MediaCodec (not shown in the figure), which is a class for encoding and decoding an audio and video, and the function of encoding and decoding is realized by accessing an underlying codec. The audio services AudioFlinger, audioFlinger are executives of audio policies, responsible for the management of input and output streaming devices and the processing and transmission of audio streaming data. The Audio HAL includes AudioInputStream, audioInputStream for obtaining an Audio input stream.
In addition, referring to fig. 9, the terminal further has a plurality of cameras that can capture video pictures and a plurality of microphones (mic) that can collect audio information. For example, the terminal may have two cameras, one of which is a front camera disposed on the front side of the terminal and the other of which is a rear camera disposed on the back side of the terminal. For example, the terminal may have three microphones, one of which may be disposed at the top of the terminal and another one of which may be disposed at the bottom of the terminal, and yet another one of which may be disposed at the back of the terminal, and in some embodiments, the three microphones may be microphones for implementing an AudioZoom function.
Referring to fig. 10, the video picture display process may include the following steps 1001-1008.
Step 1001: after the Camera application program is started, the MediaCodec is instructed to create an encoder instance, the AudioRecord is instructed to create an Audio instance, and meanwhile the Camera HAL is instructed to start a plurality of cameras of the terminal.
The encoder instance is used to encode the data stream acquired by the camera. The Audio instance is used for collecting recording data.
After the camera application program is started, the terminal displays a camera interface. At this time, the camera application program prepares for video recording, i.e., creates an encoder instance, creates an Audio instance, and starts up a plurality of cameras of the terminal. After the cameras are started, video pictures can be shot, and the camera application program can acquire the video pictures shot by each camera in the cameras.
Step 1002: after the camera interface receives the click operation of the multi-shot video button, the camera application program instructs MediaCodec to start the encoder instance and instructs AudioRecord to start the Audio instance.
After clicking the multi-shot video button on the camera interface, the user triggers a multi-shot video instruction. At this time, the camera application program can start the created encoder instance and Audio instance to record video through each of the plurality of cameras of the terminal. In this case, the camera interface is the above-described multi-shot video interface.
Step 1003: after the AudioRecord starts the Audio instance, the audiohal is instructed to acquire the Audio information.
Alternatively, the AudioRecord may call the audiohal through an AudioFlinger to instruct the audiohal to acquire Audio information.
Step 1004: the Audio HAL receives Audio information collected by a plurality of microphones of the terminal as environmental Audio information.
Optionally, the audioInputStream in the audioHAL may receive Audio information collected by a plurality of microphones of the terminal, where the Audio information collected by the plurality of microphones is the Audio information in the environment where the terminal is located, that is, the environmental Audio information.
Step 1005: and the Audio HAL performs Audio directivity analysis according to the environmental Audio information to obtain directivity data.
The audio directivity analysis refers to analyzing the directivity of the audio from the human sound source in the environmental audio information to determine whether the direction of the human sound source is the shooting direction of the camera, so as to obtain directivity data, wherein the directivity data is used for indicating which camera the shooting direction is the same as the direction of the human sound source, namely, which camera is the first camera.
The Audio HAL performs Audio directivity analysis according to the environmental Audio information to obtain directivity data, namely, the Audio HAL determines a first camera from a plurality of cameras of the terminal according to the environmental Audio information. The operation of determining the first camera by the Audio HAL from the plurality of cameras of the terminal according to the environmental Audio information is the same as the operation of determining the first camera by the terminal from the plurality of cameras according to the environmental Audio information in the above step 603, which is not described in detail in this embodiment of the present application.
It is noted that the Audio HAL includes an Audio directivity analysis algorithm for implementing the Audio directivity analysis of the environmental Audio information in step 1005. In addition, the Audio HAL may further include a recording algorithm, so as to generate corresponding Audio data for the video frames shot by each of the plurality of cameras according to the environmental Audio information, so as to output corresponding video files subsequently.
Step 1006: the Audio HAL transmits the directivity data to the Camera HAL.
Optionally, after the Camera HAL starts up the cameras of the terminal in step 1001, a callback function may be registered with the Audio HAL. Thus, after the Audio HAL obtains the directivity data, the callback function can be called to transfer the directivity data to the Audio HAL.
Step 1007: the Camera HAL transmits the directivity data to the Camera application.
Alternatively, the Camera HAL may report the directivity data to the Camera application through a Meta report path.
The Meta report path refers to reporting data using a camera_metadata data structure. The camera_metadata data structure may enable parameter transfer between the Camera HAL and the Camera application.
For example, in the case that the terminal has a front Camera and a rear Camera, TAG information in the camera_metadata data structure may be as follows:
public static final CaptureRequest.Key<Byte>AUIDO_DIRECTION_=
KeyGenerator.generateCaptureRequestKey("metadata.auidodirection",byte.class);
Wherein, byte is 0, which means that only the front camera is recognized as the first camera; byte is 1, which means that only the rear camera is recognized as a first camera; byte is 2, which means that the front camera and the rear camera are both recognized as the first camera, or that the front camera and the rear camera are not recognized as the first camera.
Step 1008: the camera application program dynamically adjusts a plurality of video pictures displayed in the camera interface according to the directivity data.
The camera application displays the video picture shot by each of the plurality of cameras in the camera interface, and compared with other video pictures, the video picture shot by the first camera indicated by the directivity data is highlighted, and the specific highlighting mode can refer to the related description in the step 604.
It is noted that the steps 1004-1008 described above may be continuously performed during the multi-shot recording process. Therefore, the Audio HAL can acquire the environmental Audio information in real time and perform Audio directivity analysis to acquire directivity data, and the camera application program can dynamically adjust a plurality of video pictures displayed in a camera interface according to the directivity data so as to achieve the effect of highlighting which video picture when a person in which video picture speaks in the whole multi-shot video recording process, thereby improving the flexibility of video picture display, facilitating a user to watch the video picture better and improving the interactive experience of the user to a certain extent.
Next, a second shooting scene will be described:
the second shooting scene: collaborative video scene
In this shooting scenario, the terminal and at least one other device (which may be referred to as a cooperative device) are in a multi-screen cooperative state, and the terminal and the at least one cooperative device both have cameras, and the terminal can shoot video pictures by means of the cameras of each of the at least one cooperative device.
The terminal can start a collaborative video recording function so as to shoot video pictures through the camera of the terminal and the camera of each collaborative device in the at least one collaborative device at the same time, and then the video pictures shot by the camera of the terminal and the video pictures shot by the camera of each collaborative device in the at least one collaborative device are displayed in a collaborative video recording interface. Illustratively, the terminal and a cooperating device each have a camera, and after the terminal initiates the cooperating video recording function, the terminal initiates the own camera and the camera of the cooperating device, and thereafter, as shown in fig. 11, the terminal 1101 may display a video frame 1121 captured by the own camera and a video frame 1122 captured by the camera of the cooperating device 1102 in the cooperating video recording interface 111.
In the collaborative video scene, n is an integer greater than or equal to 2, that is, at least two groups of cameras exist, one group of cameras is arranged at the terminal, and the other at least one group of cameras are arranged at least one device (i.e., collaborative device) in a multi-screen collaborative state with the terminal one by one.
Next, a method for displaying video pictures in a collaborative video scene will be described by the embodiments of fig. 12 to 15.
Fig. 12 is a flowchart of a video frame display method according to an embodiment of the present application. Referring to fig. 12, the method includes:
step 1201: after receiving the collaborative video command, the terminal starts a camera of the terminal, and instructs each collaborative device in at least one collaborative device in a multi-screen collaborative state with the terminal to start the camera of the terminal.
The collaborative video recording instruction is used for indicating to perform collaborative video recording, and collaborative video recording refers to using a camera of the terminal and a camera of collaborative equipment in a multi-screen collaborative state with the terminal to perform video recording simultaneously. The collaborative video instruction may be triggered by a user, and the user may trigger through operations such as clicking operation, sliding operation, voice operation, gesture operation, somatosensory operation, etc., which is not limited in this embodiment of the present application.
After receiving the collaborative video command, the terminal can start the camera of the terminal and instruct each collaborative device in the at least one collaborative device to start the cameras of the terminal, and after the cameras are started, the video picture can be shot. After each of the at least one cooperative device starts the own camera, the video picture shot by the own camera can be sent to the terminal.
Step 1202: the terminal acquires a video picture shot by a camera of the terminal, and receives the video picture shot by the camera of the terminal sent by each cooperative device in the at least one cooperative device.
In this case, as shown in FIG. 11, the terminal 1101 may display a video frame 1121 captured by its own camera and a video frame 1122 captured by the camera of the collaborative device 1102.
In addition, the terminal may also continuously obtain the environmental audio information during the process of capturing the video image by the camera of the terminal and the camera of each of the at least one cooperative device, which is described in step 1203.
Step 1203: the terminal acquires audio information collected by a microphone of the terminal, receives the audio information collected by a microphone of the terminal and transmitted by each cooperative device in the at least one cooperative device, and the audio information collected by the microphone of the terminal and the audio information collected by the microphone of each cooperative device in the at least one cooperative device are environmental audio information.
In this case, the environmental audio information includes at least two audio information, wherein one audio information is audio information in an environment where the terminal is located collected by a microphone of the terminal, and the other at least one audio information is audio information in an environment where the microphone of each of the at least one cooperative device is located. The audio information collected by the microphone of the terminal corresponds to the camera of the terminal, and the audio information collected by the microphone of a certain cooperative device corresponds to the camera of the cooperative device.
The terminal can start the own microphone to collect audio information while starting the own camera to record video. Similarly, each of the at least one cooperative device may start its own microphone to collect audio information while starting its own camera to record video, so that when a video picture captured by its own camera is sent to the terminal, the audio information collected by its own microphone may also be sent to the terminal.
Step 1204: and the terminal determines a first camera from the camera of the terminal and the camera of each cooperative device in the at least one cooperative device according to the environmental audio information, and a voice sound source exists in a shooting area of the first camera.
The human sound source refers to a sound source that emits human sound. The presence of a human sound source in the shooting area of the first camera indicates that the shooting object of the first camera is likely to be a person and is speaking, that is, the person who is speaking is likely to appear in the video picture shot by the first camera.
After the terminal obtains the audio information in the environment where the terminal is located and the audio information in the environment where each of the at least one cooperative device is located, the audio information can be analyzed to determine which cameras in the cameras of the terminal and each of the at least one cooperative device have voice sound sources in the shooting areas, i.e. determine which cameras shoot video pictures in which people who are speaking appear.
The operation in step 1204 is similar to the operation of determining the first camera from the plurality of cameras according to the environmental audio information in step 402, which is not described in detail in the embodiment of the present application.
In step 1204, after determining the first camera from the camera of the terminal and the camera of each of the at least one cooperative device according to the environmental audio information, the camera of the terminal and the other cameras except the first camera in the cameras of each of the at least one cooperative device may be referred to as the second camera.
Step 1205: the terminal displays the video picture shot by the first camera and the video picture shot by the second camera in the collaborative video interface, and compared with the video picture shot by the second camera, the terminal highlights the video picture shot by the first camera.
It should be noted that, in this embodiment of the present application, when the terminal displays, in the collaborative video recording process, its own camera and the video frames captured by the camera of each of the at least one collaborative device in the collaborative video recording interface, the video frames captured by the first camera in the cameras may be highlighted, that is, the video frames captured by the camera with the voice source in the capturing area are displayed in a display manner different from other video frames, so as to distinguish the video frame where the person speaking is located from other video frames. Therefore, the effect of highlighting which video picture can be achieved when the person in which video picture speaks, so that the flexibility of video picture display can be improved, a user can watch the video picture better, and the interactive experience of the user can be improved to a certain extent.
Optionally, the terminal may zoom in and display the video picture shot by the first camera in the collaborative video interface, and zoom out and display the video picture shot by the second camera; or, the terminal may use the video picture shot by the first camera as a main picture of the picture-in-picture mode, and use the video picture shot by the second camera as a sub-picture of the picture-in-picture mode, and display the video picture shot by the first camera and the video picture shot by the second camera in the picture-in-picture mode in the collaborative video interface.
For example, as shown in FIG. 11, the terminal 1101 and the collaborative device 1102 each have a camera, and the terminal 1101 may display a video frame 1121 captured by its own camera and a video frame 1122 captured by the camera of the collaborative device 1102 in the collaborative video interface 111. Then, if the terminal 1101 determines that the own camera is the first camera, that is, that a voice sound source exists in the shooting area of the own camera, that is, that a person speaking is present in the video frame 1121 shot by the own camera, as shown in fig. 13 (a), the terminal 1101 performs an enlarged display on the video frame 1121 shot by the own camera and a reduced display on the video frame 1122 shot by the camera of the collaborative device 1102 in the collaborative video interface 111. Alternatively, if the terminal 1101 determines that the camera of the collaborative device 1102 is the first camera, that is, that a human sound source exists in a photographing area of the camera of the collaborative device 1102, that is, that a person speaking is present in a video frame 1122 photographed by the camera of the collaborative device 1102, the terminal 1101 enlarges and reduces the video frame 1122 photographed by the camera of the collaborative device 1102 and the video frame 1121 photographed by the camera of the terminal 1101 in the collaborative video interface 111 as shown in fig. 13 (b). Thus, the effect of magnifying the video frame 1121 captured by the camera of the terminal 1101 when the person in the video frame 1121 captured by the camera of the terminal 1101 is speaking and magnifying the video frame 1122 captured by the camera of the co-device 1102 when the person in the video frame 1122 captured by the camera of the co-device 1102 is speaking can be achieved.
As another example, as shown in FIG. 11, the terminal 1101 and the collaborative device 1102 each have a camera, and the terminal 1101 may display a video frame 1121 captured by its own camera and a video frame 1122 captured by the camera of the collaborative device 1102 in the collaborative video interface 111. Then, if the terminal 1101 determines that the own camera is the first camera, that is, that a voice sound source exists in the shooting area of the own camera, that is, that a person speaking is present in the video frame 1121 shot by the own camera, as shown in fig. 14 (a), the terminal 1101 displays the video frame 1121 shot by the own camera and the video frame 1122 shot by the camera of the cooperative apparatus 1102 in the co-video interface 111 in the picture-in-picture mode with the video frame 1121 shot by the own camera as the main frame of the picture-in-picture mode and the video frame 1122 shot by the camera of the cooperative apparatus 1102 as the sub-frame of the picture-in-picture mode. Alternatively, if the terminal 1101 determines that the camera of the collaborative device 1102 is the first camera, that is, that a human sound source exists in a photographing area of the camera of the collaborative device 1102, that is, that a person speaking is present in a video frame 1122 photographed by the camera of the collaborative device 1102, as shown in fig. 14 (b), the terminal 1101 displays the video frame 1122 photographed by the camera of the collaborative device 1102 and the video frame 1121 photographed by the camera of the terminal 1101 in a picture-in-picture mode in the collaborative video interface 111 with the video frame 1122 photographed by the camera of the collaborative device 1102 as a main frame in the picture-in-picture mode and the video frame 1121 photographed by the camera of the terminal 1101 as a sub-frame in the picture-in-picture mode. Thus, the effect can be achieved that when a person in the video frame 1121 captured by the camera of the terminal 1101 speaks, the video frame 1121 captured by the camera of the terminal 1101 is displayed as a main frame, and when a person in the video frame 1122 captured by the camera of the cooperative apparatus 1102 speaks, the video frame 1122 captured by the camera of the cooperative apparatus 1102 is displayed as a main frame.
It should be noted that, if the terminal does not determine the first camera from the camera of the terminal and the camera of each of the at least one cooperative device according to the environmental audio information in the step 1204, that is, if the first camera does not exist in the cameras, the terminal does not execute the step 1205, but displays, on the collaborative video interface, the video frames captured by the camera of the terminal and the camera of each of the at least one cooperative device in the same display manner, or displays, on the collaborative video interface, the video frames captured by the camera of the terminal and the camera of each of the at least one cooperative device in a different manner according to other standards.
When the terminal performs differential display on video pictures shot by the camera of the terminal and the camera of each cooperative device in the at least one cooperative device on the cooperative video interface according to other standards, the terminal can perform differential display on video pictures shot by the camera of the terminal and the camera of each cooperative device in the at least one cooperative device on the cooperative video interface according to the standards of the number of people in the video pictures, the priority of the camera, the loudness of the environmental audio and the like. For example, the terminal may detect the number of people appearing in the video frames captured by the camera of the terminal and the camera of each of the at least one cooperative device, and then highlight the video frame with the largest number of people appearing in the cooperative video interface. Or, the priority of the camera of the terminal and the priority of the camera of each cooperative device in the at least one cooperative device may be recorded in the terminal, and the video picture shot by the camera with the highest priority is highlighted on the cooperative video interface. Or, the terminal may determine one audio information with the largest loudness from the audio information corresponding to the camera of the terminal (i.e. the audio information collected by the microphone of the terminal) and the audio information corresponding to the camera of each of the at least one cooperative device (i.e. the audio information collected by the microphone of each cooperative device), and highlight the video picture captured by the camera corresponding to the audio information on the cooperative video interface. The terminal may highlight a video frame on the collaborative video interface in a plurality of ways. For example, the terminal can enlarge and display the video picture on the collaborative video interface and reduce and display other video pictures; alternatively, the terminal may use the video picture as a main picture in a pip mode, use other video pictures as sub-pictures in a pip mode, and display the video picture and other video pictures in a pip mode at a collaborative video interface. Of course, the terminal may also highlight this video frame in the collaborative video interface in other ways, which embodiments of the present application are not limited in this regard.
Another point to be noted is that the terminal may continue to perform steps 1203-1205 described above during the capturing of the video frames by the camera of the terminal and the camera of each of the at least one cooperating device. Therefore, the terminal can dynamically adjust the display modes of the video pictures shot by the camera of the terminal and the camera of each of the at least one cooperative device according to the real-time audio frequency in the environment where the terminal and each of the at least one cooperative device are positioned in the whole cooperative video recording process, so as to achieve the effect of highlighting the video pictures when people in the video pictures speak.
In order to facilitate understanding of the collaborative video scene, the method for displaying a video picture described in the embodiment of fig. 12 is illustrated below in conjunction with the software system shown in fig. 15.
Referring to fig. 15, the software system of the terminal may include a camera application and a multi-screen collaborative application (not shown) in an application layer, and an Audio frame in an application frame layer, and an Audio service in a system layer, and an Audio HAL in an extension layer. The software system of the collaborative device may include an Audio framework in the application framework layer, and Audio services in the system layer, and an Audio HAL in the extension layer. Wherein the audio framework comprises AudioRecord. The audio service includes AudioFlinger. In addition, the terminal and the cooperative apparatus each have a camera (not shown in the figure) that can capture video pictures and a microphone that can collect audio information.
In this case, the video picture display process may include the following steps (1) - (5).
(1) After the camera application program in the terminal is started, the camera of the terminal is started. And after detecting that the camera interface is opened, the multi-screen cooperative application program in the terminal instructs the cooperative device to start the camera of the cooperative device.
After the camera application program in the terminal is started, the terminal displays a camera interface. At this time, the camera application program in the terminal prepares for video recording, i.e. starts the camera of the terminal. After the camera of the terminal is started, a video picture can be shot, and a camera application program in the terminal can acquire the video picture shot by the camera of the terminal.
After the camera of the cooperative device is started, the video picture shot by the camera of the cooperative device can be sent to the terminal, so that a camera application program in the terminal can acquire the video picture shot by the camera of the cooperative device.
(2) After the camera application program in the terminal receives clicking operation on the collaborative video button at the camera interface, the camera application program instructs an audio framework in the terminal to acquire audio information in the environment where the terminal is located. After detecting clicking operation on a collaborative video button, a multi-screen collaborative application program in the terminal instructs the collaborative device to acquire audio information in an environment where the collaborative device is located.
After clicking the collaborative video button on the camera interface, the user triggers a collaborative video command. In this case, the camera interface is the collaborative video interface described above.
Optionally, the AudioRecord in the Audio framework in the terminal may call the audiohal in the terminal through the AudioFlinger in the terminal, so as to instruct the audiohal in the terminal to obtain the Audio information in the environment where the terminal is located. The audiohal in the terminal can receive the Audio information collected by the microphone of the terminal, and then transmit the Audio information to the AudioRecord in the terminal through the audioplayer in the terminal.
Optionally, the AudioRecord in the Audio framework in the cooperative device may call the audiohal in the cooperative device through the AudioFlinger in the cooperative device, so as to instruct the audiohal in the cooperative device to acquire the Audio information in the environment where the cooperative device is located. The audiohal in the cooperative device may receive Audio information collected by a microphone of the cooperative device, then transfer the Audio information to an AudioRecord in the cooperative device through an AudioFlinger in the cooperative device, and then transfer the Audio information to an Audio framework in the terminal through the AudioRecord in the cooperative device.
The audio information in the environment where the terminal is located (i.e., the audio information collected by the microphone of the terminal) and the audio information in the environment where the cooperative device is located (i.e., the audio information collected by the microphone of the cooperative device) are environmental audio information. In this case, the environmental audio information includes two audio information, one of which corresponds to the camera of the terminal and the other of which corresponds to the camera of the cooperative apparatus.
(3) And the audio framework in the terminal performs audio directivity analysis according to the environmental audio information to obtain directivity data.
The audio directivity analysis refers to analyzing the directivity of the audio from the human sound source in the environmental audio information to determine whether the direction of the human sound source is the shooting direction of the camera, so as to obtain directivity data, wherein the directivity data is used for indicating which camera the shooting direction is the same as the direction of the human sound source, namely, which camera is the first camera.
Optionally, an audio directivity analysis algorithm is included in an audio frame in the terminal, so as to implement the audio directivity analysis of the environmental audio information in the step (3). The audio directivity analysis algorithm may be performed by AudioPolicy in an audio framework in the terminal, for example. AudioPolicy is a formulator of audio policies, and is responsible for policy selection for audio device switching, volume adjustment policies, and the like.
The audio framework in the terminal performs audio directivity analysis according to the environmental audio information to obtain directivity data, namely, the audio framework in the terminal determines a first camera from the cameras of the terminal and the cameras of the cooperative equipment according to the environmental audio information. The operation of determining, by the audio framework in the terminal, the first camera from the camera of the terminal and the cameras of the cooperative devices according to the environmental audio information is similar to the operation of determining, by the terminal in the above step 1204, the first camera from the camera of the terminal and the camera of each cooperative device in the at least one cooperative device according to the environmental audio information, which is not described in detail in this embodiment of the present application.
(4) The audio framework in the terminal sends the directivity data to the camera application in the terminal.
Optionally, the audio framework in the terminal may also send the environmental audio information to the camera application in the terminal, where the environmental audio information includes two audio information, where each of the two audio information may carry a Device identifier (Device ID) of a Device to which the environmental audio information belongs, that is, one of the two audio information carries a Device identifier of the terminal, and the other audio information carries a Device identifier of the cooperative Device, so that the camera application in the terminal may generate corresponding audio data for video frames captured by the camera of the terminal and the camera of the cooperative Device according to the two audio information, respectively, and output a corresponding video file accordingly. For example, corresponding audio data may be generated by an audio processing program (AudioHandler) in a camera application in the terminal for video pictures taken by a camera of the terminal and a camera of the collaborative device, respectively, according to the two audio information.
(5) And the camera application program in the terminal dynamically adjusts a plurality of video pictures displayed in the camera interface according to the directivity data.
The camera application in the terminal displays the video pictures shot by the camera of the terminal and the camera of the cooperative device in the camera interface, and compared with other video pictures, the video pictures shot by the first camera indicated by the directivity data are highlighted, and the specific highlighting mode can refer to the related description in the step 1205, which is not repeated in the embodiment of the present application.
It should be noted that, during the collaborative video recording process, the above steps (2) - (5) may be continuously performed. Therefore, the audio frame in the terminal can acquire the environmental audio information in real time and conduct audio directivity analysis to obtain directivity data, and the camera application program can dynamically adjust a plurality of video pictures displayed in the camera interface according to the directivity data so as to achieve the effect of highlighting which video picture when people in which video picture speak in the whole collaborative video recording process, thereby not only improving the flexibility of video picture display, being convenient for users to watch the video pictures better, but also improving the interactive experience of the users to a certain extent.
The following describes a third shooting scenario:
third shooting scene: video call scene
In this shooting scenario, the terminal has a plurality of cameras, and the terminal is in a video call state with another device (which may be referred to as a far-end call device).
And in the process of carrying out video call between the terminal and the far-end call equipment, each camera in the plurality of cameras of the terminal shoots a video picture. The terminal can display the video picture shot by one camera of the cameras of the terminal in the video call interface of the terminal, and can display the received video picture sent by the far-end call equipment in the video call interface of the terminal. Meanwhile, the terminal can send the video picture shot by one camera of the terminal displayed in the video call interface of the terminal to the far-end call equipment so that the far-end call equipment can display the video call interface of the far-end call equipment.
For example, as shown in fig. 16, the terminal 1601 may have a front camera and a rear camera. The terminal 1601 performs a video call with the far-end call device 1602. During a video call, the front camera and the rear camera of the terminal 1601 each capture a video picture, and the terminal 1601 displays the video picture 1631 captured by the front camera of the terminal in the video call interface 161 of the terminal 1601 and displays the received video picture 1641 sent by the far-end call device 1602. Meanwhile, the terminal 1601 transmits the video screen 1631 captured by the front camera of the terminal displayed in the video call interface 161 to the far-end call device 1602, so that the far-end call device 1602 displays the video call screen 162.
In the video call scene, n is 1, that is, the plurality of cameras belong to a group, and the plurality of cameras are all arranged at the terminal, and the shooting directions of the plurality of cameras are different.
Next, a method of displaying a video picture in a video call scene will be described by the embodiments of fig. 17 to 18 below.
Fig. 17 is a flowchart of a video frame display method according to an embodiment of the present application. Referring to fig. 17, the method includes:
step 1701: after receiving the video call instruction, the terminal starts a plurality of cameras of the terminal to acquire video pictures shot by each camera in the cameras.
The video call instruction is used for indicating to carry out video call with the far-end call equipment. The video call instruction may be triggered by a user, and the user may trigger through operations such as clicking operation, sliding operation, voice operation, gesture operation, somatosensory operation, etc., which is not limited in this embodiment of the present application.
After receiving the video call instruction, the terminal establishes video call connection with the far-end call equipment, at the moment, the terminal can start a plurality of cameras of the terminal, and after the cameras are started, video pictures can be shot. In this case, the terminal may also continuously acquire the environmental audio information during the process of capturing the video frames by the plurality of cameras, as described in step 1702.
Step 1702: the terminal acquires audio information acquired by a plurality of microphones of the terminal, wherein the audio information acquired by the microphones is environmental audio information.
Under the condition that the cameras are arranged on the terminal, the environment audio information comprises audio information, namely the audio information in the environment where the cameras are located, so that the terminal can collect the audio information through the microphones of the terminal, the audio information collected by the microphones is used as the environment audio information, and at the moment, the cameras are used as a group of cameras to correspond to the environment audio information. The microphones may be disposed at different locations of the terminal to collect audio information in an omni-directional manner. The terminal can start a plurality of cameras to carry out video call and can start a plurality of microphones to collect audio information. For example, the terminal may have three microphones, which may be disposed at the top, bottom and back of the terminal, respectively.
Step 1703: and the terminal determines a first camera from the plurality of cameras according to the environmental audio information, wherein a human voice sound source exists in a shooting area of the first camera.
The human sound source refers to a sound source that emits human sound. The presence of a human sound source in the shooting area of the first camera indicates that the shooting object of the first camera is likely to be a person and is speaking, that is, the person who is speaking is likely to appear in the video picture shot by the first camera.
After the terminal obtains the audio information in the environment where the terminal is located, the audio information can be analyzed to determine which cameras of the plurality of cameras of the terminal have voice sound sources in the shooting areas, namely, determine which cameras of the plurality of cameras shoot video pictures in which people who speak appear.
The operation of step 1703 is similar to the operation of determining the first camera from the plurality of cameras according to the environmental audio information in the above step 603, which is not described herein.
In step 1703, after determining the first camera from the plurality of cameras according to the environmental audio information, the other cameras except the first camera in the plurality of cameras may be referred to as the second camera.
Step 1704: the terminal displays the video picture shot by the first camera and the video picture shot by the second camera in the video call interface of the terminal, and displays the received video picture sent by the far-end call equipment in the video call interface of the terminal. And the terminal sends the video picture shot by the first camera to the far-end call equipment so that the far-end call equipment can display the video call interface of the far-end call equipment.
In the process of carrying out video call with the far-end call equipment, the terminal needs to send the video picture shot by the camera of the terminal to the far-end call equipment so as to display the video call interface of the far-end call equipment. And the far-end call equipment can send the video picture shot by the far-end call equipment to the terminal so as to display the video call interface of the terminal. That is, the video call interface of the terminal displays not only the video picture shot by the camera of the terminal, but also the video picture shot by the remote call device, and the video picture shot by the camera of the terminal displayed in the video call interface of the terminal is also synchronously sent to the video call interface of the remote call device for display.
In this case, the terminal displays the video picture shot by the first camera of the terminal in the video call interface of the terminal, but does not display the video picture shot by the second camera of the terminal, and sends the video picture shot by the first camera to the video call interface of the far-end call device for display. In other words, in the embodiment of the present application, the video frames captured by the cameras having the voice sound source in the capturing area in the plurality of cameras of the terminal are displayed on the video call interfaces of the terminal and the far-end call device, that is, the video frames having the speaking person in the video frames captured by the plurality of cameras of the terminal are displayed on the video call interfaces of the terminal and the far-end call device. Therefore, the effect that when people in the video pictures shot by which camera in the terminal speak, the video pictures shot by which camera are displayed on the video call interfaces of the terminal and the far-end call equipment can be achieved, the flexibility of video picture display can be improved, users can watch the video pictures better, and the interaction experience of the users can be improved to a certain extent.
For example, as shown in fig. 18, the terminal 1601 has a front camera and a rear camera, and the terminal 1601 performs a video call with the far-end call device 1602. During the video call, the terminal 1601 starts its own front camera and rear camera. Then, if the terminal 1601 determines that the front camera is the first camera, that is, determines that a voice sound source exists in the photographing area of the front camera, that is, determines that a person speaking is present in the video frame photographed by the front camera, as shown in fig. 18 (a), the terminal 1601 displays the video frame 1621 photographed by the front camera and the received video frame 1631 sent by the far-end call device 1602 in the video call interface 161 of the terminal, and the terminal 1601 sends the video frame 1621 photographed by the front camera to the far-end call device for the far-end call device 1602 to display in the video call interface 162 of the far-end call device 1602. At this time, the terminal 1601 and the far-end call device 1602 each display a video picture taken by a front camera of the terminal 1601, and do not display a video picture taken by a rear camera of the terminal 1602. As the video call proceeds, if the terminal 1601 determines that the rear camera is the first camera, that is, determines that a voice sound source exists in a photographing area of the rear camera, that is, determines that a person speaking is present in a video picture photographed by the rear camera, as shown in fig. 18 (b), the terminal 1601 displays a video picture 1622 photographed by the rear camera and a received video picture 1631 transmitted by the far-end call device 1602 in the video call interface 161, and the terminal 1601 transmits the video picture 1622 photographed by the rear camera to the far-end call device 1602 for the far-end call device 1602 to display in the video call interface 162 of the far-end call device 1602. At this time, the terminal 1601 and the far-end call device 1602 switch from the video screen 1621 photographed by the front camera of the terminal 1601 to the video screen 1622 photographed by the rear camera of the terminal 1601 for display. Thus, the effect that when a person in the video frame 1621 shot by the front camera of the terminal 1601 speaks during the video call, the video frame 1621 shot by the front camera of the terminal 1601 is displayed in the video call interface of the terminal 1601 and the far-end call device 1602, and when a person in the video frame 1622 shot by the rear camera of the terminal 1601 speaks, the video frame 1622 shot by the rear camera of the terminal 1601 is displayed in the video call interface of the terminal 1601 and the far-end call device 1602 can be realized.
It should be noted that, this video call mode may form an effect similar to that of a multiparty video call. For example, a plurality of friends may be in a live meeting to use a cell phone to conduct a video call with another remote friend. Under the condition, the mobile phone starts the front camera and the rear camera, and the video picture shot by the front camera is sent to the video call interface of the far-end call equipment used by the far-end friend by default to be displayed, and meanwhile, the video picture shot by the front camera of the mobile phone and the video picture sent by the far-end call equipment are displayed on the video call interface of the mobile phone. It is assumed that one friend of the plurality of friends holds the mobile phone and that the friend is in a shooting area of a front camera of the mobile phone, at which time the front camera of the mobile phone shoots a video picture including an image of the friend. The remote friend holds the remote communication device in his hand to shoot a video picture containing his own image. The video call interfaces of the mobile phone and the far-end call equipment display video pictures containing the images of the friend and display video pictures containing the images of the remote friend. And if another friend of the plurality of friends is in the shooting area of the rear camera of the mobile phone and the other friend starts speaking, the mobile phone recognizes that a voice sound source exists in the shooting area of the rear camera of the mobile phone, so that the rear camera can be determined to be the first camera, a video picture containing the image of the other friend shot by the rear camera can be switched and displayed on a video call interface of the mobile phone, and the video picture containing the image of the other friend shot by the rear camera of the mobile phone is sent to a video call interface of the far-end call device for display. At this time, the video call interfaces of the mobile phone and the far-end call device both display a video picture containing the image of the other friend and display a video picture containing the image of the far-end friend. Thus, in the video call process, the video call between two friends and remote friends of the live party is realized, and the effect similar to the multiparty video call is formed.
It should be noted that, if the terminal does not determine the first camera from the plurality of cameras according to the environmental audio information in the above step 1703, that is, if the first camera does not exist in the plurality of cameras, the terminal does not execute the above step 1704, but displays a video picture captured by a camera designated in advance (i.e., a camera displayed by default) in the plurality of cameras on a video call interface of the terminal, or may select one camera from the plurality of cameras according to other criteria, and display the video picture captured by the selected camera on the video call interface of the terminal. In this case, the video call interface of the terminal does not display video pictures taken by other cameras except the camera, and the video call interface of the terminal displays the received video pictures sent by the far-end call device, and the terminal sends the video pictures taken by the camera to the far-end call device so that the far-end call device can display the video call interface of the far-end call device. When the terminal selects one camera from the cameras according to other standards, the terminal can select the one camera from the cameras according to the number of people in the video picture, the priority of the cameras and other standards. For example, the terminal may detect the number of persons appearing in the video picture captured by each of the plurality of cameras, and then select the camera having the largest number of persons appearing in the video picture. Alternatively, the terminal may record the priority of each of the plurality of cameras, and then select the camera with the highest priority.
It should be noted that the terminal may continuously perform the steps 1702-1704 during the process of capturing the video frames by the plurality of cameras. Therefore, the terminal can dynamically adjust the video picture displayed in the video call interface according to the real-time audio in the environment where the terminal is positioned in the whole video call process, so as to achieve the effect of displaying the video picture shot by the camera on the video call interface when the person in the video picture shot by the camera in the terminal speaks.
A fourth shooting scenario is described below:
fourth shooting scene: video conference scene
In such a shooting scenario, the terminal and at least one other device each have a camera, and the terminal and the at least one device (which may be referred to as a conferencing device) are in a video conference state.
In the process of video conference between the terminal and the at least one conference device, the camera of the terminal and the camera of each conference device in the at least one conference device all shoot video pictures. The terminal may display video pictures taken by the camera of the terminal and the camera of each of the at least one conferencing device in a video conferencing interface of the terminal.
Illustratively, as shown in fig. 19, the terminal 1901 and three meeting devices 1902 each have a camera. The terminal 1901 and the three meeting devices 1902 are in a video conference. The terminal 1901 displays a video picture 1921 captured by its own camera in the video conference interface 1911, and displays a received video picture 1922 captured by its camera transmitted by each of the three conference devices 1902. That is, the terminal 1901 displays four video pictures on the video conference interface 1911, namely, a video picture 1921 captured by the camera of the terminal itself and a video picture 1922 captured by the camera of each of the three conference devices 1902.
In the video conference scene, n is an integer greater than or equal to 2, that is, the cameras are divided into a plurality of groups, one group of cameras in the plurality of groups of cameras is arranged at the terminal, and other groups of cameras are arranged at each conference device for performing video conference with the terminal.
Next, a video picture display method in a video conference scene will be described by the embodiments of fig. 20 to 21 below.
Fig. 20 is a flowchart of a video frame display method according to an embodiment of the present application. Referring to fig. 20, the method includes:
Step 2001: and after receiving the video conference instruction, the terminal starts the camera of the terminal.
The video conference instruction is used for indicating the terminal to conduct video conference with other conference devices. The video conference instruction may be triggered by a user, and the user may trigger through operations such as clicking operation, sliding operation, voice operation, gesture operation, somatosensory operation, etc., which is not limited in the embodiment of the present application.
After receiving the video conference instruction, the terminal establishes video call connection with each of at least one conference device. At this time, the terminal can start the camera of the terminal, and the camera of the terminal can shoot video pictures after being started. Each of the at least one conference device can start a camera of the conference device, shoot video pictures through the camera of the conference device, and can send the video pictures shot by the camera of the conference device to the terminal.
Step 2002: the terminal acquires a video picture shot by a camera of the terminal, and receives the video picture shot by the camera of each conference device in the at least one conference device.
In this case, as shown in fig. 19, the terminal 1901 may display, in the video conference interface 1911, a video picture 1921 captured by a camera of the terminal 1901, and may also display a received video picture 1922 captured by a camera of each of the at least one conference devices 1902 transmitted by the conference device 1902.
In addition, the terminal may also continuously acquire the environmental audio information during the videoconference, as described in step 2003.
Step 2003: the terminal acquires audio information acquired by a microphone of the terminal, receives the audio information acquired by the microphone of each of the at least one conference device and transmits the audio information acquired by the microphone of each of the at least one conference device, and the audio information acquired by the microphone of the terminal and the audio information acquired by the microphone of each of the at least one conference device are environmental audio information.
In this case, the environmental audio information includes a plurality of audio information, wherein one audio information is audio information in an environment in which the terminal is located collected by a microphone of the terminal, and the other at least one audio information is audio information in an environment in which each microphone of the at least one conference device is located collected by a microphone of the at least one conference device. The audio information collected by the microphone of the terminal corresponds to the camera of the terminal. The audio information collected by the microphone of a certain conferencing device corresponds to the camera of this conferencing device.
The terminal can start the camera of the terminal to carry out video conference and can start the microphone of the terminal to collect audio information. Similarly, when a certain conference device starts a camera of the conference device to perform a video conference, a microphone of the conference device can be started to collect audio information, so that the conference device can send a video picture shot by the camera of the conference device to the terminal and simultaneously send the audio information collected by the microphone of the conference device to the terminal.
Step 2004: and the terminal determines a first camera from the camera of the terminal and the camera of each of the at least one conference device according to the environmental audio information, and a human voice sound source exists in a shooting area of the first camera.
The human sound source refers to a sound source that emits human sound. The presence of a human sound source in the shooting area of the first camera indicates that the shooting object of the first camera is likely to be a person and is speaking, that is, the person who is speaking is likely to appear in the video picture shot by the first camera.
After the terminal obtains the audio information in the environment where the terminal is located and the audio information in the environment where each of the at least one conference device is located, the plurality of audio information can be analyzed to determine which cameras in the cameras of the terminal and each of the at least one conference device have voice sound sources in the shooting areas, i.e., determine which cameras in the plurality of cameras shoot video pictures in which people who are speaking appear.
The operation in step 2004 is similar to the operation of determining the first camera from the plurality of cameras according to the environmental audio information in step 402, which is not described in detail in the embodiment of the present application.
After determining the first camera from the camera of the terminal and the camera of each of the at least one conference device according to the environmental audio information in the above step 2004, the camera of the terminal and the other cameras except the first camera in the camera of each of the at least one conference device may be referred to as the second camera.
Step 2005: the terminal displays the video picture shot by the first camera and the video picture shot by the second camera in the video conference interface, and compared with the video picture shot by the second camera, the terminal highlights the video picture shot by the first camera.
It should be noted that, in the embodiment of the present application, when the terminal displays, in the video conference interface, the video frames captured by the camera of the terminal and the camera of each of the at least one conference device, the video frames captured by the first camera of the at least one conference device may be highlighted, that is, the video frames captured by the camera having the voice source in the capturing area may be displayed in a display manner different from other video frames, so as to distinguish the video frame in which the person speaking is located from other video frames. Therefore, the effect of highlighting which video picture can be achieved when the person in which video picture speaks can be achieved, so that the flexibility of video picture display can be improved, a user can pay attention to the person speaking in the video conference better, and the interaction experience of the user can be improved to a certain extent.
Optionally, the terminal may zoom in and display the video frame shot by the first camera in the video conference interface, and zoom out and display the video frame shot by the second camera; or the terminal can use the video picture shot by the first camera as a main picture of the picture-in-picture mode, use the video picture shot by the second camera as a sub-picture of the picture-in-picture mode, and display the video picture shot by the first camera and the video picture shot by the second camera in the picture-in-picture mode in the video conference interface. Of course, the terminal may also highlight the video frame shot by the first camera in other manners, which is not limited in the embodiment of the present application.
For example, as shown in fig. 19, the terminal 1901 and each of the at least one conference devices 1902 have a camera, and the terminal 1901 may display a video frame 1921 captured by its camera and a video frame 1922 captured by the camera of each of the at least one conference devices 1902 in the video conference interface 1911. Then, if the terminal 1901 determines that the own camera is the first camera, that is, determines that a voice sound source exists in the shooting area of the own camera, that is, determines that a speaking person exists in the video frame 1921 shot by the own camera, as shown in fig. 21 (a), the terminal 1901 performs enlarged display on the video frame 1921 shot by the own camera and performs reduced display on the video frame 1922 shot by the camera of each of the at least one conference devices 1902 in the video conference interface 1911. As the video conference proceeds, if the terminal 1901 determines that the camera of a certain conference device 1902 in the at least one conference device 1902 is the first camera, that is, determines that a sound source exists in a shooting area of the camera of the conference device 1902, that is, determines that a speaking person exists in a video picture 1922 shot by the camera of the conference device 1902, as shown in (b) of fig. 21, the terminal 1901 performs zoom-in display on the video picture 1922 shot by the camera of the conference device 1902 in the video conference interface 1911, and performs zoom-out display on the video picture 1922 shot by the camera of other conference devices 1902 except for the conference device 1902 in the at least one conference device 1902 in the video conference interface 1911, and performs zoom-out display on the video picture 1921 shot by the camera of the terminal 1901. In this way, when a person in the video screen 1921 captured by the camera of the terminal 1901 speaks, the video screen 1921 captured by the camera of the terminal 1901 is displayed in an enlarged manner, and when a person in the video screen 1922 captured by the camera of any one of the at least one conference devices 1902 speaks, the video screen 1922 captured by the camera of the conference device 1902 is displayed in an enlarged manner.
It should be noted that, if the terminal does not determine the first camera from the cameras of the terminal and the cameras of each of the at least one conference device according to the environmental audio information in the above step 2004, that is, if the first camera does not exist in the plurality of cameras, the terminal does not perform the above step 2005, but displays the video frames captured by the cameras of the terminal and each of the at least one conference device in the same display manner on the video conference interface, or displays the video frames captured by the cameras of the terminal and each of the at least one conference device differently on the video conference interface according to other standards.
When the terminal differentially displays the video pictures shot by the cameras of the terminal and the at least one conference device on the video conference interface according to other standards, the video pictures shot by the cameras of the terminal and the at least one conference device can be differentially displayed on the video conference interface according to the standards of the number of people in the video pictures, the priority of the cameras, the loudness of the environmental audio and the like. For example, the terminal may detect the number of people appearing in the video pictures photographed by the cameras of the terminal and the at least one conference participant device, and then highlight the video picture having the largest number of people appearing in the video conference interface. Or the priority of the cameras of the terminal and the at least one conference equipment can be recorded in the terminal, and the video picture shot by the camera with the highest priority is highlighted on the video conference interface. Or, the terminal may determine one audio information with the largest loudness from the audio information corresponding to the camera of the terminal (i.e. the audio information collected by the microphone of the terminal) and the audio information corresponding to the camera of each of the at least one conference device (i.e. the audio information collected by the microphone of each conference device), and highlight the video picture captured by the camera corresponding to the audio information at the video conference interface. The terminal may highlight a video frame on the video conference interface in a plurality of ways. For example, the terminal can enlarge and display the video picture on a video conference interface and reduce and display other video pictures; alternatively, the terminal may use the video picture as a main picture of the pd mode, use other video pictures as sub-pictures of the pd mode, and display the video picture and other video pictures in the pd mode at the video conference interface. Of course, the terminal may also highlight this video frame at the video conference interface in other ways, which embodiments of the present application do not limit.
Another point to be noted is that the terminal may continue to perform the above-mentioned steps 2003-2005 during the capturing of video pictures by the camera of the terminal and the camera of each of the at least one conferencing device. In this way, the terminal can dynamically adjust the display mode of the video pictures shot by the cameras of the terminal and each of the at least one conference equipment according to the real-time audio frequency in the environment where the terminal and each of the at least one conference equipment are located in the whole video conference process, so as to achieve the effect of highlighting which video picture when a person in which video picture speaks.
A fifth shooting scenario is described below:
fifth shooting scene: centralized monitoring scene
In such a shooting scene, the terminal communicates with a plurality of monitoring devices as a management device. Each of the plurality of monitoring devices has a camera. The photographing area of the camera of each of the plurality of monitoring devices is different, that is, the plurality of monitoring devices may be installed in different areas for realizing the monitoring of the different areas. The terminal may or may not have a camera, which is not limited in this embodiment of the present application. In the case where the terminal has a camera, the photographing area of the camera of the terminal may be different from the photographing areas of the cameras of the plurality of monitoring devices, that is, the camera of the terminal may separately realize monitoring of one area.
In the process that the terminal monitors through the plurality of monitoring devices, a camera of each monitoring device in the plurality of monitoring devices shoots a video picture, and the shot video picture is transmitted to the terminal. The terminal may display, in the monitoring interface, the received video picture transmitted by each of the plurality of monitoring devices and captured by its camera. Optionally, in the case that the terminal also starts the camera to monitor, the terminal may also display a video picture captured by the camera of the terminal in the monitoring interface. In this case, the video pictures photographed by the respective cameras may also be referred to as monitoring pictures.
The terminal communicates with three monitoring devices, each having a camera, for monitoring by the three monitoring devices, for example. As shown in fig. 22, the terminal displays, in the monitoring interface 221, a video screen 2221 transmitted by each of the three monitoring devices and captured by its camera, that is, a monitoring screen of a capturing area of the camera of each of the three monitoring devices. Assuming that the photographing areas of the three monitoring apparatuses are the main sleeping, the sub sleeping, and the living room, respectively, the terminal displays a main sleeping monitoring screen, a sub sleeping monitoring screen, and a living room monitoring screen on the monitoring interface 221.
In the centralized monitoring scene, n is an integer greater than or equal to 2, that is, the cameras are divided into a plurality of groups, and each group of cameras in the plurality of groups of cameras is arranged in each monitoring device. Optionally, one group of cameras may be disposed on the terminal.
Next, a method of displaying video pictures in a centralized monitoring scene will be described by the embodiments of fig. 23 to 25 below.
Fig. 23 is a flowchart of a video frame display method according to an embodiment of the present application. Referring to fig. 23, the method includes:
step 2301: after receiving the monitoring instruction, the terminal instructs each monitoring device in the plurality of monitoring devices to start the camera.
The monitoring instruction is used for indicating to monitor the area. The monitoring instruction may be triggered by a user, and the user may trigger through operations such as clicking operation, sliding operation, voice operation, gesture operation, somatosensory operation, and the like, which is not limited in the embodiment of the present application.
After receiving the monitoring instruction, the terminal can instruct each monitoring device in the plurality of monitoring devices which are communicated with the terminal to start the camera, after the camera of each monitoring device is started, the terminal can shoot a video picture, and the shot video picture can be sent to the terminal. Further, if the terminal itself also has a camera, the terminal may also start the camera to capture a video picture, and acquire a video picture captured by the camera.
Step 2302: the terminal receives video pictures shot by a camera of each monitoring device in the plurality of monitoring devices.
In this case, as shown in fig. 22, the terminal may display, in the monitoring interface 221, a received video screen 2221 shot by the camera thereof transmitted by each of the plurality of monitoring devices. Further, when the camera of the terminal also monitors, a video image captured by the camera of the terminal may be displayed on the monitoring interface.
The terminal may also continuously obtain environmental audio information during the monitoring process, as described in step 2303.
Step 2303: the terminal receives the audio information collected by the microphone of each monitoring device in the plurality of monitoring devices, wherein the audio information collected by the microphone of each monitoring device in the plurality of monitoring devices is environment audio information.
Further, under the condition that the camera of the terminal also monitors, the terminal can acquire audio information acquired by the microphone of the terminal, and the audio information acquired by the microphone of the terminal is also environmental audio information.
In this case, the environmental audio information includes a plurality of audio information, each of which is the audio information in the environment in which the microphone of each monitoring device is located. The audio information collected by the microphone of a certain monitoring device corresponds to the camera of the monitoring device. Optionally, there may be one audio information among the plurality of audio information that is collected by a microphone of the terminal in an environment in which the terminal is located. The audio information collected by the microphone of the terminal corresponds to the camera of the terminal.
When a certain monitoring device starts a camera of the monitoring device to monitor, a microphone of the monitoring device can be started to collect audio information, so that when a video picture shot by the camera of the monitoring device is sent to the terminal, the audio information collected by the microphone of the monitoring device can be sent to the terminal. In the same way, the terminal can start the own microphone to collect audio information while starting the own camera to monitor.
Step 2304: and the terminal determines a first camera from the cameras of each monitoring device in the plurality of monitoring devices according to the environmental audio information, and a human voice sound source exists in a shooting area of the first camera.
The human sound source refers to a sound source that emits human sound. The presence of a human sound source in the shooting area of the first camera indicates that the shooting object of the first camera is likely to be a person and is speaking, that is, the person who is speaking is likely to appear in the video picture shot by the first camera.
And under the condition that the monitoring interface displays a video picture shot by a camera of the terminal, the environment audio information comprises audio information acquired by a microphone of the terminal. In this case, the terminal may determine the first camera from among the camera of the terminal and the camera of each of the plurality of monitoring devices according to the environmental audio information.
After the terminal obtains the audio information in the environment where each monitoring device is located and the audio information in the environment where the terminal is located, the audio information can be analyzed to determine whether the camera of each monitoring device in the plurality of monitoring devices and the shooting areas of the cameras in the camera of the terminal have a voice sound source, namely, whether the video pictures shot by the cameras have a talking person.
The operation of step 2304 is similar to the operation of determining the first camera from the plurality of cameras according to the environmental audio information in step 402, which is not described herein.
After determining the first camera from the camera of each of the plurality of monitoring devices and the camera of the terminal according to the environmental audio information in step 2304, the camera of each of the plurality of monitoring devices and the camera of the terminal other than the first camera may be referred to as a second camera.
Step 2305: the terminal displays the video picture shot by the first camera and the video picture shot by the second camera in a differentiated mode on the monitoring interface.
The terminal performs differential display on the video picture shot by the first camera and the video picture shot by the second camera, namely, displays the video picture shot by the first camera and the video picture shot by the second camera in different display modes so as to realize the highlighting of the video picture shot by the first camera. That is, the video picture captured by the camera having the human voice sound source in the capturing area is displayed in a different manner from other video pictures, so that the video picture in which the person speaking is located is distinguished from the other video pictures. Therefore, if a person in a certain video picture is speaking, the video picture and other video pictures are subjected to differential display, so that the flexibility of video picture display can be improved, a user can watch the video picture better, and the monitoring effect is improved. In addition, the effect of highlighting which video picture can be achieved when the person in which video picture speaks, so that the interaction experience of the user can be improved to a certain extent.
The operation in step 2305 is similar to the operation in step 403 that the terminal performs differential display on the video frame captured by the first camera and the video frame captured by the second camera, which is not described in detail in this embodiment of the present application.
For example, the terminal communicates with three monitoring devices, each of which has a camera, to monitor by the three monitoring devices. Assuming that the photographing areas of the three monitoring devices are a primary sleeping, a secondary sleeping, and a living room, respectively, as shown in fig. 22, the terminal displays, in the monitoring interface 221, a video screen 2221 photographed by the camera thereof transmitted by each of the three monitoring devices, that is, a primary sleeping monitoring screen, a secondary sleeping monitoring screen, and a living room monitoring screen. Then, if the terminal determines that the camera of the monitoring device with the shooting area being the next-lying is the first camera, that is, determines that there is a voice sound source in the next-lying, that is, determines that there is a person speaking in the next-lying monitoring screen, as shown in fig. 24, the terminal performs, in the monitoring interface 221, enlarged display on the video screen 2221 (i.e., the next-lying monitoring screen) shot by the camera of the monitoring device with the shooting area being the next-lying, and performs reduced display on the video screen 2221 (i.e., the main-lying monitoring screen and the living room monitoring screen) shot by the camera of the other monitoring device. In this way, in the centralized monitoring process, when the human voice source exists in which region, the effect of amplifying and displaying the monitoring picture of which region on the monitoring interface can be achieved.
For another example, the terminal communicates with three monitoring devices, each of which has a camera, to monitor by the three monitoring devices. Assuming that the photographing areas of the three monitoring devices are a primary sleeping, a secondary sleeping, and a living room, respectively, as shown in fig. 22, the terminal displays, in the monitoring interface 221, a video screen 2221 photographed by the camera thereof transmitted by each of the three monitoring devices, that is, a primary sleeping monitoring screen, a secondary sleeping monitoring screen, and a living room monitoring screen. Then, if the terminal determines that the camera of the monitoring device with the photographing area being the next-lying is the first camera, that is, determines that there is a voice sound source in the next-lying, that is, determines that there is a person speaking in the next-lying monitoring screen, as shown in fig. 25, the terminal displays only the video screen 2221 (i.e., the next-lying monitoring screen) photographed by the camera of the monitoring device with the photographing area being the next-lying in the monitoring interface 221, but does not display the video screen 2221 (i.e., the main-lying monitoring screen and the living room monitoring screen) photographed by the camera of the other monitoring device. In this way, in the centralized monitoring process, when the human voice source exists in which region, the effect of individually displaying the monitoring screen of which region on the monitoring interface can be achieved. At this time, in the centralized monitoring process, the monitoring interface may continuously switch and display the monitoring screen of each region where the human voice sound source exists.
It should be noted that, if the terminal does not determine the first camera in the above step 2304, that is, if the first camera does not exist in the cameras, the terminal does not perform the above step 2305, but displays the video frames captured by the cameras on the monitoring interface in the same display manner, or displays the video frames captured by the cameras differently on the monitoring interface according to other standards.
When the terminal displays the video pictures shot by the cameras in a differentiated mode on the monitoring interface according to other standards, the video pictures shot by the cameras can be displayed in a differentiated mode on the monitoring interface according to the number of people in the video pictures, the priority of the cameras, the loudness of the environmental audio and other standards. For example, the terminal may detect the number of people appearing in the video frames photographed by each of the cameras, and then highlight the video frame having the largest number of people appearing. Or, the terminal may record the priority of each of the cameras, and highlight the video picture shot by the camera with the highest priority. Or the terminal can determine one audio information with the largest loudness from the audio information corresponding to each camera in the cameras, and highlight the video picture shot by the camera corresponding to the audio information. The terminal may highlight a certain video frame in various manners. For example, the terminal can enlarge and display the video picture, and reduce and display other video pictures; alternatively, the terminal may display only this video picture and not other video pictures. Of course, the terminal may highlight this video frame in other manners, which are not limited by the embodiments of the present application.
Another point to be noted is that the terminal may continuously perform the above steps 2303-2305 during the process of capturing video pictures by the cameras of the plurality of monitoring devices. Therefore, the terminal can dynamically adjust the display mode of the video picture shot by the camera of each monitoring device according to the real-time audio frequency in the environment of each monitoring device in the plurality of monitoring devices in the whole monitoring process so as to achieve the effect of highlighting the video picture when the person in the video picture speaks.
Further, after the display adjustment of the video frames is achieved through the embodiments of fig. 3 to 25, the terminal may also generate a video file corresponding to the video frames shot by each camera. The video file comprises video data and audio data, and the video file format can be set to put the video data and the audio data in one file, so that the video file is convenient to play back simultaneously, and video playing is realized.
Alternatively, the operation of the terminal to generate the video file corresponding to the video frames shot by each camera may include the following three possible manners.
A first possible way is: the terminal outputs the multi-video file at the end of shooting, and specifically, the method can comprise the following steps (1) - (2).
(1) In the shooting process of the n groups of cameras, the terminal generates multi-channel audio data according to the audio information corresponding to the group of cameras for any group of cameras in the n groups of cameras, and performs channel separation on the multi-channel audio data of the group of cameras to obtain the audio data of each camera in the group of cameras.
When the terminal generates multi-channel audio data according to the audio information corresponding to the group of cameras, audio processing algorithms such as a recording algorithm and audio zooming can be used for processing the audio information corresponding to the group of cameras, so that the multi-channel audio data are obtained.
The multi-channel audio data includes a plurality of channels of sound, which are analog stereo data. For example, the multi-channel audio data may be 5.1 channel audio data, 7.1 channel audio data, or the like. The 5.1 channels include a center channel, a front left front, a front right channel, a rear left surround channel, a rear right surround channel, and a subwoofer channel (i.e., 0.1 channel). The 7.1 sound channels include a left front surround sound channel, a right front surround sound channel, a center surround sound channel, a left rear surround sound channel, a right rear surround sound channel, a left surround sound channel, and a right surround sound channel. The multi-channel audio data may be pulse code modulated (pulse code modulation, PCM) data, for example.
The terminal performs channel separation on the multi-channel audio data of the group of cameras, namely, disassembles the multi-channel audio data, so that the audio data of each part of channels in the multi-channel audio data is used as the audio data of each camera in the group of cameras. At this time, the audio data of different cameras in the group of cameras have different channels, so that the audio data of different cameras in the group of cameras are different.
For example, the group of cameras includes two cameras, one being a front camera and the other being a rear camera. The multi-channel audio data is 7.1 channel audio data. After the 7.1 channel audio data is subjected to channel separation, the audio data of the first channel and the audio data of the second channel in the 7.1 channel audio data can be used as the audio data of the front camera, and the audio data of the third channel and the audio data of the fourth channel in the 7.1 channel audio data can be used as the audio data of the rear camera. At this time, the audio data of the front camera and the audio data of the rear camera are two-channel audio data.
In the shooting process of the n groups of cameras, the terminal can acquire video pictures shot by each camera in the n groups of cameras, and video data of the video pictures shot by each camera in the n groups of cameras are acquired. Therefore, the terminal can execute the following step (2) in the shooting process of the n groups of cameras to generate the video file corresponding to each camera in the n groups of cameras in real time in the shooting process of the n groups of cameras.
(2) And generating a video file corresponding to one camera of the n groups of cameras according to the video data of the video picture shot by the one camera and the audio data of the one camera.
Thus, when the shooting of the n groups of cameras is finished, the terminal can output the video files corresponding to each camera in the n groups of cameras, namely, a plurality of video files. The video file may be in MP4 format, for example.
The first possible way described above is illustrated below in connection with the software system of the terminal shown in fig. 26.
Referring to fig. 26, assuming n is 1, the set of cameras includes two cameras and are both provided at the terminal. The video file generation process may include the following steps a-D:
step A: the Camera HAL acquires video data of video pictures shot by each of two cameras of the terminal, and transmits the video data of the video pictures shot by each of the two cameras to the Camera application program.
Alternatively, the Camera HAL may include an Image Sensor (Sensor), an Image Front End (IFE), an Image processing engine (Image processing engine, IPE), and the like. The data stream transmitted by each of the two cameras of the terminal can be transmitted to an upper application after being processed by Sensor, IFE, IPE in the Camera HAL and the like.
As shown in fig. 26 (a), the Camera HAL can transmit video data of a video picture photographed by each of the two cameras to an upper-layer Camera application through a Camera service and a Camera frame.
And (B) step (B): the Audio HAL acquires Audio information, generates multi-channel Audio data according to the Audio information, and transmits the multi-channel Audio data to a camera application program.
Optionally, the audiohal may include AudioInputStream, which is used for receiving Audio information collected by multiple microphones of the terminal. The Audio HAL can also comprise an Audio processing algorithm such as a recording algorithm, an Audio zooming algorithm and the like, and is used for processing the received Audio information to obtain multi-channel Audio data.
As shown in fig. 26 (a), the Audio HAL may transmit the multi-channel Audio data to an upper layer camera application through an Audio service (e.g., audioplayer) and an Audio framework (e.g., audioRecord).
Step C: and the camera application program performs channel separation on the multichannel audio data to obtain the audio data of each of the two cameras, and transmits the video data and the audio data of the video picture shot by each of the two cameras to the coding frame.
An AudioHandler and an audio splitter (AudioSplit) may be included in the camera application. The AudioHandler may receive the multi-channel audio data, and the AudioSplit may channel-separate the channel audio data.
Step D: the coding frame generates corresponding video files according to the video data of the video picture shot by each camera of the two cameras and the audio data of each camera so as to obtain two video files.
As shown in fig. 26 (b), the video and audio multiplexer (Muxer) is included in the encoding frame. A video-audio multiplexer can code and synthesize the video data of the video picture shot by a camera and the audio data of the camera, and then output the video file corresponding to the camera.
A second possible way is: the terminal outputs a single video file at the end of shooting, and specifically may include the following steps (1) - (4).
(1) In the shooting process of the n groups of cameras, the terminal generates multi-channel audio data according to the audio information corresponding to the group of cameras for any group of cameras in the n groups of cameras, and performs channel separation on the multi-channel audio data of the group of cameras to obtain the audio data of each camera in the group of cameras.
The operation of step (1) in the second possible manner is the same as the operation of step (1) in the first possible manner, which is not described in detail in this embodiment of the present application.
(2) The terminal judges whether the video pictures shot by the first camera exist in all the video pictures currently being displayed.
If all the video pictures currently being displayed have the video picture shot by the first camera, the video picture shot by the first camera is the video picture highlighted in all the video pictures currently being displayed.
If all the video pictures currently being displayed have the video pictures shot by the first camera, executing the following step (3); if the video picture shot by the first camera does not exist in all the video pictures currently being displayed, the following step (4) is executed.
(3) If all the video pictures currently being displayed have the video pictures shot by the first camera, the terminal generates a video file according to the video data of all the video pictures currently being displayed and the audio data of the first camera.
In this case, the video data of all video pictures currently being displayed is the video data of the fusion picture of all video pictures currently being displayed. The video data in the generated video file at this time is the video data of the fusion picture of all the displayed video pictures, and the audio data is the audio data of the first camera (i.e. the audio data of the highlighted video picture).
(4) If the video pictures shot by the first camera do not exist in all the video pictures currently being displayed, the terminal performs audio mixing operation on the audio data of the cameras to which all the video pictures currently being displayed belong to, so as to obtain mixed audio data, and a video file is generated according to the video data of all the video pictures currently being displayed and the mixed audio data.
In this case, the video data of all video pictures currently being displayed is the video data of the fusion picture of all video pictures currently being displayed. The video data in the generated video file at this time is the video data of the fusion picture of all the displayed video pictures, and the audio data is the mixed audio data of the camera to which all the displayed video pictures belong (i.e. the mixed audio data of all the displayed video pictures).
Thus, when the shooting of the n groups of cameras is finished, the terminal can output a video file. The video data in the video file is the video data of the fusion picture of all video pictures displayed by the terminal, and the audio data in the video file is the audio data of the highlighted video picture or the mixed audio data of all video pictures displayed in common.
For example, as shown in fig. 27, the three continuous frames of video images of the video file include a video image captured by the front camera and a video image captured by the rear camera, that is, the three frames of video images are all fusion images of the video image captured by the front camera and the video image captured by the rear camera. The front camera in the first frame of video image is a first camera, that is, the video picture shot by the front camera in the first frame of video image is a main picture (that is, the highlighted video picture), and then the audio data of the front camera can be used as the audio data of the first frame of video image. The rear camera in the second frame of video image is the first camera, that is, the video picture shot by the rear camera in the second frame of video image is the main picture, the audio data of the rear camera can be used as the audio data of the second frame of video image. The first camera does not exist in the third frame of video image, that is, the video picture shot by the front camera and the video picture shot by the rear camera in the third frame of video image are not main pictures, and the mixed audio data of the front camera and the rear camera can be used as the audio data of the third frame of video image.
A third possible way is: after shooting, the terminal edits a plurality of video files obtained by shooting to obtain a fused video file, and the method specifically comprises the following steps (1) - (2).
(1) And the terminal obtains the video files corresponding to each camera in the n groups of cameras through the first possible mode when the shooting of the n groups of cameras is finished, namely, a plurality of video files are obtained.
For any one of the plurality of video files, the video file may carry history display information indicating a display condition of video data in the video file in the terminal during photographing of the n sets of cameras, that is, the history display information indicating a display period of each frame of video image in the video data in the video file during photographing of the n sets of cameras and whether or not in a highlighted state at the time of display. The history display information may be written in this video file as Tag information of this video file, for example.
(2) After detecting the editing instruction for the plurality of video files, the terminal processes the video data and the audio data in each video file of the plurality of video files according to the history display information of each video file of the plurality of video files to obtain a fused video file.
The editing instructions are used for indicating that the plurality of video files are integrated into one video file. The editing instruction may be triggered by a user, and the user may trigger through operations such as clicking operations, sliding operations, voice operations, gesture operations, somatosensory operations, and the like, which is not limited in the embodiment of the present application.
The terminal may process the video data and the audio data in each of the plurality of video files according to the history display information of each of the plurality of video files, where the processing may be: determining, for each display time point, at least one of the plurality of video files that displays a video image at the display time point, and determining whether or not there is a video file in which the video image is in a highlighted state, based on the history display information of each of the plurality of video files; if the video file with the video image in the highlighting state exists in the at least one video file, fusing the video data in the at least one video file according to the highlighting mode to obtain target video data of the display time point, and taking the audio data in the video file with the video image in the highlighting state in the at least one video file as target audio data of the display time point; if the video file with the video image in the highlighting state does not exist in the at least one video file, fusing the video data in the at least one video file according to a common display mode to obtain target video data of the display time point, and mixing the audio data in the at least one video file to obtain target audio data of the display time point; and generating a fusion video file according to the target video data and the target audio data of all the display time points.
Fig. 28 is a schematic structural diagram of a video screen display apparatus provided in the embodiment of the present application, where the apparatus may be implemented by software, hardware, or a combination of both as part or all of a computer device, and the computer device may be a terminal as described in the embodiment of fig. 1-2. Referring to fig. 28, the apparatus includes: an acquisition module 2801, a determination module 2802, and a display module 2803.
An obtaining module 2801, configured to obtain a video picture captured by each of the plurality of cameras, and obtain environmental audio information, where the environmental audio information includes audio information in an environment where each of the plurality of cameras is located;
a determining module 2802, configured to determine, according to the environmental audio information, a first camera from the plurality of cameras, where a human voice sound source exists in a shooting area of the first camera;
the display module 2803 is configured to differentially display a video frame captured by the first camera and a video frame captured by a second camera, where the second camera is another camera of the plurality of cameras except the first camera.
Optionally, the plurality of cameras are n groups of cameras, different groups of cameras are arranged on different devices, the same group of cameras are arranged on the same device, and n is a positive integer; the acquisition module 2801 is configured to:
N pieces of audio information are acquired, the n pieces of audio information are environment audio information, the n pieces of audio information are in one-to-one correspondence with the n groups of cameras, and each piece of audio information in the n pieces of audio information is the audio information in the environment where the corresponding group of cameras are located.
Optionally, the determining module 2802 is configured to:
determining at least one target audio information from the n audio information, the target audio information being audio information of a voice of a person; a first camera is determined from a set of cameras corresponding to each of the at least one target audio information.
Optionally, the determining module 2802 is configured to:
for any one of at least one piece of target audio information, if a group of cameras corresponding to the target audio information comprises j cameras, performing voice sound source positioning according to the target audio information to obtain the direction of the voice sound source, wherein j is an integer greater than or equal to 2, and the shooting directions of the j cameras are different; and determining a first camera from the j cameras according to the direction of the voice source and the shooting direction of each camera in the j cameras.
Optionally, the display module 2803 is configured to:
amplifying and displaying the video picture shot by the first camera, and reducing and displaying the video picture shot by the second camera; or the video picture shot by the first camera is taken as a main picture of the picture-in-picture mode, the video picture shot by the second camera is taken as a sub-picture of the picture-in-picture mode, and the video picture shot by the first camera and the video picture shot by the second camera are displayed in the picture-in-picture mode.
Optionally, the plurality of cameras are all arranged on the device, and shooting directions of the plurality of cameras are different; the apparatus further comprises:
the starting module is used for starting a plurality of cameras after receiving the multi-shot video recording instruction;
the acquisition module 2801 is configured to:
the method comprises the steps of acquiring audio information acquired by a plurality of microphones of the device, wherein the audio information acquired by the plurality of microphones is environmental audio information, and the plurality of microphones are arranged at different positions of the device.
Optionally, the plurality of cameras are at least two groups of cameras, one group of cameras in the at least two groups of cameras is arranged on the device, and the other at least one group of cameras are arranged on at least one cooperative device in a multi-screen cooperative state with the device one by one; the apparatus further comprises:
the starting module is used for starting the camera of the device after receiving the collaborative video recording instruction and indicating each collaborative device in at least one collaborative device to start the camera of the device;
the acquisition module 2801 is configured to:
acquiring a video picture shot by a camera of the device, and receiving the video picture shot by the camera of the device and sent by each cooperative device in at least one cooperative device; the method comprises the steps of acquiring audio information acquired by a microphone of the device, and receiving the audio information acquired by the microphone of the device, which is transmitted by each cooperative device in at least one cooperative device, wherein the audio information acquired by the microphone of the device and the audio information acquired by the microphone of each cooperative device in the at least one cooperative device are environmental audio information.
Optionally, the plurality of cameras are all arranged on the device, and shooting directions of the plurality of cameras are different; the apparatus further comprises:
the starting module is used for starting the cameras after receiving the video call instruction, and the video call instruction is used for indicating to call with the far-end call equipment;
the acquisition module 2801 is configured to:
acquiring audio information acquired by a plurality of microphones of the device, wherein the audio information acquired by the plurality of microphones is environmental audio information, and the plurality of microphones are arranged at different positions of the device;
the display module 2803 is for:
and displaying the video picture shot by the first camera and not displaying the video picture shot by the second camera on the video call interface, and sending the video picture shot by the first camera to the far-end call equipment for display.
Optionally, the apparatus further comprises:
the first generation module is used for generating multi-channel audio data according to audio information corresponding to one group of cameras for any group of cameras in the n groups of cameras in the shooting process of the n groups of cameras, and carrying out channel separation on the multi-channel audio data of the group of cameras to obtain audio data of each camera in the group of cameras, wherein the channels of the audio data of different cameras in the group of cameras are different;
The second generation module is used for generating a video file corresponding to one camera according to video data of a video picture shot by the one camera and audio data of the one camera for any one camera in the n groups of cameras.
Optionally, the apparatus further comprises:
the first generation module is used for generating multi-channel audio data according to audio information corresponding to one group of cameras for any group of cameras in the n groups of cameras in the shooting process of the n groups of cameras, and carrying out channel separation on the multi-channel audio data of the group of cameras to obtain audio data of each camera in the group of cameras, wherein the channels of the audio data of different cameras in the group of cameras are different;
the third generation module is used for generating a video file according to the video data of all the video pictures currently being displayed and the audio data of the first camera if the video pictures shot by the first camera exist in all the video pictures currently being displayed; if the video pictures shot by the first camera do not exist in all the video pictures currently being displayed, audio data of the cameras to which all the video pictures currently being displayed belong are subjected to audio mixing operation to obtain mixed audio data, and a video file is generated according to the video data and the mixed audio data of all the video pictures currently being displayed.
In the embodiment of the application, the video frames shot by the first camera and the video frames shot by the second camera in the plurality of cameras are differentially displayed, namely, the video frames shot by the first camera and the video frames shot by the second camera are displayed in different display modes, so that the highlighting of the video frames shot by the first camera is realized. That is, the video picture captured by the camera having the human voice sound source in the capturing area is displayed in a different manner from other video pictures, so that the video picture in which the person speaking is located is distinguished from the other video pictures. Therefore, if a person in a certain video picture is speaking, the video picture and other video pictures are displayed in a differentiated mode, so that the flexibility of video picture display can be improved, and a user can watch the video picture better. In addition, the effect of highlighting which video picture can be achieved when the person in which video picture speaks, so that the interaction experience of the user can be improved to a certain extent.
It should be noted that: the video frame display device provided in the above embodiment is only exemplified by the division of the above functional modules when displaying video frames, and in practical application, the above functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the device is divided into different functional modules to perform all or part of the functions described above.
The functional units and modules in the above embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the embodiments of the present application.
The video image display device provided in the above embodiment and the video image display method embodiment belong to the same concept, and specific working processes and technical effects of the units and modules in the above embodiment can be referred to in the method embodiment section, and are not repeated herein.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, data subscriber line (Digital Subscriber Line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium such as a floppy Disk, a hard Disk, a magnetic tape, an optical medium such as a digital versatile Disk (Digital Versatile Disc, DVD), or a semiconductor medium such as a Solid State Disk (SSD), etc.
The above embodiments are not intended to limit the present application, and any modifications, equivalent substitutions, improvements, etc. within the technical scope of the present disclosure should be included in the protection scope of the present application.
Claims (11)
1. A video picture display method, applied to a terminal, comprising:
the method comprises the steps of obtaining video pictures shot by each camera of a plurality of cameras, and obtaining environment audio information, wherein the plurality of cameras are n groups of cameras, different groups of cameras are arranged on different devices, the same group of cameras are arranged on the same device, the environment audio information is n pieces of audio information, the n pieces of audio information are in one-to-one correspondence with the n groups of cameras, each piece of audio information in the n pieces of audio information is the audio information in the environment where the corresponding group of cameras are located, and n is a positive integer;
determining a first camera from the plurality of cameras according to the environmental audio information, wherein a human sound source exists in a shooting area of the first camera;
performing differential display on a video picture shot by the first camera and a video picture shot by a second camera, wherein the second camera is other cameras except the first camera in the plurality of cameras;
The method further comprises the steps of:
in the shooting process of the n groups of cameras, generating multi-channel audio data according to audio information corresponding to the n groups of cameras, and carrying out channel separation on the multi-channel audio data of the group of cameras to obtain audio data of each camera in the group of cameras, wherein the channels of the audio data of different cameras in the group of cameras are different;
if all the video pictures currently being displayed have the video pictures shot by the first camera, generating a video file according to the video data of all the video pictures currently being displayed and the audio data of the first camera;
and if the video pictures shot by the first camera do not exist in all the video pictures currently being displayed, audio mixing operation is carried out on the audio data of the cameras to which all the video pictures currently being displayed belong, mixed audio data are obtained, and a video file is generated according to the video data of all the video pictures currently being displayed and the mixed audio data.
2. The method of claim 1, wherein said determining a first camera from said plurality of cameras based on said environmental audio information comprises:
Determining at least one target audio information from the n audio information, wherein the target audio information is audio information with human voice;
and determining the first camera from a group of cameras corresponding to each piece of target audio information in the at least one piece of target audio information.
3. The method of claim 2, wherein the determining the first camera from a group of cameras corresponding to each of the at least one target audio information comprises:
for any one of the at least one piece of target audio information, if a group of cameras corresponding to the one piece of target audio information comprises j cameras, performing voice sound source positioning according to the one piece of target audio information to obtain the direction of the voice sound source, wherein j is an integer greater than or equal to 2, and the shooting directions of the j cameras are different;
and determining the first camera from the j cameras according to the direction of the voice sound source and the shooting direction of each camera in the j cameras.
4. The method of claim 1, wherein the differentially displaying the video captured by the first camera and the video captured by the second camera comprises:
Amplifying and displaying the video picture shot by the first camera, and reducing and displaying the video picture shot by the second camera; or,
and displaying the video picture shot by the first camera and the video picture shot by the second camera in the picture-in-picture mode by taking the video picture shot by the first camera as a main picture of the picture-in-picture mode and taking the video picture shot by the second camera as a sub-picture of the picture-in-picture mode.
5. The method according to any one of claims 1 to 4, wherein the plurality of cameras are all disposed at the terminal, and the shooting directions of the plurality of cameras are different;
before the video picture shot by each camera in the plurality of cameras is acquired, the method further comprises:
after receiving a multi-shot video recording instruction, starting the cameras;
the obtaining environmental audio information includes:
the method comprises the steps of acquiring audio information acquired by a plurality of microphones of the terminal, wherein the audio information acquired by the plurality of microphones is the environmental audio information, and the plurality of microphones are arranged at different positions of the terminal.
6. The method of any one of claims 1-4, wherein the plurality of cameras are at least two groups of cameras, one group of cameras in the at least two groups of cameras is arranged at the terminal, and at least one other group of cameras is arranged at least one cooperative device in a multi-screen cooperative state with the terminal;
Before the video picture shot by each camera in the plurality of cameras is acquired, the method further comprises:
after receiving the collaborative video command, starting a camera of the terminal, and indicating each collaborative device in the at least one collaborative device to start the camera of the terminal;
the obtaining the video picture shot by each camera in the plurality of cameras comprises the following steps:
acquiring a video picture shot by a camera of the terminal, and receiving the video picture shot by the camera of the terminal and sent by each cooperative device in the at least one cooperative device;
the obtaining environmental audio information includes:
the method comprises the steps of acquiring audio information acquired by a microphone of the terminal, and receiving the audio information acquired by the microphone of the terminal, which is transmitted by each of at least one cooperative device, wherein the audio information acquired by the microphone of the terminal and the audio information acquired by the microphone of each of the at least one cooperative device are the environmental audio information.
7. A method according to any one of claims 1 to 3, wherein the plurality of cameras are all arranged at the terminal, and the shooting directions of the plurality of cameras are different;
Before the video picture shot by each camera in the plurality of cameras is acquired, the method further comprises:
after receiving a video call instruction, starting the cameras, wherein the video call instruction is used for indicating to call with far-end call equipment;
the obtaining environmental audio information includes:
acquiring audio information acquired by a plurality of microphones of the terminal, wherein the audio information acquired by the plurality of microphones is the environmental audio information, and the plurality of microphones are arranged at different positions of the terminal;
the differential display of the video picture shot by the first camera and the video picture shot by the second camera comprises the following steps:
and displaying the video picture shot by the first camera on a video call interface without displaying the video picture shot by the second camera, and sending the video picture shot by the first camera to the far-end call equipment for display.
8. The method of any of claims 1-4, wherein after obtaining the audio data for each camera in the set of cameras, further comprising:
and for any one camera in the n groups of cameras, generating a video file corresponding to the one camera according to video data of a video picture shot by the one camera and audio data of the one camera.
9. A video picture display device, the device comprising:
the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring video pictures shot by each camera in a plurality of cameras and acquiring environmental audio information, the plurality of cameras are n groups of cameras, different groups of cameras are arranged on different devices, the same group of cameras are arranged on the same device, the environmental audio information is n pieces of audio information, the n pieces of audio information are in one-to-one correspondence with the n groups of cameras, each piece of audio information in the n pieces of audio information is the audio information in the environment where the corresponding group of cameras are located, and n is a positive integer;
the determining module is used for determining a first camera from the plurality of cameras according to the environmental audio information, wherein a human voice sound source exists in a shooting area of the first camera;
the display module is used for carrying out differential display on the video picture shot by the first camera and the video picture shot by the second camera, and the second camera is the other cameras except the first camera in the plurality of cameras;
the apparatus further comprises:
the first generation module is used for generating multi-channel audio data according to audio information corresponding to any group of cameras in the n groups of cameras in the shooting process of the n groups of cameras, and carrying out channel separation on the multi-channel audio data of the group of cameras to obtain audio data of each camera in the group of cameras, wherein the channels of the audio data of different cameras in the group of cameras are different;
A third generating module, configured to generate a video file according to video data of all video frames currently being displayed and audio data of the first camera if there are video frames captured by the first camera in all video frames currently being displayed; and if the video pictures shot by the first camera do not exist in all the video pictures currently being displayed, audio mixing operation is carried out on the audio data of the cameras to which all the video pictures currently being displayed belong, mixed audio data are obtained, and a video file is generated according to the video data of all the video pictures currently being displayed and the mixed audio data.
10. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, which computer program, when executed by the processor, implements the method according to any of claims 1-8.
11. A computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the method of any of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210384109.4A CN115550559B (en) | 2022-04-13 | 2022-04-13 | Video picture display method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210384109.4A CN115550559B (en) | 2022-04-13 | 2022-04-13 | Video picture display method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115550559A CN115550559A (en) | 2022-12-30 |
CN115550559B true CN115550559B (en) | 2023-07-25 |
Family
ID=84724672
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210384109.4A Active CN115550559B (en) | 2022-04-13 | 2022-04-13 | Video picture display method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115550559B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117389507B (en) * | 2023-12-12 | 2024-05-10 | 荣耀终端有限公司 | Audio data processing method, electronic device and storage medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1416538A (en) * | 2001-01-12 | 2003-05-07 | 皇家菲利浦电子有限公司 | Method and appts. for determining camera movement control criteria |
CN102281425A (en) * | 2010-06-11 | 2011-12-14 | 华为终端有限公司 | Method and device for playing audio of far-end conference participants and remote video conference system |
CN102891984A (en) * | 2011-07-20 | 2013-01-23 | 索尼公司 | Transmitting device, receiving system, communication system, transmission method, reception method, and program |
CN107770477A (en) * | 2017-11-07 | 2018-03-06 | 广东欧珀移动通信有限公司 | Video call method, device, terminal and storage medium |
CN111669636A (en) * | 2020-06-19 | 2020-09-15 | 海信视像科技股份有限公司 | Audio-video synchronous video recording method and display equipment |
CN112995566A (en) * | 2019-12-17 | 2021-06-18 | 佛山市云米电器科技有限公司 | Sound source positioning method based on display equipment, display equipment and storage medium |
CN113365012A (en) * | 2020-03-06 | 2021-09-07 | 华为技术有限公司 | Audio processing method and device |
CN113727021A (en) * | 2021-08-27 | 2021-11-30 | 维沃移动通信(杭州)有限公司 | Shooting method and device and electronic equipment |
WO2022068537A1 (en) * | 2020-09-29 | 2022-04-07 | 华为技术有限公司 | Image processing method and related apparatus |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004179971A (en) * | 2002-11-27 | 2004-06-24 | Fuji Photo Film Co Ltd | Monitor camera |
US8659668B2 (en) * | 2005-10-07 | 2014-02-25 | Rearden, Llc | Apparatus and method for performing motion capture using a random pattern on capture surfaces |
CN102215372B (en) * | 2010-04-07 | 2015-04-15 | 苹果公司 | Remote control operations in a video conference |
US20150049163A1 (en) * | 2013-03-15 | 2015-02-19 | James Paul Smurro | Network system apparatus and method of use adapted for visual neural networking with multi-channel multiplexed streaming medical imagery and packetized clinical informatics |
CN103595953B (en) * | 2013-11-14 | 2017-06-20 | 华为技术有限公司 | A kind of method and apparatus for controlling video capture |
US10552014B2 (en) * | 2017-01-10 | 2020-02-04 | Cast Group Of Companies Inc. | Systems and methods for tracking and interacting with zones in 3D space |
-
2022
- 2022-04-13 CN CN202210384109.4A patent/CN115550559B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1416538A (en) * | 2001-01-12 | 2003-05-07 | 皇家菲利浦电子有限公司 | Method and appts. for determining camera movement control criteria |
CN102281425A (en) * | 2010-06-11 | 2011-12-14 | 华为终端有限公司 | Method and device for playing audio of far-end conference participants and remote video conference system |
CN102891984A (en) * | 2011-07-20 | 2013-01-23 | 索尼公司 | Transmitting device, receiving system, communication system, transmission method, reception method, and program |
CN107770477A (en) * | 2017-11-07 | 2018-03-06 | 广东欧珀移动通信有限公司 | Video call method, device, terminal and storage medium |
CN112995566A (en) * | 2019-12-17 | 2021-06-18 | 佛山市云米电器科技有限公司 | Sound source positioning method based on display equipment, display equipment and storage medium |
CN113365012A (en) * | 2020-03-06 | 2021-09-07 | 华为技术有限公司 | Audio processing method and device |
CN111669636A (en) * | 2020-06-19 | 2020-09-15 | 海信视像科技股份有限公司 | Audio-video synchronous video recording method and display equipment |
WO2022068537A1 (en) * | 2020-09-29 | 2022-04-07 | 华为技术有限公司 | Image processing method and related apparatus |
CN113727021A (en) * | 2021-08-27 | 2021-11-30 | 维沃移动通信(杭州)有限公司 | Shooting method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN115550559A (en) | 2022-12-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021164445A1 (en) | Notification processing method, electronic apparatus, and system | |
CN114710640B (en) | Video call method, device and terminal based on virtual image | |
JP7416519B2 (en) | Multi-terminal multimedia data communication method and system | |
KR20170091913A (en) | Method and apparatus for providing video service | |
CN114040242B (en) | Screen projection method, electronic equipment and storage medium | |
CN114173000B (en) | Method, electronic equipment and system for replying message and storage medium | |
CN114338965B (en) | Audio processing method and electronic equipment | |
CN113835649B (en) | Screen projection method and terminal | |
CN114610253A (en) | Screen projection method and equipment | |
CN114185503B (en) | Multi-screen interaction system, method, device and medium | |
CN115278377A (en) | Method for continuously playing multimedia content between devices | |
CN114356195B (en) | File transmission method and related equipment | |
CN114697742A (en) | Video recording method and electronic equipment | |
CN115550597A (en) | Shooting method, system and electronic equipment | |
CN114827581A (en) | Synchronization delay measuring method, content synchronization method, terminal device, and storage medium | |
CN115756270B (en) | Content sharing method, device and system | |
CN115550559B (en) | Video picture display method, device, equipment and storage medium | |
CN114827098B (en) | Method, device, electronic equipment and readable storage medium for taking photo | |
WO2024178962A1 (en) | Method for sharing file, electronic device, and computer-readable storage medium | |
CN114640747A (en) | Call method, related device and system | |
CN115242994B (en) | Video call system, method and device | |
WO2023216119A1 (en) | Audio signal encoding method and apparatus, electronic device and storage medium | |
EP4336823A1 (en) | Method for screen sharing, related electronic device, and system | |
WO2024022307A1 (en) | Screen mirroring method and electronic device | |
WO2024094115A1 (en) | Content sharing method and related device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |