CN117880731A

CN117880731A - Audio and video recording method and device and storage medium

Info

Publication number: CN117880731A
Application number: CN202211247669.1A
Authority: CN
Inventors: 余俊飞; 史润宇; 刘念; 刘晗宇
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2022-10-12
Filing date: 2022-10-12
Publication date: 2024-04-12

Abstract

The disclosure relates to an audio and video recording method, an audio and video recording device and a storage medium. The audio and video recording method comprises the following steps: acquiring audio data and acquiring video data; determining sound source space position information of a sound source included in the video data relative to the recording device; based on the sound source space position information, fusing the audio data and the video data to obtain fused audio and video information; and generating an audio and video recording file based on the fused audio and video information. According to the method and the device, the recording equipment can track and acquire the space position information of the sound source in real time in the process of recording the space audio and video, audio data is subjected to audio focusing according to the information to obtain a focused audio signal, the signal is used for carrying out space audio coding to finally obtain the space audio and video file, the sound source tracking capacity of the recording equipment in the process of recording the audio and video is effectively improved, the third dimension of the space audio and video is enhanced, and a user can obtain surrounding type listening experience.

Description

Audio and video recording method and device and storage medium

Technical Field

The disclosure relates to the field of audio and video recording, and in particular relates to an audio and video recording method, an audio and video recording device and a storage medium.

Background

When recording audio and video, current recording devices such as smart phones and video recorders generally take audio collected by a microphone as audio signals of video directly. However, the audio recorded in this way is poor in terms of spatial and stereoscopic performance, and it is difficult to provide users with an "immersive" hearing experience.

In the related art, audio is generally recorded through a microphone, and then encoded and stored, and the obtained audio data is directly used in video.

However, the method does not effectively and adaptively adjust the spatial audio according to the spatial position of the sound source, but simply stores the signal received by the microphone, so that the audio data lacks spatial information, is single, and lacks spatial sense and stereoscopic sense.

Disclosure of Invention

In order to overcome the problems in the related art, the present disclosure provides an audio/video recording method, which can enable recording equipment to track and acquire spatial position information of a sound source in real time in the process of recording spatial audio/video, then perform audio focusing on audio data according to the information to obtain a focused audio signal, and perform spatial audio coding by using the signal to finally obtain a spatial audio/video file, so that the sound source tracking capability of the recording equipment in the process of recording audio/video is effectively improved, the stereoscopic impression of the spatial audio/video is greatly enhanced, and finally, a user obtains surrounding type listening experience.

According to a first aspect of embodiments of the present disclosure, there is provided an audio/video recording method, applied to a recording device, the method including: acquiring audio data and acquiring video data; determining sound source space position information of a sound source included in the video data relative to the recording device; based on the sound source space position information, fusing the audio data and the video data to obtain fused audio and video information; and generating an audio and video recording file based on the fused audio and video information.

In one embodiment, the fusing the audio data and the video data based on the spatial position information of the sound source to obtain fused audio/video information includes: determining the position information of the recording equipment and determining the beam pointing information; and carrying out sound source focusing on the audio data based on the sound source space position information, the position information of the recording equipment and the beam pointing information to obtain the audio and video information after the sound source focusing.

In one embodiment, performing sound source focusing on the audio data based on the sound source spatial position information, the position information of the recording device, and the beam pointing information to obtain audio and video information after focusing the sound source, including: preprocessing the audio data, wherein the preprocessing comprises noise reduction processing and/or framing processing; and carrying out sound source focusing on the audio data based on the sound source space position information, the position information of the recording equipment, the beam pointing information and the preprocessed audio data to obtain sound and video information after sound source focusing.

In one embodiment, the determining beam pointing information includes: determining beam pointing information based on pre-configured beam pointing information; or acquiring beam pointing operation of a user in the recording device, and determining beam pointing information based on the beam pointing operation.

In one embodiment, the generating an audio-video recording file based on the fused audio-video information includes: and carrying out spatial audio coding on the fused audio and video information, and storing the audio and video information after the spatial audio coding to obtain a spatial audio and video recording file.

In one embodiment, the determining the sound source spatial position information of the sound source included in the video data relative to the recording apparatus includes: determining a relative position between a video recording device and a sound source included in the video data; and determining sound source space position information of the sound source in each time unit based on the relative positions.

According to a second aspect of embodiments of the present disclosure, there is provided an audio-video recording apparatus applied to a recording device, the apparatus including: the audio data acquisition module is used for acquiring audio data; the video data acquisition module is used for acquiring video data; a sound source space information calculation module, configured to determine sound source space position information of a sound source included in the video data relative to the recording device; the audio focusing module is used for fusing the audio data and the video data based on the sound source space position information to obtain fused audio and video information; and the spatial audio coding storage module is used for generating an audio and video recording file based on the fused audio and video information.

In one embodiment, the audio focusing module fuses the audio data and the video data based on the sound source spatial position information in the following manner to obtain fused audio/video information: determining the position information of the recording equipment and determining the beam pointing information; and carrying out sound source focusing on the audio data based on the sound source space position information, the position information of the recording equipment and the beam pointing information to obtain the audio and video information after the sound source focusing.

In one embodiment, the audio focusing module performs sound source focusing on the audio data based on the sound source spatial position information, the position information of the recording device and the beam pointing information in the following manner to obtain audio and video information after the sound source focusing: preprocessing the audio data, wherein the preprocessing comprises noise reduction processing and/or framing processing; and carrying out sound source focusing on the audio data based on the sound source space position information, the position information of the recording equipment, the beam pointing information and the preprocessed audio data to obtain sound and video information after sound source focusing.

In one embodiment, the spatial audio coding storage module generates the audio/video recording file based on the fused audio/video information in the following manner: and carrying out spatial audio coding on the fused audio and video information, and storing the audio and video information after the spatial audio coding to obtain a spatial audio and video recording file.

In one embodiment, the sound source space information calculating module determines sound source space position information of a sound source included in the video data with respect to the recording device in the following manner: determining a relative position between a video recording device and a sound source included in the video data; and determining sound source space position information of the sound source in each time unit based on the relative positions.

According to a third aspect of the embodiments of the present disclosure, there is provided a photographing apparatus, including: a processor: a memory for storing processor-executable instructions; wherein the processor is configured to: an audio-video recording method according to any one of claims 1 to 6.

According to a fourth aspect of embodiments of the present disclosure, a non-transitory computer-readable storage medium, when instructions in the storage medium are executed by a processor of a mobile terminal, enables the mobile terminal to perform the audio-video recording method of any one of claims 1 to 6.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects: the recording device tracks and acquires the space position information of the sound source in real time in the process of recording the space audio and video, and performs audio focusing on the audio data according to the information to obtain a focused audio signal, and spatial audio coding is performed on the audio signal to finally obtain a space audio file, so that the sound source tracking capability of the recording device in the process of recording the audio and video is effectively improved, the stereoscopic impression of the space audio is greatly enhanced, and finally, a user obtains surrounding type listening experience.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 illustrates a flowchart of an audio recording method shown in an exemplary embodiment of the present disclosure.

Fig. 2 is a flowchart illustrating a method of recording audio and video according to an exemplary embodiment.

Fig. 3 is a flowchart illustrating an audio-video recording method according to an exemplary embodiment.

Fig. 4 is a flowchart illustrating an audio-video recording method according to an exemplary embodiment.

Fig. 5 is a flowchart illustrating an audio-video recording method according to an exemplary embodiment.

Fig. 6 illustrates a flowchart of an audio-video recording method shown in an exemplary embodiment of the present disclosure.

Fig. 7 illustrates a flowchart of a method for audio-visual data preprocessing, which is illustrated in an exemplary embodiment of the present disclosure.

Fig. 8 is a block diagram illustrating an audio-video recording apparatus 100 according to an exemplary embodiment.

Fig. 9 is a block diagram illustrating an audio-video recording apparatus 200 according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure.

The method and the device are applied to the field of space audio and video recording, and can enable audio and video recording equipment to generate space audio and video files in real time or by taking the space position information of the sound source in the recording process.

Fig. 1 is a flowchart of a related art audio file production method. As shown in fig. 1, in the related art, audio recorded by a microphone is used as audio data, and the recorded audio data is encoded to obtain an audio file and stored in the audio file. And adding the prepared audio file into a corresponding video file, and converting the audio file into an audio-video file. However, this method lacks spatial information of the audio data because no other audio data processing is performed, so that the audio data is single, lacks spatial sense and stereoscopic sense, and the audio/video file made based on the audio data also lacks spatial sense and stereoscopic sense.

The embodiment of the disclosure provides an audio and video recording method, which records audio in real time and acquires space angle information of a sound source based on recorded video, and further provides an important reference for rendering space audio data, so that the environmental adaptability of an intelligent sound in the process of recording the space audio and video is improved, and the third dimension and the space sense of the space audio and video are further improved.

The audio and video recording method provided by the embodiment of the disclosure is applied to recording equipment, and the recording equipment has an audio recording function and a video recording function. The audio recording function of the recording device is used for recording the environmental sound and the sound emitted by sound sources such as people or objects in the environment, and the video recording function of the recording device is used for recording images containing all target sound sources. The recording device in the embodiment of the disclosure includes, but is not limited to, a mobile phone, a video recorder, a network camera, a microphone, a recording stick, a recording pen, a mini recorder, a planar recording board, a network camera, a professional recording device and the like.

Fig. 2 is a flowchart illustrating an audio/video recording method according to an exemplary embodiment, and the method includes steps S101 to S104 as shown in fig. 2.

In step S101, audio data is acquired, and video data is acquired.

The audio acquisition device (a plurality of microphones in general, or other devices capable of recording and storing audio data) of the recording device for acquiring the audio data is set as a normally open mode for the normally open device, continuously acquires the sound emitted by the environmental sound, the sound source such as people/objects and the like in the environment where the audio acquisition device is located, and obtains the spatial audio data through the rendering of the sound source angle after the filtering processing as the initial audio data. For example, sound of each member in a conference room is collected; for another example, sounds of pedestrians and passing vehicles are collected.

The video acquisition device (generally, a plurality of cameras, or other devices capable of recording and storing video data) of the recording device for acquiring video data is also a normally open device, and continuously acquires images including all target sound sources. For example, capturing images in a meeting room scene; for another example, images of pedestrians and passing vehicles are acquired.

In step S102, sound source spatial position information of a sound source included in the video data with respect to the recording apparatus is determined.

In step S103, audio data and video data are fused based on the sound source spatial position information, and fused audio-video information is obtained.

In step S104, an audio/video recording file is generated based on the fused audio/video information.

The audio and video information obtained after fusion is an intermediate file, and the intermediate file can be stored as a spatial audio and video file after further transcoding processing.

In the embodiment of the disclosure, audio information is acquired based on an audio acquisition device such as a microphone. Video data are collected based on video collecting devices such as video recorders, sound source space position information of audio data is extracted based on the collected video data, and therefore relative position information of a sound source and the audio collecting devices is obtained. And obtaining the fused audio and video information based on the sound source space information on the audio data and the video data. And finally, coding the fused audio and video information to generate an audio and video recording file.

The embodiment of the disclosure describes a flow of fusing audio data and video data to obtain fused audio/video information.

Fig. 3 is a flowchart illustrating an audio/video recording method according to an exemplary embodiment, and the method includes steps S201 to S202 as shown in fig. 3.

In step S201, position information of the recording apparatus is determined, and beam pointing information is determined.

Wherein the recording device is stationary, and the position information of the recording device may be predetermined.

The beam pointing information may be predetermined, or may be determined by a beam pointing operation performed by the user in the recording apparatus.

In step S202, audio data is subjected to sound source focusing based on the sound source spatial position information, the position information of the recording device, and the beam pointing information, so as to obtain audio and video information after the sound source focusing.

In the embodiment of the disclosure, the position information and the beam pointing information of the recording device are determined, and the audio data is subjected to sound source focusing through the position information, the beam pointing information and the sound source space position information of the recording device to obtain the audio and video information after the sound source focusing.

The embodiment of the disclosure describes a flow of audio and video information after focusing on an audio source.

Fig. 4 is a flowchart illustrating an audio/video recording method according to an exemplary embodiment, and the method includes steps S301 to S302 as shown in fig. 4.

In step S301, the audio data is subjected to preprocessing including noise reduction processing and/or framing processing.

The noise reduction processing of the audio data may be based on the adaptive filter to improve the signal-to-noise ratio of the audio signal.

In one implementation, an adaptive filter for performing noise reduction processing on audio data may be set by using an FIR filter and a time domain adaptive filtering method in the embodiments of the disclosure.

The audio signal is a non-stationary state process, and cannot be analyzed and processed by using a digital signal processing technology for processing stationary signals, so that the characteristics and parameters for characterizing the essential characteristics of the audio signal are all changed along with time. But on the other hand, although audio has time-varying properties, its properties remain substantially unchanged, i.e. relatively stable, over a short time period (thought to be in the short term of 10-30 ms), and thus can be regarded as a quasi-stationary process, i.e. the speech signal has short-term stationarity. Therefore, in the embodiment of the disclosure, the audio signal with the overall characteristic parameter changing at all times (i.e., the overall unstable) can be divided into a plurality of audio signals with the characteristics unchanged and stable by performing the framing processing on the audio data, so as to facilitate the acquisition of the spatial information.

In step S302, sound source focusing is performed on the audio data based on the sound source spatial position information, the position information of the recording device, the beam pointing information, and the preprocessed audio data, so as to obtain audio/video information after sound source focusing.

In one implementation of the disclosed embodiments, the disclosure uses a robust adaptive diagonal loading method to obtain spatial audio-video data after focusing of a sound source.

In the embodiment of the disclosure, the collected audio data is processed to obtain the audio data with noise filtering and stable characteristics, and the preprocessed audio data is subjected to sound source focusing based on various pieces of information containing the spatial position of the sound source to obtain the spatial audio and video data.

In one embodiment, determining beam pointing information includes:

determining beam pointing information based on pre-configured beam pointing information; or (b)

And acquiring beam pointing operation of a user in the recording device, and determining beam pointing information based on the beam pointing operation.

In an exemplary embodiment of the disclosure, a robust adaptive diagonal loading method is adopted to form a beam and determine beam pointing information, a robust adaptive beam forming algorithm based on a diagonal loading technology is a common algorithm, and the diagonal loading beam forming algorithm is used for correcting a covariance matrix by adding a small loading amount to the sampling covariance matrix, so that the purpose of suppressing noise beams can be achieved under the condition that a matrix eigenvector structure is not changed.

In one embodiment, generating an audio-video recording file based on the fused audio-video information includes:

and carrying out spatial audio coding on the fused audio and video information, and storing the audio and video information subjected to the spatial audio coding to obtain a spatial audio and video recording file.

The fused audio and video information is an intermediate file, cannot be stored as a final file, and is further processed.

In the embodiment of the disclosure, audio and video information after the focal length of the sound source is encoded, and a spatial audio and video file is obtained and stored.

The following describes a procedure of determining sound source spatial position information of a sound source included in video data with respect to a recording apparatus.

Fig. 5 is a flowchart illustrating an audio/video recording method according to an exemplary embodiment, and the method includes steps S401 to S402 as shown in fig. 5.

In step S401, the relative position between the video recording apparatus and the sound source included in the video data is determined.

In step S402, sound source spatial position information of the sound source at each time unit is determined based on the relative positions.

The sound source spatial position information is calculated based on the relative positions of the recording device and the sound source in the video, and because the sound source position is not fixed, the sound source can change along with the change of time, so that continuous operation processing is required to be carried out on video signals to obtain the spatial position information of the sound source at each moment, and the relative positions of the audio recording device and the sound source are determined.

In the embodiment of the disclosure, the relative positions of the audio recording device and the sound source are continuously obtained, and the spatial position information of the sound source at each moment is determined according to the relative positions of the audio recording device and the sound source, so that the spatial position information of the sound source can be obtained under the condition that the sound source is continuously changed.

In one implementation of the disclosed embodiments, adaptive noise reduction is performed on the acquired audio data, and audio framing processing is performed on the adaptive noise reduced audio data.

The following describes a flow of an audio/video recording method in an exemplary embodiment of the present disclosure.

Fig. 6 shows a flowchart of a spatial audio/video recording method according to an exemplary embodiment of the present disclosure, where, as shown in fig. 6, a process of obtaining spatial audio/video data is as follows: the audio data acquisition module is responsible for collecting recorded audio signals and environment sound signals to obtain audio data, and sending the audio data to the filtering noise reduction module to obtain noise-reduced audio data. The video data acquisition module is responsible for recording corresponding video signals and sending the video signals to the sound source space information calculation module, and then obtaining angle information between the sound source and recording equipment, namely the space information of the sound source. And the noise-reduced audio data and the sound source space information enter an audio focusing module at the same time to obtain focused audio data. The data is subjected to a spatial audio coding storage module to obtain a spatial audio and video file. The audio data acquisition module consists of a plurality of microphones, the video data acquisition module consists of a camera, and the spatial audio coding storage module consists of a spatial audio signal coding module and a spatial audio signal storage module.

In an exemplary embodiment of the disclosure, an audio data acquisition module, a video data acquisition module, and a sound source space information calculation module are normally open devices in an intelligent recording mode, environmental audio and video are continuously recorded during the running period of equipment, then the audio data and the video data are respectively processed, then two paths of information are fused through an audio focusing module to obtain focused audio signals, and finally a space audio and video file is obtained through a space audio coding storage module.

For the audio part, audio data S recorded by a plurality of microphones is sent to a filtering noise reduction module. The filtering noise reduction module performs noise reduction processing on the audio data S to obtain noise-reduced audio data S_pre, and sends the noise-reduced audio data S_pre to the audio focusing module. The filtering noise reduction module processes the audio data S and reduces noise in the signal. The method comprises the following steps: firstly, sending the audio data S into an adaptive filter, wherein the filter aims at improving the signal-to-noise ratio of an audio signal, and obtaining the noise-reduced audio data S_denoise through iterative calculation, wherein a calculation formula can be expressed as follows: s_denoise=ada_filter (S) the adaptive Filter ada_filter may be designed using an FIR Filter and a time-domain adaptive filtering method and then uniformly framing the denoised signal s_denoise, wherein the duration of the audio frame is 1 second, so as to obtain the processed signal s_pre.

For the video part, the sound source space information calculation module processes video data, and calculates the space position information Ang of the sound source at each moment according to the relative positions of the recording equipment and the sound source in the video.

In the audio focusing module, the position information Pos of the microphone, the Beam pointing information Beam, the spatial position information Ang of the sound source and the noise-reduced audio signal s_denoise are utilized, and the signal s_focus after focusing of the sound source can be obtained by utilizing a robust adaptive diagonal loading method (a common Beam forming method).

And in the spatial audio coding storage module, the signal S_focus of each frame is subjected to spatial audio coding and stored, and finally the spatial video and audio file is obtained.

Fig. 7 shows a flowchart of an audio data preprocessing method shown in an exemplary embodiment of the present disclosure, and the method includes steps S601 to S602 as shown in fig. 7.

In step S601, adaptive noise reduction processing is performed on the acquired audio data.

In step S602, the audio data subjected to the adaptive noise reduction processing is subjected to framing processing.

In the exemplary embodiment of the disclosure, the adaptive filter is used for filtering and noise reduction of the audio data, so that the signal to noise ratio of the audio data is improved, the audio data after the filtering is framed, the audio data with stable characteristics is obtained, and the audio source is convenient to focus to obtain the spatial audio and video data.

It can be understood that, in order to achieve the above functions, the audio and video recording apparatus provided in the embodiments of the present disclosure includes corresponding hardware structures and/or software modules that perform the respective functions. The disclosed embodiments may be implemented in hardware or a combination of hardware and computer software, in combination with the various example elements and algorithm steps disclosed in the embodiments of the disclosure. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation is not to be considered as beyond the scope of the embodiments of the present disclosure.

Fig. 8 is a block diagram of an audio-video recording apparatus 100 according to an exemplary embodiment, applied to a recording device, the apparatus including:

an audio data acquisition module 101 for acquiring audio data;

the video data acquisition module 102 is used for acquiring video data;

a sound source space information calculating module 103, configured to determine sound source space position information of a sound source included in the video data relative to the recording device;

the audio focusing module 104 is configured to fuse audio data and video data based on the spatial position information of the sound source, and obtain fused audio/video information;

the spatial audio coding storage module 105 is configured to generate an audio/video recording file based on the fused audio/video information.

In one embodiment, the audio focusing module 104 fuses the audio data and the video data based on the spatial position information of the sound source in the following manner to obtain fused audio/video information:

determining position information of recording equipment and determining beam pointing information;

and carrying out sound source focusing on the audio data based on the sound source space position information, the position information of the recording equipment and the beam pointing information to obtain the audio and video information after the sound source focusing.

In one embodiment, the audio focusing module 104 performs sound source focusing on the audio data based on the sound source spatial position information, the position information of the recording device, and the beam pointing information in the following manner, so as to obtain audio and video information after the sound source focusing:

preprocessing the audio data, wherein the preprocessing comprises noise reduction processing and/or framing processing;

and carrying out sound source focusing on the audio data based on the sound source space position information, the position information of the recording equipment, the beam pointing information and the preprocessed audio data to obtain the audio and video information after the sound source focusing.

In one embodiment, determining beam pointing information includes:

In one embodiment, the spatial audio coding storage module 105 generates an audio/video recording file based on the fused audio/video information in the following manner, including:

In one embodiment, the sound source spatial information calculation module 103 determines sound source spatial position information of a sound source included in the video data with respect to the recording apparatus in the following manner:

determining a relative position between the video recording device and a sound source included in the video data;

based on the relative positions, sound source spatial position information of the sound source at each time unit is determined.

Fig. 9 is a block diagram illustrating an apparatus 200 for photographing according to an exemplary embodiment. For example, apparatus 200 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.

Referring to fig. 9, the apparatus 200 may include one or more of the following components: a processing component 202, a memory 204, a power component 206, a multimedia component 208, an audio component 210, an input/output (I/O) interface 212, a sensor component 214, and a communication component 216.

The processing component 202 generally controls overall operation of the apparatus 200, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 202 may include one or more processors 220 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 202 can include one or more modules that facilitate interactions between the processing component 202 and other components. For example, the processing component 202 may include a multimedia module to facilitate interaction between the multimedia component 208 and the processing component 202.

The memory 204 is configured to store various types of data to support operations at the apparatus 200. Examples of such data include instructions for any application or method operating on the device 200, contact data, phonebook data, messages, pictures, videos, and the like. The memory 204 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.

The power component 206 provides power to the various components of the device 200. The power components 206 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 200.

The multimedia component 208 includes a screen between the device 200 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 208 includes a front-facing camera and/or a rear-facing camera. The front camera and/or the rear camera may receive external multimedia data when the apparatus 200 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 210 is configured to output and/or input audio signals. For example, the audio component 210 includes a Microphone (MIC) configured to receive external audio signals when the device 200 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 204 or transmitted via the communication component 216. In some embodiments, audio component 210 further includes a speaker for outputting audio signals.

The I/O interface 212 provides an interface between the processing assembly 202 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.

The sensor assembly 214 includes one or more sensors for providing status assessment of various aspects of the apparatus 200. For example, the sensor assembly 214 may detect the on/off state of the device 200, the relative positioning of the components, such as the display and keypad of the device 200, the sensor assembly 214 may also detect a change in position of the device 200 or a component of the device 200, the presence or absence of user contact with the device 200, the orientation or acceleration/deceleration of the device 200, and a change in temperature of the device 200. The sensor assembly 214 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 214 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 214 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 216 is configured to facilitate communication between the apparatus 200 and other devices in a wired or wireless manner. The device 200 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 216 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 216 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 200 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.

In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 204, including instructions executable by processor 220 of apparatus 200 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

It is understood that the term "plurality" in this disclosure means two or more, and other adjectives are similar thereto. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It is further understood that the terms "first," "second," and the like are used to describe various information, but such information should not be limited to these terms. These terms are only used to distinguish one type of information from another and do not denote a particular order or importance. Indeed, the expressions "first", "second", etc. may be used entirely interchangeably. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure.

It will be further understood that "connected" includes both direct connection where no other member is present and indirect connection where other element is present, unless specifically stated otherwise.

It will be further understood that although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the scope of the appended claims.

Claims

1. An audio/video recording method, applied to a recording device, comprising:

acquiring audio data and acquiring video data;

determining sound source space position information of a sound source included in the video data relative to the recording device;

based on the sound source space position information, fusing the audio data and the video data to obtain fused audio and video information;

and generating an audio and video recording file based on the fused audio and video information.

2. The method of claim 1, wherein the fusing the audio data and the video data based on the sound source spatial location information to obtain fused audio-video information comprises:

determining the position information of the recording equipment and determining the beam pointing information;

3. The method according to claim 2, wherein performing sound source focusing on the audio data based on the sound source spatial position information, the position information of the recording device, and the beam pointing information to obtain sound source focused audio-video information, comprises:

and carrying out sound source focusing on the audio data based on the sound source space position information, the position information of the recording equipment, the beam pointing information and the preprocessed audio data to obtain sound and video information after sound source focusing.

4. The method of claim 2, wherein the determining beam pointing information comprises:

5. The method according to any one of claims 1 to 4, wherein generating an audio-video recording file based on the fused audio-video information includes:

and carrying out spatial audio coding on the fused audio and video information, and storing the audio and video information after the spatial audio coding to obtain a spatial audio and video recording file.

6. The method according to claim 1, wherein said determining sound source spatial position information of a sound source included in said video data with respect to said recording apparatus comprises:

determining a relative position between a video recording device and a sound source included in the video data;

and determining sound source space position information of the sound source in each time unit based on the relative positions.

7. An audio/video recording apparatus, for use with a recording device, the apparatus comprising:

the audio data acquisition module is used for acquiring audio data;

the video data acquisition module is used for acquiring video data;

a sound source space information calculation module, configured to determine sound source space position information of a sound source included in the video data relative to the recording device;

the audio focusing module is used for fusing the audio data and the video data based on the sound source space position information to obtain fused audio and video information;

and the spatial audio coding storage module is used for generating an audio and video recording file based on the fused audio and video information.

8. The apparatus of claim 7, wherein the audio focusing module fuses the audio data and the video data based on the sound source spatial location information to obtain fused audio-video information by:

9. The apparatus of claim 8, wherein the audio focusing module performs sound source focusing on the audio data based on the sound source spatial position information, the position information of the recording device, and the beam pointing information to obtain the audio-video information after sound source focusing by:

10. The apparatus of claim 8, wherein the determining beam pointing information comprises:

11. The apparatus according to any one of claims 7 to 10, wherein the spatial audio coding storage module generates an audio-video recording file based on the fused audio-video information by:

12. The apparatus of claim 7, wherein the sound source spatial information calculation module determines sound source spatial position information of a sound source included in the video data with respect to the recording device by:

13. A photographing apparatus, comprising:

a processor:

a memory for storing processor-executable instructions;

wherein the processor is configured to: an audio-video recording method according to any one of claims 1 to 6.

14. A non-transitory computer readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to perform the audio video recording method of any one of claims 1 to 6.