CN105979469B - recording processing method and terminal - Google Patents

recording processing method and terminal Download PDF

Info

Publication number
CN105979469B
CN105979469B CN201610509141.5A CN201610509141A CN105979469B CN 105979469 B CN105979469 B CN 105979469B CN 201610509141 A CN201610509141 A CN 201610509141A CN 105979469 B CN105979469 B CN 105979469B
Authority
CN
China
Prior art keywords
sound
information
sound source
scene image
recording
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610509141.5A
Other languages
Chinese (zh)
Other versions
CN105979469A (en
Inventor
黄业伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN201610509141.5A priority Critical patent/CN105979469B/en
Publication of CN105979469A publication Critical patent/CN105979469A/en
Application granted granted Critical
Publication of CN105979469B publication Critical patent/CN105979469B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S5/00Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction

Abstract

The invention provides a sound recording processing method and a terminal, wherein the sound recording processing method comprises the steps of obtaining scene image information collected by a camera and sound information collected by a microphone during sound recording, obtaining position information of each sound source in a scene image according to the scene image information, generating sound channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and a plurality of sound channels required to be adopted during sound recording playing, and synthesizing the sound information collected by the microphone into multi-channel audio data according to the sound channel coefficient information.

Description

recording processing method and terminal
Technical Field
The invention relates to the technical field of terminals, in particular to recording processing methods and a terminal.
Background
The technology of mobile terminals is rapidly developed, and people often use the mobile terminals to record sound and video and record life events.
The stereo recording and video recording can improve the scene restoration degree, the playing is more stereo in the double-loudspeaker or earphone scene of the mobile terminal, and the user experience is improved. A common method for stereo recording is to use multiple microphones in a mobile terminal to collect sound, so that the positioning effect of the multiple microphones is better.
However, many mobile terminals only have a single microphone configuration, and the configuration of multiple microphones is generally limited by the size of the terminal, and if the mobile terminal is small in size, the relative positions of the multiple microphones are close, the sound positioning is poor, and the recording and video recording effect is not good.
Disclosure of Invention
In view of the above, the present invention provides recording processing methods and terminals, which solve the problem that it is difficult for existing mobile terminals to synthesize multi-channel sound by using a single microphone for recording.
In order to solve the above technical problem, in the aspect of , the present invention provides sound recording processing methods, which are applied to a terminal, and the method includes:
acquiring scene image information acquired by a camera and sound information acquired by a microphone during recording;
acquiring position information of each sound source in the scene image according to the scene image information;
generating sound channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and a plurality of sound channels required to be adopted when the sound record is played;
and synthesizing sound information collected by the microphone into multi-channel audio data according to the sound channel coefficient information.
In another aspect, the present invention further provides terminals, including:
the acquisition module is used for acquiring scene image information acquired by the camera and sound information acquired by the microphone during recording;
the position information acquisition module is used for acquiring the position information of each sound source in the scene image according to the scene image information;
a sound channel coefficient determining module, configured to generate sound channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and a plurality of sound channels required to be used when a recording is played;
and the synthesis module is used for synthesizing the sound information collected by the microphone into multi-channel audio data according to the sound channel coefficient information.
The technical scheme of the invention has the following beneficial effects:
the multichannel audio data can be synthesized by adopting the single-channel sound information collected by microphones, so that a plurality of microphones do not need to be arranged in the terminal for recording, and the cost of the terminal for recording is reduced.
Drawings
FIG. 1 is a flowchart illustrating a recording processing method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a recording processing method according to a second embodiment of the present invention;
FIG. 3 is a flowchart illustrating a recording processing method according to a third embodiment of the present invention;
fig. 4 is a block diagram of a terminal according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings and are not intended to limit the scope of the invention.
Referring to fig. 1, fig. 1 is a flowchart of a recording processing method according to an embodiment of the invention, where the method is applied to a terminal, and includes the following steps:
step S11: and acquiring scene image information acquired by a camera and sound information acquired by a microphone during recording.
And the sound information collected by the microphone is single sound channel sound information.
And step S12, acquiring the position information of each sound source in the scene image according to the scene image information.
The number of sound sources in the scene image may be , or may be more than .
The position information of the sound source refers to the position information of the sound source in the scene image, and may include, for example, parts of the sound source located in the left half of the scene image or parts of the scene image, or a scale factor of the sound source in the lateral direction of the scene image, for example, the scale factor is [1,0] at the leftmost end of the scene image in the lateral direction, the scale factor is [0,1] at the rightmost end of the scene image in the lateral direction, and the scale factor is [0.5,0.5] at the central point of the scene image in the lateral direction.
And step S13, generating the sound channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and a plurality of sound channels required to be adopted when the sound record is played.
The channel coefficient information is the proportion coefficient of each channels when the recording is played.
For example, the plurality of channels used when the audio record is played include a left channel and a right channel, and the position information of the sound source in the scene image is parts of the left half of the scene image or parts of the right half of the scene image.
Another example may be where the multiple channels used in playing the audio recording include a left channel and a right channel, and the position information of the sound source in the scene image is the proportionality coefficient of the sound source in the transverse direction of the scene image, assuming that the proportionality coefficient of the sound source in the transverse direction of the scene image is 0.2,0.8, then 20% of the sound source is played by the left channel and 80% of the sound source is played by the left channel.
Of course, instead of two channels, more channels may be used when playing back a recording, for example three channels, whose channel coefficient information may be similar to [0.2, 0.4, 0.4 ].
Step S14: and synthesizing sound information collected by the microphone into multi-channel audio data according to the sound channel coefficient information.
In the embodiment of the invention, the multichannel audio data can be synthesized by adopting the single-channel sound information collected by microphones, so that a plurality of microphones are not required to be arranged in the terminal for recording, and the cost of the terminal for recording is reduced.
The terminal executing the recording processing method in the above embodiment may be a terminal for recording, or may not be a terminal for recording, and is only used for processing recording, for example, the terminal may be an computer, and the terminal for recording may be a video camera, and the video camera transmits the recorded recording to the computer, and the computer performs multi-channel sound synthesis.
That is, the time for synthesizing the multi-channel sound may be the time for recording or recording the sound, or the time for synthesizing the multi-channel sound may be the time for synthesizing the multi-channel sound after recording or recording the sound, for example, the time for synthesizing the multi-channel sound when playing the recording or recording the sound.
In addition, the above terminal for performing a recording processing method may be further configured to play synthesized multi-channel information, that is, the multi-channel for playing the recording is multi-channel on the terminal for performing the recording processing method, and after the step of synthesizing the sound information collected by the microphone into multi-channel audio data, the method may further include: and playing the multi-channel audio data.
Of course, the multi-channel for playing the recording may not be the multi-channel of the terminal for executing the recording processing method, but may be the multi-channel of other playing devices, in this case, after the step of synthesizing the sound information collected by the microphone into the multi-channel audio data, the step of transmitting the multi-channel audio data to the playing device for playing may further include that the multi-channel audio data is the multi-channel of the playing device.
Referring to fig. 2, fig. 2 is a flowchart of a recording processing method according to a second embodiment of the present invention, where the method is applied to an terminal, and the terminal according to the second embodiment of the present invention includes a camera and a microphone, and the method includes the following steps:
step S21: when a request for opening a recording function is received, the camera is started to collect scene image information, and the microphone is started to collect sound information.
The audio recording function may be an audio recording function in the camera application software in the terminal, or an audio recording function in the real-time communication application software in the terminal, such as a video chat function of a WeChat.
Step S22: and acquiring scene image information acquired by a camera and sound information acquired by a microphone during recording.
And step S23, acquiring the position information of each sound source in the scene image according to the scene image information.
And step S24, generating sound channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and a plurality of sound channels required to be adopted when the sound record is played.
Step S25: and synthesizing sound information collected by the microphone into multi-channel audio data according to the sound channel coefficient information.
In the embodiment of the invention, the terminal executing the sound recording processing method is a sound recording and video recording terminal. And multi-channel audio data can be synthesized while recording and video recording.
In the embodiment of the present invention, an image recognition technology may be adopted to obtain the position information of each sound source in the scene image, which is described below by way of example.
Referring to fig. 3, fig. 3 is a flowchart of a recording processing method according to a third embodiment of the present invention, where the method is applied to a terminal, and includes the following steps:
step S31: and acquiring scene image information acquired by a camera and sound information acquired by a microphone during recording.
Step S32: and identifying the sounding organism in the scene image according to the scene image information.
Such organisms include humans and animals.
And step S33, performing facial recognition on the sounding organisms in the scene images, and determining each sound source.
For example, from successive images, the lips of the sound source are identified, the face changes, and the sound source is then identified.
Step S34, the position information of each sound source is acquired.
And step S35, generating the sound channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and a plurality of sound channels required to be adopted when the sound record is played.
Step S36: and synthesizing sound information collected by the microphone into multi-channel audio data according to the sound channel coefficient information.
In the embodiment of the invention, the position information of the sound source is determined by the face recognition technology, and the implementation mode is simple.
Of course, in other embodiments of the present invention, the location information of the sound source may be determined by other methods.
In the above embodiments, it is mentioned that the position information of the sound source may be whether the sound source is located at left-half portion of the scene image or at right-half portion of the scene image, when the position information of the sound source indicates that the sound source is located at left-half portion of the scene image, the vocal tract coefficient information corresponding to the position information of the sound source is configured to play the sound information of the sound source using the left channel, and the right channel is free of sound.e. the vocal tract coefficient information may be represented as [1,0 ]. when the position information of the sound source indicates that the sound source is located at right-half portion of the scene image, the vocal tract coefficient information corresponding to the position information of the sound source is configured to play the sound information of the sound source using the right channel, and the left channel is free of sound.e. the vocal tract coefficient information may be represented as [.
When a plurality of sound sources are included at time periods simultaneously, the channel coefficient information may be expressed in a matrix manner, for example, when two sound sources are included at time period, the channel coefficient information of the two sound sources may be expressed as follows.
In the above embodiment, the position information of the sound source may also be a scaling factor of the sound source in the transverse direction of the scene image, for example, when the sound source is at the leftmost end in the transverse direction of the scene image, the scaling factor is [1,0], when the sound source is at the rightmost end in the transverse direction of the scene image, the scaling factor is [0,1], and when the sound source is at the central point in the transverse direction of the scene image, the step of generating channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and a plurality of channels required to be used when playing the recording includes calculating the scaling factor of the sound source in the transverse direction of the scene image according to the position information of the sound source, and calculating coefficient information occupied by the left channel and the right channel according to the scaling factor in the transverse direction of the sound source image to obtain channel coefficient information corresponding to the position information of the sound source.
In the above embodiment, it is mentioned that the position information of the sound source may also be front-back information of the sound source in the scene image, that is, relative distance information of the sound source from the camera. In this case, a scene with multiple channels arranged in front and back of a terminal for playing the recording can be matched.
Of course, the location information of the sound source may be of other types, again not illustrated at .
Referring to fig. 4, an embodiment of the present invention further provides terminals, including:
the acquisition module is used for acquiring scene image information acquired by the camera and sound information acquired by the microphone during recording;
the position information acquisition module is used for acquiring the position information of each sound source in the scene image according to the scene image information;
a sound channel coefficient determining module, configured to generate sound channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and a plurality of sound channels required to be used when a recording is played;
and the synthesis module is used for synthesizing the sound information collected by the microphone into multi-channel audio data according to the sound channel coefficient information.
The terminal can be a mobile phone, a tablet computer, a camera or a desktop computer.
Preferably, the terminal further includes:
and the playing module is used for playing the multi-channel audio data.
Preferably, the terminal further includes:
the camera and the microphone; and
and the control module is used for controlling the camera to start and collect scene image information and controlling the microphone to start and collect sound information when receiving a request for starting the recording and video recording function.
In an embodiment of the present invention, the location information obtaining module includes:
an recognition unit for recognizing a biological body that utters in a scene image based on the scene image information;
a second recognition unit configured to perform face recognition on the biological object that uttered sound in the scene image, and identify each sound source;
an obtaining unit configured to obtain the position information of each sound source.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (11)

1, recording processing method, applied to terminal, characterized in that the method includes:
acquiring scene image information acquired by a camera and sound information acquired by a microphone during recording; the sound information collected by the microphone is single sound channel sound information;
acquiring position information of each sound source in the scene image according to the scene image information;
generating sound channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and a plurality of sound channels required to be adopted when the sound record is played;
synthesizing sound information collected by the microphone into multi-channel audio data according to the sound channel coefficient information;
the multiple sound channels required to be adopted when the recording is played comprise a left sound channel and a right sound channel of a terminal used for playing the recording, and the step of generating sound channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and the multiple sound channels required to be adopted when the recording is played comprises the following steps:
calculating a proportionality coefficient of the sound source in the transverse direction of the scene image according to the position information of the sound source;
calculating coefficient information occupied by a left sound channel and a right sound channel according to the proportional coefficient of the sound source in the transverse direction of the scene image to obtain sound channel coefficient information corresponding to the position information of the sound source;
a plurality of sound channels that need adopt when playing the recording still include a plurality of sound channels that are used for playing the terminal of recording and set up from beginning to end, the positional information of sound source still includes: the sound source is information before and after the scene image.
2. The recording processing method according to claim 1, wherein after the step of synthesizing the sound information collected by the microphones into multi-channel audio data, the method further comprises:
and playing the multi-channel audio data.
3. The audio recording processing method according to claim 1, wherein the terminal includes the camera and the microphone, and before the step of acquiring the scene image information collected by the camera and the sound information collected by the microphone during audio recording, the method further includes:
when a request for opening a recording function is received, the camera is started to collect scene image information, and the microphone is started to collect sound information.
4. The audio recording method according to claim 3, wherein the audio recording function is an audio recording function in a camera application in a terminal or an audio recording function in a real-time communication application in the terminal.
5. The audio recording method according to claim 1, wherein the step of obtaining location information of each sound source in the scene image according to the scene image information comprises:
identifying a sounding organism in the scene image according to the scene image information;
carrying out facial recognition on the sounding organisms in the scene images, and determining each sound source;
the position information of each sound source is acquired.
6. The recording processing method of claim 1, wherein the step of generating channel coefficient information corresponding to the position information of each of the sound sources based on the position information of each of the sound sources and a plurality of channels required to be used when the recording is played comprises:
when the position information of the sound source indicates that the sound source is positioned in the left half part of the scene image, the sound channel coefficient information corresponding to the position information of the sound source is configured to play the sound information of the sound source by using a left sound channel;
when the position information of the sound source indicates that the sound source is located in the right-half part of the scene image, the channel coefficient information corresponding to the position information of the sound source is configured to play the sound information of the sound source in the right channel.
7. The audio recording processing method according to claim 1, wherein the audio recording processing method is executed by the terminal at the time of audio recording; or the terminal executes the playing of the audio and video recording.
A terminal of the type , comprising:
the acquisition module is used for acquiring scene image information acquired by the camera and sound information acquired by the microphone during recording; the sound information collected by the microphone is single sound channel sound information;
the position information acquisition module is used for acquiring the position information of each sound source in the scene image according to the scene image information;
a sound channel coefficient determining module, configured to generate sound channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and a plurality of sound channels required to be used when a recording is played;
the synthesis module is used for synthesizing the sound information collected by the microphone into multi-channel audio data according to the sound channel coefficient information;
the plurality of sound channels required to be adopted when the recording is played comprise a left sound channel and a right sound channel of a terminal for playing the recording;
the sound channel coefficient determining module is used for calculating a proportionality coefficient of the sound source in the transverse direction of the scene image according to the position information of the sound source; calculating coefficient information occupied by a left sound channel and a right sound channel according to the proportional coefficient of the sound source in the transverse direction of the scene image to obtain sound channel coefficient information corresponding to the position information of the sound source;
a plurality of sound channels that need adopt when playing the recording still include a plurality of sound channels that are used for playing the terminal of recording and set up from beginning to end, the positional information of sound source still includes: the sound source is information before and after the scene image.
9. The terminal of claim 8, further comprising:
and the playing module is used for playing the multi-channel audio data.
10. The terminal of claim 8, further comprising:
the camera and the microphone; and
and the control module is used for controlling the camera to start and collect scene image information and controlling the microphone to start and collect sound information when receiving a request for starting the recording and video recording function.
11. The terminal of claim 8, wherein the location information obtaining module comprises:
an recognition unit for recognizing a biological body that utters in a scene image based on the scene image information;
a second recognition unit configured to perform face recognition on the biological object that uttered sound in the scene image, and identify each sound source;
an obtaining unit configured to obtain the position information of each sound source.
CN201610509141.5A 2016-06-29 2016-06-29 recording processing method and terminal Active CN105979469B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610509141.5A CN105979469B (en) 2016-06-29 2016-06-29 recording processing method and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610509141.5A CN105979469B (en) 2016-06-29 2016-06-29 recording processing method and terminal

Publications (2)

Publication Number Publication Date
CN105979469A CN105979469A (en) 2016-09-28
CN105979469B true CN105979469B (en) 2020-01-31

Family

ID=56953545

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610509141.5A Active CN105979469B (en) 2016-06-29 2016-06-29 recording processing method and terminal

Country Status (1)

Country Link
CN (1) CN105979469B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190278802A1 (en) * 2016-11-04 2019-09-12 Dirac Research Ab Constructing an audio filter database using head-tracking data
CN106982316A (en) * 2017-05-03 2017-07-25 张德明 A kind of sound field collection based on ball base camera renders supervising device
CN108052312B (en) * 2017-12-06 2021-04-27 晶晨半导体(上海)股份有限公司 Method for realizing multi-channel recording based on android system and audio system
CN116389982A (en) * 2023-05-19 2023-07-04 零束科技有限公司 Audio processing method, device, electronic equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102483928A (en) * 2009-09-04 2012-05-30 株式会社尼康 Voice data synthesis device

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006314078A (en) * 2005-04-06 2006-11-16 Sony Corp Imaging apparatus, voice recording apparatus, and the voice recording method
DE102005043641A1 (en) * 2005-05-04 2006-11-09 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for generating and processing sound effects in spatial sound reproduction systems by means of a graphical user interface
JP5067595B2 (en) * 2005-10-17 2012-11-07 ソニー株式会社 Image display apparatus and method, and program
EP2124486A1 (en) * 2008-05-13 2009-11-25 Clemens Par Angle-dependent operating device or method for generating a pseudo-stereophonic audio signal
JP5155092B2 (en) * 2008-10-10 2013-02-27 オリンパスイメージング株式会社 Camera, playback device, and playback method
CN103198834B (en) * 2012-01-04 2016-12-14 中国移动通信集团公司 A kind of acoustic signal processing method, device and terminal
CN103379424B (en) * 2012-04-24 2016-08-10 华为技术有限公司 A kind of sound mixing method and multipoint control server
DE102012017296B4 (en) * 2012-08-31 2014-07-03 Hamburg Innovation Gmbh Generation of multichannel sound from stereo audio signals

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102483928A (en) * 2009-09-04 2012-05-30 株式会社尼康 Voice data synthesis device

Also Published As

Publication number Publication date
CN105979469A (en) 2016-09-28

Similar Documents

Publication Publication Date Title
JP6841229B2 (en) Speech processing equipment and methods, as well as programs
CN105979469B (en) recording processing method and terminal
WO2020006935A1 (en) Method and device for extracting animal voiceprint features and computer readable storage medium
Donley et al. Easycom: An augmented reality dataset to support algorithms for easy communication in noisy environments
CN112400325A (en) Data-driven audio enhancement
CN103918284B (en) voice control device, voice control method and program
CN106790940B (en) Recording method, recording playing method, device and terminal
JP7100824B2 (en) Data processing equipment, data processing methods and programs
WO2016029806A1 (en) Sound image playing method and device
CN108769400A (en) A kind of method and device of locating recordings
JP7427408B2 (en) Information processing device, information processing method, and information processing program
CN114556469A (en) Data processing method and device, electronic equipment and storage medium
JP5383056B2 (en) Sound data recording / reproducing apparatus and sound data recording / reproducing method
CN114822568A (en) Audio playing method, device, equipment and computer readable storage medium
CN112073891B (en) System and method for generating head-related transfer functions
US11184184B2 (en) Computer system, method for assisting in web conference speech, and program
CN111988705B (en) Audio processing method, device, terminal and storage medium
JP2004248125A (en) Device and method for switching video, program for the method, and recording medium with the program recorded thereon
JP6977463B2 (en) Communication equipment, communication systems and programs
CN111147655B (en) Model generation method and device
CN115516555A (en) System and method for multi-microphone automated clinical documentation
CN112584225A (en) Video recording processing method, video playing control method and electronic equipment
WO2018088210A1 (en) Information processing device and method, and program
US20240015462A1 (en) Voice processing system, voice processing method, and recording medium having voice processing program recorded thereon
US20230260505A1 (en) Information processing method, non-transitory recording medium, information processing apparatus, and information processing system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant