CN114390133A - Recording method and device and electronic equipment - Google Patents

Recording method and device and electronic equipment Download PDF

Info

Publication number
CN114390133A
CN114390133A CN202210082304.1A CN202210082304A CN114390133A CN 114390133 A CN114390133 A CN 114390133A CN 202210082304 A CN202210082304 A CN 202210082304A CN 114390133 A CN114390133 A CN 114390133A
Authority
CN
China
Prior art keywords
sound source
sound
voice
directory
voice signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210082304.1A
Other languages
Chinese (zh)
Inventor
高志稳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN202210082304.1A priority Critical patent/CN114390133A/en
Publication of CN114390133A publication Critical patent/CN114390133A/en
Priority to PCT/CN2023/072987 priority patent/WO2023138632A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/64Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations
    • H04M1/642Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations storing speech in digital form
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/02Constructional features of telephone sets
    • H04M1/19Arrangements of transmitters, receivers, or complete sets to prevent eavesdropping, to attenuate local noise or to prevent undesired transmission; Mouthpieces or receivers specially adapted therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The application discloses a recording method, a recording device and electronic equipment, and belongs to the field of electronic equipment. The terminal determines the position of a target sound source through a microphone array; wherein the target sound source is a sound source generating a voice signal whose sound intensity satisfies a first condition, and the microphone array includes at least three microphones that are not arranged in a straight line; and acquiring voice signals in a directional acquisition mode according to the position of the target sound source to obtain a voice file.

Description

Recording method and device and electronic equipment
Technical Field
The application belongs to the field of electronic equipment, and particularly relates to a recording method and device and electronic equipment.
Background
Along with continuous progress and development of scientific technology, functions of the smart phone are more, more and more abundant and various, requirements of people on recording technology and quality are higher, good experience effect of people can be brought by the good recording technology, and meanwhile, more real and sincere hearing of people is brought by the same wonderful and rich sound.
The suppression ability of the recording of the smart phone to the environmental noise is not strong, and especially under scenes such as a multi-person conference, slightly larger surrounding sound can be recorded, so that the sound recognition degree is low, and the noise is large.
Disclosure of Invention
The embodiment of the application aims to provide a recording method, a recording device and electronic equipment, which can solve the problems that the suppression capability on environmental noise is not strong, and particularly, in scenes such as a multi-person conference and the like, a little bit of surrounding sound can be recorded, the sound recognition degree is low, and the noise is large.
In a first aspect, an embodiment of the present application provides a recording method, where the method includes:
the terminal determines the position of a target sound source through a microphone array; wherein the target sound source is a sound source generating a voice signal whose sound intensity satisfies a first condition, and the microphone array includes at least three microphones that are not arranged in a straight line;
and acquiring voice signals in a directional acquisition mode according to the position of the target sound source to obtain a voice file.
In a second aspect, an embodiment of the present application provides an audio recording apparatus, including:
the sound source positioning module is used for determining the position of a target sound source through the microphone array; wherein the target sound source is a sound source generating a voice signal whose sound intensity satisfies a first condition, and the microphone array includes at least three microphones that are not arranged in a straight line;
and the voice acquisition module is used for acquiring voice signals in a directional acquisition mode according to the position of the target sound source to obtain a voice file.
In a third aspect, embodiments of the present application provide an electronic device, which includes a processor and a memory, where the memory stores a program or instructions executable on the processor, and the program or instructions, when executed by the processor, implement the steps of the method according to the first aspect.
In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.
In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.
In a sixth aspect, embodiments of the present application provide a computer program product, stored on a storage medium, for execution by at least one processor to implement the method according to the first aspect.
In the embodiment of the application, the terminal determines the position of a target sound source through a microphone array; wherein the target sound source is a sound source generating a voice signal whose sound intensity satisfies a first condition, and the microphone array includes at least three microphones that are not arranged in a straight line; and collecting voice signals in a directional collection mode according to the position of the target sound source. According to the embodiment of the invention, the determined target sound source is positioned and directionally collected, so that the voice signal of the target sound source can be effectively separated from the noise environment and collected, and a clearer sound segment can be obtained.
Drawings
Fig. 1 is a schematic flowchart of a recording method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a microphone array structure provided in an embodiment of the present application;
FIG. 3 is a schematic flow chart of another recording method provided in the embodiments of the present application;
FIG. 4 is a schematic flowchart of another recording method according to an embodiment of the present application;
FIG. 5 is a schematic flowchart of another recording method according to an embodiment of the present application;
fig. 6 is a schematic flowchart of an identity recognition method for a sound clip according to an embodiment of the present application;
FIG. 7 is a schematic display diagram of a display interface provided in an embodiment of the present application;
fig. 8 is a schematic structural diagram of a terminal group according to an embodiment of the present application;
FIG. 9 is a schematic display diagram of another display interface provided in an embodiment of the present application;
FIG. 10 is a schematic display diagram of another display interface provided by an embodiment of the present application;
fig. 11 is a schematic structural diagram of a recording apparatus according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 13 is a hardware configuration diagram of an electronic device implementing an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
The recording method provided by the embodiment of the present application is described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.
As shown in fig. 1, an embodiment of the present application provides a sound recording method, where an execution subject of the method is a terminal, and a microphone array is disposed on the terminal, where the microphone array includes at least three microphones and requires that the at least three microphones are not disposed on a same straight line. As shown in fig. 2, three microphones 101 not in the same line are respectively disposed at the top, middle and bottom of the terminal 100 to form a microphone array. The recording method may include the following steps.
Step 110, the terminal determines the position of a target sound source through a microphone array; wherein the target sound source is a sound source generating a voice signal whose sound intensity satisfies a first condition, and the microphone array includes at least three microphones that are not arranged in a straight line.
The target sound source is a sound source corresponding to the voice signal meeting the first condition in the voice signals collected by the microphone array.
The first condition may be set according to actual needs, may be set based on the sound intensity of the voice signal, or may be set based on the duration of the voice signal.
In one embodiment, the first condition is that the sound intensity of the speech signal exceeds a first threshold. The microphone array determines a sound source generating the voice seat number as a target sound source when the sound intensity of the voice signal exceeds the first threshold value by monitoring the sound intensity of the voice signal collected in the environment.
The first condition may also be that the duration of the sound intensity exceeding the first threshold exceeds a first duration; or the sound intensity is highest in the voice signals with the sound intensity exceeding the first threshold value in the current environment. For the sake of simplicity, the first condition that the sound intensity of the voice signal exceeds the first threshold a is exemplified in the following embodiments.
The terminal can position a target sound source through the microphone array so as to acquire the position of the target sound source. The method for positioning the sound source by using the plurality of microphones can be diversified, and the position of a target sound source generating the sound signal in a three-dimensional space can be calculated by using the difference of the collected sound signals among different microphones in the microphone array on the factors such as sound intensity, phase and the like.
And 120, acquiring a voice signal in a directional acquisition mode according to the position of the target sound source to obtain a voice file.
After the position of the target sound source is determined, the terminal can control the microphone array to adopt a directional acquisition mode, directionally acquire the voice signals in the direction of the position of the target sound source, and exit from the directional acquisition mode until the sound intensity of the acquired voice signals is smaller than a second threshold value or the duration time of the acquired voice signals, which is smaller than the second threshold value, exceeds a second duration time, and the acquired voice information is stored as a voice file. And returning to the normal acquisition mode, and monitoring whether the voice signals meeting the first condition appear again in the current environment. At this time, if the voice signal which does not meet the first condition is collected, the voice signal is covered and saved as bottom noise in the voice file. Wherein the second threshold may be equal to the first threshold or less than the first threshold.
Further, the directional acquisition mode in step 120 may be various, and may include:
setting an acquisition area by taking the target sound source as a center;
and acquiring voice signals from the acquisition area to obtain sound fragments.
Furthermore, the determination mode of the acquisition area may be various in advance, and may be a sector area in which the target sound source is in a radius range with the target sound source as a center, or the microphone array is a vertex, and a connecting line from the microphone array to the target sound source is a center line, and an opening angle of the sector area may be set to be a first angle X. As shown in fig. 3, a sector area formed with the target sound source 200 as a center collects voice information as a collection area. For simplicity, the following embodiments are illustrated as fan-shaped areas.
In one embodiment, to ensure intelligibility of the speech signal acquired within the acquisition region, the method further comprises:
and (3) shielding or suppressing the voice signals outside the acquisition area, and as shown in fig. 3, shielding or suppressing the voice signals generated by other sound sources 201 outside the sector area.
The method for masking or suppressing the voice signal can be set according to actual needs, for example, different gains in different directions can be formed by adjusting parameters of each microphone in the microphone array. Increasing the gain in the direction of the acquisition area and decreasing the gain outside said acquisition area; the speech signals outside the acquisition region can also be filtered out or suppressed by software algorithms.
Fig. 4 shows an example of the recording method according to the embodiment of the present application.
Starting recording;
monitoring whether the sound intensity of voice signals generated by different sound sources in the current environment exceeds a first threshold value A; sequentially judging whether the sound intensity of the voice signals of the sound sources including the sound source 1, the sound source 2 and the sound source 3 exceeds a first threshold value A or not;
if the sound intensity of the voice information of any sound source exceeds a first threshold value A, determining the sound source as a target sound source and positioning the sound source so as to carry out X-angle fan-shaped directional acquisition on the voice signal of the sound source and simultaneously shield or inhibit the voice signals generated by other sound sources;
if the first threshold value A is not exceeded, the voice signal of the next sound source is continuously monitored, and the like is repeated until the recording is finished.
After the recording is finished, the terminal can obtain the complete voice file of the recording.
According to the technical scheme provided by the embodiment of the invention, the position of the target sound source is determined by the microphone array; wherein the target sound source is a sound source generating a voice signal whose sound intensity satisfies a first condition, and the microphone array includes at least three microphones that are not arranged in a straight line; and collecting voice signals in a directional collection mode according to the position of the target sound source. According to the embodiment of the invention, the determined target sound source is positioned and directionally collected, so that the voice signal of the target sound source can be effectively separated from the noise environment and collected, and a clearer sound segment can be obtained.
Based on the above embodiment, further, as shown in fig. 5, during or after the voice signal is collected in step 120, the method further includes:
and step 130, determining a sound source corresponding to the voice signal according to the voiceprint characteristics of the voice signal.
When the voice signal is collected, the identity recognition of the sound source generating the voice signal can be started. The identity recognition mode can be various, and the embodiment of the present application only exemplifies a mode of matching based on voiceprint features.
As shown in fig. 6, extracting the voiceprint features of the voice signal to be recognized through feature extraction, and matching the voiceprint features with the existing voiceprint features of the existing sound source of the terminal, if the matching is successful, determining that the sound source generating the voice signal is the existing sound source, and then extracting the sound fragment containing the voice signal and recording the sound fragment into the corresponding directory; and if the matching fails, determining that the sound source generating the voice signal is a new sound source, naming the identity of the new sound source according to a naming rule, warehousing the new sound source and the voiceprint features to form a new existing sound source and existing voiceprint features, extracting a sound fragment containing the voice signal, and recording the sound fragment into a corresponding directory.
Further, after the step S130, the method further includes:
extracting a sound segment including the voice signal according to the sound intensity;
the extraction method of the sound clip may be various, and the embodiment of the present application only gives an example. The sound segments may be extracted based on the first condition, and according to the change of the sound intensity of the speech signal, the start flag of each sound segment may be when the sound intensity satisfies the first condition or satisfies the first condition for a first time duration, and the end flag may be when the sound intensity does not satisfy the first condition or does not satisfy the first condition for a second time duration.
And recording the sound clips into a first directory corresponding to the sound source on a display interface, wherein the first directory is positioned under a second directory corresponding to the voice file.
The arrangement rule and the display rule of the directory in the display interface can be set according to actual needs, and the embodiment of the application only provides an example of one implementation manner. As shown in fig. 7, a two-level directory is established on a display interface, a second directory corresponding to the voice file is used as a first-level directory, a directory name of the second directory may be an identifier of the voice file, a first directory corresponding to each sound source is established under the second directory as a second-level directory, and a directory name of the first directory may be an identifier of the corresponding sound source: the method comprises the following steps of recording sound segments A-1, A-2 and A-3 corresponding to sound sources under each first directory, wherein the names of the sound segments can comprise the identification and serial numbers of the corresponding sound sources.
In the process of identifying the identity of a sound clip, if an existing sound source corresponding to the sound clip is matched, recording the sound clip into a first directory corresponding to the existing sound source; and if the existing sound source corresponding to the sound clip is not matched, a first directory corresponding to the new sound source is newly built in a second directory, and the sound clip is recorded in the newly built first directory.
Further, the method further comprises:
receiving a first input to a display interface; the first input may be generated by a touch operation, for example, a click operation, a long-press operation, a slide operation, or the like, and may also be generated by a voice operation or a gesture operation, which is not specifically limited herein.
In response to the first input, playing the sound clip or voice file.
And the user selects to play the voice file or the sound clip displayed on the display interface through a first input to the display interface. For example, as shown in fig. 7, the user may play or pause the voice file containing each sound clip by performing a first operation on the second directory corresponding to the voice file; the user can also expand or retract the list of the first catalog under the second catalog by carrying out second operation on the second catalog; the user can also expand or retract the list of the sound clips in the first catalog by carrying out a third operation on the expanded first catalog; the user can also play or pause the sound clip through a fourth operation on the sound clip.
According to the technical scheme provided by the embodiment of the invention, the embodiment of the invention determines the sound source generating the voice signal according to the voiceprint characteristics of the voice signal. Through the embodiment of the invention, the identity of the sound source of the voice signal is identified, the corresponding sound segment is extracted and recorded into the corresponding directory, and therefore, the sound segment can be efficiently managed, reasonably displayed and played.
Based on the above embodiment, further, if the terminal is a master terminal of a terminal group, where the terminal group includes at least one master terminal and N additional terminals, where N is greater than or equal to 1, the method further includes:
and receiving and summarizing the voice signals collected by the auxiliary terminal to obtain a voice file.
As shown in fig. 8, a terminal group consisting of a plurality of terminals may be formed according to actual needs, for example, when a multi-terminal conference needs to be performed, a plurality of terminals entering the conference are formed into a terminal group. In one embodiment, a main terminal set by at least one terminal and auxiliary terminals set by other terminals can be used for recording and recording the whole process of the multi-terminal conference by the main terminal.
Each terminal can determine and position a target sound source of the surrounding environment according to the manner of the embodiment, then collects voice signals in a directional collection mode according to the position of the target sound source, sends the voice signals to the main terminal, and the main terminal collects the voice signals according to the time sequence to form a total voice file.
Further, in the process of receiving the voice signal collected by the auxiliary terminal, the method further includes:
a sound source corresponding to the speech signal is determined.
It should be understood that the process of identifying the identity of the sound source corresponding to each voice signal can be completed by each terminal, and then the identification result is sent to the main terminal along with the voice signal I; the identification process may also be performed by the master terminal.
Extracting a sound segment including the voice signal from the voice file.
The sound segments may be extracted based on the first condition, and according to the change of the sound intensity of the speech signal, the start flag of each sound segment may be when the sound intensity satisfies the first condition or satisfies the first condition for a first time duration, and the end flag may be when the sound intensity does not satisfy the first condition or does not satisfy the first condition for a second time duration.
Recording the sound fragments into a first catalogue of corresponding sound sources on a display interface; the first directory is located under a third directory corresponding to the terminal, and the third directory is located under a second directory corresponding to the voice file.
And the main terminal records the sound fragment into a corresponding directory of a display interface according to the sound source corresponding to the sound fragment in the identification result of the sound fragment and the terminal for collecting the sound fragment.
The arrangement rule and the display rule of the directory in the display interface can be set according to actual needs, and the embodiment of the application only gives an example. As shown in fig. 9, a secondary directory is established on a display interface, a second directory corresponding to the voice file is used as a first-level directory, and a directory name of the second directory may be an identifier of the voice file: the terminal 1, the terminal 2, and the terminal 3, establish a third directory corresponding to each terminal as a second-level directory under the second directory, where a directory name of the third directory may be an identifier of the terminal, record a corresponding sound clip in the third directory, and a name of each sound clip may include: the identification of the terminal, the identity and the serial number of the corresponding sound source are as follows: 1-A-1, 1-A-2, 1-B-1, and the like.
In another embodiment, a third-level directory may be further established on a display interface, a second directory corresponding to the voice file is used as a first-level directory, a directory name of the second directory may be an identifier of the voice file, a third directory corresponding to each terminal is established under the second directory as a second-level directory, and a directory name of the third directory may be an identifier of the terminal: the terminal 1, the terminal 2 and the terminal 3 establish a first directory corresponding to each sound source as a third-level directory under the third directory, and the directory name of the first directory can be the identity of the corresponding sound source: sound source a, sound source B and sound source C, and recording sound segments corresponding to the respective sound sources under the respective first directories: 1-A-1, 1-A-2, 1-B-1, and the like.
In another embodiment, the hierarchical relationship between the first directory and the second directory in the upper predetermined third-level directory may be reversed, the first directory is used as the second-level directory, the third directory is established under the second directory as the third-level directory, and the corresponding sound source is recorded in the third directory.
For simplicity, the following embodiments are illustrated by establishing a secondary directory as shown in FIG. 9.
In one embodiment, the master terminal extracts and identifies each sound fragment in the total voice file, and determines the identity of the sound source corresponding to the sound fragment through the matching process of the voiceprint features during the process of identifying the sound fragment, and records the identity into the corresponding third directory and names the third directory.
Further, the method further comprises:
receiving a first input to a display interface; the first input may be generated by a touch operation, for example, a click operation, a long-press operation, a slide operation, or the like, and may also be generated by a voice operation or a gesture operation, which is not specifically limited herein.
In response to the first input, playing the sound clip or voice file.
And the user selects to play the voice file or the sound clip displayed on the display interface through a first input to the display interface. For example, as shown in fig. 9, the user may play or pause the audio file containing each sound clip by performing a first operation on the second directory corresponding to the audio file; the user can also expand or retract a list of a third catalog under the second catalog by carrying out second operation on the second catalog; the user can also expand or retract the list of the sound clips in the expanded third catalogue by performing a third operation on the expanded third catalogue; the user can also play or pause the sound clip through a fourth operation on the sound clip.
As can be seen from the above technical solutions provided in the embodiments of the present invention, in the embodiment of the present invention, if the terminal is a main terminal of a terminal group, the terminal group includes at least one main terminal and N auxiliary terminals, where N is greater than or equal to 1, the main terminal receives sound segments collected by the auxiliary terminals, and summarizes and stores the sound segments as a voice file. Through the embodiment of the invention, the sound source of the sound clip is identified and recorded in the corresponding directory, so that the sound clip can be efficiently managed, and reasonably displayed and played.
According to the recording method provided by the embodiment of the application, the execution main body can be a recording device. The recording method executed by the recording device is taken as an example in the embodiment of the present application, and the recording device provided in the embodiment of the present application is described.
As shown in fig. 11, the recording apparatus includes: a sound source positioning module 111 and a voice collecting module 112; the sound source positioning module 111 is configured to determine a position of a target sound source through a microphone array; wherein the target sound source is a sound source generating a voice signal whose sound intensity satisfies a first condition, and the microphone array includes at least three microphones that are not arranged in a straight line; the voice collecting module 112 is configured to collect a voice signal in a directional collecting mode according to the position of the target sound source, so as to obtain a voice file.
Further, the first condition includes:
the sound intensity of the speech signal exceeds a first threshold.
Further, the voice acquisition module is configured to:
setting an acquisition area by taking the target sound source as a center;
and collecting voice signals from the collection area.
Furthermore, the collecting area is a sector area which takes the microphone array as a vertex and takes a connecting line from the microphone array to the target sound source as a central line.
Further, the voice acquisition module is further configured to:
and shielding or suppressing the voice signal outside the acquisition area.
According to the technical scheme provided by the embodiment of the invention, the position of the target sound source is determined through the microphone array; wherein the target sound source is a sound source generating a voice signal whose sound intensity satisfies a first condition, and the microphone array includes at least three microphones that are not arranged in a straight line; and collecting voice signals in a directional collection mode according to the position of the target sound source. According to the embodiment of the invention, the determined target sound source is positioned and directionally collected, so that the voice signal of the target sound source can be effectively separated from the noise environment and collected, and a clearer sound segment can be obtained.
Based on the foregoing embodiment, further, in the process of acquiring a voice signal in the directional acquisition mode, the voice acquisition module is further configured to:
and determining a sound source corresponding to the voice signal according to the voiceprint characteristics of the voice signal.
Further, after determining the sound source corresponding to the voice signal, the voice acquisition module is further configured to:
extracting a sound segment including the voice signal according to the sound intensity;
and recording the sound clips into a first directory corresponding to the sound source on a display interface, wherein the first directory is positioned under a second directory corresponding to the voice file.
Further, after the sound clip is recorded in the first directory corresponding to the corresponding identification information, the voice collecting module is further configured to:
receiving a first input to a display interface;
in response to the first input, playing the sound clip or voice file.
According to the technical scheme provided by the embodiment of the invention, the embodiment of the invention determines the sound source generating the voice signal according to the voiceprint characteristics of the voice signal. Through the embodiment of the invention, the identity of the sound source corresponding to the voice signal is identified, the corresponding sound segment is extracted and recorded into the corresponding directory, and therefore, the sound segment can be efficiently managed, reasonably displayed and played.
Based on the above embodiment, further, when the recording device is a main terminal of a terminal group, the terminal group includes at least one main terminal and N auxiliary terminals, where N is greater than or equal to 1, the voice acquisition module is further configured to:
and receiving and summarizing the voice signals collected by the auxiliary terminal to obtain a voice file.
Further, in the process of receiving the voice signal acquired by the auxiliary terminal, the voice acquisition module is further configured to:
determining a sound source corresponding to the voice signal;
extracting a sound segment including the voice signal from the voice file;
recording the sound clip to a third directory corresponding to a terminal for collecting the sound clip on a display interface; and the third directory is positioned under a second directory corresponding to the voice file.
Further, after the sound clip is recorded in the first directory corresponding to the corresponding identification information, the voice collecting module is further configured to:
receiving a first input to a display interface;
in response to the first input, playing the sound clip or voice file.
As can be seen from the above technical solutions provided in the embodiments of the present invention, in the embodiment of the present invention, if the device is a main terminal of a terminal group, the terminal group includes at least one main terminal and N auxiliary terminals, where N is greater than or equal to 1, the main terminal receives sound segments collected by the auxiliary terminals, and summarizes and stores the sound segments as a voice file. Through the embodiment of the invention, the sound source of the sound clip is identified and recorded in the corresponding directory, so that the sound clip can be efficiently managed, and reasonably displayed and played.
The sound recording apparatus in the embodiment of the present application may be an electronic device, or may be a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be a device other than a terminal. The electronic Device may be, for example, a Mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic Device, a Mobile Internet Device (MID), an Augmented Reality (AR)/Virtual Reality (VR) Device, a robot, a wearable Device, an ultra-Mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and may also be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine, a self-service machine, and the like, and the embodiments of the present application are not particularly limited.
The recording device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.
The recording device provided in the embodiment of the present application can implement each process implemented by the method embodiments in fig. 1 to fig. 10, and is not described here again to avoid repetition.
Optionally, as shown in fig. 12, an electronic device 1200 is further provided in an embodiment of the present application, and includes a processor 1201 and a memory 1202, where the memory 1202 stores a program or an instruction that can be executed on the processor 1201, and when the program or the instruction is executed by the processor 1201, the steps of the foregoing recording method embodiment are implemented, and the same technical effect can be achieved, and no repeated description is provided here to avoid repetition.
It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.
Fig. 13 is a schematic hardware structure diagram of an electronic device implementing an embodiment of the present application.
The electronic device 1300 includes, but is not limited to: a radio frequency unit 1301, a network module 1302, an audio output unit 1303, an input unit 1304, a sensor 1305, a display unit 1306, a user input unit 1307, an interface unit 1308, a memory 1309, a processor 1310, and the like.
Those skilled in the art will appreciate that the electronic device 1300 may further comprise a power source (e.g., a battery) for supplying power to the various components, and the power source may be logically connected to the processor 1310 via a power management system, so as to manage charging, discharging, and power consumption management functions via the power management system. The electronic device structure shown in fig. 13 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.
Wherein, the processor 1310 is configured to determine a position of the target sound source through the input unit 1304; wherein the target sound source is a sound source which generates a voice signal whose sound intensity satisfies a first condition; the input unit 1304 includes at least three microphones that are not arranged in a straight line;
and an input unit 1304, configured to acquire a voice signal in a directional acquisition mode according to the position of the target sound source, so as to obtain a voice file.
Further, the first condition includes:
the sound intensity of the speech signal exceeds a first threshold.
Further, the input unit 1304 is configured to:
setting an acquisition area by taking the target sound source as a center;
and collecting voice signals from the collection area.
Furthermore, the collecting area is a sector area which takes the microphone array as a vertex and takes a connecting line from the microphone array to the target sound source as a central line.
Further, the input unit 1304 is further configured to:
and shielding or suppressing the voice signal outside the acquisition area.
According to the embodiment of the invention, the determined target sound source is positioned and directionally collected, so that the voice signal of the target sound source can be effectively separated from the noise environment and collected, and a clearer sound segment can be obtained.
Based on the foregoing embodiment, further, in the process of acquiring the voice signal in the directional acquisition mode, the processor 1310 is further configured to: and determining a sound source corresponding to the voice signal according to the voiceprint characteristics of the voice signal.
Further, after determining the sound source corresponding to the voice signal, the processor 1310 is further configured to:
extracting a sound segment including the voice signal according to the sound intensity;
and recording the sound clips into a first directory corresponding to the sound source on a display interface, wherein the first directory is positioned under a second directory corresponding to the voice file.
Further, the user input unit 1307 is configured to receive a first input to a display interface;
the audio output unit 1303 is configured to play the sound clip or the voice file in response to the first input.
Through the embodiment of the invention, the identity of the sound source corresponding to the voice signal is identified, the corresponding sound segment is extracted and recorded into the corresponding directory, and therefore, the sound segment can be efficiently managed, reasonably displayed and played.
Further, when the sound recording device is a main terminal of a terminal group, the terminal group includes at least one main terminal and N additional terminals, where N is greater than or equal to 1, the radio frequency unit 1301 is further configured to: and receiving and summarizing the voice signals collected by the auxiliary terminal to obtain a voice file.
Further, in the process of receiving the voice signal collected by the accessory terminal, the processor 1310 is further configured to:
determining a sound source corresponding to the voice signal;
extracting a sound segment including the voice signal from the voice file;
recording the sound clip to a third directory corresponding to a terminal for collecting the sound clip on a display interface; and the third directory is positioned under a second directory corresponding to the voice file.
Through the embodiment of the invention, the sound source of the sound clip is identified and recorded in the corresponding directory, so that the sound clip can be efficiently managed, and reasonably displayed and played.
It should be understood that in the embodiment of the present application, the input Unit 1304 may include a Graphics Processing Unit (GPU) 13041 and a microphone 13042, and the Graphics processor 13041 processes image data of still pictures or videos obtained by an image capturing apparatus (such as a camera) in a video capturing mode or an image capturing mode. The display unit 1306 may include a display panel 13061, and the display panel 13061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1307 includes a touch panel 13071 and at least one of other input devices 13072. A touch panel 13071, also referred to as a touch screen. The touch panel 13071 may include two parts, a touch detection device and a touch controller. Other input devices 13072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.
The memory 1309 may be used to store software programs as well as various data. The memory 1309 may mainly include a first storage area storing programs or instructions and a second storage area storing data, wherein the first storage area may store an operating system, application programs or instructions required for at least one function (such as a sound playing function, an image playing function, etc.), and the like. Further, memory 1309 can comprise volatile memory or nonvolatile memory, or memory 1309 can comprise both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. The volatile Memory may be a Random Access Memory (RAM), a Static Random Access Memory (Static RAM, SRAM), a Dynamic Random Access Memory (Dynamic RAM, DRAM), a Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), a Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate SDRAM, ddr SDRAM), an Enhanced Synchronous SDRAM (ESDRAM), a Synchronous Link DRAM (SLDRAM), and a Direct Memory bus RAM (DRRAM). Memory 1309 in the embodiments of the present application includes, but is not limited to, these and any other suitable types of memory.
Processor 1310 may include one or more processing units; optionally, the processor 1310 integrates an application processor, which mainly handles operations related to the operating system, user interface, application programs, etc., and a modem processor, which mainly handles wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into processor 1310.
The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the recording method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a computer read only memory ROM, a random access memory RAM, a magnetic or optical disk, and the like.
The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the above-mentioned recording method embodiment, and can achieve the same technical effect, and for avoiding repetition, the description is omitted here.
It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.
The embodiment of the present application provides a computer program product, where the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the processes of the foregoing recording method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not described here again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (13)

1. A method of recording a sound, comprising:
the terminal determines the position of a target sound source through a microphone array; wherein the target sound source is a sound source generating a voice signal whose sound intensity satisfies a first condition, and the microphone array includes at least three microphones that are not arranged in a straight line;
and acquiring voice signals in a directional acquisition mode according to the position of the target sound source to obtain a voice file.
2. The method of claim 1, wherein the collecting the voice signal in a directional collection mode according to the position of the target sound source comprises:
setting an acquisition area by taking the target sound source as a center;
and collecting voice signals from the collection area.
3. The method of claim 2, wherein the collecting area is a sector area with the vertex of the microphone array and the connecting line of the microphone array to the target sound source as the center line.
4. The method of claim 2, further comprising:
and shielding or suppressing the voice signal outside the acquisition area.
5. The method of claim 1, wherein during acquisition of the speech signal in the directional acquisition mode, the method further comprises:
and determining a sound source corresponding to the voice signal according to the voiceprint characteristics of the voice signal.
6. The method of claim 5, wherein after determining the sound source corresponding to the speech signal, the method further comprises:
extracting a sound segment including the voice signal according to the sound intensity;
and recording the sound clips into a first directory corresponding to the sound source on a display interface, wherein the first directory is positioned under a second directory corresponding to the voice file.
7. The method of claim 2, wherein in the case where the terminal is a master terminal of a terminal group, the terminal group comprising at least one master terminal and N additional terminals, where N is greater than or equal to 1, the method further comprises:
and receiving and summarizing the voice signals collected by the auxiliary terminal to obtain a voice file.
8. The method according to claim 7, wherein in receiving the voice signal collected by the additional terminal, the method further comprises:
determining a sound source corresponding to the voice signal;
extracting a sound segment including the voice signal from the voice file;
recording the sound clip to a third directory corresponding to a terminal for collecting the sound clip on a display interface; and the third directory is positioned under a second directory corresponding to the voice file.
9. The method of claim 6 or 8, wherein after recording the sound clip in the first directory corresponding to the corresponding identification information, the method further comprises:
receiving a first input to a display interface;
in response to the first input, playing the sound clip or voice file.
10. The method of claim 1, wherein the first condition comprises:
the sound intensity of the speech signal exceeds a first threshold.
11. A sound recording apparatus, comprising:
the sound source positioning module is used for determining the position of a target sound source through the microphone array; wherein the target sound source is a sound source generating a voice signal whose sound intensity satisfies a first condition, and the microphone array includes at least three microphones that are not arranged in a straight line;
and the voice acquisition module is used for acquiring voice signals in a directional acquisition mode according to the position of the target sound source to obtain a voice file.
12. An electronic device, comprising a processor and a memory, the memory storing a program or instructions executable on the processor, the program or instructions, when executed by the processor, implementing the steps of the sound recording method as claimed in any one of claims 1-10.
13. A readable storage medium, on which a program or instructions are stored, which program or instructions, when executed by a processor, carry out the steps of the sound recording method as claimed in any one of claims 1 to 10.
CN202210082304.1A 2022-01-24 2022-01-24 Recording method and device and electronic equipment Pending CN114390133A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210082304.1A CN114390133A (en) 2022-01-24 2022-01-24 Recording method and device and electronic equipment
PCT/CN2023/072987 WO2023138632A1 (en) 2022-01-24 2023-01-18 Voice recording method and apparatus, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210082304.1A CN114390133A (en) 2022-01-24 2022-01-24 Recording method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN114390133A true CN114390133A (en) 2022-04-22

Family

ID=81203197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210082304.1A Pending CN114390133A (en) 2022-01-24 2022-01-24 Recording method and device and electronic equipment

Country Status (2)

Country Link
CN (1) CN114390133A (en)
WO (1) WO2023138632A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023138632A1 (en) * 2022-01-24 2023-07-27 维沃移动通信有限公司 Voice recording method and apparatus, and electronic device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102562287B1 (en) * 2016-10-14 2023-08-02 삼성전자주식회사 Electronic device and audio signal processing method thereof
CN108538320B (en) * 2018-03-30 2020-09-11 Oppo广东移动通信有限公司 Recording control method and device, readable storage medium and terminal
CN111060874B (en) * 2019-12-10 2021-10-29 深圳市优必选科技股份有限公司 Sound source positioning method and device, storage medium and terminal equipment
CN113053368A (en) * 2021-03-09 2021-06-29 锐迪科微电子(上海)有限公司 Speech enhancement method, electronic device, and storage medium
CN114390133A (en) * 2022-01-24 2022-04-22 维沃移动通信有限公司 Recording method and device and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023138632A1 (en) * 2022-01-24 2023-07-27 维沃移动通信有限公司 Voice recording method and apparatus, and electronic device

Also Published As

Publication number Publication date
WO2023138632A1 (en) 2023-07-27

Similar Documents

Publication Publication Date Title
US11030987B2 (en) Method for selecting background music and capturing video, device, terminal apparatus, and medium
JP7138201B2 (en) Video shooting method, device, terminal device and storage medium
CN108733343B (en) Method, device and storage medium for generating voice control instruction
CN108469772B (en) Control method and device of intelligent equipment
CN111371988B (en) Content operation method, device, terminal and storage medium
CN112714253B (en) Video recording method and device, electronic equipment and readable storage medium
CN111081257A (en) Voice acquisition method, device, equipment and storage medium
KR102208822B1 (en) Apparatus, method for recognizing voice and method of displaying user interface therefor
CN113676592B (en) Recording method, recording device, electronic equipment and computer readable medium
CN114387398A (en) Three-dimensional scene loading method, loading device, electronic equipment and readable storage medium
CN108256071B (en) Method and device for generating screen recording file, terminal and storage medium
CN103473361A (en) Searching method and searching device
CN114390133A (en) Recording method and device and electronic equipment
CN110992983B (en) Method, device, terminal and storage medium for identifying audio fingerprint
CN103500197A (en) Searching method and device
CN114416256A (en) Information processing method, information processing device, electronic equipment and storage medium
CN113782027A (en) Audio processing method and audio processing device
CN113163256A (en) Method and device for generating operation flow file based on video
CN112584225A (en) Video recording processing method, video playing control method and electronic equipment
CN104090911A (en) Information processing method and electronic equipment
CN114245174B (en) Video preview method and related equipment
CN103488763A (en) Search method and search device
CN112764601B (en) Information display method and device and electronic equipment
WO2021073336A1 (en) A system and method for creating real-time video
CN103167115A (en) System and method of assisted search of mobile browser based on human brain memory characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination