WO2023138632A1 - Voice recording method and apparatus, and electronic device - Google Patents

Voice recording method and apparatus, and electronic device Download PDF

Info

Publication number
WO2023138632A1
WO2023138632A1 PCT/CN2023/072987 CN2023072987W WO2023138632A1 WO 2023138632 A1 WO2023138632 A1 WO 2023138632A1 CN 2023072987 W CN2023072987 W CN 2023072987W WO 2023138632 A1 WO2023138632 A1 WO 2023138632A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound source
voice
sound
directory
terminal
Prior art date
Application number
PCT/CN2023/072987
Other languages
French (fr)
Chinese (zh)
Inventor
高志稳
Original Assignee
维沃移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 维沃移动通信有限公司 filed Critical 维沃移动通信有限公司
Publication of WO2023138632A1 publication Critical patent/WO2023138632A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/64Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations
    • H04M1/642Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations storing speech in digital form
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/02Constructional features of telephone sets
    • H04M1/19Arrangements of transmitters, receivers, or complete sets to prevent eavesdropping, to attenuate local noise or to prevent undesired transmission; Mouthpieces or receivers specially adapted therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Definitions

  • the present application belongs to the field of electronic equipment, and in particular relates to a recording method, device and electronic equipment.
  • Smartphone recordings are not very capable of suppressing environmental noise, especially in scenarios such as multi-person conferences, where slightly louder surrounding sounds will be recorded, resulting in low sound recognition and loud noise.
  • the purpose of the embodiment of the present application is to provide a recording method, device and electronic equipment, which can solve the problem of poor suppression of environmental noise, especially in scenarios such as multi-person conferences, where slightly louder surrounding sounds will be recorded, resulting in low sound recognition and high noise.
  • the embodiment of the present application provides a recording method, the method comprising:
  • the terminal determines the position of the target sound source through the microphone array; wherein, the target sound source is the The sound intensity of the voice signal that is born satisfies the sound source of the first condition, and the microphone array includes at least three microphones that are not arranged in a straight line;
  • the voice signal is collected in a directional collection mode to obtain a voice file.
  • an embodiment of the present application provides a recording device, the device comprising:
  • the sound source localization module is used to determine the position of the target sound source through the microphone array; wherein the target sound source is a sound source whose sound intensity of the generated speech signal meets the first condition, and the microphone array includes at least three microphones that are not arranged in a straight line;
  • the voice collection module is used to collect voice signals in a directional collection mode according to the position of the target sound source to obtain a voice file.
  • an embodiment of the present application provides an electronic device, the electronic device includes a processor and a memory, the memory stores programs or instructions that can run on the processor, and when the programs or instructions are executed by the processor, the steps of the method described in the first aspect are implemented.
  • an embodiment of the present application provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the steps of the method described in the first aspect are implemented.
  • the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run programs or instructions to implement the method described in the first aspect.
  • an embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the method described in the first aspect.
  • the terminal determines the position of the target sound source through the microphone array; wherein, The target sound source is a sound source whose sound intensity of the generated speech signal meets the first condition, and the microphone array includes at least three microphones not arranged in a straight line; according to the position of the target sound source, the speech signal is collected in a directional collection mode.
  • the determined target sound source is firstly positioned and directional collected, so that the speech signal of the target sound source can be effectively separated from the noise environment and collected to obtain clearer sound clips.
  • Fig. 1 is a schematic flow chart of a recording method provided by the embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a microphone array provided by an embodiment of the present application.
  • Fig. 3 is a schematic flow chart of another recording method provided by the embodiment of the present application.
  • Fig. 4 is a schematic flow chart of another recording method provided by the embodiment of the present application.
  • Fig. 5 is a schematic flow chart of another recording method provided by the embodiment of the present application.
  • Fig. 6 is a schematic flow chart of a sound clip identification method provided by an embodiment of the present application.
  • FIG. 7 is a schematic display diagram of a display interface provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a terminal group provided by an embodiment of the present application.
  • Fig. 9 is a schematic display diagram of another display interface provided by the embodiment of the present application.
  • Fig. 10 is a schematic display diagram of another display interface provided by the embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a recording device provided in an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
  • first”, “second” and the like in the specification and claims of the present application are used to distinguish similar objects, and are not used to describe a specific sequence or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances, so that the embodiments of the present application can be implemented in an order other than those illustrated or described here, and the objects distinguished by "first”, “second” and so on are generally of one type, and the number of objects is not limited. For example, there can be one or more first objects.
  • “and/or” in the specification and claims means at least one of the connected objects, and the character “/” generally means that the related objects are an "or” relationship.
  • the embodiment of the present application provides a recording method.
  • the execution body of the method is a terminal, and a microphone array is arranged on the terminal.
  • the microphone array includes at least three microphones, and it is required that the at least three microphones are not arranged on the same straight line.
  • three microphones 101 that are not on the same straight line are respectively installed on the top, middle and bottom of the terminal 100 to form a microphone array.
  • the recording method may include the following steps.
  • Step 110 the terminal determines the position of the target sound source through the microphone array; wherein the target sound source is a sound source whose sound intensity of the generated voice signal meets the first condition, and the microphone array includes at least three microphones that are not arranged in a straight line.
  • the target sound source is a sound source corresponding to a speech signal satisfying a first condition among the speech signals collected by the microphone array.
  • the first condition can be set according to actual needs, and can be based on the sound of the voice signal
  • the intensity can be set, or it can be set based on the duration of the voice signal.
  • the first condition is that the sound intensity of the voice signal exceeds a first threshold.
  • the microphone array monitors the sound intensity of the voice signal collected in the environment, and when the sound intensity of the voice signal exceeds the first threshold, the sound source that generates the voice seat number is determined as the target sound source.
  • the first condition may also be that the duration of the sound intensity exceeding the first threshold exceeds the first duration; or, the sound intensity of the voice signals whose sound intensity exceeds the first threshold in the current environment is the highest.
  • the first condition is that the sound intensity of the voice signal exceeds the first threshold A as an example for illustration.
  • the terminal can locate the target sound source by using the microphone array, so as to acquire the position of the target sound source.
  • the manner of using multiple microphones to locate the sound source can be varied, and the position in three-dimensional space of the target sound source that generates the speech signal can be calculated by using the differences in the sound intensity and phase of the speech signals collected by different microphones in the microphone array.
  • Step 120 according to the position of the target sound source, collect the voice signal in a directional collection mode to obtain a voice file.
  • the terminal may control the microphone array to adopt a directional collection mode, and collect voice signals in a direction in which the target sound source is located, until the sound intensity of the collected voice signals is less than a second threshold, or when the sound intensity of the collected voice signals is less than the second threshold for a duration exceeding a second duration, exit the directional collection mode, and the collected voice information will be saved as a voice file.
  • the second threshold may be equal to the first threshold, or smaller than the first threshold.
  • the directional acquisition modes in step 120 can be various, and can include:
  • the voice signal is collected to obtain a sound segment.
  • the acquisition area can be determined in a variety of ways. It can be within a radius range centered on the target sound source, or it can be a fan-shaped area with the microphone array as the apex and a line connecting the microphone array to the target sound source as the centerline.
  • the opening angle of the fan-shaped area can be set to a first angle X.
  • the fan-shaped area formed with the target sound source 200 as the center is used as the collection area to collect voice information.
  • the fan-shaped area is taken as an example for illustration.
  • the method further includes:
  • the voice signals outside the collection area are shielded or suppressed. As shown in FIG. 3 , the voice signals generated by other sound sources 201 outside the fan-shaped area are shielded or suppressed.
  • the manner of shielding or suppressing the voice signal can be set according to actual needs, for example, different gains in different directions can be formed by adjusting the parameters of each microphone in the microphone array.
  • the gain in the direction of the collection area is increased, while the gain outside the collection area is decreased; the voice signal outside the collection area can also be filtered or suppressed by software algorithm.
  • FIG. 4 shows an example of the recording method of the embodiment of the present application.
  • the sound source is determined as Target the sound source and locate it, so that the voice signal of the sound source is directional collected in an X-angle sector, and at the same time shield or suppress the voice signals generated by other sound sources;
  • the terminal can obtain the complete voice file of this recording.
  • the embodiments of the present invention determine the position of the target sound source through the microphone array; wherein, the target sound source is a sound source whose sound intensity of the generated speech signal meets the first condition, and the microphone array includes at least three microphones that are not arranged in a straight line; according to the position of the target sound source, the speech signal is collected in a directional collection mode.
  • the determined target sound source is firstly positioned and directional collected, so that the speech signal of the target sound source can be effectively separated from the noise environment and collected to obtain clearer sound clips.
  • the method further includes:
  • Step 130 Determine a sound source corresponding to the voice signal according to the voiceprint feature of the voice signal.
  • While collecting the voice signal, identification of the sound source that generates the voice signal may also be enabled. There may be various ways of identifying the identity, and this embodiment of the present application only uses the way of matching based on voiceprint features as an example for illustration.
  • the voiceprint feature of the speech signal to be recognized is extracted, and matched with the existing voiceprint feature of the existing sound source of the terminal, if the matching is successful, then it is determined that the sound source generating the speech signal is an existing sound source, and then the sound segment containing the speech signal is extracted and recorded in the corresponding directory; and if the matching fails, then it is determined that the sound source generating the speech signal is a new sound source, press Name the identity of the new sound source according to the naming rules, and store the new sound source and voiceprint features into a new existing sound source and existing voiceprint features, and then extract the sound segment containing the voice signal and record it in the corresponding directory.
  • the method further includes:
  • the sound segment can be extracted based on the first condition, and according to the variation of the sound intensity of the voice signal, the start mark of each sound segment can be when the sound intensity meets the first condition or meets the first condition for a first duration, and the end mark is when the sound intensity does not meet the first condition or does not meet the first condition for a second duration.
  • the sound segment is recorded into a first directory corresponding to the sound source, and the first directory is located under a second directory corresponding to the voice file.
  • the arrangement rules and display rules of the directories in the display interface can be set according to actual needs, and this embodiment of the present application only gives an example for illustration.
  • the directory name of described second directory can be the mark of described audio file, under described second directory, set up the first directory corresponding to each sound source as second-level directory
  • the directory name of described first directory can be the identification mark of corresponding sound source: sound source A, sound source B and sound source C, and record the sound segment corresponding to each sound source under each first directory, A-1, A-2 and A-3
  • the title of each sound segment can include corresponding sound Source ID and sequence number.
  • the sound clip is recorded in the first directory corresponding to the existing sound source; The first directory corresponding to the source, and record the sound clip into the newly created first directory.
  • the method also includes:
  • the first input can be generated by a touch operation, for example, a click operation, a long press operation, or a slide operation, etc., or can also be generated by a voice operation or a gesture operation, which is not specifically limited here.
  • a touch operation for example, a click operation, a long press operation, or a slide operation, etc.
  • a voice operation or a gesture operation which is not specifically limited here.
  • the sound clip or voice file is played.
  • the user chooses to play the voice file or sound segment displayed on the display interface. For example, as shown in Figure 7, the user can play or pause the audio file that contains each sound clip by performing a first operation on the second directory corresponding to the audio file; the user can also expand or collapse the list of the first directory under the second directory by performing a second operation on the second directory; the user can also expand or collapse the list of sound clips under the first directory by performing a third operation on the expanded first directory;
  • the embodiments of the present invention determine the sound source that generates the speech signal according to the voiceprint feature of the speech signal. Through the embodiment of the present invention, the sound source of the voice signal is identified, and the corresponding sound clips are extracted, and then recorded in the corresponding directory, so that the sound clips can be efficiently managed, reasonably displayed and played.
  • the method further includes:
  • a terminal group composed of multiple terminals can be formed.
  • multiple terminals joining the conference can be formed into a terminal group.
  • at least one of the terminals can be set as the main terminal, and the other terminals can be set as auxiliary terminals, and the main terminal can record and record the whole process of the multi-terminal conference.
  • Each terminal can determine and locate the target sound source in the surrounding environment according to the method of the above-mentioned embodiment, and then, according to the position of the target sound source, collect voice signals in a directional collection mode, and then send the voice signals to the main terminal, and the main terminal summarizes them in chronological order to form a total voice file.
  • the method further includes:
  • a sound source corresponding to the speech signal is determined.
  • the identification process of the sound source corresponding to each voice signal can be completed by each terminal itself, and then the identification result together with the voice signal is sent to the main terminal; the identification process can also be performed by the main terminal.
  • a sound segment including the voice signal is extracted from the voice file.
  • the sound segment can be extracted based on the first condition, and according to the variation of the sound intensity of the voice signal, the start mark of each sound segment can be when the sound intensity meets the first condition or meets the first condition for a first duration, and the end mark is when the sound intensity does not meet the first condition or does not meet the first condition for a second duration.
  • the display interface On the display interface, record the sound segment into a first directory corresponding to the sound source; wherein, the first directory is located under a third directory corresponding to the terminal, and the third directory is located under a second directory corresponding to the voice file.
  • the main terminal records the sound clip into a corresponding directory on the display interface according to the sound source corresponding to the sound clip in the identification result of the sound clip and the terminal that collects the sound clip.
  • a three-level directory can also be established on the display interface, and the second directory corresponding to the voice file is used as the first-level directory.
  • the directory name of the second directory can be the identification of the voice file.
  • a third directory corresponding to each terminal is established as the second-level directory.
  • the directory name of the third directory can be the identification of the terminal: Terminal 1, Terminal 2, and Terminal 3.
  • the first directory corresponding to each sound source is established under the third directory as the third-level directory, and the directory name of the first directory can be the identity of the corresponding sound source: sound source A, Sound source B and sound source C, and record sound clips corresponding to each sound source in each first directory: 1-A-1, 1-A-2, 1-B-1, etc.
  • the hierarchical relationship between the first directory and the second directory in the above-mentioned three-level directory can also be reversed, the first directory is used as the second-level directory, the third directory is established under the second directory as the third-level directory, and the corresponding sound source is recorded in the third directory.
  • the establishment of the secondary directory as shown in FIG. 9 is taken as an example for illustration.
  • the main terminal extracts and identifies each sound segment in the total voice file, and in the process of identifying the sound segment, through the matching process of voiceprint features, determine the identity of the sound source corresponding to the sound segment, record it in the corresponding third directory and name it.
  • the method also includes:
  • the first input can be generated by a touch operation, for example, a click operation, a long press operation, or a slide operation, etc., or can also be generated by a voice operation or a gesture operation, which is not specifically limited here.
  • a touch operation for example, a click operation, a long press operation, or a slide operation, etc.
  • a voice operation or a gesture operation which is not specifically limited here.
  • the sound clip or voice file is played.
  • the user chooses to play the voice file or sound segment displayed on the display interface. For example, as shown in Figure 9, the user can play or pause the audio file containing each sound clip by performing a first operation on the second directory corresponding to the audio file; the user can also expand or collapse the list of the third directory under the second directory by performing a second operation on the second directory; the user can also expand or collapse the list of sound clips under the third directory by performing a third operation on the expanded third directory;
  • the terminal is the main terminal of a terminal group, and the terminal group includes at least one main terminal and N auxiliary terminals, where N is greater than or equal to 1, the main terminal receives the sound clips collected by the auxiliary terminals, and saves them together as a voice file.
  • the sound source of the sound clip is identified and recorded in a corresponding directory, so that the sound clip can be efficiently managed, displayed and played reasonably.
  • the recording method provided in the embodiment of the present application may be executed by a recording device.
  • the recording method performed by the recording device is taken as an example to describe the recording device provided in the embodiment of the present application.
  • described recording device comprises: sound source localization module 111 and voice acquisition module 112; Wherein, described sound source localization module 111 is used for passing the microphone array, determines the position of target sound source; Wherein, described target sound source is the sound source that the sound intensity of the speech signal that produces meets the first condition, and described microphone array comprises at least three microphones that are not arranged on a straight line; Said voice acquisition module Block 112 is used to collect voice signals in a directional collection mode according to the position of the target sound source to obtain a voice file.
  • the first condition includes:
  • the sound intensity of the speech signal exceeds a first threshold.
  • voice collection module is used for:
  • voice signals are collected.
  • the collection area is a fan-shaped area with the microphone array as an apex and a line connecting the microphone array to the target sound source as a centerline.
  • voice collection module is also used for:
  • the position of the target sound source is determined through the microphone array in the embodiment of the present invention; wherein, the target sound source is a sound source whose sound intensity of the generated speech signal meets the first condition, and the microphone array includes at least three microphones that are not arranged in a straight line; according to the position of the target sound source, the speech signal is collected in a directional collection mode.
  • the determined target sound source is firstly positioned and directional collected, so that the speech signal of the target sound source can be effectively separated from the noise environment and collected to obtain clearer sound clips.
  • the voice collection module is also used for:
  • a sound source corresponding to the voice signal is determined according to the voiceprint feature of the voice signal.
  • the voice collection module is also used for:
  • the sound segment is recorded into a first directory corresponding to the sound source, and the first directory is located under a second directory corresponding to the voice file.
  • the voice collection module is also used for:
  • the sound clip or voice file is played.
  • the embodiments of the present invention determine the sound source that generates the speech signal according to the voiceprint feature of the speech signal. Through the embodiment of the present invention, the identity of the sound source corresponding to the voice signal is identified, and the corresponding sound segment is extracted, and then recorded in the corresponding directory, so that the sound segment can be efficiently managed, reasonably displayed and played.
  • the voice collection module is further configured to:
  • the voice collection module is also used for:
  • the voice collection module is also used for:
  • the sound clip or voice file is played.
  • the terminal group includes at least one main terminal and N auxiliary terminals, where N is greater than or equal to 1, and the main terminal receives the sound clips collected by the auxiliary terminals, and saves them together as a voice file.
  • the sound source of the sound clip is identified and recorded in a corresponding directory, so that the sound clip can be efficiently managed, displayed and played reasonably.
  • the recording device in the embodiment of the present application may be an electronic device, or may be a component in the electronic device, such as an integrated circuit or a chip.
  • the electronic device may be a terminal, or other devices other than the terminal.
  • the electronic device may be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle electronic device, a mobile Internet device (Mobile Internet Device, MID), an augmented reality (Augmented Reality, AR)/virtual reality (Virtual Reality, VR) device, a robot, a wearable device, an ultra-mobile personal computer (Ultra-Mobile Personal Computer, UMPC), a netbook or a personal digital assistant (Personal Digital Assistant, PDA), etc.
  • It can also be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (Personal Computer, PC), a television (television, TV), a teller machine or a self-service machine, etc., which are not specifically limited in this embodiment of the
  • the recording device in this embodiment of the present application may be a device with an operating system.
  • the operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in this embodiment of the present application.
  • the recording device provided in the embodiment of the present application can realize various processes realized by the method embodiments in FIG. 1 to FIG. 10 , and details are not repeated here to avoid repetition.
  • the embodiment of the present application further provides an electronic device 1200, including The processor 1201 and the memory 1202.
  • the memory 1202 stores programs or instructions that can run on the processor 1201.
  • each step of the above-mentioned recording method embodiment can be achieved, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
  • the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.
  • FIG. 13 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
  • the electronic device 1300 includes, but is not limited to: a radio frequency unit 1301, a network module 1302, an audio output unit 1303, an input unit 1304, a sensor 1305, a display unit 1306, a user input unit 1307, an interface unit 1308, a memory 1309, and a processor 1310.
  • the electronic device 1300 can also include a power supply (such as a battery) for supplying power to various components, and the power supply can be logically connected to the processor 1310 through the power management system, so that functions such as management of charging, discharging, and power consumption management can be realized through the power management system.
  • a power supply such as a battery
  • the structure of the electronic device shown in FIG. 13 does not constitute a limitation to the electronic device.
  • the electronic device may include more or fewer components than shown in the figure, or combine some components, or arrange different components, which will not be repeated here.
  • the processor 1310 is configured to determine the position of the target sound source through the input unit 1304; wherein, the target sound source is a sound source whose sound intensity of the generated speech signal meets the first condition; the input unit 1304 includes at least three microphones that are not arranged in a straight line;
  • the input unit 1304 is configured to collect voice signals in a directional collection mode according to the position of the target sound source to obtain a voice file.
  • the first condition includes:
  • the sound intensity of the speech signal exceeds a first threshold.
  • the input unit 1304 is used for:
  • voice signals are collected.
  • the collection area is a fan-shaped area with the microphone array as an apex and a line connecting the microphone array to the target sound source as a centerline.
  • the input unit 1304 is also used for:
  • the determined target sound source is firstly positioned and directional collected, so that the speech signal of the target sound source can be effectively separated from the noise environment and collected to obtain clearer sound clips.
  • the processor 1310 is further configured to: determine the sound source corresponding to the voice signal according to the voiceprint feature of the voice signal.
  • the processor 1310 is further configured to:
  • the sound segment is recorded into a first directory corresponding to the sound source, and the first directory is located under a second directory corresponding to the voice file.
  • the user input unit 1307 is configured to receive a first input to the display interface
  • the audio output unit 1303 is configured to play the sound clip or voice file in response to the first input.
  • the identity of the sound source corresponding to the voice signal is identified, and the corresponding sound segment is extracted, and then recorded in the corresponding directory, so that the sound segment can be efficiently managed, reasonably displayed and played.
  • the radio frequency unit 1301 is further configured to: receive and summarize voice signals collected by the auxiliary terminals to obtain a voice file.
  • the processor 1310 is further configured to:
  • the sound source of the sound clip is identified and recorded in a corresponding directory, so that the sound clip can be efficiently managed, displayed and played reasonably.
  • the input unit 1304 may include a graphics processor (Graphics Processing Unit, GPU) 13041 and a microphone 13042, and the graphics processor 13041 processes image data of still pictures or videos obtained by an image capture device (such as a camera) in a video capture mode or an image capture mode.
  • the display unit 1306 may include a display panel 13061, and the display panel 13061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the user input unit 1307 includes at least one of a touch panel 13071 and other input devices 13072 .
  • Touch panel 13071 also called touch screen.
  • the touch panel 13071 may include two parts, a touch detection device and a touch controller.
  • Other input devices 13072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be repeated here.
  • the memory 1309 can be used to store software programs as well as various data.
  • the memory 1309 may mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or instructions required by at least one function (such as a sound playing function) function, image playback function, etc.), etc.
  • memory 1309 can include volatile memory or nonvolatile memory, or, memory 1309 can include both volatile and nonvolatile memory.
  • the non-volatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable PROM, EPROM), an electrically erasable programmable read-only memory (Electrically EPROM, EEPROM) or a flash memory.
  • ROM Read-Only Memory
  • PROM programmable read-only memory
  • Erasable PROM Erasable PROM
  • EPROM electrically erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • Volatile memory can be Random Access Memory (Random Access Memory, RAM), Static Random Access Memory (Static RAM, SRAM), Dynamic Random Access Memory (Dynamic RAM, DRAM), Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate SDRAM, DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced SDRAM, ESD RAM), synchronous connection dynamic random access memory (Synch link DRAM, SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DRRAM).
  • the memory 1309 in the embodiment of the present application includes but is not limited to these and any other suitable types of memory.
  • the processor 1310 may include one or more processing units; optionally, the processor 1310 integrates an application processor and a modem processor, wherein the application processor mainly processes operations related to the operating system, user interface, and application programs, and the modem processor mainly processes wireless communication signals, such as a baseband processor. It can be understood that the foregoing modem processor may not be integrated into the processor 1310 .
  • the embodiment of the present application also provides a readable storage medium.
  • the readable storage medium stores a program or an instruction.
  • the program or instruction is executed by a processor, each process of the above-mentioned recording method embodiment can be achieved, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
  • Described readable storage medium includes computer readable storage medium, such as computer read-only memory ROM, random memory Take the memory RAM, magnetic disk or optical disk, etc.
  • the embodiment of the present application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the various processes of the above-mentioned recording method embodiment, and can achieve the same technical effect. To avoid repetition, details are not repeated here.
  • chips mentioned in the embodiments of the present application may also be called system-on-chip, system-on-chip, system-on-a-chip, or system-on-a-chip.
  • the embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the various processes in the above-mentioned recording method embodiment, and can achieve the same technical effect. To avoid repetition, details are not repeated here.
  • the methods in the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course they can also over hardware, but in many cases the former is a better implementation.
  • the technical solution of the present application can be embodied in the form of a computer software product in essence or the part that contributes to the prior art.
  • the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), and includes several instructions to make a terminal (which can be a mobile phone, computer, server, or network device, etc.) execute the method described in each embodiment of the application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The present application belongs to the field of electronic devices. Disclosed are a voice recording method and apparatus, and an electronic device. The method comprises: a terminal determining the position of a target sound source by means of a microphone array, wherein the target sound source is a sound source for which the sound intensity of a generated voice signal meets a first condition, and the microphone array comprises at least three microphones which are not arranged on one straight line; and according to the position of the target sound source, collecting voice signals in a directional collection mode, so as to obtain a voice file.

Description

录音方法、装置和电子设备Recording method, device and electronic equipment
交叉引用cross reference
本发明要求在2022年01月24日提交中国专利局、申请号为202210082304.1、发明名称为“录音方法、装置和电子设备”的中国专利申请的优先权,该申请的全部内容通过引用结合在本发明中。The present invention claims the priority of the Chinese patent application submitted to the China Patent Office on January 24, 2022, with the application number 202210082304.1 and the title of the invention "Recording method, device and electronic equipment". The entire content of this application is incorporated by reference in the present invention.
技术领域technical field
本申请属于电子设备领域,具体涉及一种录音方法、装置和电子设备。The present application belongs to the field of electronic equipment, and in particular relates to a recording method, device and electronic equipment.
背景技术Background technique
随着科学技术的不断进步和发展,智能手机的功能也越来越丰富多样,人们对于录音技术和质量要求也越来越高,好的录音技术能带给人很好的体验效果,同样精彩丰富的声音也带给人更真挚的听觉。With the continuous progress and development of science and technology, the functions of smart phones are becoming more and more diverse. People have higher and higher requirements for recording technology and quality. Good recording technology can bring people a good experience, and the same wonderful and rich sound can also bring people more sincere hearing.
智能手机录音对于环境噪声的抑制能力不强,尤其是在多人会议等场景下,周围稍微大一点的声音都会被录音进去,声音识别度低,噪声大。Smartphone recordings are not very capable of suppressing environmental noise, especially in scenarios such as multi-person conferences, where slightly louder surrounding sounds will be recorded, resulting in low sound recognition and loud noise.
发明内容Contents of the invention
本申请实施例的目的是提供一种录音方法、装置和电子设备,能够解决对于环境噪声的抑制能力不强,尤其是在多人会议等场景下,周围稍微大一点的声音都会被录音进去,声音识别度低,噪声大的问题。The purpose of the embodiment of the present application is to provide a recording method, device and electronic equipment, which can solve the problem of poor suppression of environmental noise, especially in scenarios such as multi-person conferences, where slightly louder surrounding sounds will be recorded, resulting in low sound recognition and high noise.
第一方面,本申请实施例提供了一种录音方法,所述方法包括:In the first aspect, the embodiment of the present application provides a recording method, the method comprising:
终端通过麦克风阵列,确定目标声源的位置;其中,所述目标声源为产 生的语音信号的声音强度满足第一条件的声源,所述麦克风阵列包括至少三个不排列在一条直线上的麦克风;The terminal determines the position of the target sound source through the microphone array; wherein, the target sound source is the The sound intensity of the voice signal that is born satisfies the sound source of the first condition, and the microphone array includes at least three microphones that are not arranged in a straight line;
根据所述目标声源的位置,以定向采集模式采集语音信号,得到语音文件。According to the position of the target sound source, the voice signal is collected in a directional collection mode to obtain a voice file.
第二方面,本申请实施例提供了一种录音装置,所述装置包括:In a second aspect, an embodiment of the present application provides a recording device, the device comprising:
声源定位模块,用于通过麦克风阵列,确定目标声源的位置;其中,所述目标声源为产生的语音信号的声音强度满足第一条件的声源,所述麦克风阵列包括至少三个不排列在一条直线上的麦克风;The sound source localization module is used to determine the position of the target sound source through the microphone array; wherein the target sound source is a sound source whose sound intensity of the generated speech signal meets the first condition, and the microphone array includes at least three microphones that are not arranged in a straight line;
语音采集模块,用于根据所述目标声源的位置,以定向采集模式采集语音信号,得到语音文件。The voice collection module is used to collect voice signals in a directional collection mode according to the position of the target sound source to obtain a voice file.
第三方面,本申请实施例提供了一种电子设备,该电子设备包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如第一方面所述的方法的步骤。In a third aspect, an embodiment of the present application provides an electronic device, the electronic device includes a processor and a memory, the memory stores programs or instructions that can run on the processor, and when the programs or instructions are executed by the processor, the steps of the method described in the first aspect are implemented.
第四方面,本申请实施例提供了一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤。In a fourth aspect, an embodiment of the present application provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the steps of the method described in the first aspect are implemented.
第五方面,本申请实施例提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如第一方面所述的方法。In a fifth aspect, the embodiment of the present application provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run programs or instructions to implement the method described in the first aspect.
第六方面,本申请实施例提供一种计算机程序产品,该程序产品被存储在存储介质中,该程序产品被至少一个处理器执行以实现如第一方面所述的方法。In a sixth aspect, an embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the method described in the first aspect.
在本申请实施例中,终端通过麦克风阵列,确定目标声源的位置;其中, 所述目标声源为产生的语音信号的声音强度满足第一条件的声源,所述麦克风阵列包括至少三个不排列在一条直线上的麦克风;根据所述目标声源的位置,以定向采集模式采集语音信号。通过本发明实施例,先对确定的目标声源进行定位并定向采集,从而可以有效得从噪声环境中分离出目标声源的语音信号并进行采集,以获得更加清晰的声音片段。In the embodiment of the present application, the terminal determines the position of the target sound source through the microphone array; wherein, The target sound source is a sound source whose sound intensity of the generated speech signal meets the first condition, and the microphone array includes at least three microphones not arranged in a straight line; according to the position of the target sound source, the speech signal is collected in a directional collection mode. Through the embodiment of the present invention, the determined target sound source is firstly positioned and directional collected, so that the speech signal of the target sound source can be effectively separated from the noise environment and collected to obtain clearer sound clips.
附图说明Description of drawings
图1是本申请实施例提供的一种录音方法的流程示意图;Fig. 1 is a schematic flow chart of a recording method provided by the embodiment of the present application;
图2是本申请实施例提供的麦克风阵列结构示意图;FIG. 2 is a schematic structural diagram of a microphone array provided by an embodiment of the present application;
图3是本申请实施例提供的另一种录音方法的流程示意图;Fig. 3 is a schematic flow chart of another recording method provided by the embodiment of the present application;
图4是本申请实施例提供的另一种录音方法的流程示意图;Fig. 4 is a schematic flow chart of another recording method provided by the embodiment of the present application;
图5是本申请实施例提供的另一种录音方法的流程示意图;Fig. 5 is a schematic flow chart of another recording method provided by the embodiment of the present application;
图6是本申请实施例提供的一种声音片段的身份识别方法的流程示意图;Fig. 6 is a schematic flow chart of a sound clip identification method provided by an embodiment of the present application;
图7是本申请实施例提供的一种显示界面的显示示意图;FIG. 7 is a schematic display diagram of a display interface provided by an embodiment of the present application;
图8是本申请实施例提供的一种终端组的结构示意图;FIG. 8 is a schematic structural diagram of a terminal group provided by an embodiment of the present application;
图9是本申请实施例提供的另一种显示界面的显示示意图;Fig. 9 is a schematic display diagram of another display interface provided by the embodiment of the present application;
图10是本申请实施例提供的另一种显示界面的显示示意图;Fig. 10 is a schematic display diagram of another display interface provided by the embodiment of the present application;
图11是本申请实施例提供的一种录音装置的结构示意图;FIG. 11 is a schematic structural diagram of a recording device provided in an embodiment of the present application;
图12是本申请实施例提供一种电子设备的结构示意图;FIG. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present application;
图13是实现本申请实施例的一种电子设备的硬件结构示意图。FIG. 13 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行 清楚地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员获得的所有其他实施例,都属于本申请保护的范围。The technical scheme in the embodiment of the application will be carried out below in conjunction with the accompanying drawings in the embodiment of the application Clearly described, it is obvious that the described embodiments are some of the embodiments of the present application, but not all of the embodiments. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments in this application belong to the protection scope of this application.
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”等所区分的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。The terms "first", "second" and the like in the specification and claims of the present application are used to distinguish similar objects, and are not used to describe a specific sequence or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances, so that the embodiments of the present application can be implemented in an order other than those illustrated or described here, and the objects distinguished by "first", "second" and so on are generally of one type, and the number of objects is not limited. For example, there can be one or more first objects. In addition, "and/or" in the specification and claims means at least one of the connected objects, and the character "/" generally means that the related objects are an "or" relationship.
下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的录音方法进行详细地说明。The recording method provided by the embodiment of the present application will be described in detail below through specific embodiments and application scenarios with reference to the accompanying drawings.
如图1所示,本申请实施例提供了一种录音方法,该方法的执行主体为终端,在所述终端上设置有麦克风阵列,所述麦克风阵列包括至少三个麦克风,且要求所述至少三个麦克风不设置在同一条直线上。如图2所示,在所述终端100的顶端、中间和底部分别设备三个不在同一直线上的麦克风101,形成一个麦克风阵列。所述录音方法可以包括以下步骤。As shown in FIG. 1 , the embodiment of the present application provides a recording method. The execution body of the method is a terminal, and a microphone array is arranged on the terminal. The microphone array includes at least three microphones, and it is required that the at least three microphones are not arranged on the same straight line. As shown in FIG. 2 , three microphones 101 that are not on the same straight line are respectively installed on the top, middle and bottom of the terminal 100 to form a microphone array. The recording method may include the following steps.
步骤110、终端通过麦克风阵列,确定目标声源的位置;其中,所述目标声源为产生的语音信号的声音强度满足第一条件的声源,所述麦克风阵列包括至少三个不排列在一条直线上的麦克风。Step 110, the terminal determines the position of the target sound source through the microphone array; wherein the target sound source is a sound source whose sound intensity of the generated voice signal meets the first condition, and the microphone array includes at least three microphones that are not arranged in a straight line.
所述目标声源为所述麦克风阵列采集的语音信号中,与满足第一条件的语音信号对应的声源。The target sound source is a sound source corresponding to a speech signal satisfying a first condition among the speech signals collected by the microphone array.
所述第一条件可以根据实际的需要进行设定,可以基于语音信号的声音 强度进行设定,也可以是基于语音信号的持续时间进行设定。The first condition can be set according to actual needs, and can be based on the sound of the voice signal The intensity can be set, or it can be set based on the duration of the voice signal.
在一种实施方式中,第一条件为语音信号的声音强度超过第一阈值。麦克风阵列通过监测环境中采集到的语音信号的声音强度,在语音信号的声音强度超过所述第一阈值时,将产生该语音座号的声源确定为目标声源。In one embodiment, the first condition is that the sound intensity of the voice signal exceeds a first threshold. The microphone array monitors the sound intensity of the voice signal collected in the environment, and when the sound intensity of the voice signal exceeds the first threshold, the sound source that generates the voice seat number is determined as the target sound source.
所述第一条件也可以为声音强度超过第一阈值的持续时间超过第一时长;或者,当前环境中声音强度超过第一阈值的语音信号中声音强度最高。为了简便起见,在下面的实施例中均以所述第一条件为语音信号的声音强度超过第一阈值A为例进行举例说明。The first condition may also be that the duration of the sound intensity exceeding the first threshold exceeds the first duration; or, the sound intensity of the voice signals whose sound intensity exceeds the first threshold in the current environment is the highest. For the sake of simplicity, in the following embodiments, the first condition is that the sound intensity of the voice signal exceeds the first threshold A as an example for illustration.
终端通过所述麦克风阵列可以对目标声源进行定位,以获取所述目标声源的位置。其中,利用多个麦克风对声源进行定位的方式可以多种多样,可以利用所述麦克风阵列中不同麦克风之间采集到的语音信号在声音强度和相位等因素上的差异,计算得到产生所述语音信号的目标声源在三维空间中的位置。The terminal can locate the target sound source by using the microphone array, so as to acquire the position of the target sound source. Wherein, the manner of using multiple microphones to locate the sound source can be varied, and the position in three-dimensional space of the target sound source that generates the speech signal can be calculated by using the differences in the sound intensity and phase of the speech signals collected by different microphones in the microphone array.
步骤120、根据所述目标声源的位置,以定向采集模式采集语音信号,得到语音文件。Step 120, according to the position of the target sound source, collect the voice signal in a directional collection mode to obtain a voice file.
在确定所述目标声源的位置后,终端可以控制所述麦克风阵列采用定向采集模式,向所述目标声源的位置所在方向定向采集语音信号,直到所述所采集到的语音信号的声音强度小于第二阈值时,或者采集到的语音信号的声音强度小于第二阈值的持续时间超过第二时长时,退出所述定向采集模式,采集的语音信息将被保存为语音文件。回到正常采集模式,监测当前环境中是否再次出现满足第一条件的语音信号。此时,若采集到不满足第一条件的语音信号将被当作底噪声覆盖保存到语音文件中。其中,所述第二阈值可以等于第一阈值,或者小于所述第一阈值。 After determining the position of the target sound source, the terminal may control the microphone array to adopt a directional collection mode, and collect voice signals in a direction in which the target sound source is located, until the sound intensity of the collected voice signals is less than a second threshold, or when the sound intensity of the collected voice signals is less than the second threshold for a duration exceeding a second duration, exit the directional collection mode, and the collected voice information will be saved as a voice file. Return to the normal acquisition mode, and monitor whether the voice signal meeting the first condition reappears in the current environment. At this time, if the collected voice signal does not satisfy the first condition, it will be overlaid as the noise floor and saved in the voice file. Wherein, the second threshold may be equal to the first threshold, or smaller than the first threshold.
进一步地,所述步骤120中的定向采集模式可以多种多样,可以包括:Further, the directional acquisition modes in step 120 can be various, and can include:
以所述目标声源为中心,设定采集区域;Setting an acquisition area centered on the target sound source;
从所述采集区域内,采集语音信号得到声音片段。From within the collection area, the voice signal is collected to obtain a sound segment.
进一步地,所述采集区域的确定方式可预多种多样,可以为以所述目标声源为中心的半径范围内,也可以为以所述麦克风阵列为顶点,以所述麦克风阵列到所述目标声源的连线为中心线的扇形区域,所述扇形区域的开角可以设置为第一角度X。如图3所示,以所述目标声源200为中心形成的扇形区域为采集区域采集语音信息。为了简便起见,在下面的实施例中均为扇形区域为例进行举例说明。Further, the acquisition area can be determined in a variety of ways. It can be within a radius range centered on the target sound source, or it can be a fan-shaped area with the microphone array as the apex and a line connecting the microphone array to the target sound source as the centerline. The opening angle of the fan-shaped area can be set to a first angle X. As shown in FIG. 3 , the fan-shaped area formed with the target sound source 200 as the center is used as the collection area to collect voice information. For the sake of simplicity, in the following embodiments, the fan-shaped area is taken as an example for illustration.
在一种实施方式中,为了保证在采集区域内采集到的语音信号的清晰度,所述方法还包括:In one embodiment, in order to ensure the clarity of the speech signal collected in the collection area, the method further includes:
对所述采集区域外的语音信号进行屏蔽或抑制,如图3所示,对扇形区域外的其它声源201产生的语音信号进行屏蔽或抑制。The voice signals outside the collection area are shielded or suppressed. As shown in FIG. 3 , the voice signals generated by other sound sources 201 outside the fan-shaped area are shielded or suppressed.
所述对语音信号进行屏蔽或抑制的方式可以根据实际的需要进行设定,例如,可以通过调整麦克风阵列中各麦克风的参数,形成不同方向上的不同增益。使采集区域方向上的增益升高,而所述采集区域外的增益降低;还可以通过软件算法滤除或抑制采集区域外的语音信号。The manner of shielding or suppressing the voice signal can be set according to actual needs, for example, different gains in different directions can be formed by adjusting the parameters of each microphone in the microphone array. The gain in the direction of the collection area is increased, while the gain outside the collection area is decreased; the voice signal outside the collection area can also be filtered or suppressed by software algorithm.
图4给出了本申请实施例的录音方法的一种举例说明。FIG. 4 shows an example of the recording method of the embodiment of the present application.
录音开始;recording starts;
监测当前环境中不同声源产生的语音信号的声音强度是否超过第一阈值A;依次判断包括声源1、声源2和声源3在内的各声源的语音信号的声音强度是否超过第一阈值A;Whether the sound intensity of the voice signals produced by different sound sources in the monitoring current environment exceeds the first threshold value A; sequentially determine whether the sound intensity of the voice signals of each sound source including sound source 1, sound source 2 and sound source 3 exceeds the first threshold value A;
若任一声源的语音信息的声音强度超过第一阈值A,则将该声源确定为 目标声源并进行定位,以对该声源的语音信号进行X角度扇形的定向采集,同时屏蔽或抑制其它声源产生的语音信号;If the sound intensity of the voice information of any sound source exceeds the first threshold A, then the sound source is determined as Target the sound source and locate it, so that the voice signal of the sound source is directional collected in an X-angle sector, and at the same time shield or suppress the voice signals generated by other sound sources;
若未超过第一阈值A,则继续监测下一声源的语音信号,以此类推,直到录音结束。If the first threshold A is not exceeded, continue to monitor the voice signal of the next sound source, and so on until the end of the recording.
在录音结束后,终端可以得到本次录音的完整语音文件。After the recording ends, the terminal can obtain the complete voice file of this recording.
由以上本发明实施例提供的技术方案可见,本发明实施例通过麦克风阵列,确定目标声源的位置;其中,所述目标声源为产生的语音信号的声音强度满足第一条件的声源,所述麦克风阵列包括至少三个不排列在一条直线上的麦克风;根据所述目标声源的位置,以定向采集模式采集语音信号。通过本发明实施例,先对确定的目标声源进行定位并定向采集,从而可以有效得从噪声环境中分离出目标声源的语音信号并进行采集,以获得更加清晰的声音片段。It can be seen from the technical solutions provided by the above embodiments of the present invention that the embodiments of the present invention determine the position of the target sound source through the microphone array; wherein, the target sound source is a sound source whose sound intensity of the generated speech signal meets the first condition, and the microphone array includes at least three microphones that are not arranged in a straight line; according to the position of the target sound source, the speech signal is collected in a directional collection mode. Through the embodiment of the present invention, the determined target sound source is firstly positioned and directional collected, so that the speech signal of the target sound source can be effectively separated from the noise environment and collected to obtain clearer sound clips.
基于上述实施例,进一步地,如图5所示,在步骤120中采集语音信号的过程中或者采集完成后,所述方法还包括:Based on the foregoing embodiments, further, as shown in FIG. 5 , during the process of collecting the voice signal in step 120 or after the collection is completed, the method further includes:
步骤130、根据所述语音信号的声纹特征,确定与所述语音信号对应的声源。Step 130: Determine a sound source corresponding to the voice signal according to the voiceprint feature of the voice signal.
在采集语音信号的同时,还可以开启对所述产生所述语音信号的声源的身份识别。所述身份识别的方式可以多种多样,本申请实施例仅以基于声纹特征进行匹配的方式为例进行举例说明。While collecting the voice signal, identification of the sound source that generates the voice signal may also be enabled. There may be various ways of identifying the identity, and this embodiment of the present application only uses the way of matching based on voiceprint features as an example for illustration.
如图6所示,通过特征提取,提取待识别的语音信号的声纹特征,并与终端已有声源的已有声纹特征进行匹配,若匹配成功,则确定产生所述语音信号的声源为已有声源,然后提取包含所述语音信号的声音片段并记录到对应目录中;而若匹配失败,则确定产生所述语音信号的声源为新的声源,按 照命名规则对所述新的声源的身份标识进行命名,并将所述新的声源、声纹特征入库,成为新的已有声源和已有声纹特征,然后提取包含所述语音信号的声音片段并记录到对应目录中。As shown in Figure 6, through feature extraction, the voiceprint feature of the speech signal to be recognized is extracted, and matched with the existing voiceprint feature of the existing sound source of the terminal, if the matching is successful, then it is determined that the sound source generating the speech signal is an existing sound source, and then the sound segment containing the speech signal is extracted and recorded in the corresponding directory; and if the matching fails, then it is determined that the sound source generating the speech signal is a new sound source, press Name the identity of the new sound source according to the naming rules, and store the new sound source and voiceprint features into a new existing sound source and existing voiceprint features, and then extract the sound segment containing the voice signal and record it in the corresponding directory.
进一步地,在所述步骤S130之后,所述方法还包括:Further, after the step S130, the method further includes:
根据声音强度,提取出包括所述语音信号的声音片段;Extracting a sound segment including the speech signal according to the sound intensity;
其中,对所述声音片段的提取方式可以多种多样,本申请实施例仅给出了其中的一种举例说明。可以基于所述第一条件对声音片段进行提取,根据语音信号的声音强度的变化情况,每一段声音片段的开始标志可以为声音强度满足第一条件时或者满足第一条件持续第一时长时,结束标志为声音强度不满足第一条件时或者不满足第一条件持续第二时长时。Wherein, there may be various ways to extract the sound segment, and the embodiment of the present application only gives an example of one of them. The sound segment can be extracted based on the first condition, and according to the variation of the sound intensity of the voice signal, the start mark of each sound segment can be when the sound intensity meets the first condition or meets the first condition for a first duration, and the end mark is when the sound intensity does not meet the first condition or does not meet the first condition for a second duration.
在显示界面将所述声音片段记录到对应声源的第一目录中,所述第一目录位于与语音文件对应的第二目录下。On the display interface, the sound segment is recorded into a first directory corresponding to the sound source, and the first directory is located under a second directory corresponding to the voice file.
在所述显示界面中目录的排列规则和显示规则可以根据实际的需要进行设定,本申请实施例仅给出了其中的一种实施方式进行举例说明。如图7所示,在显示界面建立两级目录,以所述语音文件对应的第二目录作为第一级目录,所述第二目录的目录名可以为所述语音文件的标识,在所述第二目录下建立与各声源对应的第一目录作为第二级目录,所述第一目录的目录名可以为对应声源的身份标识:声源A、声源B和声源C,并在各第一目录下记录与各声源对应的声音片段,A-1、A-2和A-3,各声音片段的名称可以包括对应声源的身份标识和序号。The arrangement rules and display rules of the directories in the display interface can be set according to actual needs, and this embodiment of the present application only gives an example for illustration. As shown in Figure 7, set up two-level directory in display interface, with the second directory corresponding to described audio file as first-level directory, the directory name of described second directory can be the mark of described audio file, under described second directory, set up the first directory corresponding to each sound source as second-level directory, the directory name of described first directory can be the identification mark of corresponding sound source: sound source A, sound source B and sound source C, and record the sound segment corresponding to each sound source under each first directory, A-1, A-2 and A-3, the title of each sound segment can include corresponding sound Source ID and sequence number.
在对声音片段进行身份识别的过程中,若匹配到与所述声音片段对应的已有声源,则将所述声音片段记录到与所述已有声源对应的第一目录中;若没有匹配到与所述声音片段对应的已有声源,则在第二目录下新建与新的声 源对应的第一目录,并将所述声音片段记录到新建的第一目录下。In the process of identifying the sound clip, if the existing sound source corresponding to the sound clip is matched, the sound clip is recorded in the first directory corresponding to the existing sound source; The first directory corresponding to the source, and record the sound clip into the newly created first directory.
进一步地,所述方法还包括:Further, the method also includes:
接收对显示界面的第一输入;其中,所述第一输入可以由触控操作产生,例如,点击操作、长按操作或滑动操作等,还可以由语音操作或手势操作产生,此处不具等具体地限定。Receive a first input to the display interface; wherein, the first input can be generated by a touch operation, for example, a click operation, a long press operation, or a slide operation, etc., or can also be generated by a voice operation or a gesture operation, which is not specifically limited here.
响应于所述第一输入,播放所述声音片段或语音文件。In response to the first input, the sound clip or voice file is played.
用户通过对显示界面的第一输入,选择播放在所述显示界面显示的语音文件或声音片段。例如,如图7所示,用户可以通过对语音文件对应的第二目录进行第一操作,播放或暂停播放包含各声音片段的语音文件;用户还可以通过对第二目录进行第二操作,展开或收起在所述第二目录下的第一目录的列表;用户还可以通过对展开的第一目录进行第三操作,展开或收起在所述第一目录下的声音片段的列表;用户还可以通过对声音片段的第四操作,播放或暂停播放所述声音片段。Through the first input to the display interface, the user chooses to play the voice file or sound segment displayed on the display interface. For example, as shown in Figure 7, the user can play or pause the audio file that contains each sound clip by performing a first operation on the second directory corresponding to the audio file; the user can also expand or collapse the list of the first directory under the second directory by performing a second operation on the second directory; the user can also expand or collapse the list of sound clips under the first directory by performing a third operation on the expanded first directory;
由以上本发明实施例提供的技术方案可见,本发明实施例根据所述语音信号的声纹特征,确定产生所述语音信号的声源。通过本发明实施例,对语音信号的声源进行身份识别,并提取对应的声音片段,再记录到对应的目录中,从而可以对声音片段进行高效管理,合理展示并播放。It can be seen from the above technical solutions provided by the embodiments of the present invention that the embodiments of the present invention determine the sound source that generates the speech signal according to the voiceprint feature of the speech signal. Through the embodiment of the present invention, the sound source of the voice signal is identified, and the corresponding sound clips are extracted, and then recorded in the corresponding directory, so that the sound clips can be efficiently managed, reasonably displayed and played.
基于上述实施例,进一步地,若所述终端为终端组的主终端,所述终端组包括至少一个主终端和N个附终端,其中N大于等于1,则所述方法还包括:Based on the above embodiment, further, if the terminal is a main terminal of a terminal group, and the terminal group includes at least one main terminal and N auxiliary terminals, where N is greater than or equal to 1, the method further includes:
接收所述附终端采集的语音信号并汇总,得到语音文件。receiving and summarizing the voice signals collected by the attached terminal to obtain a voice file.
如图8所示,根据实际的需要,可以组成由多个终端组成的终端组,例如,在需要进行多终端会议的情况下,将入会的多个终端组成终端组。在一 种实施方式中,可以将其中的至少一个终端设置的主终端,其它的终端设置的附终端,由所述主终端来对多终端会议的全程进行录音和记录。As shown in FIG. 8 , according to actual needs, a terminal group composed of multiple terminals can be formed. For example, when a multi-terminal conference is required, multiple terminals joining the conference can be formed into a terminal group. In a In this embodiment, at least one of the terminals can be set as the main terminal, and the other terminals can be set as auxiliary terminals, and the main terminal can record and record the whole process of the multi-terminal conference.
各终端可以按照上述实施例的方式,确定周边环境的目标声源并进行定位,然后,根据所述目标声源的位置,以定向采集模式采集语音信号,再将语音信号发送给主终端,由主终端按照时间顺序进行汇总,形成总的语音文件。Each terminal can determine and locate the target sound source in the surrounding environment according to the method of the above-mentioned embodiment, and then, according to the position of the target sound source, collect voice signals in a directional collection mode, and then send the voice signals to the main terminal, and the main terminal summarizes them in chronological order to form a total voice file.
进一步地,接收所述附终端采集的语音信号的过程中,所述方法还包括:Further, in the process of receiving the voice signal collected by the attached terminal, the method further includes:
确定与所述语音信号对应的声源。A sound source corresponding to the speech signal is determined.
应理解的是,对各语音信号对应的声源进行身份识别过程可以由各终端自己完成,然后再将识别结果随同所述语音信号一并发送给主终端;也可以由主终端执行所述身份识别过程。It should be understood that the identification process of the sound source corresponding to each voice signal can be completed by each terminal itself, and then the identification result together with the voice signal is sent to the main terminal; the identification process can also be performed by the main terminal.
从所述语音文件中提取出包括所述语音信号的声音片段。A sound segment including the voice signal is extracted from the voice file.
可以基于所述第一条件对声音片段进行提取,根据语音信号的声音强度的变化情况,每一段声音片段的开始标志可以为声音强度满足第一条件时或者满足第一条件持续第一时长时,结束标志为声音强度不满足第一条件时或者不满足第一条件持续第二时长时。The sound segment can be extracted based on the first condition, and according to the variation of the sound intensity of the voice signal, the start mark of each sound segment can be when the sound intensity meets the first condition or meets the first condition for a first duration, and the end mark is when the sound intensity does not meet the first condition or does not meet the first condition for a second duration.
在显示界面将所述声音片段记录到对应声源的第一目录中;其中,所述第一目录位于与终端对应的第三目录下,所述第三目录位于与所述语音文件对应的第二目录下。On the display interface, record the sound segment into a first directory corresponding to the sound source; wherein, the first directory is located under a third directory corresponding to the terminal, and the third directory is located under a second directory corresponding to the voice file.
主终端根据所述声音片段的身份识别结果中与所述声音片段对应声源,以及采集所述声音片段的终端,将所述声音片段记录到显示界面的对应目录中。The main terminal records the sound clip into a corresponding directory on the display interface according to the sound source corresponding to the sound clip in the identification result of the sound clip and the terminal that collects the sound clip.
在所述显示界面中目录的排列规则和显示规则可以根据实际的需要进行 设定,本申请实施例仅给出了其中的一种举例说明。如图9所示,在显示界面建立二级目录,以所述语音文件对应的第二目录作为第一级目录,所述第二目录的目录名可以为所述语音文件的标识:终端1、终端2和终端3,在所述第二目录下建立与各终端对应的第三目录作为第二级目录,所述第三目录的目录名可以为所述终端的标识,在所述第三目录记录对应的声音片段,各声音片段的名称可以包括:终端的标识、对应声源的身份标识和序号:1-A-1、1-A-2、1-B-1等。Arrangement rules and display rules of directories in the display interface can be implemented according to actual needs It is assumed that the embodiment of the present application only provides an example for illustration. As shown in Figure 9, set up a secondary directory on the display interface, with the second directory corresponding to the voice file as the first-level directory, the directory name of the second directory can be the identification of the voice file: terminal 1, terminal 2 and terminal 3, the third directory corresponding to each terminal is established under the second directory as the second-level directory, the directory name of the third directory can be the identification of the terminal, and the corresponding sound clips are recorded in the third directory. 1-B-1 etc.
在另一种实施方式中,还可以在显示界面建立三级目录,以所述语音文件对应的第二目录作为第一级目录,所述第二目录的目录名可以为所述语音文件的标识,在所述第二目录下建立与各终端对应的第三目录作为第二级目录,所述第三目录的目录名可以为所述终端的标识:终端1、终端2和终端3,在所述第三目录下建立与各声源对应的第一目录作为第三级目录,所述第一目录的目录名可以为对应声源的身份标识:声源A、声源B和声源C,并在各第一目录下记录与各声源对应的声音片段:1-A-1、1-A-2、1-B-1等。In another embodiment, a three-level directory can also be established on the display interface, and the second directory corresponding to the voice file is used as the first-level directory. The directory name of the second directory can be the identification of the voice file. Under the second directory, a third directory corresponding to each terminal is established as the second-level directory. The directory name of the third directory can be the identification of the terminal: Terminal 1, Terminal 2, and Terminal 3. The first directory corresponding to each sound source is established under the third directory as the third-level directory, and the directory name of the first directory can be the identity of the corresponding sound source: sound source A, Sound source B and sound source C, and record sound clips corresponding to each sound source in each first directory: 1-A-1, 1-A-2, 1-B-1, etc.
在另一种实施方式中,还可以将上述三级目录中第一目录与第二目录之间的层级关系反转,将第一目录作为第二级目录,将第三目录建立在第二目录下作为第三级目录,并在所述第三目录中记录对应声源。In another embodiment, the hierarchical relationship between the first directory and the second directory in the above-mentioned three-level directory can also be reversed, the first directory is used as the second-level directory, the third directory is established under the second directory as the third-level directory, and the corresponding sound source is recorded in the third directory.
为了简便起见,在下面的实施例中均以建立如图9所示的二级目录为例进行举例说明。For the sake of simplicity, in the following embodiments, the establishment of the secondary directory as shown in FIG. 9 is taken as an example for illustration.
在一种实施方式中,由主终端对总的语音文件中的各声音片段进行提取和身份识别同,并在对声音片段进行身份识别的过程中,通过声纹特征的匹配过程,确定与所述声音片段对应的声源的身份标识,记录到相应的第三目录中并命名。 In one embodiment, the main terminal extracts and identifies each sound segment in the total voice file, and in the process of identifying the sound segment, through the matching process of voiceprint features, determine the identity of the sound source corresponding to the sound segment, record it in the corresponding third directory and name it.
进一步地,所述方法还包括:Further, the method also includes:
接收对显示界面的第一输入;其中,所述第一输入可以由触控操作产生,例如,点击操作、长按操作或滑动操作等,还可以由语音操作或手势操作产生,此处不具等具体地限定。Receive a first input to the display interface; wherein, the first input can be generated by a touch operation, for example, a click operation, a long press operation, or a slide operation, etc., or can also be generated by a voice operation or a gesture operation, which is not specifically limited here.
响应于所述第一输入,播放所述声音片段或语音文件。In response to the first input, the sound clip or voice file is played.
用户通过对显示界面的第一输入,选择播放在所述显示界面显示的语音文件或声音片段。例如,如图9所示,用户可以通过对语音文件对应的第二目录进行第一操作,播放或暂停播放包含各声音片段的语音文件;用户还可以通过对第二目录进行第二操作,展开或收起在所述第二目录下的第三目录的列表;用户还可以通过对展开的第三目录进行第三操作,展开或收起在所述第三目录下的声音片段的列表;用户还可以通过对声音片段的第四操作,播放或暂停播放所述声音片段。Through the first input to the display interface, the user chooses to play the voice file or sound segment displayed on the display interface. For example, as shown in Figure 9, the user can play or pause the audio file containing each sound clip by performing a first operation on the second directory corresponding to the audio file; the user can also expand or collapse the list of the third directory under the second directory by performing a second operation on the second directory; the user can also expand or collapse the list of sound clips under the third directory by performing a third operation on the expanded third directory;
由以上本发明实施例提供的技术方案可见,本发明实施例若所述终端为终端组的主终端,所述终端组包括至少一个主终端和N个附终端,其中N大于等于1,所述主终端接收所述附终端采集的声音片段,并汇总保存为语音文件。通过本发明实施例,对声音片段的声源进行身份识别,并记录到对应的目录中,从而可以对声音片段进行高效管理,合理展示并播放。It can be seen from the above technical solutions provided by the embodiments of the present invention that in the embodiments of the present invention, if the terminal is the main terminal of a terminal group, and the terminal group includes at least one main terminal and N auxiliary terminals, where N is greater than or equal to 1, the main terminal receives the sound clips collected by the auxiliary terminals, and saves them together as a voice file. Through the embodiment of the present invention, the sound source of the sound clip is identified and recorded in a corresponding directory, so that the sound clip can be efficiently managed, displayed and played reasonably.
本申请实施例提供的录音方法,执行主体可以为录音装置。本申请实施例中以录音装置执行录音方法为例,说明本申请实施例提供的录音装置。The recording method provided in the embodiment of the present application may be executed by a recording device. In the embodiment of the present application, the recording method performed by the recording device is taken as an example to describe the recording device provided in the embodiment of the present application.
如图11所示,所述录音装置包括:声源定位模块111和语音采集模块112;其中,所述声源定位模块111用于通过麦克风阵列,确定目标声源的位置;其中,所述目标声源为产生的语音信号的声音强度满足第一条件的声源,所述麦克风阵列包括至少三个不排列在一条直线上的麦克风;所述语音采集模 块112用于根据所述目标声源的位置,以定向采集模式采集语音信号,得到语音文件。As shown in Figure 11, described recording device comprises: sound source localization module 111 and voice acquisition module 112; Wherein, described sound source localization module 111 is used for passing the microphone array, determines the position of target sound source; Wherein, described target sound source is the sound source that the sound intensity of the speech signal that produces meets the first condition, and described microphone array comprises at least three microphones that are not arranged on a straight line; Said voice acquisition module Block 112 is used to collect voice signals in a directional collection mode according to the position of the target sound source to obtain a voice file.
进一步地,所述第一条件包括:Further, the first condition includes:
语音信号的声音强度超过第一阈值。The sound intensity of the speech signal exceeds a first threshold.
进一步地,所述语音采集模块用于:Further, the voice collection module is used for:
以所述目标声源为中心,设定采集区域;Setting an acquisition area centered on the target sound source;
从所述采集区域内,采集语音信号。From within the collection area, voice signals are collected.
进一步地,所述采集区域为,以所述麦克风阵列为顶点,以所述麦克风阵列到所述目标声源的连线为中心线的扇形区域。Further, the collection area is a fan-shaped area with the microphone array as an apex and a line connecting the microphone array to the target sound source as a centerline.
进一步地,所述语音采集模块还用于:Further, the voice collection module is also used for:
对所述采集区域外的语音信号进行屏蔽或抑制。Shield or suppress voice signals outside the collection area.
由以上本发明实施例提供的技术方案可见,本发明实施例的通过麦克风阵列,确定目标声源的位置;其中,所述目标声源为产生的语音信号的声音强度满足第一条件的声源,所述麦克风阵列包括至少三个不排列在一条直线上的麦克风;根据所述目标声源的位置,以定向采集模式采集语音信号。通过本发明实施例,先对确定的目标声源进行定位并定向采集,从而可以有效得从噪声环境中分离出目标声源的语音信号并进行采集,以获得更加清晰的声音片段。It can be seen from the technical solutions provided by the embodiments of the present invention above that the position of the target sound source is determined through the microphone array in the embodiment of the present invention; wherein, the target sound source is a sound source whose sound intensity of the generated speech signal meets the first condition, and the microphone array includes at least three microphones that are not arranged in a straight line; according to the position of the target sound source, the speech signal is collected in a directional collection mode. Through the embodiment of the present invention, the determined target sound source is firstly positioned and directional collected, so that the speech signal of the target sound source can be effectively separated from the noise environment and collected to obtain clearer sound clips.
基于上述实施例,进一步地,在以定向采集模式采集语音信号的过程中,所述语音采集模块还用于:Based on the above embodiments, further, in the process of collecting voice signals in a directional collection mode, the voice collection module is also used for:
根据所述语音信号的声纹特征,确定与所述语音信号对应的声源。A sound source corresponding to the voice signal is determined according to the voiceprint feature of the voice signal.
进一步地,在确定与所述语音信号对应的声源之后,所述语音采集模块还用于: Further, after determining the sound source corresponding to the voice signal, the voice collection module is also used for:
根据声音强度,提取出包括所述语音信号的声音片段;Extracting a sound segment including the speech signal according to the sound intensity;
在显示界面将所述声音片段记录到对应声源的第一目录中,所述第一目录位于与语音文件对应的第二目录下。On the display interface, the sound segment is recorded into a first directory corresponding to the sound source, and the first directory is located under a second directory corresponding to the voice file.
进一步地,在将所述声音片段记录到对应身份标识信息对应的第一目录中之后,所述语音采集模块还用于:Further, after recording the sound clip into the first directory corresponding to the corresponding identification information, the voice collection module is also used for:
接收对显示界面的第一输入;receiving a first input to a display interface;
响应于所述第一输入,播放所述声音片段或语音文件。In response to the first input, the sound clip or voice file is played.
由以上本发明实施例提供的技术方案可见,本发明实施例根据所述语音信号的声纹特征,确定产生所述语音信号的声源。通过本发明实施例,对语音信号对应的声源进行身份识别,并提取对应的声音片段,再记录到对应的目录中,从而可以对声音片段进行高效管理,合理展示并播放。It can be seen from the above technical solutions provided by the embodiments of the present invention that the embodiments of the present invention determine the sound source that generates the speech signal according to the voiceprint feature of the speech signal. Through the embodiment of the present invention, the identity of the sound source corresponding to the voice signal is identified, and the corresponding sound segment is extracted, and then recorded in the corresponding directory, so that the sound segment can be efficiently managed, reasonably displayed and played.
基于上述实施例,进一步地,在所述录音装置为终端组的主终端,所述终端组包括至少一个主终端和N个附终端,其中N大于等于1的情况下,所述语音采集模块还用于:Based on the above embodiment, further, when the recording device is a main terminal of a terminal group, and the terminal group includes at least one main terminal and N auxiliary terminals, where N is greater than or equal to 1, the voice collection module is further configured to:
接收所述附终端采集的语音信号并汇总,得到语音文件。receiving and summarizing the voice signals collected by the attached terminal to obtain a voice file.
进一步地,在接收所述附终端采集的语音信号的过程中,所述语音采集模块还用于:Further, in the process of receiving the voice signal collected by the attached terminal, the voice collection module is also used for:
确定与所述语音信号对应的声源;determining a sound source corresponding to the speech signal;
从所述语音文件中提取出包括所述语音信号的声音片段;extracting a sound segment including the voice signal from the voice file;
在显示界面将所述声音片段记录到与采集所述声音片段的终端对应的第三目录下;其中,所述第三目录位于与所述语音文件对应的第二目录下。Recording the sound clip on the display interface in a third directory corresponding to the terminal that collected the sound clip; wherein the third directory is located in the second directory corresponding to the voice file.
进一步地,在将所述声音片段记录到对应身份标识信息对应的第一目录中之后,所述语音采集模块还用于: Further, after recording the sound clip into the first directory corresponding to the corresponding identification information, the voice collection module is also used for:
接收对显示界面的第一输入;receiving a first input to a display interface;
响应于所述第一输入,播放所述声音片段或语音文件。In response to the first input, the sound clip or voice file is played.
由以上本发明实施例提供的技术方案可见,本发明实施例若所述装置为终端组的主终端,所述终端组包括至少一个主终端和N个附终端,其中N大于等于1,所述主终端接收所述附终端采集的声音片段,并汇总保存为语音文件。通过本发明实施例,对声音片段的声源进行身份识别,并记录到对应的目录中,从而可以对声音片段进行高效管理,合理展示并播放。It can be seen from the technical solutions provided by the above embodiments of the present invention that if the device in the embodiments of the present invention is the main terminal of a terminal group, the terminal group includes at least one main terminal and N auxiliary terminals, where N is greater than or equal to 1, and the main terminal receives the sound clips collected by the auxiliary terminals, and saves them together as a voice file. Through the embodiment of the present invention, the sound source of the sound clip is identified and recorded in a corresponding directory, so that the sound clip can be efficiently managed, displayed and played reasonably.
本申请实施例中的录音装置可以是电子设备,也可以是电子设备中的部件,例如集成电路或芯片。该电子设备可以是终端,也可以为除终端之外的其他设备。示例性的,电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、移动上网装置(Mobile Internet Device,MID)、增强现实(Augmented Reality,AR)/虚拟现实(Virtual Reality,VR)设备、机器人、可穿戴设备、超级移动个人计算机(Ultra-Mobile Personal Computer,UMPC)、上网本或者个人数字助理(Personal Digital Assistant,PDA)等,还可以为服务器、网络附属存储器(Network Attached Storage,NAS)、个人计算机(Personal Computer,PC)、电视机(television,TV)、柜员机或者自助机等,本申请实施例不作具体限定。The recording device in the embodiment of the present application may be an electronic device, or may be a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or other devices other than the terminal. Exemplarily, the electronic device may be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle electronic device, a mobile Internet device (Mobile Internet Device, MID), an augmented reality (Augmented Reality, AR)/virtual reality (Virtual Reality, VR) device, a robot, a wearable device, an ultra-mobile personal computer (Ultra-Mobile Personal Computer, UMPC), a netbook or a personal digital assistant (Personal Digital Assistant, PDA), etc., It can also be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (Personal Computer, PC), a television (television, TV), a teller machine or a self-service machine, etc., which are not specifically limited in this embodiment of the present application.
本申请实施例中的录音装置可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统,可以为ios操作系统,还可以为其他可能的操作系统,本申请实施例不作具体限定。The recording device in this embodiment of the present application may be a device with an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in this embodiment of the present application.
本申请实施例提供的录音装置能够实现图1至图10的方法实施例实现的各个过程,为避免重复,这里不再赘述。The recording device provided in the embodiment of the present application can realize various processes realized by the method embodiments in FIG. 1 to FIG. 10 , and details are not repeated here to avoid repetition.
可选地,如图12所示,本申请实施例还提供一种电子设备1200,包括 处理器1201和存储器1202,存储器1202上存储有可在所述处理器1201上运行的程序或指令,该程序或指令被处理器1201执行时实现上述录音方法实施例的各个步骤,且能达到相同的技术效果,为避免重复,这里不再赘述。Optionally, as shown in FIG. 12 , the embodiment of the present application further provides an electronic device 1200, including The processor 1201 and the memory 1202. The memory 1202 stores programs or instructions that can run on the processor 1201. When the programs or instructions are executed by the processor 1201, each step of the above-mentioned recording method embodiment can be achieved, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
需要说明的是,本申请实施例中的电子设备包括上述所述的移动电子设备和非移动电子设备。It should be noted that the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.
图13为实现本申请实施例的一种电子设备的硬件结构示意图。FIG. 13 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.
该电子设备1300包括但不限于:射频单元1301、网络模块1302、音频输出单元1303、输入单元1304、传感器1305、显示单元1306、用户输入单元1307、接口单元1308、存储器1309、以及处理器1310等部件。The electronic device 1300 includes, but is not limited to: a radio frequency unit 1301, a network module 1302, an audio output unit 1303, an input unit 1304, a sensor 1305, a display unit 1306, a user input unit 1307, an interface unit 1308, a memory 1309, and a processor 1310.
本领域技术人员可以理解,电子设备1300还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器1310逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图13中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。Those skilled in the art can understand that the electronic device 1300 can also include a power supply (such as a battery) for supplying power to various components, and the power supply can be logically connected to the processor 1310 through the power management system, so that functions such as management of charging, discharging, and power consumption management can be realized through the power management system. The structure of the electronic device shown in FIG. 13 does not constitute a limitation to the electronic device. The electronic device may include more or fewer components than shown in the figure, or combine some components, or arrange different components, which will not be repeated here.
其中,处理器1310,用于通过输入单元1304确定目标声源的位置;其中,所述目标声源为产生的语音信号的声音强度满足第一条件的声源;所述输入单元1304包括至少三个不排列在一条直线上的麦克风;Wherein, the processor 1310 is configured to determine the position of the target sound source through the input unit 1304; wherein, the target sound source is a sound source whose sound intensity of the generated speech signal meets the first condition; the input unit 1304 includes at least three microphones that are not arranged in a straight line;
输入单元1304,用于根据所述目标声源的位置,以定向采集模式采集语音信号,得到语音文件。The input unit 1304 is configured to collect voice signals in a directional collection mode according to the position of the target sound source to obtain a voice file.
进一步地,所述第一条件包括:Further, the first condition includes:
语音信号的声音强度超过第一阈值。The sound intensity of the speech signal exceeds a first threshold.
进一步地,所述输入单元1304用于:Further, the input unit 1304 is used for:
以所述目标声源为中心,设定采集区域; Setting an acquisition area centered on the target sound source;
从所述采集区域内,采集语音信号。From within the collection area, voice signals are collected.
进一步地,所述采集区域为,以所述麦克风阵列为顶点,以所述麦克风阵列到所述目标声源的连线为中心线的扇形区域。Further, the collection area is a fan-shaped area with the microphone array as an apex and a line connecting the microphone array to the target sound source as a centerline.
进一步地,所述输入单元1304还用于:Further, the input unit 1304 is also used for:
对所述采集区域外的语音信号进行屏蔽或抑制。Shield or suppress voice signals outside the collection area.
通过本发明实施例,先对确定的目标声源进行定位并定向采集,从而可以有效得从噪声环境中分离出目标声源的语音信号并进行采集,以获得更加清晰的声音片段。Through the embodiment of the present invention, the determined target sound source is firstly positioned and directional collected, so that the speech signal of the target sound source can be effectively separated from the noise environment and collected to obtain clearer sound clips.
基于上述实施例,进一步地,在以定向采集模式采集语音信号的过程中,所述处理器1310,还用于:根据所述语音信号的声纹特征,确定与所述语音信号对应的声源。Based on the above embodiment, further, during the process of collecting the voice signal in the directional collection mode, the processor 1310 is further configured to: determine the sound source corresponding to the voice signal according to the voiceprint feature of the voice signal.
进一步地,在确定与所述语音信号对应的声源之后,所述处理器1310还用于:Further, after determining the sound source corresponding to the speech signal, the processor 1310 is further configured to:
根据声音强度,提取出包括所述语音信号的声音片段;Extracting a sound segment including the speech signal according to the sound intensity;
在显示界面将所述声音片段记录到对应声源的第一目录中,所述第一目录位于与语音文件对应的第二目录下。On the display interface, the sound segment is recorded into a first directory corresponding to the sound source, and the first directory is located under a second directory corresponding to the voice file.
进一步地,所述用户输入单元1307用于接收对显示界面的第一输入;Further, the user input unit 1307 is configured to receive a first input to the display interface;
音频输出单元1303用于响应于所述第一输入,播放所述声音片段或语音文件。The audio output unit 1303 is configured to play the sound clip or voice file in response to the first input.
通过本发明实施例,对语音信号对应的声源进行身份识别,并提取对应的声音片段,再记录到对应的目录中,从而可以对声音片段进行高效管理,合理展示并播放。Through the embodiment of the present invention, the identity of the sound source corresponding to the voice signal is identified, and the corresponding sound segment is extracted, and then recorded in the corresponding directory, so that the sound segment can be efficiently managed, reasonably displayed and played.
进一步地,在所述录音装置为终端组的主终端,所述终端组包括至少一 个主终端和N个附终端,其中N大于等于1的情况下,所述射频单元1301还用于:接收所述附终端采集的语音信号并汇总,得到语音文件。Further, when the recording device is the main terminal of the terminal group, the terminal group includes at least one main terminals and N auxiliary terminals, where N is greater than or equal to 1, the radio frequency unit 1301 is further configured to: receive and summarize voice signals collected by the auxiliary terminals to obtain a voice file.
进一步地,在接收所述附终端采集的语音信号的过程中,所述处理器1310,还用于:Further, in the process of receiving the voice signal collected by the attached terminal, the processor 1310 is further configured to:
确定与所述语音信号对应的声源;determining a sound source corresponding to the speech signal;
从所述语音文件中提取出包括所述语音信号的声音片段;extracting a sound segment including the voice signal from the voice file;
在显示界面将所述声音片段记录到与采集所述声音片段的终端对应的第三目录下;其中,所述第三目录位于与所述语音文件对应的第二目录下。Recording the sound clip on the display interface in a third directory corresponding to the terminal that collected the sound clip; wherein the third directory is located in the second directory corresponding to the voice file.
通过本发明实施例,对声音片段的声源进行身份识别,并记录到对应的目录中,从而可以对声音片段进行高效管理,合理展示并播放。Through the embodiment of the present invention, the sound source of the sound clip is identified and recorded in a corresponding directory, so that the sound clip can be efficiently managed, displayed and played reasonably.
应理解的是,本申请实施例中,输入单元1304可以包括图形处理器(Graphics Processing Unit,GPU)13041和麦克风13042,图形处理器13041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元1306可包括显示面板13061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板13061。用户输入单元1307包括触控面板13071以及其他输入设备13072中的至少一种。触控面板13071,也称为触摸屏。触控面板13071可包括触摸检测装置和触摸控制器两个部分。其他输入设备13072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。It should be understood that, in the embodiment of the present application, the input unit 1304 may include a graphics processor (Graphics Processing Unit, GPU) 13041 and a microphone 13042, and the graphics processor 13041 processes image data of still pictures or videos obtained by an image capture device (such as a camera) in a video capture mode or an image capture mode. The display unit 1306 may include a display panel 13061, and the display panel 13061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1307 includes at least one of a touch panel 13071 and other input devices 13072 . Touch panel 13071, also called touch screen. The touch panel 13071 may include two parts, a touch detection device and a touch controller. Other input devices 13072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which will not be repeated here.
存储器1309可用于存储软件程序以及各种数据。存储器1309可主要包括存储程序或指令的第一存储区和存储数据的第二存储区,其中,第一存储区可存储操作系统、至少一个功能所需的应用程序或指令(比如声音播放功 能、图像播放功能等)等。此外,存储器1309可以包括易失性存储器或非易失性存储器,或者,存储器1309可以包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请实施例中的存储器1309包括但不限于这些和任意其它适合类型的存储器。The memory 1309 can be used to store software programs as well as various data. The memory 1309 may mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or instructions required by at least one function (such as a sound playing function) function, image playback function, etc.), etc. Furthermore, memory 1309 can include volatile memory or nonvolatile memory, or, memory 1309 can include both volatile and nonvolatile memory. Wherein, the non-volatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable PROM, EPROM), an electrically erasable programmable read-only memory (Electrically EPROM, EEPROM) or a flash memory. Volatile memory can be Random Access Memory (Random Access Memory, RAM), Static Random Access Memory (Static RAM, SRAM), Dynamic Random Access Memory (Dynamic RAM, DRAM), Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate SDRAM, DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced SDRAM, ESD RAM), synchronous connection dynamic random access memory (Synch link DRAM, SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DRRAM). The memory 1309 in the embodiment of the present application includes but is not limited to these and any other suitable types of memory.
处理器1310可包括一个或多个处理单元;可选的,处理器1310集成应用处理器和调制解调处理器,其中,应用处理器主要处理涉及操作系统、用户界面和应用程序等的操作,调制解调处理器主要处理无线通信信号,如基带处理器。可以理解的是,上述调制解调处理器也可以不集成到处理器1310中。The processor 1310 may include one or more processing units; optionally, the processor 1310 integrates an application processor and a modem processor, wherein the application processor mainly processes operations related to the operating system, user interface, and application programs, and the modem processor mainly processes wireless communication signals, such as a baseband processor. It can be understood that the foregoing modem processor may not be integrated into the processor 1310 .
本申请实施例还提供一种可读存储介质,所述可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述录音方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。The embodiment of the present application also provides a readable storage medium. The readable storage medium stores a program or an instruction. When the program or instruction is executed by a processor, each process of the above-mentioned recording method embodiment can be achieved, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
其中,所述处理器为上述实施例中所述的电子设备中的处理器。所述可读存储介质,包括计算机可读存储介质,如计算机只读存储器ROM、随机存 取存储器RAM、磁碟或者光盘等。Wherein, the processor is the processor in the electronic device described in the above embodiments. Described readable storage medium includes computer readable storage medium, such as computer read-only memory ROM, random memory Take the memory RAM, magnetic disk or optical disk, etc.
本申请实施例另提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现上述录音方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。The embodiment of the present application further provides a chip, the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the various processes of the above-mentioned recording method embodiment, and can achieve the same technical effect. To avoid repetition, details are not repeated here.
应理解,本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。It should be understood that the chips mentioned in the embodiments of the present application may also be called system-on-chip, system-on-chip, system-on-a-chip, or system-on-a-chip.
本申请实施例提供一种计算机程序产品,该程序产品被存储在存储介质中,该程序产品被至少一个处理器执行以实现如上述录音方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。The embodiment of the present application provides a computer program product, the program product is stored in a storage medium, and the program product is executed by at least one processor to implement the various processes in the above-mentioned recording method embodiment, and can achieve the same technical effect. To avoid repetition, details are not repeated here.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, but also includes other elements not explicitly listed, or also includes elements inherent to such a process, method, article or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element. In addition, it should be pointed out that the scope of the methods and devices in the embodiments of the present application is not limited to performing functions in the order shown or discussed, and may also include performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. Additionally, features described with reference to certain examples may be combined in other examples.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通 过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods in the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course they can also over hardware, but in many cases the former is a better implementation. Based on such an understanding, the technical solution of the present application can be embodied in the form of a computer software product in essence or the part that contributes to the prior art. The computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), and includes several instructions to make a terminal (which can be a mobile phone, computer, server, or network device, etc.) execute the method described in each embodiment of the application.
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。 The embodiments of the present application have been described above in conjunction with the accompanying drawings, but the present application is not limited to the above-mentioned specific embodiments. The above-mentioned specific embodiments are only illustrative and not restrictive. Under the inspiration of this application, those skilled in the art can also make many forms without departing from the purpose of the application and the scope of protection of the claims, all of which belong to the protection of the present application.

Claims (13)

  1. 一种录音方法,包括:A recording method, comprising:
    终端通过麦克风阵列,确定目标声源的位置;其中,所述目标声源为产生的语音信号的声音强度满足第一条件的声源,所述麦克风阵列包括至少三个不排列在一条直线上的麦克风;The terminal determines the position of the target sound source through the microphone array; wherein the target sound source is a sound source whose sound intensity of the generated voice signal meets the first condition, and the microphone array includes at least three microphones that are not arranged in a straight line;
    根据所述目标声源的位置,以定向采集模式采集语音信号,得到语音文件。According to the position of the target sound source, the voice signal is collected in a directional collection mode to obtain a voice file.
  2. 根据权利要求1所述的方法,其中,所述根据所述目标声源的位置,以定向采集模式采集语音信号,包括:The method according to claim 1, wherein said collecting the voice signal in a directional collection mode according to the position of the target sound source comprises:
    以所述目标声源为中心,设定采集区域;Setting an acquisition area centered on the target sound source;
    从所述采集区域内,采集语音信号。From within the collection area, voice signals are collected.
  3. 根据权利要求2所述的方法,其中,所述采集区域为,以所述麦克风阵列为顶点,以所述麦克风阵列到所述目标声源的连线为中心线的扇形区域。The method according to claim 2, wherein the collection area is a fan-shaped area with the microphone array as a vertex and a line connecting the microphone array to the target sound source as a center line.
  4. 根据权利要求2所述的方法,其中,所述方法还包括:The method according to claim 2, wherein the method further comprises:
    对所述采集区域外的语音信号进行屏蔽或抑制。Shield or suppress voice signals outside the collection area.
  5. 根据权利要求1所述的方法,其中,在以定向采集模式采集语音信号的过程中,所述方法还包括:The method according to claim 1, wherein, in the process of collecting voice signals in a directional collection mode, the method further comprises:
    根据所述语音信号的声纹特征,确定与所述语音信号对应的声源。A sound source corresponding to the voice signal is determined according to the voiceprint feature of the voice signal.
  6. 根据权利要求5所述的方法,其中,在确定与所述语音信号对应的声源之后,所述方法还包括:The method according to claim 5, wherein, after determining the sound source corresponding to the speech signal, the method further comprises:
    根据声音强度,提取出包括所述语音信号的声音片段;Extracting a sound segment including the speech signal according to the sound intensity;
    在显示界面将所述声音片段记录到对应声源的第一目录中,所述第一目录位于与语音文件对应的第二目录下。On the display interface, the sound segment is recorded into a first directory corresponding to the sound source, and the first directory is located under a second directory corresponding to the voice file.
  7. 根据权利要求2所述的方法,其中,在所述终端为终端组的主终端,所述终端组包括至少一个主终端和N个附终端,其中N大于等于1的情况下,所述方法还包括: The method according to claim 2, wherein, when the terminal is a main terminal of a terminal group, and the terminal group includes at least one main terminal and N auxiliary terminals, where N is greater than or equal to 1, the method further comprises:
    接收所述附终端采集的语音信号并汇总,得到语音文件。receiving and summarizing the voice signals collected by the attached terminal to obtain a voice file.
  8. 根据权利要求7所述的方法,其中,在接收所述附终端采集的语音信号的过程中,所述方法还包括:The method according to claim 7, wherein, in the process of receiving the voice signal collected by the attached terminal, the method further comprises:
    确定与所述语音信号对应的声源;determining a sound source corresponding to the speech signal;
    从所述语音文件中提取出包括所述语音信号的声音片段;extracting a sound segment including the voice signal from the voice file;
    在显示界面将所述声音片段记录到与采集所述声音片段的终端对应的第三目录下;其中,所述第三目录位于与所述语音文件对应的第二目录下。Recording the sound clip on the display interface in a third directory corresponding to the terminal that collected the sound clip; wherein the third directory is located in the second directory corresponding to the voice file.
  9. 根据权利要求6或8所述的方法,其中,在将所述声音片段记录到对应身份标识信息对应的第一目录中之后,所述方法还包括:The method according to claim 6 or 8, wherein, after recording the sound clip into the first directory corresponding to the corresponding identification information, the method further comprises:
    接收对显示界面的第一输入;receiving a first input to a display interface;
    响应于所述第一输入,播放所述声音片段或语音文件。In response to the first input, the sound clip or voice file is played.
  10. 根据权利要求1所述的方法,其中,所述第一条件包括:The method of claim 1, wherein the first condition comprises:
    语音信号的声音强度超过第一阈值。The sound intensity of the speech signal exceeds a first threshold.
  11. 一种录音装置,包括:A recording device comprising:
    声源定位模块,用于通过麦克风阵列,确定目标声源的位置;其中,所述目标声源为产生的语音信号的声音强度满足第一条件的声源,所述麦克风阵列包括至少三个不排列在一条直线上的麦克风;The sound source localization module is used to determine the position of the target sound source through the microphone array; wherein the target sound source is a sound source whose sound intensity of the generated speech signal meets the first condition, and the microphone array includes at least three microphones that are not arranged in a straight line;
    语音采集模块,用于根据所述目标声源的位置,以定向采集模式采集语音信号,得到语音文件。The voice collection module is used to collect voice signals in a directional collection mode according to the position of the target sound source to obtain a voice file.
  12. 一种电子设备,包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1-10任一项所述的录音方法的步骤。An electronic device, comprising a processor and a memory, the memory stores programs or instructions that can run on the processor, and when the programs or instructions are executed by the processor, the steps of the recording method according to any one of claims 1-10 are realized.
  13. 一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1-10任一项所述的录音方法的步骤。 A readable storage medium, on which a program or instruction is stored, and when the program or instruction is executed by a processor, the steps of the recording method according to any one of claims 1-10 are realized.
PCT/CN2023/072987 2022-01-24 2023-01-18 Voice recording method and apparatus, and electronic device WO2023138632A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210082304.1 2022-01-24
CN202210082304.1A CN114390133A (en) 2022-01-24 2022-01-24 Recording method and device and electronic equipment

Publications (1)

Publication Number Publication Date
WO2023138632A1 true WO2023138632A1 (en) 2023-07-27

Family

ID=81203197

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/072987 WO2023138632A1 (en) 2022-01-24 2023-01-18 Voice recording method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN114390133A (en)
WO (1) WO2023138632A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114390133A (en) * 2022-01-24 2022-04-22 维沃移动通信有限公司 Recording method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108538320A (en) * 2018-03-30 2018-09-14 广东欧珀移动通信有限公司 Recording control method and device, readable storage medium storing program for executing, terminal
US20190214011A1 (en) * 2016-10-14 2019-07-11 Samsung Electronics Co., Ltd. Electronic device and method for processing audio signal by electronic device
CN111060874A (en) * 2019-12-10 2020-04-24 深圳市优必选科技股份有限公司 Sound source positioning method and device, storage medium and terminal equipment
CN113053368A (en) * 2021-03-09 2021-06-29 锐迪科微电子(上海)有限公司 Speech enhancement method, electronic device, and storage medium
CN114390133A (en) * 2022-01-24 2022-04-22 维沃移动通信有限公司 Recording method and device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190214011A1 (en) * 2016-10-14 2019-07-11 Samsung Electronics Co., Ltd. Electronic device and method for processing audio signal by electronic device
CN108538320A (en) * 2018-03-30 2018-09-14 广东欧珀移动通信有限公司 Recording control method and device, readable storage medium storing program for executing, terminal
CN111060874A (en) * 2019-12-10 2020-04-24 深圳市优必选科技股份有限公司 Sound source positioning method and device, storage medium and terminal equipment
CN113053368A (en) * 2021-03-09 2021-06-29 锐迪科微电子(上海)有限公司 Speech enhancement method, electronic device, and storage medium
CN114390133A (en) * 2022-01-24 2022-04-22 维沃移动通信有限公司 Recording method and device and electronic equipment

Also Published As

Publication number Publication date
CN114390133A (en) 2022-04-22

Similar Documents

Publication Publication Date Title
US11030987B2 (en) Method for selecting background music and capturing video, device, terminal apparatus, and medium
CN108900902B (en) Method, device, terminal equipment and storage medium for determining video background music
US9654845B2 (en) Electronic apparatus of generating summary content and method thereof
US11580290B2 (en) Text description generating method and device, mobile terminal and storage medium
JP4920395B2 (en) Video summary automatic creation apparatus, method, and computer program
WO2022022536A1 (en) Audio playback method, audio playback apparatus, and electronic device
JP2016502302A (en) Audio background control method, mobile terminal, and non-transitory computer-readable storage medium
CN108363557A (en) Man-machine interaction method, device, computer equipment and storage medium
CN113676592B (en) Recording method, recording device, electronic equipment and computer readable medium
KR102208822B1 (en) Apparatus, method for recognizing voice and method of displaying user interface therefor
WO2023138632A1 (en) Voice recording method and apparatus, and electronic device
CN112653902A (en) Speaker recognition method and device and electronic equipment
EP2811399B1 (en) Method and terminal for starting music application
CN109782997B (en) Data processing method, device and storage medium
US8868419B2 (en) Generalizing text content summary from speech content
CN111818385B (en) Video processing method, video processing device and terminal equipment
CN111596760A (en) Operation control method and device, electronic equipment and readable storage medium
CN108256071B (en) Method and device for generating screen recording file, terminal and storage medium
US20160105620A1 (en) Methods, apparatus, and terminal devices of image processing
CN111741321A (en) Live broadcast control method, device, equipment and computer storage medium
CN112578967B (en) Chart information reading method and mobile terminal
CN111539219B (en) Method, equipment and system for disambiguation of natural language content titles
CN113241097A (en) Recording method, recording device, electronic equipment and readable storage medium
CN112261321B (en) Subtitle processing method and device and electronic equipment
CN112578965A (en) Processing method and device and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23742947

Country of ref document: EP

Kind code of ref document: A1