CN113724704A - Voice acquisition method, device, terminal and storage medium - Google Patents

Voice acquisition method, device, terminal and storage medium Download PDF

Info

Publication number
CN113724704A
CN113724704A CN202111005995.7A CN202111005995A CN113724704A CN 113724704 A CN113724704 A CN 113724704A CN 202111005995 A CN202111005995 A CN 202111005995A CN 113724704 A CN113724704 A CN 113724704A
Authority
CN
China
Prior art keywords
voice
user
determining
amplitude value
awakening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111005995.7A
Other languages
Chinese (zh)
Other versions
CN113724704B (en
Inventor
徐正新
姚文兴
陈敏
卢铁军
黄健
胡超
洪伟忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Skyworth RGB Electronics Co Ltd
Original Assignee
Shenzhen Skyworth RGB Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Skyworth RGB Electronics Co Ltd filed Critical Shenzhen Skyworth RGB Electronics Co Ltd
Priority to CN202111005995.7A priority Critical patent/CN113724704B/en
Publication of CN113724704A publication Critical patent/CN113724704A/en
Application granted granted Critical
Publication of CN113724704B publication Critical patent/CN113724704B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Telephone Function (AREA)

Abstract

The invention discloses a voice acquisition method, a voice acquisition device, a terminal and a storage medium, wherein the method comprises the following steps: acquiring a voice awakening command through a voice acquisition device, and determining a user sounding direction according to the voice awakening command; adjusting the voice acquisition device according to the sounding direction of the user; and acquiring the voice information of the user through the adjusted voice acquisition device. The voice acquisition device can be adjusted through the voice direction of the user, so that the problems that the voice acquisition device in the existing intelligent voice equipment is fixed in position and the user can not always face the voice acquisition device to generate voice, so that the accuracy of recognizing the voice of the user by the existing intelligent voice equipment is low and the voice instruction of the user is difficult to respond efficiently are solved.

Description

Voice acquisition method, device, terminal and storage medium
Technical Field
The present invention relates to the field of intelligent voice devices, and in particular, to a voice obtaining method, apparatus, terminal, and storage medium.
Background
Artificial intelligence has become the hottest term in the present day, and products carrying its technology have also shown explosive growth. In modern intelligent families, more and more intelligent voice devices are provided, wherein the intelligent voice devices with voice recognition and intelligent interaction technologies can interact with users in a voice mode, so that the intelligent voice devices are widely loved by people.
The existing intelligent voice equipment needs to collect the voice of a user for recognition, so a corresponding voice collecting device is usually arranged in the equipment. When the user pronounces towards the pronunciation collection system, the speech information quality that pronunciation collection system received is the best, however because the position of pronunciation collection system is fixed, can't carry out corresponding rotation according to the direction of giving out the sound source, consequently can't realize that the user always pronounces towards pronunciation collection system to it is lower to lead to current intelligent speech equipment to discern the accuracy of user's pronunciation, is difficult to respond to user's voice command high-efficiently.
Thus, there is still a need for improvement and development of the prior art.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a voice obtaining method, device, terminal and storage medium for solving the above-mentioned defects in the prior art, and aims to solve the problems that the position of a voice collecting device in the prior intelligent voice device is fixed, and a user cannot always face the voice collecting device to produce a sound, so that the accuracy of recognizing the voice of the user by the prior intelligent voice device is low, and it is difficult to efficiently respond to the voice instruction of the user.
The technical scheme adopted by the invention for solving the problems is as follows:
in a first aspect, an embodiment of the present invention provides a speech acquisition method, where the method is applied to an intelligent speech device, and the method includes:
acquiring a voice awakening command through a voice acquisition device, and determining a user sounding direction according to the voice awakening command;
adjusting the voice acquisition device according to the sounding direction of the user;
and acquiring the voice information of the user through the adjusted voice acquisition device.
In one embodiment, the voice collecting device includes a plurality of voice sensors, the wake-up voice command includes a plurality of wake-up voice data collected by the plurality of voice sensors, and determining the utterance direction of the user according to the wake-up voice command includes:
acquiring audio amplitude values corresponding to the plurality of awakening voice data respectively;
determining first awakening voice data and second awakening voice data from the plurality of awakening voice data according to the audio amplitude value, wherein the first audio amplitude value corresponding to the first awakening voice data is the largest, and the second audio amplitude value corresponding to the second awakening voice data is closest to the first audio amplitude value;
and determining the sound production direction of the user according to the first awakening voice data, the first audio amplitude value, the second awakening voice data and the second audio amplitude value.
In one embodiment, the determining the user utterance bearing according to the first wake-up speech data, the first audio amplitude value, the second wake-up speech data, and the second audio amplitude value includes:
acquiring position information of a voice sensor which acquires the first awakening voice data to obtain first position information;
acquiring position information of the voice sensor which acquires the second awakening voice data to obtain second position information;
and determining the sound production direction of the user according to the first position information, the second position information, the first audio amplitude value and the second audio amplitude value.
In one embodiment, the determining the user utterance location based on the first location information, the second location information, the first audio amplitude value, and the second audio amplitude value includes:
determining target range data according to the first position information and the second position information;
determining target angle data according to the first audio amplitude value and the second audio amplitude value;
and determining the sound production direction of the user according to the target range data and the target angle data.
In one embodiment, the adjusting the voice capture device according to the utterance direction of the user includes:
determining the target acquisition position according to the user sounding position, wherein the target acquisition position is opposite to the user sounding position;
and adjusting the voice acquisition device according to the target acquisition direction.
In one embodiment, the voice collecting device further includes a rotating mechanism, the plurality of voice sensors are fixed on the rotating mechanism, and the adjusting the voice collecting device according to the target collecting position includes:
taking one of the voice sensors as a main voice sensor;
and rotating the rotating mechanism according to the target acquisition orientation until the main voice sensor is positioned on the target acquisition orientation.
In an embodiment, the voice sensors include a noise reduction voice sensor, where the noise reduction voice sensor is located in a direction opposite to the main voice sensor, and the obtaining of the user voice information through the adjusted voice collecting device includes:
acquiring the sound respectively acquired by the voice sensors on the adjusted voice acquisition device except the noise reduction voice sensor to obtain a plurality of voice information;
determining initial voice information according to a plurality of voice information;
acquiring sound collected by the noise reduction voice sensor to obtain noise information;
and carrying out noise reduction processing on the initial voice information according to the noise information to obtain the user voice information.
In a second aspect, an embodiment of the present invention further provides a speech acquisition apparatus, where the apparatus includes:
the voice acquisition device comprises a determining module and a voice acquiring module, wherein the determining module is used for acquiring a wake-up voice instruction through the voice acquisition device and determining a target acquisition direction according to the wake-up voice instruction, and the target acquisition direction corresponds to a user sounding direction;
the adjusting module is used for adjusting the voice collecting device according to the target collecting direction;
and the acquisition module is used for acquiring the voice information of the user through the adjusted voice acquisition device.
In a third aspect, an embodiment of the present invention further provides a terminal, where the terminal includes a memory and one or more processors; the memory stores one or more programs; the program comprises instructions for performing a speech acquisition method as described in any of the above; the processor is configured to execute the program.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a plurality of instructions are stored, where the instructions are adapted to be loaded and executed by a processor to implement any of the steps of the voice obtaining method described above.
The invention has the beneficial effects that: according to the embodiment of the invention, a voice acquisition device is used for acquiring a wake-up voice command, and the sounding direction of a user is determined according to the wake-up voice command; adjusting the voice acquisition device according to the sounding direction of the user; and acquiring the voice information of the user through the adjusted voice acquisition device. The voice acquisition device can be adjusted through the voice direction of the user, so that the problems that the voice acquisition device in the existing intelligent voice equipment is fixed in position and the user can not always face the voice acquisition device to generate voice, so that the accuracy of recognizing the voice of the user by the existing intelligent voice equipment is low and the voice instruction of the user is difficult to respond efficiently are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating a speech acquisition method according to an embodiment of the present invention.
Fig. 2 is a block diagram of an internal module of an intelligent speech device according to an embodiment of the present invention.
Fig. 3 is a cross-sectional view of an external appearance of an intelligent speech device according to an embodiment of the present invention.
Fig. 4 is a top view of an external appearance of the smart voice device provided by the embodiment of the invention.
Fig. 5 is a flowchart of the operation of the intelligent speech device according to the embodiment of the present invention.
Fig. 6 is an advantageous diagram of a speech acquisition method according to an embodiment of the present invention.
Fig. 7 is a connection diagram of internal modules of a speech acquisition apparatus according to an embodiment of the present invention.
Fig. 8 is a functional block diagram of a terminal according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.
Artificial intelligence has become the hottest term in the present day, and products carrying its technology have also shown explosive growth. In modern intelligent families, more and more intelligent voice devices are provided, wherein the intelligent voice devices with voice recognition and intelligent interaction technologies can interact with users in a voice mode, so that the intelligent voice devices are widely loved by people.
The existing intelligent voice equipment needs to collect the voice of a user for recognition, so a corresponding voice collecting device is usually arranged in the equipment. When the user pronounces towards the pronunciation collection system, the speech information quality that pronunciation collection system received is the best, however because the position of pronunciation collection system is fixed, can't carry out corresponding rotation according to the direction of giving out the sound source, consequently can't realize that the user always pronounces towards pronunciation collection system to it is lower to lead to current intelligent speech equipment to discern the accuracy of user's pronunciation, is difficult to respond to user's voice command high-efficiently.
In view of the foregoing drawbacks of the prior art, the present invention provides a method for acquiring a speech, including: acquiring a voice awakening command through a voice acquisition device, and determining a user sounding direction according to the voice awakening command; adjusting the voice acquisition device according to the sounding direction of the user; and acquiring the voice information of the user through the adjusted voice acquisition device. The voice acquisition device can be adjusted through the voice direction of the user, so that the problems that the voice acquisition device in the existing intelligent voice equipment is fixed in position and the user can not always face the voice acquisition device to generate voice, so that the accuracy of recognizing the voice of the user by the existing intelligent voice equipment is low and the voice instruction of the user is difficult to respond efficiently are solved.
As shown in fig. 1, the method comprises the steps of:
and S100, acquiring a wake-up voice command through a voice acquisition device, and determining a user sounding direction according to the wake-up voice command.
Specifically, in this embodiment, in order to implement the voice control function of the intelligent voice device, a voice collecting device is preset on the intelligent voice device and is used for receiving the voice information of the user. In order to reduce the power consumption of the intelligent voice device and avoid the intelligent voice device from running with high power consumption under the condition of no user use, the embodiment sets a standby state for the intelligent voice device, and the intelligent voice device can close all unnecessary hardware under the standby state so as to achieve the purposes of saving power and reducing power consumption. After the intelligent voice equipment enters a standby state, the intelligent voice equipment can be recovered to normal work only by sending a preset awakening instruction by a user.
When the user sends a wake-up command, the voice acquisition device acquires the wake-up command. The awakening voice instructions collected in different directions have different sound characteristics, so the awakening voice instructions can reflect the direction of the user to a certain extent, and the sounding direction of the user can be determined according to the awakening voice instructions.
In one implementation, the voice collecting device includes a plurality of voice sensors, the wake-up voice command includes a plurality of wake-up voice data collected by the voice sensors, and the voice emitting direction of the user is determined according to the wake-up voice command, which specifically includes the following steps:
s101, acquiring audio amplitude values corresponding to a plurality of awakening voice data respectively;
step S102, according to the audio amplitude value, determining first awakening voice data and second awakening voice data from a plurality of awakening voice data, wherein the first audio amplitude value corresponding to the first awakening voice data is the largest, and the second audio amplitude value corresponding to the second awakening voice data is closest to the first audio amplitude value;
step S103, determining the sound production direction of the user according to the first awakening voice data, the first audio amplitude value, the second awakening voice data and the second audio amplitude value.
Specifically, the voice collecting device installed on the intelligent voice device in this embodiment is composed of a plurality of voice sensors. When the user sends a wake-up command, all the voice sensors receive the wake-up command. Because all the voice sensors have different relative position relations with the sounding direction of the user, the audio amplitude values of the awakening voice data collected by the voice sensors are different. The voice sensor is close to the sounding direction of the user, and the audio amplitude value of the obtained awakening voice data is large; on the contrary, the voice sensor far away from the sounding direction of the user obtains a smaller audio amplitude value of the awakening voice data.
In this embodiment, two audio amplitude values with the largest numerical value, that is, the first audio amplitude value and the second audio frequency, need to be screened out based on the audio amplitude values of the wake-up voice data respectively acquired by each voice sensor. The first awakening voice data and the second awakening voice data are obtained based on the two voice sensors which are closest to the sounding direction of the user, and the first audio amplitude value and the second audio amplitude value can further reflect the distance relation and the angle relation of the sounding directions of the user of the two voice sensors, so that the sounding direction of the user can be determined according to the first awakening voice data, the first audio amplitude value, the second awakening voice data and the second audio amplitude value.
In one implementation, the step S103 specifically includes the following steps:
step S1031, obtaining position information of the voice sensor which collects the first awakening voice data to obtain first position information;
step S1032, acquiring the position information of the voice sensor which acquires the second awakening voice data to obtain second position information;
step S1033, determining the user utterance direction according to the first position information, the second position information, the first audio amplitude value, and the second audio amplitude value.
Specifically, the embodiment defines the position information of the voice sensor, which acquires the first wake-up voice data, as the first position information; and defining the position information of the voice sensor which acquires the second awakening voice data as second position information. Since the first position information and the second position information respectively represent the closest and the second position to the user utterance direction, and the first audio amplitude value and the second audio amplitude value can further reflect the distance relationship and the angle relationship of the user utterance directions of the two voice sensors, the user utterance direction can be determined according to the first position information, the second position information, the first audio amplitude value and the second audio amplitude value.
In one implementation, the step S1033 specifically includes the following steps:
step S10331, determining target range data according to the first position information and the second position information;
step S10332, determining target angle data according to the first audio amplitude value and the second audio amplitude value;
step S10333, determining the sound production direction of the user according to the target range data and the target angle data.
Specifically, since the first position information and the second position information represent the positions closest and second to the user utterance direction, respectively, it can be roughly determined that the user utterance direction is located between the two positions, thereby obtaining the target range data. And then further determining the angle of the sound production direction of the user between the first audio amplitude value and the second audio amplitude value based on the first audio amplitude value and the second audio amplitude value, so as to obtain target angle data. And the sound production direction of the user can be accurately determined according to the target range data and the target angle data.
For example, as shown in fig. 4, for 4 MIC input audio Data1, Data2, Data3 and Data4, receiving the MIC audio D _ x with the largest audio amplitude of the wake-up word and the second largest MIC audio D _ y, determining that the user position is between D _ x and D _ y, and the user position is biased to the direction of D _ x, and estimating the approximate angle of the user's approximate direction between D _ x and D _ y by combining the designed positions of the MICs with the reception intensity of the actual test Data MIC:
D_x=max(Data1,Data2,Data3,Data4)
D_y=second(Data1,Data2,Data3,Data4)。
as shown in fig. 1, the method further comprises the steps of:
and S200, adjusting the voice acquisition device according to the sounding direction of the user.
Specifically, after waking up the smart voice device, the user may further control the smart voice device to perform an expected operation through sound, and the position of the user is usually not changed greatly at this time, so that in order to improve the accuracy of subsequent voice recognition for the user, the present embodiment needs to adjust the voice collecting device on the smart voice device according to the determined user utterance direction, so that the user utterance direction can be directed to the user utterance direction. When the voice collecting device is right at the sounding position of the user, the collected user voice is the voice which is sent by the user facing the voice collecting device, so that the user voice with better quality can be collected in the state, and the accuracy of the user voice recognition is effectively improved.
In one implementation, the step S200 specifically includes the following steps:
step S201, determining the target acquisition position according to the user sounding position, wherein the target acquisition position is opposite to the user sounding position;
and S202, adjusting the voice acquisition device according to the target acquisition direction.
Specifically, after the sound production direction of the user is determined, the target collection direction needs to be determined according to the sound production direction of the user, and in order to collect user voice with better quality subsequently, the direction opposite to the sound production direction of the user is used as the target collection direction in the embodiment. After the voice collecting device is adjusted according to the target collecting direction, the voice emitted by the front side of the user can be collected.
In one implementation, the intelligent voice device includes a rotating mechanism, the plurality of voice sensors are fixed to the rotating mechanism, and the step S202 specifically includes the following steps:
step S2021, taking one of the voice sensors as a main voice sensor;
step S2022, rotating the rotating mechanism according to the target acquisition orientation until the main voice sensor is located at the target acquisition orientation.
Specifically, as shown in fig. 4, the intelligent voice device in this embodiment includes a rotating mechanism for placing a plurality of voice sensors, where the voice sensors are fixed on the periphery of the rotating mechanism, and in this embodiment, one of the voice sensors is selected as a main voice sensor in advance, and after the target collecting orientation is determined, the rotating mechanism is rotated so that the main voice sensor is located at the target collecting orientation, thereby completing the adjustment of the voice collecting device.
In one implementation, the primary speech sensor is located in the middle of several of the speech sensors.
In one implementation mode, the rotating mechanism is composed of a disc structure with a hollow center and a central shaft sleeved in the middle of the disc structure, the disc structure and the central shaft are movably connected, and the voice sensors are fixed on the periphery of the disc structure.
In one implementation, the disc structure further includes a lower layer structure for accommodating, as shown in fig. 2, the lower layer structure includes a power supply, a DSP audio processing chip, a processor, a rotating electrical machine, a sound box, a wireless module, and other components. The power supply is responsible for supplying power to all parts of the equipment to complete the normal functions of the equipment; the DSP audio processing chip is mainly responsible for processing and analyzing the voice signals collected by the voice sensors and the noise reduction voice sensors, namely completing the voice signal collection and analysis of the equipment. The processor is responsible for receiving voice information processed by DSP audio, controlling the motor to rotate the rotating mechanism, controlling the input of a sound box per se, responding to a user instruction, and controlling the work of slave equipment through the wireless module; the sound box can respond and feed back to the intelligent voice equipment or the user requirement; the wireless module is mainly responsible for the communication function with the subordinate equipment, expands the function of intelligent voice equipment.
For example, as shown in fig. 3, the intelligent voice device in this embodiment is mainly divided into three parts, and the upper layer is a disk structure and is used for placing a plurality of voice sensors; the middle of the upper layer and the lower layer is supported by a central shaft which is rotationally connected with the disc structure, so that the central shaft can rotationally adjust the angle of the upper layer structure; the lower floor is the substructure of the main part of accomodating intelligent voice equipment, and the inside includes power, signal processing circuit, audio amplifier and motor module etc.. That is, the analysis and processing after the pickup of a plurality of voice sensors are all carried out in the lower layer structure, and the part can be adjusted relatively according to the actual design of the intelligent voice equipment.
As shown in fig. 1, the method further comprises the steps of:
and step S300, acquiring user voice information through the adjusted voice acquisition device.
Because the voice acquisition device is adjusted according to the determined target acquisition direction, the adjusted voice acquisition device can acquire the voice information which is sent out by the front of the user, thereby reducing the influence of environmental noise and obtaining high-quality voice information of the user.
In one implementation, the plurality of voice sensors include a noise reduction voice sensor, where the noise reduction voice sensor is located in a direction opposite to the main voice sensor, and the step S300 specifically includes the following steps:
step S301, obtaining the sound respectively collected by the voice sensors on the adjusted voice collecting device except the noise reduction voice sensor to obtain a plurality of voice information;
step S302, determining initial voice information according to a plurality of voice information;
step S303, acquiring the sound collected by the noise reduction voice sensor to obtain noise information;
and S304, performing noise reduction processing on the initial voice information according to the noise information to obtain the user voice information.
Since other noises besides the user's voice usually exist in the environment where the user is located, the noises have a great influence on the accuracy of the intelligent voice device for recognizing the user's voice. As shown in fig. 4, in the present embodiment, the voice sensor on the back of the main voice sensor is used as the noise reduction voice sensor, and because the main voice sensor in the adjusted voice acquisition device is located right in front of the user, the noise reduction voice sensor on the back of the main voice sensor mainly acquires ambient noise, and the voice information acquired by other voice sensors is optimized by the noise reduction voice sensor. Specifically, in this embodiment, it is necessary to acquire sounds respectively acquired by the voice sensors other than the noise reduction voice sensor to obtain a plurality of voice information. And then, carrying out combination optimization on the voice information to obtain initial voice information. And meanwhile, noise information is acquired through the noise reduction voice sensor. And finally, carrying out noise reduction processing on the initial voice information according to the noise information to obtain high-quality user voice information.
For example, as shown in fig. 4, where MIC1, MIC2, and MIC3 are voice sensors for collecting user voice information, MIC1 is a main voice sensor, MIC4 is located in the opposite direction of MIC1, and serves as a noise reduction voice sensor, the central rotation axis can adjust the position of the whole component to match the external working environment.
In an implementation manner, as shown in fig. 5, after the user voice information is collected, the user voice information is transmitted to the controlled device end through the wireless module, and the controlled device end executes the operation instruction corresponding to the user voice information. If no new voice awakening instruction appears, the direction of the current voice acquisition device is not adjusted, voice information sent by a user is continuously identified, and the voice function of multi-turn conversation is realized. And when the intelligent voice equipment does not execute any operation within the preset duration, controlling the intelligent voice equipment to enter a standby state, and restarting the intelligent voice equipment after a wake-up voice instruction is obtained again.
As shown in fig. 6, the advantage of the speech acquisition method in the present embodiment is:
1. when the voice command is input, the voice emitting direction of the user is judged, the voice emitting direction of the user can be concerned all the time, the rotating mechanism is matched with the change of the voice emitting direction of the user to adjust the collecting direction of the voice collecting device all the time, the probability of false triggering can be reduced in multiple rounds of voice conversations, and the user experience of multiple rounds of voice is optimized.
2. By configuring the noise reduction voice sensor, the noise reduction work in a multivariate environment is realized, and the accuracy of voice recognition is greatly improved.
3. Through the combination of wireless modules, intelligent voice equipment is not limited to simple voice recognition equipment any more, and intelligent voice equipment control which is more beneficial to family life is used as a central pivot for family voice control.
Based on the above embodiment, the present invention further provides a speech acquisition apparatus, as shown in fig. 7, the apparatus includes:
the determining module 01 is used for acquiring a wake-up voice instruction through a voice acquisition device and determining a target acquisition direction according to the wake-up voice instruction, wherein the target acquisition direction corresponds to a user sounding direction;
the adjusting module 02 is used for adjusting the voice collecting device according to the target collecting direction;
and the obtaining module 03 is configured to obtain the user voice information through the adjusted voice collecting device.
Based on the above embodiments, the present invention further provides a terminal, and a schematic block diagram thereof may be as shown in fig. 8. The terminal comprises a processor, a memory, a network interface and a display screen which are connected through a system bus. Wherein the processor of the terminal is configured to provide computing and control capabilities. The memory of the terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the terminal is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a speech acquisition method. The display screen of the terminal can be a liquid crystal display screen or an electronic ink display screen.
It will be understood by those skilled in the art that the block diagram of fig. 8 is a block diagram of only a portion of the structure associated with the inventive arrangements and is not intended to limit the terminals to which the inventive arrangements may be applied, and that a particular terminal may include more or less components than those shown, or may have some components combined, or may have a different arrangement of components.
In one implementation, one or more programs are stored in a memory of the terminal and configured to be executed by one or more processors include instructions for performing a voice acquisition method.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
In summary, the present invention discloses a voice obtaining method, apparatus, terminal and storage medium, where the method includes: acquiring a voice awakening command through a voice acquisition device, and determining a user sounding direction according to the voice awakening command; adjusting the voice acquisition device according to the sounding direction of the user; and acquiring the voice information of the user through the adjusted voice acquisition device. The voice acquisition device can be adjusted through the voice direction of the user, so that the problems that the voice acquisition device in the existing intelligent voice equipment is fixed in position and the user can not always face the voice acquisition device to generate voice, so that the accuracy of recognizing the voice of the user by the existing intelligent voice equipment is low and the voice instruction of the user is difficult to respond efficiently are solved.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims (10)

1. A voice obtaining method is applied to intelligent voice equipment and comprises the following steps:
acquiring a voice awakening command through a voice acquisition device, and determining a user sounding direction according to the voice awakening command;
adjusting the voice acquisition device according to the sounding direction of the user;
and acquiring the voice information of the user through the adjusted voice acquisition device.
2. The voice obtaining method according to claim 1, wherein the voice collecting device includes a plurality of voice sensors, the wake-up voice command includes a plurality of wake-up voice data collected by the plurality of voice sensors, and the determining the utterance direction of the user according to the wake-up voice command includes:
acquiring audio amplitude values corresponding to the plurality of awakening voice data respectively;
determining first awakening voice data and second awakening voice data from the plurality of awakening voice data according to the audio amplitude value, wherein the first audio amplitude value corresponding to the first awakening voice data is the largest, and the second audio amplitude value corresponding to the second awakening voice data is closest to the first audio amplitude value;
and determining the sound production direction of the user according to the first awakening voice data, the first audio amplitude value, the second awakening voice data and the second audio amplitude value.
3. The voice obtaining method according to claim 2, wherein the determining the user utterance direction according to the first wake-up voice data, the first audio amplitude value, the second wake-up voice data, and the second audio amplitude value comprises:
acquiring position information of a voice sensor which acquires the first awakening voice data to obtain first position information;
acquiring position information of the voice sensor which acquires the second awakening voice data to obtain second position information;
and determining the sound production direction of the user according to the first position information, the second position information, the first audio amplitude value and the second audio amplitude value.
4. The speech acquisition method of claim 3 wherein the determining the user utterance bearing according to the first position information, the second position information, the first audio amplitude value, and the second audio amplitude value comprises:
determining target range data according to the first position information and the second position information;
determining target angle data according to the first audio amplitude value and the second audio amplitude value;
and determining the sound production direction of the user according to the target range data and the target angle data.
5. The voice obtaining method according to claim 2, wherein the adjusting the voice collecting device according to the utterance direction of the user comprises:
determining the target acquisition position according to the user sounding position, wherein the target acquisition position is opposite to the user sounding position;
and adjusting the voice acquisition device according to the target acquisition direction.
6. The method according to claim 5, wherein the voice collecting device further includes a rotating mechanism, the plurality of voice sensors are fixed on the rotating mechanism, and the adjusting the voice collecting device according to the target collecting orientation includes:
taking one of the voice sensors as a main voice sensor;
and rotating the rotating mechanism according to the target acquisition orientation until the main voice sensor is positioned on the target acquisition orientation.
7. The method according to claim 6, wherein the plurality of speech sensors include a noise-reducing speech sensor, wherein the noise-reducing speech sensor is located in a direction opposite to the main speech sensor, and the obtaining of the user speech information by the adjusted speech acquisition device includes:
acquiring the sound respectively acquired by the voice sensors on the adjusted voice acquisition device except the noise reduction voice sensor to obtain a plurality of voice information;
determining initial voice information according to a plurality of voice information;
acquiring sound collected by the noise reduction voice sensor to obtain noise information;
and carrying out noise reduction processing on the initial voice information according to the noise information to obtain the user voice information.
8. A speech acquisition apparatus, characterized in that the apparatus comprises:
the voice acquisition device comprises a determining module and a voice acquiring module, wherein the determining module is used for acquiring a wake-up voice instruction through the voice acquisition device and determining a target acquisition direction according to the wake-up voice instruction, and the target acquisition direction corresponds to a user sounding direction;
the adjusting module is used for adjusting the voice collecting device according to the target collecting direction;
and the acquisition module is used for acquiring the voice information of the user through the adjusted voice acquisition device.
9. A terminal, comprising a memory and one or more processors; the memory stores one or more programs; the program comprises instructions for performing the speech acquisition method of any of claims 1-7; the processor is configured to execute the program.
10. A computer-readable storage medium having stored thereon instructions adapted to be loaded and executed by a processor to perform the steps of the speech acquisition method according to any of claims 1-7.
CN202111005995.7A 2021-08-30 2021-08-30 Voice acquisition method, device, terminal and storage medium Active CN113724704B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111005995.7A CN113724704B (en) 2021-08-30 2021-08-30 Voice acquisition method, device, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111005995.7A CN113724704B (en) 2021-08-30 2021-08-30 Voice acquisition method, device, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN113724704A true CN113724704A (en) 2021-11-30
CN113724704B CN113724704B (en) 2024-07-02

Family

ID=78679167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111005995.7A Active CN113724704B (en) 2021-08-30 2021-08-30 Voice acquisition method, device, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN113724704B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU3083201A (en) * 1999-11-22 2001-06-04 Microsoft Corporation Distributed speech recognition for mobile communication devices
CN108391057A (en) * 2018-04-04 2018-08-10 深圳市冠旭电子股份有限公司 Camera filming control method, device, smart machine and computer storage media
CN108735218A (en) * 2018-06-25 2018-11-02 北京小米移动软件有限公司 voice awakening method, device, terminal and storage medium
WO2021008000A1 (en) * 2019-07-12 2021-01-21 大象声科(深圳)科技有限公司 Voice wakeup method and apparatus, electronic device and storage medium
WO2021013137A1 (en) * 2019-07-25 2021-01-28 华为技术有限公司 Voice wake-up method and electronic device
CN213025388U (en) * 2020-10-13 2021-04-20 温州市迈卡威电器有限公司 Voice-controlled intelligent switch panel

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU3083201A (en) * 1999-11-22 2001-06-04 Microsoft Corporation Distributed speech recognition for mobile communication devices
CN108391057A (en) * 2018-04-04 2018-08-10 深圳市冠旭电子股份有限公司 Camera filming control method, device, smart machine and computer storage media
CN108735218A (en) * 2018-06-25 2018-11-02 北京小米移动软件有限公司 voice awakening method, device, terminal and storage medium
WO2021008000A1 (en) * 2019-07-12 2021-01-21 大象声科(深圳)科技有限公司 Voice wakeup method and apparatus, electronic device and storage medium
WO2021013137A1 (en) * 2019-07-25 2021-01-28 华为技术有限公司 Voice wake-up method and electronic device
CN213025388U (en) * 2020-10-13 2021-04-20 温州市迈卡威电器有限公司 Voice-controlled intelligent switch panel

Also Published As

Publication number Publication date
CN113724704B (en) 2024-07-02

Similar Documents

Publication Publication Date Title
CN111223497B (en) Nearby wake-up method and device for terminal, computing equipment and storage medium
US11217240B2 (en) Context-aware control for smart devices
US11551682B2 (en) Method of performing function of electronic device and electronic device using same
US20200365152A1 (en) METHOD AND APPARATUS FOR PERFORMING SPEECH RECOGNITION WITH WAKE ON VOICE (WoV)
CN106328132A (en) Voice interaction control method and device for intelligent equipment
CN110310623A (en) Sample generating method, model training method, device, medium and electronic equipment
KR20220034571A (en) Electronic device for identifying a command included in voice and method of opearating the same
CN111161714A (en) Voice information processing method, electronic equipment and storage medium
KR102512614B1 (en) Electronic device audio enhancement and method thereof
KR20190109916A (en) A electronic apparatus and a server for processing received data from the apparatus
US20210383806A1 (en) User input processing method and electronic device supporting same
US20220230657A1 (en) Voice control method and apparatus, chip, earphones, and system
CN110191397B (en) Noise reduction method and Bluetooth headset
CN113724704B (en) Voice acquisition method, device, terminal and storage medium
WO2023207185A1 (en) Voiceprint recognition method, graphical interface, and electronic device
US12033628B2 (en) Method for controlling ambient sound and electronic device therefor
US20220301542A1 (en) Electronic device and personalized text-to-speech model generation method of the electronic device
CN109922397A (en) Audio intelligent processing method, storage medium, intelligent terminal and smart bluetooth earphone
CN110415718B (en) Signal generation method, and voice recognition method and device based on artificial intelligence
CN113889116A (en) Voice information processing method and device, storage medium and electronic device
CN114255763A (en) Voice processing method, medium, electronic device and system based on multiple devices
CN110083392A (en) Audio wakes up method, storage medium, terminal and its bluetooth headset pre-recorded
CN115331672B (en) Device control method, device, electronic device and storage medium
US11562741B2 (en) Electronic device and controlling method using non-speech audio signal in the electronic device
US11211910B1 (en) Audio gain selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant