CN113963699A - Intelligent voice interaction method for financial equipment - Google Patents
Intelligent voice interaction method for financial equipment Download PDFInfo
- Publication number
- CN113963699A CN113963699A CN202111283365.6A CN202111283365A CN113963699A CN 113963699 A CN113963699 A CN 113963699A CN 202111283365 A CN202111283365 A CN 202111283365A CN 113963699 A CN113963699 A CN 113963699A
- Authority
- CN
- China
- Prior art keywords
- loudness
- signal
- audio signal
- equipment
- sound
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 35
- 238000000034 method Methods 0.000 title claims abstract description 21
- 230000005236 sound signal Effects 0.000 claims abstract description 77
- 238000000926 separation method Methods 0.000 claims abstract description 15
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 5
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 4
- 238000005070 sampling Methods 0.000 claims abstract description 4
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 4
- 230000008859 change Effects 0.000 claims description 6
- 230000009467 reduction Effects 0.000 claims description 6
- 238000009432 framing Methods 0.000 claims description 3
- 230000001960 triggered effect Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000005034 decoration Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000011410 subtraction method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses an intelligent voice interaction method for financial equipment, which comprises the following steps: signal acquisition and separation: collecting audio signals, and separating the audio signals into voice signals and noise signals by adopting a separation algorithm; synthesis of a speech signal: carrying out voice recognition on the voice signals, carrying out semantic understanding, finding out the best answer text, and synthesizing the answer text into answer voice signals; determining the playback audio signal according to equation 1: wherein, f (n) is the playing audio signal, s3(n) is the sound signal heard by the pre-estimating user, except the amplitude, other parameters are the same with the answering voice signal, d1(n) is the noise signal, n is the sampling frequency for discrete analysis of the audio signal; and determining the loudness of the played sound as the sum of the basic sound loudness and the loudness attenuation amount, and setting the equipment according to the loudness of the played sound to realize volume adjustment.
Description
Technical Field
The invention relates to the technical field of financial self-service terminals, in particular to an intelligent voice interaction method for financial equipment.
Background
The intelligent voice interaction is a new generation interaction mode based on voice input, and a feedback result can be obtained by speaking. The biggest problem of voice interaction is not accurate enough. Firstly, the accuracy of voice recognition is low due to the influence of the environment; moreover, the expression of an intention is diversified, and the intention cannot be completely covered; finally, voice interaction is an open domain and many unexpected situations need to be handled. There are no scenarios considered to be suitable for voice interaction, such as meeting scenarios, when family members sleep, etc.
With the wide application of financial self-service equipment and customer service robots, the volume of the existing equipment is constant in the interaction process, and in a complex environment, the listening effect of a user can be influenced by environmental sound, so that the satisfaction degree of customer experience is influenced to a certain extent.
Disclosure of Invention
The invention aims to provide an intelligent voice interaction method for financial equipment aiming at the technical defect that playing sound is constant in the prior art.
The technical scheme adopted for realizing the purpose of the invention is as follows:
an intelligent voice interaction method for financial equipment is characterized by comprising the following steps:
(1) acquisition of a playing audio signal:
signal acquisition and separation: collecting audio signals, and separating the audio signals into voice signals and noise signals by adopting a separation algorithm;
synthesis of a speech signal: carrying out voice recognition on the voice signals, carrying out semantic understanding, finding out the best answer text, and synthesizing the answer text into answer voice signals;
determining to play the audio signal according to formula 1;
Wherein f (n) is a playing audio signal, s3(n) is a sound signal heard by the pre-estimated user, except amplitude, other parameters are the same as the answering voice signal, d1(n) is a noise signal, n is a sampling frequency for performing discrete analysis on the audio signal, and m is 0-n and is an integer;
according to the noise signal d1(n) and the signal s3(n) which is predicted to be heard by the user, the playing audio signal is obtained through deconvolutionThe noise reduction function is achieved by superposing the noise signals;
(2) acquisition of loudness of played sound
Determining the loudness of the played sound as the sum of the loudness of the basic sound and the loudness attenuation quantity;
(3) the information content played by the equipment is determined by playing the audio signal, and the volume played by the equipment is determined by playing the sound loudness, so that intelligent voice interaction is realized.
Preferably, the audio signal is separated by an ICA blind source separation algorithm.
Preferably, the step of determining the distance r from the speaker to the user is as follows:
judging whether the front of the equipment is a living body or not through an infrared sensor, and if the front of the equipment is the living body, measuring the distance between a user and the equipment through an ultrasonic sensor;
collecting audio signals through a microphone array to obtain a relative angle between a user and equipment;
and obtaining the distance r from the loudspeaker to the user according to the distance from the ultrasonic sensor to the user, the relative angle between the user and the equipment and the relative distances among the ultrasonic sensor, the microphone array and the loudspeaker.
Preferably, the device starts audio signal acquisition after being awakened; performing framing processing on the audio signal, judging the audio signal to be pause when the pause time exceeds a set time threshold, and separating the audio signal; the awakening mode comprises awakening by an awakening word or awakening triggered by infrared rays.
Preferably, after the device is awakened, the device acquires a first audio signal, and the noise signal obtained by separation is the noise signal used when the audio signal is determined to be played each time in the voice interaction; in one voice interaction, when the situation that the position change of a user exceeds a set distance threshold or the noise loudness of a service environment exceeds a set loudness threshold is detected, separating the newly obtained audio signals, and obtaining a noise signal again to serve as the noise signal used when the audio signal is determined to be played next time in the voice interaction.
Preferably, the basic sound loudness is a fixed known value, and the loudness attenuation is calculated as follows:
Where r is the distance from the horn to the user.
Preferably, every time the change of the position of the user is detected to exceed the set distance threshold, the loudness attenuation amount is recalculated, the equipment is set according to the loudness of the new playing sound, and the real-time adjustment of the volume is realized.
Preferably, the maximum value of the loudness of the playback sound is twice the loudness of the basic sound
The invention has the beneficial effects that:
1. the invention provides a method for automatically adjusting the playing volume of equipment with different noises and different user positions, which improves the satisfaction degree of a client in the voice exchange process of intelligent equipment.
2. According to the audio signal that gathers at every turn to separate into speech signal and noise signal, the audio signal that gathers at every turn is different, and speech signal and noise signal are also different, have realized carrying out the purpose that audio signal adjusted respectively to different users for every user can all hear the most comfortable, most suitable sound.
3. When the customer is not communicating with the device, the signal directly measured is a noise signal through the microphone array of the device. When a client communicates with the equipment, the audio signal mixed with noise is collected through a microphone array of the equipment: y1(n) = s1(n) + d1(n), y (n) is the captured audio signal, s1(n) is the speech signal, and d1(n) is the noise signal. The method comprises the steps of firstly carrying out noise elimination processing on an audio signal mixed with noise, separating the audio signal by adopting an ICA blind source separation algorithm to obtain a voice signal s1(n) and a noise signal d1(n), and improving the accuracy of converting the voice signal into text information by noise reduction processing.
Drawings
FIG. 1 is a schematic flow diagram of the present invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
An intelligent voice interaction method for financial equipment comprises the following steps:
(1) acquisition of a playing audio signal:
signal acquisition and separation: collecting audio signals, and separating the audio signals into voice signals and noise signals by adopting a separation algorithm;
synthesis of a speech signal: carrying out voice recognition on the voice signals, carrying out semantic understanding, finding out the best answer text, and synthesizing the answer text into answer voice signals;
the audio signal is a regular sound wave frequency and amplitude variation information carrier with voice, music and sound effects. The answer speech signal synthesized by the device from the answer text is also an audio signal, also called a sound wave, which has three important parameters: frequency, amplitude and phase, which also determine the characteristics of the audio signal. In the prior art, a device directly synthesizes answer texts into answer voice signals, and the answer voice signals are played through a loudspeaker, wherein the frequency, the amplitude and the phase of the answer voice signals are fixed and are values initially set by the device, so that the sound waves are the same no matter what environment a user is on site, and in different noise environments, although the audio signals played by the device are the same, the audio signals heard by the user are different. In signal processing, useful call signals and useless call noise are utilized, and the noise signals are also utilized by the application.
In view of this, in the technical scheme designed by the invention, the collected and separated noise signal is used as a calculation known quantity; the words sent by the equipment are synthesized by a language, and the synthesized voice signal is s2 (n); s3(n) the frequency and phase angle are the same as s2(n), and the amplitude is calculated by equation 2 and the basic loudness, resulting in s3(n), which is the sound signal that the user is predicted to hear. S3(n) and d1(n) in formula 1 are known quantities, so that deconvolution can be performed to determine f (n) by superimposing noise signals, and the amplitude is related to loudness, which is obtained in the following acquisition of loudness of played sound, to determine the loudness of played sound.
The noise signal is the known quantity calculated by the formula 1 convolution in the interaction at this time, and when the equipment is awakened again, the noise signal changes, so that the noise signal in the interaction process is ensured to be obtained as the calculated known quantity according to the actual condition at that time.
Determining the playback audio signal according to equation 1:
Where f (n) is the playing audio signal, s3(n) is the predicted sound signal heard by the user, except for the amplitude, the other parameters are the same as the answering voice signal, d1(n) is the noise signal, n is the sampling frequency for discrete analysis of the audio signal, and m is 0-n and is an integer.
The conventional method is to find the best answer text, synthesize it into a speech signal and then play it to the user by the device, and in this scheme, after synthesizing the answer text into an answer speech signal, the noise reduction is performed, and f (n) is solved in reverse for formula 1. By this method, a noise reduction effect is achieved. Explications formula 1 is detailed below:
(2) Acquisition of loudness of played sound
And determining the loudness of the playing sound as the sum of the loudness of the basic sound and the loudness attenuation amount.
Basic sound loudness is the loudness of comfortable sound heard by a user, and for a fixed known value, the sound pressure is in db, 1 db is the sound just heard by the human ear, and less than 20 db is generally considered to be quiet, and generally less than 15 db is considered to be dead. About 20-40 decibels are furanic linguis. 40-60 decibels pertain to our normal conversation voice. Since the financial device is generally installed in a bank hall, the loudness at which the user can hear the sound comfortably is set to 50 db.
The loudness attenuation is calculated as follows:
Where r is the distance from the horn to the user.
(3) The information content played by the equipment is determined by playing the audio signal, and the volume played by the equipment is determined by playing the sound loudness, so that intelligent voice interaction is realized.
In this embodiment, an FFT noise reduction algorithm and an ICA blind source separation algorithm are adopted to separate the audio signals.
The determination of the distance r from the horn to the user is as follows:
collecting audio signals through a microphone array to obtain a relative angle between a user and equipment;
judging whether the front of the equipment is a living body or not through an infrared sensor, and if the front of the equipment is the living body, measuring the distance between a user and the equipment through an ultrasonic sensor;
and obtaining the distance r from the loudspeaker to the user according to the distance from the ultrasonic sensor to the user, the relative angle between the user and the equipment and the relative distances among the sensor, the microphone array and the loudspeaker. The relative distance of the sensor to the horn is a fixed known value.
After the equipment is awakened, audio signal acquisition is started; performing framing processing on the audio signal, judging the audio signal to be pause when the pause time exceeds a set time threshold, and separating the audio signal; the awakening mode comprises awakening by an awakening word or awakening triggered by infrared rays.
Generally, after the device is awakened, a first audio signal is acquired, and the noise signal obtained by separation is the noise signal used when the audio signal is determined to be played each time in the voice interaction.
However, in practical applications, it is found that some changes occur in the user environment, which results in different noise signals, and therefore, the noise signals need to be determined again, and then the noise-free frequency signals to be played back are obtained through a common subtraction method. Thereby ensuring that the audio signal heard by the person is a clean audio signal. Therefore, in one voice interaction, when the situation that the position change of the user exceeds the set distance threshold or the noise loudness of the service environment exceeds the set loudness threshold is detected, the newly obtained audio signals are separated, and the noise signals are obtained again and used as the noise signals used when the audio signals are determined to be played next time in the voice interaction.
Furthermore, in order to ensure that the volume at each moment is most appropriate, every time the change of the position of the user is detected to exceed the set distance threshold, the loudness attenuation amount is recalculated, and the device is set according to the loudness of the new playing sound, so that the volume adjustment is realized.
The maximum value of the loudness of the played sound is set to be twice the loudness of the basic sound.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (8)
1. An intelligent voice interaction method for financial equipment is characterized by comprising the following steps:
(1) acquisition of a playing audio signal:
signal acquisition and separation: collecting audio signals, and separating the audio signals into voice signals and noise signals by adopting a separation algorithm;
synthesis of a speech signal: carrying out voice recognition on the voice signals, carrying out semantic understanding, finding out the best answer text, and synthesizing the answer text into answer voice signals;
determining the playback audio signal according to equation 1:
Wherein f (n) is a playing audio signal, s3(n) is a sound signal heard by the pre-estimated user, except amplitude, other parameters are the same as the answering voice signal, d1(n) is a noise signal, n is a sampling frequency for performing discrete analysis on the audio signal, and m is 0-n and is an integer;
according to the noise signal d1(n) and the signal s3(n) which is predicted to be heard by the user, the playing audio signal is obtained through deconvolutionThe noise reduction function is achieved by superposing the noise signals;
(2) acquisition of loudness of played sound
Determining the loudness of the played sound as the sum of the loudness of the basic sound and the loudness attenuation quantity;
(3) the information content played by the equipment is determined by playing the audio signal, and the volume played by the equipment is determined by playing the sound loudness, so that intelligent voice interaction is realized.
2. The intelligent voice interaction method for financial equipment as claimed in claim 1, wherein the audio signal is separated by ICA blind source separation algorithm.
3. The intelligent voice interaction method for financial equipment as claimed in claim 1, wherein the distance r from the speaker to the user is determined as follows:
judging whether the front of the equipment is a living body or not through an infrared sensor, and if the front of the equipment is the living body, measuring the distance between a user and the equipment through an ultrasonic sensor;
collecting audio signals through a microphone array to obtain a relative angle between a user and equipment;
and obtaining the distance r from the loudspeaker to the user according to the distance from the ultrasonic sensor to the user, the relative angle between the user and the equipment and the relative distances among the ultrasonic sensor, the microphone array and the loudspeaker.
4. The intelligent voice interaction method for financial equipment as claimed in claim 1, wherein the equipment starts audio signal acquisition after being awakened; performing framing processing on the audio signal, judging the audio signal to be pause when the pause time exceeds a set time threshold, and separating the audio signal; the awakening mode comprises awakening by an awakening word or awakening triggered by infrared rays.
5. The intelligent voice interaction method for financial equipment as claimed in claim 4,
after the equipment is awakened, acquiring a first audio signal, wherein a noise signal obtained by separation is a noise signal used when the audio signal is determined to be played each time in the voice interaction; in one voice interaction, when the situation that the position change of a user exceeds a set distance threshold or the noise loudness of a service environment exceeds a set loudness threshold is detected, separating the newly obtained audio signals, and obtaining a noise signal again to serve as the noise signal used when the audio signal is determined to be played next time in the voice interaction.
7. The intelligent voice interaction method for financial devices as claimed in claim 6, wherein the loudness attenuation is recalculated whenever a change in the user position is detected to exceed a set distance threshold, and the device is set according to the loudness of a new playing sound, so as to achieve real-time adjustment of the volume.
8. The financial device intelligent voice interaction method of claim 1, wherein the maximum value of the loudness of the played sound is set to twice the loudness of the basic sound.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111283365.6A CN113963699A (en) | 2021-11-01 | 2021-11-01 | Intelligent voice interaction method for financial equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111283365.6A CN113963699A (en) | 2021-11-01 | 2021-11-01 | Intelligent voice interaction method for financial equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113963699A true CN113963699A (en) | 2022-01-21 |
Family
ID=79468672
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111283365.6A Withdrawn CN113963699A (en) | 2021-11-01 | 2021-11-01 | Intelligent voice interaction method for financial equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113963699A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117294985A (en) * | 2023-10-27 | 2023-12-26 | 深圳市迪斯声学有限公司 | TWS Bluetooth headset control method |
-
2021
- 2021-11-01 CN CN202111283365.6A patent/CN113963699A/en not_active Withdrawn
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117294985A (en) * | 2023-10-27 | 2023-12-26 | 深圳市迪斯声学有限公司 | TWS Bluetooth headset control method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9591410B2 (en) | Hearing assistance apparatus | |
CN110473567B (en) | Audio processing method and device based on deep neural network and storage medium | |
Jeub et al. | Model-based dereverberation preserving binaural cues | |
US8898058B2 (en) | Systems, methods, and apparatus for voice activity detection | |
CN108235181B (en) | Method for noise reduction in an audio processing apparatus | |
Yoo et al. | Speech signal modification to increase intelligibility in noisy environments | |
US8223979B2 (en) | Enhancement of speech intelligibility in a mobile communication device by controlling operation of a vibrator based on the background noise | |
CN109493877A (en) | A kind of sound enhancement method and device of auditory prosthesis | |
US8423357B2 (en) | System and method for biometric acoustic noise reduction | |
WO2022256577A1 (en) | A method of speech enhancement and a mobile computing device implementing the method | |
JP2009075160A (en) | Communication speech processing method and its device, and its program | |
EP4115413A1 (en) | Voice optimization in noisy environments | |
Premananda et al. | Speech enhancement algorithm to reduce the effect of background noise in mobile phones | |
Shankar et al. | Influence of MVDR beamformer on a Speech Enhancement based Smartphone application for Hearing Aids | |
CN114023352B (en) | Voice enhancement method and device based on energy spectrum depth modulation | |
CN113963699A (en) | Intelligent voice interaction method for financial equipment | |
WO2008075305A1 (en) | Method and apparatus to address source of lombard speech | |
RU2589298C1 (en) | Method of increasing legible and informative audio signals in the noise situation | |
JP2011141540A (en) | Voice signal processing device, television receiver, voice signal processing method, program and recording medium | |
Shankar et al. | Smartphone-based single-channel speech enhancement application for hearing aids | |
US20240221769A1 (en) | Voice optimization in noisy environments | |
EP4158625A1 (en) | A own voice detector of a hearing device | |
JPH10341123A (en) | Acoustic reproduction device | |
Neufeld | An evaluation of adaptive noise cancellation as a technique for enhancing the intelligibility of noise-corrupted speech for the hearing impaired | |
CN109862470A (en) | To method, earphone and its computer readable storage medium of otopathy patient broadcast |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20220121 |