CN113963699A - Intelligent voice interaction method for financial equipment - Google Patents

Intelligent voice interaction method for financial equipment Download PDF

Info

Publication number
CN113963699A
CN113963699A CN202111283365.6A CN202111283365A CN113963699A CN 113963699 A CN113963699 A CN 113963699A CN 202111283365 A CN202111283365 A CN 202111283365A CN 113963699 A CN113963699 A CN 113963699A
Authority
CN
China
Prior art keywords
loudness
signal
audio signal
equipment
sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111283365.6A
Other languages
Chinese (zh)
Inventor
田立刚
张云峰
张海华
魏巍
杨孟超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cashway Technology Co Ltd
Original Assignee
Cashway Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cashway Technology Co Ltd filed Critical Cashway Technology Co Ltd
Priority to CN202111283365.6A priority Critical patent/CN113963699A/en
Publication of CN113963699A publication Critical patent/CN113963699A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses an intelligent voice interaction method for financial equipment, which comprises the following steps: signal acquisition and separation: collecting audio signals, and separating the audio signals into voice signals and noise signals by adopting a separation algorithm; synthesis of a speech signal: carrying out voice recognition on the voice signals, carrying out semantic understanding, finding out the best answer text, and synthesizing the answer text into answer voice signals; determining the playback audio signal according to equation 1: wherein, f (n) is the playing audio signal, s3(n) is the sound signal heard by the pre-estimating user, except the amplitude, other parameters are the same with the answering voice signal, d1(n) is the noise signal, n is the sampling frequency for discrete analysis of the audio signal; and determining the loudness of the played sound as the sum of the basic sound loudness and the loudness attenuation amount, and setting the equipment according to the loudness of the played sound to realize volume adjustment.

Description

Intelligent voice interaction method for financial equipment
Technical Field
The invention relates to the technical field of financial self-service terminals, in particular to an intelligent voice interaction method for financial equipment.
Background
The intelligent voice interaction is a new generation interaction mode based on voice input, and a feedback result can be obtained by speaking. The biggest problem of voice interaction is not accurate enough. Firstly, the accuracy of voice recognition is low due to the influence of the environment; moreover, the expression of an intention is diversified, and the intention cannot be completely covered; finally, voice interaction is an open domain and many unexpected situations need to be handled. There are no scenarios considered to be suitable for voice interaction, such as meeting scenarios, when family members sleep, etc.
With the wide application of financial self-service equipment and customer service robots, the volume of the existing equipment is constant in the interaction process, and in a complex environment, the listening effect of a user can be influenced by environmental sound, so that the satisfaction degree of customer experience is influenced to a certain extent.
Disclosure of Invention
The invention aims to provide an intelligent voice interaction method for financial equipment aiming at the technical defect that playing sound is constant in the prior art.
The technical scheme adopted for realizing the purpose of the invention is as follows:
an intelligent voice interaction method for financial equipment is characterized by comprising the following steps:
(1) acquisition of a playing audio signal:
signal acquisition and separation: collecting audio signals, and separating the audio signals into voice signals and noise signals by adopting a separation algorithm;
synthesis of a speech signal: carrying out voice recognition on the voice signals, carrying out semantic understanding, finding out the best answer text, and synthesizing the answer text into answer voice signals;
determining to play the audio signal according to formula 1;
Figure 944912DEST_PATH_IMAGE001
formula 1 convolution formula
Wherein f (n) is a playing audio signal, s3(n) is a sound signal heard by the pre-estimated user, except amplitude, other parameters are the same as the answering voice signal, d1(n) is a noise signal, n is a sampling frequency for performing discrete analysis on the audio signal, and m is 0-n and is an integer;
according to the noise signal d1(n) and the signal s3(n) which is predicted to be heard by the user, the playing audio signal is obtained through deconvolution
Figure 676108DEST_PATH_IMAGE002
The noise reduction function is achieved by superposing the noise signals;
(2) acquisition of loudness of played sound
Determining the loudness of the played sound as the sum of the loudness of the basic sound and the loudness attenuation quantity;
(3) the information content played by the equipment is determined by playing the audio signal, and the volume played by the equipment is determined by playing the sound loudness, so that intelligent voice interaction is realized.
Preferably, the audio signal is separated by an ICA blind source separation algorithm.
Preferably, the step of determining the distance r from the speaker to the user is as follows:
judging whether the front of the equipment is a living body or not through an infrared sensor, and if the front of the equipment is the living body, measuring the distance between a user and the equipment through an ultrasonic sensor;
collecting audio signals through a microphone array to obtain a relative angle between a user and equipment;
and obtaining the distance r from the loudspeaker to the user according to the distance from the ultrasonic sensor to the user, the relative angle between the user and the equipment and the relative distances among the ultrasonic sensor, the microphone array and the loudspeaker.
Preferably, the device starts audio signal acquisition after being awakened; performing framing processing on the audio signal, judging the audio signal to be pause when the pause time exceeds a set time threshold, and separating the audio signal; the awakening mode comprises awakening by an awakening word or awakening triggered by infrared rays.
Preferably, after the device is awakened, the device acquires a first audio signal, and the noise signal obtained by separation is the noise signal used when the audio signal is determined to be played each time in the voice interaction; in one voice interaction, when the situation that the position change of a user exceeds a set distance threshold or the noise loudness of a service environment exceeds a set loudness threshold is detected, separating the newly obtained audio signals, and obtaining a noise signal again to serve as the noise signal used when the audio signal is determined to be played next time in the voice interaction.
Preferably, the basic sound loudness is a fixed known value, and the loudness attenuation is calculated as follows:
Figure 482521DEST_PATH_IMAGE004
equation 2
Where r is the distance from the horn to the user.
Preferably, every time the change of the position of the user is detected to exceed the set distance threshold, the loudness attenuation amount is recalculated, the equipment is set according to the loudness of the new playing sound, and the real-time adjustment of the volume is realized.
Preferably, the maximum value of the loudness of the playback sound is twice the loudness of the basic sound
The invention has the beneficial effects that:
1. the invention provides a method for automatically adjusting the playing volume of equipment with different noises and different user positions, which improves the satisfaction degree of a client in the voice exchange process of intelligent equipment.
2. According to the audio signal that gathers at every turn to separate into speech signal and noise signal, the audio signal that gathers at every turn is different, and speech signal and noise signal are also different, have realized carrying out the purpose that audio signal adjusted respectively to different users for every user can all hear the most comfortable, most suitable sound.
3. When the customer is not communicating with the device, the signal directly measured is a noise signal through the microphone array of the device. When a client communicates with the equipment, the audio signal mixed with noise is collected through a microphone array of the equipment: y1(n) = s1(n) + d1(n), y (n) is the captured audio signal, s1(n) is the speech signal, and d1(n) is the noise signal. The method comprises the steps of firstly carrying out noise elimination processing on an audio signal mixed with noise, separating the audio signal by adopting an ICA blind source separation algorithm to obtain a voice signal s1(n) and a noise signal d1(n), and improving the accuracy of converting the voice signal into text information by noise reduction processing.
Drawings
FIG. 1 is a schematic flow diagram of the present invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
An intelligent voice interaction method for financial equipment comprises the following steps:
(1) acquisition of a playing audio signal:
signal acquisition and separation: collecting audio signals, and separating the audio signals into voice signals and noise signals by adopting a separation algorithm;
synthesis of a speech signal: carrying out voice recognition on the voice signals, carrying out semantic understanding, finding out the best answer text, and synthesizing the answer text into answer voice signals;
the audio signal is a regular sound wave frequency and amplitude variation information carrier with voice, music and sound effects. The answer speech signal synthesized by the device from the answer text is also an audio signal, also called a sound wave, which has three important parameters: frequency, amplitude and phase, which also determine the characteristics of the audio signal. In the prior art, a device directly synthesizes answer texts into answer voice signals, and the answer voice signals are played through a loudspeaker, wherein the frequency, the amplitude and the phase of the answer voice signals are fixed and are values initially set by the device, so that the sound waves are the same no matter what environment a user is on site, and in different noise environments, although the audio signals played by the device are the same, the audio signals heard by the user are different. In signal processing, useful call signals and useless call noise are utilized, and the noise signals are also utilized by the application.
In view of this, in the technical scheme designed by the invention, the collected and separated noise signal is used as a calculation known quantity; the words sent by the equipment are synthesized by a language, and the synthesized voice signal is s2 (n); s3(n) the frequency and phase angle are the same as s2(n), and the amplitude is calculated by equation 2 and the basic loudness, resulting in s3(n), which is the sound signal that the user is predicted to hear. S3(n) and d1(n) in formula 1 are known quantities, so that deconvolution can be performed to determine f (n) by superimposing noise signals, and the amplitude is related to loudness, which is obtained in the following acquisition of loudness of played sound, to determine the loudness of played sound.
The noise signal is the known quantity calculated by the formula 1 convolution in the interaction at this time, and when the equipment is awakened again, the noise signal changes, so that the noise signal in the interaction process is ensured to be obtained as the calculated known quantity according to the actual condition at that time.
Determining the playback audio signal according to equation 1:
Figure 299167DEST_PATH_IMAGE001
equation 1
Where f (n) is the playing audio signal, s3(n) is the predicted sound signal heard by the user, except for the amplitude, the other parameters are the same as the answering voice signal, d1(n) is the noise signal, n is the sampling frequency for discrete analysis of the audio signal, and m is 0-n and is an integer.
The conventional method is to find the best answer text, synthesize it into a speech signal and then play it to the user by the device, and in this scheme, after synthesizing the answer text into an answer speech signal, the noise reduction is performed, and f (n) is solved in reverse for formula 1. By this method, a noise reduction effect is achieved. Explications formula 1 is detailed below:
Figure 82185DEST_PATH_IMAGE005
Figure 402307DEST_PATH_IMAGE006
Figure 894469DEST_PATH_IMAGE007
by parity of reasoning, obtain
Figure DEST_PATH_IMAGE008
(2) Acquisition of loudness of played sound
And determining the loudness of the playing sound as the sum of the loudness of the basic sound and the loudness attenuation amount.
Basic sound loudness is the loudness of comfortable sound heard by a user, and for a fixed known value, the sound pressure is in db, 1 db is the sound just heard by the human ear, and less than 20 db is generally considered to be quiet, and generally less than 15 db is considered to be dead. About 20-40 decibels are furanic linguis. 40-60 decibels pertain to our normal conversation voice. Since the financial device is generally installed in a bank hall, the loudness at which the user can hear the sound comfortably is set to 50 db.
The loudness attenuation is calculated as follows:
Figure 11460DEST_PATH_IMAGE004
equation 2
Where r is the distance from the horn to the user.
(3) The information content played by the equipment is determined by playing the audio signal, and the volume played by the equipment is determined by playing the sound loudness, so that intelligent voice interaction is realized.
In this embodiment, an FFT noise reduction algorithm and an ICA blind source separation algorithm are adopted to separate the audio signals.
The determination of the distance r from the horn to the user is as follows:
collecting audio signals through a microphone array to obtain a relative angle between a user and equipment;
judging whether the front of the equipment is a living body or not through an infrared sensor, and if the front of the equipment is the living body, measuring the distance between a user and the equipment through an ultrasonic sensor;
and obtaining the distance r from the loudspeaker to the user according to the distance from the ultrasonic sensor to the user, the relative angle between the user and the equipment and the relative distances among the sensor, the microphone array and the loudspeaker. The relative distance of the sensor to the horn is a fixed known value.
After the equipment is awakened, audio signal acquisition is started; performing framing processing on the audio signal, judging the audio signal to be pause when the pause time exceeds a set time threshold, and separating the audio signal; the awakening mode comprises awakening by an awakening word or awakening triggered by infrared rays.
Generally, after the device is awakened, a first audio signal is acquired, and the noise signal obtained by separation is the noise signal used when the audio signal is determined to be played each time in the voice interaction.
However, in practical applications, it is found that some changes occur in the user environment, which results in different noise signals, and therefore, the noise signals need to be determined again, and then the noise-free frequency signals to be played back are obtained through a common subtraction method. Thereby ensuring that the audio signal heard by the person is a clean audio signal. Therefore, in one voice interaction, when the situation that the position change of the user exceeds the set distance threshold or the noise loudness of the service environment exceeds the set loudness threshold is detected, the newly obtained audio signals are separated, and the noise signals are obtained again and used as the noise signals used when the audio signals are determined to be played next time in the voice interaction.
Furthermore, in order to ensure that the volume at each moment is most appropriate, every time the change of the position of the user is detected to exceed the set distance threshold, the loudness attenuation amount is recalculated, and the device is set according to the loudness of the new playing sound, so that the volume adjustment is realized.
The maximum value of the loudness of the played sound is set to be twice the loudness of the basic sound.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (8)

1. An intelligent voice interaction method for financial equipment is characterized by comprising the following steps:
(1) acquisition of a playing audio signal:
signal acquisition and separation: collecting audio signals, and separating the audio signals into voice signals and noise signals by adopting a separation algorithm;
synthesis of a speech signal: carrying out voice recognition on the voice signals, carrying out semantic understanding, finding out the best answer text, and synthesizing the answer text into answer voice signals;
determining the playback audio signal according to equation 1:
Figure 867275DEST_PATH_IMAGE001
formula 1 convolution formula
Wherein f (n) is a playing audio signal, s3(n) is a sound signal heard by the pre-estimated user, except amplitude, other parameters are the same as the answering voice signal, d1(n) is a noise signal, n is a sampling frequency for performing discrete analysis on the audio signal, and m is 0-n and is an integer;
according to the noise signal d1(n) and the signal s3(n) which is predicted to be heard by the user, the playing audio signal is obtained through deconvolution
Figure 603019DEST_PATH_IMAGE002
The noise reduction function is achieved by superposing the noise signals;
(2) acquisition of loudness of played sound
Determining the loudness of the played sound as the sum of the loudness of the basic sound and the loudness attenuation quantity;
(3) the information content played by the equipment is determined by playing the audio signal, and the volume played by the equipment is determined by playing the sound loudness, so that intelligent voice interaction is realized.
2. The intelligent voice interaction method for financial equipment as claimed in claim 1, wherein the audio signal is separated by ICA blind source separation algorithm.
3. The intelligent voice interaction method for financial equipment as claimed in claim 1, wherein the distance r from the speaker to the user is determined as follows:
judging whether the front of the equipment is a living body or not through an infrared sensor, and if the front of the equipment is the living body, measuring the distance between a user and the equipment through an ultrasonic sensor;
collecting audio signals through a microphone array to obtain a relative angle between a user and equipment;
and obtaining the distance r from the loudspeaker to the user according to the distance from the ultrasonic sensor to the user, the relative angle between the user and the equipment and the relative distances among the ultrasonic sensor, the microphone array and the loudspeaker.
4. The intelligent voice interaction method for financial equipment as claimed in claim 1, wherein the equipment starts audio signal acquisition after being awakened; performing framing processing on the audio signal, judging the audio signal to be pause when the pause time exceeds a set time threshold, and separating the audio signal; the awakening mode comprises awakening by an awakening word or awakening triggered by infrared rays.
5. The intelligent voice interaction method for financial equipment as claimed in claim 4,
after the equipment is awakened, acquiring a first audio signal, wherein a noise signal obtained by separation is a noise signal used when the audio signal is determined to be played each time in the voice interaction; in one voice interaction, when the situation that the position change of a user exceeds a set distance threshold or the noise loudness of a service environment exceeds a set loudness threshold is detected, separating the newly obtained audio signals, and obtaining a noise signal again to serve as the noise signal used when the audio signal is determined to be played next time in the voice interaction.
6. The intelligent voice interaction method for financial equipment according to claim 1, wherein the loudness of the fundamental sound is a fixed known value, and the loudness attenuation is calculated as follows:
Figure DEST_PATH_IMAGE004
equation 2
Where r is the distance from the horn to the user.
7. The intelligent voice interaction method for financial devices as claimed in claim 6, wherein the loudness attenuation is recalculated whenever a change in the user position is detected to exceed a set distance threshold, and the device is set according to the loudness of a new playing sound, so as to achieve real-time adjustment of the volume.
8. The financial device intelligent voice interaction method of claim 1, wherein the maximum value of the loudness of the played sound is set to twice the loudness of the basic sound.
CN202111283365.6A 2021-11-01 2021-11-01 Intelligent voice interaction method for financial equipment Withdrawn CN113963699A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111283365.6A CN113963699A (en) 2021-11-01 2021-11-01 Intelligent voice interaction method for financial equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111283365.6A CN113963699A (en) 2021-11-01 2021-11-01 Intelligent voice interaction method for financial equipment

Publications (1)

Publication Number Publication Date
CN113963699A true CN113963699A (en) 2022-01-21

Family

ID=79468672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111283365.6A Withdrawn CN113963699A (en) 2021-11-01 2021-11-01 Intelligent voice interaction method for financial equipment

Country Status (1)

Country Link
CN (1) CN113963699A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117294985A (en) * 2023-10-27 2023-12-26 深圳市迪斯声学有限公司 TWS Bluetooth headset control method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117294985A (en) * 2023-10-27 2023-12-26 深圳市迪斯声学有限公司 TWS Bluetooth headset control method

Similar Documents

Publication Publication Date Title
US9591410B2 (en) Hearing assistance apparatus
CN110473567B (en) Audio processing method and device based on deep neural network and storage medium
Jeub et al. Model-based dereverberation preserving binaural cues
US8898058B2 (en) Systems, methods, and apparatus for voice activity detection
CN108235181B (en) Method for noise reduction in an audio processing apparatus
Yoo et al. Speech signal modification to increase intelligibility in noisy environments
US8223979B2 (en) Enhancement of speech intelligibility in a mobile communication device by controlling operation of a vibrator based on the background noise
CN109493877A (en) A kind of sound enhancement method and device of auditory prosthesis
US8423357B2 (en) System and method for biometric acoustic noise reduction
WO2022256577A1 (en) A method of speech enhancement and a mobile computing device implementing the method
JP2009075160A (en) Communication speech processing method and its device, and its program
EP4115413A1 (en) Voice optimization in noisy environments
Premananda et al. Speech enhancement algorithm to reduce the effect of background noise in mobile phones
Shankar et al. Influence of MVDR beamformer on a Speech Enhancement based Smartphone application for Hearing Aids
CN114023352B (en) Voice enhancement method and device based on energy spectrum depth modulation
CN113963699A (en) Intelligent voice interaction method for financial equipment
WO2008075305A1 (en) Method and apparatus to address source of lombard speech
RU2589298C1 (en) Method of increasing legible and informative audio signals in the noise situation
JP2011141540A (en) Voice signal processing device, television receiver, voice signal processing method, program and recording medium
Shankar et al. Smartphone-based single-channel speech enhancement application for hearing aids
US20240221769A1 (en) Voice optimization in noisy environments
EP4158625A1 (en) A own voice detector of a hearing device
JPH10341123A (en) Acoustic reproduction device
Neufeld An evaluation of adaptive noise cancellation as a technique for enhancing the intelligibility of noise-corrupted speech for the hearing impaired
CN109862470A (en) To method, earphone and its computer readable storage medium of otopathy patient broadcast

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20220121