CN112185393A - Voice recognition processing method for power supply intelligent client - Google Patents
Voice recognition processing method for power supply intelligent client Download PDFInfo
- Publication number
- CN112185393A CN112185393A CN202011059062.1A CN202011059062A CN112185393A CN 112185393 A CN112185393 A CN 112185393A CN 202011059062 A CN202011059062 A CN 202011059062A CN 112185393 A CN112185393 A CN 112185393A
- Authority
- CN
- China
- Prior art keywords
- voice
- voice signal
- mode
- voice recognition
- client
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 47
- 238000007781 pre-processing Methods 0.000 claims abstract description 16
- 238000000605 extraction Methods 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims abstract description 12
- 230000004044 response Effects 0.000 claims abstract description 8
- 238000001514 detection method Methods 0.000 claims description 30
- 238000000034 method Methods 0.000 claims description 27
- 230000008030 elimination Effects 0.000 claims description 20
- 238000003379 elimination reaction Methods 0.000 claims description 20
- 238000013507 mapping Methods 0.000 claims description 14
- 238000004458 analytical method Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 238000005516 engineering process Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 12
- 230000009467 reduction Effects 0.000 description 7
- 230000009286 beneficial effect Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000004590 computer program Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012512 characterization method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000030808 detection of mechanical stimulus involved in sensory perception of sound Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/51—Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Business, Economics & Management (AREA)
- Marketing (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention provides a voice recognition processing method for a power supply intelligent client, which is characterized in that a power supply intelligent seat is used for answering a voice signal of a client; preprocessing a voice signal of a client; performing feature extraction on the preprocessed voice signal to obtain voice features in the voice signal; and recognizing the extracted voice features through a pre-constructed voice recognition model to obtain a voice recognition result in a text format, and outputting the voice recognition result. By implementing the invention, because the voice signals are effectively preprocessed, the voice recognition accuracy can be improved; meanwhile, due to the adoption of an intelligent interrupt processing mechanism, the intelligent degree can be improved, and intelligent interrupt response can be realized, so that the use experience of a client can be improved; in addition, by training the speech recognition model, the accuracy of speech recognition can be improved.
Description
Technical Field
The invention relates to the technical field of power supply intelligent clients, in particular to a voice recognition processing method for a power supply intelligent client.
Background
For customer service work, intelligent voice is one of trends of future development, and although many power supply enterprises are actively building intelligent customer service systems, most of the existing voice navigation systems have some disadvantages, mainly embodied in the disadvantages of low intelligent degree, limited voice recognition effect, complex service flow, poor integrity, poor serviceability and the like.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a voice recognition processing method for a power supply intelligent client, which can improve the voice recognition effect.
In order to solve the above technical problem, an aspect of the present invention provides a speech recognition processing method for a power supply smart client, which includes the following steps:
step S10, answering the voice signal of the client through the intelligent power supply seat;
step S11, preprocessing the voice signal of the client, wherein the preprocessing comprises endpoint detection, noise elimination and intelligent interruption processing;
step S12, extracting the characteristics of the preprocessed voice signal to obtain the voice characteristics in the voice signal;
and step S13, recognizing the extracted voice features through a pre-constructed voice recognition model, obtaining a voice recognition result in a text format, and outputting the voice recognition result.
Preferably, the step S11 further includes:
step S110, converting the voice data into a data format suitable for acoustic model processing, where the data format includes: wav or pcm audio at a mono 16-bit 16000 sampling rate;
step S111, identifying end points in the voice signal to determine the speaking start point and the speaking end point of the user;
step S112, eliminating background noise, carrying out scene analysis, firstly carrying out scene type judgment on the voice signal, and after determining the scene type, carrying out noise elimination according to the preset noise elimination modes corresponding to different scenes;
and step S113, responding to the latest voice signal of the user in real time, and interrupting the current task or continuing the current task according to the set interrupt processing mode.
Preferably, the step S111 further includes:
when a voice signal is received to trigger voice recognition, judging whether a speaking habit of a corresponding user is stored or not according to user login information, if so, performing endpoint detection on detection parameters after corresponding adjustment according to the speaking habit of the user; and if not, carrying out endpoint detection based on the universal detection parameters.
Preferably, in the step S111, the endpoint in the speech signal is identified by a time domain feature method and a frequency domain feature method.
Preferably, the step S112 specifically includes:
presetting different interrupt processing mode types for different users, wherein the interrupt processing mode types comprise: a interruptible mode and an non-interruptible mode;
when voice recognition is carried out in real time, if a new voice signal is received, judging the type of the current interrupt processing mode;
if the current mode is the interruptible mode, interrupting the response of the current voice recognition when a new voice instruction is received;
and if the current mode is the non-interruptible mode, after the user triggers the voice recognition processing, re-receiving a new voice signal instruction after the instruction execution is finished.
Preferably, the step S12 further includes:
and performing feature extraction on the preprocessed voice signal by adopting a linear predictive coding technology to obtain the voice feature in the voice signal.
Preferably, further comprising: the method comprises the steps of pre-constructing a speech recognition model, wherein the speech recognition model comprises an acoustic model, a dictionary and a language model, and specifically comprises the following steps:
training voice characteristics corresponding to voice sample data in a voice database to obtain mapping from the voice characteristics to phonemes to form an acoustic model;
training texts in a text database to obtain a language model, and obtaining mapping between words and sentences to form the language model;
and constructing a mapping relation between the voice and the characters according to the acoustic model and the language model to form a dictionary.
The embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a voice recognition processing method for a power supply intelligent client, which comprises the steps of firstly answering a voice signal of a client through a power supply intelligent seat; then, preprocessing the voice signals of the client, wherein the preprocessing comprises endpoint detection, noise elimination and intelligent interruption processing; performing feature extraction on the preprocessed voice signal to obtain voice features in the voice signal; and recognizing the extracted voice features through a pre-constructed voice recognition model to obtain a voice recognition result in a text format, and outputting the voice recognition result. Because the voice signals are effectively preprocessed, the voice recognition accuracy can be improved; meanwhile, due to the adoption of an intelligent interrupt processing mechanism, the intelligent degree can be improved, and intelligent interrupt response can be realized, so that the use experience of a client can be improved;
in addition, by training the speech recognition model, the accuracy of speech recognition can be improved.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a speech recognition processing method for a power-supplying smart client according to the present invention;
FIG. 2 is a more detailed flowchart of step S11 in FIG. 1;
FIG. 3 is a schematic diagram of an embodiment of a speech recognition processing system for a powered Smart client according to the present invention;
FIG. 4 is a schematic diagram of the pretreatment unit of FIG. 3;
FIG. 5 is a schematic diagram of the structure of the intelligent breaking unit in FIG. 4;
fig. 6 is a schematic structural diagram of the speech recognition model building unit in fig. 3.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
For those skilled in the art to more clearly understand the objects, technical solutions and advantages of the present invention, the following description will be further provided in conjunction with the accompanying drawings and examples.
Referring to fig. 1, a schematic flow chart of an embodiment of a speech recognition processing method for a power supply smart client according to the present invention is shown; referring to fig. 2 together, in this embodiment, the speech recognition processing method for a power supply smart client includes the following steps:
step S10, answering the voice signal of the client through the intelligent power supply seat;
step S11, preprocessing the voice signal of the client, wherein the preprocessing comprises endpoint detection, noise elimination and intelligent interruption processing;
in a specific example, the step S11 further includes:
step S110, converting the voice data into a data format suitable for acoustic model processing, where the data format includes: wav or pcm audio at a mono 16-bit 16000 sampling rate;
step S111, identifying end points in the voice signal to determine the speaking start point and the speaking end point of the user; once the user starts speaking, the voice starts to flow to a following recognition engine (namely a voice recognition processing unit) until the end of the user speaking is detected, and silence detection in the voice recognition can determine whether the user finishes speaking; this way the recognition engine starts the recognition process while the user is speaking.
Step S112, eliminating background noise, carrying out scene analysis, firstly carrying out scene type judgment on the voice signal, and after determining the scene type, carrying out noise elimination according to the preset noise elimination modes corresponding to different scenes; the noise elimination is mainly to eliminate background noise, and can be performed based on the scene where the user is located, for example, scene type judgment is performed on the voice signal, after the scene type is determined, noise elimination is performed according to the preset noise elimination mode corresponding to different scenes, and by setting corresponding noise elimination strategies for various scene types, the voice noise elimination processing effect can be improved, and the accuracy of subsequent voice recognition is ensured. In different scenes, different noise reduction modes can be learned; specifically, for each scene, noise reduction processing can be performed through various noise reduction algorithms, a noise reduction mode adapted to the scene is found, and after the noise reduction mode corresponding to each scene is determined, noise elimination can be performed on each scene by using the corresponding noise reduction mode.
And step S113, responding to the latest voice signal of the user in real time, and interrupting the current task or continuing the current task according to the set interrupt processing mode. In some scenes, the intelligent interruption means that a user can speak own requirements at any time in the process of playing the prompt words of the self-service voice service without waiting for the playing to be finished, and the system can automatically judge, immediately stop the playing of the prompt words and respond to the voice indication of the user. The intelligent interruption function enables a user to speak own requirements at any time in the prompt language playing process of self-service voice service without waiting for the playing to be finished, the system can automatically judge, the prompt language is immediately stopped from being played, and the voice instruction of the user is responded. The function enables human-computer interaction to be more efficient, faster and natural, and is beneficial to enhancing customer experience.
Preferably, the step S111 further includes:
when a voice signal is received to trigger voice recognition, judging whether a speaking habit of a corresponding user is stored or not according to user login information, if so, performing endpoint detection on detection parameters after corresponding adjustment according to the speaking habit of the user; and if not, carrying out endpoint detection based on the universal detection parameters. By establishing the speaking habits of different users to configure the corresponding endpoint detection strategy, the accuracy and efficiency of endpoint detection can be effectively improved.
Preferably, in the step S111, the endpoint in the speech signal is identified by a time domain feature method and a frequency domain feature method. Endpoint detection is the determination of the beginning and end of speech from a segment of a signal containing speech. Effective endpoint detection not only reduces processing time, but also eliminates noise interference in the silence. There are two main types of methods at present: a time domain characterization method and a frequency domain characterization method. The time domain characteristic method is to detect the end point by using the voice volume and the zero crossing rate, the calculated amount is small, but misjudgment can be caused to the gas sound, and different volume calculation can cause different detection results. The frequency domain characteristic method is used for detecting voice by using the frequency spectrum variation and entropy detection of sound, and the calculated amount is large.
Preferably, the step S112 specifically includes:
presetting different interrupt processing mode types for different users, wherein the interrupt processing mode types comprise: a interruptible mode and an non-interruptible mode;
when voice recognition is carried out in real time, if a new voice signal is received, judging the type of the current interrupt processing mode;
if the current mode is the interruptible mode, interrupting the response of the current voice recognition when a new voice instruction is received;
and if the current mode is the non-interruptible mode, after the user triggers the voice recognition processing, re-receiving a new voice signal instruction after the instruction execution is finished.
It can be understood that when speech recognition is carried out in real time, if a new speech signal is received, if a user interrupts the speech recognition, and a latest instruction is sent, the platform timely responds, so that the user can speak out own requirements at any time in the process of playing the prompt words of the self-service speech service without waiting for the completion of playing, the system can automatically judge, immediately stop the playing of the prompt words, and respond to the speech instruction of the user. The function enables human-computer interaction to be more efficient, faster and natural, and is beneficial to enhancing customer experience. Specifically, different modes such as a break-able mode and a non-break-able mode can be set, and in the break-able mode, for example, when a new voice instruction is received, the response of the current voice recognition is interrupted; and in the non-interrupt mode, after the user triggers the voice recognition processing, the new voice signal instruction is received again after the instruction execution is finished. The specific mode setting can be flexibly configured by the user according to personal use habits.
It is understood that in various embodiments, the pre-processing may also include filtering, a/D conversion, pre-emphasis, and the like.
Step S12, extracting the characteristics of the preprocessed voice signal to obtain the voice characteristics in the voice signal;
preferably, the step S12 further includes:
and performing feature extraction on the preprocessed voice signal by adopting a linear predictive coding technology to obtain the voice feature in the voice signal. It will be appreciated that the basic idea of linear predictive coding is that there is a correlation between the speech signal sample points and that the current and future sample point values can be predicted using a linear combination of past sample points. The linear pre-measured coefficients are uniquely determined by minimizing the mean square error between the predicted signal and the actual signal.
Preferably, further comprising: the method comprises the steps of pre-constructing a speech recognition model, wherein the speech recognition model comprises an acoustic model, a dictionary and a language model, and specifically comprises the following steps:
training voice characteristics corresponding to voice sample data in a voice database to obtain mapping from the voice characteristics to phonemes to form an acoustic model; wherein the acoustic model is a knowledge representation of differences in acoustics, phonetics, environmental variables, speaker gender, accents, etc.;
training texts in a text database to obtain a language model, and obtaining mapping between words and sentences to form the language model; the language model is a knowledge representation formed by a group of word sequences and can be obtained by performing LM training by using an SRILM tool;
and constructing a mapping relation between the voice and the characters according to the acoustic model and the language model to form a dictionary.
In general, the acoustic model is a unit that classifies acoustic features of speech into (decodes) phonemes or words, and then the language model decodes the words into a complete sentence.
And step S13, recognizing the extracted voice features through a pre-constructed voice recognition model, obtaining a voice recognition result in a text format, and outputting the voice recognition result. And matching and comparing the input voice characteristics with the acoustic model during recognition to obtain the optimal recognition result.
FIG. 3 is a schematic diagram illustrating an embodiment of a speech recognition processing system for a power-supplying smart client according to the method of the present invention; as shown in fig. 4 to 5. In this embodiment, the speech recognition processing system 1 for a power supply smart client includes:
the voice input unit 10 is used for receiving voice signals of a customer through a power supply intelligent seat;
the system comprises a preprocessing unit 11, a voice signal processing unit and a voice signal processing unit, wherein the preprocessing unit is used for preprocessing a voice signal of a client, and comprises endpoint detection, noise elimination and intelligent interruption processing;
a feature extraction unit 12, configured to perform feature extraction on the preprocessed voice signal to obtain a voice feature in the voice signal;
a speech recognition model construction unit 13 configured to construct a speech recognition model in advance, where the speech recognition model includes an acoustic model, a dictionary, and a language model;
and the voice recognition processing unit 14 is used for recognizing the extracted voice features through a pre-constructed voice recognition model, obtaining a voice recognition result in a text format and outputting the voice recognition result.
In a specific example, the preprocessing unit 11 further includes:
a channel conversion unit 110, configured to convert the voice data into a data format suitable for acoustic model processing, the data format including: wav or pcm audio at a mono 16-bit 16000 sampling rate;
an endpoint detection unit 111, configured to identify an endpoint in the speech signal to determine a user speaking start point and a user speaking end point;
the noise reduction unit 112 is configured to eliminate background noise, perform scene analysis, perform scene type judgment on the speech signal, and perform noise elimination according to a preset noise elimination manner corresponding to different scenes after determining the scene type;
and the intelligent interrupt unit 113 is used for responding to the latest voice signal of the user in real time and interrupting the current task or continuing the current task according to the set interrupt processing mode.
Specifically, in one example, the endpoint detection unit 111 further includes:
when a voice signal is received to trigger voice recognition, judging whether a speaking habit of a corresponding user is stored or not according to user login information, if so, performing endpoint detection on detection parameters after corresponding adjustment according to the speaking habit of the user; and if not, carrying out endpoint detection based on the universal detection parameters.
Preferably, the endpoint detecting unit 111 further identifies an endpoint in the speech signal by a time domain feature method and a frequency domain feature method.
In a specific example, the intelligent interruption unit 113 specifically includes:
an interrupt mode type setting unit 1130, configured to preset different interrupt processing mode types for different users, where the interrupt processing mode types include: a interruptible mode and an non-interruptible mode;
an interrupt mode processing unit 1131, configured to determine a current interrupt processing mode type if a new voice signal is received when performing voice recognition in real time;
if the current mode is the interruptible mode, interrupting the response of the current voice recognition when a new voice instruction is received;
and if the current mode is the non-interruptible mode, after the user triggers the voice recognition processing, re-receiving a new voice signal instruction after the instruction execution is finished.
Preferably, the feature extraction unit is further configured to perform feature extraction on the preprocessed voice signal by using a linear predictive coding technique, so as to obtain a voice feature in the voice signal.
Specifically, in one example, in the speech recognition model construction unit 13, the acoustic model is used for establishing a pronunciation template for each pronunciation, and the acoustic model is a knowledge representation of the difference between acoustics, phonetics, environmental variables, speaker gender and accent; the language model is a knowledge representation of a set of word sequences, which is a mapping of words to words, words to sentences; the dictionary is constructed with a mapping relationship between speech and text.
Preferably, the speech recognition model construction unit 13 includes:
a voice database 130 for storing a plurality of voice sample data;
the feature pre-extraction unit 131 is configured to perform feature extraction on voice sample data in the voice database to obtain voice features corresponding to the voice sample data;
and an acoustic model training unit 132, configured to train the speech features, obtain a mapping from the speech features to phonemes, and form an acoustic model.
Preferably, the speech recognition model construction unit 13 includes:
a text database 133 storing a plurality of text sample data;
and the language model training unit 134 is configured to train the texts in the text database to obtain mappings between words and sentences, so as to form a language model.
Preferably, the speech recognition model building unit 13 further comprises:
the dictionary building unit 135 is configured to build a mapping relationship between the speech and the text according to the acoustic model and the language model to form a dictionary.
For more details, reference may be made to the foregoing description of fig. 1 and fig. 2, which is not repeated herein.
The embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a voice recognition processing method for a power supply intelligent client, which comprises the steps of firstly answering a voice signal of a client through a power supply intelligent seat; then, preprocessing the voice signals of the client, wherein the preprocessing comprises endpoint detection, noise elimination and intelligent interruption processing; performing feature extraction on the preprocessed voice signal to obtain voice features in the voice signal; and recognizing the extracted voice features through a pre-constructed voice recognition model to obtain a voice recognition result in a text format, and outputting the voice recognition result. Because the voice signals are effectively preprocessed, the voice recognition accuracy can be improved; meanwhile, due to the adoption of an intelligent interrupt processing mechanism, the intelligent degree can be improved, and intelligent interrupt response can be realized, so that the use experience of a client can be improved;
in addition, by training the speech recognition model, the accuracy of speech recognition can be improved.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
Claims (7)
1. A speech recognition processing method for a power supply intelligent client is characterized by comprising the following steps:
step S10, answering the voice signal of the client through the intelligent power supply seat;
step S11, preprocessing the voice signal of the client, wherein the preprocessing comprises endpoint detection, noise elimination and intelligent interruption processing;
step S12, extracting the characteristics of the preprocessed voice signal to obtain the voice characteristics in the voice signal;
and step S13, recognizing the extracted voice features through a pre-constructed voice recognition model, obtaining a voice recognition result in a text format, and outputting the voice recognition result.
2. The method of claim 1, wherein the step S11 further comprises:
step S110, converting the voice data into a data format suitable for acoustic model processing, where the data format includes: wav or pcm audio at a mono 16-bit 16000 sampling rate;
step S111, identifying end points in the voice signal to determine the speaking start point and the speaking end point of the user;
step S112, eliminating background noise, carrying out scene analysis, firstly carrying out scene type judgment on the voice signal, and after determining the scene type, carrying out noise elimination according to the preset noise elimination modes corresponding to different scenes;
and step S113, responding to the latest voice signal of the user in real time, and interrupting the current task or continuing the current task according to the set interrupt processing mode.
3. The method of claim 2, wherein the step S111 further comprises:
when a voice signal is received to trigger voice recognition, judging whether a speaking habit of a corresponding user is stored or not according to user login information, if so, performing endpoint detection on detection parameters after corresponding adjustment according to the speaking habit of the user; and if not, carrying out endpoint detection based on the universal detection parameters.
4. The method of claim 3, wherein in step S111, the endpoints in the speech signal are identified by a time domain feature method and a frequency domain feature method.
5. The method according to claim 2, wherein the step S112 specifically includes:
presetting different interrupt processing mode types for different users, wherein the interrupt processing mode types comprise: a interruptible mode and an non-interruptible mode;
when voice recognition is carried out in real time, if a new voice signal is received, judging the type of the current interrupt processing mode;
if the current mode is the interruptible mode, interrupting the response of the current voice recognition when a new voice instruction is received;
and if the current mode is the non-interruptible mode, after the user triggers the voice recognition processing, re-receiving a new voice signal instruction after the instruction execution is finished.
6. The method of claim 5, wherein the step S12 further comprises:
and performing feature extraction on the preprocessed voice signal by adopting a linear predictive coding technology to obtain the voice feature in the voice signal.
7. The method of any of claims 1 to 6, further comprising: the method comprises the steps of pre-constructing a speech recognition model, wherein the speech recognition model comprises an acoustic model, a dictionary and a language model, and specifically comprises the following steps:
training voice characteristics corresponding to voice sample data in a voice database to obtain mapping from the voice characteristics to phonemes to form an acoustic model;
training texts in a text database to obtain a language model, and obtaining mapping between words and sentences to form the language model;
and constructing a mapping relation between the voice and the characters according to the acoustic model and the language model to form a dictionary.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011059062.1A CN112185393A (en) | 2020-09-30 | 2020-09-30 | Voice recognition processing method for power supply intelligent client |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011059062.1A CN112185393A (en) | 2020-09-30 | 2020-09-30 | Voice recognition processing method for power supply intelligent client |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112185393A true CN112185393A (en) | 2021-01-05 |
Family
ID=73947098
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011059062.1A Pending CN112185393A (en) | 2020-09-30 | 2020-09-30 | Voice recognition processing method for power supply intelligent client |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112185393A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109509471A (en) * | 2018-12-28 | 2019-03-22 | 浙江百应科技有限公司 | A method of the dialogue of intelligent sound robot is interrupted based on vad algorithm |
CN109859774A (en) * | 2019-01-02 | 2019-06-07 | 珠海格力电器股份有限公司 | Speech ciphering equipment and its end-point detection sensitivity adjustment method, device and storage medium |
CN110299152A (en) * | 2019-06-28 | 2019-10-01 | 北京猎户星空科技有限公司 | Interactive output control method, device, electronic equipment and storage medium |
CN110517697A (en) * | 2019-08-20 | 2019-11-29 | 中信银行股份有限公司 | Prompt tone intelligence cutting-off device for interactive voice response |
CN111540349A (en) * | 2020-03-27 | 2020-08-14 | 北京捷通华声科技股份有限公司 | Voice interruption method and device |
-
2020
- 2020-09-30 CN CN202011059062.1A patent/CN112185393A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109509471A (en) * | 2018-12-28 | 2019-03-22 | 浙江百应科技有限公司 | A method of the dialogue of intelligent sound robot is interrupted based on vad algorithm |
CN109859774A (en) * | 2019-01-02 | 2019-06-07 | 珠海格力电器股份有限公司 | Speech ciphering equipment and its end-point detection sensitivity adjustment method, device and storage medium |
CN110299152A (en) * | 2019-06-28 | 2019-10-01 | 北京猎户星空科技有限公司 | Interactive output control method, device, electronic equipment and storage medium |
CN110517697A (en) * | 2019-08-20 | 2019-11-29 | 中信银行股份有限公司 | Prompt tone intelligence cutting-off device for interactive voice response |
CN111540349A (en) * | 2020-03-27 | 2020-08-14 | 北京捷通华声科技股份有限公司 | Voice interruption method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107437415B (en) | Intelligent voice interaction method and system | |
WO2021128741A1 (en) | Voice emotion fluctuation analysis method and apparatus, and computer device and storage medium | |
US9916826B1 (en) | Targeted detection of regions in speech processing data streams | |
WO2017084360A1 (en) | Method and system for speech recognition | |
CN111429899A (en) | Speech response processing method, device, equipment and medium based on artificial intelligence | |
CN109545197B (en) | Voice instruction identification method and device and intelligent terminal | |
US7177810B2 (en) | Method and apparatus for performing prosody-based endpointing of a speech signal | |
CN110364178B (en) | Voice processing method and device, storage medium and electronic equipment | |
CN112185392A (en) | Voice recognition processing system for power supply intelligent client | |
CN112614514B (en) | Effective voice fragment detection method, related equipment and readable storage medium | |
CN112825248A (en) | Voice processing method, model training method, interface display method and equipment | |
CN109215634A (en) | A kind of method and its system of more word voice control on-off systems | |
CN112185385A (en) | Intelligent client processing method and system for power supply field | |
CN110767240B (en) | Equipment control method, equipment, storage medium and device for identifying child accent | |
CN112071310A (en) | Speech recognition method and apparatus, electronic device, and storage medium | |
CN114385800A (en) | Voice conversation method and device | |
CN114708856A (en) | Voice processing method and related equipment thereof | |
CN110853669B (en) | Audio identification method, device and equipment | |
CN115512687B (en) | Voice sentence-breaking method and device, storage medium and electronic equipment | |
JP3721948B2 (en) | Voice start edge detection method, voice section detection method in voice recognition apparatus, and voice recognition apparatus | |
Hirschberg et al. | Generalizing prosodic prediction of speech recognition errors | |
CN112185393A (en) | Voice recognition processing method for power supply intelligent client | |
KR20050049207A (en) | Dialogue-type continuous speech recognition system and using it endpoint detection method of speech | |
CN112185365A (en) | Power supply intelligent client processing method and system | |
CN115331670A (en) | Off-line voice remote controller for household appliances |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |