CN112185393A - Voice recognition processing method for power supply intelligent client - Google Patents

Voice recognition processing method for power supply intelligent client Download PDF

Info

Publication number
CN112185393A
CN112185393A CN202011059062.1A CN202011059062A CN112185393A CN 112185393 A CN112185393 A CN 112185393A CN 202011059062 A CN202011059062 A CN 202011059062A CN 112185393 A CN112185393 A CN 112185393A
Authority
CN
China
Prior art keywords
voice
voice signal
mode
voice recognition
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011059062.1A
Other languages
Chinese (zh)
Inventor
练芯妤
陈琳
林磊
罗陆宁
黄媚
刘家学
李艳
王婷婷
税洁
谢钰莹
徐艳如
陈诚
罗建国
黎怡均
罗益会
赵峻
莫屾
付婷婷
陈辉
黄公跃
林思远
方力谦
严玉婷
孙梦龙
杨蕴琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Power Supply Bureau Co Ltd
Original Assignee
Shenzhen Power Supply Bureau Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Power Supply Bureau Co Ltd filed Critical Shenzhen Power Supply Bureau Co Ltd
Priority to CN202011059062.1A priority Critical patent/CN112185393A/en
Publication of CN112185393A publication Critical patent/CN112185393A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/51Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a voice recognition processing method for a power supply intelligent client, which is characterized in that a power supply intelligent seat is used for answering a voice signal of a client; preprocessing a voice signal of a client; performing feature extraction on the preprocessed voice signal to obtain voice features in the voice signal; and recognizing the extracted voice features through a pre-constructed voice recognition model to obtain a voice recognition result in a text format, and outputting the voice recognition result. By implementing the invention, because the voice signals are effectively preprocessed, the voice recognition accuracy can be improved; meanwhile, due to the adoption of an intelligent interrupt processing mechanism, the intelligent degree can be improved, and intelligent interrupt response can be realized, so that the use experience of a client can be improved; in addition, by training the speech recognition model, the accuracy of speech recognition can be improved.

Description

Voice recognition processing method for power supply intelligent client
Technical Field
The invention relates to the technical field of power supply intelligent clients, in particular to a voice recognition processing method for a power supply intelligent client.
Background
For customer service work, intelligent voice is one of trends of future development, and although many power supply enterprises are actively building intelligent customer service systems, most of the existing voice navigation systems have some disadvantages, mainly embodied in the disadvantages of low intelligent degree, limited voice recognition effect, complex service flow, poor integrity, poor serviceability and the like.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a voice recognition processing method for a power supply intelligent client, which can improve the voice recognition effect.
In order to solve the above technical problem, an aspect of the present invention provides a speech recognition processing method for a power supply smart client, which includes the following steps:
step S10, answering the voice signal of the client through the intelligent power supply seat;
step S11, preprocessing the voice signal of the client, wherein the preprocessing comprises endpoint detection, noise elimination and intelligent interruption processing;
step S12, extracting the characteristics of the preprocessed voice signal to obtain the voice characteristics in the voice signal;
and step S13, recognizing the extracted voice features through a pre-constructed voice recognition model, obtaining a voice recognition result in a text format, and outputting the voice recognition result.
Preferably, the step S11 further includes:
step S110, converting the voice data into a data format suitable for acoustic model processing, where the data format includes: wav or pcm audio at a mono 16-bit 16000 sampling rate;
step S111, identifying end points in the voice signal to determine the speaking start point and the speaking end point of the user;
step S112, eliminating background noise, carrying out scene analysis, firstly carrying out scene type judgment on the voice signal, and after determining the scene type, carrying out noise elimination according to the preset noise elimination modes corresponding to different scenes;
and step S113, responding to the latest voice signal of the user in real time, and interrupting the current task or continuing the current task according to the set interrupt processing mode.
Preferably, the step S111 further includes:
when a voice signal is received to trigger voice recognition, judging whether a speaking habit of a corresponding user is stored or not according to user login information, if so, performing endpoint detection on detection parameters after corresponding adjustment according to the speaking habit of the user; and if not, carrying out endpoint detection based on the universal detection parameters.
Preferably, in the step S111, the endpoint in the speech signal is identified by a time domain feature method and a frequency domain feature method.
Preferably, the step S112 specifically includes:
presetting different interrupt processing mode types for different users, wherein the interrupt processing mode types comprise: a interruptible mode and an non-interruptible mode;
when voice recognition is carried out in real time, if a new voice signal is received, judging the type of the current interrupt processing mode;
if the current mode is the interruptible mode, interrupting the response of the current voice recognition when a new voice instruction is received;
and if the current mode is the non-interruptible mode, after the user triggers the voice recognition processing, re-receiving a new voice signal instruction after the instruction execution is finished.
Preferably, the step S12 further includes:
and performing feature extraction on the preprocessed voice signal by adopting a linear predictive coding technology to obtain the voice feature in the voice signal.
Preferably, further comprising: the method comprises the steps of pre-constructing a speech recognition model, wherein the speech recognition model comprises an acoustic model, a dictionary and a language model, and specifically comprises the following steps:
training voice characteristics corresponding to voice sample data in a voice database to obtain mapping from the voice characteristics to phonemes to form an acoustic model;
training texts in a text database to obtain a language model, and obtaining mapping between words and sentences to form the language model;
and constructing a mapping relation between the voice and the characters according to the acoustic model and the language model to form a dictionary.
The embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a voice recognition processing method for a power supply intelligent client, which comprises the steps of firstly answering a voice signal of a client through a power supply intelligent seat; then, preprocessing the voice signals of the client, wherein the preprocessing comprises endpoint detection, noise elimination and intelligent interruption processing; performing feature extraction on the preprocessed voice signal to obtain voice features in the voice signal; and recognizing the extracted voice features through a pre-constructed voice recognition model to obtain a voice recognition result in a text format, and outputting the voice recognition result. Because the voice signals are effectively preprocessed, the voice recognition accuracy can be improved; meanwhile, due to the adoption of an intelligent interrupt processing mechanism, the intelligent degree can be improved, and intelligent interrupt response can be realized, so that the use experience of a client can be improved;
in addition, by training the speech recognition model, the accuracy of speech recognition can be improved.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a speech recognition processing method for a power-supplying smart client according to the present invention;
FIG. 2 is a more detailed flowchart of step S11 in FIG. 1;
FIG. 3 is a schematic diagram of an embodiment of a speech recognition processing system for a powered Smart client according to the present invention;
FIG. 4 is a schematic diagram of the pretreatment unit of FIG. 3;
FIG. 5 is a schematic diagram of the structure of the intelligent breaking unit in FIG. 4;
fig. 6 is a schematic structural diagram of the speech recognition model building unit in fig. 3.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
For those skilled in the art to more clearly understand the objects, technical solutions and advantages of the present invention, the following description will be further provided in conjunction with the accompanying drawings and examples.
Referring to fig. 1, a schematic flow chart of an embodiment of a speech recognition processing method for a power supply smart client according to the present invention is shown; referring to fig. 2 together, in this embodiment, the speech recognition processing method for a power supply smart client includes the following steps:
step S10, answering the voice signal of the client through the intelligent power supply seat;
step S11, preprocessing the voice signal of the client, wherein the preprocessing comprises endpoint detection, noise elimination and intelligent interruption processing;
in a specific example, the step S11 further includes:
step S110, converting the voice data into a data format suitable for acoustic model processing, where the data format includes: wav or pcm audio at a mono 16-bit 16000 sampling rate;
step S111, identifying end points in the voice signal to determine the speaking start point and the speaking end point of the user; once the user starts speaking, the voice starts to flow to a following recognition engine (namely a voice recognition processing unit) until the end of the user speaking is detected, and silence detection in the voice recognition can determine whether the user finishes speaking; this way the recognition engine starts the recognition process while the user is speaking.
Step S112, eliminating background noise, carrying out scene analysis, firstly carrying out scene type judgment on the voice signal, and after determining the scene type, carrying out noise elimination according to the preset noise elimination modes corresponding to different scenes; the noise elimination is mainly to eliminate background noise, and can be performed based on the scene where the user is located, for example, scene type judgment is performed on the voice signal, after the scene type is determined, noise elimination is performed according to the preset noise elimination mode corresponding to different scenes, and by setting corresponding noise elimination strategies for various scene types, the voice noise elimination processing effect can be improved, and the accuracy of subsequent voice recognition is ensured. In different scenes, different noise reduction modes can be learned; specifically, for each scene, noise reduction processing can be performed through various noise reduction algorithms, a noise reduction mode adapted to the scene is found, and after the noise reduction mode corresponding to each scene is determined, noise elimination can be performed on each scene by using the corresponding noise reduction mode.
And step S113, responding to the latest voice signal of the user in real time, and interrupting the current task or continuing the current task according to the set interrupt processing mode. In some scenes, the intelligent interruption means that a user can speak own requirements at any time in the process of playing the prompt words of the self-service voice service without waiting for the playing to be finished, and the system can automatically judge, immediately stop the playing of the prompt words and respond to the voice indication of the user. The intelligent interruption function enables a user to speak own requirements at any time in the prompt language playing process of self-service voice service without waiting for the playing to be finished, the system can automatically judge, the prompt language is immediately stopped from being played, and the voice instruction of the user is responded. The function enables human-computer interaction to be more efficient, faster and natural, and is beneficial to enhancing customer experience.
Preferably, the step S111 further includes:
when a voice signal is received to trigger voice recognition, judging whether a speaking habit of a corresponding user is stored or not according to user login information, if so, performing endpoint detection on detection parameters after corresponding adjustment according to the speaking habit of the user; and if not, carrying out endpoint detection based on the universal detection parameters. By establishing the speaking habits of different users to configure the corresponding endpoint detection strategy, the accuracy and efficiency of endpoint detection can be effectively improved.
Preferably, in the step S111, the endpoint in the speech signal is identified by a time domain feature method and a frequency domain feature method. Endpoint detection is the determination of the beginning and end of speech from a segment of a signal containing speech. Effective endpoint detection not only reduces processing time, but also eliminates noise interference in the silence. There are two main types of methods at present: a time domain characterization method and a frequency domain characterization method. The time domain characteristic method is to detect the end point by using the voice volume and the zero crossing rate, the calculated amount is small, but misjudgment can be caused to the gas sound, and different volume calculation can cause different detection results. The frequency domain characteristic method is used for detecting voice by using the frequency spectrum variation and entropy detection of sound, and the calculated amount is large.
Preferably, the step S112 specifically includes:
presetting different interrupt processing mode types for different users, wherein the interrupt processing mode types comprise: a interruptible mode and an non-interruptible mode;
when voice recognition is carried out in real time, if a new voice signal is received, judging the type of the current interrupt processing mode;
if the current mode is the interruptible mode, interrupting the response of the current voice recognition when a new voice instruction is received;
and if the current mode is the non-interruptible mode, after the user triggers the voice recognition processing, re-receiving a new voice signal instruction after the instruction execution is finished.
It can be understood that when speech recognition is carried out in real time, if a new speech signal is received, if a user interrupts the speech recognition, and a latest instruction is sent, the platform timely responds, so that the user can speak out own requirements at any time in the process of playing the prompt words of the self-service speech service without waiting for the completion of playing, the system can automatically judge, immediately stop the playing of the prompt words, and respond to the speech instruction of the user. The function enables human-computer interaction to be more efficient, faster and natural, and is beneficial to enhancing customer experience. Specifically, different modes such as a break-able mode and a non-break-able mode can be set, and in the break-able mode, for example, when a new voice instruction is received, the response of the current voice recognition is interrupted; and in the non-interrupt mode, after the user triggers the voice recognition processing, the new voice signal instruction is received again after the instruction execution is finished. The specific mode setting can be flexibly configured by the user according to personal use habits.
It is understood that in various embodiments, the pre-processing may also include filtering, a/D conversion, pre-emphasis, and the like.
Step S12, extracting the characteristics of the preprocessed voice signal to obtain the voice characteristics in the voice signal;
preferably, the step S12 further includes:
and performing feature extraction on the preprocessed voice signal by adopting a linear predictive coding technology to obtain the voice feature in the voice signal. It will be appreciated that the basic idea of linear predictive coding is that there is a correlation between the speech signal sample points and that the current and future sample point values can be predicted using a linear combination of past sample points. The linear pre-measured coefficients are uniquely determined by minimizing the mean square error between the predicted signal and the actual signal.
Preferably, further comprising: the method comprises the steps of pre-constructing a speech recognition model, wherein the speech recognition model comprises an acoustic model, a dictionary and a language model, and specifically comprises the following steps:
training voice characteristics corresponding to voice sample data in a voice database to obtain mapping from the voice characteristics to phonemes to form an acoustic model; wherein the acoustic model is a knowledge representation of differences in acoustics, phonetics, environmental variables, speaker gender, accents, etc.;
training texts in a text database to obtain a language model, and obtaining mapping between words and sentences to form the language model; the language model is a knowledge representation formed by a group of word sequences and can be obtained by performing LM training by using an SRILM tool;
and constructing a mapping relation between the voice and the characters according to the acoustic model and the language model to form a dictionary.
In general, the acoustic model is a unit that classifies acoustic features of speech into (decodes) phonemes or words, and then the language model decodes the words into a complete sentence.
And step S13, recognizing the extracted voice features through a pre-constructed voice recognition model, obtaining a voice recognition result in a text format, and outputting the voice recognition result. And matching and comparing the input voice characteristics with the acoustic model during recognition to obtain the optimal recognition result.
FIG. 3 is a schematic diagram illustrating an embodiment of a speech recognition processing system for a power-supplying smart client according to the method of the present invention; as shown in fig. 4 to 5. In this embodiment, the speech recognition processing system 1 for a power supply smart client includes:
the voice input unit 10 is used for receiving voice signals of a customer through a power supply intelligent seat;
the system comprises a preprocessing unit 11, a voice signal processing unit and a voice signal processing unit, wherein the preprocessing unit is used for preprocessing a voice signal of a client, and comprises endpoint detection, noise elimination and intelligent interruption processing;
a feature extraction unit 12, configured to perform feature extraction on the preprocessed voice signal to obtain a voice feature in the voice signal;
a speech recognition model construction unit 13 configured to construct a speech recognition model in advance, where the speech recognition model includes an acoustic model, a dictionary, and a language model;
and the voice recognition processing unit 14 is used for recognizing the extracted voice features through a pre-constructed voice recognition model, obtaining a voice recognition result in a text format and outputting the voice recognition result.
In a specific example, the preprocessing unit 11 further includes:
a channel conversion unit 110, configured to convert the voice data into a data format suitable for acoustic model processing, the data format including: wav or pcm audio at a mono 16-bit 16000 sampling rate;
an endpoint detection unit 111, configured to identify an endpoint in the speech signal to determine a user speaking start point and a user speaking end point;
the noise reduction unit 112 is configured to eliminate background noise, perform scene analysis, perform scene type judgment on the speech signal, and perform noise elimination according to a preset noise elimination manner corresponding to different scenes after determining the scene type;
and the intelligent interrupt unit 113 is used for responding to the latest voice signal of the user in real time and interrupting the current task or continuing the current task according to the set interrupt processing mode.
Specifically, in one example, the endpoint detection unit 111 further includes:
when a voice signal is received to trigger voice recognition, judging whether a speaking habit of a corresponding user is stored or not according to user login information, if so, performing endpoint detection on detection parameters after corresponding adjustment according to the speaking habit of the user; and if not, carrying out endpoint detection based on the universal detection parameters.
Preferably, the endpoint detecting unit 111 further identifies an endpoint in the speech signal by a time domain feature method and a frequency domain feature method.
In a specific example, the intelligent interruption unit 113 specifically includes:
an interrupt mode type setting unit 1130, configured to preset different interrupt processing mode types for different users, where the interrupt processing mode types include: a interruptible mode and an non-interruptible mode;
an interrupt mode processing unit 1131, configured to determine a current interrupt processing mode type if a new voice signal is received when performing voice recognition in real time;
if the current mode is the interruptible mode, interrupting the response of the current voice recognition when a new voice instruction is received;
and if the current mode is the non-interruptible mode, after the user triggers the voice recognition processing, re-receiving a new voice signal instruction after the instruction execution is finished.
Preferably, the feature extraction unit is further configured to perform feature extraction on the preprocessed voice signal by using a linear predictive coding technique, so as to obtain a voice feature in the voice signal.
Specifically, in one example, in the speech recognition model construction unit 13, the acoustic model is used for establishing a pronunciation template for each pronunciation, and the acoustic model is a knowledge representation of the difference between acoustics, phonetics, environmental variables, speaker gender and accent; the language model is a knowledge representation of a set of word sequences, which is a mapping of words to words, words to sentences; the dictionary is constructed with a mapping relationship between speech and text.
Preferably, the speech recognition model construction unit 13 includes:
a voice database 130 for storing a plurality of voice sample data;
the feature pre-extraction unit 131 is configured to perform feature extraction on voice sample data in the voice database to obtain voice features corresponding to the voice sample data;
and an acoustic model training unit 132, configured to train the speech features, obtain a mapping from the speech features to phonemes, and form an acoustic model.
Preferably, the speech recognition model construction unit 13 includes:
a text database 133 storing a plurality of text sample data;
and the language model training unit 134 is configured to train the texts in the text database to obtain mappings between words and sentences, so as to form a language model.
Preferably, the speech recognition model building unit 13 further comprises:
the dictionary building unit 135 is configured to build a mapping relationship between the speech and the text according to the acoustic model and the language model to form a dictionary.
For more details, reference may be made to the foregoing description of fig. 1 and fig. 2, which is not repeated herein.
The embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a voice recognition processing method for a power supply intelligent client, which comprises the steps of firstly answering a voice signal of a client through a power supply intelligent seat; then, preprocessing the voice signals of the client, wherein the preprocessing comprises endpoint detection, noise elimination and intelligent interruption processing; performing feature extraction on the preprocessed voice signal to obtain voice features in the voice signal; and recognizing the extracted voice features through a pre-constructed voice recognition model to obtain a voice recognition result in a text format, and outputting the voice recognition result. Because the voice signals are effectively preprocessed, the voice recognition accuracy can be improved; meanwhile, due to the adoption of an intelligent interrupt processing mechanism, the intelligent degree can be improved, and intelligent interrupt response can be realized, so that the use experience of a client can be improved;
in addition, by training the speech recognition model, the accuracy of speech recognition can be improved.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (7)

1. A speech recognition processing method for a power supply intelligent client is characterized by comprising the following steps:
step S10, answering the voice signal of the client through the intelligent power supply seat;
step S11, preprocessing the voice signal of the client, wherein the preprocessing comprises endpoint detection, noise elimination and intelligent interruption processing;
step S12, extracting the characteristics of the preprocessed voice signal to obtain the voice characteristics in the voice signal;
and step S13, recognizing the extracted voice features through a pre-constructed voice recognition model, obtaining a voice recognition result in a text format, and outputting the voice recognition result.
2. The method of claim 1, wherein the step S11 further comprises:
step S110, converting the voice data into a data format suitable for acoustic model processing, where the data format includes: wav or pcm audio at a mono 16-bit 16000 sampling rate;
step S111, identifying end points in the voice signal to determine the speaking start point and the speaking end point of the user;
step S112, eliminating background noise, carrying out scene analysis, firstly carrying out scene type judgment on the voice signal, and after determining the scene type, carrying out noise elimination according to the preset noise elimination modes corresponding to different scenes;
and step S113, responding to the latest voice signal of the user in real time, and interrupting the current task or continuing the current task according to the set interrupt processing mode.
3. The method of claim 2, wherein the step S111 further comprises:
when a voice signal is received to trigger voice recognition, judging whether a speaking habit of a corresponding user is stored or not according to user login information, if so, performing endpoint detection on detection parameters after corresponding adjustment according to the speaking habit of the user; and if not, carrying out endpoint detection based on the universal detection parameters.
4. The method of claim 3, wherein in step S111, the endpoints in the speech signal are identified by a time domain feature method and a frequency domain feature method.
5. The method according to claim 2, wherein the step S112 specifically includes:
presetting different interrupt processing mode types for different users, wherein the interrupt processing mode types comprise: a interruptible mode and an non-interruptible mode;
when voice recognition is carried out in real time, if a new voice signal is received, judging the type of the current interrupt processing mode;
if the current mode is the interruptible mode, interrupting the response of the current voice recognition when a new voice instruction is received;
and if the current mode is the non-interruptible mode, after the user triggers the voice recognition processing, re-receiving a new voice signal instruction after the instruction execution is finished.
6. The method of claim 5, wherein the step S12 further comprises:
and performing feature extraction on the preprocessed voice signal by adopting a linear predictive coding technology to obtain the voice feature in the voice signal.
7. The method of any of claims 1 to 6, further comprising: the method comprises the steps of pre-constructing a speech recognition model, wherein the speech recognition model comprises an acoustic model, a dictionary and a language model, and specifically comprises the following steps:
training voice characteristics corresponding to voice sample data in a voice database to obtain mapping from the voice characteristics to phonemes to form an acoustic model;
training texts in a text database to obtain a language model, and obtaining mapping between words and sentences to form the language model;
and constructing a mapping relation between the voice and the characters according to the acoustic model and the language model to form a dictionary.
CN202011059062.1A 2020-09-30 2020-09-30 Voice recognition processing method for power supply intelligent client Pending CN112185393A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011059062.1A CN112185393A (en) 2020-09-30 2020-09-30 Voice recognition processing method for power supply intelligent client

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011059062.1A CN112185393A (en) 2020-09-30 2020-09-30 Voice recognition processing method for power supply intelligent client

Publications (1)

Publication Number Publication Date
CN112185393A true CN112185393A (en) 2021-01-05

Family

ID=73947098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011059062.1A Pending CN112185393A (en) 2020-09-30 2020-09-30 Voice recognition processing method for power supply intelligent client

Country Status (1)

Country Link
CN (1) CN112185393A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109509471A (en) * 2018-12-28 2019-03-22 浙江百应科技有限公司 A method of the dialogue of intelligent sound robot is interrupted based on vad algorithm
CN109859774A (en) * 2019-01-02 2019-06-07 珠海格力电器股份有限公司 Speech ciphering equipment and its end-point detection sensitivity adjustment method, device and storage medium
CN110299152A (en) * 2019-06-28 2019-10-01 北京猎户星空科技有限公司 Interactive output control method, device, electronic equipment and storage medium
CN110517697A (en) * 2019-08-20 2019-11-29 中信银行股份有限公司 Prompt tone intelligence cutting-off device for interactive voice response
CN111540349A (en) * 2020-03-27 2020-08-14 北京捷通华声科技股份有限公司 Voice interruption method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109509471A (en) * 2018-12-28 2019-03-22 浙江百应科技有限公司 A method of the dialogue of intelligent sound robot is interrupted based on vad algorithm
CN109859774A (en) * 2019-01-02 2019-06-07 珠海格力电器股份有限公司 Speech ciphering equipment and its end-point detection sensitivity adjustment method, device and storage medium
CN110299152A (en) * 2019-06-28 2019-10-01 北京猎户星空科技有限公司 Interactive output control method, device, electronic equipment and storage medium
CN110517697A (en) * 2019-08-20 2019-11-29 中信银行股份有限公司 Prompt tone intelligence cutting-off device for interactive voice response
CN111540349A (en) * 2020-03-27 2020-08-14 北京捷通华声科技股份有限公司 Voice interruption method and device

Similar Documents

Publication Publication Date Title
CN107437415B (en) Intelligent voice interaction method and system
WO2021128741A1 (en) Voice emotion fluctuation analysis method and apparatus, and computer device and storage medium
US9916826B1 (en) Targeted detection of regions in speech processing data streams
WO2017084360A1 (en) Method and system for speech recognition
CN111429899A (en) Speech response processing method, device, equipment and medium based on artificial intelligence
CN109545197B (en) Voice instruction identification method and device and intelligent terminal
US7177810B2 (en) Method and apparatus for performing prosody-based endpointing of a speech signal
CN110364178B (en) Voice processing method and device, storage medium and electronic equipment
CN112185392A (en) Voice recognition processing system for power supply intelligent client
CN112614514B (en) Effective voice fragment detection method, related equipment and readable storage medium
CN112825248A (en) Voice processing method, model training method, interface display method and equipment
CN109215634A (en) A kind of method and its system of more word voice control on-off systems
CN112185385A (en) Intelligent client processing method and system for power supply field
CN110767240B (en) Equipment control method, equipment, storage medium and device for identifying child accent
CN112071310A (en) Speech recognition method and apparatus, electronic device, and storage medium
CN114385800A (en) Voice conversation method and device
CN114708856A (en) Voice processing method and related equipment thereof
CN110853669B (en) Audio identification method, device and equipment
CN115512687B (en) Voice sentence-breaking method and device, storage medium and electronic equipment
JP3721948B2 (en) Voice start edge detection method, voice section detection method in voice recognition apparatus, and voice recognition apparatus
Hirschberg et al. Generalizing prosodic prediction of speech recognition errors
CN112185393A (en) Voice recognition processing method for power supply intelligent client
KR20050049207A (en) Dialogue-type continuous speech recognition system and using it endpoint detection method of speech
CN112185365A (en) Power supply intelligent client processing method and system
CN115331670A (en) Off-line voice remote controller for household appliances

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination