CN113593584A - Electronic product voice control system capable of effectively restraining response time delay - Google Patents

Electronic product voice control system capable of effectively restraining response time delay Download PDF

Info

Publication number
CN113593584A
CN113593584A CN202111132401.9A CN202111132401A CN113593584A CN 113593584 A CN113593584 A CN 113593584A CN 202111132401 A CN202111132401 A CN 202111132401A CN 113593584 A CN113593584 A CN 113593584A
Authority
CN
China
Prior art keywords
voice
module
user
instruction
response time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111132401.9A
Other languages
Chinese (zh)
Other versions
CN113593584B (en
Inventor
高媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yuyi Digital Technology Co ltd
Original Assignee
Shenzhen Yuyi Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yuyi Digital Technology Co ltd filed Critical Shenzhen Yuyi Digital Technology Co ltd
Priority to CN202111132401.9A priority Critical patent/CN113593584B/en
Publication of CN113593584A publication Critical patent/CN113593584A/en
Application granted granted Critical
Publication of CN113593584B publication Critical patent/CN113593584B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a voice control system of an electronic product for effectively inhibiting response time delay, which aims to solve the technical problems that the response delay time is longer, the time consumed by operating the electronic product through voice is longer, the practicability of voice control is reduced, the use is inconvenient, the accuracy of voiceprint recognition is lower, and an instruction which is not in accordance with the intention of a user is easy to send out in the prior art. The voice control system comprises a sample acquisition module, a voice acquisition module, a digital-to-analog conversion module, a storage module, a voice recognition module, a control module and a communication module. The voice control system is convenient for a plurality of users to control the voice of the electronic product, processes and judges voice data through the voice recognition module, improves the precision of voiceprint comparison, enables the matching degree of an output instruction and a target instruction of the user to be higher, effectively inhibits the problem of prolonging of response time through the response time preset in the control module, and shortens the time consumed by operating the electronic product by the voice.

Description

Electronic product voice control system capable of effectively restraining response time delay
Technical Field
The invention belongs to the technical field of voice control, and particularly relates to a voice control system of an electronic product for effectively inhibiting response time delay.
Background
With the continuous development of computer technology, electronic products are widely used in daily life, and by combining with a voice recognition technology, the electronic products can be controlled by voice to perform corresponding actions, so that the use of the electronic products is further facilitated.
At present, patent No. CN201510989140.0 discloses a voice control device, including: the voice acquisition module is used for receiving a voice signal; the voice recognition module is used for generating voice characteristics according to the voice signals, judging the voice characteristics according to the current working mode of the voice control device and generating a voice command when judging that the voice characteristics are matched with a voice template corresponding to the current working mode; the first communication module is used for carrying out wireless communication with the intelligent terminal; and the control module is used for generating a control instruction according to the voice command and sending the control instruction to the intelligent terminal through the first wireless communication module so that the intelligent terminal works according to the control instruction. The voice characteristics are judged through the working module, but the system has the disadvantages of long response delay time, long time consumption for operating an electronic product through voice, reduced practicability of voice control, inconvenience in use, low precision of voiceprint recognition and easiness in sending an instruction which is not consistent with the intention of a user.
Therefore, it is necessary to solve the above problems of the electronic product, such as long response delay time of the voice control system and low accuracy of the voiceprint recognition, so as to improve the usage scenario of the electronic product.
Disclosure of Invention
(1) Technical problem to be solved
In view of the deficiencies of the prior art, the present invention provides a voice control system for an electronic product capable of effectively suppressing response time delay, which aims to solve the technical problems of the prior art that the response delay time is long, the time consumed for operating the electronic product through voice is long, the practicability of voice control is reduced, the use is inconvenient, the accuracy of voiceprint recognition is low, and an instruction inconsistent with the intention of a user is easy to issue.
(2) Technical scheme
In order to solve the above technical problems, the present invention provides a voice control system for electronic products, which effectively suppresses response time delay, the voice control system comprising a sample acquisition module, a voice acquisition module, a digital-to-analog conversion module, a storage module, a voice recognition module, a control module and a communication module,
the system comprises a sample acquisition module, a voice input module, a voice recognition module and a voice recognition module, wherein the sample acquisition module comprises a newly-built sample, recorded voice, extracted features and model training, a plurality of users can be established by the sample acquisition module through the newly-built sample, so that different users can send instructions to an electronic product conveniently, the recorded voice is used for acquiring voice data of the users, the recorded voice content comprises awakening words and key words, and the extracted features extract voice features from the acquired voice data according to the particularity of the voice and the stability of the voice;
the voice acquisition module is used for acquiring voice data sent out in the surrounding environment;
the digital-to-analog conversion module is used for converting the acquired analog signals into digital signals convenient to process, reducing or weakening noise influence and improving the accuracy of the acquired voice data, and a conversion algorithm is preset in the digital-to-analog conversion module: converting a continuously variable signal x (t) into a time-discrete sampling signal x (n) with a sampling rate fs =2.5fmax, holding a sampling output instantaneous analog signal for a period of time, converting a continuous amplitude sampling signal into a discrete time and discrete amplitude digital signal, quantizing an error, and encoding the quantized signal into a binary code for output;
the storage module comprises an instruction library, a model library and a text library, wherein each preset instruction for controlling the electronic product to complete corresponding operation is arranged in the instruction library and consists of an operation code and an address code, the model library contains personal voiceprint templates of all users, and the text library contains preset words or sentences;
instruction comparison rules are preset in the voice recognition module: comparing the obtained input command I with commands In a command library k (I1, I2 and I3 … … In) In sequence, firstly, carrying out first comparison, comparing the I with I1, if the matching degree P1 of the I with I1 is more than or equal to 70%, keeping I1 In the result, if not, keeping the result to be 0, then carrying out second comparison, comparing the I with I2, if P2 is less than 70%, keeping the last result unchanged, if the matching degree P2 of the I with I2 is more than or equal to 70%, combining the last result, if the last result keeps I1, if P1 is less than or equal to P2, keeping I2 In the result, otherwise keeping I1, if the last comparison result is 0, keeping I2 In the result until the I is compared with all commands In the command library k, taking the final result Ix as the output of the final command, and if the final result is 0, invalidating the command;
the control module is used for commanding each module to complete sample collection, voice collection, digital-to-analog conversion, storage, voice recognition and communication work within a specified time according to requirements;
the communication module is used for sending the final instruction to the electronic product, so that the electronic product can make corresponding operation according to the voice of a user, and the control module is preset with response time.
When the voice control system of the technical scheme is used, firstly, a voice acquisition module adopts a voice command of a user, a digital-to-analog conversion module converts a continuously changing signal x (t) into a time-discrete sampling signal x (n) with a sampling rate fs =2.5fmax, a transient analog signal obtained by sampling and outputting is kept for a period of time, a continuous amplitude sampling signal is converted into a discrete time and discrete amplitude digital signal, a quantization error is generated, the quantized signal is encoded into a binary code to be output, an obtained input command I is sequentially compared with commands In a command library k (I1, I2 and I3 … … In) by a voice recognition module, first comparison is carried out, I is compared with I1, if the matching degree P1 of the I and the I1 is not less than or equal to 70%, I1 is kept In the result, if the result is 0, then second comparison is carried out, I is compared with I2, if P2 is less than 70%, keeping the last result unchanged, if the matching degree P2 of I and I2 is more than or equal to 70%, combining the last result, keeping I1 in the last result, if P1 is less than or equal to P2, keeping I2 in the result, otherwise keeping I1, if the result in the previous comparison is 0, keeping I2 in the result until the comparison of I and all instructions in the instruction base k is finished, taking the final result Ix as the output of the final instruction, and if the final result is 0, invalidating the instruction, and sending the instruction needing to be output to the electronic product through the communication module by the control module, so that the electronic product executes the response operation.
Preferably, the particularity of the speech includes tone quality, duration, intensity and pitch, the model training simulates a speaker according to the sound characteristics, a personal voiceprint template specific to the user is established, and the maximum number of users that can be established by the sample acquisition module is 3, namely user 1, user 2 and user 3.
Preferably, a voice recording ending judgment rule is preset in the voice acquisition module: and when the total acquisition time is more than 15s, the voice acquisition module automatically stops the voice recording operation.
Preferably, the voice recognition module includes a voiceprint recognition processing unit, a text conversion processing unit, a semantic parsing unit, and an instruction comparison unit, the voiceprint recognition processing unit compares the voice data collected by the voice collection module with the personal voiceprint template in the model library, the text conversion processing unit converts the voice data into text information, the semantic parsing unit performs semantic check and processing on the text information to generate a corresponding target instruction, the instruction comparison unit compares the target instruction generated by the semantic parsing unit with an instruction in the instruction library to determine whether the target instruction generated by the semantic parsing unit needs to be output.
Preferably, an identification algorithm is preset in the voiceprint identification processing unit: firstly, judging whether the awakening word is correct or not, if not, the voice data is invalid, if the awakening word is correct, calling a personal voiceprint template of a user 1 in a model base, respectively comparing the personal voiceprint template of the user 1 with the awakening word data collected by a voice collection module from the four aspects of tone quality, duration, intensity and pitch, if the similarity value exceeds 95%, judging that the awakening word data belongs to the user 1, if the similarity value is lower than 95%, continuing calling a personal voiceprint template of a user 2 in the model base for comparison, if the similarity value is higher than 95%, judging that the awakening word data belongs to the user 2, if the similarity value is lower than 95%, continuing calling a personal voiceprint template of a user 3 in the model base for comparison, if the similarity value is higher than 95%, judging that the awakening word data belongs to the user 3, if the similarity value is lower than 95%, judging that the awakening word data is invalid, if the awakening word is valid, calling a personal voiceprint template of the user 1 in the model base, comparing the four aspects of tone quality, duration, intensity and pitch with the instruction voice data collected by the voice collection module, if the similarity value exceeds 95%, judging that the instruction voice data belongs to the user 1, if the similarity value is lower than 95%, continuously calling the personal voiceprint template of the user 2 in the model base for comparison, if the similarity value exceeds 95%, judging that the instruction voice data belongs to the user 2, if the similarity value is lower than 95%, continuously calling the personal voiceprint template of the user 3 in the model base for comparison, if the similarity value exceeds 95%, judging that the instruction voice data belongs to the user 3, and if the similarity value is lower than 95%, judging that the instruction voice data is invalid.
Preferably, the text conversion processing unit cuts the audio into a frame language according to a text library in the storage module, matches the frame language with a phrase in the text library, and then converts the frame language into text data.
Preferably, the semantic parsing unit includes text preprocessing, text feature extraction, and classification model construction, where a dictionary table is preset in the text preprocessing, a sentence is split into multiple parts, each part corresponds to the dictionary table one by one, and if the word is in the dictionary table, the word splitting is successful, otherwise, the word splitting and matching are continued until the word is successful.
Preferably, the preset response time in the control module includes a wakeup word response time and a voice dialog response time, wherein the wakeup word response time is 200ms-500ms, and the voice dialog response time is 650ms-1050 ms.
(3) Advantageous effects
Compared with the prior art, the invention has the beneficial effects that: the voice control system can record voice samples of a plurality of users through the sample acquisition module, thereby facilitating the voice control of the plurality of users on electronic products, processing and judging voice data through the voice recognition module, improving the precision of voiceprint comparison, enabling the matching degree of an output instruction and a target instruction of the user to be higher, effectively inhibiting the problem of prolonging of response time through controlling the response time preset in the module, shortening the time consumed by voice operation of the electronic products, and improving the practicability of voice control.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the technical solutions in the prior art will be briefly described below, it is obvious that the drawings in the following description are only one embodiment of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic diagram of an overall framework architecture of an embodiment of a voice control system of the present invention;
FIG. 2 is a schematic diagram of a sample acquisition module frame according to one embodiment of the voice control system of the present invention;
FIG. 3 is a block diagram of a frame structure of a speech recognition module according to an embodiment of the speech control system of the present invention;
FIG. 4 is a block diagram of a memory module according to an embodiment of the present invention.
Detailed Description
In order to make the technical means, the original characteristics, the achieved purposes and the effects of the invention easily understood and obvious, the technical solutions in the embodiments of the present invention are clearly and completely described below to further illustrate the invention, and obviously, the described embodiments are only a part of the embodiments of the present invention, but not all the embodiments.
Example 1
The voice control system of the electronic product for effectively inhibiting the response time delay comprises a sample acquisition module, a voice acquisition module, a digital-to-analog conversion module, a storage module, a voice recognition module, a control module and a communication module,
the sample acquisition module comprises a newly-built sample, input voice, extraction features and model training, a plurality of users can be established by the sample acquisition module through the newly-built sample, so that different users can send instructions to the electronic product conveniently, the input voice is used for acquiring voice data of the users, the voice content input by the input voice comprises awakening words and key words, and the extraction features extract the voice features from the acquired voice data according to the particularity of the voice and the stability of the voice;
the voice acquisition module is used for acquiring voice data sent out in the surrounding environment;
the digital-to-analog conversion module is used for converting the acquired analog signals into digital signals convenient to process, reducing or weakening noise influence and improving the accuracy of the acquired voice data, and a conversion algorithm is preset in the digital-to-analog conversion module: converting a continuously variable signal x (t) into a time-discrete sampling signal x (n) with a sampling rate fs =2.5fmax, holding a sampling output instantaneous analog signal for a period of time, converting a continuous amplitude sampling signal into a discrete time and discrete amplitude digital signal, quantizing an error, and encoding the quantized signal into a binary code for output;
the storage module comprises an instruction library, a model library and a text library, wherein each preset instruction for controlling the electronic product to complete corresponding operation is arranged in the instruction library and consists of an operation code and an address code, the model library contains personal voiceprint templates of all users, and the text library contains preset words or sentences;
instruction comparison rules are preset in the voice recognition module: comparing the obtained input command I with commands In a command library k (I1, I2 and I3 … … In) In sequence, firstly, carrying out first comparison, comparing the I with I1, if the matching degree P1 of the I with I1 is more than or equal to 70%, keeping I1 In the result, if not, keeping the result to be 0, then carrying out second comparison, comparing the I with I2, if P2 is less than 70%, keeping the last result unchanged, if the matching degree P2 of the I with I2 is more than or equal to 70%, combining the last result, if the last result keeps I1, if P1 is less than or equal to P2, keeping I2 In the result, otherwise keeping I1, if the last comparison result is 0, keeping I2 In the result until the I is compared with all commands In the command library k, taking the final result Ix as the output of the final command, and if the final result is 0, invalidating the command;
the control module is used for commanding each module to complete sample collection, voice collection, digital-to-analog conversion, storage, voice recognition and communication work within a specified time according to requirements;
the communication module is used for sending the final instruction to the electronic product, so that the electronic product can make corresponding operation according to the voice of a user, and the response time is preset in the control module.
Wherein, the specificity of pronunciation includes tone quality, duration, sound intensity and pitch, and the speaker is simulated according to the sound characteristic to the model training, establishes exclusive in user's individual voiceprint model, and the user that the sample collection module can establish is 3 at most, is user 1, user 2 and user 3 respectively, has preset the pronunciation in the voice collection module and has typed the end judgement rule: when the voice information is not acquired within 1s, the voice acquisition module judges that the voice input is finished, when the total acquisition time is more than 15s, the voice acquisition module automatically stops the voice input operation, the voice identification module comprises a voiceprint recognition processing unit, a text conversion processing unit, a semantic analysis unit and an instruction comparison unit, the voiceprint recognition processing unit compares the voice data acquired by the voice acquisition module with a personal voiceprint template in a model base, the text conversion processing unit converts the voice data into text information, the semantic analysis unit performs semantic check and processing on the text information to generate a corresponding target instruction, and the instruction comparison unit compares the target instruction generated by the semantic analysis unit with the instruction in the instruction base to judge whether the target instruction generated by the semantic analysis unit needs to be output.
Meanwhile, an identification algorithm is preset in the voiceprint identification processing unit: firstly, judging whether the awakening word is correct or not, if not, the voice data is invalid, if the awakening word is correct, calling a personal voiceprint template of a user 1 in a model base, respectively comparing the personal voiceprint template of the user 1 with the awakening word data collected by a voice collection module from the four aspects of tone quality, duration, intensity and pitch, if the similarity value exceeds 95%, judging that the awakening word data belongs to the user 1, if the similarity value is lower than 95%, continuing calling a personal voiceprint template of a user 2 in the model base for comparison, if the similarity value is higher than 95%, judging that the awakening word data belongs to the user 2, if the similarity value is lower than 95%, continuing calling a personal voiceprint template of a user 3 in the model base for comparison, if the similarity value is higher than 95%, judging that the awakening word data belongs to the user 3, if the similarity value is lower than 95%, judging that the awakening word data is invalid, if the awakening word is valid, calling a personal voiceprint template of the user 1 in the model base, comparing the four aspects of tone quality, duration, intensity and pitch with the instruction voice data collected by the voice collection module, if the similarity value exceeds 95%, judging that the instruction voice data belongs to the user 1, if the similarity value is lower than 95%, continuously calling the personal voiceprint template of the user 2 in the model base for comparison, if the similarity value exceeds 95%, judging that the instruction voice data belongs to the user 2, if the similarity value is lower than 95%, continuously calling the personal voiceprint template of the user 3 in the model base for comparison, if the similarity value exceeds 95%, judging that the instruction voice data belongs to the user 3, and if the similarity value is lower than 95%, judging that the instruction voice data is invalid.
In addition, the text conversion processing unit cuts the audio into frame languages according to the text library in the storage module, matches the frame languages with the phrases in the text library and then converts the frame languages into text data.
In addition, the semantic analysis unit comprises text preprocessing, text feature extraction and classification model construction, a dictionary table is preset in the text preprocessing, a sentence is divided into a plurality of parts, each part corresponds to the dictionary table one by one, if the word is in the dictionary table, the word division is successful, otherwise, the division and matching are continued until the word division is successful, and the response time preset in the control module comprises wakeup word response time and voice conversation response time, wherein the wakeup word response time is 200ms-500ms, and the voice conversation response time is 650ms-1050 ms.
A schematic diagram of a frame structure of a sample collection module of the speech control system is shown in fig. 2, a schematic diagram of a frame structure of a speech recognition module thereof is shown in fig. 3, and a schematic diagram of a frame structure of a storage module thereof is shown in fig. 4.
When the voice control system of the technical scheme is used, firstly, a voice acquisition module adopts a voice command of a user, a digital-to-analog conversion module converts a continuously changing signal x (t) into a time-discrete sampling signal x (n) with a sampling rate fs =2.5fmax, a transient analog signal obtained by sampling and outputting is kept for a period of time, a continuous amplitude sampling signal is converted into a discrete time and discrete amplitude digital signal, a quantization error is generated, the quantized signal is encoded into a binary code to be output, an obtained input command I is sequentially compared with commands In a command library k (I1, I2 and I3 … … In) by a voice recognition module, first comparison is carried out, I is compared with I1, if the matching degree P1 of the I and the I1 is not less than or equal to 70%, I1 is kept In the result, if the result is 0, then second comparison is carried out, I is compared with I2, if P2 is less than 70%, keeping the last result unchanged, if the matching degree P2 of I and I2 is more than or equal to 70%, combining the last result, keeping I1 in the last result, if P1 is less than or equal to P2, keeping I2 in the result, otherwise keeping I1, if the result in the previous comparison is 0, keeping I2 in the result until the comparison of I and all instructions in the instruction base k is finished, taking the final result Ix as the output of the final instruction, and if the final result is 0, invalidating the instruction, and sending the instruction needing to be output to the electronic product through the communication module by the control module, so that the electronic product executes the response operation.
Having thus described the principal technical features and basic principles of the invention, and the advantages associated therewith, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, but is capable of other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
Furthermore, it should be understood that although the present description is described in terms of various embodiments, not every embodiment includes only a single embodiment, and such descriptions are provided for clarity only, and those skilled in the art will recognize that the embodiments described herein can be combined as a whole to form other embodiments as would be understood by those skilled in the art.

Claims (8)

1. An electronic product voice control system for effectively suppressing response time delay is characterized in that: the voice control system comprises a sample acquisition module, a voice acquisition module, a digital-to-analog conversion module, a storage module, a voice recognition module, a control module and a communication module,
the system comprises a sample acquisition module, a voice input module, a voice recognition module and a voice recognition module, wherein the sample acquisition module comprises a newly-built sample, recorded voice, extracted features and model training, a plurality of users can be established by the sample acquisition module through the newly-built sample, so that different users can send instructions to an electronic product conveniently, the recorded voice is used for acquiring voice data of the users, the recorded voice content comprises awakening words and key words, and the extracted features extract voice features from the acquired voice data according to the particularity of the voice and the stability of the voice;
the voice acquisition module is used for acquiring voice data sent out in the surrounding environment;
the digital-to-analog conversion module is used for converting the acquired analog signals into digital signals convenient to process, reducing or weakening noise influence and improving the accuracy of the acquired voice data, and a conversion algorithm is preset in the digital-to-analog conversion module: converting a continuously variable signal x (t) into a time-discrete sampling signal x (n) with a sampling rate fs =2.5fmax, holding a sampling output instantaneous analog signal for a period of time, converting a continuous amplitude sampling signal into a discrete time and discrete amplitude digital signal, quantizing an error, and encoding the quantized signal into a binary code for output;
the storage module comprises an instruction library, a model library and a text library, wherein each preset instruction for controlling the electronic product to complete corresponding operation is arranged in the instruction library and consists of an operation code and an address code, the model library contains personal voiceprint templates of all users, and the text library contains preset words or sentences;
instruction comparison rules are preset in the voice recognition module: comparing the obtained input command I with commands In a command library k (I1, I2 and I3 … … In) In sequence, firstly, carrying out first comparison, comparing the I with I1, if the matching degree P1 of the I with I1 is more than or equal to 70%, keeping I1 In the result, if not, keeping the result to be 0, then carrying out second comparison, comparing the I with I2, if P2 is less than 70%, keeping the last result unchanged, if the matching degree P2 of the I with I2 is more than or equal to 70%, combining the last result, if the last result keeps I1, if P1 is less than or equal to P2, keeping I2 In the result, otherwise keeping I1, if the last comparison result is 0, keeping I2 In the result until the I is compared with all commands In the command library k, taking the final result Ix as the output of the final command, and if the final result is 0, invalidating the command;
the control module is used for commanding each module to complete sample collection, voice collection, digital-to-analog conversion, storage, voice recognition and communication work within a specified time according to requirements;
the communication module is used for sending the final instruction to the electronic product, so that the electronic product can make corresponding operation according to the voice of a user, and the control module is preset with response time.
2. The system as claimed in claim 1, wherein the speech specificity includes tone quality, duration, intensity and pitch, the model training simulates speaker according to voice characteristics, establishes personal voiceprint template specific to user, and the sample collection module can establish at most 3 users, user 1, user 2 and user 3 respectively.
3. The electronic product voice control system for effectively suppressing response time delay according to claim 1, wherein a voice recording end judgment rule is preset in the voice acquisition module: and when the total acquisition time is more than 15s, the voice acquisition module automatically stops the voice recording operation.
4. The electronic product voice control system capable of effectively suppressing response time delay as claimed in claim 1, wherein the voice recognition module comprises a voiceprint recognition processing unit, a text conversion processing unit, a semantic parsing unit and an instruction comparison unit, the voiceprint recognition processing unit compares voice data collected by the voice collection module with a personal voiceprint template in a model base, the text conversion processing unit converts voice data into text information, the semantic parsing unit performs semantic check and processing on the text information to generate a corresponding target instruction, the instruction comparison unit compares the target instruction generated by the semantic parsing unit with an instruction in an instruction base to determine whether the target instruction generated by the semantic parsing unit needs to be output.
5. The voice control system of electronic products for effectively suppressing response time delay as claimed in claim 4, wherein said voiceprint recognition processing unit is pre-configured with recognition algorithm: firstly, judging whether the awakening word is correct or not, if not, the voice data is invalid, if the awakening word is correct, calling a personal voiceprint template of a user 1 in a model base, respectively comparing the personal voiceprint template of the user 1 with the awakening word data collected by a voice collection module from the four aspects of tone quality, duration, intensity and pitch, if the similarity value exceeds 95%, judging that the awakening word data belongs to the user 1, if the similarity value is lower than 95%, continuing calling a personal voiceprint template of a user 2 in the model base for comparison, if the similarity value is higher than 95%, judging that the awakening word data belongs to the user 2, if the similarity value is lower than 95%, continuing calling a personal voiceprint template of a user 3 in the model base for comparison, if the similarity value is higher than 95%, judging that the awakening word data belongs to the user 3, if the similarity value is lower than 95%, judging that the awakening word data is invalid, if the awakening word is valid, calling a personal voiceprint template of the user 1 in the model base, comparing the four aspects of tone quality, duration, intensity and pitch with the instruction voice data collected by the voice collection module, if the similarity value exceeds 95%, judging that the instruction voice data belongs to the user 1, if the similarity value is lower than 95%, continuously calling the personal voiceprint template of the user 2 in the model base for comparison, if the similarity value exceeds 95%, judging that the instruction voice data belongs to the user 2, if the similarity value is lower than 95%, continuously calling the personal voiceprint template of the user 3 in the model base for comparison, if the similarity value exceeds 95%, judging that the instruction voice data belongs to the user 3, and if the similarity value is lower than 95%, judging that the instruction voice data is invalid.
6. The system as claimed in claim 4, wherein the text conversion processing unit cuts the audio into frame language according to the text library in the storage module, matches the words in the text library, and then converts the words into text data.
7. The system as claimed in claim 4, wherein the semantic parsing unit includes text preprocessing, text feature extraction, and classification model construction, a dictionary table is preset in the text preprocessing, a sentence is split into multiple parts, each part is in one-to-one correspondence with the dictionary table, if the word is in the dictionary table, the word splitting is successful, otherwise, the splitting and matching are continued until the word is successful.
8. The system as claimed in claim 1, wherein the preset response time in the control module includes a wake-up word response time and a voice dialog response time, wherein the wake-up word response time is 200ms-500ms, and the voice dialog response time is 650ms-1050 ms.
CN202111132401.9A 2021-09-27 2021-09-27 Electronic product voice control system for restraining response time delay Active CN113593584B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111132401.9A CN113593584B (en) 2021-09-27 2021-09-27 Electronic product voice control system for restraining response time delay

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111132401.9A CN113593584B (en) 2021-09-27 2021-09-27 Electronic product voice control system for restraining response time delay

Publications (2)

Publication Number Publication Date
CN113593584A true CN113593584A (en) 2021-11-02
CN113593584B CN113593584B (en) 2022-01-18

Family

ID=78242411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111132401.9A Active CN113593584B (en) 2021-09-27 2021-09-27 Electronic product voice control system for restraining response time delay

Country Status (1)

Country Link
CN (1) CN113593584B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104575504A (en) * 2014-12-24 2015-04-29 上海师范大学 Method for personalized television voice wake-up by voiceprint and voice identification
CN105575395A (en) * 2014-10-14 2016-05-11 中兴通讯股份有限公司 Voice wake-up method and apparatus, terminal, and processing method thereof
CN108766441A (en) * 2018-05-29 2018-11-06 广东声将军科技有限公司 A kind of sound control method and device based on offline Application on Voiceprint Recognition and speech recognition
CN110265040A (en) * 2019-06-20 2019-09-20 Oppo广东移动通信有限公司 Training method, device, storage medium and the electronic equipment of sound-groove model
CN110838297A (en) * 2019-11-07 2020-02-25 广东西欧克实业有限公司 Portable intelligent household voice control system
CN113039601A (en) * 2019-09-20 2021-06-25 深圳市汇顶科技股份有限公司 Voice control method, device, chip, earphone and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105575395A (en) * 2014-10-14 2016-05-11 中兴通讯股份有限公司 Voice wake-up method and apparatus, terminal, and processing method thereof
CN104575504A (en) * 2014-12-24 2015-04-29 上海师范大学 Method for personalized television voice wake-up by voiceprint and voice identification
CN108766441A (en) * 2018-05-29 2018-11-06 广东声将军科技有限公司 A kind of sound control method and device based on offline Application on Voiceprint Recognition and speech recognition
CN110265040A (en) * 2019-06-20 2019-09-20 Oppo广东移动通信有限公司 Training method, device, storage medium and the electronic equipment of sound-groove model
CN113039601A (en) * 2019-09-20 2021-06-25 深圳市汇顶科技股份有限公司 Voice control method, device, chip, earphone and system
CN110838297A (en) * 2019-11-07 2020-02-25 广东西欧克实业有限公司 Portable intelligent household voice control system

Also Published As

Publication number Publication date
CN113593584B (en) 2022-01-18

Similar Documents

Publication Publication Date Title
CN111477216B (en) Training method and system for voice and meaning understanding model of conversation robot
TWI576825B (en) A voice recognition system of a robot system ?????????????????????????????? and method thereof
WO2004036939A1 (en) Portable digital mobile communication apparatus, method for controlling speech and system
US5689617A (en) Speech recognition system which returns recognition results as a reconstructed language model with attached data values
CN109377981B (en) Phoneme alignment method and device
US9026430B2 (en) Electronic device and natural language analysis method thereof
CN114120985A (en) Pacifying interaction method, system and equipment of intelligent voice terminal and storage medium
CN113327574A (en) Speech synthesis method, device, computer equipment and storage medium
CN111430044A (en) Natural language processing system and method of nursing robot
CN110852075A (en) Voice transcription method and device for automatically adding punctuation marks and readable storage medium
CN110767233A (en) Voice conversion system and method
WO2019184942A1 (en) Audio exchanging method and system employing linguistic semantics, and coding graph
CN113593584B (en) Electronic product voice control system for restraining response time delay
CN116665674A (en) Internet intelligent recruitment publishing method based on voice and pre-training model
Li An improved machine learning algorithm for text-voice conversion of English letters into phonemes
CN113012683A (en) Speech recognition method and device, equipment and computer readable storage medium
CN115019787A (en) Interactive homophonic and heteronym word disambiguation method, system, electronic equipment and storage medium
CN115050351A (en) Method and device for generating timestamp and computer equipment
CN109359307B (en) Translation method, device and equipment for automatically identifying languages
KR102107447B1 (en) Text to speech conversion apparatus for providing a translation function based on application of an optional speech model and operating method thereof
Ma et al. Russian speech recognition system design based on HMM
Jing et al. Acquisition of english corpus machine translation based on speech recognition technology
CN110782895A (en) Man-machine voice system based on artificial intelligence
CN211828113U (en) Voice coding and decoding system and device
CN110660394A (en) Text editing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant