CN117854506B - Robot voice intelligent interaction system - Google Patents

Robot voice intelligent interaction system Download PDF

Info

Publication number
CN117854506B
CN117854506B CN202410259405.0A CN202410259405A CN117854506B CN 117854506 B CN117854506 B CN 117854506B CN 202410259405 A CN202410259405 A CN 202410259405A CN 117854506 B CN117854506 B CN 117854506B
Authority
CN
China
Prior art keywords
voice
preset
module
robot
sets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410259405.0A
Other languages
Chinese (zh)
Other versions
CN117854506A (en
Inventor
李凌云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ludong University
Original Assignee
Ludong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ludong University filed Critical Ludong University
Priority to CN202410259405.0A priority Critical patent/CN117854506B/en
Publication of CN117854506A publication Critical patent/CN117854506A/en
Application granted granted Critical
Publication of CN117854506B publication Critical patent/CN117854506B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Manipulator (AREA)

Abstract

The invention relates to a robot voice intelligent interaction system, in particular to the technical field of voice recognition, which comprises a first acquisition module, a voice filtering module, a second acquisition module, a voice analysis module, an interaction control module, an output content analysis module, a correction module, an optimization module and a control module, wherein the first acquisition module acquires voice information of a user, the voice filtering module filters the acquired voice of the user, the voice of the user after filtering generates phonemes, the second acquisition module is used for acquiring the phoneme information and framing information, the voice analysis module analyzes the effectiveness of each phoneme, recognizes characters according to the effective phonemes and generates text information, the adjustment module adjusts the analysis process of the effectiveness of each phoneme, the interaction control module controls the state of a robot, the output content is analyzed and output, the correction module corrects the control process of the state of the robot, and the optimization module optimizes the control process of the state of the robot in the next control period.

Description

Robot voice intelligent interaction system
Technical Field
The invention relates to the technical field of voice interaction, in particular to a robot voice intelligent interaction system.
Background
The intelligent robot voice interaction system filters the acquired user voice, analyzes the effectiveness of the acquired user voice, generates text information according to an analysis result, controls the state and output content of the robot according to the text information, and improves the voice recognition efficiency of the robot.
Chinese patent publication No.: CN108133701B discloses a system and method for robot voice interaction, comprising: when a voice recognition request sent by an upper layer application is received, voice recognition is carried out on the collected audio signals, and a recognized recognition text is obtained; reporting the identification text for interface display by the upper layer application; obtaining a first voice operation instruction according to the identification text; when the first voice operation instruction is a semantic understanding request, carrying out semantic understanding on the identification text to obtain a corresponding voice instruction; reporting the voice instruction for the upper layer application to perform interface display; obtaining a second voice operation instruction according to the voice instruction; when the second voice operation instruction is a voice synthesis request, performing voice synthesis on the voice instruction and playing the voice instruction; therefore, the scheme only carries out recognition processing on the voice when carrying out voice recognition, and does not consider phonemes of voice preprocessing and recognition accuracy, so that the problem of low voice recognition efficiency of the robot exists.
Disclosure of Invention
Therefore, the invention provides a robot voice intelligent interaction system which is used for solving the problem of low voice recognition efficiency of robots in the prior art.
In order to achieve the above object, the present invention provides a robot voice intelligent interaction system, the system comprising,
The first acquisition module is used for acquiring voice information of a user;
the voice filtering module is used for filtering the acquired voice of the user according to the decibel of the acquired voice frame voice and the voice duration of the user, and generating a phoneme from the filtered voice of the user;
The second acquisition module is used for acquiring the phoneme information and the framing information;
The voice analysis module is used for analyzing the effectiveness of each phoneme according to the acquired frame length of each phoneme, recognizing the characters according to the effective phonemes and generating text information;
The adjusting module is used for adjusting the analysis process of the effectiveness of each phoneme according to the acquired voice duration, the number of phonemes, the number of invalid voice frames and the number of voice frames of the user;
The interaction control module is used for controlling the state of the robot according to the generated text information and the matching result of the preset keywords, and analyzing the output content to output the output content;
the correction module is used for correcting the control process of the robot state according to the number of invalid phonemes, the number of phonemes and the frame shift duration;
and the optimizing module is used for optimizing the control process of the state of the robot in the next control period according to the number of times that the user does not operate after the robot is started in the control period.
Further, the voice filtering module compares the acquired decibel Ka of the voice frame voice with each preset decibel, and filters the acquired voice of the user according to the comparison result, wherein:
When Ka is smaller than b1 or Ka is larger than b2, the voice filtering module judges that the voice frame is invalid and filters the voice frame;
When b1 is less than or equal to Ka is less than or equal to b2, the voice filtering module judges that the voice frame is effective and generates a phoneme according to the effective voice frame;
wherein b1 is a preset minimum decibel, b2 is a preset maximum decibel, a is more than 0 and less than or equal to t0/t1, t0 is the acquired voice duration of the user, t1 is the framing duration, and Ka is the decibel of the voice of the a-th voice frame.
Further, the voice analysis module is provided with a phoneme analysis unit, the phoneme analysis unit compares the frame length Yy of each phoneme with each preset frame length, and analyzes the validity of each phoneme according to the comparison result, wherein:
When Yy < C1 or Yy > C2, the phoneme analysis unit determines that the phoneme is invalid;
when C1 is less than or equal to Yy is less than or equal to C2, the phoneme analysis unit judges that the phoneme is valid;
Wherein, C1 is the preset minimum frame length, and C2 is the preset maximum frame length.
Further, the adjusting module is provided with an adjusting unit, the adjusting unit calculates the ratio of the acquired voice duration of the user to the number m of phonemes of the current voice, compares the calculation result with a preset speech rate coefficient, and adjusts the analysis process of the validity of each phoneme according to the comparison result, wherein:
When t0/m is less than or equal to d1, the adjusting unit judges that the speech speed of the user is fast, sets an adjusting coefficient alpha 1 to adjust a preset frame length Cc, and sets alpha 1 = 1- (d 1-t 0/m)/(d1+t0/m);
When d1 is less than t0/m is less than d2, the regulating unit judges that the speech speed of the user is normal and does not regulate;
When t0/m is more than or equal to d2, the adjusting unit judges that the user is slow in speech speed, sets an adjusting coefficient alpha 2 to adjust a preset frame length Cc, and sets alpha 2 = 1+ (t 0/m-d 2)/(d2+t0/m).
Further, the adjustment module is further provided with a correction unit, the correction unit calculates a ratio of an invalid voice frame n0 in the acquired user voice to the number of voice frames, compares a calculation result with a preset abnormal threshold value, and corrects an adjustment process of an analysis process of validity of each phoneme according to the comparison result, wherein:
When n0/n1 is less than e1, the correction unit judges that the abnormal frames of the user voice are less and does not correct;
When n0/n1 is greater than or equal to e1, the correction unit determines that the number of abnormal frames of the user voice is large, sets a correction coefficient beta to correct the adjustment coefficient alpha x, sets beta=1- (n 0/n1-e 1)/(n 0/n1+ e 1), sets the corrected adjustment coefficient alpha x to alpha x', and sets alpha x'=αx ×beta, wherein n1=t0/t 1.
Further, the interaction control module is provided with a state control unit, the state control unit matches the text information with a preset starting keyword, and controls the state of the robot according to a matching result and the byte number z0 of the text information, wherein:
when the text information is matched with a preset starting keyword, if z0 is less than or equal to z1, the state control unit controls the robot to start, and if z0 is more than z1, the state control unit does not control the robot to start;
When the text information is not matched with the preset starting key words, the state control unit does not control the robot to start, wherein z0 is the byte number of the text information, and z1 is the preset byte number.
Further, the interactive control module is further provided with an output analysis unit, when the robot is started, the output analysis unit matches text information generated by the voice analysis module with preset output keywords, and analyzes output contents of the robot according to the type f of the preset output keywords in the text information and the history matching times G f of the preset output keywords, wherein:
when f=0, the output content analysis unit does not analyze the output content of the robot;
When f=1, the output analysis unit outputs the content corresponding to the preset output keyword;
When f is more than 1, the output analysis unit sorts the history matching times G f of each preset output keyword according to the order from big to small according to the bubbling sorting method, sets the history matching times G f of the largest preset output keyword as G0, and outputs the content corresponding to the preset output keyword with the history matching times of G0.
Further, the correction module is provided with a correction unit, the correction unit calculates the ratio of the number h0 of invalid phonemes to the number of phonemes in the acquired user voice, compares the calculation result with a preset phoneme anomaly coefficient p, and corrects the control process of the state of the robot according to the comparison result, wherein:
when h0/m is less than or equal to p, the correction unit judges that the abnormal number of the voice phonemes of the user is normal, and correction is not carried out;
when h0/m > P, the correction unit determines that the number of voice phonemes of the user is large, sets a correction coefficient P to correct a preset number of bytes z1, sets z1=1+ (h 0/m-P)/(h 0/m-P), sets the corrected preset number of bytes as z1', and sets z1' =z1×p.
Further, the correction module is further provided with a compensation unit, the compensation unit compares the acquired frame shift j0 with each preset frame shift, and compensates a correction process of a control process of the state of the robot according to a comparison result, wherein:
when j0 is less than or equal to j1, the compensation unit judges that the frame is moved down, sets a compensation coefficient Q1 to compensate a correction coefficient P, and sets Q1=1- (j 1-j 0)/(j 1+ j 0);
when j1 is less than j0 and less than j2, the compensation unit judges that the frame moves normally and does not correct;
When j0 is larger than or equal to j2, the compensation unit judges that the frame is large, sets a compensation coefficient Q2 to compensate a correction coefficient P, and sets Q2=1+ (j 0-j 2)/(j 0+j 2);
The compensation unit compensates the correction coefficient P according to the compensation coefficient Qq, and sets the compensated correction coefficient as P ', and sets P' =P×Qq, and q=1, 2;
Wherein j1 is a preset minimum frame shift, and j2 is a preset maximum frame shift.
Further, the optimization module compares the number v0 of times that the robot is not operated by the user after being started in the control period with each preset abnormal number, and optimizes the control process of the state of the robot in the next control period according to the comparison result, wherein:
When v0 is less than or equal to v1, the optimization module judges that the number of times of non-operation of a user is less, sets an optimization coefficient gamma 1 to optimize the preset number of times z1 of the next control period, and sets gamma 1 = 1+ (v 1-v 0)/(v 1+ v 0);
when v1 is smaller than v0 and smaller than v2, the optimization module judges that the number of times of non-operation of a user is normal and does not perform optimization;
When v0 is more than or equal to v2, the optimization module judges that the number of times of non-operation of a user is more, sets an optimization coefficient gamma 2 to optimize the preset number of times z1 of the next control period, and sets gamma 2 = 1- (v 0-v 2)/(v 2+ v 0);
The optimizing module optimizes the preset times z1 of the next control period according to the optimizing coefficient gamma i, and sets the preset times z1 of the next control period after optimization as z1", and sets z1" =z1×gamma i, i=1, 2.
Compared with the prior art, the invention has the advantages that the voice filtering module improves the accuracy of voice frame validity judgment by setting preset decibels so as to improve the accuracy of voice frame validity judgment, thereby improving the accuracy of text information generation and finally improving the voice recognition efficiency of the robot, the voice analyzing unit improves the accuracy of voice validity judgment by setting preset frame lengths so as to improve the accuracy of text information generation and finally improving the voice recognition efficiency of the robot, the adjusting unit improves the accuracy of adjustment coefficient by setting preset speech speed coefficients so as to reduce the influence of speech speed factors on the voice validity judgment, thereby improving the accuracy of voice validity judgment of the voice, thereby improving the accuracy of text information generation, and finally improving the voice recognition efficiency of the robot, the correcting unit improves the accuracy of correction coefficient by setting preset abnormal thresholds so as to improve the accuracy of the adjustment coefficient, thereby improving the accuracy of voice information generation and finally improving the voice recognition efficiency of the robot, the state control unit controls the voice recognition coefficient by setting preset frames so as to improve the accuracy of the robot, the state control unit is set by setting preset frames so as to improve the accuracy of the state control coefficient of the robot is improved, the error correction coefficient is improved by setting the error of the robot is improved so as to improve the accuracy of the voice recognition coefficient is corrected by setting the error of the voice of the robot is improved, thereby improve the accuracy to robot state control, and then reduce the number of times that the robot was opened by mistake, finally improved the speech recognition efficiency of robot, optimizing the module and through setting up the accuracy that presets abnormal number in order to improve the optimizing coefficient, and then improve the accuracy to robot state control, and then reduce the number of times that the robot was opened by mistake, finally improved the speech recognition efficiency of robot.
Drawings
Fig. 1 is a schematic structural diagram of a robot voice intelligent interaction system according to the present embodiment;
Fig. 2 is a schematic diagram of the structure of the voice analysis module of the present embodiment;
FIG. 3 is a schematic diagram of the adjusting module according to the present embodiment;
fig. 4 is a schematic structural diagram of the interactive control module according to the present embodiment;
fig. 5 is a schematic structural diagram of the calibration module according to the present embodiment.
Detailed Description
In order that the objects and advantages of the invention will become more apparent, the invention will be further described with reference to the following examples; it should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.
It should be noted that, in the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those skilled in the art according to the specific circumstances.
Referring to fig. 1, a schematic structural diagram of a robot voice intelligent interaction system according to the present embodiment is shown, the system includes,
The first acquisition module is used for acquiring voice information of a user, wherein the voice information comprises decibels of voice frames and voice duration of the user, the voice frames can be acquired in a frame dividing mode, the frames are a plurality of voice frames which are obtained by a robot, the voice of each voice frame is single voice, the decibels of the voice frame can be acquired through a built-in sensor of the robot, and the voice duration of the user can be acquired through the built-in sensor of the robot; in the implementation, the method for acquiring the voice information is not particularly limited, and can be freely set by a person skilled in the art, and only the requirement for acquiring the voice information is met;
the voice filtering module is used for filtering the acquired voice of the user according to the decibel of the acquired voice frame voice and the voice duration of the user, generating a phoneme from the filtered voice of the user, and connecting the voice filtering module with the first acquisition module;
The second acquisition module is used for acquiring phoneme information and frame dividing information, wherein the phoneme information comprises frame lengths of phonemes and number of phonemes, the frame dividing information comprises the number of invalid voice frames, the number of voice frames and frame shifts, the frame shifts are time lengths for shifting one voice frame to another adjacent voice frame, and the second acquisition module is connected with the voice filtering module;
the voice analysis module is used for analyzing the effectiveness of each phoneme according to the acquired frame length of each phoneme, recognizing the characters according to the effective phonemes and generating text information, and is connected with the second acquisition module;
The adjusting module is used for adjusting the analysis process of the effectiveness of each phoneme according to the acquired voice duration, the number of phonemes, the number of invalid voice frames and the number of voice frames of the user, and is connected with the voice analysis module;
The interaction control module is used for controlling the state of the robot according to the generated text information and the matching result of the preset keywords, analyzing the output content and outputting the output content, and is connected with the adjustment module; in this embodiment, a plurality of preset keywords are provided, wherein the preset keywords include preset starting keywords and preset output keywords, the setting of the preset keywords is not specifically limited, and the preset keywords can be freely set by a person skilled in the art only by meeting the setting requirement of the preset keywords, wherein the preset starting keywords can be set as 'hello, small skill' and the like, and the preset output keywords can be set according to practical purposes;
the correction module is used for correcting the control process of the robot state according to the number of invalid phonemes, the number of phonemes and the frame shift duration, and is connected with the interaction control module;
The optimizing module is used for optimizing the control process of the state of the robot in the next control period according to the number of times that the user does not operate after the robot is started in the control period, and is connected with the correcting module; in this embodiment, the setting of the control period is not specifically limited, and a person skilled in the art can freely set the control period only by meeting the setting requirement of the control period, wherein the control period can be set to 7 days, 10 days, 15 days, and the like.
Referring to fig. 2, a schematic structure diagram of a voice analysis module according to the present embodiment is shown, where the voice analysis module includes,
A phoneme analysis unit for analyzing the validity of the phonemes according to the obtained frame lengths of the phonemes;
the text generation unit is used for generating effective text according to the effective phonemes and generating text information according to the effective text, and is connected with the phoneme analysis unit;
Fig. 3 is a schematic structural diagram of an adjustment module according to the present embodiment, where the adjustment module includes,
The adjusting unit is used for adjusting the analysis process of the validity of each phoneme according to the acquired voice duration and the number of the phonemes of the user;
The correction unit is used for correcting the adjusting process of the analysis process of the validity of each phoneme according to the number of invalid voice frames and the number of voice frames, and is connected with the adjusting unit;
referring to fig. 4, a schematic structural diagram of an interactive control module according to the present embodiment is shown, where the interactive control module includes,
The state control unit is used for matching the text information generated by the voice analysis module with a preset starting keyword and controlling the state of the robot according to the byte number of the text information;
The output analysis unit is used for matching the text information generated by the voice analysis module with a preset keyword, analyzing the output content of the robot according to a matching result and is connected with the state control unit;
referring to fig. 5, a schematic structural diagram of a calibration module according to the present embodiment is shown, where the calibration module includes,
A correction unit for correcting the robot state control process according to the number of invalid phonemes and the number of phonemes;
And the compensation unit is used for compensating the correction process of the robot state control process according to the acquired frame shift time length.
Specifically, the method and the device are applied to intelligent voice interaction of the robot, the acquired user voice is filtered, the effectiveness of the user voice is analyzed, text information is generated according to the analysis result, the state and the output content of the robot are controlled according to the text information, and the voice recognition efficiency of the robot is improved.
Specifically, the voice filtering module improves accuracy of voice frame validity judgment by setting preset frame length to improve accuracy of voice frame validity judgment, thereby improving accuracy of generating text information, and finally improving voice recognition efficiency of a robot, the phoneme analyzing unit improves accuracy of generating text information by setting preset frame length to improve accuracy of phoneme validity judgment, and finally improves voice recognition efficiency of a robot, the adjusting unit improves accuracy of adjusting coefficient by setting preset word pitch coefficient to improve accuracy of state control of a robot, reduces influence of a word speed factor on voice validity judgment, and further improves accuracy of phoneme validity judgment, thereby improving accuracy of generating text information, and finally improves voice recognition efficiency of a robot, the correcting unit improves accuracy of correcting coefficient by setting preset abnormal threshold value to improve accuracy of correcting coefficient, and further improves accuracy of adjusting coefficient, and further improves accuracy of phoneme validity judgment, thereby improving accuracy of generating text information, and finally improves voice recognition efficiency of a robot, and the state control unit improves accuracy of state control of a robot by setting preset word pitch coefficient to improve accuracy, and thereby improves accuracy of state control of a robot, and further improves accuracy of correcting coefficient of a state control of a robot by setting preset word pitch coefficient to improve accuracy of a state control of a robot, and thereby improves accuracy of state control of a robot by setting a state control unit, and thereby improves accuracy of a state control coefficient of a voice of a robot is improved, and then reduce the number of times that the robot was opened by mistake, finally improved the speech recognition efficiency of robot, optimizing the module and through setting up the accuracy that presets abnormal number of times in order to improve optimization coefficient, and then improve the accuracy to the robot state control, and then reduce the number of times that the robot was opened by mistake, finally improved the speech recognition efficiency of robot.
Specifically, the voice filtering module compares the acquired decibel Ka of the voice frame voice with each preset decibel, and filters the acquired voice of the user according to the comparison result, wherein:
When Ka is smaller than b1 or Ka is larger than b2, the voice filtering module judges that the voice frame is invalid and filters the voice frame;
When b1 is less than or equal to Ka is less than or equal to b2, the voice filtering module judges that the voice frame is effective and generates a phoneme according to the effective voice frame;
wherein b1 is a preset minimum decibel, b2 is a preset maximum decibel, a is more than 0 and less than or equal to t0/t1, t0 is the acquired voice duration of the user, t1 is the framing duration, and Ka is the decibel of the voice of the a-th voice frame;
The speech filtering module quantizes the valid speech frame features into digital data that can be processed by a computer according to mel-frequency cepstrum coefficients and matches the features according to an acoustic model to generate phonemes.
Specifically, the voice filtering module improves the accuracy of voice frame validity judgment by setting preset decibels, so that the accuracy of phoneme validity judgment is improved, the accuracy of text information generation is improved, and finally the voice recognition efficiency of the robot is improved; in this embodiment, the setting of the preset decibels is not specifically limited, and a person skilled in the art can freely set the setting of the preset decibels only by meeting the setting requirement of the preset decibels, wherein the optimal value of b1 is 50 decibels, and the optimal value of b2 is 75 decibels.
Specifically, the phoneme analysis unit compares the frame length Yy of each phoneme with each preset frame length, and analyzes the validity of each phoneme according to the comparison result, wherein:
When Yy < C1 or Yy > C2, the phoneme analysis unit determines that the phoneme is invalid;
when C1 is less than or equal to Yy is less than or equal to C2, the phoneme analysis unit judges that the phoneme is valid;
Wherein, C1 is the preset minimum frame length, and C2 is the preset maximum frame length.
Specifically, the phoneme analysis unit improves the accuracy of judging the validity of the phonemes by setting a preset frame length, so that the accuracy of generating text information is improved, and finally the voice recognition efficiency of the robot is improved; in this embodiment, the setting of the preset frame length is not specifically limited, and a person skilled in the art can freely set the preset frame length only by meeting the setting requirement of the preset frame length, wherein the optimal value of C1 is 30ms, and the optimal value of C2 is 120ms.
Specifically, the text generation unit sequentially combines the effective phonemes, generates text according to the combination result, and sequentially combines the generated text into text information, such as text information generated by we n b ě n.
Specifically, the adjusting unit calculates a ratio of the acquired voice duration of the user to the number m of phonemes of the current voice, compares a calculation result with a preset speech rate coefficient, and adjusts an analysis process of validity of each phoneme according to the comparison result, wherein:
When t0/m is less than or equal to d1, the adjusting unit judges that the speech speed of the user is fast, sets an adjusting coefficient alpha 1 to adjust a preset frame length Cc, and sets alpha 1 = 1- (d 1-t 0/m)/(d1+t0/m);
When d1 is less than t0/m is less than d2, the regulating unit judges that the speech speed of the user is normal and does not regulate;
when t0/m is more than or equal to d2, the adjusting unit judges that the user is slow in speech rate, sets an adjusting coefficient alpha 2 to adjust a preset frame length Cc, and sets alpha 2 = 1+ (t 0/m-d 2)/(d2+t0/m), wherein d1 is a first preset speech rate coefficient, d2 is a second preset speech rate coefficient, and d1 is less than d2;
The adjusting unit adjusts the preset frame length Cc according to the adjusting coefficient α x, and sets the adjusted Cc as Cc ', and sets Cc' =cc×α x, c=1, 2, and x=1, 2.
Specifically, the adjusting unit sets a preset speech speed coefficient to improve the accuracy of the adjusting coefficient so as to reduce the influence of the speech speed factor on the validity of the phonemes, further improve the accuracy of judging the validity of the phonemes, further improve the accuracy of generating text information and finally improve the voice recognition efficiency of the robot; in this embodiment, the setting of the preset speech rate coefficient is not specifically limited, and a person skilled in the art can freely set the preset speech rate coefficient only by meeting the setting requirement of the preset speech rate coefficient, wherein the optimal value of d1 is 0.08, and the optimal value of d2 is 0.18.
Specifically, the correction unit calculates a ratio of an invalid speech frame n0 to the number of speech frames in the acquired user speech, compares a calculation result with a preset abnormal threshold value, and corrects an adjustment process of an analysis process of validity of each phoneme according to the comparison result, wherein:
When n0/n1 is less than e1, the correction unit judges that the abnormal frames of the user voice are less and does not correct;
When n0/n1 is greater than or equal to e1, the correction unit determines that the number of abnormal frames of the user voice is large, sets a correction coefficient beta to correct the adjustment coefficient alpha x, sets beta=1- (n 0/n1-e 1)/(n 0/n1+ e 1), sets the corrected adjustment coefficient alpha x to alpha x', and sets alpha x'=αx ×beta, wherein n1=t0/t 1.
Specifically, the correction unit sets a preset abnormal threshold value to improve the accuracy of the correction coefficient, so that the accuracy of the adjustment coefficient is improved, the accuracy of the phoneme validity judgment is improved, the accuracy of generating text information is improved, and finally the voice recognition efficiency of the robot is improved; in this embodiment, the setting of the preset abnormal threshold is not specifically limited, and a person skilled in the art can freely set the preset abnormal threshold only by meeting the setting requirement of the preset abnormal threshold, wherein the optimal value of e1 is 0.23.
Specifically, the state control unit matches the text information with a preset starting keyword, and controls the state of the robot according to a matching result and the byte number z0 of the text information, wherein:
when the text information is matched with a preset starting keyword, if z0 is less than or equal to z1, the state control unit controls the robot to start, and if z0 is more than z1, the state control unit does not control the robot to start;
When the text information is not matched with the preset starting key words, the state control unit does not control the robot to start, wherein z0 is the byte number of the text information, and z1 is the preset byte number.
Specifically, the state control unit improves the accuracy of the state control of the robot by setting the preset byte number, so that the number of times that the robot is started by mistake is reduced, and the voice recognition efficiency of the robot is improved; in this embodiment, the method for obtaining the byte number of the text information is not specifically limited, and a person skilled in the art can freely set the method only by meeting the setting requirement of the text information, wherein the byte number of the text information can be obtained through Python; in this embodiment, the value of the preset number of bytes is not specifically limited, and a person skilled in the art can freely set the value of the preset number of bytes only by meeting the value requirement of the preset number of bytes, wherein the optimal value of the preset number of bytes is 40.
Specifically, after the robot is turned on, the output analysis unit matches the text information generated by the voice analysis module with a preset output keyword, and analyzes the output content of the robot according to the type f of the preset output keyword in the text information and the history matching times G f of the preset output keyword, wherein:
when f=0, the output content analysis unit does not analyze the output content of the robot;
When f=1, the output analysis unit outputs the content corresponding to the preset output keyword;
When f is more than 1, the output analysis unit sorts the history matching times G f of each preset output keyword according to the order from big to small according to the bubbling sorting method, sets the history matching times G f of the largest preset output keyword as G0, and outputs the content corresponding to the preset output keyword with the history matching times of G0.
Specifically, in this embodiment, the method for ordering the history matching times of the preset output keywords is not specifically limited, and a person skilled in the art can freely set the method only by meeting the ordering requirement of the history matching times of the preset output keywords; in this embodiment, each preset output keyword is provided with corresponding output content, and those skilled in the art can set the preset output keywords according to the actual use of the robot.
Specifically, the correction unit calculates the ratio of the number h0 of invalid phonemes to the number of phonemes in the acquired user voice, compares the calculation result with a preset phoneme abnormal coefficient p, and corrects the control process of the state of the robot according to the comparison result, wherein:
when h0/m is less than or equal to p, the correction unit judges that the abnormal number of the voice phonemes of the user is normal, and correction is not carried out;
when h0/m > P, the correction unit determines that the number of voice phonemes of the user is large, sets a correction coefficient P to correct a preset number of bytes z1, sets z1=1+ (h 0/m-P)/(h 0/m-P), sets the corrected preset number of bytes as z1', and sets z1' =z1×p.
Specifically, the correction unit improves the accuracy of the correction coefficient by setting the preset phoneme abnormal coefficient, thereby improving the accuracy of the state control of the robot, further reducing the number of times that the robot is opened by mistake, and finally improving the voice recognition efficiency of the robot; in this embodiment, the setting of the preset phoneme abnormal coefficient is not specifically limited, and a person skilled in the art can freely set the preset phoneme abnormal coefficient only by meeting the setting requirement of the preset phoneme abnormal coefficient, wherein the optimal value of p is 0.2.
Specifically, the compensation unit compares the acquired frame shift j0 with each preset frame shift, and compensates a correction process of a control process of the state of the robot according to the comparison result, wherein:
when j0 is less than or equal to j1, the compensation unit judges that the frame is moved down, sets a compensation coefficient Q1 to compensate a correction coefficient P, and sets Q1=1- (j 1-j 0)/(j 1+ j 0);
when j1 is less than j0 and less than j2, the compensation unit judges that the frame moves normally and does not correct;
When j0 is larger than or equal to j2, the compensation unit judges that the frame is large, sets a compensation coefficient Q2 to compensate a correction coefficient P, and sets Q2=1+ (j 0-j 2)/(j 0+j 2);
The compensation unit compensates the correction coefficient P according to the compensation coefficient Qq, and sets the compensated correction coefficient as P ', and sets P' =P×Qq, and q=1, 2;
Wherein j1 is a preset minimum frame shift, and j2 is a preset maximum frame shift.
Specifically, the compensation unit improves the accuracy of the compensation coefficient by setting a preset frame shift, further improves the accuracy of the correction coefficient, thereby improving the accuracy of the state control of the robot, further reducing the number of times that the robot is started by mistake, and finally improving the voice recognition efficiency of the robot; in this embodiment, the setting of the preset frame shift is not specifically limited, and a person skilled in the art can freely set the setting of the preset frame shift only by meeting the setting requirement of the frame shift, wherein the optimal value of j1 is 8ms, and the optimal value of j2 is 15ms.
Specifically, the optimization module compares the number v0 of times that the robot is not operated by the user after being started in the control period with each preset abnormal number, and optimizes the control process of the state of the robot in the next control period according to the comparison result, wherein:
When v0 is less than or equal to v1, the optimization module judges that the number of times of non-operation of a user is less, sets an optimization coefficient gamma 1 to optimize the preset number of times z1 of the next control period, and sets gamma 1 = 1+ (v 1-v 0)/(v 1+ v 0);
when v1 is smaller than v0 and smaller than v2, the optimization module judges that the number of times of non-operation of a user is normal and does not perform optimization;
When v0 is more than or equal to v2, the optimization module judges that the number of times of non-operation of a user is more, sets an optimization coefficient gamma 2 to optimize the preset number of times z1 of the next control period, and sets gamma 2 = 1- (v 0-v 2)/(v 2+ v 0);
The optimizing module optimizes the preset times z1 of the next control period according to an optimizing coefficient gamma i, and sets the preset times z1 of the next control period after optimization as z1 ', and sets z 1' =z1×gamma i, i=1, 2;
Wherein v1 is the minimum number of anomalies and v2 is the maximum number of anomalies.
Specifically, the optimization module improves the accuracy of the optimization coefficient by setting the preset abnormal times, so that the accuracy of the state control of the robot is improved, the times that the robot is started by mistake are reduced, and finally the voice recognition efficiency of the robot is improved; in this embodiment, the setting requirement of the preset number of anomalies is not specifically limited, and a person skilled in the art can freely set the setting requirement of the preset number of anomalies only by meeting the setting requirement of the preset number of anomalies, wherein when the control period is 10 days, the optimal value of v1 is 30, and the optimal value of v2 is 80.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims (6)

1. A robot voice intelligent interaction system is characterized by comprising,
The first acquisition module is used for acquiring voice information of a user;
the voice filtering module is used for filtering the acquired voice of the user according to the decibel of the acquired voice frame voice and the voice duration of the user, and generating a phoneme from the filtered voice of the user;
The second acquisition module is used for acquiring the phoneme information and the framing information;
The voice analysis module is used for analyzing the effectiveness of each phoneme according to the acquired frame length of each phoneme, recognizing the characters according to the effective phonemes and generating text information;
The adjusting module is used for adjusting the analysis process of the effectiveness of each phoneme according to the acquired voice duration, the number of phonemes, the number of invalid voice frames and the number of voice frames of the user;
The interaction control module is used for controlling the state of the robot according to the generated text information and the matching result of the preset keywords, and analyzing the output content to output the output content;
the correction module is used for correcting the control process of the robot state according to the number of invalid phonemes, the number of phonemes and the frame shift duration;
The optimizing module is used for optimizing the control process of the state of the robot in the next control period according to the number of times that the user does not operate after the robot is started in the control period;
The adjusting module is provided with an adjusting unit, the adjusting unit calculates the ratio of the acquired voice duration of the user to the number m of phonemes of the current voice, compares the calculation result with a preset speech speed coefficient, and adjusts the analysis process of the validity of each phoneme according to the comparison result, wherein:
When t0/m is less than or equal to d1, the adjusting unit judges that the speech speed of the user is fast, sets an adjusting coefficient alpha 1 to adjust a preset frame length Cc, and sets alpha 1 = 1- (d 1-t 0/m)/(d1+t0/m);
When d1 is less than t0/m is less than d2, the regulating unit judges that the speech speed of the user is normal and does not regulate;
When t0/m is more than or equal to d2, the adjusting unit judges that the user speech speed is slow, sets an adjusting coefficient alpha 2 to adjust a preset frame length Cc, and sets alpha 2 = 1+ (t 0/m-d 2)/(d2+t0/m);
wherein d1 is a first preset speech rate coefficient, and d2 is a second preset speech rate coefficient;
The adjustment module is also provided with a correction unit, the correction unit calculates the ratio of invalid voice frames n0 in the acquired user voice to the number of voice frames, compares the calculation result with a preset abnormal threshold value, and corrects the adjustment process of the analysis process of the validity of each phoneme according to the comparison result, wherein:
When n0/n1 is less than e1, the correction unit judges that the abnormal frames of the user voice are less and does not correct;
When n0/n1 is larger than or equal to e1, the correction unit judges that the number of abnormal frames of the user voice is large, sets a correction coefficient beta to correct the adjustment coefficient alpha x, sets beta=1- (n 0/n1-e 1)/(n 0/n1+ e 1), sets the corrected adjustment coefficient alpha x as alpha x', and sets alpha x'=αx multiplied by beta, wherein n 1=t0/t 1;
The correction module is provided with a correction unit, the correction unit calculates the ratio of the number h0 of invalid phonemes to the number of phonemes in the acquired user voice, compares the calculation result with a preset phoneme abnormal coefficient p, and corrects the control process of the state of the robot according to the comparison result, wherein:
when h0/m is less than or equal to p, the correction unit judges that the abnormal number of the voice phonemes of the user is normal, and correction is not carried out;
When h0/m > P, the correction unit judges that the number of voice phonemes of the user is large, sets a correction coefficient P to correct a preset byte number z1, sets z1=1+ (h 0/m-P)/(h 0/m-P), sets the corrected preset byte number as z1', and sets z1' =z1×p;
The correction module is also provided with a compensation unit, the compensation unit compares the acquired frame shift j0 with each preset frame shift and compensates the correction process of the control process of the state of the robot according to the comparison result, wherein:
when j0 is less than or equal to j1, the compensation unit judges that the frame is moved down, sets a compensation coefficient Q1 to compensate a correction coefficient P, and sets Q1=1- (j 1-j 0)/(j 1+ j 0);
when j1 is less than j0 and less than j2, the compensation unit judges that the frame moves normally and does not correct;
When j0 is larger than or equal to j2, the compensation unit judges that the frame is large, sets a compensation coefficient Q2 to compensate a correction coefficient P, and sets Q2=1+ (j 0-j 2)/(j 0+j 2);
The compensation unit compensates the correction coefficient P according to the compensation coefficient Qq, and sets the compensated correction coefficient as P ', and sets P' =P×Qq, and q=1, 2;
Wherein j1 is a preset minimum frame shift, and j2 is a preset maximum frame shift.
2. The intelligent robot voice interaction system according to claim 1, wherein the voice filtering module compares the acquired decibels Ka of the voice frame voice with each preset decibel, and filters the acquired voice of the user according to the comparison result, wherein:
When Ka is smaller than b1 or Ka is larger than b2, the voice filtering module judges that the voice frame is invalid and filters the voice frame;
When b1 is less than or equal to Ka is less than or equal to b2, the voice filtering module judges that the voice frame is effective and generates a phoneme according to the effective voice frame;
wherein b1 is a preset minimum decibel, b2 is a preset maximum decibel, a is more than 0 and less than or equal to t0/t1, t0 is the acquired voice duration of the user, t1 is the framing duration, and Ka is the decibel of the voice of the a-th voice frame.
3. The robot voice intelligent interaction system according to claim 2, wherein the voice analysis module is provided with a phoneme analysis unit which compares a frame length Yy of each phoneme with each preset frame length and analyzes the validity of each phoneme according to the comparison result, wherein:
When Yy < C1 or Yy > C2, the phoneme analysis unit determines that the phoneme is invalid;
when C1 is less than or equal to Yy is less than or equal to C2, the phoneme analysis unit judges that the phoneme is valid;
Wherein, C1 is the preset minimum frame length, and C2 is the preset maximum frame length.
4. The intelligent robot voice interaction system according to claim 1, wherein the interaction control module is provided with a state control unit, the state control unit matches the text information with a preset starting keyword, and controls the state of the robot according to the matching result and the byte number z0 of the text information, wherein:
when the text information is matched with a preset starting keyword, if z0 is less than or equal to z1, the state control unit controls the robot to start, and if z0 is more than z1, the state control unit does not control the robot to start;
When the text information is not matched with the preset starting key words, the state control unit does not control the robot to start, wherein z0 is the byte number of the text information, and z1 is the preset byte number.
5. The intelligent robot voice interaction system according to claim 4, wherein the interaction control module is further provided with an output analysis unit, and when the robot is turned on, the output analysis unit matches text information generated by the voice analysis module with a preset output keyword, and analyzes output content of the robot according to a category f of the preset output keyword in the text information and a history matching number G f of the preset output keyword, wherein:
when f=0, the output analysis unit does not analyze the output content of the robot;
When f=1, the output analysis unit outputs the content corresponding to the preset output keyword;
When f is more than 1, the output analysis unit sorts the history matching times G f of each preset output keyword according to the order from big to small according to the bubbling sorting method, sets the history matching times G f of the largest preset output keyword as G0, and outputs the content corresponding to the preset output keyword with the history matching times of G0.
6. The intelligent robot voice interaction system according to claim 1, wherein the optimizing module compares the number of times v0 of the user's non-operation after the robot is turned on in the control period with each preset abnormal number of times, and optimizes the control process of the state of the robot in the next control period according to the comparison result, wherein:
When v0 is less than or equal to v1, the optimization module judges that the number of times of non-operation of a user is less, sets an optimization coefficient gamma 1 to optimize the preset number of times z1 of the next control period, and sets gamma 1 = 1+ (v 1-v 0)/(v 1+ v 0);
when v1 is smaller than v0 and smaller than v2, the optimization module judges that the number of times of non-operation of a user is normal and does not perform optimization;
When v0 is more than or equal to v2, the optimization module judges that the number of times of non-operation of a user is more, sets an optimization coefficient gamma 2 to optimize the preset number of times z1 of the next control period, and sets gamma 2 = 1- (v 0-v 2)/(v 2+ v 0);
The optimizing module optimizes the preset times z1 of the next control period according to an optimizing coefficient gamma i, and sets the preset times z1 of the next control period after optimization as z1 ', and sets z 1' =z1×gamma i, i=1, 2;
Wherein v1 is the minimum number of anomalies and v2 is the maximum number of anomalies.
CN202410259405.0A 2024-03-07 2024-03-07 Robot voice intelligent interaction system Active CN117854506B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410259405.0A CN117854506B (en) 2024-03-07 2024-03-07 Robot voice intelligent interaction system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410259405.0A CN117854506B (en) 2024-03-07 2024-03-07 Robot voice intelligent interaction system

Publications (2)

Publication Number Publication Date
CN117854506A CN117854506A (en) 2024-04-09
CN117854506B true CN117854506B (en) 2024-05-14

Family

ID=90540470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410259405.0A Active CN117854506B (en) 2024-03-07 2024-03-07 Robot voice intelligent interaction system

Country Status (1)

Country Link
CN (1) CN117854506B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019051668A1 (en) * 2017-09-13 2019-03-21 深圳传音通讯有限公司 Start control method and start control system for smart terminal
CN112331219A (en) * 2020-11-05 2021-02-05 北京爱数智慧科技有限公司 Voice processing method and device
CN112530434A (en) * 2020-12-21 2021-03-19 云南电网有限责任公司玉溪供电局 Automatic intelligent robot on duty scheduling system of power station
CN113782010A (en) * 2021-11-10 2021-12-10 北京沃丰时代数据科技有限公司 Robot response method, device, electronic equipment and storage medium
CN115238867A (en) * 2022-07-28 2022-10-25 广东电力信息科技有限公司 Power failure positioning method based on intelligent identification of customer service unstructured data
CN115334063A (en) * 2022-07-08 2022-11-11 广东机电职业技术学院 Question-answering type virtual human control method, system, equipment and storage medium
CN116434736A (en) * 2023-04-19 2023-07-14 中国工商银行股份有限公司 Voice recognition method, interaction method, system and equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2458461A (en) * 2008-03-17 2009-09-23 Kai Yu Spoken language learning system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019051668A1 (en) * 2017-09-13 2019-03-21 深圳传音通讯有限公司 Start control method and start control system for smart terminal
CN112331219A (en) * 2020-11-05 2021-02-05 北京爱数智慧科技有限公司 Voice processing method and device
CN112530434A (en) * 2020-12-21 2021-03-19 云南电网有限责任公司玉溪供电局 Automatic intelligent robot on duty scheduling system of power station
CN113782010A (en) * 2021-11-10 2021-12-10 北京沃丰时代数据科技有限公司 Robot response method, device, electronic equipment and storage medium
CN115334063A (en) * 2022-07-08 2022-11-11 广东机电职业技术学院 Question-answering type virtual human control method, system, equipment and storage medium
CN115238867A (en) * 2022-07-28 2022-10-25 广东电力信息科技有限公司 Power failure positioning method based on intelligent identification of customer service unstructured data
CN116434736A (en) * 2023-04-19 2023-07-14 中国工商银行股份有限公司 Voice recognition method, interaction method, system and equipment

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Speech-based interaction in an AAL context;Vacher, Michel, et al.;《ISARC》;20120630;全文 *
基于特征音素的说话人识别方法;王昌龙;周福才;凌裕平;於锋;;仪器仪表学报;20071015(第10期);全文 *
生态翻译理念下智能翻译机器人机交互研究;李指南;《自动化与仪器仪表》;20230930(第9期);全文 *

Also Published As

Publication number Publication date
CN117854506A (en) 2024-04-09

Similar Documents

Publication Publication Date Title
US5794196A (en) Speech recognition system distinguishing dictation from commands by arbitration between continuous speech and isolated word modules
KR100933108B1 (en) Voice recognition system using implicit speaker adaptation
EP0965978B1 (en) Non-interactive enrollment in speech recognition
US8612235B2 (en) Method and system for considering information about an expected response when performing speech recognition
US8781837B2 (en) Speech recognition system and method for plural applications
US7089184B2 (en) Speech recognition for recognizing speaker-independent, continuous speech
CA2130218C (en) Data compression for speech recognition
US20070118373A1 (en) System and method for generating closed captions
EP1321926A1 (en) Speech recognition correction
US20070118364A1 (en) System for generating closed captions
EP1702319B1 (en) Error detection for speech to text transcription systems
US7181395B1 (en) Methods and apparatus for automatic generation of multiple pronunciations from acoustic data
JP2001517325A (en) Recognition system
JPH10198396A (en) Method of and system for recognizing phrase defined by user without depending upon speaker
CN113129898B (en) Machine-assisted conference recording system and method
CN104240718A (en) Transcription support device, method, and computer program product
JP2006505002A (en) Speech recognition method and system
CN117854506B (en) Robot voice intelligent interaction system
JP3535292B2 (en) Speech recognition system
CN116564286A (en) Voice input method and device, storage medium and electronic equipment
US7353173B2 (en) System and method for Mandarin Chinese speech recognition using an optimized phone set
CN118013390B (en) Intelligent workbench control method and system based on big data analysis
EP0143161A1 (en) Apparatus for automatic speech activity detection
WO2024053822A1 (en) System and method for receiving a voice command
CN117012206A (en) Man-machine voice interaction system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant