CN110619888B - AI voice rate adjusting method and device and electronic equipment - Google Patents

AI voice rate adjusting method and device and electronic equipment Download PDF

Info

Publication number
CN110619888B
CN110619888B CN201910939380.8A CN201910939380A CN110619888B CN 110619888 B CN110619888 B CN 110619888B CN 201910939380 A CN201910939380 A CN 201910939380A CN 110619888 B CN110619888 B CN 110619888B
Authority
CN
China
Prior art keywords
voice
parameter
adjustment
speed
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910939380.8A
Other languages
Chinese (zh)
Other versions
CN110619888A (en
Inventor
石文超
戴会杰
常富洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qiyu Information Technology Co Ltd
Original Assignee
Beijing Qiyu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qiyu Information Technology Co Ltd filed Critical Beijing Qiyu Information Technology Co Ltd
Priority to CN201910939380.8A priority Critical patent/CN110619888B/en
Publication of CN110619888A publication Critical patent/CN110619888A/en
Application granted granted Critical
Publication of CN110619888B publication Critical patent/CN110619888B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion
    • G10L21/043Time compression or expansion by changing speed
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Telephone Function (AREA)

Abstract

The invention discloses an AI voice rate adjusting method, an AI voice rate adjusting device and electronic equipment, wherein the method comprises the following steps: creating AI voice parameter sets matched with different AI voice styles; searching AI voice parameters matched with the AI voice style selected by the user in the voice parameter set, playing voice according to the AI voice parameters, and processing dialogue response; analyzing the difference value of the voice parameter of the call object and the AI voice parameter in real time; comparing the difference value with a preset threshold value; and if the difference value exceeds the preset threshold value, adjusting the voice parameter. According to the AI voice rate adjusting method, the difference value of the voice parameters of the call object and the AI voice parameters is compared in real time, and when the difference value is larger than the preset threshold value, the AI voice parameters are adjusted, so that the voice rate of the call object is matched with the voice rate of the robot.

Description

AI voice rate adjusting method and device and electronic equipment
Technical Field
The present invention relates to the technical field of AI speech, and in particular, to a method and apparatus for adjusting AI speech rate, an electronic device, and a computer readable medium.
Background
AI (artificial intelligence ) voice technology has been rapidly developed in recent years, each large intelligent device takes the intelligent voice AI technology as a selling point, people are more and more eager to communicate with a machine naturally and conveniently, AI interaction is more and more similar to human interaction from a traditional one-question-and-one-answer voice interaction system to a popular multi-round one-question-and-answer voice interaction system, and user experience is greatly improved.
However, the AI voice playing rate synthesized by various technical means is usually fixed, and in the actual application scene, the response speed of different people to AI voice is long and short, the voice speed feeling is not completely consistent. This results in some users feeling that the robot is speaking too fast, or the interval between each sentence is too short, or the reaction speed of the robot is too fast, and some users feeling that the robot is speaking too slow, or the interval between each sentence is too long, or the reaction speed of the robot is too slow, during the man-machine conversation. In one extreme case, the situation that the user cannot insert a phone or the robot cannot answer normally, so that the conversation cannot be performed normally, and the user experience is affected may occur.
Disclosure of Invention
The invention aims to solve the technical problems that the AI voice playing speed is fixed and the speech speed of different users cannot be adapted in the prior art.
In order to solve the above technical problem, a first aspect of the present invention provides an AI speech rate adjustment method, which includes:
creating AI voice parameter sets matched with different AI voice styles;
searching AI voice parameters matched with the AI voice style selected by the user in the voice parameter set, playing voice according to the AI voice parameters, and processing dialogue response;
analyzing the difference value of the voice parameter of the call object and the AI voice parameter in real time;
comparing the difference value with a preset threshold value;
and if the difference value exceeds the preset threshold value, adjusting the voice parameter.
According to a preferred embodiment of the invention, the method further comprises:
and creating a voice adjustment model, and adjusting the voice parameters according to the voice adjustment model if the difference value exceeds the preset threshold value.
According to a preferred embodiment of the invention, the method further comprises:
and comparing the voice parameters of the current call object with the current AI voice parameters to obtain a first result, and comparing the voice parameters of the call object before adjusting the voice parameters with the voice parameters of the call object after adjusting the voice parameters to obtain a second result.
Determining the parameter adjustment direction of the round according to the first result and the second result;
and determining a lower wheel parameter adjustment scheme according to the current wheel parameter adjustment direction.
According to a preferred embodiment of the present invention, if the speech rate of the call object is greater than the first preset rate or the speech rate of the call object is less than the second preset rate, a common spoken language word is inserted into the dialogue.
According to a preferred embodiment of the present invention, the voice parameters include: voice interval time, response speed, and voice speed.
According to a preferred embodiment of the present invention, the adjustment scheme includes not adjusting, continuously adjusting in the same direction as or opposite to the direction of the current wheel parameter adjustment.
In order to solve the above technical problem, a second aspect of the present invention provides an AI speech rate adjustment apparatus, including:
the first creating module is used for creating an AI voice parameter set matched with different AI voice styles;
the searching and playing module is used for searching the AI voice parameters matched with the AI voice styles selected by the user in the voice parameter set, playing the voice according to the AI voice parameters and processing dialogue response;
the analysis module is used for analyzing the difference value of the voice parameter of the call object and the AI voice parameter in real time;
the comparison module is used for comparing the difference value with a preset threshold value;
and the adjusting module is used for adjusting the voice parameters if the difference value exceeds the preset threshold value.
According to a preferred embodiment of the invention, the device further comprises:
the second creation module is used for creating a voice adjustment model;
and if the difference value exceeds the preset threshold value, the adjusting module adjusts the voice parameters according to the voice adjusting model.
According to a preferred embodiment of the invention, the device further comprises:
and the comparison module is used for comparing the voice parameter of the current call object with the current AI voice parameter to obtain a first result, and comparing the voice parameter of the call object before adjusting the voice parameter with the voice parameter of the call object after adjusting the voice parameter to obtain a second result.
The first determining module is used for determining the parameter adjustment direction of the round according to the first result and the second result;
and the second determining module is used for determining a lower wheel parameter adjustment scheme according to the current wheel parameter adjustment direction.
According to a preferred embodiment of the invention, the device further comprises:
and the inserting module is used for inserting the common spoken language assisting word into the dialogue if the voice rate of the communication object is larger than the first preset rate or smaller than the second preset rate.
According to a preferred embodiment of the present invention, the voice parameters include: voice interval time, response speed, and voice speed.
According to a preferred embodiment of the present invention, the adjustment scheme includes not adjusting, continuously adjusting in the same direction as or opposite to the direction of the current wheel parameter adjustment.
To solve the above technical problem, a third aspect of the present invention provides an electronic device, including:
a processor; and
a memory storing computer executable instructions that, when executed, cause the processor to perform the method described above.
In order to solve the above technical problem, a fourth aspect of the present invention proposes a computer-readable storage medium storing one or more programs that when executed by a processor, implement the above method.
The invention creates matched AI voice parameters for different AI voice styles; firstly, playing voice according to AI voice parameters matched with the AI voice style selected by a user, and processing dialogue response; in the conversation process of a user and a robot, analyzing the difference value of the voice parameter of the conversation object and the AI voice parameter in real time, and comparing the difference value with a preset threshold value; if the difference value exceeds the preset threshold, the fact that the difference between the current AI voice parameter and the voice parameter of the call object is too large is indicated, the fact that the voice speed of the robot is too fast or too slow is indicated, normal call is affected, at the moment, the voice parameter is adjusted, and therefore the fact that the difference between the voice parameter of the call object and the AI voice parameter is within the preset threshold range is guaranteed, and normal call is guaranteed. According to the invention, the difference value of the voice parameter of the call object and the AI voice parameter is compared in real time, and when the difference value is larger than the preset threshold value, the AI voice parameter is adjusted, so that the speech speed of the call object is matched with the speech speed of the robot, the situation that the robot speech speed is too fast, a user cannot insert a call or the robot speech speed is too slow and normal response cannot be achieved is avoided, and the user experience is improved.
Drawings
In order to make the technical problems solved by the present invention, the technical means adopted and the technical effects achieved more clear, specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted, however, that the drawings described below are merely illustrative of exemplary embodiments of the present invention and that other embodiments of the drawings may be derived from these drawings by those skilled in the art without undue effort.
FIG. 1 is a flow chart of an AI voice rate adjustment method of the invention;
FIG. 2 is a schematic diagram of the present invention for creating a speech parameter set;
FIG. 3 is a schematic diagram of the present invention for finding AI speech parameters in a speech parameter set that match a user-selected AI speech style;
FIG. 4 is a schematic diagram of a structure of an AI voice rate adjusting apparatus according to the invention;
FIG. 5 is a block diagram of an exemplary embodiment of an electronic device in accordance with the present invention;
FIG. 6 is a schematic diagram of one embodiment of a computer readable medium of the present invention.
Detailed Description
Exemplary embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments are shown, although the exemplary embodiments may be practiced in various specific ways. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art.
The structures, capabilities, effects, or other features described in a particular embodiment may be incorporated in one or more other embodiments in any suitable manner without departing from the spirit of the present invention.
In describing particular embodiments, specific details of construction, performance, effects, or other features are set forth in order to provide a thorough understanding of the embodiments by those skilled in the art. It is not excluded, however, that one skilled in the art may implement the present invention in a particular situation in a solution that does not include the structures, properties, effects, or other characteristics described above.
The flow diagrams in the figures are merely exemplary flow illustrations and do not represent that all of the elements, operations, and steps in the flow diagrams must be included in the aspects of the invention, nor that the steps must be performed in the order shown in the figures. For example, some operations/steps in the flowcharts may be decomposed, some operations/steps may be combined or partially combined, etc., and the order of execution shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.
The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The same reference numerals in the drawings denote the same or similar elements, components or portions, and thus repeated descriptions of the same or similar elements, components or portions may be omitted hereinafter. It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various devices, elements, components or portions, these devices, elements, components or portions should not be limited by these terms. That is, these phrases are merely intended to distinguish one from the other. For example, a first device may also be referred to as a second device without departing from the spirit of the invention. Furthermore, the term "and/or," "and/or" is meant to include all combinations of any one or more of the items listed.
Fig. 1 is a flowchart of an AI speech rate adjustment method provided by the present invention, and as shown in fig. 1, the method includes:
s1, creating an AI voice parameter set matched with different AI voice styles;
in the present invention, the voice parameters include: voice interval time, response speed, and voice speed. The voice interval time refers to the pause time between two sentences, and the response speed refers to the speed of answering the voice of the call object, namely the response time of answering the sentence after the call object finishes speaking one sentence in the call process; speech speed is the speed of speaking.
In one example, as in FIG. 2, the created speech parameter set R contains 4 AI speech parameters that match the AI speech style: the first AI speech style is matched with the first AI speech parameter, the second AI speech style is matched with the second AI speech parameter, the third AI speech style is matched with the third AI speech parameter, and the fourth AI speech style is matched with the fourth AI speech parameter. Wherein, the voice interval time of the first AI voice style is 30s, the response speed is 10s, and the voice speed is 100 words/min; the voice interval time of the second AI voice style is 20s, the response speed is 7s, and the voice speed is 150 words/min; the voice interval time of the third AI voice style is 50s, the response speed is 3s, and the voice speed is 170 words/min; the fourth AI speech style had a speech interval of 50, a response speed of 20s, and a speech speed of 80 words/min.
S2, searching AI voice parameters matched with the AI voice style selected by the user in the voice parameter set, playing voice according to the AI voice parameters, and processing dialogue response;
specifically, before this step, the different AI speech styles may be displayed first, and the AI speech style selected by the user may be determined according to a selection operation of the user on the AI speech style, where the selection operation of the user on the AI speech style may specifically be clicking or dragging on the target AI speech style.
As shown in fig. 3, the display unit displays a first AI speech style, a second AI speech style, a third AI speech style and a fourth AI speech style, and if the user clicks the second AI speech style, the user searches the speech parameter set R for a second AI speech parameter matched with the second AI speech style: the voice interval time is 20s, the response speed is 7s, and the voice speed is 150 words/min; then the voice is played in the human-machine conversation according to the voice parameters and the conversation response is processed.
S3, analyzing the difference value of the voice parameter of the call object and the AI voice parameter in real time;
specifically, in this step, the voice parameters of the call object are collected in real time, and the effective voice of the call object is extracted, where the effective voice refers to the voice with the actual call meaning, that is, the dialogue voice excluding the words without the actual meaning such as the exclamation word, the mood word, and the like. And calculating and analyzing the voice parameters of the call object from the effective voice parameters, namely calculating and analyzing the voice interval time, the response speed and the voice speed of the call object. Calculating a difference value between the speech parameter of the call object and the second AI speech parameter, wherein the difference value comprises: a speech interval time difference value, a response speed difference value, and a speech speed difference value.
S4, comparing the difference value with a preset threshold value;
in the step, a threshold value can be preset in advance, and when the difference value of the voice parameter of the conversation object and the AI voice parameter is within the preset threshold value, the user is considered to be matched with the speaking speed of the robot, so that normal conversation can be performed; when the difference value of the voice parameter of the call object and the AI voice parameter is out of the preset threshold, the conversation preset of the user and the robot is not matched, and normal conversation is influenced.
Corresponding to the difference value, the preset threshold value comprises: a voice interval time preset threshold, a response speed preset threshold and a voice speed preset threshold.
And S5, if the difference value exceeds the preset threshold value, adjusting the voice parameter.
Since the difference value includes three sub-difference values, the preset threshold includes three sub-preset thresholds, that is, the difference value includes: a speech interval time difference value, a response speed difference value, and a speech speed difference value. The preset threshold value comprises the following steps: a voice interval time preset threshold, a response speed preset threshold and a voice speed preset threshold.
In one example, if at least one sub-difference value of the difference values exceeds a corresponding preset threshold value, the difference value is confirmed to exceed the preset threshold value. In another example, if at least two sub-difference values in the difference values exceed corresponding preset thresholds, it is confirmed that the difference value exceeds the preset threshold. In another example, if all sub-difference values in the difference values exceed corresponding preset thresholds, the difference value is confirmed to exceed the preset thresholds.
In an example, the adjusting the voice parameter may only adjust the voice parameter corresponding to the sub-difference value exceeding the preset threshold, and specifically may be adjusted according to the difference direction of the sub-difference value, where the difference direction of the sub-difference value includes a first direction and a second direction, the first direction is that the voice parameter of the call object is greater than the AI voice parameter, the second direction is that the voice parameter of the call object is less than the AI voice parameter, and when the difference direction of the sub-difference value is the first direction, the voice parameter corresponding to the sub-difference value is increased, and when the difference direction of the sub-difference value is the second direction, the voice parameter corresponding to the sub-difference value is decreased;
in another example, a voice adjustment model may be created and trained, and the voice parameters are adjusted according to the voice adjustment model if the variance value exceeds the preset threshold.
Through the steps S1 to S5, the adjustment of the first voice parameter of the man-machine conversation is completed, and the situation that the adjustment is excessive or not in place may occur, and it needs to be further determined whether the adjustment of the parameter is appropriate or not, and whether the adjustment needs to be continued or not, so after the step S5, the AI voice rate adjustment method of the present invention further includes:
s51, comparing the voice parameter of the current call object with the current AI voice parameter to obtain a first result, and comparing the voice parameter of the call object before adjusting the voice parameter with the voice parameter of the call object after adjusting the voice parameter to obtain a second result.
S52, determining the parameter adjustment direction of the round according to the first result and the second result;
wherein the first result comprises: the voice parameter of the current call object is larger than the AI voice parameter, which indicates that the voice speed of the current robot is slower than that of the person, the voice parameter of the current call object is equal to the AI voice parameter, which indicates that the voice speed of the current robot is the same as that of the person, and the voice parameter of the current call object is smaller than the AI voice parameter, which indicates that the voice speed of the current robot is faster than that of the person. The second result includes: the voice parameter of the conversation object before adjustment is larger than the voice parameter of the conversation object after adjustment, which indicates that the voice speed of the robot is slowed down after the voice speed adjustment, the voice parameter of the conversation object before adjustment is equal to the voice parameter of the conversation object after adjustment, which indicates that the voice speed of the robot is unchanged after the voice speed adjustment, and the voice parameter of the conversation object before adjustment is smaller than the voice parameter of the conversation object after adjustment, which indicates that the voice speed of the robot is speeded up after the voice speed adjustment.
In one example, the voice parameter of the current call object is greater than the current AI voice parameter, and the voice parameter of the call object before adjustment is greater than the voice parameter of the call object after adjustment, i.e. the current robot has a slower voice speed than the voice of the person, and the voice speed of the person after adjustment of the robot is slower, or the voice parameter of the call object before adjustment is less than the voice parameter of the call object after adjustment, i.e. the current robot has a faster voice speed than the voice of the person, and the voice speed of the person after adjustment of the robot is faster, the current wheel parameter adjustment direction is determined to be forward, which indicates that the current wheel parameter adjustment direction (increasing or decreasing the corresponding voice parameter) is correct, but the adjustment amplitude is insufficient, and further adjustment according to the current wheel adjustment direction is needed.
The current speech parameters of the call object are equal to the AI speech parameters, and the current round of parameter adjustment direction is determined to be moderate, so that the current round of parameter adjustment is in place, and continuous adjustment is not needed.
The voice parameter of the current call object is smaller than the voice parameter of the current AI, the voice parameter of the call object before adjustment is larger than the voice parameter of the call object after adjustment, namely the voice speed of the current robot is faster than that of the person, and the voice speed of the person after adjustment of the voice speed of the robot is slower, or the voice parameter of the call object before adjustment is larger than the voice parameter of the call object after adjustment, namely the voice speed of the current robot is slower than that of the person, and the voice speed of the person after adjustment of the voice speed of the robot is faster, the current wheel parameter adjustment direction is determined to be negative, and the current wheel parameter adjustment direction (corresponding voice parameter is increased or reduced) is incorrect, so that the current wheel adjustment direction adjustment needs to be further corrected.
Further, the current round of parameter adjustment direction can be determined by combining the meaning of the voice, wherein the meaning of the voice refers to the meaning of the voice corresponding to the sentence capable of influencing the voice parameter of the call object; for example, when a conversation just starts to call, the speaking speed of a conversation object is slower than that of a normal conversation, namely, the voice interval time is longer than that of the normal conversation, the response speed is longer than that of the normal conversation, the voice speed is smaller than that of the normal conversation, and when the conversation object is unwilling to continue the conversation, the speaking speed of the conversation object is faster than that of the normal conversation, for example, when the conversation object finds that a robot is not interested in promoting a product, the voice interval time is shorter than that of the normal conversation, the response speed is shorter than that of the normal conversation, and the voice speed is greater than that of the normal conversation.
S53, determining a lower wheel parameter adjustment scheme according to the current wheel parameter adjustment direction.
The adjustment scheme comprises the following steps: if the parameter adjustment direction of the wheel is moderate, the lower wheel is not adjusted, if the parameter adjustment direction of the wheel is positive, the lower wheel is continuously adjusted in the same direction as the parameter adjustment direction of the wheel, and if the parameter adjustment direction of the wheel is negative, the lower wheel is continuously adjusted in the opposite direction to the parameter adjustment direction of the wheel.
Furthermore, in the man-machine conversation process, for the situation that the voice speed of a person is too fast or too slow, some common spoken language words can be adopted to insert an AI conversation, so that the abnormal situation that the AI does not respond for a long time or the user is interrupted by mistake is avoided. Therefore, the voice rate adjusting method of the present invention further includes:
s61, if the voice rate of the call object is greater than the first preset rate or the voice rate of the call object is smaller than the second preset rate, inserting a common spoken language assisting word into the dialogue.
Fig. 4 is a schematic structural diagram of an AI speech rate adjusting apparatus according to the present invention, as shown in fig. 4, the apparatus includes:
a first creating module 41, configured to create AI speech parameter sets that match different AI speech styles;
a searching and playing module 42, configured to search the voice parameter set for an AI voice parameter matching the AI voice style selected by the user, and play voice according to the AI voice parameter, so as to process a dialogue response;
the analysis module 43 is configured to analyze in real time a difference value between a voice parameter of a call object and the AI voice parameter;
a comparison module 44, configured to compare the difference value with a preset threshold;
the adjusting module 45 is configured to adjust the voice parameter if the difference value exceeds the preset threshold.
The comparison module 46 compares the voice parameter of the current call object with the voice parameter of the current AI to obtain a first result, and compares the voice parameter of the call object before adjusting the voice parameter with the voice parameter of the call object after adjusting the voice parameter to obtain a second result.
A first determining module 47, configured to determine a current round of parameter adjustment direction according to the first result and the second result;
the second determining module 48 is configured to determine a lower wheel parameter adjustment scheme according to the current wheel parameter adjustment direction. The adjustment scheme comprises no adjustment, and continuous adjustment in the same direction as the adjustment direction of the parameters of the wheel or continuous adjustment in the opposite direction to the adjustment direction of the parameters of the wheel.
The inserting module 49 is configured to insert a common spoken language word in the dialogue if the voice rate of the call object is greater than the first preset rate or the voice rate of the call object is less than the second preset rate.
In a preferred embodiment, the device further comprises: the second creation module is used for creating a voice adjustment model;
if the difference value exceeds the preset threshold, the adjustment module 45 adjusts the voice parameter according to the voice adjustment model.
Wherein the speech parameters include: voice interval time, response speed, and voice speed.
It will be appreciated by those skilled in the art that the modules in the embodiments of the apparatus described above may be distributed in an apparatus as described, or may be distributed in one or more apparatuses different from the embodiments described above with corresponding changes. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.
The following describes an embodiment of an electronic device of the present invention, which may be regarded as a physical form of implementation for the above-described embodiment of the method and apparatus of the present invention. Details described in relation to the embodiments of the electronic device of the present invention should be considered as additions to the embodiments of the method or apparatus described above; for details not disclosed in the embodiments of the electronic device of the present invention, reference may be made to the above-described method or apparatus embodiments.
Fig. 5 is a block diagram of an exemplary embodiment of an electronic device according to the present invention. The electronic device shown in fig. 5 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the present invention.
As shown in fig. 5, the electronic device 500 of the exemplary embodiment is embodied in the form of a general-purpose data processing device. The components of electronic device 500 may include, but are not limited to: at least one processing unit 510, at least one memory unit 520, a bus 530 connecting the different electronic device components (including the memory unit 520 and the processing unit 510), a display unit 540, etc.
The storage unit 520 stores a computer readable program, which may be a source program or code of a read only program. The program may be executed by the processing unit 510 such that the processing unit 510 performs the steps of various embodiments of the present invention. For example, the processing unit 510 may perform the steps shown in fig. 1.
The memory unit 520 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 5201 and/or cache memory unit 5202, and may further include Read Only Memory (ROM) 5203. The storage unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating electronic device, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 530 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 500 may also communicate with one or more external devices 300 (e.g., keyboard, display, network device, bluetooth device, etc.), such that a user can interact with the electronic device 500 via the external devices 500, and/or such that the electronic device 500 can communicate with one or more other data processing devices (e.g., routers, modems, etc.). Such communication may occur through an input/output (I/O) interface 550, and may also occur through a network adapter 560 to one or more networks, such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet. The network adapter 560 may communicate with other modules of the electronic device 500 via the bus 530. It should be appreciated that although not shown in fig. 5, other hardware and/or software modules may be used in electronic device 500, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID electronics, tape drives, data backup storage electronics, and the like.
FIG. 6 is a schematic diagram of one embodiment of a computer readable medium of the present invention. As shown in fig. 6, the computer program may be stored on one or more computer readable media. The computer readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but not limited to, an electronic device, apparatus, or means of electronic, magnetic, optical, electromagnetic, infrared, or semiconductor, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. The computer program, when executed by one or more data processing devices, enables the computer readable medium to carry out the above-described method of the present invention, namely: creating AI voice parameter sets matched with different AI voice styles; searching AI voice parameters matched with the AI voice style selected by the user in the voice parameter set, playing voice according to the AI voice parameters, and processing dialogue response; analyzing the difference value of the voice parameter of the call object and the AI voice parameter in real time; comparing the difference value with a preset threshold value; and if the difference value exceeds the preset threshold value, adjusting the voice parameter.
From the above description of embodiments, those skilled in the art will readily appreciate that the exemplary embodiments described herein may be implemented in software, or may be implemented in software in combination with necessary hardware. Thus, the technical solution according to the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a computer readable storage medium (may be a CD-ROM, a usb disk, a mobile hard disk, etc.) or on a network, comprising several instructions to cause a data processing device (may be a personal computer, a server, or a network device, etc.) to perform the above-described method according to the present invention.
The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution electronic device, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
In summary, the present invention may be implemented in a method, apparatus, electronic device, or computer readable medium that executes a computer program. Some or all of the functions of the present invention may be implemented in practice using a general purpose data processing device such as a microprocessor or Digital Signal Processor (DSP).
The above-described specific embodiments further describe the objects, technical solutions and advantageous effects of the present invention in detail, and it should be understood that the present invention is not inherently related to any particular computer, virtual device or electronic apparatus, and various general-purpose devices may also implement the present invention. The foregoing description of the embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (12)

1. An AI speech rate adjustment method, the method comprising:
creating AI voice parameter sets matched with different AI voice styles;
searching AI voice parameters matched with the AI voice style selected by the user in the voice parameter set, playing voice according to the AI voice parameters, and processing dialogue response;
analyzing the difference value of the voice parameter of the call object and the AI voice parameter in real time;
comparing the difference value with a preset threshold value;
if the difference value exceeds the preset threshold value, adjusting the voice parameter;
comparing the voice parameters of the current call object with the current AI voice parameters to obtain a first result, and comparing the voice parameters of the call object before adjusting the voice parameters with the voice parameters of the call object after adjusting the voice parameters to obtain a second result;
determining the parameter adjustment direction of the round according to the first result and the second result;
if the parameter adjustment direction of the wheel is moderate, the lower wheel is not adjusted, if the parameter adjustment direction of the wheel is positive, the lower wheel is continuously adjusted in the same direction as the parameter adjustment direction of the wheel, and if the parameter adjustment direction of the wheel is negative, the parameter adjustment direction of the lower wheel is reversely continuously adjusted;
wherein: the first result includes: the voice parameter of the current call object is larger than the AI voice parameter, which indicates that the voice speed of the current robot is slower than that of the person, the voice parameter of the current call object is equal to the AI voice parameter, which indicates that the voice speed of the current robot is the same as that of the person, the voice parameter of the current call object is smaller than the AI voice parameter, which indicates that the voice speed of the current robot is faster than that of the person; the second result includes: the voice parameter of the conversation object before adjustment is larger than the voice parameter of the conversation object after adjustment, which indicates that the voice speed of the robot is slowed down after the voice speed adjustment, the voice parameter of the conversation object before adjustment is equal to the voice parameter of the conversation object after adjustment, which indicates that the voice speed of the robot is unchanged after the voice speed adjustment, and the voice parameter of the conversation object before adjustment is smaller than the voice parameter of the conversation object after adjustment, which indicates that the voice speed of the robot is speeded up after the voice speed adjustment.
2. The method according to claim 1, wherein the method further comprises: and creating a voice adjustment model, and adjusting the voice parameters according to the voice adjustment model if the difference value exceeds the preset threshold value.
3. The method of claim 1, wherein if the speech rate of the call object is greater than a first predetermined rate or the speech rate of the call object is less than a second predetermined rate, inserting a common spoken word in the conversation.
4. The method of claim 1, wherein the speech parameters comprise: voice interval time, response speed, and voice speed.
5. The method of claim 1, wherein the adjustment direction comprises no adjustment, continuing the adjustment in the same direction as the current wheel parameter adjustment direction or continuing the adjustment in the opposite direction to the current wheel parameter adjustment direction.
6. An AI speech rate adjustment apparatus, the apparatus comprising:
the first creating module is used for creating an AI voice parameter set matched with different AI voice styles;
the searching and playing module is used for searching the AI voice parameters matched with the AI voice styles selected by the user in the voice parameter set, playing the voice according to the AI voice parameters and processing dialogue response;
the analysis module is used for analyzing the difference value of the voice parameter of the call object and the AI voice parameter in real time;
the comparison module is used for comparing the difference value with a preset threshold value;
the adjusting module is used for adjusting the voice parameters if the difference value exceeds the preset threshold value;
the comparison module is used for comparing the voice parameters of the current call object with the voice parameters of the current AI to obtain a first result, and comparing the voice parameters of the call object before adjusting the voice parameters with the voice parameters of the call object after adjusting the voice parameters to obtain a second result;
the first determining module is used for determining the parameter adjustment direction of the round according to the first result and the second result;
the second determining module is used for not adjusting the lower wheel if the parameter adjusting direction of the lower wheel is moderate, continuously adjusting the lower wheel in the same direction as the parameter adjusting direction of the lower wheel if the parameter adjusting direction of the lower wheel is positive, and continuously adjusting the lower wheel in the opposite direction if the parameter adjusting direction of the lower wheel is negative;
wherein: the first result includes: the voice parameter of the current call object is larger than the AI voice parameter, which indicates that the voice speed of the current robot is slower than that of the person, the voice parameter of the current call object is equal to the AI voice parameter, which indicates that the voice speed of the current robot is the same as that of the person, the voice parameter of the current call object is smaller than the AI voice parameter, which indicates that the voice speed of the current robot is faster than that of the person; the second result includes: the voice parameter of the conversation object before adjustment is larger than the voice parameter of the conversation object after adjustment, which indicates that the voice speed of the robot is slowed down after the voice speed adjustment, the voice parameter of the conversation object before adjustment is equal to the voice parameter of the conversation object after adjustment, which indicates that the voice speed of the robot is unchanged after the voice speed adjustment, and the voice parameter of the conversation object before adjustment is smaller than the voice parameter of the conversation object after adjustment, which indicates that the voice speed of the robot is speeded up after the voice speed adjustment.
7. The apparatus of claim 6, wherein the apparatus further comprises:
the second creation module is used for creating a voice adjustment model;
and if the difference value exceeds the preset threshold value, the adjusting module adjusts the voice parameters according to the voice adjusting model.
8. The apparatus of claim 6, wherein the apparatus further comprises:
and the inserting module is used for inserting the common spoken language assisting word into the dialogue if the voice rate of the communication object is larger than the first preset rate or smaller than the second preset rate.
9. The apparatus of claim 6, wherein the speech parameters comprise: voice interval time, response speed, and voice speed.
10. The apparatus of claim 6, wherein the adjustment direction comprises no adjustment, continuing the adjustment in the same direction as the current wheel parameter adjustment direction or continuing the adjustment in the opposite direction to the current wheel parameter adjustment direction.
11. An electronic device, comprising:
a processor; and
a memory storing computer-executable instructions that, when executed, cause the processor to perform the method of any of claims 1-5.
12. A computer readable storage medium storing one or more programs, which when executed by a processor, implement the method of any of claims 1-5.
CN201910939380.8A 2019-09-30 2019-09-30 AI voice rate adjusting method and device and electronic equipment Active CN110619888B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910939380.8A CN110619888B (en) 2019-09-30 2019-09-30 AI voice rate adjusting method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910939380.8A CN110619888B (en) 2019-09-30 2019-09-30 AI voice rate adjusting method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN110619888A CN110619888A (en) 2019-12-27
CN110619888B true CN110619888B (en) 2023-06-27

Family

ID=68924953

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910939380.8A Active CN110619888B (en) 2019-09-30 2019-09-30 AI voice rate adjusting method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110619888B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763921B (en) * 2020-07-24 2024-06-18 北京沃东天骏信息技术有限公司 Method and device for correcting text
CN112599151B (en) * 2020-12-07 2023-07-21 携程旅游信息技术(上海)有限公司 Language speed evaluation method, system, equipment and storage medium
CN112599148A (en) * 2020-12-31 2021-04-02 北京声智科技有限公司 Voice recognition method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106572067A (en) * 2015-10-12 2017-04-19 阿里巴巴集团控股有限公司 Voice flow transmission method and system
CN109582275A (en) * 2018-12-03 2019-04-05 珠海格力电器股份有限公司 Voice adjusting method and device, storage medium and electronic device
CN109767792A (en) * 2019-03-18 2019-05-17 百度国际科技(深圳)有限公司 Sound end detecting method, device, terminal and storage medium
KR20190084202A (en) * 2017-12-18 2019-07-16 네이버 주식회사 Method and system for controlling artificial intelligence device using plurality wake up word

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106572067A (en) * 2015-10-12 2017-04-19 阿里巴巴集团控股有限公司 Voice flow transmission method and system
KR20190084202A (en) * 2017-12-18 2019-07-16 네이버 주식회사 Method and system for controlling artificial intelligence device using plurality wake up word
CN109582275A (en) * 2018-12-03 2019-04-05 珠海格力电器股份有限公司 Voice adjusting method and device, storage medium and electronic device
CN109767792A (en) * 2019-03-18 2019-05-17 百度国际科技(深圳)有限公司 Sound end detecting method, device, terminal and storage medium

Also Published As

Publication number Publication date
CN110619888A (en) 2019-12-27

Similar Documents

Publication Publication Date Title
US11848008B2 (en) Artificial intelligence-based wakeup word detection method and apparatus, device, and medium
US10565983B2 (en) Artificial intelligence-based acoustic model training method and apparatus, device and storage medium
CN110619888B (en) AI voice rate adjusting method and device and electronic equipment
CN108831469B (en) Voice command customizing method, device and equipment and computer storage medium
EP3451195A1 (en) Music recommending method and apparatus, device and storage medium
US10783884B2 (en) Electronic device-awakening method and apparatus, device and computer-readable storage medium
US11830482B2 (en) Method and apparatus for speech interaction, and computer storage medium
WO2015147702A1 (en) Voice interface method and system
CN111931482B (en) Text segmentation method and device
CN111627436B (en) Voice control method and device
CN110188356B (en) Information processing method and device
US11511200B2 (en) Game playing method and system based on a multimedia file
CN111178081B (en) Semantic recognition method, server, electronic device and computer storage medium
CN112799630A (en) Creating a cinematographed storytelling experience using network addressable devices
CN108055617A (en) Microphone awakening method and device, terminal equipment and storage medium
KR20210042523A (en) An electronic apparatus and Method for controlling the electronic apparatus thereof
WO2019169722A1 (en) Shortcut key recognition method and apparatus, device, and computer-readable storage medium
CN108053826B (en) Method and device for man-machine interaction, electronic equipment and storage medium
CN116955569A (en) Dialogue generation method, device, equipment and storage medium
CN113901837A (en) Intention understanding method, device, equipment and storage medium
CN109460548B (en) Intelligent robot-oriented story data processing method and system
JP2022031854A (en) Generation method of reply content, device, apparatus and storage medium
CN112329431A (en) Audio and video data processing method and device and storage medium
CN110516043B (en) Answer generation method and device for question-answering system
CN111508481A (en) Training method and device of voice awakening model, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant