CN110619888A - AI voice rate adjusting method and device and electronic equipment - Google Patents

AI voice rate adjusting method and device and electronic equipment Download PDF

Info

Publication number
CN110619888A
CN110619888A CN201910939380.8A CN201910939380A CN110619888A CN 110619888 A CN110619888 A CN 110619888A CN 201910939380 A CN201910939380 A CN 201910939380A CN 110619888 A CN110619888 A CN 110619888A
Authority
CN
China
Prior art keywords
voice
parameter
adjustment
difference value
voice parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910939380.8A
Other languages
Chinese (zh)
Other versions
CN110619888B (en
Inventor
石文超
戴会杰
常富洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qilu Information Technology Co Ltd
Original Assignee
Beijing Qilu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qilu Information Technology Co Ltd filed Critical Beijing Qilu Information Technology Co Ltd
Priority to CN201910939380.8A priority Critical patent/CN110619888B/en
Publication of CN110619888A publication Critical patent/CN110619888A/en
Application granted granted Critical
Publication of CN110619888B publication Critical patent/CN110619888B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion
    • G10L21/043Time compression or expansion by changing speed
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Telephone Function (AREA)

Abstract

The invention discloses an AI voice rate adjusting method, a device and electronic equipment, wherein the method comprises the following steps: creating an AI voice parameter set matched with different AI voice styles; searching an AI voice parameter matched with the AI voice style selected by the user in the voice parameter set, playing voice according to the AI voice parameter and processing a dialog response; analyzing the difference value between the voice parameter of the call object and the AI voice parameter in real time; comparing the difference value with a preset threshold value; and if the difference value exceeds the preset threshold value, adjusting the voice parameters. The AI voice rate adjusting method of the invention compares the difference value of the voice parameter of the talking object and the AI voice parameter in real time, and adjusts the AI voice parameter when the difference value is larger than the preset threshold value, thereby ensuring that the voice rate of the talking object is matched with the voice rate of the robot.

Description

AI voice rate adjusting method and device and electronic equipment
Technical Field
The invention relates to the technical field of AI voices, in particular to an AI voice rate adjusting method, an AI voice rate adjusting device, electronic equipment and a computer readable medium.
Background
AI (Artificial Intelligence) voice technology develops rapidly in recent years, and each large intelligent device takes the intelligent voice AI technology carried as a selling point, people increasingly desire to communicate with a machine naturally and conveniently, from a traditional voice interaction system with one question and one answer to a current popular voice interaction system with multiple questions and answers, AI interaction is closer to human interaction, and user experience is greatly improved.
However, the playing speed of the AI speech synthesized by various technical means is usually fixed and invariable, and in practical application scenarios, the response speeds of different people to the AI speech, the intervals are long and short, and the speech speed feelings are not completely consistent. This results in some users feeling that the robot speech rate is too fast, or the interval time between each sentence is too short, or the response speed of the robot is too fast, while some users feeling that the robot speech rate is too slow, or the interval time between each sentence is too long, or the response speed of the robot is too slow, during the man-machine conversation. In an extreme case, the user may not insert the phone or the robot may not answer the phone normally, so that the conversation may not be performed normally, and the user experience may be affected.
Disclosure of Invention
The invention aims to solve the technical problem that the AI voice playing speed is fixed and can not adapt to the dialogue speeds of different users in the prior art.
In order to solve the above technical problem, a first aspect of the present invention provides an AI speech rate adjustment method, including:
creating an AI voice parameter set matched with different AI voice styles;
searching an AI voice parameter matched with the AI voice style selected by the user in the voice parameter set, playing voice according to the AI voice parameter and processing a dialog response;
analyzing the difference value between the voice parameter of the call object and the AI voice parameter in real time;
comparing the difference value with a preset threshold value;
and if the difference value exceeds the preset threshold value, adjusting the voice parameters.
According to a preferred embodiment of the present invention, the method further comprises:
and establishing a voice adjusting model, and adjusting the voice parameters according to the voice adjusting model if the difference value exceeds the preset threshold value.
According to a preferred embodiment of the present invention, the method further comprises:
and comparing the voice parameter of the current call object with the current AI voice parameter to obtain a first result, and comparing the voice parameter of the call object before the voice parameter is adjusted with the voice parameter of the call object after the voice parameter is adjusted to obtain a second result.
Determining the parameter adjustment direction of the current round according to the first result and the second result;
and determining a lower round parameter adjusting scheme according to the current round parameter adjusting direction.
According to a preferred embodiment of the present invention, if the speech rate of the call object is greater than a first preset rate, or the speech rate of the call object is less than a second preset rate, a common spoken assistant is inserted into the conversation.
According to a preferred embodiment of the present invention, the speech parameters include: speech interval time, response speed, and speech speed.
According to a preferred embodiment of the present invention, the adjustment scheme includes no adjustment, and the adjustment is continued in the same direction as the adjustment of the current parameter or in the opposite direction to the adjustment of the current parameter.
In order to solve the above technical problem, a second aspect of the present invention provides an AI speech rate adjustment apparatus, including:
the first establishing module is used for establishing an AI voice parameter set matched with different AI voice styles;
the searching and playing module is used for searching the AI voice parameters matched with the AI voice style selected by the user in the voice parameter set, playing the voice according to the AI voice parameters and processing the dialog response;
the analysis module is used for analyzing the difference value between the voice parameter of the call object and the AI voice parameter in real time;
the comparison module is used for comparing the difference value with a preset threshold value;
and the adjusting module is used for adjusting the voice parameters if the difference value exceeds the preset threshold value.
According to a preferred embodiment of the present invention, the apparatus further comprises:
the second establishing module is used for establishing a voice adjusting model;
if the difference value exceeds the preset threshold value, the adjusting module adjusts the voice parameters according to the voice adjusting model.
According to a preferred embodiment of the present invention, the apparatus further comprises:
and the comparison module is used for comparing the voice parameter of the current call object with the current AI voice parameter to obtain a first result, and comparing the voice parameter of the call object before the voice parameter is adjusted with the voice parameter of the call object after the voice parameter is adjusted to obtain a second result.
The first determining module is used for determining the parameter adjusting direction of the current round according to the first result and the second result;
and the second determining module is used for determining a lower round parameter adjusting scheme according to the current round parameter adjusting direction.
According to a preferred embodiment of the present invention, the apparatus further comprises:
and the inserting module is used for inserting the common spoken auxiliary words into the conversation if the voice rate of the call object is greater than a first preset rate or the voice rate of the call object is less than a second preset rate.
According to a preferred embodiment of the present invention, the speech parameters include: speech interval time, response speed, and speech speed.
According to a preferred embodiment of the present invention, the adjustment scheme includes no adjustment, and the adjustment is continued in the same direction as the adjustment of the current parameter or in the opposite direction to the adjustment of the current parameter.
To solve the above technical problem, a third aspect of the present invention provides an electronic device, comprising:
a processor; and
a memory storing computer executable instructions that, when executed, cause the processor to perform the method described above.
In order to solve the above technical problem, a fourth aspect of the present invention proposes a computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs that, when executed by a processor, implement the above method.
The invention creates matched AI voice parameters for different AI voice styles; firstly, playing voice according to AI voice parameters matched with the AI voice style selected by a user, and processing a dialogue response; analyzing the difference value between the voice parameter of the call object and the AI voice parameter in real time in the conversation process of the user and the robot, and comparing the difference value with a preset threshold value; if the difference value exceeds the preset threshold value, the difference between the current AI voice parameter and the voice parameter of the call object is too large, which indicates that the robot has too fast or too slow voice and normal call is affected, and at the moment, the voice parameter is adjusted, so that the difference between the voice parameter of the call object and the AI voice parameter is within the range of the preset threshold value, and normal call is ensured. According to the method and the device, the difference value between the voice parameter of the call object and the AI voice parameter is compared in real time, and when the difference value is larger than the preset threshold value, the AI voice parameter is adjusted, so that the matching between the voice speed of the call object and the voice speed of the robot is ensured, the situations that the robot has too high voice speed and the user can not speak when inserting the call or the robot has too low voice speed and can not answer the call normally are avoided, and the user experience is improved.
Drawings
In order to make the technical problems solved by the present invention, the technical means adopted and the technical effects obtained more clear, the following will describe in detail the embodiments of the present invention with reference to the accompanying drawings. It should be noted, however, that the drawings described below are only illustrations of exemplary embodiments of the invention, from which other embodiments can be derived by those skilled in the art without inventive step.
FIG. 1 is a flow chart of an AI speech rate adjustment method of the present invention;
FIG. 2 is a schematic diagram of the present invention for creating a set of speech parameters;
FIG. 3 is a schematic diagram of the present invention finding AI speech parameters in a speech parameter set that match the user-selected AI speech style;
FIG. 4 is a schematic diagram of a structural framework of an AI speech rate adjustment apparatus according to the present invention;
FIG. 5 is a block diagram of an exemplary embodiment of an electronic device in accordance with the present invention;
FIG. 6 is a diagrammatic representation of one embodiment of a computer-readable medium of the present invention.
Detailed Description
Exemplary embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention may be embodied in many specific forms, and should not be construed as limited to the embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art.
The structures, properties, effects or other characteristics described in a certain embodiment may be combined in any suitable manner in one or more other embodiments, while still complying with the technical idea of the invention.
In describing particular embodiments, specific details of structures, properties, effects, or other features are set forth in order to provide a thorough understanding of the embodiments by one skilled in the art. However, it is not excluded that a person skilled in the art may implement the invention in a specific case without the above-described structures, performances, effects or other features.
The flow chart in the drawings is only an exemplary flow demonstration, and does not represent that all the contents, operations and steps in the flow chart are necessarily included in the scheme of the invention, nor does it represent that the execution is necessarily performed in the order shown in the drawings. For example, some operations/steps in the flowcharts may be divided, some operations/steps may be combined or partially combined, and the like, and the execution order shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.
The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The same reference numerals denote the same or similar elements, components, or parts throughout the drawings, and thus, a repetitive description thereof may be omitted hereinafter. It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various elements, components, or sections, these elements, components, or sections should not be limited by these terms. That is, these phrases are used only to distinguish one from another. For example, a first device may also be referred to as a second device without departing from the spirit of the present invention. Furthermore, the term "and/or", "and/or" is intended to include all combinations of any one or more of the listed items.
Fig. 1 is a flowchart of an AI speech rate adjustment method provided by the present invention, and as shown in fig. 1, the method includes:
s1, establishing an AI voice parameter set matched with different AI voice styles;
in the present invention, the speech parameters include: speech interval time, response speed, and speech speed. The voice interval time refers to the pause time between two sentences, the response speed refers to the speed of answering the voice of the talking object, namely, the response time of answering the sentence after the talking object finishes one sentence in the talking process; the speech speed is the speaking speed.
In one example, as shown in fig. 2, the created speech parameter set R contains AI speech parameters matching 4 AI speech styles: the first AI voice style is matched with the first AI voice parameter, the second AI voice style is matched with the second AI voice parameter, the third AI voice style is matched with the third AI voice parameter, and the fourth AI voice style is matched with the fourth AI voice parameter. Wherein the voice interval time of the first AI voice style is 30s, the response speed is 10s, and the voice speed is 100 words/minute; the voice interval time of the second AI voice style is 20s, the response speed is 7s, and the voice speed is 150 words/minute; the voice interval time of the third AI voice style is 50s, the response speed is 3s, and the voice speed is 170 characters/minute; the fourth AI voice style has a voice interval time of 50, a response speed of 20s, and a voice speed of 80 words/minute.
S2, finding AI voice parameters matched with the AI voice style selected by the user in the voice parameter set, playing voice according to the AI voice parameters, and processing a dialog response;
specifically, before this step, the different AI voice styles may be displayed, and the AI voice style selected by the user is determined according to the selection operation of the user on the AI voice style, where the selection operation of the user on the AI voice style may specifically be a click or a drag on the target AI voice style.
As shown in fig. 3, the display unit displays the first AI voice style, the second AI voice style, the third AI voice style and the fourth AI voice style, and when the user clicks the second AI voice style, the second AI voice parameter matched with the second AI voice style is found in the voice parameter set R: the voice interval time is 20s, the response speed is 7s, and the voice speed is 150 characters/minute; the speech is played in the man-machine conversation according to the speech parameters and the dialog response is processed.
S3, analyzing the difference value between the voice parameter of the call object and the AI voice parameter in real time;
specifically, in this step, the speech parameters of the call object are collected in real time, and the effective speech of the call object is extracted, where the effective speech refers to speech with actual call meaning, that is, dialogue speech without words with actual meaning, such as exclamation words and word-atmosphere words. And calculating the voice parameters of the analysis call object from the effective voice parameters, namely calculating the voice interval time, the response speed and the voice speed of the analysis call object. Calculating a difference value between the voice parameter of the call object and the second AI voice parameter, wherein the difference value corresponds to the voice parameter and comprises: a speech interval time difference value, a response speed difference value and a speech speed difference value.
S4, comparing the difference value with a preset threshold value;
in this step, a threshold value may be preset, and when the difference value between the voice parameter of the call object and the AI voice parameter is within the preset threshold value, it is considered that the conversation speed of the user matches with the conversation speed of the robot, and normal conversation may be performed; when the difference value between the voice parameter of the call object and the AI voice parameter is beyond the preset threshold value, the normal conversation is influenced if the conversation preset between the user and the robot is not matched.
Corresponding to the difference value, the preset threshold value comprises: voice interval time preset threshold, response speed preset threshold and voice speed preset threshold.
And S5, if the difference value exceeds the preset threshold value, adjusting the voice parameters.
Because the difference value includes three sub-difference values, correspondingly, the preset threshold includes three sub-preset thresholds, that is, the difference value includes: a speech interval time difference value, a response speed difference value and a speech speed difference value. The preset threshold includes: voice interval time preset threshold, response speed preset threshold and voice speed preset threshold.
In an example, if at least one sub-difference value of the difference values exceeds a corresponding preset threshold, it is determined that the difference value exceeds the preset threshold. In another example, if at least two sub-differences of the differences exceed corresponding preset thresholds, it is determined that the differences exceed the preset thresholds. In another example, if all sub-difference values in the difference values exceed corresponding preset thresholds, it is determined that the difference values exceed the preset thresholds.
In an example, adjusting the voice parameter may only adjust the voice parameter corresponding to the sub-difference value exceeding a preset threshold, and specifically may adjust the voice parameter according to a difference direction of the sub-difference value, where the difference direction of the sub-difference value includes a first direction and a second direction, the first direction indicates that the voice parameter of the call object is greater than the AI voice parameter, and the second direction indicates that the voice parameter of the call object is smaller than the AI voice parameter, the voice parameter corresponding to the sub-difference value is increased when the difference direction of the sub-difference value is the first direction, and the voice parameter corresponding to the sub-difference value is decreased when the difference direction of the sub-difference value is the second direction;
in another example, a speech adjustment model may be created and trained, and if the difference value exceeds the preset threshold, the speech parameter may be adjusted according to the speech adjustment model.
Through the above steps S1 to S5, the adjustment of the first speech parameter of the man-machine conversation is completed, and there may be a case of over-adjustment or under-adjustment, and it needs to be further determined whether the parameter adjustment is suitable for this time, and whether the adjustment needs to be continued, so after the above step S5, the AI speech rate adjusting method of the present invention further includes:
s51, comparing the voice parameter of the current call object with the current AI voice parameter to obtain a first result, and comparing the voice parameter of the call object before the voice parameter is adjusted with the voice parameter of the call object after the voice parameter is adjusted to obtain a second result.
S52, determining the parameter adjustment direction of the current round according to the first result and the second result;
wherein the first result comprises: the voice parameter of the current call object is larger than the AI voice parameter, which indicates that the current robot voice speed is lower than the human voice speed, the voice parameter of the current call object is equal to the AI voice parameter, which indicates that the current robot voice speed is the same as the human voice speed, and the voice parameter of the current call object is smaller than the AI voice parameter, which indicates that the current robot voice speed is higher than the human voice speed. The second result includes: the voice parameter of the talking object before the adjustment is larger than the voice parameter of the talking object after the adjustment, which indicates that the human speed is slowed down after the robot voice speed is adjusted, the voice parameter of the talking object before the adjustment is equal to the voice parameter of the talking object after the adjustment, which indicates that the human speed is unchanged after the robot voice speed is adjusted, and the voice parameter of the talking object before the adjustment is smaller than the voice parameter of the talking object after the adjustment, which indicates that the human speed is accelerated after the robot voice speed is adjusted.
In an example, the voice parameter of the current call object is greater than the current AI voice parameter, and the voice parameter of the call object before adjustment is greater than the voice parameter of the call object after adjustment, that is, the current robot voice is slower than the speech speed of the human, and the speech speed of the human after robot voice adjustment is slower, or the voice parameter of the call object before adjustment is smaller than the voice parameter of the call object after adjustment, that is, the current robot voice is faster than the speech speed of the human, and the speech speed of the human after robot voice adjustment is faster, the current round of parameter adjustment direction is determined to be the forward direction, which indicates that the current round of parameter adjustment direction (increasing or decreasing the corresponding voice parameter) is correct, but the adjustment range is not sufficient, and further adjustment according to the current round of parameter adjustment direction is required.
The voice parameter of the current call object is equal to the AI voice parameter, and the adjustment direction of the current parameter is determined to be moderate, which indicates that the current parameter is adjusted in place and does not need to be adjusted continuously.
The voice parameter of the current call object is smaller than the current AI voice parameter, and the voice parameter of the call object before adjustment is larger than the voice parameter of the call object after adjustment, namely the current robot voice is faster than the voice speed of the human, and the voice speed of the human after robot voice adjustment is slower, or the voice parameter of the call object before adjustment is larger than the voice parameter of the call object after adjustment, namely the current robot voice is slower than the voice speed of the human, and the voice speed of the human after robot voice adjustment is faster, the parameter adjustment direction in the current round is determined to be negative, which indicates that the parameter adjustment direction in the current round (increasing or decreasing the corresponding voice parameter) is incorrect, and the adjustment direction in the current round needs to be further corrected.
Further, the parameter adjustment direction in the current round may be determined by combining the semantic meaning of the voice, where the semantic meaning of the voice refers to a voice semantic meaning corresponding to a statement that can affect the voice parameter of the call object; for example, when a call is just initiated, the speech rate of the call object is slower than that of the normal conversation, i.e. the speech interval time is longer than that in the normal conversation process, the response speed is longer than that in the normal conversation process, and the speech rate is lower than that in the normal conversation process, while when the call object does not want to continue the conversation, the speech rate of the call object is faster than that in the normal conversation process, and if the call object finds that the robot is marketing a product which is not interesting by itself, the speech interval time is shorter than that in the normal conversation process, the response speed is shorter than that in the normal conversation process, and the speech rate is higher than that in the normal conversation process.
And S53, determining a next round parameter adjusting scheme according to the current round parameter adjusting direction.
The adjustment scheme comprises the following steps: if the parameter adjusting direction of the current wheel is moderate, the lower wheel is not adjusted, if the parameter adjusting direction of the current wheel is positive, the parameter adjusting direction of the lower wheel is the same as that of the current wheel, and the adjustment is continued, and if the parameter adjusting direction of the current wheel is negative, the parameter adjusting direction of the lower wheel is opposite to that of the current wheel.
Furthermore, in the process of man-machine conversation, for the condition that the speed of the human voice is too fast or too slow, some common spoken auxiliary words can be adopted to be inserted into the AI conversation, so that the abnormal condition that the AI does not respond for a long time or the user conversation is interrupted by mistake is avoided. Therefore, the speech rate adjustment method of the present invention further comprises:
and S61, if the voice rate of the call object is greater than a first preset rate, or the voice rate of the call object is less than a second preset rate, inserting common spoken auxiliary words into the conversation.
Fig. 4 is a schematic structural diagram of an AI speech rate adjusting apparatus according to the present invention, as shown in fig. 4, the apparatus includes:
a first creating module 41, configured to create an AI voice parameter set that matches different AI voice styles;
the searching and playing module 42 is configured to search, in the voice parameter set, an AI voice parameter matched with an AI voice style selected by a user, play a voice according to the AI voice parameter, and process a dialog response;
the analysis module 43 is configured to analyze a difference value between the voice parameter of the call object and the AI voice parameter in real time;
a comparison module 44, configured to compare the difference value with a preset threshold;
and an adjusting module 45, configured to adjust the voice parameter if the difference value exceeds the preset threshold.
The comparison module 46 obtains a first result by comparing the voice parameter of the current call object with the current AI voice parameter, and obtains a second result by comparing the voice parameter of the call object before the voice parameter is adjusted and the voice parameter of the call object after the voice parameter is adjusted.
A first determining module 47, configured to determine a current parameter adjustment direction according to the first result and the second result;
and a second determining module 48, configured to determine a next round parameter adjustment scheme according to the current round parameter adjustment direction. The adjustment scheme comprises no adjustment, and continuous adjustment in the same direction as the adjustment direction of the parameters of the current round or continuous adjustment in the direction opposite to the adjustment direction of the parameters of the current round.
And the inserting module 49 is configured to insert a common spoken assistant word into the conversation if the speech rate of the call target is greater than a first preset rate, or the speech rate of the call target is less than a second preset rate.
In a preferred embodiment, the apparatus further comprises: the second establishing module is used for establishing a voice adjusting model;
if the difference value exceeds the preset threshold, the adjusting module 45 adjusts the voice parameter according to the voice adjusting model.
Wherein the speech parameters include: speech interval time, response speed, and speech speed.
Those skilled in the art will appreciate that the modules in the above-described embodiments of the apparatus may be distributed as described in the apparatus, and may be correspondingly modified and distributed in one or more apparatuses other than the above-described embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
In the following, embodiments of the electronic device of the present invention are described, which may be regarded as an implementation in physical form for the above-described embodiments of the method and apparatus of the present invention. Details described in the embodiments of the electronic device of the invention should be considered supplementary to the embodiments of the method or apparatus described above; for details which are not disclosed in embodiments of the electronic device of the invention, reference may be made to the above-described embodiments of the method or the apparatus.
Fig. 5 is a block diagram of an exemplary embodiment of an electronic device according to the present invention. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the electronic device 500 of the exemplary embodiment is represented in the form of a general-purpose data processing device. The components of the electronic device 500 may include, but are not limited to: at least one processing unit 510, at least one memory unit 520, a bus 530 connecting different electronic device components (including the memory unit 520 and the processing unit 510), a display unit 540, and the like.
The storage unit 520 stores a computer readable program, which may be a code of a source program or a read-only program. The program may be executed by the processing unit 510 such that the processing unit 510 performs the steps of various embodiments of the present invention. For example, the processing unit 510 may perform the steps as shown in fig. 1.
The memory unit 520 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM)5201 and/or a cache memory unit 5202, and may further include a read only memory unit (ROM) 5203. The memory unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: operating the electronic device, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 530 may be one or more of any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 500 may also communicate with one or more external devices 300 (e.g., keyboard, display, network device, bluetooth device, etc.), enable a user to interact with the electronic device 500 via the external devices 500, and/or enable the electronic device 500 to communicate with one or more other data processing devices (e.g., router, modem, etc.). Such communication can occur via input/output (I/O) interfaces 550, and can also occur via network adapter 560 to one or more networks, such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet. The network adapter 560 may communicate with other modules of the electronic device 500 via the bus 530. It should be appreciated that although not shown in FIG. 5, other hardware and/or software modules may be used in the electronic device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID electronics, tape drives, and data backup storage electronics, among others.
FIG. 6 is a schematic diagram of one computer-readable medium embodiment of the present invention. As shown in fig. 6, the computer program may be stored on one or more computer readable media. The computer readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic device, apparatus, or device that is electronic, magnetic, optical, electromagnetic, infrared, or semiconductor, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. The computer program, when executed by one or more data processing devices, enables the computer-readable medium to implement the above-described method of the invention, namely: creating an AI voice parameter set matched with different AI voice styles; searching an AI voice parameter matched with the AI voice style selected by the user in the voice parameter set, playing voice according to the AI voice parameter and processing a dialog response; analyzing the difference value between the voice parameter of the call object and the AI voice parameter in real time; comparing the difference value with a preset threshold value; and if the difference value exceeds the preset threshold value, adjusting the voice parameters.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments of the present invention described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a computer-readable storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a data processing device (which can be a personal computer, a server, or a network device, etc.) execute the above-mentioned method according to the present invention.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution electronic device, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including object oriented programming languages such as Java, C + + or the like and conventional procedural programming languages, such as "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In summary, the present invention can be implemented as a method, an apparatus, an electronic device, or a computer-readable medium executing a computer program. Some or all of the functions of the present invention may be implemented in practice using a general purpose data processing device such as a microprocessor or a Digital Signal Processor (DSP).
While the foregoing embodiments have described the objects, aspects and advantages of the present invention in further detail, it should be understood that the present invention is not inherently related to any particular computer, virtual machine or electronic device, and various general-purpose machines may be used to implement the present invention. The invention is not to be considered as limited to the specific embodiments thereof, but is to be understood as being modified in all respects, all changes and equivalents that come within the spirit and scope of the invention.

Claims (9)

1. An AI speech rate adjustment method, the method comprising:
creating an AI voice parameter set matched with different AI voice styles;
searching an AI voice parameter matched with the AI voice style selected by the user in the voice parameter set, playing voice according to the AI voice parameter and processing a dialog response;
analyzing the difference value between the voice parameter of the call object and the AI voice parameter in real time;
comparing the difference value with a preset threshold value;
and if the difference value exceeds the preset threshold value, adjusting the voice parameters.
2. The method of claim 1, further comprising: and establishing a voice adjusting model, and adjusting the voice parameters according to the voice adjusting model if the difference value exceeds the preset threshold value.
3. The method according to any one of claims 1-2, further comprising:
and comparing the voice parameter of the current call object with the current AI voice parameter to obtain a first result, and comparing the voice parameter of the call object before the voice parameter is adjusted with the voice parameter of the call object after the voice parameter is adjusted to obtain a second result.
Determining the parameter adjustment direction of the current round according to the first result and the second result;
and determining a lower round parameter adjusting scheme according to the current round parameter adjusting direction.
4. The method according to any of claims 1-3, wherein a common spoken aid is inserted into the conversation if the speech rate of the call partner is greater than a first predetermined rate or if the speech rate of the call partner is less than a second predetermined rate.
5. The method according to any of claims 1-4, wherein the speech parameters comprise: speech interval time, response speed, and speech speed.
6. The method according to any one of claims 1-5, wherein the adjustment scheme comprises no adjustment, continuing adjustment in the same direction as the current round of parameter adjustment, or continuing adjustment in the opposite direction to the current round of parameter adjustment.
7. An AI speech rate adjustment apparatus, comprising:
the first establishing module is used for establishing an AI voice parameter set matched with different AI voice styles;
the searching and playing module is used for searching the AI voice parameters matched with the AI voice style selected by the user in the voice parameter set, playing the voice according to the AI voice parameters and processing the dialog response;
the analysis module is used for analyzing the difference value between the voice parameter of the call object and the AI voice parameter in real time;
the comparison module is used for comparing the difference value with a preset threshold value;
and the adjusting module is used for adjusting the voice parameters if the difference value exceeds the preset threshold value.
8. An electronic device, comprising:
a processor; and
a memory storing computer-executable instructions that, when executed, cause the processor to perform the method of any of claims 1-6.
9. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-6.
CN201910939380.8A 2019-09-30 2019-09-30 AI voice rate adjusting method and device and electronic equipment Active CN110619888B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910939380.8A CN110619888B (en) 2019-09-30 2019-09-30 AI voice rate adjusting method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910939380.8A CN110619888B (en) 2019-09-30 2019-09-30 AI voice rate adjusting method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN110619888A true CN110619888A (en) 2019-12-27
CN110619888B CN110619888B (en) 2023-06-27

Family

ID=68924953

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910939380.8A Active CN110619888B (en) 2019-09-30 2019-09-30 AI voice rate adjusting method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110619888B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112599151A (en) * 2020-12-07 2021-04-02 携程旅游信息技术(上海)有限公司 Speech rate evaluation method, system, device and storage medium
CN112599148A (en) * 2020-12-31 2021-04-02 北京声智科技有限公司 Voice recognition method and device
CN113763921A (en) * 2020-07-24 2021-12-07 北京沃东天骏信息技术有限公司 Method and apparatus for correcting text

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106572067A (en) * 2015-10-12 2017-04-19 阿里巴巴集团控股有限公司 Voice flow transmission method and system
CN109582275A (en) * 2018-12-03 2019-04-05 珠海格力电器股份有限公司 Voice regulation method, device, storage medium and electronic device
CN109767792A (en) * 2019-03-18 2019-05-17 百度国际科技(深圳)有限公司 Sound end detecting method, device, terminal and storage medium
KR20190084202A (en) * 2017-12-18 2019-07-16 네이버 주식회사 Method and system for controlling artificial intelligence device using plurality wake up word

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106572067A (en) * 2015-10-12 2017-04-19 阿里巴巴集团控股有限公司 Voice flow transmission method and system
KR20190084202A (en) * 2017-12-18 2019-07-16 네이버 주식회사 Method and system for controlling artificial intelligence device using plurality wake up word
CN109582275A (en) * 2018-12-03 2019-04-05 珠海格力电器股份有限公司 Voice regulation method, device, storage medium and electronic device
CN109767792A (en) * 2019-03-18 2019-05-17 百度国际科技(深圳)有限公司 Sound end detecting method, device, terminal and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763921A (en) * 2020-07-24 2021-12-07 北京沃东天骏信息技术有限公司 Method and apparatus for correcting text
CN112599151A (en) * 2020-12-07 2021-04-02 携程旅游信息技术(上海)有限公司 Speech rate evaluation method, system, device and storage medium
CN112599151B (en) * 2020-12-07 2023-07-21 携程旅游信息技术(上海)有限公司 Language speed evaluation method, system, equipment and storage medium
CN112599148A (en) * 2020-12-31 2021-04-02 北京声智科技有限公司 Voice recognition method and device

Also Published As

Publication number Publication date
CN110619888B (en) 2023-06-27

Similar Documents

Publication Publication Date Title
US20180322865A1 (en) Artificial intelligence-based acoustic model training method and apparatus, device and storage medium
JP2019102063A (en) Method and apparatus for controlling page
CN110619888A (en) AI voice rate adjusting method and device and electronic equipment
KR20190024762A (en) Music Recommendation Method, Apparatus, Device and Storage Media
US20170371863A1 (en) Intention inference system and intention inference method
JP2021168139A (en) Method, device, apparatus and medium for man-machine interactions
KR20200056261A (en) Electronic apparatus and method for controlling thereof
US20190164555A1 (en) Apparatus, method, and non-transitory computer readable storage medium thereof for generatiing control instructions based on text
JP6884947B2 (en) Dialogue system and computer programs for it
KR102615154B1 (en) Electronic apparatus and method for controlling thereof
CN107003825A (en) System and method with dynamic character are instructed by natural language output control film
CN112309365A (en) Training method and device of speech synthesis model, storage medium and electronic equipment
CN110503944B (en) Method and device for training and using voice awakening model
CN111193834A (en) Man-machine interaction method and device based on user sound characteristic analysis and electronic equipment
WO2020052061A1 (en) Method and device for processing information
CN109460548B (en) Intelligent robot-oriented story data processing method and system
CN111128120B (en) Text-to-speech method and device
CN115731915A (en) Active dialogue method and device for dialogue robot, electronic device and storage medium
JP2022088586A (en) Voice recognition method, voice recognition device, electronic apparatus, storage medium computer program product and computer program
CN111966803B (en) Dialogue simulation method and device, storage medium and electronic equipment
CN112002325B (en) Multi-language voice interaction method and device
CN112532794B (en) Voice outbound method, system, equipment and storage medium
CN112017668B (en) Intelligent voice conversation method, device and system based on real-time emotion detection
CN113851106A (en) Audio playing method and device, electronic equipment and readable storage medium
CN112037780A (en) Semantic recognition method and device for intelligent voice robot and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant