CN109582271B - Method, device and equipment for dynamically setting TTS (text to speech) playing parameters - Google Patents

Method, device and equipment for dynamically setting TTS (text to speech) playing parameters Download PDF

Info

Publication number
CN109582271B
CN109582271B CN201811261770.6A CN201811261770A CN109582271B CN 109582271 B CN109582271 B CN 109582271B CN 201811261770 A CN201811261770 A CN 201811261770A CN 109582271 B CN109582271 B CN 109582271B
Authority
CN
China
Prior art keywords
parameters
playing
bit
attribute
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811261770.6A
Other languages
Chinese (zh)
Other versions
CN109582271A (en
Inventor
戴帅湘
袁志伟
李龙飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gansu Longdian Yunchuang Technology Consulting Co.,Ltd.
Original Assignee
Beijing Moran Cognitive Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Moran Cognitive Technology Co Ltd filed Critical Beijing Moran Cognitive Technology Co Ltd
Priority to CN201811261770.6A priority Critical patent/CN109582271B/en
Publication of CN109582271A publication Critical patent/CN109582271A/en
Application granted granted Critical
Publication of CN109582271B publication Critical patent/CN109582271B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/162Interface to dedicated audio devices, e.g. audio drivers, interface to CODECs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The embodiment of the invention discloses a method, a device and equipment for dynamically setting TTS (text to speech) playing parameters. The setting method of the TTS playing mark can dynamically set the playing acceleration mark bit, the volume mark bit and the tone mark bit by extracting the parameter information which is acquired from an external system and is related to the playing object, the environment and the user and endowing different weights to the parameters, thereby realizing the TTS intelligent playing.

Description

Method, device and equipment for dynamically setting TTS (text to speech) playing parameters
Technical Field
The embodiment of the invention relates to the field of artificial intelligence, in particular to TTS playing setting.
Background
TTS (Texto Speech) can realize the conversion from text to voice, and is an important technology of man-machine interaction in artificial intelligence technology. Improvements to TTS technology typically involve changes in TTS playback speed, such as controlling playback speed based on how urgent a broadcast object is. However, TTS is faced with a complex environment, and different users have different requirements for broadcasting, and how to intelligently perceive and analyze the environment and requirements to realize adaptive changes of the broadcasting characteristics becomes a new problem.
Disclosure of Invention
The embodiment of the invention provides a method for dynamically setting TTS playing parameters, which comprises the following steps: setting an acceleration mark bit in a playing object, wherein the acceleration mark bit is used for controlling the playing speed of TTS; the acceleration flag bit comprises an attribute bit and a numerical bit; the acceleration mark bit is automatically generated according to the scene and is dynamically adjustable; the scene is defined by a group of parameters with different weights, and the parameters comprise at least one of attribute parameters, environment parameters, user requirement parameters and user state parameters of the playing object; the parameters are obtained automatically.
The value of the attribute bit represents the increase or decrease of the playing speed relative to the reference speed.
The attribute bit value is 0 or 1, wherein when the attribute bit value is 1, the representation playing rate is increased relative to the reference rate; and when the attribute bit is 0, the representation playing speed is reduced relative to the reference speed.
The value of the numerical bit indicates the degree to which the playback rate is increased or decreased relative to the reference rate.
The scenario is set by a set of parameters with different weights, which are set by the user, or automatically generated by the system.
The attribute of the playing object comprises at least one of the length, the size and the content attribute of the playing object.
The content attribute of the playing object comprises at least one of an emergency degree index, an aging characteristic index and a function attribute index of the playing object.
The emergency degree index of the playing object represents whether the playing object is an emergency message or not, the aging characteristic index of the playing object refers to the validity period of the playing object, the function attribute index represents the function of the playing object, and the function comprises at least one of reminding, warning and entertainment.
The environmental parameter includes at least one of a speed index, a noise index, and a location index.
The user state parameter comprises at least one of an emotion index, a fatigue index and a health index.
And the parameters and the indexes are dynamically sensed and obtained from an external system.
The scenario is updated in real time according to the relevant parameters and indicators.
The scene is periodically updated and remains unchanged for a certain period of time.
A volume marking position is also set in the playing object and used for controlling the playing volume of the TTS, and the volume is automatically generated according to the scene and is dynamically adjustable; the scene is defined by a group of parameters with different weights, and the parameters comprise at least one of attribute parameters, environment parameters and user state parameters of the playing object; the parameters are obtained automatically.
A tone mark bit is also set in the playing object and used for controlling the tone of TTS playing, and the tone is automatically generated according to the scene and is dynamically adjustable; the scene is defined by a group of parameters with different weights, and the parameters comprise at least one of attribute parameters, environment parameters, user state parameters and user requirement parameters of the playing object; the parameters are obtained automatically.
The device also comprises mark priority setting used for setting the priority of the acceleration mark bit, the volume mark bit or the tone mark bit or opening and closing the mark bit to open or close the mark bit.
The embodiment of the invention also provides a device for dynamically setting TTS playing parameters, which comprises: the playing unit is used for calling the related control parameters to play the TTS content according to the set mark bit; the mark position setting unit is connected with the playing unit and used for setting at least one of an acceleration mark position, a volume mark position and a tone mark position and outputting a setting result to the playing unit; the scene setting unit is connected with the marking bit setting unit and used for calculating scene parameters and indexes to control the generation of the marking bits; the parameter extraction unit is used for extracting the parameter information related to the scene from the information received by the receiving unit and forwarding the parameter information to the scene setting unit; the receiving unit is used for receiving information from an external system in real time or periodically and forwarding the information to the parameter extraction unit;
wherein, the acceleration mark bit is used for controlling the playing speed of the TTS; the acceleration flag bit comprises an attribute bit and a numerical bit; the acceleration mark bit is automatically generated according to the scene and is dynamically adjustable;
the scenario set by the scenario setting unit is defined by a set of parameters with different weights, and the parameters comprise at least one of attribute parameters, environment parameters, user requirement parameters and user state parameters of a playing object.
The value of the attribute bit represents the increase or decrease of the playing speed relative to the reference speed.
The attribute bit value is 0 or 1, wherein when the attribute bit value is 1, the representation playing rate is increased relative to the reference rate; and when the attribute bit is 0, the representation playing speed is reduced relative to the reference speed.
The value of the numerical bit indicates the degree to which the playback rate is increased or decreased relative to the reference rate.
The scenario is set by a set of parameters with different weights, which are set by the user, or automatically generated by the system.
The attribute of the playing object comprises at least one of the length, the size and the content attribute of the playing object.
The content attribute of the playing object comprises at least one of an emergency degree index, an aging characteristic index and a function attribute index of the playing object.
The emergency degree index of the playing object represents whether the playing object is an emergency message or not, the aging characteristic index of the playing object refers to the validity period of the playing object, the function attribute index represents the function of the playing object, and the function comprises at least one of reminding, warning and entertainment.
The environmental parameter includes at least one of a speed index, a noise index, and a location index.
The user state parameter comprises at least one of an emotion index, a fatigue index and a health index.
And the indexes are dynamically sensed and obtained by the system.
The scenario is updated in real time according to the relevant parameters and indicators.
The scene is periodically updated and remains unchanged for a certain period of time.
The volume marking bit is used for controlling the playing volume of TTS, and the volume is automatically generated according to the scene and is dynamically adjustable; the scene is defined by a group of parameters with different weights, and the parameters comprise at least one of attribute parameters, environment parameters and user requirement parameters of the playing object; the parameters are obtained automatically.
The tone marking bit user controls the tone of TTS playing, and the tone is automatically generated according to the scene and is dynamically adjustable; the scene is defined by a group of parameters with different weights, and the parameters comprise at least one of attribute parameters, environment parameters and user requirement parameters of the playing object; the parameters are obtained automatically.
The mark bit setting unit comprises a mark priority setting module which is used for setting the priority of an acceleration mark bit, a volume mark bit or a tone mark bit or opening and closing the mark bit to enable the mark bit to be opened or closed.
The invention also discloses an intelligent terminal which is characterized by comprising the device for dynamically setting the TTS playing parameters.
The invention also discloses a computer device, which is characterized by comprising a processor and a memory, wherein the processor is used for executing the instructions stored in the memory, and the instructions are used for executing the method.
The invention also discloses a computer readable storage medium which is characterized by storing instructions for executing the method.
Drawings
FIG. 1 illustrates a method for dynamically setting TTS playing parameters according to the present invention;
FIG. 2 is a block diagram of the apparatus for dynamically setting TTS playing parameters according to the present invention;
FIG. 3 is a block diagram of a flag bit setting unit in the apparatus for dynamically setting TTS playing parameters according to the present invention;
fig. 4 is a block diagram of a scenario setting unit in the apparatus for dynamically setting TTS playing parameters of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Example one
Referring to fig. 1, the present invention discloses a method for dynamically setting TTS playing parameters, which controls the playing speed, volume and tone attribute by setting a flag bit that dynamically changes with the external environment, playing object and playing audience state, so as to implement intelligent playing.
In the invention, after the TTS playing object is read, an acceleration mark bit is added in the playing object, and the playing speed is correspondingly controlled by utilizing the acceleration mark bit. A reference speed is preset, and the reference speed refers to the playing speed without the acceleration mark bit. For example: the base rate is approximately 250 chinese characters/minute. The acceleration flag bit is 4 bits, the 1 st bit is an attribute bit, and the value can be (0,1), where 0 represents that the current playing speed is reduced or slowed down relative to the reference speed; 1 represents that the current playing rate is increased or accelerated relative to the reference rate; bits 2 to 4 indicate the range of variation of the playback rate from the reference rate. The following table 1 shows the relationship between the value of the acceleration flag bit and the playing speed.
Table 1: relation between acceleration mark bit value and playing speed
Figure BDA0001844026950000041
Figure BDA0001844026950000051
The above correspondence relationship is only used for explaining how to set the relationship between the acceleration flag bit and the playback speed, and should not be considered as a unique limitation. The user can set other modes or values according to own needs or preferences. For example, the gradient of the change at the time of decreasing or slowing the rate may be set to 10 words/min, and the gradient of the change at the time of increasing or slowing the rate may be set to 20 words/min, or the like. The set upper limit rate and the lower limit rate should conform to the usage habit of the user and should not exceed the user acceptance degree, for example, the upper limit of the speech rate should ensure that the user can clearly hear the content.
The acceleration mark bit is automatically generated according to the scene and is dynamically adjustable; the scene is defined by a group of parameters with different weights, and the parameters comprise at least one of attribute parameters, environment parameters, user requirement parameters and user state parameters of the playing object; the parameters are obtained automatically. The attribute parameter of the playing object comprises a content attribute of the playing object, and the content attribute comprises: an emergency index of a playing object, for example, in an in-vehicle playing system, when receiving a message of an adjacent accident in front through an interface with navigation software, the emergency of the playing object is particularly emergency, and when only the parameter exists, the corresponding acceleration flag bit is set to 1111; in addition, the content attribute further includes a function attribute index of the playing object, for example, the playing object is a current news category, an entertainment category, a work content, and the like, and the acceleration flag bit of the current news category may be set to 1110, and the work content may be set to 1000, or 0000, that is, the reference rate playing, corresponding to different acceleration flag bit values.
The environment parameters may include, for example, an in-vehicle environment, a night environment, a private environment, etc., for example, the play acceleration flag bit in the private environment may be set to 0001;
the user requirement parameter is set according to the user preference, for example, the user usually prefers faster speech speed, and also prefers faster playing speed, so the overall playing speed can be increased, for example, by increasing the amplitude to 40 words/minute, and the overall playing speed can be increased by, for example, setting a faster reference speed, for example, setting the reference speed to 290 words/minute.
The user status parameters are obtained according to the status of the user himself. For example: judging the fatigue degree of the user by detecting the blinking frequency of the user; the health state or emotional state of the user is judged by remotely monitoring indexes such as blood pressure and heart rate of the user.
When there are multiple parameters that affect the setting of the acceleration flag bit together, a conflict may occur, for example, the attribute parameter of the playing object needs to increase the playing speed, and the environment parameter needs to decrease the playing speed. At this time, to avoid collision and obtain a more intelligent result, different parameters may be set to have different priorities, for example, the attribute parameter of the playing object is the first priority and the environmental parameter is the second priority, and then when the above situation occurs, the playing speed should be preferentially increased, that is, the attribute bit of the acceleration flag bit is set to 1. In order to solve the above problem, different parameters may be set to have different weights, for example, the weight of the playing object attribute is 8, and the weight of the environment parameter is 2, so when the playing object attribute is urgent and the environment parameter is at night, the next higher playing speed may be set, and the setting to the highest playing speed is more intelligent and humanized than the case of only considering the playing object attribute.
Example two
Referring to fig. 1, the present invention discloses a method for dynamically setting TTS playing parameters, which controls the playing speed, volume and timbre attributes by setting a flag bit that dynamically changes with the external environment, the playing object, the playing audience demand and the playing audience state change, so as to implement intelligent playing.
Besides the playing speed, the playing volume is also an important parameter affecting the perception of the audience, and it is also necessary to set the volume flag bit to meet the intelligent requirement of the volume. The volume of the playback is adjusted by setting a volume flag bit in the playback object. For example, the volume flag bit may be set to 4 bits, and the 1 st bit indicates whether to increase or decrease with respect to the reference volume; bits 2-4 indicate the degree of volume increase or decrease. The volume flag bit is also automatically adjusted according to parameter information received by the playing system from an interface with an external system, and may include, for example, accident information received from a car navigation system, information of a location received from a positioning system, and user status attributes received from a biometric system, which represent information of a user's emotion, a health index such as blood pressure, and a drowsy state. For example, if the accident information is received from the vehicle-mounted system, the attribute parameter of the accident information is urgent, and the volume flag bit is set to be the maximum value correspondingly, that is, the maximum volume is used for broadcasting, so as to warn the driver. Or the position where the positioning system receives is at home, the problem that other people are disturbed is avoided, and the broadcasting can be carried out at a higher volume. Or the position where the positioning system receives the information is the home, and the user is detected to be about to sleep through blinking of the biological recognition system, and then the minimum value of the volume mark bit is correspondingly set to broadcast the information at the softest volume. Or the position where the positioning system receives the information is a vehicle-mounted environment, and the fact that the user is about to sleep is detected through blinking of the biological recognition system, the maximum value of the volume mark bit is correspondingly set, the maximum value is broadcasted with the strongest volume, and the warning effect is achieved.
Similarly, there may be conflicts between the volume parameters, for example, some parameters may require the user to turn up the volume, and some parameters may require the user to turn down the volume, and the priority of different parameters may be set to solve the problem. For example, if the attribute parameter of the playback object is set to have the highest priority, the volume is turned up to the highest level when the attribute of the playback object is urgent. Or setting different parameters to have different weights, for example, the weight of the attribute parameter of the playing object is 8, and the weight of the user state parameter is 2, when the attribute is urgent and the user state is drowsy, the user is broadcasted with the volume lower than the highest volume, so that the user is prevented from being frightened.
EXAMPLE III
Referring to fig. 1, the present invention discloses a method for dynamically setting TTS playing parameters, which controls the playing speed, volume and tone attribute by setting a flag bit that dynamically changes with the external environment, playing object and playing audience state, so as to implement intelligent playing.
The tone of TTS broadcast is an index influencing user experience, and the comfortable tone can improve the user experience and further improve the viscosity of the user. Therefore, the playing setting system of the invention is also provided with a tone mark bit for adjusting the played tone. The tone mark bits are set according to the number of the tones recorded in the system, for example, if 8 tones are recorded in advance, the tone mark bits are set to 3 bits, each value corresponds to one tone, and the played tone is adjusted by setting the volume mark bit in the playing object. The volume flag bit is also automatically adjusted according to parameter information received by the playing system from an interface with an external system, and may include, for example, attributes of a playing object, different source attributes included in information obtained from different APPs, and user status attributes received from the biometric identification system and representing information such as user emotion, health indicators such as blood pressure, and drowsiness status. For example, if accident information is received from the vehicle-mounted system, the attribute parameter of the accident information is urgent, and the corresponding tone is set to be objective and cool boys; if the playing object is from family chatting in WeChat, the tone color can be set to a soft female voice.
Also, conflicts may arise between various attributes or parameters affecting the tone mark bits, which may be coordinated by setting different priorities or different weights.
The playing acceleration flag bit, the volume flag bit and the tone flag bit can be determined and assigned based on a series of same parameters, indexes and same weights and priorities, that is, the same index can influence the setting of the three flag bits; different respective parameters, metrics, and different priority and weight information may also be selected for determination and assignment.
Example four
Referring to fig. 2, the present invention discloses a device for dynamically setting TTS playing parameters, which includes a playing unit for invoking relevant control parameters to play TTS contents according to a set flag bit; the mark position setting unit is connected with the playing unit and used for setting at least one of an acceleration mark position, a volume mark position and a tone mark position and outputting a setting result to the playing unit; the scene setting unit is connected with the marking bit setting unit and used for calculating scene parameters and indexes to control the generation of the marking bits; the parameter extraction unit is used for extracting the parameter information related to the scene from the information received by the receiving unit and forwarding the parameter information to the scene setting unit; the receiving unit is used for receiving information from an external system in real time or periodically and forwarding the information to the parameter extraction unit;
the flag bit setting unit comprises an acceleration flag bit setting module, and the acceleration flag bit is used for controlling the playing speed of the TTS; the acceleration flag bit comprises an attribute bit and a numerical bit; the acceleration mark bit is automatically generated according to the scene and is dynamically adjustable;
preferably, the flag bit setting unit may further include a volume flag bit setting module, where the volume flag bit is used to control the playing volume of the TTS, and the volume is automatically generated according to the scene and is dynamically adjustable; the scene is defined by a group of parameters with different weights, and the parameters comprise at least one of attribute parameters, environment parameters, user requirement parameters and user state parameters of the playing object; the parameters are obtained automatically.
Preferably, the flag bit setting unit may further include a tone flag bit setting module, where the tone flag bit user controls a tone of TTS playing, and the tone is automatically generated according to the scene and is dynamically adjustable; the scene is defined by a group of parameters with different weights, and the parameters comprise at least one of attribute parameters, environment parameters, user requirement parameters and user state parameters of the playing object; the parameters are obtained automatically.
As shown in fig. 3, a flag priority setting module is disposed in the flag bit setting unit, and is configured to set priorities of the acceleration flag bit, the volume flag bit, and the tone flag bit, or control a function of the corresponding flag bit to turn on or off the corresponding flag bit.
EXAMPLE five
Referring to fig. 4, the scenario set by the scenario setting unit is defined by a set of parameters with different weights, where the parameters include at least one of an attribute parameter, an environment parameter, a user requirement, and a user status parameter of a playing object; the scene setting unit comprises a playing object attribute parameter setting module, an environment parameter setting module, a user requirement parameter setting module and a user state parameter setting module. The scene setting unit is connected with the parameter extracting unit, the parameter extracting unit is connected with the receiving unit and used for extracting the parameter information related to the scene from the information received by the receiving unit and forwarding the parameter information to the scene setting unit; the receiving unit is used for receiving information from an external system in real time or periodically and forwarding the information to the parameter extraction unit; the receiving unit is connected with different external systems through different interfaces, including but not limited to various navigation systems, biological feature perception systems, various APPs and the like, and is provided with a plurality of interfaces which can be expanded to be compatible with more external systems. The scene setting unit is also provided with a parameter weight setting module which is used for endowing different weight to different parameter setting units, and the weight can be initially set, automatically set or set by acquiring user input through an external interface. The scene setting unit finally defines the scene according to the parameters with different weights set by different units. The scene unit setting unit also comprises a checking module for checking each parameter to ensure the correct result is output.
EXAMPLE six
The play setting system defined in the fourth and fifth embodiments may be integrated into different hardware or operating systems such as a mobile terminal, a vehicle-mounted system, a player, a computer, and the like.
EXAMPLE seven
The invention also protects a computer program medium for storing computer instructions for implementing the method for dynamically setting TTS playing parameters of the invention.
The invention also protects a computer device comprising a memory and a processor. The processor can call the instructions stored in the memory and realize the method for dynamically setting the TTS playing parameters by executing the instructions.

Claims (23)

1. A method for dynamically setting TTS playing parameters, the method comprising: setting an acceleration mark bit in a playing object, wherein the acceleration mark bit is used for controlling the playing speed of TTS; presetting a reference speed, wherein the reference speed refers to a playing speed when no acceleration mark bit exists; the acceleration mark bit comprises an attribute bit and a numerical bit, and the value of the attribute bit represents the increase or decrease of the playing speed relative to the reference speed; the attribute bit value is 0 or 1, wherein when the attribute bit value is 1, the representation playing rate is increased relative to the reference rate; when the attribute bit is 0, the representation playing speed is reduced relative to the reference speed; the value of the numerical bit represents the degree of increase or decrease of the playing rate relative to the reference rate;
the acceleration mark bit is automatically generated according to the scene and is dynamically adjustable; the scene is defined by a group of parameters with different weights, and the parameters comprise attribute parameters, environment parameters and user requirement parameters of a playing object; the parameters are automatically obtained; the weight is set by a user, and the scene is updated in real time according to related parameters and indexes;
the device also comprises mark priority setting used for setting the priority of the acceleration mark bit or opening and closing the function of the mark bit to enable the mark bit to be opened or closed.
2. The method of claim 1, wherein the attributes of the playing object comprise at least one of a length, a size, and a content attribute of the playing object.
3. The method according to claim 2, wherein the content attribute of the playing object comprises at least one of an urgency index, an aging characteristic index and a function attribute index of the playing object.
4. The method according to claim 3, wherein the emergency degree index of the playing object represents whether the playing object is an emergency message, the aging characteristic index of the playing object represents the effective period of the playing object, the function attribute index represents the function of the playing object, and the function comprises at least one of reminding, warning and entertainment.
5. The method of claim 1, the environmental parameter comprising at least one of a speed indicator, a noise indicator, a location indicator.
6. The method of claim 1, the parameters further comprising user status parameters comprising at least one of an emotional index, a fatigue index, a health index.
7. The method according to one of claims 3 to 6, wherein the parameters and indicators are dynamically sensed and obtained from an external system.
8. The method of claim 1, further comprising a volume flag bit for controlling the playing volume of the TTS, wherein the volume is automatically generated according to the scene and is dynamically adjustable; the scene is defined by a group of parameters with different weights, and the parameters comprise at least one of attribute parameters, environment parameters and user state parameters of the playing object; the parameters are obtained automatically.
9. The method of claim 1, further comprising a tone mark bit for controlling the tone of TTS playing, wherein the tone is automatically generated according to the scene and is dynamically adjustable; the scene is defined by a group of parameters with different weights, and the parameters comprise at least one of attribute parameters, environment parameters, user state parameters and user requirement parameters of the playing object; the parameters are obtained automatically.
10. The method of claim 1, further comprising a flag priority setting for priority of volume flag bits or tone flag bits, or a function to turn the flag bits on or off.
11. An apparatus for dynamically setting TTS playing parameters, the apparatus comprising: the playing unit is used for calling the related control parameters to play the TTS content according to the set mark bit; the mark position setting unit is connected with the playing unit and used for setting at least one of an acceleration mark position, a volume mark position and a tone mark position, controlling the opening and closing of the mark position and outputting a setting result to the playing unit; the scene setting unit is connected with the marking bit setting unit and used for calculating scene parameters and indexes to control the generation of the marking bits; the parameter extraction unit is used for extracting the parameter information related to the scene from the information received by the receiving unit and forwarding the parameter information to the scene setting unit; the receiving unit is used for receiving information from an external system in real time or periodically and forwarding the information to the parameter extraction unit;
presetting a reference speed, wherein the reference speed refers to a playing speed when no acceleration mark bit exists; the flag bit setting unit is used for setting an acceleration flag bit in a playing object, and the acceleration flag bit is used for controlling the playing speed of TTS; the acceleration flag bit comprises an attribute bit and a numerical bit; the value of the attribute bit represents the increase or decrease of the playing speed relative to the reference speed; the attribute bit value is 0 or 1, wherein when the attribute bit value is 1, the representation playing rate is increased relative to the reference rate; when the attribute bit is 0, the representation playing speed is reduced relative to the reference speed; the value of the numerical bit represents the degree of increase or decrease of the playing rate relative to the reference rate; the acceleration mark bit is automatically generated according to the scene and is dynamically adjustable; the scene is updated in real time according to the relevant parameters and indexes;
the scene set by the scene setting unit is defined by a group of parameters with different weights, the weights are set by a user, and the parameters comprise attribute parameters, environment parameters and user requirement parameters of a playing object; the device also comprises mark priority setting used for setting the priority of the acceleration mark bit or opening and closing the function of the mark bit to enable the mark bit to be opened or closed.
12. The apparatus of claim 11, wherein the attribute of the playing object comprises at least one of a length, a size, and a content attribute of the playing object.
13. The apparatus according to claim 12, wherein the content attribute of the playback object comprises at least one of an urgency index, an aging characteristic index, and a function attribute index of the playback object.
14. The apparatus according to claim 13, wherein the emergency index of the playing object represents whether the playing object is an emergency message, the aging characteristic index of the playing object represents a valid period of the playing object, and the function attribute index represents a function of the playing object, the function includes at least one of reminding, warning and entertainment.
15. The apparatus of claim 11, the environmental parameter comprising at least one of a speed indicator, a noise indicator, a location indicator.
16. The apparatus of claim 11, the parameters further comprising user status parameters comprising at least one of an emotional index, a fatigue index, a health index.
17. The apparatus of one of claims 12-16, said metrics each being dynamically perceived and obtained by the system.
18. The apparatus of claim 11, further comprising a volume flag bit for controlling the playing volume of the TTS, wherein the volume is automatically generated according to the scene and is dynamically adjustable; the scene is defined by a group of parameters with different weights, and the parameters comprise at least one of attribute parameters, environment parameters, user requirement parameters and user state parameters of the playing object; the parameters are obtained automatically.
19. The apparatus of claim 11, further comprising a tone mark bit, wherein the user controls the tone of the TTS playing, and the tone is automatically generated according to the scene and is dynamically adjustable; the scene is defined by a group of parameters with different weights, and the parameters comprise at least one of attribute parameters, environment parameters, user requirement parameters and user state parameters of the playing object; the parameters are obtained automatically.
20. The apparatus of claim 11, further comprising a flag priority setting for setting a priority of a volume flag bit or a tone flag bit, or a function of turning the flag bit on or off.
21. An intelligent terminal, characterized in that it comprises means for dynamically setting TTS playing parameters according to any one of claims 11 to 20.
22. A computer device, comprising a processor and a memory, the processor configured to execute instructions stored in the memory, the instructions configured to perform the method of claims 1-10.
23. A computer-readable storage medium having stored thereon instructions for performing the method of claims 1-10.
CN201811261770.6A 2018-10-26 2018-10-26 Method, device and equipment for dynamically setting TTS (text to speech) playing parameters Active CN109582271B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811261770.6A CN109582271B (en) 2018-10-26 2018-10-26 Method, device and equipment for dynamically setting TTS (text to speech) playing parameters

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811261770.6A CN109582271B (en) 2018-10-26 2018-10-26 Method, device and equipment for dynamically setting TTS (text to speech) playing parameters

Publications (2)

Publication Number Publication Date
CN109582271A CN109582271A (en) 2019-04-05
CN109582271B true CN109582271B (en) 2020-04-03

Family

ID=65921084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811261770.6A Active CN109582271B (en) 2018-10-26 2018-10-26 Method, device and equipment for dynamically setting TTS (text to speech) playing parameters

Country Status (1)

Country Link
CN (1) CN109582271B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110277092A (en) * 2019-06-21 2019-09-24 北京猎户星空科技有限公司 A kind of voice broadcast method, device, electronic equipment and readable storage medium storing program for executing
CN111161721B (en) * 2019-11-28 2022-12-20 广州赛特智能科技有限公司 Method for adjusting voice broadcast speed indoors according to moving distance

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101004806A (en) * 2005-11-03 2007-07-25 国际商业机器公司 Method and system for voice rendering synthetic data
CN104616660A (en) * 2014-12-23 2015-05-13 上海语知义信息技术有限公司 Intelligent voice broadcasting system and method based on environmental noise detection
CN107437413A (en) * 2017-07-05 2017-12-05 百度在线网络技术(北京)有限公司 voice broadcast method and device
CN108369805A (en) * 2017-12-27 2018-08-03 深圳前海达闼云端智能科技有限公司 Voice interaction method and device and intelligent terminal

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9704476B1 (en) * 2013-06-27 2017-07-11 Amazon Technologies, Inc. Adjustable TTS devices
CN104883642B (en) * 2015-03-27 2018-09-25 成都上生活网络科技有限公司 A kind of effect adjusting method
CN105957528A (en) * 2016-06-13 2016-09-21 北京云知声信息技术有限公司 Audio processing method and apparatus
US10586079B2 (en) * 2016-12-23 2020-03-10 Soundhound, Inc. Parametric adaptation of voice synthesis
CN106681686B (en) * 2017-01-04 2020-07-28 广东美的制冷设备有限公司 Broadcast control method, broadcast control device and air conditioner
CN106973168A (en) * 2017-05-04 2017-07-21 广东欧珀移动通信有限公司 Speech playing method, device and computer equipment
CN107731219B (en) * 2017-09-06 2021-07-20 百度在线网络技术(北京)有限公司 Speech synthesis processing method, device and equipment
CN108105958A (en) * 2017-12-13 2018-06-01 广东美的制冷设备有限公司 Conditioner and its voice broadcast method, terminal and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101004806A (en) * 2005-11-03 2007-07-25 国际商业机器公司 Method and system for voice rendering synthetic data
CN104616660A (en) * 2014-12-23 2015-05-13 上海语知义信息技术有限公司 Intelligent voice broadcasting system and method based on environmental noise detection
CN107437413A (en) * 2017-07-05 2017-12-05 百度在线网络技术(北京)有限公司 voice broadcast method and device
CN108369805A (en) * 2017-12-27 2018-08-03 深圳前海达闼云端智能科技有限公司 Voice interaction method and device and intelligent terminal

Also Published As

Publication number Publication date
CN109582271A (en) 2019-04-05

Similar Documents

Publication Publication Date Title
US10053113B2 (en) Dynamic output notification management for vehicle occupant
JP6192126B2 (en) Incoming call notification control system
US8009025B2 (en) Method and system for interaction between a vehicle driver and a plurality of applications
US8400332B2 (en) Emotive advisory system including time agent
EP3611724A1 (en) Voice response method and device, and smart device
CN110390932A (en) Method of speech processing and its equipment based on recognition of face
JP4659754B2 (en) Method and system for interaction between vehicle driver and multiple applications
CN105869626A (en) Automatic speech rate adjusting method and terminal
RU2704663C2 (en) Dynamic information system on conversations (versions)
CN110782891B (en) Audio processing method and device, computing equipment and storage medium
CN102635999B (en) Method for managing refrigerators through voices
JP2010128099A (en) In-vehicle voice information providing system
CN109582271B (en) Method, device and equipment for dynamically setting TTS (text to speech) playing parameters
CN109286727B (en) Operation control method and terminal equipment
CN110336892B (en) Multi-device cooperation method and device
JP2007511414A6 (en) Method and system for interaction between vehicle driver and multiple applications
US20190287520A1 (en) Dialog processing system, vehicle having the same, dialog processing method
CN112078498B (en) Sound output control method for intelligent vehicle cabin and intelligent cabin
CN108055617A (en) A kind of awakening method of microphone, device, terminal device and storage medium
DE102017213249A1 (en) Method and system for generating an auditory message in an interior of a vehicle
CN112455370A (en) Emotion management and interaction system and method based on multidimensional data arbitration mechanism
CN105469794B (en) Information processing method and electronic equipment
TWI822186B (en) Computer-implemented method of adapting a graphical user interface of a human machine interface of a vehicle, computer program product, human machine interface, and vehicle
CN113808410B (en) Vehicle driving prompting method and device, electronic equipment and readable storage medium
JP6988438B2 (en) In-vehicle control device, control device, control method, control program, and utterance response method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240110

Address after: 730000, No. 803-48, 8th Floor, No. 18 Gaoxin Yannan Road, Chengguan District, Lanzhou City, Gansu Province

Patentee after: Gansu Longdian Yunchuang Technology Consulting Co.,Ltd.

Address before: Room 401, gate 2, block a, Zhongguancun 768 Creative Industry Park, 5 Xueyuan Road, Haidian District, Beijing 100083

Patentee before: BEIJING MORAN COGNITIVE TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right