CN106098057B

CN106098057B - Playing speech rate management method and device

Info

Publication number: CN106098057B
Application number: CN201610412991.3A
Authority: CN
Inventors: 周海
Original assignee: Beijing Yunzhisheng Information Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd
Priority date: 2016-06-13
Filing date: 2016-06-13
Publication date: 2020-02-07
Anticipated expiration: 2036-06-13
Also published as: CN106098057A

Abstract

The invention relates to a play speech rate management method and a play speech rate management device, wherein the method comprises the following steps: acquiring a TTS playing text to be played; judging a target field to which the TTS playing text belongs; determining a target playing speed corresponding to the target field; and playing the TTS playing text according to the target playing speed. According to the technical scheme, the TTS playing text to be played is obtained, the corresponding target playing speed is determined for the TTS playing text according to the target field to which the TTS playing text belongs, and therefore the TTS playing text is played according to the target playing speed, different fields are played at different speeds, playing can be conducted in a targeted mode, not all texts are played at one speed, therefore, a user can be guaranteed to clearly listen to the playing content of the playing text in any scene, and the use experience of the user is improved.

Description

Playing speech rate management method and device

Technical Field

The invention relates to the technical field of voice processing, in particular to a playing speed management method and device.

Background

TTS is an abbreviation of Text To Speech, i.e., "from Text To Speech," which is part of a human-machine conversation To enable a machine To speak.

It applies the outstanding actions of linguistics and psychology at the same time, and under the support of built-in chip, it can intelligently convert the characters into natural speech flow by means of the design of neural network. The TTS technology carries out real-time conversion on the text file, and the conversion time can be calculated in seconds. Under the action of a special intelligent voice controller, the voice rhythm of the text output is smooth, so that a listener feels natural when listening to information and does not have the feeling of indifference and acerbity of machine voice output.

TTS is one type of speech synthesis application that converts files stored in a computer, such as help files or web pages, into natural speech output. TTS can not only help visually impaired people read information on a computer, but also increase the readability of text documents. Today's TTS applications include voice-driven mail and voice sensitive systems and are often used with voice recognition programs.

Disclosure of Invention

The embodiment of the invention provides a method and a device for managing playing speech speed, which are used for intelligently managing the speech speed fed back by TTS voice.

According to a first aspect of the embodiments of the present invention, there is provided a play speech rate management method, including:

acquiring a TTS playing text to be played;

judging a target field to which the TTS playing text belongs;

determining a target playing speed corresponding to the target field;

and playing the TTS playing text according to the target playing speed.

In the embodiment, the TTS playing text to be played is acquired, and the corresponding target playing speed is determined for the TTS playing text according to the target field to which the TTS playing text belongs, so that the TTS playing text is played according to the target playing speed, and thus, different fields are played at different speeds, and the TTS playing text can be played in a targeted manner, instead of all texts being played at one speed, so that a user can clearly listen to the playing content of the playing text in any scene, and the use experience of the user is improved.

For example, when a user encounters a [ safety ] related voice prompt while driving a car, the TTS speech rate in the [ safety ] field will be used, slower.

When a user listens to music and meets the voice prompt of reminding, the TTS speed in the field of reminding is used and is faster.

When the user receives the voice prompt of 'advertisement' in the voice broadcast, the TTS speed in the field of 'advertisement' is used, and the speed is particularly fast.

When the user receives the ordinary operation voice prompt, the default TTS speed and the standard speed in the ordinary field are used.

In one embodiment, the determining a target domain to which the TTS playing text belongs includes:

acquiring a keyword label contained in the TTS playing text;

and determining the target field of the TTS playing text according to the keyword tag.

In this embodiment, a keyword tag may be preset in each TTS playing text to indicate a field to which the TTS playing text belongs, so that the field to which the TTS playing text belongs is conveniently determined according to the keyword tag.

For example, if a TTS playing text belongs to the security field, a keyword tag [ security ] can be preset for the TTS playing text, so that the field of the TTS playing text can be determined conveniently and quickly.

In one embodiment, the determining a target playing speed corresponding to the target domain includes:

and determining a target playing speed corresponding to the target field according to the corresponding relation between the preset field and the preset playing speed.

In this embodiment, a user or a manufacturer may preset a plurality of fields and a corresponding play speech rate for each field. The setting fields comprise fields such as safety, reminding, advertisement and common, wherein the speed of speech corresponding to the field of safety is 180 words per minute for 120 plus materials, the speed of speech corresponding to the field of common is 240 words per minute for 180 plus materials, the speed of speech corresponding to the field of reminding is 300 words per minute for 240 plus materials, and the speed of speech corresponding to the field of advertisement is 360 words per minute for 300 plus materials. Thus, the target playing speed corresponding to the target field can be determined according to the corresponding relation between the preset field and the preset playing speed.

In one embodiment, the method further comprises:

receiving an input setting command;

and setting preset fields and playing speed corresponding to each field according to the setting command.

In this embodiment, a user or a manufacturer may preset a plurality of fields and a corresponding play speech rate for each field. The setting fields comprise fields such as safety, reminding, advertisement and common, wherein the speed of speech corresponding to the field of safety is 180 words per minute for 120 plus materials, the speed of speech corresponding to the field of common is 240 words per minute for 180 plus materials, the speed of speech corresponding to the field of reminding is 300 words per minute for 240 plus materials, and the speed of speech corresponding to the field of advertisement is 360 words per minute for 300 plus materials.

In one embodiment, the method further comprises:

when a voice command input by a user is received, determining a TTS playing text to be played corresponding to the voice command.

In this embodiment, it may be determined that the TTS that should be fed back plays the text according to a voice command input by the user. Thereby meeting the playing requirements of users.

According to a second aspect of the embodiments of the present invention, there is provided a playback speech rate management apparatus, including:

the obtaining module is used for obtaining a TTS playing text to be played;

the judging module is used for judging the target field of the TTS playing text;

the first determining module is used for determining a target playing speed corresponding to the target field;

and the playing module is used for playing the TTS playing text according to the target playing speed.

In one embodiment, the determining module comprises:

the obtaining submodule is used for obtaining a keyword tag contained in the TTS playing text;

and the domain determining submodule is used for determining the target domain of the TTS playing text according to the keyword tag.

In one embodiment, the first determining module comprises:

and the speech rate determining submodule is used for determining a target playing speech rate corresponding to the target field according to the corresponding relation between the preset field and the preset playing speech rate.

In one embodiment, the apparatus further comprises:

the receiving module is used for receiving an input setting command;

and the setting module is used for setting the preset fields and the playing speed corresponding to each field according to the setting command.

In one embodiment, the apparatus further comprises:

and the second determining module is used for determining the TTS playing text to be played corresponding to the voice command when the voice command input by the user is received.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a flow diagram illustrating a play speech rate management method according to an example embodiment.

Fig. 2 is a flowchart illustrating a step S102 in a play speech rate management method according to an exemplary embodiment.

Fig. 3 is a flowchart illustrating step S103 of a play speech rate management method according to an exemplary embodiment.

FIG. 4 is a flow diagram illustrating another method of play speech rate management according to an example embodiment.

FIG. 5 is a flow diagram illustrating another method of play speech rate management according to an example embodiment.

Fig. 6 is a block diagram illustrating a playback speech rate management apparatus according to an example embodiment.

Fig. 7 is a block diagram illustrating a determination module in a playback speech rate management apparatus according to an example embodiment.

Fig. 8 is a block diagram illustrating a first determination module in a playback speech rate management apparatus according to an example embodiment.

Fig. 9 is a block diagram illustrating yet another playback speech rate management apparatus according to an example embodiment.

Fig. 10 is a block diagram illustrating yet another playback speech rate management apparatus according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

Fig. 1 is a flow diagram illustrating a play speech rate management method according to an example embodiment. The playing speech rate management can be applied to playing equipment, and the playing equipment can be any equipment with a speech playing function, such as a mobile phone, a computer, a digital broadcast terminal, a message receiving and sending equipment, a game console, a tablet equipment, a medical equipment, a body-building equipment, a personal digital assistant, a vehicle-mounted player and the like. As shown in fig. 1, the method comprises steps S101-S104:

in step S101, a TTS play text to be played is acquired;

in step S102, a target field to which the TTS played text belongs is determined;

in step S103, determining a target playing speed corresponding to the target field;

in step S104, the TTS playing text is played according to the target playing speed.

As shown in fig. 2, in one embodiment, the step S102 may include steps S201 to S202:

in step S201, a keyword tag included in the TTS playing text is acquired;

in step S202, a target field to which the TTS playing text belongs is determined according to the keyword tag.

As shown in fig. 3, in an embodiment, the step S103 may include the step S301:

in step S301, a target play speech rate corresponding to the target field is determined according to a correspondence between a preset field and a preset play speech rate.

As shown in fig. 4, in an embodiment, the method further includes steps S401-S402:

in step S401, an input setting command is received;

in step S402, according to the setting command, a preset domain and a playing speed corresponding to each domain are set.

In this embodiment, a user or a manufacturer may preset a plurality of fields and a corresponding play speech rate for each field. For example, as shown in table 1, the setting fields include fields such as "safety", "reminder", "advertisement" and "general", where the speed of speech corresponding to the field "safety" is 120-.

TABLE 1

FIELD	Playing speed of speech
		Security	120-
General purpose	180-
		Reminding system	240-300 words/minute
Advertising	300 + 360 words/minute

As shown in fig. 5, in an embodiment, before step S101, the method further includes step S501:

in step S501, when a voice command input by a user is received, a TTS play text to be played corresponding to the voice command is determined.

The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention.

Fig. 6 is a block diagram illustrating a playback speech rate management apparatus, which may be implemented as part or all of a playback device through software, hardware, or a combination of both, according to an example embodiment. As shown in fig. 6, the playback speech rate management apparatus includes:

an obtaining module 61, configured to obtain a TTS playing text to be played;

the judging module 62 is configured to judge a target field to which the TTS played text belongs;

a first determining module 63, configured to determine a target playing speed corresponding to the target domain;

and the playing module 64 is configured to play the TTS playing text according to the target playing speed.

As shown in fig. 7, in one embodiment, the determining module 62 includes:

the obtaining submodule 71 is configured to obtain a keyword tag included in the TTS playing text;

and the domain determining submodule 72 is configured to determine, according to the keyword tag, a target domain to which the TTS playing text belongs.

As shown in fig. 8, in one embodiment, the first determining module 63 includes:

and a speech rate determining submodule 81, configured to determine a target playing speech rate corresponding to the target field according to a correspondence between a preset field and a preset playing speech rate.

As shown in fig. 9, in one embodiment, the apparatus further comprises:

a receiving module 91, configured to receive an input setting command;

and the setting module 92 is configured to set a preset field and a playing speed corresponding to each field according to the setting command.

As shown in fig. 10, in one embodiment, the apparatus further comprises:

a second determining module 1001, configured to determine, when a voice command input by a user is received, a to-be-played TTS playing text corresponding to the voice command.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A play speech rate management method is characterized by comprising the following steps:

acquiring a TTS playing text to be played;

judging a target field to which the TTS playing text belongs;

determining a target playing speed corresponding to the target field;

playing the TTS playing text according to the target playing speed;

the judging of the target field of the TTS playing text comprises the following steps:

acquiring a keyword label contained in the TTS playing text;

2. The method according to claim 1, wherein the determining a target playback speech rate corresponding to the target domain comprises:

3. The method of claim 2, further comprising:

receiving an input setting command;

4. The method of claim 1, further comprising:

5. A playback speech rate management apparatus, comprising:

the obtaining module is used for obtaining a TTS playing text to be played;

the playing module is used for playing the TTS playing text according to the target playing speed;

the judging module comprises:

6. The apparatus of claim 5, wherein the first determining module comprises:

7. The apparatus of claim 6, further comprising:

the receiving module is used for receiving an input setting command;

8. The apparatus of claim 5, further comprising: