CN114999446A

CN114999446A - Speech synthesis system for intelligent broadcasting

Info

Publication number: CN114999446A
Application number: CN202210838240.3A
Authority: CN
Inventors: 明德; 石金川; 张常华; 朱正辉; 赵定金
Original assignee: Guangzhou Baolun Electronics Co Ltd
Current assignee: Guangdong Baolun Electronics Co ltd
Priority date: 2022-07-18
Filing date: 2022-07-18
Publication date: 2022-09-02
Anticipated expiration: 2042-07-18
Also published as: CN114999446B

Abstract

The invention relates to the technical field of voice synthesis, in particular to a voice synthesis system for intelligent broadcasting, which comprises a broadcast information input end, a text analysis module, a voice synthesis module, a broadcast control module and a broadcast voice output end, and is used for judging whether the voice synthesis of broadcast text needs to be accelerated or not according to the broadcast information type corresponding to the broadcast text and the predicted voice synthesis time recognized by the text analysis module, and selecting corresponding playing volume and playing speed for the synthesized voice audio according to the recognized broadcast information type corresponding to the broadcast text, and effectively ensuring that the voice synthesis system for intelligent broadcasting can realize voice synthesis and audio playing by adopting a targeted voice synthesis mode according to the type of the corresponding broadcast information text.

Description

Speech synthesis system for intelligent broadcasting

Technical Field

The invention relates to the technical field of voice synthesis, in particular to a voice synthesis system for intelligent broadcasting.

Background

The Speech synthesis, also known as Text to Speech (Text to Speech) technology, can convert any Text information into standard smooth Speech for reading in real time, is a leading-edge technology in the field of Chinese information processing, and solves the main problem of how to convert the Text information into audible sound information.

Carry out speech synthesis and broadcast to broadcast information through the speech synthesis system and can trun into the machine task from the manual work into with loaded down with trivial details consuming time broadcast work, on the one hand, alleviated the not enough problem of broadcast personnel, on the other hand, because the broadcast of machine speech broadcast is not restricted by time and region, compare with manual broadcast, can promptly report more to the advantage is showing in the aspect of promptly reporting.

Chinese patent publication No.: CN112349268A discloses an emergency broadcast audio processing system and an operation method thereof, which comprises an emergency speech synthesis end, an emergency audio processing end and an emergency broadcast platform which are connected in sequence. And the emergency voice synthesis end receives the emergency information manuscript, synthesizes the emergency information manuscript into emergency dubbing, and sends the emergency dubbing to the emergency audio processing end. The emergency audio processing terminal receives the emergency dubbing, forms emergency broadcast information after audio processing is carried out on the emergency dubbing, and sends the emergency broadcast information to the emergency broadcast platform, and the emergency broadcast platform receives and plays the emergency broadcast information to users. Therefore, the virtual anchor trained by the voice synthesis technology in the technical scheme can replace professionals to dub anytime and anywhere without the limitation of professional equipment and environment, but voice synthesis is performed on the manuscript in the emergency field, and the manuscript is single in type and is not suitable for broadcasting different information types in community broadcasting and business center broadcasting.

Chinese patent publication no: CN111179901A discloses a broadcasting system with a voice synthesis function, which comprises a voice synthesis chip XFS5152 and a voltage monitoring chip IMP811REUS-T, Cortex-M3 processor STM32F103ZET6, wherein the main communication ports RXD and TXD of the voice synthesis chip XFS5152 are respectively connected with a Cortex-M3 processor STM32F103ZET6 through pull-up resistors R13 and R14, and other control ports of the voice synthesis chip XFS5152 are respectively connected with the processor STM32F103ZET6 through pull-up resistors R11 and R12. The technical scheme can synthesize the text into the voice and audio by a digital mode, support Chinese voice synthesis, voice tone and sound speed adjustment, support punctuation mark pause and support pause in common telephone numbers, date and time and other formats, but has the defects that different types of broadcast information texts cannot be identified and a targeted voice synthesis mode cannot be adopted according to the type of the corresponding broadcast information text for voice synthesis.

Disclosure of Invention

Therefore, the invention provides a voice synthesis system for intelligent broadcasting, which is used for solving the problem that the prior art can not adopt a targeted voice synthesis mode to carry out voice synthesis and audio playing according to the type of the corresponding broadcast information text.

To achieve the above object, the present invention provides a speech synthesis system for smart broadcasting, comprising:

the system comprises a broadcast information input end, a local network and a cloud internet, wherein the broadcast information input end is respectively connected with the local network and the cloud internet and is used for receiving broadcast text information or broadcast voice information to be broadcasted and converting the received broadcast text information or broadcast voice information into broadcast text with a preset format;

the text analysis module is connected with the broadcast information input end and used for analyzing the semantics of the broadcast character text to determine the broadcast information type of the broadcast character text and predicting the predicted voice synthesis time of the broadcast character text, and the text analysis module can perform statement error correction on the broadcast character text by identifying the grammar of the broadcast character text and compress the broadcast character text by semantic identification to form a corresponding broadcast key character text or a broadcast key character text;

the voice synthesis module is respectively connected with the broadcast information input end and the text analysis module and is used for carrying out voice synthesis on the received broadcast text processed by the text analysis module to form a voice audio file;

the broadcast control module is respectively connected with the broadcast information input end, the text analysis module and the voice synthesis module and is used for judging whether the voice synthesis of the broadcast text needs to be accelerated or not according to the broadcast information type corresponding to the broadcast text recognized by the text analysis module and the predicted voice synthesis time, and selecting the corresponding play volume and play speed according to the voice audio recognized that the broadcast information type corresponding to the broadcast text is synthesized;

and the broadcast voice output end is respectively connected with the voice synthesis module, the text analysis module and the broadcast control module and is used for playing the voice broadcast data generated by the voice synthesis module under the control of the broadcast control module.

Further, the text analysis module comprises a semantic analysis module and a word processing module,

the semantic analysis module is connected with the broadcast information input end and used for identifying the broadcast information type of the broadcast text and determining key sentences and key sentences in the broadcast text by analyzing the semantics of the broadcast text transmitted by the broadcast information input end;

the text analysis module is respectively connected with the semantic analysis module and the voice synthesis module and used for performing statement error correction on word sentences of the broadcast word text through syntactic analysis so that the word sentences of the broadcast word text conform to language expression specifications, and the text analysis module can determine the predicted voice synthesis time of the broadcast word text according to the analysis of the language segments of the broadcast word text and can generate corresponding broadcast key word texts and broadcast key word texts according to key sentences and key sentences in the broadcast word text determined by the semantic analysis module.

Further, the broadcast information type includes a general notification, an aging notification, an emergency notification, and an immediate notification according to broadcast urgency.

Further, the broadcast control module is provided with a first speech synthesis time standard T1, a second speech synthesis time standard T2, a third speech synthesis time standard T3 and a fourth speech synthesis time standard T4, wherein 120min > T1 > 60min > T2 > 30min > T3 > 3min > T4 > 0.5min, when the broadcast control module identifies that the broadcast information input end receives broadcast text information or broadcast speech information to be broadcasted, the broadcast control module controls the broadcast information input end to convert the received broadcast text information or broadcast speech information into broadcast text in a predetermined format and transmit the broadcast text in the predetermined format to the text analysis module, the broadcast control module determines a speech synthesis time standard corresponding to the broadcast text according to the broadcast information type of the broadcast text identified by the text analysis module to judge the processing condition of the broadcast text,

when the broadcast information type of the broadcast text is a general notification, the broadcast control module judges that a first speech synthesis time standard T1 is adopted as the speech synthesis time standard of the broadcast text;

when the broadcast information type of the broadcast text is the time-efficient notification, the broadcast control module judges that a second speech synthesis time standard T2 is adopted as the speech synthesis time standard of the broadcast text;

when the broadcast information type of the broadcast text is the emergency notification, the broadcast control module judges that a third speech synthesis time standard T3 is adopted as the speech synthesis time standard of the broadcast text;

when the broadcast information type of the broadcast text is the immediate notice, the broadcast control module determines to adopt the fourth speech synthesis time criterion T4 as the speech synthesis time criterion of the broadcast text.

Further, the broadcast control module is provided with a speech synthesis mode determination logic, the speech synthesis mode determination logic determines a speech synthesis mode of the broadcast text by comparing a broadcast information type of the broadcast text with a corresponding speech synthesis time standard, and the speech synthesis mode determination logic includes:

when the broadcast control module determines that the ith speech synthesis time standard Ti is adopted as the speech synthesis time standard of a certain broadcast text, wherein i =1, 2, 3, the broadcast control module controls the text analysis module to calculate the predicted speech synthesis time t1 of the broadcast text and compares t1 with Ti to determine the speech synthesis mode of the broadcast text,

when t1 is less than or equal to Ti, the broadcast control module judges that the predicted speech synthesis duration meets the standard, and performs speech synthesis on all contents of the broadcast text by adopting a standard speech synthesis mode and then plays the broadcast text;

and when t1 is greater than Ti, the broadcast control module judges that the predicted speech synthesis duration does not meet the standard, and performs speech synthesis on the key content of the broadcast text by adopting an accelerated speech synthesis mode and then plays the key content.

Further, the broadcast control module is provided with a first timeout percentage standard a1, a second timeout percentage standard a2, a first text compression coefficient α 1, a second text compression coefficient α 2 and a third text compression coefficient α 3, wherein 100% < a1 < 200% < a2 < 300%, 0.3 < α 3 < 0.5 < α 2 < 0.8 < α 1 < 1, when the broadcast control module determines to perform speech synthesis on the important content of the broadcast text by using an accelerated speech synthesis method, the broadcast control module determines the compression amount of the content of the broadcast text according to the ratio a of t1 to the speech synthesis time standard Ti thereof to obtain the important content of the broadcast text, and sets a = t1/Ti,

when a is not more than A1, the broadcast control module judges that the speech synthesis time of the broadcast text is lower than the overtime standard and adjusts the compression amount of the broadcast text content by adopting a first text compression coefficient alpha 1;

when A1 < a < A2, the broadcast control module judges that the speech synthesis time of the broadcast text accords with the overtime standard and adjusts the compression amount of the broadcast text content by adopting a second text compression coefficient alpha 2;

when a is more than or equal to A2, the broadcast control module judges that the speech synthesis time of the broadcast text is higher than the overtime standard and adjusts the compression amount of the broadcast text content by adopting a third text compression coefficient alpha 3;

when the broadcast control module judges that the compression amount of the broadcast character text content is adjusted by adopting a j-th text compression coefficient alpha j, j =1, 2, 3 is set, the broadcast control module controls the text analysis module to extract important content of the broadcast character text according to the compression amount requirement so as to compress the broadcast character text to generate the broadcast key character text, the total number of sentences in the broadcast key character text content is set to be M1, M1= M0 x alpha j is set, M1 is a positive integer which is rounded downwards, and M0 is the total number of sentences in the broadcast character text content before compression.

Further, the broadcast control module is provided with an accelerated loss time calculation coefficient μ, where μ is greater than 0.6 and less than 0.9, and when the broadcast control module determines to control the text analysis module to generate the broadcast key word text by using an accelerated speech synthesis method, the broadcast control module controls the text analysis module to calculate an expected speech synthesis time t2 of the generated broadcast key word text and compare t2 with an accelerated synthesis time criterion Ti 'corresponding to the broadcast key word text to determine whether the generated broadcast key word text meets the accelerated criterion, and sets Ti' = Ti × μ,

when t2 is less than or equal to Ti', the broadcast control module judges that the generated broadcast key word text meets the acceleration standard and the acceleration is effective, and the broadcast control module controls the text analysis module to transmit the generated broadcast key word text to the voice synthesis module for voice synthesis and then play;

when t2 is greater than Ti', the broadcast control module judges that the generated broadcast key word text does not meet the acceleration standard and the acceleration is invalid, the broadcast control module judges that the broadcast word text is compressed again to form the broadcast key word text, and the broadcast control module controls the text analysis module to transmit the generated broadcast key word text to the voice synthesis module for voice synthesis and then play.

Further, the semantic analysis module is provided with a key information extraction method for extracting key information of the broadcast text, and when the broadcast control module determines to re-compress the broadcast text to form the broadcast key text, the broadcast control module controls the semantic analysis module to extract the key information of the broadcast text by using the key information extraction method to generate the broadcast key text;

the key information extraction method adopts a 5W information extraction method, and the 5W information extraction method is used for extracting time information, place information, character information, event information and reason information in character information and integrating the time information, the place information, the character information, the event information and the reason information to form a complete sentence with smooth language.

Further, the speech synthesis mode determination logic further includes:

when the broadcast control module judges that a fourth speech synthesis time standard T4 is adopted as the speech synthesis time standard of a certain broadcast text, the broadcast control module controls the broadcast speech output end to play a preset emergency notification audio to improve the attention of personnel, and simultaneously controls the text analysis module to identify the word number Q of the broadcast text and determine the playing mode aiming at the broadcast text according to Q, the broadcast control module is provided with an emergency play word number standard Q0, wherein, 10 < Q0 < 50,

when Q is less than or equal to Q0, the broadcast control module judges that the number of words of the broadcast text is less, and the broadcast control module controls the voice synthesis module to perform voice synthesis on all contents of the broadcast text;

when Q is greater than Q0, the broadcast control module determines that the number of words of the broadcast text is large, and the broadcast control module controls the text analysis module to generate a broadcast key text of the broadcast text and controls the voice synthesis module to perform voice synthesis on the broadcast key text.

Further, the broadcast control module is provided with a first broadcast type play adjustment coefficient beta 1, a second broadcast type play adjustment coefficient beta 2, a third broadcast type play adjustment coefficient beta 3 and a fourth broadcast type play adjustment coefficient beta 4, wherein beta 1 is more than 0.8 and less than 1 and beta 2 is more than 1.1 and less than beta 3 and less than 1.3 and less than beta 4 and less than 1.5, when the voice synthesis module completes the voice synthesis of the received text and generates the voice audio corresponding to a certain broadcast text, the broadcast control module determines the play mode of the current broadcast according to the broadcast information type of the broadcast text corresponding to the voice audio,

when the broadcast information type of the broadcast text corresponding to the voice audio is a general notification, the broadcast control module judges that the first broadcast type playing adjustment coefficient beta 1 is adopted to adjust the playing volume and the playing speed of the voice audio;

when the broadcast information type of the broadcast text corresponding to the voice audio is the time-efficient notification, the broadcast control module judges that the second broadcast type playing adjustment coefficient beta 2 is adopted to adjust the playing volume and the playing speed of the voice audio;

when the broadcast information type of the broadcast text corresponding to the voice audio is an emergency notification, the broadcast control module judges that the playing volume and the playing speed of the voice audio are adjusted by adopting a third broadcast type playing adjustment coefficient beta 3;

when the broadcast information type of the broadcast text corresponding to the voice audio is an instant notice, the broadcast control module judges that the playing volume and the playing speed of the voice audio are adjusted by adopting a fourth broadcast type playing adjustment coefficient beta 4;

when the broadcast control module determines that the play volume and the play speed of the voice audio are adjusted by using the kth broadcast type play adjustment coefficient β k, the broadcast control module records the adjusted play volume of the voice audio as B ', records the adjusted play speed of the voice audio as H', and sets B '= B0 × β k and H' = H0 × β k, where B0 is a preset broadcast initial volume and H0 is a preset broadcast initial voice play speed.

Further, the voice synthesis system for smart broadcasting further includes a voice synthesis database module and a voice synthesis control module, wherein,

the voice synthesis database module is connected with the voice synthesis module and used for storing a plurality of language synthesis rules, a plurality of language synthesis methods and a plurality of types of anchor tone color information required by the voice synthesis module so as to provide data support for voice synthesis of the voice synthesis module;

and the voice synthesis control module is respectively connected with the voice synthesis module and the voice synthesis database module and is used for controlling the voice synthesis module to carry out voice synthesis on the text of characters needing voice synthesis according to the voice synthesis language category and the voice synthesis anchor type set by the user so as to generate a voice audio file.

Further, the speech synthesis language category includes Chinese synthesis, English synthesis, idiom special synthesis and dialect synthesis, and the speech synthesis anchor type includes a news broadcast type, a sales promotion type and a soothing and soothing type;

the voice synthesis control module can set default voice synthesis language type and default voice synthesis anchor type, and can determine the voice synthesis mode of single broadcast by selecting the voice synthesis language type and/or the voice synthesis anchor type on the human-computer interaction interface of the broadcast control module.

Further, the broadcast information input terminal includes an information receiving unit and a format conversion unit, wherein,

the information receiving unit is connected with an external network and an information receiving port and is used for receiving broadcast text information or broadcast voice information to be broadcasted;

the format conversion unit is respectively connected with the information receiving unit and the semantic analysis module and is used for converting the received broadcast text information or broadcast voice information into a broadcast text with a preset format, and the format conversion unit is provided with a voice conversion device for converting the received broadcast voice information into text information with a corresponding language type;

the language type of the broadcast text information or the broadcast voice information received by the broadcast information input end comprises Chinese, English and a plurality of languages of small languages.

Compared with the prior art, the intelligent broadcasting voice synthesis system has the advantages that the text analysis module and the broadcasting control module are arranged, the voice synthesis mode and the broadcasting mode of the broadcasting can be set in a targeted mode according to the broadcast text information or the semantic content of the broadcast voice information, and the voice synthesis system for intelligent broadcasting can effectively ensure that the voice synthesis and the audio playing can be carried out in the targeted voice synthesis mode according to the type of the corresponding broadcast information text.

Furthermore, the invention carries out semantic and grammar analysis on the received broadcast text by arranging the text analysis module to determine the broadcast information type of the broadcast text and determine the corresponding voice synthesis mode according to the identified broadcast text broadcast information type, judges the broadcast information type according to the semantic of the broadcast text at first, and ensures that the broadcast after single voice synthesis meets the voice synthesis time requirement by prejudging the voice synthesis time when the semantic of the broadcast text is identified as the broadcast information type with timeliness, thereby effectively ensuring that the invention can carry out voice synthesis and broadcast playing according to the broadcast information type of the identified broadcast text.

Furthermore, the processing condition of the broadcast text is judged by setting the speech synthesis time standard corresponding to the type of the broadcast information, and the subsequent judgment of the speech synthesis mode of the broadcast text is supported by setting the speech synthesis time standard conforming to the type of the broadcast information, so that the method effectively ensures that the method can judge the speech synthesis mode of the broadcast text by adopting the corresponding speech synthesis time standard according to the type of the broadcast information of the broadcast text, and effectively supports the subsequent judgment of the speech synthesis mode of the broadcast text according to the speech synthesis standard, and can carry out targeted judgment according to different types of the broadcast information.

Furthermore, the invention determines the compression amount of the broadcast text content when the key content of the broadcast text needs to be subjected to voice synthesis in an accelerated voice synthesis mode by setting the overtime percentage standard and the text compression coefficient, judges the overtime of the voice synthesis time of the broadcast text by setting the overtime percentage standard, sets the text compression coefficient corresponding to the overtime percentage standard to adjust the compression amount of the broadcast text content, effectively adjusts the compression amount of the broadcast text content according to the overtime condition of the actual broadcast text, ensures that the adjusted broadcast text content can keep higher original text coincidence degree while conforming to the time standard, ensures that the voice synthesis time requirement of the voice synthesis system for intelligent broadcasting can be ensured when the accelerated voice synthesis mode is adopted to perform voice synthesis, the voice synthesis system for intelligent broadcasting has better reduction degree between the content of the voice audio file after voice synthesis and the content needing broadcasting originally.

Furthermore, the invention adjusts the speech synthesis time standard of the broadcast key word text generated by adopting an accelerated speech synthesis mode by setting an accelerated loss time coefficient so as to generate an accelerated synthesis time standard, and whether the generated broadcast key word text meets the acceleration standard is determined by comparing the expected speech synthesis time of the generated broadcast key word text with the acceleration synthesis time standard, thereby effectively ensuring that after the speech synthesis system for intelligent broadcast compresses the broadcast key word text in an acceleration speech synthesis mode, the time occupied by the compressed text can be considered to set a corresponding acceleration loss time coefficient to adjust the acceleration time standard of the subsequent judgment, and determining whether to process the broadcast text in a further text compression mode by judging whether the generated broadcast key text meets the acceleration time standard.

Further, the present invention adopts different voice synthesis methods for broadcast text of different broadcast information types by providing a voice synthesis method determination logic, and when it is determined that a fourth voice synthesis time standard T4 is adopted as a voice synthesis time standard for a certain broadcast text, the broadcast control module controls the broadcast voice output terminal to play a segment of pre-made emergency notification audio to increase the attention of a person, and at the same time, the broadcast control module controls the text analysis module to recognize the number of words of the broadcast text and determines the playing method for the broadcast text according to the determination, by directly determining the number of words of the broadcast text when it is determined that the broadcast information type of the broadcast text is an immediate notification, the problem of an excessively long processing time caused by recognizing the predicted voice synthesis time is avoided, and by simplifying the determination step, the broadcast text whose broadcast information type is an immediate notification is voice synthesized and the broadcast text is voice-synthesized The broadcasting ensures that the immediate notification type broadcasting can notify audience people by adopting the fastest broadcasting mode, and ensures that the voice synthesis system for intelligent broadcasting can carry out a targeted voice synthesis mode according to the broadcasting information type of the identified broadcasting information text.

Furthermore, the voice synthesis system for intelligent broadcasting disclosed by the invention has the advantages that by setting the broadcast volume and the broadcast speed adjusting coefficient of the broadcast audio corresponding to the broadcast information type, when an emergency notification or an instant notification is identified, the broadcast is carried out on audience people by adjusting the broadcast volume and the audio play speed to be higher, so that the broadcast can achieve a better notification effect, and the voice synthesis system for intelligent broadcasting disclosed by the invention can be further effectively ensured to carry out targeted broadcast playing according to the broadcast information type of the identified broadcast information text.

Drawings

Fig. 1 is a block diagram showing a structure of a speech synthesis system for smart broadcasting according to the present invention;

FIG. 2 is a schematic diagram of the operation of the text analysis module of the present invention;

FIG. 3 is a logic diagram of determining a speech synthesis method according to the present invention.

Detailed Description

In order that the objects and advantages of the invention will be more clearly understood, the invention is further described below with reference to examples; it should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and do not limit the scope of the present invention.

It should be noted that in the description of the present invention, the terms of direction or positional relationship indicated by the terms "upper", "lower", "left", "right", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, which are only for convenience of description, and do not indicate or imply that the device or element must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.

Furthermore, it should be noted that, in the description of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

Referring to fig. 1, which is a block diagram illustrating a speech synthesis system for smart broadcasting according to the present invention, the speech synthesis system for smart broadcasting according to the present invention includes:

the system comprises a broadcast information input end, a broadcast text message input end and a broadcast voice message input end, wherein the broadcast information input end is respectively connected with a local network and a cloud internet and is used for receiving broadcast text messages or broadcast voice messages to be broadcasted and converting the received broadcast text messages or broadcast voice messages into broadcast text messages with a preset format;

the text analysis module is connected with the broadcast information input end and used for analyzing the semantics of the broadcast text to determine the broadcast information type of the broadcast text and predict the predicted speech synthesis time of the broadcast text, and the text analysis module can perform statement error correction on the broadcast text by identifying the grammar of the broadcast text and compress the broadcast text by semantic identification to form a corresponding broadcast key text or broadcast key text;

the voice synthesis module is respectively connected with the broadcast information input end and the text analysis module and is used for carrying out voice synthesis on the received broadcast text processed by the text analysis module so as to form a voice audio file;

The text analysis module and the broadcast control module are arranged, so that the voice synthesis mode and the broadcast mode of the broadcast can be set in a targeted manner according to the semantic content of the broadcast text information or the broadcast voice information, and the voice synthesis system for intelligent broadcast can effectively ensure that the voice synthesis and audio playing can be realized by adopting the targeted voice synthesis mode according to the type of the corresponding broadcast information text.

Referring to fig. 2, it is a schematic diagram of the operation of the text analysis module of the present invention, the text analysis module includes a semantic analysis module and a word processing module,

Specifically, the broadcast information type includes a general notification, an aging notification, an emergency notification, and an immediate notification according to broadcast urgency.

The invention carries out semantic and grammar analysis on the received broadcast text by setting the text analysis module to determine the broadcast information type of the broadcast text and determines the corresponding voice synthesis mode according to the identified broadcast text broadcast information type, judges the broadcast information type according to the semantic of the broadcast text at first, and ensures that the broadcast after single voice synthesis meets the voice synthesis time requirement by prejudging the voice synthesis time when the semantic of the broadcast text is identified as the broadcast information type with timeliness, thereby effectively ensuring that the invention can carry out voice synthesis and broadcast playing according to the broadcast information type of the identified broadcast text.

Specifically, the broadcast control module is provided with a first speech synthesis time standard T1, a second speech synthesis time standard T2, a third speech synthesis time standard T3 and a fourth speech synthesis time standard T4, wherein 120min > T1 > 60min > T2 > 30min > T3 > 3min > T4 > 0.5min, when the broadcast control module identifies that the broadcast information input end receives broadcast text information or broadcast speech information to be broadcasted, the broadcast control module controls the broadcast information input end to convert the received broadcast text information or broadcast speech information into broadcast text in a preset format and transmit the broadcast text to the text analysis module, the broadcast control module determines a speech synthesis time standard corresponding to the broadcast text according to the broadcast information type of the broadcast text identified by the text analysis module so as to judge the processing condition of the broadcast text,

The processing condition of the broadcast text is judged by setting the voice synthesis time standard corresponding to the type of the broadcast information, and the subsequent judgment of the voice synthesis mode of the broadcast text is supported by setting the voice synthesis time standard conforming to the type of the broadcast information, so that the method effectively ensures that the method can judge the voice synthesis mode of the broadcast text by adopting the corresponding voice synthesis time standard according to the type of the broadcast information of the broadcast text, and effectively supports the subsequent judgment of the voice synthesis mode of the broadcast text according to the voice synthesis standard, and can carry out targeted judgment according to different types of the broadcast information.

Please refer to fig. 3, which is a diagram of a speech synthesis method determining logic according to the present invention, in which the broadcast control module is provided with a speech synthesis method determining logic, the speech synthesis method determining logic determines a speech synthesis method of a broadcast text by comparing a broadcast information type of the broadcast text with a corresponding speech synthesis time standard, and the speech synthesis method determining logic includes:

when the broadcast control module determines to adopt the ith speech synthesis time standard Ti as the speech synthesis time standard of a certain broadcast text, wherein i =1, 2, 3, the broadcast control module controls the text analysis module to calculate the predicted speech synthesis time t1 of the broadcast text and compares t1 with Ti to determine the speech synthesis mode of the broadcast text,

Specifically, the broadcast control module is provided with a first timeout percentage standard A1, a second timeout percentage standard A2, a first text compression coefficient α 1, a second text compression coefficient α 2 and a third text compression coefficient α 3, wherein 100% < A1 < 200% < A2 < 300%, 0.3 < α 3 < 0.5 < α 2 < 0.8 < α 1 < 1, when the broadcast control module determines to perform speech synthesis on important content of the broadcast text by adopting an accelerated speech synthesis mode, the broadcast control module determines the compression amount of the broadcast text content according to the ratio a of t1 to the speech synthesis time standard Ti thereof to obtain the important content of the broadcast text, and sets a = t1/Ti,

when a is less than or equal to A1, the broadcast control module judges that the speech synthesis time of the broadcast text is less than the overtime standard and adjusts the compression amount of the broadcast text content by adopting a first text compression coefficient alpha 1;

The invention determines the compression amount of the broadcast text content when the key content of the broadcast text needs to be subjected to voice synthesis by adopting an accelerated voice synthesis mode through setting the overtime percentage standard and the text compression coefficient, judges the overtime of the voice synthesis time of the broadcast text by setting the overtime percentage standard, sets the text compression coefficient corresponding to the overtime percentage standard to adjust the compression amount of the broadcast text content, effectively adjusts the compression amount of the broadcast text content according to the overtime condition of the actual broadcast text, ensures that the adjusted broadcast text content can keep higher original text conforming degree as much as possible while conforming to the time standard, and ensures that the voice synthesis system for intelligent broadcasting can ensure the voice synthesis time requirement when adopting the accelerated voice synthesis mode to carry out voice synthesis, the voice synthesis system for intelligent broadcasting has better reduction degree between the content of the voice audio file after voice synthesis and the content needing broadcasting originally.

Specifically, the broadcast control module is provided with an accelerated loss time calculation coefficient μ, where 0.6 < μ < 0.9, and when the broadcast control module determines to control the text analysis module to generate the broadcast key word text by using an accelerated speech synthesis method, the broadcast control module controls the text analysis module to calculate an expected speech synthesis time t2 of the generated broadcast key word text and compare t2 with an accelerated synthesis time criterion Ti 'corresponding to the broadcast key word text to determine whether the generated broadcast key word text meets the accelerated criterion, and sets Ti' = Ti × μ,

when t2 is greater than Ti', the broadcast control module judges that the generated broadcast key word text does not meet the acceleration standard and the acceleration is invalid, the broadcast control module judges that the broadcast word text is compressed again to form a broadcast key word text, and the broadcast control module controls the text analysis module to transmit the generated broadcast key word text to the voice synthesis module for voice synthesis and then play.

The invention adjusts the speech synthesis time standard of the broadcast key word text generated by adopting an accelerated speech synthesis mode by setting an accelerated loss time coefficient to generate an accelerated synthesis time standard, and whether the generated broadcast key word text meets the acceleration standard is determined by comparing the expected speech synthesis time of the generated broadcast key word text with the acceleration synthesis time standard, thereby effectively ensuring that after the speech synthesis system for intelligent broadcast compresses the broadcast key word text in an acceleration speech synthesis mode, the time occupied by the compressed text can be considered to set a corresponding acceleration loss time coefficient to adjust the acceleration time standard of the subsequent judgment, and determining whether to process the broadcast text in a further text compression mode by judging whether the generated broadcast key text meets the acceleration time standard.

Specifically, the semantic analysis module is provided with a key information extraction method for extracting key information of a broadcast text, and when the broadcast control module determines to re-compress the broadcast text to form the broadcast key text, the broadcast control module controls the semantic analysis module to extract the key information of the broadcast text by using the key information extraction method to generate the broadcast key text;

As shown in fig. 3, the speech synthesis mode determining logic further includes:

and when Q is greater than Q0, the broadcast control module judges that the number of words of the broadcast text is large, and controls the text analysis module to generate a broadcast key text of the broadcast text and controls the voice synthesis module to perform voice synthesis on the broadcast key text.

The invention adopts different voice synthesis modes for broadcast text of different broadcast information types by arranging voice synthesis mode judging logic, when judging that a fourth voice synthesis time standard T4 is adopted as the voice synthesis time standard of a certain broadcast text, the broadcast control module controls the broadcast voice output end to play a section of prefabricated emergency notification audio to improve the attention of personnel, and simultaneously the broadcast control module controls the text analysis module to identify the word number of the broadcast text and directly judges the word number of the broadcast text when judging that the broadcast information type of the broadcast text is immediate notification according to the determined playing mode of the broadcast text, thereby avoiding the problem of overlong processing time caused by identifying the predicted voice synthesis time, and carrying out voice synthesis and broadcast on the broadcast text of which the broadcast information type is immediate notification by simplifying the judging steps, the instant notification type broadcast can be notified to audience people by adopting the fastest broadcast mode, and the voice synthesis system for intelligent broadcast can perform a targeted voice synthesis mode according to the broadcast information type of the identified broadcast information text.

Specifically, the broadcast control module is provided with a first broadcast type play adjustment coefficient beta 1, a second broadcast type play adjustment coefficient beta 2, a third broadcast type play adjustment coefficient beta 3 and a fourth broadcast type play adjustment coefficient beta 4, wherein beta 1 is more than 0.8 and less than 1 and beta 2 is more than 1.1 and less than beta 3 and less than 1.3 and less than beta 4 and less than 1.5, when the voice synthesis module completes voice synthesis of the received text characters and generates a voice audio frequency corresponding to a certain broadcast text character, the broadcast control module determines the play mode of the current broadcast according to the broadcast information type of the broadcast text character corresponding to the voice audio frequency,

According to the method, the broadcast volume and the play speed adjusting coefficient of the broadcast audio corresponding to the broadcast information type are set, and when the emergency notification or the instant notification is identified, the broadcast is carried out on audience people by adjusting the volume and the audio play speed to be higher, so that the broadcast can achieve a better notification effect, and the voice synthesis system for intelligent broadcast can be further and effectively ensured to carry out targeted broadcast play according to the broadcast information type of the identified broadcast information text.

Specifically, the voice synthesis system for smart broadcasting further includes a voice synthesis database module and a voice synthesis control module, wherein,

Specifically, the speech synthesis language category includes Chinese synthesis, English synthesis, idiom special synthesis, and dialect synthesis, and the speech synthesis anchor type includes a news broadcast type, a sales promotion type, and a soothing and soothing type;

Specifically, the broadcast information input terminal includes an information receiving unit and a format conversion unit, wherein,

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention; various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A speech synthesis system for smart broadcasting, comprising:

2. The speech synthesis system for smart broadcasting of claim 1, wherein the text analysis module includes a semantic analysis module and a word processing module,

the text analysis module is respectively connected with the semantic analysis module and the voice synthesis module and is used for performing statement error correction on word sentences of the broadcast word text through syntactic analysis so as to enable the word sentences of the broadcast word text to conform to language expression specifications, the text analysis module can determine the predicted voice synthesis time of the broadcast word text according to the analysis on the language sections of the broadcast word text and can generate corresponding broadcast key word text and broadcast key word text according to key sentences and key sentences in the broadcast word text determined by the semantic analysis module;

the broadcast information types include general notification, time-efficient notification, emergency notification, and immediate notification according to broadcast urgency.

3. The speech synthesis system for intelligent broadcasting of claim 2, wherein the broadcast control module is provided with a first speech synthesis time standard T1, a second speech synthesis time standard T2, a third speech synthesis time standard T3 and a fourth speech synthesis time standard T4, wherein 120min > T1 > 60min > T2 > 30min > T3 > 3min > T4 > 0.5min, when the broadcast control module recognizes that the broadcast information input end receives broadcast text information or broadcast speech information to be broadcasted, the broadcast control module controls the broadcast information input end to convert the received broadcast text information or broadcast speech information into broadcast text of a predetermined format and transmit the broadcast text of the predetermined format to the text analysis module, and the broadcast control module determines the corresponding speech synthesis time standard according to the broadcast information type of the broadcast text recognized by the text analysis module to process the broadcast text The judgment is that the user is in a normal state,

when the broadcast information type of the broadcast text is general notification, the broadcast control module judges that a first speech synthesis time standard T1 is adopted as the speech synthesis time standard of the broadcast text;

4. The system of claim 3, wherein the broadcast control module is configured with a speech synthesis method determination logic, the speech synthesis method determination logic determines the speech synthesis method of the broadcast text by comparing the broadcast information type of the broadcast text with a corresponding speech synthesis time standard, and the speech synthesis method determination logic comprises:

5. The speech synthesis system for intelligent broadcasting according to claim 4, wherein the broadcast control module is provided with a first timeout percentage criterion A1, a second timeout percentage criterion A2, a first text compression factor α 1, a second text compression factor α 2, and a third text compression factor α 3, wherein 100% < A1 < 200% < A2 < 300%, 0.3 < α 3 < 0.5 < α 2 < 0.8 < α 1 < 1, and when the broadcast control module determines to speech-synthesize the highlight of the broadcast text in an accelerated speech synthesis manner, the broadcast control module determines the compression amount of the broadcast text content according to the ratio a of t1 to its speech synthesis time criterion Ti to obtain the highlight of the broadcast text, and sets a = t1/Ti,

6. The speech synthesis system for intelligent broadcasting according to claim 5, wherein the broadcast control module is provided with an accelerated loss time calculation coefficient μ, where 0.6 < μ < 0.9, and when the broadcast control module determines to control the text analysis module to generate the broadcast key word text by using an accelerated speech synthesis method, the broadcast control module controls the text analysis module to calculate an expected speech synthesis time t2 of the generated broadcast key word text and compare t2 with an accelerated synthesis time criterion Ti 'corresponding to the broadcast key word text to determine whether the generated broadcast key word text meets the accelerated criterion, and set Ti' = Ti x μ,

7. The system of claim 4, wherein the speech synthesis mode decision logic further comprises:

when Q is less than or equal to Q0, the broadcast control module judges that the number of words of the broadcast text is small, and the broadcast control module controls the voice synthesis module to perform voice synthesis on all contents of the broadcast text;

8. The speech synthesis system for intelligent broadcasting according to any one of claims 6 and 7, wherein the broadcast control module is provided with a first broadcast type play adjustment coefficient β 1, a second broadcast type play adjustment coefficient β 2, a third broadcast type play adjustment coefficient β 3, and a fourth broadcast type play adjustment coefficient β 4, where β 1 < 0.8 < β 1 < β 2 < 1.1 < β 3 < 1.3 < β 4 < 1.5, and when the speech synthesis module completes speech synthesis of a received text and generates a speech audio corresponding to a text to be broadcast, the broadcast control module determines a playing mode of the current broadcast according to a broadcast information type of the text corresponding to the speech audio,

when the broadcast control module determines that the playback volume and the playback speed of the voice audio are adjusted by using the kth broadcast type playback adjustment coefficient β k, the broadcast control module records the adjusted playback volume of the voice audio as B ', records the adjusted playback speed of the voice audio as H', and sets B '= B0 × β k and H' = H0 × β k, where B0 is a preset broadcast initial volume and H0 is a preset broadcast initial voice playback speed.

9. The voice synthesis system for smart broadcasting of claim 8, further comprising a voice synthesis database module and a voice synthesis control module, wherein,

10. The speech synthesis system for smart broadcasting of claim 9, wherein the broadcast information input terminal includes an information receiving unit and a format conversion unit, wherein,