CN104050962B

CN104050962B - Multifunctional reader based on speech synthesis technique

Info

Publication number: CN104050962B
Application number: CN201310083450.7A
Authority: CN
Inventors: 李军; 李启超; 窦超强; 袁文斌
Original assignee: Guangdong Heng Electrical Information Polytron Technologies Inc
Current assignee: Guangdong Heng electrical information Polytron Technologies Inc
Priority date: 2013-03-16
Filing date: 2013-03-16
Publication date: 2019-02-12
Anticipated expiration: 2033-03-16
Also published as: CN104050962A

Abstract

The invention discloses a kind of Multifunctional readers based on speech synthesis technique, it is characterized in that, it includes the electronic reader module, voice synthesizing server module and the microprocessor for being embedded with linux system for connecing user client, electronic reader module is connect with voice synthesizing server module, and electronic reader module, voice synthesizing server module are separately connected microprocessor.The configuration of the present invention is simple, easy to use and operate, low power consuming, save the cost.

Description

Multifunctional reader based on speech synthesis technique

Technical field

The present invention relates to intelligent sound synthesis technical fields, more particularly to a kind of readding based on speech synthesis technique Read device.

Background technique

With the further exploitation and popularization of domestic and international various embedded products, embedded technology and people's lives are more next More combine closely.In embedded device, for example, e-book, mobile phone, intelligent toy, information household appliances and vehicle GPS, speech synthesis (TTS, Text-To-Speech) technology is in an increasingly wide range of applications.It is provided using GPS (GPS) Condition of road surface and location information are a main trend of transport and communications industry, almost have become the necessary equipment of vehicle.In vehicle GPS Middle addition TTS technology, driver can eye hurry hand it is busy in the case where, listen to road conditions multidate information and logical in real time by voice Know, announce, obtain the customized information of car owner's customization in time, plane display navigation is risen into three-dimensional Voice Navigation；Blind person is One disadvantaged group of our societies, they can not read common books and newspapers, and the books for the blind that touching reading can only be leaned on thick are learned It practises, but books for the blind volume is big, at high cost, information content is few, leverages them and obtains the efficiency of knowledge and information.And in electronics TTS technology is integrated in book, see e-book not only can, moreover it is possible to be listened, be read and brought conveniently for blind person；With mobile communication row The development of industry, mobile phone become increasingly popular, and demand is doubled and redoubled.Mobile phone as mobile communication terminal just towards miniaturization, it is multi-functional Change, personalized direction is developed.Mobile phone with TTS function can use voice broadcast incoming number, summarize Email content, Calendar prompting is given, network information etc. is listened to.Embedded TTS technology can also be in numerous necks such as Intelligent voice toy, measuring instrument It is used widely in domain.From the point of view of development trend, overall application of the TTS technology on embedded device has become not coilable Trend.Although e-book product on the market all incorporates TTS function mostly at present, these products can only read aloud txt format File, and the e-book of txt format and few causes the application of TTS to be had a greatly reduced quality.Moreover, every book every time will be from One page the first row starts to read aloud, the not selectable right of user, it is not possible to specify any desired content to be read aloud, design Impersonality.On the other hand, speech synthesis technique passes through the development of many years, and technology reaches its maturity, has been provided with quite high Use value.But as application field constantly extends, to TTS technology itself, higher requirements are also raised.In some special necks Domain needs different pronunciation styles, and certain places it may also be desirable to local dialect, the speech synthesis etc. with certain emotion.Just For TTS current synthetic effect, the effect for comparing the text synthesis of specification for general prompt term, news category etc. compares Ideal, and those contain the text (such as poem) for going up and down the prosodic information of emergency, modulation in tone emotion, synthesis effect to other Fruit is then unsatisfactory.Although that is, present tts system substantially meets our requirement in intelligibility, in nature Degree aspect but differs greatly from the requirement of people, synthesizes the voice of output with apparent machine taste.It really is able to substitution people Tts system to read occurs not yet, uses in the larger context to also restrict tts system.

Summary of the invention

The object of the invention is in order to a kind of structure for solving the deficiency of the prior art and providing it is simple, it is easy to use, The Multifunctional reader based on speech synthesis technique of use scope of the speech synthesis in e-book can be expanded.

The present invention is that a kind of more function based on speech synthesis technique are attained in that using following technical solution It can reader, which is characterized in that it includes the electronic reader module for connecing user client, voice synthesizing server module and interior Microprocessor embedded with linux system, electronic reader module are connect with voice synthesizing server module, electronic reader mould Block, voice synthesizing server module are separately connected microprocessor, and electronic reader module includes display screen, communication module, UI mould Block, interactive module and signal processing module, communication module is responsible for and voice synthesizing server module establishes connection, send request and Receive voice data；UI module is responsible for the layout of the display of file, window member；The operation of interactive module execution user；Signal Processing module setting signal handles function, realizes signal process function；Voice synthesizing server module includes initialization module, simultaneously Server processes are initialized as one by hair service module, processing module, I/O module and voice synthetic module, initialization module Then finger daemon creates a Unix domain socket, protocol address is read from configuration file and initializes socket, is then set It sets signal process function and completes initialization；Concurrent services module is after the completion of initialization, and server starts to execute endless loop, Accept function is called to enter sleep state in loop body, after client connection request reaches, accept function is returned, host process It is offering customers service that each client, which calls fork to derive from a subprocess,；Processing module is to complete the parsing of request data package, It performs corresponding processing according to different requests, and processing result is packaged into data packet；I/O module is responsible for reading from client Data and to client send data；Voice synthetic module is made of text analyzing, rhythm processing and speech synthesis three parts, is completed The conversion of Text To Speech.

As a further illustration of the above scheme, the micro process is provided with USB interface and configures USB peripheral driving.

The microprocessor is and to build a cross compile system using Linux as host operating system.

The present invention is using the above-mentioned attainable beneficial effect of technical solution:

1, based on speech synthesis technique, the embedded e-book phonetic reader of exploitation is carried out the present invention by TTS It reads aloud, user can obtain the information that they want, and can will be ordered by developing RSS reader under windows platform The content read is bright to read out, and subscribed content can also be carried out to the file preservation that literary periodicals export to wav format, with standby user It needs.

2, e-book phonetic reader have it is small in size, light-weight, can store required books, easy to carry, low power consuming, Save the cost, the advantages that being recycled.

3, the present invention develops software and hardware one using embedded Qt graphics frame and the Speech Synthesis Algorithm voluntarily realized The Portable e-book phonetic reader of change can browse through and read aloud the file of txt, pdf and html format, support mandarin, Guangdong language and English, compared to e-book product in the market, the file that sharpest edges are the provision of multiple format is read aloud and is selected Text function of reading aloud expands use scope of the speech synthesis in e-book, and supports the reading of Guangdong language.

4, a big characteristic of the invention is that TTS function is incorporated in RSS reader.The RSS reader of mainstream is simultaneously at present TTS is not provided, but TTS is highly suitable for used in RSS reader, because the content of RSS is succinct, contains much information, does not have There are the interference informations such as advertisement, content is also that user needs just to subscribe to, so being read aloud by TTS, user can hear them Really desired information, meanwhile, reader, which is also provided, to carry out speech synthesis for subscribed content and exports to the human nature of wav file Change function, user " can listen to " information of their subscription by playing export whenever and wherever possible.

Detailed description of the invention

Fig. 1 is general structure schematic diagram of the invention；

Fig. 2 is electronic reader module diagram；

Fig. 3 is voice synthesizing server the functional block diagram；

Fig. 4 is speech synthesis flow chart；

Fig. 5 is that server of the invention interacts schematic diagram with client.

Description of symbols: 1, electronic reader module 1-1, communication module 1-2, UI module 1-3, interactive module 1- 4, signal processing module 2, voice synthesizing server module 2-1, initialization module 2-2, concurrent services module 2-3, processing Module 2-4, I/O module 2-5, voice synthetic module 3, microprocessor 4, user client.

Specific embodiment

As Figure 1-Figure 5, a kind of Multifunctional reader based on speech synthesis technique of the present invention, it includes meeting user visitor Electronic reader module 1, voice synthesizing server module 2 and the microprocessor 3 for being embedded with linux system at family end 4, electronics is read It reads device module 1 to connect with voice synthesizing server module 2, electronic reader module 1, voice synthesizing server module 2 connect respectively Microprocessor 3 is connect, realizes that text function of reading aloud is read aloud and selected to the file of multiple format, voice synthesizing server module can be clear The file of txt, pdf and html format is look at and reads aloud, voice synthesizing server module can be automatically synthesized mandarin, Guangdong language and English Language.Electronic reader module includes display screen, communication module 1-1, UI module 1-2, interactive module 1-3 and signal processing module 1- 4, communication module is responsible for and voice synthesizing server module establishes connection, sends request and receives voice data；UI module is responsible for The display of file, window member layout；The operation of interactive module execution user；Signal processing module setting signal handles letter Number realizes signal process function；Voice synthesizing server module 2 includes initialization module 2-1, concurrent services module 2-2, processing Module 2-3, I/O module 2-4 and voice synthetic module 2-5, initialization module by server processes be initialized as one guard into Then journey creates a Unix domain socket, protocol address is read from configuration file and initializes socket, then setting signal It handles function and completes initialization；Concurrent services module is after the completion of initialization, and server starts to execute endless loop, in loop body Interior calling accept function enters sleep state, and after client connection request reaches, accept function is returned, each visitor of host process It is offering customers service that family, which calls fork to derive from a subprocess,；Processing module is to complete the parsing of request data package, according to not Same request performs corresponding processing, and processing result is packaged into data packet；I/O module be responsible for from client read data and Data are sent to client.In the present embodiment, UI module uses multi-platform C++ graphical user interface application program frame (Qt), Qt included character library display effect Chinese under embedded environment is simultaneously bad, so reader is supported using third party's character library The display of Chinese.The character library of Qt support these four formats of TTF, BDF, PFA/PFB and QPF.If straight in embedded board It connects using PFA/PFB, then application program can just calculate dot matrix when display, final the effect is unsatisfactory, word occurs Body situation not of uniform size.And this font occupies very more FLASH and memory, and loading velocity is also relatively slow；BDF font There is identical problem；Although the load of QPF font speed is fast, font pantograph is not supported；TTF font not only supports font to contract It puts, and loading velocity is very fast, therefore uses TTF format font in reader.

As shown in Figure 3 and Figure 4, voice synthetic module 2-5 is by text analyzing, rhythm processing and speech synthesis three parts group At completing the conversion of Text To Speech.The text of input is mainly decomposed into phoneme by word or word by the work of this process, and And the symbol of specially treated is wanted to analyze number, monetary unit, word deforming and the punctuate etc. in text, and by sound Element is generated then digital audio plays back or save as audio files with loudspeaker after and is played with multimedia software.Wherein, The main task of text analyzing has: 1, pre-processing to the text of input, so that text normalization.Specific way is such as removed Punctuation mark is converted into some special labels by some extra spaces or newline, such as the dead time is identical , tone changes consistent punctuation mark and is uniformly converted to certain special marking.It, can be significantly local after standardization processing Just it segments.2, it segments.After the text standardized, text is cut into several word and word.Participle is the important group of text analyzing At part, simplest segmenting method is exactly to look up the dictionary.Participle is realized using method of looking up the dictionary, and is in fact exactly a sentence from a left side As soon as scanning time to the right, the word that encountering has in dictionary is identified, encounters compound word and longest word is just looked for match, encounter and do not recognize Word string be just divided into monosyllabic word, then simple participle just completes.It 3, is phonetic symbol table by the text conversion after decomposition Show form and adds command character.The word and word obtain to participle searches its corresponding phonetic symbol and then is stitched together, and completes by words To the conversion process of phonetic symbol.

The input that the result of text analyzing is handled as the rhythm.Rhythm processing is mainly handled tone, rhythm etc.. Such as when there is continuous two third sound, falling tone just is carried out to it, so that synthesis voice is correctly expressed the meaning of one's words, sounds more certainly So.

Input in the last speech synthesis stage, by the output of rhythm processing module as voice synthetic module.Voice Synthesis module searches corresponding example for each phonetic symbol in sound bank, then by the corresponding all notes of text sentence The example of phonemic notation is stitched together, in splicing according to command character it is semantic to the duration of a sound of voice example, prosodic features and Dead time etc. is adjusted, and finally exports the complete phonetic data flow of a sentence.

Speech synthesis is realized using above method, and the calculation amount needed is small, aggregate velocity is fast, and synthesizes speech naturalness It is higher, it is clear that be relatively more suitable for the application in terms of the weaker embedded system of chip performance.Realize speech synthesis, it is also necessary to complete At several preparations: 1, the mapping table of creation Chinese characters in common use Unicode code to phonetic symbol；2, common phrase Unicode code is created To the mapping table of phonetic symbol；3, Chinese sounds library is created.Voice bank has recorded the pronunciation of whole phonetic symbols in mandarin and Guangdong language, by word With the wav formatted file composition that word is unit.

User client and server are communicated using Unix domain agreement.With client and server on the same host TCP compare, the advantage of Unix domain byte stream socket is embodied in the growth of performance.

The electronic reader module realizes user interface using QML, realizes the page in QML insertion JavaScript code Logic, speech synthesis are realized by the speech interface that Microsoft Speech SDK 5.1 is provided.Microsoft Speech SDK provides a set of application programming interface SAPI about speech processes, and SAPI provides the basic letter for realizing TTS program Number provides a high level interface for one between application program and speech engine.SAPI realizes all required to each The details of the low levels such as the real-time control and management of kind speech engine.Application program can be modeled by the object group of IspVoice Type (COM) Interface Controller text compressing.

What has been described above is only a preferred embodiment of the present invention, it is noted that for those of ordinary skill in the art For, without departing from the concept of the premise of the invention, various modifications and improvements can be made, these belong to the present invention Protection scope.

Claims

1. a kind of Multifunctional reader based on speech synthesis technique, which is characterized in that it includes the electronics for connecing user client Reader module, voice synthesizing server module and the microprocessor for being embedded with linux system, electronic reader module and voice The connection of synthesis server module, electronic reader module, voice synthesizing server module are separately connected microprocessor, realize a variety of Text function of reading aloud is read aloud and selected to the file of format, voice synthesizing server module can browse through and read aloud txt, pdf and The file of html format, voice synthesizing server module can be automatically synthesized mandarin, Guangdong language and English；

Electronic reader module includes display screen, communication module, UI module, interactive module and signal processing module, communication module It is responsible for and voice synthesizing server module establishes connection, send request and receives voice data；UI module be responsible for file display, The layout of window member；The operation of interactive module execution user；Signal processing module setting signal handles function, realizes at signal Manage function；

Voice synthesizing server module includes initialization module, concurrent services module, processing module, I/O module and speech synthesis Server processes are initialized as a finger daemon by module, initialization module, a Unix domain socket are then created, from matching It sets and reads protocol address initialization socket in file, then setting signal processing function completes initialization；Concurrent services module It is after the completion of initialization, server starts to execute endless loop, calls accept function to enter sleep state in loop body, visitor After family connection request reaches, accept function is returned, and each client of host process calls fork to derive from a subprocess and mentions for client For service；Processing module is to complete the parsing of request data package, is performed corresponding processing according to different requests, and processing is tied Fruit is packaged into data packet；I/O module is responsible for reading data from client and sends data to client；UI module uses multi-platform C++ graphical user interface application program frame Qt, Qt support the character library of these four formats of TTF, BDF, PFA/PFB and QPF；It is readding It reads to use TTF format font in device；

Voice synthetic module is made of text analyzing, rhythm processing and speech synthesis three parts, completes the conversion of Text To Speech； The work of this process is that the text of input is decomposed into phoneme by word or word, and to the number in text, monetary unit, list Word deformation and punctuate want the symbol of specially treated to be analyzed, and phoneme is generated digital audio and is then played with loudspeaker Out or saves as and played with multimedia software after audio files；The main task of text analyzing has: to the text of input into Row pretreatment, so that text normalization；Specific way includes removing some extra spaces or newline, punctuation mark Some special labels are converted into, including the dead time is identical, tone changes consistent punctuation mark and is uniformly converted to certain Kind special marking；2. segmenting, after the text standardized, text is cut into several word and word；It is realized and is divided using method of looking up the dictionary Word, as soon as a sentence is scanned time from left to right, the word that encountering has in dictionary is identified, encounter compound word just look for it is longest Word matching, encounters unacquainted word string and is just divided into monosyllabic word；3. being phonetic symbol representation by the text conversion after decomposition And command character is added, the word and word obtain to participle searches its corresponding phonetic symbol and then is stitched together, and completes by words to phonetic symbol Conversion process；The input that the result of text analyzing is handled as the rhythm；The rhythm processing mainly to tone, rhythm at Reason, including when the continuous two third sound of appearance, falling tone just is carried out to it, so that synthesis voice is correctly expressed the meaning of one's words；

Input in the last speech synthesis stage, by the output of rhythm processing module as voice synthetic module；Speech synthesis Module searches corresponding example for each phonetic symbol in sound bank, then accords with the corresponding all phonetic notations of text sentence Number example be stitched together, according to the semantic duration of a sound, prosodic features and pause to voice example of command character in splicing Time is adjusted, and finally exports the complete phonetic data flow of a sentence；

Realize speech synthesis, it is also necessary to complete several preparations: the mapping table of creation Chinese characters in common use Unicode code to phonetic symbol； Mapping table of the creation common phrase Unicode code to phonetic symbol；Create Chinese sounds library；Voice bank has recorded in mandarin and Guangdong language The pronunciation of whole phonetic symbols is made of the wav formatted file that word and word are unit；

User client and server are communicated using Unix domain agreement；

The electronic reader module realizes user interface using QML, realizes that the page is patrolled in QML insertion JavaScript code Volume, speech synthesis is realized by the speech interface that Microsoft Speech SDK 5.1 is provided；Microsoft Speech SDK provides a set of application programming interface SAPI about speech processes, and SAPI provides the basic letter for realizing TTS program Number provides a high level interface for one between application program and speech engine；SAPI realizes all required to each The real-time control of kind speech engine and the details of management low level, application program can pass through the object modeling of IspVoice Interface Controller text compressing.