Multifunctional reader based on speech synthesis technique
Technical field
The present invention relates to intelligent sound synthesis technical fields, more particularly to a kind of readding based on speech synthesis technique
Read device.
Background technique
With the further exploitation and popularization of domestic and international various embedded products, embedded technology and people's lives are more next
More combine closely.In embedded device, for example, e-book, mobile phone, intelligent toy, information household appliances and vehicle GPS, speech synthesis
(TTS, Text-To-Speech) technology is in an increasingly wide range of applications.It is provided using GPS (GPS)
Condition of road surface and location information are a main trend of transport and communications industry, almost have become the necessary equipment of vehicle.In vehicle GPS
Middle addition TTS technology, driver can eye hurry hand it is busy in the case where, listen to road conditions multidate information and logical in real time by voice
Know, announce, obtain the customized information of car owner's customization in time, plane display navigation is risen into three-dimensional Voice Navigation;Blind person is
One disadvantaged group of our societies, they can not read common books and newspapers, and the books for the blind that touching reading can only be leaned on thick are learned
It practises, but books for the blind volume is big, at high cost, information content is few, leverages them and obtains the efficiency of knowledge and information.And in electronics
TTS technology is integrated in book, see e-book not only can, moreover it is possible to be listened, be read and brought conveniently for blind person;With mobile communication row
The development of industry, mobile phone become increasingly popular, and demand is doubled and redoubled.Mobile phone as mobile communication terminal just towards miniaturization, it is multi-functional
Change, personalized direction is developed.Mobile phone with TTS function can use voice broadcast incoming number, summarize Email content,
Calendar prompting is given, network information etc. is listened to.Embedded TTS technology can also be in numerous necks such as Intelligent voice toy, measuring instrument
It is used widely in domain.From the point of view of development trend, overall application of the TTS technology on embedded device has become not coilable
Trend.Although e-book product on the market all incorporates TTS function mostly at present, these products can only read aloud txt format
File, and the e-book of txt format and few causes the application of TTS to be had a greatly reduced quality.Moreover, every book every time will be from
One page the first row starts to read aloud, the not selectable right of user, it is not possible to specify any desired content to be read aloud, design
Impersonality.On the other hand, speech synthesis technique passes through the development of many years, and technology reaches its maturity, has been provided with quite high
Use value.But as application field constantly extends, to TTS technology itself, higher requirements are also raised.In some special necks
Domain needs different pronunciation styles, and certain places it may also be desirable to local dialect, the speech synthesis etc. with certain emotion.Just
For TTS current synthetic effect, the effect for comparing the text synthesis of specification for general prompt term, news category etc. compares
Ideal, and those contain the text (such as poem) for going up and down the prosodic information of emergency, modulation in tone emotion, synthesis effect to other
Fruit is then unsatisfactory.Although that is, present tts system substantially meets our requirement in intelligibility, in nature
Degree aspect but differs greatly from the requirement of people, synthesizes the voice of output with apparent machine taste.It really is able to substitution people
Tts system to read occurs not yet, uses in the larger context to also restrict tts system.
Summary of the invention
The object of the invention is in order to a kind of structure for solving the deficiency of the prior art and providing it is simple, it is easy to use,
The Multifunctional reader based on speech synthesis technique of use scope of the speech synthesis in e-book can be expanded.
The present invention is that a kind of more function based on speech synthesis technique are attained in that using following technical solution
It can reader, which is characterized in that it includes the electronic reader module for connecing user client, voice synthesizing server module and interior
Microprocessor embedded with linux system, electronic reader module are connect with voice synthesizing server module, electronic reader mould
Block, voice synthesizing server module are separately connected microprocessor, and electronic reader module includes display screen, communication module, UI mould
Block, interactive module and signal processing module, communication module is responsible for and voice synthesizing server module establishes connection, send request and
Receive voice data;UI module is responsible for the layout of the display of file, window member;The operation of interactive module execution user;Signal
Processing module setting signal handles function, realizes signal process function;Voice synthesizing server module includes initialization module, simultaneously
Server processes are initialized as one by hair service module, processing module, I/O module and voice synthetic module, initialization module
Then finger daemon creates a Unix domain socket, protocol address is read from configuration file and initializes socket, is then set
It sets signal process function and completes initialization;Concurrent services module is after the completion of initialization, and server starts to execute endless loop,
Accept function is called to enter sleep state in loop body, after client connection request reaches, accept function is returned, host process
It is offering customers service that each client, which calls fork to derive from a subprocess,;Processing module is to complete the parsing of request data package,
It performs corresponding processing according to different requests, and processing result is packaged into data packet;I/O module is responsible for reading from client
Data and to client send data;Voice synthetic module is made of text analyzing, rhythm processing and speech synthesis three parts, is completed
The conversion of Text To Speech.
As a further illustration of the above scheme, the micro process is provided with USB interface and configures USB peripheral driving.
The microprocessor is and to build a cross compile system using Linux as host operating system.
The present invention is using the above-mentioned attainable beneficial effect of technical solution:
1, based on speech synthesis technique, the embedded e-book phonetic reader of exploitation is carried out the present invention by TTS
It reads aloud, user can obtain the information that they want, and can will be ordered by developing RSS reader under windows platform
The content read is bright to read out, and subscribed content can also be carried out to the file preservation that literary periodicals export to wav format, with standby user
It needs.
2, e-book phonetic reader have it is small in size, light-weight, can store required books, easy to carry, low power consuming,
Save the cost, the advantages that being recycled.
3, the present invention develops software and hardware one using embedded Qt graphics frame and the Speech Synthesis Algorithm voluntarily realized
The Portable e-book phonetic reader of change can browse through and read aloud the file of txt, pdf and html format, support mandarin,
Guangdong language and English, compared to e-book product in the market, the file that sharpest edges are the provision of multiple format is read aloud and is selected
Text function of reading aloud expands use scope of the speech synthesis in e-book, and supports the reading of Guangdong language.
4, a big characteristic of the invention is that TTS function is incorporated in RSS reader.The RSS reader of mainstream is simultaneously at present
TTS is not provided, but TTS is highly suitable for used in RSS reader, because the content of RSS is succinct, contains much information, does not have
There are the interference informations such as advertisement, content is also that user needs just to subscribe to, so being read aloud by TTS, user can hear them
Really desired information, meanwhile, reader, which is also provided, to carry out speech synthesis for subscribed content and exports to the human nature of wav file
Change function, user " can listen to " information of their subscription by playing export whenever and wherever possible.
Detailed description of the invention
Fig. 1 is general structure schematic diagram of the invention;
Fig. 2 is electronic reader module diagram;
Fig. 3 is voice synthesizing server the functional block diagram;
Fig. 4 is speech synthesis flow chart;
Fig. 5 is that server of the invention interacts schematic diagram with client.
Description of symbols: 1, electronic reader module 1-1, communication module 1-2, UI module 1-3, interactive module 1-
4, signal processing module 2, voice synthesizing server module 2-1, initialization module 2-2, concurrent services module 2-3, processing
Module 2-4, I/O module 2-5, voice synthetic module 3, microprocessor 4, user client.
Specific embodiment
As Figure 1-Figure 5, a kind of Multifunctional reader based on speech synthesis technique of the present invention, it includes meeting user visitor
Electronic reader module 1, voice synthesizing server module 2 and the microprocessor 3 for being embedded with linux system at family end 4, electronics is read
It reads device module 1 to connect with voice synthesizing server module 2, electronic reader module 1, voice synthesizing server module 2 connect respectively
Microprocessor 3 is connect, realizes that text function of reading aloud is read aloud and selected to the file of multiple format, voice synthesizing server module can be clear
The file of txt, pdf and html format is look at and reads aloud, voice synthesizing server module can be automatically synthesized mandarin, Guangdong language and English
Language.Electronic reader module includes display screen, communication module 1-1, UI module 1-2, interactive module 1-3 and signal processing module 1-
4, communication module is responsible for and voice synthesizing server module establishes connection, sends request and receives voice data;UI module is responsible for
The display of file, window member layout;The operation of interactive module execution user;Signal processing module setting signal handles letter
Number realizes signal process function;Voice synthesizing server module 2 includes initialization module 2-1, concurrent services module 2-2, processing
Module 2-3, I/O module 2-4 and voice synthetic module 2-5, initialization module by server processes be initialized as one guard into
Then journey creates a Unix domain socket, protocol address is read from configuration file and initializes socket, then setting signal
It handles function and completes initialization;Concurrent services module is after the completion of initialization, and server starts to execute endless loop, in loop body
Interior calling accept function enters sleep state, and after client connection request reaches, accept function is returned, each visitor of host process
It is offering customers service that family, which calls fork to derive from a subprocess,;Processing module is to complete the parsing of request data package, according to not
Same request performs corresponding processing, and processing result is packaged into data packet;I/O module be responsible for from client read data and
Data are sent to client.In the present embodiment, UI module uses multi-platform C++ graphical user interface application program frame (Qt),
Qt included character library display effect Chinese under embedded environment is simultaneously bad, so reader is supported using third party's character library
The display of Chinese.The character library of Qt support these four formats of TTF, BDF, PFA/PFB and QPF.If straight in embedded board
It connects using PFA/PFB, then application program can just calculate dot matrix when display, final the effect is unsatisfactory, word occurs
Body situation not of uniform size.And this font occupies very more FLASH and memory, and loading velocity is also relatively slow;BDF font
There is identical problem;Although the load of QPF font speed is fast, font pantograph is not supported;TTF font not only supports font to contract
It puts, and loading velocity is very fast, therefore uses TTF format font in reader.
As shown in Figure 3 and Figure 4, voice synthetic module 2-5 is by text analyzing, rhythm processing and speech synthesis three parts group
At completing the conversion of Text To Speech.The text of input is mainly decomposed into phoneme by word or word by the work of this process, and
And the symbol of specially treated is wanted to analyze number, monetary unit, word deforming and the punctuate etc. in text, and by sound
Element is generated then digital audio plays back or save as audio files with loudspeaker after and is played with multimedia software.Wherein,
The main task of text analyzing has: 1, pre-processing to the text of input, so that text normalization.Specific way is such as removed
Punctuation mark is converted into some special labels by some extra spaces or newline, such as the dead time is identical
, tone changes consistent punctuation mark and is uniformly converted to certain special marking.It, can be significantly local after standardization processing
Just it segments.2, it segments.After the text standardized, text is cut into several word and word.Participle is the important group of text analyzing
At part, simplest segmenting method is exactly to look up the dictionary.Participle is realized using method of looking up the dictionary, and is in fact exactly a sentence from a left side
As soon as scanning time to the right, the word that encountering has in dictionary is identified, encounters compound word and longest word is just looked for match, encounter and do not recognize
Word string be just divided into monosyllabic word, then simple participle just completes.It 3, is phonetic symbol table by the text conversion after decomposition
Show form and adds command character.The word and word obtain to participle searches its corresponding phonetic symbol and then is stitched together, and completes by words
To the conversion process of phonetic symbol.
The input that the result of text analyzing is handled as the rhythm.Rhythm processing is mainly handled tone, rhythm etc..
Such as when there is continuous two third sound, falling tone just is carried out to it, so that synthesis voice is correctly expressed the meaning of one's words, sounds more certainly
So.
Input in the last speech synthesis stage, by the output of rhythm processing module as voice synthetic module.Voice
Synthesis module searches corresponding example for each phonetic symbol in sound bank, then by the corresponding all notes of text sentence
The example of phonemic notation is stitched together, in splicing according to command character it is semantic to the duration of a sound of voice example, prosodic features and
Dead time etc. is adjusted, and finally exports the complete phonetic data flow of a sentence.
Speech synthesis is realized using above method, and the calculation amount needed is small, aggregate velocity is fast, and synthesizes speech naturalness
It is higher, it is clear that be relatively more suitable for the application in terms of the weaker embedded system of chip performance.Realize speech synthesis, it is also necessary to complete
At several preparations: 1, the mapping table of creation Chinese characters in common use Unicode code to phonetic symbol;2, common phrase Unicode code is created
To the mapping table of phonetic symbol;3, Chinese sounds library is created.Voice bank has recorded the pronunciation of whole phonetic symbols in mandarin and Guangdong language, by word
With the wav formatted file composition that word is unit.
User client and server are communicated using Unix domain agreement.With client and server on the same host
TCP compare, the advantage of Unix domain byte stream socket is embodied in the growth of performance.
The electronic reader module realizes user interface using QML, realizes the page in QML insertion JavaScript code
Logic, speech synthesis are realized by the speech interface that Microsoft Speech SDK 5.1 is provided.Microsoft Speech
SDK provides a set of application programming interface SAPI about speech processes, and SAPI provides the basic letter for realizing TTS program
Number provides a high level interface for one between application program and speech engine.SAPI realizes all required to each
The details of the low levels such as the real-time control and management of kind speech engine.Application program can be modeled by the object group of IspVoice
Type (COM) Interface Controller text compressing.
What has been described above is only a preferred embodiment of the present invention, it is noted that for those of ordinary skill in the art
For, without departing from the concept of the premise of the invention, various modifications and improvements can be made, these belong to the present invention
Protection scope.