CN1956056A

CN1956056A - Speech synthesis device, speech synthesis method and GPS speech guide system

Info

Publication number: CN1956056A
Application number: CNA2006101171883A
Authority: CN
Inventors: 蒋昌俊; 曾国荪; 陈闳中; 苗夺谦; 阎春钢; 付瑛; 方钰; 何良华
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2006-10-16
Filing date: 2006-10-16
Publication date: 2007-05-02
Anticipated expiration: 2026-10-16
Also published as: CN1956056B

Abstract

A method for synthesizing voice includes disconnecting navigation statement to be multiple voice format road information before a navigation statement is played, using a preset key word to index test format road information in basic databank then analyzing indexed test format road information to obtain its corresponding voice format road information, finally recombining voice format road information relating to navigation statement and using word section as unit then carrying out voice broadcasting. The GPS voice navigation system of synthesizing device used for realizing said method is also disclosed.

Description

Speech synthetic device, phoneme synthesizing method and GPS speech guide system

[technical field]

The present invention relates to a kind of speech synthesis technique, particularly a kind of speech synthetic device and method that is applied to the GPS speech guide system.

[background technology]

GPS navigation is to use the IT equipment that the function of geography information, path navigation is provided to the driver, because in the reality, the driver need pay close attention to the complicated traffic conditions of vehicle periphery, can not remove to note electronic chart constantly, so Voice Navigation becomes one of critical function of GPS navigation system.

In the vehicle mounted guidance field, speech guide system has two kinds of the navigational system of recording navigational system and phonetic synthesis.Present recording navigational system can only be accomplished simple voice content prompting, as: " please turn left for 100 meters in the place ahead ", " please turn left for 100 meters in the place ahead, it is " then powerless to enter Chang'an street if will point out, this is because of Protean similar road name information, and it is unpractical adopting recording technology.Development along with voice technology, with phonetic synthesis, speech recognition, encoding and decoding speech is that the intelligent sound technology of representative is applied in automotive field, by voice information services in the car and Che Nei voice acoustic control operation, changed the existing man-machine information interchange mode of automobile, make automobile more possess hommization glamour and personalized characteristic, improve the safer property of driving.This technology has caused the extensive concern of domestic and international automobile industry, especially in auto industry developed countries such as the U.S., Europe, Japan, all in the dynamics that continues to increase research and industrialization.

In addition, general GPS Voice Navigation equipment all is based on the winCE platform in the prior art, and the speech synthesis technique that adopts all is based on character library, and synthetic voice messaging does not have intonation, be weak in pronunciation apart from bigger with true man.

[summary of the invention]

A kind of GPS speech guide system that provides complicated voice content to point out is provided fundamental purpose of the present invention.

For achieving the above object, the invention provides a kind of speech synthetic device, be used for the GPS speech guide system, this GPS speech guide system also comprises the GPS navigation device that is connected with speech synthetic device, this speech synthetic device comprises basic data storehouse and speech play execution module, wherein, this basic data storehouse further comprises voice memory unit, is the phonetic matrix road information of unit in order to what use in the storage road guide process with the speech section, and index storage unit, storage comprises the textual description of this phonetic matrix road information at least, the text formatting road information of side-play amount and data length, and text form road information and phonetic matrix road information are one-to-one relationship; This speech play execution module further comprises analytic unit, before playing a navigation statement, the statement that will navigate earlier is split as a plurality of phonetic matrix road informations, and with the text formatting road information in the predetermined keyword retrieval basic document storehouse, by parsing, obtain corresponding with it phonetic matrix road information again to the text formatting road information; And broadcast unit, be the described navigation statement of phonetic matrix road information reorganization formation of unit with all with the speech section earlier, report again.

The present invention also provides a kind of phoneme synthesizing method, be used for the GPS speech guide system, this GPS speech guide system comprises basic data storehouse, speech synthetic device and GPS navigation device, this phoneme synthesizing method comprises the following steps: that at first what used in the storage road guide process is the phonetic matrix road information of unit with the speech section; Generate the text formatting road information of the textual description, side-play amount and the data length that comprise described phonetic matrix road information at least accordingly by this phonetic matrix road information; Before playing a navigation statement, the statement that will navigate earlier is split as a plurality of phonetic matrix road informations, and with the text formatting road information in the predetermined keyword retrieval basic document storehouse; The text formatting road information that parsing retrieves obtains corresponding with it phonetic matrix road information; At last, what described navigation statement was related to is the phonetic matrix road information reorganization of unit with the speech section, carries out voice broadcast again.

The present invention also provides a kind of GPS speech guide system, and this GPS speech guide system adopts the synthetic navigation of aforesaid speech synthetic device statement, is play again.

Because speech synthetic device of the present invention, phoneme synthesizing method and GPS speech guide system are to be basic phonetic unit with the speech section, this speech section can be information such as link name, crossing, road name, direction information, travel speed or operating range, therefore can realize more complicated navigation Service accurately, in addition, because the mode that the present invention adopts the text formatting road information to combine with voice format information, the navigation Service system that has reduced prior art in use causes the speed that improves voice suggestion to postpone because internal system is operated, and then has improved service quality.

[description of drawings]

Fig. 1 shows the block scheme of the speech synthetic device of preferred embodiment of the present invention;

Fig. 2 shows the speech synthetic device Chinese version form road information of preferred embodiment of the present invention and the corresponding relation of phonetic matrix road information; And

Fig. 3 shows the workflow diagram of phoneme synthesizing method of the present invention.

[embodiment]

In order to be illustrated more clearly in technical scheme of the present invention and technique effect, below in conjunction with description of drawings speech synthetic device of the present invention, phoneme synthesizing method and preferred embodiment with GPS speech guide system of this speech synthetic device.

See also Fig. 1, wherein show the block scheme of the speech synthetic device of preferred embodiment of the present invention.As shown in the figure, speech synthetic device 10 of the present invention, be used for GPS speech guide system 1, this GPS speech guide system 1 also comprises the GPS navigation device 20 that is connected with this speech synthetic device 10, this speech synthetic device 10 comprises basic data storehouse 100 and speech play execution module 110, in the present embodiment, literary composition/this speech synthesis technique that this GPS speech guide system 1 adopts based on dictionary.

This basic data storehouse 100 further comprises, voice memory unit 1001, is the phonetic matrix road information of unit in order to what use in the storage road guide process with the speech section, in the present embodiment, this phonetic matrix road information is the wav formatted file of true man's pronunciation, comprised more used road name information in the road guide process, turned to, distance or travel speed etc., these all phonetic matrix road informations all are stored in the data field data block of this voice memory unit 1001; And index storage unit 1002, storage comprises the textual description of described phonetic matrix road information at least, the text formatting road information of side-play amount and data length, in the present embodiment, described text formatting road information is to be recorded as storage cell, every record comprises the textual description of voice content, side-play amount, data length (is unit with the byte), wherein, side-play amount is the position offset of this section voice content in voice memory unit, data length is a speech data length, the textual description of voice content is used for index as key word, and side-play amount and data length are used for the location, and, wherein show the corresponding relation of text form road information and phonetic matrix road information please in conjunction with shown in Figure 2.In more detail, consider the size of storage space, be to adopt 22050Hz and monaural sample format that road information is sampled to record in the present embodiment, be kept among the data field data block of this voice memory unit 1001, spaced apart between every section speech data with 4 bytes complete zero, whole voice memory unit 1001 is include file head and data field, and the content of its file header is referring to as following table 1.

Shared byte	Describe	Content
Shared byte	Describe	Content	0-3	The resource file sign	‘RIFF’
4-7	The Wav file size	Speech data length+40	0-3	The resource file sign	‘RIFF’
4-7	The Wav file size	Speech data length+40	8-11	The voice document sign	WAVE
12-15	The file header identification code	‘fmt’	8-11	The voice document sign	WAVE
12-15	The file header identification code	‘fmt’	16-19	The file header size of data	16
20-21	Audio format	1 (pcm encoder)	16-19	The file header size of data	16
20-21	Audio format	1 (pcm encoder)	22-23	Port number	1 (monophony)
24-27	Sample frequency	22050Hz	22-23	Port number	1 (monophony)
24-27	Sample frequency	22050Hz	28-31	The waveform transfer rate	44100Hz
32-33	Data block is adjusted number	2	28-31	The waveform transfer rate	44100Hz
32-33	Data block is adjusted number	2	34-35	Sampling precision	16bit
36-39	The data field identification code	‘data’	34-35	Sampling precision	16bit
36-39	The data field identification code	‘data’	40-43	The size of data of data field	Speech data length

＞＝44

The data of data field

Speech data

Table 1

Wherein, the storage format of speech data is: first L channel, back R channel (0: L channel, 1: R channel); Elder generation's low byte, back high byte.Finally recording the voice memory unit capacity of finishing is about 200M.

This speech play execution module 110 further comprises, analytic unit 1101, before playing a navigation statement, earlier the target navigation statement is split as a plurality of phonetic matrix road informations, and with the text formatting road information in the predetermined keyword retrieval basic document storehouse 100, again by parsing to the text formatting road information, obtain corresponding with it phonetic matrix road information, in the present embodiment, when needs are play the target navigation statement, call this analytic unit 1101 and resolve the file header information of this phonetic matrix path file, in addition, when needs expansion or renewal navigation area, can also record new phonetic matrix road information, when recording, call the file header information that this analytic unit 1101 generates new phonetic matrix path file, can obtain all parameters relevant with reference to the file header content shown in the table 1 with this phonetic matrix road information; And broadcast unit 1102, with the speech section the described navigation statement of phonetic matrix road information reorganization formation of unit with all earlier, report again, in the present embodiment, carry out actual playback operation by this broadcast unit 1102, its groundwork has: the voice memory unit 1001 of opening audio frequency apparatus and basic data storehouse 100, the resolution file head, the parameter that audio frequency apparatus is set according to file header (comprises port number, sample frequency, sampling precision etc.), and navigate to the data field of voice memory unit 1001, reading of data is in internal memory circularly, afterwards it is write audio frequency apparatus, the voice memory unit 1001 of closing audio frequency apparatus and basic data storehouse 100 after read-write is finished; When needs carry out recording operation, groundwork is: the voice memory unit 1001 of opening audio frequency apparatus and basic data storehouse 100, read the parameter setting of audio frequency apparatus and generate the file header of voice memory unit 1001 according to these information, read voice data to internal memory from audio frequency apparatus, then it is write in the data field of voice memory unit 1001, till End of Tape, close the voice memory unit 1001 in audio frequency apparatus and basic data storehouse 100.

See also Fig. 3, wherein show the workflow diagram of phoneme synthesizing method of the present invention.Describe below in conjunction with specific embodiment, when in GPS speech guide system 1, in the GPS navigation process, driving to the intersection of Chifeng road and Siping Road, a navigation statement that needs broadcast: " Siping Road is turned left to arrive in the place ahead ", step S10, the statement that should navigate is divided into single speech section " the place ahead ", " left-hand rotation ", " arrival " and " Siping Road ", and the parameter that promptly in fact passes in the speech synthetic device 10 is each speech section that is spliced into this navigation statement.

Step S11 is described as the record of this speech section of keyword search with the information text of the phonetic matrix road information of speech section correspondence in index storage unit.

Step S12 judges whether the speech section of taking out recently is somebody's turn to do last speech section of navigation statement, if not then proceed to step S13, otherwise proceed to step S17.

Step S13 judges that whether the speech section of taking out recently is somebody's turn to do the initial speech section of navigation statement, if then proceed to step S14, otherwise directly proceeds to step S15.

Step S14, in internal memory, create an interim voice document, this interim voice document is promptly represented the navigation statement that current needs are reported, generate the file header information of this document simultaneously, this interim voice document and file header also can be stored in voice memory unit, and generate the provisional version file corresponding with this interim voice document, this provisional version file also can be stored in the index storage unit.

Step S15 reads the speech segment record according to " textual description " of speech segment record for keyword in index storage unit.

Step S16 reads the speech data of " size of data " length in internal memory according to " side-play amount " of speech segment record in voice memory unit.

Step S17 after having read last speech section of this navigation statement, closes interim voice document.

Step S18 plays the complete navigation statement that generates.

In the present embodiment, above-mentioned speech synthetic device is arranged in the integrated GPS navigation system 1 of vision and voice, also comprise by GPS navigation device 20 in this GPS navigation system 1, it is mainly by GPS receiver receiving satellite signal, according to the longitude and latitude that evolves, determine current vehicle position, and be that the center of navigation map is amplified and to be shown to whole map display interface with this position, brush navigation map in real time, show the ambient conditions in vehicle periphery zone.When obtaining the effective GPS data, start the voice suggestion that voice guiding device gives to navigate statement, display text explanation on navigation map simultaneously comprises link name, road information and the present speed of current driving.When vehicle during near intersection, the system voice prompting turns to and next crossing name, and simultaneously for driving over the speed limit, promptly the travel speed of vehicle exceeds the qualification of traffic rules, with voice suggestion driver safe driving.

For this speech synthetic device 10 and this GPS navigation device 20 is to create a sub-process specific in carrying out phonetic synthesis by master routine, when needs carry out the Voice Navigation prompting, master routine is that unit passes to subprocess with the navigation statement that will play with the speech section, is finished the overall process of speech play again by subprocess.Described subprocess is flow process shown in Figure 3.

GPS speech guide system of the present invention is at the hardware configuration aspect gps data processing and the image demonstration and flow process realizes and prior art is basic identical, is not described in detail at this.

Claims

1. speech synthetic device, be used for the GPS speech guide system, described GPS speech guide system also comprises the GPS navigation device that is connected with described speech synthetic device, and described speech synthetic device comprises basic data storehouse and speech play execution module, it is characterized in that:

Described basic data storehouse further comprises (1) voice memory unit, is the phonetic matrix road information of unit in order to what use in the storage road guide process with the speech section, and (2) index storage unit, storage comprises the text formatting road information of textual description, side-play amount and data length of described phonetic matrix road information at least, and described text formatting road information and phonetic matrix road information are one-to-one relationship;

Described speech play execution module further comprises (1) analytic unit, before playing a navigation statement, earlier described navigation statement is split as a plurality of phonetic matrix road informations, and with the text formatting road information in the predetermined described basic document of the keyword retrieval storehouse, by parsing, obtain corresponding with it phonetic matrix road information again to the text formatting road information; And (2) broadcast unit, be the described navigation statement of phonetic matrix road information reorganization formation of unit with all with the speech section earlier, report again.

2. speech synthetic device according to claim 1 is characterized in that, described phonetic matrix road information is the wav file.

3. speech synthetic device according to claim 1 is characterized in that, institute's predicate section is link name, crossing, road name, direction information, travel speed or operating range.

4. speech synthetic device according to claim 1 is characterized in that, described text formatting road information is a txt file.

5. speech synthetic device according to claim 1 is characterized in that, described side-play amount is the memory location side-play amount of described phonetic matrix road information in described voice memory unit.

6. speech synthetic device according to claim 1 is characterized in that, described data length be described phonetic matrix road information be the speech data length of unit with the byte.

7. speech synthetic device according to claim 1 is characterized in that, described predetermined keyword is the textual description of described phonetic matrix road information.

8. speech synthetic device according to claim 1 is characterized in that, described broadcast unit also can be recorded new phonetic matrix road information again according to the expansion or the variation of navigation area.

9. speech synthetic device according to claim 8 is characterized in that, records when generating new phonetic matrix road information when described broadcast unit, should generate corresponding with it text formatting road information simultaneously.

10. a GPS speech guide system is characterized in that, described GPS speech guide system adopts the synthetic navigation of the described speech synthetic device of claim 1 statement, is play again.

11. phoneme synthesizing method, be used for the GPS speech guide system, described GPS speech guide system comprises that basic data storehouse, speech synthetic device reach and the interconnective GPS navigation device of described speech synthetic device, is characterized in that described phoneme synthesizing method comprises:

What used in the storage road guide process is the phonetic matrix road information of unit with the speech section;

Generate the text formatting road information of the textual description, side-play amount and the data length that comprise described phonetic matrix road information at least accordingly by described phonetic matrix road information;

Before playing a navigation statement, earlier described navigation statement is split as a plurality of phonetic matrix road informations, and with the text formatting road information in the predetermined described basic document of the keyword retrieval storehouse;

The text formatting road information that parsing retrieves obtains corresponding with it phonetic matrix road information;

What described navigation statement was related to is the phonetic matrix road information reorganization of unit with the speech section, carries out voice broadcast again.