CN102436808A

CN102436808A - Digital bidirectional intelligent voice explanation system and method thereof

Info

Publication number: CN102436808A
Application number: CN2011103531723A
Authority: CN
Inventors: 陆德宝; 吕杰; 吴海涛
Original assignee: ANTONG TECHNOLOGY BUSINESS DEVELOPMENT Co Ltd
Current assignee: ANTONG TECHNOLOGY BUSINESS DEVELOPMENT Co Ltd
Priority date: 2011-11-09
Filing date: 2011-11-09
Publication date: 2012-05-02
Anticipated expiration: 2031-11-09
Also published as: CN102436808B

Abstract

The invention provides a digital bidirectional intelligent voice explanation system that comprises a PC server and terminal devices and is based on a TTS. The PC server is used for management and configuration of corresponded files as well as all the terminal devices can work independently; therefore, the system has good stability. According to the current explanation system, pronounced explanation only can be carried out according to a prestored pronunciation text. However, the provided system is different from the current explanation system. All the terminal devices in the system can select to carry out pronunciation according to a prestored pronunciation text or directly carry out voice amplification on on-site sounds of a narrator; and the pronunciation explanation is not influenced by a machine; therefore; demand maximization can be realized; and when there is a fault on the equipment, an emergent measure can be taken. Pronunciation files are stored and managed in a test mode; occupied resources are small; application flexibility and maintainability are strong; and a pronunciation content can be changed by modifying a text. And voice identification key words are stored and managed in a text mode as well as configuration modification can be carried out on the key words according to different demands of venues; and even when exhibits in the venues are changed, rapid configuration can also be carried out by the server and thus the flexibility is strong.

Description

Digital bidirectional intelligent sound introduction system and method thereof

Technical field

The present invention relates to speech sound eeplaining system, be mainly used in various venues, mutual to the intelligent sound explanation and the intelligent man machine language of product, showpiece.

Background technology

Along with the continuous development, particularly speech recognition algorithm of electronic technology and infotech and the development of phonetic synthesis algorithm, replace the work of repeater's power with it, and can embody the operation of its hommization.Similar introduction system is also arranged on the market, but be based on voice storages and the mode of calling mostly, this framework, sound pronunciation are to leave calling of voice data file in the storer in advance in, and it is also more inflexible pronounce, and system's hommization is poor with maintainability.System is also arranged based on TTS, but algorithm must rely on the completion with PC, in concrete the application, system cost is high, particularly goes up the system of certain scale, and stability also can't guarantee.

Summary of the invention

The technical matters that the present invention will solve is: a kind of digital bidirectional intelligent sound introduction system is provided, and its each end device can work independently.

The present invention solves the problems of the technologies described above the technical scheme of being taked to be: digital bidirectional intelligent sound introduction system; Comprise PC server and end device; It is characterized in that: described PC server comprises: NIU is used for the communication between PC server and each end device; The pronunciation text library is used for depositing in advance text and the corresponding terminal number thereof that needs are play; The speech recognition key word library is used for depositing in advance each end device speech recognition key word and corresponding terminal number thereof; The terminal monitoring administrative unit, the presence and the running status that are used to monitor and manage each end device; Terminal equipment database is used to deposit each terminal number, status information, current pronunciation text numbering and current identidication key numbering;

Described end device comprises the network communication unit, be used for and the PC server between communication; Storage unit is used to deposit the pronunciation text and the speech recognition key word that are passed over by the network communication unit; The TTS unit that pronounces is used for the pronunciation text is carried out phonetic synthesis, the output audio digital signals; Pickup unit is used to wait for the voice command that the user sends and carries out the local voice collection; Voice recognition unit, the voice messaging that is used for that pickup unit is collected carry out modeling identification, and result and identification item after the identification are compared, and trigger TTS pronunciation unit then and carry out phonetic synthesis; Trigger receiving element, be used for the local control audio switch unit that triggers, select by machine explanation and artificial explanation dual mode automatically; The D/A converting unit is used for converting the pronounce audio digital signals of unit output of TTS to analog voice signal; The audio frequency switch unit is used to switch the synthetic analog voice signal of machine and the analog voice signal of artificial microphone; Power amplifier unit is used for the analog voice signal of audio frequency switch unit output is carried out power amplification, and sends loudspeaker to.

The running status of described each end device comprises ERST, explanation state, status recognition and holding state.

Digital bidirectional intelligent sound explanation method, it is characterized in that: it may further comprise the steps:

Step 1, need pronunciation text, speech recognition key word and the corresponding terminal of pronunciation number to bind from the pronunciation text library that configures and the selection of speech recognition key word library;

Communicating by letter with the NIU of PC server in the network communication unit of step 2, end device, from the speech recognition key word and the pronunciation text of its counterpart terminal of PC downloaded number, and leaves storage unit in;

Step 3, pickup unit receive narrator's voice control command;

Step 4, voice recognition unit find corresponding pronunciation text according to recognition result after the voice control command that receives is discerned in storage unit, and the text storage address of will pronouncing passes to TTS pronunciation unit;

Step 5, TTS pronunciation unit become the pronunciation text-converted of correspondence audio digital signals and pass to the D/A converting unit to convert analog voice signal into;

Step 6, judgement trigger receiving element and whether receive trigger pip: carry out local public address if there is the voice control command that then by the audio frequency switch unit pickup unit is received directly to be sent to power amplifier unit; If do not have and then switched analog voice signal to be passed to power amplifier unit and carry out power amplification, and send loudspeaker to and play by the D/A converting unit;

The terminal monitoring unit of step 7, PC server is monitored all end devices constantly; The timed sending querying command is given each end device; Wait for the return command of counterpart terminal device then, and the presence and the running status of counterpart terminal is kept in the terminal equipment database;

Beneficial effect of the present invention is:

1, native system is based on TTS, and wherein the PC server only supplies management and configuration corresponding document, and each end device can work independently, and possesses better stability.

2, can only pronounce to explain different according to the pronunciation text that prestores with existing introduction system; Each end device of native system can be selected by the pronunciation voicing text that prestores; Perhaps direct on-the-spot sound with the narrator amplifies; Not disturbed by machine, with maximization and the emergency measure when equipment breaks down of satisfying the demands.

3, the pronunciation file is deposited and managed with the form of text, and it is little to take resource, and application flexibility is strong, and is maintainable strong; Can change the sounding content through revising text.

4, the speech recognition key word is deposited and is managed with textual form, can be configured modification according to the different demands of venue, even when showpiece changes to some extent in the pipe then and there, can dispose fast through server equally, and dirigibility is very big.

Description of drawings

Fig. 1 is the end device structured flowchart.

Fig. 2 is a PC server architecture block diagram.

Fig. 3 is a PC server system process flow diagram.

Fig. 4 is the end device system flowchart.

Embodiment

Present embodiment comprises PC server as shown in Figure 2 and end device as shown in Figure 1, and end device can be provided with a plurality of as required, and each end device is provided with a terminal number.

The PC server comprises: NIU is used for the communication between PC server and each end device; The pronunciation text library is used for depositing in advance text and the corresponding terminal number thereof that needs are play; The speech recognition key word library is used for depositing in advance each end device speech recognition key word and corresponding terminal number thereof; The terminal monitoring administrative unit, the presence and the running status that are used to monitor and manage each end device; Terminal equipment database is used to deposit each terminal number, status information, current pronunciation text numbering and current identidication key numbering.

End device comprises the network communication unit, be used for and the PC server between communication; Storage unit is used to deposit the pronunciation text and the speech recognition key word that are passed over by the network communication unit; The TTS unit that pronounces is used for the pronunciation text is carried out phonetic synthesis, the output audio digital signals; Pickup unit is used to wait for the voice command that the user sends and carries out the local voice collection; Voice recognition unit, the voice messaging that is used for that pickup unit is collected carry out modeling identification, and result and identification item after the identification are compared, and trigger TTS pronunciation unit then and carry out phonetic synthesis; Trigger receiving element, be used for the local control audio switch unit that triggers, select by machine explanation and artificial explanation dual mode automatically; The D/A converting unit is used for converting the pronounce audio digital signals of unit output of TTS to analog voice signal; The audio frequency switch unit is used to switch the synthetic analog voice signal of machine and the analog voice signal of artificial microphone; Power amplifier unit is used for the analog voice signal of audio frequency switch unit output is carried out power amplification, and sends loudspeaker to.

Fig. 3 is a PC server system process flow diagram, and Fig. 4 is the end device system flowchart, and the two flow process is combined constitutes digital bidirectional intelligent sound explanation method, may further comprise the steps:

Step 3, pickup unit receive narrator's voice control command;

The terminal monitoring unit of step 7, PC server is monitored all end devices constantly; The timed sending querying command is given each end device; Wait for the return command of counterpart terminal device then, and the presence and the running status of counterpart terminal device is kept in the terminal equipment database;

The above is merely the preferred embodiments of the present invention; Be not limited to the present invention; Although the present invention has been carried out detailed explanation with reference to previous embodiment; For a person skilled in the art, it still can be made amendment to the technical scheme that aforementioned each embodiment put down in writing, and perhaps part technical characterictic wherein is equal to replacement.All within spirit of the present invention and principle, any modification of being done is equal to replacement, improves and waits and all should be included within protection scope of the present invention.

Claims

1. digital bidirectional intelligent sound introduction system comprises PC server and end device, it is characterized in that: described PC server comprises: NIU is used for the communication between PC server and each end device; The pronunciation text library is used for depositing in advance text and the corresponding terminal number thereof that needs are play; The speech recognition key word library is used for depositing in advance each end device speech recognition key word and corresponding terminal number thereof; The terminal monitoring administrative unit, the presence and the running status that are used to monitor and manage each end device; Terminal equipment database is used to deposit each terminal number, status information, current pronunciation text numbering and current identidication key numbering;

2. digital bidirectional intelligent sound introduction system according to claim 1 is characterized in that: the running status of described each end device comprises ERST, explanation state, status recognition and holding state.

3. digital bidirectional intelligent sound explanation method, it is characterized in that: it may further comprise the steps:

Step 3, pickup unit receive narrator's voice control command;

The terminal monitoring unit of step 7, PC server is monitored all end devices constantly; The timed sending querying command is given each end device; Wait for the return command of counterpart terminal device then, and the presence and the running status of counterpart terminal is kept in the terminal equipment database.