CN100559368C

CN100559368C - The automatic making of audible text and the method for broadcast

Info

Publication number: CN100559368C
Application number: CNB2004100280634A
Authority: CN
Inventors: 韦岗; 张军
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2004-07-14
Filing date: 2004-07-14
Publication date: 2009-11-11
Anticipated expiration: 2024-07-14
Also published as: CN1595397A

Abstract

The present invention relates to a kind of automatic method for making of audible text, comprising: the cutting of text is sizeable several sections according to the feature of literal with text dividing, and cut-off is the end points of sentence; Voice synchronous points detects, and finds out the voice synchronous points corresponding with text dividing point; Phonetic segmentation with whole section voice in the cutting of voice synchronous points place is and corresponding section of text; The segmentation compression storage of voice, text, since first section synchronous good text and voice, in order to every pair of synchronous good text---voice segments adopts text and voice compression algorithm to compress respectively, discharge by the form of audible text, and the header that fills up a document; The present invention can not need under the artificial situation about participating in, and automatically realizes text and synchronous demonstration and the broadcast of natural-sounding on any sentence, paragraph, chapters and sections by machine.

Description

The automatic making of audible text and the method for broadcast

Technical field

The present invention relates to the method for a kind of electronic medium and making thereof and broadcast, particularly relate to a kind of automatic making of audible text and the method for broadcast.

Background technology

Text is that people obtain one of main means of information.Up to the present, be that book, newspaper, the magazine of main information communication means still is our learning knowledge, the most frequently used instrument that obtains information with text.Yet along with growth in the living standard, simple text can not satisfy people's needs well, under many circumstances, people not only wish to see text, but also may wish " listening " content to text, such as: we read tired the time in foreign language studying, when both hands will be done other thing or the like.At this moment just need use a kind of can videotex, can begin to play the device of content of text at an arbitrary position again.

Make a kind of like this device, its difficult point is the synchronous demonstration and the broadcast that realize that how effectively text and voice are gone up at an arbitrary position.The existing electronic installation that can play content of text has a lot, as some electronic book reading machines, PDA etc., but these devices often all seem more coarse in the processing of text and voice synchronous problem, the most frequently used method is to record one section natural-sounding specially for text, when needing to play the content of the text, play-over the voice of whole section correspondence.Though it is very simple that this method is made, use obviously very inconvenient, if we want to listen with text in during the corresponding voice of certain part, will have to start anew to play whole section voice up to the part of wanting to listen.Ask for something than higher occasion in, sometimes also can adopt manual method to come natural-sounding is carried out synchronous index, promptly manually find out the synchronous points of voice and text, note the position, when needs are play certain section text, find the synchronous points in the voice and begin according to the index of these positions and play.Though this method can realize that comparatively meticulous its making will spend great amount of manpower and time synchronously, efficient is low, and speed is slow, and along with the raising of synchronization accuracy, and the space that is used to store index also can sharply increase, and therefore is difficult to promote the use of.In some newer technology, also there is the method that adopts phonetic synthesis to synthesize the voice corresponding with text with machine.In these methods, since voice by machine according to text generating, therefore on the synchronous playing of text and voice, do not have problems, but the voice that produced by machine lack naturality, the modulation in tone and the emotion that have not had human language, greatly reduce the enjoyment of " listening ", so also be difficult to be accepted extensively by people.

Summary of the invention

Because the deficiency of prior art on voice, literal stationary problem the invention provides a kind of new electronic medium---the automatic making of audible text and the method for broadcast.This audible text is a kind of segment that is made of plain text and corresponding natural-sounding sound accompaniment thereof, utilize making provided by the invention and player method, can not need under the artificial situation about participating in, automatically realize text and synchronous demonstration and the broadcast of natural-sounding on any sentence, paragraph, chapters and sections by machine.

Audible text of the present invention has set form, and as shown in Figure 1, its form is: file is divided into head and data two parts; Head comprises the information such as sector address table of audible text sign, file size, segmentation number, text and voice; In the data division, text and phonetic segmentation are synchronous section, compress storage by the priority order, and every section comprises corresponding text packed data and compress speech data.

Given one section text and corresponding with it natural-sounding, the automatic method for making of audible text of the present invention comprises the steps:

The first step: the cutting of text for long text, is sizeable several sections according to feature (as chapter, the paragraph etc.) cutting of literal, and cut-off is the end points of sentence; Do not have obvious characteristic as literal, or to long paragraph, the end points that can select some sentence wherein is divided into several parts that are of convenient length as cut-off.Text for short can not carry out cutting.

Second step: voice synchronous points detects, and behind text segmentation, finds out the voice synchronous points corresponding with text dividing point.Detecting of voice synchronous points can be undertaken by manual or automated process, and the voice synchronous points that the inventor provides a kind of detects process automatically and can be divided into end-point detection, keyword recognition, synchronous points and determine three steps.Wherein end-point detection mainly is to find out the end points of each sentence in the voice, and estimates the approximate location of synchronous points in voice roughly, to reduce the hunting zone in keyword recognition stage, improves synchronous speed.Keyword recognition is to identify some specific speech in the voice, for the position of determining synchronous points provides information more accurately.Synchronous points determine be according to sentence end points, keyword distribute, the information such as duration of sentence, find out with text in the sound end that mates most of synchronous points.

The 3rd step: phonetic segmentation with whole section voice in the cutting of voice synchronous points place is and corresponding section of text.

The 4th step: the segmentation compression storage of voice, text.Since first section synchronous good text and voice, in order to every pair of synchronous good text---voice segments adopts text and voice compression algorithm to compress respectively, discharge by the form of above-mentioned audible text, and the header that fills up a document.

The player method of audible text of the present invention comprises the steps:

1, the decompress(ion) of text shows.When opening audible text, the required text chunk of decompress(ion) also shows a screen literal.When page turning during, this section of decompress(ion) text, and show required literal to another section compressed text.

2, the synchronous playing of voice.When opening audible text, decompress(ion) is not play voice immediately.After receiving the order of playing voice, handle according to the following steps:

---decompress(ion) and the pairing voice of current videotex.

---determine current cursor is parked on which sentence of text, first starting point with this and next screen is required synchronous starting point and terminal point, carrying out voice synchronous points in current or next voice segments detects, the process that detects of voice synchronous points comprises (1) end-point detection, promptly find out the end points of each sentence in the voice, and estimate the approximate location of synchronous points in voice roughly; (2) keyword recognition promptly identifies some specific speech in the voice; (3) synchronous points is determined, promptly according to sentence end points, keyword distribute, the information such as duration of sentence, find out with text in the sound end that mates most of synchronous points;

---begin to play voice from the synchronous starting point of voice,, then show the next screen text, seek new synchronous terminal point simultaneously if confiscate the order that stops to play before being played to synchronous terminal point.

Compared with prior art, usefulness of the present invention comprises:

1, utilizes audible text provided by the invention and making thereof and player method, can realize text and synchronous demonstration and the broadcast of natural-sounding on any sentence, paragraph, chapters and sections automatically by machine fully, need not manually participate in, improved synchronization efficiency and time widely, suitable large-scale promotion is used.

2, audible text Chinese version and voice adopt the form storage of segment sync, have both reduced the space that is used to store index, and enough fast synchronous speed is arranged in the time of can guaranteeing to play again.

3, utilize audible text method for making provided by the invention, can change the sound accompaniment of text easily, can satisfy user's personal like and custom better.

Description of drawings

Fig. 1 is an audible text form synoptic diagram of the present invention;

Fig. 2 the present invention realizes the playing circuit synoptic diagram of the player method of audible text;

Fig. 3 is the automatic method for making flow chart of audible text of the present invention;

Fig. 4 is the playing program block diagram of audible text of the present invention;

Fig. 5 detects flow chart for the automatic method for making or the described voice synchronous points of player method of audible text of the present invention.

Embodiment

The automatic making of audible text of the present invention or player method can be on the electronic installation with display device, broadcast equipment, IO interface and certain calculation ability and memory space realize effectively, has electronic book reading machine, PDA of above-mentioned feature or the like as computing machine or some.Be example with a kind of typical playing device below, a kind of embodiment of the present invention is described with above-mentioned feature.

As shown in Figure 2, the present invention realizes that the playing circuit of the player method of audible text is made of jointly microcontroller circuit, keyboard interface circuit, computer interface circuit, liquid crystal display circuit, memory circuitry, voice playing circuit and decoding scheme.Wherein IC1 is processor chips, and IC2 is the keyboard interface chip, and IC3 is the microcomputer interface chip, and IC4 is the control chip of LCD display module, and IC5 is a coding chip, and IC6 is a large flash memory, and IC7 is a D/A converter.Microprocessor is mainly finished following function as the core processing unit of total system: the 1) Presentation Function of operation interface and text.2) decoding of keyboard input and carry out the function of respective handling.3) with the function of compunication and interactive operation.4) the control voice playing circuit is play voice functions.5) the Compress softwares function of text and voice.6) function of text and voice synchronous.7) control the function of synchronous coordination work between each module.Keyboard interface circuit, computer interface circuit, liquid crystal display circuit and memory circuitry are finished communicating by letter and control function between microprocessor and keyboard, computing machine, LCDs and the storer respectively.Voice playing circuit is finished the function of the voice signal of playing microprocessor output.Decoding scheme provides chip selection signal for each peripheral chip.

Player adopts online, two kinds of working methods of off line, and wherein on-line mode is meant the working method when player links to each other with computing machine, mainly is the making of carrying out audible text by the computing machine software kit at this moment, and being written into of audible text, text and voice; Off-line mode is normal working method, and in this working method, player is videotex, broadcast voice separately, or utilize audible text to carry out the synchronous demonstration and the broadcast of text, voice.Corresponding with working method, the software of player is divided into two parts, a part is the computer software supporting with player, its main program block diagram as shown in Figure 3, wherein be the automatic method for making flow chart of audible text of the present invention in the frame of broken lines, another part is the software that moves on the player, and its main program block diagram wherein is the playing program block diagram of audible text of the present invention as shown in Figure 4 in the frame of broken lines.

As Fig. 3, computing machine software kit workflow is as follows: after entering program, at first carry out initial work, wait for user's instruction then.The operation that the user can carry out mainly comprises and uploading data (transmitting data to player) and audible text making etc.When the user need carry out upload operation, judge whether player links to each other with computing machine earlier, then can normally carry out the transmission of text, voice or audible text as linking to each other, otherwise then provide the prompting that computing machine does not link to each other with player.When the user need carry out the audible text making, computing machine will point out the user that text and corresponding natural-sounding sound accompaniment file are provided, and make according to method provided by the invention then.

As Fig. 4, the player software workflow is as follows: after start powered on, player at first entered the off-line working mode, waited for key command and detected on line state, display screen will show the catalogue of having deposited file this moment, and text, voice and audible text are added with different signs when showing.When player detects with after computing machine is connected, promptly enter the on-line working mode.In the off-line working mode, the operation that the user can carry out mainly comprises videotex, plays voice and utilizes audible text to carry out the synchronous demonstration and the broadcast of text and voice, comprise that in addition some controls show and the operation of playing process, as above descend move left and right, preceding page turning, page turning afterwards, broadcast voice, stop to play voice, confirm and withdraw from etc.If the user needs simple videotex file or plays voice document, then the decompress(ion) corresponding text or and voice, show then or play.If what the user opened is audible text, then play according to method provided by the invention.In the on-line working mode, the operation of off-line mode is with unavailable, and player mainly carries out being written into of text, voice or audible text by supporting computer software.

It is the gordian technique that all relates in audible text making and the playing process that voice synchronous points detects, also be core of the present invention, the method that these two processes adopt is basic identical, and the key distinction is: be that whole section voice are searched for during making, speed is slower, length consuming time; Only search in current or next voice segments during broadcast, speed is very fast, and the required time is very short.In the present embodiment, voice synchronous points detects program flow diagram as shown in Figure 5, and concrete steps are as follows: (1) end-point detection: 1. analyze in the text and quiet place may occur, as comma, pause mark, fullstop etc.2. detect quiet in the corresponding voice, the end points of search sentence.Here mainly adopted the end-point detection technology in the speech recognition.3. detected quiet position in quiet position of relatively estimating in the text and the voice is roughly determined the position of current sentence in voice, and is kept certain hunting zone to guarantee required sentence within this scope, so that next step can find correct result.(2) keyword recognition: 1. find out and treat near several the keyword sequences in text of synchronous sentence.This step need be used a keyword dictionary that pre-defines, and near the keyword that occurs of synchronous sentence and the order of distribution thereof determined to treat in the keyword that search occurs in dictionary in current text.2. determine and the corresponding speech model set of the keyword set of current appearance.This step is main by searching good realizing with the corresponding speech model of keyword dictionary storehouse of training in advance.3. find out the keyword in the hunting zone.This mainly realizes by the keyword recognition technology in the speech recognition.(3) synchronous points detects: 1. analyze the characteristics treat synchronous sentence and near text thereof, mainly comprise the distribution of sentence length, keyword and at interval length etc. between them.2. according to sentence end points, keyword distribute, the information such as duration of sentence, find out with text in the sound end that mates most of synchronous points.

Claims

1, a kind of automatic method for making of audible text is characterized in that comprising the steps:

The first step: the cutting of text is sizeable several sections according to the feature of literal with text dividing, and cut-off is the end points of sentence;

Second step: voice synchronous points detects, and finds out the voice synchronous points corresponding with text dividing point;

The 3rd step: phonetic segmentation with whole section voice in the cutting of voice synchronous points place is and corresponding section of text;

The 4th step: the segmentation compression storage of voice, text, since first section synchronous good text and voice, in order to every pair of synchronous good text---voice segments adopts text and voice compression algorithm to compress respectively, discharge by the form of audible text, and the header that fills up a document;

The form of described audible text is: file is divided into head and data two parts; Head comprises the information such as sector address table of audible text sign, file size, segmentation number, text and voice; In the data division, text and phonetic segmentation are synchronous section, compress storage by the priority order, and every section comprises corresponding text packed data and compress speech data;

Second step process that detects of described voice synchronous points comprises (1) end-point detection, promptly finds out the end points of each sentence in the voice, and estimates the approximate location of synchronous points in voice roughly; (2) keyword recognition promptly identifies some specific speech in the voice; (3) synchronous points is determined, promptly according to sentence end points, keyword distribute, the information such as duration of sentence, find out with text in the sound end that mates most of synchronous points.

2, a kind of player method of audible text is characterized in that comprising the steps:

The first step: the decompress(ion) of text shows that when opening audible text, the required text chunk of decompress(ion) also shows a screen literal; When page turning during, this section of decompress(ion) text, and show required literal to another section compressed text;

Second step: the synchronous playing of voice, when opening audible text, decompress(ion) is play voice immediately, receive the order of playing voice after, handle according to the following steps:

---decompress(ion) and the pairing voice of current videotex;

---determining current cursor is parked on which sentence of text, is required synchronous starting point and terminal point with first starting point of this and next screen, carries out voice synchronous points and detect in current or next voice segments;

---begin to play voice from the synchronous starting point of voice,, then show the next screen text, seek new synchronous terminal point simultaneously if confiscate the order that stops to play before being played to synchronous terminal point;

The process that detects of described voice synchronous points comprises (1) end-point detection, promptly finds out the end points of each sentence in the voice, and estimates the approximate location of synchronous points in voice; (2) keyword recognition promptly identifies some specific speech in the voice; (3) synchronous points is determined, promptly according to sentence end points, keyword distribute, the information such as duration of sentence, find out with text in the sound end that mates most of synchronous points.