CN100559368C - The automatic making of audible text and the method for broadcast - Google Patents

The automatic making of audible text and the method for broadcast Download PDF

Info

Publication number
CN100559368C
CN100559368C CNB2004100280634A CN200410028063A CN100559368C CN 100559368 C CN100559368 C CN 100559368C CN B2004100280634 A CNB2004100280634 A CN B2004100280634A CN 200410028063 A CN200410028063 A CN 200410028063A CN 100559368 C CN100559368 C CN 100559368C
Authority
CN
China
Prior art keywords
text
voice
synchronous
points
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2004100280634A
Other languages
Chinese (zh)
Other versions
CN1595397A (en
Inventor
韦岗
张军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CNB2004100280634A priority Critical patent/CN100559368C/en
Publication of CN1595397A publication Critical patent/CN1595397A/en
Application granted granted Critical
Publication of CN100559368C publication Critical patent/CN100559368C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Electrically Operated Instructional Devices (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The present invention relates to a kind of automatic method for making of audible text, comprising: the cutting of text is sizeable several sections according to the feature of literal with text dividing, and cut-off is the end points of sentence; Voice synchronous points detects, and finds out the voice synchronous points corresponding with text dividing point; Phonetic segmentation with whole section voice in the cutting of voice synchronous points place is and corresponding section of text; The segmentation compression storage of voice, text, since first section synchronous good text and voice, in order to every pair of synchronous good text---voice segments adopts text and voice compression algorithm to compress respectively, discharge by the form of audible text, and the header that fills up a document; The present invention can not need under the artificial situation about participating in, and automatically realizes text and synchronous demonstration and the broadcast of natural-sounding on any sentence, paragraph, chapters and sections by machine.

Description

The automatic making of audible text and the method for broadcast
Technical field
The present invention relates to the method for a kind of electronic medium and making thereof and broadcast, particularly relate to a kind of automatic making of audible text and the method for broadcast.
Background technology
Text is that people obtain one of main means of information.Up to the present, be that book, newspaper, the magazine of main information communication means still is our learning knowledge, the most frequently used instrument that obtains information with text.Yet along with growth in the living standard, simple text can not satisfy people's needs well, under many circumstances, people not only wish to see text, but also may wish " listening " content to text, such as: we read tired the time in foreign language studying, when both hands will be done other thing or the like.At this moment just need use a kind of can videotex, can begin to play the device of content of text at an arbitrary position again.
Make a kind of like this device, its difficult point is the synchronous demonstration and the broadcast that realize that how effectively text and voice are gone up at an arbitrary position.The existing electronic installation that can play content of text has a lot, as some electronic book reading machines, PDA etc., but these devices often all seem more coarse in the processing of text and voice synchronous problem, the most frequently used method is to record one section natural-sounding specially for text, when needing to play the content of the text, play-over the voice of whole section correspondence.Though it is very simple that this method is made, use obviously very inconvenient, if we want to listen with text in during the corresponding voice of certain part, will have to start anew to play whole section voice up to the part of wanting to listen.Ask for something than higher occasion in, sometimes also can adopt manual method to come natural-sounding is carried out synchronous index, promptly manually find out the synchronous points of voice and text, note the position, when needs are play certain section text, find the synchronous points in the voice and begin according to the index of these positions and play.Though this method can realize that comparatively meticulous its making will spend great amount of manpower and time synchronously, efficient is low, and speed is slow, and along with the raising of synchronization accuracy, and the space that is used to store index also can sharply increase, and therefore is difficult to promote the use of.In some newer technology, also there is the method that adopts phonetic synthesis to synthesize the voice corresponding with text with machine.In these methods, since voice by machine according to text generating, therefore on the synchronous playing of text and voice, do not have problems, but the voice that produced by machine lack naturality, the modulation in tone and the emotion that have not had human language, greatly reduce the enjoyment of " listening ", so also be difficult to be accepted extensively by people.
Summary of the invention
Because the deficiency of prior art on voice, literal stationary problem the invention provides a kind of new electronic medium---the automatic making of audible text and the method for broadcast.This audible text is a kind of segment that is made of plain text and corresponding natural-sounding sound accompaniment thereof, utilize making provided by the invention and player method, can not need under the artificial situation about participating in, automatically realize text and synchronous demonstration and the broadcast of natural-sounding on any sentence, paragraph, chapters and sections by machine.
Audible text of the present invention has set form, and as shown in Figure 1, its form is: file is divided into head and data two parts; Head comprises the information such as sector address table of audible text sign, file size, segmentation number, text and voice; In the data division, text and phonetic segmentation are synchronous section, compress storage by the priority order, and every section comprises corresponding text packed data and compress speech data.
Given one section text and corresponding with it natural-sounding, the automatic method for making of audible text of the present invention comprises the steps:
The first step: the cutting of text for long text, is sizeable several sections according to feature (as chapter, the paragraph etc.) cutting of literal, and cut-off is the end points of sentence; Do not have obvious characteristic as literal, or to long paragraph, the end points that can select some sentence wherein is divided into several parts that are of convenient length as cut-off.Text for short can not carry out cutting.
Second step: voice synchronous points detects, and behind text segmentation, finds out the voice synchronous points corresponding with text dividing point.Detecting of voice synchronous points can be undertaken by manual or automated process, and the voice synchronous points that the inventor provides a kind of detects process automatically and can be divided into end-point detection, keyword recognition, synchronous points and determine three steps.Wherein end-point detection mainly is to find out the end points of each sentence in the voice, and estimates the approximate location of synchronous points in voice roughly, to reduce the hunting zone in keyword recognition stage, improves synchronous speed.Keyword recognition is to identify some specific speech in the voice, for the position of determining synchronous points provides information more accurately.Synchronous points determine be according to sentence end points, keyword distribute, the information such as duration of sentence, find out with text in the sound end that mates most of synchronous points.
The 3rd step: phonetic segmentation with whole section voice in the cutting of voice synchronous points place is and corresponding section of text.
The 4th step: the segmentation compression storage of voice, text.Since first section synchronous good text and voice, in order to every pair of synchronous good text---voice segments adopts text and voice compression algorithm to compress respectively, discharge by the form of above-mentioned audible text, and the header that fills up a document.
The player method of audible text of the present invention comprises the steps:
1, the decompress(ion) of text shows.When opening audible text, the required text chunk of decompress(ion) also shows a screen literal.When page turning during, this section of decompress(ion) text, and show required literal to another section compressed text.
2, the synchronous playing of voice.When opening audible text, decompress(ion) is not play voice immediately.After receiving the order of playing voice, handle according to the following steps:
---decompress(ion) and the pairing voice of current videotex.
---determine current cursor is parked on which sentence of text, first starting point with this and next screen is required synchronous starting point and terminal point, carrying out voice synchronous points in current or next voice segments detects, the process that detects of voice synchronous points comprises (1) end-point detection, promptly find out the end points of each sentence in the voice, and estimate the approximate location of synchronous points in voice roughly; (2) keyword recognition promptly identifies some specific speech in the voice; (3) synchronous points is determined, promptly according to sentence end points, keyword distribute, the information such as duration of sentence, find out with text in the sound end that mates most of synchronous points;
---begin to play voice from the synchronous starting point of voice,, then show the next screen text, seek new synchronous terminal point simultaneously if confiscate the order that stops to play before being played to synchronous terminal point.
Compared with prior art, usefulness of the present invention comprises:
1, utilizes audible text provided by the invention and making thereof and player method, can realize text and synchronous demonstration and the broadcast of natural-sounding on any sentence, paragraph, chapters and sections automatically by machine fully, need not manually participate in, improved synchronization efficiency and time widely, suitable large-scale promotion is used.
2, audible text Chinese version and voice adopt the form storage of segment sync, have both reduced the space that is used to store index, and enough fast synchronous speed is arranged in the time of can guaranteeing to play again.
3, utilize audible text method for making provided by the invention, can change the sound accompaniment of text easily, can satisfy user's personal like and custom better.
Description of drawings
Fig. 1 is an audible text form synoptic diagram of the present invention;
Fig. 2 the present invention realizes the playing circuit synoptic diagram of the player method of audible text;
Fig. 3 is the automatic method for making flow chart of audible text of the present invention;
Fig. 4 is the playing program block diagram of audible text of the present invention;
Fig. 5 detects flow chart for the automatic method for making or the described voice synchronous points of player method of audible text of the present invention.
Embodiment
The automatic making of audible text of the present invention or player method can be on the electronic installation with display device, broadcast equipment, IO interface and certain calculation ability and memory space realize effectively, has electronic book reading machine, PDA of above-mentioned feature or the like as computing machine or some.Be example with a kind of typical playing device below, a kind of embodiment of the present invention is described with above-mentioned feature.
As shown in Figure 2, the present invention realizes that the playing circuit of the player method of audible text is made of jointly microcontroller circuit, keyboard interface circuit, computer interface circuit, liquid crystal display circuit, memory circuitry, voice playing circuit and decoding scheme.Wherein IC1 is processor chips, and IC2 is the keyboard interface chip, and IC3 is the microcomputer interface chip, and IC4 is the control chip of LCD display module, and IC5 is a coding chip, and IC6 is a large flash memory, and IC7 is a D/A converter.Microprocessor is mainly finished following function as the core processing unit of total system: the 1) Presentation Function of operation interface and text.2) decoding of keyboard input and carry out the function of respective handling.3) with the function of compunication and interactive operation.4) the control voice playing circuit is play voice functions.5) the Compress softwares function of text and voice.6) function of text and voice synchronous.7) control the function of synchronous coordination work between each module.Keyboard interface circuit, computer interface circuit, liquid crystal display circuit and memory circuitry are finished communicating by letter and control function between microprocessor and keyboard, computing machine, LCDs and the storer respectively.Voice playing circuit is finished the function of the voice signal of playing microprocessor output.Decoding scheme provides chip selection signal for each peripheral chip.
Player adopts online, two kinds of working methods of off line, and wherein on-line mode is meant the working method when player links to each other with computing machine, mainly is the making of carrying out audible text by the computing machine software kit at this moment, and being written into of audible text, text and voice; Off-line mode is normal working method, and in this working method, player is videotex, broadcast voice separately, or utilize audible text to carry out the synchronous demonstration and the broadcast of text, voice.Corresponding with working method, the software of player is divided into two parts, a part is the computer software supporting with player, its main program block diagram as shown in Figure 3, wherein be the automatic method for making flow chart of audible text of the present invention in the frame of broken lines, another part is the software that moves on the player, and its main program block diagram wherein is the playing program block diagram of audible text of the present invention as shown in Figure 4 in the frame of broken lines.
As Fig. 3, computing machine software kit workflow is as follows: after entering program, at first carry out initial work, wait for user's instruction then.The operation that the user can carry out mainly comprises and uploading data (transmitting data to player) and audible text making etc.When the user need carry out upload operation, judge whether player links to each other with computing machine earlier, then can normally carry out the transmission of text, voice or audible text as linking to each other, otherwise then provide the prompting that computing machine does not link to each other with player.When the user need carry out the audible text making, computing machine will point out the user that text and corresponding natural-sounding sound accompaniment file are provided, and make according to method provided by the invention then.
As Fig. 4, the player software workflow is as follows: after start powered on, player at first entered the off-line working mode, waited for key command and detected on line state, display screen will show the catalogue of having deposited file this moment, and text, voice and audible text are added with different signs when showing.When player detects with after computing machine is connected, promptly enter the on-line working mode.In the off-line working mode, the operation that the user can carry out mainly comprises videotex, plays voice and utilizes audible text to carry out the synchronous demonstration and the broadcast of text and voice, comprise that in addition some controls show and the operation of playing process, as above descend move left and right, preceding page turning, page turning afterwards, broadcast voice, stop to play voice, confirm and withdraw from etc.If the user needs simple videotex file or plays voice document, then the decompress(ion) corresponding text or and voice, show then or play.If what the user opened is audible text, then play according to method provided by the invention.In the on-line working mode, the operation of off-line mode is with unavailable, and player mainly carries out being written into of text, voice or audible text by supporting computer software.
It is the gordian technique that all relates in audible text making and the playing process that voice synchronous points detects, also be core of the present invention, the method that these two processes adopt is basic identical, and the key distinction is: be that whole section voice are searched for during making, speed is slower, length consuming time; Only search in current or next voice segments during broadcast, speed is very fast, and the required time is very short.In the present embodiment, voice synchronous points detects program flow diagram as shown in Figure 5, and concrete steps are as follows: (1) end-point detection: 1. analyze in the text and quiet place may occur, as comma, pause mark, fullstop etc.2. detect quiet in the corresponding voice, the end points of search sentence.Here mainly adopted the end-point detection technology in the speech recognition.3. detected quiet position in quiet position of relatively estimating in the text and the voice is roughly determined the position of current sentence in voice, and is kept certain hunting zone to guarantee required sentence within this scope, so that next step can find correct result.(2) keyword recognition: 1. find out and treat near several the keyword sequences in text of synchronous sentence.This step need be used a keyword dictionary that pre-defines, and near the keyword that occurs of synchronous sentence and the order of distribution thereof determined to treat in the keyword that search occurs in dictionary in current text.2. determine and the corresponding speech model set of the keyword set of current appearance.This step is main by searching good realizing with the corresponding speech model of keyword dictionary storehouse of training in advance.3. find out the keyword in the hunting zone.This mainly realizes by the keyword recognition technology in the speech recognition.(3) synchronous points detects: 1. analyze the characteristics treat synchronous sentence and near text thereof, mainly comprise the distribution of sentence length, keyword and at interval length etc. between them.2. according to sentence end points, keyword distribute, the information such as duration of sentence, find out with text in the sound end that mates most of synchronous points.

Claims (2)

1, a kind of automatic method for making of audible text is characterized in that comprising the steps:
The first step: the cutting of text is sizeable several sections according to the feature of literal with text dividing, and cut-off is the end points of sentence;
Second step: voice synchronous points detects, and finds out the voice synchronous points corresponding with text dividing point;
The 3rd step: phonetic segmentation with whole section voice in the cutting of voice synchronous points place is and corresponding section of text;
The 4th step: the segmentation compression storage of voice, text, since first section synchronous good text and voice, in order to every pair of synchronous good text---voice segments adopts text and voice compression algorithm to compress respectively, discharge by the form of audible text, and the header that fills up a document;
The form of described audible text is: file is divided into head and data two parts; Head comprises the information such as sector address table of audible text sign, file size, segmentation number, text and voice; In the data division, text and phonetic segmentation are synchronous section, compress storage by the priority order, and every section comprises corresponding text packed data and compress speech data;
Second step process that detects of described voice synchronous points comprises (1) end-point detection, promptly finds out the end points of each sentence in the voice, and estimates the approximate location of synchronous points in voice roughly; (2) keyword recognition promptly identifies some specific speech in the voice; (3) synchronous points is determined, promptly according to sentence end points, keyword distribute, the information such as duration of sentence, find out with text in the sound end that mates most of synchronous points.
2, a kind of player method of audible text is characterized in that comprising the steps:
The first step: the decompress(ion) of text shows that when opening audible text, the required text chunk of decompress(ion) also shows a screen literal; When page turning during, this section of decompress(ion) text, and show required literal to another section compressed text;
Second step: the synchronous playing of voice, when opening audible text, decompress(ion) is play voice immediately, receive the order of playing voice after, handle according to the following steps:
---decompress(ion) and the pairing voice of current videotex;
---determining current cursor is parked on which sentence of text, is required synchronous starting point and terminal point with first starting point of this and next screen, carries out voice synchronous points and detect in current or next voice segments;
---begin to play voice from the synchronous starting point of voice,, then show the next screen text, seek new synchronous terminal point simultaneously if confiscate the order that stops to play before being played to synchronous terminal point;
The process that detects of described voice synchronous points comprises (1) end-point detection, promptly finds out the end points of each sentence in the voice, and estimates the approximate location of synchronous points in voice; (2) keyword recognition promptly identifies some specific speech in the voice; (3) synchronous points is determined, promptly according to sentence end points, keyword distribute, the information such as duration of sentence, find out with text in the sound end that mates most of synchronous points.
CNB2004100280634A 2004-07-14 2004-07-14 The automatic making of audible text and the method for broadcast Expired - Fee Related CN100559368C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004100280634A CN100559368C (en) 2004-07-14 2004-07-14 The automatic making of audible text and the method for broadcast

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004100280634A CN100559368C (en) 2004-07-14 2004-07-14 The automatic making of audible text and the method for broadcast

Publications (2)

Publication Number Publication Date
CN1595397A CN1595397A (en) 2005-03-16
CN100559368C true CN100559368C (en) 2009-11-11

Family

ID=34664131

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100280634A Expired - Fee Related CN100559368C (en) 2004-07-14 2004-07-14 The automatic making of audible text and the method for broadcast

Country Status (1)

Country Link
CN (1) CN100559368C (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070038685A (en) * 2005-10-06 2007-04-11 엘지전자 주식회사 Mobile terminal and method for detecting playing location sound file thereof method for display text information sound file thereof
KR100658151B1 (en) * 2006-02-13 2006-12-15 삼성전자주식회사 Method and apparatus for position setting of mp3 player's in mobile phone
CN101640058B (en) * 2009-07-24 2012-05-23 王祐凡 Multimedia synchronization method, player and multimedia data making device
CN102291773B (en) * 2011-07-18 2014-12-10 电信科学技术研究院 Data compression method and equipment
JP6586514B2 (en) * 2015-05-25 2019-10-02 ▲広▼州酷狗▲計▼算机科技有限公司 Audio processing method, apparatus and terminal
CN112466287B (en) * 2020-11-25 2023-06-27 出门问问(苏州)信息科技有限公司 Voice segmentation method, device and computer readable storage medium
CN112397104B (en) * 2020-11-26 2022-03-29 北京字节跳动网络技术有限公司 Audio and text synchronization method and device, readable medium and electronic equipment

Also Published As

Publication number Publication date
CN1595397A (en) 2005-03-16

Similar Documents

Publication Publication Date Title
US10580394B2 (en) Method, client and computer storage medium for processing information
KR102331660B1 (en) Methods and apparatuses for controlling voice of electronic devices, computer device and storage media
CN102142247A (en) Multifunctional electronic score
US20140013192A1 (en) Techniques for touch-based digital document audio and user interface enhancement
CN111653265B (en) Speech synthesis method, device, storage medium and electronic equipment
EP3425630A1 (en) Electronic device-awakening method and apparatus, device and computer-readable storage medium
CN103366784A (en) Multimedia playing method and device with function of voice controlling and humming searching
CN104898821B (en) The method and electronic equipment of a kind of information processing
CN100559368C (en) The automatic making of audible text and the method for broadcast
CN104090883A (en) Playing control processing method and playing control processing device for audio file
CN109471955B (en) Video clip positioning method, computing device and storage medium
CN104219570A (en) Audio signal playing method and device
CN114023301A (en) Audio editing method, electronic device and storage medium
KR101789057B1 (en) Automatic audio book system for blind people and operation method thereof
US11366851B2 (en) Karaoke query processing system
CN1307529C (en) Audio player with lyrics display
CN113901186A (en) Telephone recording marking method, device, equipment and storage medium
CN106033678A (en) Playing content display method and apparatus thereof
CN1822091B (en) Electronic musical apparatus for displaying character
CN1300762C (en) Natural peech vocal partrier device for text and antomatic synchronous method for text and natural voice
CN1916885B (en) Method for synchronous playing image, sound, and text
CN1145913C (en) Device for reproducing information or executing functions
CN113516963A (en) Audio data generation method and device, server and intelligent loudspeaker box
KR102416818B1 (en) Methods and apparatuses for controlling voice of electronic devices, computer device and storage media
CN2842652Y (en) Acoustic-controlled programme-ordering MP3 player

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20091111

Termination date: 20100714