CN105677919A

CN105677919A - Storage and retrieval method of language resource audio data

Info

Publication number: CN105677919A
Application number: CN201610120131.2A
Authority: CN
Inventors: 何建勇
Original assignee: Individual
Current assignee: Individual
Priority date: 2016-03-03
Filing date: 2016-03-03
Publication date: 2016-06-15

Abstract

The invention provides a storage and retrieval method of language resource audio data.The storage method comprises the steps that voice, explanatory texts, images and cartoons of multiple words are composited into an audio video to be integrally saved, and a playing tool with retrieving and playing functions is built for the audio video, so that the playing tool can perform quick retrieving and playing of the words on multiple media.According to the method, the language resource audio data is processed in the mode that voice data is adopted and combined with the corresponding explanatory texts, images and cartoons, and therefore explanation for the voice is more accurate; the voice data and the corresponding explanatory texts, images and cartoons are composited into the video to be stored, the video can be propagated in the different media such as a computer, a mobile phone, a network and a television, and meanwhile the data is not prone to be modified by people; after the voice data and the corresponding explanatory texts, images and cartoons are composited into the video to be stored, the simple retrieving and playing tool can be made for the computer, the mobile phone and the like, therefore, the words are retrieved and played in the computer and the mobile phone, the retrieving speed is high, and a learner can use the tool very conveniently.

Description

The storage of the sound data of a kind of language resource and search method

Technical field

The present invention relates to the storage of the sound data of a kind of language resource and search method, belong to technical field of data processing.

Background technology

China is one of country that language resource is the abundantest in the world, and China has 55 ethnic groups, and except using general Chinese except the Manchu and the Hui ethnic group, there are the language of oneself in other nationalitys, and the different offsprings within some nationality also use different language. Therefore, pass on and develop the spoken and written languages of national minorities and culture, becoming China's protection multifarious key job of language. State Language Work Committee started " the sound data base of Chinese language resource " project in 2008; require various places according to science, unified planning in units of county; collect Contemporary Chinese dialect, minority language and with the real state of mandarin of local characteristic, sound language material; and carry out science arrangement, processing and effectively preserve; to further investigate in the future and effectively to develop, protect native language cultural heritage. This is one and has the great spoken and written languages engineering of far-reaching country.

The sound data of language resource generally comprise voice and the illustrative material two parts to this voice, in current data base, this two parts majority is stored separately, by every kind of language, the pronunciation of each word is made independent audio files storage at a place, and the corresponding illustrative material of each audio file is stored in another place, when needing certain word learning certain language, from audio file library and illustrative material library, retrieved and transfer corresponding file respectively be combined playing by software, this storage method can preserve on multiple computers, capacity is big, but when database volume is bigger, software is consuming time longer to the retrieval of word, playing pronunciation can be slow, Consumer's Experience is not good, and this type of software generally can only use on computers, cannot at hands machine, the clients such as TV use, terminal is used excessively to limit to. additionally, owing to illustrative material selected at present normally only adopts text, it is not bound with the explanation image of correspondence, animation etc. and carries out synthesis process, expression for some words may be definite not, the such as same word of some dialect, different actions has different dialects to express, and being illustrated just relatively clearer if being furnished with animated actions, more utilizing understanding of learner.

Summary of the invention

It is an object of the invention to provide a kind of storage with the conventional method sound data of diverse language resource and search method, this storage method is by the voice of multiple words, illustrates that text, image, animation compound are that a sound video carries out overall preservation, and this sound video is set up a broadcasting instrument with index playing function, making it can carry out the quick-searching to word and broadcasting on media, learner uses can be very convenient.

Storage and the search method of the sound data of language resource of the present invention comprise the following steps:

(1), the voice of all words of storage will be needed, illustrate that text, image, animation file collection are complete, all words are arranged in some pages, aligning method is: first set the word number that each page enters at most, then entering lower one page after being booked front page from front to back again, last page residue is many enters how many at least.

(2), put in order on the page by word and successively the voice document of this page of word is played out, also call explanation text corresponding to this word, image and animation while playing word pronunciation and carry out simultaneous display on one side at the page, one page word jumps to lower one page after playing continues to play, until all words all finish playing; Being that word sets a unified playing duration during broadcasting, the reproduction time making each word is all equal, and to ensure that the voice document of all words all can pronounce completely in the playing duration set; To be also that page jump during page turning sets a page jump time during broadcasting.

In this step, the pronunciation of a kind of language can only be selected for the broadcasting of word, it is also possible to select the pronunciation of polyglot. When only selecting the pronunciation of a kind of language, the voice document of each word is all play one time by the playing duration set; When selecting the pronunciation of polyglot, the voice document of every kind of selected language of each word all plays one time successively by the playing duration set.

(3), by the voice of all words in step (2), illustrate text, image, animation broadcasting record into one section of video with display process record screen software, be achieved in the voice of word, explanation text, image, animation being combined into a sound video and carrying out global storage.

In this step, can be different video formats by the video conversion recorded to store on different devices and to play.

(4), by all words of institute's recorded video, from page 1 first to last page, last enters an array in order, a broadcasting instrument with index playing function is worked out with programming language, use this broadcasting instrument, according to the word playing duration of institute's recorded video, page jump time and each word arrangement position in array, arbitrary word play position in video can be calculated, and then realize precise search and the broadcasting of the sound data of language resource to arbitrary word.

In this step, different programming languages can be used to be written as different broadcasting instruments, to adapt to the broadcasting demand of the different media such as computer, mobile phone, network, TV.

Owing to the voice length of each word differs greatly sometimes, it is fixed according to the word that voice is the longest to set during a unified playing duration, so have very long blank time when the word that broadcasting voice is shorter, it is unfavorable for succinct broadcasting, therefore can first word be divided into group according to the length of word pronunciation, respectively to often organizing word one suitable playing duration of setting, according still further to above-mentioned steps (1), (2), (3) each group of word is recorded into one section of video by method respectively, then each section of video is merged into an overall video, it is achieved in the global storage to the sound data of language resource, during index playing, by often organizing word, from page 1 first to last page, last enters an array in order, further according to each section of video order in overall video, each array is merged into a total array, a broadcasting instrument with index playing function is worked out with programming language, use this broadcasting instrument, according to every section of video position in overall video, the word quantity of every section of video, the word playing duration of every section of video, page jump time and each word arrangement position in total array, arbitrary word play position in overall video can be calculated, and then realize precise search and the broadcasting of the sound data of language resource to arbitrary word.

Storage and the search method of the sound data of language resource of the present invention have the advantage that

(1), the present invention adopt speech data plus corresponding explanation text, image, animation processing of the sound data of language resource, the explanation of voice is more accurate. Because for the dialect that some is special, it is only can only ensure the accurate of pronunciation with speech explanation, but lacks image or animation explanation, cause the misunderstanding in the learner meaning sometimes.

Such as: eagle has two kinds of wordings in a local precious jade language of autonomous county of the Yao nationality of Du'an of Guangxi, a kind of is that height is relatively larger, it is everlasting on high mountain grotto, chicken can be grabbed and grab sheep, substantially disappear at present, local Chinese is named as " big eagle ", another kind is that height is relatively small, chicken can only be grabbed, also occasionally see at present, local Chinese is named as " little eagle ", " big eagle " and " little eagle " is if the recording of precious jade language is simply got off with voice and text entry, " big eagle " will be interpreted as into the eagle that annual meeting flies by descendant's possibility when learning precious jade language, the eyas that " little eagle " is interpreted as in nest hair and has not grown, if plus picture specification, learner understands that this is the eagle of two kinds at a glance, this guarantees the accuracy of the precious jade language phonetic representation meaning preserved.

(2), speech data is added that the explanation text of correspondence, image, animation compound are that video stores by the present invention, can propagate on the different media such as computer, mobile phone, network, TV, data are not easy to be revised by people simultaneously, Different Period, as long as format transformation, can use on new equipment again.

(3), speech data is added that the explanation text of correspondence, image, animation compound are after video by the present invention, simple index playing instrument can be made for computer, mobile phone etc., on computer, mobile phone, word being carried out index playing, retrieval rate is fast, very convenient.

Accompanying drawing explanation

Fig. 1 is a page boundary face of word playout software.

Fig. 2 is the broadcast page recording into video.

Fig. 3 is the retrieved page interface that video frequency searching plays instrument.

Detailed description of the invention

Below in conjunction with specific embodiments and the drawings, the present invention will be further described.

As the present embodiment word total amount has 3000, setting each page and enter 58 words, then line up 52 pages, first 51 pages are all booked, and the 52nd page enters 42 words.

Fig. 1 is a page boundary face of the word playout software made by above-mentioned condition, play and show explanation text that word is corresponding, image, animation while word pronunciation file on the left side, page boundary face, as it is shown in figure 1, the corresponding explanation text (including word and phonetic) that when the 28th word " cock " of this page sends mandarin pronunciation, the left side shows and image.The right be this page word arrangement. The pronunciation of a kind of language can be selected in the lower left in page boundary face or select the pronunciation of polyglot, be provided with single-shot sound button in the lower right in page boundary face, playing duration sets button and the button that pronounces more, single-shot sound button is for playing the pronunciation of a kind of language continuously, and the button that pronounces for playing the pronunciation of polyglot continuously, playing duration sets button for setting the playing duration of each pronunciation of words. The playing duration set in figure is 2 seconds, when pressing single-shot sound button, mandarin Chinese word voice can be begun to send out from " cock " word, within 0 to 2 seconds, send the 28th " cock " voice, within 2 to 4 seconds, send out the 29th " hen " voice, within 4 to 6 seconds, send out the 30th " common chicken " voice, within 6 to 8 seconds, sending out the 31st " chicken " voice, the rest may be inferred, one by one, one page connects one page and pronounces continuously backward, then by once then stopping. When pressing many pronunciation buttons, can start to send successively the voice of the selected language such as mandarin, precious jade language, Miao, strong words, English from " cock " word, the speech play time of each language is 2 seconds, after finishing a word, arriving next word again, the rest may be inferred, one by one, one page connects one page and pronounces continuously backward, then by once then stopping. Playing duration is set to 2 seconds, then to ensure that all words can pronounce completely in 2 seconds, when the voice of word is longer, need to suitably increase playing duration. Page jump time during page turning can be set when making software.

Owing to each word is to pronounce completely in identical playing duration, each page word number is the same, and the time of each page pronunciation is equal, it is possible to calculate certain page of certain word broadcasting period in video. One broadcast page of the Fig. 2 video for recording by step (3), what assume to choose in step (2) is mandarin single-shot sound, shown in Fig. 2 is the 9th page, and word playing duration is 2 seconds, and the 9th page of the 32nd word " Shi Lu " in video can calculate its reproduction time section as follows:

Time used by 58 words of each page: 58 × 2=116 second

Upper page redirects the time used to nextpage: be assumed to 2 seconds (can set) when making software

Having 8 page jumps from page 1 to the 9th page, page jump is always with the time: 8 × 2=16 second

Video is played to the 9th page of beginning total time used from page 1: 116 × 8+16=944 second

9th page from the time used by the 1st word to the 31st word: 31 × 2=62 second

Total time used is play: the 944+62=1006 second from the 31st word of page 1 to the 9th page

So the 9th page the 32nd word " Shi Lu " reproduction time section is: 1006 seconds to 1008 seconds.

Therefore, if selecting reproduction time section is 1006 seconds to 1008 seconds, video will be play the voice of word " Shi Lu " and show the text image of correspondence.

When making video playback instrument, as long as finding the reproduction time section that word is corresponding, word pronunciation just can be found to play video, Fig. 3 show the retrieved page interface playing instrument with a simple video frequency searching of programming language establishment, when making broadcasting instrument, as long as find input word be which page which, so that it may calculate the time period of word video. Concrete grammar is as follows:

First by all words in institute's recorded video, from page 1 first to last page, last enters an array in order, find out input search word array which, then:

Word number=quotient+the remainder of which ÷ each page in array

When remainder is not 0, which page is the quotient+1 obtained be exactly, and remainder is exactly which is individual;

When remainder is 0, which page is the quotient obtained be exactly, and is this page last.

Such as word has 3000, is arranged in order by these 3000 words and is stored in an array, if word " flower " is the 256th in array, each page word has 58, then:

256 ÷ 58=business's 4+ remainders 24

Therefore, word " flower " is page 5 the 24th in video, recorded video shown in Fig. 3 is the speech audio frequency and video in many ways of a mandarin, precious jade language, Miao, strong words, English totally five kinds of language pronouncings, the voice document of every kind of language of each word all plays one time successively by the playing duration of 2 seconds, each word is play 10 seconds altogether, therefore by computational methods before:

Time used by 58 words of each page: 58 × 2 × 5=580 second

Upper page redirects the time used to nextpage: 2 seconds

Having 4 page jumps from page 1 to page 5, page jump is always with the time: 4 × 2=8 second

Video is played to page 5 beginning total time used from page 1: 580 × 4+8=2328 second

Page 5 is from the time used by the 1st word to the 23rd word: 23 × 2 × 5=230 second

From page 1 to page 5, total time used play in the 23rd word: the 2328+230=2558 second

So the 24th word " flower " reproduction time section of page 5 is: 2558 seconds to 2568 seconds

So, play video 2558 seconds to 2568 seconds, the video of five kinds of language pronouncings of word " flower " can be found, the Mandarin Chinese speech wherein playing " flower " in 2558 seconds to 2560 seconds, within 2560 seconds to 2562 seconds, play the precious jade language pronunciation of " flower ", within 2562 seconds to 2564 seconds, play the Miao pronunciation of " flower ", within 2564 seconds to 2566 seconds, play the strong words pronunciation of " flower ", the pronunciation of English playing " flower " in 2566 seconds to 2568 seconds, the index playing function playing instrument thus carrys out corresponding realization according to word with reproduction time section (or frame number section).

Claims

1. the storage of the sound data of language resource and search method, it is characterised in that comprise the following steps:

(1), the voice of all words of storage will be needed, illustrate that text, image, animation file collection are complete, all words are arranged in some pages, aligning method is: first set the word number that each page enters at most, then entering lower one page after being booked front page from front to back again, last page residue is many enters how many at least;

(2), put in order on the page by word and successively the voice document of this page of word is played out, also call explanation text corresponding to this word, image and animation while playing word pronunciation and carry out simultaneous display on one side at the page, one page word jumps to lower one page after playing continues to play, until all words all finish playing;Being that word sets a unified playing duration during broadcasting, the reproduction time making each word is all equal, and to ensure that the voice document of all words all can pronounce completely in the playing duration set; To be also that page jump during page turning sets a page jump time during broadcasting;

(3), by the voice of all words in step (2), illustrate text, image, animation broadcasting record into one section of video with display process record screen software, be achieved in the voice of word, explanation text, image, animation being combined into a sound video and carrying out global storage;

2. storage according to claim 1 and search method, it is characterised in that when the broadcasting of word only being selected in described step (2) to the pronunciation of a kind of language, the voice document of each word is all play one time by the playing duration set; When selecting the pronunciation of polyglot, the voice document of every kind of selected language of each word all plays one time successively by the playing duration set.

3. storage according to claim 1 and search method, it is characterised in that in described step (3), is different video formats by the video conversion recorded to store on different devices and to play.

4. storage according to claim 1 and search method, it is characterised in that in described step (4), uses different programming languages to be written as different broadcasting instruments, to adapt to the broadcasting demand of computer, mobile phone, network, TV difference media.

5. the storage of the sound data of language resource and search method, it is characterized in that, first word is divided into group according to the length of word pronunciation, respectively to often organizing word one suitable playing duration of setting, according still further to step as claimed in claim 1 (1), (2), (3) method one section of video is recorded into respectively in each group of word, then each section of video is merged into an overall video, is achieved in the global storage to the sound data of language resource, during index playing, by often organizing word, from page 1 first to last page, last enters an array in order, further according to each section of video order in overall video, each array is merged into a total array, a broadcasting instrument with index playing function is worked out with programming language, use this broadcasting instrument, according to every section of video position in overall video, the word quantity of every section of video, the word playing duration of every section of video, page jump time and each word arrangement position in total array, arbitrary word play position in overall video can be calculated, and then realize precise search and the broadcasting of the sound data of language resource to arbitrary word.