CN107273377A

CN107273377A - Geography of dialect-sound spectrum acquisition technique

Info

Publication number: CN107273377A
Application number: CN201610213770.3A
Authority: CN
Inventors: 王雪飞; 刘珺
Original assignee: Huangshan University
Current assignee: Huangshan University
Priority date: 2016-04-08
Filing date: 2016-04-08
Publication date: 2017-10-20

Abstract

A kind of acoustic feature register instrument for the territorial dialect supported the invention discloses utilization GPS location and GIS-Geographic Information System, the geography information typing that dialect is obtained samples the geography information automatic input on ground with dialect.The present invention is by automatically forming geographical feature and provincialism data set in dialect recording process, forming quick indexing.Dialect acoustic feature data are extracted to form the storage of key character acoustic data spectrum and the storage of dialect acoustic feature pedigree, and display directly perceived shows region and the acoustic feature of dialect.

Description

Geography of dialect-sound spectrum acquisition technique

Technical field

A kind of acoustic feature record for the territorial dialect supported the invention discloses utilization GPS location and GIS-Geographic Information System Instrument, the geography information typing that dialect is obtained samples the geography information automatic input on ground with dialect.The present invention to dialect by recording Geographical feature and provincialism data set are automatically formed during system, quick indexing is formed.Dialect acoustic feature data extract shape Stored into the storage of key character acoustic data spectrum and dialect acoustic feature pedigree, display directly perceived shows region and the acoustics of dialect Feature.

Dialect vocal acoustics is sampled using GPS formation geographical position.Using DSP technologies are to audio fingerprint and carry out storage tube Reason, forms the geographical attribute identification of dialect.The present invention shows convenience, is greatly improved in terms of the complete search to dialect The reliability of dialect collection, reduction collection difficulty.

Background technology

In dialect recording process, to identity information and geography information, the related text of dialect or the acoustics of dialect speaker File is required to record, or forms mark, in order to use.But traditional manual record information is cumbersome, the later stage of automatic recording Workload greatly, formed by e-file（Sound）, there is poorly efficient bottleneck in analysis and processing to the later stage.

The invention discloses the sound sampling storage of one kind standardization dialect and data storage technology.It is special the present invention relates to acoustics Levy structure, dialect data format, specified function dsp chip computing module, word-sound data memory module, display and manager Method.The inquiring about, can manage of dialect sound data, computable quick realization can be carried out, the research and knowledge of dialect is largely used to Not, conversion, intertranslation, and the classification of countries carried out using dialect.Dialect acoustic signature spectrum storage can for dialect research with Standardized calibration provides effective ways.

The formation of audio files has tape technologies, magnetic disc, the mode of solid-state storage three, and the file system formed is present Retrieval is difficult, calculate challenge；The acoustic sampling of dialect is dialectology and dialect application system steps necessary, but dialect is adopted Immediately special provincialism is found in sample, and carries out sociology and the Environmental Studies of dialect, is anxious to be resolved in dialectology The problem of；And the intertranslation technological core between dialect, it is that quick dialect is calculated（Recognize, compare and inquire about）, current related side The Speech acoustic research of speech, completes the experiment of principle level, but lack achievable technological means.The present invention is utilizing standard speech On the basis of language line storehouse, the quick discovery and accurate calculating of provincialism are realized, dialect of the geographical position for mark is formed Parameters,acoustic characteristic spectrum is drawn.

The content of the invention

Geography of dialect-sound spectrum acquisition principle

The present invention reads aloud Identification of Images and the data sound spectrum technology of dialect using GPS auxiliary positionings, dialect, can be quick The partial data collection of dialect is completed, the efficiency that dialect collects application is improved.Principle is as shown in Figure 1.

The principle of the technology of the present invention is as shown in Figure 1.1. device is common microphone, preposition for the acquisition to voice data Data signal is formed after processing；2. device samples the photograph of people for dialect and mouth shape is imaged, and forms picture；3. it is signal transacting The computing unit with FFT and HMM functions of dsp chip composition, for calculating sound stream and image stream data, forms MFCC Characteristic coefficient, is compared after calculating for the data with standard word-acoustic memory 4., forms content of text, construction index is simultaneously Storage；6. device is used for textual scan, for the data content identification to certificate and associated documents, is a composition portion of index Point；4. it is known dialect word-sound database, for this word identification of being composed a piece of writing to dialect sound-content, dialect word-sound refers to by this The standard sounding sound of dialect（Or sample）Word（Or words and phrases）Corresponding " data set of the characteristic formation of sound ", by system Word-sound storehouse generation module is formed.Handled by dsp system（Audio files, content of text, image file and manipulative indexing File）7. file system, GPS 5. information is combined by arm systems, is formed with efficacious prescriptions speech data Cun Chudao databases.Stored Data text display 8. it is middle display with management.

Fig. 2 is the process of dialect sound file formation storage.The method that dialect is recorded is by dialect sound reader（Sample People）Text display content, carries out reading and forms voice signal, voice signal forms phoneme by DSP processing（Son）Queue and side Audio files is sayed, audio files is directly stored in memory；If local dialect needs life without standard pronunciation-word correspondence database Into word-sound java standard library 10., method is to start word-sound computational algorithm 9., generates java standard library；If there is local word tone storehouse, then will Dialect phoneme（Son）Queue is identified with word-sound storehouse, forms dialect text；Different word-sound storehouse, the dialect text identified Content is different.After the characteristic parameter generation of this dialect text, the part indexed as dialect databases；The opposing party Face, gps signal is used to recognize geographical position, and determines whether there is java standard library data in local data base for auxiliary, together When be also dialect index a part；The data of image information point dialect sample people（Including scanning file, portrait）With dialect Reading process picture synchronization signal, signal forms picture frame corresponding with corresponding phoneme queue by DPS system processing（Phoneme with Picture frame series）File, picture frame characteristic is a part of index；After index is formed, index, sound are constructed The storage organization of sound file, image file, is stored in, and formation can manage database.The content and whole system of database Managed by display interface, and man-machine interaction interface is provided.

3.2 words-sound data module

The queue of dialect phoneme forms framework as shown in figure 3, providing reading content by text display（Words sentence）, reader faces Microphone is read with camera, forms image and audio files.

E-file stream is that sound is stored into memory, the sound source handled by DSP.DSP reads and adjusted after data Enter Fast Fourier Transform (FFT) (Fast Fourier Transformation, FFT), using energy as major parameter, with reference to MFCC (Mel Frequency Cepstrum Coefficient) carries out phoneme（Son）Segmentation, forms the diaphone using the time as ID Plain queue (Fig. 4).

Word therein-sound module, is that wherein voice data is by MFCC by standard word and dialect sound corresponding data library module Coefficient set, each word（Word, short sentence）Phoneme queue after one group of sound decomposition of correspondence, each phoneme is by a MFCC coefficients Collection（Structure）Correspondence.

Selected DSP constitutes computing system by TMS320C5502 chips, and uses general FTT (Fast Fourier Transformation, FFT) code and HMM (Hidden Markov Models, HMM) code formation sound data processing, And export MFCC (Mel Frequency Cepstrum Coefficient, MFCC) value.The every word or word read aloud（Sentence）Correspondence Audio files, line frequency-energy peak is entered by DSP FFT canonical functions and changed, energy frame is formed, so that MFCC formation is counted Calculate, MFCC has used general Hidden Markov (Hidden Markov Models, HMM) general-purpose computations function, form phoneme Queue.Per phoneme index data structure, generated by standard MFCC coefficients, six energy of six variables choice sound of correspondence are concentrated Frequency point, is 170Hz, 280 Hz, 400 Hz, 870 Hz, 1200 Hz, 1700 Hz correspondence MFCC variables.Per subdialect sound text Part is arranged by phoneme, forms ID, and forms the corresponding data structure of phoneme, as shown in table 1：

The data structure of the MFCC constructions of the phoneme of table 1

Word-sound module, is constituted by monosyllabic word, 100 words of double-tone word and short sentence meter and is read aloud text, standard reads aloud word（Word, sentence）By One display over the display, forms audio files to that should read aloud simultaneously typing for declaimer, forms " standard word-dialect sound " right File data is answered, data carry out filtering calculating with MFCC variables array by DSP FFT functions, form MFCC parameter sets, constitute word Corresponding voice data index, word-sound module is formed.In the data Cun Chudao flash datas storehouse of word-sound module generation, Flash Storage format is indexed by the paired standard word description of phoneme set of queues.

3.3 dialect words-sound module

Phoneme queue forms java standard library, is by word for word pronunciation, by word pronunciation, pronunciation is formed sentence by sentence, it is desirable to reliable and unique correspondence. Dialect standard pronunciation-character library the form formed, is the core database of accent recognition, and structure indexes two by word indexing and phoneme Mode.Word-sound java standard library is a part for dsp system.

Word-sound java standard library refers to have and the administrative marks of GPS or recognizable（Address）Region word-dialect sound database, It is to form the index basis that dialect can be inquired about.Java standard library by determine individual character, double word（Word）, multiword（Word）With sentence（Single language Justice）Relative fragment of sound（Phoneme, phone）Composition.Java standard library is defined as limited conventional character library, quantity be defined as 100 words with 200 words（Containing sentence）.The reading documents of namely java standard library are general unifications.

And can only, to part word formation text information in dialect file, be referred to as when being identified for dialect sound file For the text feature of dialect sound file, and as character type data（Type）" text index part " as index.Phoneme portion Divide with queuing feature and Mel parameters formation " phonetic feature "（Dialect）.Form therein is as shown in Table 1：

Address

Attribute

Character

Geographical position

Reader

Phonetic feature

Standard corpus

Flash the or ROM Plays word of table one-sound database table structure

Dialect text is in locality by dialect declaimer（GPS）Dialect java standard library identification under, the text of formation.Such as Fig. 6 It is shown.

The file of picture signal formation, the identity information for generating dialect people is collectively forming this with gps data The index file of dialect collection.Index file and sound, read aloud text, image and be collectively forming storage, wherein indexed format such as table Shown in two：

ID

Attribute

Dialect text

GPS

Reader

Phonetic feature

Standard corpus

Document location

The dialect sound file indexed format of table two.

The word of dialect-data sound spectrum form is calculated

Dialect acoustic data is composed to be formed, and people's reference standard text, an entire audio file of formation, construction one are read aloud by dialect Individual array, the structure of arrays is matching rate (1 byte), personal information code（64 bytes）, image code（16 bytes）, sound text Part and related image file storage location（URL, 128 bytes）, it is other（8 bytes）, referred to as dialect acoustic data compose.

Dialect acoustic data characteristic matching rate location algorithm, people's dialect of reading aloud for being formed to highest matching rate provides ground Domain identification name（Address or geographical position）, its algorithm is by existing region（Administrative address）Database is provided, or is provided by GPS（Ground Reason）；

Matching rate is calculated and completed by matching rate algorithm, and its pseudoprocess is as follows：

；It is known to have geographical position-phoneme queuing table k, represent k dialect type

；Phoneme queuing table 1 to be matched

；K tables are traveled through, k matching rate is formed

；Take it is maximum or 1 to meet type

；Take region name or gps values

；Otherwise（K values are not unique or are 0）

；K=null or 0, determines that position takes GPS value

Personal information code is made up of sex, age and identity；

Image code is by picture number（Integer, 8 bytes）And read aloud human head picture file（2 bytes）, audio files title ID groups Into（6 bytes）；

File storage location（URL）, by file system（Operating system）It is determined that；

It is other including the classification tree position of this class dialect and error code（2 bytes）.

Dialect region identification module

Territorial dialect word-sound identification library module, standard word is read aloud for dialect（Word）Sound is reconverted into the contrast of " dialect word " Storehouse, wherein matching rate（" dialect word " and the ratio of standard word）Highest（Or be 1）Zone name（Administrative address or geographical position Name）, it is dialect ownership place；

When two kinds of regions are inconsistent, the findings data of two is recorded as, data are calculated using GPS to be main.

Brief description of the drawings：

Fig. 1 Geography of dialect-sound spectrum acquisition principle framework；

Fig. 2 Geography of dialect-sound index Storage Management Architecture；

The queue of Fig. 3 dialect phonemes forms framework；

Fig. 4 TMS320C5502 chips constitute computing system；

The queue of Fig. 5 phonemes and image frame queue formation block schematic illustration；

Fig. 6 dialects text formation framework.

Claims

1. a kind of geography based on dialect acoustic feature-dialect acoustic data is composed and to form technology, its feature includes Geography of dialect number According to the principle and framework of sound spectrum；Wherein, Geography of dialect-sound spectrum acquisition principle, include dialect read aloud people geography information, Regional information and voice signal and data characteristics；The framework of Geography of dialect-sound spectrum, includes acoustic feature data and extracts queuing skill Art, dialect acoustic data argument sequence, and dialect acoustic data spectrum.

2. the geographical information collection technology according to right 1, it is characterized in that the region of geographic information data and provincialism is known Other technology, is specifically included：

D GPS locating module, the positioning for gps satellite location technology to locating module position；

Dialect region identification module.

3. dialect standard word-sound data technique according to right 1, it is characterized in that dialect standard word-sound flash is erasable Data storage storehouse chip, forms bright pronunciation circuit engineering corresponding with word, specifically includes：The flash storages of word-sound data module Technology.

4. the acoustic feature data according to right 1 extract queueing technique, it is characterized in that forming phoneme in audio files MFCC variable data queues, are specifically included：

The complete skill of word-sound data module；

Dialect word-sound module.

5. the dialect acoustic data spectrum according to right 1, it is characterized in that formed it is general can the storage text that exists of search index Part inquires about data structure, is embodied in：The word of dialect-data sound spectrum form is calculated.

6. a kind of dialect acoustic data as described in being required right 1 is composed to form work, it is characterised in that comprise the following steps：

1）Display is opened, dialect type identification word is shown, while GPS module starts；

2）After start key, read aloud people and start to read aloud against microphone and camera；

3）Often screen only shows a word or a word, or an outer short sentence, the display time divide automatically with manually；

4）It is made up of every time 100 words；

5）Read aloud after end, this dialect data modal data is shown automatically；

6）There is sound-character library generation key of acrolect at interface, and step is with above-mentioned but result deposit flash data storehouse.