CN107273377A - Geography of dialect-sound spectrum acquisition technique - Google Patents

Geography of dialect-sound spectrum acquisition technique Download PDF

Info

Publication number
CN107273377A
CN107273377A CN201610213770.3A CN201610213770A CN107273377A CN 107273377 A CN107273377 A CN 107273377A CN 201610213770 A CN201610213770 A CN 201610213770A CN 107273377 A CN107273377 A CN 107273377A
Authority
CN
China
Prior art keywords
dialect
data
word
sound
acoustic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610213770.3A
Other languages
Chinese (zh)
Inventor
王雪飞
刘珺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huangshan University
Original Assignee
Huangshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huangshan University filed Critical Huangshan University
Priority to CN201610213770.3A priority Critical patent/CN107273377A/en
Publication of CN107273377A publication Critical patent/CN107273377A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/61Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/686Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

A kind of acoustic feature register instrument for the territorial dialect supported the invention discloses utilization GPS location and GIS-Geographic Information System, the geography information typing that dialect is obtained samples the geography information automatic input on ground with dialect.The present invention is by automatically forming geographical feature and provincialism data set in dialect recording process, forming quick indexing.Dialect acoustic feature data are extracted to form the storage of key character acoustic data spectrum and the storage of dialect acoustic feature pedigree, and display directly perceived shows region and the acoustic feature of dialect.

Description

Geography of dialect-sound spectrum acquisition technique
Technical field
A kind of acoustic feature record for the territorial dialect supported the invention discloses utilization GPS location and GIS-Geographic Information System Instrument, the geography information typing that dialect is obtained samples the geography information automatic input on ground with dialect.The present invention to dialect by recording Geographical feature and provincialism data set are automatically formed during system, quick indexing is formed.Dialect acoustic feature data extract shape Stored into the storage of key character acoustic data spectrum and dialect acoustic feature pedigree, display directly perceived shows region and the acoustics of dialect Feature.
Dialect vocal acoustics is sampled using GPS formation geographical position.Using DSP technologies are to audio fingerprint and carry out storage tube Reason, forms the geographical attribute identification of dialect.The present invention shows convenience, is greatly improved in terms of the complete search to dialect The reliability of dialect collection, reduction collection difficulty.
Background technology
In dialect recording process, to identity information and geography information, the related text of dialect or the acoustics of dialect speaker File is required to record, or forms mark, in order to use.But traditional manual record information is cumbersome, the later stage of automatic recording Workload greatly, formed by e-file(Sound), there is poorly efficient bottleneck in analysis and processing to the later stage.
The invention discloses the sound sampling storage of one kind standardization dialect and data storage technology.It is special the present invention relates to acoustics Levy structure, dialect data format, specified function dsp chip computing module, word-sound data memory module, display and manager Method.The inquiring about, can manage of dialect sound data, computable quick realization can be carried out, the research and knowledge of dialect is largely used to Not, conversion, intertranslation, and the classification of countries carried out using dialect.Dialect acoustic signature spectrum storage can for dialect research with Standardized calibration provides effective ways.
The formation of audio files has tape technologies, magnetic disc, the mode of solid-state storage three, and the file system formed is present Retrieval is difficult, calculate challenge;The acoustic sampling of dialect is dialectology and dialect application system steps necessary, but dialect is adopted Immediately special provincialism is found in sample, and carries out sociology and the Environmental Studies of dialect, is anxious to be resolved in dialectology The problem of;And the intertranslation technological core between dialect, it is that quick dialect is calculated(Recognize, compare and inquire about), current related side The Speech acoustic research of speech, completes the experiment of principle level, but lack achievable technological means.The present invention is utilizing standard speech On the basis of language line storehouse, the quick discovery and accurate calculating of provincialism are realized, dialect of the geographical position for mark is formed Parameters,acoustic characteristic spectrum is drawn.
The content of the invention
Geography of dialect-sound spectrum acquisition principle
The present invention reads aloud Identification of Images and the data sound spectrum technology of dialect using GPS auxiliary positionings, dialect, can be quick The partial data collection of dialect is completed, the efficiency that dialect collects application is improved.Principle is as shown in Figure 1.
The principle of the technology of the present invention is as shown in Figure 1.1. device is common microphone, preposition for the acquisition to voice data Data signal is formed after processing;2. device samples the photograph of people for dialect and mouth shape is imaged, and forms picture;3. it is signal transacting The computing unit with FFT and HMM functions of dsp chip composition, for calculating sound stream and image stream data, forms MFCC Characteristic coefficient, is compared after calculating for the data with standard word-acoustic memory 4., forms content of text, construction index is simultaneously Storage;6. device is used for textual scan, for the data content identification to certificate and associated documents, is a composition portion of index Point;4. it is known dialect word-sound database, for this word identification of being composed a piece of writing to dialect sound-content, dialect word-sound refers to by this The standard sounding sound of dialect(Or sample)Word(Or words and phrases)Corresponding " data set of the characteristic formation of sound ", by system Word-sound storehouse generation module is formed.Handled by dsp system(Audio files, content of text, image file and manipulative indexing File)7. file system, GPS 5. information is combined by arm systems, is formed with efficacious prescriptions speech data Cun Chudao databases.Stored Data text display 8. it is middle display with management.
Fig. 2 is the process of dialect sound file formation storage.The method that dialect is recorded is by dialect sound reader(Sample People)Text display content, carries out reading and forms voice signal, voice signal forms phoneme by DSP processing(Son)Queue and side Audio files is sayed, audio files is directly stored in memory;If local dialect needs life without standard pronunciation-word correspondence database Into word-sound java standard library 10., method is to start word-sound computational algorithm 9., generates java standard library;If there is local word tone storehouse, then will Dialect phoneme(Son)Queue is identified with word-sound storehouse, forms dialect text;Different word-sound storehouse, the dialect text identified Content is different.After the characteristic parameter generation of this dialect text, the part indexed as dialect databases;The opposing party Face, gps signal is used to recognize geographical position, and determines whether there is java standard library data in local data base for auxiliary, together When be also dialect index a part;The data of image information point dialect sample people(Including scanning file, portrait)With dialect Reading process picture synchronization signal, signal forms picture frame corresponding with corresponding phoneme queue by DPS system processing(Phoneme with Picture frame series)File, picture frame characteristic is a part of index;After index is formed, index, sound are constructed The storage organization of sound file, image file, is stored in, and formation can manage database.The content and whole system of database Managed by display interface, and man-machine interaction interface is provided.
3.2 words-sound data module
The queue of dialect phoneme forms framework as shown in figure 3, providing reading content by text display(Words sentence), reader faces Microphone is read with camera, forms image and audio files.
E-file stream is that sound is stored into memory, the sound source handled by DSP.DSP reads and adjusted after data Enter Fast Fourier Transform (FFT) (Fast Fourier Transformation, FFT), using energy as major parameter, with reference to MFCC (Mel Frequency Cepstrum Coefficient) carries out phoneme(Son)Segmentation, forms the diaphone using the time as ID Plain queue (Fig. 4).
Word therein-sound module, is that wherein voice data is by MFCC by standard word and dialect sound corresponding data library module Coefficient set, each word(Word, short sentence)Phoneme queue after one group of sound decomposition of correspondence, each phoneme is by a MFCC coefficients Collection(Structure)Correspondence.
Selected DSP constitutes computing system by TMS320C5502 chips, and uses general FTT (Fast Fourier Transformation, FFT) code and HMM (Hidden Markov Models, HMM) code formation sound data processing, And export MFCC (Mel Frequency Cepstrum Coefficient, MFCC) value.The every word or word read aloud(Sentence)Correspondence Audio files, line frequency-energy peak is entered by DSP FFT canonical functions and changed, energy frame is formed, so that MFCC formation is counted Calculate, MFCC has used general Hidden Markov (Hidden Markov Models, HMM) general-purpose computations function, form phoneme Queue.Per phoneme index data structure, generated by standard MFCC coefficients, six energy of six variables choice sound of correspondence are concentrated Frequency point, is 170Hz, 280 Hz, 400 Hz, 870 Hz, 1200 Hz, 1700 Hz correspondence MFCC variables.Per subdialect sound text Part is arranged by phoneme, forms ID, and forms the corresponding data structure of phoneme, as shown in table 1:
The data structure of the MFCC constructions of the phoneme of table 1
Word-sound module, is constituted by monosyllabic word, 100 words of double-tone word and short sentence meter and is read aloud text, standard reads aloud word(Word, sentence)By One display over the display, forms audio files to that should read aloud simultaneously typing for declaimer, forms " standard word-dialect sound " right File data is answered, data carry out filtering calculating with MFCC variables array by DSP FFT functions, form MFCC parameter sets, constitute word Corresponding voice data index, word-sound module is formed.In the data Cun Chudao flash datas storehouse of word-sound module generation, Flash Storage format is indexed by the paired standard word description of phoneme set of queues.
3.3 dialect words-sound module
Phoneme queue forms java standard library, is by word for word pronunciation, by word pronunciation, pronunciation is formed sentence by sentence, it is desirable to reliable and unique correspondence. Dialect standard pronunciation-character library the form formed, is the core database of accent recognition, and structure indexes two by word indexing and phoneme Mode.Word-sound java standard library is a part for dsp system.
Word-sound java standard library refers to have and the administrative marks of GPS or recognizable(Address)Region word-dialect sound database, It is to form the index basis that dialect can be inquired about.Java standard library by determine individual character, double word(Word), multiword(Word)With sentence(Single language Justice)Relative fragment of sound(Phoneme, phone)Composition.Java standard library is defined as limited conventional character library, quantity be defined as 100 words with 200 words(Containing sentence).The reading documents of namely java standard library are general unifications.
And can only, to part word formation text information in dialect file, be referred to as when being identified for dialect sound file For the text feature of dialect sound file, and as character type data(Type)" text index part " as index.Phoneme portion Divide with queuing feature and Mel parameters formation " phonetic feature "(Dialect).Form therein is as shown in Table 1:
Address Attribute Character Geographical position Reader Phonetic feature Standard corpus
Flash the or ROM Plays word of table one-sound database table structure
Dialect text is in locality by dialect declaimer(GPS)Dialect java standard library identification under, the text of formation.Such as Fig. 6 It is shown.
The file of picture signal formation, the identity information for generating dialect people is collectively forming this with gps data The index file of dialect collection.Index file and sound, read aloud text, image and be collectively forming storage, wherein indexed format such as table Shown in two:
ID Attribute Dialect text GPS Reader Phonetic feature Standard corpus Document location
The dialect sound file indexed format of table two.
The word of dialect-data sound spectrum form is calculated
Dialect acoustic data is composed to be formed, and people's reference standard text, an entire audio file of formation, construction one are read aloud by dialect Individual array, the structure of arrays is matching rate (1 byte), personal information code(64 bytes), image code(16 bytes), sound text Part and related image file storage location(URL, 128 bytes), it is other(8 bytes), referred to as dialect acoustic data compose.
Dialect acoustic data characteristic matching rate location algorithm, people's dialect of reading aloud for being formed to highest matching rate provides ground Domain identification name(Address or geographical position), its algorithm is by existing region(Administrative address)Database is provided, or is provided by GPS(Ground Reason);
Matching rate is calculated and completed by matching rate algorithm, and its pseudoprocess is as follows:
;It is known to have geographical position-phoneme queuing table k, represent k dialect type
;Phoneme queuing table 1 to be matched
;K tables are traveled through, k matching rate is formed
;Take it is maximum or 1 to meet type
;Take region name or gps values
;Otherwise(K values are not unique or are 0)
;K=null or 0, determines that position takes GPS value
Personal information code is made up of sex, age and identity;
Image code is by picture number(Integer, 8 bytes)And read aloud human head picture file(2 bytes), audio files title ID groups Into(6 bytes);
File storage location(URL), by file system(Operating system)It is determined that;
It is other including the classification tree position of this class dialect and error code(2 bytes).
Dialect region identification module
Territorial dialect word-sound identification library module, standard word is read aloud for dialect(Word)Sound is reconverted into the contrast of " dialect word " Storehouse, wherein matching rate(" dialect word " and the ratio of standard word)Highest(Or be 1)Zone name(Administrative address or geographical position Name), it is dialect ownership place;
When two kinds of regions are inconsistent, the findings data of two is recorded as, data are calculated using GPS to be main.
Brief description of the drawings:
Fig. 1 Geography of dialect-sound spectrum acquisition principle framework;
Fig. 2 Geography of dialect-sound index Storage Management Architecture;
The queue of Fig. 3 dialect phonemes forms framework;
Fig. 4 TMS320C5502 chips constitute computing system;
The queue of Fig. 5 phonemes and image frame queue formation block schematic illustration;
Fig. 6 dialects text formation framework.

Claims (6)

1. a kind of geography based on dialect acoustic feature-dialect acoustic data is composed and to form technology, its feature includes Geography of dialect number According to the principle and framework of sound spectrum;Wherein, Geography of dialect-sound spectrum acquisition principle, include dialect read aloud people geography information, Regional information and voice signal and data characteristics;The framework of Geography of dialect-sound spectrum, includes acoustic feature data and extracts queuing skill Art, dialect acoustic data argument sequence, and dialect acoustic data spectrum.
2. the geographical information collection technology according to right 1, it is characterized in that the region of geographic information data and provincialism is known Other technology, is specifically included:
D GPS locating module, the positioning for gps satellite location technology to locating module position;
Dialect region identification module.
3. dialect standard word-sound data technique according to right 1, it is characterized in that dialect standard word-sound flash is erasable Data storage storehouse chip, forms bright pronunciation circuit engineering corresponding with word, specifically includes:The flash storages of word-sound data module Technology.
4. the acoustic feature data according to right 1 extract queueing technique, it is characterized in that forming phoneme in audio files MFCC variable data queues, are specifically included:
The complete skill of word-sound data module;
Dialect word-sound module.
5. the dialect acoustic data spectrum according to right 1, it is characterized in that formed it is general can the storage text that exists of search index Part inquires about data structure, is embodied in:The word of dialect-data sound spectrum form is calculated.
6. a kind of dialect acoustic data as described in being required right 1 is composed to form work, it is characterised in that comprise the following steps:
1)Display is opened, dialect type identification word is shown, while GPS module starts;
2)After start key, read aloud people and start to read aloud against microphone and camera;
3)Often screen only shows a word or a word, or an outer short sentence, the display time divide automatically with manually;
4)It is made up of every time 100 words;
5)Read aloud after end, this dialect data modal data is shown automatically;
6)There is sound-character library generation key of acrolect at interface, and step is with above-mentioned but result deposit flash data storehouse.
CN201610213770.3A 2016-04-08 2016-04-08 Geography of dialect-sound spectrum acquisition technique Pending CN107273377A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610213770.3A CN107273377A (en) 2016-04-08 2016-04-08 Geography of dialect-sound spectrum acquisition technique

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610213770.3A CN107273377A (en) 2016-04-08 2016-04-08 Geography of dialect-sound spectrum acquisition technique

Publications (1)

Publication Number Publication Date
CN107273377A true CN107273377A (en) 2017-10-20

Family

ID=60051882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610213770.3A Pending CN107273377A (en) 2016-04-08 2016-04-08 Geography of dialect-sound spectrum acquisition technique

Country Status (1)

Country Link
CN (1) CN107273377A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428805A (en) * 2019-09-04 2019-11-08 福建省立医院 Non-generic words and mandarin inter-translation method, device and equipment
CN110765105A (en) * 2019-10-14 2020-02-07 珠海格力电器股份有限公司 Method, device, equipment and medium for establishing wake-up instruction database

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110428805A (en) * 2019-09-04 2019-11-08 福建省立医院 Non-generic words and mandarin inter-translation method, device and equipment
CN110765105A (en) * 2019-10-14 2020-02-07 珠海格力电器股份有限公司 Method, device, equipment and medium for establishing wake-up instruction database

Similar Documents

Publication Publication Date Title
US7725318B2 (en) System and method for improving the accuracy of audio searching
US7177795B1 (en) Methods and apparatus for semantic unit based automatic indexing and searching in data archive systems
CA2508946C (en) Method and apparatus for natural language call routing using confidence scores
US20110131038A1 (en) Exception dictionary creating unit, exception dictionary creating method, and program therefor, as well as speech recognition unit and speech recognition method
JP5653709B2 (en) Question answering system
US7792671B2 (en) Augmentation and calibration of output from non-deterministic text generators by modeling its characteristics in specific environments
US20050180547A1 (en) Automatic identification of telephone callers based on voice characteristics
US20110004473A1 (en) Apparatus and method for enhanced speech recognition
Tomashenko et al. The VoicePrivacy 2020 challenge evaluation plan
US20080077409A1 (en) Method and system for providing speech recognition
US20080243504A1 (en) System and method of speech recognition training based on confirmed speaker utterances
JP5017534B2 (en) Drinking state determination device and drinking state determination method
US20020082841A1 (en) Method and device for processing of speech information
Stemmer et al. Acoustic modeling of foreign words in a German speech recognition system
CN107273377A (en) Geography of dialect-sound spectrum acquisition technique
US20080243499A1 (en) System and method of speech recognition training based on confirmed speaker utterances
Mengistu Automatic text independent amharic language speaker recognition in noisy environment using hybrid approaches of LPCC, MFCC and GFCC
Imperl et al. Clustering of triphones using phoneme similarity estimation for the definition of a multilingual set of triphones
WO2014155652A1 (en) Speaker retrieval system and program
Wang Mandarin spoken document retrieval based on syllable lattice matching
Balpande et al. Speaker recognition based on mel-frequency cepstral coefficients and vector quantization
CN113691382A (en) Conference recording method, conference recording device, computer equipment and medium
JP2003255980A (en) Sound model forming method, speech recognition device, method and program, and program recording medium
EP1158491A2 (en) Personal data spoken input and retrieval
Oyucu Development of test corpus with large vocabulary for Turkish speech recognition system and a new test procedure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171020

WD01 Invention patent application deemed withdrawn after publication