CN107273377A - Geography of dialect-sound spectrum acquisition technique - Google Patents
Geography of dialect-sound spectrum acquisition technique Download PDFInfo
- Publication number
- CN107273377A CN107273377A CN201610213770.3A CN201610213770A CN107273377A CN 107273377 A CN107273377 A CN 107273377A CN 201610213770 A CN201610213770 A CN 201610213770A CN 107273377 A CN107273377 A CN 107273377A
- Authority
- CN
- China
- Prior art keywords
- dialect
- data
- word
- sound
- acoustic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/61—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/686—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
A kind of acoustic feature register instrument for the territorial dialect supported the invention discloses utilization GPS location and GIS-Geographic Information System, the geography information typing that dialect is obtained samples the geography information automatic input on ground with dialect.The present invention is by automatically forming geographical feature and provincialism data set in dialect recording process, forming quick indexing.Dialect acoustic feature data are extracted to form the storage of key character acoustic data spectrum and the storage of dialect acoustic feature pedigree, and display directly perceived shows region and the acoustic feature of dialect.
Description
Technical field
A kind of acoustic feature record for the territorial dialect supported the invention discloses utilization GPS location and GIS-Geographic Information System
Instrument, the geography information typing that dialect is obtained samples the geography information automatic input on ground with dialect.The present invention to dialect by recording
Geographical feature and provincialism data set are automatically formed during system, quick indexing is formed.Dialect acoustic feature data extract shape
Stored into the storage of key character acoustic data spectrum and dialect acoustic feature pedigree, display directly perceived shows region and the acoustics of dialect
Feature.
Dialect vocal acoustics is sampled using GPS formation geographical position.Using DSP technologies are to audio fingerprint and carry out storage tube
Reason, forms the geographical attribute identification of dialect.The present invention shows convenience, is greatly improved in terms of the complete search to dialect
The reliability of dialect collection, reduction collection difficulty.
Background technology
In dialect recording process, to identity information and geography information, the related text of dialect or the acoustics of dialect speaker
File is required to record, or forms mark, in order to use.But traditional manual record information is cumbersome, the later stage of automatic recording
Workload greatly, formed by e-file(Sound), there is poorly efficient bottleneck in analysis and processing to the later stage.
The invention discloses the sound sampling storage of one kind standardization dialect and data storage technology.It is special the present invention relates to acoustics
Levy structure, dialect data format, specified function dsp chip computing module, word-sound data memory module, display and manager
Method.The inquiring about, can manage of dialect sound data, computable quick realization can be carried out, the research and knowledge of dialect is largely used to
Not, conversion, intertranslation, and the classification of countries carried out using dialect.Dialect acoustic signature spectrum storage can for dialect research with
Standardized calibration provides effective ways.
The formation of audio files has tape technologies, magnetic disc, the mode of solid-state storage three, and the file system formed is present
Retrieval is difficult, calculate challenge;The acoustic sampling of dialect is dialectology and dialect application system steps necessary, but dialect is adopted
Immediately special provincialism is found in sample, and carries out sociology and the Environmental Studies of dialect, is anxious to be resolved in dialectology
The problem of;And the intertranslation technological core between dialect, it is that quick dialect is calculated(Recognize, compare and inquire about), current related side
The Speech acoustic research of speech, completes the experiment of principle level, but lack achievable technological means.The present invention is utilizing standard speech
On the basis of language line storehouse, the quick discovery and accurate calculating of provincialism are realized, dialect of the geographical position for mark is formed
Parameters,acoustic characteristic spectrum is drawn.
The content of the invention
Geography of dialect-sound spectrum acquisition principle
The present invention reads aloud Identification of Images and the data sound spectrum technology of dialect using GPS auxiliary positionings, dialect, can be quick
The partial data collection of dialect is completed, the efficiency that dialect collects application is improved.Principle is as shown in Figure 1.
The principle of the technology of the present invention is as shown in Figure 1.1. device is common microphone, preposition for the acquisition to voice data
Data signal is formed after processing;2. device samples the photograph of people for dialect and mouth shape is imaged, and forms picture;3. it is signal transacting
The computing unit with FFT and HMM functions of dsp chip composition, for calculating sound stream and image stream data, forms MFCC
Characteristic coefficient, is compared after calculating for the data with standard word-acoustic memory 4., forms content of text, construction index is simultaneously
Storage;6. device is used for textual scan, for the data content identification to certificate and associated documents, is a composition portion of index
Point;4. it is known dialect word-sound database, for this word identification of being composed a piece of writing to dialect sound-content, dialect word-sound refers to by this
The standard sounding sound of dialect(Or sample)Word(Or words and phrases)Corresponding " data set of the characteristic formation of sound ", by system
Word-sound storehouse generation module is formed.Handled by dsp system(Audio files, content of text, image file and manipulative indexing
File)7. file system, GPS 5. information is combined by arm systems, is formed with efficacious prescriptions speech data Cun Chudao databases.Stored
Data text display 8. it is middle display with management.
Fig. 2 is the process of dialect sound file formation storage.The method that dialect is recorded is by dialect sound reader(Sample
People)Text display content, carries out reading and forms voice signal, voice signal forms phoneme by DSP processing(Son)Queue and side
Audio files is sayed, audio files is directly stored in memory;If local dialect needs life without standard pronunciation-word correspondence database
Into word-sound java standard library 10., method is to start word-sound computational algorithm 9., generates java standard library;If there is local word tone storehouse, then will
Dialect phoneme(Son)Queue is identified with word-sound storehouse, forms dialect text;Different word-sound storehouse, the dialect text identified
Content is different.After the characteristic parameter generation of this dialect text, the part indexed as dialect databases;The opposing party
Face, gps signal is used to recognize geographical position, and determines whether there is java standard library data in local data base for auxiliary, together
When be also dialect index a part;The data of image information point dialect sample people(Including scanning file, portrait)With dialect
Reading process picture synchronization signal, signal forms picture frame corresponding with corresponding phoneme queue by DPS system processing(Phoneme with
Picture frame series)File, picture frame characteristic is a part of index;After index is formed, index, sound are constructed
The storage organization of sound file, image file, is stored in, and formation can manage database.The content and whole system of database
Managed by display interface, and man-machine interaction interface is provided.
3.2 words-sound data module
The queue of dialect phoneme forms framework as shown in figure 3, providing reading content by text display(Words sentence), reader faces
Microphone is read with camera, forms image and audio files.
E-file stream is that sound is stored into memory, the sound source handled by DSP.DSP reads and adjusted after data
Enter Fast Fourier Transform (FFT) (Fast Fourier Transformation, FFT), using energy as major parameter, with reference to MFCC
(Mel Frequency Cepstrum Coefficient) carries out phoneme(Son)Segmentation, forms the diaphone using the time as ID
Plain queue (Fig. 4).
Word therein-sound module, is that wherein voice data is by MFCC by standard word and dialect sound corresponding data library module
Coefficient set, each word(Word, short sentence)Phoneme queue after one group of sound decomposition of correspondence, each phoneme is by a MFCC coefficients
Collection(Structure)Correspondence.
Selected DSP constitutes computing system by TMS320C5502 chips, and uses general FTT (Fast Fourier
Transformation, FFT) code and HMM (Hidden Markov Models, HMM) code formation sound data processing,
And export MFCC (Mel Frequency Cepstrum Coefficient, MFCC) value.The every word or word read aloud(Sentence)Correspondence
Audio files, line frequency-energy peak is entered by DSP FFT canonical functions and changed, energy frame is formed, so that MFCC formation is counted
Calculate, MFCC has used general Hidden Markov (Hidden Markov Models, HMM) general-purpose computations function, form phoneme
Queue.Per phoneme index data structure, generated by standard MFCC coefficients, six energy of six variables choice sound of correspondence are concentrated
Frequency point, is 170Hz, 280 Hz, 400 Hz, 870 Hz, 1200 Hz, 1700 Hz correspondence MFCC variables.Per subdialect sound text
Part is arranged by phoneme, forms ID, and forms the corresponding data structure of phoneme, as shown in table 1:
The data structure of the MFCC constructions of the phoneme of table 1
Word-sound module, is constituted by monosyllabic word, 100 words of double-tone word and short sentence meter and is read aloud text, standard reads aloud word(Word, sentence)By
One display over the display, forms audio files to that should read aloud simultaneously typing for declaimer, forms " standard word-dialect sound " right
File data is answered, data carry out filtering calculating with MFCC variables array by DSP FFT functions, form MFCC parameter sets, constitute word
Corresponding voice data index, word-sound module is formed.In the data Cun Chudao flash datas storehouse of word-sound module generation, Flash
Storage format is indexed by the paired standard word description of phoneme set of queues.
3.3 dialect words-sound module
Phoneme queue forms java standard library, is by word for word pronunciation, by word pronunciation, pronunciation is formed sentence by sentence, it is desirable to reliable and unique correspondence.
Dialect standard pronunciation-character library the form formed, is the core database of accent recognition, and structure indexes two by word indexing and phoneme
Mode.Word-sound java standard library is a part for dsp system.
Word-sound java standard library refers to have and the administrative marks of GPS or recognizable(Address)Region word-dialect sound database,
It is to form the index basis that dialect can be inquired about.Java standard library by determine individual character, double word(Word), multiword(Word)With sentence(Single language
Justice)Relative fragment of sound(Phoneme, phone)Composition.Java standard library is defined as limited conventional character library, quantity be defined as 100 words with
200 words(Containing sentence).The reading documents of namely java standard library are general unifications.
And can only, to part word formation text information in dialect file, be referred to as when being identified for dialect sound file
For the text feature of dialect sound file, and as character type data(Type)" text index part " as index.Phoneme portion
Divide with queuing feature and Mel parameters formation " phonetic feature "(Dialect).Form therein is as shown in Table 1:
Address | Attribute | Character | Geographical position | Reader | Phonetic feature | Standard corpus |
Flash the or ROM Plays word of table one-sound database table structure
Dialect text is in locality by dialect declaimer(GPS)Dialect java standard library identification under, the text of formation.Such as Fig. 6
It is shown.
The file of picture signal formation, the identity information for generating dialect people is collectively forming this with gps data
The index file of dialect collection.Index file and sound, read aloud text, image and be collectively forming storage, wherein indexed format such as table
Shown in two:
ID | Attribute | Dialect text | GPS | Reader | Phonetic feature | Standard corpus | Document location |
The dialect sound file indexed format of table two.
The word of dialect-data sound spectrum form is calculated
Dialect acoustic data is composed to be formed, and people's reference standard text, an entire audio file of formation, construction one are read aloud by dialect
Individual array, the structure of arrays is matching rate (1 byte), personal information code(64 bytes), image code(16 bytes), sound text
Part and related image file storage location(URL, 128 bytes), it is other(8 bytes), referred to as dialect acoustic data compose.
Dialect acoustic data characteristic matching rate location algorithm, people's dialect of reading aloud for being formed to highest matching rate provides ground
Domain identification name(Address or geographical position), its algorithm is by existing region(Administrative address)Database is provided, or is provided by GPS(Ground
Reason);
Matching rate is calculated and completed by matching rate algorithm, and its pseudoprocess is as follows:
;It is known to have geographical position-phoneme queuing table k, represent k dialect type
;Phoneme queuing table 1 to be matched
;K tables are traveled through, k matching rate is formed
;Take it is maximum or 1 to meet type
;Take region name or gps values
;Otherwise(K values are not unique or are 0)
;K=null or 0, determines that position takes GPS value
Personal information code is made up of sex, age and identity;
Image code is by picture number(Integer, 8 bytes)And read aloud human head picture file(2 bytes), audio files title ID groups
Into(6 bytes);
File storage location(URL), by file system(Operating system)It is determined that;
It is other including the classification tree position of this class dialect and error code(2 bytes).
Dialect region identification module
Territorial dialect word-sound identification library module, standard word is read aloud for dialect(Word)Sound is reconverted into the contrast of " dialect word "
Storehouse, wherein matching rate(" dialect word " and the ratio of standard word)Highest(Or be 1)Zone name(Administrative address or geographical position
Name), it is dialect ownership place;
When two kinds of regions are inconsistent, the findings data of two is recorded as, data are calculated using GPS to be main.
Brief description of the drawings:
Fig. 1 Geography of dialect-sound spectrum acquisition principle framework;
Fig. 2 Geography of dialect-sound index Storage Management Architecture;
The queue of Fig. 3 dialect phonemes forms framework;
Fig. 4 TMS320C5502 chips constitute computing system;
The queue of Fig. 5 phonemes and image frame queue formation block schematic illustration;
Fig. 6 dialects text formation framework.
Claims (6)
1. a kind of geography based on dialect acoustic feature-dialect acoustic data is composed and to form technology, its feature includes Geography of dialect number
According to the principle and framework of sound spectrum;Wherein, Geography of dialect-sound spectrum acquisition principle, include dialect read aloud people geography information,
Regional information and voice signal and data characteristics;The framework of Geography of dialect-sound spectrum, includes acoustic feature data and extracts queuing skill
Art, dialect acoustic data argument sequence, and dialect acoustic data spectrum.
2. the geographical information collection technology according to right 1, it is characterized in that the region of geographic information data and provincialism is known
Other technology, is specifically included:
D GPS locating module, the positioning for gps satellite location technology to locating module position;
Dialect region identification module.
3. dialect standard word-sound data technique according to right 1, it is characterized in that dialect standard word-sound flash is erasable
Data storage storehouse chip, forms bright pronunciation circuit engineering corresponding with word, specifically includes:The flash storages of word-sound data module
Technology.
4. the acoustic feature data according to right 1 extract queueing technique, it is characterized in that forming phoneme in audio files
MFCC variable data queues, are specifically included:
The complete skill of word-sound data module;
Dialect word-sound module.
5. the dialect acoustic data spectrum according to right 1, it is characterized in that formed it is general can the storage text that exists of search index
Part inquires about data structure, is embodied in:The word of dialect-data sound spectrum form is calculated.
6. a kind of dialect acoustic data as described in being required right 1 is composed to form work, it is characterised in that comprise the following steps:
1)Display is opened, dialect type identification word is shown, while GPS module starts;
2)After start key, read aloud people and start to read aloud against microphone and camera;
3)Often screen only shows a word or a word, or an outer short sentence, the display time divide automatically with manually;
4)It is made up of every time 100 words;
5)Read aloud after end, this dialect data modal data is shown automatically;
6)There is sound-character library generation key of acrolect at interface, and step is with above-mentioned but result deposit flash data storehouse.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610213770.3A CN107273377A (en) | 2016-04-08 | 2016-04-08 | Geography of dialect-sound spectrum acquisition technique |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610213770.3A CN107273377A (en) | 2016-04-08 | 2016-04-08 | Geography of dialect-sound spectrum acquisition technique |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107273377A true CN107273377A (en) | 2017-10-20 |
Family
ID=60051882
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610213770.3A Pending CN107273377A (en) | 2016-04-08 | 2016-04-08 | Geography of dialect-sound spectrum acquisition technique |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107273377A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110428805A (en) * | 2019-09-04 | 2019-11-08 | 福建省立医院 | Non-generic words and mandarin inter-translation method, device and equipment |
CN110765105A (en) * | 2019-10-14 | 2020-02-07 | 珠海格力电器股份有限公司 | Method, device, equipment and medium for establishing wake-up instruction database |
-
2016
- 2016-04-08 CN CN201610213770.3A patent/CN107273377A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110428805A (en) * | 2019-09-04 | 2019-11-08 | 福建省立医院 | Non-generic words and mandarin inter-translation method, device and equipment |
CN110765105A (en) * | 2019-10-14 | 2020-02-07 | 珠海格力电器股份有限公司 | Method, device, equipment and medium for establishing wake-up instruction database |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7725318B2 (en) | System and method for improving the accuracy of audio searching | |
US7177795B1 (en) | Methods and apparatus for semantic unit based automatic indexing and searching in data archive systems | |
CA2508946C (en) | Method and apparatus for natural language call routing using confidence scores | |
US20110131038A1 (en) | Exception dictionary creating unit, exception dictionary creating method, and program therefor, as well as speech recognition unit and speech recognition method | |
JP5653709B2 (en) | Question answering system | |
US7792671B2 (en) | Augmentation and calibration of output from non-deterministic text generators by modeling its characteristics in specific environments | |
US20050180547A1 (en) | Automatic identification of telephone callers based on voice characteristics | |
US20110004473A1 (en) | Apparatus and method for enhanced speech recognition | |
Tomashenko et al. | The VoicePrivacy 2020 challenge evaluation plan | |
US20080077409A1 (en) | Method and system for providing speech recognition | |
US20080243504A1 (en) | System and method of speech recognition training based on confirmed speaker utterances | |
JP5017534B2 (en) | Drinking state determination device and drinking state determination method | |
US20020082841A1 (en) | Method and device for processing of speech information | |
Stemmer et al. | Acoustic modeling of foreign words in a German speech recognition system | |
CN107273377A (en) | Geography of dialect-sound spectrum acquisition technique | |
US20080243499A1 (en) | System and method of speech recognition training based on confirmed speaker utterances | |
Mengistu | Automatic text independent amharic language speaker recognition in noisy environment using hybrid approaches of LPCC, MFCC and GFCC | |
Imperl et al. | Clustering of triphones using phoneme similarity estimation for the definition of a multilingual set of triphones | |
WO2014155652A1 (en) | Speaker retrieval system and program | |
Wang | Mandarin spoken document retrieval based on syllable lattice matching | |
Balpande et al. | Speaker recognition based on mel-frequency cepstral coefficients and vector quantization | |
CN113691382A (en) | Conference recording method, conference recording device, computer equipment and medium | |
JP2003255980A (en) | Sound model forming method, speech recognition device, method and program, and program recording medium | |
EP1158491A2 (en) | Personal data spoken input and retrieval | |
Oyucu | Development of test corpus with large vocabulary for Turkish speech recognition system and a new test procedure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20171020 |
|
WD01 | Invention patent application deemed withdrawn after publication |