CN108597493A - The audio exchange method and audio exchange system, coded graphics of language semantic - Google Patents

The audio exchange method and audio exchange system, coded graphics of language semantic Download PDF

Info

Publication number
CN108597493A
CN108597493A CN201810264460.3A CN201810264460A CN108597493A CN 108597493 A CN108597493 A CN 108597493A CN 201810264460 A CN201810264460 A CN 201810264460A CN 108597493 A CN108597493 A CN 108597493A
Authority
CN
China
Prior art keywords
language
voice
semantic
phoneme
minimum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810264460.3A
Other languages
Chinese (zh)
Other versions
CN108597493B (en
Inventor
孔繁泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201810264460.3A priority Critical patent/CN108597493B/en
Priority to CN201910143693.2A priority patent/CN109754780B/en
Publication of CN108597493A publication Critical patent/CN108597493A/en
Priority to PCT/CN2019/079834 priority patent/WO2019184942A1/en
Application granted granted Critical
Publication of CN108597493B publication Critical patent/CN108597493B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)

Abstract

Audio exchange method, system and the audio coding figure of the language semantic of the present invention, because semantic complexity causes data to respond the technical problem to go wrong with real-time difference when solving linguistic intertranslation in the prior art.Method forms the voice mapping structure of each language including the use of minimum aligned phoneme sequence, completes to convert between semantic language by each voice mapping structure.The minimum phoneme of audio minimum short section is formed in being constituted using language as the master data crosspoint of semantic conversion between each language, using minimum phoneme as the basis of coding of data exchange, change the foundation structure of speech recognition, optimize the codec complexity and accuracy rate of language sound intermediate frequency content, so that avoiding the complex audio feature for being coupled the formation of the composite informations such as language fragments medium pitch, scale, range in the cataloged procedure of language audio, phonetic recognization rate ensure that.The mapping structure of the voice coding and literal code that are formed using minimum phoneme makes data exchange efficiency when language translation be improved.

Description

The audio exchange method and audio exchange system, coded graphics of language semantic
Technical field
The present invention relates to information exchange fields, and in particular to a kind of audio exchange method of language semantic and audio exchange system System, coded graphics.
Background technology
Current language translation mainly synthesizes several parts by speech recognition, semantic analysis and sentence and forms, and speech recognition is adopted With high sensor, from extracting audio corresponding with word in sentence in the frequency domain or time domain speech signal stream of opriginal language Signal set, semantic analysis utilize the models pair such as hidden Markov model (HMM), self learning model, artificial neural network (ANN) Word sequence and semantic meaning in audio signal collection are identified and quantify to determine expression content, sentence synthesis as far as possible The audio signal collection or word sequence set of object language are formed according to the identification of expression content and quantized data.In this mistake The computing resource for being influenced by semantic analysis model complexity to need magnanimity in journey, for the application of mobile terminal need using point The computing architecture of cloth, using the computing resource at the guaranteed bandwidth access service end of internet, therefore the real-time and standard translated True property is restricted.
In patent document CN104637482B, disclose it is a kind of utilize digital coding realize dress of the voice to text conversion It sets, wherein storing first language phoneme characteristic using phoneme storage unit;Using phoneme conversion unit by the phoneme of reception Signal sequence is converted to first language phoneme by first language phoneme characteristic;It is first language using digital coding unit Phoneme carries out unique encodings, forms first language phoneme encoding sequence;The first language is formed using first language phoneme encoding sequence The word pronunciation coded sequence and vocabulary pronunciation coded sequence of speech;Using word storing unit store the word of first language, vocabulary or Figure and corresponding coded sequence;First language is generated according to the correspondence of coded sequence using words converting unit Word, vocabulary, figure and/or a combination thereof.There are the bases of coding mapping between the device description words and voice.How coding is utilized The resource consumption of the picture and text audio conversion of identical semanteme needs inventive improvements between mapping basis reduction language.
Invention content
In view of this, the embodiment of the present invention is dedicated to providing audio exchange method and the audio exchange system of a kind of language semantic System, semanteme complexity leads to the technical problem of data response and real-time difference when solving linguistic intertranslation in the prior art.
The audio exchange method of the language semantic of the embodiment of the present invention, forms the voice of each language using minimum aligned phoneme sequence Mapping structure is completed to convert between semantic language by each voice mapping structure.
The audio exchange system of the language semantic of the embodiment of the present invention, which is characterized in that including:
Memory, the program code of the audio exchange method for storing above-mentioned language semantic;
Processor, for running said program code.
The audio exchange system of the language semantic of the embodiment of the present invention, for forming each language using minimum aligned phoneme sequence Voice mapping structure is completed to convert between semantic language by each voice mapping structure.
The basic voice coding figure of the embodiment of the present invention is used for the graphic software platform of language phoneme, including basic framework, The basic framework includes arranged side by side first adaptation column, the second adaptation column and adapter rod, the first adaptation column and described second Adaptation hyte is respectively set in adaptation column, and the adaptation hyte includes several adaptation positions, and the both ends of the adapter rod respectively connect one One adaptation position of a adaptation column.
The audio exchange method and audio exchange system of the language semantic of the embodiment of the present invention, coded graphics utilize language structure Master data crosspoint at the middle minimum phoneme for forming audio minimum short section as semantic conversion between each language, utilizes minimum Basis of coding of the phoneme as data exchange, changes the foundation structure of speech recognition, simplifies the volume of language sound intermediate frequency content Code length and code efficiency so that data exchange efficiency when language translation is optimized, to reducing remote data real-time response Time delay, improving the memory capacity of basic data structure and basic data in local mobile terminal has actively impact.
Description of the drawings
Fig. 1 show the data handling procedure schematic diagram of the audio exchange method of one embodiment of the invention language semantic.
Fig. 2 show the cataloged procedure schematic diagram of the audio exchange method of one embodiment of the invention language semantic.
Fig. 3 show the voice mapping structure schematic diagram of the audio exchange method of one embodiment of the invention language semantic.
Fig. 4 show the voice mapping structure schematic diagram of the audio exchange method of one embodiment of the invention language semantic.
Fig. 5 show the schematic diagram of the audio exchange method progress language conversion of one embodiment of the invention language semantic.
Fig. 6 show the configuration diagram of the audio exchange system of one embodiment of the invention language semantic.
Fig. 7 show a kind of figure of basic voice coding figure in the audio exchange method of language semantic of the embodiment of the present invention Shape structural schematic diagram.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiment is only a part of the embodiment of the present invention, instead of all the embodiments.Based on this Embodiment in invention, every other reality obtained by those of ordinary skill in the art without making creative efforts Example is applied, shall fall within the protection scope of the present invention.
The audio exchange method of the language semantic of the embodiment of the present invention, including:
The voice mapping structure that each language is formed using minimum aligned phoneme sequence completes semanteme by each voice mapping structure It is converted between language.
There are essence difference on picture and text and pronunciation, semantic conversion refers to identical semantic for the expression of identical semanteme between language The conversion of different picture and text and pronunciation expression-form.
The pronunciation of the semantic word (one kind as graphical symbol) of regional all-purpose language expression has certainty, word The pronunciation law of remittance and sentence can be summarized as the various combination of syllable.And one group of basic minimum phoneme is used to constitute each sound Section can utilize the low signal Load Characteristics of minimum phoneme to exclude audio redundant signals and interference information, be carried for complex data exchange For the basis of coding more simplified, code length is reduced.
The statistics of each regional all-purpose language is compared according to those skilled in the art, most as pronunciation fundamental It was determined that quantity is less than 1000, the world 7000 or so, which is planted, amounts to 800 left sides in language for small phoneme quantity and its audio frequency characteristics Right unduplicated minimum phoneme, wherein each western language about use 40 or so minimum phonemes, Chinese to be no more than 150 left sides Right minimum phoneme, may be used completely hundreds value range or kilobit numberical range block code establish index be, for example, ten into Make three digits or four figures, e.g. 0 digit of binary one or 20 digits.
The audio exchange method of the language semantic of the embodiment of the present invention forms audio minimum short section in being constituted using language Master data crosspoint of the minimum phoneme as semantic conversion between each language, using minimum phoneme as the coding of data exchange Basis changes the foundation structure of speech recognition, simplifies the code length and code efficiency of language sound intermediate frequency content so that language Say the complicated sound for avoiding being coupled the formation of the composite informations such as language fragments medium pitch, scale, range in the cataloged procedure of audio Frequency feature, ensure that phonetic recognization rate, and the mapping structure of the voice coding and literal code that are formed using minimum phoneme makes language Data exchange efficiency when speech translation is optimized.To reduce remote data real-time response time delay, improve basic data structure and Memory capacity of the basic data in local mobile terminal has actively impact.
Fig. 1 is the data handling procedure schematic diagram of the audio exchange method of one embodiment of the invention language semantic.Such as Fig. 1 institutes Show, including:
Step 100:Serialize all minimum phonemes.
Serialization process may include the identification to syllable, phoneme, scale, intonation in language, syllable, sound to identification The quantitative mathematical description of element, scale, intonation, such as time domain or the audio characteristic data of frequency domain, describes quantitative mathematical the knot of data Structureization stores, and such as coding forms index one by one.
Step 200:It is formed by the subset of all minimum phonemes between the text-to-speech of each language and maps data.
The pronunciation basis of each language is determined by the subset of all minimum phonemes, passes through the group of minimum phoneme in subset The voice identifier for forming word pronunciation in a kind of language is closed, and then forms using voice identifier that word is corresponding between voice identifier to be tied The mapping data of structure, mapping data include storing the data structure of data.It may include reflecting between word and voice to map data Penetrate the mapping data between data and voice.
Step 300:It is formed by language semantic between the voice of each language and maps data.
The mapping data of the voice that corresponds to meaning are established using semantic objectivity between language, and mapping data include storage number According to data structure.Can also include the mapping data between word and voice
Step 400:Semantic language turn is formed using mapping data between data and text-to-speech are mapped between corresponding voice It changes.
The audio exchange method of the language semantic of the embodiment of the present invention ensure that one kind by mapping data between text-to-speech The continuity and correctness of the Text-to-Speech of language map the combination of data between voice between mapping data and text-to-speech The conversion diversity between language is allow to realize higher language in transfer process while the conversion quality between ensureing language Say basic data interactive efficiency.It simultaneously can be with shape by mapping the mapping variation of mapping data between data and text-to-speech between voice At further cipher round results.
Fig. 2 is the cataloged procedure schematic diagram of the audio exchange method of one embodiment of the invention language semantic.As shown in Fig. 2, On the basis of above-described embodiment, step 100 includes:
Step 110:The minimum phoneme of each all-purpose language is acquired by speech recognition.
Based on human physiological's feature and language evolution, the voice of language can be decomposed into be pronounced to word pronunciation extremely by sentence Word syllable constitutes the STRUCTURE DECOMPOSITION of phoneme to syllable.It will be appreciated by those skilled in the art that carrying out audio using computer technology Acquisition and time domain or the frequency domain character analysis of audio fragment can determine the audio frequency characteristics of word, word, phrase, determine including Minimum phoneme feature.
Step 120:Minimum phoneme is formed into unified aligned phoneme sequence.
It will be appreciated by those skilled in the art that by speech recognition technology, in conjunction with the speech analysis and statistics of necessary data amount The minimum phoneme audio frequency characteristics used in each language can be identified and be determined.By the audio frequency characteristics of each of determining minimum phoneme Unified mark coding, forms the unified aligned phoneme sequence of all minimum phonemes.Unified aligned phoneme sequence allows the voice of language accurate The determining combination to be formed by least one minimum phoneme really is deconstructed, determines that combination can pass through unified aligned phoneme sequence acquisition pair The coded sequence answered.
Such as:Syllable is formed using initial consonant and simple or compound vowel of a Chinese syllable in Chinese, initial consonant is by single minimum phoneme or several single minimum sounds Element is formed, and simple or compound vowel of a Chinese syllable is formed by one or several minimum phonemes, and syllable is formed using vowel and consonant in similar English, vowel by Single minimum phoneme or several single minimum phonemes are formed, and consonant is formed by one or several minimum phonemes, the unified sound of formation It the part of prime sequences can be as shown in the table:
Single minimum phoneme in table in unified aligned phoneme sequence has unique encodings in unified aligned phoneme sequence.For being less than 1000 minimum phonemes can form unique encodings using 10bit (bit) length.
The audio exchange method of language semantic of the embodiment of the present invention forms unified aligned phoneme sequence as same or similar semanteme The essential information carrier of word or voice conversion between different language, avoids other kinds of composite audio carrier (such as sound Section) entrained by excessive redundancy formed information interference, be conducive to optimize speech recognition accuracy and recognition efficiency.Most Small phoneme further can be updated unified aligned phoneme sequence with language evolution using unified aligned phoneme sequence, keep to each language The synchronous variation of speech sound.
As shown in Fig. 2, step 200 includes in the audio exchange method of one embodiment of the invention language semantic:
Step 210:The pronunciation with individual character or word in first language is formed using a part of phoneme in unified aligned phoneme sequence Corresponding first basic voice coding sequence.
A part of phoneme includes a kind of all minimum phonemes of language pronouncing, and syllable can be formed using this part of phoneme And then form the pronunciation of the language word or word.Coding based on minimum phoneme in unified aligned phoneme sequence, forms the first language The basic voice coding of each individual character or word is called the turn, and then forms the basic voice coder for owning (or main) individual character or word Code sequence.
Such as:" mother " word in Chinese, phonetic are " ma ", including phoneme " m " and " a ", and " m " is in unified aligned phoneme sequence Be encoded to 120, " a " is encoded to 010 in unified aligned phoneme sequence, then " mother " word is in the basic voice coding sequence of Chinese Be encoded to 120010.
Other coding compress modes can also be used in an embodiment of the present invention, such as the phoneme for including by " mother " word Coding adds up, and formation is encoded to 130.Or use the patterned mode of basic voice coding.
It will be appreciated by those skilled in the art that the coding form in basic voice coding sequence in citing is there are redundancy, by Minimum phoneme encoding effect length can utilize compression coding technology to keep compiling using the basic voice coding sequence of standard byte Code uniqueness and compared with lower Item length.
It will be appreciated by those skilled in the art that different individual characters or word with same pronunciation can be with having the same basic The different pronunciations of voice coding, individual character or word can make same individual character or word have different basic voice codings.
Step 220:It is formed using the first basic voice coding sequence corresponding with phrase in first language or sentence pronunciation First voice mapping structure.
In the basic voice coding sequence basis that individual character or word determine, the voice mapping structure of phrase or sentence can be with Form the voice mapping structure that phrase or sentence are formed based on basic voice coding sequence extension.
Voice mapping structure may be used with address feature and addressable data structure, such as team either statically or dynamically The single form or combining form of row, array, heap, storehouse, chained list, tree or figure etc., can utilize either statically or dynamically that pointer can be with Realize different data structure type address arithmetic, each data structure involved in voice mapping structure may exist comprising or Side by side.
In an embodiment of the present invention, it can be formed with related semantic meaning to pointer using above-mentioned data structure The mapping structure of voice and semanteme between word, word, language, sentence, by establishing part of speech mapping structure with semantic meaning.
Fig. 3 is the voice mapping structure schematic diagram of the audio exchange method of one embodiment of the invention language semantic.Such as Fig. 3 institutes Show, for Chinese, by taking " hair " word, " bright " word, " wound " word, " making " word as an example, each word utilizes correspondence as minimum semantic primitive The phoneme of pronunciation establishes corresponding basic voice coding, has discreteness between the basic voice coding of each word.Individual character is with chained list knot Structure (only as an example) stores single character code (the i.e. phoneme feature) filter efficiency that can ensure high speed.It is formed with individual character Such as " invention ", " creation " of each word with semantic meaning is stored with another list structure, the basic voice coder of each word Code is formed using the basic voice coding of included individual character, has discreteness between the basic voice coding of each word.With individual character or Phrase of each of the word formation with semantic meaning is stored with structure of arrays (only as an example), it is ensured that is quickly sought The efficiency of location and data topology update variation, has discreteness between the basic voice coding of each phrase.
Word, word, phrase are formed according to the semantic dependency of word, word, phrase using the address pointer in data structure The mapping structure tree of correlation or mapping structure figure so that form mapping association between voice and semanteme, this mapping association can be with It is that static or some movable state is newer.
In basic vocoded data structure, (either word or phrase) data cell of each word can be with Extension, such as it is extended to queue, for storing the different semantic words (either word or phrase) of same pronunciation, voice is reflected Penetrate structure multi-dimension.
The audio exchange method of language semantic of the embodiment of the present invention is made using the data store organisation of voice mapping word The major part of voice mapping structure is static structure, and it is excellent can to form structure by the computing capability in server end or high in the clouds Change, a small amount of dynamic update and supplement can be completed using less computing resource in client.Since phoneme in pronunciation is utilized The basic voice coding sequence formed greatly reduces complexity and data volume for semantic voice mapping structure so that The data storage and data processing of voice mapping structure can complete response under low time delay state in client and server-side.
As shown in Fig. 2, step 200 further includes in the audio exchange method of one embodiment of the invention language semantic:
Step 230:Individual character in second language or pronunciation of words are formed using another part phoneme in unified aligned phoneme sequence Second basic voice coding sequence.
Another part phoneme may include the identical phoneme in part compared with a part of phoneme in above-mentioned steps 130, or The identical phoneme of person is with the word or symbol logo in different language.
Such as:" and " its phonetic symbol is in EnglishIncluding phoneme" n " and " d "," n " and " d " is encoded to 018,220 and 200 in unified aligned phoneme sequence, then " and " word is in the basic voice coding sequence of English Be encoded to 018220200.
It will be appreciated by those skilled in the art that citing in basic voice coding sequence in coding form there are redundancies, can With using the uniqueness of compression coding technology holding coding and compared with lower Item length.
It will be appreciated by those skilled in the art that different individual characters or word with same pronunciation can be with having the same basic The different pronunciations of voice coding, individual character or word can make same individual character or word have different basic voice codings.
Step 240:It is formed using the second basic voice coding sequence corresponding with phrase in second language or sentence pronunciation Second voice mapping structure.
The word (or symbol) of identical semanteme has the possibility of same pronunciation, the difference of identical semanteme in different language The same pronunciation of word generates encoding variability with the formation of macaronic voice mapping structure.
Fig. 4 is the voice mapping structure schematic diagram of the audio exchange method of one embodiment of the invention language semantic.Such as Fig. 4 institutes Show, for English, by taking " invention ", " creation " as an example, each word is sent out as minimum semantic primitive using corresponding The phoneme of sound establishes corresponding basic voice coding, has discreteness between the basic voice coding of each word.Word is with database Form structure (only as an example) storage can ensure high speed word coding (i.e. phoneme feature) filter efficiency.With list Morphology at each of the phrase with semantic meaning stored with the form structure (only as an example) of database, it is ensured that The efficiency of immediate addressing and data topology update variation, has discreteness between the basic voice coding of each phrase.
Word, phrase correlation are formed according to the semantic dependency of word, phrase using the address pointer in data structure Mapping structure tree or mapping structure figure so that form mapping association between voice and semanteme, this mapping association can be static Or some movable state it is newer.
In basic vocoded data structure, the data cell of each word or phrase can be extended to queue, It is for storing same pronunciation different semantic words or phrase, voice mapping structure is multidimensional.
The audio exchange method of language semantic of the embodiment of the present invention is made using the data store organisation of voice mapping word The major part of voice mapping structure is static structure, and it is excellent can to form structure by the computing capability in server end or high in the clouds Change, a small amount of dynamic update and supplement can be completed using less computing resource in client.Since phoneme in pronunciation is utilized Basic voice coding sequence, greatly reduce the complexity and data volume for semantic voice mapping structure so that voice The data storage and data processing of mapping structure can complete response under low time delay state in client and server-side.
As shown in Fig. 2, step 300 further includes in the audio exchange method of one embodiment of the invention language semantic:
Step 310:Pass through each first language and second language using same or similar semantic information (i.e. first and Two) voice mapping structure forms the voice primary transformational structure between corresponding language.
Using macaronic voice mapping structure based on same or similar semantic information between needing the language translated The voice primary transformational structure between the individual character or word of same or similar meaning is formed, macaronic individual character, word, short is stored " key may be used in basic voice coding between language or sentence, voice primary transformational structure:The structure of key assignments " stores, big to respond Measure the filter efficiency of concurrent request.
For example, by using
It is semantic:The basic voice coding of English:The basic voice coding of Chinese
Innovation and creation:092072069:710169555614
The basic voice coding of English and the basic voice coding of Chinese can key and key assignments each other, be used for two-way translation.
As shown in Fig. 2, step 300 further includes in the audio exchange method of one embodiment of the invention language semantic:
Step 320:Corresponding (i.e. first and second) voice is formed using the syntax rule of first language and second language to reflect Penetrate the advanced transformational structure of interstructural voice.
The syntax rule of each language includes the language between the root according to individual character or word and the individual character or word of part of speech foundation Pitch class transformational structure.According to voice primary transformational structure, " key may be used in the advanced transformational structure of voice:The structure of key assignments " is deposited Storage, to respond the filter efficiency of a large amount of concurrent requests.
For example, by using
It is semantic:Grammer:The basic voice coding of English
English " creating (noun) " 0001:092072069;
English " creating (verb) " 0002:092072069;
English " creating (adverbial word) " 0003:092072069;
It is semantic:Grammer:The basic voice coding of Chinese
Chinese " creating (noun) " 0001:710169555614;
Chinese " creating (verb) " 0002:710169555614;
Chinese " creating (adverbial word) " 0003:710169555614;
The basic voice coder of the individual character, word or vocabulary with similar semantic will be formed in bilingual according to different grammers Code opposite can be assembled, and coding dependency improves, and improve filter efficiency and machine translation efficiency of algorithm in translation process.
Fig. 5 is that the audio exchange method of one embodiment of the invention language semantic carries out the schematic diagram of language conversion.Such as Fig. 5 institutes Show, step 400 includes:
Step 410:The sequence set of phonemes of the audio input segment of first language is obtained using speech recognition;
Step 420:It is basic using the first of the first basic voice coding sequence determination sequence set of phonemes of first language Voice coding;
Step 430:Utilize the first voice mapping structure of first language and the first basic voice coding sequence determination sequence The continuous speech of set of phonemes encodes;
Step 440:The second basic voice coder of second language is obtained using the voice primary transformational structure between corresponding language Code;
Step 450:Using between corresponding language the advanced transformational structure of voice and the second basic voice coding sequence obtain the The continuous speech of two language encodes;
Step 460:It encodes to form sound pronunciation according to the continuous speech of second language.
The audio exchange method of language semantic of the embodiment of the present invention carries out utilizing the aligned phoneme sequence-base formed when language conversion The transformational structure that this voice coding sequence-is formed between voice mapping structure and language can between voice and word between completing bilingual Inverse conversion is conducive to voice and converts accurately or relatively accurately obtain corresponding alternative spelling words intellectual.Data and data structure Sizes of memory is limited, and retrieval difficulty is relatively low, suitable for being locally stored and handling, reality that whole process responds server-side request of data When property and bandwidth requirement be not high.Fig. 6 is the configuration diagram of the audio exchange system of one embodiment of the invention language semantic.Such as figure Shown in 6, the audio exchange system of the embodiment of the present invention, the voice for forming each language using minimum aligned phoneme sequence maps knot Structure is completed to convert between semantic language by each voice mapping structure.
As shown in fig. 6, the audio exchange system of the embodiment of the present invention includes:
Device 1100 is serialized, for serializing all minimum phonemes.
Phoneme maps forming apparatus 1200 in language, the word for forming each language by the subset of all minimum phonemes Data are mapped between voice.
Phoneme maps forming apparatus 1300 between language, maps number between the voice for forming each language by language semantic According to.
Language converting device 1400, for using mapping data are formed between mapping data and text-to-speech between corresponding voice Semantic language conversion.
As shown in fig. 6, serializing device 1100 includes in the audio exchange system of the embodiment of the present invention:
Phoneme recognition module 1110, the minimum phoneme for acquiring each all-purpose language by speech recognition.
Phoneme encoding module 1120, for minimum phoneme to be formed unified aligned phoneme sequence.
As shown in fig. 6, phoneme mapping forming apparatus 1200 includes in language in the audio exchange system of the embodiment of the present invention:
First voice coding establishes module 1210, for being formed and the first language using a part of phoneme in unified aligned phoneme sequence Call the turn the corresponding first basic voice coding sequence of pronunciation of individual character or word.
Module 1220 is established in the mapping of first voice, for using in the first basic voice coding sequence formation and first language Phrase or the corresponding first voice mapping structure of sentence pronunciation.
Second voice coding establishes module 1230, for forming the second language using another part phoneme in unified aligned phoneme sequence Call the turn the second basic voice coding sequence of individual character or pronunciation of words.
Module 1240 is established in the mapping of second voice, for using in the second basic voice coding sequence formation and second language Phrase or the corresponding second voice mapping structure of sentence pronunciation.
As shown in fig. 6, phoneme mapping forming apparatus 1300 includes between language in the audio exchange system of the embodiment of the present invention:
Language construction primary conversion module 1310, for using same or similar semantic information by each first language and (i.e. first and second) voice mapping structure of second language forms the voice primary transformational structure between corresponding language.
The advanced conversion module 1320 of language construction, for being formed accordingly using the syntax rule of first language and second language The advanced transformational structure of voice between (i.e. first and second) voice mapping structure.
As shown in fig. 6, language converting device 1400 includes in the audio exchange system of the embodiment of the present invention:
Phoneme recognition module 1410, the sequence phoneme of the audio input segment for obtaining first language using speech recognition Set;
First basic coding identification module 1420, it is suitable for being determined using the first basic voice coding sequence of first language First basic voice coding of sequence set of phonemes;
First continuous speech coding module 1430, substantially for the first voice mapping structure and first using first language The continuous speech of voice coding sequence determination sequence set of phonemes encodes;
Second basic coding identification module 1440, for obtaining second using the voice primary transformational structure between corresponding language Second basic voice coding of language;
Second continuous speech coding module 1450, for utilizing the advanced transformational structure of voice and the second base between corresponding language This voice coding sequence obtains the continuous speech coding of second language;
Continuous programming code conversion module 1460, for encoding to form sound pronunciation according to the continuous speech of second language.
Those of ordinary skill in the art may realize that lists described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, depends on the specific application and design constraint of technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as multiple units or component It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be the indirect coupling by some interfaces, device or unit It closes or communicates to connect, can be electrical, machinery or other forms.
The unit illustrated as separating component may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, you can be located at a place, or may be distributed over multiple In network element.Some or all of unit therein can be selected according to the actual needs to realize the mesh of this embodiment scheme 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.
It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer read/write memory medium.Based on this understanding, technical scheme of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be expressed in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program ver-ify code such as reservoir (RAM, Random Access Memory), magnetic disc or CD.
In the audio exchange method of one embodiment of the invention language semantic, for utilizing the part in unified aligned phoneme sequence Minimum phoneme forms the basic voice coding sequence of individual character or pronunciation of words in a kind of language, and basic voice coding therein can be with Additional pictorial symbols are formed, it is corresponding with the individual character or word accordingly to pronounce.Graphical using basic voice coding can be with The pronunciation of individual character or word that phoneme is formed is converted to visual identity, is conducive to Computer Vision Recognition and machine word The communication of sound identification so that the voice conversion of identical semanteme can be with the basis of Computer Vision Recognition between language.
Fig. 7 show a kind of figure of basic voice coding figure in the audio exchange method of language semantic of the embodiment of the present invention Shape structural schematic diagram.As shown in the parts a of Fig. 7, graphic structure includes the basic framework 01 of a H-shaped, and basic framework includes simultaneously Row further include a both ends point in vertical parallel the first adaptation column 10 (bar paten) and the second adaptation column 20 (bar paten) The adapter rod 30 (bar paten) not connect with the first adaptation column and the second adaptation column.
The first adaptation hyte 11 is provided on first adaptation column (being left side in figure), second is adapted on column (being right side in figure) It is provided with the second adaptation hyte 21, third adaptation hyte 31 is provided in adapter rod 30, the end of adapter rod 30 is connected to correspondence Side is adapted on the adaptation position of column, is adapted in the adaptation hyte of column and is included at least three adaptations position (what is provided in attached drawing is 5).
Adjacent adaptation position is used to adjust the length of adaptation column in same adaptation hyte, overlaps to form adaptation by being adapted to position The specific adjusted of column so that the length of corresponding adaptation column forms corresponding change, and the adaptation position that can be overlapped includes at least two. The end of adapter rod 30 can be connected on the coincidence adaptation position of corresponding side adaptation column.
In practical applications, the phoneme encoding of the pronunciation syllable of individual character or word or the syllable of phoneme formation will can be formed Coding is reflected in the first adaptation column, second accommodates in the connection change in shape of adapter rod, utilizes the fixed position of adaptation position and suitable The coincidence variation of coordination forms the encoded content of enough permutation and combination reflection syllables.
As shown in the parts b of Fig. 7 and the parts c, it can also include in an embodiment of the present invention and be adapted to the auxiliary of position connection Help adaptation symbol 40, auxiliary adaptation symbol 40 include have direction vector vector line segment 41 and not direction vector standard symbol Numbers 42.Vector line segment 41 can be line segment or minor arc, and standard symbol 42 can be round or annular, vector line segment can there are one Or it is multiple, standard symbol can there are one or it is multiple.
It can will be relevant with syllable after additional vector line segment and standard symbol are connect with adaptation position in practical applications The additional audio features such as intonation, the tone are combined with syllable coding, increase the information load of syllable coding.
In practical application, such as Chinese, as shown in the parts b of Fig. 7 and the parts c, the parts b are individual character " rear " and " time " The correspondence figure of voice coding, the parts c are the correspondence figure of individual character " mouth " and " bandit " voice coding, the pronunciation of above-mentioned each individual character The breeder mother of syllable shows the fit structure of the length variation and vector line segment 41 of the first adaptation column on the left of basic framework, simple or compound vowel of a Chinese syllable Show the fit structure of the length variation and vector line segment 41 and standard symbol 42 of the second adaptation column on the right side of basic framework.Base This frame is adapted to the smoothed processing of symbol with auxiliary and can not only keep figure beautiful but also can ensure Computer Vision Recognition quality.
As shown in the parts d of Fig. 7, the adaptation position of coincidence and adapter rod 30 and the link position for being adapted to position, basic subrack are utilized Frame 01 can be converted to n shapes from H-shaped, as shown in the parts e of Fig. 7, using coincidence adaptation position and adapter rod 30 be adapted to position Link position, basic framework 01 can be converted to U-shaped from H-shaped.
As shown in the parts d of Fig. 7, the first, second adaptation column around basic framework (H-shaped, n shapes or U-shaped) directly marks The coding of minimum phoneme, encodes digital number and the adaptation position of corresponding adaptation column is corresponding.Utilize minimum sound in a kind of linguistic syllables The direct coding of element is shown, phonographic alphabet-phoneme encoding-voice of language is directly done visual expression so that macaronic Computer vision conversion may be implemented in basic voice coding figure, while voice is converted, is identified and is protected using computer graphical Demonstrate,prove the discrimination of language identification.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.

Claims (11)

1. a kind of audio exchange method of language semantic, which is characterized in that form the voice of each language using minimum aligned phoneme sequence Mapping structure is completed to convert between semantic language by each voice mapping structure.
2. the audio exchange method of language semantic according to claim 1, which is characterized in that described to utilize minimum phoneme sequence Row form the voice mapping structure of each language:
Serialize all minimum phonemes;
It is formed by the subset of all minimum phonemes between the text-to-speech of each language and maps data;
It is formed by language semantic between the voice of each language and maps data.
3. the audio exchange method of language semantic according to claim 2, which is characterized in that described to be mapped by each voice It is converted between the semantic language of structure completion and includes:
Semantic language conversion is formed using mapping data between data and the text-to-speech are mapped between the corresponding voice.
4. the audio exchange method of language semantic according to claim 2 or 3, which is characterized in that the serializing is all Minimum phoneme includes:
The minimum phoneme of each all-purpose language is acquired by speech recognition;
The minimum phoneme is formed into unified aligned phoneme sequence.
5. the audio exchange method of language semantic according to claim 4, which is characterized in that it is described by it is described it is all most Mapping data include between the subset of small phoneme forms the text-to-speech of each language:
Corresponding with the pronunciation of individual character in first language or word the is formed using a part of phoneme in the unified aligned phoneme sequence One basic voice coding sequence;
The first voice corresponding with phrase in first language or sentence pronunciation is formed using the described first basic voice coding sequence Mapping structure:
It is basic using the second of individual character in another part phoneme formation second language in the unified aligned phoneme sequence or pronunciation of words Voice coding sequence;
The second voice corresponding with phrase in second language or sentence pronunciation is formed using the described second basic voice coding sequence Mapping structure.
6. the audio exchange method of language semantic according to claim 5, which is characterized in that described to pass through language semantic shape Include at data are mapped between the voice of each language:
It is formed by the voice mapping structure of the first language and the second language using same or similar semantic information Voice primary transformational structure between corresponding language;
The voice between the first language and the voice mapping structure of the second language is formed using the syntax rule of each language Advanced transformational structure.
7. the audio exchange method of language semantic according to claim 3, which is characterized in that described to utilize corresponding voice Between between mapping data and text-to-speech mapping data form semantic language conversion and include:
The sequence set of phonemes of the audio input segment of first language is obtained using speech recognition;
Utilize the first basic voice coding of the first basic voice coding sequence determination sequence set of phonemes of first language;
Utilize the company of the first voice mapping structure of first language and the first basic voice coding sequence determination sequence set of phonemes Continuous voice coding;
The second basic voice coding of second language is obtained using the voice primary transformational structure between corresponding language;
The continuous of second language is obtained using the advanced transformational structure of voice corresponded between language and the second basic voice coding sequence Voice coding;
It encodes to form sound pronunciation according to the continuous speech of second language.
8. the audio exchange method of language semantic according to claim 1, which is characterized in that the minimum aligned phoneme sequence is adopted It is established and is indexed with the block code of hundreds value range or kilobit numberical range.
9. a kind of audio exchange system of language semantic, which is characterized in that including:
Memory, the program code of the audio exchange method for storing language semantic as described in any of the claims 1 to 8;
Processor, for running said program code.
10. a kind of audio exchange system of language semantic, the voice for forming each language using minimum aligned phoneme sequence maps knot Structure is completed to convert between semantic language by each voice mapping structure.
11. a kind of basic voice coding figure, it to be used for the graphic software platform of language phoneme, which is characterized in that including basic framework, The basic framework includes arranged side by side first adaptation column, the second adaptation column and adapter rod, the first adaptation column and described second Adaptation hyte is respectively set in adaptation column, and the adaptation hyte includes several adaptation positions, and the both ends of the adapter rod respectively connect one One adaptation position of a adaptation column.
CN201810264460.3A 2018-03-28 2018-03-28 The audio exchange method and audio exchange system of language semantic Active CN108597493B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201810264460.3A CN108597493B (en) 2018-03-28 2018-03-28 The audio exchange method and audio exchange system of language semantic
CN201910143693.2A CN109754780B (en) 2018-03-28 2018-03-28 Basic speech coding graphics and audio exchange method
PCT/CN2019/079834 WO2019184942A1 (en) 2018-03-28 2019-03-27 Audio exchanging method and system employing linguistic semantics, and coding graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810264460.3A CN108597493B (en) 2018-03-28 2018-03-28 The audio exchange method and audio exchange system of language semantic

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201910143693.2A Division CN109754780B (en) 2018-03-28 2018-03-28 Basic speech coding graphics and audio exchange method

Publications (2)

Publication Number Publication Date
CN108597493A true CN108597493A (en) 2018-09-28
CN108597493B CN108597493B (en) 2019-04-12

Family

ID=63624812

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201910143693.2A Active CN109754780B (en) 2018-03-28 2018-03-28 Basic speech coding graphics and audio exchange method
CN201810264460.3A Active CN108597493B (en) 2018-03-28 2018-03-28 The audio exchange method and audio exchange system of language semantic

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201910143693.2A Active CN109754780B (en) 2018-03-28 2018-03-28 Basic speech coding graphics and audio exchange method

Country Status (2)

Country Link
CN (2) CN109754780B (en)
WO (1) WO2019184942A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019184942A1 (en) * 2018-03-28 2019-10-03 孔繁泽 Audio exchanging method and system employing linguistic semantics, and coding graph
CN110991148A (en) * 2019-12-03 2020-04-10 孔繁泽 Information processing method and device, and information interaction method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060229864A1 (en) * 2005-04-07 2006-10-12 Nokia Corporation Method, device, and computer program product for multi-lingual speech recognition
US20070083369A1 (en) * 2005-10-06 2007-04-12 Mcculler Patrick Generating words and names using N-grams of phonemes
CN102063899A (en) * 2010-10-27 2011-05-18 南京邮电大学 Method for voice conversion under unparallel text condition
CN104637482A (en) * 2015-01-19 2015-05-20 孔繁泽 Voice recognition method, device, system and language switching system
US20180061417A1 (en) * 2016-08-30 2018-03-01 Tata Consultancy Services Limited System and method for transcription of spoken words using multilingual mismatched crowd

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8219391B2 (en) * 2005-02-15 2012-07-10 Raytheon Bbn Technologies Corp. Speech analyzing system with speech codebook
CN101131689B (en) * 2006-08-22 2010-08-18 苗玉水 Bidirectional mechanical translation method for sentence pattern conversion between Chinese language and foreign language
KR20080046552A (en) * 2006-11-22 2008-05-27 가구모토 주니치 Print having speech code, method and device for reappearing record, and commerce mode
CN103250148A (en) * 2010-11-04 2013-08-14 莱根达姆普罗维塔有限责任公司 Methods and systems for transcribing or transliterating to an iconophonological orthography
CN109754780B (en) * 2018-03-28 2020-08-04 孔繁泽 Basic speech coding graphics and audio exchange method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060229864A1 (en) * 2005-04-07 2006-10-12 Nokia Corporation Method, device, and computer program product for multi-lingual speech recognition
US20070083369A1 (en) * 2005-10-06 2007-04-12 Mcculler Patrick Generating words and names using N-grams of phonemes
CN102063899A (en) * 2010-10-27 2011-05-18 南京邮电大学 Method for voice conversion under unparallel text condition
CN104637482A (en) * 2015-01-19 2015-05-20 孔繁泽 Voice recognition method, device, system and language switching system
US20180061417A1 (en) * 2016-08-30 2018-03-01 Tata Consultancy Services Limited System and method for transcription of spoken words using multilingual mismatched crowd

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019184942A1 (en) * 2018-03-28 2019-10-03 孔繁泽 Audio exchanging method and system employing linguistic semantics, and coding graph
CN110991148A (en) * 2019-12-03 2020-04-10 孔繁泽 Information processing method and device, and information interaction method and device
CN110991148B (en) * 2019-12-03 2024-02-09 孔繁泽 Information processing method and device, information interaction method and device

Also Published As

Publication number Publication date
CN109754780A (en) 2019-05-14
WO2019184942A1 (en) 2019-10-03
CN109754780B (en) 2020-08-04
CN108597493B (en) 2019-04-12

Similar Documents

Publication Publication Date Title
CN111276120B (en) Speech synthesis method, apparatus and computer-readable storage medium
EP2958105B1 (en) Method and apparatus for speech synthesis based on large corpus
WO2020215551A1 (en) Chinese speech synthesizing method, apparatus and device, storage medium
US11488577B2 (en) Training method and apparatus for a speech synthesis model, and storage medium
CN108447486A (en) A kind of voice translation method and device
US6188984B1 (en) Method and system for syllable parsing
CN109523989A (en) Phoneme synthesizing method, speech synthetic device, storage medium and electronic equipment
CN101156196A (en) Hybrid speech synthesizer, method and use
WO2005034082A1 (en) Method for synthesizing speech
CN112352275A (en) Neural text-to-speech synthesis with multi-level textual information
CN110767213A (en) Rhythm prediction method and device
CN110335608A (en) Voice print verification method, apparatus, equipment and storage medium
CN108597493B (en) The audio exchange method and audio exchange system of language semantic
CN114882862A (en) Voice processing method and related equipment
CN105895076B (en) A kind of phoneme synthesizing method and system
WO2022134164A1 (en) Translation method, apparatus and device, and storage medium
KR102639322B1 (en) Voice synthesis system and method capable of duplicating tone and prosody styles in real time
CN105895075B (en) Improve the method and system of synthesis phonetic-rhythm naturalness
CN113160793A (en) Speech synthesis method, device, equipment and storage medium based on low resource language
JP2021089300A (en) Method and device for multilingual voice recognition and theme-meaning element analysis
Blair et al. Learning to predict the phonological structure of English loanwords in Japanese
CN109597884A (en) Talk with method, apparatus, storage medium and the terminal device generated
CN117672182B (en) Sound cloning method and system based on artificial intelligence
CN114999447B (en) Speech synthesis model and speech synthesis method based on confrontation generation network
CN113362803B (en) ARM side offline speech synthesis method, ARM side offline speech synthesis device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1261697

Country of ref document: HK