CN111192570B - Language model training method, system, mobile terminal and storage medium - Google Patents

Language model training method, system, mobile terminal and storage medium Download PDF

Info

Publication number
CN111192570B
CN111192570B CN202010011026.1A CN202010011026A CN111192570B CN 111192570 B CN111192570 B CN 111192570B CN 202010011026 A CN202010011026 A CN 202010011026A CN 111192570 B CN111192570 B CN 111192570B
Authority
CN
China
Prior art keywords
language
module
training
language model
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010011026.1A
Other languages
Chinese (zh)
Other versions
CN111192570A (en
Inventor
张广学
肖龙源
蔡振华
李稀敏
刘晓葳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Kuaishangtong Technology Co Ltd
Original Assignee
Xiamen Kuaishangtong Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Kuaishangtong Technology Co Ltd filed Critical Xiamen Kuaishangtong Technology Co Ltd
Priority to CN202010011026.1A priority Critical patent/CN111192570B/en
Publication of CN111192570A publication Critical patent/CN111192570A/en
Application granted granted Critical
Publication of CN111192570B publication Critical patent/CN111192570B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • G10L2015/0633Creating reference templates; Clustering using lexical or orthographic knowledge sources

Abstract

The invention provides a language model training method, a system, a mobile terminal and a storage medium, wherein the method comprises the following steps: acquiring a training text and training vocabularies, classifying the training text to obtain a plurality of language modules, and constructing a language dictionary corresponding to the language modules according to the training vocabularies; performing model training on a module language model in a language module according to a language dictionary, and training a training text to obtain a text language model; acquiring a voice to be recognized for phoneme recognition to obtain a phoneme string, and matching the phoneme string with a module language model to obtain a phoneme matching result; and performing probability calculation on the phoneme matching result through a text language model, and outputting a sentence corresponding to the maximum probability value. The invention improves the training efficiency and accuracy of the language model by classifying the training texts and constructing and designing the language dictionary, and effectively expands the language model by training and designing the model language model and the training texts.

Description

Language model training method, system, mobile terminal and storage medium
Technical Field
The invention belongs to the technical field of voice recognition, and particularly relates to a language model training method, a language model training system, a mobile terminal and a storage medium.
Background
The speech recognition research has been in history for decades, the speech recognition technology mainly comprises four parts, namely acoustic model modeling, language model modeling, pronunciation dictionary construction and decoding, each part can become an independent research direction, and the difficulty of speech data acquisition and labeling is greatly improved relative to images and texts, so that the construction of a complete speech model training system is a work which consumes a lot of time and has high difficulty, and the development of the speech recognition technology is greatly hindered.
In the existing language model training process, the language model can be trained only according to the vocabulary and sentence patterns pre-stored in the database, and the vocabulary and sentence patterns cannot be added in time in the training process, so that the efficiency and expansibility of the language model training are low.
Disclosure of Invention
The embodiment of the invention aims to provide a language model training method, a language model training system, a mobile terminal and a storage medium, and aims to solve the problems of low efficiency and expansibility of the existing language model training.
The embodiment of the invention is realized in such a way that a language model training method comprises the following steps:
acquiring a training text and training vocabularies, classifying the training text to obtain a plurality of language modules, and constructing a language dictionary corresponding to the language modules according to the training vocabularies;
performing model training on a module language model in the language module according to the language dictionary, and training the training text to obtain a text language model;
acquiring a voice to be recognized for phoneme recognition to obtain a phoneme string, and matching the phoneme string with the module language model to obtain a phoneme matching result;
and performing probability calculation on the phoneme matching result through the text language model, and outputting a sentence corresponding to the maximum probability value.
Further, the step of performing model training on the module language model in the language module according to the language dictionary comprises:
extracting a language text corresponding to the language module from the training text according to the voice dictionary;
training the module language model by adopting a 3-gram training mode according to the language text;
and acquiring the word frequency of the corresponding word in the language text extracted from the language module, and constructing a Huffman tree model according to the word frequency and the training result of the language model.
Further, the step of matching the phoneme string with the module language model comprises:
matching the phoneme string with sample phonemes in each module language model in sequence;
when the matching number between the phoneme string and the sample phonemes in the module language model is larger than or equal to a preset number, outputting all the successfully matched sample phonemes;
and when the matching number is smaller than the preset number, outputting the result of the language module corresponding to the module language model.
Further, the step of performing probability calculation on the phoneme matching result through the text language model comprises:
combining the sample phonemes output by the language modules to obtain combined information, wherein a plurality of phoneme combined strings are stored in the combined information;
and respectively carrying out probability calculation on the phoneme combined strings according to the text language model to obtain a plurality of probability values.
Further, after the step of sequentially matching the phoneme string with the sample phoneme in each of the module language models, the method further includes:
and when the phoneme string is unsuccessfully matched with the module language model, carrying out error marking on the phoneme string according to the module language model.
Further, after the step of matching the phoneme string with the module language model, the method further includes:
when the phoneme string is successfully matched with the module language model, carrying out vocabulary type marking on the phoneme string;
and performing type matching according to the marking result of the vocabulary type mark on the phoneme string to obtain a sentence type, and performing context marking on the speech to be recognized according to the sentence type.
Another object of an embodiment of the present invention is to provide a language model training system, which includes:
the text classification module is used for acquiring a training text and training vocabularies, classifying the training text to obtain a plurality of language modules, and constructing a language dictionary corresponding to the language modules according to the training vocabularies;
the model training module is used for performing model training on a module language model in the language module according to the language dictionary and training the training text to obtain a text language model;
the phoneme matching module is used for acquiring the speech to be recognized for phoneme recognition to obtain a phoneme string and matching the phoneme string with the module language model to obtain a phoneme matching result;
and the probability calculation module is used for performing probability calculation on the phoneme matching result through the text language model and outputting the sentence corresponding to the maximum probability value.
Still further, the model training module is further configured to:
extracting a language text corresponding to the language module from the training text according to the voice dictionary;
training the module language model by adopting a 3-gram training mode according to the language text;
and acquiring the word frequency of the corresponding word in the language text extracted from the language module, and constructing a Huffman tree model according to the word frequency and the training result of the language model.
Another object of an embodiment of the present invention is to provide a mobile terminal, including a storage device and a processor, where the storage device is used to store a computer program, and the processor runs the computer program to make the mobile terminal execute the above language model training method.
Another object of an embodiment of the present invention is to provide a storage medium, which stores a computer program used in the above-mentioned mobile terminal, wherein the computer program, when executed by a processor, implements the steps of the above-mentioned language model training method.
According to the embodiment of the invention, the training efficiency and the accuracy of the language model are effectively improved by classifying the training text and constructing and designing the language dictionary, the expansion of the language model can be effectively carried out by carrying out model training on the module language model in the language module and designing the training text, and the recognition efficiency of the voice model is effectively improved by carrying out voice recognition based on a phoneme recognition mode.
Drawings
FIG. 1 is a flowchart of a language model training method according to a first embodiment of the present invention;
FIG. 2 is a flowchart of a language model training method according to a second embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a language model training system according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a mobile terminal according to a fourth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In order to explain the technical means of the present invention, the following description will be given by way of specific examples.
Example one
Referring to fig. 1, it is a flowchart of a language model training method according to a first embodiment of the present invention, including the steps of:
step S10, acquiring a training text and training vocabularies, classifying the training text to obtain a plurality of language modules, and constructing a language dictionary corresponding to the language modules according to the training vocabularies;
the word language in the training text can be set according to requirements, for example, the word language can be Chinese, english, korean or Japanese, and the like, and both the training vocabulary and the training text can be obtained based on a database, and the training vocabulary includes noun vocabulary, verb vocabulary, adjective vocabulary, adverb vocabulary, and the like;
specifically, in the step, the training text may be classified by using a classifier, the classifier is configured to classify text characters in the training text according to different word attributes to correspondingly obtain a plurality of language modules, and the language modules may be a noun module, a verb module, an adjective module, an adverb module, and the like;
preferably, in the step, the design of the language dictionary is constructed, so that the subsequent stable execution of the language model training is effectively ensured, the accuracy of the language model training is improved, and the design of the language dictionary corresponding to the language module is constructed according to the training vocabulary, so that a noun dictionary, a verb dictionary, an adjective dictionary, an adverb dictionary and the like are correspondingly obtained;
step S20, performing model training on a module language model in the language module according to the language dictionary, and training the training text to obtain a text language model;
each language module is provided with a module language model, and the module language model is used for identifying the vocabulary input by the corresponding language module so as to judge whether the vocabulary input in the current language module is the vocabulary in the language module, thereby achieving the effect of judging the type of the vocabulary;
preferably, in this step, the training modes for the module language model and the training text can be selected according to the requirements, and in this embodiment, a 3-gram training mode is adopted for model training to obtain a trained module language model and a trained text language model;
step S30, acquiring a voice to be recognized, performing phoneme recognition to obtain a phoneme string, and matching the phoneme string with the module language model to obtain a phoneme matching result;
the method comprises the steps of inputting the speech to be recognized into a preset acoustic model to output a phoneme string, wherein the phoneme string is composed of a plurality of phonemes, and each phoneme corresponds to a character in the speech to be recognized;
preferably, in this step, through the design of matching the phoneme string with the module language model, attributes of phonemes on the phoneme string are respectively determined, for example, when it is determined that the matching between the current phoneme and the module language model in the noun module is successful, it is determined that a word corresponding to the current phoneme is a noun word;
specifically, in the step, the phoneme string is sequentially matched with the module language models in the noun module, the verb module, the adjective module and the adverb module to sequentially judge the attributes of the phonemes in the phoneme string, so that whether vocabularies such as nouns, verbs, adjectives or adverbs exist in the speech to be recognized corresponding to the phoneme string can be effectively judged;
for example, when the phoneme string is successfully matched with the module language models in the noun module, the verb module, the adjective module and the adverb module, determining that the noun, the verb, the adjective and the adverb all exist in the speech to be recognized corresponding to the phoneme string, and determining the number of words corresponding to the speech to be recognized by recognizing the number of times that the phoneme string is successfully matched with the corresponding module language model;
step S40, carrying out probability calculation on the phoneme matching result through the text language model, and outputting a sentence corresponding to the maximum probability value;
the text language model is used for carrying out probability calculation on the phoneme matching result so as to respectively calculate the probability value of sentences formed by output results among all language modules, and judging the recognition result according to the probability value;
for example, outputting results between all language modules includes: the method comprises the steps that a sentence A, a sentence B and a sentence C are judged through the text language model distribution, so that a probability A, a probability B and a probability C are obtained, wherein the probability A is greater than the probability B, and the probability B is greater than the probability C, so that the sentence A is output, and a recognition result for the voice to be recognized is obtained;
according to the embodiment, the training efficiency and the accuracy of the language model are effectively improved by classifying the training texts and constructing and designing the language dictionary, the model training is carried out on the module language model in the language module and the training text is designed, so that the language model can be effectively expanded, and the voice recognition is carried out in a phoneme recognition-based mode, so that the recognition efficiency of the voice model is effectively improved.
Example two
Please refer to fig. 2, which is a flowchart illustrating a language model training method according to a second embodiment of the present invention, including the steps of:
step S11, acquiring a training text and training vocabularies, classifying the training text to obtain a plurality of language modules, and constructing a language dictionary corresponding to the language modules according to the training vocabularies;
preferably, in other embodiments, the language module can be further divided into a state word module and the like according to different text attributes in the training text;
specifically, in this step, the language module and the language dictionary adopt a one-to-one correspondence relationship, so that a noun dictionary, a verb dictionary, an adjective dictionary, an adverb dictionary, and the like are obtained correspondingly by constructing a dictionary according to the training vocabulary;
step S21, extracting a language text corresponding to the language module from the training text according to the voice dictionary, and training the module language model by adopting a 3-gram training mode according to the language text;
the extracted language text can be set according to requirements, for example, the language text can be extracted in a preset audio mode, that is, the preset audio is edited and matched with a voice dictionary, and finally the extracted text of the training text is extracted according to a matching result;
specifically, in the step, texts corresponding to the noun module, the verb module, the adjective module and the adverb module are extracted from the training text based on the speech dictionary, and the language model corresponding to the module is trained according to the speech text, so that the training efficiency and the accuracy of model training are effectively improved;
step S31, acquiring the word frequency of the corresponding word in the language text extracted from the language module, and constructing a Huffman tree model according to the word frequency and the training result of the language model;
based on the extraction of the language text in step S21, the word frequency of the corresponding word in each language module is calculated, and a huffman tree model is constructed according to the word frequency extraction result, so that the design of the huffman tree model and the 3-gram training mode is adopted in the embodiment, so that the contents of newly added words, newly added sentences and the like can be effectively added in the language model training process, and the expansibility of language model training is further improved;
step S41, training the training text to obtain a text language model;
step S51, acquiring a voice to be recognized for phoneme recognition to obtain a phoneme string, and matching the phoneme string with sample phonemes in each module language model in sequence;
the method comprises the steps of inputting the speech to be recognized into a preset acoustic model to output a phoneme string, wherein the phoneme string is composed of a plurality of phonemes, and each phoneme corresponds to a character in the speech to be recognized;
specifically, in the step, the phoneme string is sequentially matched with the module language models in the noun module, the verb module, the adjective module and the adverb module to sequentially judge the attributes of the phonemes in the phoneme string, so that whether vocabularies such as nouns, verbs, adjectives or adverbs exist in the speech to be recognized corresponding to the phoneme string can be effectively judged;
for example, when the phoneme string is successfully matched with the language models of the noun module, the verb module, the adjective module and the adverb module, determining that the noun, the verb, the adjective and the adverb exist in the speech to be recognized corresponding to the phoneme string, and determining the number of words and phrases corresponding to the speech to be recognized by recognizing the number of times of successful matching of the phoneme string and the language model of the corresponding module;
step S61, when the matching of the phoneme string and the module language model fails, carrying out error marking on the phoneme string according to the module language model;
when the phoneme in the phoneme string is judged to be not matched with all phonemes in a module language model in any language module, judging that the phoneme string is not matched with the current module language model, and carrying out error marking on the phoneme string through the name of the module language model or the name of the corresponding language module;
specifically, the error flag may be marked by using a character, a number, or an image, for example, when the error flag is marked by using a character, the error flag is marked according to the name of the language module, for example, when the matching between the noun module and the phoneme string fails, the "missing noun" is marked on the phoneme string, and when the matching between the verb module and the phoneme string fails, the "missing verb" is marked on the phoneme string;
when error marking of the phoneme string is carried out in an image mode, a corresponding preset image is inquired according to the name of the language module so as to mark the phoneme string according to the preset image, the preset image can be set according to requirements, and the preset image corresponding to each language module is different;
step S71, when the matching number between the phoneme string and the sample phonemes in the module language model is greater than or equal to a preset number, outputting all the successfully matched sample phonemes;
in this embodiment, the preset number may be set according to a requirement, where the preset number is 2, that is, when it is determined that the matching number between the phoneme string and the sample phonemes in the module language model is greater than or equal to 2, all the matched sample phonemes are output as an output result of the current language module;
for example, when the phoneme string is successfully matched with the sample phoneme a, the sample phoneme B and the sample phoneme C in the module language model in the noun module, the sample phoneme a, the sample phoneme B and the sample phoneme C are used as the output result of the noun module;
step S81, when the matching number is smaller than the preset number, outputting the result of the language module corresponding to the module language model to obtain a phoneme matching result;
when the matching number is judged to be less than 2 and greater than 0, namely the matching number is 1, directly outputting the output result of the language module;
for example, when the matching between the module language model in the verb module and the phoneme string is successful only once, directly outputting the result of the verb module;
step S91, combining the sample phonemes output by the language modules to obtain combined information;
in the step, by designing the combination of the sample phonemes output by each language module, the diversity of output results is effectively improved;
for example, when the noun module is matched with the phoneme string, the output result obtained is: the output result obtained after matching the verb module with the phoneme string is as follows: a sample phoneme C; the output result obtained after the adjective module is matched with the phoneme string is as follows: a sample phoneme D and a sample phoneme E; when the adverb module does not match the phoneme string, the combined information obtained by combining includes:
first phoneme combination string: a sample phoneme A, a sample phoneme C and a sample phoneme D;
the second phoneme combined string: sample phoneme B, sample phoneme C and sample phoneme D;
third phoneme combined string: a sample phoneme A, a sample phoneme C and a sample phoneme E;
fourth phoneme combined string: sample phoneme B, sample phoneme C and sample phoneme E;
step S101, respectively carrying out probability calculation on the phoneme combination strings according to the text language model to obtain a plurality of probability values, and outputting sentences corresponding to the maximum probability values;
preferably, in this embodiment, when the step of matching between the phoneme string and the module language model is completed, the method further includes:
when the phoneme string is successfully matched with the module language model, carrying out vocabulary type marking on the phoneme string;
performing type matching according to a marking result of the vocabulary type mark on the phoneme string to obtain a sentence type, and performing context marking on the voice to be recognized according to the sentence type;
preferably, the phoneme string and the speech to be recognized corresponding to the phoneme string can be effectively marked with sentence types, such as statement sentence marks, question sentence marks or sentence structure marks, by carrying out type matching design according to the marking result of the vocabulary type mark on the phoneme string;
specifically, the sentence structure may be set according to the need, for example, a subject + predicate structure, a subject + predicate + object structure, or the like, and thus, whether or not the speech to be recognized lacks a corresponding sentence component may be analyzed by the respective language modules, and whether or not the speech to be recognized has a subject and a predicate may be analyzed by the verb modules, for example.
In the embodiment, the training efficiency and accuracy of the language model are effectively improved by classifying the training texts and constructing and designing the language dictionary, the model training is carried out on the module language model in the language module and the training text is designed, so that the language model can be effectively expanded, and the recognition efficiency of the voice model is effectively improved by carrying out voice recognition based on the phoneme recognition mode.
EXAMPLE III
Please refer to fig. 3, which is a schematic structural diagram of a language model training system 100 according to a third embodiment of the present invention, including: a text classification module 10, a model training module 11, a phoneme matching module 12 and a probability calculation module 13, wherein:
the text classification module 10 is configured to obtain a training text and a training vocabulary, classify the training text to obtain a plurality of language modules, and construct a language dictionary corresponding to the language modules according to the training vocabulary;
and the model training module 11 is configured to perform model training on a module language model in the language module according to the language dictionary, and train the training text to obtain a text language model.
Wherein the model training module 11 is further configured to: extracting a language text corresponding to the language module from the training text according to the speech dictionary; training the module language model by adopting a 3-gram training mode according to the language text; and acquiring the word frequency of the corresponding word in the language text extracted from the language module, and constructing a Huffman tree model according to the word frequency and the training result of the language model.
And the phoneme matching module 12 is configured to obtain a speech to be recognized, perform phoneme recognition to obtain a phoneme string, and match the phoneme string with the module language model to obtain a phoneme matching result.
Wherein the phoneme matching module 12 is further configured to: matching the phoneme string with sample phonemes in each module language model in sequence; when the matching number between the phoneme string and the sample phonemes in the module language model is larger than or equal to a preset number, outputting all the successfully matched sample phonemes; and when the matching number is smaller than the preset number, outputting the result of the language module corresponding to the module language model.
And a probability calculation module 13, configured to perform probability calculation on the phoneme matching result through the text language model, and output a sentence corresponding to the maximum probability value.
Wherein, the probability calculation module 13 is further configured to: combining the sample phonemes output by the language modules to obtain combined information, wherein a plurality of phoneme combined strings are stored in the combined information; and respectively carrying out probability calculation on the phoneme combination strings according to the text language model to obtain a plurality of probability values.
Preferably, the language model training system 100 further comprises:
and the type marking module 14 is used for carrying out error marking on the phoneme string according to the module language model when the phoneme string is unsuccessfully matched with the module language model.
Furthermore, the type marking module 14 is further configured to: when the phoneme string is successfully matched with the module language model, carrying out vocabulary type marking on the phoneme string; and performing type matching according to the marking result of the vocabulary type mark on the phoneme string to obtain a sentence type, and performing context marking on the speech to be recognized according to the sentence type.
According to the embodiment, the training efficiency and the accuracy of the language model are effectively improved by classifying the training texts and constructing and designing the language dictionary, the model training is carried out on the module language model in the language module and the training text is designed, so that the language model can be effectively expanded, and the voice recognition is carried out in a phoneme recognition-based mode, so that the recognition efficiency of the voice model is effectively improved.
Example four
Referring to fig. 4, a mobile terminal 101 according to a fourth embodiment of the present invention includes a storage device and a processor, where the storage device is used to store a computer program, and the processor runs the computer program to make the mobile terminal 101 execute the above language model training method.
The present embodiment also provides a storage medium on which a computer program used in the above-mentioned mobile terminal 101 is stored, which when executed, includes the steps of:
acquiring a training text and training vocabularies, classifying the training text to obtain a plurality of language modules, and constructing a language dictionary corresponding to the language modules according to the training vocabularies;
performing model training on a module language model in the language module according to the language dictionary, and training the training text to obtain a text language model;
acquiring a voice to be recognized, performing phoneme recognition to obtain a phoneme string, and matching the phoneme string with the module language model to obtain a phoneme matching result;
and performing probability calculation on the phoneme matching result through the text language model, and outputting a sentence corresponding to the maximum probability value. The storage medium, such as: ROM/RAM, magnetic disk, optical disk, etc.
It should be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is only used for illustration, and in practical applications, the above function distribution may be performed by different functional units or modules as needed, that is, the internal structure of the storage device is divided into different functional units or modules to perform all or part of the above described functions. Each functional unit and module in the embodiments may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit, and the integrated unit may be implemented in a form of hardware, or may be implemented in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application.
Those skilled in the art will appreciate that the component structures shown in FIG. 3 are not intended to limit the language model training system of the present invention and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components, and that the language model training method of FIGS. 1-2 may be implemented using more or fewer components than those shown in FIG. 3, or some components in combination, or a different arrangement of components. The units, modules, etc. referred to herein are a series of computer programs that can be executed by a processor (not shown) in the target language model training system and that can perform specific functions, and all of the computer programs can be stored in a storage device (not shown) of the target language model training system.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (9)

1. A method for language model training, the method comprising:
acquiring a training text and training vocabularies, classifying the training text to obtain a plurality of language modules, and constructing a language dictionary corresponding to the language modules according to the training vocabularies;
performing model training on a module language model in the language module according to the language dictionary, and training the training text to obtain a text language model;
acquiring a voice to be recognized for phoneme recognition to obtain a phoneme string, and matching the phoneme string with the module language model to obtain a phoneme matching result;
performing probability calculation on the phoneme matching result through the text language model, and outputting a sentence corresponding to the maximum probability value;
the step of matching the phoneme string with the module language model comprises:
matching the phoneme string with sample phonemes in each module language model in sequence;
when the matching number between the phoneme string and the sample phonemes in the module language model is larger than or equal to a preset number, outputting all the successfully matched sample phonemes;
and when the matching number is smaller than the preset number, outputting the result of the language module corresponding to the module language model.
2. The method of claim 1, wherein the step of model training the module language model in the language module according to the language dictionary comprises:
extracting a language text corresponding to the language module from the training text according to the language dictionary;
training the module language model by adopting a 3-gram training mode according to the language text;
and acquiring the word frequency of the corresponding word in the language text extracted from the language module, and constructing a Huffman tree model according to the word frequency and the training result of the language model.
3. The language model training method as claimed in claim 1, wherein the step of performing probability calculation on the phoneme matching result by the text language model comprises:
combining the sample phonemes output by the language modules to obtain combined information, wherein a plurality of phoneme combined strings are stored in the combined information;
and respectively carrying out probability calculation on the phoneme combination strings according to the text language model to obtain a plurality of probability values.
4. The method of language model training as recited in claim 1, wherein after the step of sequentially matching the phone string to the sample phones in each of the modular language models, the method further comprises:
and when the phoneme string is unsuccessfully matched with the module language model, carrying out error marking on the phoneme string according to the module language model.
5. The method of language model training as recited in claim 1, wherein after the step of matching the phoneme string to the module language model, the method further comprises:
when the phoneme string is successfully matched with the module language model, carrying out vocabulary type marking on the phoneme string;
and performing type matching according to the marking result of the vocabulary type mark on the phoneme string to obtain a sentence type, and performing context marking on the speech to be recognized according to the sentence type.
6. A language model training system, the system comprising:
the text classification module is used for acquiring a training text and training vocabularies, classifying the training text to obtain a plurality of language modules, and constructing a language dictionary corresponding to the language modules according to the training vocabularies;
the model training module is used for carrying out model training on a module language model in the language module according to the language dictionary and training the training text to obtain a text language model;
the phoneme matching module is used for acquiring the speech to be recognized for phoneme recognition to obtain a phoneme string and matching the phoneme string with the module language model to obtain a phoneme matching result; the step of matching the phoneme string with the module language model comprises: matching the phoneme string with sample phonemes in each module language model in sequence; when the matching number between the phoneme string and the sample phonemes in the module language model is larger than or equal to a preset number, outputting all the successfully matched sample phonemes; when the matching number is smaller than the preset number, outputting the result of the language module corresponding to the module language model;
and the probability calculation module is used for performing probability calculation on the phoneme matching result through the text language model and outputting a sentence corresponding to the maximum probability value.
7. The language model training system of claim 6, wherein the model training module is further to:
extracting a language text corresponding to the language module from the training text according to the language dictionary;
training the module language model by adopting a 3-gram training mode according to the language text;
and acquiring the word frequency of the corresponding word in the language text extracted from the language module, and constructing a Huffman tree model according to the word frequency and the training result of the language model.
8. A mobile terminal, characterized in that it comprises a storage device for storing a computer program and a processor for executing the computer program to make the mobile terminal execute the language model training method according to any one of claims 1 to 5.
9. A storage medium, characterized in that it stores a computer program for use in a mobile terminal according to claim 8, which computer program, when being executed by a processor, carries out the steps of the language model training method according to any one of claims 1 to 5.
CN202010011026.1A 2020-01-06 2020-01-06 Language model training method, system, mobile terminal and storage medium Active CN111192570B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010011026.1A CN111192570B (en) 2020-01-06 2020-01-06 Language model training method, system, mobile terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010011026.1A CN111192570B (en) 2020-01-06 2020-01-06 Language model training method, system, mobile terminal and storage medium

Publications (2)

Publication Number Publication Date
CN111192570A CN111192570A (en) 2020-05-22
CN111192570B true CN111192570B (en) 2022-12-06

Family

ID=70710630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010011026.1A Active CN111192570B (en) 2020-01-06 2020-01-06 Language model training method, system, mobile terminal and storage medium

Country Status (1)

Country Link
CN (1) CN111192570B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782779B (en) * 2020-05-28 2022-08-23 厦门快商通科技股份有限公司 Voice question-answering method, system, mobile terminal and storage medium
CN111933116B (en) * 2020-06-22 2023-02-14 厦门快商通科技股份有限公司 Speech recognition model training method, system, mobile terminal and storage medium
CN112489626B (en) * 2020-11-18 2024-01-16 华为技术有限公司 Information identification method, device and storage medium
CN113870848B (en) * 2021-12-02 2022-04-26 深圳市友杰智新科技有限公司 Method and device for constructing voice modeling unit and computer equipment
CN116108466B (en) * 2022-12-28 2023-10-13 南京邮电大学盐城大数据研究院有限公司 Encryption method based on statistical language model

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8392190B2 (en) * 2008-12-01 2013-03-05 Educational Testing Service Systems and methods for assessment of non-native spontaneous speech
EP2851896A1 (en) * 2013-09-19 2015-03-25 Maluuba Inc. Speech recognition using phoneme matching
CN105869634B (en) * 2016-03-31 2019-11-19 重庆大学 It is a kind of based on field band feedback speech recognition after text error correction method and system
US20180137109A1 (en) * 2016-11-11 2018-05-17 The Charles Stark Draper Laboratory, Inc. Methodology for automatic multilingual speech recognition
CN107665705B (en) * 2017-09-20 2020-04-21 平安科技(深圳)有限公司 Voice keyword recognition method, device, equipment and computer readable storage medium
CN109346064B (en) * 2018-12-13 2021-07-27 思必驰科技股份有限公司 Training method and system for end-to-end speech recognition model

Also Published As

Publication number Publication date
CN111192570A (en) 2020-05-22

Similar Documents

Publication Publication Date Title
CN111192570B (en) Language model training method, system, mobile terminal and storage medium
CN108847241B (en) Method for recognizing conference voice as text, electronic device and storage medium
US7636657B2 (en) Method and apparatus for automatic grammar generation from data entries
US8185376B2 (en) Identifying language origin of words
JP6909832B2 (en) Methods, devices, equipment and media for recognizing important words in audio
Qiu et al. Fudannlp: A toolkit for chinese natural language processing
KR102191425B1 (en) Apparatus and method for learning foreign language based on interactive character
US7996209B2 (en) Method and system of generating and detecting confusing phones of pronunciation
US20110307252A1 (en) Using Utterance Classification in Telephony and Speech Recognition Applications
CN112417102B (en) Voice query method, device, server and readable storage medium
US6763331B2 (en) Sentence recognition apparatus, sentence recognition method, program, and medium
JP2005084681A (en) Method and system for semantic language modeling and reliability measurement
CN110797010A (en) Question-answer scoring method, device, equipment and storage medium based on artificial intelligence
Hakkinen et al. N-gram and decision tree based language identification for written words
CN111401012B (en) Text error correction method, electronic device and computer readable storage medium
KR20170090127A (en) Apparatus for comprehending speech
CN110826301B (en) Punctuation mark adding method, punctuation mark adding system, mobile terminal and storage medium
CN112562640A (en) Multi-language speech recognition method, device, system and computer readable storage medium
CN107704450B (en) Natural language identification device and natural language identification method
CN111933116A (en) Speech recognition model training method, system, mobile terminal and storage medium
CN116483314A (en) Automatic intelligent activity diagram generation method
CN111782779B (en) Voice question-answering method, system, mobile terminal and storage medium
Lee et al. Grammatical error detection for corrective feedback provision in oral conversations
JP2006031278A (en) Voice retrieval system, method, and program
CN110750967A (en) Pronunciation labeling method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant