CN110750967A

CN110750967A - Pronunciation labeling method and device, computer equipment and storage medium

Info

Publication number: CN110750967A
Application number: CN201911001943.5A
Authority: CN
Inventors: 郑杰文; 赖金南
Original assignee: Guangzhou Li Zhi Network Technology Co Ltd
Current assignee: Guangzhou Li Zhi Network Technology Co Ltd
Priority date: 2019-10-21
Filing date: 2019-10-21
Publication date: 2020-02-04
Anticipated expiration: 2039-10-21
Also published as: CN110750967B

Abstract

The invention relates to a method, a device, computer equipment and a storage medium for marking pronunciation, wherein the method comprises the following steps: determining a sentence; if the sentence contains polyphone characters, performing word segmentation processing on the sentence to obtain a plurality of words; determining the part of speech of a target word in the sentence, wherein the target word is the word where the polyphone is located; if the polyphone has a plurality of pronunciations under the part of speech, determining a representative word and a co-occurrence word of the polyphone under the pronunciations, wherein the co-occurrence word and the representative word are co-occurrence; and when the sentence is marked with pronunciation, determining target pronunciation from the pronunciation according to the target word, the representative word and the co-occurrence word, and marking the target pronunciation on the polyphonic characters. The operation among the target words, the representative words and the co-occurrence words relates to vectors, is simple, does not need a complex neural network, can reduce consumed resources and reduces operation time consumption.

Description

Pronunciation labeling method and device, computer equipment and storage medium

Technical Field

The embodiment of the invention relates to a natural language processing technology, in particular to a pronunciation labeling method, a pronunciation labeling device, computer equipment and a storage medium.

Background

In Natural Language Processing (NLP), word-to-sound conversion is one of the common functions of chinese speech synthesis, language education, office documentation, etc., most of words have definite pronunciations, but many words have two or more pronunciations, which are also called polyphones, and one of the key points and difficulties of word-to-sound conversion is the discrimination and disambiguation of these polyphones.

It is generally recognized that the pronunciation of polyphonic characters is often closely related to specific contextual information, semantics, and language habits, and that deep learning is often used to label the pronunciation.

For example, the features of a polyphone are input into an LSTM (Long Short Term Memory network) to obtain context information of the polyphone, and the context information is input into a deep neural network corresponding to an initial consonant, a final sound and a tone to obtain a probability of the initial consonant, a probability of the final sound and a probability of the tone corresponding to the pronunciation of the polyphone, so as to combine the probabilities of the pronunciations and select the pronunciations according to the probabilities of the pronunciations.

However, the deep learning has high computation complexity, consumes more resources, and takes a long time for computation.

Disclosure of Invention

The embodiment of the invention provides a pronunciation labeling method, a pronunciation labeling device, computer equipment and a storage medium, which aim to solve the problem of higher operation complexity of labeling polyphone pronunciations by using deep learning.

In a first aspect, an embodiment of the present invention provides a method for labeling pronunciation, including:

determining a sentence;

if the sentence contains polyphone characters, performing word segmentation processing on the sentence to obtain a plurality of words;

determining the part of speech of a target word in the sentence, wherein the target word is the word where the polyphone is located;

if the polyphone has a plurality of pronunciations under the part of speech, determining a representative word and a co-occurrence word of the polyphone under the pronunciations, wherein the co-occurrence word and the representative word are co-occurrence;

and when the sentence is marked with pronunciation, determining target pronunciation from the pronunciation according to the target word, the representative word and the co-occurrence word, and marking the target pronunciation on the polyphonic characters.

Optionally, the determining a target pronunciation from the pronunciations according to the target word, the representative word and the co-occurrence word includes:

calculating a total score of the pronunciation based on the target word, the representative word and the co-occurrence word;

a target pronunciation is determined from the pronunciations based on the total score.

Optionally, the calculating a total score of the pronunciation based on the target word, the representative word and the co-occurrence word includes:

determining a word vector of the target word, a word vector of the representative word and a word vector of the co-occurrence word;

performing exponential operation on the word vector of the target word and the word vector of the representative word to obtain a first sub-score;

performing exponential operation on the word vector of the target word and the word vector of the co-occurrence word to obtain a second sub-score;

and calculating the sum of the first sub-score and all the second sub-scores as a total score.

Optionally, the method further comprises:

determining the part of speech of a polyphone and the pronunciation of the polyphone under the part of speech;

if the polyphonic characters have a plurality of pronunciations under the part of speech, determining a representative word of the polyphonic characters under the pronunciations;

traversing preset linguistic data to search co-occurrence words co-occurring with the representative words;

generating a mapping relation among the polyphones, the part of speech, the pronunciation, the representative words and the co-occurrence words;

and converting the representative word and the co-occurrence word into a word vector.

Optionally, traversing the preset corpus to find a co-occurrence word co-occurring with the representative word includes:

in the corpus, the representative word is used as a midpoint of a statistical window, and a word which is co-occurring with the word in the statistical window is determined;

counting the number of times of the co-occurrence of the representative word and the word;

selecting a co-occurring word of the representative word from the words based on the number of co-occurrences.

Optionally, the traversing the preset corpus to find a co-occurrence word co-occurring with the representative word further includes:

determining the part of speech of the co-occurrence word;

if the part of speech is a noun or a verb, determining that the co-occurrence word is valid;

and if the part of speech is non-noun and non-verb, determining that the co-occurrence word is invalid.

Optionally, after the determining the part of speech of the target word in the sentence, further comprising:

and if the polyphonic characters have one pronunciation under the part of speech, marking the pronunciation for the characters when marking the pronunciation for the sentence.

Optionally, the method further comprises:

and if the polyphone has one pronunciation under the part of speech, generating a mapping relation among the polyphone, the part of speech and the pronunciation.

Optionally, after the determining the sentence, further comprising:

and if the sentence does not contain polyphone characters, marking pronunciation for the sentence.

In a second aspect, an embodiment of the present invention further provides a pronunciation labeling device, including:

a sentence determination module for determining a sentence;

the word segmentation module is used for carrying out word segmentation on the sentence to obtain a plurality of words if the sentence contains polyphone characters;

a part-of-speech determining module, configured to determine a part-of-speech of a target word in the sentence, where the target word is a word in which the polyphone is located;

a word determining module, configured to determine a representative word and a co-occurrence word of the polyphonic character in the pronunciation if the polyphonic character has multiple pronunciations in the part of speech, where the co-occurrence word and the representative word co-occur;

and the multi-pronunciation labeling module is used for determining a target pronunciation from the pronunciations according to the target word, the representative word and the co-occurrence word when labeling the pronunciations of the sentences, and labeling the target pronunciation for the multi-tone characters.

Optionally, the multiple phonetic symbol injection mold block comprises:

a total score calculating submodule for calculating a total score of the pronunciation based on the target word, the representative word and the co-occurrence word;

a target pronunciation determination sub-module for determining a target pronunciation from the pronunciations based on the total score.

Optionally, the total score calculating sub-module includes:

a word vector determining unit, configured to determine a word vector of the target word, a word vector of the representative word, and a word vector of the co-occurrence word;

the first sub-score calculating unit is used for performing exponential operation on the word vector of the target word and the word vector of the representative word to obtain a first sub-score;

the second sub-score calculating unit is used for performing exponential operation on the word vector of the target word and the word vector of the co-occurrence word to obtain a second sub-score;

and the summation unit is used for calculating the sum of the first sub-score and all the second sub-scores to serve as a total score.

Optionally, the method further comprises:

the word parameter determining module is used for determining the part of speech of the polyphone and the pronunciation of the polyphone under the part of speech;

a representative word determining module, configured to determine a representative word of the polyphonic character in the pronunciations if the polyphonic character has multiple pronunciations in the part of speech;

the co-occurrence word searching module is used for traversing preset linguistic data to search co-occurrence words co-occurring with the representative words;

a multi-mapping relation generation module, configured to generate mapping relations among the polyphones, the part of speech, the pronunciation, the representative word, and the co-occurrence word;

and the word vector conversion module is used for converting the representative word and the co-occurrence word into a word vector.

Optionally, the co-occurrence word searching module includes:

a statistical window traversal submodule, configured to determine, in the corpus, a word that is co-occurring with the word in the statistical window by using the representative word as a midpoint of the statistical window;

the co-occurrence frequency counting submodule is used for counting the co-occurrence frequency of the representative word and the word;

and the co-occurrence number selection sub-module is used for selecting the co-occurrence words of the representative words from the words based on the co-occurrence number.

Optionally, the co-occurrence word searching module further includes:

a co-occurrence part-of-speech determination submodule for determining the part-of-speech of the co-occurrence word;

the validity determination submodule is used for determining that the co-occurrence word is valid if the part of speech is a noun or a verb;

and the invalidation determining submodule is used for determining that the co-occurrence word is invalid if the part of speech is a non-noun and a non-verb.

Optionally, the method further comprises:

and the single-pronunciation marking module is used for marking pronunciation for the word when marking pronunciation for the sentence if the polyphonic word has one pronunciation under the part of speech.

Optionally, the method further comprises:

the word information determining module is used for determining the part of speech of the polyphone and the pronunciation of the polyphone under the part of speech;

and the single mapping relation generation module is used for generating the mapping relation among the polyphonic characters, the part of speech and the pronunciations if the polyphonic characters have one pronunciations under the part of speech.

Optionally, the method further comprises:

and the sentence marking module is used for marking pronunciation for the sentence if the sentence does not contain polyphone characters.

In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of annotating a pronunciation as defined in any one of the first aspects.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the pronunciation labeling method according to any one of the first aspect.

In this embodiment, for a sentence to be labeled with pronunciations, if the sentence includes polyphones, performing word segmentation processing on the sentence to obtain a plurality of words, determining the part of speech of a target word in the sentence, where the target word is a word where the polyphones are located, and if the polyphones have a plurality of pronunciations under the part of speech, determining that the polyphones represent words and co-occurrence words under the pronunciations, and the co-occurrence words and the representative words co-occur, when the pronunciation is marked on the sentence, the target pronunciation is determined from the pronunciation according to the target word, the representative word and the co-occurrence word, marking target pronunciation for polyphonic characters, marking pronunciation by referring to part of speech and representative words with co-occurrence words as environmental information, eliminating pronunciation ambiguity, ensuring pronunciation correctness, moreover, the operation among the target word, the representative word and the co-occurrence word involves vectors, is simple, does not need a complex neural network, can reduce consumed resources and reduces the operation time consumption.

Drawings

Fig. 1 is a flowchart of a method for labeling pronunciation according to an embodiment of the present invention;

FIG. 2 is a flowchart of a pronunciation labeling method according to a second embodiment of the present invention;

fig. 3 is a flowchart of a pronunciation labeling method according to a third embodiment of the present invention;

fig. 4 is a flowchart of a pronunciation labeling method according to a fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of a phonetic annotation device according to a fifth embodiment of the present invention;

fig. 6 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a pronunciation labeling method according to an embodiment of the present invention, where the embodiment is applicable to a case of establishing a mapping relationship between a phonetic character, a part of speech, a pronunciation, a representative word, and a co-occurrence word, and the method can be executed by a pronunciation labeling device, and the pronunciation labeling device can be implemented by software and/or hardware and can be configured in a computer device, for example, a personal computer, a mobile terminal (e.g., a mobile phone, a tablet), a server, a workstation, and the like, and the method specifically includes the following steps:

step 101, determining the part of speech of a polyphone and the pronunciation of the polyphone under the part of speech.

In particular implementations, a dictionary, such as a Xinhua dictionary, may be collected from which polyphonic characters are collected.

Wherein, the polyphone is a character with two or more pronunciations.

And recording corresponding pronunciations of the collected polyphones in different parts of speech to form a data set of the polyphones, the parts of speech and the pronunciations, wherein each item in the data set is in a format of the polyphones, the parts of speech and the pronunciations.

For example, "ground, noun, di 4", "ground, co-word, de".

And 102, if the polyphonic characters have a plurality of pronunciations under the part of speech, determining the representative words of the polyphonic characters under the pronunciations.

If a certain polyphonic character has a plurality of pronunciations (i.e., two or more pronunciations) in the same part of speech in the data set obtained in step 101, one representative word is selected for each pronunciation to serve as a representative word.

For example, when the part of speech is a noun, the pronunciation may be "qu 1" (meaning "q ug"), and at this time, the representative word is "paper clip", and the pronunciation may also be "qu 3" (meaning "q ǔ"), and at this time, the representative word is "music score".

And 103, traversing the preset corpus to search co-occurrence words co-occurring with the representative words.

In the practical application, about 1000 ten thousand words or more of the corpus can be collected in advance, and the words (i.e. co-occurring words) co-occurring with the representative words in the corpus are traversed, wherein the co-occurring words are context words showing context relationship.

In a preferred embodiment of the present invention, step 103 may comprise the following sub-steps:

and a substep S11, determining the word co-occurring with the word in the statistical window by using the representative word as the midpoint of the statistical window in the corpus.

In this embodiment, the statistical window is a window that slides on the words, and for the corpus, the length of the statistical window may be set to be between 5 and 10.

For example, for a sentence: i/love/guangzhou/thin waist/but/i/not yet/go, when the length of the statistical window is 3 (i.e. the left and right lengths are each 1), the co-occurrence is as follows:

at this time, the word in which the co-occurrence word appears in the window of the core word (representative word) is the co-occurrence word of the core word (representative word).

And a substep S12 of counting the number of times the representative word and the word co-occur.

In this embodiment, the number of co-occurrences between each representative word and each word is counted, and a co-occurrence matrix can be generated, that is, the co-occurrence word and the number of co-occurrences of the central word (representative word) in the statistical window are listed by using each central word (representative word) and the cell content is a column.

For example, for a sentence: i/love/guangzhou/thin waist/but/i/not yet/go, when the length of the statistical window is 3 (i.e., the left and right lengths are each 1), the co-occurrence matrix is as follows:

co-occurrence counting	I am	Love	Guangzhou province	Waist part	But do not	Have not yet been	To remove
								I am	0	1	0	0	1	1	0
Love	1	0	1	0	0	0	0
								Guangzhou province	0	1	0	1	0	0	0
Waist part	0	0	1	0	1	0	0
								But do not	1	0	0	1	0	0	0
Have not yet been	1	0	0	0	0	0	1
								To remove	0	0	0	0	0	1	0

And a substep S13 of selecting a co-occurring word of the representative word from the words based on the number of co-occurrences.

For each representative word, co-occurring words may be selected from the co-occurring words based on the number of co-occurrences with the representative word.

In general, n words with the highest number of co-occurrences may be selected as co-occurring words of the representative word, where n is a preset constant, such as 2.

In a preferred embodiment of the present invention, step 103 may further comprise the following sub-steps:

and S14, determining the part of speech of the co-occurrence word.

And S15, if the part of speech is a noun or a verb, determining that the co-occurrence word is valid.

And S16, if the part of speech is a non-noun and a non-verb, determining that the co-occurrence word is invalid.

And for the selected co-occurrence word, if the part of speech of the co-occurrence word is a noun or a verb, determining that the co-occurrence word is valid, if the part of speech of the co-occurrence word is a non-noun or a non-verb, determining that the co-occurrence word is invalid, and selecting the word as the co-occurrence word of the representative word again based on the number of co-occurrences.

And 104, generating a mapping relation among the polyphones, the part of speech, the pronunciations, the representative words and the co-occurrence words.

In this embodiment, a data set in the format of "polyphone, part of speech, pronunciation, representative word, co-occurrence word" may be formed and stored in the database.

For example: "song, noun, qu1, paper, song clip, stationery, paper", "song, noun, qu3, music score, musical instrument, playing".

And 105, converting the representative word and the co-occurrence word into a word vector.

In this embodiment, a Global vector for word representation (Global vector for word representation) model may be called to convert the representative words and the co-occurring words in the contribution matrix into word vectors with a specified length L (for example, 512 dimensions), each of the representative words and the co-occurring words after training corresponds to a vector with a length L, and the vector is a word vector containing context information corresponding to each of the representative words and the co-occurring words.

And storing each representative word, co-occurrence word and corresponding word vector as a data set according to the format of 'word, word vector'.

The Glove model is a deep learning model for calculating word vectors based on a co-occurrence matrix idea, and can be used for vectorizing and expressing words and enabling semantic and grammatical information to be contained between vectors as much as possible.

A word vector is a vector that maps words (words or phrases) to a real number domain.

It should be noted that the terms "representative" and "co-occurrence" are used in a generic sense, and in some cases, a term may be used as a term, while other terms are used as co-occurrence terms of the term, and in other cases, the term is used as a co-occurrence term of other terms (representatives).

Fig. 2 is a flowchart of a pronunciation labeling method according to an embodiment of the present invention, where the embodiment is applicable to a case of labeling a pronunciation of a polyphonic character by using a part of speech and a co-occurrence word, the method may be executed by a pronunciation labeling device, the pronunciation labeling device may be implemented by software and/or hardware, and may be configured in a computer device, for example, a personal computer, a mobile terminal (e.g., a mobile phone, a tablet), a server, a workstation, and the like, and the method specifically includes the following steps:

step 201, sentences are determined.

In this embodiment, the text information to be converted from Chinese characters to pinyin is based on punctuation marks, such as periods, in Chinese that indicate the end of a sentence. ", exclamation point"! ", question mark"? And the like, and the text information is divided into sentences to obtain sentences.

Step 202, if the sentence contains polyphone characters, performing word segmentation processing on the sentence to obtain a plurality of words.

In this embodiment, it can be detected whether the sentence contains polyphones.

In a specific implementation, each character in a sentence is compared with a preset polyphone character, if the characters are the same as the preset polyphone character, the sentence is determined to contain the polyphone character, and if the characters are different from each other, the sentence is determined to contain the polyphone character.

If the sentence does not contain polyphone characters, the pinyin conversion is directly carried out, and the pronunciation is marked to the sentence, so that the pronunciation of each character is obtained.

If the sentence contains polyphone characters, the sentence can be subjected to word segmentation processing, so that a plurality of words are obtained, and pronunciation labeling is performed.

The word segmentation processing refers to segmenting a Chinese character sequence into individual words.

The word segmentation processing method for Chinese characters generally includes two categories:

the first type is based on string matching, i.e. scanning a string, if the substrings and words of the string are found to be identical, even if matched. Such word segmentation usually adds some heuristic rules, such as "forward/backward maximum matching", "long word first", etc. the IK Analyzer, coding, etc. are word segmentation based on string matching.

The second category is a word segmentation mode based on statistics and machine learning, and the word segmentation models Chinese characters based on the part of speech and statistical characteristics labeled manually, namely, model parameters are estimated according to observed data (labeled corpora), namely, training is carried out. And in the word segmentation stage, the probability of the occurrence of various word segmentations is calculated through a model, and the word segmentation result with the maximum probability is taken as a final result. Common sequence labeling models are HMM (Hidden Markov Model) and CRF (conditional random field).

Step 203, determining the part of speech of the target word in the sentence.

Wherein, the target word is the word where the polyphone is located.

In this embodiment, the corpus with the annotation information can be collected in advance, and the corpus size is usually 500 ten thousand characters or more.

Each sentence in the corpus has a participle, and each word has a corresponding part of speech, for example, "i come through this place", after labeling: my noun/past verb/this fixed/local noun.

The corpus is used for training the BilSTM (Bi-directional Long Short-Term Memory) and the CRF, and after the training is finished, the BilSTM, the CRF and the parameters thereof are saved.

For the sentences after the current word segmentation, the BiLSTM and the CRF load trained parameters, and the sentences are input into the BiLSTM and the CRF for processing, so that the part of speech of each word is output.

And 204, if the polyphonic characters have a plurality of pronunciations under the part of speech, determining that the polyphonic characters represent words and co-occurrence words under the pronunciations.

If the polyphone has multiple pronunciations (i.e. two or more pronunciations) under the same part of speech, the polyphone and the part of speech can be used as key words to search for the representative word and the co-occurrence word of the polyphone under the pronunciation of the part of speech in the data set with the format of polyphone, part of speech, pronunciations, representative words and co-occurrence words, wherein the co-occurrence word and the representative word co-occur in a certain range.

Step 205, when a pronunciation is labeled to the sentence, determining a target pronunciation from the pronunciations according to the target word, the representative word and the co-occurrence word, and labeling the target pronunciation to the polyphonic character.

In the sentence, for polyphone characters, the environmental information of the target word, the representative word and the co-occurrence word can be referred, a pronunciation is determined from the pronunciations under the part of speech as a target pronunciation, the target pronunciation is labeled to the polyphone characters, the pinyin conversion is directly carried out on other characters except the polyphone characters to obtain the pronunciation of the character, and the pronunciation is spliced with the corresponding pronunciation of the polyphone characters according to positions to complete the pronunciation conversion of the whole sentence.

In a preferred embodiment of the present invention, step 205 may comprise the following sub-steps:

s21, calculating the total score of the pronunciation based on the target word, the representative word and the co-occurrence word.

In this embodiment, according to a preset calculation rule, the total score of a pronunciation may be calculated by using a target word, a representative word and a co-occurrence word under a certain pronunciation, and the total score represents the accuracy of the pronunciation.

In one example, a word (target word, representative word, co-occurring word) may be used as a keyword, and a word vector of the target word, a word vector of the representative word, and a word vector of the co-occurring word are determined in a data set formatted as "word, word vector".

On one hand, the word vector of the target word and the word vector of the representative word are subjected to exponential operation to obtain a first sub-score.

And on the other hand, performing exponential operation on the word vector of the target word and the word vector of the co-occurrence word to obtain a second sub-score.

Thereby calculating a sum of the first sub-score and the second sub-score as a total score.

Expressed as:

Score_i＝exp(V_{target word}*V_{Representative word})+exp(V_{Target word}*V_{Co-occurrence of words 1})+exp(V_{Polyphone words and expressions}*V_{Co-occurrence 2})……

Wherein, Score_iThe total score of the ith pronunciation of the polyphone is shown, exp is index operation, V_{Target word}Word vector, V, representing the target word_{Co-occurrence of words 1}、V_{Co-occurrence 2}… … denotes the word vector for the 1 st, 2 … … th co-occurrence.

Of course, the above-mentioned manner of calculating the total score is only an example, and when the embodiment is implemented, other manners of calculating the total score may be set according to actual situations, for example, weighting and summing the target word, the representative word, and the co-occurrence, which is not limited in this embodiment. In addition, besides the above-mentioned manner of calculating the total score, a person skilled in the art may also adopt other manners of calculating the total score according to actual needs, and this embodiment is not limited to this.

And S22, determining a target pronunciation from the pronunciations based on the total score.

If the total score of the pronunciation is positively correlated with the accuracy of the pronunciation, the pronunciation with the highest total score can be used as the target pronunciation.

EXAMPLE III

Fig. 3 is a flowchart of a pronunciation labeling method according to a third embodiment of the present invention, where the present embodiment is applicable to a case of establishing a mapping relationship between a phonetic character, a part of speech, and a pronunciation, and the method may be executed by a pronunciation labeling device, and the pronunciation labeling device may be implemented by software and/or hardware, and may be configured in a computer device, such as a personal computer, a mobile terminal (e.g., a mobile phone, a tablet), a server, a workstation, and the like, where the method specifically includes the following steps:

step 301, determining the part of speech of the polyphone and the pronunciation of the polyphone under the part of speech.

Wherein, the polyphone is a character with two or more pronunciations.

For example, "ground, noun, di 4", "ground, co-word, de".

Step 302, if the polyphonic character has a pronunciation under the part of speech, generating a mapping relation among the polyphonic character, the part of speech and the pronunciation.

If a certain polyphone has only one pronunciation under the same part of speech in the data set obtained in step 301, a data set in the format of "polyphone, part of speech, pronunciation", such as "local, noun, di 4", may be formed and stored in the database.

Example four

Fig. 4 is a flowchart of a pronunciation labeling method according to a fourth embodiment of the present invention, where the present embodiment is applicable to a case of labeling a pronunciation of a polyphonic character by using a part of speech and a part of speech, the method may be executed by a pronunciation labeling device, the pronunciation labeling device may be implemented by software and/or hardware, and may be configured in a computer device, for example, a personal computer, a mobile terminal (e.g., a mobile phone, a tablet), a server, a workstation, and the like, and the method specifically includes the following steps:

step 401, determining a sentence.

Step 402, if the sentence contains polyphone characters, performing word segmentation processing on the sentence to obtain a plurality of words.

In a specific embodiment, each character in the sentence is compared with a preset polyphone character, if the two characters are the same, the sentence is determined to contain the polyphone character, and if the two characters are different, the sentence is determined to contain the polyphone character.

The second category is a word segmentation mode based on statistics and machine learning, and the word segmentation models Chinese characters based on the part of speech and statistical characteristics labeled manually, namely, model parameters are estimated according to observed data (labeled corpora), namely, training is carried out. And in the word segmentation stage, the probability of the occurrence of various word segmentations is calculated through a model, and the word segmentation result with the maximum probability is taken as a final result. Common sequence labeling models are HMM and CRF.

And step 403, determining the part of speech of the target word in the sentence.

Wherein, the target word is the word where the polyphone is located.

The corpus is used for training the BilSTM and CRF, and the BilSTM and CRF and parameters thereof are saved after the training is finished.

Step 404, if the polyphonic character has a pronunciation under the part of speech, when the sentence is marked with the pronunciation, the pronunciation is marked to the character.

If the polyphone has only one pronunciation under the same part of speech, the polyphone and the part of speech can be used as key words to search the pronunciation of the polyphone under the part of speech in the data set with the format of polyphone, part of speech and pronunciation.

For example, if the part of speech of the polyphone "land" is a noun, the pronunciation that the entry "land", noun, di4 "finds" land "in the data set is" di4 ".

In the sentence, the phonetic conversion is directly carried out on the other characters except the polyphone character to obtain the pronunciation of the character, and the pronunciations corresponding to the polyphone character are spliced according to positions to complete the pronunciation conversion of the whole sentence.

In this embodiment, for a sentence to be labeled with a pronunciation, if the sentence includes a polyphone, performing word segmentation processing on the sentence to obtain a plurality of words, determining a part of speech of a target word in the sentence, where the target word is a word where the polyphone is located, and if the polyphone has a pronunciation under the part of speech, labeling the pronunciation on the word when labeling the sentence, and labeling the pronunciation with reference to the part of speech, so as to eliminate ambiguity of the pronunciation and ensure correctness of the pronunciation.

EXAMPLE five

Fig. 5 is a schematic structural diagram of a pronunciation labeling device according to a fifth embodiment of the present invention, where the device specifically includes the following modules:

a sentence determination module 501, configured to determine a sentence;

a word segmentation module 502, configured to perform word segmentation on the sentence to obtain multiple words if the sentence includes polyphone characters;

a part-of-speech determining module 503, configured to determine a part-of-speech of a target word in the sentence, where the target word is a word in which the polyphone is located;

a word determining module 504, configured to determine a representative word and a co-occurrence word of the polyphonic character in the pronunciation if the polyphonic character has multiple pronunciations in the part of speech, where the co-occurrence word and the representative word co-occur;

and a multiple pronunciation labeling module 505, configured to determine a target pronunciation from the pronunciations according to the target word, the representative word, and the co-occurrence word when labeling a pronunciation for the sentence, and label the target pronunciation for the multiple-tone character.

In a preferred embodiment of the present invention, the multi-pronunciation labeling module 505 comprises:

In a preferred example of the embodiment of the present invention, the total score calculating sub-module includes:

In a preferred embodiment of the present invention, the method further comprises:

In a preferred embodiment of the present invention, the co-occurrence searching module includes:

In a preferred embodiment of the present invention, the co-occurrence searching module further includes:

The pronunciation labeling device provided by the embodiment of the invention can execute the pronunciation labeling method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

EXAMPLE six

Fig. 6 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention. As shown in fig. 6, the computer apparatus includes a processor 600, a memory 601, a communication module 602, an input device 603, and an output device 604; the number of processors 600 in the computer device may be one or more, and one processor 600 is taken as an example in fig. 6; the processor 600, the memory 601, the communication module 602, the input device 603 and the output device 604 in the computer apparatus may be connected by a bus or other means, and the connection by the bus is exemplified in fig. 6.

The memory 601 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as the modules corresponding to the tagging method of the pronunciation in the embodiment (for example, the sentence determination module 501, the word segmentation module 502, the part of speech determination module 503, the word determination module 504, and the multiple phonetic symbol injection module 505 in the tagging apparatus of the pronunciation shown in fig. 5). The processor 600 executes various functional applications and data processing of the computer device by executing software programs, instructions and modules stored in the memory 601, namely, realizes the above-mentioned pronunciation labeling method.

The memory 601 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 601 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 601 may further include memory located remotely from processor 600, which may be connected to a computer device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

And the communication module 602 is configured to establish a connection with the display screen and implement data interaction with the display screen. The input device 603 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus.

The computer device provided by the embodiment of the present invention can execute the pronunciation labeling method provided by any embodiment of the present invention, and has corresponding functions and advantages.

EXAMPLE seven

A seventh embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for labeling a pronunciation, and the method includes:

determining a sentence;

Of course, the computer program of the computer-readable storage medium provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations in the pronunciation labeling method provided by any embodiments of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiment of the above-mentioned pronunciation labeling apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for labeling pronunciation, comprising:

determining a sentence;

2. The method of claim 1, wherein determining a target pronunciation from the pronunciations based on the target word, the representative word, and the co-occurring word comprises:

3. The method of claim 2, wherein the calculating the total score of the pronunciation based on the target word, the representative word, and the co-occurrence word comprises:

4. The method according to any one of claims 1-3, further comprising:

5. The method according to claim 4, wherein traversing the predetermined corpus to find co-occurring words co-occurring with the representative word comprises:

6. The method of claim 5, wherein traversing the predetermined corpus to find co-occurring words co-occurring with the representative word further comprises:

determining the part of speech of the co-occurrence word;

7. The method of claim 1, wherein after determining the part of speech of the target word in the sentence, further comprising:

8. The method of claim 7, further comprising:

9. The method of claim 1 or 2 or 3 or 5 or 6 or 7, further comprising, after said determining a sentence:

10. An apparatus for annotating an utterance, comprising:

a sentence determination module for determining a sentence;

11. A computer device, characterized in that the computer device comprises:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of annotating a pronunciation according to any one of claims 1-9.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method for annotating a pronunciation according to any one of claims 1 to 9.