WO2022121166A1 - Method, apparatus and device for predicting heteronym pronunciation, and storage medium - Google Patents

Method, apparatus and device for predicting heteronym pronunciation, and storage medium Download PDF

Info

Publication number
WO2022121166A1
WO2022121166A1 PCT/CN2021/083522 CN2021083522W WO2022121166A1 WO 2022121166 A1 WO2022121166 A1 WO 2022121166A1 CN 2021083522 W CN2021083522 W CN 2021083522W WO 2022121166 A1 WO2022121166 A1 WO 2022121166A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
target
polyphonic
vector
pronunciation
Prior art date
Application number
PCT/CN2021/083522
Other languages
French (fr)
Chinese (zh)
Inventor
李俊杰
张志宇
马骏
王少军
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Priority to JP2021574349A priority Critical patent/JP7441864B2/en
Publication of WO2022121166A1 publication Critical patent/WO2022121166A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/245Classification techniques relating to the decision surface
    • G06F18/2451Classification techniques relating to the decision surface linear, e.g. hyperplane
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present application relates to the field of intelligent decision-making of artificial intelligence, and in particular, to a method, device, device and storage medium for predicting the pronunciation of polyphonic words.
  • Grapheme-to-phoneme conversion is an important component in Text-to-Speech systems. But unlike other languages, it is very common for a character in Chinese to have different pronunciations in different situations, and even many Chinese characters have more than 3 pronunciations. Therefore, the quality of the polyphonic pronunciation labeling system greatly affects the quality of the Chinese speech synthesis system. If the pronunciation is wrongly labelled, it will lead to obvious errors in the synthesized speech.
  • the method for predicting the pronunciation of polyphonic words is usually to predict the pronunciation of polyphonic words by using the labeled data and randomly initializing a set of vectors on the labeled data.
  • the inventor realized that, due to the random initialization of a set of vectors, when predicting the pronunciation of polyphonic words, if there is a problem of unrecognized words that have not been labeled when training the model, that is, the problem of unregistered words (out of vocabulary), Therefore, the accuracy of predicting the pronunciation of polyphonic words is low.
  • the present application provides a method, device, device and storage medium for predicting the pronunciation of a polyphonic word, which are used to improve the accuracy of predicting the pronunciation of a polyphonic word.
  • a first aspect of the present application provides a method for predicting the pronunciation of a polyphonic word, including:
  • the target pinyin probability of the target vector is calculated, and the target pronunciation of the target polyphonic word is determined according to the target pinyin probability.
  • a second aspect of the present application provides a device for predicting the pronunciation of polyphonic words, including:
  • an acquisition module configured to acquire the marked Chinese sentences to be processed, and to acquire a word representation vector set and a polyphonic word representation vector of the to-be-processed Chinese sentences, where the to-be-processed Chinese sentences include target polyphonic words;
  • a conversion module configured to perform word segmentation on the to-be-processed Chinese sentence to obtain a target word segmentation, and convert the word representation vector set into a word-level feature representation vector according to the target word segmentation;
  • a splicing module configured to perform splicing processing based on an attention mechanism on the polyphonic word representation vector and the word-level feature representation vector to obtain a target vector
  • a determination module configured to calculate the target pinyin probability of the target vector through a preset linear layer, and determine the target pronunciation of the target polyphonic word according to the target pinyin probability.
  • a third aspect of the present application provides a device for predicting the pronunciation of a polyphonic word, including: a memory and at least one processor, where an instruction is stored in the memory; the at least one processor calls the instruction in the memory to Make the prediction equipment of described polyphonic word pronunciation carry out the prediction method of polyphonic word pronunciation as follows:
  • the target pinyin probability of the target vector is calculated, and the target pronunciation of the target polyphonic word is determined according to the target pinyin probability.
  • a fourth aspect of the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when it runs on a computer, the computer executes the following method for predicting the pronunciation of polyphonic words:
  • the target pinyin probability of the target vector is calculated, and the target pronunciation of the target polyphonic word is determined according to the target pinyin probability.
  • the marked Chinese sentences to be processed are obtained, and the word representation vector set and polyphonic word representation vector of the Chinese sentences to be processed are obtained, and the Chinese sentences to be processed include target polyphonic words; the Chinese sentences to be processed are segmented.
  • the target word segmentation is obtained, and the word representation vector set is converted into a word-level feature representation vector according to the target word segmentation; the polyphonic word representation vector and the word-level feature representation vector are spliced based on the attention mechanism to obtain the target vector; layer, calculate the target pinyin probability of the target vector, and determine the target pronunciation of the target polyphonic word according to the target pinyin probability.
  • the word representation vector set is converted into a word-level feature representation vector according to the target word segmentation, and the word features are converted into word-level features to avoid the problem of unregistered words, thereby effectively improving the pronunciation prediction of polyphonic words.
  • the accuracy rate of the target vector is calculated by splicing the polyphonic word representation vector and the word-level feature representation vector based on the attention mechanism, and through the preset linear layer, the target pinyin probability of the target vector is calculated, and the target polyphonic word is determined according to the target pinyin probability.
  • the pronunciation of the target polyphonic word is predicted without any rules and artificial feature design, which alleviates the impact of the labeling error in word segmentation, and can accurately capture the Chinese to be processed.
  • the textual semantic information of the sentence improves the accuracy of predicting the pronunciation of polyphonic words.
  • Fig. 1 is an embodiment schematic diagram of the prediction method of polyphonic word pronunciation in the embodiment of the application
  • Fig. 2 is another embodiment schematic diagram of the prediction method of polyphonic word pronunciation in the embodiment of the application;
  • FIG. 3 is a schematic diagram of an embodiment of a device for predicting the pronunciation of polyphonic words in an embodiment of the present application
  • FIG. 4 is a schematic diagram of another embodiment of the device for predicting the pronunciation of polyphonic words in the embodiment of the present application.
  • FIG. 5 is a schematic diagram of an embodiment of a device for predicting the pronunciation of polyphonic words in an embodiment of the present application.
  • Embodiments of the present application provide a method, device, device and storage medium for predicting the pronunciation of a polyphonic word, which improves the accuracy of predicting the pronunciation of a polyphonic word.
  • an embodiment of the method for predicting the pronunciation of polyphonic words in the embodiment of the present application includes:
  • the execution body of the present application may be a device for predicting the pronunciation of polyphonic words, and may also be a terminal or a server, which is not specifically limited here.
  • the embodiments of the present application take the server as an execution subject as an example for description.
  • the server receives the initial Chinese sentence sent by the preset interface, cleans the data of the initial Chinese sentence, obtains the candidate Chinese sentence, and obtains the pre-created polyphonic word label.
  • the polyphonic word label can be based on the general dictionary, business domain dictionary and user portrait label.
  • the server After the server obtains the marked Chinese sentences to be processed, it calls the pre-trained word vector and the preset word vector conversion algorithm, performs vector conversion on the words to be processed in the Chinese sentence, and obtains a word representation vector set.
  • the server extracts the representation vector corresponding to the target polyphonic word in the word representation vector set, thereby obtaining the polyphonic word representation vector; or the server extracts the target polyphonic word in the marked Chinese sentence to be processed, and invokes the pre-trained word vector and preset word vector conversion Algorithm, respectively perform vector transformation on the characters of the Chinese sentence to be processed and the target polyphonic words, and obtain the word representation vector set and the polyphonic word representation vector.
  • the number of target polyphonic words includes one or more than one.
  • the server calls the preset jieba word segmentation tool or the Chinese language processing package hanlp word segmentation tool or other word segmentation tools to perform word segmentation processing based on the order of the original sentence to obtain the initial word segmentation; or, the server calls the preset dictionary-based Chinese word segmentation tool
  • the word segmentation algorithm or the Chinese word segmentation algorithm based on statistics, performs word segmentation processing based on the order of the original sentence to obtain the initial word segmentation, and splices the initial word segmentation according to the preset word splicing rules to obtain the target word segmentation, among which, the initial word segmentation and
  • the number of target participles includes one or more than one.
  • the server classifies the word representation vectors in the word representation vector set according to the target word segmentation, obtains the word representation vector group corresponding to each target word segmentation, and splices the word representation vector group corresponding to each target word segmentation to obtain the word-level feature representation vector,
  • the number of word-level feature representation vectors includes one or more than one, and one target word segment corresponds to one word-level feature representation vector.
  • the server can calculate the polyphonic attention value of the polyphonic word representation vector through the preset attention mechanism, multiply the polyphonic attention value by the polyphonic representation vector to obtain the polyphonic word vector matrix, and calculate the word-level feature representation vector Based on the word attention value of the polyphonic word representation vector, a word vector matrix is obtained, and the polyphonic word vector matrix and the word vector matrix are matrix-added or multiplied to obtain the target vector; or the server can also use the preset attention mechanism, Calculate the first attention value of the polysyllabic representation vector relative to the word-level feature representation vector, calculate the second attention value of the word-level feature representation vector relative to the polysyllabic representation vector, and multiply the first attention value by the word-level feature representation vector , obtain the first vector, multiply the second attention value with the polyphonic word representation vector to obtain the second vector, and perform matrix addition or matrix multiplication between the first vector and the second vector to obtain the target vector.
  • the number of preset linear layers can be multiple layers, and each layer corresponds to a classifier, that is, the linear layer includes multiple classifiers.
  • the initial pinyin probabilities corresponding to each classifier are weighted and summed to obtain the target pinyin probability of the target vector.
  • the number of the initial pinyin probabilities includes one or more.
  • the probability is compared with the preset threshold and the initial pinyin probability, and the target pinyin probability is obtained, and the pinyin corresponding to the target pinyin probability is determined as the target pronunciation of the target polysyllabic word, for example: multiple classifiers are Classifier 1, Classifier 2 and Classifier 3, Classifier 1 performs pinyin classification and probability value calculation on the target vector, and obtains that the probability based on Pinyin 1 is A1 and the probability based on Pinyin 2 is A2, and classifier 2 performs Pinyin classification and probability on the target vector.
  • the probability based on pinyin 1 is B1 and the probability based on pinyin 2 is B2, classifier 3 performs pinyin classification and probability value calculation on the target vector, and obtains the probability based on pinyin 1 is C1 and the probability based on pinyin 2 is C2 , A1, B1 and C1 are weighted and summed to obtain the initial pinyin probability 1 of the target vector based on pinyin 1, and A2, B2 and C2 are weighted and summed to obtain the initial pinyin probability 2 of the target vector based on pinyin 2, if the initial pinyin probability 2 One of the probability 1 and the initial pinyin probability 2 is greater than the preset threshold, then the initial pinyin probability greater than the preset threshold is determined as the target pinyin probability, if both the initial pinyin probability 1 and the initial pinyin probability 2 are greater than the preset threshold, then the initial pinyin probability The larger initial pinyin probability among the pinyin probability 1 and the initial pinyin probability 2
  • the initial pinyin probability is recalculated. After obtaining the target pinyin probability, the server determines the pinyin corresponding to the target pinyin probability as the target pronunciation of the target polyphonic word.
  • the server matches the initial historical polyphonic word information stored in the preset database according to the Chinese sentence to be processed and the target polyphonic word, and obtains the corresponding target historical polyphonic word information
  • the target historical polyphonic word information includes: Pronunciation of the target historical Chinese sentence, the historical polyphone and the historical polyphone in the target historical Chinese sentence; calculate the similarity between the target pronunciation of the target polyphone and the pronunciation of the historical polyphone; calculate the difference between the similarity and 1 to get target value, it is judged whether the target value is less than the preset similarity value, if so, the target pronunciation of the target polyphonic word is determined as the final target pronunciation, if not, the pronunciation of the historical polyphonic word is determined as the target pronunciation of the target polyphonic word.
  • the word representation vector set is converted into a word-level feature representation vector according to the target word segmentation, and the word features are converted into word-level features to avoid the problem of unregistered words, thereby effectively improving the prediction of the pronunciation of polyphonic words.
  • the accuracy rate of the target vector is calculated by splicing the polyphonic word representation vector and the word-level feature representation vector based on the attention mechanism, and through the preset linear layer, the target pinyin probability of the target vector is calculated, and the target polyphonic word is determined according to the target pinyin probability.
  • the pronunciation of the target polyphonic word is predicted without any rules and artificial feature design, which alleviates the impact of the labeling error in word segmentation, and can accurately capture the Chinese to be processed.
  • the textual semantic information of the sentence improves the accuracy of predicting the pronunciation of polyphonic words.
  • FIG. 2 another embodiment of the method for predicting the pronunciation of polyphonic words in the embodiment of the present application includes:
  • the server obtains the initial Chinese sentence, the target polyphone in the initial Chinese sentence, and the position information of the polyphone corresponding to the target polyphone; according to the position information of the polyphone, the target polyphone in the initial Chinese sentence is marked to obtain the Chinese character to be processed.
  • Sentences; word vector encoding and polyphonic word vector extraction are sequentially performed on the Chinese sentences to be processed to obtain a word representation vector set and a polyphonic word representation vector.
  • the server receives the initial Chinese sentence sent by the preset interface, calls the pre-created polyphonic word dictionary, performs polyphonic word recognition on the initial Chinese sentence, obtains the target polyphonic word, and extracts the position information of the target polyphonic word in the initial Chinese sentence (that is, the position of the polyphonic word).
  • the target polyphone corresponding to the polyphone position information is marked, and the marked content includes the target polyphone and the polyphone position information of the target polyphone, and the marked content can also include the target polyphone based on the initial Chinese
  • the pronunciation of the Chinese sentence corresponding to the sentence, wherein the Chinese sentence corresponding to the initial Chinese sentence can be matched by calculating the weight and value of the semantic similarity, emotional similarity and sentence similarity, so as to obtain the Chinese sentence to be processed.
  • the server invokes a preset supervised neural network encoder and/or an unsupervised pre-trained network encoder to encode the Chinese sentences to be processed with word vectors to obtain a word representation vector set, and extracts the corresponding polyphonic words from the word representation vector set.
  • Polyphone representation vector
  • the server encodes each word in the Chinese sentence to be processed through a preset deep neural network encoder, and obtains a word representation vector set, one word representation vector corresponds to one word; according to the position information of polyphonic words, the word representation vector The representation vector corresponding to the target polyphonic word is extracted centrally, and the polyphonic word representation vector is obtained.
  • the server invokes the deep neural network encoder among the preset supervised neural network encoders.
  • the deep neural network encoder may include, but is not limited to, long short-term memory (LSTM) models and transformer-based At least one of the bidirectional encoder representations from transformers (BERT) models, through the deep neural network encoder, according to the sequence order of each word in the Chinese sentence to be processed, each word in the Chinese sentence to be processed is based on Coding of contextual semantic information, obtaining the representation vector of each word, that is, the word representation vector set, extracting the representation vector corresponding to the position information of the polyphonic word in the word representation vector set, and obtaining the polyphonic word representation vector, for example: the Chinese sentence to be processed is "all products" All at a discount”, the location information of the polyphonic word is the seventh word in the Chinese sentence to be processed, extract the seventh word representation vector from the word representation vector set, and obtain the polyphonic word representation vector corresponding to the target polyphonic word.
  • LSTM long short-term memory
  • the server performs word segmentation processing on the Chinese sentence to be processed to obtain the target word segmentation; according to the target word segmentation, the word representation vector set is divided to obtain the representation vector group of each word; through the preset hybrid pooling layer, the representation of each word is divided The vector group is mixed and pooled to obtain the word-level feature representation vector.
  • the server calls the preset Chinese word segmentation algorithm, performs word segmentation processing on the Chinese sentence to be processed, obtains the initial word segmentation, performs part-of-speech detection and phrase detection on the initial word segmentation, and determines the initial word segmentation that passes the detection as the target word segmentation.
  • the Chinese word segmentation algorithm integrates meta-grammar N-Gram model and bi-directction matching method (BM) model, that is, the output of the N-Gram model can be the input of the BM model, or the output of the BM model can be the input of the N-Gram model, or,
  • BM bi-directction matching method
  • the server divides the word representation vector set according to the target word segmentation, and obtains the representation vector group of each word.
  • the Chinese sentence to be processed is "all products are sold at a discount”
  • the corresponding target word segmentation is "all”
  • “commodity” "All, "discount” and “sale”
  • the representation vector group of the word “discount” includes the representation vector of "hit” and the representation vector of "discount”, and the same is true for other words.
  • the preset hybrid pooling layer is used to indicate a pooling layer that combines maximum pooling and average pooling.
  • the server calls the preset hybrid pooling layer to perform hybrid pooling on the representation vector group of each word to obtain word-level features.
  • Representation vector for example: fuse the representation vector of "Dai” and the representation vector of "Zhe” in the representation vector group of the word to obtain the word-level feature representation vector of "Discount”.
  • the server can perform the maximum pooling process on the representation vector group of each word through the maximum pooling convolution kernel or the maximum pooling layer in the mixed pooling layer to obtain the first word representation vector group.
  • the average pooling convolution kernel or average pooling layer in the average pooling process is performed on the first word representation vector group to obtain the word-level feature representation vector; or, the server uses the maximum pooling convolution kernel in the mixed pooling layer. Or the maximum pooling layer, which performs maximum pooling on the representation vector group of each word to obtain the first word representation vector group.
  • each Perform average pooling on the word representation vector group to obtain the second word representation vector group, and fuse the first word representation vector group and the second word representation vector group to obtain the word-level feature representation vector; or, the server pre-creates the fusion maximum
  • the mixed pooling layer of the pooling convolution kernel and the average pooling convolution kernel performs pooling and convolution processing on the representation vector group of each word to obtain the word-level feature representation vector.
  • the number of word-level feature representation vectors includes one or More than one, a target word corresponds to a word-level feature representation vector.
  • the server calculates the attention value between the polyphonic word representation vector and the word-level feature representation vector through the preset feed-forward attention mechanism feed-forward attention. Weighted summation to obtain the attention vector; or, the server calculates the attention value of the polyphonic word representation vector relative to the word-level feature representation vector through the preset feed-forward attention mechanism feed-forward attention, and expresses the attention value with the polyphonic word representation.
  • the vectors are multiplied to obtain the polyphonic word representation vector matrix, and the polyphonic word representation vector matrix and the word-level feature representation vector are subjected to matrix addition or matrix multiplication to obtain the attention vector.
  • the server After the server obtains the attention vector, it performs matrix multiplication or matrix addition of the attention vector and the polyphonic word representation vector to obtain the target vector; or the server performs the weighted summation of the attention vector and the polyphonic word representation vector to obtain the target vector.
  • the target vector is obtained through the preset feed-forward attention mechanism, indicating which word information in the Chinese sentence to be processed is more important for the target polysyllabic word and requires greater weight, thereby improving the accuracy of the contextual semantic fusion of the target polyphonic word .
  • the server calculates the probability of the target vector based on each pinyin through a preset linear layer, and obtains a set of pinyin probability values for polysyllabic words; The values are sorted, and the pinyin probability value of the first sorted polysyllabic word is determined as the target pinyin probability; the pinyin corresponding to the target pinyin probability is determined as the target pronunciation of the target polysyllabic word.
  • the number of linear layers is one
  • the server inputs the target vector into the preset linear layer, and calculates the probability of the target vector based on each pinyin through the linear layer, and obtains a set of probability values for the pinyin of the multi-syllable word, which is the pinyin probability value of the poly-syllable word.
  • the pinyin probability value of the polyphonic word 2 ranks first, then the first ranking is the target pinyin probability, and the pinyin corresponding to the target pinyin probability is determined as the target pronunciation of the target polyphonic word.
  • the server calculates the target pinyin probability of the target vector through the preset linear layer, and after determining the target pronunciation of the target polyphonic word according to the target pinyin probability, obtains the error value of the target pronunciation based on the marked pronunciation, and determines the target pronunciation according to the error value.
  • the acquisition strategy is optimized, and the acquisition strategy includes the execution process, algorithm and network structure of acquiring the target pronunciation.
  • the server obtains the marked pronunciation of the target polyphonic word, and the marked pronunciation is the pronunciation of the target polyphonic word based on the pronunciation of the sentence corresponding to the semantics and emotions of the Chinese sentence to be processed.
  • the labeling model is obtained by labeling, calculating the pronunciation similarity between the target pronunciation of the target polyphonic word and the labeled pronunciation, calculating the difference between the pronunciation similarity and 1, and obtaining the error value of the target pronunciation based on the labeled pronunciation, which is obtained through the error value pair.
  • the execution process of the target pronunciation is adjusted, and the network structure used to obtain the target pronunciation is optimized by the error value.
  • the word representation vector set is converted into a word-level feature representation vector according to the target word segmentation, and the word features are converted into word-level features to avoid the problem of unregistered words, thereby effectively improving the prediction of the pronunciation of polyphonic words.
  • the accuracy rate of the target vector is calculated by splicing the polyphonic word representation vector and the word-level feature representation vector based on the attention mechanism, and through the preset linear layer, the target pinyin probability of the target vector is calculated, and the target polyphonic word is determined according to the target pinyin probability.
  • the target pronunciation of the target is combined with the word segmentation and attention mechanism to predict the pronunciation of the target polyphonic word without any rules and artificial feature design, which reduces the impact of the labeling error in the word segmentation, and can accurately capture the Chinese to be processed.
  • the textual semantic information of the sentence improves the accuracy of predicting the pronunciation of polyphonic words.
  • an embodiment of the device for predicting the pronunciation of a polyphonic word in the embodiment of the present application include:
  • the obtaining module 301 is used to obtain the marked Chinese sentences to be processed, and to obtain a word representation vector set and a polyphonic word representation vector of the to-be-processed Chinese sentences, and the to-be-processed Chinese sentences include target polyphonic words;
  • the conversion module 302 is used to perform word segmentation processing on the Chinese sentence to be processed to obtain a target word segmentation, and convert the word representation vector set into a word-level feature representation vector according to the target word segmentation;
  • the splicing module 303 is used to perform splicing processing based on the attention mechanism on the polyphonic word representation vector and the word-level feature representation vector to obtain a target vector;
  • the determining module 304 is configured to calculate the target pinyin probability of the target vector through a preset linear layer, and determine the target pronunciation of the target polyphonic word according to the target pinyin probability.
  • each module in the above-mentioned apparatus for predicting the pronunciation of a polyphonic word corresponds to each step in the above-mentioned embodiment of the method for predicting the pronunciation of a polyphonic word, and the functions and implementation process thereof will not be repeated here.
  • the word representation vector set is converted into a word-level feature representation vector according to the target word segmentation, and the word features are converted into word-level features to avoid the problem of unregistered words, thereby effectively improving the prediction of the pronunciation of polyphonic words.
  • the accuracy rate of the target vector is calculated by splicing the polyphonic word representation vector and the word-level feature representation vector based on the attention mechanism, and through the preset linear layer, the target pinyin probability of the target vector is calculated, and the target polyphonic word is determined according to the target pinyin probability.
  • the pronunciation of the target polyphonic word is predicted without any rules and artificial feature design, which alleviates the impact of the labeling error in word segmentation, and can accurately capture the Chinese to be processed.
  • the textual semantic information of the sentence improves the accuracy of predicting the pronunciation of polyphonic words.
  • another embodiment of the device for predicting the pronunciation of polyphonic words in the embodiment of the present application includes:
  • the obtaining module 301 is used to obtain the marked Chinese sentences to be processed, and to obtain a word representation vector set and a polyphonic word representation vector of the to-be-processed Chinese sentences, and the to-be-processed Chinese sentences include target polyphonic words;
  • the conversion module 302 is used to perform word segmentation processing on the Chinese sentence to be processed to obtain a target word segmentation, and convert the word representation vector set into a word-level feature representation vector according to the target word segmentation;
  • the splicing module 303 is used to perform splicing processing based on the attention mechanism on the polyphonic word representation vector and the word-level feature representation vector to obtain a target vector;
  • the splicing module 303 specifically includes:
  • the calculation unit 3031 is used to perform attention calculation on the polyphonic word representation vector and the word-level feature representation vector through the preset feedforward attention mechanism, and obtain the attention vector;
  • the splicing unit 3032 is used for splicing the attention vector and the polyphonic word representation vector to obtain the target vector;
  • the determining module 304 is configured to calculate the target pinyin probability of the target vector through a preset linear layer, and determine the target pronunciation of the target polyphonic word according to the target pinyin probability.
  • the conversion module 302 can also be specifically used for:
  • the representation vector group of each word is mixed and pooled to obtain the word-level feature representation vector.
  • the determining module 304 can also be specifically used for:
  • the pinyin corresponding to the target pinyin probability is determined as the target pronunciation of the target polyphonic word.
  • the obtaining module 301 includes:
  • Obtaining unit 3011 is used to obtain the initial Chinese sentence, the target polyphone in the initial Chinese sentence, and the polyphone position information corresponding to the target polyphone;
  • the labeling unit 3012 is used to label the target polyphonic word in the initial Chinese sentence according to the position information of the polyphonic word to obtain the Chinese sentence to be processed;
  • the encoding and extracting unit 3013 is configured to sequentially perform word vector encoding and polyphonic word vector extraction on the Chinese sentence to be processed to obtain a word representation vector set and a polyphonic word representation vector.
  • the code extraction unit 3013 can also be specifically used for:
  • the preset deep neural network encoder encode each word in the Chinese sentence to be processed, and obtain a word representation vector set, one word representation vector corresponds to one word;
  • the representation vector corresponding to the target polyphonic word is extracted from the word representation vector set to obtain the representation vector of the polyphonic word.
  • the device for predicting the pronunciation of polyphonic words further includes:
  • the optimization module 305 is used to obtain the error value of the target pronunciation based on the marked pronunciation, and optimize the acquisition strategy of the target pronunciation according to the error value, and the acquisition strategy includes the execution process, algorithm and network structure of obtaining the target pronunciation.
  • each module and each unit in the above-mentioned polyphonic word pronunciation prediction apparatus corresponds to each step in the above-mentioned polyphonic word pronunciation prediction method embodiment, and the functions and implementation process thereof will not be repeated here.
  • the word representation vector set is converted into a word-level feature representation vector according to the target word segmentation, and the word features are converted into word-level features to avoid the problem of unregistered words, thereby effectively improving the prediction of the pronunciation of polyphonic words.
  • the accuracy rate of the target vector is calculated by splicing the polyphonic word representation vector and the word-level feature representation vector based on the attention mechanism, and through the preset linear layer, the target pinyin probability of the target vector is calculated, and the target polyphonic word is determined according to the target pinyin probability.
  • the pronunciation of the target polyphonic word is predicted without any rules and artificial feature design, which alleviates the impact of the labeling error in word segmentation, and can accurately capture the Chinese to be processed.
  • the textual semantic information of the sentence improves the accuracy of predicting the pronunciation of polyphonic words.
  • Figures 3 and 4 above describe in detail the device for predicting the pronunciation of polyphones in the embodiment of the present application from the perspective of modular functional entities.
  • the following describes the device for predicting the pronunciation of polyphones in the embodiment of the present application in detail from the perspective of hardware processing.
  • the device 500 for predicting the pronunciation of a polyphonic word may vary greatly due to different configurations or performance, and may include one or more processors (central processing units, CPU) 510 (eg, one or more processors) and memory 520, one or more storage media 530 (eg, one or more mass storage devices) that store application programs 533 or data 532.
  • the memory 520 and the storage medium 530 may be short-term storage or persistent storage.
  • the program stored in the storage medium 530 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the apparatus 500 for predicting the pronunciation of polyphonic words.
  • the processor 510 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the device 500 for predicting the pronunciation of polyphones.
  • the device 500 for predicting the pronunciation of polyphones may also include one or more power sources 540, one or more wired or wireless network interfaces 550, one or more input and output interfaces 560, and/or, one or more operating systems 531, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and more.
  • operating systems 531 such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and more.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium.
  • the computer-readable storage medium may also be a volatile computer-readable storage medium. Instructions are stored in the read storage medium, and when the instructions are executed on the computer, make the computer execute the steps of the method for predicting the pronunciation of polyphonic words.
  • the computer-readable storage medium may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required by at least one function, and the like; Use the created data, etc.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the integrated unit if implemented as a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, removable hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Abstract

The present invention relates to the technical field of artificial intelligence, and provides a method, apparatus and device for predicting the heteronym pronunciation, and a storage medium, for use in improving the accuracy of predicting the heteronym pronunciation. The method for predicting the heteronym pronunciation comprises: acquiring an annotated Chinese statement to be processed, and acquiring a word representation vector set and a heteronym representation vector of said Chinese statement, said Chinese statement comprising a target heteronym (101); performing word segmentation processing on said Chinese statement to obtain a target word, and converting the word representation vector set into a word-level feature representation vector according to the target word (102); connecting the heteronym representation vector to the word-level feature representation vector on the basis of an attention mechanism to obtain a target vector (103); and calculating a target Pinyin probability of the target vector by means of a preset linear layer, and determining a target pronunciation of the target heteronym according to the target Pinyin probability (104). In addition, the present invention further relates to blockchain technology, and said annotated Chinese statement can be stored in a blockchain.

Description

多音字发音的预测方法、装置、设备及存储介质Method, device, device and storage medium for predicting pronunciation of polyphonic words
本申请要求于2020年12月10日提交中国专利局、申请号为202011432585.6、发明名称为“多音字发音的预测方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of the Chinese patent application filed on December 10, 2020 with the application number 202011432585.6 and the invention titled "Method, Apparatus, Equipment and Storage Medium for Predicting the Pronunciation of Polyphones", the entire contents of which are approved by Reference is incorporated in the application.
技术领域technical field
本申请涉及人工智能的智能决策领域,尤其涉及一种多音字发音的预测方法、装置、设备及存储介质。The present application relates to the field of intelligent decision-making of artificial intelligence, and in particular, to a method, device, device and storage medium for predicting the pronunciation of polyphonic words.
背景技术Background technique
文字到音素(grapheme-to-phoneme)转换是从文本到语音(Text-to-Speech)系统中的重要组成部分。但是不同于其他语言,中文里一个字符在不同情况下具有不同发音的情况十分常见,甚至有许多汉字有超过3种发音。因此,多音字发音标注系统的好坏十分影响中文语音合成系统的质量,如果标错发音,会导致合成出的语音出现明显错误。目前,对于多音字发音的预测方法通常为利用已经标注好的数据以及对已经标注好的数据进行随机初始化一组向量来预测多音字的发音。Grapheme-to-phoneme conversion is an important component in Text-to-Speech systems. But unlike other languages, it is very common for a character in Chinese to have different pronunciations in different situations, and even many Chinese characters have more than 3 pronunciations. Therefore, the quality of the polyphonic pronunciation labeling system greatly affects the quality of the Chinese speech synthesis system. If the pronunciation is wrongly labelled, it will lead to obvious errors in the synthesized speech. At present, the method for predicting the pronunciation of polyphonic words is usually to predict the pronunciation of polyphonic words by using the labeled data and randomly initializing a set of vectors on the labeled data.
但是发明人意识到,由于随机初始化一组向量会导致在多音字发音预测时,若遇到在训练模型时未标注过的词则无法识别的问题,即未登录词问题(out of vocabulary),因而,导致了预测多音字发音的准确率较低。However, the inventor realized that, due to the random initialization of a set of vectors, when predicting the pronunciation of polyphonic words, if there is a problem of unrecognized words that have not been labeled when training the model, that is, the problem of unregistered words (out of vocabulary), Therefore, the accuracy of predicting the pronunciation of polyphonic words is low.
发明内容SUMMARY OF THE INVENTION
本申请提供一种多音字发音的预测方法、装置、设备及存储介质,用于提高预测多音字发音的准确率。The present application provides a method, device, device and storage medium for predicting the pronunciation of a polyphonic word, which are used to improve the accuracy of predicting the pronunciation of a polyphonic word.
本申请第一方面提供了一种多音字发音的预测方法,包括:A first aspect of the present application provides a method for predicting the pronunciation of a polyphonic word, including:
获取经过标注的待处理中文语句,并获取所述待处理中文语句的字表示向量集和多音字表示向量,所述待处理中文语句包括目标多音字;Obtain the marked Chinese sentences to be processed, and obtain the word representation vector set and the polyphonic word representation vector of the to-be-processed Chinese sentences, and the to-be-processed Chinese sentences include target polyphonic words;
对所述待处理中文语句进行分词处理得到目标分词,并根据所述目标分词将所述字表示向量集转换为词级特征表示向量;Perform word segmentation processing on the to-be-processed Chinese sentence to obtain a target word segmentation, and convert the word representation vector set into a word-level feature representation vector according to the target word segmentation;
对所述多音字表示向量和所述词级特征表示向量进行基于注意力机制的拼接处理,得到目标向量;performing splicing processing based on an attention mechanism on the polyphonic word representation vector and the word-level feature representation vector to obtain a target vector;
通过预置的线性层,计算所述目标向量的目标拼音概率,并根据所述目标拼音概率确定所述目标多音字的目标发音。Through the preset linear layer, the target pinyin probability of the target vector is calculated, and the target pronunciation of the target polyphonic word is determined according to the target pinyin probability.
本申请第二方面提供了一种多音字发音的预测装置,包括:A second aspect of the present application provides a device for predicting the pronunciation of polyphonic words, including:
获取模块,用于获取经过标注的待处理中文语句,并获取所述待处理中文语句的字表示向量集和多音字表示向量,所述待处理中文语句包括目标多音字;an acquisition module, configured to acquire the marked Chinese sentences to be processed, and to acquire a word representation vector set and a polyphonic word representation vector of the to-be-processed Chinese sentences, where the to-be-processed Chinese sentences include target polyphonic words;
转换模块,用于对所述待处理中文语句进行分词处理得到目标分词,并根据所述目标分词将所述字表示向量集转换为词级特征表示向量;a conversion module, configured to perform word segmentation on the to-be-processed Chinese sentence to obtain a target word segmentation, and convert the word representation vector set into a word-level feature representation vector according to the target word segmentation;
拼接模块,用于对所述多音字表示向量和所述词级特征表示向量进行基于注意力机制的拼接处理,得到目标向量;a splicing module, configured to perform splicing processing based on an attention mechanism on the polyphonic word representation vector and the word-level feature representation vector to obtain a target vector;
确定模块,用于通过预置的线性层,计算所述目标向量的目标拼音概率,并根据所述目标拼音概率确定所述目标多音字的目标发音。A determination module, configured to calculate the target pinyin probability of the target vector through a preset linear layer, and determine the target pronunciation of the target polyphonic word according to the target pinyin probability.
本申请第三方面提供了一种多音字发音的预测设备,包括:存储器和至少一个处理器,所述存储器中存储有指令;所述至少一个处理器调用所述存储器中的所述指令,以使得所述多音字发音的预测设备执行如下所述的多音字发音的预测方法:A third aspect of the present application provides a device for predicting the pronunciation of a polyphonic word, including: a memory and at least one processor, where an instruction is stored in the memory; the at least one processor calls the instruction in the memory to Make the prediction equipment of described polyphonic word pronunciation carry out the prediction method of polyphonic word pronunciation as follows:
获取经过标注的待处理中文语句,并获取所述待处理中文语句的字表示向量集和多音字表示向量,所述待处理中文语句包括目标多音字;Obtain the marked Chinese sentences to be processed, and obtain the word representation vector set and the polyphonic word representation vector of the to-be-processed Chinese sentences, and the to-be-processed Chinese sentences include target polyphonic words;
对所述待处理中文语句进行分词处理得到目标分词,并根据所述目标分词将所述字表示向量集转换为词级特征表示向量;Perform word segmentation processing on the to-be-processed Chinese sentence to obtain a target word segmentation, and convert the word representation vector set into a word-level feature representation vector according to the target word segmentation;
对所述多音字表示向量和所述词级特征表示向量进行基于注意力机制的拼接处理,得到目标向量;performing splicing processing based on an attention mechanism on the polyphonic word representation vector and the word-level feature representation vector to obtain a target vector;
通过预置的线性层,计算所述目标向量的目标拼音概率,并根据所述目标拼音概率确定所述目标多音字的目标发音。Through the preset linear layer, the target pinyin probability of the target vector is calculated, and the target pronunciation of the target polyphonic word is determined according to the target pinyin probability.
本申请的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行如下所述的多音字发音的预测方法:A fourth aspect of the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when it runs on a computer, the computer executes the following method for predicting the pronunciation of polyphonic words:
获取经过标注的待处理中文语句,并获取所述待处理中文语句的字表示向量集和多音字表示向量,所述待处理中文语句包括目标多音字;Obtain the marked Chinese sentences to be processed, and obtain the word representation vector set and the polyphonic word representation vector of the to-be-processed Chinese sentences, and the to-be-processed Chinese sentences include target polyphonic words;
对所述待处理中文语句进行分词处理得到目标分词,并根据所述目标分词将所述字表示向量集转换为词级特征表示向量;Perform word segmentation processing on the to-be-processed Chinese sentence to obtain a target word segmentation, and convert the word representation vector set into a word-level feature representation vector according to the target word segmentation;
对所述多音字表示向量和所述词级特征表示向量进行基于注意力机制的拼接处理,得到目标向量;performing splicing processing based on an attention mechanism on the polyphonic word representation vector and the word-level feature representation vector to obtain a target vector;
通过预置的线性层,计算所述目标向量的目标拼音概率,并根据所述目标拼音概率确定所述目标多音字的目标发音。Through the preset linear layer, the target pinyin probability of the target vector is calculated, and the target pronunciation of the target polyphonic word is determined according to the target pinyin probability.
本申请提供的技术方案中,获取经过标注的待处理中文语句,并获取待处理中文语句的字表示向量集和多音字表示向量,待处理中文语句包括目标多音字;对待处理中文语句进行分词处理得到目标分词,并根据目标分词将字表示向量集转换为词级特征表示向量;对多音字表示向量和词级特征表示向量进行基于注意力机制的拼接处理,得到目标向量;通过预置的线性层,计算目标向量的目标拼音概率,并根据目标拼音概率确定目标多音字的目标发音。本申请实施例中,通过根据目标分词将字表示向量集转换为词级特征表示向量,通过将字的特征转化为词级特征,避免了未登录词问题,从而有效地提高了多音字发音预测的准确率,通过对多音字表示向量和词级特征表示向量进行基于注意力机制的拼接处理,以及通过预置的线性层,计算目标向量的目标拼音概率,并根据目标拼音概率确定目标多音字的目标发音,通过结合目标分词和注意力机制,对目标多音字的发音进行预测,无需任何规则和人工特征设计,减轻了分词存在的标注错误问题所造成的影响,能够准确地捕获待处理中文语句的文本语义信息,提高了预测多音字发音的准确率。In the technical solution provided by this application, the marked Chinese sentences to be processed are obtained, and the word representation vector set and polyphonic word representation vector of the Chinese sentences to be processed are obtained, and the Chinese sentences to be processed include target polyphonic words; the Chinese sentences to be processed are segmented. The target word segmentation is obtained, and the word representation vector set is converted into a word-level feature representation vector according to the target word segmentation; the polyphonic word representation vector and the word-level feature representation vector are spliced based on the attention mechanism to obtain the target vector; layer, calculate the target pinyin probability of the target vector, and determine the target pronunciation of the target polyphonic word according to the target pinyin probability. In the embodiment of the present application, the word representation vector set is converted into a word-level feature representation vector according to the target word segmentation, and the word features are converted into word-level features to avoid the problem of unregistered words, thereby effectively improving the pronunciation prediction of polyphonic words. The accuracy rate of the target vector is calculated by splicing the polyphonic word representation vector and the word-level feature representation vector based on the attention mechanism, and through the preset linear layer, the target pinyin probability of the target vector is calculated, and the target polyphonic word is determined according to the target pinyin probability. By combining the target word segmentation and attention mechanism, the pronunciation of the target polyphonic word is predicted without any rules and artificial feature design, which alleviates the impact of the labeling error in word segmentation, and can accurately capture the Chinese to be processed. The textual semantic information of the sentence improves the accuracy of predicting the pronunciation of polyphonic words.
附图说明Description of drawings
图1为本申请实施例中多音字发音的预测方法的一个实施例示意图;Fig. 1 is an embodiment schematic diagram of the prediction method of polyphonic word pronunciation in the embodiment of the application;
图2为本申请实施例中多音字发音的预测方法的另一个实施例示意图;Fig. 2 is another embodiment schematic diagram of the prediction method of polyphonic word pronunciation in the embodiment of the application;
图3为本申请实施例中多音字发音的预测装置的一个实施例示意图;3 is a schematic diagram of an embodiment of a device for predicting the pronunciation of polyphonic words in an embodiment of the present application;
图4为本申请实施例中多音字发音的预测装置的另一个实施例示意图;4 is a schematic diagram of another embodiment of the device for predicting the pronunciation of polyphonic words in the embodiment of the present application;
图5为本申请实施例中多音字发音的预测设备的一个实施例示意图。FIG. 5 is a schematic diagram of an embodiment of a device for predicting the pronunciation of polyphonic words in an embodiment of the present application.
具体实施方式Detailed ways
本申请实施例提供了一种多音字发音的预测方法、装置、设备及存储介质,提高了预测多音字发音的准确率。Embodiments of the present application provide a method, device, device and storage medium for predicting the pronunciation of a polyphonic word, which improves the accuracy of predicting the pronunciation of a polyphonic word.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示 或描述的内容以外的顺序实施。此外,术语“包括”或“具有”及其任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of the present application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" or "having" and any variations thereof are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed steps or units, but may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.
为便于理解,下面对本申请实施例的具体流程进行描述,请参阅图1,本申请实施例中多音字发音的预测方法的一个实施例包括:For ease of understanding, the specific flow of the embodiment of the present application is described below, referring to FIG. 1 , an embodiment of the method for predicting the pronunciation of polyphonic words in the embodiment of the present application includes:
101、获取经过标注的待处理中文语句,并获取待处理中文语句的字表示向量集和多音字表示向量,待处理中文语句包括目标多音字。101. Acquire the marked Chinese sentences to be processed, and acquire a word representation vector set and a polyphonic word representation vector of the to-be-processed Chinese sentences, where the to-be-processed Chinese sentences include target polyphonic words.
可以理解的是,本申请的执行主体可以为多音字发音的预测装置,还可以是终端或者服务器,具体此处不做限定。本申请实施例以服务器为执行主体为例进行说明。It can be understood that the execution body of the present application may be a device for predicting the pronunciation of polyphonic words, and may also be a terminal or a server, which is not specifically limited here. The embodiments of the present application take the server as an execution subject as an example for description.
服务器接收预置界面发送的初始中文语句,对初始中文语句进行数据清理,得到候选中文语句,获取预先创建的多音字标签,该多音字标签可为基于通用字典、业务领域词典和用户画像标签中至少一种的多音字创建的便签,以提高多音字基于多领域标注的普适性和准确性,以及通过基于用户画像标签的兴趣爱好,提高多音字的标注准确性,多音字标签包括多音字和多音字基于语义信息的发音,识别候选中文语句的业务领域和用户信息,基于业务领域和用户信息,调用对应的多音字标签,通过该多音字标签,识别候选中文语句中的目标多音字,并对目标多音字进行标注,从而得到经过标注的待处理中文语句。The server receives the initial Chinese sentence sent by the preset interface, cleans the data of the initial Chinese sentence, obtains the candidate Chinese sentence, and obtains the pre-created polyphonic word label. The polyphonic word label can be based on the general dictionary, business domain dictionary and user portrait label. A sticky note created by at least one type of polyphonic word, so as to improve the universality and accuracy of the polyphonic word based on multi-domain labeling, and improve the labeling accuracy of the polyphonic word based on the hobbies of the user portrait tag, and the polyphonic word label includes the polyphonic word. Based on the pronunciation of polyphonic words based on semantic information, identify the business domain and user information of candidate Chinese sentences, call the corresponding polyphonic word label based on the business domain and user information, and identify the target polyphonic word in the candidate Chinese sentence through the polyphonic word label, And mark the target polyphonic words, so as to obtain the marked Chinese sentences to be processed.
服务器得到经过标注的待处理中文语句后,调用预先训练好的字向量和预置的字向量转换算法,对待处理中文语句的字进行向量转换,得到字表示向量集,根据标注的目标多音字,提取字表示向量集中目标多音字对应的表示向量,从而得到多音字表示向量;或者服务器提取经过标注的待处理中文语句中的目标多音字,调用预先训练好的字向量和预置的字向量转换算法,分别对待处理中文语句的字和目标多音字进行向量转换,得到字表示向量集和多音字表示向量。其中,目标多音字的数量包括一个或一个以上。After the server obtains the marked Chinese sentences to be processed, it calls the pre-trained word vector and the preset word vector conversion algorithm, performs vector conversion on the words to be processed in the Chinese sentence, and obtains a word representation vector set. According to the marked target polyphonic words, Extract the representation vector corresponding to the target polyphonic word in the word representation vector set, thereby obtaining the polyphonic word representation vector; or the server extracts the target polyphonic word in the marked Chinese sentence to be processed, and invokes the pre-trained word vector and preset word vector conversion Algorithm, respectively perform vector transformation on the characters of the Chinese sentence to be processed and the target polyphonic words, and obtain the word representation vector set and the polyphonic word representation vector. Wherein, the number of target polyphonic words includes one or more than one.
102、对待处理中文语句进行分词处理得到目标分词,并根据目标分词将字表示向量集转换为词级特征表示向量。102. Perform word segmentation processing on the Chinese sentence to be processed to obtain a target word segmentation, and convert the word representation vector set into a word-level feature representation vector according to the target word segmentation.
服务器调用预置的结巴jieba分词工具或汉语言处理包hanlp分词工具或其他分词工具,对待处理中文语句进行基于原句顺序的分词处理,得到初始分词;或者,服务器调用预置的基于词典的中文分词算法或基于统计的中文分词算法,对待处理中文语句进行基于原句顺序的分词处理,得到初始分词,按照预置的词语拼接规则,对初始分词进行拼接,得到目标分词,其中,初始分词和目标分词的数量均包括一个或一个以上。服务器按照目标分词,将字表示向量集中的字表示向量进行分类,得到每个目标分词对应的字表示向量组,将每个目标分词对应的字表示向量组进行拼接,得到词级特征表示向量,词级特征表示向量的数量包括一个或一个以上,一个目标分词对应一个词级特征表示向量。The server calls the preset jieba word segmentation tool or the Chinese language processing package hanlp word segmentation tool or other word segmentation tools to perform word segmentation processing based on the order of the original sentence to obtain the initial word segmentation; or, the server calls the preset dictionary-based Chinese word segmentation tool The word segmentation algorithm or the Chinese word segmentation algorithm based on statistics, performs word segmentation processing based on the order of the original sentence to obtain the initial word segmentation, and splices the initial word segmentation according to the preset word splicing rules to obtain the target word segmentation, among which, the initial word segmentation and The number of target participles includes one or more than one. The server classifies the word representation vectors in the word representation vector set according to the target word segmentation, obtains the word representation vector group corresponding to each target word segmentation, and splices the word representation vector group corresponding to each target word segmentation to obtain the word-level feature representation vector, The number of word-level feature representation vectors includes one or more than one, and one target word segment corresponds to one word-level feature representation vector.
103、对多音字表示向量和词级特征表示向量进行基于注意力机制的拼接处理,得到目标向量。103. Perform a splicing process based on an attention mechanism on the polyphonic word representation vector and the word-level feature representation vector to obtain a target vector.
服务器可通过预置的注意力机制,计算多音字表示向量的多音字注意力值,将该多音字注意力值与多音字表示向量进行相乘,得到多音字向量矩阵,计算词级特征表示向量基于多音字表示向量的词注意力值,得到词向量矩阵,将多音字向量矩阵与词向量矩阵进行矩阵相加或矩阵相乘,得到目标向量;或服务器也可通过预置的注意力机制,计算多音字表示向量相对词级特征表示向量的第一注意力值,计算词级特征表示向量相对多音字表示向量的第二注意力值,将第一注意力值与词级特征表示向量相乘,得到第一向量,将第二注意力值与多音字表示向量相乘,得到第二向量,将第一向量和第二向量进行矩阵相加或矩阵相乘,得到目标向量。The server can calculate the polyphonic attention value of the polyphonic word representation vector through the preset attention mechanism, multiply the polyphonic attention value by the polyphonic representation vector to obtain the polyphonic word vector matrix, and calculate the word-level feature representation vector Based on the word attention value of the polyphonic word representation vector, a word vector matrix is obtained, and the polyphonic word vector matrix and the word vector matrix are matrix-added or multiplied to obtain the target vector; or the server can also use the preset attention mechanism, Calculate the first attention value of the polysyllabic representation vector relative to the word-level feature representation vector, calculate the second attention value of the word-level feature representation vector relative to the polysyllabic representation vector, and multiply the first attention value by the word-level feature representation vector , obtain the first vector, multiply the second attention value with the polyphonic word representation vector to obtain the second vector, and perform matrix addition or matrix multiplication between the first vector and the second vector to obtain the target vector.
104、通过预置的线性层,计算目标向量的目标拼音概率,并根据目标拼音概率确定目 标多音字的目标发音。104. Calculate the target pinyin probability of the target vector through a preset linear layer, and determine the target pronunciation of the target polyphonic word according to the target pinyin probability.
预置的线性层的层数可为多层,每层对应一个分类器,即线性层包括多个分类器,服务器通过多个分类器,分别对目标向量进行拼音分类和概率值计算,得到每个分类器对应的多个初始拼音概率,将多个个分类器对应的初始拼音概率进行加权求和,得到目标向量的目标拼音概率,该初始拼音概率的数量包括一个或者多个,将初始拼音概率与预设阈值以及初始拼音概率之间的大小进行对比分析,得到目标拼音概率,将目标拼音概率对应的拼音确定为目标多音字的目标发音,例如:多个分类器分别为分类器1、分类器2和分类器3,分类器1对目标向量进行拼音分类和概率值计算,得到基于拼音1的概率为A1和基于拼音2的概率为A2,分类器2对目标向量进行拼音分类和概率值计算,得到基于拼音1的概率为B1和基于拼音2的概率为B2,分类器3对目标向量进行拼音分类和概率值计算,得到基于拼音1的概率为C1和基于拼音2的概率为C2,将A1、B1和C1进行加权求和,得到目标向量基于拼音1的初始拼音概率1,将A2、B2和C2进行加权求和,得到目标向量基于拼音2的初始拼音概率2,若初始拼音概率1和初始拼音概率2中存在一个大于预设阈值,则将大于预设阈值的初始拼音概率确定为目标拼音概率,若初始拼音概率1和初始拼音概率2均大于预设阈值,则将初始拼音概率1和初始拼音概率2中较大的初始拼音概率确定为目标拼音概率,若初始拼音概率1和初始拼音概率2均小于或等于预设阈值,则重新计算初始拼音概率。服务器得到目标拼音概率后,将目标拼音概率对应的拼音确定为目标多音字的目标发音。The number of preset linear layers can be multiple layers, and each layer corresponds to a classifier, that is, the linear layer includes multiple classifiers. The initial pinyin probabilities corresponding to each classifier are weighted and summed to obtain the target pinyin probability of the target vector. The number of the initial pinyin probabilities includes one or more. The probability is compared with the preset threshold and the initial pinyin probability, and the target pinyin probability is obtained, and the pinyin corresponding to the target pinyin probability is determined as the target pronunciation of the target polysyllabic word, for example: multiple classifiers are Classifier 1, Classifier 2 and Classifier 3, Classifier 1 performs pinyin classification and probability value calculation on the target vector, and obtains that the probability based on Pinyin 1 is A1 and the probability based on Pinyin 2 is A2, and classifier 2 performs Pinyin classification and probability on the target vector. Value calculation, the probability based on pinyin 1 is B1 and the probability based on pinyin 2 is B2, classifier 3 performs pinyin classification and probability value calculation on the target vector, and obtains the probability based on pinyin 1 is C1 and the probability based on pinyin 2 is C2 , A1, B1 and C1 are weighted and summed to obtain the initial pinyin probability 1 of the target vector based on pinyin 1, and A2, B2 and C2 are weighted and summed to obtain the initial pinyin probability 2 of the target vector based on pinyin 2, if the initial pinyin probability 2 One of the probability 1 and the initial pinyin probability 2 is greater than the preset threshold, then the initial pinyin probability greater than the preset threshold is determined as the target pinyin probability, if both the initial pinyin probability 1 and the initial pinyin probability 2 are greater than the preset threshold, then the initial pinyin probability The larger initial pinyin probability among the pinyin probability 1 and the initial pinyin probability 2 is determined as the target pinyin probability. If both the initial pinyin probability 1 and the initial pinyin probability 2 are less than or equal to the preset threshold, the initial pinyin probability is recalculated. After obtaining the target pinyin probability, the server determines the pinyin corresponding to the target pinyin probability as the target pronunciation of the target polyphonic word.
其中,在另一实施例中,服务器根据待处理中文语句和目标多音字,对预置数据库中存储的初始历史多音字信息进行匹配,得到对应的目标历史多音字信息,目标历史多音字信息包括目标历史中文语句、目标历史中文语句中的历史多音字和历史多音字的发音;计算目标多音字的目标发音与历史多音字的发音之间的相似度;计算该相似度与1的差值得到目标值,判断目标值是否小于预设相似度值,若是,则目标多音字的目标发音确定为最终的目标发音,若否,则将历史多音字的发音确定为目标多音字的目标发音。Wherein, in another embodiment, the server matches the initial historical polyphonic word information stored in the preset database according to the Chinese sentence to be processed and the target polyphonic word, and obtains the corresponding target historical polyphonic word information, and the target historical polyphonic word information includes: Pronunciation of the target historical Chinese sentence, the historical polyphone and the historical polyphone in the target historical Chinese sentence; calculate the similarity between the target pronunciation of the target polyphone and the pronunciation of the historical polyphone; calculate the difference between the similarity and 1 to get target value, it is judged whether the target value is less than the preset similarity value, if so, the target pronunciation of the target polyphonic word is determined as the final target pronunciation, if not, the pronunciation of the historical polyphonic word is determined as the target pronunciation of the target polyphonic word.
本申请实施例中,通过根据目标分词将字表示向量集转换为词级特征表示向量,通过将字的特征转化为词级特征,避免了未登录词问题,从而有效地提高了预测多音字发音的准确率,通过对多音字表示向量和词级特征表示向量进行基于注意力机制的拼接处理,以及通过预置的线性层,计算目标向量的目标拼音概率,并根据目标拼音概率确定目标多音字的目标发音,通过结合目标分词和注意力机制,对目标多音字的发音进行预测,无需任何规则和人工特征设计,减轻了分词存在的标注错误问题所造成的影响,能够准确地捕获待处理中文语句的文本语义信息,提高了预测多音字发音的准确率。In the embodiment of the present application, the word representation vector set is converted into a word-level feature representation vector according to the target word segmentation, and the word features are converted into word-level features to avoid the problem of unregistered words, thereby effectively improving the prediction of the pronunciation of polyphonic words. The accuracy rate of the target vector is calculated by splicing the polyphonic word representation vector and the word-level feature representation vector based on the attention mechanism, and through the preset linear layer, the target pinyin probability of the target vector is calculated, and the target polyphonic word is determined according to the target pinyin probability. By combining the target word segmentation and attention mechanism, the pronunciation of the target polyphonic word is predicted without any rules and artificial feature design, which alleviates the impact of the labeling error in word segmentation, and can accurately capture the Chinese to be processed. The textual semantic information of the sentence improves the accuracy of predicting the pronunciation of polyphonic words.
请参阅图2,本申请实施例中多音字发音的预测方法的另一个实施例包括:Referring to Fig. 2, another embodiment of the method for predicting the pronunciation of polyphonic words in the embodiment of the present application includes:
201、获取经过标注的待处理中文语句,并获取待处理中文语句的字表示向量集和多音字表示向量,待处理中文语句包括目标多音字。201. Acquire the marked Chinese sentences to be processed, and acquire a word representation vector set and a polyphonic word representation vector of the to-be-processed Chinese sentences, where the to-be-processed Chinese sentences include target polyphonic words.
具体地,服务器获取初始中文语句、初始中文语句中的目标多音字和目标多音字对应的多音字位置信息;根据多音字位置信息,对初始中文语句中的目标多音字进行标注,得到待处理中文语句;对待处理中文语句依次进行字向量编码和多音字向量提取,得到字表示向量集和多音字表示向量。Specifically, the server obtains the initial Chinese sentence, the target polyphone in the initial Chinese sentence, and the position information of the polyphone corresponding to the target polyphone; according to the position information of the polyphone, the target polyphone in the initial Chinese sentence is marked to obtain the Chinese character to be processed. Sentences; word vector encoding and polyphonic word vector extraction are sequentially performed on the Chinese sentences to be processed to obtain a word representation vector set and a polyphonic word representation vector.
服务器接收预置界面发送的初始中文语句,调用预先创建的多音字词典,对初始中文语句进行多音字识别,得到目标多音字,提取目标多音字在初始中文语句中的位置信息(即多音字位置信息),在初始中文语句中多音字位置信息对应的目标多音字进行标注,标注的内容包括目标多音字和目标多音字的多音字位置信息,标注的内容还可以包括目标多音字基于与初始中文语句对应的中文语句的发音,其中,可通过计算语义相似度、情绪相似度和句式相似度的权重和值,来匹配与初始中文语句对应的中文语句,从而得到待处理中文 语句。The server receives the initial Chinese sentence sent by the preset interface, calls the pre-created polyphonic word dictionary, performs polyphonic word recognition on the initial Chinese sentence, obtains the target polyphonic word, and extracts the position information of the target polyphonic word in the initial Chinese sentence (that is, the position of the polyphonic word). information), in the initial Chinese sentence, the target polyphone corresponding to the polyphone position information is marked, and the marked content includes the target polyphone and the polyphone position information of the target polyphone, and the marked content can also include the target polyphone based on the initial Chinese The pronunciation of the Chinese sentence corresponding to the sentence, wherein the Chinese sentence corresponding to the initial Chinese sentence can be matched by calculating the weight and value of the semantic similarity, emotional similarity and sentence similarity, so as to obtain the Chinese sentence to be processed.
服务器调用预置的有监督的神经网络编码器和/或无监督的预训练网络编码器,对待处理中文语句进行字向量编码,得到字表示向量集,从字表示向量集中提取目标多音字对应的多音字表示向量。The server invokes a preset supervised neural network encoder and/or an unsupervised pre-trained network encoder to encode the Chinese sentences to be processed with word vectors to obtain a word representation vector set, and extracts the corresponding polyphonic words from the word representation vector set. Polyphone representation vector.
具体地,服务器通过预置的深度神经网络编码器,对待处理中文语句中的每个字进行编码,得到字表示向量集,一个字表示向量对应一个字;根据多音字位置信息,从字表示向量集中提取目标多音字对应的表示向量,得到多音字表示向量。Specifically, the server encodes each word in the Chinese sentence to be processed through a preset deep neural network encoder, and obtains a word representation vector set, one word representation vector corresponds to one word; according to the position information of polyphonic words, the word representation vector The representation vector corresponding to the target polyphonic word is extracted centrally, and the polyphonic word representation vector is obtained.
服务器调用预置的有监督的神经网络编码器中的深度神经网络编码器,深度神经网络编码器可包括但不限于长短期记忆人工神经网络(long short-term memory,LSTM)模型和基于变压器的双向编码器表示(bidirectional encoder representations from transformers,BERT)模型中至少一种,通过深度神经网络编码器,按照待处理中文语句中每个字的序列顺序,对待处理中文语句中的每个字进行基于上下文语义信息的编码,得到每个字的表示向量,即字表示向量集,提取字表示向量集中对应多音字位置信息的表示向量,得到多音字表示向量,例如:待处理中文语句为“所有商品都打折出售”,多音字位置信息为待处理中文语句中的第七个字,从字表示向量集中提取第七个字表示向量,得到目标多音字对应的多音字表示向量。The server invokes the deep neural network encoder among the preset supervised neural network encoders. The deep neural network encoder may include, but is not limited to, long short-term memory (LSTM) models and transformer-based At least one of the bidirectional encoder representations from transformers (BERT) models, through the deep neural network encoder, according to the sequence order of each word in the Chinese sentence to be processed, each word in the Chinese sentence to be processed is based on Coding of contextual semantic information, obtaining the representation vector of each word, that is, the word representation vector set, extracting the representation vector corresponding to the position information of the polyphonic word in the word representation vector set, and obtaining the polyphonic word representation vector, for example: the Chinese sentence to be processed is "all products" All at a discount”, the location information of the polyphonic word is the seventh word in the Chinese sentence to be processed, extract the seventh word representation vector from the word representation vector set, and obtain the polyphonic word representation vector corresponding to the target polyphonic word.
202、对待处理中文语句进行分词处理得到目标分词,并根据目标分词将字表示向量集转换为词级特征表示向量。202. Perform word segmentation processing on the Chinese sentence to be processed to obtain a target word segmentation, and convert the word representation vector set into a word-level feature representation vector according to the target word segmentation.
具体地,服务器对待处理中文语句进行分词处理得到目标分词;根据目标分词对字表示向量集进行划分,得到每个词的表示向量组;通过预置的混合池化层,将每个词的表示向量组进行混合池化,得到词级特征表示向量。Specifically, the server performs word segmentation processing on the Chinese sentence to be processed to obtain the target word segmentation; according to the target word segmentation, the word representation vector set is divided to obtain the representation vector group of each word; through the preset hybrid pooling layer, the representation of each word is divided The vector group is mixed and pooled to obtain the word-level feature representation vector.
服务器调用预置的中文分词算法,对待处理中文语句进行分词处理,得到初始分词,对初始分词进行词性检测和词组检测,将检测通过的初始分词确定为目标分词,该中文分词算法综合了元语法N-Gram模型和双向最大匹配(bi-directction matching method,BM)模型,即N-Gram模型的输出可为BM模型的输入,或,BM模型的输出可为N-Gram模型的输入,或,N-Gram模型与BM模型并列连接。The server calls the preset Chinese word segmentation algorithm, performs word segmentation processing on the Chinese sentence to be processed, obtains the initial word segmentation, performs part-of-speech detection and phrase detection on the initial word segmentation, and determines the initial word segmentation that passes the detection as the target word segmentation. The Chinese word segmentation algorithm integrates meta-grammar N-Gram model and bi-directction matching method (BM) model, that is, the output of the N-Gram model can be the input of the BM model, or the output of the BM model can be the input of the N-Gram model, or, The N-Gram model is connected in parallel with the BM model.
服务器根据目标分词对字表示向量集进行划分,得到每个词的表示向量组,例如:待处理中文语句为“所有商品都打折出售”,对应的目标分词为“所有”、“商品”、“都”、“打折”和“出售”,以“打折”为例,“打折”这个词的表示向量组包括“打”的表示向量和“折”的表示向量,其余词也是同理。The server divides the word representation vector set according to the target word segmentation, and obtains the representation vector group of each word. For example, the Chinese sentence to be processed is "all products are sold at a discount", and the corresponding target word segmentation is "all", "commodity", " All, "discount" and "sale", taking "discount" as an example, the representation vector group of the word "discount" includes the representation vector of "hit" and the representation vector of "discount", and the same is true for other words.
预置的混合池化层用于指示结合最大池化和平均池化的池化层,服务器调用预置的混合池化层,将每个词的表示向量组进行混合池化,得到词级特征表示向量,例如:将词的表示向量组中的“打”的表示向量和“折”的表示向量进行融合,得到“打折”的词级特征表示向量。其中,服务器可通过混合池化层中的最大池化卷积核或最大池化层,对每个词的表示向量组进行最大池化处理,得到第一词表示向量组,通过混合池化层中的平均池化卷积核或平均池化层,对第一词表示向量组进行平均池化处理,得到词级特征表示向量;或者,服务器通过混合池化层中的最大池化卷积核或最大池化层,对每个词的表示向量组进行最大池化处理,得到第一词表示向量组,通过混合池化层中的平均池化卷积核或平均池化层,对每个词的表示向量组进行平均池化处理,得到第二词表示向量组,将第一词表示向量组和第二词表示向量组进行融合,得到词级特征表示向量;或者,服务器预先创建融合最大池化卷积核和平均池化卷积核的混合池化层,对每个词的表示向量组进行池化卷积处理,得到词级特征表示向量,词级特征表示向量的数量包括一个或一个以上,一个目标分词对应一个词级特征表示向量。The preset hybrid pooling layer is used to indicate a pooling layer that combines maximum pooling and average pooling. The server calls the preset hybrid pooling layer to perform hybrid pooling on the representation vector group of each word to obtain word-level features. Representation vector, for example: fuse the representation vector of "Dai" and the representation vector of "Zhe" in the representation vector group of the word to obtain the word-level feature representation vector of "Discount". Among them, the server can perform the maximum pooling process on the representation vector group of each word through the maximum pooling convolution kernel or the maximum pooling layer in the mixed pooling layer to obtain the first word representation vector group. The average pooling convolution kernel or average pooling layer in the average pooling process is performed on the first word representation vector group to obtain the word-level feature representation vector; or, the server uses the maximum pooling convolution kernel in the mixed pooling layer. Or the maximum pooling layer, which performs maximum pooling on the representation vector group of each word to obtain the first word representation vector group. Through the average pooling convolution kernel or average pooling layer in the mixed pooling layer, each Perform average pooling on the word representation vector group to obtain the second word representation vector group, and fuse the first word representation vector group and the second word representation vector group to obtain the word-level feature representation vector; or, the server pre-creates the fusion maximum The mixed pooling layer of the pooling convolution kernel and the average pooling convolution kernel performs pooling and convolution processing on the representation vector group of each word to obtain the word-level feature representation vector. The number of word-level feature representation vectors includes one or More than one, a target word corresponds to a word-level feature representation vector.
203、通过预置的前馈注意力机制,对多音字表示向量和词级特征表示向量进行注意力 计算,得到注意力向量。203. Through a preset feedforward attention mechanism, perform attention calculation on the polyphonic word representation vector and the word-level feature representation vector to obtain an attention vector.
服务器通过预置的前馈注意力机制feed-forward attention,计算多音字表示向量和词级特征表示向量之间的注意力值,通过该注意力值将多音字表示向量和词级特征表示向量进行加权求和,得到注意力向量;或者,服务器通过预置的前馈注意力机制feed-forward attention,计算多音字表示向量相对词级特征表示向量的注意力值,将注意力值与多音字表示向量相乘,得到多音字表示向量矩阵,将多音字表示向量矩阵与词级特征表示向量进行矩阵相加或矩阵相乘,得到注意力向量。The server calculates the attention value between the polyphonic word representation vector and the word-level feature representation vector through the preset feed-forward attention mechanism feed-forward attention. Weighted summation to obtain the attention vector; or, the server calculates the attention value of the polyphonic word representation vector relative to the word-level feature representation vector through the preset feed-forward attention mechanism feed-forward attention, and expresses the attention value with the polyphonic word representation. The vectors are multiplied to obtain the polyphonic word representation vector matrix, and the polyphonic word representation vector matrix and the word-level feature representation vector are subjected to matrix addition or matrix multiplication to obtain the attention vector.
204、将注意力向量与多音字表示向量进行拼接,得到目标向量。204. Splicing the attention vector and the polyphonic word representation vector to obtain a target vector.
服务器得到注意力向量后,将注意力向量与多音字表示向量进行矩阵相乘或矩阵相加,得到目标向量;或者服务器,即将注意力向量与多音字表示向量进行加权求和,得到目标向量。通过预置的前馈注意力机制获取目标向量,表示对于目标多音字,待处理中文语句中哪一个词的信息更重要,需要更大的权重,从而提高目标多音字的上下文语义融合的准确性。After the server obtains the attention vector, it performs matrix multiplication or matrix addition of the attention vector and the polyphonic word representation vector to obtain the target vector; or the server performs the weighted summation of the attention vector and the polyphonic word representation vector to obtain the target vector. The target vector is obtained through the preset feed-forward attention mechanism, indicating which word information in the Chinese sentence to be processed is more important for the target polysyllabic word and requires greater weight, thereby improving the accuracy of the contextual semantic fusion of the target polyphonic word .
205、通过预置的线性层,计算目标向量的目标拼音概率,并根据目标拼音概率确定目标多音字的目标发音。205. Calculate the target pinyin probability of the target vector through the preset linear layer, and determine the target pronunciation of the target polyphonic word according to the target pinyin probability.
具体地,服务器通过预置的线性层,计算目标向量基于每一个拼音的概率,得到多音字拼音概率值集;按照值从大到小的顺序,对多音字拼音概率值集中的多音字拼音概率值进行排序,并将排序第一的多音字拼音概率值确定为目标拼音概率;将目标拼音概率对应的拼音确定为目标多音字的目标发音。Specifically, the server calculates the probability of the target vector based on each pinyin through a preset linear layer, and obtains a set of pinyin probability values for polysyllabic words; The values are sorted, and the pinyin probability value of the first sorted polysyllabic word is determined as the target pinyin probability; the pinyin corresponding to the target pinyin probability is determined as the target pronunciation of the target polysyllabic word.
例如,线性层的数量为一层,服务器将目标向量输入至预置的线性层,通过该线性层计算目标向量基于每一个拼音的概率,得到多音字拼音概率值集,为多音字拼音概率值1和多音字拼音概率值2,按照值从大到小的顺序,对多音字拼音概率值1和多音字拼音概率值2进行排序,得到序列“多音字拼音概率值2-多音字拼音概率值1”,多音字拼音概率值2排序第一,则排序第一为目标拼音概率,将目标拼音概率对应的拼音确定为目标多音字的目标发音。For example, the number of linear layers is one, and the server inputs the target vector into the preset linear layer, and calculates the probability of the target vector based on each pinyin through the linear layer, and obtains a set of probability values for the pinyin of the multi-syllable word, which is the pinyin probability value of the poly-syllable word. 1 and the probability value of the pinyin of the multi-syllable word 2, according to the order of the values from large to small, sort the probability value of the pinyin of the multi-syllable word 1 and the probability value of the pinyin of the multi-syllable word 2, and obtain the sequence "the probability value of the pinyin of the multi-syllable word 2 - the probability value of the pinyin of the multi-syllable word. 1", the pinyin probability value of the polyphonic word 2 ranks first, then the first ranking is the target pinyin probability, and the pinyin corresponding to the target pinyin probability is determined as the target pronunciation of the target polyphonic word.
具体地,服务器通过预置的线性层,计算目标向量的目标拼音概率,并根据目标拼音概率确定目标多音字的目标发音之后,获取目标发音基于标注发音的误差值,并根据误差值对目标发音的获取策略进行优化,获取策略包括获取目标发音的执行过程、算法和网络结构。Specifically, the server calculates the target pinyin probability of the target vector through the preset linear layer, and after determining the target pronunciation of the target polyphonic word according to the target pinyin probability, obtains the error value of the target pronunciation based on the marked pronunciation, and determines the target pronunciation according to the error value. The acquisition strategy is optimized, and the acquisition strategy includes the execution process, algorithm and network structure of acquiring the target pronunciation.
服务器获取目标多音字的标注发音,该标注发音为目标多音字基于与待处理中文语句的语义和情绪对应的语句的发音,该标注发音可由人工进行标注而得,也可通过预先训练的多音字标注模型进行标注而得,计算目标多音字的目标发音与标注发音之间的发音相似度,计算发音相似度与1的差值,得到目标发音基于标注发音的误差值,通过该误差值对获取目标发音的执行过程进行调整,通过该误差值对获取目标发音所采用的网络结构进行优化,该网络结构包括神经网络结构和模型参数,可对应的处理功能可为表示向量的生成、表示向量的提取、分词和线性层的拼音概率计算等,通过该误差值对获取目标发音所采用的算法进行增加或删除或执行顺序的调整,通过根据误差值对目标发音的获取策略进行优化,提高了预测多音字发音的准确率。The server obtains the marked pronunciation of the target polyphonic word, and the marked pronunciation is the pronunciation of the target polyphonic word based on the pronunciation of the sentence corresponding to the semantics and emotions of the Chinese sentence to be processed. The labeling model is obtained by labeling, calculating the pronunciation similarity between the target pronunciation of the target polyphonic word and the labeled pronunciation, calculating the difference between the pronunciation similarity and 1, and obtaining the error value of the target pronunciation based on the labeled pronunciation, which is obtained through the error value pair. The execution process of the target pronunciation is adjusted, and the network structure used to obtain the target pronunciation is optimized by the error value. Extraction, word segmentation and pinyin probability calculation of the linear layer, etc., through the error value to add or delete the algorithm used to obtain the target pronunciation or adjust the execution order, and improve the prediction by optimizing the acquisition strategy of the target pronunciation according to the error value. The accuracy of the pronunciation of polyphonic words.
本申请实施例中,通过根据目标分词将字表示向量集转换为词级特征表示向量,通过将字的特征转化为词级特征,避免了未登录词问题,从而有效地提高了预测多音字发音的准确率,通过对多音字表示向量和词级特征表示向量进行基于注意力机制的拼接处理,以及通过预置的线性层,计算目标向量的目标拼音概率,并根据目标拼音概率确定目标多音字的目标发音,通过目标结合分词和注意力机制,对目标多音字的发音进行预测,无需任何规则和人工特征设计,减轻了分词存在的标注错误问题所造成的影响,能够准确地捕获 待处理中文语句的文本语义信息,提高了预测多音字发音的准确率。In the embodiment of the present application, the word representation vector set is converted into a word-level feature representation vector according to the target word segmentation, and the word features are converted into word-level features to avoid the problem of unregistered words, thereby effectively improving the prediction of the pronunciation of polyphonic words. The accuracy rate of the target vector is calculated by splicing the polyphonic word representation vector and the word-level feature representation vector based on the attention mechanism, and through the preset linear layer, the target pinyin probability of the target vector is calculated, and the target polyphonic word is determined according to the target pinyin probability. The target pronunciation of the target is combined with the word segmentation and attention mechanism to predict the pronunciation of the target polyphonic word without any rules and artificial feature design, which reduces the impact of the labeling error in the word segmentation, and can accurately capture the Chinese to be processed. The textual semantic information of the sentence improves the accuracy of predicting the pronunciation of polyphonic words.
上面对本申请实施例中多音字发音的预测方法进行了描述,下面对本申请实施例中多音字发音的预测装置进行描述,请参阅图3,本申请实施例中多音字发音的预测装置一个实施例包括:The method for predicting the pronunciation of a polyphonic word in the embodiment of the present application has been described above. The following describes the device for predicting the pronunciation of a polyphonic word in the embodiment of the present application. Please refer to FIG. 3 , an embodiment of the device for predicting the pronunciation of a polyphonic word in the embodiment of the present application. include:
获取模块301,用于获取经过标注的待处理中文语句,并获取待处理中文语句的字表示向量集和多音字表示向量,待处理中文语句包括目标多音字;The obtaining module 301 is used to obtain the marked Chinese sentences to be processed, and to obtain a word representation vector set and a polyphonic word representation vector of the to-be-processed Chinese sentences, and the to-be-processed Chinese sentences include target polyphonic words;
转换模块302,用于对待处理中文语句进行分词处理得到目标分词,并根据目标分词将字表示向量集转换为词级特征表示向量;The conversion module 302 is used to perform word segmentation processing on the Chinese sentence to be processed to obtain a target word segmentation, and convert the word representation vector set into a word-level feature representation vector according to the target word segmentation;
拼接模块303,用于对多音字表示向量和词级特征表示向量进行基于注意力机制的拼接处理,得到目标向量;The splicing module 303 is used to perform splicing processing based on the attention mechanism on the polyphonic word representation vector and the word-level feature representation vector to obtain a target vector;
确定模块304,用于通过预置的线性层,计算目标向量的目标拼音概率,并根据目标拼音概率确定目标多音字的目标发音。The determining module 304 is configured to calculate the target pinyin probability of the target vector through a preset linear layer, and determine the target pronunciation of the target polyphonic word according to the target pinyin probability.
上述多音字发音的预测装置中各个模块的功能实现与上述多音字发音的预测方法实施例中各步骤相对应,其功能和实现过程在此处不再一一赘述。The function implementation of each module in the above-mentioned apparatus for predicting the pronunciation of a polyphonic word corresponds to each step in the above-mentioned embodiment of the method for predicting the pronunciation of a polyphonic word, and the functions and implementation process thereof will not be repeated here.
本申请实施例中,通过根据目标分词将字表示向量集转换为词级特征表示向量,通过将字的特征转化为词级特征,避免了未登录词问题,从而有效地提高了预测多音字发音的准确率,通过对多音字表示向量和词级特征表示向量进行基于注意力机制的拼接处理,以及通过预置的线性层,计算目标向量的目标拼音概率,并根据目标拼音概率确定目标多音字的目标发音,通过结合目标分词和注意力机制,对目标多音字的发音进行预测,无需任何规则和人工特征设计,减轻了分词存在的标注错误问题所造成的影响,能够准确地捕获待处理中文语句的文本语义信息,提高了预测多音字发音的准确率。In the embodiment of the present application, the word representation vector set is converted into a word-level feature representation vector according to the target word segmentation, and the word features are converted into word-level features to avoid the problem of unregistered words, thereby effectively improving the prediction of the pronunciation of polyphonic words. The accuracy rate of the target vector is calculated by splicing the polyphonic word representation vector and the word-level feature representation vector based on the attention mechanism, and through the preset linear layer, the target pinyin probability of the target vector is calculated, and the target polyphonic word is determined according to the target pinyin probability. By combining the target word segmentation and attention mechanism, the pronunciation of the target polyphonic word is predicted without any rules and artificial feature design, which alleviates the impact of the labeling error in word segmentation, and can accurately capture the Chinese to be processed. The textual semantic information of the sentence improves the accuracy of predicting the pronunciation of polyphonic words.
请参阅图4,本申请实施例中多音字发音的预测装置的另一个实施例包括:Referring to FIG. 4, another embodiment of the device for predicting the pronunciation of polyphonic words in the embodiment of the present application includes:
获取模块301,用于获取经过标注的待处理中文语句,并获取待处理中文语句的字表示向量集和多音字表示向量,待处理中文语句包括目标多音字;The obtaining module 301 is used to obtain the marked Chinese sentences to be processed, and to obtain a word representation vector set and a polyphonic word representation vector of the to-be-processed Chinese sentences, and the to-be-processed Chinese sentences include target polyphonic words;
转换模块302,用于对待处理中文语句进行分词处理得到目标分词,并根据目标分词将字表示向量集转换为词级特征表示向量;The conversion module 302 is used to perform word segmentation processing on the Chinese sentence to be processed to obtain a target word segmentation, and convert the word representation vector set into a word-level feature representation vector according to the target word segmentation;
拼接模块303,用于对多音字表示向量和词级特征表示向量进行基于注意力机制的拼接处理,得到目标向量;The splicing module 303 is used to perform splicing processing based on the attention mechanism on the polyphonic word representation vector and the word-level feature representation vector to obtain a target vector;
其中,拼接模块303具体包括:Wherein, the splicing module 303 specifically includes:
计算单元3031,用于通过预置的前馈注意力机制,对多音字表示向量和词级特征表示向量进行注意力计算,得到注意力向量;The calculation unit 3031 is used to perform attention calculation on the polyphonic word representation vector and the word-level feature representation vector through the preset feedforward attention mechanism, and obtain the attention vector;
拼接单元3032,用于将注意力向量与多音字表示向量进行拼接,得到目标向量;The splicing unit 3032 is used for splicing the attention vector and the polyphonic word representation vector to obtain the target vector;
确定模块304,用于通过预置的线性层,计算目标向量的目标拼音概率,并根据目标拼音概率确定目标多音字的目标发音。The determining module 304 is configured to calculate the target pinyin probability of the target vector through a preset linear layer, and determine the target pronunciation of the target polyphonic word according to the target pinyin probability.
可选的,转换模块302还可以具体用于:Optionally, the conversion module 302 can also be specifically used for:
对待处理中文语句进行分词处理得到目标分词;Perform word segmentation processing on the Chinese sentence to be processed to obtain the target word segmentation;
根据目标分词对字表示向量集进行划分,得到每个词的表示向量组;Divide the word representation vector set according to the target word segmentation to obtain the representation vector group of each word;
通过预置的混合池化层,将每个词的表示向量组进行混合池化,得到词级特征表示向量。Through the preset mixed pooling layer, the representation vector group of each word is mixed and pooled to obtain the word-level feature representation vector.
可选的,确定模块304还可以具体用于:Optionally, the determining module 304 can also be specifically used for:
通过预置的线性层,计算目标向量基于每一个拼音的概率,得到多音字拼音概率值集;Through the preset linear layer, calculate the probability of the target vector based on each pinyin, and obtain the probability value set of multi-syllable word pinyin;
按照值从大到小的顺序,对多音字拼音概率值集中的多音字拼音概率值进行排序,并 将排序第一的多音字拼音概率值确定为目标拼音概率;According to the order of value from big to small, sort the polyphonic word pinyin probability values in the polyphonic word pinyin probability value set, and determine the first polyphonic word pinyin probability value to be the target pinyin probability;
将目标拼音概率对应的拼音确定为目标多音字的目标发音。The pinyin corresponding to the target pinyin probability is determined as the target pronunciation of the target polyphonic word.
可选的,获取模块301包括:Optionally, the obtaining module 301 includes:
获取单元3011,用于获取初始中文语句、初始中文语句中的目标多音字和目标多音字对应的多音字位置信息;Obtaining unit 3011 is used to obtain the initial Chinese sentence, the target polyphone in the initial Chinese sentence, and the polyphone position information corresponding to the target polyphone;
标注单元3012,用于根据多音字位置信息,对初始中文语句中的目标多音字进行标注,得到待处理中文语句;The labeling unit 3012 is used to label the target polyphonic word in the initial Chinese sentence according to the position information of the polyphonic word to obtain the Chinese sentence to be processed;
编码提取单元3013,用于对待处理中文语句依次进行字向量编码和多音字向量提取,得到字表示向量集和多音字表示向量。The encoding and extracting unit 3013 is configured to sequentially perform word vector encoding and polyphonic word vector extraction on the Chinese sentence to be processed to obtain a word representation vector set and a polyphonic word representation vector.
可选的,编码提取单元3013还可以具体用于:Optionally, the code extraction unit 3013 can also be specifically used for:
通过预置的深度神经网络编码器,对待处理中文语句中的每个字进行编码,得到字表示向量集,一个字表示向量对应一个字;Through the preset deep neural network encoder, encode each word in the Chinese sentence to be processed, and obtain a word representation vector set, one word representation vector corresponds to one word;
根据多音字位置信息,从字表示向量集中提取目标多音字对应的表示向量,得到多音字表示向量。According to the position information of the polyphonic word, the representation vector corresponding to the target polyphonic word is extracted from the word representation vector set to obtain the representation vector of the polyphonic word.
可选的,多音字发音的预测装置,还包括:Optionally, the device for predicting the pronunciation of polyphonic words further includes:
优化模块305,用于获取目标发音基于标注发音的误差值,并根据误差值对目标发音的获取策略进行优化,获取策略包括获取目标发音的执行过程、算法和网络结构。The optimization module 305 is used to obtain the error value of the target pronunciation based on the marked pronunciation, and optimize the acquisition strategy of the target pronunciation according to the error value, and the acquisition strategy includes the execution process, algorithm and network structure of obtaining the target pronunciation.
上述多音字发音的预测装置中各模块和各单元的功能实现与上述多音字发音的预测方法实施例中各步骤相对应,其功能和实现过程在此处不再一一赘述。The function implementation of each module and each unit in the above-mentioned polyphonic word pronunciation prediction apparatus corresponds to each step in the above-mentioned polyphonic word pronunciation prediction method embodiment, and the functions and implementation process thereof will not be repeated here.
本申请实施例中,通过根据目标分词将字表示向量集转换为词级特征表示向量,通过将字的特征转化为词级特征,避免了未登录词问题,从而有效地提高了预测多音字发音的准确率,通过对多音字表示向量和词级特征表示向量进行基于注意力机制的拼接处理,以及通过预置的线性层,计算目标向量的目标拼音概率,并根据目标拼音概率确定目标多音字的目标发音,通过结合目标分词和注意力机制,对目标多音字的发音进行预测,无需任何规则和人工特征设计,减轻了分词存在的标注错误问题所造成的影响,能够准确地捕获待处理中文语句的文本语义信息,提高了预测多音字发音的准确率。In the embodiment of the present application, the word representation vector set is converted into a word-level feature representation vector according to the target word segmentation, and the word features are converted into word-level features to avoid the problem of unregistered words, thereby effectively improving the prediction of the pronunciation of polyphonic words. The accuracy rate of the target vector is calculated by splicing the polyphonic word representation vector and the word-level feature representation vector based on the attention mechanism, and through the preset linear layer, the target pinyin probability of the target vector is calculated, and the target polyphonic word is determined according to the target pinyin probability. By combining the target word segmentation and attention mechanism, the pronunciation of the target polyphonic word is predicted without any rules and artificial feature design, which alleviates the impact of the labeling error in word segmentation, and can accurately capture the Chinese to be processed. The textual semantic information of the sentence improves the accuracy of predicting the pronunciation of polyphonic words.
上面图3和图4从模块化功能实体的角度对本申请实施例中的多音字发音的预测装置进行详细描述,下面从硬件处理的角度对本申请实施例中多音字发音的预测设备进行详细描述。Figures 3 and 4 above describe in detail the device for predicting the pronunciation of polyphones in the embodiment of the present application from the perspective of modular functional entities. The following describes the device for predicting the pronunciation of polyphones in the embodiment of the present application in detail from the perspective of hardware processing.
图5是本申请实施例提供的一种多音字发音的预测设备的结构示意图,该多音字发音的预测设备500可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)510(例如,一个或一个以上处理器)和存储器520,一个或一个以上存储应用程序533或数据532的存储介质530(例如一个或一个以上海量存储设备)。其中,存储器520和存储介质530可以是短暂存储或持久存储。存储在存储介质530的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对多音字发音的预测设备500中的一系列指令操作。更进一步地,处理器510可以设置为与存储介质530通信,在多音字发音的预测设备500上执行存储介质530中的一系列指令操作。5 is a schematic structural diagram of a device for predicting the pronunciation of a polyphonic word provided by an embodiment of the present application. The device 500 for predicting the pronunciation of a polyphonic word may vary greatly due to different configurations or performance, and may include one or more processors (central processing units, CPU) 510 (eg, one or more processors) and memory 520, one or more storage media 530 (eg, one or more mass storage devices) that store application programs 533 or data 532. Among them, the memory 520 and the storage medium 530 may be short-term storage or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations in the apparatus 500 for predicting the pronunciation of polyphonic words. Furthermore, the processor 510 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the device 500 for predicting the pronunciation of polyphones.
多音字发音的预测设备500还可以包括一个或一个以上电源540,一个或一个以上有线或无线网络接口550,一个或一个以上输入输出接口560,和/或,一个或一个以上操作系统531,例如Windows Serve,Mac OS X,Unix,Linux,FreeBSD等等。本领域技术人员可以理解,图5示出的多音字发音的预测设备结构并不构成对多音字发音的预测设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。The device 500 for predicting the pronunciation of polyphones may also include one or more power sources 540, one or more wired or wireless network interfaces 550, one or more input and output interfaces 560, and/or, one or more operating systems 531, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and more. Those skilled in the art can understand that the structure of the prediction device for the pronunciation of polyphonic words shown in FIG. 5 does not constitute a limitation on the prediction device for the pronunciation of polyphonic words, and may include more or less components than those shown in the figure, or combine some components , or a different component arrangement.
本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算 机可读存储介质,该计算机可读存储介质也可以为易失性计算机可读存储介质,计算机可读存储介质中存储有指令,当指令在计算机上运行时,使得计算机执行多音字发音的预测方法的步骤。The present application also provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium. The computer-readable storage medium may also be a volatile computer-readable storage medium. Instructions are stored in the read storage medium, and when the instructions are executed on the computer, make the computer execute the steps of the method for predicting the pronunciation of polyphonic words.
进一步地,计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。Further, the computer-readable storage medium may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required by at least one function, and the like; Use the created data, etc.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device and unit described above may refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented as a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods in the various embodiments of the present application. The aforementioned storage medium includes: U disk, removable hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
以上,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should understand that: it can still be used for the above-mentioned implementations The technical solutions described in the examples are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the present application.

Claims (20)

  1. 一种多音字发音的预测方法,其中,所述多音字发音的预测方法包括:A method for predicting the pronunciation of a polyphone, wherein the method for predicting the pronunciation of the polyphone comprises:
    获取经过标注的待处理中文语句,并获取所述待处理中文语句的字表示向量集和多音字表示向量,所述待处理中文语句包括目标多音字;Obtain the marked Chinese sentences to be processed, and obtain the word representation vector set and the polyphonic word representation vector of the to-be-processed Chinese sentences, and the to-be-processed Chinese sentences include target polyphonic words;
    对所述待处理中文语句进行分词处理得到目标分词,并根据所述目标分词将所述字表示向量集转换为词级特征表示向量;Perform word segmentation processing on the to-be-processed Chinese sentence to obtain a target word segmentation, and convert the word representation vector set into a word-level feature representation vector according to the target word segmentation;
    对所述多音字表示向量和所述词级特征表示向量进行基于注意力机制的拼接处理,得到目标向量;performing splicing processing based on an attention mechanism on the polyphonic word representation vector and the word-level feature representation vector to obtain a target vector;
    通过预置的线性层,计算所述目标向量的目标拼音概率,并根据所述目标拼音概率确定所述目标多音字的目标发音。Through the preset linear layer, the target pinyin probability of the target vector is calculated, and the target pronunciation of the target polyphonic word is determined according to the target pinyin probability.
  2. 根据权利要求1所述的多音字发音的预测方法,其中,所述对所述待处理中文语句进行分词处理得到目标分词,并根据所述目标分词将所述字表示向量集转换为词级特征表示向量,包括:The method for predicting the pronunciation of a polyphonic word according to claim 1, wherein the word segmentation process is performed on the to-be-processed Chinese sentence to obtain a target word segmentation, and the word representation vector set is converted into a word-level feature according to the target word segmentation Represents a vector, including:
    对所述待处理中文语句进行分词处理得到目标分词;Perform word segmentation processing on the to-be-processed Chinese sentences to obtain target word segmentation;
    根据所述目标分词对所述字表示向量集进行划分,得到每个词的表示向量组;Divide the word representation vector set according to the target word segmentation to obtain a representation vector group of each word;
    通过预置的混合池化层,将所述每个词的表示向量组进行混合池化,得到词级特征表示向量。Through a preset mixed pooling layer, the representation vector group of each word is mixed and pooled to obtain a word-level feature representation vector.
  3. 根据权利要求1所述的多音字发音的预测方法,其中,所述对所述多音字表示向量和所述词级特征表示向量进行基于注意力机制的拼接处理,得到目标向量,包括:The method for predicting the pronunciation of a polyphonic word according to claim 1, wherein the splicing process based on the attention mechanism is performed on the polyphonic word representation vector and the word-level feature representation vector to obtain a target vector, including:
    通过预置的前馈注意力机制,对所述多音字表示向量和所述词级特征表示向量进行注意力计算,得到注意力向量;Through the preset feedforward attention mechanism, the attention calculation is performed on the polyphonic word representation vector and the word-level feature representation vector to obtain the attention vector;
    将所述注意力向量与所述多音字表示向量进行拼接,得到目标向量。The attention vector is spliced with the polyphonic word representation vector to obtain a target vector.
  4. 根据权利要求1所述的多音字发音的预测方法,其中,所述通过预置的线性层,计算所述目标向量的目标拼音概率,并根据所述目标拼音概率确定所述目标多音字的目标发音,包括:The method for predicting the pronunciation of a polyphonic word according to claim 1, wherein the target phonetic probability of the target vector is calculated through a preset linear layer, and the target of the target polyphonic word is determined according to the target phonetic probability pronunciation, including:
    通过预置的线性层,计算所述目标向量基于每一个拼音的概率,得到多音字拼音概率值集;Through the preset linear layer, calculate the probability of the target vector based on each pinyin, and obtain the probability value set of the polyphonic word pinyin;
    按照值从大到小的顺序,对所述多音字拼音概率值集中的多音字拼音概率值进行排序,并将排序第一的多音字拼音概率值确定为目标拼音概率;According to the order of values from large to small, sort the polyphonetic word pinyin probability values in the polyphonetic word pinyin probability value set, and determine the first polyphonetic word pinyin probability value to be the target pinyin probability;
    将所述目标拼音概率对应的拼音确定为所述目标多音字的目标发音。The pinyin corresponding to the target pinyin probability is determined as the target pronunciation of the target polyphonic word.
  5. 根据权利要求1所述的多音字发音的预测方法,其中,所述获取经过标注的待处理中文语句,并获取所述待处理中文语句的字表示向量集和多音字表示向量,所述待处理中文语句包括目标多音字,包括:The method for predicting the pronunciation of polyphonic words according to claim 1, wherein the obtaining marked Chinese sentences to be processed, and the character representation vector set and the polyphonic word representation vectors of the Chinese sentences to be processed are obtained, and the to-be-processed Chinese sentences are obtained. Chinese sentences include target polyphonic characters, including:
    获取初始中文语句、所述初始中文语句中的目标多音字和所述目标多音字对应的多音字位置信息;Obtain the initial Chinese sentence, the target polyphone in the initial Chinese sentence, and the polyphone position information corresponding to the target polyphone;
    根据所述多音字位置信息,对所述初始中文语句中的目标多音字进行标注,得到待处理中文语句;According to the position information of the polyphonic word, the target polyphonic word in the initial Chinese sentence is marked to obtain the Chinese sentence to be processed;
    对所述待处理中文语句依次进行字向量编码和多音字向量提取,得到字表示向量集和多音字表示向量。The word vector encoding and polyphonic word vector extraction are sequentially performed on the to-be-processed Chinese sentences to obtain a word representation vector set and a polyphonic word representation vector.
  6. 根据权利要求5所述的多音字发音的预测方法,其中,所述对所述待处理中文语句依次进行字向量编码和多音字向量提取,得到字表示向量集和多音字表示向量,包括:The method for predicting the pronunciation of polyphonic words according to claim 5, wherein, the described Chinese sentences to be processed are sequentially subjected to word vector encoding and polyphonic word vector extraction to obtain a word representation vector set and a polyphonic word representation vector, including:
    通过预置的深度神经网络编码器,对所述待处理中文语句中的每个字进行编码,得到字表示向量集,一个字表示向量对应一个字;Through a preset deep neural network encoder, each word in the Chinese sentence to be processed is encoded to obtain a word representation vector set, and a word representation vector corresponds to a word;
    根据所述多音字位置信息,从所述字表示向量集中提取所述目标多音字对应的表示向量,得到多音字表示向量。According to the position information of the polyphonic word, a representation vector corresponding to the target polyphonic word is extracted from the set of word representation vectors to obtain a representation vector of the polyphonic word.
  7. 根据权利要求1-6中任一项所述的多音字发音的预测方法,其中,所述通过预置的线性层,计算所述目标向量的目标拼音概率,并根据所述目标拼音概率确定所述目标多音字的目标发音之后,还包括:The method for predicting the pronunciation of a polyphonic word according to any one of claims 1-6, wherein the target pinyin probability of the target vector is calculated through a preset linear layer, and the target pinyin probability is determined according to the target pinyin probability. After describing the target pronunciation of the target polyphonic word, it also includes:
    获取所述目标发音基于标注发音的误差值,并根据所述误差值对所述目标发音的获取策略进行优化,所述获取策略包括获取所述目标发音的执行过程、算法和网络结构。The acquisition of the target pronunciation is based on the error value of the marked pronunciation, and the acquisition strategy of the target pronunciation is optimized according to the error value, and the acquisition strategy includes an execution process, an algorithm and a network structure for acquiring the target pronunciation.
  8. 一种多音字发音的预测装置,其中,所述多音字发音的预测装置包括:A prediction device for the pronunciation of a polyphonic word, wherein the prediction device for the pronunciation of the polyphonic word comprises:
    获取模块,用于获取经过标注的待处理中文语句,并获取所述待处理中文语句的字表示向量集和多音字表示向量,所述待处理中文语句包括目标多音字;an acquisition module, configured to acquire the marked Chinese sentences to be processed, and to acquire a word representation vector set and a polyphonic word representation vector of the to-be-processed Chinese sentences, where the to-be-processed Chinese sentences include target polyphonic words;
    转换模块,用于对所述待处理中文语句进行分词处理得到目标分词,并根据所述目标分词将所述字表示向量集转换为词级特征表示向量;a conversion module, configured to perform word segmentation on the to-be-processed Chinese sentence to obtain a target word segmentation, and convert the word representation vector set into a word-level feature representation vector according to the target word segmentation;
    拼接模块,用于对所述多音字表示向量和所述词级特征表示向量进行基于注意力机制的拼接处理,得到目标向量;a splicing module, configured to perform splicing processing based on an attention mechanism on the polyphonic word representation vector and the word-level feature representation vector to obtain a target vector;
    确定模块,用于通过预置的线性层,计算所述目标向量的目标拼音概率,并根据所述目标拼音概率确定所述目标多音字的目标发音。A determination module, configured to calculate the target pinyin probability of the target vector through a preset linear layer, and determine the target pronunciation of the target polyphonic word according to the target pinyin probability.
  9. 一种多音字发音的预测设备,其中,所述多音字发音的预测设备包括:存储器和至少一个处理器,所述存储器中存储有指令;A device for predicting the pronunciation of a polyphonic word, wherein the device for predicting the pronunciation of a polyphonic word comprises: a memory and at least one processor, wherein an instruction is stored in the memory;
    所述至少一个处理器调用所述存储器中的所述指令,以使得所述多音字发音的预测设备执行如下所述的多音字发音的预测方法:The at least one processor invokes the instructions in the memory, so that the prediction device for the pronunciation of the polyphonic word executes the prediction method for the pronunciation of the polyphonic word as described below:
    对所述待处理中文语句进行分词处理得到目标分词,并根据所述目标分词将所述字表示向量集转换为词级特征表示向量;Perform word segmentation processing on the to-be-processed Chinese sentence to obtain a target word segmentation, and convert the word representation vector set into a word-level feature representation vector according to the target word segmentation;
    对所述多音字表示向量和所述词级特征表示向量进行基于注意力机制的拼接处理,得到目标向量;performing splicing processing based on an attention mechanism on the polyphonic word representation vector and the word-level feature representation vector to obtain a target vector;
    通过预置的线性层,计算所述目标向量的目标拼音概率,并根据所述目标拼音概率确定所述目标多音字的目标发音。Through the preset linear layer, the target pinyin probability of the target vector is calculated, and the target pronunciation of the target polyphonic word is determined according to the target pinyin probability.
  10. 根据权利要求9所述的多音字发音的预测设备,其中,所述多音字发音的预测设备被所述处理器执行所述对所述待处理中文语句进行分词处理得到目标分词,并根据所述目标分词将所述字表示向量集转换为词级特征表示向量的步骤时,包括:The device for predicting the pronunciation of a polyphonic word according to claim 9, wherein the device for predicting the pronunciation of a polyphonic word is executed by the processor to perform the word segmentation process on the to-be-processed Chinese sentence to obtain a target word segmentation, and according to the When the target word segmentation converts the word representation vector set into word-level feature representation vector, it includes:
    对所述待处理中文语句进行分词处理得到目标分词;Perform word segmentation processing on the to-be-processed Chinese sentence to obtain target word segmentation;
    根据所述目标分词对所述字表示向量集进行划分,得到每个词的表示向量组;Divide the word representation vector set according to the target word segmentation to obtain a representation vector group of each word;
    通过预置的混合池化层,将所述每个词的表示向量组进行混合池化,得到词级特征表示向量。Through a preset mixed pooling layer, the representation vector group of each word is mixed and pooled to obtain a word-level feature representation vector.
  11. 根据权利要求9所述的多音字发音的预测设备,其中,所述多音字发音的预测设备被所述处理器执行所述对所述多音字表示向量和所述词级特征表示向量进行基于注意力机制的拼接处理,得到目标向量的步骤时,包括:The device for predicting the pronunciation of a polyphonic word according to claim 9, wherein the device for predicting the pronunciation of a polyphonic word is executed by the processor by performing the attention-based performing of the polyphonic word representation vector and the word-level feature representation vector. For the splicing process of the force mechanism, the steps to obtain the target vector include:
    通过预置的前馈注意力机制,对所述多音字表示向量和所述词级特征表示向量进行注意力计算,得到注意力向量;Through the preset feedforward attention mechanism, the attention calculation is performed on the polyphonic word representation vector and the word-level feature representation vector to obtain the attention vector;
    将所述注意力向量与所述多音字表示向量进行拼接,得到目标向量。The attention vector is spliced with the polyphonic word representation vector to obtain a target vector.
  12. 根据权利要求9所述的多音字发音的预测设备,其中,所述多音字发音的预测设备被所述处理器执行所述通过预置的线性层,计算所述目标向量的目标拼音概率,并根据所述目标拼音概率确定所述目标多音字的目标发音的步骤时,包括:The device for predicting the pronunciation of a polyphonic word according to claim 9, wherein the device for predicting the pronunciation of a polyphonic word is executed by the processor to calculate the target pinyin probability of the target vector through a preset linear layer, and When the step of determining the target pronunciation of the target polyphonic word according to the target pinyin probability includes:
    通过预置的线性层,计算所述目标向量基于每一个拼音的概率,得到多音字拼音概率值集;Through the preset linear layer, calculate the probability of the target vector based on each pinyin, and obtain the probability value set of the polyphonic word pinyin;
    按照值从大到小的顺序,对所述多音字拼音概率值集中的多音字拼音概率值进行排序,并将排序第一的多音字拼音概率值确定为目标拼音概率;According to the order of values from large to small, sort the polyphonetic word pinyin probability values in the polyphonetic word pinyin probability value set, and determine the first polyphonetic word pinyin probability value to be the target pinyin probability;
    将所述目标拼音概率对应的拼音确定为所述目标多音字的目标发音。The pinyin corresponding to the target pinyin probability is determined as the target pronunciation of the target polyphonic word.
  13. 根据权利要求9所述的多音字发音的预测设备,其中,所述多音字发音的预测设备被所述处理器执行所述获取经过标注的待处理中文语句,并获取所述待处理中文语句的字表示向量集和多音字表示向量,所述待处理中文语句包括目标多音字的步骤时,包括:The device for predicting the pronunciation of polyphones according to claim 9, wherein the device for predicting the pronunciation of polyphones is executed by the processor to obtain the marked Chinese sentences to be processed, and obtain the data of the Chinese sentences to be processed. The word representation vector set and the polyphonic word representation vector, when the Chinese sentence to be processed includes the steps of the target polyphonic word, including:
    获取初始中文语句、所述初始中文语句中的目标多音字和所述目标多音字对应的多音字位置信息;Obtain the initial Chinese sentence, the target polyphone in the initial Chinese sentence, and the polyphone position information corresponding to the target polyphone;
    根据所述多音字位置信息,对所述初始中文语句中的目标多音字进行标注,得到待处理中文语句;According to the position information of the polyphonic word, the target polyphonic word in the initial Chinese sentence is marked to obtain the Chinese sentence to be processed;
    对所述待处理中文语句依次进行字向量编码和多音字向量提取,得到字表示向量集和多音字表示向量。The word vector encoding and polyphonic word vector extraction are sequentially performed on the to-be-processed Chinese sentences to obtain a word representation vector set and a polyphonic word representation vector.
  14. 根据权利要求13所述的多音字发音的预测设备,其中,所述多音字发音的预测设备被所述处理器执行所述对所述待处理中文语句依次进行字向量编码和多音字向量提取,得到字表示向量集和多音字表示向量的步骤时,包括:The device for predicting the pronunciation of a polyphonic word according to claim 13, wherein the device for predicting the pronunciation of a polyphonic word is executed by the processor by performing the sequence of performing word vector encoding and polyphonic word vector extraction on the to-be-processed Chinese sentence, The steps for obtaining the word representation vector set and polyphonic word representation vector include:
    通过预置的深度神经网络编码器,对所述待处理中文语句中的每个字进行编码,得到字表示向量集,一个字表示向量对应一个字;Through a preset deep neural network encoder, each word in the Chinese sentence to be processed is encoded to obtain a word representation vector set, and a word representation vector corresponds to a word;
    根据所述多音字位置信息,从所述字表示向量集中提取所述目标多音字对应的表示向量,得到多音字表示向量。According to the position information of the polyphonic word, a representation vector corresponding to the target polyphonic word is extracted from the set of word representation vectors to obtain a representation vector of the polyphonic word.
  15. 根据权利要求9-14中任一项所述的多音字发音的预测设备,其中,在所述多音字发音的预测设备被所述处理器执行所述通过预置的线性层,计算所述目标向量的目标拼音概率,并根据所述目标拼音概率确定所述目标多音字的目标发音的步骤之后,还包括:The device for predicting the pronunciation of a polyphonic word according to any one of claims 9-14, wherein, in the device for predicting the pronunciation of a polyphonic word, the processor executes the preset linear layer to calculate the target The target phonetic probability of the vector, and after the step of determining the target pronunciation of the target polyphonic word according to the target phonetic probability, it also includes:
    获取所述目标发音基于标注发音的误差值,并根据所述误差值对所述目标发音的获取策略进行优化,所述获取策略包括获取所述目标发音的执行过程、算法和网络结构。The acquisition of the target pronunciation is based on the error value of the marked pronunciation, and the acquisition strategy of the target pronunciation is optimized according to the error value, and the acquisition strategy includes an execution process, an algorithm and a network structure for acquiring the target pronunciation.
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有指令,其中,所述指令被处理器执行时实现如下所述的多音字发音的预测方法:A computer-readable storage medium having instructions stored on the computer-readable storage medium, wherein, when the instructions are executed by a processor, the following method for predicting the pronunciation of polyphonic words is implemented:
    对所述待处理中文语句进行分词处理得到目标分词,并根据所述目标分词将所述字表示向量集转换为词级特征表示向量;Perform word segmentation processing on the to-be-processed Chinese sentence to obtain a target word segmentation, and convert the word representation vector set into a word-level feature representation vector according to the target word segmentation;
    对所述多音字表示向量和所述词级特征表示向量进行基于注意力机制的拼接处理,得到目标向量;performing splicing processing based on an attention mechanism on the polyphonic word representation vector and the word-level feature representation vector to obtain a target vector;
    通过预置的线性层,计算所述目标向量的目标拼音概率,并根据所述目标拼音概率确定所述目标多音字的目标发音。Through the preset linear layer, the target pinyin probability of the target vector is calculated, and the target pronunciation of the target polyphonic word is determined according to the target pinyin probability.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述多音字发音的预测指令被所述处理器执行所述对所述待处理中文语句进行分词处理得到目标分词,并根据所述目标分词将所述字表示向量集转换为词级特征表示向量的步骤时,包括:The computer-readable storage medium according to claim 16, wherein the prediction instruction of the pronunciation of the polyphonic word is executed by the processor, and the word segmentation process is performed on the to-be-processed Chinese sentence to obtain a target word segmentation, and according to the target word segmentation The step of converting the word representation vector set into word-level feature representation vector during word segmentation includes:
    对所述待处理中文语句进行分词处理得到目标分词;Perform word segmentation processing on the to-be-processed Chinese sentence to obtain target word segmentation;
    根据所述目标分词对所述字表示向量集进行划分,得到每个词的表示向量组;Divide the word representation vector set according to the target word segmentation to obtain a representation vector group of each word;
    通过预置的混合池化层,将所述每个词的表示向量组进行混合池化,得到词级特征表示向量。Through a preset mixed pooling layer, the representation vector group of each word is mixed and pooled to obtain a word-level feature representation vector.
  18. 根据权利要求16所述的计算机可读存储介质,其中,所述多音字发音的预测指令被所述处理器执行所述对所述多音字表示向量和所述词级特征表示向量进行基于注意力机制的拼接处理,得到目标向量的步骤时,包括:17. The computer-readable storage medium of claim 16, wherein the instructions for predicting the pronunciation of the polyphone are executed by the processor and the attention-based performing the attention-based performing of the polyphone representation vector and the word-level feature representation vector is executed by the processor. In the splicing process of the mechanism, the steps to obtain the target vector include:
    通过预置的前馈注意力机制,对所述多音字表示向量和所述词级特征表示向量进行注意力计算,得到注意力向量;Through the preset feedforward attention mechanism, the attention calculation is performed on the polyphonic word representation vector and the word-level feature representation vector to obtain the attention vector;
    将所述注意力向量与所述多音字表示向量进行拼接,得到目标向量。The attention vector is spliced with the polyphonic word representation vector to obtain a target vector.
  19. 根据权利要求16所述的计算机可读存储介质,其中,所述多音字发音的预测指令被所述处理器执行所述通过预置的线性层,计算所述目标向量的目标拼音概率,并根据所述目标拼音概率确定所述目标多音字的目标发音的步骤时,包括:The computer-readable storage medium according to claim 16, wherein the prediction instruction of the pronunciation of the polyphonic word is executed by the processor through the preset linear layer, the target pinyin probability of the target vector is calculated, and the target pinyin probability is calculated according to the When the target pinyin probability determines the target pronunciation of the target polyphonic word, it includes:
    通过预置的线性层,计算所述目标向量基于每一个拼音的概率,得到多音字拼音概率值集;Through the preset linear layer, calculate the probability of the target vector based on each pinyin, and obtain the probability value set of the polyphonic word pinyin;
    按照值从大到小的顺序,对所述多音字拼音概率值集中的多音字拼音概率值进行排序,并将排序第一的多音字拼音概率值确定为目标拼音概率;According to the order of values from large to small, sort the polyphonetic word pinyin probability values in the polyphonetic word pinyin probability value set, and determine the first polyphonetic word pinyin probability value to be the target pinyin probability;
    将所述目标拼音概率对应的拼音确定为所述目标多音字的目标发音。The pinyin corresponding to the target pinyin probability is determined as the target pronunciation of the target polyphonic word.
  20. 根据权利要求16所述的计算机可读存储介质,其中,所述多音字发音的预测指令被所述处理器执行所述获取经过标注的待处理中文语句,并获取所述待处理中文语句的字表示向量集和多音字表示向量,所述待处理中文语句包括目标多音字的步骤时,包括:The computer-readable storage medium according to claim 16, wherein the prediction instruction of the pronunciation of the polyphonic word is executed by the processor, the obtaining the marked Chinese sentence to be processed, and the character of the Chinese sentence to be processed is obtained. Representation vector set and polyphonic word representation vector, when the Chinese sentence to be processed includes the steps of the target polyphonic word, including:
    获取初始中文语句、所述初始中文语句中的目标多音字和所述目标多音字对应的多音字位置信息;Obtain the initial Chinese sentence, the target polyphone in the initial Chinese sentence, and the polyphone position information corresponding to the target polyphone;
    根据所述多音字位置信息,对所述初始中文语句中的目标多音字进行标注,得到待处理中文语句;According to the position information of the polyphonic word, the target polyphonic word in the initial Chinese sentence is marked to obtain the Chinese sentence to be processed;
    对所述待处理中文语句依次进行字向量编码和多音字向量提取,得到字表示向量集和多音字表示向量。The word vector encoding and polyphonic word vector extraction are sequentially performed on the to-be-processed Chinese sentences to obtain a word representation vector set and a polyphonic word representation vector.
PCT/CN2021/083522 2020-12-10 2021-03-29 Method, apparatus and device for predicting heteronym pronunciation, and storage medium WO2022121166A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2021574349A JP7441864B2 (en) 2020-12-10 2021-03-29 Methods, devices, equipment, and storage media for predicting polyphonic pronunciation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011432585.6A CN112528648A (en) 2020-12-10 2020-12-10 Method, device, equipment and storage medium for predicting polyphone pronunciation
CN202011432585.6 2020-12-10

Publications (1)

Publication Number Publication Date
WO2022121166A1 true WO2022121166A1 (en) 2022-06-16

Family

ID=74998777

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083522 WO2022121166A1 (en) 2020-12-10 2021-03-29 Method, apparatus and device for predicting heteronym pronunciation, and storage medium

Country Status (3)

Country Link
JP (1) JP7441864B2 (en)
CN (1) CN112528648A (en)
WO (1) WO2022121166A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115273809A (en) * 2022-06-22 2022-11-01 北京市商汤科技开发有限公司 Training method of polyphone pronunciation prediction network, and speech generation method and device
CN116150697A (en) * 2023-04-19 2023-05-23 上海钐昆网络科技有限公司 Abnormal application identification method, device, equipment, storage medium and product
CN117592473A (en) * 2024-01-18 2024-02-23 武汉杏仁桉科技有限公司 Harmonic splitting processing method and device for multiple Chinese phrases

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528648A (en) * 2020-12-10 2021-03-19 平安科技(深圳)有限公司 Method, device, equipment and storage medium for predicting polyphone pronunciation
CN112989821B (en) * 2021-04-13 2021-08-13 北京世纪好未来教育科技有限公司 Phonetic notation method for polyphone and computer storage medium
CN113268989A (en) * 2021-05-14 2021-08-17 北京金山数字娱乐科技有限公司 Polyphone processing method and device
CN113268974B (en) * 2021-05-18 2022-11-29 平安科技(深圳)有限公司 Method, device and equipment for marking pronunciations of polyphones and storage medium
CN113823259A (en) * 2021-07-22 2021-12-21 腾讯科技(深圳)有限公司 Method and device for converting text data into phoneme sequence
CN114417832B (en) * 2021-12-08 2023-05-05 马上消费金融股份有限公司 Disambiguation method, training method and device of disambiguation model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069781A (en) * 2019-04-24 2019-07-30 北京奇艺世纪科技有限公司 A kind of recognition methods of entity tag and relevant device
US20200117856A1 (en) * 2017-04-03 2020-04-16 Siemens Aktiengesellschaft A method and apparatus for performing hierarchiacal entity classification
CN111382567A (en) * 2020-05-29 2020-07-07 恒信东方文化股份有限公司 Method and device for recognizing Chinese word segmentation and Chinese character polyphones
CN111599340A (en) * 2020-07-27 2020-08-28 南京硅基智能科技有限公司 Polyphone pronunciation prediction method and device and computer readable storage medium
CN112528648A (en) * 2020-12-10 2021-03-19 平安科技(深圳)有限公司 Method, device, equipment and storage medium for predicting polyphone pronunciation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222329B (en) * 2019-04-22 2023-11-24 平安科技(深圳)有限公司 Chinese word segmentation method and device based on deep learning
CN112052331A (en) * 2019-06-06 2020-12-08 武汉Tcl集团工业研究院有限公司 Method and terminal for processing text information
CN111144110A (en) * 2019-12-27 2020-05-12 科大讯飞股份有限公司 Pinyin marking method, device, server and storage medium
CN111967260A (en) * 2020-10-20 2020-11-20 北京金山数字娱乐科技有限公司 Polyphone processing method and device and model training method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200117856A1 (en) * 2017-04-03 2020-04-16 Siemens Aktiengesellschaft A method and apparatus for performing hierarchiacal entity classification
CN110069781A (en) * 2019-04-24 2019-07-30 北京奇艺世纪科技有限公司 A kind of recognition methods of entity tag and relevant device
CN111382567A (en) * 2020-05-29 2020-07-07 恒信东方文化股份有限公司 Method and device for recognizing Chinese word segmentation and Chinese character polyphones
CN111599340A (en) * 2020-07-27 2020-08-28 南京硅基智能科技有限公司 Polyphone pronunciation prediction method and device and computer readable storage medium
CN112528648A (en) * 2020-12-10 2021-03-19 平安科技(深圳)有限公司 Method, device, equipment and storage medium for predicting polyphone pronunciation

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115273809A (en) * 2022-06-22 2022-11-01 北京市商汤科技开发有限公司 Training method of polyphone pronunciation prediction network, and speech generation method and device
CN116150697A (en) * 2023-04-19 2023-05-23 上海钐昆网络科技有限公司 Abnormal application identification method, device, equipment, storage medium and product
CN117592473A (en) * 2024-01-18 2024-02-23 武汉杏仁桉科技有限公司 Harmonic splitting processing method and device for multiple Chinese phrases
CN117592473B (en) * 2024-01-18 2024-04-09 武汉杏仁桉科技有限公司 Harmonic splitting processing method and device for multiple Chinese phrases

Also Published As

Publication number Publication date
CN112528648A (en) 2021-03-19
JP7441864B2 (en) 2024-03-01
JP2023509257A (en) 2023-03-08

Similar Documents

Publication Publication Date Title
WO2022121166A1 (en) Method, apparatus and device for predicting heteronym pronunciation, and storage medium
JP6929466B2 (en) Speech recognition system
US11132988B1 (en) Dialogue system, a dialogue method, and a method of training
US11210306B2 (en) Dialogue system, a method of obtaining a response from a dialogue system, and a method of training a dialogue system
US11741109B2 (en) Dialogue system, a method of obtaining a response from a dialogue system, and a method of training a dialogue system
WO2021208719A1 (en) Voice-based emotion recognition method, apparatus and device, and storage medium
WO2020168752A1 (en) Speech recognition and speech synthesis method and apparatus based on dual learning
Campbell et al. Language recognition with word lattices and support vector machines
Masumura et al. Large context end-to-end automatic speech recognition via extension of hierarchical recurrent encoder-decoder models
US20230104228A1 (en) Joint Unsupervised and Supervised Training for Multilingual ASR
CN113655893A (en) Word and sentence generation method, model training method and related equipment
Çakır et al. Multi-task regularization based on infrequent classes for audio captioning
Ganesan et al. N-best ASR transformer: Enhancing SLU performance using multiple ASR hypotheses
Moeng et al. Canonical and surface morphological segmentation for nguni languages
CN113282714A (en) Event detection method based on differential word vector representation
CN113268974B (en) Method, device and equipment for marking pronunciations of polyphones and storage medium
CN114925702A (en) Text similarity recognition method and device, electronic equipment and storage medium
CN115098673A (en) Business document information extraction method based on variant attention and hierarchical structure
Liu et al. An empirical evaluation of zero resource acoustic unit discovery
CN114386399A (en) Text error correction method and device
Alisamir et al. An end-to-end deep learning model to recognize Farsi speech from raw input
Dehzangi et al. Discriminative feature extraction for speech recognition using continuous output codes
Route et al. Multimodal, multilingual grapheme-to-phoneme conversion for low-resource languages
CN113362809B (en) Voice recognition method and device and electronic equipment
CN114896404A (en) Document classification method and device

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021574349

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21901882

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21901882

Country of ref document: EP

Kind code of ref document: A1