CN117056859A

CN117056859A - Method for complementing missing characters in cultural relics

Info

Publication number: CN117056859A
Application number: CN202311025114.7A
Authority: CN
Inventors: 丁杨; 胡凯
Original assignee: Individual
Current assignee: Individual
Priority date: 2023-08-15
Filing date: 2023-08-15
Publication date: 2023-11-14

Abstract

The invention belongs to the technical field of artificial intelligence, and discloses a method for complementing missing characters in a cultural relics, which comprises the following steps: step 1, constructing a cultural relic data set; step 2, constructing a text missing character recognition model for predicting the text missing in the text; the text missing text recognition model comprises an emotion recognition model, a meaning recognition model, a phonogram recognition model and a transducer encoder; the output ends of the emotion recognition model, the meaning recognition model and the phonogram recognition model are connected with a transducer encoder; step 3, training a text missing character recognition model; and step 4, inputting the legend Wen Yugou containing the missing characters into the trained legend missing character recognition model, and predicting the missing characters in the legend. According to the invention, emotion, semantics and voice are extracted and fused, so that the efficiency and quality of the deficiency-tonifying characters are improved.

Description

Method for complementing missing characters in cultural relics

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a method for complementing missing characters in a cultural relics.

Technical Field

The cultural relics are traditional languages in China, and in archaeological work, partial characters are lost due to damage of a nail, a stone tablet, a bamboo stick and a cotton fabric, so that archaeological staff needs to spend a great deal of time to guess and recover the lost characters, and difficulties are brought to exploring the history of national civilization.

If the existing artificial intelligence technology can be utilized, the part of the lost characters can be guessed and recovered, and the expert can be combined to mutually verify other contents, so that great help is brought to the history examination.

Disclosure of Invention

Technical problems: based on an artificial intelligence algorithm, a method for complementing the missing characters in the cultural relics is provided, and is used for complementing the missing characters in the texts by combining emotion recognition, context semantics and ancient pronunciations.

The invention comprises the following steps: the invention provides a method for complementing missing characters in a cultural relic, which comprises the following steps:

step 1, constructing a cultural relic data set;

step 2, constructing a text missing character recognition model for predicting the text missing in the text;

the text missing text recognition model comprises an emotion recognition model, a meaning recognition model, a phonogram recognition model and a transducer encoder;

the output ends of the emotion recognition model, the meaning recognition model and the phonogram recognition model are connected with a transducer encoder;

step 3, training the emotion recognition model, the meaning recognition model and the phonogram recognition model by using the cultural relic data set, and training the cultural relic missing character recognition model integrally by using the cultural relic data set after training;

and step 4, inputting the legend Wen Yugou containing the missing characters into the trained legend missing character recognition model, and predicting the missing characters in the legend.

Further, training the emotion recognition model by using a dialect data set in the step 3, specifically, inputting a dialect Wen Yugou containing a missing word into the emotion recognition model, performing emotion recognition, and outputting emotion tendencies of the dialect Wen Yugou containing the missing word;

further, the inputting the dialect Wen Yugou containing the missing text into the emotion recognition model for emotion recognition includes the following steps:

step 311, the left text of the missing text is expressed as Sen _left The text on the right side of the missing text is expressed as Sen _right For Sen respectively _left And Sen _right An Emb (& gt) literal encoding operation is performed to obtain two encoding tensors left and right, expressed as:

left＝Emb(Sen _left )

right＝Emb(Sen _righ )

step 312, inputting the encoded tensor left and right to bi_lstm (i.e.) two-way long-short-term memory network, and performing feature extraction to obtain:

out _l ＝Bi_LSTM(left)

out _r ＝Bi_LSTM(right)

will out _l And out _r Performing splicing operation, and outputting through a Softmax activation function to obtain the emotion tendencies of the text Wen Yugou containing the missing characters

emotion＝Softmax(Cat(out _l ，out _r ))

Wherein Cat (-) represents a stitching operation of the two feature vectors; softmax (-) is the activation function for the final classification.

Further, in step 3, training the semantic recognition model by using a dialect data set, specifically, inputting a dialect Wen Yugou containing a missing word into the semantic recognition model, performing semantic recognition, and outputting a semantic vector semanic of the missing word in the dialect sentence.

Further, the semantic recognition model adopts a bidirectional LSTM model.

Further, training a phonological recognition model by adopting a dialect and text data set in the step 3, namely inputting the dialect Wen Yugou containing the missing characters into the phonological recognition model for pinyin recognition, and outputting the pinyin with the tone of the missing characters;

further, the inputting the dialect Wen Yugou containing the missing text into the phonogram recognition model for the pinyin recognition includes the following steps:

step 321, word vector coding of Word2Vec and Huffman tree algorithm are adopted to carry out Pinyin recognition on a dialect Wen Yugou containing missing characters, so as to obtain Pinyin information of the missing characters; the pinyin information does not include tones;

step 322, coding the input dialect Wen Yugou by adopting an coding Word code to obtain a coding vector word_emb:

Word_emb＝Emb(Sen)

step 323, inputting the code vector word_emb to a Bi-directional long-short-term memory network Bi_LSTM, and extracting features to obtain a feature vector Temp, wherein the Bi-directional long-term memory network Bi_LSTM is set to be 7 or 17;

Temp＝Bi_LSTM(Word_emb)

step 324, the extracted feature vector Temp is continuously sent to a transducer network to extract global information, and the tone of the missing text is output

tone＝Transformer_Layer(Temp)

Step 325, combining the tone of the missing text with its Pinyin to obtain the Pinyin with tone of the missing text.

Further, in step 3, the whole training of the text missing word recognition model is specifically that emotion tendencies, semantic vectors and tonal pinyin of the missing word, which are correspondingly output by the emotion recognition model, the meaning recognition model and the phonological recognition model, are input to a transducer encoder, so that the missing word of the text is predicted.

Further, the emotion tendency emotion, the semantic vector Sen and the tonal pinyin of the missing text which are correspondingly output by the emotion recognition model, the meaning recognition model and the phonological recognition model are input into a transducer encoder, so that the missing text is predicted, and the method comprises the following steps:

step 331, respectively outputting a semantic vector Sen, emotion tendency em, and the tonal Pinyin to an enabling layer for encoding, where the obtained encoding vectors are expressed as follows:

Word_emb1＝Emb(Sen)

Word_emb2＝Emb(emotion)

Word_emb3＝Emb(Pinyin)

step 332, splicing the encoded vectors obtained in step 331, which is expressed as follows:

input＝Cat(Word_emb1，Word_emb2，Word_emb3)

wherein Cat (-) represents a stitching operation on the feature vector;

step 333, sending the fused tensor input to a transducer encoder for feature extraction, predicting the missing text and outputting

Output＝Transformer(input)

The beneficial effects are that: in the process of predicting the missing characters, the invention extracts and fuses emotion, semantics and voice, thereby improving the efficiency and quality of the missing characters.

Drawings

FIG. 1 is a general flow diagram of the present invention;

FIG. 2 is a standard continuous word bag model CBOW diagram;

FIG. 3 is a table voice recognition flow chart;

fig. 4 is a generally detailed flow chart of the present invention.

Detailed Description

The existing work of the character deficiency is mainly concentrated on semantic features, but as many literary works in ancient Chinese are works of authors under different emotions, the emotion is a feature with important information; in addition, the ancient Chinese language has a large number of writing specifications of voice fight and tone, so the voice is also an important information feature. The current research does not extract the emotion and voice characteristics and does not fuse the emotion and voice characteristics with semantic characteristics, so that a great amount of characteristic information is lost, and the quality of the deficiency is reduced. Therefore, the 3-dimensional features of emotion, voice and semantics are extracted and fused, the width and accuracy of the filling can be improved, and the method is very convenient for word specialists to operate and has innovative and practical significance. In addition, the invention submits the respective results of the original emotion, semantic and voice analysis and the final results to expert users, and the expert can analyze and adopt from various angles.

The theoretical basis of the invention has 3 starting points:

(1) Semantic meaning

The Chinese language is ideographic words, each word can express various meanings according to the context environment, so that the understanding of each word needs to be understood by combining the context, and intermediate words can be guessed through the context words, and therefore, a Markov chain of local context information exists on the word information.

(2) Speech sound

Because Chinese characters have a plurality of polyphones, ancient people pay attention to the rule of the pronunciation direction of the flatly-sounding and rhyming in writing, and can provide the information of guessing missing words in pronunciation dimension besides the upper and lower characters. There is also a markov chain of local context information on the word pronunciation.

(3) Mood of emotion

The literary works are emotions, which are emotions of the author to a certain state, which affect the text usage of the whole work, which is a globally affecting feature.

Based on the 3 factors, the invention provides a deep learning method for the literary and literal deletion completion task, which adopts a 2-level multi-branch structure; the first stage is to predict emotion, semanteme and tone phonetic alphabet of the character to be supplemented; the pinyin with tone refers to one, two, three or four sounds in the pinyin; the second level is to fuse emotion, semantics and tone pinyin. After fusion, a group of possibly interpolated characters are output for the proposal of the character research worker to refer to.

Because the invention considers that expert users need to know the results of the independent analysis of the angles in actual work, thereby facilitating the final comprehensive analysis of the expert users, the models requiring the angles can be analyzed independently. In consideration of the reason, the whole model designed by the invention needs that the 3 angles can be independently learned, respectively analyzed and respectively output, so that a scheme which is finally fused at a result level is selected from three fusion schemes of a pixel level, a feature level and a result level in the data fusion field. The invention relates to a method for complementing missing characters in a cultural relic, which comprises the main steps of data set construction, learning process, using process, feedback process and the like.

Step 1, data set construction

Step 1.1, constructing a cultural relic data set

Using the ancient electronic dataset Chinese-post (https:// gilthub. Com/Chinese-post) disclosed on Github, the dataset may also be downloaded by self-purchase through, for example, the CSDN website. The data set includes a content of discourse, poetry, four books and five strokes, mongolian, 5.5 ten thousand Tang poems, 26 ten thousand Song Shi and 2.1 ten thousand Song words. Each sentence carries punctuation marks marked by the latter person, and the punctuation marks exist as a word.

In addition, the text-to-text data set also comprises pronunciation data corresponding to the text-to-text; considering that the public commercial pronunciation software fully learns the pronunciation rules of polyphones in pronunciation, the pronunciation of the literary composition is automatically generated by a computer, the Mandarin is used, and the public commercial pronunciation software of the mass science has the function of automatically reading the semantics of a Chinese-character data set to form a group of pronunciation. The pronunciation data can be stored in the cultural relic data set, and the MP3 format sound data corresponding to the lossless standard can be generated by using the pronunciation software when a certain sentence in the cultural relic data set is used each time.

The Chinese language is stored in a database in a coding mode, the coding standard is GB18030-2022, the coding standard is an upgrade version of GB18030 issued in 2022, the Chinese character coding national standard is expansion of GBK coding, characters such as Chinese are covered, and the total number of Chinese characters is recorded in eight tens of thousands.

When each model is trained independently, the existing words in the existing Chinese-character data set are hidden randomly, and the emotion value, the semantic value, the spelling value and the flat pitch which originally exist are used as learning output of the model and used as paired data learning

Step 2, learning process

Principle part:

the existing Natural Language Processing (NLP) field has a common disadvantage in the english application direction: sentences in a pair of contradictions are not necessarily from different semantic categories. Therefore, merely optimizing the implicit and contradictory objective functions of reasoning does not adequately capture higher-order semantic features, i.e., it is not possible to represent more fine-grained semantics. This disadvantage is due to the fact that local, i.e. finer granularity, losses can only be learned from a single sentence pair or triplet, resulting in a poor local optimal solution.

The principle of the section is combined with the characteristics, namely if the model only focuses on the low-order characteristics in a local range, the contradictory information of the low-order characteristics is easily amplified, and the model falls into local optimum; if the model is only focused on the high-order features in the global scope, a lot of detail information is lost, and the optimal solution cannot be found. Therefore, the method can be combined with each other, and the high-order features are used for guiding the extraction and learning of the low-order features, so that a better balance effect is achieved.

Considering that the principle of natural language processing NLP is approximately the same, the same working principle exists for the subject text of the present invention. If the characteristics of multidimensional features are considered, the invention proposes and uses the emotion which is a high-order feature, and the feature extraction and the learning are carried out with the semantic and voice which are two low-order features.

In addition, the invention considers that the expert user needs to know the results of the independent analysis of the angles in the actual work, thereby facilitating the final comprehensive analysis of the expert user, and the models requiring the angles can be analyzed independently. In view of this, as shown in fig. 1, the overall model designed by the present invention requires that these 3 angles can be analyzed independently, learned independently, and a final fusion scheme at the result level is selected from three schemes of the pixel level, the feature level and the result level in the data fusion field.

In the whole structure, as shown in fig. 1, the model provided by the invention is divided into 2 stages, wherein the 1 st stage is emotion recognition, meaning recognition and phonogram recognition; stage 2 is to perform a transition-based alignment operation based on the prediction results of these high-and low-order features. Because the emotion features belong to a global high-order feature, and the context and context extraction features belong to a local low-order feature, the invention uses emotion, semantic and voice multiple features to extract and learn, and hopes to achieve a better balance effect.

Emotion recognition is a specialized field, with specialized data sets and algorithms, and the present invention uses existing maturity algorithms.

Step 2.1 emotion recognition model.

The emotion recognition model adopts a recognition scheme of a bidirectional loop LSTM of a short text emotion analysis algorithm research based on deep learning in a Leng Yongcai master paper of the North electric university of 2021 to recognize emotion in a Chinese text; on the basis, the invention sets the identified emotion values as follows: 8 kinds of results, 0 (unknown), 1 (very negative), 2 (negative), 3 (little negative), 4 (neutral), 5 (little positive), 6 (positive), 7 (very positive). The numerical values of 0 to 7 are input to the lower level and used as the high-order semantic to conduct guiding learning on the low-order semantic.

By analyzing the Chinese-character data set, the reasonable balance area of sentence length and emotion accuracy is found to be 28-35 characters, so that the number of LSTM in the bidirectional loop LSTM in the emotion recognition model is set to be 35, namely 17 characters before and after the recognized Chinese character (for the characters positioned at the beginning and the end of the article, if one side is less than 17 characters, 0 is supplemented, in addition, punctuation marks are also used as character input), which is one parameter setting in the emotion model combining process. Thus, the present invention takes 35 words as the length of one sentence, recognizes the overall emotion of the sentence of surrounding 35 words centering on each word, and assigns the overall emotion value to the emotion feature of this word.

By using the data set of emotion recognition in short TexT emotion analysis algorithm research based on deep learning in the existing university of North electric power in 2021, leng Yongcai, as input, a proper emotion recognition model M1 can be trained, and then the emotion recognition model M1 is used for performing emotion recognition on the input TexT data TexT1, so that an output result of a group of emotion recognition models, namely an emotion result OutT1, is obtained.

The invention regards the characters to be supplemented as a blank value, the emotion expression value of the characters to be supplemented is calculated by 34 surrounding values, and the value is filled in an emotion result OutT1 sequence of the emotion recognition model M1. If 34 Chinese characters before and after the Chinese characters to be supplemented are deleted are directly used for identifying the emotion of the Chinese characters to be supplemented, the calculation cost is high, so that the scheme of the invention is that.

Step 2.1-A, namely, 34 characters in total are arranged before and after the characters to be supplemented, each character and 2 characters before and after the characters form a 3-character phrase, the emotion value of 1 character before and after the 34 characters (the last 2 characters or the first 2 characters are used for the head and tail characters) is identified,

step 2.1-B then gives the emotion value sequence of 34 words to the network, and then identifies the emotion value of the missing word to be supplemented. Thus, with a 2-stage scheme, the computational effort can be greatly reduced.

The output content of the emotion recognition model M1 is an emotion expression value, outt1= { Oa1, which coincides with the input TexT1 TexT length LT1 of the emotion recognition model M1 ₁ ，Oa1 ₂ ，...Oa1 _LTi ，...，Oa1 _LT1 Lti=1 to LT1, LTi represents the i-th word, and lt1=35 is used in the embodiment of the present invention. Oa1 ₂ The emotion result of the 2 nd word in the text is 0-7, and the emotion result is 8 types of results, namely 0 (unknown), 1 (very negative), 2 (negative), 3 (little negative), 4 (neutral), 5 (little positive), 6 (positive) and 7 (very positive).

After setting, the output of the OutT1 is that surrounding words centering on each word form a sentence, then the whole emotion of the sentence is recognized, and then the whole emotion value is assigned to the position of the word to form a high-order feature sequence with emotion.

The emotion recognition model M1-the specific scheme of the bidirectional loop LSTM is as follows: sen (Sen) _left And Sen _rig The left text and the right text of the missing word are respectively subjected to an Emb () coding operation of the text, as shown in fig. 2, refer to paper Efficient Estimation of Word Representations in Vector Space, which is a text coding mode, coding is realized by using low-dimensional vectors, and the coding can express the correlation between the characters through training optimization of a neural network. Two parameters are required to be set, namely, the maximum number of the dictionary, namely, the size of the dictionary, and the second parameter is the dimension of the expected output vector.

In a specific embodiment of the invention, for step 2.1-A, the word "hoe Dai Dang" is in the form of: "hoe", "grass", "day", "sweaty", "noon". Each word is sent to the encoding layer for encoding, where the specified output dimension is 3, and a real vector of each word is obtained after training of the model, for example, the vector encoded by "hoe" may be: [0.2,0.4, -0.1], thus yielding a low-dimensional vector representation of the text, the specific dimensions can be set by predefined.

left＝Emb(Sen _left )

right＝Emb(Sen _ri g _ht )

Since this step becomes computationally intensive as dimensions increase and the computational requirements become high for 35 dimensions, the invention is divided into 2 steps, step 2.1-a using only 3 words to extract the low-dimensional vector representation of the words. Step 2.1-B is the OutT1 obtained directly in step 2.1-A, so the calculation amount is reduced.

The missing text is obtained as a left text Sen and a right text Sen _left And Sen _right Coding to obtain two tensorsleft and right

The Bi-directional loop LSTM in emotion recognition model M1 is Bi_LSTM (two-way long-short-term memory network) which is formed by combining forward LSTM and backward LSTM. But the LSTM cells required in these two steps are not identical. The LSTM cells in step 2.1-A are 3, and the inputs are two tensors left and right. The LSTM cells in step 2.1-B are 35, and the inputs are OutT1.

I.e., both front and back, are typically used to model context information in natural language processing tasks. LSTM networks were first described in paper Convolutional LSTM Network: a Machine Learning Approach for Precipitation Nowcasting, the above encoding operation is performed to obtain the encoding tensor of the left and right text of the missing word, the two tensors left and right respectively perform feature extraction through a Bi-directional long-short-term memory network bi_lstm (), the encoding tensor is input to bi_lstm (), then the forward LSTM reads the input from front to back one by one word, and the reverse LSTM reads the input from back to front. The two-way long and short-term memory network includes a plurality of LSTM units, each of which maintains a hidden state that indicates its understanding of the current context. And finally, combining the hidden states in the two directions to obtain comprehensive understanding of the context information of the whole sentence.

out _l ＝Bi_LSTM(left)

out _r ＝Bi_LSTM(right)

After the feature extraction operation of the two-way long-short-term memory network of the left branch and the right branch, the obtained feature vectors are fused, and then the emotion tendencies of sentences can be obtained through the output of the Softmax activation function

emotion＝Softmax(Cat(out _l ，out _r ))

Where Cat () represents the concatenation of two feature vectors. softmax (-) is an activation function for the final classification, intended to score each node for the previous linear classification, which scales each element in the output vector to a value between 0 and 1, and all values add to 1. And finally, outputting the position index with the maximum probability value and at the uniform speed by the model, namely the predicted emotion type. Here, in step 2.1-a, the emotions of the single words composed of 3 words are input into OUT1 one by one, and in step 2.1-B, the emotions of the missing values obtained by 35 words (may be reduced according to the actual situation) are determined.

The' hoeing day is the morning, the sweat drops off the soil, who knows the Chinese meal, and the grains are all hard. By way of example, assuming that the missing word is "sweat", the procedure of this scheme is as follows:

in the step 2.1-A, respectively extracting 'hoeing day', 'He daycare', 'daycare noon', 'noon, 0'. The identification is carried OUT, and a Bi_LSTM of 3 LSTM units is used, so that each word in the whole text is identified with one emotion, and OUT1 is obtained;

in step 2.1-B, bi_LSTM of 35 LSTM units is used to identify the input OUT1, and the emotion value of the missing value is obtained. In this example, of the 35 LSTM's, the left 17 LSTM's use only 6, corresponding to the emotion value of 1 word around the "hoeing daycare", the remaining 11 voids are filled with 0 "unknown"; the 17 LSTMs on the right use 17, which corresponds to' drop grass is fallen down to know who has eaten the Chinese meal, and the grains are all hard. "emotion value of 1 word around 17 words.

Therefore, on the premise of limited calculation amount, the invention is divided into 2 main steps based on Bi_LSTM structures (3 and 35 LSTM modes), and the 35-dimensional calculation amount is changed into 34 3-dimensional calculation amounts, so that emotion recognition project of words in a 35-word range is obtained.

Thus, the M1 model can work independently, output emotion sequences and input the emotion sequences into later fusion. The method meets the requirement that expert users need to know the results of independent analysis of the angles in actual work, thereby facilitating the final comprehensive analysis of the expert users.

Step 2.2, meaning identification model

Here, we continue to use the scheme of bi-directional LSTM in emotion recognition model for meaning prediction. Considering that the absolute sentence in ancient text is many 7 words, the minimum unit used is 1 word, so the first easily conceivable relationship length is 7 words.

The method is as follows, for example, "according to the wine of" Kanji "and" according to the 1 st LSTM "at the moment, the input value of" according to the Kanji "and" according to the 2 nd LSTM "and" according to the number of "according to the Kanji" and "according to the GB 18030-2022", the input value of "according to the 3 rd LSTM" and "according to the number of" according to the text of "according to the Kanji" and "according to the GB 18030-2022", the input value of "according to the 4 th LSTM" and "according to the 5 th LSTM" and "according to the number of" according to the Kanji "and" according to the GB18030-2022 "and" according to the 7 th LSTM "and" according to the wine ".

From this example, it is not difficult to see that the missing "out" in the middle is more prone to be combined with the following "wine", so as to form a common two-word combination of "drunk", "out of alcohol", "drunk", and the like; and the two-word combination of the wine dissuading is combined with the previous word dissuading, so that the scheme of using the bidirectional LSTM is reasonable, the front and back words are 3 words, and the length is 7.

In addition, many of the upper and lower pairs of the absolute sentences, for example, "two Orioles are in green, one row of aigrette is in green," two "and" one "," one "and" row "are in one-to-one pairs, and their pitches are 7 Chinese characters plus one punctuation mark, so their distances are 8, so the 2 nd relationship distance which is easily thought of is 8×2+1=17.

Since it is reasonable to set the relationship length to 7 or 17, a larger range of 17 can be selected as the maximum distance, so that the relationship between them can be grasped as much as possible.

However, if 17-dimensional data is used for analysis, as in step 2.1, the calculation is very large, so that the word required for the fight can be provided to the 8 th distance, i.e. 8-dimensional, on the basis of 7 distances in order to minimize the calculation. In the example, if the missing value is "line", the "number" of the pair is added with the front and back 3 words "willow, and one aigrette. As to whether the text is a 5-way clause or a 7-way clause, the input can be completely provided by the researcher as the invention is an auxiliary algorithm provided to the researcher.

Based on this analysis, plus the fact that the model of the bi-directional LSTM using 8 LSTMs, which easily discovers detailed information that is not easily discoverable by humans, is chosen for better feature extraction performance due to the data sets that have been created herein and the adoption of the bi-directional LSTM model scheme in emotion model recognition.

The specific scheme of the partial model M2 is consistent with that of the emotion recognition model M1, so that details are not repeated, and the output result is Sen.

Thus, the M2 model can work independently, output emotion sequences and input the emotion sequences into later fusion. The method meets the requirement that expert users need to know the results of independent analysis of the angles in actual work, thereby facilitating the final comprehensive analysis of the expert users.

Step 2.3 Table Voice recognition model

Step 2.3.1, pinyin identification

Word vector coding of Word2Vec is adopted, and a Huffman tree algorithm is combined, as shown in fig. 2, a standard CBOW (continuous Word bag model) model comprises an Embdding layer, a hidden layer and an output layer, and outputs of the hidden layers are subjected to normalization processing after being spliced and then are output through the output layer. Then, since the first-level characters commonly used in the Chinese characters are approximately 3500 characters, if the softmax output of the traditional full-connection layer is adopted, the calculation amount is large, so that the coding format of the Huffman tree is adopted for reducing the calculation amount. Such combinations are well known and will not be described in detail herein, the details of which may be found in

Parameters are referred to in (https:// blog. Csdn. Net/qq_ 45198339/arc/details/128772164).

After this step, pinyin information (e.g., chinese "i" where wo, no 3 rd sound information is obtained) of the target chinese character to be identified that does not include flat tones (tones) may be obtained. Then the result is input into 1.3.2 Pingzep tone recognition, and combined with Pingzep tone to form Chinese phonetic alphabet with Pingzep tone.

The reason for this design is that the existing scheme is relatively approved, and the code and the model are disclosed, so that the method can provide better Pinyin information without flat tones in effect.

Step 2.3.2 Flat-tone recognition

The phonogram model mainly considers the fight of upper and lower tones in ancient text absolute sentences, especially flat tone, and the cultural text basically has the following rules:

(1) From each sentence, the plains are alternate; from each union, the level is opposite, and the upper union post sentence is the same as the lower union pre sentence.

(2) Three consecutive calms cannot occur at the end of each sentence.

(3) The flat-up and flat-down sentence in the five words (the second word is flat and the last word is flat), and the first, third or fifth word in the zepe-up and flat-down sentence in the seven words (the second word is zepe and the last word is flat) has to have one flat sound, otherwise is 'solitary flat'.

In addition to the above, there are also some rules of flat tones. These rules can be processed either with a purely model approach or with a deep learning approach. The scheme of adopting the pure model can lead the rules to be more detailed, but obviously the volume of the model is large, and the detail information which is more difficult to be found by people is difficult to identify, and the bidirectional LSTM model is adopted as the voice recognition model M3 as the data set which is established in the text and the bidirectional LSTM model scheme are adopted in emotion model recognition, and the characteristics of the flat and unified characteristics of the last sentence end word of the law poem are considered, so that the bidirectional LSTM model under the maximum 17 word distances is adopted as the voice recognition model M3 as the semantic model.

In this way, the present invention recognizes emotion from a total of 8 words, and then predicts the sound of the missing word by using M3, and the present invention uses only four sounds of sound as a result. The values of the model are one of 5 types of results of 0 (unknown), 1 (-tone), 2 (/ tone), 3 (, tone), and 4 (\tone), and the result is used as the subsequent results, and in this scheme, although the model is at the maximum 17-character distance, the input is only 5 types, and is not 3500 characters or more in character meaning, so the calculation amount is much smaller. The specific scheme is shown in fig. 3:

as in M1, for example, "two Orunder green willows, one row of aigrette on the green sky". The form of the poem after word segmentation is as follows: "two (3 (2) (/ tone)) are only (4 (2 (/ tone)) to sound (2 (/ tone)) green (4 (2 (/ tone)) willow (3 (2) ((v tone))), (0 (unknown)) one (1 (-tone)) row (2 (/ tone)) white (2 (/ tone)) on (4 (\tone)) green (1 (-tone)) day (1 (-tone)) of (4 (\tone)). (0 (unknown)) "Pinyin (3422245012244110);

assuming that the missing word is a "row", then (3422245010244110) this sequence is input into the 35 LSTM cells in model M1; the 35 LSTM units form Bi_LSTM (), are two-way long-short-term memory networks, and then the encoded sentences are sent into the two-way long-term memory networks (Bi_LSTM) to perform feature extraction:

Temp＝Bi_LSTM(Word_emb)

where word_emb=emb (Sen) represents the encoded sentence;

the extracted feature vector Temp is continuously sent into a transducer network to extract global information, and finally the tone of the missing text is output

tone＝Transformer_Layer(Temp)

Step 2.3.2, combining the tone of the missing text with the Pinyin thereof to obtain Pinyin with the tone of the missing text;

thus, the M3a and M3b models can work independently to output Pinyin and Pinyin tone_pz, and can also be combined into a complete tone with Pinyin to be input into later fusion. The method meets the requirement that expert users need to know the results of independent analysis of the angles in actual work, thereby facilitating the final comprehensive analysis of the expert users.

Step 2.4, synthesizing an output module

After obtaining the prediction results provided by the 3 dimensions respectively, a synthesis output needs to be performed on the results, and the process includes coding splicing and predicting 2 parts, and is shown in fig. 4 as a whole.

Step 2.4.1, coding and splicing.

The coding operation is carried out on the original sentence and the emotion, the semantic and the Pinyin with tone which are predicted before, the Emb (the term) is the Embedding character coding operation, and the paper Efficient Estimation of Word Representations in Vector Space can be referred to, which is a character coding mode, the coding is realized by using low-dimensional vectors, and the coding is optimized through the training of a neural network, so that the relevance between words can be expressed.

Word_emb1＝Emb(Sen)

Word_emb2＝Emb(emotion)

Word_emb3＝Emb(Pinyin)

Wherein Sen, emotion and Pinyin respectively represent results of the meaning model recognition, emotion recognition and phonogram recognition.

Then, splicing the obtained coding vectors to obtain a coding tensor input:

input＝Cat(Word_emb1，Word_emb2，Word_emb3)

where Cat () represents the concatenation of two feature vectors.

The tensor input after fusion is sent to a transducer for feature extraction, and the characters missing in the final ancient poems are predicted

Output＝Transformer(input)

The transducer model was first proposed in google published paper Attention is All You Need, where the parameters were set as follows:

length of input vector: 8, 8;

feedforward neural network hidden neuron number: 2048;

dimension of query, key, and value vector: 512.

Number of stacks of modules: 12;

number of attention heads in multi-head attention: 8

[1] Input representation (Input representation):

first, each word or token in the input sequence is converted into a vector representation. Word embedding (word embedding) is typically used to represent words or tokens in an input sequence. The word coding method is a word coding mode, coding is realized by using low-dimensional vectors, and the coding is optimized through neural network training, so that the relevance between words can be expressed. Two parameters are required to be set, namely, the maximum number of the dictionary, namely, the size of the dictionary, and the second parameter is the dimension of the expected output vector.

The form of the poetry divided by words, such as 'hoeing the grass on the day' is as follows: "hoe", "grass", "day", "sweaty", "noon". Each word is sent to the encoding layer for encoding, where the specified output dimension is 3, and a real vector of each word is obtained after training of the model, for example, the vector encoded by "hoe" may be: [0.2,0.4, -0.1], this results in a low-dimensional vector representation of the word, the specific dimensions being set by predefined definition.

Here, prediction is performed using only a transducer encoder section, which is mainly stacked by the same attention module (transducer_layer), and the process of each module can be expressed as the following:

[2] Self-Attention calculation (Self-Attention):

at the heart of the transducer encoder is a self-attention mechanism that allows the model to build associations between different positions in the sequence. The input sequence is subjected to three linear transformations to obtain representations of a query (query), a key (key) and a value (value).

An attention weight is calculated using the query vector for measuring the relevance of each position in the input sequence to the query position. This may be achieved by calculating the inner product of the query vector and all key vectors.

Attention weights are applied to the value vectors to obtain a weighted sum representing a contextual representation related to the query location. The calculation of the self-attention mechanism can be expressed as the following formula:

where Q, K, V represent the query vector, the key vector and the value vector,dimension representing key vectorsDegree, softmax () represents the activation function.

[3] Multi-Head Attention (Multi-Head Attention):

the transducer model uses multiple independent self-attention mechanisms, called multi-head attention. Each attention header performs a different query, key, and value linear transformation, allowing the model to capture different information in different presentation subspaces. The flow of multi-head attention can be divided into the following steps:

dividing the input sequence data into a plurality of heads;

performing independent query, key, value linear transformation for each header;

performing a self-attention calculation on each head, resulting in an output of the head;

splice the outputs of all heads together and perform an output linear transformation.

The outputs of the multiple heads of attention are connected and linearly transformed to generate a final attention representation.

[4] Layer normalization (Layer Normalization):

after self-attention calculation and multi-head attention, a layer normalization operation is performed. This normalizes the attention representation making it easier to train and characterize.

[5] Feed forward neural network (Feed-forward Neural Network):

in each of the attention modules, a feed-forward neural network is also included. It performs a nonlinear transformation on the attention representation of each location to enhance the representation capabilities of the model.

[1] Residual connection (Residual Connections) and layer normalization (Layer Normalization):

in each attention module, residual connection and layer normalization are used to enhance the flow and gradient propagation of information.

In the prediction result, the original scheme is to fill the characters with the largest prediction probability as missing characters, however, because of subjectivity of text content semantics, the characters with the largest prediction probability are not necessarily the most suitable, so that all characters with the highest prediction probability are output in the last selection, the predicted five characters and output values of M1, M2, M3a and M3b models are provided for a user to perform reasoning selection to be the most suitable for filling as missing characters through artificial selection intervention, and the fluency of the text and the semantics can be greatly improved.

On the aspect of the problem of a data set, when each model is independently trained, the invention randomly hides the words existing in the existing Chinese-character data set, and uses the original emotion value, semantic value, spell value and flat tone of the word as the learning output of the word as the paired data learning

In the whole learning process, the method is divided into 2 steps, wherein the 1 st step is training M1L, M2L, M L of a sub model, and the training M1L, M L and the training M L, M L are independently learned on a data set; after learning in the 1 st substep, the 3 trained models are brought into the whole algorithm to train the later fused part to obtain the whole MTL, which is the content of the 2 nd substep.

In this embodiment, the emotion recognition model M1 does not participate in learning, and uses the data set of emotion recognition in the short text emotion analysis algorithm research based on deep learning in the university of electric power Leng Yongcai of north in 2021 as input, a suitable emotion recognition model M1L can be trained, and L is a symbol indicating that learning is already performed.

M2 and M3 (including M3a and M3 b) are to be learned. According to the invention, the whole data DataT1 in the ancient electronic data set Chinese-point is divided into 3 parts at random, 30% of data are subjected to respective pre-learning, 50% of unified learning and 20% of testing. The data in 30% are M2 and M3, and a preliminary model is obtained; substituting the training data into the integral 2.4 step of the invention to carry out integral learning to obtain trained M2L, M L (M3 aL and M3 bL), and finally obtaining an integral model MTL.

1. The use process

In the process, 35 characters in total are input into ML1, ML2 and ML3 before and after the characters to be recognized. The emotion value is obtained through ML1, the pinyin with tone is obtained through ML3, and the semantic vector is obtained through ML 2. The overall MTL is then used to predict the specific text.

2. Feedback process

In the process, the invention integrates the text cases which are determined to be in deficiency by the text expert in the use process and feed the text cases back to the learning process, and the learning accuracy is improved by increasing the learning times of the error cases.

Claims

1. The method for complementing the missing characters in the literary composition is characterized by comprising the following steps:

step 1, constructing a cultural relic data set;

step 2, constructing a text missing character recognition model for predicting the text missing in the text; the text missing text recognition model comprises an emotion recognition model, a meaning recognition model, a phonogram recognition model and a transducer encoder;

2. The method according to claim 1, wherein in step 3, the emotion recognition model is trained using a text data set, specifically, the text Wen Yugou containing the missing text is input into the emotion recognition model for emotion recognition, and the emotion tendencies of the text Wen Yugou containing the missing text are output.

3. The method for completing the missing text in the text of the cultural relics according to claim 2, wherein the step of inputting the cultural relics Wen Yugou containing the missing text into the emotion recognition model for emotion recognition comprises the following steps:

left＝Emb(Sen _left )

right＝Emb(Sen _right )

out _l ＝Bi_LSTM(left)

out _r ＝Bi_LSTM(right)

emotion＝Softmax(Cat(out _l ,out _r ))

4. The method for complementing a missing text in a text according to claim 1, wherein in step 3, the semantic recognition model is trained by using a text data set, specifically, a text Wen Yugou containing the missing text is input into the semantic recognition model for semantic recognition, and a semantic vector semanic of the missing text in the text sentence is output.

5. The method for complementing a missing text in a culture of a word as claimed in claim 1, wherein the semantic recognition model adopts a bidirectional LSTM model.

6. The method for completing the missing text in the text according to claim 1, wherein in step 3, the text data set is adopted to train the phonological recognition model, specifically, the text Wen Yugou containing the missing text is input into the phonological recognition model to perform pinyin recognition, and the tonal pinyin of the missing text is output.

7. The method for completing the missing text in the text of the Chinese character according to claim 6, wherein the step of inputting the text Wen Yugou containing the missing text into the phonogram recognition model to perform the pinyin recognition comprises the following steps:

Word_emb＝Emb(Sen)

step 323, inputting the code vector word_emb to a Bi-directional long-short-term memory network Bi_LSTM, and extracting the characteristics to obtain a characteristic vector Temp;

Temp＝Bi_LSTM(Word_emb)

tone＝Transformer_Layer(Temp)

8. The method for completing the missing text in the text according to claim 1, wherein in step 3, the text recognition model of the missing text in the text is trained as a whole, specifically, emotion tendencies and semantic vectors which are output by the emotion recognition model, the meaning recognition model and the phonological pinyin of the missing text corresponding to the emotion recognition model are input to a transducer encoder, so as to predict the missing text in the text.

9. The method for completing a missing text in a text according to claim 8, wherein emotion tendencies, semantic vectors Sen, and tonal pinyin of the missing text, which are output by the emotion recognition model, the meaning recognition model, and the phonological tendencies, are input to a transducer encoder, so as to predict the missing text in the text, and the method comprises the following steps:

Word_emb1＝Emb(Sen)

Word_emb2＝Emb(emotion)

Word_emb3＝Emb(Pinyin)

input＝Cat(Word_emb1,Word_emb2,Word_emb3)

wherein Cat (-) represents a stitching operation on the feature vector;

Output＝Transformer(input)。