CN116415582A

CN116415582A - Text processing method, text processing device, computer readable storage medium and electronic equipment

Info

Publication number: CN116415582A
Application number: CN202310591400.3A
Authority: CN
Inventors: 赵韡; 刁晓林; 张海波; 曹旭; 王玉鑫; 任立新; 廉晓丹
Original assignee: Fuwai Hospital of CAMS and PUMC
Current assignee: Fuwai Hospital of CAMS and PUMC
Priority date: 2023-05-24
Filing date: 2023-05-24
Publication date: 2023-07-11
Anticipated expiration: 2043-05-24
Also published as: CN116415582B

Abstract

The invention discloses a text processing method, a text processing device, a computer readable storage medium and electronic equipment. The method comprises the following steps: obtaining target words, wherein the target words are words extracted from a voice transcription text; determining the pinyin of the target word, and carrying out vector representation on the pinyin to obtain a target vector; acquiring a plurality of standard words and the pinyin vectors of the plurality of standard words, and screening at least one candidate word from the plurality of standard words according to the target vector and the pinyin vectors of the plurality of standard words; and calculating a first editing distance between the target word and each candidate word, and screening target matching words from at least one candidate word according to the first editing distance, wherein the target matching words are used for replacing target words in the voice transcription text. The method and the device solve the technical problem of low normalization accuracy when the words in the voice transcription text are normalized in the related technology.

Description

Text processing method, text processing device, computer readable storage medium and electronic equipment

Technical Field

The present invention relates to the field of artificial intelligence, and in particular, to a text processing method, apparatus, computer readable storage medium, and electronic device.

Background

Speech is one of the main media for propagating information. In speech related applications (e.g., medical, dining, finance, etc.), a basic and important task is to extract key information from them. Such as: the hospital follow-up robot completes the record and tracking of the physical state and the medication condition of the patient through voice dialogue, and the robot needs to identify key information such as clinical manifestation, disease, medicine and the like of the patient in the dialogue process, so that the key information and the speaking operation are stored in a targeted mode. To implement the logic described above, as shown in fig. 1, the solution in the related art is: firstly, the voice audio is transcribed into a text through ASR (automatic speech recognition technology, automatic Speech Recognition), then important information in the transcribed text is extracted by using an extraction model to obtain a variant text, and finally, the variant text is subjected to standardized processing to obtain the canonical expression of the text.

In the application process, the transcribed text has serious harmonic sounds, dialects, multiple words, missing words and other variant phenomena, and the occurrence frequency of the phenomenon is higher, especially in the professional field and on the dialects. At present, in the related art, a canonical text corresponding to a variant text is usually obtained through screening based on a character editing distance, so that the complex and changeable variant problem cannot be completely solved, and the problem of low standardization accuracy is solved.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a text processing method, a text processing device, a computer readable storage medium and electronic equipment, which at least solve the technical problem of low normalization accuracy when normalization is performed on words in a voice transcription text in the related technology.

According to an aspect of an embodiment of the present invention, there is provided a text processing method including: obtaining target words, wherein the target words are words extracted from a voice transcription text; determining the pinyin of the target word, and carrying out vector representation on the pinyin to obtain a target vector; acquiring a plurality of standard words and the pinyin vectors of the plurality of standard words, and screening at least one candidate word from the plurality of standard words according to the target vector and the pinyin vectors of the plurality of standard words; and calculating a first editing distance between the target word and each candidate word, and screening target matching words from at least one candidate word according to the first editing distance, wherein the target matching words are used for replacing target words in the voice transcription text.

Further, the text processing method further comprises the following steps: splitting the pinyin of the target word to obtain the pinyin of each character in the target word; respectively extracting the characteristics of the pinyin of each character to obtain a characteristic vector corresponding to each character; and carrying out feature fusion on the feature vector of each character to obtain a target vector.

Further, the text processing method further comprises the following steps: for each character, carrying out vector representation on each letter in the pinyin of the current character to obtain a letter vector of each letter, and screening out a letter vector matched with the first letter; extracting the characteristics of the letter vectors of each letter to obtain the full-letter characteristic vector of the current character; vector representation is carried out on the pinyin of the current character, and a full pinyin vector is obtained; and splicing the full-letter feature vector, the first letter matched letter vector and the full-pinyin vector to obtain the feature vector of the current character.

Further, the text processing method further comprises the following steps: for each normative word, performing target processing on pinyin of the current normative word to obtain a positive sample matched with the current normative word, wherein the target processing comprises at least one of the following steps: deletion, insertion, replacement; for each normative word, randomly sampling N normative words except the current normative word from a plurality of normative words to obtain a negative sample matched with the current normative word, wherein N is a positive integer; taking each standard word and positive and negative samples matched with the standard word as a training sample, and constructing a training sample set; and acquiring an initial vector model, and training the initial vector model according to the loss function and the training sample set based on a comparison learning mode to obtain a target vector model.

Further, the text processing method further comprises the following steps: obtaining a plurality of dialect-mandarin parallel pairs, wherein the dialect-mandarin parallel pairs comprise dialect sentences and mandarin sentences, and the dialect sentences and the mandarin sentences are respectively composed of pinyin; for each dialect-Mandarin parallel pair, determining whether at least one pinyin co-occurrence pair exists, wherein the pinyin co-occurrence pair comprises dialect pinyin and Mandarin pinyin, the dialect pinyin and the Mandarin pinyin are different pinyins appearing in the current dialect-Mandarin parallel pair, and the dialect pinyin and the Mandarin pinyin do not appear in the same sentence in the current dialect-Mandarin parallel pair; under the condition that pinyin co-occurrence pairs exist in the plurality of dialect-Mandarin parallel pairs, replacing any one pinyin in the current standard word with the first target dialect pinyin according to the pinyin co-occurrence pairs of the plurality of dialect-Mandarin parallel pairs to obtain a positive sample, wherein any one pinyin and the first target dialect pinyin belong to the same pinyin co-occurrence pair.

Further, the text processing method further comprises the following steps: counting the occurrence times of the pinyin co-occurrence pairs in a plurality of dialect-Mandarin parallel pairs for each pinyin co-occurrence pair; determining the pinyin co-occurrence pairs with the times larger than a preset threshold value as target pinyin co-occurrence pairs; and replacing any one pinyin in the current standard words with the second target dialect pinyin according to the target pinyin co-occurrence pair to obtain a positive sample, wherein any one pinyin and the second target dialect pinyin belong to the same target pinyin co-occurrence pair.

Further, the text processing method further comprises the following steps: acquiring a plurality of Mandarin sentences, and counting the pinyin existing in the plurality of Mandarin sentences to obtain a pinyin set, wherein the Mandarin sentences consist of pinyin; combining any two pinyin in the pinyin set to obtain a plurality of approximate-tone pinyin pairs; calculating a second editing distance between the pinyin of each approximate-tone pinyin pair, and comparing each second editing distance with a preset editing distance to obtain a comparison result of each approximate-tone pinyin pair; according to the comparison result, determining the approximate sound pinyin pair with the second editing distance smaller than the preset editing distance as a target approximate sound pinyin pair; and replacing any pinyin in the current standard words with the approximate-tone pinyin according to the target approximate-tone pinyin pair to obtain a positive sample, wherein any pinyin and the approximate-tone pinyin belong to the same target approximate-tone pinyin pair.

Further, the text processing method further comprises the following steps: performing similarity calculation on the target vector and the pinyin vector of each standard word to obtain a similarity score between the target vector and the pinyin vector of each standard word; screening similarity scores larger than a preset threshold value from the similarity scores to obtain at least one target similarity score; determining that the canonical term matched with the at least one target similarity score is at least one candidate term.

Further, the text processing method further comprises the following steps: calculating the editing distance between the pinyin of the target word and the pinyin of each candidate word to obtain a first sub editing distance; calculating the editing distance between the characters of the target word and the characters of each candidate word to obtain a second sub editing distance; and calculating the sum of the first sub-editing distance and the second sub-editing distance to obtain the first editing distance.

According to another aspect of the embodiment of the present invention, there is also provided a text processing apparatus including: the system comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring target words, wherein the target words are words extracted from a voice transcription text; the determining module is used for determining the pinyin of the target word and carrying out vector representation on the pinyin to obtain a target vector; the first screening module is used for acquiring a plurality of standard words and the pinyin vectors of the standard words, and screening at least one candidate word from the standard words according to the target vector and the pinyin vectors of the standard words; and the second screening module is used for calculating a first editing distance between the target word and each candidate word and screening target matching words from at least one candidate word according to the first editing distance, wherein the target matching words are used for replacing target words in the voice transcription text.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to execute the above-described text processing method when run.

According to another aspect of an embodiment of the present invention, there is also provided an electronic device including one or more processors; and a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running the program, wherein the program is configured to perform the text processing method described above when run.

In the embodiment of the invention, a mode of determining standard words matched with target words according to the characteristics of the target words in pinyin dimension is adopted, the target words are obtained, the pinyin of the target words is then determined, vector representation is carried out on the pinyin to obtain target vectors, then the vectors of a plurality of standard words and the pinyin of the plurality of standard words are obtained, at least one candidate word is screened out of the plurality of standard words according to the target vectors and the vector of the pinyin of the plurality of standard words, so that the first editing distance between the target words and each candidate word is calculated, and the target matching words are screened out of the at least one candidate word according to the first editing distance, wherein the target words are words extracted from a voice transcription text, and the target matching words are used for replacing the target words in the voice transcription text.

It is easy to note that in the above process, the target vector is obtained by performing vector representation on the pinyin of the target word, so that the determination of the characteristics of the target word in the pinyin dimension is realized. Further, at least one candidate word is selected from the plurality of standard words according to the target vector and the vector of the pinyin of the plurality of standard words, and the candidate word similar to the pinyin of the target word can be effectively selected from the standard words, so that the problem that harmonic sounds and dialects cannot be solved due to the fact that candidate words are selected based on semantic vectors in the related technology is avoided, and the normalization accuracy when words in the voice transcription text are normalized is improved. Furthermore, by screening the target matching word from at least one candidate word based on the editing distance, effective determination of the target matching word is achieved, and therefore normalization accuracy is further improved.

Therefore, the scheme provided by the application achieves the purpose of vector representation of the pinyin of the target word, and the standard word matched with the target word is determined according to the obtained vector, so that the technical effect of improving the standardization accuracy is achieved, and the technical problem of low standardization accuracy when the word in the speech transcription text is standardized in the related technology is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

FIG. 1 is a schematic diagram of an alternative related art text processing method;

FIG. 2 is a schematic diagram of an alternative text processing method according to an embodiment of the invention;

FIG. 3 is a flow chart of an alternative text processing method according to an embodiment of the invention;

FIG. 4 is a schematic diagram of an alternative object vector model for determining full-letter feature vectors in accordance with an embodiment of the present invention;

FIG. 5 is a schematic diagram of an alternative object vector model determination operation according to an embodiment of the present invention;

FIG. 6 is a training flow diagram of an alternative target vector model according to an embodiment of the invention;

FIG. 7 is a schematic diagram of an alternative text processing device according to an embodiment of the invention;

fig. 8 is a schematic diagram of an alternative electronic device according to an embodiment of the invention.

Description of the embodiments

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In accordance with an embodiment of the present invention, there is provided an embodiment of a text processing method, it should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

The text processing method in the embodiment can be applied to an intelligent voice dialogue scene, in the intelligent voice dialogue scene, real-time voice stream can be transcribed into text information through an ASR module in the dialogue process, then key information words are obtained through an extraction model, and then the key information words are subjected to standardized (i.e. normalized) processing by adopting the text processing method in the embodiment to obtain standard words corresponding to the key information words, so that the key information words can be replaced by the corresponding standard words, and the conversation pushing and the key point saving are facilitated.

FIG. 2 is a schematic diagram of an alternative text processing method according to an embodiment of the invention, as shown in FIG. 2, comprising the steps of:

in step S201, a target word is obtained, where the target word is a word extracted from a speech transcription text.

Alternatively, the target word may be obtained by an electronic device, an application system, a server, or the like. In this embodiment, for convenience of description, a target system is defined, and the target system is used as a main body for executing the text processing method, and the target word is acquired through the target system. The target word is a word to be normalized (i.e., normalized), which corresponds to the aforementioned keyword and also corresponds to the variant text.

Step S202, determining the pinyin of the target word, and carrying out vector representation on the pinyin to obtain a target vector.

In step S202, the target system may convert the target word into pinyin and then vector the pinyin. FIG. 3 is a flowchart of an alternative text processing method according to an embodiment of the present invention, as shown in FIG. 3, in the vectorization process, the target system may use a pre-trained target vector model to vectorize the pinyin of the target word, so as to implement a vector representation of the pinyin of the target word, thereby obtaining a target vector.

It should be noted that, by performing vector representation on the pinyin of the target word, a target vector is obtained, so as to realize the determination of the characteristics of the target word in the pinyin dimension.

Step S203, a plurality of standard words and the pinyin vectors of the standard words are obtained, and at least one candidate word is selected from the standard words according to the target vector and the pinyin vectors of the standard words.

Optionally, the plurality of canonical words and the pinyin vectors of the plurality of canonical words are stored in a pre-constructed canonical word vector index library. In the process of constructing the normative word vector index library, the normative words can be preset by a worker. Further, the target system may convert each standard word into pinyin, and then, as shown in fig. 3, perform vectorization processing on the pinyin of the standard word by using the pre-trained target vector model, so as to obtain a vector of the pinyin of the standard word, where the vectors of the pinyin of all standard words together form a standard word vector index library.

In step S203, as shown in fig. 3, the target system may recall candidate terms for the pinyin vectors of the canonical terms in the canonical term vector index library by using the target vector, specifically, the step is to recall similar vectors, obtain a sorted set of pinyin vectors of the canonical terms similar to the target vector, and then select the canonical terms corresponding to the first z vectors from the sorted set as candidate terms, where z is a positive integer greater than 0.

It should be noted that, according to the target vector and the vector of the pinyin of the plurality of standard words, at least one candidate word is selected from the plurality of standard words, and the candidate word similar to the pinyin of the target word can be effectively selected from the standard words, so that the problems of harmonic sounds and dialects can be effectively solved, and further, the normalization accuracy when the words in the voice transcription text are normalized is improved.

Step S204, calculating a first editing distance between the target word and each candidate word, and screening target matching words from at least one candidate word according to the first editing distance, wherein the target matching words are used for replacing target words in the voice transcription text.

In step S204, after at least one candidate word is selected, as shown in fig. 3, the target system may calculate an edit distance between the target word and each candidate word on pinyin to obtain a first edit distance, optionally, the target system may also calculate an edit distance between the target word and each candidate word on characters to obtain a first edit distance, optionally, the target system may also calculate an edit distance between the target word and each candidate word on pinyin and characters to obtain a first edit distance.

Further, after determining the first edit distance of the target word from each candidate word, the target system may determine the candidate word having the smallest edit distance from the target word as the target matching word as shown in fig. 3.

It should be noted that, by determining the target matching word that is most matched with the target word according to the edit distance after determining the standard word, efficient determination of the target matching word is achieved.

Based on the above-mentioned schemes defined in steps S201 to S204, it may be known that, in the embodiment of the present invention, a manner of determining a standard word matched with a target word according to a feature of the target word in a pinyin dimension is adopted, by obtaining the target word, then determining a pinyin of the target word, and performing vector representation on the pinyin to obtain a target vector, then obtaining a plurality of standard words and vectors of the pinyin of the plurality of standard words, and according to the target vector and the vectors of the pinyin of the plurality of standard words, screening at least one candidate word from the plurality of standard words, thereby calculating a first editing distance between the target word and each candidate word, and screening a target matching word from the at least one candidate word according to the first editing distance, where the target word is a word extracted from a speech transcription text, and the target matching word is used for replacing the target word in the speech transcription text.

In an alternative embodiment, in the process of performing vector representation on pinyin to obtain a target vector, the target system may split the pinyin of the target word to obtain the pinyin of each character in the target word, then perform feature extraction on the pinyin of each character to obtain a feature vector corresponding to each character, and then perform feature fusion on the feature vector of each character to obtain the target vector.

Optionally, in this embodiment, the target system performs vector representation on pinyin by using the pre-trained target vector model. Specifically, the target system firstly splits the pinyin of the target word to obtain the pinyin of each character in the target word. For example, the pinyin "waizhouxueshuan" of the "peripheral thrombus" is split to obtain the pinyin "wai" of the character "outer", the pinyin "zhou" of the character "peripheral", the pinyin "xue" of the character "blood", and the pinyin "shuan" of the character "suppository". And then, the target system can input the pinyin of each character into the target vector model together so as to respectively perform feature extraction on the pinyin of each character to obtain a feature vector corresponding to each character. The objective vector model designs a unique encoder (encoder) structure to model pinyin features, the structure extracts feature vectors corresponding to each character from three angles of full pinyin, initial and full letter mixed modeling, fig. 4 is a working schematic diagram of an alternative objective vector model for determining full-letter feature vectors according to an embodiment of the present invention, and is used for extracting full-letter feature vectors, as shown in fig. 4, and the structure combines a convolutional neural network layer (CNN), a Pooling layer (Pooling) and a random inactivation (Dropout) layer to improve model robustness and generalization.

Further, FIG. 5 is a schematic diagram of an alternative operation of determining a target vector by using a target vector model according to an embodiment of the present invention, as shown in FIG. 5, after obtaining the feature vector of each character, the target vector model determines the feature vector of each character (i.e., x in FIG. 5 ₁ 、x ₂ 、x ₃ 、x ₄ ) And (3) sending the target vector into a random inactivation (Dropout) layer and a long and short term memory artificial neural network (LSTM) layer for sequence modeling, and then sequentially processing the target vector through a convolutional neural network layer (CNN), a Pooling layer, a random inactivation (Dropout) layer and a full connection (Dense) layer.

It should be noted that, by respectively extracting the characteristics of the pinyin of each character in the target word, the effective determination of the characteristics of each character in the target word in the pinyin dimension is realized, so that the accuracy of obtaining the target vector of the target word is improved.

In an alternative embodiment, in the process of respectively extracting features of the pinyin of each character to obtain feature vectors corresponding to each character, for each character, the target system may perform vector representation on each letter in the pinyin of the current character to obtain a letter vector of each letter, screen out a letter vector matched with the first letter, then perform feature extraction on the letter vector of each letter to obtain a full-letter feature vector of the current character, and then perform vector representation on the pinyin of the current character to obtain a full-pinyin vector, thereby splicing the full-letter feature vector, the letter vector matched with the first letter and the full-pinyin vector to obtain the feature vector of the current character.

Alternatively, in the present embodiment, for eachThe character and the letters in the pinyin of the character can be expressed as x _i （c ₁ ,c ₂ ,c ₃ ,...c _n ) Wherein x is _i Representing the ith character, c ₁ -c _n Is x _i For example, 'peripheral', 'z', 'h', 'o', 'u', and the pinyin of (a) includes pinyin letters.

Alternatively, for each character, the target vector model may first perform vector representation on each letter in the pinyin of the current character to obtain a letter vector of each letter, then send the letter vector to a random inactivation (Dropout) layer as shown in fig. 4, and then sequentially process the letter vector through a convolutional neural network layer (CNN), a Pooling (Pooling) layer, a random inactivation (Dropout) layer, and a full connection (Dense) layer as shown in fig. 4, so as to obtain a unique full-letter feature vector [ z, h, o, u ] of the current character. Alternatively, after deriving the alphabetic vector for each letter of the current character, the target vector model may filter out the alphabetic vectors that match the initial, e.g., for 'week' ('z', 'h', 'o', 'u'), z being the initial. Optionally, the target vector model may also perform vector representation on the entire pinyin of the current character, that is, vector representation on "zhou", so as to obtain a full pinyin vector.

Further, the target vector model may splice the obtained full-letter feature vector, the first-letter matched letter vector, and the full-pinyin vector, thereby obtaining a feature vector 'perimeter' ([ z, h, o, u ], z, zhou) of the current character.

Still further, if the target word is "peripheral thrombus", the target vector will be "" x "" after obtaining the feature vector of each character therein, as shown in FIG. 5 ₁ : exo' ([ w, a, i)]，w，wai)，‘x ₂ : peri' ([ z, h, o, u)]，z，zhou)，‘x ₃ : blood' ([ x, u, e)]，x，xue)，‘x ₄ : bolt' ([ s, h, u, a, n)]S, shan) "is sent into a random inactivation (Dropout) layer and a long and short term memory artificial neural network (LSTM) layer for sequence modeling, and then sequentially goes through a convolutional neural network layer (CNN), a Pooling layer (Pooling) and a random inactivation (D)ropout) layer and full connection (Dense) layer, to obtain the final target vector.

It should be noted that, by designing the target vector model based on multiple angles (full pinyin, initial and full letter), the robustness and generalization of the model are effectively improved, so that the standardization accuracy of target words is further improved.

In an alternative embodiment, the pinyin is represented by a vector by a target vector model, so as to obtain a target vector, wherein the target vector model is trained by the following method: for each standard word, the target system can perform target processing on pinyin of the current standard word to obtain a positive sample matched with the current standard word, then randomly sampling N standard words except the current standard word from a plurality of standard words to obtain a negative sample matched with the current standard word, then constructing a training sample set by taking each standard word and the positive sample and the negative sample matched with the standard word as a training sample, then obtaining an initial vector model, and training the initial vector model according to a loss function and the training sample set based on a comparison learning mode to obtain a target vector model. Wherein the target process comprises at least one of: deleting, inserting and replacing, wherein N is a positive integer.

Optionally, fig. 6 is a training flowchart of an optional target vector model according to an embodiment of the present invention, and as shown in fig. 6, in the process of constructing a training sample set, the target system may process the pinyin of the current canonical word from the word level and word level directions to obtain a positive sample of the matching of the current canonical word. Specifically, at the word level, the target system may perform processing of randomly taking the first letter, [ first-last ] -randomly inserting the letter, [ randomly taking the sub-pinyin string, randomly deleting the letter, etc. on the pinyin of a certain character of the current canonical word. For example, if the target word is "peripheral thrombosis", the pinyin obtained after performing the random initial processing may be "waizxueshuan", where only the initial "z" in "zhou" is taken; after randomly inserting letters, the pinyin can be "waizhouxueshuang", wherein the letters "g" are added at the tail; after the processing of randomly taking the sub-pinyin strings, the obtained pinyin can be 'waizhouxueshu', wherein the sub-pinyin strings of 'an' are taken; after the random deletion letter processing, the resulting pinyin may be "waizhouxuanchua", where the letter "n" is removed. It is to be noted that, the spelling of a character of the current standard word is randomly first-letter-inserted, [ first-second-last ] and randomly-inserted, so that the problems of missing and multiple words can be solved when positive and negative samples are constructed, and the processing of randomly-reading sub-spelling strings and randomly deleting letters can be adopted, so that both harmony and dialect can be considered when positive and negative samples are constructed.

Optionally, at word level, the target system may delete all or randomly add pinyin for a certain character of the current canonical word, or optionally, may replace pinyin for a certain character according to a pre-constructed dialect comparison table or harmonic comparison table. For example, if the target word is "peripheral thrombosis", after all the pinyin of a certain character is deleted, the obtained pinyin may be "waizhushuan", where the pinyin "xue" of the character "blood" is deleted; after the pinyin of a certain character is randomly added and replaced, the obtained pinyin can be 'waiguoxieshuan', wherein the pinyin 'zhou' of the character 'week' is replaced by 'guo'. In this embodiment, the processing of the standard word may be a processing manner of each of the word level and the word level, or may be a plurality of processing manners, and only one positive sample is constructed for one standard word. By randomly deleting pinyin and randomly adding pinyin for replacement at word level, harmony and dialect can be considered when positive and negative samples are constructed.

Optionally, as shown in fig. 6, in the process of constructing the training sample set, the target system may randomly sample N normative terms except the current normative term from the multiple normative terms, so as to obtain multiple negative samples matched by the current normative terms. Further, after the positive sample and the negative sample corresponding to each standard word are obtained, the target system can take each standard word and the positive sample and the negative sample matched with the standard word as a training sample, so that a training sample set is constructed.

Then, as shown in fig. 6, the target system may construct an initial vector model, then, based on a comparison learning framework, form a positive sample pair from an anchor sample (equivalent to a standard word) and a positive sample, form a negative sample pair from the anchor sample and the negative sample, send a plurality of anchor samples and the positive and negative samples thereof into the initial vector model, obtain representative vectors of each sample through the aforementioned encoder (encoder) structure, and train the representative vectors through a comparison loss. In the training process, the loss function is constructed so that the vector of the anchor point sample and the vector of the positive sample are zoomed in, and the vector of the anchor point sample and the vector of the negative sample are zoomed out, so that after the training is completed, the effective target vector model shown in fig. 6 is obtained.

Note that, since the target process includes at least one of: deleting, inserting and replacing, so that a positive sample is obtained by carrying out target processing on the current standard words, on one hand, the construction mode of a comparison sample with missing words and multiple words is realized, the common problem of missing word repetition in voice transcription is solved, on the other hand, the construction mode of a comparison sample with harmonic sounds and dialects is realized, and the problem of poor recognition effect of the common harmonic sounds and dialects in voice transcription is solved.

In an alternative embodiment, in the case that the target processing is replacement, in the process of performing the target processing on the pinyin of the current canonical word to obtain the positive sample matched with the current canonical word, the target system may obtain a plurality of dialect-mandarin parallel pairs, and then determine, for each dialect-mandarin parallel pair, whether at least one pinyin co-occurrence pair exists, so that in the case that the pinyin co-occurrence pair exists in the plurality of dialect-mandarin parallel pairs, any pinyin in the current canonical word is replaced with the pinyin of the first target dialect according to the pinyin co-occurrence pair of the plurality of dialect-mandarin parallel pairs, to obtain the positive sample. The dialect-Mandarin parallel pair comprises a dialect sentence and a Mandarin sentence, the dialect sentence and the Mandarin sentence are respectively composed of pinyin, the pinyin co-occurrence pair comprises dialect pinyin and Mandarin pinyin, the dialect pinyin and the Mandarin pinyin are different pinyins appearing in the current dialect-Mandarin parallel pair, the dialect pinyin and the Mandarin pinyin do not appear in the same sentence in the current dialect-Mandarin parallel pair, and any one pinyin and the first target dialect pinyin belong to the same pinyin co-occurrence pair.

Alternatively, in the process of constructing the positive sample, a dialect reference table may be constructed in advance, and then the foregoing word level processing may be implemented through the dialect reference table. Specifically, the target system may first obtain a dialect-Mandarin parallel corpus FP (f ₁ -p ₁ ，f ₂ -p ₂ ，f ₃ -p ₃ ，...，f _i -p _i ) Wherein f _i -p _i Represents the ith dialect-Mandarin parallel pair, dialect sentence f _i From the pinyin fw of each character in the sentence ₁ ，fw ₂ ，fw ₃ ，...，fw _n Composition, fw _n Pinyin representing the nth character in the dialect sentence, mandarin sentence p _i From the pinyin pw of each character in a sentence ₁ ，pw ₂ ，pw ₃ ，...，pw _n Composition, pw _n Representing the pinyin for the nth character in the mandarin chinese sentence.

Further, for each dialect-Mandarin parallel pair, it is determined whether there is at least one pinyin co-occurrence pair, for example, if the dialect-Mandarin parallel pair is "zheshinidezian-zheshingezian", where the Mandarin sentence is the pinyin sentence "zheshinidezian" of your dictionary, the dialect sentence is the pinyin sentence "zheshingezian" of you Ge Zidian ", the" de "and" ge "are different and do not appear in the same sentence of this dialect-Mandarin parallel pair, and thus" ge "in the dialect sentence can be determined as the dialect pinyin in the pinyin co-occurrence pair, and" de "in the Mandarin sentence as the Mandarin pinyin in the pinyin co-occurrence pair.

Alternatively, the target system may traverse the dialect-Mandarin parallel corpus FP, determine whether pinyin co-occurrence pairs exist in the plurality of dialect-Mandarin parallel pairs, and add the pinyin co-occurrence pairs that all dialect-Mandarin parallel pairs can make up to the dialect comparison table if pinyin co-occurrence pairs exist in the plurality of dialect-Mandarin parallel pairs. Then, the target system can randomly select one pinyin to be replaced from the current standard words, and then select a pinyin co-occurrence pair comprising the pinyin to be replaced from the plurality of obtained pinyin co-occurrence pairs, wherein the pinyin except the pinyin to be replaced in the pinyin co-occurrence pairs is determined to be the pinyin of the first target dialect, so that replacement processing is carried out, and a positive sample is obtained. Optionally, the target system may also determine the pinyin co-occurrence pair that is last used for the replacement process according to the number of occurrences of the pinyin co-occurrence pair.

It should be noted that, by constructing the dialect comparison table and constructing the positive sample of the standard words based on the dialect comparison table, the target vector model obtained by training can effectively identify the dialect words and output accurate vector representation, and further the standardization accuracy of the application can be improved.

In an alternative embodiment, in the process of replacing any pinyin in the current standard word with the first target dialect pinyin according to the pinyin co-occurrence pairs of the plurality of dialect-mandarin parallel pairs to obtain the positive sample, for each pinyin co-occurrence pair, the target system may count the number of occurrences of the pinyin co-occurrence pair in the plurality of dialect-mandarin parallel pairs, and then determine the pinyin co-occurrence pair with the number of occurrences greater than the preset threshold as the target pinyin co-occurrence pair, so that any pinyin in the current standard word is replaced with the second target dialect pinyin according to the target pinyin co-occurrence pair to obtain the positive sample, where any pinyin and the second target dialect pinyin belong to the same target pinyin co-occurrence pair.

Optionally, the target system may calculate the number of occurrences of each pinyin co-occurrence pair simultaneously in the process of traversing the dialect-mandarin parallel corpus FP, and then construct the pinyin co-occurrence matrix by using the number of occurrences of the pinyin co-occurrence pair as an element value.

Furthermore, the target system can screen out pinyin co-occurrence pairs with occurrence times larger than a preset threshold based on the pinyin co-occurrence matrix, take the pinyin co-occurrence pairs as target pinyin co-occurrence pairs, and only add the target pinyin co-occurrence pairs into the dialect comparison table. Then, the target system may perform replacement processing on the current standard word according to the target pinyin co-occurrence, where a method of performing replacement processing on the current standard word according to the target pinyin co-occurrence is the same as a method of performing replacement processing on the current standard word according to the pinyin co-occurrence, so that details are not repeated herein.

It should be noted that, by further screening the pinyin co-occurrence pairs obtained, the validity of the dialect comparison table is ensured, so that the accuracy of recognition of the dialect by the target vector model is further improved.

In an alternative embodiment, in the case that the target processing is replacement, in the process of performing the target processing on the pinyin of the current standard word to obtain the positive sample matched with the current standard word, the target system may obtain a plurality of mandarin sentences, count the pinyins existing in the plurality of mandarin sentences to obtain a pinyin set, then combine any two pinyins in the pinyin set to obtain a plurality of approximate-tone pinyin pairs, calculate a second editing distance between the pinyins in each of the approximate-tone pinyin pairs, compare each second editing distance with the preset editing distance to obtain a comparison result of each of the approximate-tone pinyin pairs, and then determine, according to the comparison result, an approximate-tone pinyin pair with the second editing distance smaller than the preset editing distance as a target approximate-tone pinyin pair, thereby replacing any one pinyin in the current standard word with an approximate-tone pinyin according to the target approximate-tone pinyin pair to obtain the positive sample, wherein any one pinyin belongs to the same target approximate-tone pair. Wherein, the Mandarin sentence is composed of pinyin.

Alternatively, in the process of constructing the positive sample, a harmonic tone comparison table may be constructed in advance, and then the foregoing word level processing is implemented through the harmonic tone comparison table, where in this embodiment, the harmonic tone represents pinyin whose pronunciation is similar, that is, approximate sound. Specifically, the target system may first obtain a Mandarin sentence set P (P ₁ ，p ₂ ，p ₃ ，...p _i ) Wherein, mandarin sentence p _i From the pinyin pw of each character in a sentence ₁ ，pw ₂ ，pw ₃ ，...，pw _n Composition, pw _n Representing the pinyin for the nth character in the mandarin chinese sentence.

Then, the target system can count the pinyin existing in a plurality of Mandarin sentences to obtain a pinyin set W (W ₁ ，w ₂ ，w ₃ ，...w _x ). Wherein w is _x Representing the x-th pinyin in the pinyin set, and each pinyin in the pinyin set being different, e.g., if the Mandarin sentence set includes p ₁ - "nide" and p ₂ "nihao", the Pinyin set is (ni, de, hao).

Still further, the target system may combine any two pinyin in the pinyin collection to obtain a plurality of approximate pinyin pairs, e.g., for the foregoing pinyin collection (ni, de, hao), approximate pinyin pairs "(ni, hao)", "(ni, de)" and "(hao, de)". Then, the target system may calculate a second edit distance between two pinyin in each of the pair of approximate-tone pinyin, thereby determining the pair of approximate-tone pinyin having a second edit distance less than the preset edit distance as a target pair of approximate-tone pinyin, and adding the target pair of approximate-tone pinyin to the harmonic-tone lookup table. The target system may then replace the current canonical word according to the target approximate pinyin. The method for replacing the current standard word according to the target approximate pinyin is the same as the method for replacing the current standard word according to pinyin co-occurrence, so that the description is omitted here.

It should be noted that, through constructing the harmonic sound comparison table and constructing the positive sample of the standard words based on the harmonic sound comparison table, the target vector model obtained through training can effectively identify the harmonic sound words and output accurate vector representation, and further can improve the standardization accuracy of the application.

In an alternative embodiment, in the process of screening at least one candidate word from the plurality of standard words according to the target vector and the pinyin vectors of the plurality of standard words, the target system may perform similarity calculation on the target vector and the pinyin vector of each standard word to obtain a similarity score between the target vector and the pinyin vector of each standard word, and then screen a similarity score greater than a preset threshold from the plurality of similarity scores to obtain at least one target similarity score, so as to determine that the standard word with the at least one target similarity score matched is the at least one candidate word.

Optionally, in determining the candidate terms, the target system may perform a two-by-two-point similarity calculation on the target vector and the pinyin vector of each canonical term. Since the similarity is calculated by using the vector, the calculation can be accelerated by matrix operation, namely, the similarity score of the target vector and the vector in the whole normalized word vector index library is calculated at one time.

Then, the target system may sort the similarity scores, and then select a similarity score greater than a preset threshold value from the sorted similarity scores to obtain at least one target similarity score. The fixed value may be determined as a preset threshold, or the z-th similarity score in the ranking may be determined as a preset threshold.

Still further, the target system may determine the canonical term for which the at least one target similarity score matches as at least one candidate term.

By adopting vector index, the problems of overlong inverted zippers and high time complexity existing in the related art based on word/pinyin inverted index recall are avoided, so that the recall speed can be accelerated, and the overall time complexity is reduced. In addition, candidate words are obtained through screening according to the similarity scores, the calculated amount in the process of determining the target matching words is effectively reduced, and therefore the working efficiency is improved.

In an alternative embodiment, in the process of calculating the first editing distance between the target word and each candidate word, the target system may calculate the editing distance between the pinyin of the target word and the pinyin of each candidate word to obtain a first sub-editing distance, and then calculate the editing distance between the character of the target word and the character of each candidate word to obtain a second sub-editing distance, thereby calculating the sum of the first sub-editing distance and the second sub-editing distance to obtain the first editing distance.

For example, if the target word is "my", and a certain normative word is "your", the edit distance between the character of the target word and the character of the normative word is 1, and the edit distance between the pinyin "wo de" of the target word and the pinyin "ni de" of the normative word is 2, thereby determining that the first edit distance between the target word and the normative word is 3.

Still further, the target system may determine the canonical term that has the smallest edit distance to the target term as the target matching term.

It should be noted that, by determining the edit distance from two dimensions of the character and the pinyin, the reference dimension of the edit distance is enriched, so that the target matching word can be determined more accurately.

Optionally, an application of the present application in a medical scenario is illustrated. Wherein, the dialogue between the medical follow-up robot and the patient is transcribed as follows:

machine: you good, i am on this side is xxx hospital visit specialist xxx, and now want to do a follow-up job to you take several minutes, is you my own?

Patient: and so is i am.

Machine: please ask you that the antihypertensive drug is in the bar recently?

Patient: do you say a sauced duck? That is eating, the other medicines are also eaten, and the next kick lol also eats woolen.

Machine: preferably, i note here about your situation, and subsequently if there is a problem, the nurse station phone xxx can be dialed again. Thank you for the coordination, you want to be healthy, see the upper portion again

Optionally, the text processing method provided by the embodiment can convert the above-mentioned "sauced duck" into "blood pressure reducing", and convert the above-mentioned "danshou" into "atenolol", so that the background of the robot can perform logic judgment and medicine recording according to standard information.

Optionally, an application of the present application in a dining scenario is illustrated. The conversation between the meal ordering robot and the guest is transcribed as follows:

machine: welcome call xxx restaurant order hotline, i are the current attendant xxx. Is very happy to serve you-!

Guest: i want to go to a period of 331482, by a period of three and a half pm.

Machine: is you already bound here, ask what can help you?

Guest: and not.

Machine: good , thank you for trust of the restaurant, and you want to get dinner happily and see more than

Optionally, the text processing method provided in this embodiment may convert the "back-office" 33148b "into" tomorrow "and the" eater "into" ten ", so that the background of the robot may perform logic judgment and record time and the number of people according to standard information.

Example 2

According to an embodiment of the present invention, there is provided an embodiment of a text processing apparatus, wherein fig. 7 is a schematic diagram of an alternative text processing apparatus according to an embodiment of the present invention, as shown in fig. 7, the apparatus includes:

the obtaining module 701 is configured to obtain a target word, where the target word is a word extracted from a speech transcription text;

the determining module 702 is configured to determine pinyin of a target word, and perform vector representation on the pinyin to obtain a target vector;

a first screening module 703, configured to obtain a plurality of standard terms and pinyin vectors of the plurality of standard terms, and screen at least one candidate term from the plurality of standard terms according to the target vector and the pinyin vectors of the plurality of standard terms;

the second screening module 704 is configured to calculate a first editing distance between the target word and each candidate word, and screen a target matching word from at least one candidate word according to the first editing distance, where the target matching word is used to replace a target word in the speech transcription text.

It should be noted that the above-mentioned obtaining module 701, determining module 702, first filtering module 703 and second filtering module 704 correspond to steps S201 to S204 in the above-mentioned embodiment, and the four modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in the above-mentioned embodiment 1.

Optionally, the determining module 702 further includes: the splitting module is used for splitting the pinyin of the target word to obtain the pinyin of each character in the target word; the feature extraction submodule is used for respectively carrying out feature extraction on the pinyin of each character to obtain a feature vector corresponding to each character; and the feature fusion sub-module is used for carrying out feature fusion on the feature vector of each character to obtain a target vector.

Optionally, the feature extraction sub-module further includes: the screening unit is used for carrying out vector representation on each letter in the pinyin of the current character to obtain a letter vector of each letter, and screening out the letter vector matched with the first letter; the feature extraction unit is used for extracting the features of the letter vectors of each letter to obtain the full-letter feature vector of the current character; the vector representation unit is used for carrying out vector representation on the pinyin of the current character to obtain a full pinyin vector; and the splicing unit is used for splicing the full-letter feature vector, the initial matched letter vector and the full-pinyin vector to obtain the feature vector of the current character.

Optionally, the text processing device further includes: the first processing module is used for carrying out target processing on pinyin of the current standard word for each standard word to obtain a positive sample matched with the current standard word, wherein the target processing comprises at least one of the following steps: deletion, insertion, replacement; the second processing module is used for randomly sampling N standard words except the current standard word from the plurality of standard words for each standard word to obtain a negative sample matched with the current standard word, wherein N is a positive integer; the construction module is used for constructing a training sample set by taking each standard word and positive and negative samples matched with the standard word as a training sample; the training module is used for acquiring an initial vector model, training the initial vector model according to the loss function and the training sample set based on a comparison learning mode, and obtaining a target vector model.

Optionally, the first processing module further includes: the first acquisition submodule is used for acquiring a plurality of dialect-mandarin parallel pairs, wherein the dialect-mandarin parallel pairs comprise dialect sentences and mandarin sentences, and the dialect sentences and the mandarin sentences are respectively composed of pinyin; a first determining sub-module for determining, for each dialect-mandarin parallel pair, whether there is at least one pinyin co-occurrence pair, wherein the pinyin co-occurrence pair includes a dialect pinyin and a mandarin pinyin, the dialect pinyin and the mandarin pinyin are different pinyins that appear in the current dialect-mandarin parallel pair, and the dialect pinyin and the mandarin pinyin do not appear in the same sentence in the current dialect-mandarin parallel pair; and the first processing submodule is used for replacing any pinyin in the current standard word with the first target dialect pinyin according to the pinyin co-occurrence pairs of the dialect-mandarin parallel pairs under the condition that the pinyin co-occurrence pairs exist in the dialect-mandarin parallel pairs, so as to obtain a positive sample, wherein any pinyin and the first target dialect pinyin belong to the same pinyin co-occurrence pair.

Optionally, the second processing sub-module further includes: a statistics unit, configured to, for each pinyin co-occurrence pair, count the number of occurrences of the pinyin co-occurrence pair in a plurality of dialects-mandarin parallel pairs; the determining unit is used for determining the pinyin co-occurrence pair with the times larger than a preset threshold value as a target pinyin co-occurrence pair; and the processing unit is used for replacing any pinyin in the current standard word with the pinyin of the second target dialect according to the target pinyin co-occurrence pair to obtain a positive sample, wherein any pinyin and the pinyin of the second target dialect belong to the same target pinyin co-occurrence pair.

Optionally, the first processing module further includes: the second acquisition sub-module is used for acquiring a plurality of Mandarin sentences and counting the pinyin existing in the Mandarin sentences to obtain a pinyin set, wherein the Mandarin sentences consist of pinyin; the second processing submodule is used for combining any two pinyin in the pinyin set to obtain a plurality of approximate-tone pinyin pairs; the first calculating sub-module is used for calculating second editing distances between the pinyin of each approximate sound pinyin pair and comparing each second editing distance with a preset editing distance to obtain a comparison result of each approximate sound pinyin pair; the second determining submodule is used for determining an approximate sound pinyin pair with a second editing distance smaller than a preset editing distance as a target approximate sound pinyin pair according to the comparison result; and the third processing sub-module is used for replacing any pinyin in the current standard words with the approximate-tone pinyin according to the target approximate-tone pinyin pair to obtain a positive sample, wherein any pinyin and the approximate-tone pinyin belong to the same target approximate-tone pinyin pair.

Optionally, the first screening module further includes: the second computing sub-module is used for carrying out similarity computation on the target vector and the pinyin vector of each standard word to obtain a similarity score between the target vector and the pinyin vector of each standard word; the screening submodule is used for screening out similarity scores larger than a preset threshold value from the similarity scores to obtain at least one target similarity score; and a third determining sub-module, configured to determine that the canonical term matched by the at least one target similarity score is at least one candidate term.

Optionally, the second screening module further includes: the third calculating sub-module is used for calculating the editing distance between the pinyin of the target word and the pinyin of each candidate word to obtain a first sub-editing distance; a fourth calculation sub-module, configured to calculate an editing distance between the character of the target word and the character of each candidate word, to obtain a second sub-editing distance; and the fifth calculating sub-module is used for calculating the sum of the first sub-editing distance and the second sub-editing distance to obtain the first editing distance.

Example 3

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium in which a computer program is stored, wherein the computer program is configured to perform the above-described text processing method when run.

Example 4

According to another aspect of an embodiment of the present invention, there is also provided an electronic device, wherein fig. 8 is a schematic diagram of an alternative electronic device according to an embodiment of the present invention, as shown in fig. 8, the electronic device including one or more processors; and a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running the program, wherein the program is configured to perform the text processing method described above when run.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of units may be a logic function division, and there may be another division manner in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A text processing method, comprising:

obtaining target words, wherein the target words are words extracted from a voice transcription text;

determining the pinyin of the target word, and carrying out vector representation on the pinyin to obtain a target vector;

acquiring a plurality of standard words and the pinyin vectors of the standard words, and screening at least one candidate word from the standard words according to the target vector and the pinyin vectors of the standard words;

calculating a first editing distance between the target word and each candidate word, and screening target matching words from the at least one candidate word according to the first editing distance, wherein the target matching words are used for replacing target words in the voice transcription text.

2. The method of claim 1, wherein vector representing the pinyin to obtain a target vector comprises:

Splitting the pinyin of the target word to obtain the pinyin of each character in the target word;

respectively extracting the characteristics of the pinyin of each character to obtain a characteristic vector corresponding to each character;

and carrying out feature fusion on the feature vector of each character to obtain the target vector.

3. The method of claim 2, wherein the feature extraction is performed on the pinyin of each character to obtain a feature vector corresponding to each character, and the method comprises:

for each character, carrying out vector representation on each letter in the pinyin of the current character to obtain a letter vector of each letter, and screening out a letter vector matched with the first letter;

extracting the characteristics of the letter vectors of each letter to obtain the full-letter characteristic vector of the current character;

vector representation is carried out on the pinyin of the current character, and a full pinyin vector is obtained;

and splicing the full-letter feature vector, the initial matched letter vector and the full-pinyin vector to obtain the feature vector of the current character.

4. A method according to any one of claims 1 to 3, wherein the pinyin is vector represented by a target vector model, resulting in the target vector, wherein the target vector model is trained by:

For each normative word, performing target processing on pinyin of the current normative word to obtain a positive sample matched with the current normative word, wherein the target processing comprises at least one of the following steps: deletion, insertion, replacement;

for each normative word, randomly sampling N normative words except the current normative word from the plurality of normative words to obtain a negative sample matched with the current normative word, wherein N is a positive integer;

taking each standard word and positive and negative samples matched with the standard word as a training sample, and constructing a training sample set;

and acquiring an initial vector model, and training the initial vector model according to a loss function and the training sample set based on a comparison learning mode to obtain the target vector model.

5. The method of claim 4, wherein, in the case that the target processing is replacement, performing target processing on pinyin of a current normative word to obtain a positive sample of matching of the current normative word, including:

obtaining a plurality of dialect-mandarin parallel pairs, wherein the dialect-mandarin parallel pairs comprise dialect sentences and mandarin sentences, and the dialect sentences and the mandarin sentences respectively consist of pinyin;

For each dialect-mandarin parallel pair, determining whether at least one pinyin co-occurrence pair exists, wherein the pinyin co-occurrence pair comprises a dialect pinyin and a mandarin pinyin, the dialect pinyin and the mandarin pinyin are different pinyins appearing in a current dialect-mandarin parallel pair, and the dialect pinyin and the mandarin pinyin are not appearing in the same sentence in the current dialect-mandarin parallel pair;

and under the condition that the pinyin co-occurrence pairs exist in the plurality of dialect-Mandarin parallel pairs, replacing any one pinyin in the current standard word with the first target dialect pinyin according to the pinyin co-occurrence pairs of the plurality of dialect-Mandarin parallel pairs to obtain the positive sample, wherein the any one pinyin and the first target dialect pinyin belong to the same pinyin co-occurrence pair.

6. The method of claim 5, wherein replacing any pinyin in the current canonical word with the first target dialect pinyin based on pinyin co-occurrence pairs of the plurality of dialect-mandarin parallel pairs to obtain the positive sample comprises:

counting the occurrence times of the pinyin co-occurrence pairs in the plurality of dialect-mandarin parallel pairs for each pinyin co-occurrence pair;

Determining the pinyin co-occurrence pair with the frequency larger than a preset threshold value as a target pinyin co-occurrence pair;

and replacing any pinyin in the current standard word with a second target dialect pinyin according to the target pinyin co-occurrence pair to obtain the positive sample, wherein the any pinyin and the second target dialect pinyin belong to the same target pinyin co-occurrence pair.

7. The method of claim 4, wherein, in the case that the target processing is replacement, performing target processing on pinyin of a current normative word to obtain a positive sample of matching of the current normative word, including:

acquiring a plurality of Mandarin sentences, and counting the pinyin existing in the plurality of Mandarin sentences to obtain a pinyin set, wherein the Mandarin sentences consist of pinyin;

combining any two pinyin in the pinyin set to obtain a plurality of approximate-tone pinyin pairs;

calculating a second editing distance between the pinyin of each approximate-tone pinyin pair, and comparing each second editing distance with a preset editing distance to obtain a comparison result of each approximate-tone pinyin pair;

according to the comparison result, determining the approximate sound pinyin pair with the second editing distance smaller than the preset editing distance as a target approximate sound pinyin pair;

And replacing any pinyin in the current standard word with the approximate pinyin according to the target approximate pinyin pair to obtain the positive sample, wherein the any pinyin and the approximate pinyin belong to the same target approximate pinyin pair.

8. The method of claim 1, wherein selecting at least one candidate word from the plurality of canonical words based on the target vector and the vector of pinyin for the plurality of canonical words comprises:

performing similarity calculation on the target vector and the pinyin vector of each standard word to obtain a similarity score between the target vector and the pinyin vector of each standard word;

screening similarity scores larger than a preset threshold value from the similarity scores to obtain at least one target similarity score;

and determining the normative words matched with the at least one target similarity score as the at least one candidate word.

9. The method of claim 1, wherein calculating a first edit distance for the target term from each candidate term comprises:

calculating the editing distance between the pinyin of the target word and the pinyin of each candidate word to obtain a first sub editing distance;

Calculating the editing distance between the characters of the target word and the characters of each candidate word to obtain a second sub editing distance;

and calculating the sum of the first sub-editing distance and the second sub-editing distance to obtain the first editing distance.

10. A text processing apparatus, comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring target words, wherein the target words are words extracted from a voice transcription text;

the determining module is used for determining the pinyin of the target word and carrying out vector representation on the pinyin to obtain a target vector;

the first screening module is used for acquiring a plurality of standard words and the pinyin vectors of the standard words, and screening at least one candidate word from the standard words according to the target vector and the pinyin vectors of the standard words;

and the second screening module is used for calculating a first editing distance between the target word and each candidate word and screening target matching words from the at least one candidate word according to the first editing distance, wherein the target matching words are used for replacing the target word in the voice transcription text.

11. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program, wherein the computer program is arranged to execute the text processing method as claimed in any one of claims 1 to 9 when run.

12. An electronic device, the electronic device comprising one or more processors; a memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement a method for running a program, wherein the program is configured to perform the text processing method of any of claims 1 to 9 when run.