CN116595970A - Sentence synonymous rewriting method and device and electronic equipment - Google Patents

Sentence synonymous rewriting method and device and electronic equipment Download PDF

Info

Publication number
CN116595970A
CN116595970A CN202310269238.3A CN202310269238A CN116595970A CN 116595970 A CN116595970 A CN 116595970A CN 202310269238 A CN202310269238 A CN 202310269238A CN 116595970 A CN116595970 A CN 116595970A
Authority
CN
China
Prior art keywords
sentence
processed
synonym
dialogue
dimension reduction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310269238.3A
Other languages
Chinese (zh)
Inventor
欧文杰
林悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Hangzhou Network Co Ltd
Original Assignee
Netease Hangzhou Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Hangzhou Network Co Ltd filed Critical Netease Hangzhou Network Co Ltd
Priority to CN202310269238.3A priority Critical patent/CN116595970A/en
Publication of CN116595970A publication Critical patent/CN116595970A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a sentence synonymous rewriting method, a sentence synonymous rewriting device and electronic equipment, wherein, firstly, a sentence to be processed and personalized parameters are obtained; carrying out normalization processing on the statement to be processed to obtain a standard expression statement corresponding to the statement to be processed; carrying out synonym rewriting on the standard expression sentence to obtain a plurality of candidate synonyms; based on the personalized parameters and a preset stylized word stock, personalized processing is carried out on the candidate synonyms, and a plurality of final synonyms corresponding to the to-be-processed sentences are obtained. According to the method, based on one sentence, a plurality of final synonymous sentences with the same semantic meaning as the sentence can be automatically obtained, so that the same sentence can show synonymous expressions with different personalities under different settings; therefore, the method can obtain a plurality of personalized synonymous expressions corresponding to each scene NPC in the game scene only by setting one sentence of dialogs for each scene NPC by the planner, thereby increasing the reality of the game world and reducing the workload of the planner.

Description

Sentence synonymous rewriting method and device and electronic equipment
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a device for rewriting statement synonyms and electronic equipment.
Background
In the game development process, in order to make the game content richer, the player searches for more fun, and a planner usually designs a plurality of different dialogues conforming to the personality of the non-player character NPC in the game. However, the number of scene NPCs used in a game to enrich a game scene is generally large, and if a planner designs various dialogs for each scene NPC, it takes a lot of time and effort, so that each scene NPC in the related art has only one repeated dialogs, but this way weakens the reality of the game world, and also makes it easy for a player to ignore the exploration of scene NPCs in the game.
Disclosure of Invention
The invention aims to provide a method, a device and electronic equipment for synonymously rewriting sentences to obtain a plurality of synonymous sentences, so that agreeing to one sentence can show agreeing expressions of different personalities under different settings, and the reality of a game world is improved.
In a first aspect, the present invention provides a method for rewriting synonyms of sentences, the method comprising: acquiring sentences to be processed and personalized parameters; the personalized parameters are used for carrying out personalized processing on the sentences; carrying out normalization processing on the statement to be processed to obtain a standard expression statement corresponding to the statement to be processed; carrying out synonym rewriting on the standard expression sentence to obtain a plurality of candidate synonyms; based on the personalized parameters and a preset stylized word stock, personalized processing is carried out on the candidate synonyms, and a plurality of final synonyms corresponding to the to-be-processed sentences are obtained.
In a second aspect, the present invention provides a sentence meaning rewriting apparatus, including: the sentence acquisition module is used for acquiring sentences to be processed and personalized parameters; the personalized parameters are used for carrying out personalized processing on the sentences; the normalization processing module is used for carrying out normalization processing on the sentences to be processed to obtain standard expression sentences corresponding to the sentences to be processed; the synonym rewriting module is used for rewriting synonyms of the standard expression sentences to obtain a plurality of candidate synonyms; the personalized processing module is used for carrying out personalized processing on the candidate synonyms based on the personalized parameters and a preset stylized word stock to obtain a plurality of final synonyms corresponding to the sentence to be processed.
In a third aspect, the present invention provides an electronic device comprising a processor and a memory storing machine executable instructions executable by the processor to implement the statement synonym rewrite method of the claims.
In a fourth aspect, the present invention provides a computer-readable storage medium storing computer-executable instructions that, when invoked and executed by a processor, cause the processor to implement the statement synonym rewrite method described above.
The embodiment of the invention has the following beneficial effects:
the invention provides a sentence synonymous rewriting method, a sentence synonymous rewriting device and electronic equipment, wherein, firstly, sentences to be processed and personalized parameters are obtained; carrying out normalization processing on the sentence to be processed to obtain a standard expression sentence corresponding to the sentence to be processed; then, carrying out synonym rewriting on the standard expression sentence to obtain a plurality of candidate synonyms; and then, based on the personalized parameters and a preset stylized word stock, carrying out personalized processing on the candidate synonyms to obtain a plurality of final synonyms corresponding to the sentence to be processed. According to the method, based on one sentence, a plurality of final synonymous sentences with the same semantic meaning as the sentence can be automatically obtained, so that the same sentence can show synonymous expressions with different personalities under different settings; therefore, the method can obtain a plurality of personalized synonymous expressions corresponding to each scene NPC in the game scene only by setting one sentence of dialogs for each scene NPC by the planner, thereby increasing the reality of the game world and reducing the workload of the planner.
Additional features and advantages of the invention will be set forth in the description which follows, or in part will be obvious from the description, or may be learned by practice of the invention.
In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for rewriting synonyms of sentences according to an embodiment of the present invention;
FIG. 2 is a flowchart of another method for rewriting synonyms of sentences according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of another method for rewriting synonyms of sentences according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a device for rewriting synonyms of sentences according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
With the continuous and deep research of the related technology of artificial intelligence and natural language processing, more and more deep learning technologies are continuously practiced and landed in the game field. In the game development process, in order to make the game content richer, the player searches for more fun, and a planner usually designs a plurality of different dialogues conforming to the personality of the non-player character NPC in the game. However, the number of scene NPCs used in a game to enrich a game scene is generally large, and if a planner designs various dialogs for each scene NPC, it takes a lot of time and effort, so that each scene NPC in the related art has only one repeated dialogs, but this way weakens the reality of the game world, and also makes it easy for a player to ignore the exploration of scene NPCs in the game.
Based on the above problems, the embodiments of the present invention provide a method, an apparatus, and an electronic device for rewriting a sentence synonym, where the technology may be applied to a scene of rewriting a synonym for a dialogue, especially a scene of rewriting a synonym for a sentence of a dialogue in a game.
In order to facilitate understanding of the embodiments of the present invention, first, a detailed description is provided of a method for rewriting synonyms of sentences disclosed in the embodiments of the present invention, as shown in fig. 1, the method includes the following specific steps:
step S102, acquiring a statement to be processed and personalized parameters; the personalized parameters are used for personalized processing of the sentences.
In specific implementation, the sentence to be processed can be any sentence, or can be a sentence designed by a planner for NPC in the game; for example, the sentence to be processed is "the little knight-errant your martial arts are good, i am very wearing. The personalized parameters can be set according to research and development requirements, and are mainly used for personalized processing and color rendering of the obtained sentences, so that the finally obtained sentences are richer and more personalized.
Step S104, carrying out normalization processing on the statement to be processed to obtain a standard expression statement corresponding to the statement to be processed.
The normalization processing aims at normalizing the sentences to be processed, so that different sentences can be processed conveniently by using a unified method. In concrete implementation, firstly, the semantics of the sentence to be processed needs to be analyzed, and then normalization processing is carried out on the sentence to be processed according to the semantics, wherein the normalization processing can be carried out on personalized vocabularies such as called predicates, intonation words and the like which are related in the sentence to be processed, so as to obtain a standard expression sentence.
And S106, carrying out synonym rewriting on the standard expression sentences to obtain a plurality of candidate synonyms.
In specific implementation, the semantics of each candidate synonym are the same as those of the standard expression statement, and the expression modes are different. Specifically, when the synonym is rewritten in the standard expression sentence, the synonym can be converted for a specified style which can be set according to the development requirement, and for example, the specified style can be Jin Yong style, quadratic style, emotion style, and the like.
In practical application, the standard expression sentence can be split into a plurality of clauses, and if the clause is a short sentence, a plurality of candidate synonymous sentences with high similarity with the clause are searched in a preset corpus in a vector search mode; and if the clause is long, carrying out synonym rewriting through a pre-trained synonym generation model capable of generating candidate synonyms of a specified style, so as to obtain a plurality of rewritten candidate synonyms. The preset corpus comprises a large number of sentences with appointed styles; the synonym generation model may be a neural network model or a deep learning model, and the synonym generation model is obtained by training a synonym data set with a specified style, wherein the synonym data set contains a large number of synonym pairs, and the synonym pairs comprise a dialect with a specified style and a translation of a specified language corresponding to the dialect.
And S108, performing personalized processing on the candidate synonyms based on the personalized parameters and a preset stylized word stock to obtain a plurality of final synonyms corresponding to the sentence to be processed.
The stylized word library contains a large number of words with specified styles, and the words are usually called predicates, mood words and the like. According to the personalized parameters and the personalized word stock, the obtained candidate synonyms can be subjected to personalized color rendering to obtain a plurality of final synonyms of the sentences to be processed, so that the same sentence to be processed shows synonym expressions of different personalities under different settings.
The sentence synonymous rewriting method provided by the embodiment of the invention comprises the steps of firstly obtaining a sentence to be processed and personalized parameters; carrying out normalization processing on the sentence to be processed to obtain a standard expression sentence corresponding to the sentence to be processed; then, carrying out synonym rewriting on the standard expression sentence to obtain a plurality of candidate synonyms; and then, based on the personalized parameters and a preset stylized word stock, carrying out personalized processing on the candidate synonyms to obtain a plurality of final synonyms corresponding to the sentence to be processed. According to the method, based on one sentence, a plurality of final synonymous sentences with the same semantic meaning as the sentence can be automatically obtained, so that the same sentence can show synonymous expressions with different personalities under different settings; therefore, the method can obtain a plurality of personalized synonymous expressions corresponding to each scene NPC in the game scene only by setting one sentence of dialogs for each scene NPC by the planner, thereby increasing the reality of the game world and reducing the workload of the planner.
The embodiment of the invention also provides another method for rewriting synonyms of sentences, which is realized on the basis of the embodiment, and the method mainly describes the specific process of carrying out normalization processing on the sentences to be processed to obtain the standard expression sentences corresponding to the sentences to be processed (realized by the following steps S204-S208), and carries out personalized processing on a plurality of candidate synonyms based on personalized parameters and a preset stylized word stock to obtain the specific process of a plurality of final synonyms corresponding to the sentences to be processed (realized by the following steps S212-S214); as shown in fig. 2, the method comprises the following specific steps:
step S202, acquiring a sentence to be processed and personalized parameters; wherein the personalized parameters include the character and sex of the two parties of the conversation.
The above-mentioned personalized parameter may be set by a planner according to the requirement, and since the sentence to be processed is usually the dialect of the scenario NPC in the game, the personalized parameter is used to set the personality and sex of the speaker and the speaker to be uttered, for example, the speaker may be a woman, and the speaker may be a man; the speaker may be the old, the person being uttered may be a young man, etc.
Step S204, carrying out semantic analysis on the statement to be processed to obtain a semantic analysis result.
When receiving a sentence to be processed, in concrete implementation, firstly, semantic analysis is required to be carried out on the sentence to be processed so as to obtain the semantic of the sentence to be processed. Specifically, the semantic analysis mode adopted by the embodiment of the invention can adopt the existing method or tool for carrying out semantic analysis on natural language, and the like.
Step S206, determining the title word and the mood word in the sentence to be processed based on the semantic analysis result.
And step S208, normalizing the title words in the to-be-processed sentence into standard title predicates, and deleting the intonation words in the to-be-processed sentence to obtain the standard expression sentence corresponding to the to-be-processed sentence.
The standard title words comprise you and me, and the language words comprise the terms of the first two, the second two, the third two, etc. In specific implementation, the title word in the sentence to be processed is normalized to a unified standard title predicate, and the intonation word in the sentence to be processed is removed, so that a normalized expression sentence (equivalent to the standard expression sentence) can be obtained.
And step S210, carrying out synonym rewriting on the standard expression sentence to obtain a plurality of candidate synonyms.
And S212, calculating the semantic similarity of each candidate synonym and the sentence to be processed, and determining the candidate synonym with the semantic similarity larger than a preset similarity threshold as a standby synonym.
The above-mentioned preset similarity threshold may be set according to development requirements, for example, the preset similarity threshold may be set to 95% or 90% or the like. In specific implementation, a plurality of candidate synonyms with semantic similarity larger than a preset similarity threshold value are selected from the candidate synonyms after synonym rewriting, and the selected candidate synonyms are used as standby synonyms for subsequent processing.
Step S214, for each standby synonym, determining a target nominal predicate from a nominal word stock according to characters and sexes of both sides of the dialogue, determining a target word from a word stock, replacing a standard nominal predicate in the standby synonym with the target nominal word, and adding the target word into the standby synonym with the replaced nominal word to obtain a final synonym.
The term word stock and the word stock of the Chinese language are stylized word stock with specified styles. The target predicates and the target mood words are selected from a title word stock and a mood word stock according to the characters and the sexes of the two parties of the dialogue respectively. The final synonym is usually a synonym corresponding to the sentence to be processed with a specified style.
In specific implementation, the term predicate and the word stock are determined by the following steps 10-12:
Step 10, acquiring a dialogue database with a designated style; the dialogue database comprises a plurality of dialogue data.
The specified style may be set according to the development requirement, and for example, the specified style may be Jin Yong style, quadratic style, emotion style, etc. The dialogue database contains a large amount of dialogue data with a specified style, and the dialogue data can have all dialogue data in a novel with the specified style.
And 11, marking Chinese word class on the dialogue database, extracting target words marked as sighing in the dialogue database, and constructing a word stock based on the target words.
In particular implementations, semantic analysis tools may be employed to tag the dialogue data in the dialogue database with chinese word class. Specifically, a WordTag tool (namely a Chinese word class knowledge tagging tool) can be adopted to carry out Chinese word class tagging; the WordTag is a word class knowledge labeling tool capable of covering all Chinese words, aims to provide comprehensive and rich knowledge labeling results for Chinese text analysis, can be applied to natural language processing tasks such as template (mining template and analysis template) generation and matching, knowledge mining (new word discovery and relation mining) and the like, and improves text analysis and mining precision; the method can also be used as a Chinese text feature generator to provide text features for various machine learning models. In practice, words labeled "sighing" by WordTag may be determined as target words.
And step 12, performing lexical analysis on the dialogue database to obtain words with the length of the beginning of each dialogue data in the appointed word number, and constructing a plurality of dialogue personnel devices and corresponding title word libraries of each dialogue personnel device based on the words with the appointed word number.
In practical applications, LAC (Lexical Analysis of Chinese, chinese lexical analysis) tools may be used to lexically analyze dialog data; the LAC can realize the functions of Chinese word segmentation, part-of-speech tagging, special name recognition and the like. In a specific implementation, the above specified word number is used to indicate the word number corresponding to the term in the dialogue data, and usually the first 2 or 3 words of a section of dialogue are the term, so the specified word number may be set to 2 or 3. Different people's settings can be constructed according to different obtained title words, and the number of constructed people's settings can be determined according to the research and development requirement setting. For example, from the old head, a lifetime setting can be obtained.
For example, assuming that the dialogue database contains all dialogue data in martial arts novel corpus, the dialogue data in the dialogue database is analyzed by using WordTag and LAC, words with the length of 2-3 appearing at the beginning of a sentence in the WordTag and LAC are extracted and statistically ordered, and then 2500 words with the top order are screened out based on manual screening, 13 people devices (including 'good person', 'men', 'ancestor', 'evening', 'woman', 'neutral', 'hiking', 'civilian', 'follow' and 'child') are constructed.
According to the sentence synonymous rewriting method, synonymous rewriting can be automatically performed according to one sentence of NPC, the semantic meaning of an input sentence is analyzed and normalized, then synonymous sentence extraction is performed by using a database vector retrieval mode for a short sentence, a specified style synonymous sentence is used for generating a long sentence, and finally personalized coloring is performed on the obtained synonymous sentence according to settings, so that the same sentence of the white shows synonymous expressions of different personalities under different settings, game contents are greatly enriched, and meanwhile, the workload of text planning is reduced. In addition, the method can moisten based on the name word stock, the language word stock and the individuation parameters, and further individuate the content obtained by synonymous rewriting.
The embodiment of the invention also provides another method for rewriting synonyms of sentences, which is realized on the basis of the embodiment, and the method mainly describes the concrete process of rewriting the synonyms of standard expression sentences to obtain a plurality of candidate synonyms (realized through the following steps S306-S318); as shown in fig. 3, the method comprises the following specific steps:
step S302, acquiring a statement to be processed and personalized parameters; the personalized parameters are used for personalized processing of the sentences.
Step S304, carrying out normalization processing on the statement to be processed to obtain a standard expression statement corresponding to the statement to be processed.
In the specific implementation, after receiving a statement to be processed, the semantics of the statement to be processed need to be analyzed, and then the title words contained in the statement to be processed are determined according to the semantics and normalized into standard title predicates; and taking out the Chinese words in the sentence to be processed to obtain the standard expression sentence corresponding to the sentence to be processed.
And step S306, according to punctuation marks in the standard expression sentences, carrying out clauses on the standard expression sentences to obtain at least one clause.
For example, the standard expression statement is: your martial arts are good, i am wearing. Then, based on punctuation, this standard expression sentence can be divided into two clauses, "your martial arts good" and "me very wearing".
Step S308, determining a first clause in the standard expression sentence as a current clause; step S310 is performed.
In a specific implementation, each of at least one clause of the standard expression sentence divided is required to be a current clause once, and the following steps S310 to S314 are performed.
Step S310, judging whether the length of the current clause is larger than a preset length threshold value; if the length is greater than the preset length threshold, step S312 is executed; otherwise, step S314 is performed.
The preset length threshold may be set according to development requirements, for example, the preset length threshold may be set to 5 or 8. And then, distinguishing the clauses according to the different lengths of the clauses.
Step S312, inputting the current clause into a pre-trained synonym generation model, and outputting a plurality of candidate synonyms of the current clause through the synonym generation model; step S316 is performed.
And if the length of the current clause is greater than the preset length threshold, rewriting the current clause through a synonym generation model to obtain a plurality of rewritten candidate synonyms corresponding to the current clause. Specifically, the synonym generation model is obtained by training a preset synonym on a data set; wherein the synonym data set contains a plurality of synonym pairs; the synonym pair may be determined by the following steps 20-21:
step 20, obtaining the dialect sentence with the appointed style.
The above-mentioned dialect sentence may be extracted from dialects in a novel having a specified style.
Step 21, obtaining synonymous sentence pairs corresponding to the dialect sentences in a double back translation mode of a plurality of translation engines; the synonymous sentence pair comprises a dialect sentence and a translation of a specified language corresponding to the dialect sentence.
The translation engine may be a translation tool that performs translations in multiple languages. In specific implementation, a translation engine is used for translating the dialect sentence in the appointed language into the first language, another translation engine is used for translating the dialect sentence in the first language into the translation in the appointed language, and then clear processing is carried out on the translation in the appointed language, so that the final translation is obtained.
For example, assuming that the specified style is Jin Yong style, all dialects and bystandings in the 14 part Jin Yongxiao sentence may be separated and extracted to obtain 133595 and 120771 sentences, respectively, and then multiple pairs of Jin Yongwen (corresponding to the dialect language) -dialect (corresponding to the translation of the specified language) data, that is, synonymous sentence pairs, may be constructed by using a multiple translation engine double back translation method. The text quality obtained by the back and forth translation of Jin Yongwen-English-white text is irregular, especially some entities such as names, and the like, because the entities are usually converted into pinyin when translating into English, and the problem of inconsistency occurs when the entities are converted into Chinese from pinyin. Therefore, these synonymous sentence pairs are further cleaned, and the main method is to use NER (named entity recognition, which is a technology for extracting and labeling entity nouns such as person names, place names, organization names, etc. for sentences) tools to extract the entities in Jin Yongwen and the white sentences respectively, then use pinyin to align, and directly replace the entities in the white sentences with the Jin Yongwen entities.
In the process of training the synonym generation model, training samples are firstly required to be obtained from a synonym data set, and then the training samples are input into an initial model to obtain an output result; calculating a loss value based on the output result and the training sample, adjusting the structure of the initial model based on the loss value if the loss value does not meet the preset condition, continuously acquiring the training sample from the synonymous sentence data set, and inputting the training sample into the adjusted initial model to obtain an output result; if the loss value calculated based on the output result and the training sample meets a preset condition, determining the adjusted initial model as a synonymous sentence generation model; if the preset condition is not met, continuing to acquire training samples from the synonym data set until the loss value meets the preset condition.
In practical application, the initial model may be a T5-Pegasus model, where the T5-Pegasus model is a transform-based encoder-decoder model, and a pre-training model is generated for the text of the abstract generation task. Specifically, firstly, based on a T5-Pegasus model, fine tuning is carried out on a data set by using the synonym to obtain a synonym generation model after training is completed, then, a clause with the length larger than a preset length threshold is directly input into the synonym generation model to obtain candidate output, the candidate output and a sentence to be processed are used for calculating similarity, and the candidate output with the similarity not smaller than the preset similarity threshold is used as a candidate synonym of a current clause. Wherein SimBERT is a model for computing sentence vectors based on bert, and is generally used for sentence coding and sentence similarity computation.
In the field of synonym generation, usually, a synonym pair is constructed to train an initial model, but a simple synonym rewrite technique may obtain similar output for the same input, and thus the requirement of personalized rewrite cannot be met. Therefore, the invention trains the initial model through the synonym data set, and can obtain personalized and various candidate synonym.
Step S314, determining a plurality of candidate synonymous sentences similar to the current clause from a preset corpus.
If the length of the current clause is not greater than the preset length threshold value, a plurality of candidate synonymous sentences with higher similarity with the current clause can be searched in a preset corpus in a vector search mode. Specifically, a semantic vector search method may be adopted, in this method, a BERT encoder is generally used to encode all contents in a preset corpus to obtain an ebedding vector (corresponding to a coding vector described below), then an index is built by a vector search method or tool, the content to be searched (corresponding to a current clause) is encoded by the same encoder, and search is performed by the index, so that the candidate synonymous sentences obtained by the method, more similar in terms of semantics, may not contain the same keywords. Where ebedding refers to explicit representation of a high-dimensional real space into a much lower-dimensional continuous vector space.
In specific implementation, the above-mentioned preset corpus is determined by the following steps 30-34:
step 30, a plurality of dialogue clauses having a specified style are acquired.
The plurality of dialogue clauses with the specified style may be the clauses after the dialogue clauses in the novel corpus with the specified style.
And step 31, coding each dialogue clause by using a preset vector model to obtain a coding vector corresponding to each dialogue clause.
The vector model may be a SimBERT model, and each dialogue clause may be encoded using the SimBERT model to obtain an encoded encoding vector corresponding to each dialogue clause.
Step 32, classifying and dividing the coding vectors corresponding to the dialogue clauses to obtain a plurality of division results; wherein each division result comprises at least one coding vector.
In specific implementation, the code vectors corresponding to the plurality of dialogue clauses may be randomly divided into a plurality of division results, or the code vectors corresponding to the plurality of dialogue clauses may be divided into a plurality of division results according to a specified rule. The above specified rules may be set according to development requirements, and are not specifically limited herein.
Step 33, for each division result, performing principal component analysis on the code vector in the current division result, and performing dimension reduction processing on the code vector in the current division result according to the principal component analysis result to obtain a dimension reduction processing result; the dimension reduction processing result comprises the following steps: the code vector after the dimension reduction processing in the current division result is subjected to dimension reduction parameters used for the dimension reduction processing.
In specific implementation, each of the plurality of division results needs to be used as a current division result, and each coding vector in the current division result needs to be subjected to dimension reduction processing through a Whitening operation, and dimension reduction parameters of the coding vector are extracted and stored. The Whitening operation may also be referred to as a Whitening operation, that is, performing principal component analysis on a batch of vectors, reducing dimensions, and extracting transformation operation parameters (corresponding to the dimension reduction parameters) corresponding to the dimension reduction.
In practical application, a search index is also required to be constructed for all the code vectors subjected to the dimension reduction processing by fasis so as to obtain the code vectors from a preset corpus in a subsequent vector retrieval mode. Wherein fasss is a tool library for efficient similarity searching and dense vector clustering.
Step 34, generating a preset corpus based on the dimension reduction processing result corresponding to each division result; the preset corpus comprises a plurality of dimension reduction processing results corresponding to the division results.
In specific implementation, the above step S314 may be implemented by the following steps 40-43:
and step 40, randomly sampling the preset corpus to obtain a dimension reduction processing result of the target division result.
The dimension reduction processing result of the target division result may be a dimension reduction processing result of any one division result in the preset corpus.
And step 41, encoding the current clause based on the dimension reduction parameters corresponding to the dimension reduction processing result of the target division result, and obtaining the encoding vector corresponding to the current clause.
In specific implementation, the SimBERT model can be utilized, and the dimension reduction parameters corresponding to the dimension reduction processing result of the target division result are used for encoding the current clause to obtain the encoding vector corresponding to the current clause.
And 42, performing principal component analysis on the code vector corresponding to the current clause, and performing dimension reduction on the code vector corresponding to the current clause according to the principal component analysis result to obtain a dimension reduction vector.
In specific implementation, the dimension reduction processing can be performed on the code vector corresponding to the current clause through the Whitening operation, so that the dimension reduction vector corresponding to the current clause after dimension reduction is obtained.
And 43, calculating the similarity between the dimension-reduced vector and the dimension-reduced coded vector in the dimension-reduced processing result of the target division result, and determining the dialogue clause corresponding to the coded vector with the similarity meeting the preset condition as the candidate synonym corresponding to the current clause.
The preset condition may be determined according to the development requirement, for example, the preset condition may be that the similarity is greater than a preset similarity threshold, or the first ten or twelve corresponding dialogue clauses ranked based on the similarity.
In concrete implementation, the invention uses the existing tool to calculate the similarity between the code vector after the dimension reduction processing and the dimension reduction vector in the dimension reduction processing result of the target division result, and obtains a plurality of short sentences with higher similarity.
Specifically, assuming that the code vectors after the dimension reduction processing in the dimension reduction processing result of the target division result contain the code vectors after the dimension reduction processing corresponding to 10 ten thousand phrases, calculating a plurality of cluster centers of the code vectors after the dimension reduction processing corresponding to the 10 ten thousand phrases according to a plurality of preset classifications, calculating the similarity between the code vectors after the dimension reduction corresponding to the current phrase and each cluster center based on the code vectors after the dimension reduction corresponding to the current phrase, selecting the code vectors after the dimension reduction processing corresponding to the cluster center with high similarity, determining the cluster centers after the dimension reduction processing contained in the cluster center, continuously calculating the similarity between the code centers after the dimension reduction processing and the code vectors after the dimension reduction corresponding to the current phrase, and sequentially analogizing until the most similar specified number of the code vectors after the dimension reduction processing are obtained, and determining the phrases corresponding to the code vectors after the dimension reduction processing as candidate synonyms of the current phrase.
Step S316, judging whether the current clause is the last clause in the landmark-by-landmark expression statement; if not, step S318 is performed, otherwise, step S320 is performed.
Step S318, the next clause of the current clause in the standard expression sentence is used as a new current clause, and step S310 is executed.
Step S320, based on the personalized parameters and a preset stylized word stock, personalized processing is carried out on each candidate synonym, and a plurality of final synonyms corresponding to the to-be-processed sentences are obtained.
In some embodiments, in order to obtain synonyms with semantics more similar to those of the sentence to be processed, the semantic similarity between the obtained candidate synonyms and the sentence to be processed may be calculated first, then the candidate synonyms with semantic similarity smaller than a preset similarity threshold are filtered, the candidate synonyms with semantic similarity greater than or equal to the preset similarity threshold are left, and then personalized rewrite is performed according to the term predicates and the word atmosphere words set in the NPC in the personalized parameters, so as to obtain the final personalized rewrite content of the NPC in the specified style, that is, obtain the final synonyms corresponding to the sentence to be processed.
In specific implementation, a plurality of final synonyms can be used as the dialect output of the NPC, and because the purpose of the invention is to enrich the corpus of the NPC, for an application layer, one sentence of dialect is input, and the aim is to output infinite synonymous stylized corpus. The multiple final synonyms output by the NPC can be played in turn in the game, or each time the player clicks the NPC, different final synonyms are output.
In order to facilitate understanding of the embodiment of the invention, taking the sentence to be processed as ' little knight-errant your martial arts true good ' as an example, the invention is exemplarily introduced, firstly, normalization processing is carried out on the sentence to be processed to obtain a standard expression sentence ' your martial arts true good, i's martial arts very wearing '; then, carrying out clauses on the standard expression statement according to punctuation marks to obtain a first clause of ' your Wugong is good ' and a second clause of ' I'm is very peruse '; for a second sentence with the sentence length within 5, searching candidate synonymous sentences which are most similar to the first sentence in a preset corpus by means of vector searching: "I am very peace," "that is very peace" and "really let me peace"; for a first sentence with the sentence length of more than 5, carrying out rewrite generation through a synonym generation model capable of obtaining sentences with a specified style, and obtaining a plurality of candidate synonyms corresponding to the rewritten first clause: "your martial arts are very good", "your martial arts are very good" and "your art is very good". And finally, calculating the semantic similarity of each candidate synonym and the corresponding clause in the sentence to be processed, reserving the candidate synonym with the semantic similarity larger than a preset similarity threshold, and then carrying out personalized rewrite according to the associated call predicate and the word of the language set by the NPC in the personalized parameters to obtain the final synonym. For example, the personalized parameter is that the speaker type is female, the speaker type is male, then the final synonym output is: "hip-hop, little brothers your arts are very good, i am even more to wear to take a suit, and" interesting, every your arts are very good, that is, very to wear to take a suit ".
According to the sentence synonymous rewriting method, firstly, a synonymous sentence pair data set is automatically constructed according to the corpus of the appointed style, then, a corpus rewriting system of the appointed style is built based on the vector retrieval mode and the synonymous sentence generation module, the writing workload of planners in designing NPC rich corpus scenes is greatly reduced, and the planners are supported to generate various individualized synonymous expression modes through the system by only writing at least one sentence for each NPC and setting individuality. In addition, under the NPC game scene with rich corpus, the player can participate in exploring the game content more actively, and the method is beneficial to spontaneous popularization of the player, and saves popularization cost.
Corresponding to the above method embodiment, the embodiment of the present invention further provides a device for rewriting synonyms of sentences, as shown in fig. 4, where the device includes:
a sentence acquisition module 40, configured to acquire a sentence to be processed and a personalized parameter; the personalized parameters are used for personalized processing of the sentences.
The normalization processing module 41 is configured to normalize the sentence to be processed, and obtain a standard expression sentence corresponding to the sentence to be processed.
And the synonym rewriting module 42 is configured to rewrite the synonym for the standard expression sentence to obtain a plurality of candidate synonyms.
The personalized processing module 43 is configured to perform personalized processing on the multiple candidate synonyms based on the personalized parameters and a preset stylized word stock, so as to obtain multiple final synonyms corresponding to the to-be-processed sentence.
The sentence synonymous rewriting device firstly acquires a sentence to be processed and personalized parameters; carrying out normalization processing on the sentence to be processed to obtain a standard expression sentence corresponding to the sentence to be processed; then, carrying out synonym rewriting on the standard expression sentence to obtain a plurality of candidate synonyms; and then, based on the personalized parameters and a preset stylized word stock, carrying out personalized processing on the candidate synonyms to obtain a plurality of final synonyms corresponding to the sentence to be processed. According to the method, based on one sentence, a plurality of final synonymous sentences with the same semantic meaning as the sentence can be automatically obtained, so that the same sentence can show synonymous expressions with different personalities under different settings; therefore, the method can obtain a plurality of personalized synonymous expressions corresponding to each scene NPC in the game scene only by setting one sentence of dialogs for each scene NPC by the planner, thereby increasing the reality of the game world and reducing the workload of the planner.
Specifically, the normalization processing module 41 is configured to: carrying out semantic analysis on the statement to be processed to obtain a semantic analysis result; determining a title word and a mood word in the sentence to be processed based on the semantic analysis result; normalizing the title words in the to-be-processed sentence into standard call predicates, and deleting the intonation words in the to-be-processed sentence to obtain the standard expression sentence corresponding to the to-be-processed sentence.
Further, the synonym rewrite module 42 is configured to: according to punctuation marks in the standard expression sentences, carrying out clauses on the standard expression sentences to obtain at least one clause; judging whether the length of the current clause is larger than a preset length threshold value or not according to each clause; if the current clause is larger than the preset length threshold value, inputting the current clause into a pre-trained synonym generation model, and outputting a plurality of candidate synonyms of the current clause through the synonym generation model; and if the length of the candidate phrases is not greater than the preset length threshold value, determining a plurality of candidate synonymous sentences similar to the current clause from the preset corpus.
In specific implementation, the synonym generation model is obtained by training a preset synonym on a data set; the synonym data set comprises a plurality of synonym pairs; based on this, the apparatus further includes a synonym pair determining module configured to: obtaining a dialogue sentence with a specified style; obtaining synonymous sentence pairs corresponding to the dialect sentences through a double back translation mode of a plurality of translation engines; the synonymous sentence pair comprises a dialect sentence and a translation of a specified language corresponding to the dialect sentence.
In practical application, the device further comprises a corpus generating module, configured to: acquiring a plurality of dialogue clauses with appointed styles; using a preset vector model to encode each dialogue clause to obtain a corresponding encoding vector of each dialogue clause; classifying and dividing the coding vectors corresponding to the dialogue clauses to obtain a plurality of division results; wherein each division result comprises at least one coding vector; aiming at each division result, carrying out principal component analysis on the coded vector in the current division result, and carrying out dimension reduction on the coded vector in the current division result according to the principal component analysis result to obtain a dimension reduction processing result; the dimension reduction processing result comprises the following steps: the code vector after the dimension reduction processing in the current division result is subjected to dimension reduction parameters used for the dimension reduction processing; generating a preset corpus based on dimension reduction processing results corresponding to each division result; the preset corpus comprises a plurality of dimension reduction processing results corresponding to the division results.
Further, the synonym rewrite module 42 is further configured to: randomly sampling a preset corpus to obtain a dimension reduction processing result of a target division result; based on the dimension reduction parameters corresponding to the dimension reduction processing result of the target division result, encoding the current clause to obtain an encoding vector corresponding to the current clause; performing principal component analysis on the code vector corresponding to the current clause, and performing dimension reduction on the code vector corresponding to the current clause according to the principal component analysis result to obtain a dimension reduction vector; and calculating the similarity of the dimension reduction vector and the code vector after the dimension reduction processing in the dimension reduction processing result of the target division result, and determining the dialogue clause corresponding to the code vector with the similarity meeting the preset condition as the candidate synonymous sentence corresponding to the current clause.
In specific implementation, the personalized parameters comprise characters and sexes of two parties of the dialogue; the stylized word stock comprises a title word stock and a mood word stock; the above-mentioned personalized processing module 43 is configured to: calculating the semantic similarity of each candidate synonym and the sentence to be processed, and determining the candidate synonym with the semantic similarity larger than a preset similarity threshold as a standby synonym; and for each standby synonym, determining a target nominal predicate from a nominal word library according to characters and sexes of both sides of the dialogue, determining a target word from a word stock, replacing the standard nominal predicate in the standby synonym with the target nominal word, and adding the target word into the standby synonym with the replaced nominal word to obtain the final synonym.
Further, the apparatus further includes a thesaurus determining module configured to: acquiring a dialogue database with a specified style; the dialogue database comprises a plurality of dialogue data; chinese word class marking is carried out on the dialogue database, target words marked as sighing words in the dialogue database are extracted, and a word stock is constructed based on the target words; and performing lexical analysis on the dialogue database to obtain words with the length of the beginning of each dialogue data in the appointed word number, and constructing a plurality of dialogue people and a corresponding title word library of each dialogue person based on the words in the appointed word number.
The sentence synonymous rewriting device provided by the embodiment of the present invention has the same implementation principle and technical effects as those of the foregoing method embodiment, and for the sake of brevity, reference may be made to the corresponding contents in the foregoing method embodiment where the device embodiment section is not mentioned.
The embodiment of the invention also provides an electronic device, as shown in fig. 5, which comprises a processor and a memory, wherein the memory stores machine executable instructions which can be executed by the processor, and the processor executes the machine executable instructions to realize the method for rewriting the synonyms of the sentences.
Specifically, the method for rewriting the synonyms of the sentences comprises the following steps: acquiring sentences to be processed and personalized parameters; the personalized parameters are used for carrying out personalized processing on the sentences; carrying out normalization processing on the statement to be processed to obtain a standard expression statement corresponding to the statement to be processed; carrying out synonym rewriting on the standard expression sentence to obtain a plurality of candidate synonyms; based on the personalized parameters and a preset stylized word stock, personalized processing is carried out on the candidate synonyms, and a plurality of final synonyms corresponding to the to-be-processed sentences are obtained.
According to the sentence synonymous rewriting method, based on one sentence, a plurality of final synonymous sentences with the same semantic meaning as the sentence can be automatically obtained, so that the same sentence can show synonymous expressions with different personalities under different settings; therefore, the method can obtain a plurality of personalized synonymous expressions corresponding to each scene NPC in the game scene only by setting one sentence of dialogs for each scene NPC by the planner, thereby increasing the reality of the game world and reducing the workload of the planner.
In an alternative embodiment, the step of normalizing the to-be-processed sentence to obtain a standard expression sentence corresponding to the to-be-processed sentence includes: carrying out semantic analysis on the statement to be processed to obtain a semantic analysis result; determining a title word and a mood word in the sentence to be processed based on the semantic analysis result; normalizing the title words in the to-be-processed sentence into standard call predicates, and deleting the intonation words in the to-be-processed sentence to obtain the standard expression sentence corresponding to the to-be-processed sentence.
In an alternative embodiment, the step of rewriting the synonym of the standard expression sentence to obtain a plurality of candidate synonyms includes: according to punctuation marks in the standard expression sentences, carrying out clauses on the standard expression sentences to obtain at least one clause; judging whether the length of the current clause is larger than a preset length threshold value or not according to each clause; if the current clause is larger than the preset length threshold value, inputting the current clause into a pre-trained synonym generation model, and outputting a plurality of candidate synonyms of the current clause through the synonym generation model; and if the length of the candidate phrases is not greater than the preset length threshold value, determining a plurality of candidate synonymous sentences similar to the current clause from the preset corpus.
In an alternative embodiment, the synonym generation model is obtained by training a preset synonym on a data set; the synonym data set comprises a plurality of synonym pairs; the synonym pair is determined by: obtaining a dialogue sentence with a specified style; obtaining synonymous sentence pairs corresponding to the dialect sentences through a double back translation mode of a plurality of translation engines; the synonymous sentence pair comprises a dialect sentence and a translation of a specified language corresponding to the dialect sentence.
In an alternative embodiment, the above-mentioned preset corpus is determined by: acquiring a plurality of dialogue clauses with appointed styles; using a preset vector model to encode each dialogue clause to obtain a corresponding encoding vector of each dialogue clause; classifying and dividing the coding vectors corresponding to the dialogue clauses to obtain a plurality of division results; wherein each division result comprises at least one coding vector; aiming at each division result, carrying out principal component analysis on the coded vector in the current division result, and carrying out dimension reduction on the coded vector in the current division result according to the principal component analysis result to obtain a dimension reduction processing result; the dimension reduction processing result comprises the following steps: the code vector after the dimension reduction processing in the current division result is subjected to dimension reduction parameters used for the dimension reduction processing; generating a preset corpus based on dimension reduction processing results corresponding to each division result; the preset corpus comprises a plurality of dimension reduction processing results corresponding to the division results.
In an alternative embodiment, the step of determining a plurality of candidate synonyms similar to the current clause from the preset corpus includes: randomly sampling a preset corpus to obtain a dimension reduction processing result of a target division result; based on the dimension reduction parameters corresponding to the dimension reduction processing result of the target division result, encoding the current clause to obtain an encoding vector corresponding to the current clause; performing principal component analysis on the code vector corresponding to the current clause, and performing dimension reduction on the code vector corresponding to the current clause according to the principal component analysis result to obtain a dimension reduction vector; and calculating the similarity of the dimension reduction vector and the code vector after the dimension reduction processing in the dimension reduction processing result of the target division result, and determining the dialogue clause corresponding to the code vector with the similarity meeting the preset condition as the candidate synonymous sentence corresponding to the current clause.
In an alternative embodiment, the personalized parameters include characters and gender of both parties of the dialogue; the stylized word stock comprises a title word stock and a mood word stock; the step of performing personalized processing on the candidate synonyms based on the personalized parameters and the preset stylized word stock to obtain a plurality of final synonyms corresponding to the sentence to be processed comprises the following steps: calculating the semantic similarity of each candidate synonym and the sentence to be processed, and determining the candidate synonym with the semantic similarity larger than a preset similarity threshold as a standby synonym; and for each standby synonym, determining a target nominal predicate from a nominal word library according to characters and sexes of both sides of the dialogue, determining a target word from a word stock, replacing the standard nominal predicate in the standby synonym with the target nominal word, and adding the target word into the standby synonym with the replaced nominal word to obtain the final synonym.
In an alternative embodiment, the term word stock and the term word stock are determined by: acquiring a dialogue database with a specified style; the dialogue database comprises a plurality of dialogue data; chinese word class marking is carried out on the dialogue database, target words marked as sighing words in the dialogue database are extracted, and a word stock is constructed based on the target words; and performing lexical analysis on the dialogue database to obtain words with the length of the beginning of each dialogue data in the appointed word number, and constructing a plurality of dialogue people and a corresponding title word library of each dialogue person based on the words in the appointed word number.
Further, the electronic device shown in fig. 5 further includes a bus 102 and a communication interface 103, and the processor 101, the communication interface 103, and the memory 100 are connected through the bus 102.
The memory 100 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the system network element and at least one other network element is implemented via at least one communication interface 103 (which may be wired or wireless), and may use the internet, a wide area network, a local network, a metropolitan area network, etc. Bus 102 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 5, but not only one bus or type of bus.
The processor 101 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 101 or instructions in the form of software. The processor 101 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but may also be a digital signal processor (Digital Signal Processing, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 100 and the processor 101 reads information in the memory 100 and in combination with its hardware performs the steps of the method of the previous embodiments.
The embodiment of the invention also provides a computer readable storage medium, which stores computer executable instructions that, when being called and executed by a processor, cause the processor to implement the method for rewriting the synonyms of the sentences, and the specific implementation can be referred to the method embodiment and will not be described herein.
Specifically, the method for rewriting the synonyms of the sentences comprises the following steps: acquiring sentences to be processed and personalized parameters; the personalized parameters are used for carrying out personalized processing on the sentences; carrying out normalization processing on the statement to be processed to obtain a standard expression statement corresponding to the statement to be processed; carrying out synonym rewriting on the standard expression sentence to obtain a plurality of candidate synonyms; based on the personalized parameters and a preset stylized word stock, personalized processing is carried out on the candidate synonyms, and a plurality of final synonyms corresponding to the to-be-processed sentences are obtained.
According to the sentence synonymous rewriting method, based on one sentence, a plurality of final synonymous sentences with the same semantic meaning as the sentence can be automatically obtained, so that the same sentence can show synonymous expressions with different personalities under different settings; therefore, the method can obtain a plurality of personalized synonymous expressions corresponding to each scene NPC in the game scene only by setting one sentence of dialogs for each scene NPC by the planner, thereby increasing the reality of the game world and reducing the workload of the planner.
In an alternative embodiment, the step of normalizing the to-be-processed sentence to obtain a standard expression sentence corresponding to the to-be-processed sentence includes: carrying out semantic analysis on the statement to be processed to obtain a semantic analysis result; determining a title word and a mood word in the sentence to be processed based on the semantic analysis result; normalizing the title words in the to-be-processed sentence into standard call predicates, and deleting the intonation words in the to-be-processed sentence to obtain the standard expression sentence corresponding to the to-be-processed sentence.
In an alternative embodiment, the step of rewriting the synonym of the standard expression sentence to obtain a plurality of candidate synonyms includes: according to punctuation marks in the standard expression sentences, carrying out clauses on the standard expression sentences to obtain at least one clause; judging whether the length of the current clause is larger than a preset length threshold value or not according to each clause; if the current clause is larger than the preset length threshold value, inputting the current clause into a pre-trained synonym generation model, and outputting a plurality of candidate synonyms of the current clause through the synonym generation model; and if the length of the candidate phrases is not greater than the preset length threshold value, determining a plurality of candidate synonymous sentences similar to the current clause from the preset corpus.
In an alternative embodiment, the synonym generation model is obtained by training a preset synonym on a data set; the synonym data set comprises a plurality of synonym pairs; the synonym pair is determined by: obtaining a dialogue sentence with a specified style; obtaining synonymous sentence pairs corresponding to the dialect sentences through a double back translation mode of a plurality of translation engines; the synonymous sentence pair comprises a dialect sentence and a translation of a specified language corresponding to the dialect sentence.
In an alternative embodiment, the above-mentioned preset corpus is determined by: acquiring a plurality of dialogue clauses with appointed styles; using a preset vector model to encode each dialogue clause to obtain a corresponding encoding vector of each dialogue clause; classifying and dividing the coding vectors corresponding to the dialogue clauses to obtain a plurality of division results; wherein each division result comprises at least one coding vector; aiming at each division result, carrying out principal component analysis on the coded vector in the current division result, and carrying out dimension reduction on the coded vector in the current division result according to the principal component analysis result to obtain a dimension reduction processing result; the dimension reduction processing result comprises the following steps: the code vector after the dimension reduction processing in the current division result is subjected to dimension reduction parameters used for the dimension reduction processing; generating a preset corpus based on dimension reduction processing results corresponding to each division result; the preset corpus comprises a plurality of dimension reduction processing results corresponding to the division results.
In an alternative embodiment, the step of determining a plurality of candidate synonyms similar to the current clause from the preset corpus includes: randomly sampling a preset corpus to obtain a dimension reduction processing result of a target division result; based on the dimension reduction parameters corresponding to the dimension reduction processing result of the target division result, encoding the current clause to obtain an encoding vector corresponding to the current clause; performing principal component analysis on the code vector corresponding to the current clause, and performing dimension reduction on the code vector corresponding to the current clause according to the principal component analysis result to obtain a dimension reduction vector; and calculating the similarity of the dimension reduction vector and the code vector after the dimension reduction processing in the dimension reduction processing result of the target division result, and determining the dialogue clause corresponding to the code vector with the similarity meeting the preset condition as the candidate synonymous sentence corresponding to the current clause.
In an alternative embodiment, the personalized parameters include characters and gender of both parties of the dialogue; the stylized word stock comprises a title word stock and a mood word stock; the step of performing personalized processing on the candidate synonyms based on the personalized parameters and the preset stylized word stock to obtain a plurality of final synonyms corresponding to the sentence to be processed comprises the following steps: calculating the semantic similarity of each candidate synonym and the sentence to be processed, and determining the candidate synonym with the semantic similarity larger than a preset similarity threshold as a standby synonym; and for each standby synonym, determining a target nominal predicate from a nominal word library according to characters and sexes of both sides of the dialogue, determining a target word from a word stock, replacing the standard nominal predicate in the standby synonym with the target nominal word, and adding the target word into the standby synonym with the replaced nominal word to obtain the final synonym.
In an alternative embodiment, the term word stock and the term word stock are determined by: acquiring a dialogue database with a specified style; the dialogue database comprises a plurality of dialogue data; chinese word class marking is carried out on the dialogue database, target words marked as sighing words in the dialogue database are extracted, and a word stock is constructed based on the target words; and performing lexical analysis on the dialogue database to obtain words with the length of the beginning of each dialogue data in the appointed word number, and constructing a plurality of dialogue people and a corresponding title word library of each dialogue person based on the words in the appointed word number.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a terminal device, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (11)

1. A method for synonymous rewriting of sentences, the method comprising:
acquiring sentences to be processed and personalized parameters; the personalized parameters are used for carrying out personalized processing on the sentences;
normalizing the statement to be processed to obtain a standard expression statement corresponding to the statement to be processed;
carrying out synonym rewriting on the standard expression statement to obtain a plurality of candidate synonyms;
and carrying out personalized processing on the candidate synonyms based on the personalized parameters and a preset stylized word stock to obtain a plurality of final synonyms corresponding to the statement to be processed.
2. The method of claim 1, wherein the step of normalizing the to-be-processed sentence to obtain a standard expression sentence corresponding to the to-be-processed sentence comprises:
carrying out semantic analysis on the statement to be processed to obtain a semantic analysis result;
determining a title word and a mood word in the sentence to be processed based on the semantic analysis result;
normalizing the title words in the sentence to be processed into standard title predicates, and deleting the intonation words in the sentence to be processed to obtain the standard expression sentence corresponding to the sentence to be processed.
3. The method of claim 1, wherein the step of performing synonym rewrite on the standard expression sentence to obtain a plurality of candidate synonyms comprises:
according to punctuation marks in the standard expression sentences, carrying out clauses on the standard expression sentences to obtain at least one clause;
judging whether the length of the current clause is larger than a preset length threshold value or not according to each clause; if the current clause is larger than the preset length threshold value, inputting the current clause into a pre-trained synonym generation model, and outputting a plurality of candidate synonyms of the current clause through the synonym generation model; and if the candidate synonym is not greater than the preset length threshold, determining a plurality of candidate synonyms similar to the current clause from a preset corpus.
4. The method of claim 3, wherein the synonym generation model is obtained by training a preset synonym on a dataset; wherein the synonym data set comprises a plurality of synonym pairs; the synonym pair is determined by:
obtaining a dialogue sentence with a specified style;
obtaining a synonymous sentence pair corresponding to the dialect sentence through a double back translation mode of a plurality of translation engines; the synonymous sentence pair comprises the dialect sentence and the translation of the appointed language corresponding to the dialect sentence.
5. A method according to claim 3, wherein the pre-set corpus is determined by:
acquiring a plurality of dialogue clauses with appointed styles;
using a preset vector model to encode each dialogue clause to obtain a corresponding encoding vector of each dialogue clause;
classifying and dividing the coding vectors corresponding to the dialogue clauses to obtain a plurality of division results; wherein each division result comprises at least one coding vector;
performing principal component analysis on the code vectors in the current division result aiming at each division result, and performing dimension reduction processing on the code vectors in the current division result according to the principal component analysis result to obtain a dimension reduction processing result; the dimension reduction processing result comprises the following steps: the code vector after the dimension reduction processing in the current division result and the dimension reduction parameter used for the dimension reduction processing;
generating the preset corpus based on the dimension reduction processing result corresponding to each division result; the preset corpus comprises a plurality of dimension reduction processing results corresponding to the division results.
6. The method of claim 5, wherein the step of determining a plurality of candidate synonyms from a pre-set corpus that are similar to the current clause comprises:
Randomly sampling the preset corpus to obtain a dimension reduction processing result of a target division result;
coding the current clause based on the dimension reduction parameters corresponding to the dimension reduction processing result of the target division result to obtain a coding vector corresponding to the current clause;
performing principal component analysis on the code vector corresponding to the current clause, and performing dimension reduction on the code vector corresponding to the current clause according to a principal component analysis result to obtain a dimension reduction vector;
and calculating the similarity of the dimension reduction vector and the dimension reduction processed coding vector in the dimension reduction processing result of the target division result, and determining dialogue clauses corresponding to the coding vector with the similarity meeting the preset condition as candidate synonyms corresponding to the current clause.
7. The method of claim 1, wherein the personalized parameters include personality and gender of the two parties to the conversation; the stylized word stock comprises a title word stock and a mood word stock;
the step of performing personalized processing on the candidate synonyms based on the personalized parameters and a preset stylized word stock to obtain a plurality of final synonyms corresponding to the sentence to be processed comprises the following steps:
Calculating the semantic similarity of each candidate synonym and the sentence to be processed, and determining the candidate synonym with the semantic similarity larger than a preset similarity threshold as a standby synonym;
and for each standby synonym, determining a target nominal predicate from the nominal word bank according to the characters and the sexes of the two parties of the dialogue, determining a target mood word from the mood word bank, replacing the standard nominal predicate in the standby synonym with the target nominal predicate, and adding the target mood word into the standby synonym with the replaced nominal predicate to obtain a final synonym.
8. The method of claim 7, wherein the population word stock and the mood word stock are determined by:
acquiring a dialogue database with a specified style; wherein, the dialogue database comprises a plurality of dialogue data;
chinese word class marking is carried out on the dialogue database, target words marked as sighing words in the dialogue database are extracted, and a word stock is constructed based on the target words;
and performing lexical analysis on the dialogue database to obtain words with the length of the beginning of each dialogue data in the appointed word number, and constructing a plurality of dialogue personnel devices and corresponding title word libraries of each dialogue personnel device based on the words with the appointed word number.
9. A sentence meaning rewriting device, the device comprising:
the sentence acquisition module is used for acquiring sentences to be processed and personalized parameters; the personalized parameters are used for carrying out personalized processing on the sentences;
the normalization processing module is used for carrying out normalization processing on the statement to be processed to obtain a standard expression statement corresponding to the statement to be processed;
the synonym rewriting module is used for rewriting synonyms of the standard expression sentences to obtain a plurality of candidate synonyms;
and the individuation processing module is used for individuating the candidate synonyms based on the individuation parameters and a preset stylized word stock to obtain a plurality of final synonyms corresponding to the statement to be processed.
10. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor, the processor executing the machine executable instructions to implement the method of synonymous writing of sentences of any one of claims 1 to 8.
11. A computer-readable storage medium storing computer-executable instructions that, when invoked and executed by a processor, cause the processor to implement the method of synonymous rewriting of sentences of any one of claims 1 to 8.
CN202310269238.3A 2023-03-15 2023-03-15 Sentence synonymous rewriting method and device and electronic equipment Pending CN116595970A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310269238.3A CN116595970A (en) 2023-03-15 2023-03-15 Sentence synonymous rewriting method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310269238.3A CN116595970A (en) 2023-03-15 2023-03-15 Sentence synonymous rewriting method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN116595970A true CN116595970A (en) 2023-08-15

Family

ID=87605073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310269238.3A Pending CN116595970A (en) 2023-03-15 2023-03-15 Sentence synonymous rewriting method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN116595970A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117951246A (en) * 2024-03-26 2024-04-30 中国电子科技集团公司第三十研究所 New word discovery and application field prediction method and system for network technology

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117951246A (en) * 2024-03-26 2024-04-30 中国电子科技集团公司第三十研究所 New word discovery and application field prediction method and system for network technology
CN117951246B (en) * 2024-03-26 2024-05-28 中国电子科技集团公司第三十研究所 New word discovery and application field prediction method and system for network technology

Similar Documents

Publication Publication Date Title
CN113792818B (en) Intention classification method and device, electronic equipment and computer readable storage medium
CN109408642B (en) Domain entity attribute relation extraction method based on distance supervision
CN107798140B (en) Dialog system construction method, semantic controlled response method and device
WO2018097091A1 (en) Model creation device, text search device, model creation method, text search method, data structure, and program
US20080221863A1 (en) Search-based word segmentation method and device for language without word boundary tag
CN110717045A (en) Letter element automatic extraction method based on letter overview
CN109614620B (en) HowNet-based graph model word sense disambiguation method and system
CN103314369B (en) Machine translation apparatus and method
KR102043353B1 (en) Apparatus and method for recognizing Korean named entity using deep-learning
CN116628186B (en) Text abstract generation method and system
CN112185361B (en) Voice recognition model training method and device, electronic equipment and storage medium
CN111444704A (en) Network security keyword extraction method based on deep neural network
CN113705237A (en) Relation extraction method and device fusing relation phrase knowledge and electronic equipment
CN114757184B (en) Method and system for realizing knowledge question and answer in aviation field
CN116343747A (en) Speech synthesis method, speech synthesis device, electronic device, and storage medium
CN115497477A (en) Voice interaction method, voice interaction device, electronic equipment and storage medium
CN115759119A (en) Financial text emotion analysis method, system, medium and equipment
CN116595970A (en) Sentence synonymous rewriting method and device and electronic equipment
CN114611529B (en) Intention recognition method and device, electronic equipment and storage medium
CN114722774B (en) Data compression method, device, electronic equipment and storage medium
CN109960782A (en) A kind of Tibetan language segmenting method and device based on deep neural network
CN116483314A (en) Automatic intelligent activity diagram generation method
CN113486666A (en) Medical named entity recognition method and system
Ducoffe et al. Machine Learning under the light of Phraseology expertise: use case of presidential speeches, De Gaulle-Hollande (1958-2016)
CN113011141A (en) Buddha note model training method, Buddha note generation method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination