CN116842961A - Text generation method and device, nonvolatile storage medium and electronic equipment - Google Patents

Text generation method and device, nonvolatile storage medium and electronic equipment Download PDF

Info

Publication number
CN116842961A
CN116842961A CN202310755909.7A CN202310755909A CN116842961A CN 116842961 A CN116842961 A CN 116842961A CN 202310755909 A CN202310755909 A CN 202310755909A CN 116842961 A CN116842961 A CN 116842961A
Authority
CN
China
Prior art keywords
initial
corpus
feature
probability distribution
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310755909.7A
Other languages
Chinese (zh)
Inventor
齐宝森
张鹏
王超远
吕梦阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310755909.7A priority Critical patent/CN116842961A/en
Publication of CN116842961A publication Critical patent/CN116842961A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a text generation method and device, a nonvolatile storage medium and electronic equipment. Relates to the field of artificial intelligence, and the method comprises the following steps: acquiring N initial corpus, wherein context association relations exist among the N initial corpus, and N is a positive integer greater than or equal to 1; determining semantic features for representing the context association based on the N initial corpus; determining a first probability distribution for expressing N initial corpus selection probabilities based on semantic features; and selecting N initial corpus by adopting the first probability distribution to obtain target corpus, and generating target text based on the target corpus. The application solves the problem of non-ideal accuracy of text generation in the related technology.

Description

Text generation method and device, nonvolatile storage medium and electronic equipment
Technical Field
The application relates to the field of artificial intelligence, in particular to a text generation method and device, a nonvolatile storage medium and electronic equipment.
Background
At present, the generation quality of the statistical language model is greatly influenced by training samples, the more the training samples are rich, the better the corresponding training effect is, the probability of one word, sentence or even document needs to be calculated, and the probability of the next word or sentence and the validity of the semantics can be predicted by a computer from the probability angle. The related art adopts a neural network model for processing, but the neural network model has high complexity, low algorithm efficiency and poor interpretation of generated text, so that the efficiency of generating the text is insufficient. And the related technology only considers the corpus processed currently when generating the text, and lacks estimation of context continuity, so that the quality and accuracy of the generated text are not ideal.
Aiming at the problem of non-ideal accuracy of text generation in the related art, no effective solution is proposed at present.
Disclosure of Invention
The application mainly aims to provide a text generation method, a device, a nonvolatile storage medium and electronic equipment, so as to solve the problem of non-ideal accuracy of text generation in the related technology.
In order to achieve the above object, according to one aspect of the present application, there is provided a text generation method. The method comprises the following steps: acquiring N initial corpus, wherein context association relations exist among the N initial corpus; determining semantic features for representing the context association based on the N initial corpus; determining a first probability distribution for expressing the N initial corpus selection probabilities based on the semantic features; and selecting the N initial corpus by adopting the first probability distribution to obtain a target corpus, and generating a target text based on the target corpus.
In order to achieve the above object, according to another aspect of the present application, there is provided a text generating apparatus. The device comprises: the first acquisition module is used for acquiring N initial corpus, wherein context association relations exist among the N initial corpus; the semantic determining module is used for determining semantic features for representing the context association relation based on the N initial corpus; the probability determining module is used for determining a first probability distribution for expressing the N initial corpus selection probabilities based on the semantic features; the first generation module is used for selecting the N initial corpus by adopting the first probability distribution to obtain a target corpus, and generating a target text based on the target corpus.
To achieve the above object, according to another aspect of the present application, there is provided a non-volatile storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the text generation method of any one of the above.
In order to achieve the above object, according to another aspect of the present application, there is provided an electronic device including one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement any one of the text generation methods.
According to the application, the following steps are adopted: acquiring N initial corpus, wherein context association relations exist among the N initial corpus; determining semantic features for representing the context association based on the N initial corpus; determining a first probability distribution for expressing the N initial corpus selection probabilities based on the semantic features; and selecting the N initial corpus by adopting the first probability distribution to obtain a target corpus, and generating a target text based on the target corpus. The method achieves the purpose of enabling the generated text to have context continuity, and solves the problem of non-ideal accuracy of the generated text in the related technology. Thereby achieving the effect of improving the accuracy of the generated text.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:
FIG. 1 is a flow chart of a text generation method provided according to an embodiment of the present application;
FIG. 2 is an algorithm block diagram of a text generation method provided according to an embodiment of the present application; and
fig. 3 is a schematic diagram of a text generating apparatus provided according to an embodiment of the present application.
Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the application herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of description, the following will describe some terms or terminology involved in the embodiments of the present application:
the BERT model is a pre-training model for bi-directional encoding based on a transducer deep learning network at the encoder end. The BERT essence is mainly based on a self-supervised learning method that operates in a massive corpus in order to enable words to learn better features. Unlike the supervised learning method, the self-supervised learning also has labels generated by guidance, but the labels are not manually marked, but are learned by the relation between input and output data. The BERT task may be a feature extractor based on fine tuning or fixing of different NLP (natural language learning, natural Language Processing) tasks.
The BI-LSTM model, namely a bidirectional long-short-time memory network, is used for realizing forward propagation of the network and backward propagation of the network on the basis of the long-short-time memory network (LSTM). Compared with LSTM forward propagation, the BI-LSTM processing context correlation sequence can take backward correlation of input data into consideration, and can better capture the context two-way semantic dependence, so that modeling of text context information is realized.
It should be noted that, the relevant information related to the disclosure (including, but not limited to, the account, and the information related to the account) and the data (including, but not limited to, N initial corpus used to generate the initial features) are information and data authorized by the user or sufficiently authorized by each party. For example, an interface is provided between the system and the related user or organization, before acquiring the related account information and corpus data, an acquisition request needs to be sent to the user or organization through the interface, and after receiving the consent information fed back by the user or organization, the related account information and corpus data are acquired.
Based on the above-mentioned problems, an embodiment of the present application provides a text generation method, and fig. 1 is a flowchart of the text generation method provided according to an embodiment of the present application, as shown in fig. 1, the method includes the following steps:
Step S102, N initial corpus are obtained, wherein a context association relationship exists among the N initial corpus;
it can be understood that N initial corpus are required to be acquired in order to generate the target text, and context association relations exist among the N initial corpus. Compared with the related technology, the method has stronger characteristic acquisition capability for the context association relation, and is beneficial to enabling the text quality of the generated target text to be higher.
It should be noted that the N initial corpus may be words, terms, sentences, and the like.
In an optional embodiment, the acquiring N initial corpora includes: performing data crawling on a target server to obtain initial data corresponding to a plurality of accounts respectively, wherein the initial data comprises corpus content and account information respectively associated with the plurality of accounts; according to a preset first type, carrying out emotion classification based on initial data corresponding to the plurality of accounts respectively to obtain a plurality of accounts carrying emotion tags; according to a preset second type, carrying out preference classification based on the plurality of accounts carrying the emotion tags to obtain a plurality of accounts carrying the emotion tags and the preference tags; and generating the N initial corpus based on the initial data corresponding to the plurality of accounts respectively, the emotion tags and the preference tags.
It can be understood that information collection is performed on a plurality of accounts using the target server (the information collection is fully authorized), so as to obtain initial data corresponding to the plurality of accounts respectively. In order to enable the generated N initial corpus to have personalized style characteristics, emotion classification is carried out according to the first type, and a plurality of accounts carrying emotion labels are obtained. The above emotional tags may be, for example, positive, neutral, and negative. And then, carrying out preference classification on each account carrying the emotion tag according to the second type to obtain a plurality of accounts carrying the emotion tag and the preference tag, wherein each account corresponds to the emotion tag and the preference tag respectively. The preference tag may be as follows: living, leisure, entertainment, performance, responsibility and unknown-characteristic users. Based on initial data, emotion tags and preference tags respectively corresponding to the accounts, N initial corpus are generated. It should be noted that the emotion tags and preference tags described above may be used to obtain account style characteristics.
Optionally, before performing emotion classification, the initial data corresponding to the plurality of accounts respectively further needs to be cleaned, so as to reduce data noise.
Optionally, after carrying the plurality of accounts of the emotion tag and the preference tag, the method further comprises: and carrying out data complement on the initial data corresponding to the plurality of accounts respectively. The initial data with field missing is complemented according to a preset statistical mode, wherein the statistical mode can be mode, median, average value and the like.
Step S104, determining semantic features for representing the context association relation based on the N initial corpus;
it can be appreciated that, based on the N initial corpus, semantic features can be determined for features that enable the generated target text to have contextual relationships.
In an optional embodiment, the determining, based on the N initial corpora, semantic features for characterizing the context association includes: processing by adopting a preset first encoder based on the N initial corpus to generate initial characteristics; inputting the initial characteristics into a preset second encoder for processing to obtain semantic characteristics corresponding to the initial characteristics.
It can be understood that the encoding process of the N initial corpus is implemented by using a preset first encoder, and initial features for inputting into a second encoder are generated. The second encoder performs a further processing based on the obtained initial features, determining semantic features that can characterize the contextual relevance. Through the processing, the first encoder can be set to enable the generated initial characteristics to be accurate, and semantic characteristics with high accuracy can be obtained when the second encoder is processed.
Alternatively, the first encoder may be set as a BERT model.
Alternatively, the first encoder may be configured as a BI-LSTM model.
In an optional embodiment, the processing, based on the N initial corpora, with a preset first encoder to generate initial features includes: classifying the N initial corpus to generate a first sequence and a second sequence, wherein the first sequence comprises corpus contents respectively corresponding to the N initial corpus, and the second sequence comprises account style features existing in the N initial corpus; and determining an initial characteristic corresponding to the first sequence as a first initial characteristic and an initial characteristic corresponding to the second sequence as a second initial characteristic by adopting the first encoder.
It can be appreciated that, in order to enable the generated target text to have account style characteristics, N initial corpus can be classified to obtain a first sequence including corpus contents corresponding to the N initial corpus respectively, and a second sequence including account style characteristics existing in the N initial corpus. A first encoder is employed to generate a corresponding first initial feature for a first sequence of inputs and a corresponding second initial feature for a second sequence of inputs.
It should be noted that, the first initial feature and the second initial feature are obtained by classifying based on N initial corpus, and are the same as the initial feature directly generated based on N initial corpus.
Alternatively, the first sequence may be expressed as a= { a 1 ,a 2 ,....,a n The second sequence may be expressed as b= { b 1 ,b 2 ,....,b m And }, wherein a i For the i-th element in the first sequence, n is the total number of elements included in the first sequence, i=1..n, b j For the j-th element in the second sequence, m is the total number of elements included in the second sequence, j=1..m, and in the case that the first encoder is a BERT model, the BERT model performs feature embedding (embedding) on the first sequence and the second sequence, and performs self-attention mechanism and normalization operation on the input a i Obtaining a mapWherein (1)>For the i-th mapped element corresponding to the first sequence, for the input b j Get map-> And the j-th mapped element corresponding to the second sequence. Based on->Generating a first initial feature as X= { X 1 ,X 2 ,....,X n },X i For the i-th element in the first initial feature, based on +.>Generating a second initial characteristic of G= { G 1 ,G 2 ,....,G m },G j Is the j-th element in the second initial feature.
In an optional embodiment, the inputting the initial feature into a preset second encoder to process to obtain a semantic feature corresponding to the initial feature includes: determining a first intermediate feature, wherein the first intermediate feature is obtained by hidden layer processing included in the last processing before the second encoder generates the semantic feature; determining a second intermediate feature obtained by processing the initial feature by a hidden layer included in the second encoder; acquiring a loss function value corresponding to the initial characteristic; the semantic feature is obtained based on the first intermediate feature, the loss function value, and the second intermediate feature.
It can be understood that the second encoder includes a hidden layer, and the input initial feature is processed to obtain a second intermediate feature, which is a processing result obtained by the hidden layer during this processing. The second encoder processes the hidden layer at the last processing to obtain the first intermediate feature, which is the result of the processing by the hidden layer at the last processing. And obtaining a loss function value of the second encoder in the current process, wherein the loss function value can be characterized as a difference between the first intermediate feature and the second intermediate feature. Based on the first intermediate feature, the second intermediate feature, and the loss function value, semantic features characterizing the contextual relevance of the initial feature are obtained. Through the processing, the iterative thought is adopted, and the second intermediate feature obtained by the previous processing is used as one of parameters for generating the semantic feature, so that the semantic feature can have better capturing capability on the up-down bidirectional semantic.
In an optional embodiment, the acquiring a loss function value corresponding to the initial feature includes: acquiring a first weight value corresponding to the initial feature, a first decoding result and a second intermediate feature, wherein the first decoding result is obtained by a preset updating gate module included in the last processing before the first probability distribution is generated by a decoder, the decoder is used for generating a corresponding probability distribution based on semantic features output by the second encoder, and the second intermediate feature is obtained by a hidden layer included by the second encoder based on the initial feature; the loss function value is determined based on the first weight value, the first decoding result, and the second intermediate feature.
It will be appreciated that the loss function value is typically used to characterize the difference between the actual result and the calculated result, where the loss function calculation is used to derive a deviation from alignment with the first decoded result from the previous processing. The channel attention mode is adopted for weighting, and the attention of the user is considered to be more when the weight value is high. The initial feature is preset with a corresponding first weight value, and a loss function value corresponding to the initial feature can be determined according to the first weight value, the first decoding result and a second intermediate feature corresponding to the initial feature.
Optionally, in the case that the initial feature is divided into a first initial feature corresponding to the first sequence and a second initial feature corresponding to the second sequence, and the second encoder is a BI-LSTM model, the first initial feature is x= { X 1 ,X 2 ,....,X n },X i The i element in the first initial feature is G= { G 1 ,G 2 ,....,G m },G j Is the j-th element in the second initial feature. For x= { X 1 ,X 2 ,....,X n Second encoder inputs x= { X through hidden layer of BI-LSTM model 1 ,X 2 ,....,X n And obtaining a second intermediate feature corresponding to the first initial feature. Based on the first initial feature X, the first intermediate feature corresponding to the first initial feature is marked asThe second intermediate feature corresponding to the first initial feature can be obtained by >
Wherein t represents the present processing of generating the semantic features corresponding to the first initial feature, t-1 represents the last processing before generating the semantic features corresponding to the first initial feature,the hidden layer representing the BI-LSTM model processes the first initial feature.
For g= { G 1 ,G 2 ,....,G m The second encoder inputs g= { G through the hidden layer of the BI-LSTM model 1 ,G 2 ,....,G m Obtaining a first intermediate feature corresponding to the second initial feature, and marking the first intermediate feature asFurthermore, a second intermediate feature corresponding to the second initial feature can be obtained by>
Wherein,,the hidden layer representing the BI-LSTM model processes the second initial feature.
Obtaining second intermediate features after obtaining first initial features (i.e. initial features generated based on corpus content) and processing the first intermediate features by a hidden layer of a second encoder (BI-LSTM model)And the second initial feature (i.e. the initial feature generated based on the account style feature) is processed by the hidden layer of the second encoder (BI-LSTM model) to obtain a second intermediate feature->Then, summarizing hidden layer states by using an attention mechanism to obtain semantic features corresponding to the first initial feature capable of representing the contextual relevance +.>Semantic feature corresponding to the second initial feature +.>
The loss function value corresponding to the first initial feature is marked as a ti A second intermediate feature corresponding to the first initial featureThe semantic feature corresponding to the first initial feature can be obtained by>
Loss function value a corresponding to first initial feature ti Can be obtained in the following way:
a ti =softmax(e ti )
wherein softmax (…) is the loss function, e ti An alignment score for a first initial feature may be expressed asW a A first weight value s corresponding to the first initial feature t-1 And generating a first decoding result obtained by an update gate module included in the last processing before the first probability distribution for a decoder.
The loss function value corresponding to the second initial feature is marked as b tj A second intermediate feature corresponding to the second initial featureThe semantic feature corresponding to the second initial feature can be obtained by>
The loss function value b corresponding to the second initial feature tj Can be obtained in the following way:
b tj =softmax(e tj )
wherein e tj An alignment score for the second initial feature may be expressed asW b And the first weight value is corresponding to the second initial feature.
Step S106, determining a first probability distribution for expressing the N initial corpus selection probabilities based on the semantic features;
it will be appreciated that after the semantic features are obtained, a first probability distribution representing N initial corpus selection probabilities is determined. Since for a machine, the language model outputs the corpus in a way of selecting the corpus output combination with high probability as readable text. In order to improve the training effect on the model, before the target text for training is generated, determining a first probability distribution for selection, wherein the first probability distribution characteristics are determined based on semantic characteristics, so that the subsequently generated target text has better context consistency, and the model for training by using the target text has better text reproduction capability.
In an optional embodiment, the determining, based on the semantic features, a first probability distribution for expressing the N initial corpus selection probabilities includes: determining the first decoding result and a second probability distribution, wherein the second probability distribution is a probability distribution obtained by the decoder before the first probability distribution is generated; the second probability distribution, and the first decoding result, determine the first probability distribution based on the semantic features.
It will be appreciated that the decoder gets the first probability distribution in the last processing and the second probability distribution in the present processing. Based on the semantic features, the second probability distribution, and the first decoding result obtained in the previous processing, a first probability distribution can be determined.
Alternatively, the decoder may be an LSTM model.
Alternatively, the semantic features corresponding to the initial features may be denoted as C t The semantic feature C is obtained t And the last processing by the decoder yields a first decoding result (denoted s t-1 ) The update gate of the input decoder LSTM model obtains a second decoding result (denoted as s t ):
s t =LSTM dec (s t-1 ,[C t *e(y t-1 )])
Wherein LSTM dec (…) means that the update gate of the decoder is processed, e (y) t-1 ) For the parameter representing the second probability distribution y t-1 Second probability score obtained for last processing of decoderAnd (3) cloth.
Under the condition that the initial features are the first initial feature and the second initial feature respectively, the semantic features corresponding to the first initial featureSemantic feature corresponding to the second initial feature +.>Semantic feature corresponding to the first initial feature +.>Semantic feature corresponding to the second initial feature +.>First decoding result s t-1 An update gate of the decoder LSTM model is input to obtain a second decoding result s by t
A second weight value W can be set for the first decoding result 0 Y is obtained by t ,y t S obtained for the decoder based on the above-described processing t And a first probability distribution is generated:
y t ~softmax(W 0 s t )。
in an optional embodiment, the determining, based on the semantic features, a first probability distribution for expressing the N initial corpus selection probabilities includes: determining predetermined discrete features existing in the N initial corpus, the first decoding result, and a second probability distribution, wherein the second probability distribution is a probability distribution obtained by the decoder before the first probability distribution is generated; the first probability distribution is determined based on the predetermined discrete feature, the semantic feature, the second probability distribution, and the first decoding result.
It can be appreciated that the N initial corpus has predetermined discrete features, and the discretized data, such as account data, can obtain corresponding discrete features. The decoder gets the first probability distribution in the last processing and the second probability distribution in the present processing. The predetermined discrete features are added to the process of generating the first probability distribution, so that the generated first probability distribution is affected by the predetermined discrete features, such as the account data and other discrete data, and the selection probability is enabled to have account personalized features. The first probability distribution may be determined based on the predetermined discrete features, the semantic features, the second probability distribution, and the first decoding result from the previous processing.
Optionally, feature embedding (emmbedding) is performed based on the N initial corpus, resulting in a third sequence y= { y for representing discrete features 1 ,y 2 ,....,y n Processing by a gate control unit preset in the decoder to obtain preset discrete characteristics
Alternatively, the second decoding result s of the decoder is obtained by t
Other parameters are the same as the above definition and will not be described again, correspondingly based on s above t A first probability distribution y can be obtained t
Step S108, selecting the N initial corpus by adopting the first probability distribution to obtain a target corpus, and generating a target text based on the target corpus.
It can be understood that the first probability distribution is a probability distribution generated by representing the context characteristics, and the first probability distribution is selected from the N initial corpus, so that the target corpus can be selected, the text quality of the target text generated based on the target corpus is higher, and better context consistency can be achieved by using the target text training model.
In an optional embodiment, the generating the target text based on the target corpus includes: and combining the target corpus with the N initial corpora to obtain the target text, and training a preset initial text review model to obtain the target text review model.
It can be understood that the N initial corpus is not selected by the first probability distribution, the target text is selected by generating the first probability distribution based on semantic features, so that the dryness between contexts can be represented, the effect of training the initial text repeating model by directly adopting the N initial corpus is not ideal, and the problems of unsmooth output text, wrong semantic sequence and the like exist. Therefore, a mode of combining the target text with N initial corpus is adopted, so that the training process of the initial text repeating model is corrected by the target text, and the target text repeating model with better training effect is obtained.
Optionally, the target text review model is a Laser tab, is an open-source text editing model, can convert an input source text into a new synonymous text, and has good adaptability to application scenes with longer input and output texts.
Alternatively, any of the features of the embodiments described above may be represented in vector form.
According to the text generation method provided by the embodiment of the application, N initial corpus are obtained, wherein context association relations exist among the N initial corpus, and N is a positive integer greater than or equal to 1; determining semantic features for representing the context association relationship based on the N initial corpus; determining a first probability distribution for expressing the N initial corpus selection probabilities based on the semantic features; and selecting the N initial corpus by adopting the first probability distribution to obtain a target corpus, and generating a target text based on the target corpus. The method achieves the purpose of enabling the generated text to have context continuity, and solves the problem of non-ideal accuracy of the generated text in the related technology. Thereby achieving the effect of improving the accuracy of the generated text.
Based on the foregoing embodiment, the present application further provides an optional specific implementation manner, fig. 2 is an algorithm structure diagram of a text generating method according to an embodiment of the present application, and a text generating model for generating a target text shown in fig. 2 adopts the following structure, where a first encoder is a BERT model, a second encoder is a BI-LSTM model, and a decoder is an LSTM model. After the processing of data crawling, data cleaning, data deficiency and the like, N initial corpuses containing account personalized features (such as emotion labels and preference labels) can be obtained, and based on the N initial corpuses, a first sequence a= { a representing corpus content can be obtained 1 ,a 2 ,....,a n -and a second sequence b= { b characterizing an account style feature 1 ,b 2 ,....,b m And }, wherein a i For the i-th element in the first sequence, n is the total number of elements included in the first sequence, i=1..n, b j For the j-th element in the second sequence, m is the total number of elements included in the second sequence, j=1.
The BERT model respectively performs feature embedding on the first sequence and the second sequence, and performs self-attention mechanism and normalization operation on the input a i Obtaining a mapWherein (1)>For the i-th mapped element corresponding to the first sequence, for the input b j Get map-> And the j-th mapped element corresponding to the second sequence. Based on->(i=1. N) generating the first an initial feature is x= { X 1 ,X 2 ,....,X n },X i For the i-th element in the first initial feature, based on +.>Generating a second initial characteristic of G= { G 1 ,G 2 ,....,G m },G j Is the j-th element in the second initial feature.
The first initial feature X and the second initial feature G are input into a second encoder BI-LSTM model for processing, a second intermediate feature of the current processing can be obtained through a hidden layer in the BI-LSTM model, and the second intermediate feature corresponding to the first initial feature is thatThe second intermediate feature corresponding to the second initial feature is +.>
Summarizing hidden layer states in a first encoder using an attention mechanism to obtain semantic features corresponding to a first initial feature that may represent a contextual association Semantic feature corresponding to the second initial feature +.>
The N initial corpus has discretized account data and other data, and the N initial corpus can obtain the predetermined discrete characteristic expressed asSemantic feature corresponding to the first initial feature +.>Semantic feature corresponding to the second initial feature +.>First decoding result s t-1 Predetermined discrete feature->An update gate of the LSTM model of the input decoder can be used for the second decoding result s t
A second weight value W can be set for the first decoding result 0 Based on the second weight value W 0 Second decoding result s t Obtaining a first probability distribution y t
Using a first probability distribution y t And selecting N initial corpus to obtain target corpus, combining the target corpus and the N initial corpus to generate target text in order to improve the training effect of the initial text review model, and correcting the training process of the initial text review model by the target text to obtain a target text review model Laser trigger with better training effect.
Through the above-described processing, the present embodiment can achieve the following effects: considering the difference of text expression, classifying and describing according to different account style characteristics, and reflecting the personalized characteristics of the account. The attention mechanism and the gating memory unit are introduced, and semantic features for generating a specific style are guided to be carried through the text content and the account style features. The encoder uses BERT for word vector enhancement training to improve the quality of the generated text. The text with higher text quality is used for training the initial text replication model, so that the training effect is better than that of the text without adding the target text, and the smoothness and consistency of the generated text in application are ensured because the target text introduces the characteristics related to the context.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
The embodiment of the application also provides a text generation device, and the text generation device can be used for executing the text generation method provided by the embodiment of the application. The text generating device provided by the embodiment of the application is described below.
Fig. 3 is a schematic diagram of a text generating apparatus according to an embodiment of the present application. As shown in fig. 3, the apparatus includes: the first obtaining module 302, the semantic determining module 304, the probability determining module 306, the first generating module 308, are described in detail below:
the first obtaining module 302 is configured to obtain N initial corpora, where a context association relationship exists between the N initial corpora, and N is a positive integer greater than or equal to 1;
the semantic determining module 304 is connected to the first obtaining module 302, and is configured to determine semantic features for characterizing the context association based on the N initial corpora;
The probability determining module 306 is connected to the semantic determining module 304, and is configured to determine, based on the semantic features, a first probability distribution for expressing the N initial corpus selection probabilities;
the first generation module 308 is connected to the probability determination module 306, and is configured to select the N initial corpora by using the first probability distribution, obtain a target corpus, and generate a target text based on the target corpus.
The text generation device provided by the embodiment of the application is used for acquiring N initial corpus through the first acquisition module 302, wherein a context association relationship exists among the N initial corpus, and N is a positive integer greater than or equal to 1; the semantic determining module 304 is connected to the first obtaining module 302, and is configured to determine semantic features for characterizing the context association based on the N initial corpora; the probability determining module 306 is connected to the semantic determining module 304, and is configured to determine, based on the semantic features, a first probability distribution for expressing the N initial corpus selection probabilities; the first generation module 308 is connected to the probability determination module 306, and is configured to select the N initial corpora by using the first probability distribution, obtain a target corpus, and generate a target text based on the target corpus. The method achieves the purpose of enabling the generated text to have context continuity, and solves the problem of non-ideal accuracy of the generated text in the related technology. Thereby achieving the effect of improving the accuracy of the generated text.
In an alternative embodiment, the semantic determining module 304 includes: the first coding module is used for processing by adopting a preset first coder based on the N initial corpus to generate initial characteristics; and the second coding module is used for inputting the initial characteristics into a preset second coder for processing to obtain semantic characteristics corresponding to the initial characteristics.
In an alternative embodiment, the first encoding module includes: the first classification module is used for classifying the N initial linguistic data to generate a first sequence and a second sequence, wherein the first sequence comprises the linguistic data content corresponding to the N initial linguistic data respectively, and the second sequence comprises account style characteristics existing in the N initial linguistic data; and the third coding module is used for determining the initial characteristic corresponding to the first sequence as a first initial characteristic and the initial characteristic corresponding to the second sequence as a second initial characteristic by adopting the first coder.
In an alternative embodiment, the second encoding module includes: a first determining module, configured to determine a first intermediate feature, where the first intermediate feature is obtained by processing a hidden layer included in a last processing before the second encoder generates the semantic feature; a second determining module, configured to determine a second intermediate feature obtained by processing the initial feature by a hidden layer included in the second encoder; the second acquisition module is used for acquiring the loss function value corresponding to the initial characteristic; and a third obtaining module, configured to obtain the semantic feature based on the first intermediate feature, the loss function value, and the second intermediate feature.
In an alternative embodiment, the second obtaining module includes: a fourth obtaining module, configured to obtain a first weight value corresponding to the initial feature, a first decoding result, and a second intermediate feature, where the first decoding result is obtained by a preset updating gate module included in a last processing before the first probability distribution is generated by a decoder, the decoder is configured to generate a corresponding probability distribution based on a semantic feature output by the second encoder, and the second intermediate feature is obtained by a hidden layer included in the second encoder based on the initial feature; and a third determining module configured to determine the loss function value based on the first weight value, the first decoding result, and the second intermediate feature.
In an alternative embodiment, the probability determination module 306 includes: a fourth determining module, configured to determine predetermined discrete features existing in the N initial corpora, the first decoding result, and a second probability distribution, where the second probability distribution is a probability distribution obtained by the decoder before the first probability distribution is generated; a fifth determining module, configured to determine the first probability distribution based on the predetermined discrete feature, the semantic feature, the second probability distribution, and the first decoding result.
In an alternative embodiment, the first generating module 308 includes: the second generation module is used for combining the target corpus with the N initial corpora to obtain the target text, and training a preset initial text replication model to obtain the target text replication model.
It should be noted that each of the above modules may be implemented by software or hardware, for example, in the latter case, it may be implemented by: the above modules may be located in the same processor; alternatively, the various modules described above may be located in different processors in any combination.
Here, the first obtaining module 302, the semantic determining module 304, the probability determining module 306, and the first generating module 308 correspond to steps S102 to S108 in the embodiment, and the above modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the above modules may be run in a computer terminal as part of the apparatus.
It should be noted that, the optional or preferred implementation manner of this embodiment may be referred to the related description in the embodiment, and will not be repeated herein.
The text generating device comprises a processor and a memory, wherein the units and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel may set one or more, and text generation is performed by adjusting kernel parameters.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.
An embodiment of the present invention provides a non-volatile storage medium having stored thereon a program which, when executed by a processor, implements the above-described text generation method.
The embodiment of the invention provides a processor, which is used for running a program, wherein the text generation method is executed when the program runs.
The embodiment of the invention provides an electronic device, which comprises a processor, a memory and a program stored on the memory and capable of running on the processor, wherein the following steps are realized when the processor executes the program: acquiring N initial corpus, wherein context association relations exist among the N initial corpus, and N is a positive integer greater than or equal to 1; determining semantic features for representing the context association relationship based on the N initial corpus; determining a first probability distribution for expressing the N initial corpus selection probabilities based on the semantic features; and selecting the N initial corpus by adopting the first probability distribution to obtain a target corpus, and generating a target text based on the target corpus. The device herein may be a server, PC, PAD, cell phone, etc.
The application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of: acquiring N initial corpus, wherein context association relations exist among the N initial corpus, and N is a positive integer greater than or equal to 1; determining semantic features for representing the context association relationship based on the N initial corpus; determining a first probability distribution for expressing the N initial corpus selection probabilities based on the semantic features; and selecting the N initial corpus by adopting the first probability distribution to obtain a target corpus, and generating a target text based on the target corpus.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (10)

1. A text generation method, comprising:
acquiring N initial corpus, wherein context association relations exist among the N initial corpus, and N is a positive integer greater than or equal to 1;
determining semantic features for representing the context association based on the N initial corpus;
determining a first probability distribution for expressing the N initial corpus selection probabilities based on the semantic features;
and selecting the N initial corpus by adopting the first probability distribution to obtain a target corpus, and generating a target text based on the target corpus.
2. The method of claim 1, wherein the determining semantic features for characterizing the contextual relevance based on the N initial corpus comprises:
processing by adopting a preset first encoder based on the N initial corpus to generate initial features;
inputting the initial features into a preset second encoder for processing to obtain semantic features corresponding to the initial features.
3. The method of claim 2, wherein the processing with a preset first encoder based on the N initial corpus to generate initial features includes:
Classifying the N initial corpus to generate a first sequence and a second sequence, wherein the first sequence comprises corpus contents respectively corresponding to the N initial corpus, and the second sequence comprises account style features existing in the N initial corpus;
and determining initial characteristics corresponding to the first sequence as first initial characteristics and initial characteristics corresponding to the second sequence as second initial characteristics by adopting the first encoder.
4. The method according to claim 2, wherein the inputting the initial feature into a preset second encoder for processing to obtain a semantic feature corresponding to the initial feature includes:
determining a first intermediate feature, wherein the first intermediate feature is obtained by hidden layer processing included in the last processing before the second encoder generates the semantic feature;
determining a second intermediate feature obtained by processing the initial feature by a hidden layer included in the second encoder;
acquiring a loss function value corresponding to the initial characteristic;
based on the first intermediate feature, the loss function value, and the second intermediate feature, the semantic feature is obtained.
5. The method of claim 4, wherein the obtaining the loss function value corresponding to the initial feature comprises:
acquiring a first weight value corresponding to the initial feature, a first decoding result and a second intermediate feature, wherein the first decoding result is obtained by a preset updating gate module included in the last processing before the first probability distribution is generated by a decoder, the decoder is used for generating a corresponding probability distribution based on semantic features output by the second encoder, and the second intermediate feature is obtained by a hidden layer included by the second encoder based on the initial feature;
the loss function value is determined based on the first weight value, the first decoding result, and the second intermediate feature.
6. The method of any one of claim 5, wherein determining a first probability distribution for expressing the N initial corpus selection probabilities based on the semantic features comprises:
determining predetermined discrete features existing in the N initial corpus, the first decoding result, and a second probability distribution, wherein the second probability distribution is a probability distribution obtained by the decoder before the first probability distribution is generated;
The first probability distribution is determined based on the predetermined discrete features, the semantic features, the second probability distribution, and the first decoding result.
7. The method of any of claims 1 to 5, wherein the generating target text based on the target corpus comprises:
and combining the target corpus with the N initial corpora to obtain the target text, and training a preset initial text review model to obtain a target text review model.
8. A text generating apparatus, comprising:
the first acquisition module is used for acquiring N initial corpus, wherein context association relations exist among the N initial corpus, and N is a positive integer greater than or equal to 1;
the semantic determining module is used for determining semantic features for representing the context association relation based on the N initial corpus;
the probability determining module is used for determining a first probability distribution for expressing the N initial corpus selection probabilities based on the semantic features;
the first generation module is used for selecting the N initial corpus by adopting the first probability distribution to obtain a target corpus, and generating a target text based on the target corpus.
9. A non-volatile storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the text generation method of any of claims 1 to 7.
10. An electronic device comprising one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the text generation method of any of claims 1-7.
CN202310755909.7A 2023-06-25 2023-06-25 Text generation method and device, nonvolatile storage medium and electronic equipment Pending CN116842961A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310755909.7A CN116842961A (en) 2023-06-25 2023-06-25 Text generation method and device, nonvolatile storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310755909.7A CN116842961A (en) 2023-06-25 2023-06-25 Text generation method and device, nonvolatile storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN116842961A true CN116842961A (en) 2023-10-03

Family

ID=88162744

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310755909.7A Pending CN116842961A (en) 2023-06-25 2023-06-25 Text generation method and device, nonvolatile storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN116842961A (en)

Similar Documents

Publication Publication Date Title
US11869485B2 (en) Method for generating style statement, method and apparatus for training model, and computer device
CN110929515B (en) Reading understanding method and system based on cooperative attention and adaptive adjustment
CN118349673A (en) Training method of text processing model, text processing method and device
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN117173504A (en) Training method, training device, training equipment and training storage medium for text-generated graph model
WO2020073700A1 (en) Image description model training method and device, and storage medium
CN113609284A (en) Method and device for automatically generating text abstract fused with multivariate semantics
CN112417092A (en) Intelligent text automatic generation system based on deep learning and implementation method thereof
KR20190053028A (en) Neural machine translation apparatus and method of operation thereof based on neural network learning using constraint strength control layer
CN116956835A (en) Document generation method based on pre-training language model
CN117252957A (en) Method, device and storage medium for generating picture with accurate text according to text description
CN116150306A (en) Training method of question-answering robot, question-answering method and device
Wang et al. Data augmentation for internet of things dialog system
CN116913278B (en) Voice processing method, device, equipment and storage medium
CN117746186A (en) Training method of low-rank adaptive model, text image generation method and system
CN117473951A (en) Text processing method, device and storage medium
CN117132329A (en) Product recommendation method and device, storage medium and electronic equipment
CN116644180A (en) Training method and training system for text matching model and text label determining method
CN116028626A (en) Text matching method and device, storage medium and electronic equipment
CN116911306A (en) Natural language understanding method and device, server and storage medium
CN113704466B (en) Text multi-label classification method and device based on iterative network and electronic equipment
CN116842961A (en) Text generation method and device, nonvolatile storage medium and electronic equipment
CN111291576B (en) Method, device, equipment and medium for determining internal representation information quantity of neural network
CN114386480A (en) Training method, application method, device and medium of video content description model
Harichandana et al. Adaptive Beam Search to Enhance On-device Abstractive Summarization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination