WO2023071242A1 - Text generation method and apparatus, and storage medium - Google Patents

Text generation method and apparatus, and storage medium Download PDF

Info

Publication number
WO2023071242A1
WO2023071242A1 PCT/CN2022/100545 CN2022100545W WO2023071242A1 WO 2023071242 A1 WO2023071242 A1 WO 2023071242A1 CN 2022100545 W CN2022100545 W CN 2022100545W WO 2023071242 A1 WO2023071242 A1 WO 2023071242A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
sample
keyword
type
target
Prior art date
Application number
PCT/CN2022/100545
Other languages
French (fr)
Chinese (zh)
Inventor
王昕远
郑少杰
范增虎
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2023071242A1 publication Critical patent/WO2023071242A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Definitions

  • the present application relates to the technical field of artificial intelligence, in particular to a method and device for generating text, and a storage medium.
  • the network will push a lot of text information corresponding to the object to the user every day, so that the user can understand the object in depth according to the text information, so as to realize the processing process of the object.
  • the embodiments of the present application expect to provide a text generation method and device, and a storage medium, which can improve the intelligence of the text generation device when generating text information.
  • An embodiment of the present application provides a text generation method, including:
  • the target template is obtained from the template library;
  • the template in the template library is a text template provided with a text type;
  • An embodiment of the present application provides a text generation device, the device includes:
  • the obtaining part is configured to obtain text keywords from the text generation instruction in the case of receiving the text generation instruction; in the case that there is a target template containing the target text type in the template library, from the template library Obtain the target template in the template library; the template in the template library is a text template with a text type;
  • a determining part configured to determine a target text type corresponding to the text keyword
  • the replacement part is configured to replace the field information corresponding to the target text type with the field information of the text keyword at the position, so as to obtain the target text containing the text keyword.
  • An embodiment of the present application provides a text generation device, the device includes:
  • the memory communicates with the processor through the communication bus, the memory stores a text-generated program executable by the processor, and when the text-generated program is executed , using the processor to execute the above text generation method.
  • An embodiment of the present application provides a storage medium on which a computer program is stored, which is applied to a text generation device.
  • the computer program is executed by a processor, the above text generation method is implemented.
  • Embodiments of the present application provide a text generation method and device, and a storage medium.
  • the text generation method includes: in the case of receiving a text generation instruction, obtaining text keywords from the text generation instruction, and determining the target corresponding to the text keyword Text type; if there is a target template containing the target text type in the template library, obtain the target template from the template library; the template in the template library is a text template with a text type; find the target text type in the target template position, and replace the field information corresponding to the target text type with the field information of the text keyword at the position to obtain the target text containing the text keyword.
  • the text generation device when it receives the text generation instruction, it obtains the text keywords from the text generation instruction, searches the template library for a target template that includes the target text type corresponding to the text keyword, and Find the position of the target text type in the target template, and use the field information of the text keyword to replace the field information corresponding to the target text type at this position, so as to obtain the target text containing the text keyword, which does not need to be obtained manually
  • the text information improves the intelligence of the text generating device when generating the text information.
  • Fig. 1 is a flow chart of a text generation method provided by the embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of an exemplary BERT provided in the embodiment of the present application.
  • FIG. 3 is a schematic diagram of an exemplary supervised training BERT model provided by an embodiment of the present application.
  • Fig. 4 is a flow chart of an exemplary training BERT model provided by the embodiment of the present application.
  • FIG. 5 is an exemplary text template persistence flowchart provided by the embodiment of the present application.
  • FIG. 6 is a flow chart of an exemplary text generation method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a seed stage and an automatic training stage of an exemplary text generation method provided by an embodiment of the present application.
  • FIG. 8 is a first structural diagram of a text generation device provided by an embodiment of the present application.
  • FIG. 9 is a second schematic diagram of the composition and structure of a text generation device provided by an embodiment of the present application.
  • Fig. 1 is a flow chart 1 of a text generation method provided in the embodiment of the present application.
  • the text generation method may include:
  • a text generation method provided by an embodiment of the present application is applicable to a scenario where a target text is generated according to text keywords carried in a text generation instruction.
  • the text generation device may be implemented in various forms.
  • the text generation device described in this application may include mobile phones, cameras, tablet computers, notebook computers, palmtop computers, personal digital assistants (Personal Digital Assistant, PDA), portable media players (Portable Media Player, PMP), Devices such as navigation devices, wearable devices, smart bracelets, pedometers, and devices such as digital TVs, desktop computers, etc.
  • PDA Personal Digital Assistant
  • PMP portable media players
  • Devices such as navigation devices, wearable devices, smart bracelets, pedometers, and devices such as digital TVs, desktop computers, etc.
  • the text generation instruction can be an instruction for generating marketing text; the text generation instruction can also be an instruction for generating advertising text; the text generation instruction can be an instruction for generating other text; the specific text generation instruction can be based on the actual situation The determination is made, which is not limited in this embodiment of the present application.
  • the text generation device may include a display screen, and the text generation device may receive a text generation instruction from the display screen; the text generation device may also receive a text generation instruction from other devices, and the text generation instruction may also be transmitted through other devices.
  • the method for receiving the text generation instruction is performed by the text generation device; the specific method for the text generation device to receive the text generation instruction can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • the text keyword may be information used to generate the target text corresponding to the text generation instruction.
  • the number of text keywords can be one, the number of text keywords can also be two, the number of text keywords can also be multiple, and the specific number of text keywords can be determined according to the actual situation. Alternatively, this embodiment of the present application does not limit it.
  • the text keywords include bank, coupon, 10 yuan, January 1st to January 30th, movie viewing, card binding, etc.
  • the number of target text types can be one, the number of target text types can also be two, the number of target text types can also be multiple, and the specific number of target text types can be determined according to the actual situation. Alternatively, this embodiment of the present application does not limit it.
  • the target text type can be a company name; the target text type can also be a product name; the target text type can also be a distribution item; the target text type can also be a numerical amount; the target text type can also be an activity time or an activity Description; the specific target text type can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • the process of the text generation device determining the target text type corresponding to the text keyword includes: when the text generation instruction does not carry the target text type, the text generation device inputs the text keyword into the type recognition model, Obtain the target text type; if the target text type is carried in the text generation instruction, the text generation device obtains the target text type from the text generation instruction.
  • the type recognition model can be a model configured in the text generation device; the type recognition model can also be a model obtained by the type recognition model from other devices before the text generation device inputs text keywords into the type recognition model; The recognition model may also be a model obtained by the text generation device in other ways; the specific manner in which the text generation device obtains the type recognition model may be determined according to actual conditions, which is not limited in this embodiment of the present application.
  • the type recognition model can be a text classification (FastText) model; the type recognition model can also be other models that can determine the text type according to text keywords; the specific type recognition model can be determined according to the actual situation , which is not limited in this embodiment of the present application.
  • FastText text classification
  • the type recognition model can also be other models that can determine the text type according to text keywords
  • the specific type recognition model can be determined according to the actual situation , which is not limited in this embodiment of the present application.
  • the text generation device inputs the text keywords into the type recognition model, and before obtaining the target text type, the text generation device will also obtain the second sample keyword and the second sample text type; the text generation device uses the second sample The keywords and the second sample text type train the initial type recognition model to obtain the type recognition model.
  • the second sample keyword can be a preset keyword; the second sample keyword can also be a keyword transmitted to the text generating device by other devices; the second sample keyword can also be a text generated
  • the keywords received by the device through manual labeling; the specific method of obtaining the second sample keywords by the text generation device can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • the second sample text type is a text type corresponding to the second sample keyword.
  • the second sample text type can be a preset text type; the second sample text type can also be the text type transmitted to the text generating device by other equipment; the second sample text type can also be received by the text generating device through manual marking
  • the text type obtained by the method; the specific method of obtaining the second sample text type by the text generation device can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • the text generation device may acquire the second sample keyword and the second sample text type only once.
  • the second sample keywords include bank, coupon, 10 yuan, January 1 to January 30, movie watching, card binding and so on.
  • the second sample text type includes: company name, product name, issued items, numerical value, activity time or activity description, etc.; the specific second sample text type can be determined according to the actual situation, and this embodiment of the application Not limited.
  • the templates in the template library are text templates set with the text type.
  • the text generation device determines the target text type corresponding to the text keyword, if the text generation device has a target template containing the target text type in the template library, the text generation device obtains the target text type from the template library. template.
  • templates in the template library are text templates with a text type.
  • the number of text templates can be one, and the number of text templates can also be two; the number of text templates can also be multiple, and the specific number of text templates can be determined according to the actual situation. The embodiment does not limit this.
  • the text generation device before the text generation device obtains the target template from the template library, the text generation device will also obtain the first sample text; and input the first sample text into the keyword recognition model to obtain the first sample text The corresponding first sample keyword, the first sample type and the first position of the first sample keyword in the first sample text; the text generation device inputs the first sample keyword into the text generation model to obtain the first sample keyword An output text; according to the first output text, the first sample text, the first sample keyword, the first sample type and the first position, a text template is obtained, and the text template is added to the template library.
  • the text generation device may obtain the first sample text every preset time period; the text generation device may also obtain the first sample text when receiving the sample text acquisition instruction. Note that the text generation device can also obtain the first sample text in other ways; the specific method for the text generation device to obtain the first sample text can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • the preset time period can be the time period configured in the text generating device; the preset time period can also be the time period received by the text generating device before the text generating device obtains the first sample text; the preset time The segment may also be a time segment obtained by the text generating device in other ways, and the specific manner in which the text generating device obtains the preset time segment may be determined according to actual conditions, which is not limited in this embodiment of the present application.
  • the preset time period can be one week; the preset time period can also be one month; the preset time period can also be one day; the specific preset time period can be determined according to the actual situation. There is no limit to this.
  • the keyword recognition model can be a model configured in the text generation device; the keyword recognition model can also be a model transmitted by other devices received by the text generation device; the keyword recognition model can also be a text generation device Models obtained in other ways; the specific manner in which the text generation device obtains the keyword recognition model may be determined according to actual conditions, which is not limited in this embodiment of the present application.
  • the keyword recognition model can be a model obtained from a language representation model (Bidirectional Encoder Representation from Transformers, BERT) and a conditional random field model; the keyword recognition model can also be other models that can be obtained from the sample text Models of the sample keywords, sample types, and positions of the sample keywords in the sample text corresponding to the sample text; the specific keyword recognition model can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • a language representation model Bidirectional Encoder Representation from Transformers, BERT
  • conditional random field model the keyword recognition model can also be other models that can be obtained from the sample text Models of the sample keywords, sample types, and positions of the sample keywords in the sample text corresponding to the sample text
  • the specific keyword recognition model can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • the text generation model can be a model configured in the text generation device; the text generation model can also be a model transmitted by other devices received by the text generation device; the text generation model can also be a text generation device with other The model obtained by the method; the specific method for the text generation device to obtain the text generation model can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • the text generation model can be the Fixed-Keywords BERT model; the text generation model can also be other models that can generate output text according to the text keywords; the specific text generation model can be determined according to the actual situation. The embodiment does not limit this.
  • the text generation device obtains the text template according to the first output text, the first sample text, the first sample keyword, the first sample type and the first position, including: the text generation device Utilize the keyword recognition model to determine the second position of the first sample keyword in the first output text; the text generation device replaces the first sample key with the first sample type at the second position in the first output text word, to obtain the first template; the text generation device uses the first sample type to replace the first sample keyword at the first position in the first sample text to obtain the second template; the text generation device combines the first template and The second template serves as a text template.
  • the text generation device uses the keyword recognition model to determine the second position of the first sample keyword in the first output text, which can be used for the text generation device to input the first output text into the keyword recognition model , using the keyword recognition model to determine the second position of the first sample keyword in the first output text.
  • the first template and the second template can be the same; the first template and the second template can also be different; if the number of the first template and the number of the second template are multiple, then the first template and the second template
  • the second template may also have some same templates and some different templates; the specific ones can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • first position and the second position may be the same; the first position and the second position may also be different; the specifics may be determined according to actual conditions, which is not limited in this embodiment of the present application.
  • the text generation device inputs the first sample text into the keyword recognition model, and obtains the first sample keyword corresponding to the first sample text, the first sample type, and the first sample keyword in Before the first position in the first sample text, the text generation device will also obtain the second sample text and the second sample keyword corresponding to the second sample text, the second sample type and the second sample key corresponding to the second sample text The third position of the word in the second sample text; the text generation device uses the second sample keyword, the second sample type, the third position and the second sample text to train the initial keyword recognition model to obtain the keyword recognition model.
  • the text generation device is configured with a regular expression combining ⁇ marketing words ⁇ and ⁇ product/company name ⁇ , and the text generation device can use the regular expression to obtain the second sample text from the full amount of Internet data , and mark the corresponding second sample keyword, the second sample type, and the third position of the second sample keyword in the second sample text from the second sample text by manual labeling. Then, the second sample keyword, the second sample type and the third location are transmitted to the text generating device, and at this time, the text generating device acquires the second sample keyword, the second sample type and the third location.
  • the marketing words are words related to financial marketing, words configured in the text generating device, and the marketing words include: receiving, benefits, discounts, red envelopes, limited time, special prices, free shipping, recharge, coupons, members , voucher, blockbuster, good news, exclusive, exclusive, super value, special offer, gift, reward, exchange, activation, gift, subsidy, 11.11, 12.12, double 11, double 12, lottery, double 11, double 12.
  • the product/company name is a financial-related product and company name or abbreviation, represented by " ⁇ product/company name ⁇ ”.
  • the regular expression for combining ⁇ marketing word ⁇ and ⁇ product/company name ⁇ can be: ⁇ marketing word ⁇ .* ⁇ product/company name ⁇ ; combining ⁇ marketing word ⁇ and ⁇ product/company name ⁇
  • the combined regular expression can also be ⁇ product/company name ⁇ .* ⁇ marketing word ⁇ .
  • the first sample text may also use the sample text information obtained from the full amount of data on the Internet every preset time period by using the regular expression.
  • the corresponding second sample type is a company name; if the second sample keyword is a coupon, then the corresponding second sample type is an issued item; If the keyword in the second sample is 10 yuan, the corresponding second sample type is monetary value; if the second sample keyword is from January 1 to January 30, then the corresponding second sample type is activity time.
  • the corresponding keyword of the first second sample is the company, and the corresponding type of the first second sample is Company name
  • the corresponding first third position is (0, 2);
  • the corresponding second second sample keyword is 50 yuan,
  • the corresponding second second sample type is the amount value, the corresponding second
  • the third position is (7, 10);
  • the corresponding third second sample keyword is red envelope, the corresponding third second sample type is distribution items, and the corresponding third third position is (10, 12) .
  • the third position may be a pair of starting and ending positions where the second sample keyword appears in the second sample text.
  • the text generation device determines at least two Empty positions and at least two groups of characters corresponding to at least two vacant positions; the text generation device splices at least two vacant positions and keywords according to at least two groups of characters to obtain splicing information; the text generating device inputs the splicing information into the text to generate model to obtain at least two groups of target character information corresponding to at least two empty positions; the text generation device adds at least two groups of target character information to at least two empty positions in the splicing information to obtain target text.
  • At least two empty positions correspond to at least two groups of characters, that is, at least one empty position corresponds to a group of characters.
  • the number of text keywords when the number of text keywords is one, there will be an empty position on the left side of the text keyword, and there will be a second empty position on the right side of the text keyword; when the number of text keywords is In the case of two, there will be an empty space to the left of the first text keyword, and a second empty space between the first text keyword and the second text keyword; the second text keyword's There will be a third empty position on the right; ....;
  • the number of text keywords is N, there will be an empty position to the left of the first text keyword, the first text keyword and the second text There will be a second empty position between the keywords; ...; There will be an Nth empty position between the N-1th text keyword and the Nth text keyword; there will be a Nth empty position on the right of the Nth text keyword N+1 empty slots. That is, when the number of text keywords is N, the corresponding number of empty positions is N+1.
  • the text generation device inputs the splicing information into the text generation model to obtain at least two groups of target character information corresponding to at least two empty positions, including the text generation device inputting the splicing information into the text generation model, using the text
  • the generation model obtains the first word in each group of empty positions in at least two empty positions by sampling, that is, at least two groups of first characters are obtained; and then at least two groups of first characters and splicing information are input into the text Generate a model, use the text generation model to obtain at least two groups of second characters in at least two empty positions by sampling, until each word in the at least two empty positions is obtained by using the text generation model by sampling, that is, get At least two sets of target character information.
  • BERT can be divided into three parts: word vector conversion part, encoding part and supervision part. Among them, in the case of receiving the input text, first use the word vector conversion part to perform word vector conversion on the input text to obtain the word vector sequence (CLS, word 1, word 2, word 3, ..., word N), and then use BERT The encoding part encodes the word vector sequence, and finally uses the supervision part to determine the text category of the encoded input text.
  • the encoding part is the main body of BERT. Its main function is to encode the input N+1 word vectors to allow information interaction between all input vectors.
  • the coding part is composed of several layers of coding blocks, and the first coding block in the coding part can obtain the first coding sequence (E CLS , E 1 , E 2 , E 3 , ..., E n ) after coding the word vector sequence (E CLS , E 1 , E 2 , E 3 , ..., E n ) after coding the word vector sequence (E CLS , E 1 , E 2 , E 3 , ..., E n
  • a coding sequence is the coding sequence closest to the word vector sequence in Fig. 3)
  • the last coding block in the coding part can obtain the coding output sequence (E CLS , E 1 , E 2 , E 3 , ..., E n ) (the coding output sequence is the coding sequence closest to the supervisory part in Fig. 3).
  • the supervised part includes the labels corresponding to the input text needed for supervised training of BERT.
  • Figure 3 shows the multi-category classification of the input text.
  • the supervision part can be adjusted according to the task goal, such as performing named entity recognition, question answering, etc.
  • the function of the text generation model is to generate marketing copy (that is, target text) containing these template keywords according to the given template keywords.
  • Fig. 3 shows the supervised training process of inputting "bank” and “red envelope” as sample keywords to generate a marketing copy (output text) "Come and get the bank red envelope!.
  • word-vector conversion is performed on "bank", "red envelope” and the mask part to obtain a word-vector sequence; the word-vector sequence is encoded using the first code block in the code part to obtain the first code sequence (E CLS , E M , EM , EM , Esilver , Erow , EM , EM , EM , Ered , Epacket , EM , EM , EM ) until the last coded block pair in the coded section is utilized
  • Encoding is performed to obtain the encoded output sequence, and the supervised part is used to supervise the encoded output part to obtain the prediction result (come and get it, ---, la!-).
  • the first method is: according to the sample text and the sample keyword, determine the number of words corresponding to the blank value formed by the sample keyword in the sample text. For example, for the marketing copy "Come and get the bank red envelope!”, and the sample keywords it contains:
  • Sample type ⁇ distributed item>, sample keyword: red envelope, keyword position: (5, 7)
  • the second type is: for the number of words corresponding to all vacancy values, take the maximum value, which can be L M . Then, L M mask vectors (denoted as M) are used to insert into the vacancies formed between all sample keywords. For example, for inputting "bank” and “red envelope”, L M mask vectors can be inserted before “bank”, between "bank” and “red envelope”, and after "red envelope”. As shown in Figure 3, assuming that L M is 3, the word vector conversion part shows the final result including the mask.
  • word supervision is performed on the mask part. If the number of words in the corresponding position in the sample copy is less than L M , the supervision starts from the leftmost part of the mask part, and the supervision object of the remaining position is "-", as shown in "---" in Figure 3, which means this Words do not exist anywhere.
  • the final marketing copy can be obtained according to the “-” in the predicted target character and the sample keyword.
  • the text generation device acquires a second sample text and a second sample keyword corresponding to the second sample text.
  • the text generation device constructs a word vector sequence by using the second sample keywords.
  • Second Sample Type ⁇ Company Name>, Second Sample Keyword: Bank, Third Position: (3, 5)
  • the second sample type ⁇ distributed items>, the second sample keyword: red envelope, the third position: (5, 7)
  • the second sample keyword in the marketing copy is converted into word vectors from left to right, and each word is directly converted into a 200-dimensional vector. All the word vectors of the two words are spliced together to construct a 200-dimensional word vector sequence of length 4. Then, the mask vector is filled for all vacancies formed by the two second sample keywords.
  • the length can be obtained as: 3 (length of mask vector sequence) + 2 (length of "bank” vector sequence) + 3 (length of mask vector sequence) + 2 (length of "red envelope” vector sequence) + 3 (mask
  • the vector sequence length) is a 200-dimensional vector sequence.
  • the text generation device constructs a training label according to the second sample text.
  • the training label represents the expected result after inputting the data into the model, that is, the real marketing copy.
  • a mask is inserted for each vacancy, and it is necessary to ensure that the training label and the word vector sequence correspond to each word position.
  • a vector sequence can be constructed: [M, M, M, silver, line, M, M, M, red, bag, M, M, M], then the corresponding training label is constructed as follows: [Quick , come, collar, silver, OK, -,-,-, red, bag, la,! ,-]. Among them, "-" indicates that there is no character in the corresponding position.
  • the text generation device inputs the word vector sequence into the encoding part of the initial text generation model to obtain an encoded output sequence.
  • the text generation device trains the initial text generation model according to the encoded output sequence and the training labels, and obtains the text generation model.
  • the text generating device maps each vector (except the CLS vector) in the coded output sequence to the word list set (including "-").
  • the encoded output sequence length obtained after inputting the word vector sequence [M, M, M, silver, line, M, M, M, red, bag, M, M, M] into the encoding part of the initial text generation model A sequence of 200-dimensional vectors of 13. For each vector, multiply a trainable matrix (matrix shape: 200 ⁇ (word size+1), 1 means "-") so as to map the vector to the target word table (including "-"). After that, you can determine the mapped vector sequence and training label: [Quick, come, collar, silver, line, -,-,-, red, package, la,! ,-]
  • the cross entropy between , and gradient descent can be used to fine-tune and update the parameters of the initial text generation model.
  • the initial text generation model converges (that is, the parameters of the initial text generation model cannot be updated) or reaches the maximum number of training steps, it can be considered that the initial text generation model has been trained, thereby obtaining a text generation model.
  • the Fixed-Keywords BERT model will acquire the following capabilities: input the words “bank” and “red envelope”, and output “come to get” on the left side of “bank”, “bank”, “---” between “red envelopes”, “ ⁇ !-” on the right side of “red envelopes”.
  • "-" indicates that there is no character here, and the mask part after removing "-” is spliced together with "bank” and “red envelope” in order to get a complete marketing copy: "Come and get the bank red envelope! ".
  • the text generation device acquires a first sample keyword.
  • the text generation device while acquiring the first sample keyword, will also acquire the first sample type corresponding to the first sample keyword. Specifically, the text generation device may input the first sample text into the keyword recognition model to obtain the first sample keyword and the first sample type corresponding to the first sample text.
  • the text generation device also needs the first sample keyword sequence.
  • the input first sample keyword form may be:
  • the first sample type ⁇ company name>, the first sample keyword: bank;
  • Type of the first sample ⁇ item issued>, keyword of the first sample: interest-free coupon
  • the input first sample keyword sequence is order-sensitive, that is, the order of the input first sample keyword sequence is consistent with the order in which it appears in the final generated marketing copy. In order to subsequently generate a text template, it is necessary to obtain the first sample type.
  • the text generation device inputs the first sample keywords into the text generation model to obtain a first output text.
  • the two words bank and interest-free coupon form spaces in sequence (the left side of "bank”, between “bank” and “interest-free coupon”, and the right side of "interest-free coupon”)
  • L M the maximum value of the number of words that appear.
  • the word vector sequence that can be constructed is: [M, M, M, bank, bank, M, M, M, free, interest, coupon, M, M, M].
  • Input the constructed word vector sequence into the encoding part of the Fixed-Keywords BERT model, and the encoded output sequence output by the last encoding layer (the last encoding block) can be obtained.
  • Each vector in the coded output sequence (except the part where the ECLS and the first sample keyword is located) is mapped to the word table (including "-"), and the word with the largest probability value obtained after the mapping is selected as the current location predictions.
  • mapping for each character position, there is a numerical (probability) vector representing the possibility of each word in the word table (including "-"), and the word with the highest probability value can be used as the position prediction here out of the word.
  • all nine positions are predicted, then combine with the first sample keywords to get: ⁇ -,-,-, bank, line, big, amount,-, free, interest, coupon, enjoy, no, stop ⁇ .
  • the predicted marketing copy (the first output text is ) can be obtained by removing the "-" representing the non-existent character here: “Enjoy non-stop interest-free bank coupons”.
  • the text generation device determines a second position of the first sample keyword in the first output text by using the keyword recognition model.
  • the text generation device replaces the first sample keyword with the first sample type at the second position in the first output text to obtain a first template; and uses the first template as a text template.
  • the first sample keyword in the predicted marketing copy is replaced with the corresponding first sample type, that is, "bank” is replaced by the first sample keyword
  • “interest-free coupon” is replaced by the first sample type " ⁇ issued item>”
  • the final first template can be obtained: " ⁇ company name> large amount ⁇ issued item> enjoy Non-stop”
  • the first template is stored, that is, the persistence of the marketing copy template is completed.
  • the text generation device after the text generation device searches the target template containing the target text type in the template library, the text generation device can search for the position of the target text type in the target template, and use the field of the text keyword at the position The information replaces the field information corresponding to the target text type to obtain the target text containing text keywords.
  • the target text is the text corresponding to the text generation instruction.
  • the text generation device acquires text keywords from the text generation instruction.
  • the text generation device inputs text keywords into the type recognition model to obtain the target text type.
  • the text generation device obtains the target text type from the text generation instruction.
  • the text generation device acquires the target template from the template library.
  • the text generation device searches for the position of the target text type in the target template, and replaces the field information corresponding to the target text type with the field information of the text keyword at the position, to obtain the target text containing the text keyword.
  • the text generation device determines at least two empty positions formed according to the text keywords and at least two groups of character quantities corresponding to the at least two empty positions.
  • the text generating device splices at least two empty positions and keywords according to at least two sets of characters to obtain splicing information.
  • the text generation device inputs the splicing information into the text generation model to obtain at least two sets of target character information corresponding to at least two empty positions.
  • the text generation device adds at least two sets of target character information to at least two empty positions in the splicing information to obtain the target text.
  • an exemplary text generation method includes a seed stage and an automatic training stage, as shown in FIG. 7 .
  • the seed stage is to obtain the second sample text first, and manually mark the second sample text to obtain the second sample keywords corresponding to the second sample text, the second sample type and the second sample text corresponding to the second sample text
  • the third position of the keyword in the second sample text Utilize the second sample keyword, the second sample type, the third position and the second sample text to train the initial keyword recognition model to obtain the keyword recognition model (training keyword recognition Model).
  • the initial type recognition model is trained by using the second sample keywords and the second sample text type to obtain a type recognition model (training type recognition model).
  • the automatic training stage is to obtain the first sample text, input the first sample text into the keyword recognition model, and obtain the first sample keyword, the first sample type and the first sample key corresponding to the first sample text
  • the first position of the word in the first sample text (use the keyword recognition model to mark the first sample text); input the first sample keyword into the text generation model to obtain the first output text; use keyword recognition
  • the model determines the second position of the first sample keyword in the first output text; at the second position in the first output text, the first sample keyword is replaced by the first sample type to obtain the first template; At the first position in the first sample text, utilize the first sample type to replace the first sample keyword to obtain a second template; use the first template and the second template as text templates (get text templates), and
  • the text template is added to the template library, so that when the text generation instruction is received, the target text containing the text keyword is obtained according to the text keyword in the text generation instruction and the target template in the template library.
  • the text generation device when it receives the text generation instruction, it obtains the text keyword from the text generation instruction, searches the template library for a target template that includes the target text type corresponding to the text keyword, and Find the position of the target text type in the template, and use the field information of the text keyword to replace the field information corresponding to the target text type at this position, so as to obtain the target text containing the text keyword, and do not need to obtain the text manually information, which improves the intelligence of the text generating device when generating text information.
  • FIG. Text generating device 1 may include:
  • the obtaining part 11 is configured to, in the case of receiving a text generation instruction, obtain text keywords from the text generation instruction; if there is a target template containing the target text type in the template library, from the template Obtain the target template in the library; the template in the template library is a text template with a text type;
  • the determining part 12 is configured to determine the target text type corresponding to the text keyword
  • the replacement part 13 is configured to replace the field information corresponding to the target text type with the field information of the text keyword at the position, so as to obtain the target text containing the text keyword.
  • the device further includes an input part and an adding part
  • the acquisition part 11 is configured to acquire the first sample text
  • the input part is configured to input the first sample text into the keyword recognition model to obtain the first sample keyword, the first sample type and the first sample text corresponding to the first sample text
  • the keyword is in the first position in the first sample text
  • the first sample keyword is input into the text generation model to obtain the first output text; according to the first output text, the first sample text This, the first sample keyword, the first sample type and the first position to obtain the text template;
  • the adding part is configured to add the text template to the template library.
  • the determining part 12 is configured to determine a second position of the first sample keyword in the first output text by using a keyword recognition model
  • the replacement part 13 is configured to replace the first sample keyword with the first sample type at the second position in the first output text to obtain a first template; in the At the first position in the first sample text, use the first sample type to replace the first sample keyword to obtain a second template; use the first template and the second template as The text template.
  • the device further includes a training part
  • the acquiring part 11 is configured to acquire the second sample text and the second sample keyword corresponding to the second sample text, the second sample type corresponding to the second sample text and the second sample keyword in the second sample text the third position in the second sample text;
  • the training part is configured to use the second sample keyword, the second sample type, the third position and the second sample text to train an initial keyword recognition model to obtain the keyword recognition model.
  • the device further includes a splicing part
  • the determining part 12 is configured to determine at least two empty positions formed according to the text keywords and the at least two At least two groups of characters corresponding to the empty positions; the at least two empty positions correspond to the at least two groups of characters one by one;
  • the splicing part is configured to splice the at least two empty positions and the keyword according to the at least two groups of characters to obtain splicing information
  • the input part is configured to input the splicing information into the text generation model to obtain at least two sets of target character information corresponding to the at least two empty positions;
  • the adding part is configured to add the at least two groups of target character information to the at least two empty positions in the splicing information to obtain the target text.
  • the input part is configured to input the text keywords into the type recognition model to obtain the target text when the target text type is not carried in the text generation instruction type;
  • the obtaining part 11 is configured to obtain the target text type from the text generation instruction if the text generation instruction carries the target text type.
  • the acquisition part 11 is configured to acquire a second sample keyword and a second sample text type
  • the training part is configured to use the second sample keywords and the second sample text type to train an initial type recognition model to obtain the type recognition model.
  • the above-mentioned acquisition part 11, determination part 12 and replacement part 13 can be realized by the processor 14 on the text generation device 1, specifically CPU (Central Processing Unit, central processing unit), MPU (Microprocessor Unit, microprocessor), DSP (Digital Signal Processing, digital signal processor) or Field Programmable Gate Array (FPGA, Field Programmable Gate Array) and other realizations; the above-mentioned data storage can be realized by the memory 15 on the text generation device 1.
  • CPU Central Processing Unit, central processing unit
  • MPU Microprocessor Unit, microprocessor
  • DSP Digital Signal Processing, digital signal processor
  • FPGA Field Programmable Gate Array
  • the embodiment of the present application also provides a text generating device 1. As shown in FIG.
  • the processor 14 communicates, and the memory 15 stores a program executable by the processor 14. When the program is executed, the processor 14 executes the text generation method as described above.
  • the above-mentioned memory 15 can be a volatile memory (volatile memory), such as a random access memory (Random-Access Memory, RAM); or a non-volatile memory (non-volatile memory), such as a read-only memory (Read-Only Memory, ROM), flash memory (flash memory), hard disk (Hard Disk Drive, HDD) or solid-state hard drive (Solid-State Drive, SSD); Provide instructions and data.
  • volatile memory such as a random access memory (Random-Access Memory, RAM)
  • non-volatile memory such as a read-only memory (Read-Only Memory, ROM), flash memory (flash memory), hard disk (Hard Disk Drive, HDD) or solid-state hard drive (Solid-State Drive, SSD); Provide instructions and data.
  • An embodiment of the present application provides a computer-readable storage medium, on which a computer program is carried, and when the program is executed by the processor 14, the text generation method as described above is implemented.
  • the text generation device when it receives the text generation instruction, it obtains the text keyword from the text generation instruction, searches the template library for a target template that includes the target text type corresponding to the text keyword, and Find the position of the target text type in the template, and use the field information of the text keyword to replace the field information corresponding to the target text type at this position, so as to obtain the target text containing the text keyword, and do not need to obtain the text manually information, which improves the intelligence of the text generating device when generating text information.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) having computer-usable program code embodied therein.
  • a computer-usable storage media including but not limited to disk storage and optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
  • Embodiments of the present application provide a text generation method and device, and a storage medium.
  • the text generation method includes: in the case of receiving a text generation instruction, obtaining text keywords from the text generation instruction, and determining the target corresponding to the text keyword Text type; if there is a target template containing the target text type in the template library, obtain the target template from the template library; the template in the template library is a text template with a text type; find the target text type in the target template position, and replace the field information corresponding to the target text type with the field information of the text keyword at the position to obtain the target text containing the text keyword.
  • the text generation device when it receives the text generation instruction, it obtains the text keywords from the text generation instruction, searches the template library for the target template that includes the target text type corresponding to the text keyword, and Find the position of the target text type in the target template, and use the field information of the text keyword to replace the field information corresponding to the target text type at this position, so as to obtain the target text containing the text keyword, which does not need to be obtained manually
  • the text information improves the intelligence of the text generating device when generating the text information.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the present application disclose a text generation method and apparatus, and a storage medium. The method comprises: obtaining a text keyword from a text generation instruction when the text generation instruction is received, and determining a target text type corresponding to the text keyword; when a target template comprising the target text type exists in a template library, obtaining the target template from the template library, templates in the template library being text templates that are provided with text types; and finding the position of the target text type in the target template, and replacing field information corresponding to the target text type by using field information of the text keyword at the position to obtain a target text comprising the text keyword.

Description

一种文本生成方法及装置、存储介质A text generation method, device, and storage medium
相关申请的交叉引用Cross References to Related Applications
本申请要求在2021年11月01日提交中国专利局、申请号为202111284961.6、申请名称为“一种文本生成方法及装置、存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111284961.6 and the application title "a text generation method and device, storage medium" submitted to the China Patent Office on November 01, 2021, the entire contents of which are incorporated by reference In this application.
技术领域technical field
本申请涉及人工智能技术领域,尤其涉及一种文本生成方法及装置、存储介质。The present application relates to the technical field of artificial intelligence, in particular to a method and device for generating text, and a storage medium.
背景技术Background technique
随着互联网技术的发展,网络每天都会向用户推送很多对象对应的文本信息,以供用户可以根据该文本信息深入了解对象,以实现对该对象的处理过程。With the development of Internet technology, the network will push a lot of text information corresponding to the object to the user every day, so that the user can understand the object in depth according to the text information, so as to realize the processing process of the object.
现有技术中,是在得到与该对象相关的描述信息的情况下,人工查找对应的描述模板,人工以将该描述模板和描述信息进行关联,得到对应的文本信息,如此降低了生成文本信息时的智能性。In the prior art, when the description information related to the object is obtained, the corresponding description template is manually searched, and the description template and the description information are manually associated to obtain the corresponding text information, which reduces the generation of text information. time intelligence.
发明内容Contents of the invention
为解决上述技术问题,本申请实施例期望提供一种文本生成方法及装置、存储介质,能够提高文本生成装置生成文本信息时的智能性。In order to solve the above technical problems, the embodiments of the present application expect to provide a text generation method and device, and a storage medium, which can improve the intelligence of the text generation device when generating text information.
本申请的技术方案是这样实现的:The technical scheme of the present application is realized like this:
本申请实施例提供一种文本生成方法,包括:An embodiment of the present application provides a text generation method, including:
在接收到文本生成指令的情况下,从所述文本生成指令中获取文本关键词,并确定所述文本关键词对应的目标文本类型;In the case of receiving a text generation instruction, acquiring text keywords from the text generation instruction, and determining the target text type corresponding to the text keywords;
在模板库中存在包含所述目标文本类型的目标模板的情况下,从所述模板库中获取所述目标模板;所述模板库中的模板为设置有文本类型的文本模板;In the case that there is a target template containing the target text type in the template library, the target template is obtained from the template library; the template in the template library is a text template provided with a text type;
在所述目标模板中查找所述目标文本类型的位置,并在所述位置处利用所述文本关键词的字段信息替换所述目标文本类型对应的字段信息,得到包含所述文本关键词的目标文本。Find the position of the target text type in the target template, and replace the field information corresponding to the target text type with the field information of the text keyword at the position, to obtain the target containing the text keyword text.
本申请实施例提供了一种文本生成装置,所述装置包括:An embodiment of the present application provides a text generation device, the device includes:
获取部分,配置为在接收到文本生成指令的情况下,从所述文本生成指令中获取文本关键词;在模板库中存在包含所述目标文本类型的目标模板的情况下,从所述模板库中获取所述目标模板;所述模板库中的模板为设置有文本类型的文本模板;The obtaining part is configured to obtain text keywords from the text generation instruction in the case of receiving the text generation instruction; in the case that there is a target template containing the target text type in the template library, from the template library Obtain the target template in the template library; the template in the template library is a text template with a text type;
确定部分,配置为确定所述文本关键词对应的目标文本类型;A determining part configured to determine a target text type corresponding to the text keyword;
替换部分,配置为在所述位置处利用所述文本关键词的字段信息替换所述目标文本类型对应的字段信息,得到包含所述文本关键词的目标文本。The replacement part is configured to replace the field information corresponding to the target text type with the field information of the text keyword at the position, so as to obtain the target text containing the text keyword.
本申请实施例提供了一种文本生成装置,所述装置包括:An embodiment of the present application provides a text generation device, the device includes:
存储器、处理器和通信总线,所述存储器通过所述通信总线与所述处理器进行通信,所述存储器存储所述处理器可执行的文本生成的程序,当所述文本生成的程序被执行时,通过所述处理器执行上述所述的文本生成方法。a memory, a processor, and a communication bus, the memory communicates with the processor through the communication bus, the memory stores a text-generated program executable by the processor, and when the text-generated program is executed , using the processor to execute the above text generation method.
本申请实施例提供了一种存储介质,其上存储有计算机程序,应用于文本生成装置,该计算机程序被处理器执行时实现上述所述的文本生成方法。An embodiment of the present application provides a storage medium on which a computer program is stored, which is applied to a text generation device. When the computer program is executed by a processor, the above text generation method is implemented.
本申请实施例提供了一种文本生成方法及装置、存储介质,文本生成方法包括:在接收到文本生成指令的情况下,从文本生成指令中获取文本关键词,并确定文本关键词对应的目标文本类型;在模板库中存在包含目标文本类型的目标模板的情况下,从模板库中获取目标模板;模板库中的 模板为设置有文本类型的文本模板;在目标模板中查找目标文本类型的位置,并在位置处利用文本关键词的字段信息替换目标文本类型对应的字段信息,得到包含文本关键词的目标文本。采用上述方法实现方案,文本生成装置在接收到文本生成指令的情况下,从文本生成指令中获取文本关键词,通过在模板库中查找包括文本关键词对应的目标文本类型的目标模板,并在目标模板中查找目标文本类型的位置,以在该位置处利用文本关键词的字段信息替换目标文本类型对应的字段信息,从而得到包含文本关键词的目标文本,不需要再通过人工的方式来得到文本信息,提高了文本生成装置生成文本信息时的智能性。Embodiments of the present application provide a text generation method and device, and a storage medium. The text generation method includes: in the case of receiving a text generation instruction, obtaining text keywords from the text generation instruction, and determining the target corresponding to the text keyword Text type; if there is a target template containing the target text type in the template library, obtain the target template from the template library; the template in the template library is a text template with a text type; find the target text type in the target template position, and replace the field information corresponding to the target text type with the field information of the text keyword at the position to obtain the target text containing the text keyword. Using the implementation scheme of the above method, when the text generation device receives the text generation instruction, it obtains the text keywords from the text generation instruction, searches the template library for a target template that includes the target text type corresponding to the text keyword, and Find the position of the target text type in the target template, and use the field information of the text keyword to replace the field information corresponding to the target text type at this position, so as to obtain the target text containing the text keyword, which does not need to be obtained manually The text information improves the intelligence of the text generating device when generating the text information.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本申请的实施例,并与说明书一起用于说明本申请的技术方案。The accompanying drawings here are incorporated into the specification and constitute a part of the specification. These drawings show embodiments consistent with the application, and are used together with the description to describe the technical solution of the application.
图1为本申请实施例提供的一种文本生成方法流程图;Fig. 1 is a flow chart of a text generation method provided by the embodiment of the present application;
图2为本申请实施例提供的一种示例性的BERT的结构示意图;FIG. 2 is a schematic structural diagram of an exemplary BERT provided in the embodiment of the present application;
图3为本申请实施例提供的一种示例性的监督训练BERT模型的示意图;FIG. 3 is a schematic diagram of an exemplary supervised training BERT model provided by an embodiment of the present application;
图4为本申请实施例提供的一种示例性的训练BERT模型的流程图;Fig. 4 is a flow chart of an exemplary training BERT model provided by the embodiment of the present application;
图5为本申请实施例提供的一种示例性的文本模板持久化流程图;FIG. 5 is an exemplary text template persistence flowchart provided by the embodiment of the present application;
图6为本申请实施例提供的一种示例性的文本生成方法流程图;FIG. 6 is a flow chart of an exemplary text generation method provided by an embodiment of the present application;
图7为本申请实施例提供的一种示例性的文本生成方法的种子阶段和自动训练阶段示意图;FIG. 7 is a schematic diagram of a seed stage and an automatic training stage of an exemplary text generation method provided by an embodiment of the present application;
图8为本申请实施例提供的一种文本生成装置的组成结构示意图一;FIG. 8 is a first structural diagram of a text generation device provided by an embodiment of the present application;
图9为本申请实施例提供的一种文本生成装置的组成结构示意图二。FIG. 9 is a second schematic diagram of the composition and structure of a text generation device provided by an embodiment of the present application.
具体实施方式Detailed ways
为了能够更加详尽地了解本申请实施例的特点与技术内容,下面结合附图对本申请实施例的实现进行详细阐述,所附附图仅供参考说明之用,并非用来限定本申请实施例。In order to understand the characteristics and technical contents of the embodiments of the present application in more detail, the implementation of the embodiments of the present application will be described in detail below in conjunction with the accompanying drawings. The attached drawings are only for reference and description, and are not intended to limit the embodiments of the present application.
实施例一Embodiment one
本申请实施例提供了一种文本生成方法,图1为本申请实施例提供的一种文本生成方法流程图一,如图1所示,文本生成方法可以包括:The embodiment of the present application provides a text generation method, and Fig. 1 is a flow chart 1 of a text generation method provided in the embodiment of the present application. As shown in Fig. 1, the text generation method may include:
S101、在接收到文本生成指令的情况下,从文本生成指令中获取文本关键词,并确定文本关键词对应的目标文本类型。S101. When a text generation instruction is received, acquire text keywords from the text generation instruction, and determine a target text type corresponding to the text keywords.
本申请实施例提供的一种文本生成方法适用于根据文本生成指令中携带的文本关键词生成目标文本的场景下。A text generation method provided by an embodiment of the present application is applicable to a scenario where a target text is generated according to text keywords carried in a text generation instruction.
在本申请实施例中,文本生成装置可以以各种形式来实施。例如,本申请中描述的文本生成装置可以包括诸如手机、照相机、平板电脑、笔记本电脑、掌上电脑、个人数字助理(Personal Digital Assistant,PDA)、便捷式媒体播放器(Portable Media Player,PMP)、导航装置、可穿戴设备、智能手环、计步器等装置,以及诸如数字TV、台式计算机等装置。In the embodiment of the present application, the text generation device may be implemented in various forms. For example, the text generation device described in this application may include mobile phones, cameras, tablet computers, notebook computers, palmtop computers, personal digital assistants (Personal Digital Assistant, PDA), portable media players (Portable Media Player, PMP), Devices such as navigation devices, wearable devices, smart bracelets, pedometers, and devices such as digital TVs, desktop computers, etc.
在本申请实施例中,文本生成指令可以为生成营销文本的指令;文本生成指令也可以为生成广告文本的指令;文本生成指令可以为生成其他文本的指令;具体的文本生成指令可以根据实际情况进行确定,本申请实施例对此不作限定。In the embodiment of the present application, the text generation instruction can be an instruction for generating marketing text; the text generation instruction can also be an instruction for generating advertising text; the text generation instruction can be an instruction for generating other text; the specific text generation instruction can be based on the actual situation The determination is made, which is not limited in this embodiment of the present application.
在本申请实施例中,文本生成装置可以包括显示屏,文本生成装置可以从显示屏上接收到文本生成指令;文本生成装置也可以从其他设备处接收文本生成指令,文本生成指令还可以通过其他的方式来接收文本生成指令;具体的文本生成装置接收文本生成指令的方式,可以根据实际情况进行确定,本申请实施例对此不作限定。In the embodiment of the present application, the text generation device may include a display screen, and the text generation device may receive a text generation instruction from the display screen; the text generation device may also receive a text generation instruction from other devices, and the text generation instruction may also be transmitted through other devices. The method for receiving the text generation instruction is performed by the text generation device; the specific method for the text generation device to receive the text generation instruction can be determined according to the actual situation, which is not limited in this embodiment of the present application.
在本申请实施例中,文本关键词可以为用于生成文本生成指令对应的目标文本的信息。In this embodiment of the present application, the text keyword may be information used to generate the target text corresponding to the text generation instruction.
在本申请实施例中,文本关键词的数量可以为一个,文本关键词的数量也可以为两个,文本关键词的数量还可以为多个,具体的文本关键词的数量可以根据实际情况进行确定,本申请实施例对此不作限定。In the embodiment of the present application, the number of text keywords can be one, the number of text keywords can also be two, the number of text keywords can also be multiple, and the specific number of text keywords can be determined according to the actual situation. Definitely, this embodiment of the present application does not limit it.
示例性的,文本关键词包括银行、优惠券、10元、1月1日到1月30日、观影、绑卡等。Exemplarily, the text keywords include bank, coupon, 10 yuan, January 1st to January 30th, movie viewing, card binding, etc.
在本申请实施例中,目标文本类型的数量可以为一个,目标文本类型的数量也可以为两个,目标文本类型的数量还可以为多个,具体的目标文本类型的数量可以根据实际情况进行确定,本申请实施例对此不作限定。In the embodiment of the present application, the number of target text types can be one, the number of target text types can also be two, the number of target text types can also be multiple, and the specific number of target text types can be determined according to the actual situation. Definitely, this embodiment of the present application does not limit it.
示例性的,目标文本类型可以为公司名称;目标文本类型也可以为产品名称;目标文本类型还也可以为发放物品;目标文本类型也可以为数值金额;目标文本类型也可以为活动时间或者活动描述;具体的目标文本类型可以根据实际情况进行确定,本申请实施例对此不作限定。Exemplarily, the target text type can be a company name; the target text type can also be a product name; the target text type can also be a distribution item; the target text type can also be a numerical amount; the target text type can also be an activity time or an activity Description; the specific target text type can be determined according to the actual situation, which is not limited in this embodiment of the present application.
需要说明的是,文本关键词与目标文本类型可以是一一对应的,即一个文本关键词对应一个目标文本类型;也可以是两个文本关键词对应一个目标文本类型;还以是多个文本关键词对应一个目标文本类型;具体的文本关键词与目标文本类型之间的对应关系可以根据实际情况进行确定,本申请实施例对此不作限定。It should be noted that there can be a one-to-one correspondence between text keywords and target text types, that is, one text keyword corresponds to one target text type; two text keywords can also correspond to one target text type; or multiple text keywords A keyword corresponds to a target text type; the specific correspondence between the text keyword and the target text type can be determined according to the actual situation, which is not limited in this embodiment of the present application.
在本申请实施例中,文本生成装置确定文本关键词对应的目标文本类型的过程,包括:在文本生成指令中未携带目标文本类型的情况下,文本生成装置将文本关键词输入类型识别模型,得到目标文本类型;在文本生成指令中携带目标文本类型的情况下,文本生成装置从文本生成指令中获取目标文本类型。In the embodiment of the present application, the process of the text generation device determining the target text type corresponding to the text keyword includes: when the text generation instruction does not carry the target text type, the text generation device inputs the text keyword into the type recognition model, Obtain the target text type; if the target text type is carried in the text generation instruction, the text generation device obtains the target text type from the text generation instruction.
需要说明的是,类型识别模型可以为文本生成装置中配置的模型;类型识别模型也可以为文本生成装置将文本关键词输入类型识别模型之前, 类型识别模型从其他设备处获取到的模型;类型识别模型还可以为文本生成装置以其他的方式得到的模型;具体的文本生成装置获取到类型识别模型的方式可以根据实际情况进行确定,本申请实施例对此不作限定。It should be noted that the type recognition model can be a model configured in the text generation device; the type recognition model can also be a model obtained by the type recognition model from other devices before the text generation device inputs text keywords into the type recognition model; The recognition model may also be a model obtained by the text generation device in other ways; the specific manner in which the text generation device obtains the type recognition model may be determined according to actual conditions, which is not limited in this embodiment of the present application.
在本申请实施例中,类型识别模型可以为文本分类(FastText)模型;类型识别模型也可以为其他的能够根据文本关键词确定出文本类型的模型;具体的类型识别模型可以根据实际情况进行确定,本申请实施例对此不作限定。In the embodiment of the present application, the type recognition model can be a text classification (FastText) model; the type recognition model can also be other models that can determine the text type according to text keywords; the specific type recognition model can be determined according to the actual situation , which is not limited in this embodiment of the present application.
在本申请实施例中,文本生成装置将文本关键词输入类型识别模型,得到目标文本类型之前,文本生成装置还会获取第二样本关键词和第二样本文本类型;文本生成装置利用第二样本关键词和第二样本文本类型训练初始类型识别模型,得到类型识别模型。In the embodiment of the present application, the text generation device inputs the text keywords into the type recognition model, and before obtaining the target text type, the text generation device will also obtain the second sample keyword and the second sample text type; the text generation device uses the second sample The keywords and the second sample text type train the initial type recognition model to obtain the type recognition model.
在本申请实施例中,第二样本关键词可以为预设的关键词;第二样本关键词也可以为其他设备传输至文本生成装置中的关键词;第二样本关键词还可以为文本生成装置接收到的通过人工标注的方式得到的关键词;具体的文本生成装置得到第二样本关键词的方式可以根据实际情况进行确定,本申请实施例对此不作限定。In the embodiment of the present application, the second sample keyword can be a preset keyword; the second sample keyword can also be a keyword transmitted to the text generating device by other devices; the second sample keyword can also be a text generated The keywords received by the device through manual labeling; the specific method of obtaining the second sample keywords by the text generation device can be determined according to the actual situation, which is not limited in this embodiment of the present application.
在本申请实施例中,第二样本文本类型为与第二样本关键词对应的文本类型。第二样本文本类型可以为预设的文本类型;第二样本文本类型也可以为其他设备传输至文本生成装置中的文本类型;第二样本文本类型还可以为文本生成装置接收到的通过人工标注的方式得到的文本类型;具体的文本生成装置得到第二样本文本类型的方式可以根据实际情况进行确定,本申请实施例对此不作限定。In this embodiment of the present application, the second sample text type is a text type corresponding to the second sample keyword. The second sample text type can be a preset text type; the second sample text type can also be the text type transmitted to the text generating device by other equipment; the second sample text type can also be received by the text generating device through manual marking The text type obtained by the method; the specific method of obtaining the second sample text type by the text generation device can be determined according to the actual situation, which is not limited in this embodiment of the present application.
在本申请实施例中,文本生成装置可以只获取一次第二样本关键词和第二样本文本类型。In this embodiment of the present application, the text generation device may acquire the second sample keyword and the second sample text type only once.
示例性的,第二样本关键词包括银行、优惠券、10元、1月1日到1月30日、观影、绑卡等。Exemplarily, the second sample keywords include bank, coupon, 10 yuan, January 1 to January 30, movie watching, card binding and so on.
示例性的,第二样本文本类型包括:公司名称、产品名称、发放物品、数值金额、活动时间或者活动描述等;具体的第二样本文本类型可以根据实际情况进行确定,本申请实施例对此不作限定。Exemplarily, the second sample text type includes: company name, product name, issued items, numerical value, activity time or activity description, etc.; the specific second sample text type can be determined according to the actual situation, and this embodiment of the application Not limited.
S102、在模板库中存在包含目标文本类型的目标模板的情况下,从模板库中获取目标模板;模板库中的模板为设置有文本类型的文本模板。S102. If there is a target template including the target text type in the template library, acquire the target template from the template library; the templates in the template library are text templates set with the text type.
在本申请实施例中,文本生成装置确定文本关键词对应的目标文本类型之后,文本生成装置在模板库中存在包含目标文本类型的目标模板的情况下,文本生成装置就从模板库中获取目标模板。In the embodiment of the present application, after the text generation device determines the target text type corresponding to the text keyword, if the text generation device has a target template containing the target text type in the template library, the text generation device obtains the target text type from the template library. template.
需要说明的是,模板库中的模板为设置有文本类型的文本模板。It should be noted that the templates in the template library are text templates with a text type.
在本申请实施例中,文本模板的数量可以为一个,文本模板的数量也可以为两个;文本模板的数量还可以为多个,具体的文本模板的数量可以根据实际情况进行确定,本申请实施例对此不作限定。In the embodiment of the present application, the number of text templates can be one, and the number of text templates can also be two; the number of text templates can also be multiple, and the specific number of text templates can be determined according to the actual situation. The embodiment does not limit this.
在本申请实施例中,文本生成装置从模板库中获取目标模板之前,文本生成装置还会获取第一样本文本;并将第一样本文本输入关键词识别模型,得到第一样本文本对应的第一样本关键词、第一样本类型和第一样本关键词在第一样本文本中的第一位置;文本生成装置将第一样本关键词输入文本生成模型,得到第一输出文本;根据第一输出文本、第一样本文本、第一样本关键词、第一样本类型和第一位置,得到文本模板,并将文本模板添加至模板库。In the embodiment of the present application, before the text generation device obtains the target template from the template library, the text generation device will also obtain the first sample text; and input the first sample text into the keyword recognition model to obtain the first sample text The corresponding first sample keyword, the first sample type and the first position of the first sample keyword in the first sample text; the text generation device inputs the first sample keyword into the text generation model to obtain the first sample keyword An output text; according to the first output text, the first sample text, the first sample keyword, the first sample type and the first position, a text template is obtained, and the text template is added to the template library.
在本申请实施例中,文本生成装置可以每隔预设时间段获取第一样本文本;文本生成装置也可以为在接收到样本文本获取指令的情况下,文本生成装置就获取第一样本文本;文本生成装置还可以以其他的方式获取第一样本文本;具体的文本生成装置获取第一样本文本的方式可以根据实际情况进行确定,本申请实施例对此不作限定。In the embodiment of the present application, the text generation device may obtain the first sample text every preset time period; the text generation device may also obtain the first sample text when receiving the sample text acquisition instruction. Note that the text generation device can also obtain the first sample text in other ways; the specific method for the text generation device to obtain the first sample text can be determined according to the actual situation, which is not limited in this embodiment of the present application.
需要说明的是,预设时间段可以为文本生成装置中配置的时间段;预设时间段也可以为文本生成装置获取第一样本文本之前,文本生成装置接 收到的时间段;预设时间段还可以为文本生成装置以其他的方式获取到的时间段,具体的文本生成装置获取到预设时间段的方式可以根据实际情况进行确定,本申请实施例对此不作限定。It should be noted that the preset time period can be the time period configured in the text generating device; the preset time period can also be the time period received by the text generating device before the text generating device obtains the first sample text; the preset time The segment may also be a time segment obtained by the text generating device in other ways, and the specific manner in which the text generating device obtains the preset time segment may be determined according to actual conditions, which is not limited in this embodiment of the present application.
还需要说明的是,预设时间段可以为一周;预设时间段也可以为一个月;预设时间段还可以为一天;具体的预设时间段可以根据实际情况进行确定,本申请实施例对此不作限定。It should also be noted that the preset time period can be one week; the preset time period can also be one month; the preset time period can also be one day; the specific preset time period can be determined according to the actual situation. There is no limit to this.
在本申请实施例中,关键词识别模型可以为文本生成装置中配置的模型;关键词识别模型也可以为文本生成装置接收到的其他设备传输的模型;关键词识别模型还可以为文本生成装置以其他的方式获取到的模型;具体的文本生成装置获取到关键词识别模型的方式可以根据实际情况进行确定,本申请实施例对此不作限定。In this embodiment of the application, the keyword recognition model can be a model configured in the text generation device; the keyword recognition model can also be a model transmitted by other devices received by the text generation device; the keyword recognition model can also be a text generation device Models obtained in other ways; the specific manner in which the text generation device obtains the keyword recognition model may be determined according to actual conditions, which is not limited in this embodiment of the present application.
在本申请实施例中,关键词识别模型可以为语言表示模型(Bidirectional Encoder Representation from Transformers,BERT)和条件随机场的模型得到的模型;关键词识别模型也可以为其他的可以根据样本文本得到该样本文本对应的样本关键词、样本类型和样本关键词在样本文本中的位置的模型;具体的关键词识别模型可以根据实际情况进行确定,本申请实施例对此不作限定。In the embodiment of the present application, the keyword recognition model can be a model obtained from a language representation model (Bidirectional Encoder Representation from Transformers, BERT) and a conditional random field model; the keyword recognition model can also be other models that can be obtained from the sample text Models of the sample keywords, sample types, and positions of the sample keywords in the sample text corresponding to the sample text; the specific keyword recognition model can be determined according to the actual situation, which is not limited in this embodiment of the present application.
在本申请实施例中,文本生成模型可以为文本生成装置中配置的模型;文本生成模型也可以为文本生成装置接收到的其他设备传输的模型;文本生成模型还可以为文本生成装置以其他的方式获取到的模型;具体的文本生成装置获取到文本生成模型的方式可以根据实际情况进行确定,本申请实施例对此不作限定。In this embodiment of the application, the text generation model can be a model configured in the text generation device; the text generation model can also be a model transmitted by other devices received by the text generation device; the text generation model can also be a text generation device with other The model obtained by the method; the specific method for the text generation device to obtain the text generation model can be determined according to the actual situation, which is not limited in this embodiment of the present application.
在本申请实施例中,文本生成模型可以为Fixed-Keywords BERT模型;文本生成模型也可以为其他的可以根据文本关键词生成输出文本的模型;具体的文本生成模型可以根据实际进行确定,本申请实施例对此不作限定。In the embodiment of the present application, the text generation model can be the Fixed-Keywords BERT model; the text generation model can also be other models that can generate output text according to the text keywords; the specific text generation model can be determined according to the actual situation. The embodiment does not limit this.
在本申请实施例中,文本生成装置根据第一输出文本、第一样本文本、 第一样本关键词、第一样本类型和第一位置,得到文本模板的过程,包括:文本生成装置利用关键词识别模型确定第一样本关键词在第一输出文本中的第二位置;文本生成装置在第一输出文本中的第二位置处,利用第一样本类型替换第一样本关键词,得到第一模板;文本生成装置在第一样本文本中的第一位置处,利用第一样本类型替换第一样本关键词,得到第二模板;文本生成装置将第一模板和第二模板作为文本模板。In the embodiment of the present application, the text generation device obtains the text template according to the first output text, the first sample text, the first sample keyword, the first sample type and the first position, including: the text generation device Utilize the keyword recognition model to determine the second position of the first sample keyword in the first output text; the text generation device replaces the first sample key with the first sample type at the second position in the first output text word, to obtain the first template; the text generation device uses the first sample type to replace the first sample keyword at the first position in the first sample text to obtain the second template; the text generation device combines the first template and The second template serves as a text template.
在本申请实施例中,文本生成装置利用关键词识别模型确定第一样本关键词在第一输出文本中的第二位置的方式,可以为文本生成装置将第一输出文本输入关键词识别模型,利用关键词识别模型确定出第一样本关键词在第一输出文本中的第二位置。In the embodiment of the present application, the text generation device uses the keyword recognition model to determine the second position of the first sample keyword in the first output text, which can be used for the text generation device to input the first output text into the keyword recognition model , using the keyword recognition model to determine the second position of the first sample keyword in the first output text.
在本申请实施例中,第一模板和第二模板可以相同;第一模板和第二模板也可以不同;若第一模板的数量和第二模板的数量都为多个,则第一模板和第二模板还可以存在部分相同模板,部分不同模板;具体的可以根据实际进行确定,本申请实施例对此不作限定。In this embodiment of the application, the first template and the second template can be the same; the first template and the second template can also be different; if the number of the first template and the number of the second template are multiple, then the first template and the second template The second template may also have some same templates and some different templates; the specific ones can be determined according to the actual situation, which is not limited in this embodiment of the present application.
需要说明的是,第一位置和第二位置可以相同;第一位置和第二位置也可以不同;具体的可以根据实际进行确定,本申请实施例对此不作限定。It should be noted that the first position and the second position may be the same; the first position and the second position may also be different; the specifics may be determined according to actual conditions, which is not limited in this embodiment of the present application.
在本申请实施例中,文本生成装置将第一样本文本输入关键词识别模型,得到第一样本文本对应的第一样本关键词、第一样本类型和第一样本关键词在第一样本文本中的第一位置之前,文本生成装置还会获取第二样本文本和第二样本文本对应的第二样本关键词、第二样本文本对应的第二样本类型和第二样本关键词在第二样本文本中的第三位置;文本生成装置利用第二样本关键词、第二样本类型、第三位置和第二样本文本训练初始关键词识别模型,得到关键词识别模型。In this embodiment of the present application, the text generation device inputs the first sample text into the keyword recognition model, and obtains the first sample keyword corresponding to the first sample text, the first sample type, and the first sample keyword in Before the first position in the first sample text, the text generation device will also obtain the second sample text and the second sample keyword corresponding to the second sample text, the second sample type and the second sample key corresponding to the second sample text The third position of the word in the second sample text; the text generation device uses the second sample keyword, the second sample type, the third position and the second sample text to train the initial keyword recognition model to obtain the keyword recognition model.
在本申请实施例中,文本生成装置中配置有{营销词}与{产品/公司名}组合的正则表达式,文本生成装置可以利用该正则表达式从互联网全量数据中获取到第二样本文本,并通过人工的标注的方式,从第二样本文本中 标注出对应的第二样本关键词、第二样本类型和第二样本关键词在第二样本文本中的第三位置。然后将第二样本关键词、第二样本类型和第三位置传输至文本生成装置,此时文本生成装置就获取到了第二样本关键词、第二样本类型和第三位置。In this embodiment of the application, the text generation device is configured with a regular expression combining {marketing words} and {product/company name}, and the text generation device can use the regular expression to obtain the second sample text from the full amount of Internet data , and mark the corresponding second sample keyword, the second sample type, and the third position of the second sample keyword in the second sample text from the second sample text by manual labeling. Then, the second sample keyword, the second sample type and the third location are transmitted to the text generating device, and at this time, the text generating device acquires the second sample keyword, the second sample type and the third location.
在本申请实施例中,营销词为与金融营销相关的词,配置于文本生成装置中的词语,营销词包括:领取、福利、优惠、红包、限时、特价、包邮、充值、券、会员、代金、重磅、好消息、专享、专供、超值、特惠、礼、回馈、兑换、激活、送、补贴、11.11、12.12、双十一、双十二、抽奖、双11、双12、贴心、省钱、折、精品、包邮、暖冬、精美、等你来拿、秒杀、免、券、折、送、礼、赠、店庆、仅限、优惠、兑换、好消息、惊喜、狂欢、惊爆、推出、活动、特价、特别、特惠、来袭、羊毛、直降、省钱、补贴、立减、红包、限时、积分、上线、震撼、手慢无、[低少下让减降].*[息率利费价]、[息率利费价].*[低少下减降]、满.*减、专属、无抵押、来抢、速来、速速、必备、充值、返利、开业、最新。In this embodiment of the application, the marketing words are words related to financial marketing, words configured in the text generating device, and the marketing words include: receiving, benefits, discounts, red envelopes, limited time, special prices, free shipping, recharge, coupons, members , voucher, blockbuster, good news, exclusive, exclusive, super value, special offer, gift, reward, exchange, activation, gift, subsidy, 11.11, 12.12, double 11, double 12, lottery, double 11, double 12. Intimate, money-saving, discount, high-quality goods, free shipping, warm winter, exquisite, waiting for you to get it, spike, free, coupon, discount, gift, gift, gift, store celebration, limited, discount, exchange, good news, surprise , carnival, shocking, launch, event, special offer, special, special offer, incoming, wool, direct drop, money saving, subsidy, immediate discount, red envelope, limited time, points, online, shock, slow hands, [low less let Reduction].*[Interest rate interest price], [Interest rate interest rate price].*[Low less reduction reduction], full.*Reduction, exclusive, unsecured, come to grab, come quickly, quickly, must Preparation, recharge, rebate, opening, latest.
在本申请实施例中,产品/公司名为与金融相关的产品以及公司名称或简称,以“{产品/公司名}”表示。In this embodiment of the application, the product/company name is a financial-related product and company name or abbreviation, represented by "{product/company name}".
示例性的,将{营销词}与{产品/公司名}进行组合的正则表达式可以为:{营销词}.*{产品/公司名};将{营销词}与{产品/公司名}进行组合的正则表达式也可以为{产品/公司名}.*{营销词}。Exemplarily, the regular expression for combining {marketing word} and {product/company name} can be: {marketing word}.*{product/company name}; combining {marketing word} and {product/company name} The combined regular expression can also be {product/company name}.*{marketing word}.
需要说明的是,第一样本文本也可以利用该正则表达式每隔预设时间段从互联网全量数据中获取到的样本文本信息。It should be noted that the first sample text may also use the sample text information obtained from the full amount of data on the Internet every preset time period by using the regular expression.
在本申请实施例中,若第二样本关键词为银行,则对应的第二样本类型为公司名称;若第二样本关键词为优惠券,则对应的第二样本类型为发放物品;若第二样本关键词为10元,则对应的第二样本类型为金额数值;若第二样本关键词为1月1日到1月30日,则对应的第二样本类型为活动时间。In this embodiment of the application, if the second sample keyword is a bank, then the corresponding second sample type is a company name; if the second sample keyword is a coupon, then the corresponding second sample type is an issued item; If the keyword in the second sample is 10 yuan, the corresponding second sample type is monetary value; if the second sample keyword is from January 1 to January 30, then the corresponding second sample type is activity time.
示例性的,若第二样本文本为“公司发福利啦!50元红包,快来领取喔”,则对应的第一个第二样本关键词为公司,对应的第一个第二样本类型为公司名称,对应的第一个第三位置为(0,2);对应的第二个第二样本关键词为50元,对应的第二个第二样本类型为金额数值,对应的第二个第三位置为(7,10);对应的第三个第二样本关键词为红包,对应的第三个第二样本类型为发放物品,对应的第三个第三位置为(10,12)。Exemplarily, if the text of the second sample is "The company is giving out benefits! 50 yuan red envelope, come and get it", then the corresponding keyword of the first second sample is the company, and the corresponding type of the first second sample is Company name, the corresponding first third position is (0, 2); the corresponding second second sample keyword is 50 yuan, the corresponding second second sample type is the amount value, the corresponding second The third position is (7, 10); the corresponding third second sample keyword is red envelope, the corresponding third second sample type is distribution items, and the corresponding third third position is (10, 12) .
需要说明的是,第三位置可以为第二样本关键词在第二样本文本中出现的起始和终止位置对。It should be noted that the third position may be a pair of starting and ending positions where the second sample keyword appears in the second sample text.
在本申请实施例中,文本生成装置确定文本关键词对应的目标文本类型之后,文本生成装置在模板库中不包含目标文本类型的目标模板的情况下,确定根据文本关键词形成的至少两个空位置以及至少两个空位置对应的至少两组字符量;文本生成装置按照至少两组字符量对至少两个空位置和关键字进行拼接,得到拼接信息;文本生成装置将拼接信息输入文本生成模型,得到与至少两个空位置对应的至少两组目标字符信息;文本生成装置在拼接信息中的至少两个空位置处添加至少两组目标字符信息,得到目标文本。In the embodiment of the present application, after the text generation device determines the target text type corresponding to the text keyword, the text generation device determines at least two Empty positions and at least two groups of characters corresponding to at least two vacant positions; the text generation device splices at least two vacant positions and keywords according to at least two groups of characters to obtain splicing information; the text generating device inputs the splicing information into the text to generate model to obtain at least two groups of target character information corresponding to at least two empty positions; the text generation device adds at least two groups of target character information to at least two empty positions in the splicing information to obtain target text.
需要说明的是,至少两个空位置与至少两组字符量一一对应,即至少一个空位置对应一组字符量。It should be noted that at least two empty positions correspond to at least two groups of characters, that is, at least one empty position corresponds to a group of characters.
需要说明的是,在文本关键词的数量为一个的情况下,该文本关键词的左边将存在一个空位置,该文本关键词的右边将存在第二个空位置;在文本关键词的数量为两个的情况下,第一个文本关键词的左边将存在一个空位置,第一个文本关键词和第二个文本关键词之间将存在第二个空位置;第二个文本关键词的右边将存在第三个空位置;….;在文本关键词的数量为N个的情况下,第一个文本关键词的左边将存在一个空位置,第一个文本关键词和第二个文本关键词之间将存在第二个空位置;…;第N-1个文本关键词与第N个文本关键词之间将存在第N个空位置;第N个文本关键 词的右边将存在第N+1个空位置。即在文本关键词的数量为N个的情况下,则对应的空位置数量为N+1。It should be noted that, when the number of text keywords is one, there will be an empty position on the left side of the text keyword, and there will be a second empty position on the right side of the text keyword; when the number of text keywords is In the case of two, there will be an empty space to the left of the first text keyword, and a second empty space between the first text keyword and the second text keyword; the second text keyword's There will be a third empty position on the right; ....; In the case that the number of text keywords is N, there will be an empty position to the left of the first text keyword, the first text keyword and the second text There will be a second empty position between the keywords; ...; There will be an Nth empty position between the N-1th text keyword and the Nth text keyword; there will be a Nth empty position on the right of the Nth text keyword N+1 empty slots. That is, when the number of text keywords is N, the corresponding number of empty positions is N+1.
在本申请实施例中,文本生成装置将拼接信息输入文本生成模型,得到与至少两个空位置对应的至少两组目标字符信息的方式,包括文本生成装置将拼接信息输入文本生成模型,利用文本生成模型通过采样的方式得到至少两个空位置中每组空位置上的第一个字,即,得到至少两组第一个字;然后再将至少两组第一个字和拼接信息输入文本生成模型,利用文本生成模型通过采样的方式得到至少两个空位置中的至少两组第二个字,直至利用文本生成模型通过采样的方式得到至少两个空位置中的每一个字,即得到至少两组目标字符信息。In the embodiment of the present application, the text generation device inputs the splicing information into the text generation model to obtain at least two groups of target character information corresponding to at least two empty positions, including the text generation device inputting the splicing information into the text generation model, using the text The generation model obtains the first word in each group of empty positions in at least two empty positions by sampling, that is, at least two groups of first characters are obtained; and then at least two groups of first characters and splicing information are input into the text Generate a model, use the text generation model to obtain at least two groups of second characters in at least two empty positions by sampling, until each word in the at least two empty positions is obtained by using the text generation model by sampling, that is, get At least two sets of target character information.
需要说明的是,BERT的结构图2所示:BERT可以分为三个部分:字向量转化部分,编码部分以及监督部分。其中,在接收到输入文本的情况下,先利用字向量转化部分对输入文本进行字向量转化,得到字向量序列(CLS、字1、字2、字3、…、字N),然后利用BERT的编码部分对字向量序列进行编码,最后利用监督部分确定出编码后的输入文本的文本类别。编码部分是BERT的主体,主要功能是对输入的N+1个字向量进行编码,让所有输入向量之间产生信息交互。编码部分由若干层编码块组成,编码部分中的第一个编码块对字向量序列编码后可以得到第一编码序列(E CLS、E 1、E 2、E 3、…、E n)(第一编码序列为图3中距离字向量序列最近的一个编码序列),编码部分中的最后一个编码块对字向量序列编码后可以得到编码输出序列(E CLS、E 1、E 2、E 3、…、E n)(编码输出序列为图3中距离监督部分最近的一个编码序列)。最后,监督部分包括为了完成对BERT的有监督训练所需要的对应于输入文本的标签。图3中展示的是对输入文本进行多类别分类。此时只需要取编码部分输出的编码输出序列,将编码输出序列映射到目标类别(文本类别),即可以开始有监督训练。监督部分可以根据任务目标进行调整,如进行命名实体识别,问题回答等。 It should be noted that the structure of BERT is shown in Figure 2: BERT can be divided into three parts: word vector conversion part, encoding part and supervision part. Among them, in the case of receiving the input text, first use the word vector conversion part to perform word vector conversion on the input text to obtain the word vector sequence (CLS, word 1, word 2, word 3, ..., word N), and then use BERT The encoding part encodes the word vector sequence, and finally uses the supervision part to determine the text category of the encoded input text. The encoding part is the main body of BERT. Its main function is to encode the input N+1 word vectors to allow information interaction between all input vectors. The coding part is composed of several layers of coding blocks, and the first coding block in the coding part can obtain the first coding sequence (E CLS , E 1 , E 2 , E 3 , ..., E n ) after coding the word vector sequence (E CLS , E 1 , E 2 , E 3 , ..., E n A coding sequence is the coding sequence closest to the word vector sequence in Fig. 3), and the last coding block in the coding part can obtain the coding output sequence (E CLS , E 1 , E 2 , E 3 , ..., E n ) (the coding output sequence is the coding sequence closest to the supervisory part in Fig. 3). Finally, the supervised part includes the labels corresponding to the input text needed for supervised training of BERT. Figure 3 shows the multi-category classification of the input text. At this time, it is only necessary to take the encoded output sequence output by the encoding part, and map the encoded output sequence to the target category (text category), and then supervised training can be started. The supervision part can be adjusted according to the task goal, such as performing named entity recognition, question answering, etc.
在本申请实施例中,若文本生成模型为Fixed-Keywords BERT模型,文本生成模型的作用为根据给定的模板关键词,生成出包含这些模板关键词的营销文案(即目标文本)。In the embodiment of the present application, if the text generation model is the Fixed-Keywords BERT model, the function of the text generation model is to generate marketing copy (that is, target text) containing these template keywords according to the given template keywords.
在本申请实施例中,图3展示了以“银行”、“红包”作为样本关键词进行输入,生成营销文案(输出文本)“快来领银行红包啦!”的有监督训练过程。先对“银行”、“红包”和掩码部分进行字向量转化,得到字向量序列;利用编码部分中的第一个编码块对字向量序列进行编码,得到第一编码序列(E CLS、E M、E M、E M、E 、E 、E M、E M、E M、E 、E 、E M、E M、E M),直至利用编码部分中的最后一个编码块对进行编码得到编码输出序列,利用监督部分对编码输出部分进行监督,从而得到预测结果(快来领、---、啦!-)。 In the embodiment of the present application, Fig. 3 shows the supervised training process of inputting "bank" and "red envelope" as sample keywords to generate a marketing copy (output text) "Come and get the bank red envelope!". Firstly, word-vector conversion is performed on "bank", "red envelope" and the mask part to obtain a word-vector sequence; the word-vector sequence is encoded using the first code block in the code part to obtain the first code sequence (E CLS , E M , EM , EM , Esilver , Erow , EM , EM , EM , Ered , Epacket , EM , EM , EM ) until the last coded block pair in the coded section is utilized Encoding is performed to obtain the encoded output sequence, and the supervised part is used to supervise the encoded output part to obtain the prediction result (come and get it, ---, la!-).
具体的:由于输入仅有“银行”、“红包”两个词,不能构成一条完整的营销文案,但这两个词按顺序所构成的空位(“银行”的左侧,“银行”与“红包”之间,“红包”右侧)都有可能出现构成这条营销文案的字。此时,需要先设置这些空位出现的字的个数的最大值L M。L M设置的方法有两种: Specifically: Since the input only has two words "bank" and "red envelope", a complete marketing copy cannot be formed, but the spaces formed by these two words in sequence (on the left side of "bank", "bank" and " Between "red envelopes", on the right side of "red envelopes"), words that make up this marketing copy may appear. At this time, it is necessary to first set the maximum value L M of the number of words where these vacancies appear. There are two ways to set L M :
第一种为:根据样本文本与样本关键词,确定样本关键词在样本文本中所构成的空位值对应的字个数。例如,对于营销文案“快来领银行红包啦!”,以及其包含的样本关键词:The first method is: according to the sample text and the sample keyword, determine the number of words corresponding to the blank value formed by the sample keyword in the sample text. For example, for the marketing copy "Come and get the bank red envelope!", and the sample keywords it contains:
样本类型:<公司名称>,样本关键词:银行,关键词位置:(3,5)Sample Type: <Company Name>, Sample Keyword: Bank, Keyword Position: (3, 5)
样本类型:<发放物品>,样本关键词:红包,关键词位置:(5,7)Sample type: <distributed item>, sample keyword: red envelope, keyword position: (5, 7)
可以确定出三个空位所包含的字的个数分别为:3,0,2。It can be determined that the numbers of words contained in the three slots are: 3, 0, 2 respectively.
第二种为:对所有空位值对应的字个数,取最大值,可以为L M。然后,使用L M个掩码向量(记为M)插入所有样本关键词之间构成的空位中。例如,对于输入“银行”、“红包”,可以在“银行”之前,“银行”、“红包”之间,“红包”之后各插入L M个掩码向量。如图3所示,假设L M为3, 字向量转化部分展示的即为最终包含掩码的结果。 The second type is: for the number of words corresponding to all vacancy values, take the maximum value, which can be L M . Then, L M mask vectors (denoted as M) are used to insert into the vacancies formed between all sample keywords. For example, for inputting "bank" and "red envelope", L M mask vectors can be inserted before "bank", between "bank" and "red envelope", and after "red envelope". As shown in Figure 3, assuming that L M is 3, the word vector conversion part shows the final result including the mask.
可以理解的是,以最大值为空位值对应的字个数,将会有充足的位置(最足够多的掩码部分)对空位置处的目标字符进行预测,提高了预测目标字符信息时的准确性。It can be understood that, with the maximum value as the number of words corresponding to the vacancy value, there will be sufficient positions (the most sufficient mask parts) to predict the target character at the vacancy position, which improves the accuracy of predicting the target character information. accuracy.
在本申请实施例中,在监督部分,对掩码部分进行字的监督。若样本文案中对应位置的字数小于L M,则从掩码部分的最左侧开始监督,剩下的位置的监督对象为“-”,如图3中“---”所示,表示此处不存在字。在预测得到掩码部分对应的预测目标字符的情况下,根据去除预测目标字符中的“-”后和样本关键词,即可得到最终的营销文案。 In the embodiment of the present application, in the supervision part, word supervision is performed on the mask part. If the number of words in the corresponding position in the sample copy is less than L M , the supervision starts from the leftmost part of the mask part, and the supervision object of the remaining position is "-", as shown in "---" in Figure 3, which means this Words do not exist anywhere. In the case of predicting the predicted target character corresponding to the mask part, the final marketing copy can be obtained according to the “-” in the predicted target character and the sample keyword.
示例性的,文本生成模型训练的过程如图4所示:Exemplarily, the process of text generation model training is shown in Figure 4:
S41、文本生成装置获取第二样本文本和第二样本文本对应的第二样本关键词。S41. The text generation device acquires a second sample text and a second sample keyword corresponding to the second sample text.
S42、文本生成装置利用第二样本关键词构造字向量序列。S42. The text generation device constructs a word vector sequence by using the second sample keywords.
示例性的,对于营销文案(第二样本文本)“快来领银行红包啦!”,以及其包含的第二样本关键词:Exemplarily, for the marketing copy (the second sample text) "Come and get the bank red envelope!", and the second sample keywords contained therein:
第二样本类型:<公司名称>,第二样本关键词:银行,第三位置:(3,5)Second Sample Type: <Company Name>, Second Sample Keyword: Bank, Third Position: (3, 5)
第二样本类型:<发放物品>,第二样本关键词:红包,第三位置:(5,7)The second sample type: <distributed items>, the second sample keyword: red envelope, the third position: (5, 7)
先按照第二样本关键词在营销文案中的位置从左到右依次转化为字向量,直接将每个字转化为一个200维的向量。两个词的所有字向量都拼接在一起可以构造出长度为4的200维字向量序列。然后,对两个第二样本关键词所构成的所有空位进行掩码向量的填充。First, according to the position of the second sample keyword in the marketing copy, it is converted into word vectors from left to right, and each word is directly converted into a 200-dimensional vector. All the word vectors of the two words are spliced together to construct a 200-dimensional word vector sequence of length 4. Then, the mask vector is filled for all vacancies formed by the two second sample keywords.
示例性的,若每一个掩码部分的字符数量L M为3,则对于每个空位,都插入3个200维的掩码向量,全部初始化为全0向量。插入后,可以得到长度为:3(掩码向量序列长度)+2(“银行”向量序列长度)+3(掩 码向量序列长度)+2(“红包”向量序列长度)+3(掩码向量序列长度)的200维向量序列,此时,完成了Fixed-Keywords BERT模型的字向量转化部分,也即得到了字向量序列。 Exemplarily, if the number of characters L M in each mask part is 3, then for each vacant position, three 200-dimensional mask vectors are inserted, all of which are initialized as all-0 vectors. After insertion, the length can be obtained as: 3 (length of mask vector sequence) + 2 (length of "bank" vector sequence) + 3 (length of mask vector sequence) + 2 (length of "red envelope" vector sequence) + 3 (mask The vector sequence length) is a 200-dimensional vector sequence. At this time, the word vector conversion part of the Fixed-Keywords BERT model is completed, that is, the word vector sequence is obtained.
S43、文本生成装置根据第二样本文本构建训练标签。S43. The text generation device constructs a training label according to the second sample text.
在本申请实施例中,训练标签表示的是期望将数据输入模型之后想获得的结果,即为真实的营销文案。构造字向量序列时对每个空位都插入了掩码,需要保证训练标签与字向量序列在每一个字位上都是对应的。In the embodiment of this application, the training label represents the expected result after inputting the data into the model, that is, the real marketing copy. When constructing the word vector sequence, a mask is inserted for each vacancy, and it is necessary to ensure that the training label and the word vector sequence correspond to each word position.
示例性的,可以构造出向量序列:【M,M,M,银,行,M,M,M,红,包,M,M,M】,那么,其对应的训练标签构造如下:【快,来,领,银,行,-,-,-,红,包,啦,!,-】。其中“-”表示对应位置无字符。Exemplarily, a vector sequence can be constructed: [M, M, M, silver, line, M, M, M, red, bag, M, M, M], then the corresponding training label is constructed as follows: [Quick , come, collar, silver, OK, -,-,-, red, bag, la,! ,-]. Among them, "-" indicates that there is no character in the corresponding position.
S44、文本生成装置将字向量序列输入初始文本生成模型的编码部分,得到编码输出序列。S44. The text generation device inputs the word vector sequence into the encoding part of the initial text generation model to obtain an encoded output sequence.
S45、文本生成装置将根据编码输出序列和训练标签训练初始文本生成模型,得到了文本生成模型。S45. The text generation device trains the initial text generation model according to the encoded output sequence and the training labels, and obtains the text generation model.
需要说明的是,文本生成装置得到编码输出序列之后,文本生成装置就将编码输出序列中的每个向量(除CLS向量外)都映射到字表集合(包括“-”)。It should be noted that after the text generating device obtains the coded output sequence, the text generating device maps each vector (except the CLS vector) in the coded output sequence to the word list set (including "-").
示例性的,将字向量序列【M,M,M,银,行,M,M,M,红,包,M,M,M】输入初始文本生成模型的编码部分后得到的编码输出序列长度为13的200维向量序列。对于每一个向量,乘上一个可训练的矩阵(矩阵形状为:200×(字表大小+1),1表示“-”)从而把向量映射到目标字表(包括“-”)。之后,就可以确定映射后的向量序列与训练标签:【快,来,领,银,行,-,-,-,红,包,啦,!,-】之间的交叉熵,采用梯度下降等方式即可以对初始文本生成模型参数进行微调更新。在初始文本生成模型收敛(即初始文本生成模型参数无法得到更新)或达到最大的训练步数的情况下,即 可以认为初始文本生成模型已经训练完毕,从而得到了文本生成模型。Exemplarily, the encoded output sequence length obtained after inputting the word vector sequence [M, M, M, silver, line, M, M, M, red, bag, M, M, M] into the encoding part of the initial text generation model A sequence of 200-dimensional vectors of 13. For each vector, multiply a trainable matrix (matrix shape: 200×(word size+1), 1 means "-") so as to map the vector to the target word table (including "-"). After that, you can determine the mapped vector sequence and training label: [Quick, come, collar, silver, line, -,-,-, red, package, la,! ,-] The cross entropy between , and gradient descent can be used to fine-tune and update the parameters of the initial text generation model. When the initial text generation model converges (that is, the parameters of the initial text generation model cannot be updated) or reaches the maximum number of training steps, it can be considered that the initial text generation model has been trained, thereby obtaining a text generation model.
在本申请实施例中,经过模型训练,Fixed-Keywords BERT模型将会获得以下能力:输入“银行”、“红包”两词,输出“银行”左侧的“快来领”,“银行”、“红包”之间的“---”,“红包”右侧的“啦!-”。其中,“-”表示此处无字符,去掉“-”后的掩码部分与“银行”、“红包”按顺序拼接在一起,即可得到完整的营销文案:“快来领银行红包啦!”。由于“银行”、“红包”作为样本关键词,各自有其对应的样本类型,因此,此处可以进一步将生成好的营销文案中的样本关键词替换为对应的样本类型,即以“<公司名称>”替换“银行”,“<发放物品>”替换“红包”,可得到模板:“快来领<公司名称><发放物品>啦!”。即完成对模板的持久化。具体地,持久化为文本模板的流程图如图5所示:In the embodiment of this application, after model training, the Fixed-Keywords BERT model will acquire the following capabilities: input the words "bank" and "red envelope", and output "come to get" on the left side of "bank", "bank", "---" between "red envelopes", "啦!-" on the right side of "red envelopes". Among them, "-" indicates that there is no character here, and the mask part after removing "-" is spliced together with "bank" and "red envelope" in order to get a complete marketing copy: "Come and get the bank red envelope! ". Since "bank" and "red envelope" are sample keywords, each has its corresponding sample type, therefore, here you can further replace the sample keywords in the generated marketing copy with the corresponding sample type, that is, use "<company Name>" replaces "bank", "<issued item>" replaces "red envelope", and the template can be obtained: "Come and get <company name> <issued item>!". That is, the persistence of the template is completed. Specifically, the flow chart of persisting as a text template is shown in Figure 5:
S51、文本生成装置获取第一样本关键词。S51. The text generation device acquires a first sample keyword.
在本申请实施例中,文本生成装置在获取第一样本关键词的同时,还会获取第一样本关键词对应的第一样本类型。具体的,文本生成装置可以将第一样本文本输入关键词识别模型,得到第一样本文本对应的第一样本关键词、第一样本类型。In this embodiment of the present application, while acquiring the first sample keyword, the text generation device will also acquire the first sample type corresponding to the first sample keyword. Specifically, the text generation device may input the first sample text into the keyword recognition model to obtain the first sample keyword and the first sample type corresponding to the first sample text.
在本申请实施例中,文本生成装置还需要第一样本关键词序列。示例性的,输入的第一样本关键词形式可以为:In the embodiment of the present application, the text generation device also needs the first sample keyword sequence. Exemplarily, the input first sample keyword form may be:
第一样本类型:<公司名称>,第一样本关键词:银行;The first sample type: <company name>, the first sample keyword: bank;
第一样本类型:<发放物品>,第一样本关键词:免息券Type of the first sample: <item issued>, keyword of the first sample: interest-free coupon
需要说明的是,输入的第一样本关键词序列是顺序敏感的,即输入的第一样本关键词序列的顺序与其在最终生成的营销文案中出现的顺序是一致的。为了后续生成文本模板,故需要获取第一样本类型。It should be noted that the input first sample keyword sequence is order-sensitive, that is, the order of the input first sample keyword sequence is consistent with the order in which it appears in the final generated marketing copy. In order to subsequently generate a text template, it is necessary to obtain the first sample type.
S52、文本生成装置将第一样本关键词输入文本生成模型,得到第一输出文本。S52. The text generation device inputs the first sample keywords into the text generation model to obtain a first output text.
在本申请实施例中,银行和免息券这两个词按顺序所构成的空位(“银 行”的左侧,“银行”与“免息券”之间,“免息券”右侧)出现的字的个数的最大值L M。若L M为3,可以构造的字向量序列为:【M,M,M,银,行,M,M,M,免,息,券,M,M,M】。将构造的字向量序列输入Fixed-Keywords BERT模型的编码部分,可以得到最后一个编码层(最后一个编码块)输出的编码输出序列。将编码输出序列中的每一个向量(除E CLS以及第一样本关键词所在的部分)映射到字表(包括“-”),并且选择映射后得到的概率值最大的一个字作为对当前位置的预测结果。 In the embodiment of this application, the two words bank and interest-free coupon form spaces in sequence (the left side of "bank", between "bank" and "interest-free coupon", and the right side of "interest-free coupon") The maximum value L M of the number of words that appear. If L M is 3, the word vector sequence that can be constructed is: [M, M, M, bank, bank, M, M, M, free, interest, coupon, M, M, M]. Input the constructed word vector sequence into the encoding part of the Fixed-Keywords BERT model, and the encoded output sequence output by the last encoding layer (the last encoding block) can be obtained. Each vector in the coded output sequence (except the part where the ECLS and the first sample keyword is located) is mapped to the word table (including "-"), and the word with the largest probability value obtained after the mapping is selected as the current location predictions.
示例性的,对应于字向量序列:【M,M,M,银,行,M,M,M,免,息,券,M,M,M】,将位置处于(0,3),(5,8),(11,14)的九个向量分别映射到所有字表(包括“-”)上。映射完之后,对于每个字符位置,都有一个表示字表(包括“-”)中各个字可能性的数值(概率)向量,此时概率值最大的那个字即可以当作此处位置预测出来的字。当九个位置都预测出来字之后,再与第一样本关键词组合得到:【-,-,-,银,行,大,额,-,免,息,券,享,不,停】。Exemplary, corresponding to the word vector sequence: [M, M, M, bank, line, M, M, M, free, interest, coupon, M, M, M], place the position at (0, 3), ( 5, 8), the nine vectors of (11, 14) are mapped to all word tables (including "-") respectively. After mapping, for each character position, there is a numerical (probability) vector representing the possibility of each word in the word table (including "-"), and the word with the highest probability value can be used as the position prediction here out of the word. When all nine positions are predicted, then combine with the first sample keywords to get: 【-,-,-, bank, line, big, amount,-, free, interest, coupon, enjoy, no, stop】 .
需要说明的是,此处去掉表示不存在字符的“-”,即可以得到预测出来的营销文案(第一输出文本为):“银行大额免息券享不停”。It should be noted that the predicted marketing copy (the first output text is ) can be obtained by removing the "-" representing the non-existent character here: "Enjoy non-stop interest-free bank coupons".
S53、文本生成装置利用关键词识别模型确定第一样本关键词在第一输出文本中的第二位置。S53. The text generation device determines a second position of the first sample keyword in the first output text by using the keyword recognition model.
S54、文本生成装置在第一输出文本中的第二位置处,利用第一样本类型替换第一样本关键词,得到第一模板;并将第一模板作为文本模板。S54. The text generation device replaces the first sample keyword with the first sample type at the second position in the first output text to obtain a first template; and uses the first template as a text template.
在本申请实施例中,将预测出来的营销文案:“银行大额免息券享不停”中的第一样本关键词替换为对应的第一样本类型,即“银行”替换为第一样本类型“<公司名称>”,“免息券”替换为第一样本类型“<发放物品>”,可以得到最终的第一模板:“<公司名称>大额<发放物品>享不停”,将第一模板存储起来,即完成对营销文案模板的持久化。In the embodiment of this application, the first sample keyword in the predicted marketing copy: "Enjoy non-stop interest-free bank coupons" is replaced with the corresponding first sample type, that is, "bank" is replaced by the first sample keyword A sample type "<company name>", "interest-free coupon" is replaced by the first sample type "<issued item>", the final first template can be obtained: "<company name> large amount <issued item> enjoy Non-stop", the first template is stored, that is, the persistence of the marketing copy template is completed.
S103、在目标模板中查找目标文本类型的位置,并在位置处利用文本 关键词的字段信息替换目标文本类型对应的字段信息,得到包含文本关键词的目标文本。S103. Find the position of the target text type in the target template, and replace the field information corresponding to the target text type with the field information of the text keyword at the position, so as to obtain the target text containing the text keyword.
在本申请实施例中,文本生成装置在模板库中查找包含目标文本类型的目标模板之后,文本生成装置就可以在目标模板中查找目标文本类型的位置,并在位置处利用文本关键词的字段信息替换目标文本类型对应的字段信息,得到包含文本关键词的目标文本。In the embodiment of the present application, after the text generation device searches the target template containing the target text type in the template library, the text generation device can search for the position of the target text type in the target template, and use the field of the text keyword at the position The information replaces the field information corresponding to the target text type to obtain the target text containing text keywords.
需要说明的是,目标文本即为与文本生成指令对应的文本。It should be noted that the target text is the text corresponding to the text generation instruction.
示例性的,一种示例性的文本生成方法示意图如图6所示:Exemplarily, a schematic diagram of an exemplary text generation method is shown in Figure 6:
S61、在接收到文本生成指令的情况下,文本生成装置从文本生成指令中获取文本关键词。S61. In the case of receiving the text generation instruction, the text generation device acquires text keywords from the text generation instruction.
S62、在文本生成指令中未携带目标文本类型的情况下,文本生成装置将文本关键词输入类型识别模型,得到目标文本类型。S62. In the case that the text generation instruction does not carry the target text type, the text generation device inputs text keywords into the type recognition model to obtain the target text type.
S63、在文本生成指令中携带目标文本类型的情况下,文本生成装置从文本生成指令中获取目标文本类型。S63. In the case that the text generation instruction carries the target text type, the text generation device obtains the target text type from the text generation instruction.
S64、在模板库中存在包含目标文本类型的目标模板的情况下,文本生成装置从模板库中获取目标模板。S64. In the case that there is a target template including the target text type in the template library, the text generation device acquires the target template from the template library.
S65、文本生成装置在目标模板中查找目标文本类型的位置,并在位置处利用文本关键词的字段信息替换目标文本类型对应的字段信息,得到包含文本关键词的目标文本。S65. The text generation device searches for the position of the target text type in the target template, and replaces the field information corresponding to the target text type with the field information of the text keyword at the position, to obtain the target text containing the text keyword.
S66、在模板库中不包含目标文本类型的目标模板的情况下,文本生成装置确定根据文本关键词形成的至少两个空位置以及至少两个空位置对应的至少两组字符量。S66. In the case that the template library does not contain the target template of the target text type, the text generation device determines at least two empty positions formed according to the text keywords and at least two groups of character quantities corresponding to the at least two empty positions.
S67、文本生成装置按照至少两组字符量对至少两个空位置和关键字进行拼接,得到拼接信息。S67. The text generating device splices at least two empty positions and keywords according to at least two sets of characters to obtain splicing information.
S68、文本生成装置将拼接信息输入文本生成模型,得到与至少两个空位置对应的至少两组目标字符信息。S68. The text generation device inputs the splicing information into the text generation model to obtain at least two sets of target character information corresponding to at least two empty positions.
S69、文本生成装置在拼接信息中的至少两个空位置处添加至少两组目标字符信息,得到目标文本。S69. The text generation device adds at least two sets of target character information to at least two empty positions in the splicing information to obtain the target text.
示例性的,一种示例性的文本生成方法包括种子阶段和自动训练阶段,如图7所示。其中,种子阶段的是先获取第二样本文本,并对第二样本文本进行人工标注,得到第二样本文本对应的第二样本关键词、第二样本文本对应的第二样本类型和第二样本关键词在第二样本文本中的第三位置;利用第二样本关键词、第二样本类型、第三位置和第二样本文本训练初始关键词识别模型,得到关键词识别模型(训练关键词识别模型)。利用第二样本关键词和第二样本文本类型训练初始类型识别模型,得到类型识别模型(训练类型识别模型)。自动训练阶段是获取第一样本文本,并将第一样本文本输入关键词识别模型,得到第一样本文本对应的第一样本关键词、第一样本类型和第一样本关键词在第一样本文本中的第一位置(利用关键词识别模型对第一样本文本进行标注);将第一样本关键词输入文本生成模型,得到第一输出文本;利用关键词识别模型确定第一样本关键词在第一输出文本中的第二位置;在第一输出文本中的第二位置处,利用第一样本类型替换第一样本关键词,得到第一模板;在第一样本文本中的第一位置处,利用第一样本类型替换第一样本关键词,得到第二模板;将第一模板和第二模板作为文本模板(得到文本模板),并将文本模板添加至模板库,以在接收到文本生成指令的情况下,根据文本生成指令中的文本关键词和模板库中目标模板,得到包含文本关键词的目标文本。Exemplarily, an exemplary text generation method includes a seed stage and an automatic training stage, as shown in FIG. 7 . Among them, the seed stage is to obtain the second sample text first, and manually mark the second sample text to obtain the second sample keywords corresponding to the second sample text, the second sample type and the second sample text corresponding to the second sample text The third position of the keyword in the second sample text; Utilize the second sample keyword, the second sample type, the third position and the second sample text to train the initial keyword recognition model to obtain the keyword recognition model (training keyword recognition Model). The initial type recognition model is trained by using the second sample keywords and the second sample text type to obtain a type recognition model (training type recognition model). The automatic training stage is to obtain the first sample text, input the first sample text into the keyword recognition model, and obtain the first sample keyword, the first sample type and the first sample key corresponding to the first sample text The first position of the word in the first sample text (use the keyword recognition model to mark the first sample text); input the first sample keyword into the text generation model to obtain the first output text; use keyword recognition The model determines the second position of the first sample keyword in the first output text; at the second position in the first output text, the first sample keyword is replaced by the first sample type to obtain the first template; At the first position in the first sample text, utilize the first sample type to replace the first sample keyword to obtain a second template; use the first template and the second template as text templates (get text templates), and The text template is added to the template library, so that when the text generation instruction is received, the target text containing the text keyword is obtained according to the text keyword in the text generation instruction and the target template in the template library.
可以理解的是,文本生成装置在接收到文本生成指令的情况下,从文本生成指令中获取文本关键词,通过在模板库中查找包括文本关键词对应的目标文本类型的目标模板,并在目标模板中查找目标文本类型的位置,以在该位置处利用文本关键词的字段信息替换目标文本类型对应的字段信息,从而得到包含文本关键词的目标文本,不需要再通过人工的方式来得到文本信息,提高了文本生成装置生成文本信息时的智能性。It can be understood that, when the text generation device receives the text generation instruction, it obtains the text keyword from the text generation instruction, searches the template library for a target template that includes the target text type corresponding to the text keyword, and Find the position of the target text type in the template, and use the field information of the text keyword to replace the field information corresponding to the target text type at this position, so as to obtain the target text containing the text keyword, and do not need to obtain the text manually information, which improves the intelligence of the text generating device when generating text information.
实施例二Embodiment two
基于实施例一同一发明构思,本申请实施例提供了一种文本生成装置1,对应于一种文本生成方法;图8为本申请实施例提供的一种文本生成装置的组成结构示意图一,该文本生成装置1可以包括:Based on the same inventive concept of Embodiment 1, the embodiment of the present application provides a text generating device 1 corresponding to a text generating method; FIG. Text generating device 1 may include:
获取部分11,配置为在接收到文本生成指令的情况下,从所述文本生成指令中获取文本关键词;在模板库中存在包含所述目标文本类型的目标模板的情况下,从所述模板库中获取所述目标模板;所述模板库中的模板为设置有文本类型的文本模板;The obtaining part 11 is configured to, in the case of receiving a text generation instruction, obtain text keywords from the text generation instruction; if there is a target template containing the target text type in the template library, from the template Obtain the target template in the library; the template in the template library is a text template with a text type;
确定部分12,配置为确定所述文本关键词对应的目标文本类型;The determining part 12 is configured to determine the target text type corresponding to the text keyword;
替换部分13,配置为在所述位置处利用所述文本关键词的字段信息替换所述目标文本类型对应的字段信息,得到包含所述文本关键词的目标文本。The replacement part 13 is configured to replace the field information corresponding to the target text type with the field information of the text keyword at the position, so as to obtain the target text containing the text keyword.
在本申请的一些实施例中,所述装置还包括输入部分和添加部分;In some embodiments of the present application, the device further includes an input part and an adding part;
所述获取部分11,配置为获取第一样本文本;The acquisition part 11 is configured to acquire the first sample text;
所述输入部分,配置为将所述第一样本文本输入关键词识别模型,得到所述第一样本文本对应的第一样本关键词、第一样本类型和所述第一样本关键词在所述第一样本文本中的第一位置;将所述第一样本关键词输入文本生成模型,得到第一输出文本;根据所述第一输出文本、所述第一样本文本、所述第一样本关键词、所述第一样本类型和所述第一位置,得到所述文本模板;The input part is configured to input the first sample text into the keyword recognition model to obtain the first sample keyword, the first sample type and the first sample text corresponding to the first sample text The keyword is in the first position in the first sample text; the first sample keyword is input into the text generation model to obtain the first output text; according to the first output text, the first sample text This, the first sample keyword, the first sample type and the first position to obtain the text template;
所述添加部分,配置为将所述文本模板添加至所述模板库。The adding part is configured to add the text template to the template library.
在本申请的一些实施例中,所述确定部分12,配置为利用关键词识别模型确定所述第一样本关键词在所述第一输出文本中的第二位置;In some embodiments of the present application, the determining part 12 is configured to determine a second position of the first sample keyword in the first output text by using a keyword recognition model;
所述替换部分13,配置为在所述第一输出文本中的所述第二位置处,利用所述第一样本类型替换所述第一样本关键词,得到第一模板;在所述第一样本文本中的所述第一位置处,利用所述第一样本类型替换所述第一 样本关键词,得到第二模板;将所述第一模板和所述第二模板作为所述文本模板。The replacement part 13 is configured to replace the first sample keyword with the first sample type at the second position in the first output text to obtain a first template; in the At the first position in the first sample text, use the first sample type to replace the first sample keyword to obtain a second template; use the first template and the second template as The text template.
在本申请的一些实施例中,所述装置还包括训练部分;In some embodiments of the present application, the device further includes a training part;
所述获取部分11,配置为获取第二样本文本和所述第二样本文本对应的第二样本关键词、所述第二样本文本对应的第二样本类型和第二样本关键词在所述第二样本文本中的第三位置;The acquiring part 11 is configured to acquire the second sample text and the second sample keyword corresponding to the second sample text, the second sample type corresponding to the second sample text and the second sample keyword in the second sample text the third position in the second sample text;
所述训练部分,配置为利用所述第二样本关键词、所述第二样本类型、所述第三位置和所述第二样本文本训练初始关键词识别模型,得到所述关键词识别模型。The training part is configured to use the second sample keyword, the second sample type, the third position and the second sample text to train an initial keyword recognition model to obtain the keyword recognition model.
在本申请的一些实施例中,所述装置还包括拼接部分;In some embodiments of the present application, the device further includes a splicing part;
所述确定部分12,配置为在所述模板库中不包含所述目标文本类型的所述目标模板的情况下,确定根据所述文本关键词形成的至少两个空位置以及所述至少两个空位置对应的至少两组字符量;所述至少两个空位置与所述至少两组字符量一一对应;The determining part 12 is configured to determine at least two empty positions formed according to the text keywords and the at least two At least two groups of characters corresponding to the empty positions; the at least two empty positions correspond to the at least two groups of characters one by one;
所述拼接部分,配置为按照所述至少两组字符量对所述至少两个空位置和所述关键字进行拼接,得到拼接信息;The splicing part is configured to splice the at least two empty positions and the keyword according to the at least two groups of characters to obtain splicing information;
所述输入部分,配置为将所述拼接信息输入文本生成模型,得到与所述至少两个空位置对应的至少两组目标字符信息;The input part is configured to input the splicing information into the text generation model to obtain at least two sets of target character information corresponding to the at least two empty positions;
所述添加部分,配置为在所述拼接信息中的所述至少两个空位置处添加所述至少两组目标字符信息,得到所述目标文本。The adding part is configured to add the at least two groups of target character information to the at least two empty positions in the splicing information to obtain the target text.
在本申请的一些实施例中,所述输入部分,配置为在所述文本生成指令中未携带所述目标文本类型的情况下,将所述文本关键词输入类型识别模型,得到所述目标文本类型;In some embodiments of the present application, the input part is configured to input the text keywords into the type recognition model to obtain the target text when the target text type is not carried in the text generation instruction type;
所述获取部分11,配置为在所述文本生成指令中携带所述目标文本类型的情况下,从所述文本生成指令中获取所述目标文本类型。The obtaining part 11 is configured to obtain the target text type from the text generation instruction if the text generation instruction carries the target text type.
在本申请的一些实施例中,所述获取部分11,配置为获取第二样本关 键词和第二样本文本类型;In some embodiments of the present application, the acquisition part 11 is configured to acquire a second sample keyword and a second sample text type;
所述训练部分,配置为利用所述第二样本关键词和所述第二样本文本类型训练初始类型识别模型,得到所述类型识别模型。The training part is configured to use the second sample keywords and the second sample text type to train an initial type recognition model to obtain the type recognition model.
需要说明的是,在实际应用中,上述获取部分11、确定部分12和替换部分13可由文本生成装置1上的处理器14实现,具体为CPU(Central Processing Unit,中央处理器)、MPU(Microprocessor Unit,微处理器)、DSP(Digital Signal Processing,数字信号处理器)或现场可编程门阵列(FPGA,Field Programmable Gate Array)等实现;上述数据存储可由文本生成装置1上的存储器15实现。It should be noted that, in practical applications, the above-mentioned acquisition part 11, determination part 12 and replacement part 13 can be realized by the processor 14 on the text generation device 1, specifically CPU (Central Processing Unit, central processing unit), MPU (Microprocessor Unit, microprocessor), DSP (Digital Signal Processing, digital signal processor) or Field Programmable Gate Array (FPGA, Field Programmable Gate Array) and other realizations; the above-mentioned data storage can be realized by the memory 15 on the text generation device 1.
本申请实施例还提供了一种文本生成装置1,如图9所示,所述文本生成装置1包括:处理器14、存储器15和通信总线16,所述存储器15通过所述通信总线16与所述处理器14进行通信,所述存储器15存储所述处理器14可执行的程序,当所述程序被执行时,通过所述处理器14执行如上述所述的文本生成方法。The embodiment of the present application also provides a text generating device 1. As shown in FIG. The processor 14 communicates, and the memory 15 stores a program executable by the processor 14. When the program is executed, the processor 14 executes the text generation method as described above.
在实际应用中,上述存储器15可以是易失性存储器(volatile memory),例如随机存取存储器(Random-Access Memory,RAM);或者非易失性存储器(non-volatile memory),例如只读存储器(Read-Only Memory,ROM),快闪存储器(flash memory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);或者上述种类的存储器的组合,并向处理器14提供指令和数据。In practical applications, the above-mentioned memory 15 can be a volatile memory (volatile memory), such as a random access memory (Random-Access Memory, RAM); or a non-volatile memory (non-volatile memory), such as a read-only memory (Read-Only Memory, ROM), flash memory (flash memory), hard disk (Hard Disk Drive, HDD) or solid-state hard drive (Solid-State Drive, SSD); Provide instructions and data.
本申请实施例提供了一种计算机可读存储介质,其上有计算机程序,所述程序被处理器14执行时实现如上述所述的文本生成方法。An embodiment of the present application provides a computer-readable storage medium, on which a computer program is carried, and when the program is executed by the processor 14, the text generation method as described above is implemented.
可以理解的是,文本生成装置在接收到文本生成指令的情况下,从文本生成指令中获取文本关键词,通过在模板库中查找包括文本关键词对应的目标文本类型的目标模板,并在目标模板中查找目标文本类型的位置,以在该位置处利用文本关键词的字段信息替换目标文本类型对应的字段信 息,从而得到包含文本关键词的目标文本,不需要再通过人工的方式来得到文本信息,提高了文本生成装置生成文本信息时的智能性。It can be understood that, when the text generation device receives the text generation instruction, it obtains the text keyword from the text generation instruction, searches the template library for a target template that includes the target text type corresponding to the text keyword, and Find the position of the target text type in the template, and use the field information of the text keyword to replace the field information corresponding to the target text type at this position, so as to obtain the target text containing the text keyword, and do not need to obtain the text manually information, which improves the intelligence of the text generating device when generating text information.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) having computer-usable program code embodied therein.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.
以上所述,仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。The above descriptions are only preferred embodiments of the present application, and are not intended to limit the protection scope of the present application.
工业实用性Industrial Applicability
本申请实施例提供了一种文本生成方法及装置、存储介质,文本生成方法包括:在接收到文本生成指令的情况下,从文本生成指令中获取文本关键词,并确定文本关键词对应的目标文本类型;在模板库中存在包含目标文本类型的目标模板的情况下,从模板库中获取目标模板;模板库中的模板为设置有文本类型的文本模板;在目标模板中查找目标文本类型的位置,并在位置处利用文本关键词的字段信息替换目标文本类型对应的字段信息,得到包含文本关键词的目标文本。采用上述方法实现方案,文本生成装置在接收到文本生成指令的情况下,从文本生成指令中获取文本关键词,通过在模板库中查找包括文本关键词对应的目标文本类型的目标模板,并在目标模板中查找目标文本类型的位置,以在该位置处利用文本关键词的字段信息替换目标文本类型对应的字段信息,从而得到包含文本关键词的目标文本,不需要再通过人工的方式来得到文本信息,提高了文本生成装置生成文本信息时的智能性。Embodiments of the present application provide a text generation method and device, and a storage medium. The text generation method includes: in the case of receiving a text generation instruction, obtaining text keywords from the text generation instruction, and determining the target corresponding to the text keyword Text type; if there is a target template containing the target text type in the template library, obtain the target template from the template library; the template in the template library is a text template with a text type; find the target text type in the target template position, and replace the field information corresponding to the target text type with the field information of the text keyword at the position to obtain the target text containing the text keyword. Using the implementation scheme of the above method, when the text generation device receives the text generation instruction, it obtains the text keywords from the text generation instruction, searches the template library for the target template that includes the target text type corresponding to the text keyword, and Find the position of the target text type in the target template, and use the field information of the text keyword to replace the field information corresponding to the target text type at this position, so as to obtain the target text containing the text keyword, which does not need to be obtained manually The text information improves the intelligence of the text generating device when generating the text information.

Claims (10)

  1. 一种文本生成方法,所述方法包括:A text generation method, the method comprising:
    在接收到文本生成指令的情况下,从所述文本生成指令中获取文本关键词,并确定所述文本关键词对应的目标文本类型;In the case of receiving a text generation instruction, acquiring text keywords from the text generation instruction, and determining the target text type corresponding to the text keywords;
    在模板库中存在包含所述目标文本类型的目标模板的情况下,从所述模板库中获取所述目标模板;所述模板库中的模板为设置有文本类型的文本模板;In the case that there is a target template containing the target text type in the template library, the target template is obtained from the template library; the template in the template library is a text template provided with a text type;
    在所述目标模板中查找所述目标文本类型的位置,并在所述位置处利用所述文本关键词的字段信息替换所述目标文本类型对应的字段信息,得到包含所述文本关键词的目标文本。Find the position of the target text type in the target template, and replace the field information corresponding to the target text type with the field information of the text keyword at the position, to obtain the target containing the text keyword text.
  2. 根据权利要求1所述的方法,其中,所述从所述模板库中获取所述目标模板之前,所述方法还包括:The method according to claim 1, wherein, before obtaining the target template from the template library, the method further comprises:
    获取第一样本文本;并将所述第一样本文本输入关键词识别模型,得到所述第一样本文本对应的第一样本关键词、第一样本类型和所述第一样本关键词在所述第一样本文本中的第一位置;Obtain a first sample text; and input the first sample text into the keyword recognition model to obtain the first sample keyword, the first sample type and the first sample corresponding to the first sample text the first position of the keyword in the first sample text;
    将所述第一样本关键词输入文本生成模型,得到第一输出文本;Inputting the first sample keywords into the text generation model to obtain the first output text;
    根据所述第一输出文本、所述第一样本文本、所述第一样本关键词、所述第一样本类型和所述第一位置,得到所述文本模板,并将所述文本模板添加至所述模板库。According to the first output text, the first sample text, the first sample keywords, the first sample type and the first position, the text template is obtained, and the text A template is added to the template library.
  3. 根据权利要求2所述的方法,其中所述根据所述第一输出文本、所述第一样本文本、所述第一样本关键词、所述第一样本类型和所述第一位置,得到所述文本模板,包括:The method according to claim 2, wherein said first output text, said first sample text, said first sample keywords, said first sample type and said first location , to get the text template, including:
    利用关键词识别模型确定所述第一样本关键词在所述第一输出文本中的第二位置;using a keyword recognition model to determine a second position of the first sample keyword in the first output text;
    在所述第一输出文本中的所述第二位置处,利用所述第一样本类型替 换所述第一样本关键词,得到第一模板;At the second position in the first output text, replace the first sample keyword with the first sample type to obtain a first template;
    在所述第一样本文本中的所述第一位置处,利用所述第一样本类型替换所述第一样本关键词,得到第二模板;At the first position in the first sample text, using the first sample type to replace the first sample keyword to obtain a second template;
    将所述第一模板和所述第二模板作为所述文本模板。The first template and the second template are used as the text templates.
  4. 根据权利要求2所述的方法,其中,所述将所述第一样本文本输入关键词识别模型,得到所述第一样本文本对应的第一样本关键词、第一样本类型和所述第一样本关键词在所述第一样本文本中的第一位置之前,所述方法还包括:The method according to claim 2, wherein the first sample text is input into the keyword recognition model to obtain the first sample keyword, the first sample type and the first sample text corresponding to the first sample text The first sample keyword is before the first position in the first sample text, and the method further includes:
    获取第二样本文本和所述第二样本文本对应的第二样本关键词、所述第二样本文本对应的第二样本类型和第二样本关键词在所述第二样本文本中的第三位置;Obtaining the second sample text, the second sample keyword corresponding to the second sample text, the second sample type corresponding to the second sample text, and the third position of the second sample keyword in the second sample text ;
    利用所述第二样本关键词、所述第二样本类型、所述第三位置和所述第二样本文本训练初始关键词识别模型,得到所述关键词识别模型。An initial keyword recognition model is trained by using the second sample keyword, the second sample type, the third position and the second sample text to obtain the keyword recognition model.
  5. 根据权利要求1所述的方法,其中,所述确定所述文本关键词对应的目标文本类型之后,所述方法还包括:The method according to claim 1, wherein, after determining the target text type corresponding to the text keyword, the method further comprises:
    在所述模板库中不包含所述目标文本类型的所述目标模板的情况下,确定根据所述文本关键词形成的至少两个空位置以及所述至少两个空位置对应的至少两组字符量;所述至少两个空位置与所述至少两组字符量一一对应;If the template library does not contain the target template of the target text type, determine at least two empty positions formed according to the text keywords and at least two groups of characters corresponding to the at least two empty positions amount; the at least two empty positions correspond to the at least two groups of character amounts;
    按照所述至少两组字符量对所述至少两个空位置和所述关键字进行拼接,得到拼接信息;splicing the at least two empty positions and the keyword according to the at least two groups of characters to obtain splicing information;
    将所述拼接信息输入文本生成模型,得到与所述至少两个空位置对应的至少两组目标字符信息;Inputting the splicing information into the text generation model to obtain at least two groups of target character information corresponding to the at least two empty positions;
    在所述拼接信息中的所述至少两个空位置处添加所述至少两组目标字符信息,得到所述目标文本。Adding the at least two groups of target character information to the at least two empty positions in the splicing information to obtain the target text.
  6. 根据权利要求1所述的方法,其中,所述确定所述文本关键词对应 的目标文本类型,包括:The method according to claim 1, wherein said determination of the target text type corresponding to said text keyword comprises:
    在所述文本生成指令中未携带所述目标文本类型的情况下,将所述文本关键词输入类型识别模型,得到所述目标文本类型;In the case that the text generation instruction does not carry the target text type, inputting the text keyword into a type recognition model to obtain the target text type;
    在所述文本生成指令中携带所述目标文本类型的情况下,从所述文本生成指令中获取所述目标文本类型。If the target text type is carried in the text generation instruction, the target text type is obtained from the text generation instruction.
  7. 根据权利要求6所述的方法,其中,所述将所述文本关键词输入类型识别模型,得到所述目标文本类型之前,所述方法还包括:The method according to claim 6, wherein, before said inputting said text keywords into a type recognition model and obtaining said target text type, said method further comprises:
    获取第二样本关键词和第二样本文本类型;Obtain a second sample keyword and a second sample text type;
    利用所述第二样本关键词和所述第二样本文本类型训练初始类型识别模型,得到所述类型识别模型。An initial type recognition model is trained by using the second sample keywords and the second sample text type to obtain the type recognition model.
  8. 一种文本生成装置,所述装置包括:A text generation device, said device comprising:
    获取部分,配置为在接收到文本生成指令的情况下,从所述文本生成指令中获取文本关键词;在模板库中存在包含所述目标文本类型的目标模板的情况下,从所述模板库中获取所述目标模板;所述模板库中的模板为设置有文本类型的文本模板;The obtaining part is configured to obtain text keywords from the text generation instruction in the case of receiving the text generation instruction; in the case that there is a target template containing the target text type in the template library, from the template library Obtain the target template in the template library; the template in the template library is a text template with a text type;
    确定部分,配置为确定所述文本关键词对应的目标文本类型;A determining part configured to determine a target text type corresponding to the text keyword;
    替换部分,配置为在所述位置处利用所述文本关键词的字段信息替换所述目标文本类型对应的字段信息,得到包含所述文本关键词的目标文本。The replacement part is configured to replace the field information corresponding to the target text type with the field information of the text keyword at the position, so as to obtain the target text containing the text keyword.
  9. 一种文本生成装置,所述装置包括:A text generation device, said device comprising:
    存储器、处理器和通信总线,所述存储器通过所述通信总线与所述处理器进行通信,所述存储器存储所述处理器可执行的文本生成的程序,当所述文本生成的程序被执行时,通过所述处理器执行如权利要求1至7任一项所述的方法。a memory, a processor, and a communication bus, the memory communicates with the processor through the communication bus, the memory stores a text-generated program executable by the processor, and when the text-generated program is executed , executing the method according to any one of claims 1 to 7 by the processor.
  10. 一种存储介质,其上存储有计算机程序,应用于文本生成装置,该计算机程序被处理器执行时实现权利要求1至7任一项所述的方法。A storage medium, on which a computer program is stored, applied to a text generating device, and the computer program is executed by a processor to implement the method described in any one of claims 1 to 7.
PCT/CN2022/100545 2021-11-01 2022-06-22 Text generation method and apparatus, and storage medium WO2023071242A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111284961.6A CN114118041A (en) 2021-11-01 2021-11-01 Text generation method and device and storage medium
CN202111284961.6 2021-11-01

Publications (1)

Publication Number Publication Date
WO2023071242A1 true WO2023071242A1 (en) 2023-05-04

Family

ID=80379767

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/100545 WO2023071242A1 (en) 2021-11-01 2022-06-22 Text generation method and apparatus, and storage medium

Country Status (2)

Country Link
CN (1) CN114118041A (en)
WO (1) WO2023071242A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118041A (en) * 2021-11-01 2022-03-01 深圳前海微众银行股份有限公司 Text generation method and device and storage medium
CN114997131A (en) * 2022-05-19 2022-09-02 北京沃东天骏信息技术有限公司 File generation method, model training device, file generation device, file training equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7752033B2 (en) * 2002-03-18 2010-07-06 National Institute Of Information And Communications Technology, Independent Administrative Institution Text generation method and text generation device
CN111930976A (en) * 2020-07-16 2020-11-13 平安科技(深圳)有限公司 Presentation generation method, device, equipment and storage medium
CN112597312A (en) * 2020-12-28 2021-04-02 深圳壹账通智能科技有限公司 Text classification method and device, electronic equipment and readable storage medium
CN113076756A (en) * 2020-01-06 2021-07-06 北京沃东天骏信息技术有限公司 Text generation method and device
CN114118041A (en) * 2021-11-01 2022-03-01 深圳前海微众银行股份有限公司 Text generation method and device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7752033B2 (en) * 2002-03-18 2010-07-06 National Institute Of Information And Communications Technology, Independent Administrative Institution Text generation method and text generation device
CN113076756A (en) * 2020-01-06 2021-07-06 北京沃东天骏信息技术有限公司 Text generation method and device
CN111930976A (en) * 2020-07-16 2020-11-13 平安科技(深圳)有限公司 Presentation generation method, device, equipment and storage medium
CN112597312A (en) * 2020-12-28 2021-04-02 深圳壹账通智能科技有限公司 Text classification method and device, electronic equipment and readable storage medium
CN114118041A (en) * 2021-11-01 2022-03-01 深圳前海微众银行股份有限公司 Text generation method and device and storage medium

Also Published As

Publication number Publication date
CN114118041A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
WO2023071242A1 (en) Text generation method and apparatus, and storage medium
US10922488B1 (en) Computing numeric representations of words in a high-dimensional space
AU2014201827B2 (en) Scoring concept terms using a deep network
KR102129640B1 (en) Ranking for inductive synthesis of string transformations
CN110321482A (en) A kind of recommended method of information, device and equipment
CN1606004B (en) Method and apparatus for identifying semantic structures from text
CN101473325B (en) Bucket-based searching
US10936950B1 (en) Processing sequential interaction data
US11475227B2 (en) Intelligent routing services and systems
EP3580698B1 (en) Hierarchical device placement with reinforcement learning
CN110678882B (en) Method and system for selecting answer spans from electronic documents using machine learning
US20210004370A1 (en) Machine learning based plug-in for providing access to cloud-based analytics engine
EP3563302A1 (en) Processing sequential data using recurrent neural networks
US11741190B2 (en) Multi-dimensional language style transfer
US20240062253A1 (en) Advertisement title rewriting method, apparatus and device, and storage medium
CN103885767A (en) System and method used for geographical area correlated websites
CN112800339B (en) Information stream searching method, device and equipment
RU2564641C1 (en) Intelligent information selection system &#34;optimel&#34;
JP6979899B2 (en) Generator, learning device, generation method, learning method, generation program, and learning program
CN116188125B (en) Business invitation management method and device for office building, electronic equipment and storage medium
Carroll Beyond spreadsheets with R: A beginner's guide to R and RStudio
CN108717587A (en) A method of text prediction forwarding task is pushed away based on the solution of multi-panel sorting network
JP2019133565A (en) News material classification apparatus, program and learning model
US20240184982A1 (en) Hierarchical text generation using language model neural networks
CN112633479A (en) Target data prediction method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22885148

Country of ref document: EP

Kind code of ref document: A1