WO2023071242A1 - Procédé et appareil de génération de texte et support de stockage - Google Patents

Procédé et appareil de génération de texte et support de stockage Download PDF

Info

Publication number
WO2023071242A1
WO2023071242A1 PCT/CN2022/100545 CN2022100545W WO2023071242A1 WO 2023071242 A1 WO2023071242 A1 WO 2023071242A1 CN 2022100545 W CN2022100545 W CN 2022100545W WO 2023071242 A1 WO2023071242 A1 WO 2023071242A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
sample
keyword
type
target
Prior art date
Application number
PCT/CN2022/100545
Other languages
English (en)
Chinese (zh)
Inventor
王昕远
郑少杰
范增虎
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2023071242A1 publication Critical patent/WO2023071242A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Definitions

  • the present application relates to the technical field of artificial intelligence, in particular to a method and device for generating text, and a storage medium.
  • the network will push a lot of text information corresponding to the object to the user every day, so that the user can understand the object in depth according to the text information, so as to realize the processing process of the object.
  • the embodiments of the present application expect to provide a text generation method and device, and a storage medium, which can improve the intelligence of the text generation device when generating text information.
  • An embodiment of the present application provides a text generation method, including:
  • the target template is obtained from the template library;
  • the template in the template library is a text template provided with a text type;
  • An embodiment of the present application provides a text generation device, the device includes:
  • the obtaining part is configured to obtain text keywords from the text generation instruction in the case of receiving the text generation instruction; in the case that there is a target template containing the target text type in the template library, from the template library Obtain the target template in the template library; the template in the template library is a text template with a text type;
  • a determining part configured to determine a target text type corresponding to the text keyword
  • the replacement part is configured to replace the field information corresponding to the target text type with the field information of the text keyword at the position, so as to obtain the target text containing the text keyword.
  • An embodiment of the present application provides a text generation device, the device includes:
  • the memory communicates with the processor through the communication bus, the memory stores a text-generated program executable by the processor, and when the text-generated program is executed , using the processor to execute the above text generation method.
  • An embodiment of the present application provides a storage medium on which a computer program is stored, which is applied to a text generation device.
  • the computer program is executed by a processor, the above text generation method is implemented.
  • Embodiments of the present application provide a text generation method and device, and a storage medium.
  • the text generation method includes: in the case of receiving a text generation instruction, obtaining text keywords from the text generation instruction, and determining the target corresponding to the text keyword Text type; if there is a target template containing the target text type in the template library, obtain the target template from the template library; the template in the template library is a text template with a text type; find the target text type in the target template position, and replace the field information corresponding to the target text type with the field information of the text keyword at the position to obtain the target text containing the text keyword.
  • the text generation device when it receives the text generation instruction, it obtains the text keywords from the text generation instruction, searches the template library for a target template that includes the target text type corresponding to the text keyword, and Find the position of the target text type in the target template, and use the field information of the text keyword to replace the field information corresponding to the target text type at this position, so as to obtain the target text containing the text keyword, which does not need to be obtained manually
  • the text information improves the intelligence of the text generating device when generating the text information.
  • Fig. 1 is a flow chart of a text generation method provided by the embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of an exemplary BERT provided in the embodiment of the present application.
  • FIG. 3 is a schematic diagram of an exemplary supervised training BERT model provided by an embodiment of the present application.
  • Fig. 4 is a flow chart of an exemplary training BERT model provided by the embodiment of the present application.
  • FIG. 5 is an exemplary text template persistence flowchart provided by the embodiment of the present application.
  • FIG. 6 is a flow chart of an exemplary text generation method provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a seed stage and an automatic training stage of an exemplary text generation method provided by an embodiment of the present application.
  • FIG. 8 is a first structural diagram of a text generation device provided by an embodiment of the present application.
  • FIG. 9 is a second schematic diagram of the composition and structure of a text generation device provided by an embodiment of the present application.
  • Fig. 1 is a flow chart 1 of a text generation method provided in the embodiment of the present application.
  • the text generation method may include:
  • a text generation method provided by an embodiment of the present application is applicable to a scenario where a target text is generated according to text keywords carried in a text generation instruction.
  • the text generation device may be implemented in various forms.
  • the text generation device described in this application may include mobile phones, cameras, tablet computers, notebook computers, palmtop computers, personal digital assistants (Personal Digital Assistant, PDA), portable media players (Portable Media Player, PMP), Devices such as navigation devices, wearable devices, smart bracelets, pedometers, and devices such as digital TVs, desktop computers, etc.
  • PDA Personal Digital Assistant
  • PMP portable media players
  • Devices such as navigation devices, wearable devices, smart bracelets, pedometers, and devices such as digital TVs, desktop computers, etc.
  • the text generation instruction can be an instruction for generating marketing text; the text generation instruction can also be an instruction for generating advertising text; the text generation instruction can be an instruction for generating other text; the specific text generation instruction can be based on the actual situation The determination is made, which is not limited in this embodiment of the present application.
  • the text generation device may include a display screen, and the text generation device may receive a text generation instruction from the display screen; the text generation device may also receive a text generation instruction from other devices, and the text generation instruction may also be transmitted through other devices.
  • the method for receiving the text generation instruction is performed by the text generation device; the specific method for the text generation device to receive the text generation instruction can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • the text keyword may be information used to generate the target text corresponding to the text generation instruction.
  • the number of text keywords can be one, the number of text keywords can also be two, the number of text keywords can also be multiple, and the specific number of text keywords can be determined according to the actual situation. Alternatively, this embodiment of the present application does not limit it.
  • the text keywords include bank, coupon, 10 yuan, January 1st to January 30th, movie viewing, card binding, etc.
  • the number of target text types can be one, the number of target text types can also be two, the number of target text types can also be multiple, and the specific number of target text types can be determined according to the actual situation. Alternatively, this embodiment of the present application does not limit it.
  • the target text type can be a company name; the target text type can also be a product name; the target text type can also be a distribution item; the target text type can also be a numerical amount; the target text type can also be an activity time or an activity Description; the specific target text type can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • the process of the text generation device determining the target text type corresponding to the text keyword includes: when the text generation instruction does not carry the target text type, the text generation device inputs the text keyword into the type recognition model, Obtain the target text type; if the target text type is carried in the text generation instruction, the text generation device obtains the target text type from the text generation instruction.
  • the type recognition model can be a model configured in the text generation device; the type recognition model can also be a model obtained by the type recognition model from other devices before the text generation device inputs text keywords into the type recognition model; The recognition model may also be a model obtained by the text generation device in other ways; the specific manner in which the text generation device obtains the type recognition model may be determined according to actual conditions, which is not limited in this embodiment of the present application.
  • the type recognition model can be a text classification (FastText) model; the type recognition model can also be other models that can determine the text type according to text keywords; the specific type recognition model can be determined according to the actual situation , which is not limited in this embodiment of the present application.
  • FastText text classification
  • the type recognition model can also be other models that can determine the text type according to text keywords
  • the specific type recognition model can be determined according to the actual situation , which is not limited in this embodiment of the present application.
  • the text generation device inputs the text keywords into the type recognition model, and before obtaining the target text type, the text generation device will also obtain the second sample keyword and the second sample text type; the text generation device uses the second sample The keywords and the second sample text type train the initial type recognition model to obtain the type recognition model.
  • the second sample keyword can be a preset keyword; the second sample keyword can also be a keyword transmitted to the text generating device by other devices; the second sample keyword can also be a text generated
  • the keywords received by the device through manual labeling; the specific method of obtaining the second sample keywords by the text generation device can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • the second sample text type is a text type corresponding to the second sample keyword.
  • the second sample text type can be a preset text type; the second sample text type can also be the text type transmitted to the text generating device by other equipment; the second sample text type can also be received by the text generating device through manual marking
  • the text type obtained by the method; the specific method of obtaining the second sample text type by the text generation device can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • the text generation device may acquire the second sample keyword and the second sample text type only once.
  • the second sample keywords include bank, coupon, 10 yuan, January 1 to January 30, movie watching, card binding and so on.
  • the second sample text type includes: company name, product name, issued items, numerical value, activity time or activity description, etc.; the specific second sample text type can be determined according to the actual situation, and this embodiment of the application Not limited.
  • the templates in the template library are text templates set with the text type.
  • the text generation device determines the target text type corresponding to the text keyword, if the text generation device has a target template containing the target text type in the template library, the text generation device obtains the target text type from the template library. template.
  • templates in the template library are text templates with a text type.
  • the number of text templates can be one, and the number of text templates can also be two; the number of text templates can also be multiple, and the specific number of text templates can be determined according to the actual situation. The embodiment does not limit this.
  • the text generation device before the text generation device obtains the target template from the template library, the text generation device will also obtain the first sample text; and input the first sample text into the keyword recognition model to obtain the first sample text The corresponding first sample keyword, the first sample type and the first position of the first sample keyword in the first sample text; the text generation device inputs the first sample keyword into the text generation model to obtain the first sample keyword An output text; according to the first output text, the first sample text, the first sample keyword, the first sample type and the first position, a text template is obtained, and the text template is added to the template library.
  • the text generation device may obtain the first sample text every preset time period; the text generation device may also obtain the first sample text when receiving the sample text acquisition instruction. Note that the text generation device can also obtain the first sample text in other ways; the specific method for the text generation device to obtain the first sample text can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • the preset time period can be the time period configured in the text generating device; the preset time period can also be the time period received by the text generating device before the text generating device obtains the first sample text; the preset time The segment may also be a time segment obtained by the text generating device in other ways, and the specific manner in which the text generating device obtains the preset time segment may be determined according to actual conditions, which is not limited in this embodiment of the present application.
  • the preset time period can be one week; the preset time period can also be one month; the preset time period can also be one day; the specific preset time period can be determined according to the actual situation. There is no limit to this.
  • the keyword recognition model can be a model configured in the text generation device; the keyword recognition model can also be a model transmitted by other devices received by the text generation device; the keyword recognition model can also be a text generation device Models obtained in other ways; the specific manner in which the text generation device obtains the keyword recognition model may be determined according to actual conditions, which is not limited in this embodiment of the present application.
  • the keyword recognition model can be a model obtained from a language representation model (Bidirectional Encoder Representation from Transformers, BERT) and a conditional random field model; the keyword recognition model can also be other models that can be obtained from the sample text Models of the sample keywords, sample types, and positions of the sample keywords in the sample text corresponding to the sample text; the specific keyword recognition model can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • a language representation model Bidirectional Encoder Representation from Transformers, BERT
  • conditional random field model the keyword recognition model can also be other models that can be obtained from the sample text Models of the sample keywords, sample types, and positions of the sample keywords in the sample text corresponding to the sample text
  • the specific keyword recognition model can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • the text generation model can be a model configured in the text generation device; the text generation model can also be a model transmitted by other devices received by the text generation device; the text generation model can also be a text generation device with other The model obtained by the method; the specific method for the text generation device to obtain the text generation model can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • the text generation model can be the Fixed-Keywords BERT model; the text generation model can also be other models that can generate output text according to the text keywords; the specific text generation model can be determined according to the actual situation. The embodiment does not limit this.
  • the text generation device obtains the text template according to the first output text, the first sample text, the first sample keyword, the first sample type and the first position, including: the text generation device Utilize the keyword recognition model to determine the second position of the first sample keyword in the first output text; the text generation device replaces the first sample key with the first sample type at the second position in the first output text word, to obtain the first template; the text generation device uses the first sample type to replace the first sample keyword at the first position in the first sample text to obtain the second template; the text generation device combines the first template and The second template serves as a text template.
  • the text generation device uses the keyword recognition model to determine the second position of the first sample keyword in the first output text, which can be used for the text generation device to input the first output text into the keyword recognition model , using the keyword recognition model to determine the second position of the first sample keyword in the first output text.
  • the first template and the second template can be the same; the first template and the second template can also be different; if the number of the first template and the number of the second template are multiple, then the first template and the second template
  • the second template may also have some same templates and some different templates; the specific ones can be determined according to the actual situation, which is not limited in this embodiment of the present application.
  • first position and the second position may be the same; the first position and the second position may also be different; the specifics may be determined according to actual conditions, which is not limited in this embodiment of the present application.
  • the text generation device inputs the first sample text into the keyword recognition model, and obtains the first sample keyword corresponding to the first sample text, the first sample type, and the first sample keyword in Before the first position in the first sample text, the text generation device will also obtain the second sample text and the second sample keyword corresponding to the second sample text, the second sample type and the second sample key corresponding to the second sample text The third position of the word in the second sample text; the text generation device uses the second sample keyword, the second sample type, the third position and the second sample text to train the initial keyword recognition model to obtain the keyword recognition model.
  • the text generation device is configured with a regular expression combining ⁇ marketing words ⁇ and ⁇ product/company name ⁇ , and the text generation device can use the regular expression to obtain the second sample text from the full amount of Internet data , and mark the corresponding second sample keyword, the second sample type, and the third position of the second sample keyword in the second sample text from the second sample text by manual labeling. Then, the second sample keyword, the second sample type and the third location are transmitted to the text generating device, and at this time, the text generating device acquires the second sample keyword, the second sample type and the third location.
  • the marketing words are words related to financial marketing, words configured in the text generating device, and the marketing words include: receiving, benefits, discounts, red envelopes, limited time, special prices, free shipping, recharge, coupons, members , voucher, blockbuster, good news, exclusive, exclusive, super value, special offer, gift, reward, exchange, activation, gift, subsidy, 11.11, 12.12, double 11, double 12, lottery, double 11, double 12.
  • the product/company name is a financial-related product and company name or abbreviation, represented by " ⁇ product/company name ⁇ ”.
  • the regular expression for combining ⁇ marketing word ⁇ and ⁇ product/company name ⁇ can be: ⁇ marketing word ⁇ .* ⁇ product/company name ⁇ ; combining ⁇ marketing word ⁇ and ⁇ product/company name ⁇
  • the combined regular expression can also be ⁇ product/company name ⁇ .* ⁇ marketing word ⁇ .
  • the first sample text may also use the sample text information obtained from the full amount of data on the Internet every preset time period by using the regular expression.
  • the corresponding second sample type is a company name; if the second sample keyword is a coupon, then the corresponding second sample type is an issued item; If the keyword in the second sample is 10 yuan, the corresponding second sample type is monetary value; if the second sample keyword is from January 1 to January 30, then the corresponding second sample type is activity time.
  • the corresponding keyword of the first second sample is the company, and the corresponding type of the first second sample is Company name
  • the corresponding first third position is (0, 2);
  • the corresponding second second sample keyword is 50 yuan,
  • the corresponding second second sample type is the amount value, the corresponding second
  • the third position is (7, 10);
  • the corresponding third second sample keyword is red envelope, the corresponding third second sample type is distribution items, and the corresponding third third position is (10, 12) .
  • the third position may be a pair of starting and ending positions where the second sample keyword appears in the second sample text.
  • the text generation device determines at least two Empty positions and at least two groups of characters corresponding to at least two vacant positions; the text generation device splices at least two vacant positions and keywords according to at least two groups of characters to obtain splicing information; the text generating device inputs the splicing information into the text to generate model to obtain at least two groups of target character information corresponding to at least two empty positions; the text generation device adds at least two groups of target character information to at least two empty positions in the splicing information to obtain target text.
  • At least two empty positions correspond to at least two groups of characters, that is, at least one empty position corresponds to a group of characters.
  • the number of text keywords when the number of text keywords is one, there will be an empty position on the left side of the text keyword, and there will be a second empty position on the right side of the text keyword; when the number of text keywords is In the case of two, there will be an empty space to the left of the first text keyword, and a second empty space between the first text keyword and the second text keyword; the second text keyword's There will be a third empty position on the right; ....;
  • the number of text keywords is N, there will be an empty position to the left of the first text keyword, the first text keyword and the second text There will be a second empty position between the keywords; ...; There will be an Nth empty position between the N-1th text keyword and the Nth text keyword; there will be a Nth empty position on the right of the Nth text keyword N+1 empty slots. That is, when the number of text keywords is N, the corresponding number of empty positions is N+1.
  • the text generation device inputs the splicing information into the text generation model to obtain at least two groups of target character information corresponding to at least two empty positions, including the text generation device inputting the splicing information into the text generation model, using the text
  • the generation model obtains the first word in each group of empty positions in at least two empty positions by sampling, that is, at least two groups of first characters are obtained; and then at least two groups of first characters and splicing information are input into the text Generate a model, use the text generation model to obtain at least two groups of second characters in at least two empty positions by sampling, until each word in the at least two empty positions is obtained by using the text generation model by sampling, that is, get At least two sets of target character information.
  • BERT can be divided into three parts: word vector conversion part, encoding part and supervision part. Among them, in the case of receiving the input text, first use the word vector conversion part to perform word vector conversion on the input text to obtain the word vector sequence (CLS, word 1, word 2, word 3, ..., word N), and then use BERT The encoding part encodes the word vector sequence, and finally uses the supervision part to determine the text category of the encoded input text.
  • the encoding part is the main body of BERT. Its main function is to encode the input N+1 word vectors to allow information interaction between all input vectors.
  • the coding part is composed of several layers of coding blocks, and the first coding block in the coding part can obtain the first coding sequence (E CLS , E 1 , E 2 , E 3 , ..., E n ) after coding the word vector sequence (E CLS , E 1 , E 2 , E 3 , ..., E n ) after coding the word vector sequence (E CLS , E 1 , E 2 , E 3 , ..., E n
  • a coding sequence is the coding sequence closest to the word vector sequence in Fig. 3)
  • the last coding block in the coding part can obtain the coding output sequence (E CLS , E 1 , E 2 , E 3 , ..., E n ) (the coding output sequence is the coding sequence closest to the supervisory part in Fig. 3).
  • the supervised part includes the labels corresponding to the input text needed for supervised training of BERT.
  • Figure 3 shows the multi-category classification of the input text.
  • the supervision part can be adjusted according to the task goal, such as performing named entity recognition, question answering, etc.
  • the function of the text generation model is to generate marketing copy (that is, target text) containing these template keywords according to the given template keywords.
  • Fig. 3 shows the supervised training process of inputting "bank” and “red envelope” as sample keywords to generate a marketing copy (output text) "Come and get the bank red envelope!.
  • word-vector conversion is performed on "bank", "red envelope” and the mask part to obtain a word-vector sequence; the word-vector sequence is encoded using the first code block in the code part to obtain the first code sequence (E CLS , E M , EM , EM , Esilver , Erow , EM , EM , EM , Ered , Epacket , EM , EM , EM ) until the last coded block pair in the coded section is utilized
  • Encoding is performed to obtain the encoded output sequence, and the supervised part is used to supervise the encoded output part to obtain the prediction result (come and get it, ---, la!-).
  • the first method is: according to the sample text and the sample keyword, determine the number of words corresponding to the blank value formed by the sample keyword in the sample text. For example, for the marketing copy "Come and get the bank red envelope!”, and the sample keywords it contains:
  • Sample type ⁇ distributed item>, sample keyword: red envelope, keyword position: (5, 7)
  • the second type is: for the number of words corresponding to all vacancy values, take the maximum value, which can be L M . Then, L M mask vectors (denoted as M) are used to insert into the vacancies formed between all sample keywords. For example, for inputting "bank” and “red envelope”, L M mask vectors can be inserted before “bank”, between "bank” and “red envelope”, and after "red envelope”. As shown in Figure 3, assuming that L M is 3, the word vector conversion part shows the final result including the mask.
  • word supervision is performed on the mask part. If the number of words in the corresponding position in the sample copy is less than L M , the supervision starts from the leftmost part of the mask part, and the supervision object of the remaining position is "-", as shown in "---" in Figure 3, which means this Words do not exist anywhere.
  • the final marketing copy can be obtained according to the “-” in the predicted target character and the sample keyword.
  • the text generation device acquires a second sample text and a second sample keyword corresponding to the second sample text.
  • the text generation device constructs a word vector sequence by using the second sample keywords.
  • Second Sample Type ⁇ Company Name>, Second Sample Keyword: Bank, Third Position: (3, 5)
  • the second sample type ⁇ distributed items>, the second sample keyword: red envelope, the third position: (5, 7)
  • the second sample keyword in the marketing copy is converted into word vectors from left to right, and each word is directly converted into a 200-dimensional vector. All the word vectors of the two words are spliced together to construct a 200-dimensional word vector sequence of length 4. Then, the mask vector is filled for all vacancies formed by the two second sample keywords.
  • the length can be obtained as: 3 (length of mask vector sequence) + 2 (length of "bank” vector sequence) + 3 (length of mask vector sequence) + 2 (length of "red envelope” vector sequence) + 3 (mask
  • the vector sequence length) is a 200-dimensional vector sequence.
  • the text generation device constructs a training label according to the second sample text.
  • the training label represents the expected result after inputting the data into the model, that is, the real marketing copy.
  • a mask is inserted for each vacancy, and it is necessary to ensure that the training label and the word vector sequence correspond to each word position.
  • a vector sequence can be constructed: [M, M, M, silver, line, M, M, M, red, bag, M, M, M], then the corresponding training label is constructed as follows: [Quick , come, collar, silver, OK, -,-,-, red, bag, la,! ,-]. Among them, "-" indicates that there is no character in the corresponding position.
  • the text generation device inputs the word vector sequence into the encoding part of the initial text generation model to obtain an encoded output sequence.
  • the text generation device trains the initial text generation model according to the encoded output sequence and the training labels, and obtains the text generation model.
  • the text generating device maps each vector (except the CLS vector) in the coded output sequence to the word list set (including "-").
  • the encoded output sequence length obtained after inputting the word vector sequence [M, M, M, silver, line, M, M, M, red, bag, M, M, M] into the encoding part of the initial text generation model A sequence of 200-dimensional vectors of 13. For each vector, multiply a trainable matrix (matrix shape: 200 ⁇ (word size+1), 1 means "-") so as to map the vector to the target word table (including "-"). After that, you can determine the mapped vector sequence and training label: [Quick, come, collar, silver, line, -,-,-, red, package, la,! ,-]
  • the cross entropy between , and gradient descent can be used to fine-tune and update the parameters of the initial text generation model.
  • the initial text generation model converges (that is, the parameters of the initial text generation model cannot be updated) or reaches the maximum number of training steps, it can be considered that the initial text generation model has been trained, thereby obtaining a text generation model.
  • the Fixed-Keywords BERT model will acquire the following capabilities: input the words “bank” and “red envelope”, and output “come to get” on the left side of “bank”, “bank”, “---” between “red envelopes”, “ ⁇ !-” on the right side of “red envelopes”.
  • "-" indicates that there is no character here, and the mask part after removing "-” is spliced together with "bank” and “red envelope” in order to get a complete marketing copy: "Come and get the bank red envelope! ".
  • the text generation device acquires a first sample keyword.
  • the text generation device while acquiring the first sample keyword, will also acquire the first sample type corresponding to the first sample keyword. Specifically, the text generation device may input the first sample text into the keyword recognition model to obtain the first sample keyword and the first sample type corresponding to the first sample text.
  • the text generation device also needs the first sample keyword sequence.
  • the input first sample keyword form may be:
  • the first sample type ⁇ company name>, the first sample keyword: bank;
  • Type of the first sample ⁇ item issued>, keyword of the first sample: interest-free coupon
  • the input first sample keyword sequence is order-sensitive, that is, the order of the input first sample keyword sequence is consistent with the order in which it appears in the final generated marketing copy. In order to subsequently generate a text template, it is necessary to obtain the first sample type.
  • the text generation device inputs the first sample keywords into the text generation model to obtain a first output text.
  • the two words bank and interest-free coupon form spaces in sequence (the left side of "bank”, between “bank” and “interest-free coupon”, and the right side of "interest-free coupon”)
  • L M the maximum value of the number of words that appear.
  • the word vector sequence that can be constructed is: [M, M, M, bank, bank, M, M, M, free, interest, coupon, M, M, M].
  • Input the constructed word vector sequence into the encoding part of the Fixed-Keywords BERT model, and the encoded output sequence output by the last encoding layer (the last encoding block) can be obtained.
  • Each vector in the coded output sequence (except the part where the ECLS and the first sample keyword is located) is mapped to the word table (including "-"), and the word with the largest probability value obtained after the mapping is selected as the current location predictions.
  • mapping for each character position, there is a numerical (probability) vector representing the possibility of each word in the word table (including "-"), and the word with the highest probability value can be used as the position prediction here out of the word.
  • all nine positions are predicted, then combine with the first sample keywords to get: ⁇ -,-,-, bank, line, big, amount,-, free, interest, coupon, enjoy, no, stop ⁇ .
  • the predicted marketing copy (the first output text is ) can be obtained by removing the "-" representing the non-existent character here: “Enjoy non-stop interest-free bank coupons”.
  • the text generation device determines a second position of the first sample keyword in the first output text by using the keyword recognition model.
  • the text generation device replaces the first sample keyword with the first sample type at the second position in the first output text to obtain a first template; and uses the first template as a text template.
  • the first sample keyword in the predicted marketing copy is replaced with the corresponding first sample type, that is, "bank” is replaced by the first sample keyword
  • “interest-free coupon” is replaced by the first sample type " ⁇ issued item>”
  • the final first template can be obtained: " ⁇ company name> large amount ⁇ issued item> enjoy Non-stop”
  • the first template is stored, that is, the persistence of the marketing copy template is completed.
  • the text generation device after the text generation device searches the target template containing the target text type in the template library, the text generation device can search for the position of the target text type in the target template, and use the field of the text keyword at the position The information replaces the field information corresponding to the target text type to obtain the target text containing text keywords.
  • the target text is the text corresponding to the text generation instruction.
  • the text generation device acquires text keywords from the text generation instruction.
  • the text generation device inputs text keywords into the type recognition model to obtain the target text type.
  • the text generation device obtains the target text type from the text generation instruction.
  • the text generation device acquires the target template from the template library.
  • the text generation device searches for the position of the target text type in the target template, and replaces the field information corresponding to the target text type with the field information of the text keyword at the position, to obtain the target text containing the text keyword.
  • the text generation device determines at least two empty positions formed according to the text keywords and at least two groups of character quantities corresponding to the at least two empty positions.
  • the text generating device splices at least two empty positions and keywords according to at least two sets of characters to obtain splicing information.
  • the text generation device inputs the splicing information into the text generation model to obtain at least two sets of target character information corresponding to at least two empty positions.
  • the text generation device adds at least two sets of target character information to at least two empty positions in the splicing information to obtain the target text.
  • an exemplary text generation method includes a seed stage and an automatic training stage, as shown in FIG. 7 .
  • the seed stage is to obtain the second sample text first, and manually mark the second sample text to obtain the second sample keywords corresponding to the second sample text, the second sample type and the second sample text corresponding to the second sample text
  • the third position of the keyword in the second sample text Utilize the second sample keyword, the second sample type, the third position and the second sample text to train the initial keyword recognition model to obtain the keyword recognition model (training keyword recognition Model).
  • the initial type recognition model is trained by using the second sample keywords and the second sample text type to obtain a type recognition model (training type recognition model).
  • the automatic training stage is to obtain the first sample text, input the first sample text into the keyword recognition model, and obtain the first sample keyword, the first sample type and the first sample key corresponding to the first sample text
  • the first position of the word in the first sample text (use the keyword recognition model to mark the first sample text); input the first sample keyword into the text generation model to obtain the first output text; use keyword recognition
  • the model determines the second position of the first sample keyword in the first output text; at the second position in the first output text, the first sample keyword is replaced by the first sample type to obtain the first template; At the first position in the first sample text, utilize the first sample type to replace the first sample keyword to obtain a second template; use the first template and the second template as text templates (get text templates), and
  • the text template is added to the template library, so that when the text generation instruction is received, the target text containing the text keyword is obtained according to the text keyword in the text generation instruction and the target template in the template library.
  • the text generation device when it receives the text generation instruction, it obtains the text keyword from the text generation instruction, searches the template library for a target template that includes the target text type corresponding to the text keyword, and Find the position of the target text type in the template, and use the field information of the text keyword to replace the field information corresponding to the target text type at this position, so as to obtain the target text containing the text keyword, and do not need to obtain the text manually information, which improves the intelligence of the text generating device when generating text information.
  • FIG. Text generating device 1 may include:
  • the obtaining part 11 is configured to, in the case of receiving a text generation instruction, obtain text keywords from the text generation instruction; if there is a target template containing the target text type in the template library, from the template Obtain the target template in the library; the template in the template library is a text template with a text type;
  • the determining part 12 is configured to determine the target text type corresponding to the text keyword
  • the replacement part 13 is configured to replace the field information corresponding to the target text type with the field information of the text keyword at the position, so as to obtain the target text containing the text keyword.
  • the device further includes an input part and an adding part
  • the acquisition part 11 is configured to acquire the first sample text
  • the input part is configured to input the first sample text into the keyword recognition model to obtain the first sample keyword, the first sample type and the first sample text corresponding to the first sample text
  • the keyword is in the first position in the first sample text
  • the first sample keyword is input into the text generation model to obtain the first output text; according to the first output text, the first sample text This, the first sample keyword, the first sample type and the first position to obtain the text template;
  • the adding part is configured to add the text template to the template library.
  • the determining part 12 is configured to determine a second position of the first sample keyword in the first output text by using a keyword recognition model
  • the replacement part 13 is configured to replace the first sample keyword with the first sample type at the second position in the first output text to obtain a first template; in the At the first position in the first sample text, use the first sample type to replace the first sample keyword to obtain a second template; use the first template and the second template as The text template.
  • the device further includes a training part
  • the acquiring part 11 is configured to acquire the second sample text and the second sample keyword corresponding to the second sample text, the second sample type corresponding to the second sample text and the second sample keyword in the second sample text the third position in the second sample text;
  • the training part is configured to use the second sample keyword, the second sample type, the third position and the second sample text to train an initial keyword recognition model to obtain the keyword recognition model.
  • the device further includes a splicing part
  • the determining part 12 is configured to determine at least two empty positions formed according to the text keywords and the at least two At least two groups of characters corresponding to the empty positions; the at least two empty positions correspond to the at least two groups of characters one by one;
  • the splicing part is configured to splice the at least two empty positions and the keyword according to the at least two groups of characters to obtain splicing information
  • the input part is configured to input the splicing information into the text generation model to obtain at least two sets of target character information corresponding to the at least two empty positions;
  • the adding part is configured to add the at least two groups of target character information to the at least two empty positions in the splicing information to obtain the target text.
  • the input part is configured to input the text keywords into the type recognition model to obtain the target text when the target text type is not carried in the text generation instruction type;
  • the obtaining part 11 is configured to obtain the target text type from the text generation instruction if the text generation instruction carries the target text type.
  • the acquisition part 11 is configured to acquire a second sample keyword and a second sample text type
  • the training part is configured to use the second sample keywords and the second sample text type to train an initial type recognition model to obtain the type recognition model.
  • the above-mentioned acquisition part 11, determination part 12 and replacement part 13 can be realized by the processor 14 on the text generation device 1, specifically CPU (Central Processing Unit, central processing unit), MPU (Microprocessor Unit, microprocessor), DSP (Digital Signal Processing, digital signal processor) or Field Programmable Gate Array (FPGA, Field Programmable Gate Array) and other realizations; the above-mentioned data storage can be realized by the memory 15 on the text generation device 1.
  • CPU Central Processing Unit, central processing unit
  • MPU Microprocessor Unit, microprocessor
  • DSP Digital Signal Processing, digital signal processor
  • FPGA Field Programmable Gate Array
  • the embodiment of the present application also provides a text generating device 1. As shown in FIG.
  • the processor 14 communicates, and the memory 15 stores a program executable by the processor 14. When the program is executed, the processor 14 executes the text generation method as described above.
  • the above-mentioned memory 15 can be a volatile memory (volatile memory), such as a random access memory (Random-Access Memory, RAM); or a non-volatile memory (non-volatile memory), such as a read-only memory (Read-Only Memory, ROM), flash memory (flash memory), hard disk (Hard Disk Drive, HDD) or solid-state hard drive (Solid-State Drive, SSD); Provide instructions and data.
  • volatile memory such as a random access memory (Random-Access Memory, RAM)
  • non-volatile memory such as a read-only memory (Read-Only Memory, ROM), flash memory (flash memory), hard disk (Hard Disk Drive, HDD) or solid-state hard drive (Solid-State Drive, SSD); Provide instructions and data.
  • An embodiment of the present application provides a computer-readable storage medium, on which a computer program is carried, and when the program is executed by the processor 14, the text generation method as described above is implemented.
  • the text generation device when it receives the text generation instruction, it obtains the text keyword from the text generation instruction, searches the template library for a target template that includes the target text type corresponding to the text keyword, and Find the position of the target text type in the template, and use the field information of the text keyword to replace the field information corresponding to the target text type at this position, so as to obtain the target text containing the text keyword, and do not need to obtain the text manually information, which improves the intelligence of the text generating device when generating text information.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) having computer-usable program code embodied therein.
  • a computer-usable storage media including but not limited to disk storage and optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
  • Embodiments of the present application provide a text generation method and device, and a storage medium.
  • the text generation method includes: in the case of receiving a text generation instruction, obtaining text keywords from the text generation instruction, and determining the target corresponding to the text keyword Text type; if there is a target template containing the target text type in the template library, obtain the target template from the template library; the template in the template library is a text template with a text type; find the target text type in the target template position, and replace the field information corresponding to the target text type with the field information of the text keyword at the position to obtain the target text containing the text keyword.
  • the text generation device when it receives the text generation instruction, it obtains the text keywords from the text generation instruction, searches the template library for the target template that includes the target text type corresponding to the text keyword, and Find the position of the target text type in the target template, and use the field information of the text keyword to replace the field information corresponding to the target text type at this position, so as to obtain the target text containing the text keyword, which does not need to be obtained manually
  • the text information improves the intelligence of the text generating device when generating the text information.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Des modes de réalisation de la présente demande divulguent un procédé et un appareil de génération de texte, et un support de stockage. Le procédé consiste : à obtenir un mot-clé de texte en provenance d'une instruction de génération de texte lorsque l'instruction de génération de texte est reçue, et à déterminer un type de texte cible correspondant au mot-clé de texte ; lorsqu'un modèle cible comprenant le type de texte cible existe dans une bibliothèque de modèles, à obtenir le modèle cible en provenance de la bibliothèque de modèles, des modèles dans la bibliothèque de modèles étant des modèles de texte pourvus de types de texte ; et à rechercher la position du type de texte cible dans le modèle cible, et à remplacer des informations de champ correspondant au type de texte cible en utilisant des informations de champ du mot-clé de texte au niveau de la position afin d'obtenir un texte cible comprenant le mot-clé de texte.
PCT/CN2022/100545 2021-11-01 2022-06-22 Procédé et appareil de génération de texte et support de stockage WO2023071242A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111284961.6A CN114118041A (zh) 2021-11-01 2021-11-01 一种文本生成方法及装置、存储介质
CN202111284961.6 2021-11-01

Publications (1)

Publication Number Publication Date
WO2023071242A1 true WO2023071242A1 (fr) 2023-05-04

Family

ID=80379767

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/100545 WO2023071242A1 (fr) 2021-11-01 2022-06-22 Procédé et appareil de génération de texte et support de stockage

Country Status (2)

Country Link
CN (1) CN114118041A (fr)
WO (1) WO2023071242A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118041A (zh) * 2021-11-01 2022-03-01 深圳前海微众银行股份有限公司 一种文本生成方法及装置、存储介质
CN114997131A (zh) * 2022-05-19 2022-09-02 北京沃东天骏信息技术有限公司 文案生成方法、模型训练方法及装置、设备、存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7752033B2 (en) * 2002-03-18 2010-07-06 National Institute Of Information And Communications Technology, Independent Administrative Institution Text generation method and text generation device
CN111930976A (zh) * 2020-07-16 2020-11-13 平安科技(深圳)有限公司 演示文稿生成方法、装置、设备及存储介质
CN112597312A (zh) * 2020-12-28 2021-04-02 深圳壹账通智能科技有限公司 文本分类方法、装置、电子设备及可读存储介质
CN113076756A (zh) * 2020-01-06 2021-07-06 北京沃东天骏信息技术有限公司 一种文本生成方法和装置
CN114118041A (zh) * 2021-11-01 2022-03-01 深圳前海微众银行股份有限公司 一种文本生成方法及装置、存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7752033B2 (en) * 2002-03-18 2010-07-06 National Institute Of Information And Communications Technology, Independent Administrative Institution Text generation method and text generation device
CN113076756A (zh) * 2020-01-06 2021-07-06 北京沃东天骏信息技术有限公司 一种文本生成方法和装置
CN111930976A (zh) * 2020-07-16 2020-11-13 平安科技(深圳)有限公司 演示文稿生成方法、装置、设备及存储介质
CN112597312A (zh) * 2020-12-28 2021-04-02 深圳壹账通智能科技有限公司 文本分类方法、装置、电子设备及可读存储介质
CN114118041A (zh) * 2021-11-01 2022-03-01 深圳前海微众银行股份有限公司 一种文本生成方法及装置、存储介质

Also Published As

Publication number Publication date
CN114118041A (zh) 2022-03-01

Similar Documents

Publication Publication Date Title
WO2023071242A1 (fr) Procédé et appareil de génération de texte et support de stockage
US10922488B1 (en) Computing numeric representations of words in a high-dimensional space
AU2014201827B2 (en) Scoring concept terms using a deep network
KR102129640B1 (ko) 스트링 변환의 귀납적 합성을 위한 랭킹 기법
CN110321482A (zh) 一种信息的推荐方法、装置及设备
CN1606004B (zh) 从文本标识语义结构的方法和装置
CN101473325B (zh) 基于桶的搜索
US10936950B1 (en) Processing sequential interaction data
US11475227B2 (en) Intelligent routing services and systems
EP3580698B1 (fr) Placement de dispositif hiérarchique avec apprentissage par renforcement
CN110678882B (zh) 使用机器学习从电子文档选择回答跨距的方法及系统
US20210004370A1 (en) Machine learning based plug-in for providing access to cloud-based analytics engine
EP3563302A1 (fr) Traitement de données séquentielles à l'aide de réseaux neuronaux récurrents
US11741190B2 (en) Multi-dimensional language style transfer
US20240062253A1 (en) Advertisement title rewriting method, apparatus and device, and storage medium
CN103885767A (zh) 用于地理区域相关网站的系统和方法
CN112800339B (zh) 信息流搜索方法、装置及设备
RU2564641C1 (ru) Интеллектуальная информационная система выбора "оптимэль"
JP6979899B2 (ja) 生成装置、学習装置、生成方法、学習方法、生成プログラム、及び学習プログラム
CN116188125B (zh) 一种写字楼的招商管理方法、装置、电子设备及存储介质
Carroll Beyond spreadsheets with R: A beginner's guide to R and RStudio
CN108717587A (zh) 一种基于多面排序网络解决推文预测转发任务的方法
JP2019133565A (ja) ニュース素材分類装置、プログラム及び学習モデル
US20240184982A1 (en) Hierarchical text generation using language model neural networks
CN112633479A (zh) 一种目标数据的预测方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22885148

Country of ref document: EP

Kind code of ref document: A1