WO2023115914A1 - Method and device for generating document having consistent writing style, and storage medium - Google Patents

Method and device for generating document having consistent writing style, and storage medium Download PDF

Info

Publication number
WO2023115914A1
WO2023115914A1 PCT/CN2022/105318 CN2022105318W WO2023115914A1 WO 2023115914 A1 WO2023115914 A1 WO 2023115914A1 CN 2022105318 W CN2022105318 W CN 2022105318W WO 2023115914 A1 WO2023115914 A1 WO 2023115914A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
author
collaborative
writing
model
Prior art date
Application number
PCT/CN2022/105318
Other languages
French (fr)
Chinese (zh)
Inventor
罗清彩
孙善宝
蒋梦梦
张晖
解萌
于晓艳
张鑫
Original Assignee
山东浪潮科学研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202111562107.1A external-priority patent/CN114239600B/en
Application filed by 山东浪潮科学研究院有限公司 filed Critical 山东浪潮科学研究院有限公司
Publication of WO2023115914A1 publication Critical patent/WO2023115914A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Definitions

  • the present application relates to the field of natural language processing, and in particular to a method, device, and storage medium for generating consistent writing style documents.
  • Reinforcement learning is different from traditional supervised learning, mainly in the reinforcement signal.
  • the reinforcement signal provided by the environment is an evaluation of the quality of the generated action.
  • Collaborative writing or cooperative writing refers to a writing plan that is completed by multiple people instead of individuals alone. It is mostly in the form of crowdsourcing and distribution to achieve efficient content production and collaboration.
  • This application provides a method, device, and storage medium for generating consistent writing style documents, which solves the technical problem of inconsistency in writing style and writing content when multiple people write collaboratively.
  • a method of generating a consistent writing style document comprising:
  • the method before generating the collaborative document through the collaborative writing model, the method further includes: pre-training the bidirectional coding representation transformer BERT, including: collecting the author's text in the collaborative writing platform The content forms a document library; select the author's document in the document library, and train the BERT general model through the author's document to obtain a personalized BERT author model; according to the BERT general model and the BERT author model, trained to obtain an author content generator; according to the BERT author model and a linear classifier, trained to obtain an author writing style discriminator; according to the BERT author model and the linear classifier, trained to obtain an author content rationality discriminator.
  • pre-training the bidirectional coding representation transformer BERT including: collecting the author's text in the collaborative writing platform The content forms a document library; select the author's document in the document library, and train the BERT general model through the author's document to obtain a personalized BERT author model; according to the BERT general model and the BERT author model, trained to obtain an author content generator; according to the BERT author
  • the method before generating the collaborative document through the collaborative writing model, the method further includes: training the collaborative writing model, including: downloading the document on the collaborative writing platform, and
  • the author document is input into the document encoder to generate the context vector of the main author document; the author document is input into the document encoder to generate the statement vector set of the author document; the context vector and the statement vector set are passed through
  • the author content generator constructs a sentence sequence; inputs the sentence sequence into the collaborative writing model, trains the collaborative writing model, and generates a collaborative document; interacts the generated collaborative document with a feedback environment , obtain a feedback result; transmit the feedback result to the collaborative writing model, the collaborative writing model updates network parameters according to the feedback result, and trains to obtain a further optimized collaborative writing model.
  • the generated collaborative document is interacted with the feedback environment to obtain the feedback result, which specifically includes: forming a feedback environment based on the author's writing style discriminator, the author's content rationality discriminator, and reader feedback, determining A reward function: interacting the generated cooperation document with a feedback environment, and calculating a feedback result according to the reward function.
  • the author's writing style discriminator and the author's content reasonable discriminator perform evaluation, and select the collaborative document with the highest similarity as the final document, which specifically includes: discriminating according to the author's writing style Judging the similarity between each cooperative document and the author's writing style by a device; judging the similarity between each cooperative document and the author's writing content according to the author's content reasonable discriminator; Weighting processing is performed to obtain the final similarity value of each cooperative document; the similarity value of each document is compared, and the cooperative document corresponding to the highest similarity value is determined as the final document.
  • the training of the collaborative writing model specifically includes: training the collaborative writing model through the A3C algorithm; using multiple worker threads to adopt the same network structure as the global model public neural network , generating a cooperation document; inputting the cooperation document into the feedback environment to obtain a feedback result; forming a final document generation strategy according to the feedback result.
  • the method before the collection of the author's text content in the collaborative writing platform, the method further includes: the collaborative writing platform receives the author's registration, checks and marks the author's identity; Receive documents uploaded by authors on the collaborative writing platform; when a document uploaded by a newly registered author appears, automatically obtain the document of the new author for training.
  • An apparatus for generating consistent writing style documents comprising:
  • the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to:
  • the at least one processor is further configured to: perform pre-training on the bidirectional encoding representation transformer BERT, including: collecting the text content of the author in the collaborative writing platform to form a document library; Select the author's document in the document library, and train the BERT general model through the author's document to obtain a personalized BERT author model; according to the BERT general model and the BERT author model, train the author content generator ; According to the BERT author model and the linear classifier, the author's writing style discriminator is obtained through training; according to the BERT author model and the linear classifier, the author's content rationality discriminator is obtained through training.
  • BERT bidirectional encoding representation transformer
  • a non-volatile storage medium storing computer-executable instructions, wherein the computer-executable instructions are set to:
  • This application provides a method, device, and storage medium for generating consistent writing style documents, which at least include the following beneficial effects: a document library is formed by collecting the text content of a large number of collaborative writing participants, and a BERT model is used to construct the author's personalized Model, and use reinforcement learning to realize model training through collaborative writing platform and interactive feedback environment, forming a consistent style collaborative writing model, which can generate documents with uniform document format, consistent content style, and more accurate and reasonable documents.
  • the cooperation documents generated by the model formed by deep learning and reinforcement learning can better discover the internal semantic connection of the article and more accurately simulate the author's writing style; the BERT model is used for prediction
  • the training can form a targeted language model based on the author's actual writing documents.
  • FIG. 1 is a schematic diagram of the steps of a method for generating a consistent writing style document provided by an embodiment of the present application
  • Fig. 2 is a schematic diagram of model training provided by the embodiment of the present application.
  • Fig. 3 is a schematic composition diagram of a device for generating consistent writing style documents provided by an embodiment of the present application.
  • Reinforcement learning is different from traditional supervised learning, mainly in the reinforcement signal.
  • the reinforcement signal provided by the environment in reinforcement learning is an evaluation of the quality of the generated action (usually a scalar signal), rather than telling the reinforcement learning system RLS (reinforcement learning system) how to produce the correct action.
  • Reinforcement learning continuously learns to make optimal actions in different environments through the task of interaction between the agent and the environment, and uses these perception generation strategies to create higher machine intelligence. Reinforcement learning has been applied in the fields of robot control, autonomous driving, and recommender systems, and has surpassed human performance in many areas.
  • Bidirectional Encoder Representation from Transformers that is, the encoder Encoder of the bidirectional transformation Transformer, compared with the traditional natural language processing mode, BERT is a revolutionary natural language processing mode, in natural language processing It has important applications in the field and has also inspired many existing computer logic frameworks and training methods. In particular, its ability to abstract continuous long sequence features has become one of the most important language processing models at present.
  • the text content of a large number of collaborative writing participants is collected to form a document library
  • the BERT model is used to train documents of different authors to form a personalized BERT author model
  • the BERT author model is used to construct a specific
  • the author's author content generator model, author writing style discriminator model and author content reasonable discriminator model use reinforcement learning to train the collaborative writing model through the collaborative writing platform and the interactive feedback environment to form a consistent style collaborative writing model and generate Documents with uniform format, consistent content and style, and more accurate and reasonable documents.
  • FIG. 1 is a schematic diagram of the steps of a method for generating a consistent writing style document provided in an embodiment of the present application, which may include the following steps:
  • S101 Obtain the main author document through the collaborative writing platform, input the main author document into the document encoder, and generate a context vector.
  • S102 Obtain the sub-author document through the collaborative writing platform, input the sub-author document into a document encoder, and generate a set of document sentence vectors.
  • S104 Generate a plurality of collaborative documents, evaluate according to the author's writing style discriminator and the author's content plausibility discriminator, and select the collaborative document with the highest similarity as the final document.
  • the collaborative writing platform before collecting the author's text content on the collaborative writing platform, receives the author's registration, verifies and marks the author's identity; receives the document uploaded by the author on the collaborative writing platform ; When a document uploaded by a newly registered author appears, automatically obtain the document of the new author for training.
  • the author content generator is trained.
  • the core of the author content generator ContentGen is the Transformer model.
  • the content generator formed based on the BERT language model training is used to generate sentences that conform to the author's style.
  • the author's writing style discriminator is trained.
  • the core of the author's writing style discriminator StyleClzfier is composed of a BERT language model and a linear classifier to determine whether the input document comes from the author's writing style.
  • the author content rationality discriminator is trained.
  • the core of the author content rationality discriminator ContClzfier is composed of a BERT language model and a linear classifier to judge whether the semantics of the input document is reasonable.
  • the collaborative writing model needs to be trained before the collaborative document is generated through the collaborative writing model.
  • the core of the collaborative writing model ⁇ is a neural network model based on the author content generator and format generator.
  • the main author is selected for training and generates Consistency Style Collaboration Document C, using the A3C training method to interact with the interactive feedback environment, and finally form a generation strategy.
  • the interactive feedback environment is composed of the author's writing style discriminator StyleClzfier, the author's content rationality discriminator ContClzfier and the reader feedback ReaderClzfier.
  • the reader feedback is the feedback of specific readers' direct evaluation of the content.
  • the collaborative writing platform is deployed in the cloud data center to provide services such as author registration review, reader management, online collaborative editing, and proofreading.
  • the cloud data center where it is located provides computing, storage, network and other cloud infrastructure services to realize document collection and provide BERT Basic models such as language models, while providing the computing power, storage and environment required for deep learning and reinforcement learning training tasks.
  • the model uses multiple worker threads to adopt the same network structure as the global model public neural network to generate collaborative documents; input the collaborative documents into the feedback environment to obtain feedback results; form the final document generation strategy based on the feedback results.
  • the feedback environment is formed according to the author's writing style discriminator, the author's content rationality discriminator and the reader's feedback, and the reward function is determined; the generated cooperation document is interacted with the feedback environment, and the feedback result is calculated according to the reward function.
  • the feedback results are transmitted to the collaborative writing model, and the collaborative writing model updates the network parameters according to the feedback results, and the further optimized collaborative writing model is trained.
  • the author's writing style discriminator and the author's content plausibility discriminator are evaluated, and the collaborative document with the highest similarity is selected as the final document.
  • the similarity between each cooperative document and the author's writing style is judged; according to the author's content reasonable discriminator, the similarity between each cooperative document and the author's writing content is judged; Values are weighted to obtain the final similarity value of each cooperative document; the similarity values of each document are compared, and the cooperative document corresponding to the highest similarity value is determined as the final document.
  • writing styles include romanticism, unrestrained style, postmodern style, documentary style, ideological genre, youth literature, network literature, romance style, critical style, pure literature style, etc.
  • the writing style of the main author's document is a critical style
  • the writing content is a criticism around social phenomena
  • the writing format is a total score format
  • the content of the secondary author's document is a total score format or other formats.
  • the cooperative document is generated according to the collaborative writing model, judge the similarity between the style of the cooperative document and the critical style, the similarity between the writing content and the social phenomenon, the similarity between the writing format and the total score format, and weight these similarities , to obtain the similarity value of each document, compare the similarity values of each document, and determine the cooperation document corresponding to the highest similarity value as the final document.
  • the embodiment of the present application also provides a corresponding device for generating a consistent writing style document, as shown in FIG. 3 .
  • This embodiment provides a device for generating consistent writing style documents, including:
  • the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to:
  • At least one processor is also used to: perform pre-training on the bidirectional encoding representation transformer BERT, including: collect the text content of the author in the collaborative writing platform to form a document library; select from the document library The author's document, the BERT general model is trained through the author's document, and the personalized BERT author model is obtained; according to the BERT general model and the BERT author model, the author content generator is trained; according to the BERT author model and the linear classifier, the training is obtained The author's writing style discriminator; according to the BERT author model and linear classifier, the author's content rationality discriminator is trained.
  • some embodiments of the present application also provide media corresponding to the above methods and devices.
  • Some embodiments of the present application provide a storage medium for generating consistent writing style documents, which store computer-executable instructions, and the computer-executable instructions are set to:
  • each embodiment in the present application is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments.
  • the description is relatively simple, and for relevant parts, please refer to the descriptions of the method embodiments.
  • the methods and media provided in the embodiments of the present application correspond to the methods one by one, therefore, the methods and media also have beneficial technical effects similar to their corresponding methods. Since the beneficial technical effects of the methods have been described in detail above, therefore, The beneficial technical effects of the method and medium will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Document Processing Apparatus (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed are a method and device for generating a document having a consistent writing style, and a storage medium. The method comprises: obtaining a main author document by means of a collaborative writing platform, and inputting the main author document into a document encoder to generate a context vector; obtaining a secondary author document by means of the collaborative writing platform, and inputting the secondary author document into the document encoder to generate a document statement vector set; inputting the context vector and the document statement vector set into a collaborative writing model to generate cooperative documents having the same style as the main author; and generating a plurality of cooperative documents, evaluating according to an author writing style discriminator and the author content rationality discriminator, and selecting a cooperative document having the highest similarity as a final document.

Description

一种生成一致性写作风格文档的方法、设备、存储介质A method, device, and storage medium for generating consistent writing style documents 技术领域technical field
本申请涉及自然语言处理领域,尤其涉及一种生成一致性写作风格文档的方法、设备、存储介质。The present application relates to the field of natural language processing, and in particular to a method, device, and storage medium for generating consistent writing style documents.
背景技术Background technique
近年来,强化学习技术受到了大家的广泛关注,特别是与深度学习结合,给人工智能领域带来了很大的进展。强化学习不同于传统的监督学习,主要表现在强化信号上,强化学习中由环境提供的强化信号是对产生动作的好坏作一种评价。In recent years, reinforcement learning technology has received widespread attention, especially in combination with deep learning, which has brought great progress to the field of artificial intelligence. Reinforcement learning is different from traditional supervised learning, mainly in the reinforcement signal. In reinforcement learning, the reinforcement signal provided by the environment is an evaluation of the quality of the generated action.
随着互联网的发展,数字内容生产越来越受重视,协同写作逐渐成为内容生产的重要手段。协同写作或合作写作是指由多人一起,而非个人单独完成的写作计划,多以众包与分发的形式,实现高效内容的生产与协同。With the development of the Internet, more and more attention has been paid to digital content production, and collaborative writing has gradually become an important means of content production. Collaborative writing or cooperative writing refers to a writing plan that is completed by multiple people instead of individuals alone. It is mostly in the form of crowdsourcing and distribution to achieve efficient content production and collaboration.
由于参与协同写作的作者写作风格的不同,在格式或内容上,最终产出的文档作品往往出现内容、风格等不一致的问题,影响读者的阅读质量。Due to the different writing styles of the authors participating in the collaborative writing, in terms of format or content, the final document works often have inconsistent content and style, which affects the reading quality of readers.
基于此,需要一种能够使协同化写作的文档内容、风格一致性的方案,以更好地提升用户阅读体验。Based on this, there is a need for a solution that can make the document content and style of collaborative writing consistent, so as to better improve the user's reading experience.
发明内容Contents of the invention
本申请提供了一种生成一致性写作风格文档的方法、设备、存储介质,解决了在多人协同写作时导致写作风格、写作内容等不一致的技术问题。This application provides a method, device, and storage medium for generating consistent writing style documents, which solves the technical problem of inconsistency in writing style and writing content when multiple people write collaboratively.
一种生成一致性写作风格文档的方法,包括:A method of generating a consistent writing style document comprising:
通过协同写作平台获取主作者文档,将所述主作者文档输入到文档编码器中,生成上下文向量;Obtaining the main author's document through the collaborative writing platform, inputting the main author's document into the document encoder, and generating a context vector;
通过协同写作平台获取从作者文档,将所述从作者文档输入到所述文档编码器,生成文档语句向量集合;Obtain the sub-author document through the collaborative writing platform, input the sub-author document into the document encoder, and generate a set of document sentence vectors;
将所述上下文向量和所述文档语句向量集合输入到协同写作模型中,生成与主作者风格相同的合作文档;Inputting the context vector and the set of document sentence vectors into a collaborative writing model to generate a collaborative document with the same style as the main author;
生成多个合作文档,根据作者写作风格判别器和所述作者内容合理判别器进行评估,选择相似度最高的合作文档作为最终文档。Multiple collaborative documents are generated, evaluated according to the author's writing style discriminator and the author's content plausibility discriminator, and the collaborative document with the highest similarity is selected as the final document.
在本申请的一种实施例中,在通过所述协同写作模型生成合作文档之前,所述方法还包括:对双向编码表征变换器BERT进行预训练,包括:采集协同写作平台中的作者的文字内容形成文档库;在所述文档库中选取所述作者的文档,通过所述作者的文档对BERT通用模型进行训练,得到个性化的BERT作者模型;根据所述BERT通用模型和所述BERT作者模型,训练得到作者内容生成器;根据所述BERT作者模型和线性分类器,训练得到作者写作风格判别器;根据所述BERT作者模型和所述线性分类器,训练得到作者内容合理性判别器。In an embodiment of the present application, before generating the collaborative document through the collaborative writing model, the method further includes: pre-training the bidirectional coding representation transformer BERT, including: collecting the author's text in the collaborative writing platform The content forms a document library; select the author's document in the document library, and train the BERT general model through the author's document to obtain a personalized BERT author model; according to the BERT general model and the BERT author model, trained to obtain an author content generator; according to the BERT author model and a linear classifier, trained to obtain an author writing style discriminator; according to the BERT author model and the linear classifier, trained to obtain an author content rationality discriminator.
在本申请的一种实施例中,在通过所述协同写作模型生成合作文档之前,所述方法还包括:对所述协同写作模型进行训练,包括:在所述协同写作平台下载文档,将主作者文档输入文档编码器,生成所述主作者文档的上下文向量;将从作者文档输入所述文档编码器生成所述从作者文档的语句向量集合;将所述上下文向量和所述语句向量集合通过所述作者内容生成器,构建语句序列;将所述语句序列输入到所述协同写作模型中,对所述协同写作模型进行训练,生成合作文档;将生成的所述合作文档与反馈环境进行交互,得到反馈结果;将所述反馈结果传输到所述协同写作模型中,所述协同写作模型根据所述反馈结果更新网络参数,训练得到进一步优化的协同写作模型。In an embodiment of the present application, before generating the collaborative document through the collaborative writing model, the method further includes: training the collaborative writing model, including: downloading the document on the collaborative writing platform, and The author document is input into the document encoder to generate the context vector of the main author document; the author document is input into the document encoder to generate the statement vector set of the author document; the context vector and the statement vector set are passed through The author content generator constructs a sentence sequence; inputs the sentence sequence into the collaborative writing model, trains the collaborative writing model, and generates a collaborative document; interacts the generated collaborative document with a feedback environment , obtain a feedback result; transmit the feedback result to the collaborative writing model, the collaborative writing model updates network parameters according to the feedback result, and trains to obtain a further optimized collaborative writing model.
在本申请的一种实施例中,将生成的所述合作文档与反馈环境进行交互,得到反馈结果,具体包括:根据作者写作风格判别器、作者内容合理判别器以及读者反馈形成反馈环境,确定奖励函数;将生成的所述合作文档与反馈环境进行交互,根据所述奖励函数计算得到反馈结果。In one embodiment of the present application, the generated collaborative document is interacted with the feedback environment to obtain the feedback result, which specifically includes: forming a feedback environment based on the author's writing style discriminator, the author's content rationality discriminator, and reader feedback, determining A reward function: interacting the generated cooperation document with a feedback environment, and calculating a feedback result according to the reward function.
在本申请的一种实施例中,所述根据作者写作风格判别器和所述作者内容合理判别器进行评估,选择相似度最高的合作文档作为最终文档,具体包括:根据所述作者写作风格判别器判断各个合作文档与作者写作风格相似度;根据所述作者内容合理判别器判断各个合作文档与作者写作内容相似度;对各个合作文档的所述作者写作风格和所述作者内容计算的值进行加权处理,得到各个合作文档最终的相似度值;对各个文档的所述相似度值进行比较,确定最高的所述相似度值对应的合作文档,作为最终文档。In an embodiment of the present application, the author's writing style discriminator and the author's content reasonable discriminator perform evaluation, and select the collaborative document with the highest similarity as the final document, which specifically includes: discriminating according to the author's writing style Judging the similarity between each cooperative document and the author's writing style by a device; judging the similarity between each cooperative document and the author's writing content according to the author's content reasonable discriminator; Weighting processing is performed to obtain the final similarity value of each cooperative document; the similarity value of each document is compared, and the cooperative document corresponding to the highest similarity value is determined as the final document.
在本申请的一种实施例中,所述对所述协同写作模型进行训练,具体包括:通过A3C算法训练所述协同写作模型;利用多个worker线程采用与全局模型公共神经网络一样的网络结构,生成合作文档;将所述合作文档输入所述反馈环境获得反馈结果;根据所述反馈结果形成最终的文档生成策略。In one embodiment of the present application, the training of the collaborative writing model specifically includes: training the collaborative writing model through the A3C algorithm; using multiple worker threads to adopt the same network structure as the global model public neural network , generating a cooperation document; inputting the cooperation document into the feedback environment to obtain a feedback result; forming a final document generation strategy according to the feedback result.
在本申请的一种实施例中,所述采集协同写作平台中的作者的文字内容之前,所述方法还包括:由协同写作平台接收作者的注册,对所述作者身份进行审核并进行标记;接收作者在诉述协同写作平台上传的文档;当出现新注册的作者上传的文档时,自动获取新作者的文档进行训练。In one embodiment of the present application, before the collection of the author's text content in the collaborative writing platform, the method further includes: the collaborative writing platform receives the author's registration, checks and marks the author's identity; Receive documents uploaded by authors on the collaborative writing platform; when a document uploaded by a newly registered author appears, automatically obtain the document of the new author for training.
一种生成一致性写作风格文档的设备,包括:An apparatus for generating consistent writing style documents, comprising:
至少一个处理器;以及,at least one processor; and,
与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:The memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to:
通过协同写作平台获取主作者文档,将所述主作者文档输入到文档编码器中,生成上下文向量;Obtaining the main author's document through the collaborative writing platform, inputting the main author's document into the document encoder, and generating a context vector;
通过协同写作平台获取从作者文档,将所述从作者文档输入到所述文档编码器,生成文档语句向量集合;Obtain the sub-author document through the collaborative writing platform, input the sub-author document into the document encoder, and generate a set of document sentence vectors;
将所述上下文向量和所述文档语句向量集合输入到协同写作模型中,生成与主作者风格相同的合作文档;Inputting the context vector and the set of document sentence vectors into a collaborative writing model to generate a collaborative document with the same style as the main author;
生成多个合作文档,根据作者写作风格判别器和所述作者内容合理判别器进行评估,选择相似度最高的合作文档作为最终文档。Multiple collaborative documents are generated, evaluated according to the author's writing style discriminator and the author's content plausibility discriminator, and the collaborative document with the highest similarity is selected as the final document.
在本申请的一种实施例中,所述至少一个处理器还用于:对双向编码表征变换器BERT进行预训练,包括:采集协同写作平台中的作者的文字内容形成文档库;在所述文档库中选取所述作者的文档,通过所述作者的文档对BERT通用模型进行训练,得到个性化的BERT作者模型;根据所述BERT通用模型和所述BERT作者模型,训练得到作者内容生成器;根据所述BERT作者模型和线性分类器,训练得到作者写作风格判别器;根据所述BERT作者模型和所述线性分类器,训练得到作者内容合理性判别器。In an embodiment of the present application, the at least one processor is further configured to: perform pre-training on the bidirectional encoding representation transformer BERT, including: collecting the text content of the author in the collaborative writing platform to form a document library; Select the author's document in the document library, and train the BERT general model through the author's document to obtain a personalized BERT author model; according to the BERT general model and the BERT author model, train the author content generator ; According to the BERT author model and the linear classifier, the author's writing style discriminator is obtained through training; according to the BERT author model and the linear classifier, the author's content rationality discriminator is obtained through training.
一种非易失性存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为:A non-volatile storage medium storing computer-executable instructions, wherein the computer-executable instructions are set to:
通过协同写作平台获取主作者文档,将所述主作者文档输入到文档编码器中,生成上下文向量;Obtaining the main author's document through the collaborative writing platform, inputting the main author's document into the document encoder, and generating a context vector;
通过协同写作平台获取从作者文档,将所述从作者文档输入到所述文档编码器,生成文档语句向量集合;Obtain the sub-author document through the collaborative writing platform, input the sub-author document into the document encoder, and generate a set of document sentence vectors;
将所述上下文向量和所述文档语句向量集合输入到协同写作模型中,生成与主作者风格相同的合作文档;Inputting the context vector and the set of document sentence vectors into a collaborative writing model to generate a collaborative document with the same style as the main author;
生成多个合作文档,根据作者写作风格判别器和所述作者内容合理判别器进行评估,选择相似度最高的合作文档作为最终文档。Multiple collaborative documents are generated, evaluated according to the author's writing style discriminator and the author's content plausibility discriminator, and the collaborative document with the highest similarity is selected as the final document.
本申请提供了一种生成一致性写作风格文档的方法、设备、存储介质,至少包括以下有益效果:通过收集海量协同写作参与者的文字内容形成文档库,采用基于BERT模型来构建作者个性化的模型,并利用强化学习通过协同写作平台与交互反馈环境实现模型训练,形成一致性风格协同写作模型,能够生成文档格式 统一、内容风格一致、更加准确合理的文档。与传统的内容风格统一方式相比,采用深度学习和强化学习形成的模型所产生的合作文档,能更好的发现文章的内在语义联系,更加准确的模拟作者的写作风格;采用BERT模型进行预训练可以基于作者的实际写作文档形成有针对性的语言模型,一方面提升了训练效率,合理地利用了现有资源,另一方面,能更好的满足领域个性化的需求;采用强化学习训练,有效的利用了协同写作平台资源,采用作者写作内容判别器和作者写作风格判别器等深度学习判别器来判断生成文档的合理性和有效性,同时借助平台实际读者反馈来训练模型,在提升模型训练效果的同时,能够形成更加符合真实用户阅读体验的一致性风格文档生成模型。This application provides a method, device, and storage medium for generating consistent writing style documents, which at least include the following beneficial effects: a document library is formed by collecting the text content of a large number of collaborative writing participants, and a BERT model is used to construct the author's personalized Model, and use reinforcement learning to realize model training through collaborative writing platform and interactive feedback environment, forming a consistent style collaborative writing model, which can generate documents with uniform document format, consistent content style, and more accurate and reasonable documents. Compared with the traditional method of unifying the content style, the cooperation documents generated by the model formed by deep learning and reinforcement learning can better discover the internal semantic connection of the article and more accurately simulate the author's writing style; the BERT model is used for prediction The training can form a targeted language model based on the author's actual writing documents. On the one hand, it improves the training efficiency and makes reasonable use of existing resources. On the other hand, it can better meet the individual needs of the field; using intensive learning training , making effective use of collaborative writing platform resources, using deep learning discriminators such as author writing content discriminators and author writing style discriminators to judge the rationality and effectiveness of generated documents, and at the same time using the platform's actual reader feedback to train the model, improving While improving the model training effect, it can form a consistent style document generation model that is more in line with the real user reading experience.
附图说明Description of drawings
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The schematic embodiments and descriptions of the application are used to explain the application and do not constitute an improper limitation to the application. In the attached picture:
图1为本申请实施例提供的一种生成一致性写作风格文档的方法的步骤示意图;FIG. 1 is a schematic diagram of the steps of a method for generating a consistent writing style document provided by an embodiment of the present application;
图2为本申请实施例提供的模型训练示意图;Fig. 2 is a schematic diagram of model training provided by the embodiment of the present application;
图3为本申请实施例提供的一种生成一致性写作风格文档的设备的组成示意图。Fig. 3 is a schematic composition diagram of a device for generating consistent writing style documents provided by an embodiment of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例对本申请进行清楚、完整的描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solution and advantages of the present application clearer, the following will give a clear and complete description of the present application in conjunction with specific embodiments of the present application. Apparently, the described embodiments are only some of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
近年来,强化学习技术受到了大家的广泛关注,特别是与深度学习结合,给人工智能领域带来了很大的进展。强化学习不同于传统的监督学习,主要表现在强化信号上,强化学习中由环境提供的强化信号是对产生动作的好坏作一种评价(通常为标量信号),而不是告诉强化学习系统RLS(reinforcement learning system)如何去产生正确的动作。强化学习通过智能体agent与环境之间交互的任务,不断地学习在不同的环境下做出最优的动作,利用这些感知生成策略,因而可以创造更高的机器智能。强化学习在机器人控制、自动驾驶、推荐系统领域等都得到了应用,在很多领域都超越了人类表现。In recent years, reinforcement learning technology has received widespread attention, especially in combination with deep learning, which has brought great progress to the field of artificial intelligence. Reinforcement learning is different from traditional supervised learning, mainly in the reinforcement signal. The reinforcement signal provided by the environment in reinforcement learning is an evaluation of the quality of the generated action (usually a scalar signal), rather than telling the reinforcement learning system RLS (reinforcement learning system) how to produce the correct action. Reinforcement learning continuously learns to make optimal actions in different environments through the task of interaction between the agent and the environment, and uses these perception generation strategies to create higher machine intelligence. Reinforcement learning has been applied in the fields of robot control, autonomous driving, and recommender systems, and has surpassed human performance in many areas.
双向编码表征变换器(Bidirectional Encoder Representation from Transformers,BERT),即双向变换Transformer的编码器Encoder,与传统的自然语言处理模式相比,BERT是具有革命性意义的自然语言处理模式,在自然语言处理领域有着重要的应用,也给很多现有的计算机逻辑框架、训练方法带来了启发。特别是其具备抽象连续长序列特征的能力,成为目前最重要的语言处理模型之一。Bidirectional Encoder Representation from Transformers (Bidirectional Encoder Representation from Transformers, BERT), that is, the encoder Encoder of the bidirectional transformation Transformer, compared with the traditional natural language processing mode, BERT is a revolutionary natural language processing mode, in natural language processing It has important applications in the field and has also inspired many existing computer logic frameworks and training methods. In particular, its ability to abstract continuous long sequence features has become one of the most important language processing models at present.
在本申请的一种实施例中,收集海量协同写作参与者的文字内容形成文档库,基于BERT模型针对不同的作者的文档进行训练形成个性化的BERT作者模型,并利用BERT作者模型构建面向具体作者的作者内容生成器模型、作者写作风格辨别器模型以及作者内容合理辨别器模型,利用强化学习通过协同写作平台与-交互反馈环境对协同写作模型进行训练,形成一致性风格协同写作模型,生成格式统一、内容风格一致、更加准确合理的文档。下面进行具体说明。In one embodiment of the present application, the text content of a large number of collaborative writing participants is collected to form a document library, the BERT model is used to train documents of different authors to form a personalized BERT author model, and the BERT author model is used to construct a specific The author's author content generator model, author writing style discriminator model and author content reasonable discriminator model use reinforcement learning to train the collaborative writing model through the collaborative writing platform and the interactive feedback environment to form a consistent style collaborative writing model and generate Documents with uniform format, consistent content and style, and more accurate and reasonable documents. A detailed description will be given below.
图1为本申请实施例提供的一种生成一致性写作风格文档的方法的步骤示意图,可以包括以下步骤:FIG. 1 is a schematic diagram of the steps of a method for generating a consistent writing style document provided in an embodiment of the present application, which may include the following steps:
S101:通过协同写作平台获取主作者文档,将主作者文档输入到文档编码器中,生成上下文向量。S101: Obtain the main author document through the collaborative writing platform, input the main author document into the document encoder, and generate a context vector.
S102:通过协同写作平台获取从作者文档,将从作者文档输入到文档编码器,生成文档语句向量集合。S102: Obtain the sub-author document through the collaborative writing platform, input the sub-author document into a document encoder, and generate a set of document sentence vectors.
S103:将上下文向量和文档语句向量集合输入到协同写作模型中,生成与主作者风格相同的合作文档。S103: Input the context vector and the document sentence vector set into the collaborative writing model to generate a collaborative document with the same style as the main author.
S104:生成多个合作文档,根据作者写作风格判别器和作者内容合理判别器进行评估,选择相似度最高的合作文档作为最终文档。S104: Generate a plurality of collaborative documents, evaluate according to the author's writing style discriminator and the author's content plausibility discriminator, and select the collaborative document with the highest similarity as the final document.
在本申请的一种实施例中,在通过协同写作模型生成合作文档之前,需要对双向编码表征变换器BERT进行预训练。In one embodiment of the present application, before the cooperative document is generated through the collaborative writing model, it is necessary to pre-train the bidirectional encoding representation transformer BERT.
在本申请的一种实施例中,采集协同写作平台中的作者的文字内容之前,由协同写作平台接收作者的注册,对作者身份进行审核并进行标记;接收作者在诉述协同写作平台上传的文档;当出现新注册的作者上传的文档时,自动获取新作者的文档进行训练。In one embodiment of the present application, before collecting the author's text content on the collaborative writing platform, the collaborative writing platform receives the author's registration, verifies and marks the author's identity; receives the document uploaded by the author on the collaborative writing platform ; When a document uploaded by a newly registered author appears, automatically obtain the document of the new author for training.
在协同写作平台采集作者的文字内容形成文档库,然后利用文档库中的文档对用到的各个模型进行训练后形成可用的模型;首先对双向编码表征变换器BERT进行预训练形成针对作者的个性化BERT模型。Collect the author's text content on the collaborative writing platform to form a document library, and then use the documents in the document library to train the various models used to form an available model; first, pre-train the bidirectional encoding representation transformer BERT to form the author's personality Optimized BERT model.
如图2所示,在文档库中选取作者的文档,将选取的作者文档输入BERT通用模型中,对BERT通用模型进行个性化训练,得到个性化的BERT作者模型。As shown in Figure 2, select the author's document in the document library, input the selected author's document into the BERT general model, and perform personalized training on the BERT general model to obtain a personalized BERT author model.
根据BERT通用模型和BERT作者模型,训练得到作者内容生成器,作者内容生成器ContentGen核心是Transformer模型,基于BERT语言模型训练形成的内容生成器,用来生成符合作者风格的语句。According to the BERT general model and the BERT author model, the author content generator is trained. The core of the author content generator ContentGen is the Transformer model. The content generator formed based on the BERT language model training is used to generate sentences that conform to the author's style.
根据BERT作者模型和线性分类器,训练得到作者写作风格判别器,作者写作风格判别器StyleClzfier核心是由基于BERT语言模型和线性分类器构成,判断输入文档是否来自作者的写作风格。According to the BERT author model and linear classifier, the author's writing style discriminator is trained. The core of the author's writing style discriminator StyleClzfier is composed of a BERT language model and a linear classifier to determine whether the input document comes from the author's writing style.
根据BERT作者模型和线性分类器,训练得到作者内容合理性判别器,作者内容合理性判别器ContClzfier核心是由基于BERT语言模型和线性分类器构成,判断输入文档语义是否合理。According to the BERT author model and linear classifier, the author content rationality discriminator is trained. The core of the author content rationality discriminator ContClzfier is composed of a BERT language model and a linear classifier to judge whether the semantics of the input document is reasonable.
在本申请的一种实施例中,在通过协同写作模型生成合作文档之前,需要对协同写作模型进行训练。协同写作模型π核心是基于作者内容生成器和格式生成器的神经网络模型,通过输入多位参与协同写作的文档(主作者文档T,从作者文档X1~Xn),选择主作者进行训练并产生一致性风格合作文档C,采用A3C训练方法与交互反馈环境进行交互,最终形成生成策略。交互反馈环境是由作者写作风格判别器StyleClzfier、作者内容合理性判别器ContClzfier以及读者反馈ReaderClzfier构成,读者反馈是由具体读者直接对内容进行的评价的反馈。In an embodiment of the present application, before the collaborative document is generated through the collaborative writing model, the collaborative writing model needs to be trained. The core of the collaborative writing model π is a neural network model based on the author content generator and format generator. By inputting multiple documents participating in collaborative writing (main author document T, secondary author documents X1~Xn), the main author is selected for training and generates Consistency Style Collaboration Document C, using the A3C training method to interact with the interactive feedback environment, and finally form a generation strategy. The interactive feedback environment is composed of the author's writing style discriminator StyleClzfier, the author's content rationality discriminator ContClzfier and the reader feedback ReaderClzfier. The reader feedback is the feedback of specific readers' direct evaluation of the content.
具体地,在协同写作平台下载文档,将主作者文档输入文档编码器,生成主作者文档的上下文向量。协同写作平台部署在云数据中心,提供作者注册审核、读者管理、在线合作编辑、校对等服务,其所在的云数据中心提供计算、存储、网络等云基础设施服务,实现文档的收集并且提供BERT语言模型等基础模型,同时提供深度学习、强化学习训练任务所需的算力、存储和环境。Specifically, download the document on the collaborative writing platform, input the main author document into the document encoder, and generate the context vector of the main author document. The collaborative writing platform is deployed in the cloud data center to provide services such as author registration review, reader management, online collaborative editing, and proofreading. The cloud data center where it is located provides computing, storage, network and other cloud infrastructure services to realize document collection and provide BERT Basic models such as language models, while providing the computing power, storage and environment required for deep learning and reinforcement learning training tasks.
将从作者文档输入文档编码器生成从作者文档的语句向量集合;将上下文向量和语句向量集合通过作者内容生成器,构建语句序列;将语句序列输入到协同写作模型中,通过A3C算法训练协同写作模型,利用多个worker线程采用与全局模型公共神经网络一样的网络结构,生成合作文档;将合作文档输入反馈环境获得反馈结果;根据反馈结果形成最终的文档生成策略。Input the document encoder from the author document to generate a sentence vector set from the author document; pass the context vector and sentence vector set through the author content generator to construct a sentence sequence; input the sentence sequence into the collaborative writing model, and train the collaborative writing through the A3C algorithm The model uses multiple worker threads to adopt the same network structure as the global model public neural network to generate collaborative documents; input the collaborative documents into the feedback environment to obtain feedback results; form the final document generation strategy based on the feedback results.
根据作者写作风格判别器、作者内容合理判别器以及读者反馈形成反馈环境,确定奖励函数;将生成的合作文档与反馈环境进行交互,根据奖励函数计算得到反馈结果。The feedback environment is formed according to the author's writing style discriminator, the author's content rationality discriminator and the reader's feedback, and the reward function is determined; the generated cooperation document is interacted with the feedback environment, and the feedback result is calculated according to the reward function.
将反馈结果传输到协同写作模型中,协同写作模型根据反馈结果更新网络参数,训练得到进一步优化的协同写作模型。The feedback results are transmitted to the collaborative writing model, and the collaborative writing model updates the network parameters according to the feedback results, and the further optimized collaborative writing model is trained.
在本申请的一种实施例中,根据作者写作风格判别器和作者内容合理判别器进行评估,选择相似度最高的合作文档作为最终文档。In one embodiment of the present application, the author's writing style discriminator and the author's content plausibility discriminator are evaluated, and the collaborative document with the highest similarity is selected as the final document.
具体地,根据作者写作风格判别器判断各个合作文档与作者写作风格相似度;根据作者内容合理判别器判断各个合作文档与作者写作内容相似度;对各个合作文档的作者写作风格和作者内容计算的值进行加权处理,得到各个合作文档最终的相似度值;对各个文档的相似度值进行比较,确定最高的相似度值对应的合作文档,作为最终文档。Specifically, according to the author's writing style discriminator, the similarity between each cooperative document and the author's writing style is judged; according to the author's content reasonable discriminator, the similarity between each cooperative document and the author's writing content is judged; Values are weighted to obtain the final similarity value of each cooperative document; the similarity values of each document are compared, and the cooperative document corresponding to the highest similarity value is determined as the final document.
例如,写作风格包括浪漫主义、豪放型、后现代风格、纪实风格、意识流派、青春文学、网络文学、言情风格、批判风格、纯文学风格等。主作者文档的写作风格为批判的风格,写作内容为围绕社会现象进行的批判,写作格式为总分总的格式,从作者文档的内容为总分格式或其他格式。那么根据协同写作模型生成合作文档后,判断合作文档的风格与批判风格的相似度、写作内容与该社会现象的相似度,写作格式与总分总格式的相似度,将这些相似度进行加权处理,得到各个文档的相似度值,对各个文档的相似度值进行比较,确定最高的相似度值对应的合作文档,作为最终文档。For example, writing styles include romanticism, unrestrained style, postmodern style, documentary style, ideological genre, youth literature, network literature, romance style, critical style, pure literature style, etc. The writing style of the main author's document is a critical style, the writing content is a criticism around social phenomena, and the writing format is a total score format, and the content of the secondary author's document is a total score format or other formats. Then, after the cooperative document is generated according to the collaborative writing model, judge the similarity between the style of the cooperative document and the critical style, the similarity between the writing content and the social phenomenon, the similarity between the writing format and the total score format, and weight these similarities , to obtain the similarity value of each document, compare the similarity values of each document, and determine the cooperation document corresponding to the highest similarity value as the final document.
以上为本申请实施例提供的一种生成一致性写作风格文档的方法,基于同样的发明思路,本申请实施例还提供了相应的一种生成一致性写作风格文档的设备,如图3所示。The above is a method for generating a consistent writing style document provided by the embodiment of the present application. Based on the same inventive idea, the embodiment of the present application also provides a corresponding device for generating a consistent writing style document, as shown in FIG. 3 .
本实施例提供了一种生成一致性写作风格文档的设备,包括:This embodiment provides a device for generating consistent writing style documents, including:
至少一个处理器;以及,at least one processor; and,
与至少一个处理器通信连接的存储器;其中,memory communicatively coupled to at least one processor; wherein,
存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够:The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to:
通过协同写作平台获取主作者文档,将主作者文档输入到文档编码器中,生成上下文向量;Obtain the main author's document through the collaborative writing platform, input the main author's document into the document encoder, and generate a context vector;
通过协同写作平台获取从作者文档,将从作者文档输入到文档编码器,生成文档语句向量集合;Obtain the author's document through the collaborative writing platform, input the author's document into the document encoder, and generate a set of document sentence vectors;
将上下文向量和文档语句向量集合输入到协同写作模型中,生成与主作者风格相同的合作文档;Input the context vector and document statement vector set into the collaborative writing model to generate a collaborative document with the same style as the main author;
生成多个合作文档,根据作者写作风格判别器和作者内容合理判别器进行评估,选择相似度最高的合作文档作为最终文档。Multiple collaborative documents are generated, evaluated according to the author's writing style discriminator and author's content plausibility discriminator, and the collaborative document with the highest similarity is selected as the final document.
在本申请的一种实施例中,至少一个处理器还用于:对双向编码表征变换器BERT进行预训练,包括:采集协同写作平台中的作者的文字内容形成文档库;在文档库中选取作者的文档,通过作者的文档对BERT通用模型进行训练,得到个性化的BERT作者模型;根据BERT通用模型和BERT作者模型,训练得到作者内容生成器;根据BERT作者模型和线性分类器,训练得到作者写作风格判别器;根据BERT作者模型和线性分类器,训练得到作者内容合理性判别器。In one embodiment of the present application, at least one processor is also used to: perform pre-training on the bidirectional encoding representation transformer BERT, including: collect the text content of the author in the collaborative writing platform to form a document library; select from the document library The author's document, the BERT general model is trained through the author's document, and the personalized BERT author model is obtained; according to the BERT general model and the BERT author model, the author content generator is trained; according to the BERT author model and the linear classifier, the training is obtained The author's writing style discriminator; according to the BERT author model and linear classifier, the author's content rationality discriminator is trained.
基于同样的思路,本申请的一些实施例还提供了上述方法、设备对应的介质。Based on the same idea, some embodiments of the present application also provide media corresponding to the above methods and devices.
本申请的一些实施例提供的一种生成一致性写作风格文档的存储介质,存储有计算机可执行指令,计算机可执行指令设置为:Some embodiments of the present application provide a storage medium for generating consistent writing style documents, which store computer-executable instructions, and the computer-executable instructions are set to:
通过协同写作平台获取主作者文档,将主作者文档输入到文档编码器中,生成上下文向量;Obtain the main author's document through the collaborative writing platform, input the main author's document into the document encoder, and generate a context vector;
通过协同写作平台获取从作者文档,将从作者文档输入到文档编码器,生成文档语句向量集合;Obtain the author's document through the collaborative writing platform, input the author's document into the document encoder, and generate a set of document sentence vectors;
将上下文向量和文档语句向量集合输入到协同写作模型中,生成与主作者风格相同的合作文档;Input the context vector and document statement vector set into the collaborative writing model to generate a collaborative document with the same style as the main author;
生成多个合作文档,根据作者写作风格判别器和作者内容合理判别器进行评估,选择相似度最高的合作文档作为最终文档。Multiple collaborative documents are generated, evaluated according to the author's writing style discriminator and author's content plausibility discriminator, and the collaborative document with the highest similarity is selected as the final document.
本申请中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于方法和介质实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in the present application is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the method and medium embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and for relevant parts, please refer to the descriptions of the method embodiments.
本申请实施例提供的方法和介质与方法是一一对应的,因此,方法和介质也具有与其对应的方法类似的有益技术效果,由于上面已经对方法的有益技术效果进行了详细说明,因此,这里不再赘述方法和介质的有益技术效果。The methods and media provided in the embodiments of the present application correspond to the methods one by one, therefore, the methods and media also have beneficial technical effects similar to their corresponding methods. Since the beneficial technical effects of the methods have been described in detail above, therefore, The beneficial technical effects of the method and medium will not be repeated here.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程方法商品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程方法商品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程方法商品或者方法中还存在另外的相同要素。It should also be noted that the term "comprises," "comprises," or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, commodity or method that includes a set of elements includes not only those elements, but also includes not expressly included. other elements listed, or also include elements inherent to the process method commodity or method. Without further limitations, an element qualified by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process method article or method comprising the element.
以上仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above are only examples of the present application, and are not intended to limit the present application. For those skilled in the art, various modifications and changes may occur in this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application shall be included within the scope of the claims of the present application.

Claims (10)

  1. 一种生成一致性写作风格文档的方法,其特征在于,包括:A method of generating a consistent writing style document, comprising:
    通过协同写作平台获取主作者文档,将所述主作者文档输入到文档编码器中,生成上下文向量;Obtaining the main author's document through the collaborative writing platform, inputting the main author's document into the document encoder, and generating a context vector;
    通过协同写作平台获取从作者文档,将所述从作者文档输入到所述文档编码器,生成文档语句向量集合;Obtain the sub-author document through the collaborative writing platform, input the sub-author document into the document encoder, and generate a set of document sentence vectors;
    将所述上下文向量和所述文档语句向量集合输入到协同写作模型中,生成与主作者风格相同的合作文档;Inputting the context vector and the set of document sentence vectors into a collaborative writing model to generate a collaborative document with the same style as the main author;
    生成多个合作文档,根据作者写作风格判别器和所述作者内容合理判别器进行评估,选择相似度最高的合作文档作为最终文档。Multiple collaborative documents are generated, evaluated according to the author's writing style discriminator and the author's content plausibility discriminator, and the collaborative document with the highest similarity is selected as the final document.
  2. 根据权利要求1所述的方法,其特征在于,在通过所述协同写作模型生成合作文档之前,所述方法还包括:The method according to claim 1, wherein, before generating a collaborative document through the collaborative writing model, the method further comprises:
    对双向编码表征变换器BERT进行预训练,包括:Pre-train the bidirectional encoding representation transformer BERT, including:
    采集协同写作平台中的作者的文字内容形成文档库;Collect the author's text content in the collaborative writing platform to form a document library;
    在所述文档库中选取所述作者的文档,通过所述作者的文档对BERT通用模型进行训练,得到个性化的BERT作者模型;Select the author's document in the document library, and train the BERT general model through the author's document to obtain a personalized BERT author model;
    根据所述BERT通用模型和所述BERT作者模型,训练得到作者内容生成器;According to the BERT general model and the BERT author model, the author content generator is obtained through training;
    根据所述BERT作者模型和线性分类器,训练得到作者写作风格判别器;According to the BERT author model and linear classifier, train the author's writing style discriminator;
    根据所述BERT作者模型和所述线性分类器,训练得到作者内容合理性判别器。According to the BERT author model and the linear classifier, an author content plausibility discriminator is obtained through training.
  3. 根据权利要求2所述的方法,其特征在于,在通过所述协同写作模型生成合作文档之前,所述方法还包括:The method according to claim 2, wherein, before generating a collaborative document through the collaborative writing model, the method further comprises:
    对所述协同写作模型进行训练,包括:The collaborative writing model is trained, including:
    在所述协同写作平台下载文档,将主作者文档输入文档编码器,生成所述主作者文档的上下文向量;Downloading documents on the collaborative writing platform, inputting the main author's document into a document encoder to generate a context vector of the main author's document;
    将从作者文档输入所述文档编码器生成所述从作者文档的语句向量集合;inputting the author document into the document encoder to generate a set of sentence vectors from the author document;
    将所述上下文向量和所述语句向量集合通过所述作者内容生成器,构建语句序列;passing the context vector and the sentence vector set through the author content generator to construct a sentence sequence;
    将所述语句序列输入到所述协同写作模型中,对所述协同写作模型进行训练,生成合作文档;inputting the statement sequence into the collaborative writing model, training the collaborative writing model, and generating a collaborative document;
    将生成的所述合作文档与反馈环境进行交互,得到反馈结果;Interacting the generated cooperation document with the feedback environment to obtain a feedback result;
    将所述反馈结果传输到所述协同写作模型中,所述协同写作模型根据所述反馈结果更新网络参数,训练得到进一步优化的协同写作模型。The feedback result is transmitted to the collaborative writing model, and the collaborative writing model updates network parameters according to the feedback result, and is trained to obtain a further optimized collaborative writing model.
  4. 根据权利要求3所述的方法,其特征在于,将生成的所述合作文档与反馈环境进行交互,得到反馈结果,具体包括:The method according to claim 3, wherein the generated cooperation document is interacted with a feedback environment to obtain a feedback result, which specifically includes:
    根据作者写作风格判别器、作者内容合理判别器以及读者反馈形成反馈环境,确定奖励函数;According to the author's writing style discriminator, author's content reasonable discriminator and reader feedback to form a feedback environment, determine the reward function;
    将生成的所述合作文档与反馈环境进行交互,根据所述奖励函数计算得到反馈结果。The generated cooperation document is interacted with a feedback environment, and a feedback result is obtained through calculation according to the reward function.
  5. 根据权利要求1所述的方法,其特征在于,所述根据作者写作风格判别器和所述作者内容合理判别器进行评估,选择相似度最高的合作文档作为最终文档,具体包括:The method according to claim 1, wherein the evaluation is performed according to the author's writing style discriminator and the author's content rational discriminator, and the collaborative document with the highest similarity is selected as the final document, specifically comprising:
    根据所述作者写作风格判别器判断各个合作文档与作者写作风格相似度;Judging the similarity between each collaborative document and the author's writing style according to the author's writing style discriminator;
    根据所述作者内容合理判别器判断各个合作文档与作者写作内容相似度;Judging the similarity between each collaborative document and the author's writing content according to the author's content reasonable discriminator;
    对各个合作文档的所述作者写作风格和所述作者内容计算的值进行加权处理,得到各个合作文档最终的相似度值;performing weighting processing on the author's writing style and the author's content calculation value of each cooperative document to obtain the final similarity value of each cooperative document;
    对各个文档的所述相似度值进行比较,确定最高的所述相似度值对应的合作文档,作为最终文档。The similarity values of the various documents are compared, and the cooperation document corresponding to the highest similarity value is determined as the final document.
  6. 根据权利要求3所述的方法,其特征在于,所述对所述协同写作模型进行训练,具体包括:The method according to claim 3, wherein the training of the collaborative writing model specifically includes:
    通过A3C算法训练所述协同写作模型;Training the collaborative writing model through the A3C algorithm;
    利用多个worker线程采用与全局模型公共神经网络一样的网络结构,生成合作文档;Use multiple worker threads to adopt the same network structure as the global model public neural network to generate collaborative documents;
    将所述合作文档输入所述反馈环境获得反馈结果;inputting the cooperation document into the feedback environment to obtain a feedback result;
    根据所述反馈结果形成最终的文档生成策略。A final document generation strategy is formed according to the feedback results.
  7. 根据权利要求2所述的方法,其特征在于,所述采集协同写作平台中的作者的文字内容之前,所述方法还包括:The method according to claim 2, wherein, before the collection of the author's text content in the collaborative writing platform, the method also includes:
    由协同写作平台接收作者的注册,对所述作者身份进行审核并进行标记;The collaborative writing platform receives the author's registration, reviews and marks the author's identity;
    接收作者在诉述协同写作平台上传的文档;Receive documents uploaded by authors on the collaborative writing platform;
    当出现新注册的作者上传的文档时,自动获取新作者的文档进行训练。When a document uploaded by a newly registered author appears, the document of the new author is automatically obtained for training.
  8. 一种生成一致性写作风格文档的设备,其特征在于,包括:An apparatus for generating consistent writing style documents, comprising:
    至少一个处理器;以及,at least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:The memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to:
    通过协同写作平台获取主作者文档,将所述主作者文档输入到文档编码器中,生成上下文向量;Obtaining the main author's document through the collaborative writing platform, inputting the main author's document into the document encoder, and generating a context vector;
    通过协同写作平台获取从作者文档,将所述从作者文档输入到所述文档编码器,生成文档语句向量集合;Obtain the sub-author document through the collaborative writing platform, input the sub-author document into the document encoder, and generate a set of document sentence vectors;
    将所述上下文向量和所述文档语句向量集合输入到协同写作模型中,生成与主作者风格相同的合作文档;Inputting the context vector and the set of document sentence vectors into a collaborative writing model to generate a collaborative document with the same style as the main author;
    生成多个合作文档,根据作者写作风格判别器和所述作者内容合理判别器进行评估,选择相似度最高的合作文档作为最终文档。Multiple collaborative documents are generated, evaluated according to the author's writing style discriminator and the author's content plausibility discriminator, and the collaborative document with the highest similarity is selected as the final document.
  9. 根据权利要求8所述的设备,其特征在于,所述至少一个处理器还用于:The device according to claim 8, wherein the at least one processor is further configured to:
    对双向编码表征变换器BERT进行预训练,包括:Pre-train the bidirectional encoding representation transformer BERT, including:
    采集协同写作平台中的作者的文字内容形成文档库;Collect the author's text content in the collaborative writing platform to form a document library;
    在所述文档库中选取所述作者的文档,通过所述作者的文档对BERT通用模型进行训练,得到个性化的BERT作者模型;Select the author's document in the document library, and train the BERT general model through the author's document to obtain a personalized BERT author model;
    根据所述BERT通用模型和所述BERT作者模型,训练得到作者内容生成器;According to the BERT general model and the BERT author model, the author content generator is obtained through training;
    根据所述BERT作者模型和线性分类器,训练得到作者写作风格判别器;According to the BERT author model and linear classifier, train the author's writing style discriminator;
    根据所述BERT作者模型和所述线性分类器,训练得到作者内容合理性判别器。According to the BERT author model and the linear classifier, an author content plausibility discriminator is obtained through training.
  10. 一种非易失性存储介质,存储有计算机可执行指令,其特征在于,所述计算机可执行指令设置为:A non-volatile storage medium storing computer-executable instructions, wherein the computer-executable instructions are set to:
    通过协同写作平台获取主作者文档,将所述主作者文档输入到文档编码器中,生成上下文向量;Obtaining the main author's document through the collaborative writing platform, inputting the main author's document into the document encoder, and generating a context vector;
    通过协同写作平台获取从作者文档,将所述从作者文档输入到所述文档编码器,生成文档语句向量集合;Obtain the sub-author document through the collaborative writing platform, input the sub-author document into the document encoder, and generate a set of document sentence vectors;
    将所述上下文向量和所述文档语句向量集合输入到协同写作模型中,生成与主作者风格相同的合作文档;Inputting the context vector and the set of document sentence vectors into a collaborative writing model to generate a collaborative document with the same style as the main author;
    生成多个合作文档,根据作者写作风格判别器和所述作者内容合理判别器进行评估,选择相似度最高的合作文档作为最终文档。Multiple collaborative documents are generated, evaluated according to the author's writing style discriminator and the author's content plausibility discriminator, and the collaborative document with the highest similarity is selected as the final document.
PCT/CN2022/105318 2021-12-20 2022-07-13 Method and device for generating document having consistent writing style, and storage medium WO2023115914A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111562107.1A CN114239600B (en) 2021-12-20 Method, equipment and storage medium for generating consistent writing style document
CN202111562107.1 2021-12-20

Publications (1)

Publication Number Publication Date
WO2023115914A1 true WO2023115914A1 (en) 2023-06-29

Family

ID=80759148

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/105318 WO2023115914A1 (en) 2021-12-20 2022-07-13 Method and device for generating document having consistent writing style, and storage medium

Country Status (1)

Country Link
WO (1) WO2023115914A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688834A (en) * 2019-08-22 2020-01-14 阿里巴巴集团控股有限公司 Method and equipment for rewriting intelligent manuscript style based on deep learning model
CN111737983A (en) * 2020-06-22 2020-10-02 网易(杭州)网络有限公司 Text writing style processing method, device, equipment and storage medium
US20210149990A1 (en) * 2019-11-19 2021-05-20 International Business Machines Corporation Iteratively expanding concepts
CN113688606A (en) * 2021-07-30 2021-11-23 达观数据(苏州)有限公司 Method for automatically writing document report
CN114239600A (en) * 2021-12-20 2022-03-25 山东浪潮科学研究院有限公司 Method, equipment and storage medium for generating consistent writing style document

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688834A (en) * 2019-08-22 2020-01-14 阿里巴巴集团控股有限公司 Method and equipment for rewriting intelligent manuscript style based on deep learning model
US20210149990A1 (en) * 2019-11-19 2021-05-20 International Business Machines Corporation Iteratively expanding concepts
CN111737983A (en) * 2020-06-22 2020-10-02 网易(杭州)网络有限公司 Text writing style processing method, device, equipment and storage medium
CN113688606A (en) * 2021-07-30 2021-11-23 达观数据(苏州)有限公司 Method for automatically writing document report
CN114239600A (en) * 2021-12-20 2022-03-25 山东浪潮科学研究院有限公司 Method, equipment and storage medium for generating consistent writing style document

Also Published As

Publication number Publication date
CN114239600A (en) 2022-03-25

Similar Documents

Publication Publication Date Title
WO2021031480A1 (en) Text generation method and device
CN109785833A (en) Human-computer interaction audio recognition method and system for smart machine
CN110750987B (en) Text processing method, device and storage medium
Walczak et al. Challenges for higher education in the era of widespread access to Generative AI
Juan et al. Particle swarm optimization neural network for research on artificial intelligence college English classroom teaching framework
CN112069781B (en) Comment generation method and device, terminal equipment and storage medium
Zhao et al. Multi-task learning with graph attention networks for multi-domain task-oriented dialogue systems
Pan et al. Dial2desc: end-to-end dialogue description generation
Shihab et al. Revisiting the use of ChatGPT in business and educational fields: Possibilities and challenges
CN116956116A (en) Text processing method and device, storage medium and electronic equipment
Kaiss et al. Effectiveness of an Adaptive Learning Chatbot on Students’ Learning Outcomes Based on Learning Styles.
Huang et al. Recent advances in artificial intelligence for video production system
WO2023115914A1 (en) Method and device for generating document having consistent writing style, and storage medium
Xia Construction and implementation of music recommendation model utilising deep learning artificial neural network and mobile edge computing
CN112668344B (en) Complexity-controllable diversified problem generation method based on mixed expert model
Thorat et al. Improving conversation modelling using attention based variational hierarchical RNN
Muangnak et al. The neural network conversation model enables the commonly asked student query agents
Zhao et al. Mutually improved response generation and dialogue summarization for multi-domain task-oriented dialogue systems
Zheng et al. ELion: An intelligent Chinese composition tutoring system based on large language models
Steindl et al. Generating Synthetic Dialogues from Prompts to Improve Task-Oriented Dialogue Systems
Zhang An Assisted Teaching Method of College English Translation Using Generative Adversarial Network
Guan et al. Research on the Generation of Emotional Dialogue Statements in Generative Adversarial Networks
Ye Rule-Based AI System Application on College English Teaching Path Based on Computer-Aided Technology
Tian et al. Unsupervised style control for image captioning
Yang et al. Multi-turn target-guided topic prediction with Monte Carlo tree search

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22909245

Country of ref document: EP

Kind code of ref document: A1