CN117574920A - Training method and device for text prediction model and storage medium - Google Patents

Training method and device for text prediction model and storage medium Download PDF

Info

Publication number
CN117574920A
CN117574920A CN202311520789.9A CN202311520789A CN117574920A CN 117574920 A CN117574920 A CN 117574920A CN 202311520789 A CN202311520789 A CN 202311520789A CN 117574920 A CN117574920 A CN 117574920A
Authority
CN
China
Prior art keywords
sample
text
target
dialogue
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311520789.9A
Other languages
Chinese (zh)
Inventor
刘刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202311520789.9A priority Critical patent/CN117574920A/en
Publication of CN117574920A publication Critical patent/CN117574920A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Marketing (AREA)
  • Biomedical Technology (AREA)
  • Human Resources & Organizations (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a training method, a training device and a storage medium of a text prediction model, which can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, internet of vehicles and the like, and the method comprises the following steps: acquiring first sample attribute information of a first sample object, second sample attribute information of a second sample object and sample context information; constructing a first sample text in a dialogue opening scene based on the first sample attribute information and the second sample attribute information; based on the sample context information, extracting a sample problem text, and constructing a second sample text in a dialogue reply scene; constructing a third sample text in the dialogue modification scene based on the sample context information; training the large language model based on the first sample text, the second sample text and the third sample text to obtain a text prediction model. The method and the device improve the communication efficiency of the object under a plurality of scenes.

Description

Training method and device for text prediction model and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a training method and apparatus for a text prediction model, and a storage medium.
Background
With the rapid development of the mobile internet, a large number of users are gathered on an instant messaging software platform, a large number of acquaintance users and unfamiliar stranger users exist, and a large number of point-to-point, point-to-group and other different communication conversations are communicated on the platform every day. The communication frequencies among the users are also greatly different and different, meanwhile, the communication expression capacity of different users is also greatly different, some users are not good in expression, especially the difference of different users is likely to be great, many people feel bad mouth feeling and insufficient social capacity, and are not willing to communicate and communicate with other people. In the prior art, if a person does not know what topics are covered in the chat process, the person usually searches the latest hot topics through a search engine or searches some suggestions through a knowledge sharing platform similar to the knowledge, or if the person wants to express some special ideas or carry out coloring treatment on a piece of text or ideas which the person wants to send, the person wants to try to adjust and optimize the effect of the text, and also searches some independent experience sharing and summarization such as a transcription tool or community content posts collected for different scenes, such as friend making love, game play, film and television drama and the like, the related integration with the existing social network chat system itself is not carried out, and the provided information is not enough in pertinence depending on keywords input by the retrieval user and own judgment; and search query while chatting and communicating, seriously influence the effect of dialogue and communicate, can't achieve the seamless connection, consuming time and laborious.
Disclosure of Invention
The application provides a training method, a training device and a storage medium of a text prediction model, which can improve the communication efficiency and the communication effect of a target object and other objects in a dialogue opening scene, a dialogue reply scene and a dialogue correction scene.
In one aspect, the present application provides a training method of a text prediction model, the method including:
acquiring first sample attribute information of a first sample object, second sample attribute information of a second sample object and sample context information; the second sample object is an interaction object of the first sample object, and the sample context information is interaction information of the first sample object and the second sample object;
constructing a first sample text in a dialogue opening scene based on the first sample attribute information and the second sample attribute information; the first text label is marked with a sample initial text label;
extracting sample problem text based on the sample context information, and constructing a second sample text in a dialogue reply scene; the second sample text is marked with a sample reply text label;
constructing a third sample text in the dialogue modification scene based on the sample context information; the third sample text is marked with a sample correction text label; the sample correction text is a text obtained by rewriting and coloring the third sample text;
Constructing a comprehensive sample text based on the first sample text, the second sample text and the third sample text;
training a large language model based on the comprehensive sample text, the sample initial text label, the sample reply text label and the sample correction text label to obtain a text prediction model; the text prediction model is used for predicting target text of a target object in the dialogue opening scene, the dialogue reply scene or the dialogue correction scene.
In another aspect, a training device for a text prediction model is provided, where the device includes:
an information acquisition module for acquiring first sample attribute information of a first sample object, second sample attribute information of a second sample object, and sample context information; the second sample object is an interaction object of the first sample object, and the sample context information is interaction information of the first sample object and the second sample object;
the first sample construction module is used for constructing a first sample text in a dialogue opening scene based on the first sample attribute information and the second sample attribute information; the first text label is marked with a sample initial text label;
The second sample text construction module is used for extracting sample problem texts based on the sample context information and constructing second sample texts in a dialogue reply scene; the second sample text is marked with a sample reply text label;
a third sample text construction module, configured to construct a third sample text in a dialogue modification scene based on the sample context information; the third sample text is marked with a sample correction text label; the sample correction text is a text obtained by rewriting and coloring the third sample text;
a comprehensive sample text construction module for constructing a comprehensive sample text based on the first sample text, the second sample text, and the third sample text;
the model training module is used for training a large-scale language model based on the comprehensive sample text, the sample initial text label, the sample reply text label and the sample correction text label to obtain a text prediction model; the text prediction model is used for predicting target text of a target object in the dialogue opening scene, the dialogue reply scene or the dialogue correction scene.
In another aspect, a training device for a text prediction model is provided, the device comprising a processor and a memory, the memory storing at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by the processor to implement a training method for a text prediction model as described above.
Another aspect provides a computer storage medium storing at least one instruction or at least one program loaded and executed by a processor to implement a training method of a text prediction model as described above.
Another aspect provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium and executes the computer instructions to cause the computer device to perform the training method of the text prediction model as described above.
The training method, the training device and the storage medium for the text prediction model have the following technical effects:
Acquiring first sample attribute information of a first sample object, second sample attribute information of a second sample object and sample context information; the second sample object is an interaction object of the first sample object, and the sample context information is interaction information of the first sample object and the second sample object; constructing a first sample text in a dialogue opening scene based on the first sample attribute information and the second sample attribute information; the first text labels the initial text labels of the samples; based on the sample context information, extracting a sample problem text, and constructing a second sample text in a dialogue reply scene; the second sample text is marked with a sample reply text label; constructing a third sample text in the dialogue modification scene based on the sample context information; the third sample text is marked with a sample correction text label; the sample correction text is a text obtained by rewriting and moisturizing the third sample text; constructing a comprehensive sample text based on the first sample text, the second sample text and the third sample text; training the large language model based on the comprehensive sample text, the sample initial text label, the sample reply text label and the sample correction text label to obtain a text prediction model; the text prediction model is used for predicting target text of a target object in a dialogue opening scene, a dialogue reply scene or a dialogue modification scene. The integrated sample text constructed by the method comprises a dialogue opening scene, a dialogue reply scene and a sample text under a dialogue correction scene, and the large language model is trained through the integrated sample text, so that the text prediction model obtained through training can accurately predict target texts under three scenes, and the communication efficiency and the communication effect of the target objects and other objects under the dialogue opening scene, the dialogue reply scene and the dialogue correction scene are improved.
Drawings
In order to more clearly illustrate the embodiments of the present description or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a schematic diagram of a training system for a text prediction model provided in an embodiment of the present disclosure;
FIG. 2 is a flow chart of a training method of a text prediction model according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of a method for obtaining first sample attribute information of a first sample object, second sample attribute information of a second sample object, and sample context information according to an embodiment of the present disclosure;
FIG. 4 is a flowchart of a method for constructing a third sample text in a dialogue modification scene based on the sample context information according to the embodiment of the present disclosure;
FIG. 5 is a flowchart of a method for training a large language model to obtain a text prediction model based on the integrated sample text, the sample initial text label, the sample reply text label, and the sample corrected text label according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a large language model provided by an embodiment of the present disclosure;
fig. 7 is a schematic diagram of an operating principle of a lorea model according to an embodiment of the present disclosure;
fig. 8 is a schematic page diagram of a target terminal displaying a target initial text according to an embodiment of the present disclosure;
fig. 9 is a schematic page diagram of a target terminal displaying a target reply text according to an embodiment of the present disclosure;
fig. 10 is a schematic page diagram of a target terminal displaying a target corrected text according to an embodiment of the present disclosure;
FIG. 11 is a schematic structural diagram of a training device for text prediction model according to an embodiment of the present disclosure;
fig. 12 is a schematic structural diagram of a server according to an embodiment of the present disclosure.
Detailed Description
The technical solutions of the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present application based on the embodiments herein.
First, partial nouns or terms appearing in the course of the description of the embodiments of the specification are explained as follows:
the content is as follows: the content recommended to the user in the small world may contain pictures, texts or pictures or short videos, the pictures are usually short pictures which are actively edited and released after a public number is opened from media, the short pictures comprise vertical small pictures and horizontal short pictures, the videos are usually provided by PGC or UGC content producers, the videos are provided in the form of information streams based on user interest points through a pushing and social distribution engine.
PGC (full name: professional Generated Content) Internet terminology refers to a facility or organization that professionally produces content.
UGC (User Generated Content) refers to the original content of the user, which is brought up with the web2.0 concept advocating personalization as a main feature. It is not a specific service, but a new way for users to use the internet, namely to change from original download to download and upload again.
Instant messaging: instant messaging (Instant Messaging) is the most popular way of communicating over the Internet, allowing two or more people to communicate text messages, documents, voice and video in real time using a network. Various instant messaging software is also layered; service providers also offer an increasing variety of communication service functions. Unfortunately, the Internet has become a true information highway, and a very huge business complex social network system is formed based on a large amount of instant messaging software, and a complex and massive user relationship chain is formed.
LLM: the large language model (English Large Language Model, LLM for short) refers to a computer model capable of processing and generating natural language; it represents a significant advancement in the field of artificial intelligence and is expected to change this field through learned knowledge. LLM can predict the next word or sentence through learning the statistical rule and semantic information of language data, and with the continuous expansion of input data set and parameter space, LLM's ability also can improve correspondingly. It is used in a variety of application fields such as robotics, machine learning, machine translation, speech recognition, image processing, etc., and so is called a multi-Modal Large Language Model (MLLM).
Social distribution: refers to recommending content to a user through a social media platform, chat application, or other social network. This way streaming content can be recommended to friends, attendees or groups of the user so that they can see the content as well. The distribution mode can be realized through an algorithm of a social network, and the most relevant content is recommended to the user according to the factors such as interests, behaviors and social relations of the user. Social distribution may also be achieved by users sharing content themselves, who may share content of interest to their social network so that friends and attention can also see the content. Social distribution can help the streaming content spread faster, while also increasing the exposure and visibility of the content.
Instruction Tuning: instruction trimming, which is to generate instructions individually for each task by performing trimming over several tasks and then evaluating generalization ability (zero shot) over specific tasks, wherein pre-trained model parameters are thawed, typically over a large number of NLP task datasets disclosed, for inspiring understanding ability of the language model, by giving more obvious instructions for the model to understand and make correct feedback.
Prompt learning, one type of learning method in machine learning: under the condition of not significantly changing the structure and parameters of the pre-training language model, the effect of the model is greatly improved by adding 'prompt information' to the input and enhancing the prompt information as an information, the input can be regarded as an instruction to a task, and the input is also multiplexing of a pre-training target, wherein the essence of the input is the enhancement of the effective training of the parameters, and the prompt template is independently generated and then the full-shot fine adjustment and evaluation are carried out on each task.
RLHF: human feedback reinforcement learning (Reinforcement Learning with Human Feedback) is an extension of Reinforcement Learning (RL) in that it incorporates human feedback into the training process, providing a natural, humanized interactive learning process for the machine. In addition to the reward signal, RLHF agents get feedback from humans, learn with a wider view and higher efficiency, similar to how humans learn from another person's expertise. By setting up a bridge between the agent and the human, RLHF allows the human to direct the machine and allows the machine to master decision elements that are significantly embedded in human experience, as an effective alignment technique, RLHF can help to some extent mitigate the harmful content generated by Large Language Models (LLMs) and improve information integrity.
Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include, for example, sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The pre-training model is also called a large model and a basic model, and can be widely applied to all large-direction downstream tasks of artificial intelligence after fine adjustment. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like. The pre-training model is the latest development result of deep learning, and integrates the technology.
The Pre-training model (Pre-training model), also called a matrix model and a large model, refers to a deep neural network (Deep neural network, DNN) with large parameters, trains massive unlabeled data, utilizes the function approximation capability of the large-parameter DNN to enable PTM to extract common features on the data, and is suitable for downstream tasks through fine tuning (fine tuning), efficient fine tuning (PEFT) of parameters, prompt-tuning and other technologies. Therefore, the pre-training model can achieve ideal effects in a small sample (Few-shot) or Zero sample (Zero-shot) scene. PTM can be classified according to the data modality of the process into a language model (ELMO, BERT, GPT), a visual model (swin-transducer, viT, V-MOE), a speech model (VALL-E), a multi-modal model (ViBERT, CLIP, flamingo, gato), etc., wherein a multi-modal model refers to a model that builds a representation of the characteristics of two or more data modalities. The pre-training model is an important tool for outputting Artificial Intelligence Generation Content (AIGC), and can also be used as a general interface for connecting a plurality of specific task models.
The intelligent transportation is a new generation information technology such as the Internet of things, space perception, cloud computing, mobile Internet and the like in the whole transportation field, and the theories and tools such as traffic science, system methods, artificial intelligence, knowledge mining and the like are comprehensively utilized, the comprehensive perception, deep fusion, active service and scientific decision making are taken as targets, and the related data of the transportation are deeply mined by constructing a real-time dynamic information service system to form a problem analysis model, so that the improvement of the industry resource allocation optimizing capability, public decision making capability, industry management capability and public service capability is realized, the transportation is promoted to be safer, more efficient, more convenient, more economical, more environment-friendly and more comfortable to operate and develop, and the transportation related industry is driven to be transformed and upgraded.
It will be appreciated that in the specific embodiments of the present application, related data such as user session records, attribute information, etc. are referred to, and when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use, and processing of related data need to comply with related laws and regulations and standards of related countries and regions.
It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, fig. 1 is a schematic diagram of a training system of a text prediction model according to an embodiment of the present disclosure, and as shown in fig. 1, the training system of the text prediction model may at least include a server 01 and a client 02.
Specifically, in the embodiment of the present disclosure, the server 01 may include a server that operates independently, or a distributed server, or a server cluster that is formed by a plurality of servers, and may also be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing services such as big data and artificial intelligence platforms. The server 01 may include a network communication unit, a processor, a memory, and the like. Specifically, the server 01 may be configured to obtain a first sample text, a second sample text, and a third sample text, and construct a comprehensive sample text; training the large language model based on the comprehensive sample text, the sample initial text label, the sample reply text label and the sample correction text label to obtain a text prediction model; and predicting target text of the target object in a dialogue opening scene, a dialogue reply scene or a dialogue correction scene according to the text prediction model.
Specifically, in the embodiment of the present disclosure, the client 02 may include smart phones, desktop computers, tablet computers, notebook computers, digital assistants, smart wearable devices, smart speakers, vehicle terminals, smart televisions, and other types of physical devices, or may include software running in the physical devices, for example, web pages provided by some service providers to users, or may also provide applications provided by the service providers to users. Specifically, the client 02 may be configured to display the target text of the target object in a session initiation scene, a session reply scene, or a session correction scene.
In the following, a method for training a text prediction model according to the present application is described, and fig. 2 is a schematic flow chart of a method for training a text prediction model according to an embodiment of the present application, where the method operation steps described in the examples or the flowcharts are provided, but more or fewer operation steps may be included based on conventional or non-creative labor. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented in a real system or server product, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multithreaded environment). As shown in fig. 2, the method may include:
S201: acquiring first sample attribute information of a first sample object, second sample attribute information of a second sample object and sample context information; the second sample object is an interaction object of the first sample object, and the sample context information is interaction information of the first sample object and the second sample object.
In the embodiment of the present disclosure, the first sample object and the second sample object may be two interaction objects in the instant messaging application; the first sample attribute information may include, but is not limited to, portrait information, interactive style information, etc. of the first sample object; the second sample attribute information may include, but is not limited to, portrait information, interactive style information, etc. of the second sample object; the sample context information may be historical interaction information of the first sample object and the second sample object; for example, sample context information may be determined from a historical dialog record of a first sample object with a second sample object; the sample context information may be a history dialogue record within a preset history period.
In the embodiment of the present specification, as shown in fig. 3, the above-mentioned acquisition of the first sample attribute information of the first sample object, the second sample attribute information of the second sample object, and the sample context information includes:
S20101: acquiring a first sample portrait and first interaction style information of the first sample object to obtain the first sample attribute information;
in the embodiment of the specification, information can be set in the instant messaging application through the first sample object, and a first sample portrait of the first sample object is obtained; the first interaction style information of the first sample object can be determined according to the image of the first sample object, the historical interaction information of the first sample object and other objects and the like; the first interaction style information may include, but is not limited to, serious, lively, and the like styles.
S20103: acquiring a second sample portrait and second interaction style information of the second sample object to obtain second sample attribute information;
in the embodiment of the present specification, the second sample portrait of the second sample object may be obtained by setting information of the second sample object in the instant messaging application; the second interaction style information of the second sample object can be determined according to the image of the second sample object, the historical interaction information of the second sample object and other objects and the like; the second interaction style information may include, but is not limited to, serious, lively, and the like styles.
S20105: and acquiring the sample context information based on the historical interaction records of the first sample object and the second sample object.
In the embodiment of the present disclosure, the sample context information may be obtained according to a history of interaction between the first sample object and the second sample object; for example, a history interaction record within a preset history period may be determined as sample context information.
S203: constructing a first sample text in a dialogue opening scene based on the first sample attribute information and the second sample attribute information; the first sample is labeled with a sample initial text label.
In an embodiment of the present disclosure, constructing a first sample text in a session initiation scene based on the first sample attribute information and the second sample attribute information includes:
constructing a first sample text of the first sample object in the dialogue opening scene based on the first sample portrait and the first interaction style information;
and constructing a first sample text of the second sample object in the dialogue opening scene based on the second sample portrait and the second interaction style information.
In this embodiment of the present disclosure, the session initiation scenario is an call or a bridge, and in this scenario, no history interaction record exists between the first sample object and the second sample object, for example, only one history record (for example, a friend adding message) exists in the last 10 minutes, or no history record exists in the last 10 minutes, and at this time, the portrait and the personality information of both chat parties can be comprehensively considered to obtain the first text sample.
S205: based on the sample context information, extracting a sample problem text, and constructing a second sample text in a dialogue reply scene; the second sample text is marked with a sample reply text label.
In an embodiment of the present disclosure, extracting a sample question text based on the sample context information, and constructing a second sample text in a dialogue reply scene includes:
constructing a sample dialogue text based on the sample context information; the sample dialogue text comprises the sample question text and a sample reply text; the sample reply text is used for determining a label corresponding to the sample question text;
and constructing a second sample text in the dialogue reply scene based on the sample dialogue text.
In the embodiment of the present specification, in a dialogue reply scene, a first sample object is required to reply to a question text proposed by a second sample object; a plurality of sample dialogue texts can be constructed according to the sample context information; each sample dialogue text comprises the sample question text and the sample reply text; labeling can be carried out on each sample question text according to the corresponding relation between the sample question text and the sample reply text; then, determining the sample context information and the sample dialogue text as a second sample text in the dialogue reply scene; therefore, the second sample text in the dialogue reply scene is quickly and accurately constructed.
In the embodiment of the specification, the reply can be automatically generated according to the text information corresponding to the context information of the object communication and the second sample text, and meanwhile, the generated result is closely related to the context, so that the accuracy of the reply text is improved.
S207: constructing a third sample text in the dialogue modification scene based on the sample context information; the third sample text is marked with a sample correction text label; the sample correction text is a text obtained by rewriting and rendering the third sample text.
In the embodiment of the specification, the user can be helped to check and modify grammar, spelling, punctuation and other errors in the text, and meanwhile, the user's own expectations and emotion expressions, such as negative or positive, formal or informal, are met, and the real chat scene comprises intimate chat, care, greeting blessing, emotion life and other subdivision scenes.
In the embodiment of the present specification, as shown in fig. 4, constructing a third sample text in the dialogue modification scene based on the above sample context information includes:
s2071: analyzing the sample context information into an initial sample text;
s2073: correcting at least one of grammar, spelling and punctuation in the initial sample text to obtain a first corrected text;
S2075: determining a sample text set of a sample object corresponding to the initial sample text based on the sample context information;
s2077: based on the sample text set, determining a sample emotion type of a sample object corresponding to the initial sample text;
s2079: based on the sample emotion type, adjusting the first corrected text to obtain a second corrected text;
s20711: and determining a third sample text in the dialogue modification scene based on the second modification text.
In the embodiment of the specification, the sample context information can be firstly subjected to preliminary sentence breaking according to the sample object, and then the preliminary sentence breaking result is subjected to further sentence breaking according to punctuation marks to generate a plurality of initial sample texts; correcting at least one of grammar, spelling and punctuation in the initial sample text to obtain a first corrected text; therefore, the initial sample text is checked and corrected, and the accuracy of characters in the first corrected text is improved; clustering the sample context information according to sample objects to obtain sample text sets corresponding to different sample objects, namely obtaining sample text sets of the sample objects corresponding to the initial sample text; analyzing the sample text set corresponding to each sample object, and determining the sample emotion type of the sample object corresponding to the initial sample text; based on the emotion type of the sample, adjusting the first corrected text to obtain a second corrected text; determining a third sample text in the dialogue modification scene based on the second modification text; for example, the second corrected text or the second corrected text and the sample context information may be determined as the third sample text; the third sample text may be at least one text.
In an embodiment of the present disclosure, determining, based on the second corrected text, a third sample text in the dialog corrected scene includes:
determining a sample communication scene type of a sample object corresponding to the initial sample text; the sample communication scene types comprise formal communication scene types and informal communication scene types;
and adjusting the second corrected text based on the sample communication scene type to obtain a third sample text in the dialogue corrected scene.
In the embodiment of the present disclosure, a sample communication scene type of a sample object corresponding to an initial sample text may be determined according to context information corresponding to a first sample object and a second sample object; the sample communication scene types may include a formal communication scene type and an informal communication scene type; the formal communication scene can generally comprise a communication scene aiming at working matters, communication among teachers and students and the like; informal communication scenarios may include those of relatives, communications between friends, and the like. The second corrected text can be adjusted according to the communication scene type corresponding to the sample object, and a third sample text under the dialogue corrected scene is obtained, so that the third sample text accords with the corresponding communication scene, and the communication efficiency is improved.
In this embodiment of the present disclosure, based on the sample communication scene type, the adjusting the second corrected text to obtain a third sample text in the dialogue corrected scene includes:
acquiring at least two communication sub-types corresponding to the informal communication scene type under the condition that the sample communication scene type is the informal communication scene type;
screening the communication sub-types matched with the sample context information from the at least two communication sub-types to obtain a target communication sub-type;
and adjusting the second corrected text based on the target communication subtype to obtain a third sample text in the dialogue corrected scene.
In the embodiment of the present disclosure, in an informal communication scenario, at least two communication sub-types corresponding to the informal communication scenario type may be obtained; the at least two communication subtypes comprise intimate communication, care of care, greeting blessing, emotion communication, life communication and the like; thereby further matching the target communication subtype; in the text adjustment process, the second corrected text can be adjusted according to the target communication subtype with finer granularity, so as to obtain a third sample text in the dialogue corrected scene; therefore, the matching degree of the third sample text and the communication scene is further improved, and the communication efficiency is improved.
S209: and constructing a comprehensive sample text based on the first sample text, the second sample text and the third sample text.
In the embodiment of the present specification, the first sample text, the second sample text, and the third sample text may be determined as integrated sample texts.
S2011: training a large language model based on the comprehensive sample text, the sample initial text label, the sample reply text label and the sample correction text label to obtain a text prediction model; the text prediction model is used for predicting target text of a target object in the dialogue opening scene, the dialogue reply scene or the dialogue correction scene.
In this embodiment of the present disclosure, the large language model includes a converter model and a parameter debugging model, as shown in fig. 5, training the large language model based on the integrated sample text, the sample initial text label, the sample reply text label, and the sample corrected text label to obtain a text prediction model, including:
s20111: inputting the comprehensive sample text into the large language model to perform text prediction processing to obtain a sample prediction text;
In the embodiment of the specification, sample texts under three scenes contained in the comprehensive sample text can be respectively input into a large language model to perform text prediction processing, and sample prediction texts corresponding to various sample texts can be obtained. The large language model may be a model for constructing a large-scale Transform architecture by using massive rich internet basic corpus, including but not limited to Low-rank adaptation (Low-Rank Adaptation of Large Language Models, loRA), alpaca large language model (Large Language Model Meta AI, LLaMa), generalized linear model (Generalized Linear Models, GLM), and the like.
S20113: constructing loss data based on the difference between the sample label corresponding to the comprehensive sample text and the sample prediction text; the sample labels corresponding to the comprehensive sample text comprise the sample initial text label, the sample reply text label and the sample correction text label;
s20115: and adjusting parameters of the large language model until the training ending condition is met based on the loss data, and determining the large language model at the time of training ending as the text prediction model.
In the embodiment of the present specification, loss data may be constructed according to a difference between a sample tag corresponding to a comprehensive sample text and the sample prediction text; and adjusting parameters of the large language model according to the loss data until the training ending condition is met, and determining the large language model at the end of training as the text prediction model. The training ending condition can be set according to the actual situation; for example, the training end condition may be set such that the loss data is smaller than a preset value or the loss data is smaller than a preset value and the number of training iterations reaches a preset number.
In the embodiment of the present disclosure, inputting the integrated sample text into the large language model to perform text prediction processing, to obtain a sample predicted text, including:
inputting the first text sample into the large language model to perform text prediction processing to obtain a first sample predicted text;
inputting the second sample text into the large language model to perform text prediction processing to obtain a second sample predicted text;
and inputting the third sample text into the large language model to perform text prediction processing to obtain a third sample predicted text.
In some embodiments, constructing the loss data based on a difference between the sample tag corresponding to the integrated sample text and the sample predicted text includes:
determining a first difference between the first sample predicted text and the sample initial text label;
determining a second difference between the second sample predicted text and the sample reply text label;
determining a third difference between the third sample predicted text and the sample corrected text;
the loss data is constructed based on the first difference, the second difference, and the third difference.
In the embodiment of the present specification, the loss data may be calculated by combining the differences corresponding to the first sample predicted text, the second sample predicted text, and the third sample predicted text; the large-scale language model can be primarily trained according to the difference corresponding to the first sample predictive text, and the model obtained by primary training is secondarily trained according to the difference corresponding to the second sample predictive text; and finally, according to the corresponding difference of the text predicted by the third sample, training the model obtained by the secondary training for three times to obtain a text prediction model.
In the embodiment of the present disclosure, inputting the integrated sample text into the large language model to perform text prediction processing, to obtain a sample predicted text, including:
inputting the comprehensive sample text into the converter model to extract text characteristics, and obtaining first sample characteristics;
inputting the comprehensive sample text into the parameter debugging model to extract text characteristics, so as to obtain second sample characteristics; the model parameters of the parameter debugging model are smaller than a preset threshold value;
and determining a sample prediction text corresponding to the integrated sample text based on the first sample feature and the second sample feature.
In this embodiment of the present disclosure, as shown in fig. 6, fig. 6 is a schematic structural diagram of a large language model, including a converter model and a parameter debugging model, where an instruction fine tuning task may be set in the model, including instruction definition, instruction input and output result examples in three scenarios; the instruction fine tuning task can construct sample texts in three scenes according to the input information; the fine tuning task specifically comprises three typical tasks of built-up dialogue auxiliary generation, auxiliary reply generation based on context dialogue, dialogue information rewriting and color rendering. Partner: normally, no history record of chat is available, and typically, calling and answering is actively carried out, for example, only one history record (such as friend adding message) is needed in the last 10 minutes, or no history record is needed in the last 10 minutes, at this time, a set of account results can be used for generating, and images and personal information of both chat parties can be taken into consideration when the set of account, so that the generated results are more targeted; auxiliary reply generation: the reply is automatically generated according to the text information of the context of the chat of the user, and the generated result is closely related to the context; rewriting and moisturizing: the method can help the user to check and modify grammar, spelling, punctuation and other errors in the text, and simultaneously accords with the expectations and emotion expressions of the user, such as negative or positive, formal or informal, and real chat scenes comprise intimate chat, care, greeting blessing, emotion life and the like, and a part of typical fine tuning instruction sample data is constructed based on each scene.
The instruction fine tuning mainly adopts a LoRA model, and the method can be considered as a method capable of learning and editing the weights of large models. Specifically: loRA freezes the parameters of the pre-trained model and adds dropout+Linear Linear layer+Conv1d extra parameters to each layer Decoder. Dropout is a common regularization method for reducing the overfitting phenomenon of the neural network, and Dropout refers to that in the training process of the deep learning network, the neural network unit is temporarily discarded from the network according to a certain probability. Conv1D is a one-dimensional convolution layer (one-dimensional convolution).
The converter model (tansformer) comprises structural layers of L1, L2, … …, L12 and the like, a bypass module can be inserted in parallel into any linear layer of the converter model (tansformer), and the input of the linear layer is subjected to weight conversion of the original weight and the bypass module and then the obtained result is added and output. The bypass module is a plug-in parameter, and adopts a low rank matrix structure, namely, the original d x d dimensional matrix W is represented by the product of a d x r dimensional matrix A and a r x d dimensional matrix B through matrix decomposition, and r < < d >, so that the quantity of the introduced plug-in parameter is very small, most parameters do not need to be adjusted, the result of a model can be influenced by adjusting a small quantity of parameters, the calculated quantity is reduced, the model convergence is quickened, and the required training machine resources are also very small. In addition, the multi-layer fine tuning mode can support capturing of more chat high-level semantic features, the capturing is more basic features and the high-level features are more biased to the semantic layer, so that better effect and ideographic capability of finally generated dialogue texts can be ensured, and the dialogue is powerfully assisted. Features of the multi-layer neural network, in which a lower layer such as L1 captures representations of primary features such as words and phrases, a higher layer such as L10, and a higher layer such as L12 capture more features of semantic and grammar levels, and adding adjustable parameters at each layer expands the capabilities of the low-layer and higher-layer neural networks.
The scheme can avoid the defect of another fine tuning scheme Adapter method, the latter uses a serial structure, the inserted Adapter module is easy to become a calculation bottleneck, and especially when the parallelism is lower (the batch is smaller, the length is shorter), the influence on the calculation efficiency of the model is larger; in addition, the method also has the advantages that although a parallel structure is used, the introduced Prefix token occupies the available input length of the model, so that the expansibility of the Prefix token is poor, the quantity of the Prefix token is increased by increasing the quantity of parameters, the occupation of the available input length of the model is more serious, and meanwhile, the input distribution is likely to be changed, so that the final fine tuning effect is influenced.
In an exemplary embodiment, as shown in fig. 7, fig. 7 is a schematic diagram of the working principle of the lore model, where the working principle of the lore (Low-Rank Adaptation of Large Language Models, lore) is to freeze the weights of the pre-training model and inject a trainable layer in each transducer block. This results in a significant reduction in the computational effort of fine-tuning the model, while the fine-tuning quality is comparable to full-model fine-tuning. The pre-training weights (pretrained weights, W) in fig. 7 are frozen, the parameters A, B are adjustable parameters in the training process, the input and output parts are fused, and the range of the adjustment parameters is enlarged by adding a bypass module.
In an embodiment of the present disclosure, determining, based on the first sample feature and the second sample feature, a sample prediction text corresponding to the integrated sample text, where the integrated sample text corresponds to the target number of the sample tags includes:
generating at least two initial sample predictive texts based on the first sample features and the second sample features;
taking any initial sample prediction text as a first screening text, and determining the similarity between each remaining initial sample prediction text and the first screening text;
ranking the rest initial sample predictive texts based on the similarity determination result;
determining a second screening text based on the sorting result and the target number;
determining the first screening text and the second screening text as sample prediction texts corresponding to the comprehensive sample texts; the number of the sample predicted texts is a target number;
in the embodiment of the present disclosure, the target number of the sample tags may be set according to actual situations, where the sum of the numbers of the first screening text and the second screening text is the target number; the rest initial sample predicted texts are texts except the first screening text in at least two initial sample predicted texts; in the screening process of the second screening text, each remaining initial sample predicted text can be ranked according to the similarity from high to low, and a preset number of texts at the last ranking position are determined as the second screening text, so that the similarity among a plurality of sample predicted texts can be reduced.
Illustratively, constructing the loss data based on a difference between the sample label corresponding to the integrated sample text and the sample predicted text includes:
constructing at least two groups of sample label text pairs based on the sample labels of the target number and the sample prediction texts of the target number;
determining initial loss data corresponding to each set of sample label text pairs based on differences between the sample labels and the sample prediction text in each set of sample label text pairs;
and determining the loss data according to the sum of the initial loss data corresponding to each group of sample label text pairs.
In the present description embodiment, each set of sample tag text pairs includes one sample tag and one sample prediction text; at least two groups of sample label text pairs can be constructed based on the sample labels of the target number and the sample prediction texts of the target number; the initial loss data corresponding to each group of sample label text pairs can be determined according to the difference between the sample labels and the sample prediction text in each group of sample label text pairs; and determining the loss data according to the sum of the initial loss data corresponding to each group of sample label text pairs, so as to train a large language model according to the loss data and obtain a text prediction model.
In an embodiment of the present disclosure, determining the loss data according to a sum of initial loss data corresponding to each set of sample tag text pairs includes:
determining first loss data according to the sum of initial loss data corresponding to each group of sample label text pairs;
acquiring sample evaluation information corresponding to the sample prediction text;
constructing second loss data based on the difference between the sample evaluation information and the standard evaluation information;
the loss data is determined based on the first loss data and the second loss data.
In the embodiment of the specification, evaluation information of human feedback reinforcement learning can be introduced, the evaluation is used as a supervision signal for model training, and loss corresponding to the evaluation is introduced in the process of determining loss data, so that the accuracy of the model is improved.
In the embodiment of the specification, in order to ensure that the output effect of the final actual task can be aligned with the expected of a person, human feedback Reinforcement Learning (RLHF) is introduced, the result of LLM is manually aligned with the expected of the person before the model is formally online, the result is improved by reinforcement learning, the main process is to make a distinguishing evaluation on the output result by manual scoring, construct a model for scoring reinforcement learning quality, and then screen and sort the generated result quality based on the instruction scoring model in the production process. The method comprises the steps of outputting a plurality of prediction results according to a text prediction model, performing reinforcement learning through human feedback to obtain an evaluation model, evaluating the plurality of prediction results through the evaluation model, and sorting the plurality of prediction results according to the evaluation result, so that a target prediction result is determined according to the sorting result. For example, a fixed number of predicted results with high evaluation scores may be recommended to the terminal as final output results. Illustratively, the results of feedback and conversion clicks of the user in-line are actually filtered to match the user's desired result pairs as further refined and aligned sample data.
In an embodiment of the present disclosure, the method further includes:
acquiring first attribute information of a target object and second attribute information corresponding to an interaction object of the target object;
inputting the first attribute information and the second attribute information into the text prediction model to perform text prediction processing to obtain a target initial text;
and sending the target initial text to a target terminal corresponding to the target object so that the target terminal displays the target initial text.
In the embodiment of the present disclosure, the target initial text may be one or at least two, and the specific number may be set in the training process of the text prediction model;
in this embodiment of the present disclosure, a target terminal may obtain, in response to an opening instruction triggered by a dialog box in an instant messaging application by a target object, first attribute information of the target object and second attribute information corresponding to an interaction object of the target object; and sent to the server. As shown in fig. 8, fig. 8 is a schematic page diagram of a target terminal displaying a target initial text, where the target object is not contacted with the interactive object for a long time, the target object needs to be in communication with the interactive object again, and the target initial text includes a plurality of target initial texts, including a target initial text 1, a target initial text 2, a target initial text 3, and a target initial text 4, and the target terminal displays the target initial text 1 in a dialog box in response to an operation instruction triggered based on the target initial text 1.
In an embodiment of the present disclosure, the method further includes:
acquiring target context information of the target object and the interactive object in the interaction process and target problem text of the interactive object;
inputting the target context information and the target problem text into the text prediction model, and performing text prediction processing to obtain a target reply text;
and sending the target reply text to a target terminal corresponding to the target object, so that the target terminal displays the target reply text.
In the embodiment of the present disclosure, the target reply text may be one or at least two, and the specific number may be set in the training process of the text prediction model; as shown in fig. 9, fig. 9 is a schematic page diagram of a target terminal displaying target reply texts, where the target reply texts include a plurality of target reply texts, including a target reply text 1, a target reply text 2, a target reply text 3, and a target reply text 4, and the target terminal displays the target reply text 2 in a dialog box in response to an operation instruction triggered based on the target reply text 2.
In an embodiment of the present disclosure, the method further includes:
Acquiring target context information of the target object and the interactive object in the interaction process, and inputting a target dialogue text of the target object in a dialogue box of an instant communication application;
inputting the target dialogue text into the text prediction model, and performing text prediction processing to obtain a target corrected text;
and sending the target correction text to a target terminal corresponding to the target object so that the target terminal displays the target correction text.
In the embodiment of the present disclosure, the target corrected text may be one or at least two, and the specific number may be set in the training process of the text prediction model; as shown in fig. 10, fig. 10 is a schematic page diagram showing a target corrected text by a target terminal, wherein a target object inputs a target dialogue text in an input box, and then a plurality of target corrected texts including a target corrected text 1, a target corrected text 2, a target corrected text 3 and a target corrected text 4 are shown in the page, and the target terminal responds to an operation instruction triggered based on the target corrected text 2 to display the target corrected text 2 in the dialogue box.
For example, in an application process, since auxiliary session enhancement is performed in a social network chat scenario, it is necessary to upload some basic information of the session process. Such as representation information of the user: a host user ID-user ID, a guest user ID-user ID; user remarks: b notes written to a, etc., which may be used as a trimming aid input. The other clients upload the fields for providing information as shown in table 1 below:
TABLE 1
Auxiliary dialogue scene 1: in the dialogue reply scene, if a history dialogue record exists, intelligent reply is performed according to the dialogue record, as shown in the following table 2:
TABLE 2
Auxiliary dialogue scene 2: the partner scene is mainly a field partner, and the partner is subjected to a calling partner in the last 10 minutes without history record. As shown in table 3 below:
TABLE 3 Table 3
Auxiliary dialogue scene 3: in the dialogue correction scenario, the chat dialogue content is moistened and rewritten as shown in table 4 below:
TABLE 4 Table 4
Based on the actual example of the fine tuning instruction, each scene structure collects the fine tuning instruction with the level of about K (1000), the related capability of the large language model can be well activated through the fine tuning scheme, better effect and ideographic capability of the finally generated dialogue text are ensured, the dialogue is forcefully assisted, the number of active dialogue messages and dialogue messages of the platform is finally increased, and the communication efficiency and the communication effect of the object in the dialogue opening scene, the dialogue reply scene and the dialogue correction scene are improved.
As can be seen from the technical solutions provided in the above embodiments of the present specification, the embodiments of the present specification obtain first sample attribute information of a first sample object, second sample attribute information of a second sample object, and sample context information; the second sample object is an interaction object of the first sample object, and the sample context information is interaction information of the first sample object and the second sample object; constructing a first sample text in a dialogue opening scene based on the first sample attribute information and the second sample attribute information; the first text labels the initial text labels of the samples; based on the sample context information, extracting a sample problem text, and constructing a second sample text in a dialogue reply scene; the second sample text is marked with a sample reply text label; constructing a third sample text in the dialogue modification scene based on the sample context information; the third sample text is marked with a sample correction text label; the sample correction text is a text obtained by rewriting and moisturizing the third sample text; constructing a comprehensive sample text based on the first sample text, the second sample text and the third sample text; training the large language model based on the comprehensive sample text, the sample initial text label, the sample reply text label and the sample correction text label to obtain a text prediction model; the text prediction model is used for predicting target text of a target object in a dialogue opening scene, a dialogue reply scene or a dialogue modification scene. The integrated sample text constructed by the method comprises a dialogue opening scene, a dialogue reply scene and a sample text under a dialogue correction scene, and the large language model is trained through the integrated sample text, so that the text prediction model obtained through training can accurately predict target texts under three scenes, and the communication efficiency and the communication effect of the target objects and other objects under the dialogue opening scene, the dialogue reply scene and the dialogue correction scene are improved.
The embodiment of the present disclosure further provides a training device for a text prediction model, as shown in fig. 11, where the device includes:
an information acquisition module 1110 for acquiring first sample attribute information of a first sample object, second sample attribute information of a second sample object, and sample context information; the second sample object is an interaction object of the first sample object, and the sample context information is interaction information of the first sample object and the second sample object;
a first sample construction module 1120, configured to construct a first sample text in a session initiation scene based on the first sample attribute information and the second sample attribute information; the first text labels the initial text labels of the samples;
a second sample text construction module 1130, configured to extract a sample question text based on the sample context information, and construct a second sample text in a dialogue reply scene; the second sample text is marked with a sample reply text label;
a third sample text construction module 1140, configured to construct a third sample text in the dialog modification scene based on the sample context information; the third sample text is marked with a sample correction text label; the sample correction text is a text obtained by rewriting and coloring the third sample text;
A comprehensive sample text construction module 1150, configured to construct a comprehensive sample text based on the first sample text, the second sample text, and the third sample text;
the model training module 1160 is configured to train the large-scale language model based on the comprehensive sample text, the sample initial text label, the sample reply text label, and the sample correction text label, to obtain a text prediction model; the text prediction model is used for predicting target text of a target object in the dialogue opening scene, the dialogue reply scene or the dialogue correction scene.
In some embodiments, the third sample text construction module includes:
an initial text parsing unit, configured to parse the sample context information into an initial sample text;
the first correction unit is used for correcting at least one of grammar, spelling and punctuation in the initial sample text to obtain a first correction text;
a sample text set determining unit, configured to determine a sample text set of a sample object corresponding to the initial sample text based on the sample context information;
a sample emotion type determining unit, configured to determine, based on the sample text set, a sample emotion type of a sample object corresponding to the initial sample text;
The second correction unit is used for adjusting the first correction text based on the emotion type of the sample to obtain a second correction text;
and a third sample text determining unit configured to determine a third sample text in the dialogue modification scene based on the second modification text.
In some embodiments, the third sample text determination includes:
the scene determining subunit is used for determining the sample communication scene type of the sample object corresponding to the initial sample text; the sample communication scene types comprise formal communication scene types and informal communication scene types;
and the text adjustment subunit is used for adjusting the second corrected text based on the sample communication scene type to obtain a third sample text in the dialogue corrected scene.
In some embodiments, the text adjustment subunit includes:
a communication subtype obtaining subunit, configured to obtain at least two communication subtypes corresponding to an informal communication scene type when the sample communication scene type is the informal communication scene type;
a target communication subtype screening subunit, configured to screen, from the at least two communication subtypes, a communication subtype that matches the sample context information, to obtain a target communication subtype;
And the corrected text adjustment subunit is used for adjusting the second corrected text based on the target communication subtype to obtain a third sample text in the dialogue correction scene.
In some embodiments, the information acquisition module includes:
a first sample attribute information obtaining unit configured to obtain a first sample portrait and first interaction style information of the first sample object, to obtain the first sample attribute information;
a second sample attribute information obtaining unit configured to obtain a second sample portrait and second interaction style information of the second sample object, to obtain the second sample attribute information;
a sample context information obtaining unit, configured to obtain the sample context information based on a history interaction record of the first sample object and the second sample object;
the first sample construction module includes:
a first construction unit configured to construct a first sample text of the first sample object in the session start scene based on the first sample portrait and the first interaction style information;
and a second construction unit configured to construct a first sample text of the second sample object in the session start scene based on the second sample portrait and the second interaction style information.
In some embodiments, the second sample text building module includes:
a sample dialogue text construction unit for constructing a sample dialogue text based on the sample context information; the sample dialogue text comprises the sample question text and a sample reply text; the sample reply text is used for determining a label corresponding to the sample question text;
and the second sample text construction unit is used for constructing the second sample text in the dialogue reply scene based on the sample dialogue text.
In some embodiments, the large language model includes a converter model and a parameter tuning model, and the model training module includes:
the sample prediction text determining unit is used for inputting the comprehensive sample text into the large language model to perform text prediction processing to obtain a sample prediction text;
a loss data construction unit, configured to construct loss data based on a difference between a sample tag corresponding to the integrated sample text and the sample prediction text; the sample labels corresponding to the comprehensive sample text comprise the sample initial text label, the sample reply text label and the sample correction text label;
And a text prediction model determination unit configured to adjust parameters of the large language model until a training end condition is satisfied based on the loss data, and determine the large language model at the time of the training end as the text prediction model.
In some embodiments, the sample predicted text determining unit includes:
a first sample feature determining subunit, configured to input the integrated sample text into the converter model to perform text feature extraction, so as to obtain a first sample feature;
the second sample feature determining subunit is used for inputting the comprehensive sample text into the parameter debugging model to extract text features, so as to obtain second sample features; the model parameters of the parameter debugging model are smaller than a preset threshold value;
and the sample prediction text determination subunit is used for determining the sample prediction text corresponding to the comprehensive sample text based on the first sample characteristic and the second sample characteristic.
In some embodiments, the integrated sample text corresponds to the sample tags of the target number, and the sample predicted text determining subunit includes:
an initial sample prediction text generation subunit, configured to generate at least two initial sample prediction texts based on the first sample feature and the second sample feature;
A similarity determining subunit, configured to determine, using any initial sample prediction text as a first screening text, a similarity between each remaining initial sample prediction text and the first screening text;
a ranking subunit, configured to rank each remaining initial sample prediction text based on the similarity determination result;
a second screening text determining subunit, configured to determine a second screening text based on the sorting result and the target number;
a sample prediction text determining subunit, configured to determine the first screening text and the second screening text as a sample prediction text corresponding to the integrated sample text; the number of the sample predicted texts is a target number;
in some embodiments, the loss data construction unit includes:
a sample label text pair constructing subunit, configured to construct at least two groups of sample label text pairs based on the sample labels of the target number and the sample prediction texts of the target number;
an initial loss data determining subunit, configured to determine initial loss data corresponding to each set of sample label text pairs based on a difference between the sample labels and the sample prediction text in each set of sample label text pairs;
And the loss data determining subunit is used for determining the loss data according to the sum of the initial loss data corresponding to each group of sample label text pairs.
In some embodiments, the loss data determining subunit includes:
a first loss data determining subunit, configured to determine first loss data according to a sum of initial loss data corresponding to each set of sample tag text pairs;
a sample evaluation information obtaining subunit, configured to obtain sample evaluation information corresponding to the sample prediction text;
a second loss data constructing subunit configured to construct second loss data based on a difference between the sample evaluation information and the standard evaluation information;
and a loss data subunit configured to determine the loss data based on the first loss data and the second loss data.
In some embodiments, the apparatus further comprises:
the second attribute information acquisition module is used for acquiring the first attribute information of the target object and the second attribute information corresponding to the interaction object of the target object;
the target initial text determining module is used for inputting the first attribute information and the second attribute information into the text prediction model to perform text prediction processing to obtain a target initial text;
And the target initial text sending module is used for sending the target initial text to a target terminal corresponding to the target object so that the target terminal can display the target initial text.
In some embodiments, the apparatus further comprises:
the target question text acquisition module is used for acquiring target context information of the target object and the interactive object in the interaction process and target question text of the interactive object;
the target reply text determining module is used for inputting the target context information and the target problem text into the text prediction model and performing text prediction processing to obtain a target reply text;
and the target reply text sending module is used for sending the target reply text to the target terminal corresponding to the target object so that the target terminal can display the target reply text.
In some embodiments, the apparatus further comprises:
the target dialogue text acquisition module is used for acquiring target context information of the target object and the interaction object in the interaction process and target dialogue text input by the target object in a dialogue box of the instant communication application;
the target correction text determining module is used for inputting the target dialogue text into the text prediction model, and performing text prediction processing to obtain a target correction text;
And the target correction text sending module is used for sending the target correction text to a target terminal corresponding to the target object so that the target terminal can display the target correction text.
The apparatus and method embodiments described above in the apparatus embodiments are based on the same inventive concept.
Embodiments of the present disclosure provide a training device for a text prediction model, the device comprising a processor and a memory, the memory storing at least one instruction or at least one program, the at least one instruction or at least one program being loaded and executed by the processor to implement a training method for a text prediction model as provided by the method embodiments described above.
Embodiments of the present application also provide a computer storage medium, where the storage medium may be provided in a terminal to store at least one instruction or at least one program related to a training method for implementing a text prediction model in a method embodiment, where the at least one instruction or at least one program is loaded and executed by the processor to implement the training method for a text prediction model provided in the method embodiment.
Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes to implement the training method of the text prediction model provided by the above method embodiment.
Alternatively, in the present description embodiment, the storage medium may be located in at least one network server among a plurality of network servers of the computer network. Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The memory described above may be used for storing software programs and modules, and the processor executes the software programs and modules stored in the memory to perform various functional applications and data processing. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for functions, and the like; the storage data area may store data created according to the use of the above-described device, or the like. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory may also include a memory controller to provide access to the memory by the processor.
The training method embodiments of the text prediction model provided in the embodiments of the present disclosure may be executed in a mobile terminal, a computer terminal, a server, or a similar computing device. Taking the operation on the server as an example, fig. 12 is a block diagram of the hardware structure of the server of a training method of a text prediction model according to the embodiment of the present disclosure. As shown in fig. 12, the server 1200 may vary considerably in configuration or performance, and may include one or more central processing units (Central Processing Units, CPU) 1210 (the central processing unit 1210 may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA), a memory 1230 for storing data, one or more storage media 1220 (e.g., one or more mass storage devices) for storing applications 1223 or data 1222. Wherein memory 1230 and storage medium 1220 can be transitory or persistent. The program stored on the storage medium 1220 may include one or more modules, each of which may include a series of instruction operations on a server. Still further, the central processor 1210 may be configured to communicate with a storage medium 1220 and execute a series of instruction operations in the storage medium 1220 on the server 1200. The server 1200 may also include one or more power supplies 1260, one or more wired or wireless network interfaces 1250, one or more input/output interfaces 1240, and/or one or more operating systems 1221, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.
The input-output interface 1240 may be used to receive or transmit data via a network. The specific example of the network described above may include a wireless network provided by a communication provider of the server 1200. In one example, the input-output interface 1240 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the input/output interface 1240 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.
It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 12 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the server 1200 may also include more or fewer components than shown in fig. 12, or have a different configuration than shown in fig. 12.
As can be seen from the above embodiments of the method, apparatus, device, or storage medium for training a text prediction model provided in the present application, the present application obtains first sample attribute information of a first sample object, second sample attribute information of a second sample object, and sample context information; the second sample object is an interaction object of the first sample object, and the sample context information is interaction information of the first sample object and the second sample object; constructing a first sample text in a dialogue opening scene based on the first sample attribute information and the second sample attribute information; the first text labels the initial text labels of the samples; based on the sample context information, extracting a sample problem text, and constructing a second sample text in a dialogue reply scene; the second sample text is marked with a sample reply text label; constructing a third sample text in the dialogue modification scene based on the sample context information; the third sample text is marked with a sample correction text label; the sample correction text is a text obtained by rewriting and moisturizing the third sample text; constructing a comprehensive sample text based on the first sample text, the second sample text and the third sample text; training the large language model based on the comprehensive sample text, the sample initial text label, the sample reply text label and the sample correction text label to obtain a text prediction model; the text prediction model is used for predicting target text of a target object in a dialogue opening scene, a dialogue reply scene or a dialogue modification scene. The integrated sample text constructed by the method comprises a dialogue opening scene, a dialogue reply scene and a sample text under a dialogue correction scene, and the large language model is trained through the integrated sample text, so that the text prediction model obtained through training can accurately predict target texts under three scenes, and the communication efficiency and the communication effect of the target objects and other objects under the dialogue opening scene, the dialogue reply scene and the dialogue correction scene are improved.
It should be noted that: the embodiment sequence of the present disclosure is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus, device, storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and references to the parts of the description of the method embodiments are only required.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the present application is not intended to limit the invention to the particular embodiments of the present application, but to limit the scope of the invention to the particular embodiments of the present application.

Claims (15)

1. A method of training a text prediction model, the method comprising:
acquiring first sample attribute information of a first sample object, second sample attribute information of a second sample object and sample context information; the second sample object is an interaction object of the first sample object, and the sample context information is interaction information of the first sample object and the second sample object;
constructing a first sample text in a dialogue opening scene based on the first sample attribute information and the second sample attribute information; the first text label is marked with a sample initial text label;
extracting sample problem text based on the sample context information, and constructing a second sample text in a dialogue reply scene; the second sample text is marked with a sample reply text label;
constructing a third sample text in the dialogue modification scene based on the sample context information; the third sample text is marked with a sample correction text label; the sample correction text is a text obtained by rewriting and coloring the third sample text;
Constructing a comprehensive sample text based on the first sample text, the second sample text and the third sample text;
training a large language model based on the comprehensive sample text, the sample initial text label, the sample reply text label and the sample correction text label to obtain a text prediction model; the text prediction model is used for predicting target text of a target object in the dialogue opening scene, the dialogue reply scene or the dialogue correction scene.
2. The method of claim 1, wherein constructing a third sample text in a dialog modification scene based on the sample context information comprises:
analyzing the sample context information into an initial sample text;
correcting at least one of grammar, spelling and punctuation in the initial sample text to obtain a first corrected text;
determining a sample text set of a sample object corresponding to the initial sample text based on the sample context information;
based on the sample text set, determining a sample emotion type of a sample object corresponding to the initial sample text;
based on the sample emotion type, adjusting the first corrected text to obtain a second corrected text;
And determining a third sample text in the dialogue modification scene based on the second modification text.
3. The method of claim 2, wherein the determining a third sample text in the dialog modification scene based on the second modification text comprises:
determining a sample communication scene type of a sample object corresponding to the initial sample text; the sample communication scene type comprises a formal communication scene type and an informal communication scene type;
and adjusting the second corrected text based on the sample communication scene type to obtain a third sample text in the dialogue corrected scene.
4. The method of claim 3, wherein adjusting the second corrected text based on the sample communication scene type to obtain a third sample text in the dialog corrected scene comprises:
acquiring at least two communication sub-types corresponding to the informal communication scene type under the condition that the sample communication scene type is the informal communication scene type;
screening the communication sub-types matched with the sample context information from the at least two communication sub-types to obtain a target communication sub-type;
And adjusting the second corrected text based on the target communication subtype to obtain a third sample text in the dialogue corrected scene.
5. The method of any of claims 1-4, wherein the obtaining the first sample attribute information of the first sample object, the second sample attribute information of the second sample object, and the sample context information comprises:
acquiring a first sample portrait and first interaction style information of the first sample object to obtain the first sample attribute information;
acquiring a second sample portrait and second interaction style information of the second sample object to obtain second sample attribute information;
acquiring the sample context information based on the historical interaction records of the first sample object and the second sample object;
the constructing a first sample text in a dialogue opening scene based on the first sample attribute information and the second sample attribute information includes:
constructing a first sample text of the first sample object in the dialogue opening scene based on the first sample portrait and the first interaction style information;
and constructing a first sample text of the second sample object in the dialogue opening scene based on the second sample portrait and the second interaction style information.
6. The method of any of claims 1-4, wherein extracting sample question text based on the sample context information and constructing a second sample text in a dialog reply scene comprises:
constructing a sample dialogue text based on the sample context information; the sample dialogue text comprises the sample question text and a sample reply text; the sample reply text is used for determining a label corresponding to the sample question text;
and constructing a second sample text in the dialogue reply scene based on the sample dialogue text.
7. The method of any of claims 1-4, wherein the large language model comprises a converter model and a parametric debugging model, the training the large language model based on the integrated sample text, the sample initial text label, the sample reply text label, and the sample corrected text label to obtain a text prediction model comprising:
inputting the comprehensive sample text into the large language model to perform text prediction processing to obtain a sample prediction text;
constructing loss data based on the difference between the sample label corresponding to the comprehensive sample text and the sample prediction text; the sample labels corresponding to the comprehensive sample text comprise the sample initial text label, the sample reply text label and the sample correction text label;
And adjusting parameters of the large language model based on the loss data until the training ending condition is met, and determining the large language model at the end of training as the text prediction model.
8. The method of claim 7, wherein inputting the integrated sample text into the large language model for text prediction processing to obtain sample predicted text comprises:
inputting the comprehensive sample text into the converter model to extract text characteristics, and obtaining first sample characteristics;
inputting the comprehensive sample text into the parameter debugging model to extract text features, and obtaining second sample features; the model parameters of the parameter debugging model are smaller than a preset threshold value;
and determining a sample prediction text corresponding to the integrated sample text based on the first sample feature and the second sample feature.
9. The method of claim 8, wherein the integrated sample text corresponds to a target number of the sample tags, wherein the determining the sample prediction text corresponding to the integrated sample text based on the first sample feature and the second sample feature comprises:
Generating at least two initial sample prediction texts based on the first sample features and the second sample features;
taking any initial sample prediction text as a first screening text, and determining the similarity between each remaining initial sample prediction text and the first screening text;
ranking the rest initial sample predictive texts based on the similarity determination result;
determining a second screening text based on the sorting result and the target number;
determining the first screening text and the second screening text as sample prediction texts corresponding to the comprehensive sample texts; the number of the sample predicted texts is a target number;
constructing loss data based on a difference between a sample label corresponding to the integrated sample text and the sample predicted text, including:
constructing at least two groups of sample label text pairs based on the sample labels of the target number and the sample prediction texts of the target number;
determining initial loss data corresponding to each set of sample label text pairs based on differences between the sample labels and the sample prediction text in each set of sample label text pairs;
and determining the loss data according to the sum of the initial loss data corresponding to each group of sample label text pairs.
10. The method of claim 9, wherein determining the penalty data based on a sum of initial penalty data corresponding to each set of sample tag text pairs comprises:
determining first loss data according to the sum of initial loss data corresponding to each group of sample label text pairs;
acquiring sample evaluation information corresponding to the sample prediction text;
constructing second loss data based on a difference between the sample evaluation information and standard evaluation information;
the loss data is determined based on the first loss data and the second loss data.
11. The method according to any one of claims 1-4, further comprising:
acquiring first attribute information of a target object and second attribute information corresponding to an interaction object of the target object;
inputting the first attribute information and the second attribute information into the text prediction model, and performing text prediction processing to obtain a target initial text;
and sending the target initial text to a target terminal corresponding to the target object so that the target terminal displays the target initial text.
12. The method according to any one of claims 1-4, further comprising:
Acquiring target context information of the target object and the interactive object in the interaction process and target problem text of the interactive object;
inputting the target context information and the target problem text into the text prediction model, and performing text prediction processing to obtain a target reply text;
and sending the target reply text to a target terminal corresponding to the target object so that the target terminal displays the target reply text.
13. The method according to any one of claims 1-4, further comprising:
acquiring target context information of the target object and the interactive object in the interactive process, and inputting a target dialogue text of the target object in a dialogue box of an instant communication application;
inputting the target dialogue text into the text prediction model, and performing text prediction processing to obtain a target corrected text;
and sending the target correction text to a target terminal corresponding to the target object so that the target terminal displays the target correction text.
14. A training device for a text prediction model, the device comprising:
An information acquisition module for acquiring first sample attribute information of a first sample object, second sample attribute information of a second sample object, and sample context information; the second sample object is an interaction object of the first sample object, and the sample context information is interaction information of the first sample object and the second sample object;
the first sample construction module is used for constructing a first sample text in a dialogue opening scene based on the first sample attribute information and the second sample attribute information; the first text label is marked with a sample initial text label;
the second sample text construction module is used for extracting sample problem texts based on the sample context information and constructing second sample texts in a dialogue reply scene; the second sample text is marked with a sample reply text label;
a third sample text construction module, configured to construct a third sample text in a dialogue modification scene based on the sample context information; the third sample text is marked with a sample correction text label; the sample correction text is a text obtained by rewriting and coloring the third sample text;
A comprehensive sample text construction module for constructing a comprehensive sample text based on the first sample text, the second sample text, and the third sample text;
the model training module is used for training a large-scale language model based on the comprehensive sample text, the sample initial text label, the sample reply text label and the sample correction text label to obtain a text prediction model; the text prediction model is used for predicting target text of a target object in the dialogue opening scene, the dialogue reply scene or the dialogue correction scene.
15. A computer storage medium storing at least one instruction or at least one program loaded and executed by a processor to implement a method of training a text predictive model according to any of claims 1-13.
CN202311520789.9A 2023-11-14 2023-11-14 Training method and device for text prediction model and storage medium Pending CN117574920A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311520789.9A CN117574920A (en) 2023-11-14 2023-11-14 Training method and device for text prediction model and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311520789.9A CN117574920A (en) 2023-11-14 2023-11-14 Training method and device for text prediction model and storage medium

Publications (1)

Publication Number Publication Date
CN117574920A true CN117574920A (en) 2024-02-20

Family

ID=89883591

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311520789.9A Pending CN117574920A (en) 2023-11-14 2023-11-14 Training method and device for text prediction model and storage medium

Country Status (1)

Country Link
CN (1) CN117574920A (en)

Similar Documents

Publication Publication Date Title
CN110717017B (en) Method for processing corpus
WO2021217935A1 (en) Method for training question generation model, question generation method, and related device
US20190303768A1 (en) Community Question Answering-Based Article Recommendation Method, System, and User Device
CN113392331A (en) Text processing method and equipment
CN116700839B (en) Task processing method, device, equipment, storage medium and program product
CN112231554A (en) Search recommendation word generation method and device, storage medium and computer equipment
CN114330483A (en) Data processing method, model training method, device, equipment and storage medium
CN112749556B (en) Multi-language model training method and device, storage medium and electronic equipment
CN116821307B (en) Content interaction method, device, electronic equipment and storage medium
CN114282528A (en) Keyword extraction method, device, equipment and storage medium
CN116956183A (en) Multimedia resource recommendation method, model training method, device and storage medium
CN112667791A (en) Latent event prediction method, device, equipment and storage medium
CN117218482A (en) Model training method, video processing device and electronic equipment
CN116977701A (en) Video classification model training method, video classification method and device
CN113741759B (en) Comment information display method and device, computer equipment and storage medium
CN117574920A (en) Training method and device for text prediction model and storage medium
CN114595370A (en) Model training and sorting method and device, electronic equipment and storage medium
CN114281934A (en) Text recognition method, device, equipment and storage medium
CN112818084A (en) Information interaction method, related device, equipment and computer readable medium
CN114996435A (en) Information recommendation method, device, equipment and storage medium based on artificial intelligence
CN115130453A (en) Interactive information generation method and device
CN114547435A (en) Content quality identification method, device, equipment and readable storage medium
CN116913278B (en) Voice processing method, device, equipment and storage medium
CN116894089B (en) Digest generation method, digest generation device, digest generation apparatus, digest generation program, and digest generation program
CN117336539B (en) Video script production method and system for short video IP (Internet protocol) construction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication