CN116881127A

CN116881127A - Method and device for testing stylized rewriting capability of model

Info

Publication number: CN116881127A
Application number: CN202310746750.2A
Authority: CN
Inventors: 郑杰文
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2023-06-21
Filing date: 2023-06-21
Publication date: 2023-10-13

Abstract

The invention discloses a method and a device for testing stylized rewriting capability of a model. The method comprises the following steps: acquiring a target interaction data set; inputting interaction information of at least one interaction target contained in the target interaction data set into a language model to obtain stylized information in the interaction information and simulation reply information corresponding to the interaction information; inputting the interaction information, the stylized information and the simulation reply information into at least one model to be evaluated to obtain the rewritten reply information of the simulation reply information, wherein the rewritten reply information is used for representing that the simulation reply information is stylized rewritten according to the stylized information to obtain the interaction information; and testing at least one model to be evaluated according to the rewritten reply information and the interaction information to obtain a test result of the at least one model to be evaluated. The invention solves the technical problem of lower stylized rewriting efficiency of the interactive information in the related technology.

Description

Method and device for testing stylized rewriting capability of model

Technical Field

The invention relates to the field of language processing, in particular to a method and a device for testing stylized rewriting capability of a model.

Background

Currently, in the field of natural language processing, the rewriting capability refers to reconstructing or rewriting an original text to achieve a better expression effect or adapt to a specific application scene. The rewriting capability plays an important role in the fields of machine translation, abstract generation, text correction, and the like. However, the conventional technology has a problem that the stylized rewriting efficiency of the interactive information is low.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

At least some embodiments of the present invention provide a method and an apparatus for testing stylized rewrite capability of a model, so as to at least solve a technical problem in the related art that the stylized rewrite efficiency of interactive information is low.

According to one embodiment of the present invention, there is provided a method for testing stylized rewrite capability of a model, including: acquiring a target interaction data set, wherein the target interaction data set comprises interaction information among a plurality of interaction objects, and the plurality of interaction objects comprise at least one interaction target outputting stylized information; inputting interaction information of at least one interaction target contained in the target interaction data set into a language model to obtain stylized information in the interaction information and simulation reply information corresponding to the interaction information, wherein the interaction information is used for representing information of interaction between the at least one interaction target and other interaction objects in a plurality of interaction objects; inputting the interaction information, the stylized information and the simulation reply information into at least one model to be evaluated to obtain the rewritten reply information of the simulation reply information, wherein the rewritten reply information is used for representing that the simulation reply information is stylized rewritten according to the stylized information to obtain the interaction information; and testing at least one model to be evaluated according to the rewriting reply information and the interaction information to obtain a test result of the at least one model to be evaluated, wherein the test result is used for representing the stylized rewriting capability of the at least one model to be evaluated.

According to one embodiment of the present invention, there is also provided a stylized rewrite capability test apparatus for a model, including: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a target interaction data set, wherein the target interaction data set comprises interaction information among a plurality of interaction objects, and the plurality of interaction objects comprise at least one interaction target outputting stylized information; the first input module is used for inputting interaction information of at least one interaction target contained in the target interaction data set into the language model to obtain stylized information in the interaction information and simulation reply information corresponding to the interaction information, wherein the interaction information is used for representing information of interaction between the at least one interaction target and other interaction objects in the plurality of interaction objects; the second input module is used for inputting the interaction information, the stylized information and the simulation reply information into at least one model to be evaluated to obtain the rewritten reply information of the simulation reply information, wherein the rewritten reply information is used for representing that the simulation reply information is subjected to stylized rewriting according to the stylized information to obtain the interaction information; the test module is used for testing at least one model to be evaluated according to the rewriting reply information and the interaction information to obtain a test result of the at least one model to be evaluated, wherein the test result is used for representing stylized rewriting capability of the at least one model to be evaluated.

According to one embodiment of the present disclosure, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to execute the stylized rewrite capability test method of the model in the above embodiment when executed by a processor.

According to one embodiment of the present disclosure, there is also provided an electronic device including a memory in which a computer program is stored, and a processor configured to run the computer program to perform the stylized rewrite capability test method of the model in the above embodiment.

In at least some embodiments of the present invention, a target interaction dataset is obtained, where the target interaction dataset includes interaction information between a plurality of interaction objects, and the plurality of interaction objects includes at least one interaction target that outputs stylized information; inputting interaction information of at least one interaction target contained in the target interaction data set into a language model to obtain stylized information in the interaction information and simulation reply information corresponding to the interaction information, wherein the interaction information is used for representing information of interaction between the at least one interaction target and other interaction objects in a plurality of interaction objects; inputting the interaction information, the stylized information and the simulation reply information into at least one model to be evaluated to obtain the rewritten reply information of the simulation reply information, wherein the rewritten reply information is used for representing that the simulation reply information is stylized rewritten according to the stylized information to obtain the interaction information; and testing at least one model to be evaluated according to the rewriting reply information and the interaction information to obtain a test result of the at least one model to be evaluated, wherein the test result is used for representing the stylized rewriting capability of the at least one model to be evaluated. It is easy to notice that, through the at least one to-be-evaluated model to rewrite the stylized information of the simulation reply information corresponding to the interaction information, the obtained at least one rewritten reply information and the original interaction information can be subjected to a correlation matching test, so that the test result of the at least one to-be-evaluated model is determined, the to-be-evaluated model with the most accurate stylized rewriting capability in the test result of the at least one to-be-evaluated model is determined, and the interaction information is conveniently subjected to stylized rewriting based on the to-be-evaluated model, so that the technical effect of improving the stylized rewriting efficiency of the to-be-evaluated model on the interaction information is achieved, and the technical problem that the stylized rewriting efficiency of the interaction information is lower in the related technology is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a block diagram of a hardware architecture of a mobile terminal of a stylized rewrite capability test method of a model according to an embodiment of the present application;

FIG. 2 is a flow chart of a method for stylized rewrite capability testing of a model according to one embodiment of the application;

FIG. 3 is a flow chart of a stylized rewrite capability evaluation of a model according to one embodiment of the application;

FIG. 4 is a block diagram of a stylized rewrite capability test apparatus for a model according to one embodiment of the application;

fig. 5 is a schematic diagram of an electronic device according to an embodiment of the application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to one embodiment of the present invention, there is provided an embodiment of a stylized rewrite capability test method for a model, it being noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.

The method embodiments may be performed in a mobile terminal, a computer terminal, or similar computing device. Taking the mobile terminal as an example, the mobile terminal can be a terminal device such as a smart phone (e.g. an Android mobile phone, an iOS mobile phone, etc.), a tablet computer, a palm computer, a mobile internet device (MID for short), a PAD, a game console, etc. Fig. 1 is a block diagram of a hardware structure of a mobile terminal according to a stylized rewrite capability test method of a model according to an embodiment of the present invention. As shown in fig. 1, a mobile terminal may include one or more (only one is shown in fig. 1) processors 102 (the processors 102 may include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processor (GPU), a Digital Signal Processing (DSP) chip, a Microprocessor (MCU), a programmable logic device (FPGA), a neural Network Processor (NPU), a Tensor Processor (TPU), an Artificial Intelligence (AI) type processor, etc.) and a memory 104 for storing data. Optionally, the mobile terminal may further include a transmission device 106, an input-output device 108, and a display device 110 for communication functions. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting of the structure of the mobile terminal described above. For example, the mobile terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.

The memory 104 may be used to store a computer program, for example, a software program of an application software and a module, such as a computer program corresponding to the stylized rewrite capability test method of the model in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, that is, implements the stylized rewrite capability test method of the model described above. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the mobile terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.

The input in the input output device 108 may come from a plurality of human interface devices (Human Interface Device, abbreviated as HIDs). For example: keyboard and mouse, gamepad, other special game controllers (e.g., steering wheel, fishing pole, dance mat, remote control, etc.). Part of the ergonomic interface device may provide output functions in addition to input functions, such as: force feedback and vibration of the gamepad, audio output of the controller, etc.

The display device 110 may be, for example, a head-up display (HUD), a touch screen type Liquid Crystal Display (LCD), and a touch display (also referred to as a "touch screen" or "touch display"). The liquid crystal display may enable a user to interact with a user interface of the mobile terminal. In some embodiments, the mobile terminal has a Graphical User Interface (GUI), and the user may interact with the GUI by touching finger contacts and/or gestures on the touch-sensitive surface, where the man-machine interaction functions optionally include the following interactions: executable instructions for performing the above-described human-machine interaction functions, such as creating web pages, drawing, word processing, making electronic documents, games, video conferencing, instant messaging, sending and receiving electronic mail, talking interfaces, playing digital video, playing digital music, and/or web browsing, are configured/stored in a computer program product or readable storage medium executable by one or more processors.

In a possible implementation manner, the embodiment of the invention provides a stylized rewriting capability test method of a model, and a graphical user interface is provided through a terminal device, wherein the terminal device can be the aforementioned local terminal device or the aforementioned client device in a cloud interaction system. Fig. 2 is a flowchart of a method for testing stylized rewrite capability of a model according to one embodiment of the present invention, in which a graphical user interface is provided by a terminal device, and contents displayed on the graphical user interface include a touch area, as shown in fig. 2, the method includes the steps of:

step S202, a target interaction data set is obtained, wherein the target interaction data set comprises interaction information among a plurality of interaction objects, and the interaction objects comprise at least one interaction target outputting stylized information.

Specifically, the target interaction data set may be used to represent interaction information between a plurality of interaction objects, may be interaction data between a plurality of users, may be interaction data for a language processing model, may be interaction data for obtaining a target text based on a text generation model, and the like, and is not particularly limited herein. Meanwhile, the plurality of interaction objects at least comprise one interaction target with stylized information.

In an alternative embodiment, in the process of performing the interaction, at least one interaction target with stylized information needs to be determined as one party of the interaction, and other interaction targets except the interaction target with stylized information in the plurality of interaction objects are taken as the other party of the interaction. Alternatively, in response to the target interaction data set being used to characterize interaction data between multiple users, an interaction target with stylized information may be used as one party to the interaction and the other users as another party to the interaction.

Illustratively, the above-described target interaction data set is expressed as:

"Speaker 1 (Persona 1) (interactive object 1): hey, how's it going? (how much is you, recently

Speaker 2 (Persona 2) (interaction 2): not to bad, thanks. How aboutyou? "(not very bad, you woolen)

Speaker 1 (Persona 1): I'm doing pretty well. So, what do you like to do in your free time? (I have recently passed, you like to dry what woolen at leisure time)

Speaker 2 (Persona 2): I enjoy reading and playing video gas.how aboutyou? (i like reading and playing games, you do

Wherein the interactive object 1 in the conversation is characterized as a college student and likes playing basketball and watching movie, and the interactive object 2 in the conversation is characterized as a software engineer and likes reading and playing games at leisure time, etc. Since both the interactive object 1 and the interactive object 2 have stylization, any one of the interactive object 1 and the interactive object 2 can be used as an evaluation target, and in the present invention, the interactive object 2 can be used as an evaluation target for exemplary explanation, and meanwhile, the interactive information of a non-evaluation target (i.e. the interactive object 1) is reserved.

Step S204, inputting interaction information of at least one interaction target contained in the target interaction data set into a language model to obtain stylized information in the interaction information and simulation reply information corresponding to the interaction information, wherein the interaction information is used for representing information of interaction between the at least one interaction target and other interaction objects in the plurality of interaction objects.

Specifically, the above-mentioned interactive information may be information for representing that at least one interactive object interacts with other interactive objects among the plurality of interactive objects, and reference may be made to the above-mentioned "Speaker 1 (Persona 1) (interactive object 1): hey, how's it going? (how much is you, recently) Speaker 2 (Persona 2) (interaction object 2): not to bad, thanks. How aboutyou? "(not very bad, you woolen).

The language model can be used for representing a model for generating a universal text, and is applied to the method, namely the model for replying the universal text to the interactive information.

The above-described stylized information may be used to represent the stylized information of the non-evaluation target, which may be represented as the university student of the above-described interactive object 1 based on the above-described exemplary description.

The simulated reply information can be used for representing the corresponding reply information obtained by carrying out language model training on the interactive information, and the simulated reply information is characterized as reply information without stylization.

In an alternative embodiment, after the target interaction data set is acquired, the interaction information in the target interaction data set may be uploaded to the language model to perform the reply processing of the general text, so as to obtain the simulated reply information corresponding to the interaction information, and the simulated reply information may be labeled as R.

Step S206, inputting the interaction information, the stylized information and the simulation reply information into at least one model to be evaluated to obtain the rewritten reply information of the simulation reply information, wherein the rewritten reply information is used for representing that the simulated reply information is stylized rewritten according to the stylized information to obtain the interaction information;

specifically, the model to be evaluated can be used for representing a model for performing stylized rewriting on the simulated reply information, and the model with the training result most matched with the stylized information in the at least one model to be evaluated can be obtained by evaluating the stylized rewriting result obtained by the at least one model to be evaluated.

The rewritten reply information can be used for representing the rewritten reply information corresponding to the obtained simulated reply information by rewriting the stylized information of the simulated reply information based on the model to be evaluated.

In an alternative embodiment, after the stylized information in the interaction information and the analog reply information corresponding to the interaction information are obtained, the interaction information, the stylized information and the analog reply information may be input into at least one model to be evaluated, and the at least one rewritten reply information may be obtained by rewriting the stylized information of the analog reply information by the at least one model to be evaluated, and meanwhile, the rewritten reply information is labeled as RR.

And step S208, testing at least one model to be evaluated according to the rewriting reply information and the interaction information to obtain a test result of the at least one model to be evaluated, wherein the test result is used for representing the stylized rewriting capability of the at least one model to be evaluated.

Specifically, the above test result may be used to represent the stylized rewriting capability of at least one model to be evaluated on input information, and in general, the test result of at least one model to be evaluated may be determined according to the degree of association between at least one rewritten reply message and interactive information output by at least one model to be evaluated, where the higher the degree of association between at least one rewritten reply message and interactive information is, the stronger the stylized rewriting capability of the corresponding model to be evaluated is.

In an alternative embodiment, after obtaining the rewrite reply information corresponding to the simulated reply information, at least one model to be evaluated may be tested according to the rewrite reply information and the original interaction information, so as to obtain at least one test result of the at least one model to be evaluated and the interaction information.

In summary, by acquiring a target interaction data set, wherein the target interaction data set comprises interaction information among a plurality of interaction objects, and the plurality of interaction objects comprise at least one interaction target outputting stylized information; inputting interaction information of at least one interaction target contained in the target interaction data set into a language model to obtain stylized information in the interaction information and simulation reply information corresponding to the interaction information, wherein the interaction information is used for representing information of interaction between the at least one interaction target and other interaction objects in a plurality of interaction objects; inputting the interaction information, the stylized information and the simulation reply information into at least one model to be evaluated to obtain the rewritten reply information of the simulation reply information, wherein the rewritten reply information is used for representing that the simulation reply information is stylized rewritten according to the stylized information to obtain the interaction information; and testing at least one model to be evaluated according to the rewriting reply information and the interaction information to obtain a test result of the at least one model to be evaluated, wherein the test result is used for representing the stylized rewriting capability of the at least one model to be evaluated. It is easy to notice that, through the at least one to-be-evaluated model to rewrite the stylized information of the simulation reply information corresponding to the interaction information, the obtained at least one rewritten reply information and the original interaction information can be subjected to a correlation matching test, so that the test result of the at least one to-be-evaluated model is determined, the to-be-evaluated model with the most accurate stylized rewriting capability in the test result of the at least one to-be-evaluated model is determined, and the interaction information is conveniently subjected to stylized rewriting based on the to-be-evaluated model, so that the technical effect of improving the stylized rewriting efficiency of the to-be-evaluated model on the interaction information is achieved, and the technical problem that the stylized rewriting efficiency of the interaction information is lower in the related technology is solved.

Optionally, inputting the interaction information of at least one interaction target contained in the target interaction data set into the language model to obtain stylized information in the interaction information and simulation reply information corresponding to the interaction information, where the method includes: determining first interaction information corresponding to at least one interaction target and second interaction information corresponding to other interaction objects from the interaction information, wherein the first interaction information is used for representing information output by the at least one interaction target in an interaction process, and the second interaction information is used for representing information output by the other interaction objects in the interaction process; and respectively inputting the first interaction information and the second interaction information into the language model to obtain stylized information of the first interaction information and simulation reply information corresponding to the second interaction information.

Specifically, the first interaction information may be used to represent information output by at least one interaction target during the interaction, and may be labeled YY, and exemplary, the interaction information of the evaluation target (i.e. the (interaction object 2) is taken as the first interaction information, and "Not to bad, thanks. How aboutyou? "(not very bad, you's woolen)" and "I enjoy reading and playing video games. How aboutyou? (i like reading and playing games, you.

The second interaction information may be used to represent information output by other interaction objects during the interaction, and may be labeled as XX, and exemplary, interaction information of a non-evaluation target (i.e. the (interaction object 1) is taken as the second interaction information, where "Hey, how's it going? (how do you like you, recently? (i have recently passed, you like to dry what you like at leisure time), etc.

The stylized information may be used to represent stylized information of at least one interaction target obtained by performing a language model-based stylized distinction on the first interaction information, and the stylized information corresponding to the interaction object 1 is an example, for example, a university student.

The simulated reply information can be used for representing the simulated reply information corresponding to the second interaction information obtained by performing the language model-based reply training without stylization on the second interaction information.

In an alternative embodiment, in the process of inputting the interaction information of at least one interaction target contained in the target interaction data set into the language model to obtain the stylized information in the interaction information and the simulation reply information corresponding to the interaction information, the first interaction information corresponding to the at least one interaction target and the second interaction information corresponding to other interaction objects interacting with the at least one interaction target need to be determined from the interaction information, and then model training is performed on the first interaction information and the second interaction information respectively, so that the stylized information in the first interaction information and the simulation reply information corresponding to the second interaction information can be obtained.

In another alternative embodiment, in the process of respectively training the first interaction information and the second interaction information, in response to training the first interaction information through the first language model, the training task of the corresponding first language model may be to distinguish between stylized information and non-stylized information of the interaction information; in response to training the second interaction information through the first language model, the training task of the corresponding first language model may be to perform a simulated reply to the interaction information without a style; responsive to training the first interaction information through the second language model, a training task of the corresponding second language model may be to distinguish between stylized information and non-stylized information of the interaction information; in response to training the second interaction information through the second language model, the training task of the corresponding second language model may be a simulated reply to the interaction information without a style. In the invention, the first language model and the second language model are not specifically set, and can be adjusted according to actual conditions.

Optionally, the first interaction information and the second interaction information are respectively input into a language model to obtain stylized information of the first interaction information and simulation reply information corresponding to the second interaction information, including: inputting the first interaction information, the target style of the first interaction information and the first task description into a first language model in the language model to obtain stylized information in the first interaction information, wherein the first task description is used for issuing tasks for distinguishing the stylized information and the non-stylized information in the first interaction information to the first language model; and inputting the second interaction information and the second task description into a second language model in the language model to obtain simulation reply information corresponding to the second interaction information, wherein the second task description is used for sending a task for replying the second interaction information to the second language model, and the simulation reply information is used for representing information which simulates at least one interaction target to output in the interaction process.

Specifically, the target style may be used to represent the target style of the interactive object in the first interactive information, that is, the style of the evaluation target is a software engineer.

The first language model may be used as a general language model for distinguishing the stylized information from the non-stylized information of the first interactive information.

The first task description may be used to represent a task of differentiating between stylized information and non-stylized information in the first interaction information for the first language model.

The second language model can be used for representing a general text generation model for carrying out simulation reply on the second interaction information.

The second task description may be used to represent a reply task for performing the second interaction information on the second language model.

In an alternative embodiment, in the process of obtaining the stylized information of the first interaction information and the simulated reply information corresponding to the second interaction information, training the input first interaction information, the target style and the first task description through a general language model, reserving the interaction information associated with the target style in the first interaction information to obtain the stylized information, and marking the stylized information as chosen_persona (selected personage), and removing the interaction information irrelevant to the target style. And training the input second interaction information and the second task description through a general text generation model to obtain simulated reply information corresponding to the second interaction information, namely the R.

Optionally, inputting the interaction information, the stylized information and the simulated reply information into at least one model to be evaluated to obtain the rewritten reply information of the simulated reply information, including: generating an interaction context condition based on the first interaction information, the stylized information and the simulated reply information in the interaction information; and inputting the interaction context condition and a third task description into at least one model to be evaluated to obtain the rewritten reply information of the simulated reply information, wherein the third task description is used for issuing a task for rewriting the style of the simulated reply information to the at least one model to be evaluated.

Specifically, the above-mentioned interactive context conditions may be used to represent a context obtained by combining the first interactive information XX, the stylized information chosen_persona, and the simulated reply information R.

The third task description described above can be used to represent a task of rewriting the simulated reply information R in the input interaction context.

In an alternative embodiment, in the process of inputting the interaction information, the stylized information and the simulated reply information into at least one model to be evaluated to obtain the rewritten reply information of the simulated reply information, the interaction context condition can be obtained by combining the first interaction information XX, the stylized information chosen_persona and the simulated reply information R, and the simulated reply information in the input interaction context condition is rewritten by at least one model to be evaluated to obtain at least one rewritten reply information RR.

Optionally, testing at least one model to be evaluated according to the rewritten reply information and the interaction information to obtain a test result of the at least one model to be evaluated, including: determining recall rate of at least one model to be evaluated based on the first interaction information in the rewritten reply information and the interaction information, wherein the recall rate is used for representing the proportion of the rewritten reply information and the same interaction information in the first interaction information; determining test data of at least one model to be evaluated based on the similarity of the rewriting reply information and the stylized information, wherein the test data is used for representing data for testing the stylized rewriting capability; and determining the test result of at least one model to be evaluated based on the recall rate and/or the test data.

Specifically, the recall may be used to represent the proportion of the same interactive information in the rewritten reply information and the first interactive information, and generally, the higher the recall is, the more the proportion of the same interactive information in the rewritten reply information and the first interactive information is, and further, the higher the rewrite efficiency of the corresponding model to be evaluated is.

The test data may be used to indicate a degree of association corresponding to the degree of similarity between the rewritten reply information and the stylized information.

In an alternative embodiment, for each set of interaction information, performing index calculation of BLEU (rouge) on the rewritten reply information RR and the original speaking YY in the first interaction information to obtain a recall condition; meanwhile, the association condition can be obtained by calculating the bert score of the rewritten reply information RR and the stylized information chosen_persona, so that the test result of at least one model to be evaluated is determined according to the recall rate and/or the test data.

The above two indexes of the BLEU and rouge are the conditions of overlapping words between the generated text and the text serving as the standard answer, that is, recall rate. That is, how much text in the annotation answer appears within the generated text. The above-described bertscore is a measure of how tightly the association between the generated text and persona is calculated. For a model generated text, if it is associated with a given persona more highly, i.e., the bert score is higher, it is stated that the generated text is more likely to embody the corresponding persona, i.e., the corresponding style.

Optionally, in the case that the to-be-evaluated models are plural, the method further includes: determining average indexes of a plurality of models to be evaluated based on the recall rate and the test data; sequencing the multiple models to be evaluated based on the average index to obtain sequencing results of the multiple models to be evaluated; and determining a target evaluation model from the multiple models to be evaluated based on the sequencing result, wherein the target evaluation model is one of the multiple models to be evaluated, and the rewriting capability of the model to be evaluated meets a preset standard.

Specifically, the average index may be used to represent an index obtained by performing an average calculation on the recall and the test data.

The sorting result can be used for representing a result obtained by sorting the evaluating efficiencies of the multiple models to be evaluated based on the size sorting of the average index.

The target evaluation model can be used for representing the to-be-evaluated model with the rewriting capability meeting the preset standard in the multiple to-be-evaluated models.

The preset standard may be used to indicate that the preset rewrite capability reaches a standard corresponding to a predetermined degree, and in general, the preset standard is not specifically set, and a plurality of targets to be evaluated may be determined based on the preset standard.

In an alternative embodiment, under the condition that a plurality of to-be-evaluated models exist, average indexes of recall rate and test data are required to be determined, so that the plurality of to-be-evaluated targets are ranked based on the average indexes to obtain a ranking result, and then the plurality of to-be-evaluated targets are screened based on a preset standard to obtain the target evaluation models, and the purpose of accurately screening the plurality of to-be-evaluated models is achieved.

Optionally, the method further comprises: acquiring a plurality of preset interaction data sets, wherein the preset interaction data sets are used for representing interaction information comprising at least one interaction target; a target interaction data set is selected from a plurality of preset interaction data sets.

Specifically, the preset interaction data set may be used to represent a plurality of interaction data sets acquired in advance, including a data set of at least one interaction target, a data set without an interaction target, and the like.

The target interaction data set may be used to represent a data set selected from a preset interaction data set and including at least one interaction target.

In an alternative embodiment, by acquiring a plurality of preset interaction data sets in advance, in order to realize stylized rewriting of the interaction information, the interaction information of at least one interaction target in the plurality of preset interaction data sets needs to be determined, so that the interaction information is determined as target interaction data.

FIG. 3 is a flow chart of a stylized rewrite capability evaluation of a model according to one embodiment of the invention. As shown in fig. 3:

s301, preparing a dialogue data set containing persona;

a common dataset is obtained that contains conversations and personas.

S302, selecting an evaluation party;

the party with persona is selected as the evaluation target.

S303, grouping two by two according to S1 and S2;

s304, R is obtained;

for each group of the speech of the non-evaluation target is reserved, the reserved speech and the generated task description are input into a general language model, and accordingly a reply R without persona is obtained.

S305, judging whether the signals are irrelevant;

for each group, the reserved speech and the corresponding persona and task description are input into a general language model, and whether the speech is irrelevant or not is judged to obtain the chosen_persona or no-related corresponding to the speech.

S306, obtaining (S1: XX, S2: YY, R, chosen_persona);

for each group, if the result of S305 is no-related, then discard, thereby changing the original data from (S1: XX, S2: YY) to (S1: XX, S2: YY, R, chosen_persona).

S307, RR is obtained;

for each group, XX, R, chosen_persona is taken as a context condition, and the task description of "rewrite R" is added, and in actual operation, the name corresponding to R is given. The context conditions and task descriptions are combined together and input into the model to be evaluated, and the model output RR is obtained.

S308, calculating a blu and a rouge value;

for each group of obtained RRs, calculating the equivalent of bleu and rouge together with the RR and YY to obtain the recall condition.

S309, calculating a bert score value;

together with the reorganized chosen_persona, the RR calculated the bert score, looking at the association with persona.

S310, calculating an average index.

And adding and averaging indexes of all groups to obtain an evaluation value of the stylized rewriting capability of the model.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

In this embodiment, a device for testing the stylized rewriting capability of a model is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

Fig. 4 is a block diagram of a stylized rewrite capability test apparatus for a model according to one embodiment of the present invention, in which a graphical user interface is provided by a terminal device, the content displayed by the graphical user interface including a touch area, as shown in fig. 4, the apparatus includes:

an obtaining module 402, configured to obtain a target interaction data set, where the target interaction data set includes interaction information between a plurality of interaction objects, and the plurality of interaction objects includes at least one interaction target that outputs stylized information;

the first input module 404 is configured to input interaction information of at least one interaction target included in the target interaction data set into the language model, to obtain stylized information in the interaction information and simulation reply information corresponding to the interaction information, where the interaction information is used to characterize information that the at least one interaction target interacts with other interaction objects in the plurality of interaction objects;

the second input module 406 is configured to input the interaction information, the stylized information, and the simulated reply information into at least one model to be evaluated, to obtain rewritten reply information of the simulated reply information, where the rewritten reply information is used to represent that the simulated reply information is stylized rewritten according to the stylized information to obtain the interaction information;

The test module 408 is configured to test at least one model to be evaluated according to the rewrite reply information and the interaction information, and obtain a test result of the at least one model to be evaluated, where the test result is used to represent a stylized rewrite capability of the at least one model to be evaluated.

Optionally, the first input module 404 includes: the system comprises a determining module, a processing module and a processing module, wherein the determining module is used for determining first interaction information corresponding to at least one interaction target and second interaction information corresponding to other interaction objects from interaction information, the first interaction information is used for representing information output by the at least one interaction target in an interaction process, and the second interaction information is used for representing information output by the other interaction objects in the interaction process; the interactive information input module is used for respectively inputting the first interactive information and the second interactive information into the language model to obtain stylized information of the first interactive information and simulation reply information corresponding to the second interactive information.

Optionally, the interactive information input module includes: the stylized information obtaining module is used for inputting the first interaction information, the target style of the first interaction information and the first task description into a first language model in the language model to obtain the stylized information in the first interaction information, wherein the first task description is used for issuing a task for distinguishing the stylized information and the non-stylized information in the first interaction information to the first language model; the simulation reply information obtaining module is used for inputting the second interaction information and the second task description into a second language model in the language model to obtain simulation reply information corresponding to the second interaction information, wherein the second task description is used for sending a task for replying the second interaction information to the second language model, and the simulation reply information is used for representing information which simulates at least one interaction target to output in the interaction process.

Optionally, the second input module 406 includes: the context condition generation module is used for generating interaction context conditions based on the first interaction information, the stylized information and the simulated reply information in the interaction information; and the rewriting reply information obtaining module is used for inputting the interaction context condition and the third task description into at least one model to be evaluated to obtain the rewriting reply information of the simulation reply information, wherein the third task description is used for issuing a task for rewriting the style of the simulation reply information to the at least one model to be evaluated.

Optionally, the test module 408 includes: the recall rate determining module is used for determining the recall rate of at least one model to be evaluated based on the first interaction information in the rewritten reply information and the interaction information, wherein the recall rate is used for representing the proportion of the rewritten reply information and the same interaction information in the first interaction information; the test data determining module is used for determining test data of at least one model to be evaluated based on the similarity of the rewriting reply information and the stylized information, wherein the test data are used for representing data for testing the stylized rewriting capability; and the test result determining module is used for determining the test result of at least one model to be evaluated based on the recall rate and/or the test data.

Optionally, the apparatus further comprises: the average index determining module is used for determining average indexes of the plurality of models to be evaluated based on the recall rate and the test data; the model ordering module is used for ordering the plurality of models to be evaluated based on the average index to obtain ordering results of the plurality of models to be evaluated; the target evaluation model determining module is used for determining a target evaluation model from a plurality of to-be-evaluated models based on the sorting result, wherein the target evaluation model is a to-be-evaluated model with the rewriting capability meeting a preset standard in the plurality of to-be-evaluated models.

Optionally, the apparatus further comprises: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a plurality of preset interaction data sets, wherein the preset interaction data sets are used for representing interaction information comprising at least one interaction target; and the selection module is used for selecting a target interaction data set from a plurality of preset interaction data sets.

It should be noted that each of the above modules may be implemented by software or hardware, and for the latter, it may be implemented by, but not limited to: the modules are all located in the same processor; alternatively, the above modules may be located in different processors in any combination.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a computer readable storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present application, a computer-readable storage medium stores thereon a program product capable of implementing the method described above in this embodiment. In some possible implementations, aspects of the disclosed embodiments may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the disclosure as described in the "exemplary methods" section of the disclosure, when the program product is run on the terminal device.

A program product for implementing the above-described method according to an embodiment of the present disclosure may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the embodiments of the present disclosure is not limited thereto, and in the embodiments of the present disclosure, the computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product described above can employ any combination of one or more computer-readable storage media. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

It should be noted that the program code embodied on the computer readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Alternatively, in the present embodiment, the above-described computer-readable storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.

An embodiment of the invention also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

In the electronic device of this embodiment, a technical solution of a stylized rewrite capability test of a model is provided. Acquiring a target interaction data set, wherein the target interaction data set comprises interaction information among a plurality of interaction objects, and the plurality of interaction objects comprise at least one interaction target outputting stylized information; inputting interaction information of at least one interaction target contained in the target interaction data set into a language model to obtain stylized information in the interaction information and simulation reply information corresponding to the interaction information, wherein the interaction information is used for representing information of interaction between the at least one interaction target and other interaction objects in a plurality of interaction objects; inputting the interaction information, the stylized information and the simulation reply information into at least one model to be evaluated to obtain the rewritten reply information of the simulation reply information, wherein the rewritten reply information is used for representing that the simulation reply information is stylized rewritten according to the stylized information to obtain the interaction information; and testing at least one model to be evaluated according to the rewriting reply information and the interaction information to obtain a test result of the at least one model to be evaluated, wherein the test result is used for representing the stylized rewriting capability of the at least one model to be evaluated. The method and the device achieve the technical effect of improving the stylized rewriting efficiency of the model to be evaluated on the interactive information, and further solve the technical problem of low stylized rewriting efficiency of the interactive information in the related technology.

Fig. 5 is a schematic diagram of an electronic device according to an embodiment of the application. As shown in fig. 5, the electronic device 500 is merely an example, and should not be construed as limiting the functionality and scope of use of the embodiments of the present application.

As shown in fig. 5, the electronic apparatus 500 is embodied in the form of a general purpose computing device. The components of the electronic device 500 may include, but are not limited to: the at least one processor 510, the at least one memory 520, a bus 530 connecting the various system components including the memory 520 and the processor 510, and a display 540.

Wherein the memory 520 stores program code that can be executed by the processor 510 such that the processor 510 performs the steps according to various exemplary embodiments of the present application described in the above method section of the embodiment of the present application.

The memory 520 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 5201 and/or cache memory 5202, and may further include Read Only Memory (ROM) 5203, and may also include nonvolatile memory, such as one or more magnetic storage devices, flash memory, or other nonvolatile solid state memory.

In some examples, memory 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Memory 520 may further include memory located remotely from processor 510, which may be connected to electronic device 500 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Bus 530 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, processor 510, or a local bus using any of a variety of bus architectures.

The display 540 may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the electronic device 500.

Optionally, the electronic apparatus 500 may also communicate with one or more external devices 1400 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic apparatus 500, and/or any device (e.g., router, modem, etc.) that enables the electronic apparatus 500 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 550. Also, the electronic device 500 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through a network adapter 560. As shown in fig. 5, network adapter 560 communicates with other modules of electronic device 500 over bus 530. It should be appreciated that although not shown in fig. 5, other hardware and/or software modules may be used in connection with electronic device 500, which may include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The electronic device 500 may further include: a keyboard, a cursor control device (e.g., a mouse), an input/output interface (I/O interface), a network interface, a power supply, and/or a camera.

It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 5 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, electronic device 500 may also include more or fewer components than shown in FIG. 5, or have a different configuration than shown in FIG. 1. The memory 520 may be used to store a computer program and corresponding data, such as a computer program and corresponding data corresponding to a stylized rewrite capability test method of a model in an embodiment of the present application. The processor 510 executes a computer program stored in the memory 520 to perform various functional applications and data processing, that is, to implement the stylized rewrite capability test method of the model described above.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A method for testing stylized rewriting capability of a model is characterized by comprising the following steps:

acquiring a target interaction data set, wherein the target interaction data set comprises interaction information among a plurality of interaction objects, and the plurality of interaction objects comprise at least one interaction target outputting stylized information;

inputting interaction information of at least one interaction target contained in the target interaction data set into a language model to obtain stylized information in the interaction information and simulation reply information corresponding to the interaction information, wherein the interaction information is used for representing information of interaction between the at least one interaction target and other interaction objects in the plurality of interaction objects;

inputting the interaction information, the stylized information and the simulation reply information into at least one model to be evaluated to obtain rewritten reply information of the simulation reply information, wherein the rewritten reply information is used for representing that the simulation reply information is subjected to stylized rewriting according to the stylized information to obtain the interaction information;

And testing the at least one model to be evaluated according to the rewrite reply information and the interaction information to obtain a test result of the at least one model to be evaluated, wherein the test result is used for representing the stylized rewrite capability of the at least one model to be evaluated.

2. The method for testing the stylized rewrite capability of a model according to claim 1, wherein inputting the interaction information of at least one interaction target included in the target interaction data set into a language model to obtain stylized information in the interaction information and simulated reply information corresponding to the interaction information, includes:

determining first interaction information corresponding to the at least one interaction target and second interaction information corresponding to the other interaction objects from the interaction information, wherein the first interaction information is used for representing information output by the at least one interaction target in an interaction process, and the second interaction information is used for representing information output by the other interaction objects in the interaction process;

and respectively inputting the first interaction information and the second interaction information into the language model to obtain simulated reply information corresponding to the stylized information and the second interaction information of the first interaction information.

3. The method for testing the stylized rewrite capability of a model according to claim 2, wherein the steps of inputting the first interactive information and the second interactive information into the language model to obtain simulated reply information corresponding to the stylized information and the second interactive information of the first interactive information, respectively, include:

inputting the first interaction information, the target style of the first interaction information and a first task description into a first language model in the language models to obtain the stylized information in the first interaction information, wherein the first task description is used for issuing a task for distinguishing the stylized information from the non-stylized information in the first interaction information to the first language model;

inputting the second interaction information and the second task description into a second language model in the language model to obtain the simulation reply information corresponding to the second interaction information, wherein the second task description is used for issuing a task for replying the second interaction information to the second language model, and the simulation reply information is used for representing information which simulates the output of the at least one interaction target in the interaction process.

4. The method for testing the stylized rewrite capability of a model according to claim 1, wherein inputting the interaction information, the stylized information, and the simulated reply information into at least one model to be evaluated to obtain the rewrite reply information of the simulated reply information includes:

generating an interaction context condition based on the first interaction information, the stylized information and the simulated reply information in the interaction information;

and inputting the interaction context conditions and a third task description into at least one model to be evaluated to obtain the rewritten reply information of the simulated reply information, wherein the third task description is used for issuing a task for rewriting the style of the simulated reply information to the at least one model to be evaluated.

5. The method for testing the stylized rewrite capability of a model according to claim 1, wherein the testing the at least one model to be evaluated according to the rewrite reply information and the interaction information to obtain a test result of the at least one model to be evaluated includes:

determining recall rate of the at least one model to be evaluated based on the rewritten reply information and first interaction information in the interaction information, wherein the recall rate is used for representing the proportion of the rewritten reply information and the same interaction information in the first interaction information;

Determining test data of the at least one model to be evaluated based on the similarity of the rewriting reply information and the stylized information, wherein the test data is used for representing data for testing the stylized rewriting capability;

determining the test result of the at least one model to be evaluated based on the recall and/or the test data.

6. The method for testing the stylized rewrite ability of a model according to claim 5, wherein in a case where the model to be evaluated is plural, the method further comprises:

determining average indexes of a plurality of models to be evaluated based on the recall rate and the test data;

sequencing the plurality of models to be evaluated based on the average index to obtain sequencing results of the plurality of models to be evaluated;

and determining a target evaluation model from the multiple models to be evaluated based on the sorting result, wherein the target evaluation model is one of the multiple models to be evaluated, and the rewriting capability of the model to be evaluated meets a preset standard.

7. The method for testing stylized rewrite ability of a model according to claim 1, further comprising:

acquiring a plurality of preset interaction data sets, wherein the preset interaction data sets are used for representing interaction information comprising the at least one interaction target;

And selecting a target interaction data set from the plurality of preset interaction data sets.

8. A stylized rewrite capability test apparatus for a model, comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a target interaction data set, wherein the target interaction data set comprises interaction information among a plurality of interaction objects, and the plurality of interaction objects comprise at least one interaction target outputting stylized information;

the first input module is used for inputting interaction information of at least one interaction target contained in the target interaction data set into a language model to obtain stylized information in the interaction information and simulation reply information corresponding to the interaction information, wherein the interaction information is used for representing information of interaction between the at least one interaction target and other interaction objects in the plurality of interaction objects;

the second input module is used for inputting the interaction information, the stylized information and the simulation reply information into at least one model to be evaluated to obtain the rewritten reply information of the simulation reply information, wherein the rewritten reply information is used for representing that the simulation reply information is stylized rewritten according to the stylized information to obtain the interaction information;

The test module is used for testing the at least one model to be evaluated according to the rewrite reply information and the interaction information to obtain a test result of the at least one model to be evaluated, wherein the test result is used for representing the stylized rewrite capability of the at least one model to be evaluated.

9. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program, wherein the computer program is arranged to, when run by a processor, perform the stylized rewrite capability test method of the model according to any one of claims 1 to 7.

10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the stylized rewrite capability test method of the model according to any one of claims 1 to 7.