CN117422067A - Information processing method, information processing device, electronic equipment and storage medium - Google Patents

Information processing method, information processing device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117422067A
CN117422067A CN202311309876.XA CN202311309876A CN117422067A CN 117422067 A CN117422067 A CN 117422067A CN 202311309876 A CN202311309876 A CN 202311309876A CN 117422067 A CN117422067 A CN 117422067A
Authority
CN
China
Prior art keywords
evaluation
prompt
dimension
target
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311309876.XA
Other languages
Chinese (zh)
Inventor
岳双燕
樊中恺
周廷帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202311309876.XA priority Critical patent/CN117422067A/en
Publication of CN117422067A publication Critical patent/CN117422067A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The disclosure provides an information processing method, an information processing device, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, and particularly relates to the technical field of deep learning. The specific implementation scheme is as follows: acquiring a prompt word to be evaluated; generating a preset example set according to the prompt word; determining an evaluation dimension according to the prompt word and a preset example set, and generating an evaluation step for the evaluation dimension; based on a preset example set, an evaluation dimension and an evaluation step, evaluating the prompt word to obtain a target evaluation value of the prompt word. Therefore, the method and the device have the advantages that the target evaluation value is obtained by generating the preset example set according to the prompt word, determining the evaluation dimension and the evaluation step of the prompt word for evaluation, the automation evaluation can be realized, a large amount of time and labor cost are saved, the evaluation direction can be controlled by customizing the evaluation dimension, a flexible evaluation mode can be realized, and the evaluation range is enlarged.

Description

Information processing method, information processing device, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of deep learning, and especially relates to an information processing method, an information processing device, electronic equipment and a storage medium.
Background
The question and answer results generated based on the large language model have the problems of accuracy, consistency and the like, and the large language model needs to be evaluated. However, in the prior art, the large language model is evaluated manually, so that larger manpower resources and time cost are required, only the marked data set can be evaluated, and various evaluation tasks cannot be met.
Disclosure of Invention
The present disclosure provides a method, an apparatus, an electronic device, and a storage medium for information processing.
According to an aspect of the present disclosure, there is provided an information processing method including: acquiring a prompt word to be evaluated; generating a preset example set according to the prompt word; determining an evaluation dimension according to the prompt word and the preset example set, and generating an evaluation step for the evaluation dimension; and based on the preset example set, the evaluation dimension and the evaluation step, evaluating the prompt word to obtain a target evaluation value of the prompt word.
According to another aspect of the present disclosure, there is provided an information processing apparatus including: the acquisition module is used for acquiring prompt words to be evaluated; the first generation module is used for generating a preset example set according to the prompt words; the second generation module is used for determining an evaluation dimension according to the prompt words and the preset example set and generating an evaluation step for the evaluation dimension; and the evaluation module is used for evaluating the prompt word based on the preset example set, the evaluation dimension and the evaluation step to obtain a target evaluation value of the prompt word.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the information processing method according to the embodiment of the above aspect.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon computer instructions for causing the computer to execute the information processing method according to the embodiment of the above aspect.
According to another aspect of the present disclosure, there is provided a computer program product including a computer program/instruction which, when executed by a processor, implements the information processing method according to the embodiment of the above aspect.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a schematic flow chart of an information processing method according to an embodiment of the disclosure;
FIG. 2 is a flowchart of another information processing method according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of another information processing method according to an embodiment of the present disclosure;
FIG. 4 is a flowchart of another information processing method according to an embodiment of the present disclosure;
FIG. 5 is a flowchart of another information processing method according to an embodiment of the present disclosure;
FIG. 6 is a schematic flow chart of evaluating a model provided by an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present disclosure;
fig. 8 is a block diagram of an electronic device for implementing an information processing method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Information processing methods, apparatuses, and electronic devices according to embodiments of the present disclosure are described below with reference to the accompanying drawings.
Artificial intelligence (Artificial Intelligence, AI for short) is a discipline of researching and enabling a computer to simulate certain thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning and the like) of a person, and has a technology at a hardware level and a technology at a software level. Artificial intelligence hardware technologies generally include computer vision technologies, speech recognition technologies, natural language processing technologies, and learning/deep learning, big data processing technologies, knowledge graph technologies, and the like.
Natural language processing (Natural Language Processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. The natural language processing is mainly applied to the aspects of machine translation, public opinion monitoring, automatic abstracting, viewpoint extraction, text classification, question answering, text semantic comparison, voice recognition and the like.
Deep Learning (DL) is a new research direction in the field of Machine Learning (ML), and is introduced into Machine Learning to make it closer to the original goal, i.e., artificial intelligence. Deep learning is the inherent law and presentation hierarchy of learning sample data, and the information obtained during such learning is helpful in interpreting data such as text, images and sounds. Its final goal is to have the machine have analytical learning capabilities like a person, and to recognize text, image, and sound data. Deep learning is a complex machine learning algorithm that achieves far greater results in terms of speech and image recognition than prior art.
Smart searches are a new generation of search engines that incorporate artificial intelligence technology. Besides the functions of traditional quick search, relevance sorting and the like, the system can also provide functions of user role registration, automatic user interest identification, semantic understanding of content, intelligent informatization filtering, pushing and the like.
Machine translation (machine translation), also known as automatic translation, is a process of converting one natural language (source language) to another natural language (target language) using a computer. It is a branch of computational linguistics, one of the goals of artificial intelligence.
Fig. 1 is a flow chart of an information processing method according to an embodiment of the disclosure.
As shown in fig. 1, the information processing method may include:
s101, acquiring prompt words to be evaluated.
It should be noted that, in the embodiment of the present disclosure, the execution body of the information processing method may be a hardware device having data information processing capability and/or software necessary for driving the hardware device to operate. Alternatively, the execution body may include a server, a computer, a user terminal, and other intelligent devices. Optionally, the user terminal includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, etc. Alternatively, the server includes, but is not limited to, a web server, an application server, a server of a distributed system, a server incorporating a blockchain, etc.
In some implementations, the prompt word entered by the user may be used as the prompt word to be evaluated. Alternatively, the prompt word directly input by the user can be received and used as the prompt word to be evaluated. Alternatively, the voice information input by the user can be converted into text information, and the text information is used as a prompt word to be evaluated. For example, the user enters "artificial intelligence," which is the cue word to be evaluated.
In some implementations, the hint words are in the form of natural language text, where natural language includes, but is not limited to: the language types of the embodiments of the present disclosure are not limited by the languages of Chinese, english, japanese, german, etc.
S102, generating a preset example set according to the prompt words.
In some implementations, the set of preset examples includes one or more questions and answers that match the questions, that is, one question and answer that match it may be taken as one preset example. Alternatively, one or more questions may be generated from the prompt word and answers to the questions may be determined. Taking an answer of a question and the matching of the question as a preset example, and combining all preset examples into a preset example set.
Alternatively, one or more questions may be generated from the prompt word based on the large language model, and an answer matching each question may be automatically determined. And further takes a question and an answer matched with the question as a preset example.
For example, assuming that the prompt word is "a", the large language model may generate a question 1, a question 2, and a question 3 around the prompt word, and automatically determine an answer 1, an answer 2, and an answer 3 that match the question, and take the question 1 and the answer 1 as one preset example 1, and the preset example set includes the preset example 1, the preset example 2, and the preset example 3.
And S103, determining an evaluation dimension according to the prompt word and the preset example set, and generating an evaluation step for the evaluation dimension.
In some implementations, the evaluation dimensions include a first evaluation dimension and a second evaluation dimension, wherein the first evaluation dimension refers to an evaluation dimension common to the hint word and the second evaluation dimension refers to an evaluation dimension regarding expansion of the hint word.
Optionally, the types of the prompting words may be classified, and a general evaluation dimension is manually set for the types of the prompting words to determine a first evaluation dimension, where different types of prompting words correspond to different first evaluation dimensions. Alternatively, the generation capability of the large language model may be utilized to automatically generate a targeted, innovative evaluation dimension that may evaluate the set of preset examples as a second evaluation dimension.
Further, a corresponding evaluation step may be generated for each evaluation dimension based on the large language model, and a Chain of thought (CoT) about the evaluation step may be generated according to a relationship between each evaluation step, and evaluation may be performed in the form of a Chain of thought, so that the evaluation result may approximate to human thinking.
And S104, based on a preset example set, an evaluation dimension and an evaluation step, evaluating the prompt word to obtain a target evaluation value of the prompt word.
In some implementations, the hint words can be evaluated based on a target hint template. Optionally, the target prompt template can be obtained by assembling the prompt word, the preset example set, the evaluation dimension and the evaluation step to further evaluate the target prompt template to obtain a target evaluation value of the prompt word.
In some implementations, the target evaluation value for the alert word may be determined based on the probability distribution of the target alert template over the respective scores. Alternatively, the target evaluation value of the prompt word may be calculated by acquiring a probability distribution of each evaluation dimension on each score and calculating the target evaluation value of the prompt word according to the probability distribution.
For example, for a scoring range of 1-5 points, the target evaluation value for the cue word may be calculated based on a probability of 1 point, a probability of 2 points, a probability of 3 points, a probability of 4 points, and a probability of 5 points for each evaluation dimension.
According to the information processing method provided by the embodiment of the disclosure, the prompt word to be evaluated is obtained, the preset example set is generated according to the prompt word, and further the evaluation dimension and the evaluation step of the prompt word can be determined. By evaluating the prompt words, the preset example set, the evaluation dimension and the evaluation step, the target evaluation value is obtained, the automation evaluation can be realized, and a great deal of time and labor cost are saved. The evaluation direction can be controlled by customizing the evaluation dimension, so that a flexible evaluation mode can be realized, and the evaluation range is enlarged. The evaluation step adopts a thinking chain mode, so that the thinking process during human evaluation can be simulated, and the evaluation result is more similar to manual evaluation. The scoring mode is calculated by adopting probability distribution, so that low variance of the score and low correlation of human judgment can be avoided.
Fig. 2 is a flow chart of an information processing method according to an embodiment of the disclosure.
As shown in fig. 2, the information processing method may include:
s201, acquiring prompt words to be evaluated.
The relevant content of step S201 may be referred to the above embodiments, and will not be described herein.
S202, generating one or more problems related to the prompt word according to the prompt word by the large language model.
In some implementations, the large language model may first determine a context and a context related to the hint word based on the hint word in order to understand the meaning of the hint word and its relationship to other words, sentences, and then the large language model may generate a plurality of candidate questions based on the hint word and the context.
Alternatively, the large language model may filter the generated plurality of candidate questions, determine a question having the highest relevance to the prompt word from among the candidate questions, as one or more questions related to the prompt word.
For example, if the term is "artificial intelligence," the question generated around the term is "how does the development prospect of artificial intelligence? ", how does" evaluate artificial intelligence? "," what is the development core of artificial intelligence? "and the like.
S203, determining an answer matched with the question according to the prompt word and the question.
S204, taking the questions and answers matched with the questions as a preset example to obtain a preset example set.
In some implementations, answers to questions can be automatically generated by a large language model based on the prompt words and the questions. Alternatively, the large language model may generate, for each question, an answer matching the question by analyzing and understanding the prompt word and the question.
In some implementations, by taking a question and an answer matching the question as a pre-set example, a basis may be provided for a subsequent evaluation process. Combining multiple preset examples into a preset example set allows for more comprehensive evaluation.
For example, if the answer matched with the question a is the answer a, the question a+the answer a is a preset example a, and if the answer matched with the question B is the answer B, the question b+the answer B is a preset example B. The preset examples a and B constitute a preset example set.
S205, determining an evaluation dimension according to the prompt word and the preset example set, and generating an evaluation step for the evaluation dimension.
S206, based on the preset example set, the evaluation dimension and the evaluation step, evaluating the prompt word to obtain a target evaluation value of the prompt word.
The relevant content of steps S205-S206 can be seen in the above embodiments, and will not be described here again.
According to the information processing method provided by the embodiment of the disclosure, the prompt word to be evaluated is obtained, one or more questions related to the prompt word are determined according to the prompt word, a preset example set is generated according to the questions and answers matched with the questions, and then the evaluation dimension and the evaluation step of the prompt word can be determined. By evaluating the prompt words, the preset example set, the evaluation dimension and the evaluation step, the target evaluation value is obtained, the automation evaluation can be realized, and a great deal of time and labor cost are saved. The evaluation direction can be controlled by customizing the evaluation dimension, so that a flexible evaluation mode can be realized, and the evaluation range is enlarged. The evaluation step adopts a thinking chain mode, so that the thinking process during human evaluation can be simulated, and the evaluation result is more similar to manual evaluation. The scoring mode is calculated by adopting probability distribution, so that low variance of the score and low correlation of human judgment can be avoided.
Fig. 3 is a flow chart of an information processing method according to an embodiment of the disclosure.
As shown in fig. 3, the information processing method may include:
s301, acquiring prompt words to be evaluated.
S302, generating a preset example set according to the prompt words.
The relevant content of steps S301 to S302 can be seen in the above embodiments, and will not be described here again.
S303, classifying and identifying the prompt words, and determining the target type to which the prompt words belong.
In some implementations, feature information of the hint word may be determined by feature extraction of the hint word, and a target type of the hint word may be determined based on the feature information. Alternatively, keywords or keywords in the hint words may be used as feature information of the hint words. The method comprises the steps of identifying keywords or keywords of the prompt word, and determining the target type of the prompt word according to the keywords or keywords.
In the exemplary description, let the prompt word be "please play exactly one friend with knowledge," and if the keyword is identified as "play", then the target type to which the prompt word belongs is determined as a role class. And (3) setting the prompting word as 'please and optimistic dialogue with me', and determining the target type to which the prompting word belongs as emotion type if the keyword is identified as 'optimistic'.
S304, determining a general first evaluation dimension corresponding to the prompt word based on the target type.
In some implementations, for different types of alert words, a general assessment dimension may be manually formulated for the alert word as the first assessment dimension. That is, the first evaluation dimensions of the same type of cue words are the same and the first evaluation dimensions of different types of cue words are different.
For example, if the target type of the prompt word is a role class, the first evaluation dimension includes: the degree of role awareness (e.g., whether or not to play a set role), whether or not a presentation word is exposed, the degree of intelligibility, and the like. If the target type of the prompt word is emotion, the first evaluation dimension comprises: the intelligence quotient shows, the co-emotion ability, whether the intelligence quotient has concentricity or not, etc.
S305, generating exclusive second evaluation dimensions corresponding to the prompt words by the large language model according to the preset example set.
In some implementations, the large language model may generate a targeted, innovative second evaluation dimension for the hint word based on the questions in the set of preset examples and the answers that match the questions. The second evaluation dimension of each prompt word is the evaluation dimension exclusive to the prompt word, namely, the second evaluation dimensions of different prompt words are different.
In an exemplary illustration, let the prompt be "please play a better friend", and according to a preset example set corresponding to the prompt, the generated second evaluation dimension includes: awareness, humor level, feasibility of answers, friendliness, and the like. Setting a prompt word as 'making a toxic tongue girlfriend', and generating a second evaluation dimension according to a preset example set corresponding to the prompt word, wherein the second evaluation dimension comprises: irony, practicality, degree of offensiveness, etc.
S306, generating an evaluation step for the evaluation dimension.
In some implementations, the first evaluation dimension and the second evaluation dimension constitute an evaluation dimension of the hint word, and the evaluation step of the evaluation dimension may be generated by the large language model from the hint word. The large language model combines the prompt words to generate corresponding evaluation steps for each evaluation dimension. The evaluation steps are provided with a thinking Chain (CoT) for representing the logic relationship between the evaluation steps, and the evaluation steps adopt a thinking Chain mode, so that the thinking process during human evaluation can be simulated, and the evaluation result is more similar to that of manual evaluation.
Meanwhile, the large language model can also generate corresponding scoring criteria for each evaluation dimension. For example, there is a scoring range of 1-5 points, determining which gear score should be obtained by the answer in any evaluation dimension, as a scoring criterion. For example, answer A should get 3 points on the plot of the lover.
S307, based on the preset example set, the evaluation dimension and the evaluation step, the prompt word is evaluated to obtain a target evaluation value of the prompt word.
The relevant content of step S307 can be seen in the above embodiment, and will not be described here again.
According to the information processing method provided by the embodiment of the disclosure, the prompt word to be evaluated is obtained, and the preset example set is generated according to the prompt word. The first evaluation dimension and the second evaluation dimension can be determined according to the prompt word and the preset example set, and an evaluation step is generated according to the evaluation dimensions. By evaluating the prompt words, the preset example set, the evaluation dimension and the evaluation step, the target evaluation value can be obtained, the automation evaluation can be realized, and a great deal of time and labor cost are saved. The evaluation direction can be controlled by customizing the evaluation dimension, so that a flexible evaluation mode can be realized, and the evaluation range is enlarged. The evaluation step adopts a thinking chain mode, so that the thinking process during human evaluation can be simulated, and the evaluation result is more similar to manual evaluation. The scoring mode is calculated by adopting probability distribution, so that low variance of the score and low correlation of human judgment can be avoided.
Fig. 4 is a flowchart of an information processing method according to an embodiment of the present disclosure.
As shown in fig. 4, the information processing method may include:
s401, acquiring prompt words to be evaluated.
S402, generating a preset example set according to the prompt words.
S403, determining an evaluation dimension according to the prompt word and the preset example set, and generating an evaluation step for the evaluation dimension.
The relevant content of steps S401 to S403 can be seen in the above embodiments, and will not be described here again.
S404, carrying out templated combination on the prompt words, the preset example set, the evaluation dimension and the evaluation step to obtain a target prompt template.
In some implementations, in order to improve the evaluation efficiency and maintain the consistency of the evaluation format structure, the prompt word, the preset example set, the evaluation dimension and the evaluation step may be templated based on a preset template, so as to obtain a target prompt template.
In some implementations, the target prompt template may be obtained by obtaining model configuration information, and templatizing the prompt word and the preset example set, the evaluation dimension and the evaluation step, and the model configuration information. Optionally, the model configuration information is acquired, parameters of the large language model can be determined, and based on the model parameters, the prompt words, the preset example set, the evaluation dimension and the evaluation step, and the model configuration information are subjected to templated combination. For example, the model parameter may be set to 1 to increase the certainty of the target hint template.
S405, evaluating the prompt words by the large language model according to the target prompt template to obtain target evaluation values of the prompt words.
In some implementations, the target hint templates are input into a large language model, and the probability distribution of each evaluation dimension over the respective scores under the preset examples is output by the large language model. For example, the probability of 1 score for preset example a in evaluation dimension 1 is 10%, the probability of 2 score is 50%, the probability of 3 score is 70%, the probability of 4 score is 95%, and the probability of 5 score is 40%.
Further, the target evaluation value of the hint word may be obtained based on probability distribution in each evaluation dimension under a preset example. Optionally, the target evaluation value of the prompt word can be obtained by weighting the probability distribution, so that the low correlation of the target evaluation value can be avoided, and the subtle difference of the score can be conveniently reflected.
In some implementations, for each evaluation dimension, a first evaluation value for the evaluation dimension may be determined based on a probability distribution over the evaluation dimension. Alternatively, the first evaluation value for each evaluation dimension may be obtained by calculating a weighted sum of probability distributions over that evaluation dimension.
Further, a target evaluation value of the prompt word is obtained based on the first evaluation value in each evaluation dimension under the preset example. Alternatively, for each preset example, the first evaluation value in each evaluation dimension may be weighted, resulting in a second evaluation value for the preset example. And carrying out weighting operation on the second evaluation value of each preset example to obtain the target evaluation value of the prompt word. The first weight values for weighting the first evaluation values may be the same or different. The second weight values for weighting the second evaluation values may be the same or different. Wherein the first evaluation value, the second evaluation value, and the target evaluation value are specific scores.
According to the information processing method provided by the embodiment of the disclosure, the prompt word to be evaluated is obtained, the preset example set is generated according to the prompt word, and further the evaluation dimension and the evaluation step of the prompt word can be determined. And carrying out template combination on the prompt words, the preset example set, the evaluation dimension and the evaluation step to obtain the target prompt template. The probability distribution in each evaluation dimension is obtained based on the target prompt template, the target evaluation value is determined by carrying out weighted calculation on the probability distribution, the automation evaluation can be realized, and a large amount of time and labor cost are saved. The evaluation direction can be controlled by customizing the evaluation dimension package, so that a flexible evaluation mode can be realized, and the evaluation range is enlarged. The evaluation step adopts a thinking chain mode, so that the thinking process during human evaluation can be simulated, and the evaluation result is more similar to manual evaluation. The scoring mode is calculated by adopting probability distribution, so that low variance of the score and low correlation of human judgment can be avoided.
Fig. 5 is a flowchart of an information processing method according to an embodiment of the present disclosure.
As shown in fig. 5, the information processing method may include:
S501, acquiring prompt words to be evaluated.
S502, generating a preset example set according to the prompt words.
And S503, determining an evaluation dimension according to the prompt words and the preset example set.
S504, generating an evaluation step for evaluating the dimension by the large language model according to the prompt word.
S505, carrying out templated combination on the prompt words, the preset example set, the evaluation dimension and the evaluation step to obtain a target prompt template.
S506, inputting the target prompt template into a large language model, and outputting probability distribution of each evaluation dimension on each score under a preset example by the large language model.
S507, for each evaluation dimension, determining a first evaluation value of the evaluation dimension based on the probability distribution over the evaluation dimension.
S508, obtaining a target evaluation value of the prompt word based on the first evaluation value in each evaluation dimension in the preset example.
According to the information processing method provided by the embodiment of the disclosure, the prompt word to be evaluated is obtained, the preset example set is generated according to the prompt word, and further the evaluation dimension and the evaluation step of the prompt word can be determined. By evaluating the prompt words, the preset example set, the evaluation dimension and the evaluation step, the target evaluation value is obtained, the automation evaluation can be realized, and a great deal of time and labor cost are saved. The evaluation direction can be controlled by customizing the evaluation dimension package, so that a flexible evaluation mode can be realized, and the evaluation range is enlarged. The evaluation step adopts a thinking chain mode, so that the thinking process during human evaluation can be simulated, and the evaluation result is more similar to manual evaluation. The scoring mode is calculated by adopting probability distribution, so that low variance of the score and low correlation of human judgment can be avoided.
Fig. 6 shows a schematic flow chart for evaluating a model. The method comprises the steps of obtaining a prompt word, generating one or more questions related to the prompt word according to the prompt word, determining answers matched with the questions, and forming a preset example set by taking one question and the answer matched with the question as a preset example. And determining a first evaluation dimension according to the prompt word, and determining a second evaluation dimension according to the preset example set, wherein the evaluation dimension consists of the first evaluation dimension and the second evaluation dimension. Further, for each evaluation dimension of the prompt word, a corresponding evaluation step is determined by the large language model, a thought chain for representing the logical relationship between the evaluation steps is generated, and a corresponding scoring standard, such as a scoring category of 1-5, is automatically generated, and what answer should obtain a score of which gear.
Further, based on the model configuration information, the prompt words, the preset example set and the evaluation method are subjected to templated combination to obtain the target prompt template. The evaluation method comprises the steps of evaluating dimensions and evaluating. And inputting the target prompt template into a large language model, and evaluating the prompt words by the large language model according to the target prompt template. The first evaluation value of the preset example is calculated by determining a probability distribution of each evaluation dimension over the respective scores for each preset example. And further calculating a weighted sum of the first evaluation values of each evaluation dimension to obtain a second evaluation value of the preset example. And calculating a weighted sum of the second evaluation values of each preset example to obtain a target evaluation value of the prompt word, wherein the target evaluation value is a specific score. For example, the target evaluation value of the hint word a is 4 points.
In correspondence with the information processing methods provided in the above-described several embodiments, an embodiment of the present disclosure further provides an information processing apparatus, and since the information processing apparatus provided in the embodiment of the present disclosure corresponds to the information processing method provided in the above-described several embodiments, implementation of the information processing method described above is also applicable to the information processing apparatus provided in the embodiment of the present disclosure, and will not be described in detail in the following embodiments.
Fig. 7 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present disclosure.
As shown in fig. 7, an information processing apparatus 700 of an embodiment of the present disclosure includes an acquisition module 701, a first generation module 702, a second generation module 703, and an evaluation module 704.
The obtaining module 701 is configured to obtain a prompt word to be evaluated.
A first generating module 702, configured to generate a preset example set according to the prompt word.
A second generating module 703, configured to determine an evaluation dimension according to the prompt word and the preset example set, and generate an evaluation step for the evaluation dimension.
And the evaluation module 704 is configured to evaluate the prompt word based on the preset example set, the evaluation dimension and the evaluation step, so as to obtain a target evaluation value of the prompt word.
In one embodiment of the present disclosure, the first generating module 702 is further configured to: generating one or more questions related to the prompt word according to the prompt word by a large language model; determining an answer matched with the question according to the prompt word and the question; and taking the question and an answer matched with the question as a preset example to obtain the preset example set.
In one embodiment of the present disclosure, the second generating module 703 is further configured to: classifying and identifying the prompting words, and determining the target types to which the prompting words belong; determining a general first evaluation dimension corresponding to the prompt word based on the target type; and generating a proprietary second evaluation dimension corresponding to the prompt word by the large language model according to the preset example set.
In one embodiment of the present disclosure, the second generating module 703 is further configured to: and generating evaluation steps of the evaluation dimension by the large language model according to the prompt words, wherein thinking chains are used for representing logical relations among the evaluation steps.
In one embodiment of the present disclosure, the evaluation module 704 is further configured to: template combination is carried out on the prompt words, the preset example set, the evaluation dimension and the evaluation step to obtain a target prompt template; and evaluating the prompt word by the large language model according to the target prompt template to obtain a target evaluation value of the prompt word.
In one embodiment of the present disclosure, the evaluation module 704 is further configured to: and obtaining model configuration information, and carrying out templated combination on the prompt words, the preset example set, the evaluation dimension, the evaluation step and the model configuration information to obtain the target prompt template.
In one embodiment of the present disclosure, the evaluation module 704 is further configured to: inputting the target prompt template into the large language model, and outputting probability distribution of each evaluation dimension on each score under the preset example by the large language model; and obtaining target evaluation values of the prompt words based on the probability distribution in each evaluation dimension under the preset examples.
In one embodiment of the present disclosure, the evaluation module 704 is further configured to: determining, for each of the evaluation dimensions, a first evaluation value for the evaluation dimension based on the probability distribution over the evaluation dimension; and obtaining a target evaluation value of the prompt word based on the first evaluation value of each evaluation dimension under the preset example.
In one embodiment of the present disclosure, the evaluation module 704 is further configured to: for each preset example, weighting the first evaluation value in each evaluation dimension to obtain a second evaluation value of the preset example; and carrying out weighting operation on the second evaluation value of each preset example to obtain a target evaluation value of the prompt word.
According to the information processing method provided by the embodiment of the disclosure, the prompt word to be evaluated is obtained, the preset example set is generated according to the prompt word, and further the evaluation dimension and the evaluation step of the prompt word can be determined. By evaluating the prompt words, the preset example set, the evaluation dimension and the evaluation step, the target evaluation value is obtained, the automation evaluation can be realized, and a great deal of time and labor cost are saved. The evaluation direction can be controlled by customizing the evaluation dimension, so that a flexible evaluation mode can be realized, and the evaluation range is enlarged. The evaluation step adopts a thinking chain mode, so that the thinking process during human evaluation can be simulated, and the evaluation result is more similar to manual evaluation. The scoring mode is calculated by adopting probability distribution, so that low variance of the score and low correlation of human judgment can be avoided.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the device 800 includes a computing unit 801 that can perform various suitable actions and processes according to computer programs/instructions stored in a Read Only Memory (ROM) 802 or loaded from a storage unit 806 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.
Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the respective methods and processes described above, for example, an information processing method. For example, in some embodiments, the information processing method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as in some embodiments of the storage unit 806, part or all of the computer program/instructions may be loaded and/or installed onto the device 800 via the ROM 802 and/or the communication unit 809. When computer programs/instructions are loaded into RAM 803 and executed by computing unit 801, one or more steps of the information processing methods described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the information processing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs/instructions that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be a special or general purpose programmable processor, operable to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs/instructions running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (21)

1. An information processing method, wherein the method comprises:
acquiring a prompt word to be evaluated;
generating a preset example set according to the prompt word;
determining an evaluation dimension according to the prompt word and the preset example set, and generating an evaluation step for the evaluation dimension;
and based on the preset example set, the evaluation dimension and the evaluation step, evaluating the prompt word to obtain a target evaluation value of the prompt word.
2. The method of claim 1, wherein the generating a set of preset examples from the hint word comprises:
generating one or more questions related to the prompt word according to the prompt word by a large language model;
determining an answer matched with the question according to the prompt word and the question;
and taking the question and an answer matched with the question as a preset example to obtain the preset example set.
3. The method of claim 1, wherein the determining an evaluation dimension from the hint word and the set of preset examples comprises:
classifying and identifying the prompting words, and determining the target types to which the prompting words belong;
determining a general first evaluation dimension corresponding to the prompt word based on the target type;
and generating a proprietary second evaluation dimension corresponding to the prompt word by the large language model according to the preset example set.
4. A method according to claim 3, wherein the generating an evaluation step for the evaluation dimension comprises:
and generating evaluation steps of the evaluation dimension by the large language model according to the prompt words, wherein thinking chains are used for representing logical relations among the evaluation steps.
5. The method of claim 1, wherein the evaluating the hint word based on the set of preset examples, the evaluation dimension, and the evaluation step to obtain a target evaluation value of the hint word includes:
template combination is carried out on the prompt words, the preset example set, the evaluation dimension and the evaluation step to obtain a target prompt template;
and evaluating the prompt word by the large language model according to the target prompt template to obtain a target evaluation value of the prompt word.
6. The method of claim 5, wherein the templated combining of the hint words and the set of preset examples, the evaluation dimension, and the evaluation step results in a target hint template, comprising:
and obtaining model configuration information, and carrying out templated combination on the prompt words, the preset example set, the evaluation dimension, the evaluation step and the model configuration information to obtain the target prompt template.
7. The method according to claim 5 or 6, wherein the evaluating the prompt word by the large language model according to the target prompt template to obtain a target evaluation value of the prompt word includes:
Inputting the target prompt template into the large language model, and outputting probability distribution of each evaluation dimension on each score under the preset example by the large language model;
and obtaining target evaluation values of the prompt words based on the probability distribution in each evaluation dimension under the preset examples.
8. The method of claim 7, wherein the obtaining the target evaluation value of the hint word based on the probability distribution over each evaluation dimension of the preset examples includes:
determining, for each of the evaluation dimensions, a first evaluation value for the evaluation dimension based on the probability distribution over the evaluation dimension;
and obtaining a target evaluation value of the prompt word based on the first evaluation value of each evaluation dimension under the preset example.
9. The method of claim 8, wherein the obtaining the target evaluation value of the hint word based on the first evaluation value in each of the evaluation dimensions under the preset examples includes:
for each preset example, weighting the first evaluation value in each evaluation dimension to obtain a second evaluation value of the preset example;
And carrying out weighting operation on the second evaluation value of each preset example to obtain a target evaluation value of the prompt word.
10. An information processing apparatus, wherein the apparatus comprises:
the acquisition module is used for acquiring prompt words to be evaluated;
the first generation module is used for generating a preset example set according to the prompt words;
the second generation module is used for determining an evaluation dimension according to the prompt words and the preset example set and generating an evaluation step for the evaluation dimension;
and the evaluation module is used for evaluating the prompt word based on the preset example set, the evaluation dimension and the evaluation step to obtain a target evaluation value of the prompt word.
11. The apparatus of claim 10, wherein the first generation module is further configured to:
generating one or more questions related to the prompt word according to the prompt word by a large language model;
determining an answer matched with the question according to the prompt word and the question;
and taking the question and an answer matched with the question as a preset example to obtain the preset example set.
12. The apparatus of claim 10, wherein the second generation module is further configured to:
Classifying and identifying the prompting words, and determining the target types to which the prompting words belong;
determining a general first evaluation dimension corresponding to the prompt word based on the target type;
and generating a proprietary second evaluation dimension corresponding to the prompt word by the large language model according to the preset example set.
13. The apparatus of claim 12, wherein the second generation module is further configured to:
and generating evaluation steps of the evaluation dimension by the large language model according to the prompt words, wherein thinking chains are used for representing logical relations among the evaluation steps.
14. The apparatus of claim 10, wherein the evaluation module is further to:
template combination is carried out on the prompt words, the preset example set, the evaluation dimension and the evaluation step to obtain a target prompt template;
and evaluating the prompt word by the large language model according to the target prompt template to obtain a target evaluation value of the prompt word.
15. The apparatus of claim 14, wherein the evaluation module is further to:
and obtaining model configuration information, and carrying out templated combination on the prompt words, the preset example set, the evaluation dimension, the evaluation step and the model configuration information to obtain the target prompt template.
16. The apparatus of claim 14 or 15, wherein the evaluation module is further configured to:
inputting the target prompt template into the large language model, and outputting probability distribution of each evaluation dimension on each score under the preset example by the large language model;
and obtaining target evaluation values of the prompt words based on the probability distribution in each evaluation dimension under the preset examples.
17. The apparatus of claim 16, wherein the evaluation module is further configured to:
determining, for each of the evaluation dimensions, a first evaluation value for the evaluation dimension based on the probability distribution over the evaluation dimension;
and obtaining a target evaluation value of the prompt word based on the first evaluation value of each evaluation dimension under the preset example.
18. The apparatus of claim 17, wherein the evaluation module is further configured to:
for each preset example, weighting the first evaluation value in each evaluation dimension to obtain a second evaluation value of the preset example;
and carrying out weighting operation on the second evaluation value of each preset example to obtain a target evaluation value of the prompt word.
19. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
20. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-9.
21. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the method steps of any of claims 1-9.
CN202311309876.XA 2023-10-10 2023-10-10 Information processing method, information processing device, electronic equipment and storage medium Pending CN117422067A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311309876.XA CN117422067A (en) 2023-10-10 2023-10-10 Information processing method, information processing device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311309876.XA CN117422067A (en) 2023-10-10 2023-10-10 Information processing method, information processing device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117422067A true CN117422067A (en) 2024-01-19

Family

ID=89529291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311309876.XA Pending CN117422067A (en) 2023-10-10 2023-10-10 Information processing method, information processing device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117422067A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117744753A (en) * 2024-02-19 2024-03-22 浙江同花顺智能科技有限公司 Method, device, equipment and medium for determining prompt word of large language model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117744753A (en) * 2024-02-19 2024-03-22 浙江同花顺智能科技有限公司 Method, device, equipment and medium for determining prompt word of large language model
CN117744753B (en) * 2024-02-19 2024-05-03 浙江同花顺智能科技有限公司 Method, device, equipment and medium for determining prompt word of large language model

Similar Documents

Publication Publication Date Title
EP3910492A2 (en) Event extraction method and apparatus, and storage medium
CN113590776B (en) Knowledge graph-based text processing method and device, electronic equipment and medium
EP4113357A1 (en) Method and apparatus for recognizing entity, electronic device and storage medium
CN116737908A (en) Knowledge question-answering method, device, equipment and storage medium
CN117422067A (en) Information processing method, information processing device, electronic equipment and storage medium
EP3992814A2 (en) Method and apparatus for generating user interest profile, electronic device and storage medium
CN111460810A (en) Crowd-sourced task spot check method and device, computer equipment and storage medium
CN112287085B (en) Semantic matching method, system, equipment and storage medium
CN117747087A (en) Training method of large inquiry model, inquiry method and device based on large inquiry model
CN113792230B (en) Service linking method, device, electronic equipment and storage medium
CN114118049B (en) Information acquisition method, device, electronic equipment and storage medium
CN113641724B (en) Knowledge tag mining method and device, electronic equipment and storage medium
CN112559713B (en) Text relevance judging method and device, model, electronic equipment and readable medium
CN114817476A (en) Language model training method and device, electronic equipment and storage medium
CN115840867A (en) Generation method and device of mathematical problem solving model, electronic equipment and storage medium
CN114118937A (en) Information recommendation method and device based on task, electronic equipment and storage medium
CN116244432B (en) Pre-training method and device for language model and electronic equipment
CN114201607B (en) Information processing method and device
CN116932714B (en) Method and device for training generated dialogue model and realizing generated dialogue
CN112989797B (en) Model training and text expansion methods, devices, equipment and storage medium
CN113344405B (en) Method, device, equipment, medium and product for generating information based on knowledge graph
CN116226478B (en) Information processing method, model training method, device, equipment and storage medium
CN114330345B (en) Named entity recognition method, training method, device, electronic equipment and medium
CN117093601A (en) Recall method, device, equipment and medium for structured data
CN117421403A (en) Intelligent dialogue method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination