CN118093838A - Large language model prompt word generation method, system, terminal equipment and medium - Google Patents

Large language model prompt word generation method, system, terminal equipment and medium Download PDF

Info

Publication number
CN118093838A
CN118093838A CN202410494748.5A CN202410494748A CN118093838A CN 118093838 A CN118093838 A CN 118093838A CN 202410494748 A CN202410494748 A CN 202410494748A CN 118093838 A CN118093838 A CN 118093838A
Authority
CN
China
Prior art keywords
word
relation
prompt
node
prompt word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410494748.5A
Other languages
Chinese (zh)
Other versions
CN118093838B (en
Inventor
刘星宝
李鑫
刘庆东
李迦迦
张言波
刘利枚
贺勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiangjiang Laboratory
Original Assignee
Xiangjiang Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiangjiang Laboratory filed Critical Xiangjiang Laboratory
Priority to CN202410494748.5A priority Critical patent/CN118093838B/en
Publication of CN118093838A publication Critical patent/CN118093838A/en
Application granted granted Critical
Publication of CN118093838B publication Critical patent/CN118093838B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions
    • G06F16/3323Query formulation using system suggestions using document space presentation or visualization, e.g. category, hierarchy or range presentation and selection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • G06F16/337Profile generation, learning or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)

Abstract

The application is suitable for the technical field of large language models, and provides a large language model prompt word generation method, a system, terminal equipment and a medium. Constructing a personalized corpus based on user data; acquiring a prompt word input by a user, and dividing the prompt word; determining a root node in the core word, and determining the relation between the core word and other unit texts according to the grammar relation; constructing a prompt word analysis tree according to the root nodes and the relation; calculating evolution probability and node depth probability, and evolving the prompt word analysis tree to obtain a final prompt word analysis tree; generating a plurality of new prompt words according to the final prompt word analysis tree, calculating the comprehensive score of each new prompt word based on the personalized corpus and the preset professional field corpus, and outputting the new prompt word corresponding to the highest comprehensive score to the user. The method and the device can improve the accuracy of generating the prompt words.

Description

Large language model prompt word generation method, system, terminal equipment and medium
Technical Field
The application belongs to the technical field of large language models, and particularly relates to a large language model prompt word generation method, a large language model prompt word generation system, terminal equipment and a medium.
Background
The large language model refers to a deep learning model trained by using a large amount of text data, can generate natural language text or understand the meaning of the language text, can process various natural language tasks such as text classification, question-answer, dialogue and the like, and is an important path leading to artificial intelligence.
However, when interacting with large language models, the user-entered prompt often suffers from inaccuracy, ambiguity, incompleteness, lack of key information, misleading words, grammar mistakes, misspellings, ambiguous or ambiguous words, etc.
Furthermore, for domain-specific questions, the user may not provide background information or terms of the relevant domain, resulting in the model failing to understand or answer the relevant questions. Some large language models also limit the length of the prompt words entered by the user, which results in the prompt words being truncated or reduced, which in turn results in the loss or incompleteness of information. Therefore, a method capable of accurately generating a hint word is needed.
Disclosure of Invention
The application provides a large language model prompt word generation method, a large language model prompt word generation system, terminal equipment and medium, which can improve the accuracy of prompt word generation.
In a first aspect, the present application provides a method for generating a large language model prompt word, including:
constructing a personalized corpus based on pre-acquired user data; the personalized corpus is used for providing prompt words which better meet the requirements and preferences of specific users, thereby improving interaction experience and accuracy.
Acquiring a prompt word input by a user, and dividing the prompt word; wherein the prompt word is segmented into at least one unit text;
selecting a core word from at least one unit text as a root node, and determining the relation between the core word and other unit texts according to a preset grammar relation;
Constructing a prompt word analysis tree according to the root nodes and the relation; the nodes of the prompt word analysis tree correspond to the unit texts one by one, and the edges of the prompt word analysis tree represent the relation among the nodes;
Respectively calculating the evolution probability and the node depth probability of each node, and evolving the prompt word analysis tree according to the evolution probability and the node depth probability until a new prompt word analysis tree obtained by evolution meets a preset evolution termination condition, so as to obtain a final prompt word analysis tree; the node depth probability is used for calculating the contribution of the depth of the node to the evolution probability. The method has the effect of adding depth-related weights into the evolution probability of the nodes so as to consider the importance of the nodes in the prompt word structure and the context dependency.
Generating a plurality of new prompt words according to the final prompt word analysis tree, calculating the comprehensive score of each new prompt word based on the personalized corpus and the preset professional field corpus, and outputting the new prompt word corresponding to the highest comprehensive score to the user.
Optionally, the user data includes user behavior information, user portrait information, and user feedback information.
Optionally, the core word is a verb with a main-predicate relation;
Grammatical relations include a master-predicate relation, a verb object relation, an indirect object relation, a subordinate master-predicate relation, a subordinate complement relation, an open complement relation, a passive master-predicate relation, a auxiliary verb relation, a series verb relation, a qualifier relation, an adjective modifier relation, a quantity modifier relation, an orthotopic relation, a idiom relation, a compound word relation, a marker relation, a preposition relation, a subordinate relation modifier relation, a parallel structure relation, a punctuation relation, a juxtaposition relation, and a flat modifier relation.
Alternatively, the evolution probability is calculated by the formula; Wherein/>Represents the/>Individual node/>Evolution probability of/>Representing a mapping function,/>Representing nodes/>Part of speech feature,/>Represents the/>Individual node/>Node depth probability,/>Represents the/>Individual node/>Is a position depth of (2);
the calculation formula of the node depth probability is as follows ; Wherein/>Represents the/>Individual node/>Node depth probability,/>Represents the/>Individual node/>Position depth of/>Representing the maximum depth of the hint word parse tree.
Optionally, the calculation expression of the composite score is:
Wherein, Representing the composite score,/>Representing a large model score representing a new prompt term/>, by the large modelAnd original prompt words/>The base scores given by comparison are-Representing the personalized corpus,/>Representing the specialized domain corpus,/>Representing unit text/>Importance score of/>Representing unit text/>And personalized corpus/>And professional domain corpus/>Cosine similarity scores between.
Optionally, the evolution termination condition is that the evolution frequency is greater than or equal to a preset maximum evolution frequency.
In a second aspect, the present application provides a large language model hint word generating system, including:
the corpus construction module is used for constructing a personalized corpus based on the user data acquired in advance;
the prompt word segmentation module is used for acquiring the prompt words input by the user and segmenting the prompt words; wherein the prompt word is segmented into at least one unit text;
the relation determining module is used for selecting a core word from at least one unit text as a root node and determining the relation between the core word and other unit texts according to a preset grammar relation;
the analysis tree construction module is used for constructing a prompt word analysis tree according to the root nodes and the relation; the nodes of the prompt word analysis tree correspond to the unit texts one by one, and the edges of the prompt word analysis tree represent the relation among the nodes;
The analysis tree evolution module is used for calculating the evolution probability and the node depth probability of each node respectively, and evolving the prompt word analysis tree according to the evolution probability and the node depth probability until the new prompt word analysis tree obtained by evolution meets the preset evolution termination condition, so as to obtain a final prompt word analysis tree;
The prompt word generation module is used for generating a plurality of new prompt words according to the final prompt word analysis tree, calculating the comprehensive score of each new prompt word based on the personalized corpus and the preset professional field corpus, and outputting the new prompt word corresponding to the highest comprehensive score to the user.
In a third aspect, the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the method for generating a large language model prompt word described above when executing the computer program.
In a fourth aspect, the present application provides a computer readable storage medium storing a computer program, which when executed by a processor, implements the above-described large language model hint word generating method.
The scheme of the application has the following beneficial effects:
According to the large language model prompt word generation method provided by the application, the prompt word analysis tree is evolved according to the evolution probability and the node depth probability to obtain the final prompt word analysis tree, so that the prompt word with larger influence on the whole semantics can be reserved, and the accuracy of the generated prompt word is improved; based on the personalized corpus and the preset professional field corpus, the comprehensive score of each new prompting word is calculated, the importance and the relevance of the new prompting word in the personalized corpus and the professional field corpus are combined, the new prompting word is output according to the highest comprehensive score, and the accuracy of the generated prompting word can be further improved.
Other advantageous effects of the present application will be described in detail in the detailed description section which follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for generating large language model hint words according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a hint word parse tree according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a large language model hint word generating system according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
Aiming at the problem of low accuracy of the prompting words generated by the traditional method, the application provides a large language model prompting word generation method, a system, terminal equipment and a medium, wherein the method evolves a prompting word analysis tree according to evolution probability and node depth probability to obtain a final prompting word analysis tree, so that prompting words with great influence on overall semantics can be reserved, and the accuracy of the generated prompting words is improved; based on the personalized corpus and the preset professional field corpus, the comprehensive score of each new prompting word is calculated, the importance and the relevance of the new prompting word in the personalized corpus and the professional field corpus are combined, the new prompting word is output according to the highest comprehensive score, and the accuracy of the generated prompting word can be further improved.
The method for generating the large language model prompt word provided by the application is exemplified below.
As shown in FIG. 1, the method for generating the large language model prompt word provided by the application comprises the following steps:
and 11, constructing a personalized corpus based on the pre-acquired user data.
The personalized corpus is used for providing prompt words which better meet the requirements and preferences of specific users, thereby improving interaction experience and accuracy.
Specifically, the user data includes user behavior information, user portrait information, and user feedback information. In an embodiment of the present application, after the data is collected, the data may be stored in a user information table of a corresponding user, and a unique identifier is set for each user to prevent confusion, and the collected user information tables of all users form the personalized corpus. The database may be created using existing database creation methods, the kind of which is not limited here.
In an embodiment of the present application, the above data information is specifically:
User behavior information including user operations in a large language model, search records, click preferences, and the like. The data is analyzed to mine the user's preferences, interests and usage habits, such as words, phrases or expressions that are frequently used by the user.
User profile information including age, gender, occupation, geographic location, etc. Based on this information, a user portrait is created that helps to understand the characteristics and context of the user.
User feedback information including user ratings, preferences, satisfaction, etc. And analyzing the feedback information to know the preference and the requirement of the user for generating the prompt words.
And step 12, acquiring the prompt words input by the user, and dividing the prompt words.
Wherein the prompt word is segmented into at least one unit text.
Illustratively, in one embodiment of the present application, the hint word is "THE CAT CHASED THE mouse" (cat trapping mouse), then each word is used as a unit text, and the segmentation result is. In other embodiments of the present application, individual words (or terms) may be used as unit text, without limitation.
And 13, selecting a core word from at least one unit text as a root node, and determining the relation between the core word and other unit texts according to a preset grammar relation.
The core word is a verb with a main-predicate relation.
Specifically, in the embodiment of the present application, the above-mentioned grammatical relations include a main-predicate relation, a verb object relation, an indirect object relation, a subordinate main-predicate relation, a subordinate complement relation, an open complement relation, a passive main-predicate relation, a auxiliary verb relation, a system verb relation, a qualifier relation, an adjective modifier relation, a quantity modifier relation, an apposition relation, a idiom relation, a compound word relation, a marker relation, a preposition relation, a subordinate relation modifier relation, a parallel structure relation, a punctuation relation, a juxtaposition relation, and a flat modifier relation.
And 14, constructing a prompt word analysis tree according to the root nodes and the relations.
The nodes of the prompt word analysis tree are in one-to-one correspondence with the unit texts, and the edges of the prompt word analysis tree represent the relation among the nodes.
In an embodiment of the present application, the hint word parse tree is comprised of a set of nodes and a set of edges, the set of nodes may be represented as,/>Represents the/>Personal node,/>The numbering of the nodes starts from root node number 1, and is numbered layer by layer in the order from top to bottom and from left to right. Illustratively, a hint word parse tree is shown in FIG. 2.
Further, the nodeThe value of (2) represents the characteristics of the node, and the expression is: /(I)Wherein/>Representing semantic features of nodes (unit text)/>Representing part of speech features,/>Representing the location characteristics (where the node is located).
For example, the nodes in FIG. 2Semantic feature s with text "cashed" is interpreted as "captured" by context, part-of-speech feature w is verb, location feature/>The value of (2) is/>
The edge set may be represented as,/>Representing nodes/>And nodeConnected edges,/>The characteristic value of (a) is expressed as/>Wherein/>Representing nodes/>And node/>Grammatical relations between.
The hint word parse tree may be represented asWherein, when/>Time,/>Representing the whole prompt word analysis tree, and representing nodes/>, when i is other valuesIs the subtree of the root node. /(I)Is expressed as the characteristic value ofWherein/>Representation tree/>Included node set,/>Representation tree/>Included edge set,/>Representation tree/>A set of words included.
And 15, respectively calculating the evolution probability and the node depth probability of each node, and evolving the prompt word analysis tree according to the evolution probability and the node depth probability until the new prompt word analysis tree obtained by evolution meets the preset evolution termination condition, so as to obtain a final prompt word analysis tree.
The node depth probability is used for calculating the contribution of the depth of the node to the evolution probability. The method has the effect of adding depth-related weights into the evolution probability of the nodes so as to consider the importance of the nodes in the prompt word structure and the context dependency.
Specifically, the evolution probability is calculated by the formula; Wherein/>Represents the/>Individual node/>Evolution probability of/>Representing a mapping function,/>Representing nodes/>Part of speech feature,/>Represents the/>Individual node/>Node depth probability,/>Represents the/>Individual node/>Is a position depth of (2);
the calculation formula of the node depth probability is as follows ; Wherein/>Represents the/>Individual node/>Node depth probability,/>Represents the/>Individual node/>Position depth of/>Representing the maximum depth of the hint word parse tree. In this formula, nodes with smaller depth (closer to the root node) will have higher probability weights because they play a more important role in the hint word structure and context dependencies. Conversely, nodes with greater depth (close to leaf nodes) will have lower probability weights because they have relatively less semantic impact on the overall sentence.
In the embodiment of the application, the evolution termination condition is that the evolution frequency is greater than or equal to a preset maximum evolution frequency.
The evolution process in the embodiment of the present application is exemplarily described below.
Exemplary, first, a maximum number of iterations is initializedAnd hint word parse Tree Generation quantity/>
Then, all nodes are iterated from the root node, and whether the node is to be evolved is judged by importance of feature information (evolution probability and node depth probability) contained in the node. When the evolution probability of the node is greater than the preset evolution probability threshold, the node is determined to evolve, and in an embodiment of the present application, the preset evolution probability threshold may be set to 0.8.
And 16, generating a plurality of new prompt words according to the final prompt word analysis tree, calculating the comprehensive score of each new prompt word based on the personalized corpus and the preset professional field corpus, and outputting the new prompt word corresponding to the highest comprehensive score to the user.
Specifically, the calculation expression of the composite score is:
Wherein, Representing the composite score,/>Representing a large model score representing a new prompt term/>, by the large modelAnd original prompt words/>The base scores given by comparison are-Representing the personalized corpus,/>Representing the specialized domain corpus,/>Representing unit text/>Importance score of/>Representing unit text/>And personalized corpus/>And professional domain corpus/>Cosine similarity scores between.
According to the large language model prompt word generation method provided by the application, the prompt word analysis tree is evolved according to the evolution probability and the node depth probability to obtain the final prompt word analysis tree, so that the prompt word with great influence on the whole semantics can be reserved, and the accuracy of the generated prompt word is improved; based on the personalized corpus and the preset professional field corpus, the comprehensive score of each new prompting word is calculated, the importance and the relevance of the new prompting word in the personalized corpus and the professional field corpus are combined, the new prompting word is output according to the highest comprehensive score, and the accuracy of the generated prompting word can be further improved.
The large language model prompt word generating system provided by the application is exemplified below.
As shown in fig. 3, the large language model hint word generation system 300 includes:
The corpus construction module 301 is configured to construct a personalized corpus based on the user data collected in advance;
the prompt word segmentation module 302 is configured to obtain a prompt word input by a user, and segment the prompt word; wherein the prompt word is segmented into at least one unit text;
A relationship determining module 303, configured to select a core word from at least one unit text as a root node, and determine a relationship between the core word and other unit texts according to a preset grammatical relationship;
The parse tree construction module 304 is configured to construct a prompt word parse tree according to the root node and the relationship; the nodes of the prompt word analysis tree correspond to the unit texts one by one, and the edges of the prompt word analysis tree represent the relation among the nodes;
The analysis tree evolution module 305 is configured to calculate an evolution probability and a node depth probability of each node respectively, and evolve the analysis tree of the hint word according to the evolution probability and the node depth probability until a new analysis tree of the hint word obtained by evolution meets a preset evolution termination condition, so as to obtain a final analysis tree of the hint word;
the prompt word generation module 306 is configured to generate a plurality of new prompt words according to the final prompt word parsing tree, calculate a comprehensive score of each new prompt word based on the personalized corpus and the preset professional field corpus, and output a new prompt word corresponding to the highest comprehensive score to the user.
It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
As shown in fig. 4, an embodiment of the present application provides a terminal device, and as shown in fig. 4, a terminal device D10 of the embodiment includes: at least one processor D100 (only one processor is shown in fig. 4), a memory D101 and a computer program D102 stored in the memory D101 and executable on the at least one processor D100, the processor D100 implementing the steps in any of the various method embodiments described above when executing the computer program D102.
Specifically, when the processor D100 executes the computer program D102, a personalized corpus is constructed based on the user data collected in advance; acquiring a prompt word input by a user, and dividing the prompt word; selecting a core word from at least one unit text as a root node, and determining the relation between the core word and other unit texts according to a preset grammar relation; constructing a prompt word analysis tree according to the root nodes and the relation; respectively calculating the evolution probability and the node depth probability of each node, and evolving the prompt word analysis tree according to the evolution probability and the node depth probability until a new prompt word analysis tree obtained by evolution meets a preset evolution termination condition, so as to obtain a final prompt word analysis tree; generating a plurality of new prompt words according to the final prompt word analysis tree, calculating the comprehensive score of each new prompt word based on the personalized corpus and the preset professional field corpus, and outputting the new prompt word corresponding to the highest comprehensive score to the user. The method comprises the steps of carrying out evolution on a prompt word analysis tree according to evolution probability and node depth probability to obtain a final prompt word analysis tree, reserving prompt words with great influence on overall semantics, and improving the accuracy of the generated prompt words; based on the personalized corpus and the preset professional field corpus, the comprehensive score of each new prompting word is calculated, the importance and the relevance of the new prompting word in the personalized corpus and the professional field corpus are combined, the new prompting word is output according to the highest comprehensive score, and the accuracy of the generated prompting word can be further improved.
The Processor D100 may be a central processing unit (CPU, central Processing Unit), the Processor D100 may also be other general purpose processors, digital signal processors (DSP, digital Signal processors), application SPECIFIC INTEGRATED integrated circuits (ASICs), off-the-shelf Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory D101 may in some embodiments be an internal storage unit of the terminal device D10, for example a hard disk or a memory of the terminal device D10. The memory D101 may also be an external storage device of the terminal device D10 in other embodiments, for example, a plug-in hard disk, a smart memory card (SMC, smart Media Card), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the terminal device D10. Further, the memory D101 may also include both an internal storage unit and an external storage device of the terminal device D10. The memory D101 is used for storing an operating system, an application program, a boot loader (BootLoader), data, other programs, etc., such as program codes of the computer program. The memory D101 may also be used to temporarily store data that has been output or is to be output.
Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the various method embodiments described above.
Embodiments of the present application provide a computer program product enabling a terminal device to carry out the steps of the method embodiments described above when the computer program product is run on the terminal device.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a large language model hint word generating system/terminal equipment, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunication signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
While the foregoing is directed to the preferred embodiments of the present application, it will be appreciated by those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present application, and such modifications and adaptations are intended to be comprehended within the scope of the present application.

Claims (9)

1. A method for generating a large language model prompt word, comprising:
constructing a personalized corpus based on pre-acquired user data; the personalized corpus is used for providing prompt words which more meet the requirements and preferences of specific users;
acquiring a prompt word input by a user, and dividing the prompt word; wherein the prompt word is segmented into at least one unit text;
selecting a core word from the at least one unit text as a root node, and determining the relation between the core word and other unit texts according to a preset grammar relation;
Constructing a prompt word analysis tree according to the root node and the relation; the nodes of the prompt word analysis tree correspond to the unit texts one by one, and the edges of the prompt word analysis tree represent the relation among the nodes;
Respectively calculating the evolution probability and the node depth probability of each node, and evolving the prompt word analysis tree according to the evolution probability and the node depth probability until a new prompt word analysis tree obtained by evolution meets a preset evolution termination condition, so as to obtain a final prompt word analysis tree; the node depth probability is used for calculating the contribution of the depth of the node to the evolution probability, and the contribution is used for adding a depth-related weight into the evolution probability of the node so as to consider the importance of the node in the prompt word structure and the context dependency;
Generating a plurality of new prompt words according to the final prompt word analysis tree, calculating the comprehensive score of each new prompt word based on the personalized corpus and the preset professional field corpus, and outputting the new prompt word corresponding to the highest comprehensive score to a user.
2. The large language model hint word generating method of claim 1, wherein the user data includes historical input data, historical behavior data, user personal information, hint word feedback information, hint word interaction information, user preference information, and user portrayal information.
3. The large language model hint word generating method of claim 1, wherein the core word is a verb with a main-predicate relationship;
The grammatical relations include a master-predicate relation, a verb object relation, an indirect object relation, a subordinate master-predicate relation, a subordinate complement relation, an open complement relation, a passive master-predicate relation, a subordinate verb relation, a tethered verb relation, a qualifier relation, an adjective modifier relation, a quantity modifier relation, an orthotopic relation, a idiom relation, a compound word relation, a marker relation, a preposition relation, a subordinate modifier relation, a parallel structure relation, a punctuation relation, a juxtaposition relation, and a flat modifier relation.
4. The method for generating large language model hint words according to claim 1, wherein the calculation formula of the evolution probability is; Wherein/>Represents the/>Individual node/>Evolution probability of/>Representing a mapping function,/>Representing nodes/>Part of speech feature,/>Represents the/>Individual node/>Node depth probability,/>Represents the/>Individual node/>Is a position depth of (2);
The calculation formula of the node depth probability is as follows ; Wherein/>Represents the/>Individual node/>Node depth probability,/>Represents the/>Individual node/>Position depth of/>Representing the maximum depth of the prompt word parse tree.
5. The large language model hint word generation method of claim 1, wherein the calculation expression of the composite score is:
Wherein, Representing the composite score,/>Representing a large model score representing a new prompt term/>, by the large modelAnd original prompt words/>The base scores given by comparison are-Representing the personalized corpus,/>Representing the specialized domain corpus,/>Representing unit text/>Importance score of/>Representing unit text/>And personalized corpus/>And professional domain corpus/>Cosine similarity scores between.
6. The method for generating a large language model hint word according to claim 1, wherein the evolution termination condition is that the number of evolutions is equal to or greater than a preset maximum number of evolutions.
7. A large language model hint word generation system, comprising:
the corpus construction module is used for constructing a personalized corpus based on the user data acquired in advance;
the prompt word segmentation module is used for acquiring the prompt words input by the user and segmenting the prompt words; wherein the prompt word is segmented into at least one unit text;
The relation determining module is used for selecting a core word from the at least one unit text as a root node and determining the relation between the core word and other unit texts according to a preset grammar relation;
The analysis tree construction module is used for constructing a prompt word analysis tree according to the root node and the relation; the nodes of the prompt word analysis tree correspond to the unit texts one by one, and the edges of the prompt word analysis tree represent the relation among the nodes;
The analysis tree evolution module is used for respectively calculating the evolution probability and the node depth probability of each node, and evolving the prompt word analysis tree according to the evolution probability and the node depth probability until a new prompt word analysis tree obtained by evolution meets a preset evolution termination condition, so as to obtain a final prompt word analysis tree;
The prompt word generation module is used for generating a plurality of new prompt words according to the final prompt word analysis tree, calculating the comprehensive score of each new prompt word based on the personalized corpus and the preset professional field corpus, and outputting the new prompt word corresponding to the highest comprehensive score to a user.
8. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the large language model hint word generating method according to any one of claims 1 to 6 when the computer program is executed by the processor.
9. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the large language model hint word generating method according to any one of claims 1 to 6.
CN202410494748.5A 2024-04-24 2024-04-24 Large language model prompt word generation method, system, terminal equipment and medium Active CN118093838B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410494748.5A CN118093838B (en) 2024-04-24 2024-04-24 Large language model prompt word generation method, system, terminal equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410494748.5A CN118093838B (en) 2024-04-24 2024-04-24 Large language model prompt word generation method, system, terminal equipment and medium

Publications (2)

Publication Number Publication Date
CN118093838A true CN118093838A (en) 2024-05-28
CN118093838B CN118093838B (en) 2024-07-16

Family

ID=91155533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410494748.5A Active CN118093838B (en) 2024-04-24 2024-04-24 Large language model prompt word generation method, system, terminal equipment and medium

Country Status (1)

Country Link
CN (1) CN118093838B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140163962A1 (en) * 2012-12-10 2014-06-12 International Business Machines Corporation Deep analysis of natural language questions for question answering system
CN103914569A (en) * 2014-04-24 2014-07-09 百度在线网络技术(北京)有限公司 Input prompt method and device and dictionary tree model establishing method and device
CN106294324A (en) * 2016-08-11 2017-01-04 上海交通大学 A kind of machine learning sentiment analysis device based on natural language parsing tree
CN116012492A (en) * 2022-12-13 2023-04-25 特赞(上海)信息科技有限公司 Prompt word intelligent optimization method and system for character generation image
CN116881415A (en) * 2023-07-11 2023-10-13 严梓铭 System and method for large language model second order prompt words for digital person intelligence
CN117744753A (en) * 2024-02-19 2024-03-22 浙江同花顺智能科技有限公司 Method, device, equipment and medium for determining prompt word of large language model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140163962A1 (en) * 2012-12-10 2014-06-12 International Business Machines Corporation Deep analysis of natural language questions for question answering system
CN103914569A (en) * 2014-04-24 2014-07-09 百度在线网络技术(北京)有限公司 Input prompt method and device and dictionary tree model establishing method and device
CN106294324A (en) * 2016-08-11 2017-01-04 上海交通大学 A kind of machine learning sentiment analysis device based on natural language parsing tree
CN116012492A (en) * 2022-12-13 2023-04-25 特赞(上海)信息科技有限公司 Prompt word intelligent optimization method and system for character generation image
CN116881415A (en) * 2023-07-11 2023-10-13 严梓铭 System and method for large language model second order prompt words for digital person intelligence
CN117744753A (en) * 2024-02-19 2024-03-22 浙江同花顺智能科技有限公司 Method, device, equipment and medium for determining prompt word of large language model

Also Published As

Publication number Publication date
CN118093838B (en) 2024-07-16

Similar Documents

Publication Publication Date Title
US11816438B2 (en) Context saliency-based deictic parser for natural language processing
US10997370B2 (en) Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
US11017178B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
CN108304375B (en) Information identification method and equipment, storage medium and terminal thereof
US10496749B2 (en) Unified semantics-focused language processing and zero base knowledge building system
US7035789B2 (en) Supervised automatic text generation based on word classes for language modeling
US20140351228A1 (en) Dialog system, redundant message removal method and redundant message removal program
US11113470B2 (en) Preserving and processing ambiguity in natural language
CN108460011A (en) A kind of entitative concept mask method and system
CN110147544B (en) Instruction generation method and device based on natural language and related equipment
CN112417846B (en) Text automatic generation method and device, electronic equipment and storage medium
US11170169B2 (en) System and method for language-independent contextual embedding
JP2006065387A (en) Text sentence search device, method, and program
Denis New learning models for robust reference resolution
CN118093838B (en) Large language model prompt word generation method, system, terminal equipment and medium
CN111611793B (en) Data processing method, device, equipment and storage medium
CN110750967A (en) Pronunciation labeling method and device, computer equipment and storage medium
CN113255374B (en) Question and answer management method and system
CN113536776A (en) Confusion statement generation method, terminal device and computer-readable storage medium
CN112732885A (en) Answer extension method and device for question-answering system and electronic equipment
CN115437620B (en) Natural language programming method, device, equipment and storage medium
CN114091465B (en) Semantic recognition method, semantic recognition device, storage medium and electronic device
Truskinger et al. Reconciling folksonomic tagging with taxa for bioacoustic annotations
Narayan et al. Pre-Neural Approaches
Henrich et al. LISGrammarChecker: Language Independent Statistical Grammar Checking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant