CN117786061B - Large language model prediction method and device based on space-time attention mechanism - Google Patents

Large language model prediction method and device based on space-time attention mechanism Download PDF

Info

Publication number
CN117786061B
CN117786061B CN202311675342.9A CN202311675342A CN117786061B CN 117786061 B CN117786061 B CN 117786061B CN 202311675342 A CN202311675342 A CN 202311675342A CN 117786061 B CN117786061 B CN 117786061B
Authority
CN
China
Prior art keywords
entity
language model
target entity
prompt
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311675342.9A
Other languages
Chinese (zh)
Other versions
CN117786061A (en
Inventor
吴迪
胡汉一
卢冰洁
刘天蒙
那崇宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202311675342.9A priority Critical patent/CN117786061B/en
Publication of CN117786061A publication Critical patent/CN117786061A/en
Application granted granted Critical
Publication of CN117786061B publication Critical patent/CN117786061B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The specification discloses a large language model prediction method and device based on a space-time attention mechanism, which can be used for processing downstream prediction tasks of target entities and partial associated entities, and comprises the following steps: the method comprises the steps of obtaining space topological features by inputting initial features and dynamic knowledge patterns of target entities and associated entities into a space attention network; inputting the space topological characteristic into a time sequence attention network to obtain a space-time characteristic; implicit prompt is generated by utilizing the space-time characteristics of the target entity and part of the associated entities, and explicit event texts of the target entity are combined to generate prompt information; inputting prompt information into a large language model, keeping parameters of the large language model unchanged, and training a space attention network and a time sequence attention network by using the labeling information of a downstream prediction task and a loss function; and finally, processing the downstream prediction tasks of the target entity and part of the associated entities according to the trained network.

Description

Large language model prediction method and device based on space-time attention mechanism
Technical Field
The present disclosure relates to the field of knowledge graph and deep learning, and in particular, to a method and apparatus for predicting a large language model based on a spatiotemporal attention mechanism.
Background
Along with the increase of the training parameter quantity and the data quantity of the large language model, the capabilities of the large language model such as contextual learning, logical reasoning and the like gradually emerge, the effect of scenes such as text generation, question-answering systems, emotion analysis, automatic abstracts and the like is remarkably improved, and meanwhile, the man-machine dialogue capability and the task solving capability exhibited by the large language model greatly reduce the use threshold of business personnel.
On the other hand, due to the lack of assistance of background information such as a knowledge graph, a time sequence and the like, a large language model only depends on event text information and cannot well process tasks of a downstream prediction type, and the association of three-in-one is difficult to achieve.
Knowledge maps often contain spatial topology information between entities (the space in the specification generally refers to various structural spaces), and time sequences often contain development trend information of the entities. However, not the characteristics of each associated entity or the entity corresponding to each point in time are important to the downstream prediction task.
Therefore, a method is needed to extract the characteristics of the related entities with different weights and the entities corresponding to the time points with different weights, and fuse the text information of the event to assist the large language model to better process the downstream prediction task, so as to achieve the effect of three-in-one.
Disclosure of Invention
The present specification provides a method and apparatus for large language model prediction based on spatio-temporal attention mechanism to solve the above-mentioned problems existing in the prior art.
The technical scheme adopted in the specification is as follows:
the specification provides a large language model prediction method based on a space-time attention mechanism, which comprises the following steps:
determining a dynamic knowledge graph within a preset time period;
inputting the dynamic knowledge graph, the initial characteristics of the target entity and the initial characteristics of the associated entity of the target entity in the preset time period into a preset spatial attention network to obtain spatial topological characteristics of the target entity and the associated entity in the preset time period;
inputting the spatial topological characteristics of the target entity and the associated entity in the preset time period into a preset time sequence attention network to obtain the space-time characteristics of the target entity and the associated entity;
acquiring an implicit prompt based on the space-time characteristics, and splicing the implicit prompt with a prompt text in a preset prompt template to acquire spliced prompt information;
Inputting the spliced prompt information into a preset large language model to obtain an output result of the large language model so as to minimize the difference between the output result and the labeling information corresponding to the target entity and at least part of related entities, and training at least the spatial attention network and the time sequence attention network so as to predict the business related to the entity through the trained spatial attention network, the trained time sequence attention network and the large language model.
Optionally, the initial feature is obtained through various indexes of an entity corresponding to the initial feature, wherein the various indexes at least comprise a technical index and a basic surface index.
Optionally, the dynamic knowledge graph is used for representing changes of a plurality of business relationships between the entities in the preset time period.
Optionally, the dynamic knowledge graph in the preset time period includes a knowledge graph corresponding to each time point in the preset time period;
Inputting the dynamic knowledge graph, the initial characteristics of the target entity and the initial characteristics of the associated entity of the target entity in the preset time period into a preset spatial attention network to obtain spatial topological characteristics of the target entity and the associated entity in the preset time period, wherein the method specifically comprises the following steps of:
And inputting the knowledge graph corresponding to each time point in the preset time period, the initial characteristics of the target entity and the initial characteristics of the associated entity of the target entity into the spatial attention network to obtain the spatial topological characteristics corresponding to each time point in the preset time period of the target entity and the associated entity.
Optionally, inputting the spatial topological features of the target entity and the associated entity in the preset time period into a preset time sequence attention network to obtain the space-time features of the target entity and the associated entity, which specifically includes:
and inputting the spatial topological features of the target entity and the associated entity corresponding to each time point in the preset time period into the time sequence attention network, determining the attention weight corresponding to each time point through the time sequence attention network, and respectively and independently weighting and fusing the spatial topological features of the target entity and the associated entity corresponding to each time point in the preset time period to obtain the spatial and temporal features of the target entity and the associated entity.
Optionally, obtaining an implicit prompt based on the space-time features specifically includes:
Screening out target associated entities from the associated entities of the target entities in the dynamic knowledge graph; and obtaining implicit prompt according to the space-time characteristics of the target entity and the space-time characteristics of the target associated entity.
Optionally, inputting the spliced prompt information into a preset large language model to obtain an output result of the large language model, which specifically includes:
inputting the spliced prompt information into a preset large language model, obtaining a text vector corresponding to a prompt text in the spliced prompt information through a text embedding layer in the large language model, and splicing a feature vector corresponding to an implicit prompt in the spliced prompt information with the text vector to obtain a spliced vector;
and obtaining an output result of the large language model according to the spliced vector.
The present specification provides a large language model prediction apparatus based on a spatiotemporal attention mechanism, comprising:
the determining module is used for determining a dynamic knowledge graph in a preset time period;
The space feature determining module is used for inputting the dynamic knowledge graph, the initial feature of the target entity and the initial feature of the associated entity of the target entity in the preset time period into a preset space attention network to obtain the space topological features of the target entity and the associated entity in the preset time period;
The space-time feature determining module is used for inputting the space topological features of the target entity and the associated entity in the preset time period into a preset time sequence attention network to obtain the space-time features of the target entity and the associated entity;
The prompt information determining module is used for obtaining an implicit prompt based on the space-time characteristics, and splicing the implicit prompt with a prompt text in a preset prompt template to obtain spliced prompt information;
The training module is used for inputting the spliced prompt information into a preset large language model to obtain an output result of the large language model so as to minimize the difference between the output result and the labeling information corresponding to the target entity and at least part of related entities, and training at least the spatial attention network and the time sequence attention network so as to predict the business related to the entity through the trained spatial attention network, the trained time sequence attention network and the large language model.
The present specification provides a computer readable storage medium storing a computer program which when executed by a processor implements the above-described spatiotemporal attention mechanism based large language model prediction method.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above-described spatiotemporal attention mechanism based large language model prediction method when executing the program.
The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:
According to the large language model prediction method based on the space-time attention mechanism, the dynamic knowledge graph in the preset time period is determined, and the dynamic knowledge graph in the preset time period, the initial characteristics of the target entity and the initial characteristics of the associated entity of the target entity are input into a preset space attention network to obtain the space topological characteristics of the target entity and the associated entity in the preset time period. And then, the spatial topological features of the target entity and the associated entity in a preset time period can be input into a preset time sequence attention network to obtain the space-time features of the target entity and the associated entity. Furthermore, an implicit prompt can be obtained based on space-time characteristics, the implicit prompt is spliced with prompt texts in a preset prompt template to obtain spliced prompt information, then the spliced prompt information is input into a preset large language model to obtain an output result of the large language model, so that differences between the output result and labeling information corresponding to a target entity and at least part of related entities are minimized, at least the spatial attention network and the time-sequential attention network are trained, and the business related to the entities is predicted through the trained spatial attention network, the trained time-sequential attention network and the large language model.
Compared with the prior art, the invention has the following beneficial effects:
(1) The method can extract the characteristics of the related entities with different weights based on the spatial attention network, provide important spatial topology information for a large language model, and improve the accuracy of a downstream prediction task and the capability of the model in one-to-three;
(2) The invention can extract the characteristics of the entities corresponding to different weight time points based on the time sequence attention network,
(3) Providing important development trend information for the large language model, and improving the accuracy of the downstream prediction task;
(4) The invention can process the downstream prediction task by combining the space-time characteristics and the event text information simultaneously, and supplement short plates with insufficient background information of a large language model.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:
FIG. 1 is a schematic flow chart of a large language model prediction method based on a space-time attention mechanism provided in the present specification;
FIG. 2 is a schematic flow chart of determining spatial topology features provided in the present specification;
FIG. 3 is a schematic diagram of a model structure provided in the present specification;
FIG. 4 is a schematic diagram of a large language model prediction device based on a spatio-temporal attention mechanism provided in the present specification;
fig. 5 is a schematic structural view of the electronic device corresponding to fig. 1 provided in the present specification.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a large language model prediction method based on a spatio-temporal attention mechanism provided in the present specification, specifically including the following steps:
s100: and determining a dynamic knowledge graph within a preset time period.
S102: inputting the dynamic knowledge graph, the initial characteristics of the target entity and the initial characteristics of the associated entity of the target entity in the preset time period into a preset spatial attention network to obtain the spatial topological characteristics of the target entity and the associated entity in the preset time period.
S104: and inputting the spatial topological characteristics of the target entity and the associated entity in the preset time period into a preset time sequence attention network to obtain the space-time characteristics of the target entity and the associated entity.
In this specification, a prediction of a business result in the future of an entity needs to be performed based on a large language model in combination with a dynamic knowledge graph, where the entity mentioned herein may refer to a marketing company, and the business result may have a plurality of types, for example, the business result may be a risk occurrence rate, a stock price trend of the marketing company, and the like.
Based on the dynamic knowledge graph in the preset time period can be determined, and the dynamic knowledge graph in the preset time period, the initial characteristics of the target entity and the initial characteristics of the associated entity of the target entity are input into a preset spatial attention network to obtain the spatial topological characteristics of the target entity and the associated entity in the preset time period. The dynamic knowledge graph can represent the change of a plurality of business relations among the entities in the preset time period. The above-mentioned associated entity may refer to an entity directly connected to the target entity, i.e., a neighbor entity of the target entity.
The dynamic knowledge graph may include knowledge graphs corresponding to a plurality of time points in a preset time period, so that when determining the spatial topological feature, the spatial topological feature corresponding to each time point may be determined, as shown in fig. 2.
Fig. 2 is a schematic flow chart for determining a spatial topology feature provided in the present specification.
The knowledge graph corresponding to each time point in the preset time period, the initial characteristics of the target entity and the initial characteristics of the associated entity of the target entity can be input into a spatial attention network to obtain the spatial topological characteristics corresponding to each time point in the preset time period of the target entity and the associated entity.
It should be noted that, the historical service data of the target entity and the related entity may be used for training the spatial attention network and the time sequence attention network in the method, so that the service of the entity may be predicted through the trained spatial attention network, the trained time sequence attention network and the large language model.
At a time point (time) t within the above-mentioned preset period, the initial feature of the entity may be represented as d= [ x 1,...,xi,...,xn]T∈Rn*k, where n represents the number of entities and k represents the feature dimension.
By way of example, an entity may refer to a listing company, the initial characteristics may include technical indicators, basic surface indicators, etc. of the listing company (the basic surface indicators may include indexes of revenue, profit, liability level, market rate, net rate, etc. of the listing company; the technical indicators may include indexes of stock price, volume of volume, etc. of the listing company), and the relationships in the knowledge graph may include upstream and downstream of the industry chain, stock right relationships, plate relationships, bin holding relationships, etc. The spatial attention network can be a multi-relation diagram attention network, the attention weight of the associated entity of each entity under each type of relation can be calculated, and the characteristics of the entity are updated by utilizing the characteristics of the associated entity of different weights so as to obtain the spatial topological characteristics of the entity.
Of course, the entity includes the target entity and the associated entity. Taking the target entity as an example, at the time point t, the initial feature x i of the target entity i is updated by all the relationships set Relation as follows:
Wherein, The initial feature x i representing the target entity i is a spatial topological feature obtained by updating the associated entity set N r (i) under the relation r, and x' i represents a spatial topological feature corresponding to the target entity i at the time point t in the following manner:
wherein x j represents the initial feature of the associated entity j under the relationship r, the attention weight The calculation method is as follows:
Wherein the attention score The calculation method is as follows:
Wherein, And/>Query vector and key vector, respectively,/>For the edge weights under the relation r, Q (r)、K(r) is the learning parameter.
After determining the spatial topological characteristics of the target entity and the associated entity in the preset time period, the spatial topological characteristics of the target entity and the associated entity in the preset time period can be input into a preset time sequence attention network to obtain the spatial and temporal characteristics of the target entity and the associated entity.
The spatial topological features of the target entity and the related entity corresponding to each time point in the preset time period can be input into the time sequence attention network, so that the attention weight corresponding to each time point is determined through the time sequence attention network, and the spatial topological features of the target entity and the related entity corresponding to each time point in the preset time period are respectively and independently weighted and fused according to the attention weight corresponding to each time point, so that the spatial and temporal features of the target entity and the related entity are obtained.
In the time interval [ t-l+1, t ], the set of spatial topological feature sequences of each entity may be represented as s= [ S 1,...,Si,...,Sn]∈Rn*L*k′ ], where S i∈RL*k′ represents the spatial topological feature time sequence of entity i, n represents the number of entities, L represents the time span, and k' represents the spatial topological feature dimension. The time sequence attention network is a self-attention network, attention weights of each entity at different time points can be calculated, and the space topology characteristics of the entity corresponding to the current time point t are updated by utilizing the space topology characteristics of the entity corresponding to the time points with different weights so as to obtain the space-time characteristics of the entity corresponding to the current time point t.
Of course, the entity includes the target entity and the associated entity. Taking the target entity as an example, in a time interval [ t-L+1, t ], the time sequence S i of the spatial topological feature of the target entity i in a preset time period is updated by a self-attention mechanism as follows:
Q=SiW(q)
K=SiW(k)
V=SiW(v)
Wherein Q, K, V is a query matrix, a key matrix and a value matrix respectively, softmax is a function normalized by columns, d k is a column dimension of the key matrix K for solving the gradient vanishing problem caused by overlarge softmax value, and W (q)、W(k)、W(v) is a learning parameter.
At this time, x' i=S′i[L,:]T is the space-time characteristic of the target entity i corresponding to the current time point t.
S106: and obtaining an implicit prompt based on the space-time characteristics, and splicing the implicit prompt with a prompt text in a preset prompt template to obtain spliced prompt information.
S108: inputting the spliced prompt information into a preset large language model to obtain an output result of the large language model so as to minimize the difference between the output result and the labeling information corresponding to the target entity and at least part of related entities, and training at least the spatial attention network and the time sequence attention network so as to predict the business related to the entity through the trained spatial attention network, the trained time sequence attention network and the large language model.
After determining the space-time characteristics of the target entity and the associated entity. The implicit prompt can be obtained based on the space-time characteristics, the implicit prompt is spliced with prompt texts in a preset prompt template to obtain spliced prompt information, then the spliced prompt information can be input into a preset large language model to obtain an output result of the large language model, so that differences between the output result and labeling information corresponding to a target entity and at least part of related entities are minimized, at least the spatial attention network and the time-sequential attention network are trained, and the business related to the entities is predicted through the trained spatial attention network, the trained time-sequential attention network and the large language model.
Fig. 3 is a schematic diagram of a model structure provided in the present specification.
The above-mentioned output result may refer to a prediction result of the business related to the target entity and part of the related entities, and in this specification, the type and business of the entity are not limited, for example, the entity may be a marketing company, the prediction of the marketing company may be to predict whether a risk will appear in the marketing company, and for example, the entity may be a user in a service platform, and the prediction of the user may be to predict whether the user has a risk.
The implicit cues mentioned above may include not only spatiotemporal features of the target entity, but also spatiotemporal features of the partially associated entity of the target entity.
In order to strengthen the effect or effect of background information in business prediction, during training, a target associated entity can be screened from the associated entities of the target entity in a dynamic knowledge graph, and an implicit prompt is obtained according to the space-time characteristics of the target entity and the space-time characteristics of the target associated entity, so that a large language model can obtain an output result containing a business prediction result for the target entity and a business prediction result for the target associated entity through the implicit prompt, and at least the difference between the output result and labeling information corresponding to the target entity and the target associated entity can be minimized, and the space attention network and the time attention network can be trained (parameters of the large language model can be fixed during training, and fine tuning can also be carried out on the large language model).
That is, in the above-described manner, the large language model needs to predict not only the target entity but also the target related entity, so that the information input into the large language model includes not only the space-time characteristics of the target entity but also the space-time characteristics of the target related entity, and then the supervised training is performed by the labeling information of the target entity (for example, the business result of whether there is a risk in practice) and the labeling information of the target related entity, thereby improving the prediction capability of the overall model obtained by the training.
That is, the target associated entity may be an entity that is screened out by a certain policy and has a close relationship with the target entity. Therefore, the target associated entity can be screened out from the associated entities in the dynamic knowledge graph of the target entity, and top-k target associated entities can be screened out through the size of the edge weight between the target entity corresponding to the current time point and the associated entity when the target associated entity is screened out. The target associated entities can be screened according to the ordering of the edge weights or by combining the edge weights through a model.
The spliced prompt information can be input into a preset large language model (the large language model can be an existing trained large language model), a text vector corresponding to a prompt text in the spliced prompt information is obtained through a text embedding layer in the large language model, a feature vector corresponding to an implicit prompt in the spliced prompt information is spliced with the text vector, a spliced vector is obtained, and an output result of the large language model is obtained according to the spliced vector.
The spliced prompt message is illustrated here:
"task: processing downstream prediction tasks of the target entity and part of the associated entities [ illustratively, the downstream prediction tasks may be risk occurrence, stock price trend, etc ];
Background information prompting: the spatiotemporal feature of target entity i is [ P i ];
Relationship r [ illustratively, industry chain upstream-downstream relationship, equity relationship, etc. ] -
The spatio-temporal characteristics of the associated entity i 1 are
The spatio-temporal characteristics of the associated entity i k are
Event text prompting: event information of the target entity i is [ illustratively, news public opinion, legal litigation, company announcement, etc ];
Input problem: please think step by step according to the context information, and process the downstream prediction tasks of the target entity and part of the associated entities under different relations;
And (3) outputting a prompt: and outputting the prediction results of the downstream tasks one by one according to the target entity and part of associated entities under different relations. "
Wherein,For implicit cues, are used to distinguish explicit cue text in a cue template. The spliced prompt information is input into a large language model, wherein the prompt text is firstly embedded into a text vector by a text embedding module in the large language model and then is further processed by the large language model, and the embedded vector of the implicit prompt is replaced by space-time characteristics corresponding to the target entity and part of the associated entities.
During training, the spliced prompt information can be input into a large language model, parameters of the large language model are kept unchanged, and the spatial attention network and the time sequence attention network are trained by using the labeling information of the downstream prediction task and the loss function;
Illustratively, a large language model may select chatglm, llama, etc., and a loss function may select a cross entropy loss function.
And finally, processing the downstream prediction tasks of the target entity and part of the related entities according to the trained spatial attention network, the trained time sequence attention network and the large language model.
For convenience of description, the execution subject for executing the method is described as a server, and the execution subject of the method may be a computer, a large-scale service platform, or the like, which is not limited herein.
In addition, all the actions for acquiring signals, information or data in the present specification are performed under the condition of conforming to the corresponding data protection rule policy of the place and obtaining the authorization given by the corresponding device owner.
The above method for predicting a large language model based on a spatiotemporal attention mechanism provided for one or more embodiments of the present specification further provides a device for predicting a large language model based on a spatiotemporal attention mechanism based on the same ideas, as shown in fig. 4.
FIG. 4 is a schematic diagram of a large language model prediction device based on a spatio-temporal attention mechanism provided in the present specification, including:
a determining module 401, configured to determine a dynamic knowledge graph within a preset time period;
the spatial feature determining module 402 is configured to input the dynamic knowledge graph in the preset time period, the initial feature of the target entity, and the initial feature of the associated entity of the target entity into a preset spatial attention network, so as to obtain spatial topological features of the target entity and the associated entity in the preset time period;
A space-time feature determining module 403, configured to input spatial topological features of the target entity and the associated entity in the preset time period into a preset time sequence attention network, so as to obtain space-time features of the target entity and the associated entity;
The prompt information determining module 404 is configured to obtain an implicit prompt based on the space-time feature, and splice the implicit prompt with a prompt text in a preset prompt template to obtain spliced prompt information;
The training module 405 is configured to input the spliced prompt information to a preset large language model, obtain an output result of the large language model, so as to minimize a difference between the output result and labeling information corresponding to the target entity and at least a part of associated entities, and train at least the spatial attention network and the time-sequence attention network, so as to predict a service related to an entity through the trained spatial attention network, the trained time-sequence attention network and the large language model.
Optionally, the initial feature is obtained through various indexes of an entity corresponding to the initial feature, wherein the various indexes at least comprise a technical index and a basic surface index.
Optionally, the dynamic knowledge graph is used for representing changes of a plurality of business relationships between the entities in the preset time period.
Optionally, the dynamic knowledge graph in the preset time period includes a knowledge graph corresponding to each time point in the preset time period; the spatial feature determining module 402 is specifically configured to input a knowledge graph corresponding to each time point in the preset time period, an initial feature of a target entity, and an initial feature of an associated entity of the target entity into the spatial attention network, so as to obtain spatial topological features corresponding to each time point in the preset time period of the target entity and the associated entity.
Optionally, the space-time feature determining module 403 is specifically configured to input the space topology features of the target entity and the associated entity corresponding to each time point in the preset time period to the time-sequence attention network, determine the attention weight corresponding to each time point through the time-sequence attention network, and respectively and independently weight and fuse the space topology features of the target entity and the associated entity corresponding to each time point in the preset time period to obtain the space-time features of the target entity and the associated entity
Optionally, the prompt information determining module 404 is specifically configured to screen out a target associated entity from the associated entities of the target entity in the dynamic knowledge graph; and obtaining implicit prompt according to the space-time characteristics of the target entity and the space-time characteristics of the target associated entity.
Optionally, the training module 405 is specifically configured to input the spliced prompt information into a preset large language model, obtain a text vector corresponding to a prompt text in the spliced prompt information through a text embedding layer in the large language model, and splice a feature vector corresponding to an implicit prompt in the spliced prompt information with the text vector to obtain a spliced vector; and obtaining an output result of the large language model according to the spliced vector.
The present specification also provides a computer readable storage medium storing a computer program operable to perform the above-described spatio-temporal attention-based large language model prediction method.
The present specification also provides a schematic structural diagram of the electronic device shown in fig. 5. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as illustrated in fig. 5, although other hardware required by other services may be included. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to realize the large language model prediction method based on the space-time attention mechanism.
Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable GATE ARRAY, FPGA)) is an integrated circuit whose logic functions are determined by user programming of the device. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented with "logic compiler (logic compiler)" software, which is similar to the software compiler used in program development and writing, and the original code before being compiled is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but HDL is not just one, but a plurality of kinds, such as ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language), and VHDL (Very-High-SPEED INTEGRATED Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application SPECIFIC INTEGRATED Circuits (ASICs), programmable logic controllers, and embedded microcontrollers, examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims (9)

1. A method for large language model prediction based on a spatiotemporal attention mechanism, comprising:
determining a dynamic knowledge graph within a preset time period;
inputting the dynamic knowledge graph, the initial characteristics of the target entity and the initial characteristics of the associated entity of the target entity in the preset time period into a preset spatial attention network to obtain spatial topological characteristics of the target entity and the associated entity in the preset time period;
Inputting the spatial topological features of the target entity and the associated entity corresponding to each time point in the preset time period to a preset time sequence attention network, determining the attention weight corresponding to each time point through the time sequence attention network, and respectively and independently weighting and fusing the spatial topological features of the target entity and the associated entity corresponding to each time point in the preset time period to obtain the spatial and temporal features of the target entity and the associated entity;
acquiring an implicit prompt based on the space-time characteristics, and splicing the implicit prompt with a prompt text in a preset prompt template to acquire spliced prompt information;
Inputting the spliced prompt information into a preset large language model to obtain an output result of the large language model so as to minimize the difference between the output result and the labeling information corresponding to the target entity and at least part of related entities, and training at least the spatial attention network and the time sequence attention network so as to predict the business related to the entity through the trained spatial attention network, the trained time sequence attention network and the large language model.
2. The method of claim 1, wherein the initial feature is obtained by each index of an entity corresponding to the initial feature, and each index at least comprises a technical index and a basic surface index.
3. The method of claim 1, wherein the dynamic knowledge graph is used to represent changes in a plurality of business relationships between entities within the preset time period.
4. The method of claim 1, wherein the dynamic knowledge-graph within the preset time period comprises a knowledge-graph corresponding to each time point within the preset time period;
Inputting the dynamic knowledge graph, the initial characteristics of the target entity and the initial characteristics of the associated entity of the target entity in the preset time period into a preset spatial attention network to obtain spatial topological characteristics of the target entity and the associated entity in the preset time period, wherein the method specifically comprises the following steps of:
And inputting the knowledge graph corresponding to each time point in the preset time period, the initial characteristics of the target entity and the initial characteristics of the associated entity of the target entity into the spatial attention network to obtain the spatial topological characteristics corresponding to each time point in the preset time period of the target entity and the associated entity.
5. The method according to claim 1, wherein an implicit hint is obtained based on the spatio-temporal features, comprising in particular:
screening out target associated entities from the associated entities of the target entities in the dynamic knowledge graph;
And obtaining implicit prompt according to the space-time characteristics of the target entity and the space-time characteristics of the target associated entity.
6. The method of claim 1, wherein inputting the spliced prompt message into a preset large language model to obtain an output result of the large language model, specifically comprises:
inputting the spliced prompt information into a preset large language model, obtaining a text vector corresponding to a prompt text in the spliced prompt information through a text embedding layer in the large language model, and splicing a feature vector corresponding to an implicit prompt in the spliced prompt information with the text vector to obtain a spliced vector;
and obtaining an output result of the large language model according to the spliced vector.
7. A large language model prediction apparatus based on a spatiotemporal attention mechanism, comprising:
the determining module is used for determining a dynamic knowledge graph in a preset time period;
The space feature determining module is used for inputting the dynamic knowledge graph, the initial feature of the target entity and the initial feature of the associated entity of the target entity in the preset time period into a preset space attention network to obtain the space topological features of the target entity and the associated entity in the preset time period;
The time-space feature determining module is used for inputting the spatial topological features of the target entity and the associated entity corresponding to each time point in the preset time period to a preset time sequence attention network, determining the attention weight corresponding to each time point through the time sequence attention network, and respectively and independently weighting and fusing the spatial topological features of the target entity and the associated entity corresponding to each time point in the preset time period to obtain the time-space features of the target entity and the associated entity;
The prompt information determining module is used for obtaining an implicit prompt based on the space-time characteristics, and splicing the implicit prompt with a prompt text in a preset prompt template to obtain spliced prompt information;
The training module is used for inputting the spliced prompt information into a preset large language model to obtain an output result of the large language model so as to minimize the difference between the output result and the labeling information corresponding to the target entity and at least part of related entities, and training at least the spatial attention network and the time sequence attention network so as to predict the business related to the entity through the trained spatial attention network, the trained time sequence attention network and the large language model.
8. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-6.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-6 when executing the program.
CN202311675342.9A 2023-12-06 2023-12-06 Large language model prediction method and device based on space-time attention mechanism Active CN117786061B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311675342.9A CN117786061B (en) 2023-12-06 2023-12-06 Large language model prediction method and device based on space-time attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311675342.9A CN117786061B (en) 2023-12-06 2023-12-06 Large language model prediction method and device based on space-time attention mechanism

Publications (2)

Publication Number Publication Date
CN117786061A CN117786061A (en) 2024-03-29
CN117786061B true CN117786061B (en) 2024-06-04

Family

ID=90385950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311675342.9A Active CN117786061B (en) 2023-12-06 2023-12-06 Large language model prediction method and device based on space-time attention mechanism

Country Status (1)

Country Link
CN (1) CN117786061B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116502176A (en) * 2023-03-28 2023-07-28 支付宝(杭州)信息技术有限公司 Pre-training method and device of language model, medium and electronic equipment
CN116756574A (en) * 2023-08-16 2023-09-15 腾讯科技(深圳)有限公司 Training method, using method, device and equipment of multi-mode pre-training model
CN116821294A (en) * 2023-06-20 2023-09-29 浙江大学 Question-answer reasoning method and device based on implicit knowledge ruminant
CN117058595A (en) * 2023-10-11 2023-11-14 齐鲁工业大学(山东省科学院) Video semantic feature and extensible granularity perception time sequence action detection method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116578877B (en) * 2023-07-14 2023-12-26 之江实验室 Method and device for model training and risk identification of secondary optimization marking
CN116882767B (en) * 2023-09-08 2024-01-05 之江实验室 Risk prediction method and device based on imperfect heterogeneous relation network diagram

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116502176A (en) * 2023-03-28 2023-07-28 支付宝(杭州)信息技术有限公司 Pre-training method and device of language model, medium and electronic equipment
CN116821294A (en) * 2023-06-20 2023-09-29 浙江大学 Question-answer reasoning method and device based on implicit knowledge ruminant
CN116756574A (en) * 2023-08-16 2023-09-15 腾讯科技(深圳)有限公司 Training method, using method, device and equipment of multi-mode pre-training model
CN117058595A (en) * 2023-10-11 2023-11-14 齐鲁工业大学(山东省科学院) Video semantic feature and extensible granularity perception time sequence action detection method and device

Also Published As

Publication number Publication date
CN117786061A (en) 2024-03-29

Similar Documents

Publication Publication Date Title
CN113688313A (en) Training method of prediction model, information pushing method and device
CN115238826B (en) Model training method and device, storage medium and electronic equipment
CN112966186A (en) Model training and information recommendation method and device
CN115146731A (en) Model training method, business wind control method and business wind control device
CN114997472A (en) Model training method, business wind control method and business wind control device
CN110738562B (en) Method, device and equipment for generating risk reminding information
CN113343085B (en) Information recommendation method and device, storage medium and electronic equipment
CN116029556B (en) Service risk assessment method, device, equipment and readable storage medium
CN115545353B (en) Business wind control method, device, storage medium and electronic equipment
CN116308738B (en) Model training method, business wind control method and device
CN117786061B (en) Large language model prediction method and device based on space-time attention mechanism
CN116882767A (en) Risk prediction method and device based on imperfect heterogeneous relation network diagram
CN117093862A (en) Model training method and device, electronic equipment and storage medium
CN116824331A (en) Model training and image recognition method, device, equipment and storage medium
CN116363418A (en) Method and device for training classification model, storage medium and electronic equipment
CN116308620A (en) Model training and information recommending method, device, storage medium and equipment
CN115758141A (en) Method and device for model training and business wind control
CN113344590A (en) Method and device for model training and complaint rate estimation
CN114116813A (en) Information recommendation method and recommendation device
CN114120273A (en) Model training method and device
CN118428333B (en) Method, device, storage medium and electronic equipment for enhancing text data
CN116501852B (en) Controllable dialogue model training method and device, storage medium and electronic equipment
CN117494052A (en) Prediction method and device based on automatic generation of space-time static information
CN116109008B (en) Method and device for executing service, storage medium and electronic equipment
CN117876114A (en) Method and device for service execution and model training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant