CN118013934A - Text generation method, apparatus, electronic device, medium and computer program product - Google Patents

Text generation method, apparatus, electronic device, medium and computer program product Download PDF

Info

Publication number
CN118013934A
CN118013934A CN202410170837.4A CN202410170837A CN118013934A CN 118013934 A CN118013934 A CN 118013934A CN 202410170837 A CN202410170837 A CN 202410170837A CN 118013934 A CN118013934 A CN 118013934A
Authority
CN
China
Prior art keywords
text
processor
feature vector
round
text feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410170837.4A
Other languages
Chinese (zh)
Inventor
周健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN202410170837.4A priority Critical patent/CN118013934A/en
Publication of CN118013934A publication Critical patent/CN118013934A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The application discloses a text generation method, a device, electronic equipment, a medium and a computer program product, belonging to the technical field of text processing, wherein the method comprises the following steps: acquiring a first text feature vector sequence corresponding to a first text; first-round prediction is carried out on the first text based on at least one first processor of the electronic equipment and the first text feature vector sequence, so that a first text feature vector is obtained; based on at least one second processor of the electronic equipment, the first text feature vector sequence and the text feature vector predicted by each round of the first text before the ith round, performing the ith round of prediction to obtain the ith text feature vector, wherein i epsilon [2, … …, N ] and N are the total rounds of prediction based on the first text; and sequencing the text feature vectors from the first text feature vector to the ith text feature vector to obtain a second text feature vector sequence, and outputting a second text based on the second text feature vector sequence.

Description

Text generation method, apparatus, electronic device, medium and computer program product
Technical Field
The application belongs to the technical field of text processing, and particularly relates to a text generation method, a text generation device, electronic equipment, a medium and a computer program product.
Background
Currently, users can predict subsequent text content of an entered text through a large language model (Large Language Model, LLM). In general, a large language model may map an input text into a sequence of text feature vectors, and then generate a sequence of text feature vectors corresponding to a subsequent text of the input text based on the sequence of text feature vectors.
However, since the large language model performs computation based on a parallel data manner of a plurality of processors when processing text, the predicted subsequent text content can be obtained only when the plurality of processors participating in the computation complete the computation. Therefore, if a computing process of one processor of the plurality of processors participating in the computing is complex and takes a long time, the computing process of the other processor is blocked. Thus, the speed of text prediction is slow and the time required is long.
Disclosure of Invention
The embodiment of the application aims to provide a text generation method, a device, electronic equipment, a medium and a computer program product, which can improve the speed of text prediction and shorten the time required by text prediction.
In a first aspect, an embodiment of the present application provides a text generation method, which is executed by an electronic device, and includes: acquiring a first text feature vector sequence corresponding to a first text; first-round prediction is carried out on the first text based on at least one first processor of the electronic equipment and the first text feature vector sequence, so that a first text feature vector is obtained; based on at least one second processor of the electronic equipment, the first text feature vector sequence and the text feature vector predicted by each round of the first text before the ith round, performing the ith round of prediction to obtain the ith text feature vector, wherein i epsilon [2, … …, N ] and N are the total rounds of prediction based on the first text; and sequencing the text feature vectors from the first text feature vector to the ith text feature vector to obtain a second text feature vector sequence, and outputting a second text based on the second text feature vector sequence.
In a second aspect, an embodiment of the present application provides a text generating apparatus, including: the device comprises an acquisition module, a processing module and an output module; the acquisition module is used for acquiring a first text feature vector sequence corresponding to the first text; the processing module is used for carrying out first-round prediction on the first text based on at least one first processor of the electronic equipment and the first text feature vector sequence acquired by the acquisition module to acquire a first text feature vector; the processing module is further used for carrying out the ith round of prediction on the basis of the at least one second processor of the electronic equipment and the text feature vector sequence obtained by the obtaining module and the text feature vector predicted by each round of first texts before the ith round, so as to obtain the ith text feature vector, wherein i is [2, … …, N ] and N is the total round of prediction based on the first texts; the processing module is further used for sequencing the text feature vectors from the first text feature vector to the ith text feature vector to obtain a second text feature vector sequence; and the output module is used for outputting the second text based on the second text feature vector sequence obtained by the processing module.
In a third aspect, an embodiment of the present application provides an electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the method as described in the first aspect.
In a fourth aspect, embodiments of the present application provide a readable storage medium having stored thereon a program or instructions which when executed by a processor perform the steps of the method according to the first aspect.
In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and where the processor is configured to execute a program or instructions to implement a method according to the first aspect.
In a sixth aspect, embodiments of the present application provide a computer program product stored in a storage medium, the program product being executable by at least one processor to implement the method according to the first aspect.
In the embodiment of the application, the electronic equipment can acquire a first text feature vector sequence corresponding to the first text; first-round prediction is carried out on the first text based on at least one first processor of the electronic equipment and the first text feature vector sequence, so that a first text feature vector is obtained; based on at least one second processor of the electronic equipment, the first text feature vector sequence and the text feature vector predicted by each round of the first text before the ith round, performing the ith round of prediction to obtain the ith text feature vector, wherein i epsilon [2, … …, N ] and N are the total rounds of prediction based on the first text; and sequencing the text feature vectors from the first text feature vector to the ith text feature vector to obtain a second text feature vector sequence, and outputting a second text based on the second text feature vector sequence. According to the scheme, when the electronic equipment predicts the text, the first round of prediction of the text can be performed through at least one first processor in the electronic equipment, and the ith round of prediction of the text, which is not the first round of prediction, can be performed through at least one second processor in the electronic equipment. Thus, by distributing the computation by text prediction among multiple processors, the processor resources used in text prediction by the electronic device are isolated. Therefore, the shorter calculation process of the i-th round of prediction of the non-first round with smaller calculation amount can be prevented from being blocked by the longer calculation process of the first round of prediction with larger calculation amount, so that the resource utilization rate of a processor in the electronic equipment is improved, the speed of text prediction is further improved, and the time required by the text prediction is shortened.
Drawings
FIG. 1 is one of the flowcharts of a text generation method provided by an embodiment of the present application;
FIG. 2 is a schematic diagram of an example of text prediction by a language model according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an example of a first-round prediction process blocking a non-first-round prediction process according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an architecture of an electronic device for dividing a processor required for first-round prediction and a processor required for non-first-round prediction according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an example of an electronic device partitioning a second text feature vector according to an embodiment of the present application;
FIG. 6 is a schematic diagram of an example of acquiring a corresponding KV Cache in a non-first-round prediction according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a text generating device according to an embodiment of the present application;
fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application;
fig. 9 is a second schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions of the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.
The terms "first," "second," and the like in the description of the present application, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or otherwise described herein, and that the objects identified by "first," "second," etc. are generally of a type not limited to the number of objects, for example, the first object may be one or more. In addition, "and/or" in the specification means at least one of the connected objects, and the character "/", generally means a relationship in which the associated objects are one kind of "or".
The terms "at least one", and the like in the description of the present application mean that they encompass any one, any two, or a combination of two or more of the objects. For example, at least one of a, b, c (item) may represent: "a", "b", "c", "a and b", "a and c", "b and c" and "a, b and c", wherein a, b, c may be single or plural. Similarly, the term "at least two" means two or more, and the meaning of the expression is similar to the term "at least one".
The text generation method, the device, the electronic equipment, the medium and the computer program product provided by the embodiment of the application are described in detail through specific embodiments and application scenes thereof with reference to the accompanying drawings.
The text generation method, the device, the electronic equipment, the medium and the computer program product provided by the embodiment of the application can be applied to a scene of text prediction of the electronic equipment through a language model.
Currently, large language models can rely on model network structures and parameter sets to predict possible output text based on existing input text information to accomplish text generation reasoning. In general, a large language model is an autoregressive, generative model based on a Transformer (Transformer) structure. In the reasoning process, the input text can be mapped into a text feature vector (token) sequence corresponding to a large language model word list, and the large language model can generate a next new text feature vector according to the text feature vector sequence of the context and the generated text feature vector sequence, and the process is circulated until an ending symbol is generated to terminate the reasoning.
For a large language model of trillion-level parameters, the current mainstream reasoning usually adopts a data parallel mode of a single-machine multiprocessor card, namely the reasoning parameters are divided into a plurality of parts and put on a plurality of processor cards, the corresponding part of data of each processor card is calculated, and then the processing results of each processor card are combined together to obtain a final result. Generally, in order to improve the computing efficiency of the processor, a text feature vector sequence corresponding to a plurality of input texts may be combined into a large input matrix to perform overall computing, that is, batch computing, so as to calculate a next text feature vector corresponding to each input text.
However, because of the complex network structure and the huge model parameters of the large language models, the inference response of the large language models has a bottleneck, and hundreds of milliseconds or even seconds are often required to generate the next output, which results in limited usability and applicability of the large language models in real-time application scenarios.
It can be appreciated that, since the large language model performs computation based on a parallel manner of data of multiple processors when processing text, the predicted subsequent text content can be obtained only when the multiple processors participating in the computation complete the computation. Therefore, if a computing process of one processor of the plurality of processors participating in the computing is complex and takes a long time, the computing process of the other processor is blocked. Thus, the speed of text prediction is slow and the time required is long.
In the text generation method, the device, the electronic equipment, the medium and the computer program product provided by the embodiment of the application, when the electronic equipment predicts the text, the first round of prediction of the text can be performed by at least one first processor in the electronic equipment, and the non-first round of i-th round of prediction of the text can be performed by at least one second processor in the electronic equipment. Thus, by distributing the computation by text prediction among multiple processors, the processor resources used in text prediction by the electronic device are isolated. Therefore, the shorter calculation process of the i-th round of prediction of the non-first round with smaller calculation amount can be prevented from being blocked by the longer calculation process of the first round of prediction with larger calculation amount, so that the resource utilization rate of a processor in the electronic equipment is improved, the speed of text prediction is further improved, and the time required by the text prediction is shortened.
The execution subject of the text generation method provided by the embodiment of the application can be a text generation device. The text generating means may be an electronic device or a component in the electronic device, such as an integrated circuit or a chip, for example. The text generation method provided by the embodiment of the application will be exemplarily described below by taking an electronic device as an example.
The embodiment of the application provides a text generation method, and fig. 1 shows a flowchart of the text generation method provided by the embodiment of the application, and the method can be applied to electronic equipment. As shown in fig. 1, the text generating method provided by the embodiment of the present application may include the following steps 101 to 104.
Step 101, the electronic device acquires a first text feature vector sequence corresponding to a first text.
In some embodiments of the present application, the first text may be a text obtained by the electronic device from another device, a text stored in the electronic device, or a text manually input by a user. The embodiment of the present application is not particularly limited.
In some embodiments of the present application, the electronic device may obtain the first text feature vector sequence corresponding to the first text by extracting the text feature vector of the first text.
In some embodiments of the present application, the first text feature vector sequence may be a text feature vector of the first text, and the vector sequence is arranged according to a text order of the first text.
In some embodiments of the present application, the text feature vector of the first text may be a feature vector of a word in the first text extracted by the electronic device.
In some embodiments of the present application, before the step 101, the text generating method provided by the embodiment of the present application may further include a step 105 described below.
Step 105, the electronic device inputs the first text into the language model.
In some embodiments of the present application, the language model described above may be used to predict subsequent text content of the first text.
In some embodiments of the present application, the language model may include an input layer, a codec layer, and an output layer.
In some embodiments of the present application, the electronic device may extract a text feature vector of the first text through an input layer in the language model, to obtain a first text feature vector sequence corresponding to the first text.
Step 102, the electronic device performs first-round prediction on the first text based on at least one first processor of the electronic device and the first text feature vector sequence to obtain a first text feature vector.
Step 103, the electronic device predicts the ith round based on at least one second processor of the electronic device, the first text feature vector sequence and the text feature vector predicted by each round of the first text before the ith round to obtain the ith text feature vector.
Where i ε [2, … …, N ] N is the total round of prediction based on the first text.
In some embodiments of the present application, the first processor and the second processor may be different processors in the electronic device.
In some embodiments of the present application, the first processor may include, but is not limited to: a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU).
In some embodiments of the present application, the second processor may include, but is not limited to: CPU, GPU.
In some embodiments of the present application, the electronic device may obtain the predicted text corresponding to the first text by performing multiple rounds of prediction on the first text. When the electronic device performs the ith round of prediction on the first text, the ith round of prediction can be performed by multiplexing the first text feature vector sequence and the result of each round of prediction before the ith round, so as to reduce the calculated amount of the ith round of prediction.
Illustratively, i=4 is taken as an example. The electronic device may perform a 4 th round of prediction based on the first text feature vector sequence, the first text feature vector obtained by first round of first text prediction, the 2 nd text feature vector obtained by second round of first text prediction, and the 3 rd text feature vector obtained by 3 rd round of first text prediction to obtain a 4 th text feature vector.
It can be appreciated that the ith round of prediction may be any prediction round except the first round of prediction in the process of predicting the first text by the electronic device.
In some embodiments of the present application, the first text feature vector may be a text feature vector obtained by first-round prediction of the first text by the electronic device. That is, the first text feature vector may be a text feature vector obtained by predicting the first text for the first time by the electronic device.
In some embodiments of the present application, the text feature vector obtained by prediction may be used as a KV calculation Cache (Key-Value Cache, KV Cache) for use when the second processor performs the ith prediction of the non-first round.
In some embodiments of the present application, the KV Cache may be a matrix product of a concentration layer and a Key (Key, K) Value (V) weight of the input first text feature vector sequence in the codec layer.
In some embodiments of the present application, in the process of performing text prediction by using the language model, the language model may perform multiple rounds of prediction based on the first text feature vector sequence of the input first text, so as to obtain a subsequent text of the first text.
It can be appreciated that in the first-round prediction (ContextDecoder) performed by the language model, since the matrix product of the attention layer and the Key (K) Value (V) weight of the input first text feature vector sequence in the codec layer needs to be calculated, the calculation complexity is positively related to the length of the first text feature vector sequence. Thus, the first round of prediction is a computationally intensive task.
In the process of carrying out non-first-round ith prediction (Generator) by the language model, the calculation amount of the non-first-round ith prediction can be reduced to 1 calculation amount of text feature vectors through a KV Cache, so that the calculation task of the non-first-round ith prediction is reduced. Therefore, the ith round of non-first round is predicted as a read-write intensive task.
It should be noted that, KV Cache is an optimization technology in a transformer model, and is used for caching the calculation result of a Cross-Attention layer (Cross-Attention), reducing the overhead of repeated calculation, and improving the calculation efficiency. When cross Attention is used, the input text is calculated by a Multi-Head self Attention layer (Multi-Head Attention) to obtain three matrixes of Query (Q) and K, V, then an Attention distribution is calculated by Q and K, a cross Attention context vector is obtained by an Attention distribution weighted average K matrix, and finally the vector is weighted average with a V matrix obtained by Multi-Head self Attention layer processing to obtain a final output text feature vector.
Therefore, when the same Q matrix needs to be calculated for a plurality of times, the corresponding K matrix and V matrix can be cached. When the next calculation is performed, if the same Q matrix appears, the cached K matrix and the cached V matrix can be reused, so that the calculation amount is greatly reduced, and the calculation efficiency of the language model is improved.
Illustratively, as shown in fig. 2, assume that a first text feature vector sequence corresponding to a first text is: 101. 9867, 2672, 999, 3857. The electronic device may get the first text feature "6782" through a first round of prediction. When the electronic equipment performs the 2 nd round of calculation, the electronic equipment can directly obtain the corresponding text feature vector through the cached first text feature vector sequence and the KV Cache of the first text feature vector, and the calculation is not required to be repeated. That is, the electronic device only needs to predict the text feature vector "672" in the 2 nd round of calculation process, so as to obtain the corresponding 2 nd text feature vector. And the electronic equipment can finish N rounds of calculation on the first text through smaller calculation amount, so as to obtain the predicted text corresponding to the first text.
It is understood that the number corresponding to the text feature vector may be a number tag corresponding to a word in the first text.
It will be appreciated that the first-round prediction described above is a computationally intensive task and therefore takes a relatively long time to predict than the ith-round prediction, which is not the first-round. If the electronic device performs text prediction through the language model, and if first-round prediction and non-first-round prediction exist in the same batch of calculation, the first-round prediction process can block the non-first-round prediction process, so that the first-round prediction process becomes long, and the time required for prediction is increased.
Illustratively, as shown in FIG. 3, assume that the electronic device predicts text 1, text 2, text 3, text 4, and text 5 simultaneously. Wherein, text 1, text 2, text 3 and text 4 are non-first-round predictions, and text 5 is a first-round prediction. Then, since the electronic device performs parallel prediction of the text 1, the text 2, the text 3, the text 4 and the text 5 simultaneously, the non-first-round prediction process corresponding to the text 1, the text 2, the text 3 and the text 4 is shorter, and the first-round prediction process corresponding to the text 5 is longer. Thus, the first round prediction process of text 5 will block the non-first round prediction processes of text 1, text 2, text 3, and text 4.
In some embodiments of the application, the electronic device may separate the processor used to make the first-round prediction from the processor used to make the non-first-round, i-th-round prediction, by using a different processor to make the first-round or i-th-round prediction. Therefore, the first-round prediction process can be prevented from blocking the non-first-round prediction process, so that the speed of text prediction is increased, and the time required by text prediction is shortened.
Step 104, the electronic device sorts the text feature vectors from the first text feature vector to the i text feature vector to obtain a second text feature vector sequence, and outputs the second text based on the second text feature vector sequence.
In some embodiments of the present application, the electronic device may splice the text feature vectors predicted in the 1 st round to the N th round after the first text feature vector sequence according to the predicted sequence, to obtain a text feature vector sequence corresponding to the second text, so as to output the second text.
According to the text generation method provided by the embodiment of the application, when the electronic equipment predicts the text, the first round of prediction of the text can be performed by at least one first processor in the electronic equipment, and the ith round of prediction of the text, which is not the first round of prediction, can be performed by at least one second processor in the electronic equipment. Thus, by distributing the computation by text prediction among multiple processors, processor resources used when the electronic device performs text prediction through the language model are isolated. Therefore, the shorter calculation process of the i-th round of prediction of the non-first round with smaller calculation amount can be prevented from being blocked by the longer calculation process of the first round of prediction with larger calculation amount, so that the resource utilization rate of a processor in the electronic equipment is improved, the speed of text prediction is further improved, and the time required by the text prediction is shortened.
In some embodiments of the present application, before the step 102, the text generating method provided in the embodiment of the present application may further include steps 106 to 108 described below.
Step 106, the electronic device determines the number of processors required by the ith round of prediction based on the first parameter.
Wherein, the first parameter may include: the method comprises the steps of temporarily occupying the space of the language model in running, simultaneously processing the maximum text quantity of the language model, displaying the memory size of one processor in the electronic equipment and the maximum cache size of the processor in the electronic equipment.
In some embodiments of the present application, the electronic device may use multiple processors to perform computation during text prediction through the language model, so as to improve the computation performance of data parallelism.
In some embodiments of the present application, data transmission between multiple processors in an electronic device may be performed through a communication high-speed interface to ensure efficient inter-processor communication.
Data transfer between multiple processors in an electronic device may be by way of an external device interconnect bus interface (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, PCI-E) or a high-speed connection channel (NVIDIA LINK, NVLINK), for example.
In some embodiments of the application, computational power and memory are two primary indicators that measure processor performance.
In some embodiments of the application, the computational power of the processor may be a quantitative indicator of the speed and strength of the processing of data by the processor, typically expressed in terms of peak speed of floating point operations in Tflops (trillion floating point operations per second). It will be appreciated that the higher the computational power of the processor, the more powerful the processing power, and the faster the speed in performing massive parallel operations.
In some embodiments of the application, the memory of the processor may be memory for the processor to store image data, texture data, intermediate results of calculations, etc., typically expressed in gigabytes (Gigabytes, GB). It will be appreciated that the size of the memory can affect the performance of the processor for large-scale computing and loading of large data sets. The larger the processor memory size, the more data it can cache and the more capable it is to handle large data sets.
In some embodiments of the present application, since first-round prediction is a computationally intensive task, the process of first-round prediction by the electronic device requires adequate computational support; since the ith round of prediction of the non-first round is a read-write intensive task, the process of the electronic equipment for carrying out the ith round of prediction of the non-first round needs to have enough video memory support. In this manner, the electronic device may determine a first processor that makes a first round of prediction and a second processor that makes a non-first round of prediction by calculating processor resources required for the first round of prediction and the non-first round of prediction.
In some embodiments of the application, the electronic device may calculate the number of processors required for the i-th round of prediction for the non-first round by equation (1) below.
Where M may represent the number of processors required for the ith round of computation; max_batch may represent the maximum number of texts processed simultaneously by the language model; v kv_cache may represent the maximum cache size of the processor in the electronic device; runtime _memory may represent the amount of space temporarily occupied by the language model runtime; model_size may represent the weight of the language model; memory size per card may represent the memory size of a processor in the electronic device.
It should be noted that power_upper (x) may represent a function that is greater than or equal to the POWER of x to the smallest 2.
Illustratively, power_upper (3.4) =4.
Illustratively, power_upper (4) =4.
Illustratively, power_upper (5.6) =8.
In some embodiments of the present application, the function POWER_UPPER (x) may be implemented in the following high-level programming language Python:
def POWER_UPPER(n):
power=1
while power<n:
power*=2
return power
Step 107, the electronic device determines the number of processors required for the first round of prediction based on the total number of processors in the electronic device and the number of processors required for the ith round of prediction.
In some embodiments of the application, the electronic device may calculate the number of processors required for the first round of prediction by equation (2) below.
N=power_lower (all_card_number-M) formula (2)
Where N may represent the number of processors required for the first round of prediction; all_card_number may represent the total number of processors in the electronic device.
It should be noted that power_lower (x) may represent less than or equal to the POWER of x to the maximum of 2.
Illustratively, power_lower (3.4) =2.
Illustratively, power_lower (4) =4.
Illustratively, power_upper (5.6) =4.
In some embodiments of the present application, the function Power_Low (x) may be implemented in the following high-level programming language Python:
def POWER_LOWER(n):
power=1
while power<=n:
power*=2
return power//2
Step 108, the electronic device determines a first processor and a second processor from the processors of the electronic device according to the number of processors required by the first round of prediction and the number of processors required by the ith round of prediction.
In some embodiments of the present application, the electronic device may divide the multiple processors in the electronic device into two modules of the first-round prediction and the i-th-round prediction according to the calculated number of processors required by the first-round prediction and the calculated number of processors required by the i-th-round prediction, where the two modules may perform data transmission through a high-speed bus protocol.
Illustratively, the first processor and the second processor are each GPUs. As shown in fig. 4, the electronic device may determine that N GPUs are used for first-round prediction and M GPUs are used for ith-round prediction that is not first-round.
Therefore, the first and second processors can be determined from the processors of the electronic device according to the number of the processors required by the first-round prediction and the number of the processors required by the i-th-round prediction, so that the first-round prediction and the non-first-round prediction can be performed through different processors, the computation process of the computationally intensive first-round prediction is prevented from blocking the computation process of the non-first-round prediction, the resource utilization rate of the processors in the electronic device is improved, the speed of text prediction is further improved, and the time required by the text prediction is shortened.
In some embodiments of the present application, before the step 106, the text generating method provided by the embodiment of the present application may further include a step 109 described below.
Step 109, the electronic device determines a maximum cache size of a processor in the electronic device based on the second parameter.
Wherein the second parameter may include: the number of layers of the codec layer of the language model, the dimension of the hidden layer of the language model, the calculation accuracy of the language model, the maximum text input length of the language model and the maximum generatable text length of the language model.
In some embodiments of the present application, the maximum buffer size of the processor in the electronic device may represent a maximum buffer size that may be buffered in all processors of the electronic device. In other words, the maximum buffer size of the processor in the electronic device may be the maximum buffer size that can be buffered in any processor of the electronic device when the electronic device performs text prediction.
In some embodiments of the present application, the electronic device may calculate the maximum cache size of the processor in the electronic device by the following equation (3).
Vkv_cache=2×num_layer×(max_input_token_length+
Max_new_token_length) ×hidden_size× sizeT formula (3)
Wherein num_layer may represent the number of layers of the codec layer of the language model; max_input_token_length may represent the maximum text input length of the language model; max_new_token_length may represent the maximum generatable text length of the language model; the hidden_size may represent the dimensions of the language model hidden layer; sizeT may represent language model calculation accuracy.
In this way, the electronic device can calculate the maximum buffer size of the processors in the electronic device, so as to calculate the number of processors required by the i-th prediction of the non-first round and the processors required by the first round, and the electronic device can determine the processor for the i-th prediction of the non-first round and the processor for the first round.
In some embodiments of the present application, the at least one second processor may include M second processors, where M is a positive integer.
In some embodiments of the present application, after the step 102, the text generating method provided in the embodiment of the present application may further include the following steps 110 to 112.
Step 110, the electronic device asynchronously transmits, for a third processor in at least one second processor, the first text feature vector obtained by each first processor to a temporary storage area corresponding to the third processor.
Wherein the third processor is any one of at least one second processor.
It will be appreciated that the transfer of the first text feature vector from the first round of prediction between the first processor and the second processor is an important process for the electronic device to complete the text prediction through the language model.
It can be understood that the first text feature vector can be used as a KV Cache obtained in the first-round prediction process. The KV Cache is essentially 2 two-dimensional matrices including a K Cache matrix and a V Cache matrix. The matrix shape of the K Cache matrix may be [ seq_length, hidden_size ≡p ], seq_length may represent the input text length, hidden_size may represent the hidden layer dimension of the language model, and P may represent the number of first processors. The matrix shape of the V Cache matrix may be identical to the K Cache shape.
In some embodiments of the present application, during the first-round prediction process, the electronic device may asynchronously transfer, to a temporary area corresponding to the second processor, a KV Cache generated by the first text feature vector sequence in the cross-attention layer of the codec layer together with a request id, a layer sequence number (layer_no), and a processor card sequence number (rank) used for the first-round prediction.
It should be noted that, the request ID may be used to locate a request to which the KV Cache belongs. That is, it can be used to locate the input text to which the KV Cache belongs.
The layer sequence number can be used for positioning the layer sequence number of the attention layer in the coding and decoding layer of the KV Cache belonging to the language model.
The card number may indicate which processor the KV Cache came from.
In some embodiments of the present application, the buffer corresponding to the third processor may be located on a video memory of a processor used for non-first-round prediction.
For example, the buffer corresponding to the third processor may be located on the video memory of the No. 0 processor used for the non-first-round prediction.
It can be appreciated that the electronic device asynchronously transmits the first text feature vector obtained by each first processor to the corresponding temporary storage area of the third processor, so that the influence on the main operation process of the first-round prediction can be reduced.
Step 111, the electronic device divides, by the third processor, the first text feature vectors stored in the temporary storage area corresponding to the third processor, to obtain M second text feature vectors.
Step 112, the electronic device assigns M second text feature vectors to the at least one second processor.
Wherein M is the number of second processors, and one second text feature vector corresponds to one second processor.
In some embodiments of the present application, the electronic device may splice the first text feature vectors obtained by each first processor in the temporary storage area corresponding to the third processor, and divide the first text feature vectors according to the number of the second processors.
For example, as shown in fig. 5, assuming that the number of the first processors is N and the number of the second processors is M, the electronic device may arrange the first text feature vectors with identical layer sequence numbers and the first text feature vector request ID obtained by each first processor in the temporary storage area corresponding to the second processor according to the sequence from the rank to the rank in the column direction, splice the first text feature vectors to form a matrix with the shapes of [ seq_length, hidden_size ], and divide the matrix into M parts according to the column direction, thereby obtaining the segmented matrix.
In some embodiments of the present application, when the electronic device allocates the second text feature vector to the second processor, the text feature vector equally divided by columns may be allocated according to the card serial number of the second processor.
For example, the electronic device may allocate the K Cache matrix and the V Cache matrix that are equally divided by columns to the second processor video memory in a one-to-one correspondence according to the rank order of the second processor.
In some embodiments of the present application, in the process of allocating the second text feature vector, the electronic device may carry the request ID and the layer sequence number corresponding to the second text feature vector in the second text feature vector, and send the request ID and the layer sequence number to the corresponding second processor.
In some embodiments of the present application, the order in which the electronic device assigns and transmits the second text feature vector may be consistent with the order in which the electronic devices are equally divided in columns.
Therefore, the electronic equipment can efficiently transfer the first-round prediction result to the second processor for use, so that the computing resources adopted in the process of the language model reasoning calculation can be isolated, the use efficiency of the processor is improved, and the reasoning speed is improved.
In some embodiments of the present application, the step 103 may include the following steps 103a and 103b.
Step 103a, for the ith round of prediction, predicting a text feature vector obtained by predicting the first text according to the at least one second processor, the first text feature vector sequence, the second text feature vector corresponding to the second processor and each round of the first text before the ith round except for the first round, so as to obtain a third text feature vector.
And 103b, splicing the third text feature vectors obtained by each second processor to obtain the ith text feature vector.
In some embodiments of the present application, after the electronic device may allocate M second text feature vectors to the at least one second processor, for each second processor in the at least one second processor, the first text feature vector sequence, the second text feature vector corresponding to each second processor, and each round of text feature vectors predicted by the first text before the ith round except the first round may be used to calculate, so as to obtain at least one third text feature vector corresponding to each second processor. And the electronic equipment can splice the third text feature vector obtained by each second processor to obtain an ith text feature vector.
In some embodiments of the present application, the electronic device may splice at least one third text feature vector corresponding to each second processor according to the sequence of the processor card serial numbers of the at least one processor and arranged in the column direction, to obtain an ith text feature vector.
In some embodiments of the present application, when the electronic device performs the ith round of prediction, the electronic device may acquire the corresponding text feature vector to participate in calculation according to the request ID and the layer sequence number of the current request.
Illustratively, taking the second text feature vector as the KV Cache, the second processor is exemplified by the second GPU. As shown in fig. 6, when the request corresponding to the first text is calculated on the second GPU with rank=w to the kth layer of the attention layer in the codec layer, the electronic device may obtain the KV Cache corresponding to the request and the layer sequence number from the video memory of the W-th card.
It should be noted that, for the other second processors except the third processor in at least one second processor, the ith round of computation may be performed to obtain the ith text feature vector based on the first text feature vector sequence and the second text feature vector allocated to the second processor in the text feature vectors predicted by each round of the first text before the ith round. To avoid repetition, no further description is provided here.
Therefore, the electronic equipment can directly use the corresponding text feature vector to conduct non-first-round prediction, so that the calculated amount of the electronic equipment in the process of conducting non-first-round ith-round prediction can be saved, the speed of text prediction is improved, and the time required by text prediction is shortened.
The above embodiments of the method, or various possible implementation manners in the embodiments of the method, may be executed separately, or may be executed in any two or more combinations with each other, and may specifically be determined according to actual use requirements, which is not limited by the embodiments of the present application.
According to the text generation method provided by the embodiment of the application, the execution main body can be a text generation device. In the embodiment of the present application, a text generating device is described by taking a text generating method performed by the text generating device as an example.
Fig. 7 shows a schematic diagram of a possible configuration of a text generating apparatus according to an embodiment of the present application. As shown in fig. 7, the text generating apparatus 70 may include: an acquisition module 71, a processing module 72 and an output module 73.
The acquiring module 71 is configured to acquire a first text feature vector sequence corresponding to a first text; the processing module 72 is configured to perform first-round prediction on the first text based on the at least one first processor of the electronic device and the first text feature vector sequence acquired by the acquiring module 71, to obtain a first text feature vector; and, at least one second processor, configured to predict an ith round based on the first text feature vector sequence acquired by the electronic device and the text feature vector predicted by each round of the first text before the ith round by using the acquiring module 71, to obtain an ith text feature vector, where i e [2, … …, N ] is a total round of prediction based on the first text; the text feature vectors from the first text feature vector to the ith text feature vector are sequenced to obtain a second text feature vector sequence; an output module 73, configured to output the second text based on the second text feature vector sequence obtained by the processing module 72.
In one possible implementation manner, the apparatus further includes: an input module;
The input module is configured to input the first text into the language model before the obtaining module 71 obtains a first text feature vector sequence corresponding to the first text;
The processing module 72 is further configured to determine, based on at least one first processor of the electronic device and the first text feature vector sequence, a number of processors required for the i-th prediction based on the first parameter before performing first-round prediction on the first text to obtain the first text feature vector; and determining the number of processors required for the first round of prediction based on the total number of processors in the electronic device and the number of processors required for the ith round of prediction; and determining a first processor and a second processor from the processors of the electronic device according to the number of processors required for the first round of prediction and the number of processors required for the i-th round of prediction;
wherein the first parameter comprises: the method comprises the steps of temporarily occupying the space of the language model in running, simultaneously processing the maximum text quantity of the language model, displaying the memory size of one processor in the electronic equipment and the maximum cache size of the processor in the electronic equipment.
In a possible implementation manner, the processing module 72 is further configured to determine, based on the second parameter, a maximum buffer size of the processor in the electronic device before determining, based on the first parameter, the number of processors required for the i-th round of prediction;
Wherein the second parameter comprises: the number of layers of the codec layer of the language model, the dimension of the hidden layer of the language model, the calculation accuracy of the language model, the maximum text input length of the language model and the maximum generatable text length of the language model.
In a possible implementation manner, the processing module 72 is further configured to asynchronously transmit, for a third processor of the at least one second processor, the first text feature vector obtained by each first processor to a corresponding temporary storage area of the third processor, where the third processor is any one of the at least one second processor; the first text feature vectors stored in the temporary storage areas corresponding to the third processors are divided by the third processors to obtain M second text feature vectors, wherein M is the number of the second processors; and the method is also used for distributing M second text feature vectors to at least one second processor, wherein one second text feature vector corresponds to one second processor.
In one possible implementation, the processing module 72 is specifically configured to:
Predicting the first text according to at least one second processor, a first text feature vector sequence, a second text feature vector corresponding to the second processor and each round of text feature vectors predicted by the first text before the ith round except the first round to obtain a third text feature vector;
And splicing the third text feature vector obtained by each second processor to obtain an ith text feature vector.
The embodiment of the application provides a text generation device, which can predict a first round of text through at least one first processor in electronic equipment and predict a non-first round of i-th round of text through at least one second processor in the electronic equipment when the electronic equipment predicts the text. Thus, by distributing the computation by text prediction among multiple processors, processor resources used when the electronic device performs text prediction through the language model are isolated. Therefore, the shorter calculation process of the i-th round of prediction of the non-first round with smaller calculation amount can be prevented from being blocked by the longer calculation process of the first round of prediction with larger calculation amount, so that the resource utilization rate of a processor in the electronic equipment is improved, the speed of text prediction is further improved, and the time required by the text prediction is shortened.
The text generating device in the embodiment of the application can be an electronic device or a component in the electronic device, such as an integrated circuit or a chip. The electronic device may be a terminal, or may be other devices than a terminal. The electronic device may be a Mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted electronic device, a Mobile internet appliance (Mobile INTERNET DEVICE, MID), an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a robot, a wearable device, an ultra-Mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), etc., and may also be a server, a network attached storage (Network Attached Storage, NAS), a personal computer (personal computer, PC), a Television (TV), a teller machine, a self-service machine, etc., which are not particularly limited in the embodiments of the present application.
The text generating device in the embodiment of the application can be a device with an operating system. The operating system may be an Android operating system, an iOS operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.
The text generating device provided by the embodiment of the application can realize the processes realized by the embodiment of the text generating method, achieve the same technical effects, and are not repeated here for avoiding repetition.
Optionally, as shown in fig. 8, the embodiment of the present application further provides an electronic device 800, including a processor 801 and a memory 802, where the memory 802 stores a program or an instruction that can be executed on the processor 801, and the program or the instruction implements each step of the above-mentioned text generation method embodiment when executed by the processor 801, and the steps achieve the same technical effects, so that repetition is avoided, and no further description is given here.
The electronic device in the embodiment of the application includes the mobile electronic device and the non-mobile electronic device.
Fig. 9 is a schematic hardware structure of an electronic device implementing an embodiment of the present application.
The electronic device 1000 includes, but is not limited to: radio frequency unit 1001, network module 1002, audio output unit 1003, input unit 1004, sensor 1005, display unit 1006, user input unit 1007, interface unit 1008, memory 1009, and processor 1010.
Those skilled in the art will appreciate that the electronic device 1000 may also include a power source (e.g., a battery) for powering the various components, which may be logically connected to the processor 1010 by a power management system to perform functions such as managing charge, discharge, and power consumption by the power management system. The electronic device structure shown in fig. 9 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than shown, or may combine certain components, or may be arranged in different components, which are not described in detail herein.
The processor 1010 is configured to obtain a first text feature vector sequence corresponding to the first text; and the first processor is used for carrying out first-round prediction on the first text based on the at least one first processor of the electronic equipment and the first text feature vector sequence to obtain a first text feature vector; and performing an i-th round of prediction based on the at least one second processor of the electronic device and the first text feature vector sequence and the text feature vectors predicted for the first text for each round preceding the i-th round to obtain an i-th text feature vector, i e [2, … …, N ], N being a total round of prediction based on the first text; the text feature vectors from the first text feature vector to the ith text feature vector are sequenced to obtain a second text feature vector sequence; and, further for outputting a second text based on the second text feature vector sequence.
In one possible implementation, the processor 1010 is configured to input the first text into the language model before acquiring the first text feature vector sequence corresponding to the first text;
The processor 1010 is further configured to determine, based on at least one first processor of the electronic device and the first text feature vector sequence, a number of processors required for the i-th prediction based on the first parameter before performing first-round prediction on the first text to obtain the first text feature vector; and determining the number of processors required for the first round of prediction based on the total number of processors in the electronic device and the number of processors required for the ith round of prediction; and determining a first processor and a second processor from the processors of the electronic device according to the number of processors required for the first round of prediction and the number of processors required for the i-th round of prediction;
wherein the first parameter comprises: the method comprises the steps of temporarily occupying the space of the language model in running, simultaneously processing the maximum text quantity of the language model, displaying the memory size of one processor in the electronic equipment and the maximum cache size of the processor in the electronic equipment.
In one possible implementation, the processor 1010 is further configured to determine, based on the second parameter, a maximum cache size of a processor in the electronic device before determining, based on the first parameter, a number of processors required for the ith round of prediction;
Wherein the second parameter comprises: the number of layers of the codec layer of the language model, the dimension of the hidden layer of the language model, the calculation accuracy of the language model, the maximum text input length of the language model and the maximum generatable text length of the language model.
In a possible implementation manner, the processor 1010 is further configured to asynchronously transmit, for a third processor of the at least one second processor, the first text feature vector obtained by each first processor to a register corresponding to the third processor, where the third processor is any one of the at least one second processor; the first text feature vectors stored in the temporary storage areas corresponding to the third processors are divided by the third processors to obtain M second text feature vectors, wherein M is the number of the second processors; and the method is also used for distributing M second text feature vectors to at least one second processor, wherein one second text feature vector corresponds to one second processor.
In one possible implementation, the processor 1010 is specifically configured to:
Predicting the first text according to at least one second processor, a first text feature vector sequence, a second text feature vector corresponding to the second processor and each round of text feature vectors predicted by the first text before the ith round except the first round to obtain a third text feature vector;
And splicing the third text feature vector obtained by each second processor to obtain an ith text feature vector.
The embodiment of the application provides electronic equipment, wherein when the electronic equipment predicts a text, the first round of prediction of the text can be performed by at least one first processor in the electronic equipment, and the ith round of prediction of the text, which is not the first round of prediction, can be performed by at least one second processor in the electronic equipment. Thus, by distributing the computation by text prediction among multiple processors, processor resources used when the electronic device performs text prediction through the language model are isolated. Therefore, the shorter calculation process of the i-th round of prediction of the non-first round with smaller calculation amount can be prevented from being blocked by the longer calculation process of the first round of prediction with larger calculation amount, so that the resource utilization rate of a processor in the electronic equipment is improved, the speed of text prediction is further improved, and the time required by the text prediction is shortened.
It should be appreciated that in embodiments of the present application, the input unit 1004 may include a graphics processor (Graphics Processing Unit, GPU) 10041 and a microphone 10042, where the graphics processor 10041 processes image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The display unit 1006 may include a display panel 10061, and the display panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes at least one of a touch panel 10071 and other input devices 10072. The touch panel 10071 is also referred to as a touch screen. The touch panel 10071 can include two portions, a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and so forth, which are not described in detail herein.
The memory 1009 may be used to store software programs as well as various data. The memory 1009 may mainly include a first memory area storing programs or instructions and a second memory area storing data, wherein the first memory area may store an operating system, application programs or instructions (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. Further, the memory 1009 may include volatile memory or nonvolatile memory, or the memory 1009 may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM), static random access memory (STATIC RAM, SRAM), dynamic random access memory (DYNAMIC RAM, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate Synchronous dynamic random access memory (Double DATA RATE SDRAM, DDRSDRAM), enhanced Synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCH LINK DRAM, SLDRAM), and Direct random access memory (DRRAM). Memory 1009 in embodiments of the application includes, but is not limited to, these and any other suitable types of memory.
The processor 1010 may include one or more processing units; optionally, the processor 1010 integrates an application processor that primarily processes operations involving an operating system, user interface, application programs, and the like, and a modem processor that primarily processes wireless communication signals, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 1010.
The embodiment of the application also provides a readable storage medium, on which a program or an instruction is stored, which when executed by a processor, implements each process of the above text generation method embodiment, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here.
Wherein the processor is a processor in the electronic device described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.
The embodiment of the application further provides a chip, which comprises a processor and a communication interface, wherein the communication interface is coupled with the processor, and the processor is used for running programs or instructions to realize the processes of the embodiment of the text generation method, and can achieve the same technical effects, so that repetition is avoided, and the description is omitted here.
It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.
Embodiments of the present application provide a computer program product stored in a storage medium, where the program product is executed by at least one processor to implement the respective processes of the above-described text generating method embodiment, and achieve the same technical effects, and for avoiding repetition, a detailed description is omitted herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims (13)

1. A text generation method, performed by an electronic device, the method comprising:
acquiring a first text feature vector sequence corresponding to a first text;
performing first-round prediction on the first text based on at least one first processor of the electronic equipment and the first text feature vector sequence to obtain a first text feature vector;
Based on at least one second processor of the electronic device, the first text feature vector sequence and text feature vectors predicted by each round before the ith round, performing the ith round of prediction to obtain the ith text feature vector, wherein i epsilon [2, … …, N ] is the total round of prediction based on the first text;
And sequencing the text feature vectors from the first text feature vector to the ith text feature vector to obtain a second text feature vector sequence, and outputting a second text based on the second text feature vector sequence.
2. The method of claim 1, wherein prior to the obtaining the first text feature vector sequence corresponding to the first text, the method further comprises:
Inputting the first text into a language model;
The method further comprises, before the first-round prediction is performed on the first text based on the at least one first processor of the electronic device and the first text feature vector sequence to obtain a first text feature vector:
determining the number of processors required for the ith round of prediction based on the first parameter;
determining the number of processors required for the first round of prediction based on the total number of processors in the electronic device and the number of processors required for the i-th round of prediction;
Determining the first processor and the second processor from the processors of the electronic device according to the number of processors required by the first round of prediction and the number of processors required by the ith round of prediction;
wherein the first parameter comprises: the method comprises the steps of temporarily occupying space of the language model in operation, simultaneously processing the maximum text quantity of the language model, displaying the memory size of one processor in the electronic equipment and the maximum cache size of the processor in the electronic equipment.
3. The method of claim 2, wherein prior to determining the number of processors required for the ith round of prediction based on the first parameter, the method further comprises:
Determining a maximum cache size of a processor in the electronic device based on a second parameter;
wherein the second parameter comprises: the method comprises the steps of number of layers of a coding layer and a decoding layer of the language model, dimension of a hidden layer of the language model, calculation precision of the language model, maximum text input length of the language model and maximum generatable text length of the language model.
4. The method of claim 1, wherein the first text is first-round predicted based on the at least one first processor of the electronic device and the first sequence of text feature vectors to obtain a first text feature vector, and further comprising:
For a third processor in the at least one second processor, asynchronously transmitting the first text feature vector obtained by each first processor to a temporary storage area corresponding to the third processor, wherein the third processor is any processor in the at least one second processor;
Dividing the first text feature vectors stored in the temporary storage areas corresponding to the third processors by the third processors to obtain M second text feature vectors, wherein M is the number of the second processors;
And distributing the M second text feature vectors to the at least one second processor, wherein one second text feature vector corresponds to one second processor.
5. The method of claim 4, wherein the performing the ith round of prediction based on the at least one second processor of the electronic device and the first sequence of text feature vectors, each round of the predicted text feature vector for the first text prior to the ith round, to obtain the ith text feature vector, comprises:
Predicting, for the ith round of prediction, a third text feature vector based on the at least one second processor, the first text feature vector sequence, the second text feature vector corresponding to the second processor, and each round of text feature vector predicted by the first text before the ith round except for the first round of prediction;
and splicing the third text feature vector obtained by each second processor to obtain the ith text feature vector.
6. A text generation apparatus, the apparatus comprising: the device comprises an acquisition module, a processing module and an output module;
The acquisition module is used for acquiring a first text feature vector sequence corresponding to the first text;
the processing module is used for performing first-round prediction on the first text based on at least one first processor of the electronic equipment and the first text feature vector sequence acquired by the acquisition module to obtain a first text feature vector;
The processing module is further configured to perform an ith round of prediction based on the at least one second processor of the electronic device and the first text feature vector sequence acquired by the acquiring module and text feature vectors predicted by each round of the first text before the ith round, to obtain an ith text feature vector, where i e [2, … …, N ] is a total round of prediction based on the first text;
The processing module is further configured to sort text feature vectors included in the first to i-th text feature vectors to obtain a second text feature vector sequence;
And the output module is used for outputting a second text based on the second text feature vector sequence obtained by the processing module.
7. The apparatus of claim 6, wherein the apparatus further comprises: an input module;
the input module is used for inputting the first text into a language model before the acquisition module acquires a first text feature vector sequence corresponding to the first text;
The processing module is further configured to determine, based on a first parameter, a number of processors required for the i-th round of prediction before performing first round of prediction on the first text based on at least one first processor of the electronic device and the first text feature vector sequence to obtain a first text feature vector;
the processing module is further configured to determine a number of processors required for the first round of prediction based on a total number of processors in the electronic device and the number of processors required for the i-th round of prediction;
the processing module is further configured to determine the first processor and the second processor from the processors of the electronic device according to the number of processors required for the first round of prediction and the number of processors required for the i-th round of prediction;
wherein the first parameter comprises: the method comprises the steps of temporarily occupying space of the language model in operation, simultaneously processing the maximum text quantity of the language model, displaying the memory size of one processor in the electronic equipment and the maximum cache size of the processor in the electronic equipment.
8. The apparatus of claim 7, wherein the processing module is further configured to determine a maximum cache size for a processor in the electronic device based on a second parameter before determining the number of processors required for the ith round of prediction based on the first parameter;
wherein the second parameter comprises: the method comprises the steps of number of layers of a coding layer and a decoding layer of the language model, dimension of a hidden layer of the language model, calculation precision of the language model, maximum text input length of the language model and maximum generatable text length of the language model.
9. The apparatus of claim 6, wherein the processing module is further configured to perform first-round prediction on the first text based on at least one first processor of the electronic device and the first text feature vector sequence, and after obtaining first text feature vectors, asynchronously transmit, for a third processor of the at least one second processor, the first text feature vector obtained by each of the first processors to a corresponding temporary storage area of the third processor, where the third processor is any one of the at least one second processor;
The processing module is further configured to divide, by the third processor, the first text feature vectors stored in the temporary storage area corresponding to the third processor, to obtain M second text feature vectors, where M is the number of the second processors;
the processing module is further configured to allocate the M second text feature vectors to the at least one second processor, where one second text feature vector corresponds to one second processor.
10. The apparatus according to claim 9, wherein the processing module is specifically configured to:
Predicting, for the ith round of prediction, a third text feature vector based on the at least one second processor, the first text feature vector sequence, the second text feature vector corresponding to the second processor, and each round of text feature vector predicted by the first text before the ith round except for the first round of prediction;
and splicing the third text feature vector obtained by each second processor to obtain the ith text feature vector.
11. An electronic device comprising a processor and a memory storing a program or instructions executable on the processor, which when executed by the processor, implement the steps of the text generation method of any of claims 1 to 5.
12. A readable storage medium, characterized in that the readable storage medium has stored thereon a program or instructions which, when executed by a processor, implement the steps of the text generation method according to any of claims 1 to 5.
13. A computer program product stored in a storage medium, the computer program product being executed by at least one processor to implement the steps of the text generation method of any of claims 1 to 5.
CN202410170837.4A 2024-02-06 2024-02-06 Text generation method, apparatus, electronic device, medium and computer program product Pending CN118013934A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410170837.4A CN118013934A (en) 2024-02-06 2024-02-06 Text generation method, apparatus, electronic device, medium and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410170837.4A CN118013934A (en) 2024-02-06 2024-02-06 Text generation method, apparatus, electronic device, medium and computer program product

Publications (1)

Publication Number Publication Date
CN118013934A true CN118013934A (en) 2024-05-10

Family

ID=90943962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410170837.4A Pending CN118013934A (en) 2024-02-06 2024-02-06 Text generation method, apparatus, electronic device, medium and computer program product

Country Status (1)

Country Link
CN (1) CN118013934A (en)

Similar Documents

Publication Publication Date Title
US11836520B2 (en) Dynamic batching for inference system for transformer-based generation tasks
CN109657793B (en) Model training method and device, storage medium and electronic equipment
CN113449859A (en) Data processing method and device
CN111191784A (en) Transposed sparse matrix multiplied by dense matrix for neural network training
CN111984400A (en) Memory allocation method and device of neural network
US10684824B2 (en) Stochastic rounding of numerical values
CN112818663A (en) Processing method for language model, text generation method, text generation device and medium
CN110909527B (en) Text processing model running method and device, electronic equipment and storage medium
Wang et al. Parallelization and performance optimization on face detection algorithm with OpenCL: A case study
CN111161705A (en) Voice conversion method and device
Li et al. A Novel Memory‐Scheduling Strategy for Large Convolutional Neural Network on Memory‐Limited Devices
Long et al. Deep learning based data prefetching in CPU-GPU unified virtual memory
US11934930B2 (en) Selective batching for inference system for transformer-based generation tasks
CN111199276B (en) Data processing method and related product
CN118013934A (en) Text generation method, apparatus, electronic device, medium and computer program product
US20230153604A1 (en) Performing simulations using machine learning
CN113408304B (en) Text translation method and device, electronic equipment and storage medium
Rodriguez-Santana et al. Mobile computation offloading architecture for mobile augmented reality, case study: Visualization of cetacean skeleton
CN113761416A (en) Request processing method, device, server and storage medium
CN113535349A (en) Data batch processing method and device and storage medium
CN109635238B (en) Matrix operation method, device, equipment and readable medium
US11972188B2 (en) Rail power density aware standard cell placement for integrated circuits
CN115759260B (en) Reasoning method and device of deep learning model, electronic equipment and storage medium
Chen et al. pommDNN: Performance optimal GPU memory management for deep neural network training
Chen et al. Approximate Communication in Network-on-Chips for Training and Inference of Image Classification Models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination