CN110909527B

CN110909527B - Text processing model running method and device, electronic equipment and storage medium

Info

Publication number: CN110909527B
Application number: CN201911222138.5A
Authority: CN
Inventors: 王晓晖; 李磊
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2019-12-03
Filing date: 2019-12-03
Publication date: 2023-12-08
Anticipated expiration: 2039-12-03
Also published as: CN110909527A

Abstract

The embodiment of the disclosure discloses a method, a device, an electronic device and a storage medium for operating a text processing model, wherein the text processing model comprises at least one encoder layer and at least one decoder layer, and the method is characterized by comprising the following steps: acquiring an input text vector; inputting the text vector into at least one encoder layer for processing to form an implicit layer vector; inputting the hidden layer vector into at least one decoder layer for processing to generate an output text vector; and in the data calculation process of the encoder layer and/or the decoder layer, a combined kernel function is called to process the data, the combined kernel function comprises at least two basic kernel functions, the basic kernel functions are used for completing mathematical level calculation of the data, and the combined kernel functions are used for completing functional level calculation of the data. The technical scheme of the embodiment can reduce the read-write times of the video memory of the GPU and improve the running efficiency.

Description

Text processing model running method and device, electronic equipment and storage medium

Technical Field

The embodiment of the disclosure relates to the technical field of natural language processing, in particular to a method and a device for operating a text processing model, electronic equipment and a storage medium.

Background

In the field of natural language processing, a sequence-to-sequence (seq 2 seq) model is typically employed for processing, mainly comprising at least one encoder and at least one decoder. The main principles of the seq2seq model are: splitting a source sentence into word sequences, inputting the word sequences into an encoder, and outputting vectors of an implicit layer; the hidden layer vector is used as an input of a decoder, a target vocabulary can be generated at each time step, the latter target vocabulary is generated based on the hidden layer and the target vocabulary output before, and the final target vocabulary sequence forms a target sentence of translation.

In general, when the seq2seq model performs natural language processing, the calculation amount of model operation is so large that the processing time is required to be long, for example, translating a sentence of 20 words on a P4 model GPU based on a transducer model requires about 1 second. This is unacceptable in business scenarios where tens of thousands of translation requests per second are often done, both from a machine cost perspective and a user experience perspective.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide a method, an apparatus, an electronic device, and a storage medium for operating a text processing model, so as to reduce the read-write frequency of a GPU video memory, and improve the operating efficiency.

Other features and advantages of embodiments of the present disclosure will be apparent from the following detailed description, or may be learned by practice of embodiments of the disclosure in part.

In a first aspect, embodiments of the present disclosure provide a method of operating a text processing model, the text processing model including at least one encoder layer and at least one decoder layer, comprising:

acquiring an input text vector;

inputting the text vector into at least one encoder layer for processing to form an implicit layer vector;

inputting the hidden layer vector into at least one decoder layer for processing to generate an output text vector;

and in the data calculation process of the encoder layer and/or the decoder layer, a combined kernel function is called to process the data, the combined kernel function comprises at least two basic kernel functions, the basic kernel functions are used for completing mathematical level calculation of the data, and the combined kernel functions are used for completing functional level calculation of the data.

In a second aspect, an embodiment of the present disclosure further provides an apparatus for executing a text processing model, where the text processing model includes at least one encoder layer and at least one decoder layer, and the apparatus includes:

An input acquisition unit configured to acquire an input text vector;

an encoder layer processing unit, configured to input the text vector into at least one encoder layer for processing to form an implicit layer vector;

a decoder layer processing unit, configured to input the implicit layer vector into at least one decoder layer for processing, so as to generate an output text vector;

and the encoder layer processing unit calls a combined kernel function to process data in the data computing process of the encoder layer and/or the decoder layer processing unit in the decoder layer, wherein the combined kernel function comprises at least two basic kernel functions, the basic kernel functions are used for completing mathematical level computing of the data, and the combined kernel functions are used for completing functional level computing of the data.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, including:

one or more processors;

a memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the instructions of the method of any of the first aspects.

In a fourth aspect, the presently disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method according to any of the first aspects.

Embodiments of the present disclosure process an input text vector by inputting it into at least one encoder layer to form an implied layer vector; and inputting the hidden layer vector into at least one decoder layer for processing to generate an output text vector, wherein in the data calculation process of the encoder layer and/or the decoder layer, a combined kernel function is called to process the data, the combined kernel function comprises at least two basic kernel functions, the basic kernel functions are used for completing mathematical level calculation of the data, and the combined kernel functions are used for completing functional level calculation of the data. According to the embodiment of the disclosure, the read-write frequency of the video memory of the GPU can be greatly reduced, and the running efficiency can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the following description will briefly explain the drawings required to be used in the description of the embodiments of the present disclosure, and it is apparent that the drawings in the following description are only some of the embodiments of the present disclosure, and other drawings may be obtained according to the contents of the embodiments of the present disclosure and these drawings without inventive effort for those skilled in the art.

FIG. 1 is a flow diagram of a method of operating a text processing model provided in an embodiment of the present disclosure;

FIG. 2 is a flow diagram of an example of invoking a combined kernel to process data provided by embodiments of the present disclosure;

FIG. 3 is a schematic diagram of an example of a transducer model provided in an embodiment of the present disclosure;

FIG. 4 is a schematic internal block diagram of the encoder and decoder layers of a transducer model provided by embodiments of the present disclosure;

FIG. 5 is a schematic diagram of the calculation of three vectors of the self-attention mechanism layer of a transducer model according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a calculation process of a self-attention mechanism layer of a transducer model according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a calculation process of a self-attention mechanism layer of a transducer model according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a calculation process of a self-attention mechanism layer of a transducer model according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of a calculation process of a self-attention mechanism layer of a transducer model according to an embodiment of the present disclosure;

FIG. 10 is a schematic diagram of a calculation process of a self-attention mechanism layer of a transducer model according to an embodiment of the present disclosure;

FIG. 11 is a flow diagram of another method of operating a text processing model provided by an embodiment of the present disclosure;

FIG. 12 is a schematic diagram of a text processing model running device according to an embodiment of the present disclosure;

fig. 13 shows a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

In order to make the technical problems solved, the technical solutions adopted and the technical effects achieved by the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments, but not all embodiments of the present disclosure. All other embodiments, which are derived by a person skilled in the art from the embodiments of the present disclosure without creative efforts, fall within the protection scope of the embodiments of the present disclosure.

It should be noted that the terms "system" and "network" in the embodiments of the present disclosure are often used interchangeably herein. References to "and/or" in the embodiments of the present disclosure are intended to encompass any and all combinations of one or more of the associated listed items. The terms first, second and the like in the description and in the claims and drawings are used for distinguishing between different objects and not for limiting a particular order.

It should be further noted that, in the embodiments of the present disclosure, the following embodiments may be implemented separately, or may be implemented in combination with each other, which is not specifically limited by the embodiments of the present disclosure.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

The technical solutions of the embodiments of the present disclosure are further described below with reference to the accompanying drawings and through specific implementations.

Fig. 1 is a flow chart illustrating a method for operating a text processing model according to an embodiment of the present disclosure, where the embodiment is applicable to a case of operating a text processing model including at least one encoder layer and at least one decoder layer, and the method may be performed by an operating device of a text processing model configured in an electronic device, as shown in fig. 1, where the method for operating a text processing model according to the present embodiment includes:

in step S110, an input text vector is acquired.

In step S120, the text vector is input to at least one encoder layer for processing, so as to form an implicit layer vector, and in the data calculation process of each encoder layer, a combined kernel function is called to process the data.

The combined kernel function comprises at least two basic kernel functions, wherein the basic kernel functions are used for completing mathematical level calculation of data, and the combined kernel functions are used for completing functional level calculation of the data.

In the data calculation process of each encoder layer, a calculation step such as calling a kernel function, for example, calling a kernel function such as matrix line average, matrix line variance, and matrix dot product, is generally involved. In general, the kernel functions in the basic library of scientific computing are usually fine-grained, and if the kernel functions in the basic library are directly called, frequent memory reading and writing are required. If the operation of the base library kernel functions is spliced and combined to form a combined kernel function in the data calculation process of each encoder layer in the step, for example, the combined kernel function comprises matrix row average, matrix row variance, matrix dot product and the like, the combined kernel function is called to process the data, intermediate calculation values can be temporarily stored in a register, the final result of each component is directly written into a video memory, and the video memory read-write times can be reduced by times.

For example, if the text processing model adopts a transducer model, the computing components of the self-attention mechanism layer need to be combined in sequence in the base library: matrix row average, matrix row variance, matrix dot product and other kernel functions to produce at least 5 times of reading and writing of the whole matrix; if the combined kernel function is called, and the process of calling the kernel functions such as matrix row average, matrix row variance, matrix dot product and the like is realized by using one combined kernel function in a thread group communication mode, only one-time matrix reading and writing is needed, and the delay can be reduced by about 80%.

The combined kernel function in this step refers to a large-granularity kernel function obtained by combining at least two basic kernel functions that are sequentially called, and is specifically obtained by combining basic kernel functions, which are determined according to basic kernel functions actually involved in calculation of each encoder layer, which is not limited in this embodiment.

In step S130, the implicit layer vector is input to at least one decoder layer for processing, so as to generate an output text vector, and in the data calculation process of each decoder layer, a combined kernel function is called to process the data.

Likewise, the combined kernel includes at least two basic kernels for performing mathematical level calculations of the data, and the combined kernel for performing functional level calculations of the data.

For the same reason as in step S120, in order to reduce the number of read/write operations of the video memory, in the process of calculating the data of each decoder layer, the self-made combined kernel function may be also called according to the basic kernel function to be called, and the self-made combined kernel function may be called to process the data, and in the process of calculating, the intermediate calculation value may be temporarily stored in a register, and the final result of each component may be directly written into the video memory, so as to reduce the number of read/write operations of the video memory.

It should be noted that, compared with the prior art that each calculation process directly calls the basic kernel function in the basic library, in this embodiment, if the combined kernel function is called to process the data in the data calculation process of any encoder layer or any decoder layer, the effect of reducing the read-write times of the video memory is achieved, so that the read-write frequency of the video memory can be reduced, and the operation efficiency is improved.

Illustratively, fig. 2 is a flowchart of an example of calling a combined kernel function to process data, where, as shown in fig. 2, in a data calculation process of the encoder layer and/or the decoder layer, the calling the combined kernel function to process data includes:

in step S210, during the data calculation of the encoder layer and/or the decoder layer, a thread group is allocated to the called combined kernel, and data is read from the video memory space.

In step S220, the combined kernel is run by a thread group, and intermediate data in the combined kernel calculation process is read and written in a register by using a thread group communication method.

In step S230, the final output data of the combined kernel is written into the video memory space.

Illustratively, the technical solution of this embodiment is described below by taking a transform model as the text processing model, where the transform model used as the text processing model includes a plurality of sequentially connected encoder layers and a plurality of sequentially connected decoder layers, and an implicit layer vector is transmitted between the last encoder layer and each decoder layer; each encoder layer includes at least a self-attention mechanism layer and a feedforward neural network layer; each decoder layer includes at least a self-attention mechanism layer, a codec attention mechanism layer, and a feedforward neural network layer.

Taking the example of the transducer model shown in fig. 3, which uses a layer comprising 6 encoder layers and 6 decoder layers, the internal simplified structure of each encoder layer and decoder layer is shown in fig. 4.

As shown in fig. 4, for the encoder layer, comprising two layers, a self-attention mechanism layer and a feed-forward neural network layer, the self-attention mechanism layer can help the current node not only focus on the current word, but also can acquire the context semantics.

The decoder layer also comprises two layers of networks mentioned by the encoder layer, but a codec attention mechanism layer is arranged between the two layers to help the current node acquire the important content which needs to be focused currently.

For internal details of the model, reference is made to fig. 5. Firstly, the model needs to perform an enabling operation (similar to the operation of w2c, used for mapping the natural language vector into a digital vector, namely mapping the text vector into a sequence vector) on the input data, the data is input to an encoder layer after the enabling operation is finished, the data is sent to a feedforward neural network layer after the self-attention mechanism layer processes the data, the calculation of the feedforward neural network layer can be parallel, and the obtained output can be input to the next encoder layer. If the input text vectors are "training" and "machinery", the sequence vector X is obtained after the mapping processing of the ebedding ₁ And X ₂ 。

The ideas and attentiveness mechanisms are similar for the self-attentive mechanism layer, but the self-attentive mechanism layer is a kind of idea that a transducer uses to convert the "understanding" of other related words into the word being processed.

First, the self-attention mechanism layer calculates three new vectors for reflecting the weights of words from three sides, respectively. For example, the sequence vectors are respectively subjected to three weight calculations, and the three vectors are respectively called Query, key and Value, and are obtained by using an ebedding vector (such as sequence vector X ₁ And X ₂ ) With a randomly initialised matrix (e.g. matrix W as shown in figure 5 ^Q 、W ^K And W is ^V ) The result of multiplication (according to X as shown in FIG. 5 ₁ Sum matrix W ^Q 、W ^K And W is ^V The vector q obtained by multiplication _1、 k ₁ And v _1， According to X ₂ Sum matrix W ^Q 、W ^K And W is ^V The vector q obtained by multiplication _2、 k ₂ And v ₂ )。

For example, this matrix employs a randomly initialized matrix of dimensions (64, 512). Note that the second dimension needs to be the same as the dimension of ebedding, its value is updated all the time during BP, and the dimensions of the three vectors obtained are 64 below the dimension of ebedding, and the schematic calculation process is shown in fig. 5.

Then, a score value of the self-attention mechanism layer needs to be calculated, which determines the degree of attention to other parts of the input sentence when one word is encoded at a certain position. The calculation method of the score value is that Query and Key are used as points, taking fig. 6 as an example, wherein the reference name of the figure refers to the reference name of fig. 5, and the score value of other words for the word is calculated firstly aiming at the word of 'Thinking', namely q is calculated firstly ₁ ·k ₁ Then for the second word, q ₁ ·k ₂ . A schematic of the calculation process is shown in fig. 6. Q as shown in the example of FIG. 6 ₁ And k is equal to ₁ The dot product Score was 112, q ₁ And k is equal to ₂ The dot product is 112.

Next, the result of the dot product is divided by a constant, for example, this value is typically an evolution using the first dimension of the matrix mentioned above, for example, an evolution 8 using 64, although other values may be chosen, and the result obtained is then calculated by a softmax to determine the text and text confidence probability of the output bit, and the result obtained is the magnitude of the word correlation of each word to the current position, which certainly would be quite large.

On the basis of fig. 6, the calculation result at this step is shown in fig. 7 for the above example, wherein reference numerals refer to the reference numerals of fig. 5 to 6, and in addition,this means that the above Score is divided by 8 and then the square is opened.

Then, the Value and the Value obtained by softmax are multiplied and added, and the obtained result is the Value of the self-attention mechanism layer at the current node. The above example shows the calculation result at this step as shown in fig. 8, in which the reference numerals refer to those of fig. 5 to 7, and Sum represents addition.

In the actual application scenario, in order to increase the calculation speed, a matrix is used to directly calculate the matrix of Query, key, value, and the calculation result in this step is shown in fig. 9.

The value of the scaling is then multiplied directly by the three matrices, the resulting new matrix Q is multiplied by K, multiplied by a constant, softmax operated, and finally multiplied by the V matrix, the example described above being shown in fig. 10, labeled with reference to fig. 5-9, in the process of calculating softmax.

In the above model, if the prior art is adopted, in the process of calculating softmax by the self-attention mechanism layers of each encoder layer and each decoder layer, basic kernel functions need to be called, for example, matrix multiplication is involved when matrix of Query, key and Value is calculated, matrix multiplication kernel functions need to be called, matrix point multiplication and functions need to be called when matrix Query and Key points of each layer are multiplied, and variance functions need to be called when the values of each softmax are finally obtained.

According to the embodiment, the combined kernel function can be prefabricated according to the call flow of each function so as to call the combined kernel function, the read-write times of the video memory can be greatly reduced, and the running efficiency can be improved.

Furthermore, on the basis of the above, the combined kernel function can be operated through a thread group, intermediate data in the calculation process of the combined kernel function is read and written in a register by utilizing a thread group communication mode, and final output data of the combined kernel function is written in a video memory space, so that the efficiency of calling the combined kernel function can be further improved.

The present embodiment processes by inputting an input text vector into at least one encoder layer to form an implicit layer vector; and inputting the hidden layer vector into at least one decoder layer for processing to generate an output text vector, wherein in the data calculation process of the encoder layer and/or the decoder layer, a combined kernel function is called to process the data, the combined kernel function comprises at least two basic kernel functions, the basic kernel functions are used for completing mathematical level calculation of the data, and the combined kernel functions are used for completing functional level calculation of the data. The read-write frequency of the video memory can be greatly reduced, and the running efficiency can be improved.

Fig. 11 is a flow chart illustrating another method for operating a text processing model according to an embodiment of the present disclosure, which is based on the foregoing embodiment, and performs improved optimization. As shown in fig. 11, the method for operating a text processing model according to the present embodiment includes:

In step S1110, an input text vector is acquired.

In step S1120, a fixed-position video memory space is allocated to at least one computing module of the encoder layer, where the size of the video memory space is fixed, and the video memory space remains in a free state when no data is read and written.

Taking the models described in fig. 3 and fig. 4 as an example, the self-attentive mechanism modules of the first encoder model, the third encoder model and the fifth encoder model of the transducer model may be allocated with a first video memory space at a fixed position, and the self-attentive mechanism modules of the second encoder model, the fourth encoder model and the sixth encoder model may be allocated with a second video memory space at a fixed position. For the same reason, the self-attention mechanism modules of the second encoder model, the fourth encoder model and the sixth encoder model allocate the video memory to be read and written sequentially, so that the phenomenon of time crossing does not occur, the same video memory space (namely the second video memory space) can be reused, the attention mechanism modules do not need to release the video memory space after the read-write operation on the video memory space is finished, the processing times of releasing the video memory space and allocating the video memory space in the process of running the models can be reduced, and the processing efficiency can be further improved.

In order to make the text processing range of the text processing model wider, the size of the allocated memory space is determined by the maximum value of the inputtable text vector. Wherein the maximum value of the inputtable text vector is determinable from the historical maximum value of the input text vector, such that the text processing model is capable of processing text vectors having a length not exceeding the maximum value.

In step S1130, the text vector is input to at least one encoder layer for processing to form an implicit layer vector, and in the data calculation process of each encoder layer, a combined kernel function is called to process the data.

In step S1140, the implicit layer vector is input to at least one decoder layer for processing to generate an output text vector.

It should be noted that, the decoder layer includes at least two computing modules having a time-sharing computing relationship, and may also allocate the same spatial video memory space in a fixed position to the at least two computing modules, set the size of the video memory space to be fixed, and keep the same spatial video memory space in a vacant state when no data is read or written, so that the at least two computing modules multiplex the same spatial video memory space. Similarly, after the read-write operation of the multiplexed video memory space is finished, each module does not need to release the multiplexed video memory space, so that the processing times of releasing the video memory space of the GPU and distributing the video memory space of the GPU in the process of running the text processing model can be reduced, and the processing efficiency can be further improved.

Taking the example of a text processing model using a transducer model, the self-attention mechanism layer of the transducer model has a mechanism called "multi-head-attention". In an embodiment, the logical equivalence is performed on the operation, and the operation is changed into a calculation mode with higher concurrency, and in the data calculation process of the encoder layer and/or the decoder layer, a logical equivalence calculation module can be further used for processing the data, so that the concurrency is improved, and the processing efficiency can be further improved.

For example, a stitching weight calculation module may be used to calculate a matrix of sequence vectors of the input text vectors, and generate a stitching matrix of the sequence vectors. For example, the stitching weight calculation module includes a query weight calculation matrix, a key weight calculation matrix, a numerical weight calculation matrix, and the like that are stitched together, and the stitching matrix of the sequence vector includes a query weight matrix, a key weight matrix, a numerical weight matrix, and the like that are stitched together.

Processing the data using the logically equivalent computing module may include: during data processing at the decoder layer, text and text confidence probabilities of the output bits are determined by a softmax function. And for the current output bit, reserving part of texts and text credibility probabilities of which the text credibility meets the set conditions, and transmitting the part of texts and text credibility probabilities to the next output bit in a variable vector mode to calculate the text and text credibility probabilities.

The scheme changes the inherent calculation process of the transducer model and only ensures logical equivalence. In the multi-head-attribute process, the query, key and value are obtained by mapping calculation of three weight matrixes respectively; after logical equivalence is carried out, three weight matrixes can be spliced in rows, and the three weight matrixes are input into the splicing matrixes for one-time mapping to obtain splicing matrixes (query, key, value), so that concurrency is improved. In addition, for example, in the process of calculating softmax, only the probability values of the output values of the K models in the top ranking are saved and used for subsequent bundle searching calculation, instead of all the probability values of each symbol (token) of the whole word vector are calculated, so that the processing efficiency can be further improved.

As an implementation of the method shown in the above figures, the present application provides an embodiment of an operation device of a text processing model, fig. 12 shows a schematic structure of the operation device of the text processing model provided in this embodiment, where an embodiment of the device corresponds to the embodiment of the method shown in fig. 1 to 11, and the device may be applied to various electronic devices specifically. As shown in fig. 12, the operation device of the text processing model according to the present embodiment includes an input acquisition unit 1210, an encoder layer processing unit 1220, and a decoder layer processing unit 1230.

The input acquisition unit 1210 is configured to acquire an input text vector.

The encoder layer processing unit 1220 is configured for inputting the text vector into at least one encoder layer for processing to form an implicit layer vector.

The decoder layer processing unit 1230 is configured to input the hidden layer vector into at least one decoder layer for processing to generate an output text vector.

Further, the base kernel function includes at least one of: matrix row average, matrix row variance, and matrix dot product;

the combined kernel function includes a normalized processing function including matrix row average, matrix row variance, and matrix dot product.

Further, the encoder layer processing unit 1220 invoking a combined kernel function to process data during the encoder layer and/or the decoder layer processing unit 1230 in the decoder layer data calculation process includes:

In the data calculation process of the encoder layer and/or the decoder layer, distributing a thread group to the called combined kernel function, and reading in data from a video memory space;

operating the combined kernel function through a thread group, and reading and writing intermediate data in a calculation process of the combined kernel function in a register by utilizing a thread group communication mode;

and writing the final output data of the combined kernel function into a video memory space.

In an embodiment, the apparatus further comprises a video memory space configuration unit (not shown in fig. 12) for, prior to the encoder layer and/or decoder layer data calculation process: allocating a fixed-position video memory space for at least one computing module of the encoder layer and/or the decoder layer; the size of the video memory space is fixed, and the video memory space is kept in a vacant state when no data is read and written.

Further, the video memory space configuration unit is configured such that video memory spaces allocated to at least two of the calculation modules having a time-sharing calculation relationship are the same space multiplexed.

Further, in the video memory space configuration unit, the computing module multiplexing the same video memory space includes at least one group of following: a self-attention mechanism module of two encoder layers at intervals.

In one embodiment, the size of the video memory space is determined by the maximum value of the input text vector.

In one embodiment, in the process of calculating the data at the encoder layer by the encoder layer processing unit 1220 and/or the decoder layer by the decoder layer processing unit 1230, the method further comprises using a logical equivalent calculation module to process the data.

Further, processing the data with the logically equivalent computing module includes:

a splicing weight calculation module is adopted to calculate a matrix of the sequence vector of the input text vector, and a splicing matrix of the sequence vector is generated;

the splicing weight calculation module comprises a query weight calculation matrix, a key weight calculation matrix and a numerical weight calculation matrix which are spliced together; the splicing matrix of the sequence vector comprises a query weight matrix, a key weight matrix and a numerical weight matrix which are spliced together.

determining the text of the output bit and the text credibility probability through a softmax function in the data processing process of the decoder layer;

and for the current output bit, reserving part of texts and text credibility probabilities of which the text credibility meets the set conditions, and transmitting the part of texts and text credibility probabilities to the next output bit in a variable vector mode to calculate the text and text credibility probabilities.

In one embodiment, the text processing model includes a plurality of encoder layers connected in sequence and a plurality of decoder layers connected in sequence, with implicit layer vectors being transmitted between the last encoder layer and each decoder layer; each encoder layer includes at least a self-attention mechanism layer and a feedforward neural network layer; each decoder layer includes at least a self-attention mechanism layer, a codec attention mechanism layer, and a feedforward neural network layer.

The running device of the text processing model provided by the embodiment can execute the running method of the text processing model provided by the embodiment of the method disclosed by the invention, and has the corresponding functional modules and beneficial effects of the executing method.

Referring now to fig. 13, a schematic diagram of an electronic device 1300 suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 13 is merely an example and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 13, the electronic device 1300 may include a processing means (e.g., a central processor, a graphics processor, etc.) 1301, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1302 or a program loaded from a storage means 1308 into a Random Access Memory (RAM) 1303. In the RAM 1303, various programs and data necessary for the operation of the electronic apparatus 1300 are also stored. The processing device 1301, the ROM 1302, and the RAM 1303 are connected to each other through a bus 1304. An input/output (I/O) interface 1305 is also connected to bus 1304.

In general, the following devices may be connected to the I/O interface 1305: input devices 1306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 1307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 1308 including, for example, magnetic tape, hard disk, etc.; and communication means 1309. The communication means 1309 may allow the electronic device 1300 to communicate with other devices wirelessly or by wire to exchange data. While fig. 13 shows an electronic device 1300 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communications device 1309, or installed from the storage device 1308, or installed from the ROM 1302. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 1301.

It should be noted that, the computer readable medium described above in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the disclosed embodiments, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the disclosed embodiments, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to:

acquiring an input text vector;

Computer program code for carrying out operations for embodiments of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware. The name of the unit does not in any way constitute a limitation of the unit itself, for example the first acquisition unit may also be described as "unit acquiring at least two internet protocol addresses".

According to one or more embodiments of the present disclosure, in the method for operating a text processing model, the basic kernel function includes at least one of: matrix row average, matrix row variance, and matrix dot product;

According to one or more embodiments of the present disclosure, in the method for operating a text processing model, in a data calculation process of the encoder layer and/or the decoder layer, invoking a combined kernel function to process data includes:

According to one or more embodiments of the present disclosure, in the method for operating a text processing model, before the data calculation process of the encoder layer and/or the decoder layer, the method further includes:

Allocating a fixed-position video memory space for at least one computing module of the encoder layer and/or the decoder layer; the size of the video memory space is fixed, and the video memory space is kept in a vacant state when no data is read and written.

According to one or more embodiments of the present disclosure, in the method for operating a text processing model, the video memory spaces allocated by at least two computing modules having a time-sharing computing relationship are the same space that is multiplexed.

According to one or more embodiments of the present disclosure, the computing module that multiplexes the same video memory space includes at least one of the following:

a self-attention mechanism module of two encoder layers at intervals.

According to one or more embodiments of the present disclosure, in the method for operating a text processing model, the size of the video memory space is determined by a maximum value of the input text vector.

According to one or more embodiments of the present disclosure, in the operation method of the text processing model, in the data calculation process of the encoder layer and/or the decoder layer, the method further includes:

and processing the data by adopting a logic equivalent calculation module.

According to one or more embodiments of the present disclosure, in the method for operating a text processing model, processing data using a logically equivalent computing module includes:

According to one or more embodiments of the present disclosure, in the method for operating a text processing model, the text processing model includes a plurality of encoder layers sequentially connected and a plurality of decoder layers sequentially connected, and an implicit layer vector is transmitted between a last encoder layer and each decoder layer; each encoder layer includes at least a self-attention mechanism layer and a feedforward neural network layer; each decoder layer includes at least a self-attention mechanism layer, a codec attention mechanism layer, and a feedforward neural network layer.

According to one or more embodiments of the present disclosure, in the running device of the text processing model, the basic kernel function includes at least one of: matrix row average, matrix row variance, and matrix dot product;

According to one or more embodiments of the present disclosure, in the running apparatus of the text processing model, the invoking, by the encoder layer processing unit, a combined kernel function to process data in a data calculation process of the encoder layer and/or the decoder layer processing unit in the decoder layer includes:

According to one or more embodiments of the present disclosure, the running apparatus of the text processing model further includes a video memory space configuration unit for, prior to the data calculation process of the encoder layer and/or the decoder layer:

According to one or more embodiments of the present disclosure, in the running device of the text processing model, the video memory space configuration unit is further configured to: at least two video memory spaces allocated by the computing modules with the time-sharing computing relationship are the same space in multiplexing.

According to one or more embodiments of the present disclosure, in the video memory space configuration unit, the computing module that multiplexes the same video memory space includes at least one set of:

a self-attention mechanism module of two encoder layers at intervals.

According to one or more embodiments of the present disclosure, in the running apparatus of the text processing model, the size of the video memory space is determined by a maximum value of the input text vector.

According to one or more embodiments of the present disclosure, in the running apparatus of the text processing model, in the data calculation process of the encoder layer processing unit at the encoder layer and/or the decoder layer processing unit at the decoder layer, the method further includes processing data by using a logical equivalent calculation module.

According to one or more embodiments of the present disclosure, in the running device of the text processing model, the processing the data with the logical equivalent computing module includes:

According to one or more embodiments of the present disclosure, in the running apparatus of the text processing model, the text processing model includes a plurality of sequentially connected encoder layers and a plurality of sequentially connected decoder layers, and an implicit layer vector is transmitted between a last encoder layer and each decoder layer; each encoder layer includes at least a self-attention mechanism layer and a feedforward neural network layer; each decoder layer includes at least a self-attention mechanism layer, a codec attention mechanism layer, and a feedforward neural network layer.

The foregoing description is only of the preferred embodiments of the disclosed embodiments and is presented for purposes of illustration of the principles of the technology being utilized. It will be appreciated by those skilled in the art that the scope of the disclosure in the embodiments of the disclosure is not limited to the specific combination of the above technical features, but also encompasses other technical features formed by any combination of the above technical features or their equivalents without departing from the spirit of the disclosure. Such as the technical solution formed by mutually replacing the above-mentioned features and the technical features with similar functions (but not limited to) disclosed in the embodiments of the present disclosure.

Claims

1. A method of operating a text processing model, the text processing model including at least one encoder layer and at least one decoder layer, comprising:

Acquiring an input text vector;

in the data calculation process of the encoder layer and/or the decoder layer, a combined kernel function is called to process data, the combined kernel function comprises at least two basic kernel functions, the basic kernel functions are used for completing mathematical level calculation of the data, and the combined kernel functions are used for completing functional level calculation of the data;

wherein, in the data calculation process of the encoder layer and/or the decoder layer, the calling the combined kernel function to process the data comprises the following steps:

2. The method according to claim 1, characterized in that:

the basic kernel function includes at least one of: matrix row average, matrix row variance, and matrix dot product;

3. The method according to claim 1, further comprising, prior to the encoder layer and/or decoder layer data calculation process:

4. A method according to claim 3, wherein the video memory space allocated by at least two of the computation modules having a time-sharing computation relationship is the same space multiplexed.

5. The method of claim 4, wherein the computing module that multiplexes the same video memory space comprises at least one of the following:

a self-attention mechanism module of two encoder layers at intervals.

6. A method as claimed in claim 3, wherein the size of the video memory space is determined by the maximum value of the input text vector.

7. The method according to claim 1, further comprising, in the data calculation of the encoder layer and/or decoder layer:

processing the data by adopting a logic equivalent calculation module;

the processing of the data by adopting the logic equivalence calculation module comprises the following steps:

8. The method of claim 7, wherein processing the data using a logically equivalent computing module comprises:

9. The method of any of claims 1-2, 3-8, wherein the text processing model comprises a plurality of encoder layers connected in sequence and a plurality of decoder layers connected in sequence, and wherein an implicit layer vector is transmitted between a last encoder layer and each decoder layer; each encoder layer includes at least a self-attention mechanism layer and a feedforward neural network layer; each decoder layer includes at least a self-attention mechanism layer, a codec attention mechanism layer, and a feedforward neural network layer.

10. An apparatus for running a text processing model, the text processing model comprising at least one encoder layer and at least one decoder layer, the apparatus comprising:

an input acquisition unit configured to acquire an input text vector;

the encoder layer processing unit calls a combined kernel function to process data in the data computing process of the encoder layer and/or the decoder layer processing unit in the decoder layer, the combined kernel function comprises at least two basic kernel functions, the basic kernel functions are used for completing mathematical level computing of the data, and the combined kernel functions are used for completing functional level computing of the data;

Wherein, the encoder layer processing unit calls a combined kernel function to process data in the data calculation process of the encoder layer and/or the decoder layer processing unit in the decoder layer, and the method comprises the following steps:

11. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs;

the instructions that when executed by the one or more processors cause the one or more processors to implement the method of any of claims 1-9.

12. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1-9.