CN116991431A

CN116991431A - GPU-based coding and decoding model static deployment method, electronic equipment and medium

Info

Publication number: CN116991431A
Application number: CN202310983908.8A
Authority: CN
Inventors: 谢佳形
Original assignee: Muxi Integrated Circuit Hangzhou Co ltd
Current assignee: Muxi Lingzhi Technology Hangzhou Co ltd
Priority date: 2023-08-04
Filing date: 2023-08-04
Publication date: 2023-11-03
Anticipated expiration: 2043-08-04
Also published as: CN116991431B

Abstract

The application relates to the technical field of computers, in particular to a static deployment method of a coding and decoding model based on a Graphic Processing Unit (GPU), electronic equipment and a medium, wherein the method comprises the following steps of S1, acquiring an original feature matrix, and generating a complement feature matrix by complementing columns of the original feature matrix; s2, acquiring a relative position coding matrix corresponding to the filling feature matrix; s3, inputting the relative position coding matrix into an encoder of the coding and decoding model to generate relative position coding information; s4, the historical prediction information is complemented into R-bit input information, the complemented R-bit input information and the current effective information bit number X in the R-bit input information are input into a decoder of the coding and decoding model, relative position coding information is read, the (X+1) th prediction information is generated, the value range of X is 0 to R-1, and target information is generated based on the R-1 prediction information. The application realizes static deployment of the coding and decoding model on the GPU and meets the requirement of the coding and decoding model on the reasoning speed.

Description

GPU-based coding and decoding model static deployment method, electronic equipment and medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a static deployment method for a codec model based on a GPU, an electronic device, and a medium.

Background

The input length of some encoding-decoding (Encoder-Decoder) models (hereinafter codec models) is dynamically changing, resulting in output length and intermediate feature length also being dynamically changing, such as speech recognition (Automatic Speech Recognition, ASR) models. ASR is an important application in the AI field, a technology that converts human speech into text. The codec model has high requirements on the reasoning speed, and for the requirements on the reasoning speed, the trained codec model is generally required to be deployed on a graphics processor (Graphics Processing Unit, referred to as GPU for short), some GPUs support the dynamic deployment of the codec model, but some GPUs do not support the dynamic deployment of the codec model, so how to implement the static deployment of the codec model on the GPU, and meeting the requirements of the codec model on the reasoning speed becomes a technical problem to be solved urgently.

Disclosure of Invention

The application aims to provide a static deployment method, electronic equipment and medium for a coding and decoding model based on a GPU, which realize static deployment of the coding and decoding model on the GPU and meet the requirement of the coding and decoding model on the reasoning speed.

According to a first aspect of the present application, there is provided a GPU-based codec model static deployment method, including:

s1, acquiring an original feature matrix, and complementing columns of the original feature matrix to generate an complemented feature matrix, wherein the number of rows and the number of columns of the original feature matrix are M, the number of rows of the complemented feature matrix is M, and the number of columns of the complemented feature matrix is N, and N is more than M;

s2, acquiring a relative position coding matrix corresponding to the filling feature matrix, wherein the number of rows of the relative position coding matrix is N-1, and the number of columns of the relative position coding matrix is M;

s3, inputting the relative position coding matrix into an encoder of a coding and decoding model to generate relative position coding information, and storing the relative position coding information in a preset memory;

s4, the historical prediction information is complemented into R-bit input information, the complemented R-bit input information and the current effective information bit number X in the R-bit input information are input into a decoder of a coding and decoding model, relative position coding information is read from the preset memory, the (X+1) th prediction information is generated, the value range of X is 0 to R-1, and target information is generated based on the R-1 th prediction information.

According to a second aspect of the present application, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being arranged to perform the method according to the first aspect of the application.

According to a third aspect of the present application there is provided a computer readable storage medium storing computer executable instructions for performing the method of the first aspect of the present application.

Compared with the prior art, the application has obvious advantages and beneficial effects. By means of the technical scheme, the GPU-based coding and decoding model static deployment method, the electronic equipment and the media can achieve quite technical progress and practicality, have wide industrial utilization value, and have at least the following beneficial effects:

according to the embodiment of the application, the original feature matrix is supplemented to generate the supplemented feature matrix, the corresponding relative position coding matrix is obtained based on the supplemented feature matrix, the relative position coding information is generated based on the relative position coding matrix, the supplemented R-bit input information and the current effective information bit number X in the R-bit input information are input into a decoder of a coding and decoding model in the decoding process, and the X+1th prediction information is predicted, so that static deployment of the coding and decoding model on the GPU is realized, and the requirement of the coding and decoding model on the reasoning speed is met.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a static deployment method for a GPU-based codec model according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.

The embodiment of the application provides a GPU-based coding and decoding model static deployment method, which is shown in fig. 1 and comprises the following steps:

s1, acquiring an original feature matrix, and complementing columns of the original feature matrix to generate an complemented feature matrix, wherein the number of rows and the number of columns of the original feature matrix are M, the number of rows of the complemented feature matrix is M, and the number of columns of the complemented feature matrix is N, and N > M.

The number of columns of the original feature matrix may be dynamically changed, and the fixed number of columns of the matrix input by the coding and decoding model is realized in a filling manner, so that the static deployment of the coding and decoding model encoder is realized. The codec model may specifically be an ASR model.

And S2, acquiring a relative position coding matrix corresponding to the filling feature matrix, wherein the number of rows of the relative position coding matrix is N-1, and the number of columns of the relative position coding matrix is M.

And S3, inputting the relative position coding matrix into an encoder of a coding and decoding model to generate relative position coding information, and storing the relative position coding information in a preset memory.

In the prior art, when predicting information, the last bit of each prediction can be dynamically realized, but the application is statically deployed, so that the input information of R bits which are complemented each time is input, and the input information of R bits which are complemented each time is not the last bit of each time to be predicted.

It should be noted that, in the prior art, the conversion from absolute position coding to relative position coding may be directly implemented by rel_shift for the original feature matrix, but in the present application, since the original feature matrix may be dynamically changed and all the supplementary features are involved, the conversion from absolute position coding to relative position coding cannot be directly implemented by using the existing rel_shift. Based on the above, the embodiment of the application provides a set of modes suitable for the application, and can accurately generate and realize conversion from absolute position codes to relative position codes. As one embodiment, the element in the relative position matrix is a _ij Wherein i is a _ij Line number j is a _ij The value range of i is 1 to M, the value range of j is 1 to N-1, and the relative position coding matrix is divided into a first area, a second area and a third area; element a in the first region _ij The method meets the following conditions: i is less than or equal to M-1, j<M and a _ij Located at a _i(i+1) Left side; element a in the second region _ij The method meets the following conditions: i.e<M-1，2<j≤M，a _ij Located at a _i(i+1) Right side; the elements in the third region satisfy j=i+1 or i>M-1, it is understood that the third region is all regions except the first region and the second region in the relative position coding matrix. The step S2 includes:

step S21, setting a first matrix and a second matrix, wherein the number of rows and the number of columns of the first matrix and the second matrix are M, each row of the first matrix is (1, 2,3, …, M, …, M), each column of the second matrix is (1, 2,3, …, M, …, M), and the value range of M is 1 to M.

Step S22, a first intermediate matrix and a second intermediate matrix are obtained based on the first matrix and the second matrix, and the number of rows and the number of columns of the first intermediate matrix and the second intermediate matrix are M.

Step S23, determining elements in a first area in a relative position coding matrix based on the first intermediate matrix, determining elements in a second area based on the second intermediate matrix, setting all elements in a third area to 0 based on a mask operation, and generating the relative position matrix.

Note that the portion corresponding to the third region is a portion not requiring attention, and therefore the elements of the third region are directly set to 0 entirely based on the masking operation.

As an embodiment, the first matrix comprises the element b _xy The second matrix includes element c _xy The first intermediate matrix includes an element d _xy The second intermediate matrix includes element e _xy X represents b _xy 、c _xy 、d _xy 、e _xy The corresponding number of rows, y represents b _xy 、c _xy 、d _xy 、e _xy The number of columns corresponds to the number of columns, the value range of x is 1 to M, the value range of y is 1 to M, and the step S22 includes:

step S221, based on b _xy 、c _xy And M determines d _xy ：d _xy ＝b _xy -c _xy +M-1。

Wherein d is based on _xy ＝b _xy -c _xy +M-1 enables determination of each d _xy To obtain a first intermediate matrix for directing the generation of elements in a first region of the relative position-coding matrix.

Step S22, based on b _xy 、c _xy Determining e _xy ：e _xy ＝b _xy -c _xy -1。

Wherein, based on e _xy ＝b _xy -c _xy -1 being able to determine each e _xy To obtain a second intermediate matrix for directing the generation of elements in a second region in the relative position-coding matrix.

As an embodiment, the step S23 includes:

step S231, locating the original feature matrix in the ith row, and the (d) _ij The element of the column is determined to be element a in the first region _ij Wherein d _ij Is the ith row and jth column element of the first intermediate matrix.

Wherein the first region is a lower triangular region in the relative position encoding matrix, only the elements of the lower triangular region in the first intermediate matrix are required to be used in step S231, the row corresponding to each element in the lower triangular region is determined as the row in the original feature matrix, and d _ij And (3) determining the values of the corresponding elements as columns in the original feature matrix, acquiring the corresponding elements from the original feature matrix, and filling the corresponding elements into corresponding positions in a first area of the relative position coding matrix.

Step S232, locating the original feature matrix in the ith row and the (e) _ij The element of the column is determined to be element a in the second region _ij Wherein e is _ij Is the ith row and jth column element of the second intermediate matrix.

Wherein the second region is an upper triangle region in the relative position encoding matrix, in step S231, only the elements of the upper triangle region in the first intermediate matrix are needed to be used, the row corresponding to each element in the upper triangle region is determined as the row in the original feature matrix, and e _ij And (3) determining the values of the corresponding elements as columns in the original feature matrix, acquiring the corresponding elements from the original feature matrix, and filling the corresponding elements into corresponding positions in a second area of the relative position coding matrix.

As an embodiment, the step S3 includes:

step S31, inputting the relative position coding matrix into an encoder of a coding and decoding model to execute a depth convolution operation, generating relative position coding information, and resetting elements of a third area of the relative position matrix to 0 based on a mask operation before each execution of the depth convolution operation.

It should be noted that, in the encoder, a plurality of depth convolution operations need to be performed, although in the initial state, the elements of the third area of the relative position matrix are already set to 0 based on the mask operation, in the process of performing the depth convolution, calculation of correlation and the like may be involved, so that part of the elements of the third area are no longer 0, and if the elements are not reset, the result of encoding will be affected, therefore, before each time the depth convolution operation is performed, the elements of the third area of the relative position matrix need to be reset to 0 based on the mask operation, so as to ensure the accuracy of encoding.

As an embodiment, the step S4 includes:

step S41, initially setting x=0, and initially setting the history preset information to U ₀ 。

Wherein U is ₀ For predicting U ₁ It will be appreciated that U ₀ Not randomly set, but according to specific encoded content.

S42, generating R-bit input information from the history preset information, filling the input information of the part with 0, inputting the current filled R-bit input information and X into a decoder of a coding and decoding model, reading relative position coding information from the preset memory, and generating the X+1st prediction information U _X+1 。

The method comprises the steps of inputting X into a decoder of a coding and decoding model, wherein the decoder is equivalent to designating bits needing to be predicted in advance, ensuring that each input is R bits, and ensuring accuracy on the basis of realizing static deployment.

Step S43, if X<R-1, the history preset information is set to (U) ₀ ,U ₁ ,…,U _X+1 ) Setting x=x+1, returning to step S42, and if x=r-1, generating target information based on R-1 pieces of prediction information.

It should be noted that some exemplary embodiments are described as a process or a method depicted as a flowchart. Although a flowchart depicts steps as a sequential process, many of the steps may be implemented in parallel, concurrently, or with other steps. Furthermore, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, and the like.

The embodiment of the application also provides electronic equipment, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being configured to perform the methods of embodiments of the present application.

The embodiment of the application also provides a computer readable storage medium, which stores computer executable instructions for executing the method according to the embodiment of the application.

The present application is not limited to the above-mentioned embodiments, but is intended to be limited to the following embodiments, and any modifications, equivalents and modifications can be made to the above-mentioned embodiments without departing from the scope of the application.

Claims

1. The method for statically deploying the encoding and decoding model based on the GPU is characterized by comprising the following steps of:

2. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the element in the relative position matrix is a _ij Wherein i is a _ij Line number j is a _ij The value range of i is 1 to M, the value range of j is 1 to N-1, and the relative position coding matrix is divided into a first area, a second area and a third area; element a in the first region _ij The method meets the following conditions: i is less than or equal to M-1, j<M and a _ij Located at a _i(i+1) Left side; element a in the second region _ij The method meets the following conditions: i.e<M-1，2<j≤M，a _ij Located at a _i(i+1) Right side; the elements in the third region satisfy j=i+1 or i>M-1; the step S2 includes:

step S21, setting a first matrix and a second matrix, wherein the number of rows and the number of columns of the first matrix and the second matrix are M, each row of the first matrix is (1, 2,3, …, M, …, M), each column of the second matrix is (1, 2,3, …, M, …, M), and the value range of M is 1 to M;

step S22, a first intermediate matrix and a second intermediate matrix are obtained based on the first matrix and the second matrix, and the number of rows and the number of columns of the first intermediate matrix and the second intermediate matrix are M;

3. The method of claim 2, wherein the step of determining the position of the substrate comprises,

the first matrix includes element b _xy The second matrix includes element c _xy The first intermediate matrix includes an element d _xy The second intermediate matrix includes element e _xy X represents b _xy 、c _xy 、d _xy 、e _xy The corresponding number of rows, y represents b _xy 、c _xy 、d _xy 、e _xy The number of columns corresponds to the number of columns, the value range of x is 1 to M, the value range of y is 1 to M, and the step S22 includes:

step S221, based on b _xy 、c _xy And M determines d _xy ：d _xy ＝b _xy -c _xy +M-1；

4. The method of claim 3, wherein the step of,

the step S23 includes:

step S231, locating the original feature matrix in the ith row, and the (d) _ij The element of the column is determined to be element a in the first region _ij Wherein d _ij A j-th column element of an i-th row of the first intermediate matrix;

5. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the step S3 includes:

6. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the step S4 includes:

step S41, initially setting x=0, and initially setting the history preset information to U ₀ ；

S42, generating R-bit input information from the history preset information, filling the input information of the part with 0, inputting the current filled R-bit input information and X into a decoder of a coding and decoding model, reading relative position coding information from the preset memory, and generating the X+1st prediction information U _X+1 ；

7. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the coding and decoding model is an ASR model.

8. An electronic device, comprising:

at least one processor;

and a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor, the instructions being arranged to perform the method of any of the preceding claims 1-7.

9. A computer readable storage medium, characterized in that computer executable instructions are stored for performing the method of any of the preceding claims 1-7.