Disclosure of Invention
The application aims to provide a static deployment method, electronic equipment and medium for a coding and decoding model based on a GPU, which realize static deployment of the coding and decoding model on the GPU and meet the requirement of the coding and decoding model on the reasoning speed.
According to a first aspect of the present application, there is provided a GPU-based codec model static deployment method, including:
s1, acquiring an original feature matrix, and complementing columns of the original feature matrix to generate an complemented feature matrix, wherein the number of rows and the number of columns of the original feature matrix are M, the number of rows of the complemented feature matrix is M, and the number of columns of the complemented feature matrix is N, and N is more than M;
s2, acquiring a relative position coding matrix corresponding to the filling feature matrix, wherein the number of rows of the relative position coding matrix is N-1, and the number of columns of the relative position coding matrix is M;
s3, inputting the relative position coding matrix into an encoder of a coding and decoding model to generate relative position coding information, and storing the relative position coding information in a preset memory;
s4, the historical prediction information is complemented into R-bit input information, the complemented R-bit input information and the current effective information bit number X in the R-bit input information are input into a decoder of a coding and decoding model, relative position coding information is read from the preset memory, the (X+1) th prediction information is generated, the value range of X is 0 to R-1, and target information is generated based on the R-1 th prediction information.
According to a second aspect of the present application, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being arranged to perform the method according to the first aspect of the application.
According to a third aspect of the present application there is provided a computer readable storage medium storing computer executable instructions for performing the method of the first aspect of the present application.
Compared with the prior art, the application has obvious advantages and beneficial effects. By means of the technical scheme, the GPU-based coding and decoding model static deployment method, the electronic equipment and the media can achieve quite technical progress and practicality, have wide industrial utilization value, and have at least the following beneficial effects:
according to the embodiment of the application, the original feature matrix is supplemented to generate the supplemented feature matrix, the corresponding relative position coding matrix is obtained based on the supplemented feature matrix, the relative position coding information is generated based on the relative position coding matrix, the supplemented R-bit input information and the current effective information bit number X in the R-bit input information are input into a decoder of a coding and decoding model in the decoding process, and the X+1th prediction information is predicted, so that static deployment of the coding and decoding model on the GPU is realized, and the requirement of the coding and decoding model on the reasoning speed is met.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
The embodiment of the application provides a GPU-based coding and decoding model static deployment method, which is shown in fig. 1 and comprises the following steps:
s1, acquiring an original feature matrix, and complementing columns of the original feature matrix to generate an complemented feature matrix, wherein the number of rows and the number of columns of the original feature matrix are M, the number of rows of the complemented feature matrix is M, and the number of columns of the complemented feature matrix is N, and N > M.
The number of columns of the original feature matrix may be dynamically changed, and the fixed number of columns of the matrix input by the coding and decoding model is realized in a filling manner, so that the static deployment of the coding and decoding model encoder is realized. The codec model may specifically be an ASR model.
And S2, acquiring a relative position coding matrix corresponding to the filling feature matrix, wherein the number of rows of the relative position coding matrix is N-1, and the number of columns of the relative position coding matrix is M.
And S3, inputting the relative position coding matrix into an encoder of a coding and decoding model to generate relative position coding information, and storing the relative position coding information in a preset memory.
S4, the historical prediction information is complemented into R-bit input information, the complemented R-bit input information and the current effective information bit number X in the R-bit input information are input into a decoder of a coding and decoding model, relative position coding information is read from the preset memory, the (X+1) th prediction information is generated, the value range of X is 0 to R-1, and target information is generated based on the R-1 th prediction information.
In the prior art, when predicting information, the last bit of each prediction can be dynamically realized, but the application is statically deployed, so that the input information of R bits which are complemented each time is input, and the input information of R bits which are complemented each time is not the last bit of each time to be predicted.
It should be noted that, in the prior art, the conversion from absolute position coding to relative position coding may be directly implemented by rel_shift for the original feature matrix, but in the present application, since the original feature matrix may be dynamically changed and all the supplementary features are involved, the conversion from absolute position coding to relative position coding cannot be directly implemented by using the existing rel_shift. Based on the above, the embodiment of the application provides a set of modes suitable for the application, and can accurately generate and realize conversion from absolute position codes to relative position codes. As one embodiment, the element in the relative position matrix is a ij Wherein i is a ij Line number j is a ij The value range of i is 1 to M, the value range of j is 1 to N-1, and the relative position coding matrix is divided into a first area, a second area and a third area; element a in the first region ij The method meets the following conditions: i is less than or equal to M-1, j<M and a ij Located at a i(i+1) Left side; element a in the second region ij The method meets the following conditions: i.e<M-1,2<j≤M,a ij Located at a i(i+1) Right side; the elements in the third region satisfy j=i+1 or i>M-1, it is understood that the third region is all regions except the first region and the second region in the relative position coding matrix. The step S2 includes:
step S21, setting a first matrix and a second matrix, wherein the number of rows and the number of columns of the first matrix and the second matrix are M, each row of the first matrix is (1, 2,3, …, M, …, M), each column of the second matrix is (1, 2,3, …, M, …, M), and the value range of M is 1 to M.
Step S22, a first intermediate matrix and a second intermediate matrix are obtained based on the first matrix and the second matrix, and the number of rows and the number of columns of the first intermediate matrix and the second intermediate matrix are M.
Step S23, determining elements in a first area in a relative position coding matrix based on the first intermediate matrix, determining elements in a second area based on the second intermediate matrix, setting all elements in a third area to 0 based on a mask operation, and generating the relative position matrix.
Note that the portion corresponding to the third region is a portion not requiring attention, and therefore the elements of the third region are directly set to 0 entirely based on the masking operation.
As an embodiment, the first matrix comprises the element b xy The second matrix includes element c xy The first intermediate matrix includes an element d xy The second intermediate matrix includes element e xy X represents b xy 、c xy 、d xy 、e xy The corresponding number of rows, y represents b xy 、c xy 、d xy 、e xy The number of columns corresponds to the number of columns, the value range of x is 1 to M, the value range of y is 1 to M, and the step S22 includes:
step S221, based on b xy 、c xy And M determines d xy :d xy =b xy -c xy +M-1。
Wherein d is based on xy =b xy -c xy +M-1 enables determination of each d xy To obtain a first intermediate matrix for directing the generation of elements in a first region of the relative position-coding matrix.
Step S22, based on b xy 、c xy Determining e xy :e xy =b xy -c xy -1。
Wherein, based on e xy =b xy -c xy -1 being able to determine each e xy To obtain a second intermediate matrix for directing the generation of elements in a second region in the relative position-coding matrix.
As an embodiment, the step S23 includes:
step S231, locating the original feature matrix in the ith row, and the (d) ij The element of the column is determined to be element a in the first region ij Wherein d ij Is the ith row and jth column element of the first intermediate matrix.
Wherein the first region is a lower triangular region in the relative position encoding matrix, only the elements of the lower triangular region in the first intermediate matrix are required to be used in step S231, the row corresponding to each element in the lower triangular region is determined as the row in the original feature matrix, and d ij And (3) determining the values of the corresponding elements as columns in the original feature matrix, acquiring the corresponding elements from the original feature matrix, and filling the corresponding elements into corresponding positions in a first area of the relative position coding matrix.
Step S232, locating the original feature matrix in the ith row and the (e) ij The element of the column is determined to be element a in the second region ij Wherein e is ij Is the ith row and jth column element of the second intermediate matrix.
Wherein the second region is an upper triangle region in the relative position encoding matrix, in step S231, only the elements of the upper triangle region in the first intermediate matrix are needed to be used, the row corresponding to each element in the upper triangle region is determined as the row in the original feature matrix, and e ij And (3) determining the values of the corresponding elements as columns in the original feature matrix, acquiring the corresponding elements from the original feature matrix, and filling the corresponding elements into corresponding positions in a second area of the relative position coding matrix.
As an embodiment, the step S3 includes:
step S31, inputting the relative position coding matrix into an encoder of a coding and decoding model to execute a depth convolution operation, generating relative position coding information, and resetting elements of a third area of the relative position matrix to 0 based on a mask operation before each execution of the depth convolution operation.
It should be noted that, in the encoder, a plurality of depth convolution operations need to be performed, although in the initial state, the elements of the third area of the relative position matrix are already set to 0 based on the mask operation, in the process of performing the depth convolution, calculation of correlation and the like may be involved, so that part of the elements of the third area are no longer 0, and if the elements are not reset, the result of encoding will be affected, therefore, before each time the depth convolution operation is performed, the elements of the third area of the relative position matrix need to be reset to 0 based on the mask operation, so as to ensure the accuracy of encoding.
As an embodiment, the step S4 includes:
step S41, initially setting x=0, and initially setting the history preset information to U 0 。
Wherein U is 0 For predicting U 1 It will be appreciated that U 0 Not randomly set, but according to specific encoded content.
S42, generating R-bit input information from the history preset information, filling the input information of the part with 0, inputting the current filled R-bit input information and X into a decoder of a coding and decoding model, reading relative position coding information from the preset memory, and generating the X+1st prediction information U X+1 。
The method comprises the steps of inputting X into a decoder of a coding and decoding model, wherein the decoder is equivalent to designating bits needing to be predicted in advance, ensuring that each input is R bits, and ensuring accuracy on the basis of realizing static deployment.
Step S43, if X<R-1, the history preset information is set to (U) 0 ,U 1 ,…,U X+1 ) Setting x=x+1, returning to step S42, and if x=r-1, generating target information based on R-1 pieces of prediction information.
It should be noted that some exemplary embodiments are described as a process or a method depicted as a flowchart. Although a flowchart depicts steps as a sequential process, many of the steps may be implemented in parallel, concurrently, or with other steps. Furthermore, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, and the like.
The embodiment of the application also provides electronic equipment, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being configured to perform the methods of embodiments of the present application.
The embodiment of the application also provides a computer readable storage medium, which stores computer executable instructions for executing the method according to the embodiment of the application.
According to the embodiment of the application, the original feature matrix is supplemented to generate the supplemented feature matrix, the corresponding relative position coding matrix is obtained based on the supplemented feature matrix, the relative position coding information is generated based on the relative position coding matrix, the supplemented R-bit input information and the current effective information bit number X in the R-bit input information are input into a decoder of a coding and decoding model in the decoding process, and the X+1th prediction information is predicted, so that static deployment of the coding and decoding model on the GPU is realized, and the requirement of the coding and decoding model on the reasoning speed is met.
The present application is not limited to the above-mentioned embodiments, but is intended to be limited to the following embodiments, and any modifications, equivalents and modifications can be made to the above-mentioned embodiments without departing from the scope of the application.