CN113034339B

CN113034339B - Method for improving vibration data transmission bandwidth based on GPU acceleration

Info

Publication number: CN113034339B
Application number: CN202011153850.7A
Authority: CN
Inventors: 刘隆波; 彭军; 程红伟; 袁玉道; 刘文浩; 王思文; 贺喆; 王大翊; 刘鹏鹏; 熊玲
Original assignee: Tongfang Test Technology Beijing Co ltd; Chinese People's Liberation Army 92942 Army
Current assignee: Tongfang Test Technology Beijing Co ltd; Chinese People's Liberation Army 92942 Army
Priority date: 2020-10-26
Filing date: 2020-10-26
Publication date: 2024-04-09
Anticipated expiration: 2040-10-26
Also published as: CN113034339A

Abstract

The invention discloses a method for improving vibration data transmission bandwidth based on GPU acceleration, and belongs to the technical field of data compression algorithms and data transmission. The method comprises the steps that a GPU compression algorithm module is adopted to compress the data volume, and after the data is transmitted, a GPU decompression algorithm module is adopted to restore the data in a lossless manner; the GPU compression algorithm module omits sign bits according to the alternating characteristic of the vibration data, and then compresses the data in a large proportion according to the resolution ratio and the normalized recording mode of the collector; in the compression calculation process, data is uploaded to a display card in the form of an image RGBA, compression calculation is carried out by using an OpenGL rendering method, then the obtained image is transmitted back to a local memory, and the image is packed into a data structure to finish compression; and the GPU decompression algorithm firstly carries out modulus reduction on the data according to the logic of the compression algorithm on the values recorded in the compressed data packet, and then carries out symbol reduction. The invention can increase the data transmission bandwidth by more than one time, thereby reducing the probability of failure in the data transmission process.

Description

Method for improving vibration data transmission bandwidth based on GPU acceleration

Technical Field

The invention relates to a method for improving vibration data transmission bandwidth in reliability test through GPU acceleration, and belongs to the technical fields of data compression algorithm, openGL image rendering and data transmission.

Background

Reliability testing is an important element of reliability operation, and a significant part of reliability testing is vibration testing. The data used for collecting vibration data in vibration test practice at present is 64-bit double-precision floating point data, and single data occupies 8Bytes of a memory. Typically, the number of sensors of one collector is 64, the sampling frequency is 5120HZ, and then the data generated by a single sensor in 1 second reaches 40.96KB, so the data volume of a single collector in 1 second reaches 2.62MB, that is, the network transmission speed must reach 2.62M/s steadily, that is, 20.971Mbit/s. If the speed of the monograph is high, the hundred megacards can be easily handled, however, in the practice of vibration test, the continuous work is required for 7-24 hours, the heavier the network card is, the more unstable condition is easily generated, and then the more easily the data backlog or the transmission interruption is caused, so that the test fails; it is important to increase the data transmission bandwidth.

The current methods for improving bandwidth mainly include the following two methods: firstly, hardware facilities are improved, for example, gigabit network cards, gigabit routers and other gigabit-class equipment are used for replacing common hundred-megaclass equipment; and secondly, the data is compressed by adopting a general compression algorithm, so that more data can be transmitted under the condition of keeping the data transmission speed unchanged. However, the first type of method has high cost, because the replacement equipment needs to undergo the steps of equipment purchase, old equipment disassembly, new equipment installation and debugging and the like, the time, financial resources and labor cost are greatly increased; the second type of method adopts a general data compression method in the process of compressing data, does not grasp the characteristics of vibration data, and is simply data multiplexing, so that effective compression is difficult under the condition that a large amount of data are mutually unequal, and the compression ratio is unstable when the general algorithm compresses the vibration data; also, the general compression algorithm occupies CPU resources and increases CPU load, which is a great challenge for a collection computer that works for a long time and high load.

The reliability test has high cost and long time consumption, and any fault in the transmission process can cause the failure of the whole test process; the prior art has higher bandwidth required when transmitting vibration data, and has higher probability of generating faults when continuously working for a long time.

Disclosure of Invention

In view of the above, the invention provides a method for improving vibration data transmission bandwidth based on GPU acceleration, which can improve the data transmission bandwidth by more than one time, thereby reducing the probability of failure in the data transmission process, reducing the CPU burden increased by a compression algorithm, and further ensuring successful performance of a reliability test.

The method adopts a GPU compression algorithm module to compress the data volume, and uses a GPU decompression algorithm module to restore the data after transmission;

the GPU compression algorithm module omits sign bits according to alternating characteristics of vibration data, and then compresses data in a large proportion according to resolution and a normalized recording mode of the collector; in the compression calculation process, data is uploaded to a display card in the form of an image RGBA, compression calculation is carried out by using an OpenGL rendering method, then the obtained image is transmitted back to a local memory, and the image is packed into a data structure to finish compression; the GPU decompression algorithm is to firstly carry out modulus reduction on the data according to the logic of the compression algorithm on the values recorded in the compressed data packet, and then carry out symbol reduction according to the alternating characteristic of the vibration data.

Further, the steps implemented by the GPU compression algorithm module include:

the first step: writing header information (0 x0a 0x 55) for identifying the start position of the data packet;

and a second step of: write sensitivity, 0x01 denotes 16 bits, 0x02 denotes 32 bits;

and a third step of: writing a first value symbol, and directly writing a first number of symbol bits of source data; 0x00 represents '+',0x01 represents '-';

fourth step: writing the maximum value of the absolute value, traversing the absolute value of the source data to calculate the maximum value, writing in the 8-byte Double type, and writing in the data with the absolute value of the source data _max A representation;

fifth step: writing data quantity; traversing the source data, counting the number N of the source data and writing in;

sixth step: preparing GPU rendering;

seventh step: and writing the data into a data structure after the GPU rendering is completed.

Further, the process of preparing the GPU rendering in the sixth step includes

(1) Creating an OpenGL rendering context for representing an environment in which the current rendering is running;

(2) creating vertexes, wherein 4 vertexes are selected to create two triangles to finally form a square;

(3) creating a frame buffer, and setting the frame buffer as a target frame buffer of an OpenGL rendering context after the frame buffer is created;

(4) creating a texture object, creating a function of the texture object by using OpenGL, setting a sampling mode of the texture to be neighbor sampling, and binding a default repeating mode of the repeating mode to a corresponding texture unit GL_TEXTUREi, wherein the values of i are 0,1, … … and 31;

(5) vertex shader and fragment shader objects are created, created using OpenGL corresponding functions, and the shader is bound to the current context.

Further, the process of GPU rendering in the seventh step includes

(1) The original data is copied to pixels in the image to obtain an image of the packaged data, then a function of OpenGL uploading texture data is called to upload the image to the created corresponding texture unit, and the uploading format is selected to GL_RGBA32F_ARB;

(2) in the fragment shader, the input ith Zhang Wenli is sampled to obtain four values of pixel values R, G, B and A, and the four values are calculated and respectively substituted into n= |D|/|A| _max *2 ¹⁶ N is obtained by solving D in (1), rn, gn, bn and An are obtained, and the obtained Rn, gn, bn and An are written into a built-in variable gl_FragColor of a coloring code;

(3) compiling a fragment shader by using an OpenGL function to start rendering; the result data obtained after rendering is in the attachment of the frame buffer, and then the function of OpenGL for downloading data is used to copy the data from the video memory to the local memory;

(4) changing the value of i, and carrying out rendering and downloading data for 32 times in total;

(5) and sequentially solving the obtained data, and writing the data into a data structure.

Further, the step implemented by the GPU decompression algorithm module includes:

the first step: taking two bytes, judging whether the data packet is packet header information (0 x0A 0x 55), if the data packet passes the judgment, the data packet indicating that the DM is a data packet of a decompression algorithm module can be analyzed, otherwise, the data packet indicating that the DM is not the data packet of the decompression algorithm module is not analyzed;

and a second step of: taking a byte, analyzing the byte into a short type of 8 bits, and setting the short type as sendi, wherein the length of each piece of data is s=sendi 16;

and a third step of: taking one byte, analyzing the one byte into a short type of 8 bits, setting the short type as sn, determining a first symbol sign according to the value of the sn, wherein sign=1 when the sn=0x00, and sign= -1 when the sn=0x01;

fourth step: taking the eight bytes, resolving the eight bytes into a double type of 64 bits, and setting the double type as |A| _max ；

Fifth step: taking eight bytes, analyzing the eight bytes into a 64bit long type, and setting the long type as N;

sixth step: taking s bits, analyzing the bits according to an unsigned integer mode, and setting the bits as ds0; calculating d0=sign|a| according to the mode of the compression algorithm _max *ds0/2 ^s ；

Seventh step: taking s bits, stillResolving according to an unsigned integer mode, and setting the resolution as ds1; solving the mode of the compression algorithm to obtain the absolute value D1 absolute value= |A|according to the mode of the compression algorithm _max *ds1/2 ^s Then, the sign is calculated, and all values are calculated and solved in sequence.

Wherein, the process of solving the symbol again comprises the following steps:

step a), the left derivative Dleft at the point is obtained ₁ = |d1| - |d0|, right derivative droight ₁ ＝|D2|-|D1|；

Step b), when Dleft ₁ <=0 and weight ₁ >When the sign is=0, the signs at positions D1 to D2 are changed, and sign is subjected to a sign-changing operation sign= -sign; in other cases, sign is unchanged;

step c), finding d1=sign|d1|.

The beneficial effects are that:

1. the invention realizes the lossless compression of data according to the sensitivity of the collector, the compression rate of the compression algorithm is related to the sensitivity of the collector card, and the determined compression rate is independent of the data for the determined sensitivity, so the compression rate is stable.

2. According to the invention, specific data decompression is performed according to the characteristic of continuous alternation of vibration data, the compression algorithm is specially used for calculating the vibration data, the compression algorithm is irrelevant to the data, and the general compression algorithm can be used for compression after the compression is finished, so that a higher compression rate is obtained.

3. The invention utilizes the characteristic of GPU massive parallel computation during compression and decompression operation, improves the operation speed and reduces the CPU burden of the collector.

Drawings

FIG. 1 is a diagram of device vibration data;

FIG. 2 is a schematic diagram of the hardware components of the present invention;

FIG. 3 is a data structure diagram of a compression algorithm;

FIG. 4 is a schematic representation of the alternating position of the vibration function image;

FIG. 5 is a schematic diagram of the operation of data to image;

FIG. 6 is a schematic diagram of a process for processing vibration data by a GPU;

fig. 7 is a schematic diagram of converting frame buffered data into a data sequence.

Detailed Description

The invention will now be described in detail by way of example with reference to the accompanying drawings.

The invention provides a method for improving vibration data transmission bandwidth based on GPU acceleration, which aims at the vibration data shown in figure 1, wherein figure 1 is a functional image of a section of acceleration component in the x direction relative to time; the clear image reveals that the relation between the component of vibration acceleration in any direction and time has the characteristic of positive and negative alternation in vibration test.

FIG. 4 is a schematic representation of the alternating position of the vibration function image; the left side is a functional image of a component of acceleration in a certain direction relative to time t, which is used for comparing the relation between vibration data and time in a vibration test, wherein positive and negative alternate points and negative and positive alternate points are marked by black dots; the right image is a corresponding image of the left image after the modulo operation is added, and is used for comparing the image after the vibration data is compressed, and small black dots in the image correspond to the small black dots of the left image. The graph shows that the left derivative of the compressed data is smaller than or equal to 0, and the right derivative is the positive and negative orthogonal substitution point of the original function when the left derivative is larger than or equal to 0.

As shown in figure 2, the hardware adopted by the method comprises a sensor, a collector, a network interface, a display card and an upper computer; the collector comprises a collection card, a cache, a CPU, a display card and a GPU compression algorithm module; the upper computer comprises a database and a GPU decompression algorithm module; the sensor in the figure is a sensor for measuring vibration acceleration in a vibration test, and can transmit a voltage value back to the acquisition card according to the vibration of the sensor, and the acquisition card is arranged on the acquisition device; and the collector packages the data in the acquisition card by using the GPU compression algorithm module and then sends the data to the upper computer through the network port. And after the upper computer receives the data packet, the data is restored by using the decompression algorithm, and then the data is uploaded to a database.

The GPU compression algorithm module comprises the following steps of:

fifth step: writing data quantity; traversing the source data, counting the number N of the source data and writing in; it is proposed here to use a fixed amount of data: integer multiples of 32 x 1024 x 4 = 128M; this is because the minimum texture resolution required by the OpenGL standard is 1024 x 1024, the maximum number of textures is 32, and the maximum number of values each pixel accommodates is 4 values; thus, all display cards can be compatible.

Sixth step: GPU rendering preparation, (1) creating OpenGL rendering context to represent the environment in which the current rendering is running is a necessary step for OpenGL rendering. (2) The creation of the vertex is realized by selecting 4 vertexes, namely an upper left vertex (-1.0,1.0,0.0), an upper right vertex (1.0,1.0,0.0), a lower right vertex (1.0, -1.0,0.0) and a lower left vertex (-1.0, -1.0,0.0), wherein the four vertexes are used for creating two triangles (basic graphics of OpenGL) to finally form a square. (3) The creation of the frame buffer is made possible by using the function of OpenGL to create the frame buffer (OpenGL specifies that the complete frame buffer must contain a texture attachment, here set the data format of this texture to gl_rgba32 f_arb). After the creation is completed, the frame buffer is set as a target frame buffer of OpenGL rendering context, and the purpose of this is to transfer the output image into this buffer. (in order to be compatible with all display cards, 1024 x 1024 size frame buffer memory is selected, and larger size frame buffer memory can be selected according to the performance of the display card) (4) a texture object is created, the function of creating the texture object is used for creating OpenGL, the sampling mode of the texture is set to be neighbor sampling, the default repeating mode can be taken by the repeating mode, and then the texture is bound to a corresponding texture unit GL_TEXTUREi (wherein the values of i are 0,1, … … and 31). (5) Vertex shader and fragment shader objects are created, created using OpenGL corresponding functions, and the shader is bound to the current context.

Preferably using the version opengl3.0 above; the codes processed by the GPU are all put into a program in the form of character strings, then are compiled by using OpenGL functions, and are correspondingly bound into a top point shader and a fragment shader after being compiled, and finally can be used in rendering; the texture is provided with the neighbor mode because the texture needs to sample the original value in the image when sampling, if other sampling modes are selected, the system can automatically interpolate when sampling, so that the data is inaccurate; when the texture selects the repeating mode, the default repeating mode can be selected, and because the image and the frame buffer resolution used by the method are the same, the texture can completely cover the target graph without repeating; the frame buffer is set to have a format of gl_rgba32f_arb because the default format is uchar type for each data, and floating point type is handled here, so gl_rgba32f_arb is changed.

Seventh step: GPU rendering (the invention in this step is to perform 32 renderings, the rendering index refers to i), and the code of the rendering pipeline control part in this step is mainly written into the fragment shader code. (1) As in fig. 5, the original data is copied to the pixels in the image to obtain an image of the encapsulated data, and then the function of OpenGL uploading the texture data is called to upload the image to the corresponding texture unit created in the sixth step, and the uploading format selects gl_rgba32f_arb. (2) In the fragment shader, the input i Zhang Wenli is sampled to obtain four values of pixel values R, G, B and a. And then the four values are calculated and respectively substituted into n= |D|/|A| _max *2 ¹⁶ N is obtained by solving for D in (1), rn, gn, bn and An are obtained, and the obtained values are written into a built-in variable gl_FragColor of the coloring code. (3) Compiling a fragment shader by using an OpenGL function, and starting rendering; and (3) the result data obtained after the rendering is completed is in an attachment of the frame buffer, namely the frame buffer created in the sixth step, and the data is copied from the video memory to the local memory by using a function of the downloaded data of OpenGL. (4) Changing the value of i, rendering, downloading data a total of 32 times (more renderings are required for a larger amount of data, suggested as an integer multiple of 32). (5) The obtained data are sequentially decoded as shown in fig. 7 and written into the data structure.

Wherein, FIG. 5 is a data-to-image process; first copy 4 data into four values of red, green, blue and opacity of a pixel, and then copy this pixel into the first pixel of the image; in this way all data is copied into the image in turn, forming an image of the encapsulated data. The image data format here selects the gl_rgba32F format of the OpenGL standard (the highest precision floating-point type allowed by OpenGL).

FIG. 6 is a process by which the GPU processes vibration data; firstly, uploading an image of the packaged data as a texture to a display card, and storing the image into a texture storage area corresponding to GL_TEXTUREi in OpenGL; then establishing a frame buffer (Framebuffer Object) and shader load textures; finally, setting the frame buffer as a current output target, and sampling and calculating texture data for the GPU to execute shader codes through a rendering pipeline specified by OpenGL. After the rendering is finished, the data in the display card is obtained, and then the data is downloaded to the local memory by using the function provided by OpenGL for reading the cache.

FIG. 7 is a diagram of converting frame buffered data into a data sequence; firstly, after data is downloaded from a display card to a local memory, the data format is RGBA image data format, and then the pixel data are sequentially put into a data sequence to obtain a data sequence which is converted from a frame buffer memory.

The fragment shader is an important object that controls the rendering pipeline, where the code logic is to be allocated to each GPU unit in the GPU array; once the rendering work is started, all GPU units start to work at the same time, and each pixel value of each corresponding coordinate is sampled and calculated. The GPU increases the operation speed in this way.

Assuming that the variable is termination, the data packet read in from the network interface is DM; the first number after analysis is D0, and the second number is D1, … … Dn; the left derivative of each number after analysis is Dleft _i Right derivative value is Dright _i Where i=0, 1,2, … …, n;

the GPU decompression algorithm module comprises the following steps of:

Seventh step: taking s bits, and still analyzing the bits according to an unsigned integer mode, and setting the bits as ds1; solving the mode of the compression algorithm to obtain the absolute value D1 absolute value= |A|according to the mode of the compression algorithm _max *ds1/2 ^s Then, the sign is calculated, and all values are calculated and solved in sequence.

Wherein, the process of solving the symbol again comprises the following steps:

step c), finding d1=sign|d1|.

In practical objects and engineering structures, the mass and elasticity of the system are continuously distributed, the system has 3 degrees of freedom, and the system can be generally simplified into a multi-degree-of-freedom vibration system in engineering test analysis. The differential equation for free vibration of an undamped system with 6 degrees of freedom can be derived from newton's second law as follows:

wherein: m is the mass matrix of the system, K is the stiffness matrix, x is the displacement vector,is an acceleration vector; according to differential equation theory, its general solution satisfies the form:

from the general solution it can be seen that the displacement vector is a superposition of a plurality of sinusoidal functions. x (t) is a continuously-derivable function, and its second derivative has the same form as x (t). The acceleration-time function of the vibration is continuously derivable from a formal sense, that is to say the signal source of the vibration sensor is a continuously derivable function with respect to time. From the properties of the continuous function, it is known that: the function has a left derivative equal to the reciprocal for each point.

When the compression algorithm is finished, the function x (t) is changed into |x (t) | after being processed by the compression algorithm module, but the function |x (t) | is not a continuous function, and then the discontinuous point is the position with alternating symbols; considering that the actual values are all discrete values, the left and right derivatives at each point in the function in x (t) cannot be exactly equal, their difference depends on the sampling density; but in case the sampling frequency is high enough, the left derivative of i x (t) | <0 then indicates that the left value of the point is decreasing, while the right derivative of i x (t) |=0 at the point indicates that the right side is unchanged or increasing, then the point must be discontinuous, that is to say it is the point where the sign of the original function x (t) changes.

Each set of data given is ultimately packed in the data structure shown in fig. 3. Head is the identification of the data packet, as long as it is distinctive; the Sensitivity represents the Sensitivity, the values are 0x01 and 0x02, and the Sensitivity corresponds to 16bit and 32bit; first Sign represents the Sign of the First value in a given set of data, and takes the values of 0x00 and 0x01, which correspond to '+', and '-' signs respectively; abs Max represents the maximum value of the modulus in a given set of data; the total amount of data represents the amount of a given set of data; data0, data1 … … represent the storage of a given set of Data.

According to the data structure of the compression algorithm shown in fig. 3, when the sensitivity of the acquisition card is 16 bits, 2 bytes are needed to store one data, 75% of space can be saved compared with the data of 8bytes used at present, and when a large amount of data is compressed, the fixed length 16bytes in the data structure can be ignored, so that the bandwidth can be increased by 300%.

In summary, the above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The method is characterized in that a GPU compression algorithm module is adopted to compress the data volume, and a GPU decompression algorithm module is used to restore the data after the data is transmitted;

the GPU compression algorithm module omits sign bits according to alternating characteristics of vibration data, and then compresses data in a large proportion according to resolution and a normalized recording mode of the collector; in the compression calculation process, data is uploaded to a display card in the form of an image RGBA, compression calculation is carried out by using an OpenGL rendering method, then the obtained image is transmitted back to a local memory, and the image is packed into a data structure to finish compression; the GPU decompression algorithm is to firstly carry out modulus reduction on the data according to the logic of the compression algorithm on the values recorded in the compressed data packet, and then carry out symbol reduction according to the alternating characteristic of the vibration data;

the GPU compression algorithm module comprises the following steps of:

sixth step: preparing GPU rendering;

seventh step: writing data into a data structure after GPU rendering is completed;

the GPU decompression algorithm module comprises the following steps of:

2. The GPU-accelerated vibration data transmission bandwidth-based method of claim 1, wherein the process of GPU rendering preparation in the sixth step comprises

3. The GPU acceleration-based method for increasing vibration data transmission bandwidth of claim 2, wherein the process of GPU rendering in the seventh step comprises

4. The GPU acceleration based method for increasing bandwidth of vibratory data transmission of claim 1, wherein the re-symbolizing in the seventh step comprises:

step c), finding d1=sign|d1|.