CN108233942A

CN108233942A - A kind of method, apparatus and computer equipment for data storage

Info

Publication number: CN108233942A
Application number: CN201810014959.9A
Authority: CN
Inventors: 胡耀文; 张文明; 陈少杰
Original assignee: Wuhan Douyu Network Technology Co Ltd
Current assignee: Zichang Dongkai Network Technology Co.,Ltd.
Priority date: 2018-01-08
Filing date: 2018-01-08
Publication date: 2018-06-29
Anticipated expiration: 2038-01-08
Also published as: CN108233942B

Abstract

An embodiment of the present invention provides a kind of method, apparatus and computer equipment for data storage, apply in platform is broadcast live, the method includes：More parts of performance sampled datas are obtained, the performance sampled data includes：A variety of method names and corresponding operation data；String encoding is carried out using each method name in every part of performance data as different basic elements, generates multiple unshared coding schedules；Based on multiple unshared coding schedules, the method title after each coding is searched respectively；The unshared coding schedule is subjected to first compression with the method title after each coding respectively, obtains each side's legitimate name after first compression；Each the method title after first compression and each corresponding operation data of the method title are serialized respectively, generate character string sequence；The character string sequence is subjected to second-compressed, compression result is obtained, the compression result is stored into database.

Description

Method and device for data storage and computer equipment

Technical Field

The invention belongs to the technical field of network operation, and particularly relates to a method and a device for data storage and computer equipment.

Background

After performance sampling using xhprof, the sampled data needs to be stored in a database to be retrieved and analyzed.

The performance sampling data is a large array, and the common processing method is firstly compressed by using a conventional compression algorithm and then stored.

Disclosure of Invention

Aiming at the problems in the prior art, the embodiments of the present invention provide a method, an apparatus and a computer device for data storage, which are used to solve the technical problem in the prior art that when performance sample data is compressed, the compression rate is low, which results in large occupied storage space.

The invention provides a method for data storage, which is applied to a live broadcast platform and comprises the following steps:

obtaining a plurality of performance sampling data, wherein the performance sampling data comprises: multiple method names and corresponding operating data;

taking each method name in each piece of performance data as different basic elements to carry out character string coding, and generating a plurality of unshared coding tables;

respectively searching the method name after each type of coding based on a plurality of non-shared coding tables;

respectively compressing the non-shared coding table and the method name after each coding for one time to obtain the name of each method after one compression;

serializing each compressed method name and the running data corresponding to each method name respectively to generate a character string sequence;

and carrying out secondary compression on the character string sequence to obtain a compression result, and storing the compression result into a database.

In the above scheme, the generating an encoding table by performing character string encoding using each method name as a different basic element includes:

respectively counting the number of each method name in a plurality of sets of performance sampling data;

assigning a unique character string to each method name in each piece of performance sampling data;

and respectively storing the name of each method, the number of the names of each method and the corresponding character string of each piece of performance sampling data into corresponding mapping tables, wherein the mapping tables are non-shared encoding tables, and the corresponding character strings are preset.

In the above scheme, in the character string sequence, separators are provided between different character strings.

In the foregoing solution, the operation data includes: the running time, the running times, the memory occupied by running and the utilization rate of a Central Processing Unit (CPU) of each method name.

The present invention also provides an apparatus for data storage, the apparatus comprising:

an obtaining unit, configured to obtain multiple pieces of performance sampling data, where the performance sampling data includes: multiple method names and corresponding operating data;

the encoding unit is used for carrying out character string encoding on each method name in each piece of performance data as different basic elements to generate a plurality of non-shared encoding tables;

a searching unit, configured to search the names of the methods after each encoding based on a plurality of the unshared encoding tables, respectively;

the first compression unit is used for respectively carrying out primary compression on the non-shared coding table and the method name after each coding to obtain the name of each method after primary compression;

the generating unit is used for serializing each compressed method name and the running data corresponding to each method name respectively to generate a character string sequence;

the second compression unit is used for carrying out secondary compression on the character string sequence to obtain a compression result;

and the storage unit is used for storing the compression result into a database.

In the foregoing solution, the encoding unit is specifically configured to:

In the foregoing solution, the operation data includes: the running time, the running times, the memory occupied by running and the CPU utilization rate of the central processing unit of each method name.

The invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is capable of performing the method as described in any one of the above.

The present invention also provides a computer device for data storage, comprising:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein,

the memory stores program instructions executable by the processor, the processor being capable of performing the method as described in any one of the above.

The embodiment of the invention provides a method, a device and computer equipment for data storage, which are applied to a live broadcast platform, wherein the method comprises the following steps: obtaining a plurality of performance sampling data, wherein the performance sampling data comprises: multiple method names and corresponding operating data; taking each method name in each piece of performance data as different basic elements to carry out character string coding, and generating a plurality of unshared coding tables; respectively searching the method name after each type of coding based on a plurality of non-shared coding tables; respectively compressing the non-shared coding table and the method name after each coding for one time to obtain the name of each method after one compression; serializing each compressed method name and the running data corresponding to each method name respectively to generate a character string sequence; performing secondary compression on the character string sequence to obtain a compression result, and storing the compression result into a database; therefore, when data are stored, the non-shared coding table and the method name after each coding are firstly compressed for the first time to obtain the names of the methods after the first compression, then the character string sequence is compressed for the second time to obtain the compression result, and the data are compressed for the second time to improve the compression ratio; and when the method names are coded, each method name is used as a basic element for coding, and single characters are coded, which is not the same as the prior art, so that the data to be stored is further reduced, and the occupied storage space is reduced.

Drawings

Fig. 1 is a schematic flowchart of a method for storing data according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an apparatus for data storage according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a computer device for data storage according to a third embodiment of the present invention.

Detailed Description

In order to solve the technical problem that when performance sampling data is compressed in the prior art, the compression ratio is low, and the occupied storage space is large, the invention provides a method, a device and computer equipment for data storage, which are applied to a live broadcast platform, wherein the method comprises the following steps: obtaining a plurality of performance sampling data, wherein the performance sampling data comprises: multiple method names and corresponding operating data; taking each method name in each piece of performance data as different basic elements to carry out character string coding, and generating a plurality of unshared coding tables; respectively searching the method name after each type of coding based on a plurality of non-shared coding tables; respectively compressing the non-shared coding table and the method name after each coding for one time to obtain the name of each method after one compression; serializing each compressed method name and the running data corresponding to each method name respectively to generate a character string sequence; and carrying out secondary compression on the character string sequence to obtain a compression result, and storing the compression result into a database.

The technical solution of the present invention is further described in detail by the accompanying drawings and the specific embodiments.

Example one

The embodiment provides a method for data storage, which is applied to a live platform, and as shown in fig. 1, the method includes:

s110, acquiring multiple performance sampling data, wherein the performance sampling data comprises: multiple method names and corresponding operating data;

in this step, performance sampling data is required to be acquired, the performance sampling data includes a plurality of parts, and each part of the performance sampling data includes a plurality of method names and corresponding operation data. For example, if the user name needs to be obtained, the corresponding method name is username or userID; the operation data comprises the operation time, the operation times, the memory occupied by the operation, the CPU utilization rate and the like of each method name.

S111, performing character string coding by taking each method name in each piece of performance data as different basic elements to generate a plurality of non-shared coding tables;

here, a plurality of unshared coding tables may be generated by performing character string coding as different basic elements for each method name in each piece of performance sample data.

Specifically, the number of each method name in a plurality of copies of the performance sampling data is counted respectively; assigning a unique character string to each method name in each piece of performance sampling data; based on Huffman coding, the name of each method, the number of the names of each method and the corresponding character string of each piece of performance sampling data are respectively stored into corresponding mapping tables, the mapping tables are non-shared coding tables, and the corresponding character strings are preset.

For example, the mapping table generated for a certain performance sample data is shown in table 1, and the mapping tables for other performance sample data may be implemented in the same manner. Wherein the character string may be determined from 26 upper and lower case letters.

TABLE 1

Name of method	Number of occurrences	Character encoding
			A	243	f
B	173	d
			C	99	e

It should be noted that, when different numbers of performance sample data contain the same method name, the same method name needs to use the same character string.

Of course, if the compression rate needs to be further improved, any one of the performance sample data may also be used to generate a shared coding table, and the specific generation manner is the same as the generation manner of the non-shared coding table, which is not described herein again.

S112, respectively searching the name of each coded method based on the non-shared coding tables;

in this step, when the name of each encoded method in each performance sample data needs to be searched, the search may be performed based on a corresponding non-shared encoding table. For example, table 1 is an encoding table of the first piece of performance sampling data, and when the character string encoding corresponding to the method name a in the first piece of performance sampling data is needed, the corresponding character string encoding f can be found based on table 1.

S113, respectively carrying out primary compression on the non-shared code table and the method name after each code to obtain the names of the methods after the primary compression;

and after the method name after each code is obtained, the non-shared code table and the method name after each code are respectively compressed for one time, and the names of the methods after one-time compression are obtained.

Still taking the first piece of performance sampling data as an example, when the obtained code corresponding to the method name a is f and the obtained code corresponding to the method name B is d, the first piece of performance sampling data includes: method A, method B, method A; then the names of the methods after one compression are: method A- > f, method B- > d; f, f, d, f.

S114, serializing each method name after primary compression and running data corresponding to each method name respectively to generate a character string sequence;

in this step, when multiple sets of performance sampling data of the same application program need to be stored, each compressed method name and the running data corresponding to each method name are serialized respectively to generate a character string sequence; in the character string sequence, separators are added among different character strings to facilitate accurate retrieval. Here, the corresponding character string is preset.

For example, the names of methods included in a certain piece of performance sample data are a and B; in the first performance sampling data, the operation data corresponding to the method A is 1, and the operation data corresponding to the method B is B; then the string sequence is:

method A- > f, method B- > d; f, d; 1, 3;

similarly, if 1000 pieces of performance sample data need to be stored, they may be generated separately in the same manner as described above. Here, each piece of operation data has a corresponding offset or extraction identifier; the offset may be a time offset, a sequence number offset, or an address offset.

And S115, carrying out secondary compression on the character string sequence to obtain a compression result, and storing the compression result into a database.

After the character string sequence is obtained, carrying out secondary compression on the character string sequence to obtain a compression result, and storing the compression result into a database.

Here, since the method name generally includes several to several tens of characters, if the method name is directly compressed by a single character, the occupied space is relatively large, but after the method name is used for encoding the character string, the method name generally includes 1 to several characters, so that the occupied space is relatively small during compression.

Taking the sample data requested at a time in the live broadcast room as an example, a total of 476 method names are present, 1788 times, and 3.8 times of each method name are present on average. The weighted average length of the method name is 28.6 characters. If the number of times of occurrence of the method name is not considered, the average length of the 457 method names is 31.4 characters. In the string after serialization, the characters of the method name occupy 58.0% of the total content.

After huffman coding of a method name as described above, a method name can be represented by an average of 8.0 bits. In contrast, if stored in character form, each character in the method name requires 1 byte (1 byte — 8 bits). This means that after encoding, the method name occupies only one of the previous 28.6 times the space.

Considering the need to store extra coding tables, i.e. each method must appear in the form of characters once, we can omit about 50% of the character content (referring to only the part of the method name) by this coding method if one method name is repeated twice in the sample data; if the method name is repeated three times, 64% of character content can be saved; calculated as the average number of repetitions of 3.8, nearly 70% of the character space can be saved. Considering that the character content of the method name is 58% of the total content, the multiplication by 70% is 40.6%. That is, theoretically, the content to be compressed can be reduced by more than 40% before the conventional compression method by the present scheme.

It should be noted that the remaining 42% of the operating data is compressed and stored in a conventional compression manner.

When the performance sampling data needs to be retrieved, the character string code corresponding to the target method name can be found by the code table (shared code table or non-shared code table), then the total operation data corresponding to the character string code is found, and then the target operation data is extracted from the total operation data according to the offset address or the extraction identifier of the target operation data.

Example two

Corresponding to the first embodiment, this embodiment further provides an apparatus for data storage, as shown in fig. 2, the apparatus includes: the device comprises an acquisition unit 21, an encoding unit 22, a searching unit 23, a first compression unit 24, a generation unit 25, a second compression unit 26 and a storage unit 27; wherein,

the obtaining unit 21 is configured to obtain performance sampling data, where the performance sampling data includes multiple copies, and each copy of the performance sampling data includes multiple method names and corresponding operation data. For example, if the user name needs to be obtained, the corresponding method name is username or userID; the operation data comprises the operation time, the operation times, the memory occupied by the operation, the CPU utilization rate and the like of each method name.

Here, the encoding unit 22 may perform character string encoding as different basic elements for each method name in each piece of performance sample data, respectively, and generate a plurality of unshared encoding tables.

Specifically, the encoding unit 22 respectively counts the number of each method name in a plurality of sets of the performance sampling data; assigning a unique character string to each method name in each piece of performance sampling data; and respectively storing the name of each method, the number of the names of each method and the corresponding character string of each piece of performance sampling data into corresponding mapping tables based on Huffman coding, wherein the mapping tables are non-shared coding tables, and the corresponding character strings are preset.

For example, the mapping table generated for a certain performance sampling data is shown as a table, and the mapping table for other performance sampling data may be implemented by the same method. Wherein the character string may be determined from 26 upper and lower case letters.

TABLE 1

Name of method	Number of occurrences	Character encoding
			A	243	f
B	173	d
			C	99	e

When the method name after each encoding in each performance sample data needs to be searched after the encoding table is generated, the searching unit 23 may perform the searching based on the corresponding non-shared encoding table. For example, table 1 is an encoding table of the first piece of performance sampling data, and when the character string encoding corresponding to the method name a in the first piece of performance sampling data is needed, the corresponding character string encoding f can be found based on table 1.

After obtaining the method name after each encoding, the first compression unit 24 is configured to perform primary compression on the non-shared encoding table and the method name after each encoding, respectively, and obtain names of each method after the primary compression.

When multiple sets of performance sampling data of the same application program need to be stored, the generating unit 25 is configured to serialize each of the once-compressed method names and the operation data corresponding to each of the method names, and generate a character string sequence; in the character string sequence, separators are added among different character strings to facilitate accurate retrieval. Here, the corresponding character string is preset.

method A- > f, method B- > d; f, d; 1, 3;

Finally, the second compression unit 26 performs secondary compression on the character string sequence to obtain a compression result, and the storage unit 27 stores the compression result in the database.

It should be noted that the remaining 42% of the operating data is compressed in a conventional compression manner.

EXAMPLE III

The present embodiment further provides a computer device for data storage, as shown in fig. 3, the computer device includes: radio Frequency (RF) circuitry 310, memory 320, input unit 330, display unit 340, audio circuitry 350, WiFi module 360, processor 370, and power supply 380. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 3 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components.

The following describes the components of the computer device in detail with reference to fig. 3:

RF circuitry 310 may be used for receiving and transmitting signals, and in particular, for receiving downlink information from base stations and processing the received downlink information to processor 370. In general, the RF circuit 310 includes, but is not limited to, at least one Amplifier, transceiver, coupler, Low Noise Amplifier (LNA), duplexer, and the like.

The memory 320 may be used to store software programs and modules, and the processor 370 may execute various functional applications of the computer device and data processing by operating the software programs and modules stored in the memory 320. The memory 320 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 320 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 330 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the computer apparatus. Specifically, the input unit 330 may include a keyboard 331 and other input devices 332. The keyboard 331 can collect the input operation of the user thereon and drive the corresponding connection device according to a preset program. The keyboard 331 collects the output information and sends it to the processor 370. The input unit 330 may include other input devices 332 in addition to the keyboard 331. In particular, other input devices 332 may include, but are not limited to, one or more of a touch panel, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 340 may be used to display information input by a user or information provided to the user and various menus of the computer device. The Display unit 340 may include a Display panel 341, and optionally, the Display panel 341 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the keyboard 331 may cover the display panel 341, and when the keyboard 331 detects a touch operation on or near the keyboard 331, the keyboard 331 transmits the touch event to the processor 370 to determine the type of the touch event, and then the processor 370 provides a corresponding visual output on the display panel 341 according to the type of the input event. Although the keyboard 331 and the display panel 341 are shown in fig. 3 as two separate components to implement input and output functions of the computer device, in some embodiments, the keyboard 331 and the display panel 341 may be integrated to implement input and output functions of the computer device.

Audio circuitry 350, speaker 351, microphone 352 may provide an audio interface between a user and a computer device. The audio circuit 350 may transmit the electrical signal converted from the received audio data to the speaker 351, and the electrical signal is converted into a sound signal by the speaker 351 and output;

WiFi belongs to short-distance wireless transmission technology, and computer equipment can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 360, and provides wireless broadband internet access for the user. Although fig. 3 shows the WiFi module 360, it is understood that it does not belong to the essential constitution of the computer device, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 370 is a control center of the computer device, connects various parts of the entire computer device using various interfaces and lines, performs various functions of the computer device and processes data by operating or executing software programs and/or modules stored in the memory 320 and calling data stored in the memory 320, thereby monitoring the computer device as a whole. Alternatively, processor 370 may include one or more processing units; preferably, the processor 370 may be integrated with an application processor, wherein the application processor primarily handles operating systems, user interfaces, application programs, and the like.

The computer device also includes a power supply 380 (such as a power adapter) for powering the various components, which may preferably be logically connected to the processor 370 through a power management system.

The method, the device and the computer equipment for storing data provided by the embodiment of the invention have the following beneficial effects that:

the embodiment of the invention provides a method, a device and computer equipment for data storage, which are applied to a live broadcast platform, wherein the method comprises the following steps: the embodiment of the invention provides a method, a device and computer equipment for data storage, which are applied to a live broadcast platform, wherein the method comprises the following steps: obtaining a plurality of performance sampling data, wherein the performance sampling data comprises: multiple method names and corresponding operating data; taking each method name in each piece of performance data as different basic elements to carry out character string coding, and generating a plurality of unshared coding tables; respectively searching the method name after each type of coding based on a plurality of non-shared coding tables; respectively compressing the non-shared coding table and the method name after each coding for one time to obtain the name of each method after one compression; serializing each compressed method name and the running data corresponding to each method name respectively to generate a character string sequence; performing secondary compression on the character string sequence to obtain a compression result, and storing the compression result into a database; therefore, when data are stored, the non-shared coding table and the method name after each coding are firstly compressed for the first time to obtain the names of the methods after the first compression, then the character string sequence is compressed for the second time to obtain the compression result, and the data are compressed for the second time to improve the compression ratio; when the method names are coded, each method name is used as a basic element for coding, and single characters are coded, which is not the same as the prior art, so that the data to be stored are further reduced, and the occupied storage space is reduced; when encoding, each method name is used as different basic elements to perform character string encoding; the character string is readable data, and even after the data is stored, the data is still readable; therefore, when the user searches information, the corresponding method name can be searched according to the corresponding character string, and then the performance sampling data of the method name is obtained; in addition, since the method name is encoded as a basic element, when the method name is compressed, the compressed data is reduced, thereby improving the compression rate.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components of a gateway, proxy server, system according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on a computer-readable storage medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form. It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

The above description is only exemplary of the present invention and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements, etc. that are within the spirit and principle of the present invention should be included in the present invention.

Claims

1. A method for data storage, applied in a live platform, the method comprising:

2. The method of claim 1, wherein said string encoding each method name as a distinct base element, generating an encoding table, comprises:

3. The method of claim 1, wherein separators are provided between different strings in the sequence of strings.

4. The method of claim 1, wherein the operational data comprises: the running time, the running times, the memory occupied by running and the CPU utilization rate of the central processing unit of each method name.

5. An apparatus for data storage, the apparatus comprising:

6. The apparatus of claim 5, wherein the encoding unit is specifically configured to:

7. The apparatus of claim 5, wherein a separator is disposed between different strings in the sequence of strings.

8. The apparatus of claim 5, wherein the operational data comprises: the running time, the running times, the memory occupied by running and the CPU utilization rate of the central processing unit of each method name.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is able to carry out the method according to any one of claims 1 to 4.

10. A computer device for data storage, comprising:

at least one processor; and

at least one memory communicatively coupled to the processor, wherein,

the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the method of any of claims 1 to 4.