CN117271595A - Data processing method, plug-in, device and storage medium - Google Patents

Data processing method, plug-in, device and storage medium Download PDF

Info

Publication number
CN117271595A
CN117271595A CN202311400360.6A CN202311400360A CN117271595A CN 117271595 A CN117271595 A CN 117271595A CN 202311400360 A CN202311400360 A CN 202311400360A CN 117271595 A CN117271595 A CN 117271595A
Authority
CN
China
Prior art keywords
data
query
transfer
transfer data
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311400360.6A
Other languages
Chinese (zh)
Inventor
蒋尧鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kangjian Information Technology Shenzhen Co Ltd
Original Assignee
Kangjian Information Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kangjian Information Technology Shenzhen Co Ltd filed Critical Kangjian Information Technology Shenzhen Co Ltd
Priority to CN202311400360.6A priority Critical patent/CN117271595A/en
Publication of CN117271595A publication Critical patent/CN117271595A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of computers and digital medical treatment, and provides a data processing method, a plug-in unit, equipment and a storage medium, wherein the method comprises the following steps: acquiring original data in response to a data storage request; converting the format of the original data, compressing the original data after format conversion to obtain transfer data, and storing the transfer data into a cache space; reading transfer data from the cache space in response to the data query request; decompressing the transfer data, performing reverse format conversion on the decompressed transfer data to obtain query data, and storing the query data in a memory space; and writing the query data stored in the memory space into the ES index library in batches. The data processing method reduces the number of documents in the ES index library, saves machine resources and can realize better retrieval analysis performance.

Description

Data processing method, plug-in, device and storage medium
Technical Field
The present application relates to the field of computer and digital medical technology, and in particular, to a data processing method, a plug-in unit, a device, and a storage medium.
Background
ElasticSearch (ES) is a powerful open source search and analysis engine that is widely used for large-scale data storage and retrieval. In a conventional manner, all data is written directly into the ES index library. In order to meet the requirement of subsequent parsing of data, data written into the ES index library needs to store transaction logs (translogs), inverted indexes, and the like in addition to the original document. In this case, the ES processing such large-scale data requires consuming a large amount of machine resources, and requires more machines to provide enough Central Processing Unit (CPU), memory, and disk resources.
In practical use, some of the large-scale data generally do not need to be retrieved, and only in specific cases, a part of the data needs to be acquired for retrieval, such as some system logs, non-critical service logs and the like. If the ES index library is written directly after the data is generated, a large amount of "asleep" data will occupy machine resources. And the cost of writing data and storing data with high capacity is high and the cost performance is low.
Disclosure of Invention
In order to solve the above problems, the embodiments of the present application provide a data processing method, an add-in, a device, and a storage medium, where only original data is stored first, and only when some data needs to be retrieved and analyzed, the original data is exported and written into an ES index library, so as to save resources thereof, reduce the amount of documents in the ES, and improve the performance of data retrieval and analysis.
The embodiment of the application adopts the following technical scheme:
in a first aspect, a data processing method is provided, the method comprising:
acquiring original data in response to a data storage request;
converting the format of the original data, compressing the original data after format conversion to obtain transfer data, and storing the transfer data into a cache space;
Reading transfer data from the cache space in response to the data query request;
decompressing the transfer data, performing reverse format conversion on the decompressed transfer data to obtain query data, and storing the query data in a memory space;
and writing the query data stored in the memory space into the ES index library in batches.
In a second aspect, there is provided a data processing plug-in comprising:
an acquisition unit configured to acquire original data in response to a data storage request;
the buffer unit is used for carrying out format conversion on the original data, compressing the original data subjected to format conversion to obtain transfer data, and storing the transfer data into a buffer space;
the query unit is used for responding to the data query request and reading the transfer data;
the transfer unit is used for decompressing the transfer data, performing reverse format conversion on the decompressed transfer data to obtain query data, and storing the query data in the memory space;
and the writing unit is used for writing the query data stored in the memory space into the ES index library in batches.
In a third aspect, a computer device is provided comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the data processing method described above when the computer program is executed.
In a fourth aspect, a computer readable storage medium is provided, the computer readable storage medium storing a computer program which, when instructed by a processor, implements the steps of the data processing method described above.
The above-mentioned at least one technical scheme that this application embodiment adopted can reach following beneficial effect:
the data processing method provided by the application is used for responding to a data storage request to acquire original data; converting the format of the original data, compressing the original data after format conversion to obtain transfer data, and storing the transfer data into a cache space; reading transfer data from the cache space in response to the data query request; decompressing the transfer data, performing reverse format conversion on the decompressed transfer data to obtain query data, and storing the query data in a memory space; and writing the query data stored in the memory space into the ES index library in batches. According to the data processing method, the original data are firstly obtained, the transfer data converted from the original data are cached, the transfer data are reversely converted into the original data according to the requirement, and finally the original data are imported into the ES index library. The data processing method reduces the number of documents in the ES index library, saves server machine resources, and can realize better retrieval analysis performance. The data processing method provided by the application can be realized as a customized plug-in of the elastic search, and the plug-in is integrated into the elastic search instance without additionally deploying new services.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 illustrates an application environment schematic of a data processing method according to one embodiment of the present application;
FIG. 2 shows a flow diagram of a data processing method according to one embodiment of the present application;
FIG. 3 shows a flow diagram of a data processing method according to another embodiment of the present application;
FIG. 4 illustrates a schematic diagram of a data processing plug-in according to one embodiment of the present application;
FIG. 5 illustrates a functional schematic of a data processing plug-in according to another embodiment of the present application;
FIG. 6 illustrates a schematic structural diagram of a computer device according to one embodiment of the present application;
fig. 7 shows a schematic structural diagram of a computer device according to another embodiment of the present application.
Detailed Description
For the purposes, technical solutions and advantages of the present application, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In order to make the technical solutions provided by the embodiments of the present application more clearly understood to those skilled in the art, the background art related to the present application is first explained.
ElasticSearch (ES) is a powerful open source search and analysis engine. It provides a plug-in mechanism so that the developer can extend the functionality of the elastic search, add custom functionality or integrate third party tools and libraries according to actual needs. The elastic search can attach customized functions to the original elastic search instance without the need to additionally deploy new services. The application designs a data processing plug-in based on the expansion function of the elastic search, and changes the direct writing of all data into an elastic search index library into: the original data is firstly obtained and then converted, the converted data is converted into the original data according to the requirement, and finally the original data is imported into an ES index library, so that the data retrieval and analysis performance is improved.
The data processing method provided by the embodiment of the application can be applied to an environment as shown in fig. 1, wherein a client communicates with a server through a network. The clients may be, but are not limited to, devices with display screens and input means, such as various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices. The service end can be realized by an independent service platform or a service platform cluster formed by a plurality of service platforms. One server may communicate with multiple clients.
The data storage request and the data query request are sent from a client, and the client sending the data storage request and the client sending the data query request can be the same client or different clients. The method comprises the steps that a server side responds to a data storage request sent by a client side to obtain original data; and carrying out format conversion on the original data, compressing the original data subjected to format conversion to obtain transfer data, and storing the transfer data into a cache space. The server responds to the data query request sent by the client again, and reads transfer data from the cache space; decompressing the transfer data, performing reverse format conversion on the decompressed transfer data to obtain query data, and storing the query data in a memory space. The server side writes the query data of the memory space into the ES index library in batches, so that the number of Chinese files in the ES index library is reduced, machine resources are saved, and the retrieval analysis performance of the data is improved.
In the actual scene of digital medical treatment, a user can use a client to store user information and simultaneously search pharmacy information, medicine information, doctor information, insurance information and the like; the doctor can store doctor information using the client while retrieving user information, medicine information, and the like. Data information related to digital medicine is massive and the content stored and retrieved by different clients may be different. If these information are all saved and written directly into the ES index library, this will lead to huge device pressure on the server side. Therefore, based on the data processing method provided by the application, after the client sends out a data storage request, the server converts the original data into transfer data and stores the transfer data in the cache space; only after the client sends out the data query request again, the server converts and stores the corresponding transfer data in the memory space from the cache space and writes the transfer data into the ES index library in batches.
The present application is described in detail below by way of specific examples.
Fig. 2 is a schematic flow chart of a data processing method according to an embodiment of the present application, and according to the method shown in fig. 2, the method includes steps S210 to S250:
step S210, acquiring the original data in response to the data storage request.
The data storage request is sent by the client to the server. Only when a user sends a data storage request instruction through a client, the server acquires the original data and performs subsequent processing. The server may send a front page to the client to receive the data storage request, or the client may set a timing storage function so that the server receives the data storage request at certain time intervals.
The original data acquired by the server is usually in a JSON format, which is a JavaScript native format and is a lighter weight data exchange format than XML.
Step S220, format conversion is carried out on the original data, compression is carried out on the original data after format conversion to obtain transfer data, and the transfer data are stored in a cache space.
In order to relieve the storage pressure of the cache space, the acquired original data is not directly transferred to the cache space, but is subjected to format conversion and compression. For example, common data transmission serialization formats include XML, JSON, protobuf, etc. The server side can convert the received original data in the JSON format into data in the Protobuf serialization format, and the field names and other metadata are discarded, so that the original data after format conversion occupies smaller space. Then, the original data after format conversion is compressed. Compression may be performed using, but is not limited to, gzip, ZSTD, etc. algorithms. The compressed data is stored as intermediate transfer data in the buffer space. The buffer memory space is a storage area specially divided by the server for storing the transfer data, and the storage area can be a part of the internal storage space of the server or an external expansion storage space of the server.
Step S230, reading the transfer data from the buffer space in response to the data query request.
The data query request is also sent by the client to the server. Only if the user sends a data query request instruction through the client, the server reads the transfer data in the cache space and performs subsequent processing. The client that issues the data query request in this step may be the same as or different from the client that issues the data storage request in step S210. The server may send a front page to the client to receive the data query request, or the client may set a timing query function so that the server receives the data query request at a certain time interval.
The server can determine the data range to be queried based on the data query request, and determine the transfer data to be read in the cache space. For example, the server may provide a front-end page with an input function to the client, and determine a data range to be queried based on the input of the user; or, the client sets a function of periodically inquiring the specific data range, and the server determines the data range to be inquired after receiving the data inquiry request for inquiring the specific data range.
Step S240, decompressing the transfer data, performing reverse format conversion on the decompressed transfer data to obtain query data, and storing the query data in a memory space.
After the server derives the transfer data according to the data query request, the reverse operation of step S220 is performed on the transfer data, i.e. the compressed data is decompressed, and then the format-converted data is converted in reverse format, so as to obtain the query data. The same algorithm as the compression operation in step S220 may be selected for the decompression operation of the relay data.
The obtained query data are stored in the memory space of the server. In the storage, a piece-by-piece storage mode can be adopted. However, the query data is often scattered by storing the query data piece by piece, so that the query data can be classified and stored in the memory space in a certain classification mode. For example, a plurality of memory queues are arranged in the memory space, and query data are classified and stored in each corresponding memory queue in a certain classification mode.
Step S250, the query data stored in the memory space is written into the ES index library in batches.
The server only writes the query file stored in the memory space into the ES index library, thereby greatly reducing the document quantity of the ES index library. During writing, the writing efficiency can be improved in a multi-thread and batch-wise mode.
As can be seen from the method shown in fig. 2, the data processing method provided in the present application obtains the original data in response to the data storage request; converting the format of the original data, compressing the original data after format conversion to obtain transfer data, and storing the transfer data into a cache space; reading transfer data from the cache space in response to the data query request; decompressing the transfer data, performing reverse format conversion on the decompressed transfer data to obtain query data, and storing the query data in a memory space; and writing the query data stored in the memory space into the ES index library in batches. According to the data processing method, the original data are firstly obtained, the transfer data converted from the original data are cached, the transfer data are reversely converted into the original data according to the requirement, and finally the original data are imported into the ES index library. The data processing method reduces the number of documents in the ES index library, saves server machine resources, and can realize better retrieval analysis performance. The data processing method provided by the application can be realized as a customized plug-in of the elastic search, and the plug-in is integrated into the elastic search instance without additionally deploying new services.
In some optional embodiments, in the above method, step S210, obtaining the original data in response to the data storage request includes: providing a front page, wherein the front page comprises a stored data control; receiving a data storage request through a storage data control; the original data in the JSON format is acquired by calling a data acquisition processor through the RestHandler API of the elastic search.
And the server side sends the front-end page to the client side and is used for receiving an instruction of storing data sent by a user through the front-end page. In the front page, a store data control may be provided for the user to prompt the user whether to store data. For example, the server may send a prompt to the front page to indicate whether to save the data, and when the user clicks the "yes" option, a data storage request is sent to the server by the client.
The server uses the RestHandler API of the elastic search to add a data acquisition processor with path Save Original Data. And after the server receives the data storage request, the data acquisition processor is utilized to receive the original data. The original data is JSON format data.
In some optional embodiments, in the step above, step S220, performing format conversion on the original data, compressing the original data after format conversion to obtain transfer data, and storing the transfer data in a buffer space, includes: converting the original data in the JSON format into data in the Protobuf serialization format; compressing the Protobuf serialization format data by adopting a ZSTD algorithm to obtain transfer data; and storing the transfer data to at least one level of directory in the cache space in an MMAP mode, wherein each transfer data independently occupies one line.
Although JSON is a lightweight data exchange format, in order to further reduce the storage pressure of the buffer space and improve the efficiency of data compression, the server side still needs to convert the data format after receiving the original data in JSON format through the data acquisition processor. In the application, the original data format of the JSON format is converted into Protobuf serialization format data.
The Protobuf is a lightweight and efficient structured data storage format, and has the characteristics of independence of language, independence of platform and expandability. As a binary serialization format, the Protobuf format saves more memory space than the JSON format because it does not need to store field names and other metadata. In addition, protobuf uses an efficient encoder based on code generation, so that data can be packed and parsed quickly, which is faster than JSON. The JSON format may be parsed into a Protobuf format using a codec library provided by Protobuf (e.g., a JSON format tool). After the original data in the JSON format is converted into the data in the Protobuf format, the data can be further compressed.
The compression process uses ZSTD algorithm (Zstandard algorithm). ZSTD is a novel compression algorithm, which ensures high compression ratio, and simultaneously has higher compression speed and lower memory consumption. Compared with the traditional compression algorithms such as gzip, the ZSTD algorithm aims at achieving compression with zlib level and better compression rate, and is quick and lossless.
And storing the compressed data into a cache space as the transfer data. When the transfer data is stored in the buffer space, an MMAP mode can be adopted.
The MMAP mode can map the object to the address space of the process, so that the one-to-one mapping relation between the object disk address and a section of virtual address in the process virtual address space is realized, and further, the direct read-write operation is realized. Compared with the traditional memory storage mode, the MMAP mode not only saves more memory, but also improves the data read-write performance and the data throughput.
When the transfer data is stored in the cache space, the transfer data is required to be stored in at least one level of directory of the cache space, and each transfer data independently occupies one line. That is, in the buffer space, the relay data is classified and stored according to the directory level. As an alternative implementation, the staging data may be catalogued and stored in two dimensions, application and time. In the cache space, the application name is used as a primary directory, and the hour dimension is used as a secondary directory. After the original data is converted into format and compressed into transfer data, the transfer data is firstly guided to the first-level catalogue according to the application name of the original data, and then guided to the second-level catalogue according to the hour range of the acquired time point of the original data, so that the storage of the transfer data is realized. By the storage mode, data can be accurately exported according to application and time dimension.
In a digital medical scenario: for example, the original data of an on-line inquiry plate is acquired, the acquisition time is 12 minutes of 12 months in 2020, 12 days in 12 days, the original data is converted into a format and compressed, the obtained intermediate data is stored in a cache space, and the storage catalog is an on-line inquiry (primary catalog) -202012121213 (secondary catalog) "; for example, the original data of the ' on-line inquiry ' plate is acquired, the acquisition time is 15 minutes from 12 months in 2020 to 12 days in 12 days, the original data is converted into a format and compressed, the obtained intermediate data is stored in a cache space, and the storage catalog is ' on-line inquiry (primary catalog) -202012121516 (secondary catalog) "; for example, the original data of the "pharmacy guide" plate is acquired, the acquisition time is 12 minutes 12 days 12 months 2020, the original data is converted into a format and compressed, and the obtained intermediate conversion data is stored in a cache space, and the storage catalog is "pharmacy guide (primary catalog) -202012121213 (secondary catalog)".
In some optional embodiments, in the above method, step S230, in response to the data query request, reads the relay data from the cache space, including: providing a front page, wherein the front page comprises a query data control; receiving a data query request through a query data control, and determining a data retrieval range according to the data query request; the data query processor is called through the RestHandler API of the elastic search, and the transfer data is acquired based on the data retrieval range.
And the server side sends the front-end page to the client side and is used for receiving an instruction of inquiring data sent by a user through the front-end page. In the front-end page, a query data control may be provided for the user to prompt the user to fill in the data to be queried. For example, the server may send a prompt for "please input a query requirement" to the front-end page, and when the user fills in the corresponding query requirement, the client sends a data query request to the server.
After receiving the data query request, the server determines a data retrieval range according to the data query request, wherein the data retrieval range can comprise an application name of the data and/or a time dimension of the data. Determining the data retrieval range is intended to determine from which directory of the cache space the staging data is retrieved.
The server uses the RestHandler API of the elastic search to add a data query processor with path Export Original Data. And the server acquires the transfer data in the cache space by using the data query processor according to the determined data retrieval range.
The client sending the data storage request and the client sending the data query request may be the same client or different clients. And determining the transfer data acquired after the data retrieval range is determined through the data query request, wherein the transfer data is a part of all transfer data stored in the cache space. That is, all the transfer data are stored in the buffer space, and only after the client sends out the data query request, a part of the transfer data meeting the conditions is exported.
In the digital medical scene, after the server side sends a prompt of 'please input the query requirement' to the client side, the user can fill in the application name to be queried and/or the time range to be queried in the input box. For example, after the user inputs the application name of "insurance service" and the time range of "12 months and 12 days in 2020", the server receives the data query request to determine the data retrieval range as follows: the primary catalog "insurance service", the secondary catalog "202012120001" to "202012122324" are 24 catalogs. The server side exports the transfer data under the directories through the data query processor.
In some optional embodiments, in the above method, step S240 decompresses the intermediate data, and performs anti-format conversion on the decompressed intermediate data to obtain query data, and stores the query data in the memory space, including: decompressing the transfer data by adopting a ZSTD algorithm; the decompressed transfer data is inversely sequenced into query data in a JSON format; the query data is stored in a plurality of memory queues of the memory space.
After the server determines the data query range, traversing the files one by one, reading the data row by row, and exporting the needed transfer data from the cache space. The server side decompresses and reverse format converts the intermediate data. The decompression process can still be performed through a ZSTD algorithm, and then the inverse serialization of Protobuf is performed, so that the transit data are converted into query data in a JSON format. Here, the query data corresponds to a state of restoring the relay data to the original data.
The query data are put into a memory queue of the server and divided into a plurality of batches according to a certain number. The data maximum limit which can be contained in each memory queue can be set, and when the query data volume of one memory queue reaches the data maximum limit, the subsequent query data is put into the next memory queue; when the query data in one memory queue is written into the ES index library, the accommodating space is released, and new query data can be additionally stored in the memory queue. And setting the application name of the data which can be accommodated by each memory queue, and placing the query data into the corresponding memory queue when the application name of the query data belongs to the application name which can be accommodated by the memory queue.
For example, in a digital medical scenario, multiple memory queues may be partitioned according to application names such as "on-line consultation", "pharmacy guidance", "insurance service", etc. When the query data is the application name of the 'online inquiry', the 'online inquiry' memory queue is put, when the query data is the application name of the 'pharmacy guide', the 'pharmacy guide' memory queue is put, and when the query data is the application name of the 'insurance service', the 'insurance service' memory queue is put.
In some alternative embodiments, in the above method, step S250, writing the query data stored in the memory space into the ES index library in batches includes: query data is written to the ES index library by thread Chi Yibu calling the Bulk API of elastic search.
The server calls Bulk API of elastic search in an asynchronous mode through a thread pool, and writes query data into an ES index library in batches. Because the query data are all local at the server, network transmission is not needed, and the writing speed is high. The batch and asynchronous call Bulk API mode has higher performance, and can realize that query data can be written into the ES index library rapidly.
Since the query data in the specified range is derived as required, the number of documents in the ES index library is not too large, and the memory, central Processing Unit (CPU) and storage resources required by the ES index library are not excessively occupied.
In some alternative embodiments, in the above method, step S250, after the step of writing the query data stored in the memory space into the ES index library in batches, the method further includes: deleting the transfer data meeting the preset conditions in the cache space at a first preset time interval, wherein the preset conditions comprise: the storage time exceeds a first preset duration, the original data acquisition time belongs to a preset time period, and the original data application name is a preset name; and/or deleting the query data written into the ES index library at a second preset time interval for more than a second preset time period.
Since the transfer data stored in the buffer memory space for a long time occupies a large amount of storage space, the probability of the history transfer data being exported is very low and even not exported any more. Therefore, the relay data stored for too long can be deleted at regular time. And when the time stored in the buffer memory space exceeds the first preset time, cleaning the transfer data from the buffer memory space.
In some cases, the raw data acquired during a certain period of time is no longer needed. Therefore, it is possible to delete the relay data whose original data acquisition time belongs to the preset period of time at regular time. And when the original data corresponding to the intermediate data acquire time data for the preset time period, cleaning the intermediate data from the cache space.
In addition, as some application names are developed and come off-line, the transit data under the application names can be cleaned. Therefore, it is possible to delete relay data whose application name is a preset name at regular time. And when the application name of the original data corresponding to the intermediate data is the preset name through the preset name, cleaning the intermediate data from the cache space.
Of course, the above preset conditions can be combined, and the intermediate data can be cleaned regularly by combining the dimensions of the storage time, the original data acquisition time, the original data application name and the like, so that the buffer space is released.
After the transfer data is written into the ES index library, long-term storage is not necessary. Therefore, the query data written into the ES index library, which has long retention time, can be deleted at the second preset time interval. And when the time of the query data written into the ES index library exceeds the second preset time length, cleaning the query data written into the ES index library. Therefore, the storage space can be saved, and the document quantity in the ES index library is reduced, so that the data retrieval and analysis performance is improved.
Fig. 3 shows a flow diagram of a data processing method according to another embodiment of the present application, which may include the following steps according to the illustration of fig. 3.
In step S301, a front page is provided, where the front page includes a storage data control, and a data storage request is received through the storage data control.
In step S302, the data acquisition processor is called by the RestHandler API of the elastic search to acquire the original data in the JSON format.
Step S303, converting the original data in the JSON format into data in the Protobuf serialization format.
And step S304, compressing the Protobuf serial format data by adopting a ZSTD algorithm to obtain transfer data.
In step S305, the transfer data is stored under at least one level of directory in the cache space in an MMAP manner, where each transfer data occupies a line independently.
Step S306, providing a front-end page, wherein the front-end page comprises a query data control, receiving a data query request through the query data control, and determining a data retrieval range according to the data query request.
Step S307, calling a data query processor through a RestHandler API of the elastic search, and acquiring transfer data based on the data retrieval range.
And step 308, decompressing the transfer data by adopting a ZSTD algorithm.
Step S309, the decompressed intermediate data is inversely sequenced into query data in JSON format.
In step S310, the query data is stored in a plurality of memory queues in the memory space.
In step S311, the thread Chi Yibu calls the Bulk API of the elastic search to write the query data into the ES index library.
Step S312, deleting the transfer data meeting the preset conditions in the buffer space at a first preset time interval, wherein the preset conditions include: the storage time exceeds a first preset duration, the original data acquisition time belongs to a preset time period, and the application name of the original data is a preset name.
Step S313, deleting the query data written into the ES index library at a second preset time interval exceeding a second preset time period.
FIG. 4 shows a schematic diagram of a data processing card according to one embodiment of the present application, and according to FIG. 4, the card 400 includes:
An acquisition unit 410 for acquiring the original data in response to the data storage request;
the buffer unit 420 is configured to perform format conversion on the original data, compress the original data after format conversion to obtain transfer data, and store the transfer data into a buffer space;
a query unit 430 for reading the relay data in response to the data query request;
the transfer unit 440 is configured to decompress the transfer data, perform anti-format conversion on the decompressed transfer data to obtain query data, and store the query data in the memory space;
the writing unit 450 is configured to write the query data stored in the memory space into the ES index library in batches.
In some alternative embodiments, in the above-mentioned plug-in 400, the obtaining unit 410 is specifically configured to: providing a front page, wherein the front page comprises a stored data control; receiving a data storage request through a storage data control; the original data in the JSON format is acquired by calling a data acquisition processor through the RestHandler API of the elastic search.
In some alternative embodiments, in the foregoing plug-in 400, the buffering unit 420 is specifically configured to: converting the original data in the JSON format into data in the Protobuf serialization format; compressing the Protobuf serialization format data by adopting a ZSTD algorithm to obtain transfer data; and storing the transfer data to at least one level of directory in the cache space in an MMAP mode, wherein each transfer data independently occupies one line.
In some alternative embodiments, in the plug-in 400, the query unit 430 is specifically configured to: providing a front page, wherein the front page comprises a query data control; receiving a data query request through a query data control, and determining a data retrieval range according to the data query request; the data query processor is called through the RestHandler API of the elastic search, and the transfer data is acquired based on the data retrieval range.
In some alternative embodiments, in the above-mentioned plug-in unit 400, the dump unit 440 is specifically configured to: decompressing the transfer data by adopting a ZSTD algorithm; the decompressed transfer data is inversely sequenced into query data in a JSON format; the query data is stored in a plurality of memory queues of the memory space.
In some alternative embodiments, in the above-described plug-in 400, the writing unit 450 is specifically configured to: query data is written to the ES index library by thread Chi Yibu calling the Bulk API of elastic search.
In some alternative embodiments, the insert 400 further includes: the deleting unit is specifically configured to delete, at a first preset time interval, transfer data in the cache space, where the transfer data meets a preset condition, where the preset condition includes: the storage time exceeds a first preset duration, the original data acquisition time belongs to a preset time period, and the application name of the original data is a preset name; and/or deleting the query data written into the ES index library at a second preset time interval for more than a second preset time period.
It should be noted that, the foregoing data processing method may be implemented by the foregoing data processing plug-in 400, which is not described herein.
Fig. 5 shows a functional schematic of a data processing plug-in according to another embodiment of the present application, which, according to the illustration of fig. 5, performs the following functions by means of units:
the acquisition unit 410 provides a front page to the client, the front page including a stored data control; receiving a data storage request through a storage data control; the original data in the JSON format is acquired by calling a data acquisition processor through the RestHandler API of the elastic search.
The buffer unit 420 converts the original data in JSON format into Protobuf serialization format data; compressing the Protobuf serialization format data by adopting a ZSTD algorithm to obtain transfer data; and storing the transfer data to at least one level of directory in the cache space in an MMAP mode, wherein each transfer data independently occupies one line.
The query unit 430 provides a front page to the client, the front page including a query data control; receiving a data query request through a query data control, and determining a data retrieval range according to the data query request; the data query processor is called through the RestHandler API of the elastic search, and the transfer data is acquired based on the data retrieval range.
The transfer unit 440 decompresses the transfer data using ZSTD algorithm; the decompressed transfer data is inversely sequenced into query data in a JSON format; the query data is stored in a plurality of memory queues of the memory space.
The writing unit 450 writes the query data to the ES index library by calling the Bulk API of the elastic search through the thread Chi Yibu.
The deleting unit 460 deletes the transfer data meeting the preset conditions in the buffer space at preset time intervals, where the preset conditions include: the storage time exceeds a first preset duration, the original data acquisition time belongs to a preset time period, and the application name of the original data is a preset name; and deleting the query data written into the ES index library at preset time intervals for exceeding a second preset time length.
Fig. 6 shows a schematic structural diagram of a computer device according to an embodiment of the present application, and according to fig. 6, the internal structure of the computer device may include a processor, a memory, a network interface, and a database connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes non-volatile and/or volatile storage media and internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is for communicating with an external client via a network connection. The computer program is executed by a processor to perform the functions or steps of a data processing method server.
In one embodiment, the computer device provided in the present application includes a memory and a processor, the memory storing a database and a computer program executable on the processor, the processor executing the computer program to perform the steps of:
acquiring original data in response to a data storage request;
converting the format of the original data, compressing the original data after format conversion to obtain transfer data, and storing the transfer data into a cache space;
reading transfer data from the cache space in response to the data query request;
decompressing the transfer data, performing reverse format conversion on the decompressed transfer data to obtain query data, and storing the query data in a memory space;
and writing the query data stored in the memory space into the ES index library in batches.
In one embodiment, a computer device is also provided, which may be a client, and the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external service end through network connection. The computer program is executed by a processor to carry out the functions or steps of a data processing method client.
In one embodiment, there is also provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
acquiring original data in response to a data storage request;
converting the format of the original data, compressing the original data after format conversion to obtain transfer data, and storing the transfer data into a cache space;
reading transfer data from the cache space in response to the data query request;
decompressing the transfer data, performing reverse format conversion on the decompressed transfer data to obtain query data, and storing the query data in a memory space;
and writing the query data stored in the memory space into the ES index library in batches.
It should be noted that, the functions or steps that can be implemented by the computer device or the computer readable storage medium may correspond to the relevant descriptions in the foregoing method embodiments, and are not described herein for avoiding repetition.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims (10)

1. A method of data processing, the method comprising:
acquiring original data in response to a data storage request;
performing format conversion on the original data, compressing the original data subjected to format conversion to obtain transfer data, and storing the transfer data into a cache space;
Reading the transfer data from the cache space in response to a data query request;
decompressing the transfer data, performing reverse format conversion on the decompressed transfer data to obtain query data, and storing the query data in a memory space;
and writing the query data stored in the memory space into an ES index library in batches.
2. The data processing method of claim 1, wherein the obtaining the raw data in response to the data storage request comprises:
providing a front-end page, wherein the front-end page comprises a stored data control;
receiving a data storage request through the storage data control;
the original data in the JSON format is acquired by calling a data acquisition processor through the RestHandler API of the elastic search.
3. The method of claim 1, wherein the converting the original data into a format, compressing the original data after the format conversion to obtain transfer data, and storing the transfer data in a buffer space, includes:
converting the original data in the JSON format into Protobuf serialization format data;
compressing the Protobuf serialization format data by adopting a ZSTD algorithm to obtain transfer data;
And storing the transfer data to at least one level of directory in a cache space in an MMAP mode, wherein each transfer data independently occupies one line.
4. The data processing method of claim 1, wherein the reading the staging data from the cache space in response to a data query request comprises:
providing a front-end page, wherein the front-end page comprises a query data control;
receiving a data query request through the query data control, and determining a data retrieval range according to the data query request;
and calling a data query processor through a RestHandler API of the elastic search, and acquiring transfer data based on the data retrieval range.
5. The data processing method according to claim 1, wherein decompressing the intermediate data, performing a reverse format conversion on the decompressed intermediate data to obtain query data, and storing the query data in a memory space, includes:
decompressing the transfer data by adopting a ZSTD algorithm;
the decompressed transit data are inversely sequenced into query data in a JSON format;
and storing the query data in a plurality of memory queues in a memory space.
6. The method of claim 1, wherein writing the query data stored in the memory space into the ES index library in batches comprises:
the query data is written to the ES index library by thread Chi Yibu calling the Bulk API of the elastic search.
7. The method of claim 1, wherein after the step of batch writing the query data stored in the memory space to an ES index library, the method further comprises:
deleting the transfer data meeting preset conditions in the cache space at a first preset time interval, wherein the preset conditions comprise: the storage time exceeds a first preset duration, the original data acquisition time belongs to a preset time period, and the original data application name is a preset name;
and/or deleting the query data written into the ES index library at a second preset time interval for more than a second preset time period.
8. A data processing card, the card comprising:
an acquisition unit configured to acquire original data in response to a data storage request;
the buffer unit is used for carrying out format conversion on the original data, compressing the original data subjected to format conversion to obtain transfer data, and storing the transfer data into a buffer space;
The query unit is used for responding to the data query request and reading the transfer data;
the transfer unit is used for decompressing the transfer data, performing reverse format conversion on the decompressed transfer data to obtain query data, and storing the query data in a memory space;
and the writing unit is used for writing the query data stored in the memory space into the ES index library in batches.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the data processing method according to any of claims 1 to 7 when the computer program is executed.
10. A computer-readable storage medium storing a computer program, characterized in that the computer program when instructed by a processor implements the steps of the data processing method according to any one of claims 1 to 7.
CN202311400360.6A 2023-10-26 2023-10-26 Data processing method, plug-in, device and storage medium Pending CN117271595A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311400360.6A CN117271595A (en) 2023-10-26 2023-10-26 Data processing method, plug-in, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311400360.6A CN117271595A (en) 2023-10-26 2023-10-26 Data processing method, plug-in, device and storage medium

Publications (1)

Publication Number Publication Date
CN117271595A true CN117271595A (en) 2023-12-22

Family

ID=89206229

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311400360.6A Pending CN117271595A (en) 2023-10-26 2023-10-26 Data processing method, plug-in, device and storage medium

Country Status (1)

Country Link
CN (1) CN117271595A (en)

Similar Documents

Publication Publication Date Title
CN107169083B (en) Mass vehicle data storage and retrieval method and device for public security card port and electronic equipment
US9678969B2 (en) Metadata updating method and apparatus based on columnar storage in distributed file system, and host
US20140215170A1 (en) Block Compression in a Key/Value Store
US10649905B2 (en) Method and apparatus for storing data
CN112486913B (en) Log asynchronous storage method and device based on cluster environment
CN110611592B (en) Log recording method and device
CN110069557B (en) Data transmission method, device, equipment and storage medium
CN112613271A (en) Data paging method and device, computer equipment and storage medium
CN105191144A (en) Compression device, compression method, decompression device, decompression method, and information processing system
CN102255866A (en) Method and device for downloading data
Zhai et al. Hadoop perfect file: A fast and memory-efficient metadata access archive file to face small files problem in hdfs
CN111857574A (en) Write request data compression method, system, terminal and storage medium
CN115470156A (en) RDMA-based memory use method, system, electronic device and storage medium
CN113726341B (en) Data processing method and device, electronic equipment and storage medium
US11327929B2 (en) Method and system for reduced data movement compression using in-storage computing and a customized file system
CN109522273B (en) Method and device for realizing data writing
CN106980618B (en) File storage method and system based on MongoDB distributed cluster architecture
CN115774699B (en) Database shared dictionary compression method and device, electronic equipment and storage medium
CN117271595A (en) Data processing method, plug-in, device and storage medium
CN111090782A (en) Graph data storage method, device, equipment and storage medium
US11423000B2 (en) Data transfer and management system for in-memory database
CN113010103B (en) Data storage method and device, related equipment and storage medium
CN113010113B (en) Data processing method, device and equipment
CN114817176A (en) Distributed file storage system and method based on Nginx + MinIO + Redis
CN113704588A (en) File reading method and system based on mapping technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination