CN112131295A

CN112131295A - Data processing method and device based on Elasticissearch

Info

Publication number: CN112131295A
Application number: CN202011034098.4A
Authority: CN
Inventors: 王永亮
Original assignee: Ping An Medical and Healthcare Management Co Ltd
Current assignee: Shenzhen Ping An Medical Health Technology Service Co Ltd
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2020-12-25

Abstract

The application relates to the technical field of medical informatization of digital medical treatment, and discloses a data processing method and equipment based on an elastic search, wherein the method comprises the following steps: importing Hive metadata into a database through a metadata management unit, converting the structured Hive metadata into a JSON format through an elastic search in the importing process, and creating an original index, an index type and a document; carrying out heat statistics and sorting on the retrieval data of the ES through a data processing unit, and determining a heat label according to a sorting result; establishing an inverted index for the data in the database according to the heat label in the ES; and receiving the keywords to be retrieved through the retrieval service unit, retrieving and dynamically adjusting according to the keywords to be retrieved through the ES, and determining a target retrieval result. Therefore, the data needed by the client can be quickly acquired under the condition of not influencing the heat statistics, the data can be quickly acquired, and the retrieval efficiency is improved.

Description

Data processing method and device based on Elasticissearch

Technical Field

The application relates to the technical field of medical informatization of digital medical treatment, in particular to a data processing method and equipment based on an elastic search.

Background

In digital medical treatment, how to quickly get through the upstream and downstream relationships of medical data in the treatment process of data assets formed by the medical data, and add corresponding categories to each level, and quickly realize the retrieval of the data assets according to the categories and the heat degree has certain efficiency bottleneck.

Disclosure of Invention

The application mainly aims to provide a data processing method and equipment based on the elastic search, and aims to solve the technical problems that in the prior art, when the system data volume is large, the I/O performance and the statistical analysis performance of a traditional relational database cannot meet the user requirements easily, and the overall retrieval efficiency cannot be improved easily.

In order to achieve the above object, the present application provides an Elasticsearch-based data processing method, including:

data acquisition: importing Hive metadata into a database through a metadata management unit, converting the structured Hive metadata into a JSON format and creating an original index, an index type and a document through an elastic search in the importing process, wherein the Hive metadata is data obtained according to medical data;

data processing: performing heat statistics on the retrieval data of the Elasticsearch through a data processing unit to obtain a heat statistical result, sorting the retrieval data according to the heat statistical result to obtain a sorting result, and determining a heat label according to the sorting result;

index construction: establishing an inverted index for the data in the database according to the hot label through the Elasticissearch;

and (3) data retrieval: receiving a keyword to be retrieved through a retrieval service unit, retrieving according to the keyword to be retrieved through the elastic search to obtain a primary retrieval result, and dynamically adjusting the primary retrieval result through the retrieval service unit to determine a target retrieval result.

Further, the step of converting the structured Hive metadata into JSON format by using elastic search in the importing process includes:

and importing the Hive metadata into a database through a data import function of a metadata management unit, and converting the structured data of the database into a JSON format through XContentBluilder in the SDK of the Elasticisarch in the import process.

Further, the step of performing heat statistics on the retrieval data of the Elasticsearch through a data processing unit to obtain a heat statistical result includes:

converting the retrieval data of the Elasticsearch into a DML language through a data processing unit to obtain DML language data, and storing the DML language data in the data processing unit;

determining the total number of applications submitted and the total number of details checked corresponding to the retrieved data according to the DML language data through a data processing unit;

and weighting the total number of times of submitting applications and the total number of times of checking details of the same retrieved data through a data processing unit to obtain a heat statistical result corresponding to the retrieved data.

Further, the step of determining, by the data processing unit, the total number of applications submitted and the total number of details viewed corresponding to the retrieved data according to the DML language data includes:

extracting the DML language data according to preset statistical duration through a data processing unit to obtain data to be counted;

and determining the total number of applications submitted and the total number of details checked corresponding to the data to be searched according to the data to be counted and a preset statistical rule.

Further, the step of dynamically adjusting the primary search result by the search service unit to determine a target search result includes:

and searching the primary search result through the search service unit according to the combination condition and the filtering function to obtain the target search result.

Further, the combination condition includes: at least one of a custom search condition, a built-in query condition of the Elasticissearch, and a composite query condition of the Elasticissearch.

Further, the step of retrieving the primary retrieval result by the retrieval service unit according to a combination condition and a filtering function to obtain the target retrieval result includes:

when the combination condition comprises the custom search condition, obtaining the custom search condition according to a query condition dragged to a retrieval area of the retrieval service unit, wherein the query condition is a condition in a query condition list;

assembling the self-defined search condition into a package search entry through the search service unit;

retrieving by the Elasticsearch according to the package retrieval;

the dragging function of the retrieval service unit is provided by an vue framework, and the page of the retrieval service unit is asynchronously loaded according to page default data during loading to obtain the query condition list.

The present application also proposes an elastic search based data processing apparatus, said apparatus comprising:

the data acquisition module is used for importing Hive metadata into a database through a metadata management unit, converting the structured Hive metadata into a JSON format and creating an original index, an index type and a document through an elastic search in the importing process, wherein the Hive metadata is data obtained according to medical data;

the data processing module is used for carrying out heat statistics on the retrieval data of the Elasticsearch through the data processing unit to obtain a heat statistical result, sorting the retrieval data according to the heat statistical result to obtain a sorting result, and determining a heat label according to the sorting result;

the index building module is used for building an inverted index for the data in the database according to the hot label through the Elasticissearch;

and the data retrieval module is used for receiving the keywords to be retrieved through the retrieval service unit, retrieving according to the keywords to be retrieved through the elastic search to obtain a primary retrieval result, and dynamically adjusting the primary retrieval result through the retrieval service unit to determine a target retrieval result.

The present application further proposes a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of any of the above methods when executing the computer program.

The present application also proposes a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any of the above.

According to the data processing method and device based on the Elasticissearch, the heat statistics is carried out on the retrieval data of the Elasticissearch through the data processing unit to obtain a heat statistical result, the retrieval data are sorted according to the heat statistical result to obtain a sorting result, the heat label is determined according to the sorting result, and the inverted index is established for the data in the database according to the heat label in the Elasticissearch, so that the data needed by a client can be quickly obtained under the condition that the heat statistics is not influenced; the retrieval service unit receives the keywords to be retrieved, the Elasticissearch is used for retrieving according to the keywords to be retrieved to obtain a primary retrieval result, the retrieval service unit dynamically adjusts the primary retrieval result to determine a target retrieval result, so that direct operation of a user on Hive metadata is shielded, the primary retrieval result is dynamically adjusted to obtain the target retrieval result, data can be rapidly obtained when the system data volume is large, and the retrieval efficiency is improved.

Drawings

Fig. 1 is a schematic flowchart of a data processing method based on an Elasticsearch according to an embodiment of the present application;

FIG. 2 is a schematic block diagram of a structure of an elastic search based data processing apparatus according to an embodiment of the present application;

fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The terminology used in this application is to be interpreted as follows:

an elastic search, ES for short, is a search server based on Lucene. It provides a distributed multi-user capable full-text search engine based on restful web interface. The Elasticsearch was developed in the Java language and published as open source under the Apache licensing terms, a popular enterprise level search engine. The Elasticisearch is used in cloud computing, can achieve real-time search, and is stable, reliable, rapid, and convenient to install and use. Official clients are available in Java,. NET (C #), PHP, Python, apache groovy, Ruby and many other languages. The Elasticissearch is the most popular enterprise search engine, as shown by the DB-Engineers ranking, followed by Apache Solr, also based on Lucene.

Hive is a data warehouse tool based on Hadoop, is used for data extraction, transformation and loading, and is a mechanism capable of storing, querying and analyzing large-scale data stored in Hadoop.

And the metadata (MetaDate) mainly records the definition of a model in the data warehouse, the mapping relation among layers, the data state of the monitoring data warehouse and the task running state of the ETL. Metadata is generally stored and managed uniformly through a metadata repository (MetadataRepository), which is mainly aimed at enabling collaboration and consistency in the design, deployment, operation and management of data warehouses.

Hive metadata refers to Hive metadata, the Hive metadata exists in mysql, and a Hive library is stored in the mysql.

The JSON format is a lightweight data exchange format. It stores and represents data in a text format that is completely independent of the programming language, based on a subset of ECMAScript (js specification set by the european computer association). The compact and clear hierarchy makes JSON an ideal data exchange language. The network transmission method is easy to read and write by people, is easy to analyze and generate by machines, and effectively improves the network transmission efficiency.

The DML language is a data operation language through which a user can perform basic operations on a database. Such as queries, insertions, deletions and modifications to the data in the table. In the DML, an application program can perform insertion operation, deletion operation, modification operation, sorting operation and retrieval operation on a database.

The SDK refers to a software development kit, which is a collection of development tools used by software engineers to create application software for a specific software package, software framework, hardware platform, operating system, and the like, and in general, the SDK is an SDK used for developing an application program on a Windows platform. It may simply be a file that provides an application program interface API for a certain programming language, but may also include complex hardware that can communicate with a certain embedded system. Typical tools include utility tools for debugging and other purposes. SDKs also often include example code, supporting technical notes, or other supporting documentation to clarify suspicions for basic reference.

Lucene is a full-text search engine toolkit of open source codes, but the Lucene is not a complete full-text search engine but a full-text search engine architecture, and provides a complete query engine, an index engine and a partial text analysis engine (English and German western languages).

Vue framework is a set of progressive JavaScript framework for constructing a user interface. Unlike other large frames, Vue is designed to be applied layer by layer from the bottom up. Vue the core library only focuses on the viewing layer, facilitating integration with third party libraries or existing projects.

In order to solve the technical problems that in the prior art, when the system data volume is large, the I/O performance and the statistical analysis performance of a traditional relational database are difficult to meet the user requirements, and the overall retrieval efficiency is difficult to improve, the application provides a data processing method based on the elastic search, and the method is applied to the technical field of digital medical treatment and is further applied to the technical field of medical informatization. The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. The data processing method based on the Elasticissearch establishes the inverted index for the data in the database based on the heat label, retrieves according to the keyword to be retrieved through the Elasticissearch to obtain a primary retrieval result, dynamically adjusts the primary retrieval result through the retrieval service unit to determine a target retrieval result, shields the direct operation of a user on the Hive metadata, achieves the purpose of rapidly acquiring the data when the system data volume is large, and improves the retrieval efficiency.

Referring to fig. 1, the data processing method based on the Elasticsearch includes:

s1: data acquisition: importing Hive metadata into a database through a metadata management unit, converting the structured Hive metadata into a JSON format and creating an original index, an index type and a document through an elastic search in the importing process, wherein the Hive metadata is data obtained according to medical data;

s2: data processing: performing heat statistics on the retrieval data of the Elasticsearch through a data processing unit to obtain a heat statistical result, sorting the retrieval data according to the heat statistical result to obtain a sorting result, and determining a heat label according to the sorting result;

s3: index construction: establishing an inverted index for the data in the database according to the hot label through the Elasticissearch;

s4: and (3) data retrieval: receiving a keyword to be retrieved through a retrieval service unit, retrieving according to the keyword to be retrieved through the elastic search to obtain a primary retrieval result, and dynamically adjusting the primary retrieval result through the retrieval service unit to determine a target retrieval result.

According to the method, the data processing unit is used for conducting heat statistics on the retrieval data of the Elasticissearch to obtain a heat statistical result, the retrieval data are sorted according to the heat statistical result to obtain a sorting result, a heat label is determined according to the sorting result, and an inverted index is established for the data in the database according to the heat label in the Elasticissearch, so that the data needed by a client can be quickly obtained under the condition that the heat statistics is not influenced; the retrieval service unit receives the keywords to be retrieved, the Elasticissearch is used for retrieving according to the keywords to be retrieved to obtain a primary retrieval result, the retrieval service unit dynamically adjusts the primary retrieval result to determine a target retrieval result, so that direct operation of a user on Hive metadata is shielded, the primary retrieval result is dynamically adjusted to obtain the target retrieval result, data can be rapidly obtained when the system data volume is large, and the retrieval efficiency is improved.

In the process of retrieving the medical data by the Elasticsearch, the data volume of the medical data is large, so that the retrieval efficiency is low, and the Hive metadata generated according to the medical data is subjected to data processing through the steps S1 to S4, so that the retrieval efficiency is improved, the working efficiency of medical workers is improved, the efficiency of social workers in retrieving the medical data is improved, and the use value of medical assets with large data volume is improved.

For step S1, the structured Hive metadata is converted into JSON format by an Elasticsearch during the import process, that is, the structured data of the database is converted into JSON character strings.

Preferably, the database is a MySQL resource database.

The medical data includes: at least one of case data, treatment order data, medication data, and medical expense data, it is understood that the medical data may also be data of the medical industry, and is not specifically limited herein.

The metadata management unit, that is, a metadata management tool, is configured to manage metadata. The implementation method of the metadata management tool may be selected from the prior art, and will not be described herein.

And creating an original index, an index type and a document according to the structured Hive metadata by using a construction method of IndexResponse of the elastic search in the importing process.

The original index is an index directly created according to the Hive metadata by an IndexResponse construction method, and can be used as a field of the inverted index.

In the Elasticsearch, one index object can store a plurality of objects with different purposes, and different objects in a single index object can be distinguished through an index type (index _ type), which can be understood as a table in a relational database. Each index type may have a different structure, but different index types cannot set different types for the same attribute.

The main entity stored in the Elasticsearch is named document (document), which can be understood as a line of records in a table of a relational database. Each document is composed of a plurality of fields, the elastic search is an unstructured database, each document can have different fields and has a unique identifier.

For step S2, performing offline heat statistics on the retrieved data of the Elasticsearch through a data processing unit to obtain a heat statistical result corresponding to the retrieved data; sorting the retrieved data in a reverse order according to the heat statistical result, and taking the sorted retrieved data as a sorting result; and labeling the retrieved data according to the sorting result, and determining the heat label of each retrieved data.

The data processing unit is a tool for performing offline statistics.

The retrieved data, i.e., data assets. The retrieval data is operational data formed on the retrieval of the data assets.

And the data to be searched is sorted in the reverse order, namely, the data to be searched with high heat statistics is sorted in the front, and the data to be searched with low heat statistics is sorted in the back.

The hot label is used for representing the query hot of the metadata, for example, the hot label can be a one-star grade, a two-star grade, a three-star grade, a four-star grade, and a five-star grade, and the hot statistical result of the five-star grade is higher than that of the four-star grade. For example, the retrieved data ranked 20% (including 20%) is placed in a hot label five-star level, the retrieved data ranked 20% -40% (not including 20%, including 40%) is placed in a hot label four-star level, the retrieved data ranked 40% -60% (not including 40%, including 60%) is placed in a hot label three-star level, the retrieved data ranked 60% -80% (not including 60%, including 80%) is placed in a hot label two-star level, and the retrieved data ranked 80% (not including 80%) is placed in a hot label one-star level, which is not specifically limited in this example.

For step S3, an inverted index is built for the data in the database according to the heat label based on Lucene through the Elasticsearch. That is, the original index will be a field of the inverted index. The method for establishing the inverted index may be selected from the prior art, and will not be described herein.

The Elasticissearch utilizes the reverse index of Lucene, when the index module of the Elasticissearch stores data, the Elasticissearch creates a reverse index for each field of the data in the database, and maps (map) the indexed words (terms) to the documents containing the words through the reverse index.

The Lucene is a jar packet and contains various packaged codes for establishing the reverse index and searching, including various algorithms.

For step S4, receiving a keyword to be retrieved by the retrieval service unit; calling the Elasticissearch through a retrieval service unit to perform full-text retrieval according to the keyword to be retrieved, and taking a retrieval result as a primary retrieval result; and performing linkage query according to the primary retrieval result through the retrieval service unit, dynamically adjusting available screening labels, dynamically adjusting the screening labels in a drilling-down mode, and taking the result of the dynamic adjustment as a target retrieval result.

And the drill-down step is used for spreading the next layer of data from the current data downwards. For example: (the classification of certain data is divided into item names below) from the classification list to the item name list.

The screening label is a cascade label. The screening labels are set according to characteristics of the medical data.

The linkage query is also a joint index query in the inverted index, a joint index function is realized by using a bitset data structure in the Lucene, asynchronous loading is carried out on the front end (retrieval service unit) after two linkage results are queried, and different labels are displayed, wherein the labels are defined for each data.

Preferably, the label is a classification defined for each data, and means a classification defined for each data according to the characteristics of the medical data.

In an embodiment, the step of converting the structured Hive metadata into JSON format by using elastic search in the importing process includes:

The Hive metadata is imported into a Hive library of a database through a data import function of a metadata management unit, and the structured data of the database is converted into a JSON character string through XContentBluilder in the SDK of the Elasticisarch in the import process. That is, the data that is ultimately stored in the Hive library of the database is an unstructured JSON string.

When defining the mapping type of the index and obtaining the index string, xcontentpointer needs to convert the index into a setting (directory) and a mapping (mapping) of the specified index, and generate a mapping source, where the mapping is composed of one or more analyzers (tokenizers) used for partitioning data of the JSON string, and each mapping source is composed of one or more filters, and each filter is set. When the Elasticsearch indexes the document, the content in the field is transferred to the analyzer and data conversion (word segmentation, index and other operations) is carried out through the filter of the subset (mapping subset) so as to manage the index, and the index is used for increasing the retrieval speed.

In an embodiment, the step of performing heat statistics on the retrieval data of the Elasticsearch through a data processing unit to obtain a heat statistical result includes:

s21: converting the retrieval data of the Elasticsearch into a DML language through a data processing unit to obtain DML language data, and storing the DML language data in the data processing unit;

s22: determining the total number of applications submitted and the total number of details checked corresponding to the retrieved data according to the DML language data through a data processing unit;

s23: and weighting the total number of times of submitting applications and the total number of times of checking details of the same retrieved data through a data processing unit to obtain a heat statistical result corresponding to the retrieved data.

According to the embodiment, offline heat statistics is performed on the retrieval data of the Elasticsearch, and a basis is provided for tagging the retrieved data.

For step S21, the data processing unit converts the search data of the Elasticsearch into DML language to obtain DML language data, and stores the DML language data in the data processing unit, thereby facilitating the data processing unit to perform offline heat statistics and avoiding the influence of the heat statistics on the search performance of the Elasticsearch.

It can be understood that the retrieved data of the Elasticsearch is converted into the DML language by the data processing unit to obtain DML language data, and the DML language data can be converted in real time, can be converted periodically, and can be converted according to a preset time.

For step S22, finding out all retrieved data from the DML language data through the data processing unit, then performing submission application statistics on each retrieved data according to the DML language data to obtain a total number of submission applications corresponding to the retrieved data, and performing viewing detail number statistics on each retrieved data according to the DML language data to obtain a total number of viewing details corresponding to the retrieved data.

The submitted application is used for applying for access to the retrieved data.

The viewing details refer to the detailed information of the retrieved data.

For step S23, the calculation formula of the heat statistic result R is:

R＝T_i×a+S_i×b

wherein, T_iIs the total number of times of viewing details of the ith said retrieved data, S_iIs the total number of times of filing the retrieved data, a and b are weighting coefficients, and a and b are added to 1.

Preferably, a is 0.4, and b is 0.6, so that the heat of the data searched for by the heat statistical result feedback is more consistent with the use habit of the user.

In an embodiment, the step of determining, by the data processing unit, the total number of applications submitted and the total number of details viewed corresponding to the retrieved data according to the DML language data includes:

s221: extracting the DML language data according to preset statistical duration through a data processing unit to obtain data to be counted;

s222: and determining the total number of applications submitted and the total number of details checked corresponding to the data to be searched according to the data to be counted and a preset statistical rule.

In the embodiment, the DML language data is extracted according to the preset statistical duration, so that the influence of too long historical data on the accuracy of the heat statistical result is avoided.

A comparison step S221, extracting the DML language data from the historical time by using the current time as the starting time through a data processing unit to obtain data to be counted; the time length of the data to be counted is the same as the preset counting time length.

The preset statistical duration may be directly input by a user, may be written in a program file for implementing the Elasticsearch-based data processing method of the present application, and may be acquired from a database.

And a comparison step S222, finding out all retrieved data to be retrieved from the data to be counted through a data processing unit, then carrying out application submitting frequency statistics on each retrieved data according to the data to be counted and preset statistical rules to obtain the total application submitting frequency corresponding to the retrieved data, and carrying out detail checking frequency statistics on each retrieved data according to the data to be counted and preset statistical rules to obtain the total detail checking frequency corresponding to the retrieved data.

And counting the number of times of checking details according to a preset counting rule, namely checking the searched data once, and recording the data as the once checking details.

Preferably, for the preset statistical rule, the same user applies for calculation for the same feature data within the preset application duration in one time. For example, the same user applies for access to a certain feature data (i.e., retrieved data) first, and after authorization, the viewing validity period of the feature data (i.e., retrieved data) by the same user is 12 hours, and if the viewing validity period exceeds 12 hours, the user needs to apply for another application, where the feature refers to a description of metadata about a tag, a category, a domain, and a tag, a category, and a domain, and this example is not limited specifically.

In an embodiment, the step of dynamically adjusting the primary search result by the search service unit to determine a target search result includes:

According to the embodiment, the retrieval is carried out according to the combination condition and the filtering function, so that the user can be helped to discover and understand the retrieved data, and the user can rapidly apply for the data, obtain the data, apply the data and share the data, so that the retrieval response time can be ensured even if the data is PB-level data, the retrieval efficiency is improved, and the user experience is improved.

Wherein, the retrieval is carried out according to the combination condition and the filtering function, namely the dynamic adjustment is carried out.

And searching the primary search result according to a combination condition through the search service unit, and filtering by using a filter filtering function in the search process to finally obtain the target search result.

In one embodiment, the combination condition includes: at least one of a custom search condition, a built-in query condition of the Elasticissearch, and a composite query condition of the Elasticissearch.

The self-defined search condition is a search condition self-defined by a user.

The built-in query condition of the Elasticissearch is realized by means of Template of the Elasticissearch, mainly some Boolean queries, regular expression queries and the like.

The compound query conditions of the Elasticissearch comprise phrase matching query (also called phrase matching retrieval) and package query (also called package retrieval), are realized through Java API of the Elasticissearch, and mainly utilize the QueryBuilder object.

In one embodiment, the step of retrieving, by the retrieval service unit, the primary retrieval result according to a combination condition and a filtering function to obtain the target retrieval result includes:

s41: when the combination condition comprises the custom search condition, obtaining the custom search condition according to a query condition dragged to a retrieval area of the retrieval service unit, wherein the query condition is a condition in a query condition list;

s42: assembling the self-defined search condition into a package search entry through the search service unit;

s43: retrieving by the Elasticsearch according to the package retrieval;

According to the embodiment, the user-defined search condition defined by the user is obtained, and the search is carried out according to the user-defined search condition, so that the personalized search requirement of the user is met, and the user experience is improved.

For step S41, the user drags the query condition of the query condition list to the retrieval area of the retrieval service unit through the dragging function of the retrieval service unit, the user triggers a generation instruction, and the retrieval service unit generates the customized search condition according to the generation instruction.

The query conditions of the query condition list are commodity names, starting time and ending time, and a user can name a commodity state by dragging one condition.

With reference to fig. 2, the present application also proposes an Elasticsearch-based data processing apparatus, said apparatus comprising:

Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer equipment is used for storing data such as an Elasticissearch-based data processing method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an Elasticsearch based data processing method. The data processing method based on the Elasticissearch comprises the following steps: data acquisition: importing Hive metadata into a database through a metadata management unit, converting the structured Hive metadata into a JSON format and creating an original index, an index type and a document through an elastic search in the importing process, wherein the Hive metadata is data obtained according to medical data; data processing: performing heat statistics on the retrieval data of the Elasticsearch through a data processing unit to obtain a heat statistical result, sorting the retrieval data according to the heat statistical result to obtain a sorting result, and determining a heat label according to the sorting result; index construction: establishing an inverted index for the data in the database according to the hot label through the Elasticissearch; and (3) data retrieval: receiving a keyword to be retrieved through a retrieval service unit, retrieving according to the keyword to be retrieved through the elastic search to obtain a primary retrieval result, and dynamically adjusting the primary retrieval result through the retrieval service unit to determine a target retrieval result.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for processing data based on an Elasticsearch is implemented, including the steps of: data acquisition: importing Hive metadata into a database through a metadata management unit, converting the structured Hive metadata into a JSON format and creating an original index, an index type and a document through an elastic search in the importing process, wherein the Hive metadata is data obtained according to medical data; data processing: performing heat statistics on the retrieval data of the Elasticsearch through a data processing unit to obtain a heat statistical result, sorting the retrieval data according to the heat statistical result to obtain a sorting result, and determining a heat label according to the sorting result; index construction: establishing an inverted index for the data in the database according to the hot label through the Elasticissearch; and (3) data retrieval: receiving a keyword to be retrieved through a retrieval service unit, retrieving according to the keyword to be retrieved through the elastic search to obtain a primary retrieval result, and dynamically adjusting the primary retrieval result through the retrieval service unit to determine a target retrieval result.

According to the executed data processing method based on the Elasticissearch, the data processing unit is used for carrying out heat statistics on the retrieval data of the Elasticissearch to obtain a heat statistical result, the retrieval data are sorted according to the heat statistical result to obtain a sorting result, a heat label is determined according to the sorting result, and an inverted index is established for the data in the database according to the heat label in the Elasticissearch, so that the data required by a client can be quickly obtained under the condition of not influencing the heat statistics; the retrieval service unit receives the keywords to be retrieved, the Elasticissearch is used for retrieving according to the keywords to be retrieved to obtain a primary retrieval result, the retrieval service unit dynamically adjusts the primary retrieval result to determine a target retrieval result, so that direct operation of a user on Hive metadata is shielded, the primary retrieval result is dynamically adjusted to obtain the target retrieval result, data can be rapidly obtained when the system data volume is large, and the retrieval efficiency is improved.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. An elastic search based data processing method, characterized in that the method comprises:

2. The elastic search based data processing method according to claim 1, wherein the step of converting the structured Hive metadata into JSON format by the elastic search in the importing process comprises:

3. The data processing method based on the elastic search according to claim 1, wherein the step of performing heat statistics on the search data of the elastic search through a data processing unit to obtain a heat statistical result comprises:

4. The Elasticsearch-based data processing method as claimed in claim 3, wherein the step of determining, by the data processing unit, the total number of applications submitted and the total number of details viewed corresponding to the retrieved data according to the DML language data includes:

5. The Elasticsearch-based data processing method according to claim 1, wherein the step of determining the target search result by dynamically adjusting the primary search result through the search service unit comprises:

6. The Elasticsearch-based data processing method according to claim 5, wherein said combination condition comprises: at least one of a custom search condition, a built-in query condition of the Elasticissearch, and a composite query condition of the Elasticissearch.

7. The Elasticsearch-based data processing method according to claim 6, wherein said step of retrieving said primary search result by said search service unit according to a combination condition and a filtering function to obtain said target search result comprises:

retrieving by the Elasticsearch according to the package retrieval;

8. An Elasticsearch-based data processing apparatus, the apparatus comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.