CN112131295A - Data processing method and device based on Elasticissearch - Google Patents

Data processing method and device based on Elasticissearch Download PDF

Info

Publication number
CN112131295A
CN112131295A CN202011034098.4A CN202011034098A CN112131295A CN 112131295 A CN112131295 A CN 112131295A CN 202011034098 A CN202011034098 A CN 202011034098A CN 112131295 A CN112131295 A CN 112131295A
Authority
CN
China
Prior art keywords
data
retrieval
result
search
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011034098.4A
Other languages
Chinese (zh)
Inventor
王永亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ping An Medical Health Technology Service Co Ltd
Original Assignee
Ping An Medical and Healthcare Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Medical and Healthcare Management Co Ltd filed Critical Ping An Medical and Healthcare Management Co Ltd
Priority to CN202011034098.4A priority Critical patent/CN112131295A/en
Publication of CN112131295A publication Critical patent/CN112131295A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists

Abstract

The application relates to the technical field of medical informatization of digital medical treatment, and discloses a data processing method and equipment based on an elastic search, wherein the method comprises the following steps: importing Hive metadata into a database through a metadata management unit, converting the structured Hive metadata into a JSON format through an elastic search in the importing process, and creating an original index, an index type and a document; carrying out heat statistics and sorting on the retrieval data of the ES through a data processing unit, and determining a heat label according to a sorting result; establishing an inverted index for the data in the database according to the heat label in the ES; and receiving the keywords to be retrieved through the retrieval service unit, retrieving and dynamically adjusting according to the keywords to be retrieved through the ES, and determining a target retrieval result. Therefore, the data needed by the client can be quickly acquired under the condition of not influencing the heat statistics, the data can be quickly acquired, and the retrieval efficiency is improved.

Description

Data processing method and device based on Elasticissearch
Technical Field
The application relates to the technical field of medical informatization of digital medical treatment, in particular to a data processing method and equipment based on an elastic search.
Background
In digital medical treatment, how to quickly get through the upstream and downstream relationships of medical data in the treatment process of data assets formed by the medical data, and add corresponding categories to each level, and quickly realize the retrieval of the data assets according to the categories and the heat degree has certain efficiency bottleneck.
Disclosure of Invention
The application mainly aims to provide a data processing method and equipment based on the elastic search, and aims to solve the technical problems that in the prior art, when the system data volume is large, the I/O performance and the statistical analysis performance of a traditional relational database cannot meet the user requirements easily, and the overall retrieval efficiency cannot be improved easily.
In order to achieve the above object, the present application provides an Elasticsearch-based data processing method, including:
data acquisition: importing Hive metadata into a database through a metadata management unit, converting the structured Hive metadata into a JSON format and creating an original index, an index type and a document through an elastic search in the importing process, wherein the Hive metadata is data obtained according to medical data;
data processing: performing heat statistics on the retrieval data of the Elasticsearch through a data processing unit to obtain a heat statistical result, sorting the retrieval data according to the heat statistical result to obtain a sorting result, and determining a heat label according to the sorting result;
index construction: establishing an inverted index for the data in the database according to the hot label through the Elasticissearch;
and (3) data retrieval: receiving a keyword to be retrieved through a retrieval service unit, retrieving according to the keyword to be retrieved through the elastic search to obtain a primary retrieval result, and dynamically adjusting the primary retrieval result through the retrieval service unit to determine a target retrieval result.
Further, the step of converting the structured Hive metadata into JSON format by using elastic search in the importing process includes:
and importing the Hive metadata into a database through a data import function of a metadata management unit, and converting the structured data of the database into a JSON format through XContentBluilder in the SDK of the Elasticisarch in the import process.
Further, the step of performing heat statistics on the retrieval data of the Elasticsearch through a data processing unit to obtain a heat statistical result includes:
converting the retrieval data of the Elasticsearch into a DML language through a data processing unit to obtain DML language data, and storing the DML language data in the data processing unit;
determining the total number of applications submitted and the total number of details checked corresponding to the retrieved data according to the DML language data through a data processing unit;
and weighting the total number of times of submitting applications and the total number of times of checking details of the same retrieved data through a data processing unit to obtain a heat statistical result corresponding to the retrieved data.
Further, the step of determining, by the data processing unit, the total number of applications submitted and the total number of details viewed corresponding to the retrieved data according to the DML language data includes:
extracting the DML language data according to preset statistical duration through a data processing unit to obtain data to be counted;
and determining the total number of applications submitted and the total number of details checked corresponding to the data to be searched according to the data to be counted and a preset statistical rule.
Further, the step of dynamically adjusting the primary search result by the search service unit to determine a target search result includes:
and searching the primary search result through the search service unit according to the combination condition and the filtering function to obtain the target search result.
Further, the combination condition includes: at least one of a custom search condition, a built-in query condition of the Elasticissearch, and a composite query condition of the Elasticissearch.
Further, the step of retrieving the primary retrieval result by the retrieval service unit according to a combination condition and a filtering function to obtain the target retrieval result includes:
when the combination condition comprises the custom search condition, obtaining the custom search condition according to a query condition dragged to a retrieval area of the retrieval service unit, wherein the query condition is a condition in a query condition list;
assembling the self-defined search condition into a package search entry through the search service unit;
retrieving by the Elasticsearch according to the package retrieval;
the dragging function of the retrieval service unit is provided by an vue framework, and the page of the retrieval service unit is asynchronously loaded according to page default data during loading to obtain the query condition list.
The present application also proposes an elastic search based data processing apparatus, said apparatus comprising:
the data acquisition module is used for importing Hive metadata into a database through a metadata management unit, converting the structured Hive metadata into a JSON format and creating an original index, an index type and a document through an elastic search in the importing process, wherein the Hive metadata is data obtained according to medical data;
the data processing module is used for carrying out heat statistics on the retrieval data of the Elasticsearch through the data processing unit to obtain a heat statistical result, sorting the retrieval data according to the heat statistical result to obtain a sorting result, and determining a heat label according to the sorting result;
the index building module is used for building an inverted index for the data in the database according to the hot label through the Elasticissearch;
and the data retrieval module is used for receiving the keywords to be retrieved through the retrieval service unit, retrieving according to the keywords to be retrieved through the elastic search to obtain a primary retrieval result, and dynamically adjusting the primary retrieval result through the retrieval service unit to determine a target retrieval result.
The present application further proposes a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of any of the above methods when executing the computer program.
The present application also proposes a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any of the above.
According to the data processing method and device based on the Elasticissearch, the heat statistics is carried out on the retrieval data of the Elasticissearch through the data processing unit to obtain a heat statistical result, the retrieval data are sorted according to the heat statistical result to obtain a sorting result, the heat label is determined according to the sorting result, and the inverted index is established for the data in the database according to the heat label in the Elasticissearch, so that the data needed by a client can be quickly obtained under the condition that the heat statistics is not influenced; the retrieval service unit receives the keywords to be retrieved, the Elasticissearch is used for retrieving according to the keywords to be retrieved to obtain a primary retrieval result, the retrieval service unit dynamically adjusts the primary retrieval result to determine a target retrieval result, so that direct operation of a user on Hive metadata is shielded, the primary retrieval result is dynamically adjusted to obtain the target retrieval result, data can be rapidly obtained when the system data volume is large, and the retrieval efficiency is improved.
Drawings
Fig. 1 is a schematic flowchart of a data processing method based on an Elasticsearch according to an embodiment of the present application;
FIG. 2 is a schematic block diagram of a structure of an elastic search based data processing apparatus according to an embodiment of the present application;
fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The terminology used in this application is to be interpreted as follows:
an elastic search, ES for short, is a search server based on Lucene. It provides a distributed multi-user capable full-text search engine based on restful web interface. The Elasticsearch was developed in the Java language and published as open source under the Apache licensing terms, a popular enterprise level search engine. The Elasticisearch is used in cloud computing, can achieve real-time search, and is stable, reliable, rapid, and convenient to install and use. Official clients are available in Java,. NET (C #), PHP, Python, apache groovy, Ruby and many other languages. The Elasticissearch is the most popular enterprise search engine, as shown by the DB-Engineers ranking, followed by Apache Solr, also based on Lucene.
Hive is a data warehouse tool based on Hadoop, is used for data extraction, transformation and loading, and is a mechanism capable of storing, querying and analyzing large-scale data stored in Hadoop.
And the metadata (MetaDate) mainly records the definition of a model in the data warehouse, the mapping relation among layers, the data state of the monitoring data warehouse and the task running state of the ETL. Metadata is generally stored and managed uniformly through a metadata repository (MetadataRepository), which is mainly aimed at enabling collaboration and consistency in the design, deployment, operation and management of data warehouses.
Hive metadata refers to Hive metadata, the Hive metadata exists in mysql, and a Hive library is stored in the mysql.
The JSON format is a lightweight data exchange format. It stores and represents data in a text format that is completely independent of the programming language, based on a subset of ECMAScript (js specification set by the european computer association). The compact and clear hierarchy makes JSON an ideal data exchange language. The network transmission method is easy to read and write by people, is easy to analyze and generate by machines, and effectively improves the network transmission efficiency.
The DML language is a data operation language through which a user can perform basic operations on a database. Such as queries, insertions, deletions and modifications to the data in the table. In the DML, an application program can perform insertion operation, deletion operation, modification operation, sorting operation and retrieval operation on a database.
The SDK refers to a software development kit, which is a collection of development tools used by software engineers to create application software for a specific software package, software framework, hardware platform, operating system, and the like, and in general, the SDK is an SDK used for developing an application program on a Windows platform. It may simply be a file that provides an application program interface API for a certain programming language, but may also include complex hardware that can communicate with a certain embedded system. Typical tools include utility tools for debugging and other purposes. SDKs also often include example code, supporting technical notes, or other supporting documentation to clarify suspicions for basic reference.
Lucene is a full-text search engine toolkit of open source codes, but the Lucene is not a complete full-text search engine but a full-text search engine architecture, and provides a complete query engine, an index engine and a partial text analysis engine (English and German western languages).
Vue framework is a set of progressive JavaScript framework for constructing a user interface. Unlike other large frames, Vue is designed to be applied layer by layer from the bottom up. Vue the core library only focuses on the viewing layer, facilitating integration with third party libraries or existing projects.
In order to solve the technical problems that in the prior art, when the system data volume is large, the I/O performance and the statistical analysis performance of a traditional relational database are difficult to meet the user requirements, and the overall retrieval efficiency is difficult to improve, the application provides a data processing method based on the elastic search, and the method is applied to the technical field of digital medical treatment and is further applied to the technical field of medical informatization. The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. The data processing method based on the Elasticissearch establishes the inverted index for the data in the database based on the heat label, retrieves according to the keyword to be retrieved through the Elasticissearch to obtain a primary retrieval result, dynamically adjusts the primary retrieval result through the retrieval service unit to determine a target retrieval result, shields the direct operation of a user on the Hive metadata, achieves the purpose of rapidly acquiring the data when the system data volume is large, and improves the retrieval efficiency.
Referring to fig. 1, the data processing method based on the Elasticsearch includes:
s1: data acquisition: importing Hive metadata into a database through a metadata management unit, converting the structured Hive metadata into a JSON format and creating an original index, an index type and a document through an elastic search in the importing process, wherein the Hive metadata is data obtained according to medical data;
s2: data processing: performing heat statistics on the retrieval data of the Elasticsearch through a data processing unit to obtain a heat statistical result, sorting the retrieval data according to the heat statistical result to obtain a sorting result, and determining a heat label according to the sorting result;
s3: index construction: establishing an inverted index for the data in the database according to the hot label through the Elasticissearch;
s4: and (3) data retrieval: receiving a keyword to be retrieved through a retrieval service unit, retrieving according to the keyword to be retrieved through the elastic search to obtain a primary retrieval result, and dynamically adjusting the primary retrieval result through the retrieval service unit to determine a target retrieval result.
According to the method, the data processing unit is used for conducting heat statistics on the retrieval data of the Elasticissearch to obtain a heat statistical result, the retrieval data are sorted according to the heat statistical result to obtain a sorting result, a heat label is determined according to the sorting result, and an inverted index is established for the data in the database according to the heat label in the Elasticissearch, so that the data needed by a client can be quickly obtained under the condition that the heat statistics is not influenced; the retrieval service unit receives the keywords to be retrieved, the Elasticissearch is used for retrieving according to the keywords to be retrieved to obtain a primary retrieval result, the retrieval service unit dynamically adjusts the primary retrieval result to determine a target retrieval result, so that direct operation of a user on Hive metadata is shielded, the primary retrieval result is dynamically adjusted to obtain the target retrieval result, data can be rapidly obtained when the system data volume is large, and the retrieval efficiency is improved.
In the process of retrieving the medical data by the Elasticsearch, the data volume of the medical data is large, so that the retrieval efficiency is low, and the Hive metadata generated according to the medical data is subjected to data processing through the steps S1 to S4, so that the retrieval efficiency is improved, the working efficiency of medical workers is improved, the efficiency of social workers in retrieving the medical data is improved, and the use value of medical assets with large data volume is improved.
For step S1, the structured Hive metadata is converted into JSON format by an Elasticsearch during the import process, that is, the structured data of the database is converted into JSON character strings.
Preferably, the database is a MySQL resource database.
The medical data includes: at least one of case data, treatment order data, medication data, and medical expense data, it is understood that the medical data may also be data of the medical industry, and is not specifically limited herein.
The metadata management unit, that is, a metadata management tool, is configured to manage metadata. The implementation method of the metadata management tool may be selected from the prior art, and will not be described herein.
And creating an original index, an index type and a document according to the structured Hive metadata by using a construction method of IndexResponse of the elastic search in the importing process.
The original index is an index directly created according to the Hive metadata by an IndexResponse construction method, and can be used as a field of the inverted index.
In the Elasticsearch, one index object can store a plurality of objects with different purposes, and different objects in a single index object can be distinguished through an index type (index _ type), which can be understood as a table in a relational database. Each index type may have a different structure, but different index types cannot set different types for the same attribute.
The main entity stored in the Elasticsearch is named document (document), which can be understood as a line of records in a table of a relational database. Each document is composed of a plurality of fields, the elastic search is an unstructured database, each document can have different fields and has a unique identifier.
For step S2, performing offline heat statistics on the retrieved data of the Elasticsearch through a data processing unit to obtain a heat statistical result corresponding to the retrieved data; sorting the retrieved data in a reverse order according to the heat statistical result, and taking the sorted retrieved data as a sorting result; and labeling the retrieved data according to the sorting result, and determining the heat label of each retrieved data.
The data processing unit is a tool for performing offline statistics.
The retrieved data, i.e., data assets. The retrieval data is operational data formed on the retrieval of the data assets.
And the data to be searched is sorted in the reverse order, namely, the data to be searched with high heat statistics is sorted in the front, and the data to be searched with low heat statistics is sorted in the back.
The hot label is used for representing the query hot of the metadata, for example, the hot label can be a one-star grade, a two-star grade, a three-star grade, a four-star grade, and a five-star grade, and the hot statistical result of the five-star grade is higher than that of the four-star grade. For example, the retrieved data ranked 20% (including 20%) is placed in a hot label five-star level, the retrieved data ranked 20% -40% (not including 20%, including 40%) is placed in a hot label four-star level, the retrieved data ranked 40% -60% (not including 40%, including 60%) is placed in a hot label three-star level, the retrieved data ranked 60% -80% (not including 60%, including 80%) is placed in a hot label two-star level, and the retrieved data ranked 80% (not including 80%) is placed in a hot label one-star level, which is not specifically limited in this example.
For step S3, an inverted index is built for the data in the database according to the heat label based on Lucene through the Elasticsearch. That is, the original index will be a field of the inverted index. The method for establishing the inverted index may be selected from the prior art, and will not be described herein.
The Elasticissearch utilizes the reverse index of Lucene, when the index module of the Elasticissearch stores data, the Elasticissearch creates a reverse index for each field of the data in the database, and maps (map) the indexed words (terms) to the documents containing the words through the reverse index.
The Lucene is a jar packet and contains various packaged codes for establishing the reverse index and searching, including various algorithms.
For step S4, receiving a keyword to be retrieved by the retrieval service unit; calling the Elasticissearch through a retrieval service unit to perform full-text retrieval according to the keyword to be retrieved, and taking a retrieval result as a primary retrieval result; and performing linkage query according to the primary retrieval result through the retrieval service unit, dynamically adjusting available screening labels, dynamically adjusting the screening labels in a drilling-down mode, and taking the result of the dynamic adjustment as a target retrieval result.
And the drill-down step is used for spreading the next layer of data from the current data downwards. For example: (the classification of certain data is divided into item names below) from the classification list to the item name list.
The screening label is a cascade label. The screening labels are set according to characteristics of the medical data.
The linkage query is also a joint index query in the inverted index, a joint index function is realized by using a bitset data structure in the Lucene, asynchronous loading is carried out on the front end (retrieval service unit) after two linkage results are queried, and different labels are displayed, wherein the labels are defined for each data.
Preferably, the label is a classification defined for each data, and means a classification defined for each data according to the characteristics of the medical data.
In an embodiment, the step of converting the structured Hive metadata into JSON format by using elastic search in the importing process includes:
and importing the Hive metadata into a database through a data import function of a metadata management unit, and converting the structured data of the database into a JSON format through XContentBluilder in the SDK of the Elasticisarch in the import process.
The Hive metadata is imported into a Hive library of a database through a data import function of a metadata management unit, and the structured data of the database is converted into a JSON character string through XContentBluilder in the SDK of the Elasticisarch in the import process. That is, the data that is ultimately stored in the Hive library of the database is an unstructured JSON string.
When defining the mapping type of the index and obtaining the index string, xcontentpointer needs to convert the index into a setting (directory) and a mapping (mapping) of the specified index, and generate a mapping source, where the mapping is composed of one or more analyzers (tokenizers) used for partitioning data of the JSON string, and each mapping source is composed of one or more filters, and each filter is set. When the Elasticsearch indexes the document, the content in the field is transferred to the analyzer and data conversion (word segmentation, index and other operations) is carried out through the filter of the subset (mapping subset) so as to manage the index, and the index is used for increasing the retrieval speed.
In an embodiment, the step of performing heat statistics on the retrieval data of the Elasticsearch through a data processing unit to obtain a heat statistical result includes:
s21: converting the retrieval data of the Elasticsearch into a DML language through a data processing unit to obtain DML language data, and storing the DML language data in the data processing unit;
s22: determining the total number of applications submitted and the total number of details checked corresponding to the retrieved data according to the DML language data through a data processing unit;
s23: and weighting the total number of times of submitting applications and the total number of times of checking details of the same retrieved data through a data processing unit to obtain a heat statistical result corresponding to the retrieved data.
According to the embodiment, offline heat statistics is performed on the retrieval data of the Elasticsearch, and a basis is provided for tagging the retrieved data.
For step S21, the data processing unit converts the search data of the Elasticsearch into DML language to obtain DML language data, and stores the DML language data in the data processing unit, thereby facilitating the data processing unit to perform offline heat statistics and avoiding the influence of the heat statistics on the search performance of the Elasticsearch.
It can be understood that the retrieved data of the Elasticsearch is converted into the DML language by the data processing unit to obtain DML language data, and the DML language data can be converted in real time, can be converted periodically, and can be converted according to a preset time.
For step S22, finding out all retrieved data from the DML language data through the data processing unit, then performing submission application statistics on each retrieved data according to the DML language data to obtain a total number of submission applications corresponding to the retrieved data, and performing viewing detail number statistics on each retrieved data according to the DML language data to obtain a total number of viewing details corresponding to the retrieved data.
The submitted application is used for applying for access to the retrieved data.
The viewing details refer to the detailed information of the retrieved data.
For step S23, the calculation formula of the heat statistic result R is:
R=Ti×a+Si×b
wherein, TiIs the total number of times of viewing details of the ith said retrieved data, SiIs the total number of times of filing the retrieved data, a and b are weighting coefficients, and a and b are added to 1.
Preferably, a is 0.4, and b is 0.6, so that the heat of the data searched for by the heat statistical result feedback is more consistent with the use habit of the user.
In an embodiment, the step of determining, by the data processing unit, the total number of applications submitted and the total number of details viewed corresponding to the retrieved data according to the DML language data includes:
s221: extracting the DML language data according to preset statistical duration through a data processing unit to obtain data to be counted;
s222: and determining the total number of applications submitted and the total number of details checked corresponding to the data to be searched according to the data to be counted and a preset statistical rule.
In the embodiment, the DML language data is extracted according to the preset statistical duration, so that the influence of too long historical data on the accuracy of the heat statistical result is avoided.
A comparison step S221, extracting the DML language data from the historical time by using the current time as the starting time through a data processing unit to obtain data to be counted; the time length of the data to be counted is the same as the preset counting time length.
The preset statistical duration may be directly input by a user, may be written in a program file for implementing the Elasticsearch-based data processing method of the present application, and may be acquired from a database.
And a comparison step S222, finding out all retrieved data to be retrieved from the data to be counted through a data processing unit, then carrying out application submitting frequency statistics on each retrieved data according to the data to be counted and preset statistical rules to obtain the total application submitting frequency corresponding to the retrieved data, and carrying out detail checking frequency statistics on each retrieved data according to the data to be counted and preset statistical rules to obtain the total detail checking frequency corresponding to the retrieved data.
And counting the number of times of checking details according to a preset counting rule, namely checking the searched data once, and recording the data as the once checking details.
Preferably, for the preset statistical rule, the same user applies for calculation for the same feature data within the preset application duration in one time. For example, the same user applies for access to a certain feature data (i.e., retrieved data) first, and after authorization, the viewing validity period of the feature data (i.e., retrieved data) by the same user is 12 hours, and if the viewing validity period exceeds 12 hours, the user needs to apply for another application, where the feature refers to a description of metadata about a tag, a category, a domain, and a tag, a category, and a domain, and this example is not limited specifically.
In an embodiment, the step of dynamically adjusting the primary search result by the search service unit to determine a target search result includes:
and searching the primary search result through the search service unit according to the combination condition and the filtering function to obtain the target search result.
According to the embodiment, the retrieval is carried out according to the combination condition and the filtering function, so that the user can be helped to discover and understand the retrieved data, and the user can rapidly apply for the data, obtain the data, apply the data and share the data, so that the retrieval response time can be ensured even if the data is PB-level data, the retrieval efficiency is improved, and the user experience is improved.
Wherein, the retrieval is carried out according to the combination condition and the filtering function, namely the dynamic adjustment is carried out.
And searching the primary search result according to a combination condition through the search service unit, and filtering by using a filter filtering function in the search process to finally obtain the target search result.
In one embodiment, the combination condition includes: at least one of a custom search condition, a built-in query condition of the Elasticissearch, and a composite query condition of the Elasticissearch.
The self-defined search condition is a search condition self-defined by a user.
The built-in query condition of the Elasticissearch is realized by means of Template of the Elasticissearch, mainly some Boolean queries, regular expression queries and the like.
The compound query conditions of the Elasticissearch comprise phrase matching query (also called phrase matching retrieval) and package query (also called package retrieval), are realized through Java API of the Elasticissearch, and mainly utilize the QueryBuilder object.
In one embodiment, the step of retrieving, by the retrieval service unit, the primary retrieval result according to a combination condition and a filtering function to obtain the target retrieval result includes:
s41: when the combination condition comprises the custom search condition, obtaining the custom search condition according to a query condition dragged to a retrieval area of the retrieval service unit, wherein the query condition is a condition in a query condition list;
s42: assembling the self-defined search condition into a package search entry through the search service unit;
s43: retrieving by the Elasticsearch according to the package retrieval;
the dragging function of the retrieval service unit is provided by an vue framework, and the page of the retrieval service unit is asynchronously loaded according to page default data during loading to obtain the query condition list.
According to the embodiment, the user-defined search condition defined by the user is obtained, and the search is carried out according to the user-defined search condition, so that the personalized search requirement of the user is met, and the user experience is improved.
For step S41, the user drags the query condition of the query condition list to the retrieval area of the retrieval service unit through the dragging function of the retrieval service unit, the user triggers a generation instruction, and the retrieval service unit generates the customized search condition according to the generation instruction.
The query conditions of the query condition list are commodity names, starting time and ending time, and a user can name a commodity state by dragging one condition.
With reference to fig. 2, the present application also proposes an Elasticsearch-based data processing apparatus, said apparatus comprising:
the data acquisition module is used for importing Hive metadata into a database through a metadata management unit, converting the structured Hive metadata into a JSON format and creating an original index, an index type and a document through an elastic search in the importing process, wherein the Hive metadata is data obtained according to medical data;
the data processing module is used for carrying out heat statistics on the retrieval data of the Elasticsearch through the data processing unit to obtain a heat statistical result, sorting the retrieval data according to the heat statistical result to obtain a sorting result, and determining a heat label according to the sorting result;
the index building module is used for building an inverted index for the data in the database according to the hot label through the Elasticissearch;
and the data retrieval module is used for receiving the keywords to be retrieved through the retrieval service unit, retrieving according to the keywords to be retrieved through the elastic search to obtain a primary retrieval result, and dynamically adjusting the primary retrieval result through the retrieval service unit to determine a target retrieval result.
According to the method, the data processing unit is used for conducting heat statistics on the retrieval data of the Elasticissearch to obtain a heat statistical result, the retrieval data are sorted according to the heat statistical result to obtain a sorting result, a heat label is determined according to the sorting result, and an inverted index is established for the data in the database according to the heat label in the Elasticissearch, so that the data needed by a client can be quickly obtained under the condition that the heat statistics is not influenced; the retrieval service unit receives the keywords to be retrieved, the Elasticissearch is used for retrieving according to the keywords to be retrieved to obtain a primary retrieval result, the retrieval service unit dynamically adjusts the primary retrieval result to determine a target retrieval result, so that direct operation of a user on Hive metadata is shielded, the primary retrieval result is dynamically adjusted to obtain the target retrieval result, data can be rapidly obtained when the system data volume is large, and the retrieval efficiency is improved.
Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer equipment is used for storing data such as an Elasticissearch-based data processing method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an Elasticsearch based data processing method. The data processing method based on the Elasticissearch comprises the following steps: data acquisition: importing Hive metadata into a database through a metadata management unit, converting the structured Hive metadata into a JSON format and creating an original index, an index type and a document through an elastic search in the importing process, wherein the Hive metadata is data obtained according to medical data; data processing: performing heat statistics on the retrieval data of the Elasticsearch through a data processing unit to obtain a heat statistical result, sorting the retrieval data according to the heat statistical result to obtain a sorting result, and determining a heat label according to the sorting result; index construction: establishing an inverted index for the data in the database according to the hot label through the Elasticissearch; and (3) data retrieval: receiving a keyword to be retrieved through a retrieval service unit, retrieving according to the keyword to be retrieved through the elastic search to obtain a primary retrieval result, and dynamically adjusting the primary retrieval result through the retrieval service unit to determine a target retrieval result.
According to the method, the data processing unit is used for conducting heat statistics on the retrieval data of the Elasticissearch to obtain a heat statistical result, the retrieval data are sorted according to the heat statistical result to obtain a sorting result, a heat label is determined according to the sorting result, and an inverted index is established for the data in the database according to the heat label in the Elasticissearch, so that the data needed by a client can be quickly obtained under the condition that the heat statistics is not influenced; the retrieval service unit receives the keywords to be retrieved, the Elasticissearch is used for retrieving according to the keywords to be retrieved to obtain a primary retrieval result, the retrieval service unit dynamically adjusts the primary retrieval result to determine a target retrieval result, so that direct operation of a user on Hive metadata is shielded, the primary retrieval result is dynamically adjusted to obtain the target retrieval result, data can be rapidly obtained when the system data volume is large, and the retrieval efficiency is improved.
An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for processing data based on an Elasticsearch is implemented, including the steps of: data acquisition: importing Hive metadata into a database through a metadata management unit, converting the structured Hive metadata into a JSON format and creating an original index, an index type and a document through an elastic search in the importing process, wherein the Hive metadata is data obtained according to medical data; data processing: performing heat statistics on the retrieval data of the Elasticsearch through a data processing unit to obtain a heat statistical result, sorting the retrieval data according to the heat statistical result to obtain a sorting result, and determining a heat label according to the sorting result; index construction: establishing an inverted index for the data in the database according to the hot label through the Elasticissearch; and (3) data retrieval: receiving a keyword to be retrieved through a retrieval service unit, retrieving according to the keyword to be retrieved through the elastic search to obtain a primary retrieval result, and dynamically adjusting the primary retrieval result through the retrieval service unit to determine a target retrieval result.
According to the executed data processing method based on the Elasticissearch, the data processing unit is used for carrying out heat statistics on the retrieval data of the Elasticissearch to obtain a heat statistical result, the retrieval data are sorted according to the heat statistical result to obtain a sorting result, a heat label is determined according to the sorting result, and an inverted index is established for the data in the database according to the heat label in the Elasticissearch, so that the data required by a client can be quickly obtained under the condition of not influencing the heat statistics; the retrieval service unit receives the keywords to be retrieved, the Elasticissearch is used for retrieving according to the keywords to be retrieved to obtain a primary retrieval result, the retrieval service unit dynamically adjusts the primary retrieval result to determine a target retrieval result, so that direct operation of a user on Hive metadata is shielded, the primary retrieval result is dynamically adjusted to obtain the target retrieval result, data can be rapidly obtained when the system data volume is large, and the retrieval efficiency is improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. An elastic search based data processing method, characterized in that the method comprises:
data acquisition: importing Hive metadata into a database through a metadata management unit, converting the structured Hive metadata into a JSON format and creating an original index, an index type and a document through an elastic search in the importing process, wherein the Hive metadata is data obtained according to medical data;
data processing: performing heat statistics on the retrieval data of the Elasticsearch through a data processing unit to obtain a heat statistical result, sorting the retrieval data according to the heat statistical result to obtain a sorting result, and determining a heat label according to the sorting result;
index construction: establishing an inverted index for the data in the database according to the hot label through the Elasticissearch;
and (3) data retrieval: receiving a keyword to be retrieved through a retrieval service unit, retrieving according to the keyword to be retrieved through the elastic search to obtain a primary retrieval result, and dynamically adjusting the primary retrieval result through the retrieval service unit to determine a target retrieval result.
2. The elastic search based data processing method according to claim 1, wherein the step of converting the structured Hive metadata into JSON format by the elastic search in the importing process comprises:
and importing the Hive metadata into a database through a data import function of a metadata management unit, and converting the structured data of the database into a JSON format through XContentBluilder in the SDK of the Elasticisarch in the import process.
3. The data processing method based on the elastic search according to claim 1, wherein the step of performing heat statistics on the search data of the elastic search through a data processing unit to obtain a heat statistical result comprises:
converting the retrieval data of the Elasticsearch into a DML language through a data processing unit to obtain DML language data, and storing the DML language data in the data processing unit;
determining the total number of applications submitted and the total number of details checked corresponding to the retrieved data according to the DML language data through a data processing unit;
and weighting the total number of times of submitting applications and the total number of times of checking details of the same retrieved data through a data processing unit to obtain a heat statistical result corresponding to the retrieved data.
4. The Elasticsearch-based data processing method as claimed in claim 3, wherein the step of determining, by the data processing unit, the total number of applications submitted and the total number of details viewed corresponding to the retrieved data according to the DML language data includes:
extracting the DML language data according to preset statistical duration through a data processing unit to obtain data to be counted;
and determining the total number of applications submitted and the total number of details checked corresponding to the data to be searched according to the data to be counted and a preset statistical rule.
5. The Elasticsearch-based data processing method according to claim 1, wherein the step of determining the target search result by dynamically adjusting the primary search result through the search service unit comprises:
and searching the primary search result through the search service unit according to the combination condition and the filtering function to obtain the target search result.
6. The Elasticsearch-based data processing method according to claim 5, wherein said combination condition comprises: at least one of a custom search condition, a built-in query condition of the Elasticissearch, and a composite query condition of the Elasticissearch.
7. The Elasticsearch-based data processing method according to claim 6, wherein said step of retrieving said primary search result by said search service unit according to a combination condition and a filtering function to obtain said target search result comprises:
when the combination condition comprises the custom search condition, obtaining the custom search condition according to a query condition dragged to a retrieval area of the retrieval service unit, wherein the query condition is a condition in a query condition list;
assembling the self-defined search condition into a package search entry through the search service unit;
retrieving by the Elasticsearch according to the package retrieval;
the dragging function of the retrieval service unit is provided by an vue framework, and the page of the retrieval service unit is asynchronously loaded according to page default data during loading to obtain the query condition list.
8. An Elasticsearch-based data processing apparatus, the apparatus comprising:
the data acquisition module is used for importing Hive metadata into a database through a metadata management unit, converting the structured Hive metadata into a JSON format and creating an original index, an index type and a document through an elastic search in the importing process, wherein the Hive metadata is data obtained according to medical data;
the data processing module is used for carrying out heat statistics on the retrieval data of the Elasticsearch through the data processing unit to obtain a heat statistical result, sorting the retrieval data according to the heat statistical result to obtain a sorting result, and determining a heat label according to the sorting result;
the index building module is used for building an inverted index for the data in the database according to the hot label through the Elasticissearch;
and the data retrieval module is used for receiving the keywords to be retrieved through the retrieval service unit, retrieving according to the keywords to be retrieved through the elastic search to obtain a primary retrieval result, and dynamically adjusting the primary retrieval result through the retrieval service unit to determine a target retrieval result.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202011034098.4A 2020-09-27 2020-09-27 Data processing method and device based on Elasticissearch Pending CN112131295A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011034098.4A CN112131295A (en) 2020-09-27 2020-09-27 Data processing method and device based on Elasticissearch

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011034098.4A CN112131295A (en) 2020-09-27 2020-09-27 Data processing method and device based on Elasticissearch

Publications (1)

Publication Number Publication Date
CN112131295A true CN112131295A (en) 2020-12-25

Family

ID=73840351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011034098.4A Pending CN112131295A (en) 2020-09-27 2020-09-27 Data processing method and device based on Elasticissearch

Country Status (1)

Country Link
CN (1) CN112131295A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765131A (en) * 2021-01-22 2021-05-07 重庆邮电大学 Heterogeneous medical health data storage and retrieval method and system
CN112818013A (en) * 2021-01-27 2021-05-18 北京百度网讯科技有限公司 Time sequence database query optimization method, device, equipment and storage medium
CN113380416A (en) * 2021-06-11 2021-09-10 山东健康医疗大数据有限公司 Regional medical data rapid retrieval method
CN113779349A (en) * 2021-08-11 2021-12-10 中央广播电视总台 Data retrieval system, apparatus, electronic device, and readable storage medium
CN114443728A (en) * 2022-01-04 2022-05-06 广州粤建三和软件股份有限公司 Detection report searching method and device based on elastic search
CN115563127A (en) * 2022-11-10 2023-01-03 神州医疗科技股份有限公司 Query method and system based on big data medical general retrieval index construction
CN117493641A (en) * 2024-01-02 2024-02-02 中国电子科技集团公司第二十八研究所 Secondary fuzzy search method based on semantic metadata

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649426A (en) * 2016-08-05 2017-05-10 浪潮软件股份有限公司 Data analysis method, data analysis platform and server
CN106776878A (en) * 2016-11-29 2017-05-31 西安交通大学 A kind of method for carrying out facet retrieval to MOOC courses based on ElasticSearch
CN109299102A (en) * 2018-10-23 2019-02-01 中国电子科技集团公司第二十八研究所 A kind of HBase secondary index system and method based on Elastcisearch
CN110555152A (en) * 2018-03-31 2019-12-10 甘肃万维信息技术有限责任公司 distributed search system based on Elasticissearch framework

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649426A (en) * 2016-08-05 2017-05-10 浪潮软件股份有限公司 Data analysis method, data analysis platform and server
CN106776878A (en) * 2016-11-29 2017-05-31 西安交通大学 A kind of method for carrying out facet retrieval to MOOC courses based on ElasticSearch
CN110555152A (en) * 2018-03-31 2019-12-10 甘肃万维信息技术有限责任公司 distributed search system based on Elasticissearch framework
CN109299102A (en) * 2018-10-23 2019-02-01 中国电子科技集团公司第二十八研究所 A kind of HBase secondary index system and method based on Elastcisearch

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765131A (en) * 2021-01-22 2021-05-07 重庆邮电大学 Heterogeneous medical health data storage and retrieval method and system
CN112765131B (en) * 2021-01-22 2023-03-24 重庆邮电大学 Heterogeneous medical health data storage and retrieval method and system
CN112818013A (en) * 2021-01-27 2021-05-18 北京百度网讯科技有限公司 Time sequence database query optimization method, device, equipment and storage medium
CN112818013B (en) * 2021-01-27 2023-07-21 北京百度网讯科技有限公司 Time sequence database query optimization method, device, equipment and storage medium
CN113380416A (en) * 2021-06-11 2021-09-10 山东健康医疗大数据有限公司 Regional medical data rapid retrieval method
CN113779349A (en) * 2021-08-11 2021-12-10 中央广播电视总台 Data retrieval system, apparatus, electronic device, and readable storage medium
CN114443728A (en) * 2022-01-04 2022-05-06 广州粤建三和软件股份有限公司 Detection report searching method and device based on elastic search
CN114443728B (en) * 2022-01-04 2022-11-15 广州粤建三和软件股份有限公司 Detection report searching method and device based on Elasticissearch
CN115563127A (en) * 2022-11-10 2023-01-03 神州医疗科技股份有限公司 Query method and system based on big data medical general retrieval index construction
CN115563127B (en) * 2022-11-10 2023-02-24 神州医疗科技股份有限公司 Query method and system based on big data medical general retrieval index construction
CN117493641A (en) * 2024-01-02 2024-02-02 中国电子科技集团公司第二十八研究所 Secondary fuzzy search method based on semantic metadata
CN117493641B (en) * 2024-01-02 2024-03-22 中国电子科技集团公司第二十八研究所 Secondary fuzzy search method based on semantic metadata

Similar Documents

Publication Publication Date Title
CN112131295A (en) Data processing method and device based on Elasticissearch
US7912816B2 (en) Adaptive archive data management
US8751466B1 (en) Customizable answer engine implemented by user-defined plug-ins
US8180758B1 (en) Data management system utilizing predicate logic
US7539669B2 (en) Methods and systems for providing guided navigation
US20080086490A1 (en) Discovery of services matching a service request
CN111125086B (en) Method, device, storage medium and processor for acquiring data resources
CN111666401A (en) Official document recommendation method and device based on graph structure, computer equipment and medium
WO2013071305A2 (en) Systems and methods for manipulating data using natural language commands
CN111782763A (en) Information retrieval method based on voice semantics and related equipment thereof
CN112883030A (en) Data collection method and device, computer equipment and storage medium
US11841846B1 (en) Generating object morphisms during object search
KR20240020166A (en) Method for learning machine-learning model with structured ESG data using ESG auxiliary tool and service server for generating automatically completed ESG documents with the machine-learning model
US11328005B2 (en) Machine learning (ML) based expansion of a data set
CN117149804A (en) Data processing method, device, electronic equipment and storage medium
CN117033744A (en) Data query method and device, storage medium and electronic equipment
CN114547257B (en) Class matching method and device, computer equipment and storage medium
Batista-Navarro et al. A text mining-based framework for constructing an RDF-compliant biodiversity knowledge repository
CN114117242A (en) Data query method and device, computer equipment and storage medium
Ma et al. API prober–a tool for analyzing web API features and clustering web APIs
CN114564482A (en) Multi-entity-oriented label system and processing method
Singh et al. Semantic web mining: survey and analysis
CN115185973A (en) Data resource sharing method, platform, device and storage medium
CN114648121A (en) Data processing method and device, electronic equipment and storage medium
CN107038172A (en) A kind of oil field search engine construction method based on semanteme

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220601

Address after: 518000 China Aviation Center 2901, No. 1018, Huafu Road, Huahang community, Huaqiang North Street, Futian District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Ping An medical and Health Technology Service Co.,Ltd.

Address before: Room 12G, Block H, 666 Beijing East Road, Huangpu District, Shanghai 200000

Applicant before: PING AN MEDICAL AND HEALTHCARE MANAGEMENT Co.,Ltd.

TA01 Transfer of patent application right