CN117688124A - Data query index creation method and device, storage medium and electronic equipment - Google Patents

Data query index creation method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN117688124A
CN117688124A CN202311490405.3A CN202311490405A CN117688124A CN 117688124 A CN117688124 A CN 117688124A CN 202311490405 A CN202311490405 A CN 202311490405A CN 117688124 A CN117688124 A CN 117688124A
Authority
CN
China
Prior art keywords
data
query
update data
mapping information
change
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311490405.3A
Other languages
Chinese (zh)
Inventor
张美智
韩爱珍
王漫漫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weizheng Intellectual Property Technology Co ltd
Original Assignee
Weizheng Intellectual Property Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weizheng Intellectual Property Technology Co ltd filed Critical Weizheng Intellectual Property Technology Co ltd
Priority to CN202311490405.3A priority Critical patent/CN117688124A/en
Publication of CN117688124A publication Critical patent/CN117688124A/en
Pending legal-status Critical Current

Links

Abstract

The application provides a data query index creation method, a device, a storage medium and electronic equipment, and relates to the technical field of data retrieval, wherein the method comprises the following steps: constructing a MySQL database of policy data; synchronizing policy data in the MySQL database to an elastic search library; monitoring change data of policy data in a MySQL database by using a Canal component, wherein the change data is generated from a change record; processing and aggregating the change data by using the correlation data of the change data to obtain updated data; and creating an elastic search index of the update data by configuring mapping information corresponding to the update data. The efficiency and speed of the query can be improved, and the response time of the policy query function can be reduced.

Description

Data query index creation method and device, storage medium and electronic equipment
Technical Field
The present invention relates to the field of data retrieval, and in particular, to a method and apparatus for creating a data query index, a storage medium, and an electronic device.
Background
With the release and promotion of a large number of policies nationwide, data such as public policies and related data have been increased. When the user knows about the relevant policy, the user needs to retrieve accordingly.
In the related art, when a user uses MySQL to perform multi-table continuous table query, the query speed is slower because the policy data volume is huge, and the new volume of the policy data is larger or the number of the query people is more.
Disclosure of Invention
The application provides a data query index creation method, a device, a storage medium and electronic equipment, wherein change information in a monitored MySQL database is mapped into an elastic search index, and compared with the query of policy data in the MySQL database, the response time of the policy data query can be reduced.
In a first aspect, the present application provides a method for creating a data query index, the method including:
constructing a MySQL database of policy data; synchronizing policy data in the MySQL database to an elastic search library; monitoring change data of policy data in a MySQL database by using a Canal component, wherein the change data is generated from a change record; processing and aggregating the change data by using the correlation data of the change data to obtain updated data, wherein the correlation data of the change data is obtained by inquiring the unique identification of the change data; and creating an elastic search index of the update data by configuring mapping information corresponding to the update data.
By adopting the technical scheme, when the change data in the MySQL database is monitored, the processing aggregation correlation data is processed to obtain the update data, and the mapping information corresponding to the update data is configured, so that the fields needing searching and analysis in the update data can be determined because the mapping information comprises the data types and the attributes of the fields in the update data, thereby creating a more accurate elastic search index.
Optionally, the creating the elastic search index of the update data by configuring mapping information corresponding to the update data includes:
acquiring a first field name and a first word segmentation device in mapping information corresponding to the update data, wherein the first word segmentation device is a rule for segmenting text fields;
inputting the first field name and the first word segmentation device into a performance prediction model to obtain a configuration score corresponding to the updated data;
And taking the first field name with the configuration score being greater than or equal to a first threshold value and the first word segmentation device as mapping information corresponding to the update data, and creating an elastic search index of the update data.
By adopting the technical scheme, the test result is analyzed by configuring the score, and whether the optimized index reaches the expected performance improvement is checked, so that the lower limit of the query response speed can be ensured.
Optionally, after the first field name and the first word segmentation unit are input to the performance prediction model to obtain the configuration score corresponding to the update data, the method further includes:
acquiring a second field name with the similarity to the first field name being greater than a second threshold value and a second word segmentation device with the similarity to the first word segmentation device being greater than a third threshold value;
taking mapping information simultaneously containing the second field name and the second word segmentation device as similar mapping information, wherein the similar mapping information corresponds to similar data;
and merging the elastic search index of the similar data and the update data, and storing the similar data and the update data in the same fragment.
By adopting the technical scheme, the field names and the word segmenters are used for determining the similar mapping information, so that corresponding similar data is obtained, the similar data and the updated data are stored in the same segment, and the query of the similar data can be limited in a specific segment, so that the query response speed is improved.
Optionally, the inputting the first field name and the first word segmentation unit into a performance prediction model to obtain the configuration score corresponding to the update data includes:
generating a query scene based on the first field name and the first word segmentation device, wherein the query scene is a query operation of a simulated user;
testing the query response speed corresponding to each query scene by using a performance testing tool in the performance prediction model;
and solving the root mean square of the query response speed corresponding to each query scene, and obtaining the configuration score corresponding to the update data.
By adopting the technical scheme, the performance testing tool is used for testing the query response speed of the query scene simulating each query operation of the user, and the configuration score is calculated by solving the root mean square of the query response speed, so that the accuracy of calculating the configuration score can be improved.
Optionally, after creating the elastic search index of the update data by configuring mapping information corresponding to the update data, the method further includes:
judging whether mapping information corresponding to the update data exists in the history elastic search index;
if the mapping information corresponding to the update data exists in the history elastic search index, inquiring a corresponding history text through the mapping information corresponding to the update data, and performing a new adding or deleting operation on the history text based on the update data.
By adopting the technical scheme, if mapping information corresponding to the update data already exists in the existing historical elastic search index, the corresponding historical text is subjected to new addition or deletion operation, so that the historical text in the elastic search index can be updated rapidly, and meanwhile, the accuracy of data query can be improved.
Optionally, after creating the elastic search index of the update data by configuring mapping information corresponding to the update data, the method further includes:
acquiring the access frequency of the update data;
and adjusting the size and the number of fragments of the update data in an elastic search index based on the access frequency.
By adopting the technical scheme, the size and the number of the fragments of the update data can be dynamically adjusted according to the access frequency of the update data, and the distribution and the query process of the update data can be optimized, so that the efficiency and the accuracy of policy data query are improved.
Optionally, acquiring a first search result returned by using the MySQL database for policy data query, and acquiring a second search result returned by using the elastic search index for policy data query;
combining the first search result and the second search result to obtain a combined search result;
And outputting the combined search result to a user side.
By adopting the technical scheme, the accuracy of the search result can be improved, and the requirement of a user on accurate search of policy data is met.
In a second aspect, the present application provides a data query index creation apparatus, the apparatus including:
the database construction module is used for constructing a MySQL database of policy data;
the data synchronization module is used for synchronizing policy data in the MySQL database to an elastic search library;
the change data monitoring module is used for monitoring change data of policy data in the MySQL database by using the Canal component, and the change data is generated from a change record;
the update data processing module is used for processing and aggregating the change data by using the correlation data of the change data to obtain update data, wherein the correlation data of the change data is obtained by inquiring the unique identification of the change data;
the index creation module is used for creating an elastic search index of the update data by configuring mapping information corresponding to the update data;
and the data query module is used for querying the policy data by using the elastic search index.
In a third aspect, the present application provides a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform any of the methods described above.
In a fourth aspect, the present application provides an electronic device comprising a processor, a memory for storing instructions, and a transceiver for communicating with other devices, the processor for executing the instructions stored in the memory to cause the electronic device to perform a method as in any one of the above.
In summary, the beneficial effects brought by the technical scheme of the application include:
when monitoring the change data in the MySQL database, processing and aggregating the correlation data to obtain updated data, and configuring mapping information corresponding to the updated data, wherein the mapping information comprises the data types and the attributes of fields in the updated data, so that the fields needing searching and analysis in the updated data can be determined, a more accurate elastic search index can be created, compared with the MySQL database which queries policy data through structured data, the policy data query can be performed through unstructured or semi-structured data by using the elastic search index, the query efficiency and the query speed can be improved, and the response time of the policy data query is reduced.
Drawings
Fig. 1 is a flowchart of a method for creating a data query index according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating another method for creating a data query index according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a data query index creating apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Reference numerals illustrate: 301. a database construction module; 302. a data synchronization module; 303. a change data monitoring module; 304. updating a data processing module; 305. an index creation module; 400. an electronic device; 401. a processor; 402. a communication bus; 403. a user interface; 404. a network interface; 405. a memory.
Detailed Description
In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present application, but not all embodiments.
In the description of embodiments of the present application, words such as "exemplary," "such as" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "illustrative," "such as" or "for example" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "illustratively," "such as" or "for example," etc., is intended to present related concepts in a concrete fashion.
In the description of the embodiments of the present application, the term "plurality" means two or more. For example, a plurality of systems means two or more systems, and a plurality of screen terminals means two or more screen terminals. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating an indicated technical feature. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
First, a simple explanation is made of the operation architecture of the policy data query related to the present application. The policy data includes not only a large amount of text data and structured data, but also a relatively complex structure and association relation, compared with the general data. Therefore, when selecting the storage mode of policy data, policy data is stored in MySQL database instead of in the elastic search based on various considerations such as consistency, complexity, storage capacity and cost of policy data, security, and authority control. But the search function of MySQL database is relatively weak, especially under large-scale data sets and complex query conditions, the query performance is greatly limited. And the elastiscearch is a distributed search engine specially used for searching and analyzing, and the elastiscearch index is created according to the policy information in the MySQL database, so that the faster and more flexible searching and inquiring functions can be realized.
Referring to fig. 1, a flow chart of a method for creating a data query index according to an embodiment of the present application is provided, and the method may be implemented by a computer program, may be implemented by a single chip microcomputer, or may be run on a data query index creating device based on von neumann system. The computer program may be integrated in the application or may run as a stand-alone tool class application. The application takes a server for inquiring policy data as an execution subject, and describes specific steps of a data inquiry index creation method in detail.
S101, constructing a MySQL database of policy data.
Policy data is unique in its complexity and diversity of content. Policy data typically contains a large amount of textual descriptions and related information such as policy terms, explanations, scope of applicability, etc. Policy data may also relate to a number of areas and topics, such as economic policies, environmental policies, educational policies, and the like. In addition, the policy data is updated frequently, and needs to be updated and maintained in time.
Considering the complexity and diversity of policy data, the MySQL database has the advantages of good performance, flexible expansion and the like, and can meet the basic requirements of policy data management and query.
S102, synchronizing policy data in the MySQL database into an elastic search library.
The MySQL database is a relational database, and the query language is based on Structured Query Language (SQL), so that the query performance of the MySQL database is greatly limited when the MySQL database processes large-scale data and complex queries, especially when operations such as full-text search, fuzzy matching, aggregation analysis and the like are required. The elastic search library is a full-text search engine, has powerful full-text search and analysis functions, and can process unstructured and semi-structured data. The elastiscearch library performs better when dealing with large-scale data and complex queries.
Specifically, importing policy data in the MySQL database into the elastic search library may be accomplished using an open source data collection and processing tool. An exemplary synchronization approach is to use a logstar tool to configure the input plugin as MySQL database and the output plugin as an elastiscearch library.
S103, monitoring change data of policy data in the MySQL database by using the Canal component, wherein the change data is generated from the change record.
The Canal component is an incremental component of an open source for the MySQL database and can be used for monitoring the change data of the MySQL database. Specifically, the interception object in the present application is change data of policy data. The change data is typically an object that contains a change type and a change record.
The change record may be Binlog, which is a transaction log of the MySQL database. The change operation in the database is recorded, including the operations of insertion, update, deletion, and the like. By analyzing Binlog, the change data of the database can be obtained. In another implementation, the third party library may be used to help monitor the change data in the MySQL database.
And connecting the Canal component to a MySQL database, analyzing a Binlog file in the MySQL database, and extracting the change operation in the Binlog file, so as to monitor the change data of the normal information in the database.
S104, processing and aggregating the change data by using the correlation data of the change data to obtain updated data, wherein the correlation data of the change data is obtained by inquiring the unique identification of the change data.
The correlation data is other data related to or associated with the variable data, and the correlation data of the variable data can be obtained by referring to a related data table based on a related field in the variable data, for example, a public ID, an item ID, or the like.
In one implementation manner, the obtaining manner of the correlation data may include:
and inquiring through the unique identification of the change data to obtain the correlation data.
The unique identifier is a field directly related to the change data, and the query is performed through the unique identifier, so that the acquisition of the correlation data accurately related to the change data can be ensured. And the efficiency of inquiring through the unique identification is higher, and the relevance data associated with the change data can be quickly positioned.
Meanwhile, the unique identification of the change data has expansibility, the unique identification can be any field or combination field, and different unique identification fields can be selected for inquiry according to different change data types and associated data types, so that the requirements of different inquiry scenes are met.
In addition, the unique identification of the change data has maintainability, and the change data and the correlation data can be clearly associated. For example, if the change data changes, only the index of the correlation data needs to be updated, and the query and operation of other data are not affected.
The correlation data associated with the variant data can be retrieved by querying the relevant data table using the uniquely identified field as a condition.
The specific steps of processing and aggregation are to combine, calculate, screen and the like the change data and the correlation data according to the query requirement so as to generate updated data. For example, the public information and the project information may be combined into a complete piece of updated data, including fields such as a public title, a project name, and subsidy information.
S105, an elastic search index of the update data is created by configuring mapping information corresponding to the update data.
The mapping information corresponding to the update data refers to data types of fields in the update data, such as a word segmentation device, and the data types include text type (text), number type (integer, float, etc.), date type (date), etc. The word segmenter is used for performing word segmentation on the text type field so as to be capable of more accurately matching keywords in searching and matching, and includes, for example, a standard word segmenter (standard), a chinese word segmenter (ik_smart, ik_max_word), and the like. It should be understood that, according to the specific requirement of the updated data, a suitable attribute may be selected for definition, so as to improve the accuracy and efficiency of the subsequent query.
The elastomer search index is a core concept in elastomer search, similar to tables in relational databases. An index is a logical container used to store and organize data, containing a series of documents with the same structure. In the elastic search, the document is the smallest unit of data, which can be understood as a piece of record. Each document has a unique ID for identification and retrieval. The document is made up of a plurality of fields, each containing a data value, which may be of the text, numeric, date, etc. type.
Since the mapping information of the elastesearch index defines the data type, the word segmentation device and other attributes when the update data is written, the update data can be inserted or updated according to the mapping information of the elastesearch index through an API or tool provided by the elastesearch to create the elastesearch index of the update data.
It is to be appreciated that the update data can be indexed by the elastomer search and inverted index built to support efficient querying.
In one implementation manner, whether mapping information corresponding to update data exists in the historical elastic search index is judged; if the mapping information corresponding to the update data exists in the history elastic search index, inquiring the corresponding history text through the mapping information corresponding to the update data, and performing new addition or deletion operation on the history text based on the update data.
Whether corresponding Mapping information exists can be queried by calling an API of the Mapping type. If mapping information corresponding to the update data exists in the historical elastic search index, the index corresponding to the update data is created before the mapping information exists, and the index contains historical text data. The new or deleted operation of the history text can be realized by using a writing API or tool in the elastic search.
By judging whether mapping information exists in the historical index, repeated creation of the index and mapping can be avoided, and resources and time are saved.
In one implementation, when using the elastiscearch index for policy data queries, the elastiscearch index may be further optimized by query effects. The specific optimization steps comprise:
acquiring the access frequency of the update data; the size and number of slices of update data in the elastic search index are adjusted based on the access frequency.
The access frequency of the update data can be obtained by recording the access times and time intervals of each update data and monitoring and counting the access of the update data.
The access frequency of the updated data reflects the heat of the data and the concurrent access amount. For frequently accessed hot data, the fragment size can be set smaller to increase the response speed of the query. And for the cold data which is not accessed frequently, the size of the fragments can be set to be larger so as to save resources and storage space. For data with higher concurrent access, the number of fragments can be increased to improve the concurrent processing capability of the query. For data with lower concurrent access quantity, the number of fragments can be reduced, so that resources are saved and the query efficiency is improved.
The adjustment of the size and number of slices can be achieved by adjusting the setting parameters of the elastiscearch index.
In the embodiment of the application, the mapping information is configured to create the elastic search index and query the policy data, compared with the binlog reading method used in the query of the MySQL database, the mapping information defines the field types and the attributes in the document, so that the elastic search can process the query request more intelligently, only the necessary fields are searched and analyzed, unnecessary calculation and operation are reduced, the query efficiency and the query speed can be improved, and the response time of the policy query function is reduced.
Referring to fig. 2, a flowchart of another method for creating a data query index according to an embodiment of the present application is shown. The step of creating an elastic search index of update data is described in detail to improve the performance of policy data queries.
S201, constructing a MySQL database of policy data.
S202, synchronizing policy data in the MySQL database into an elastic search library.
S203, monitoring change data of policy data in the MySQL database by using the Canal component, wherein the change data is generated from the change record.
S204, processing and aggregating the change data by using the correlation data of the change data to obtain updated data, wherein the correlation data of the change data is obtained by inquiring the unique identification of the change data.
The steps S201 to S204 are described in detail in the steps S101 to S204 in the above embodiments, and are not described herein.
S205, a first field name and a first word segmentation device in mapping information corresponding to the update data are obtained, wherein the first word segmentation device is a rule for segmenting text type fields.
After the mapping information corresponding to the update data is obtained, the field names and the word segmentation information in the mapping information are analyzed. Specifically, for each field, the name and the corresponding word segmentation device are acquired, and the name and the corresponding word segmentation device are named as a first field name and a first word segmentation device. The segmenter rules may be built-in segmenters provided by elastic search, such as Standard Analyzer, simple Analyzer, etc., or may be custom segmenter rules optimized according to policy data, which are not limited herein.
S206, inputting the first field name and the first word segmentation device into the performance prediction model to obtain the configuration score corresponding to the updated data.
The performance prediction model can predict the configuration scores corresponding to the updated data according to the characteristics of the field names, the word segmentation device and the like. The training and construction of the performance prediction model may be performed using machine learning algorithms, such as regression models, decision trees, and the like. In training and construction of the performance prediction model, a sample data set is constructed using the first field name and the first word segmentation unit as input data. The sample data set includes a plurality of samples, each sample containing a field name and a word segmenter as features, and a corresponding configuration score as a label. Training of the performance prediction model is performed using the sample dataset. Model training can be performed using a machine learning algorithm, and a performance prediction model is constructed by learning the relationship between the features and the labels in the sample dataset.
By predicting the configuration scores, the performance optimization degree of the updated data can be rapidly evaluated aiming at different query scenes and data characteristics, so that the response speed is improved.
Specifically, step S206 further includes:
in step S2061, a query scene is generated based on the first field name and the first word segmentation unit, where the query scene is a query operation of the simulated user.
The query scenarios may include different query conditions, ordering, aggregation operations, etc. to cover different query requirements and usage scenarios. Specifically, parameters such as the number of concurrent users, the request frequency and the like can be set so as to simulate the real query load.
Step S2062, testing the query response speed corresponding to each query scenario using the performance testing tool in the performance prediction model.
The query response speed under each query scene can be tested by using an open-source performance testing tool, such as Apache Meter, gatler and the like, or the testing tool can be automatically developed according to the performance testing requirement, and the response speed can meet the corresponding error and is not repeated here.
Step S2063, the root mean square of the query response speed corresponding to each query scene is obtained, and the configuration score corresponding to the updated data is obtained.
The configuration score of the updated data is predicted by calculating the root mean square of the query response speed, so that the accuracy of prediction can be improved. Policy query can be further optimized by simulating a real query scene, so that query response speed and user experience are improved.
In one implementation, a second field name with similarity to the first field name being greater than a second threshold and a second word segmentation device with similarity to the first word segmentation device being greater than a third threshold are obtained; taking mapping information simultaneously containing a second field name and a second word separator as similar mapping information, wherein the similar mapping information corresponds to similar data; the elastic search index of the similar data and the update data are combined, and the similar data and the update data are stored in the same piece.
And calculating the similarity of other field names and the word splitters with the first field names and the first word splitters by using a similarity algorithm (such as editing distance, cosine similarity and the like) according to the first field names and the first word splitters. Screening out a second field name with similarity larger than a second threshold value and a second word segmentation device with similarity larger than a third threshold value. The similarity mapping information corresponds to similarity data. The similar data and the updated data are combined into the same elastic search index, the index combining function of the elastic search can be used, the similar data and the updated data are stored in the same slice, the number of slices of the query and the complexity of the query system can be reduced, and therefore the query performance can be optimized.
S207, the first field name with the configuration score being greater than or equal to a first threshold value and the first word segmentation device are used as mapping information corresponding to the update data, and an elastic search index of the update data is created.
Proper fields and word segmenters can be screened out by configuring scores, so that the accuracy and recall rate of inquiry are improved.
In one implementation, an accurate search mode for a user is provided, which can implement parallel search of an elastic search and a MySQL database.
Acquiring a first search result returned by using a MySQL database for policy data query, and acquiring a second search result returned by using an elastic search index for policy data query; combining the first search result and the second search result to obtain a combined search result; and outputting the combined search result to the user side.
First, the periodic synchronization of the elastiscearch index data with the data in the MySQL database may be accomplished using a data synchronization tool. Second, the search request may be sent to both the elastic search and MySQL databases using either a multi-threaded or asynchronous approach. And finally, merging search results returned by the elastic search and MySQL databases, and returning the final merged results to the user.
By considering the difference of storage modes and search mechanisms between the data of the elastic search and the MySQL database, the efficiency and the accuracy of policy data search can be improved through the parallel search mode.
In the embodiment of the application, the proper mapping information is screened out by simulating the actual policy data query scene, and the creation of the elastic search index is reasonably optimized, so that the query speed of the policy data can be improved, and the user query experience can be improved.
The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
Referring to fig. 3, a schematic structural diagram of a data query index creating apparatus according to an exemplary embodiment of the present application is shown. The apparatus may be implemented as all or part of an apparatus by software, hardware, or a combination of both. The device comprises a database construction module 301, a data synchronization module 302, a change data listening module 303, an update data processing module 304 and an index creation module 305.
A database construction module 301, configured to construct a MySQL database of policy data;
a data synchronization module 302, configured to synchronize policy data in the MySQL database to an elastic search library;
The change data monitoring module 303 is configured to monitor change data of policy data in the MySQL database by using the cananal component, where the change data is generated from the change record;
the update data processing module 304 is configured to process and aggregate the change data by using the correlation data of the change data to obtain update data, where the correlation data of the change data is obtained by querying a unique identifier of the change data;
the index creation module 305 is configured to create an elastic search index of the update data by configuring mapping information corresponding to the update data.
Optionally, the index creating module 305 further includes a mapping information filtering unit, a similar data merging unit, a history text adjusting unit, an access frequency adjusting unit, and a parallel searching unit, where the information filtering unit further includes a scene simulation subunit.
The mapping information screening unit is used for acquiring a first field name and a first word segmentation device in mapping information corresponding to the updating data, wherein the first word segmentation device is a rule for segmenting text type fields; inputting the first field name and the first word segmentation device into a performance prediction model to obtain a configuration score corresponding to the updated data; and taking the first field name with the configuration score being greater than or equal to a first threshold value and the first word segmentation device as mapping information corresponding to the update data, and creating an elastic search index of the update data.
The similarity data merging unit is used for acquiring a second field name with the similarity to the first field name being larger than a second threshold value and a second word segmentation device with the similarity to the first word segmentation device being larger than a third threshold value; taking mapping information simultaneously containing a second field name and a second word separator as similar mapping information, wherein the similar mapping information corresponds to similar data; the elastic search index of the similar data and the update data are combined, and the similar data and the update data are stored in the same piece.
A history text adjustment unit, configured to determine whether mapping information corresponding to update data exists in the history elastic search index; if the mapping information corresponding to the update data exists in the history elastic search index, inquiring the corresponding history text through the mapping information corresponding to the update data, and performing new addition or deletion operation on the history text based on the update data.
The scene simulation subunit is used for generating a query scene based on the first field name and the first word segmentation device, wherein the query scene is a query operation of a simulation user; testing the query response speed corresponding to each query scene by using a performance testing tool in the performance prediction model; and solving the root mean square of the query response speed corresponding to each query scene, and obtaining the configuration score corresponding to the update data.
The frequency adjusting unit is used for acquiring the access frequency of the update data; the size and number of slices of update data in the elastic search index are adjusted based on the access frequency.
The parallel search unit is used for acquiring a first search result returned by using the MySQL database for policy data query and acquiring a second search result returned by using the elastic search index for policy data query; combining the first search result and the second search result to obtain a combined search result; and outputting the combined search result to a user side.
The embodiment of the present application further provides a computer storage medium, where a plurality of instructions may be stored, where the instructions are adapted to be loaded and executed by a processor, where a specific execution process may be referred to in the specific description of the embodiment shown in fig. 1-2, where the specific description is omitted here.
Referring to fig. 4, a schematic structural diagram of an electronic device is provided in an embodiment of the present application. As shown in fig. 4, the electronic device 400 may include: at least one processor 401, at least one network interface 404, a user interface 403, a memory 405, and at least one communication bus 402.
Wherein communication bus 402 is used to enable connected communications between these components.
The user interface 403 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 403 may further include a standard wired interface and a standard wireless interface.
The network interface 404 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Wherein the processor 401 may include one or more processing cores. The processor 401 connects the various parts within the entire server using various interfaces and lines, performs various functions of the server and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 405, and invoking data stored in the memory 405. Alternatively, the processor 401 may be implemented in at least one hardware form of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 401 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 401 and may be implemented by a single chip.
The Memory 405 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 405 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 405 may be used to store instructions, programs, code sets, or instruction sets. The memory 405 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described various method embodiments, etc.; the storage data area may store data or the like involved in the above respective method embodiments. The memory 405 may also optionally be at least one storage device located remotely from the aforementioned processor 401. As shown in fig. 4, an operating system, a network communication module, a user interface module, and an application program of a data query index creation method may be included in the memory 405 as a computer storage medium.
In the electronic device 400 shown in fig. 4, the user interface 403 is mainly used as an interface for providing input for a user, and obtains data input by the user; and processor 401 may be used to invoke an application program in memory 405 that stores a data query index creation method, which when executed by one or more processors, causes the electronic device to perform the method as in one or more of the embodiments described above.
An electronic device readable storage medium storing instructions. The method of one or more of the above embodiments is performed by one or more processors, which when executed by an electronic device.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided herein, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, such as a division of units, merely a division of logic functions, and there may be additional divisions in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some service interface, device or unit indirect coupling or communication connection, electrical or otherwise.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, including several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned memory includes: various media capable of storing program codes, such as a U disk, a mobile hard disk, a magnetic disk or an optical disk.
The above are merely exemplary embodiments of the present disclosure and are not intended to limit the scope of the present disclosure. That is, equivalent changes and modifications are contemplated by the teachings of this disclosure, which fall within the scope of the present disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a scope and spirit of the disclosure being indicated by the claims.

Claims (10)

1. A method of creating a data query index, the method comprising:
constructing a MySQL database of policy data;
synchronizing policy data in the MySQL database to an elastic search library;
monitoring change data of policy data in a MySQL database by using a Canal component, wherein the change data is generated from a change record;
processing and aggregating the change data by using the correlation data of the change data to obtain updated data, wherein the correlation data of the change data is obtained by inquiring the unique identification of the change data;
And creating an elastic search index of the update data by configuring mapping information corresponding to the update data.
2. The method according to claim 1, wherein creating the elastsearch index of the update data by configuring mapping information corresponding to the update data comprises:
acquiring a first field name and a first word segmentation device in mapping information corresponding to the update data, wherein the first word segmentation device is a rule for segmenting text fields;
inputting the first field name and the first word segmentation device into a performance prediction model to obtain a configuration score corresponding to the updated data;
and taking the first field name with the configuration score being greater than or equal to a first threshold value and the first word segmentation device as mapping information corresponding to the update data, and creating an elastic search index of the update data.
3. The method of claim 2, wherein the inputting the first field name and the first word segmentation unit into a performance prediction model to obtain the configuration score corresponding to the updated data further comprises:
acquiring a second field name with the similarity to the first field name being greater than a second threshold value and a second word segmentation device with the similarity to the first word segmentation device being greater than a third threshold value;
Taking mapping information simultaneously containing the second field name and the second word segmentation device as similar mapping information, wherein the similar mapping information corresponds to similar data;
and merging the elastic search index of the similar data and the update data, and storing the similar data and the update data in the same fragment.
4. The method of claim 2, wherein the inputting the first field name and the first word segmentation unit into a performance prediction model to obtain the configuration score corresponding to the updated data comprises:
generating a query scene based on the first field name and the first word segmentation device, wherein the query scene is a query operation of a simulated user;
testing the query response speed corresponding to each query scene by using a performance testing tool in the performance prediction model;
and solving the root mean square of the query response speed corresponding to each query scene, and obtaining the configuration score corresponding to the update data.
5. The method according to claim 1, wherein after creating the elastic search index of the update data by configuring the mapping information corresponding to the update data, further comprises:
Judging whether mapping information corresponding to the update data exists in the history elastic search index;
if the mapping information corresponding to the update data exists in the history elastic search index, inquiring a corresponding history text through the mapping information corresponding to the update data, and performing a new adding or deleting operation on the history text based on the update data.
6. The method according to claim 1, wherein after creating the elastic search index of the update data by configuring the mapping information corresponding to the update data, further comprises:
acquiring the access frequency of the update data;
and adjusting the size and the number of fragments of the update data in an elastic search index based on the access frequency.
7. The method according to claim 1, wherein after creating the elastic search index of the update data by configuring the mapping information corresponding to the update data, further comprises:
acquiring a first search result returned by using the MySQL database for policy data query, and acquiring a second search result returned by using the elastic search index for policy data query;
combining the first search result and the second search result to obtain a combined search result;
And outputting the combined search result to a user side.
8. A data query index creation apparatus, the apparatus comprising:
the database construction module is used for constructing a MySQL database of policy data;
the data synchronization module is used for synchronizing policy data in the MySQL database to an elastic search library;
the change data monitoring module is used for monitoring change data of policy data in the MySQL database by using the Canal component, and the change data is generated from a change record;
the update data processing module is used for processing and aggregating the change data by using the correlation data of the change data to obtain update data, wherein the correlation data of the change data is obtained by inquiring the unique identification of the change data;
and the index creation module is used for creating an elastic search index of the update data by configuring mapping information corresponding to the update data.
9. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method of any one of claims 1 to 7.
10. An electronic device comprising a processor, a memory and a transceiver, the memory configured to store instructions, the transceiver configured to communicate with other devices, the processor configured to execute the instructions stored in the memory, to cause the electronic device to perform the method of any one of claims 1-7.
CN202311490405.3A 2023-11-08 2023-11-08 Data query index creation method and device, storage medium and electronic equipment Pending CN117688124A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311490405.3A CN117688124A (en) 2023-11-08 2023-11-08 Data query index creation method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311490405.3A CN117688124A (en) 2023-11-08 2023-11-08 Data query index creation method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN117688124A true CN117688124A (en) 2024-03-12

Family

ID=90127456

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311490405.3A Pending CN117688124A (en) 2023-11-08 2023-11-08 Data query index creation method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117688124A (en)

Similar Documents

Publication Publication Date Title
JP7343568B2 (en) Identifying and applying hyperparameters for machine learning
US9239875B2 (en) Method for disambiguated features in unstructured text
KR102324048B1 (en) Method, apparatus, computer device and storage medium for verifying community question answer data
US20180113928A1 (en) Multiple record linkage algorithm selector
US10956469B2 (en) System and method for metadata correlation using natural language processing
GB2499395A (en) Search method
JP6042974B2 (en) Data management apparatus, data management method, and non-temporary recording medium
CN107391682B (en) Knowledge verification method, knowledge verification apparatus, and storage medium
CN116848490A (en) Document analysis using model intersection
KR20150086958A (en) System and method for determining infringement of copyright based on the text reference point
WO2015084757A1 (en) Systems and methods for processing data stored in a database
CN110968664A (en) Document retrieval method, device, equipment and medium
CN114072788B (en) Method and system for random sampling from search engine
CN116226494B (en) Crawler system and method for information search
CN112434009A (en) End-to-end data probing method and device, computer equipment and storage medium
CN107430633B (en) System and method for data storage and computer readable medium
CN111859042A (en) Retrieval method and device and electronic equipment
KR101880474B1 (en) Keyword-based service provide method for high value added content information service and method and recording medium storing program for executing the same and recording medium storing program for executing the same
US11645283B2 (en) Predictive query processing
CN117688124A (en) Data query index creation method and device, storage medium and electronic equipment
CN114064606A (en) Database migration method, device, equipment, storage medium and system
CN113064982A (en) Question-answer library generation method and related equipment
JP2011018152A (en) Information presentation device, information presentation method, and program
CN116501841B (en) Fuzzy query method, system and storage medium for data model
CN114201607B (en) Information processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination