CN114817253A - Method for building document model based on search analysis engine and application thereof - Google Patents

Method for building document model based on search analysis engine and application thereof Download PDF

Info

Publication number
CN114817253A
CN114817253A CN202210415136.3A CN202210415136A CN114817253A CN 114817253 A CN114817253 A CN 114817253A CN 202210415136 A CN202210415136 A CN 202210415136A CN 114817253 A CN114817253 A CN 114817253A
Authority
CN
China
Prior art keywords
document
target
query
population
business
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210415136.3A
Other languages
Chinese (zh)
Inventor
黄练纲
张翔宇
张帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCI China Co Ltd
Original Assignee
CCI China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCI China Co Ltd filed Critical CCI China Co Ltd
Priority to CN202210415136.3A priority Critical patent/CN114817253A/en
Publication of CN114817253A publication Critical patent/CN114817253A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The scheme provides a method for building a document model based on a search analysis engine and application thereof, which are used for searching target crowds, obtaining population basic information and business information of the target crowds and building a corresponding document model, wherein the population basic information is filled in a retrieval field of the document model, and the business information is filled in a business field of the document model to form a document corresponding to each target crowd, so that the target crowds can be quickly searched under different conditions under the condition of large data volume.

Description

Method for building document model based on search analysis engine and application thereof
Technical Field
The invention relates to the field of retrieval and query, in particular to a method for building a document model based on a search analysis engine and application thereof.
Background
With the rapid development of society and the popularization of communication equipment, data is rapidly expanding and becoming large, especially in the era that social networks and mobile communication bring human beings into mass data information, the retrieval and query of data become very complicated due to huge data volume, the time cost required for accurately querying one piece of data in a database with huge data volume is extremely high, and how to efficiently process mass information becomes a difficult problem of many enterprises and institutions.
The data is divided into structured data and unstructured data, the structured data is also called line data, is logically expressed and realized by a two-dimensional table structure, strictly follows the data format and length specification, is mainly stored and managed by a relational database, and is the unstructured data which is not suitable for being expressed by a two-dimensional table of the database, relative to the structured data, and comprises office documents, XML, HTML, various types of reports, pictures, audio and video information and the like in all formats.
The data of the target crowd exists in a structured data form, the data volume of the target crowd library is continuously increased, and meanwhile, the query demand of users is increased, which puts higher requirements on the data storage capacity of the target crowd library and the capacity of providing retrieval service for the outside. The existing optimization schemes of indexes, partitions and the like of the database layer cannot meet the timeliness when a large amount of data are gathered and inquired. When specific information needs to be queried, all tables possibly involved need to be queried and associated, which brings problems of data query instantaneity and high-load operation of a system, and is not beneficial to business application under large data volume.
Disclosure of Invention
The scheme provides a method for building a document model based on a search analysis engine and application thereof, wherein the corresponding document model is built based on population basic information and business information of target population, and the search analysis engine is implemented
In a first aspect, the present invention is a method for building a document model based on a search analysis engine, including: acquiring population basic information and business information of at least one target population, wherein the population basic information comprises general attributes corresponding to different types of target populations, and the business information comprises population types, business items and attribution information of the corresponding target populations;
and building a document model corresponding to each target crowd, filling the population basic information corresponding to the target crowd into the retrieval field of the document model, and filling the business information corresponding to the target crowd into the business field of the document model to obtain a document corresponding to the target crowd.
In some embodiments, the business items are special attributes corresponding to different types of target people, and the special attributes are used for distinguishing different types of target people.
In some embodiments, the attribution information is the zone code and the mesh of the target population.
In some embodiments, each document is named with a document uniform prefix and a grid number, the grid number being a code corresponding to the home grid information in the corresponding document.
In some embodiments, each of said documents comprises document directory metadata and document identification metadata, said document identification metadata of said document corresponding to a target demographic, said document identification metadata and said document directory metadata locating a particular document.
In some embodiments, a non-temporal field within the document model employs keywords and the temporal field employs timestamps.
In a second aspect, the scheme provides a document model built based on the method.
In a third aspect, the scheme provides a query method of a document model built based on the method, which comprises the following steps:
acquiring a query request, wherein business information of a target crowd is recorded in the query request;
and querying the corresponding document model based on the query request.
When a query request is obtained, a query statement is predefined, parameters of the query statement are reserved, query conditions for recording incoming parameters are obtained, and the incoming parameters are brought into the query statement to be packaged into the query request.
In a fourth aspect, the present disclosure provides a target crowd querying device, including:
the document building module is used for building a document model corresponding to each target crowd, population basic information of the target crowd is filled in retrieval fields of the document model, and business information of the target crowd is filled in business fields of the document model, wherein the population basic information comprises general attributes corresponding to different types of target crowds, the business information comprises crowd types, business items and attribution information corresponding to the target crowd, documents corresponding to the target crowd are obtained, and the documents are named by the business information and the document uniform prefix;
and the query module is used for querying the corresponding document based on the query request.
In a fifth aspect, the present disclosure provides an electronic device, which includes a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the above method for building a document model based on a search analysis engine and the above method for querying a target group.
In a sixth aspect, the present disclosure provides a readable storage medium, in which a computer program is stored, where the computer program includes program code for controlling a process to execute the process, where the process includes the above-mentioned document model building method based on a search analysis engine and the above-mentioned target population query method.
Compared with the prior art, the technical scheme has the following beneficial effects: based on analysis of data characteristics and application scenes of different types of target crowds, document model design is carried out on the target crowd data based on services, and a plurality of target crowds can be gathered and inquired by adopting a document-by-document design;
in addition, the target population is stored in a search analysis engine in a modeling mode, the result can be quickly inquired by utilizing the inquiry under different conditions, and compared with the traditional structured database table design, the complicated SQL design and the inquiry method for associating all branch tables are avoided, and the expansibility and the performance are obviously improved. By utilizing the common _ search request, the target data requirement can be quickly searched in a large amount of data, and compared with single-table query, the performance is obviously improved; compared with the sub-table query, the method avoids the design of a complex query algorithm and supports the query of all fields under the full amount of data.
Drawings
Fig. 1 is a flowchart of a method for building a document model based on an elastic search according to the present invention.
FIG. 2 is a schematic view of a document model.
Fig. 3 is a schematic diagram of data entry by Logstash.
Fig. 4 is a flowchart of a target population query method provided in the present embodiment.
Fig. 5 is a schematic diagram of a business system query.
Fig. 6 is a schematic diagram of the unified attribute information of the target population.
Fig. 7 is a block diagram of a target crowd inquiry apparatus according to an embodiment of the present application;
fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
By the method and the device, a target population application model based on search analysis engine service can be established and used for gathering and querying the target population. Based on the service characteristics of each data item of the target population, the document of the search analysis engine is structurally designed (basic information and service information), the aggregation and sharing scenes of each part of structured data are analyzed, the field of each part of data is further refined, the target population can be structurally stored in the search analysis engine, the document stored in the search analysis engine is split and designed in a flattened mode by utilizing the service characteristics of the search analysis engine, the storage of the target population in the search analysis engine is realized, and based on the characteristic that the search analysis engine quickly queries under a large data volume, the quick query of the target population is realized, the query efficiency is improved, and the quick query of the target population under different query conditions is realized. By the method and the device, the target crowd can be quickly inquired under different service scenes, and the application of the target crowd is supported.
To facilitate understanding of the present solution, first, terms related to the present solution are introduced:
a search analysis engine: the method is characterized in that the used search analysis engine is an Elasticissearch engine which is a distributed, high-expansion and high-real-time search and data analysis engine and can conveniently enable a large amount of data to have the searching, analyzing and exploring capabilities.
Example one
The scheme provides a method for building a document model based on an elastic search, which comprises the following steps:
acquiring population basic information and service information of at least one type of target population, wherein the population basic information comprises general attributes corresponding to different types of target populations, and the service information comprises population types, service items and attribution information corresponding to the target populations;
building a document model corresponding to each target crowd, filling the retrieval field of the document model with the population basic information corresponding to the target crowd, and filling the business field of the document model with the business information corresponding to the target crowd to obtain the document corresponding to the target crowd.
Each target group has a population identification, and a target group can be uniquely determined through the identification.
Specifically, the information of the target group may be imported from another system, or may be stored by a manager manually entering the system.
In the step of "wherein the population basic information includes general attributes corresponding to the type of target population", as shown in fig. 6, for different types of target population, the content included in the information to be learned is different, but the population basic information content has uniform attributes: name, great name, gender, identification number, date of birth, ethnicity, native place, marital status, political aspect, academic calendar, religious belief, occupation category, occupation, place of residence, address of household registration, and place of residence.
In some embodiments, the document model needs to be imported into an elastic search for use, and the elastic search is different from a structured database, and for poor support of the associated query, the design of the document should be flattened to ensure that all required fields can be covered.
In the step of "the service information includes a crowd type, a service item and attribution information corresponding to the target crowd", the service item is a special attribute corresponding to different types of target crowds, and the special attribute is used for distinguishing different types of target crowds.
For example, if the type of a target group is target group A, the business item includes A 1 , A 2 ,A 3 ,A 4 ,A 5 (ii) a If the type of a certain target crowd is a target crowd B, the business item comprises B 1 ,B 2 ,B 3 ,B 4 ,B 5
The attribution information is a region code and a grid to which the target crowd belongs, and the record of the attribution information can facilitate management personnel to quickly inquire the target crowd.
For different types of target crowds, the business sub items contained in the business information are different, but the business sub items correspond to uniform population basic information, the uniform population basic information and different business sub items are packaged into an object, the uniform packaging of different types of target crowds can be realized corresponding to different types of target crowds, and the structured storage of data in the elastic search is completed.
For example, if a "person a in the target population a" belongs to the hangzhou border area in zhe jiang, the grid to which the record belongs is: 'Hangzhou Binjiang district'. The thickness granularity of the grid can be adjusted.
In the step of obtaining the documents corresponding to the target population, each type of target population corresponds to a specific document model. And naming each document by using a document uniform prefix and a grid number, wherein the grid number is a division code recorded in the attribution information recorded in the corresponding document. Specifically, the document rule is a uniform prefix _, and in each data entry request, the document name of the request is composed of a specified prefix and a code of a grid to which the request belongs in the current request, so that automatic document storage of data in the Elasticsearch is realized according to the grid to which the request belongs.
In addition, when the document model of the scheme is recorded into the Elasticissearch, an Elasticissearch template is designed by utilizing an alias mechanism of the Elasticissearch, and data recorded into the Elasticissearch are divided according to the grids to which the Elasticissearch template belongs by adopting uniform aliases, so that uniform management of document storage data in the document storage data and documents in the Elasticissearch indexes is realized.
Specifically, the elastic search may assign an alias to one or more indexes, and the content of the one or more indexes may be searched through the alias, and internally, the elastic search may map the alias to a corresponding index, and may write a filter or route to the alias, where the alias cannot be repeated in the system, and also cannot be repeated with the index name, thereby implementing unified query on data of multiple documents in the elastic search.
The method comprises the steps of establishing a template for the document of the Elasticisearch, applying a document rule of the template to be a uniform prefix + (such as document _), and in each data entry request, enabling the document name of the request to be composed of the prefix specified in the template and the grid code (such as document _1100000000001) to which the current request belongs, so that the data in the Elasticisearch is automatically stored in a document-by-document mode according to the grid, the data distribution is dispersed, and the query pressure is reduced.
Each index designs a main fragment and a copy fragment, which comprises a plurality of documents, so as to share the query request.
Specifically, the shards are the key for distributing data in the cluster by the Elasticsearch, the shards are thought to be containers of the data, document data are stored in the shards, then the shards are distributed to nodes in the cluster, when the cluster expands or contracts, the Elasticsearch automatically migrates the shards among the nodes to keep the cluster balanced, and the shards can be main shards or duplicate shards.
Specifically, the copy fragment serves as a backup to prevent the main fragment from crashing, and when the main fragment is unavailable, the elastic search will be reselected to raise the latest copy fragment to the position of the main fragment.
Specifically, a plurality of main fragments and a plurality of copy fragments may exist on one node, and the main fragment and its own copy fragment may not be on the same node.
And filling population basic information corresponding to the target population into a retrieval field of the document model, filling business information corresponding to the target population into a business field of the document model, wherein a non-time field in the document model adopts a keyword, and a time field adopts a timestamp.
All non-temporal fields in document are keywords. This has the advantages that: the accuracy of data query is guaranteed, and the physical storage size of each record is reduced.
Specifically, when the keyword is used for storing data, the word segmentation is not carried out to establish the index, any thing related to a word segmentation device is not needed to be set after a certain field is mapped to be the keyword type, and the type defaults the text data type without word segmentation, so that the data segmentation can improve the query accuracy and save the storage space.
All time fields in the document adopt long (integer) time stamps, which has the advantages that: the universality of the data in different systems is ensured, the conversion and application of the data in different systems are facilitated, and meanwhile, the inquiry and statistical errors caused by time zone conversion are avoided.
Specifically, in a database system, different databases have different interpretations of time types: in terms of field types, the data (time class, time-dependent methods such as timestamp used for operating this type of object) of popular relational databases Oracle and MySQL (relational database management system) cannot be directly compatible for conversion; the value aspect of the field record, that is, the data, can be converted by using a data migration tool, so that, in order to facilitate the conversion application of the data between different systems and simultaneously avoid query and statistical errors caused by time zone conversion, all time fields in the document in the Elasticsearch adopt a long timestamp method, thereby achieving the cross-platform property.
The method includes the steps that population basic information and business information are filled into corresponding document models to obtain specific documents, in order to achieve the purpose that each target population corresponds to the specific documents, each document comprises document identification metadata, the document identification metadata of the documents correspond to a unique target population identifier, the uniqueness of the target population stored in an elastic search is guaranteed, and the phenomenon that multiple records possibly stored in the elastic search by one target population due to repeated adding, deleting and modifying operations are avoided, and data errors are caused is avoided.
Specifically, the document id of each record in the index of the Elasticsearch corresponds to the target population one by one, so that the storage uniqueness of the target population in the Elasticsearch is ensured.
In order to record the storage positions of the documents, each document comprises document directory metadata, and the document identification metadata and the document directory metadata locate a specific document.
Specifically, similar data is placed in one index, and non-similar data is placed in different indices, for example, the target group a data, the target group B data, and the target group C data are all placed in one large personnel index (total data), and one index may contain multiple documents.
Example two
The scheme provides a document model based on the Elasticissearch, which is built by adopting the building method of the document model based on the Elasticissearch described in the embodiment I.
In order to realize the data entry into the Elasticsearch, the scheme may adopt a logstack synchronization method, and the flow of the method is shown in fig. 3, and includes the following steps:
packaging according to a document model created by the Elasticissearch;
and requesting the Elasticissearch through Logstash, and recording target population data into a document model of the Elasticissearch.
In the step of requesting the Elasticissearch through the Logstash and recording the target crowd data into the document model of the Elasticissearch, writing an SQL statement based on a structured database, and recording the target crowd data into the document model of the Elasticissearch by adopting a timing batch synchronization mode according to a preset synchronization strategy.
In some embodiments, after the target population data is recorded into the document model of the elastic search, the underlying structured business data can be processed, so that the correctness of the data in the elastic search is ensured.
The logstack is an open-source data collection engine with real-time Pipeline capability, the logstack can dynamically unify data from different sources and standardize the data to a selected target output, the logstack Pipeline is an independent operation unit in the logstack, each Pipeline comprises two necessary elements, namely input and output, an optional element filter (element filter), the event processing Pipeline is responsible for coordinating the execution of the input and output, the input and output support codecs can encode or decode the data when the data enters or exits the Pipeline without using a separate filter, the data is recorded into the Elasticsearch through the deployed logstack, the version of the logstack is the same as that of the Elasticsearch, and incompatibility caused by version inconsistency is avoided.
In the step of packaging according to the document model created by the elastic search, the packaging is to hide the attribute and implementation details of the object, only externally disclose an interface and control the access level of reading and modifying the attribute in the program; the data obtained by abstraction and the behaviors (or functions) are combined to form an organic whole, namely, the data and the source codes of the operation data are organically combined to form a class, wherein the data and the functions are members of the class, and the data object is packaged into a json object to facilitate service system calling.
Specifically, the data object may be encapsulated as a json object by using a map, the required data is put into a map set, and then all sets are put into a List set, and the List set and json can be mutually converted.
In the step of writing SQL sentences based on the structured database and adopting a timing batch running synchronization mode, the SQL sentences can be written in the structured database and changed within the specified time to control the database to synchronize at the specified time through the Logstash.
Specifically, a timed batch refers to a batch of data to be processed that is accumulated into a "batch" and processed at once within a specified time, and the timed batch may automatically process a large amount of data at a specified point in time.
In the step of recording the target crowd data into the document model of the Elasticsearch according to the preset synchronization strategy, the target crowd data synchronized in the logstack synchronization mode is recorded into the document model set up in the first embodiment.
Specifically, the method includes the steps that basic configuration files mysql.conf need to be written first when database information of MySQL is recorded into the Elasticsearch in a logstack synchronization mode, the sysql.conf configuration files include a connected database, the position of a connection driver, the class name of the driver, whether paging is needed, the size of the paging is needed, query statements, timing fields and the like, the written configuration files and the database driver are placed in the same folder, and synchronization commands are input to start synchronization when the service of the Elasticsearch is in an open state.
In the step of processing the underlying structured business data, various data are processed and processed by analyzing, sorting, calculating, editing and the like, and valuable and meaningful data are deduced from a large amount of data recorded into the document model.
Specifically, because a large amount of repeated human mouth basic information exists in the target crowd data, the storage space is occupied, and the query speed is influenced, the useless information is deleted in a data processing mode, the accuracy of the basic information of each target crowd is ensured, the storage space is saved, and the search efficiency is improved.
In addition, since the Elasticsearch does not support transactions, in order to guarantee the consistency of data, the data entry Elasticsearch should be completed in the last step of the business logic. And the exception is thrown after the entry of the Elasticissearch fails so as to ensure the consistency of the data in the structured database and the Elasticissearch when the Elasticissearch is combined with the adding, deleting and modifying logic of the structured database.
The third scheme of the embodiment also provides a data searching method based on the document model of the Elasticsearch, and data query is carried out based on the document model based on the Elasticsearch described in the first embodiment.
A target population query method, as shown in fig. 4 and 5, comprising the steps of:
building a document model corresponding to each target crowd, wherein a retrieval field of the document model is filled in population basic information of the target crowd, a service field of the document model is filled in service information of the target crowd, the population basic information comprises general attributes corresponding to different types of target crowds, the service information comprises crowd types, service items and attribution information corresponding to the target crowd, documents corresponding to the target crowd are obtained, and the documents are named as the documents by the service information and document uniform prefix;
and acquiring a query request, and querying a corresponding document model based on the query request.
The scheme of how to build the document is shown in example one, and no redundancy explanation is made here.
In the step of obtaining the query request, a query statement is predefined, parameters of the query statement are reserved, at least one query condition for recording incoming parameters is obtained, and the incoming parameters are substituted into the parameters in the query statement to be packaged to form the query request.
In the step of querying the corresponding document based on the query request, business information is recorded in the query request, and the document is queried uniformly based on the business information and the name of the document matched with the business information.
In addition, in the step of "querying the corresponding document based on the query request", if the query condition of the query request is a simple query condition, unified query can be performed on each document through a releasearchchhghevelclient (a high-level client) based on an alias specified in an Elasticsearch template, so that the problem that all tables need to be associated in the conventional table-by-table query is solved; the structure of a query statement DSL (a query mode, which mainly aims to process a query and return a result) can be predefined, parameters are reserved, the parameters are received during query, and the complete DSL is rendered for query.
Specifically, DSL is a query language which is rich in functions and diversified in appearance in the Elasticsearch, and by using a json-format request body to interact with the Elasticsearch, various query requests can be realized.
Specific DSL queries mainly contain two types of query statements:
leaf query statement: special values for querying special fields, such as: math, term, range, etc.
Compound query statement: other leaf queries or compound queries can be merged, thereby implementing very complex query logic.
In the step of "querying the corresponding document based on the query request", if the query condition of the query request is a complex query condition, the query condition is integrated and uniformly packaged as a query object, and the query data is acquired by requesting an Elasticsearch through a releasearchchhghevelclient under different query conditions.
For the above query mode, the Elasticsearch is required to return all target crowd data meeting the conditions, the post (a common request type) request is carried out through the document id (target crowd identification) so as to realize accurate query, and for the returned information, the basic information and the service information in the returned information can be directly adopted for the service system to use;
for the paging query, specific target person data does not need to be returned, the size is set to be 0, a post request of search is carried out, then document _ count (the number of documents) in the corresponding group is taken out according to specific aggregation conditions and is used as total _ count (the total number) of paging information, and the result is packaged into json for a service system to use.
Specifically, when the size value is set to 0, only the aggregation result is returned without returning the query result.
That is, in the step of "querying the corresponding document based on the query request", population basic information and business information recorded by the document are returned, or paging information of the document is returned.
The document partitioning strategy under a large amount of data solves the problem that the Elasticissearch has no affairs and causes inconsistent data when being combined with the traditional structured database, writes the query condition into the object, and directly takes out the query condition from the object during query, thereby avoiding complicated query condition writing and maintenance.
The versions of the elastic search, the replaysearchhighlevelclient, the Logstash and other software parts based on the elastic search related by the invention should be kept the same to avoid incompatibility caused by version problems.
The software of the invention can be operated on the mainstream systems of Windows, MAC, Linux and the like, and realizes the functions, the data in the database is stored in a large-capacity memory, and the adding, deleting, modifying, checking and all related operations of the database are completed under the condition of enough computing power of a processor.
Example four
Based on the same concept, with reference to the drawings, the application also provides a target crowd inquiring device, which comprises:
the Document building module is used for building a Document model corresponding to each target crowd, wherein the retrieval field of the Document model is filled with population basic information of the target crowd, and the service field of the Document model is filled with service information of the target crowd, wherein the population basic information comprises general attributes corresponding to different types of target crowds, the service information comprises crowd types, service items and attribution information corresponding to the target crowd, documents corresponding to the target crowd are obtained, and the documents are named by the service information and the Document uniform prefix;
and the query module is used for querying the corresponding document based on the query request.
The technical features appearing in the fourth embodiment are the same as those in the third embodiment, and are not redundantly described here.
EXAMPLE five
The present embodiment further provides an electronic apparatus, referring to fig. 8, including a memory 804 and a processor 802, where the memory 804 stores a computer program, and the processor 802 is configured to execute the computer program to perform the steps in any one of the above-mentioned embodiments of the document model based on Elasticsearch or the target population query method.
Specifically, the processor 802 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present application.
Memory 804 may include, among other things, mass storage 804 for data or instructions. By way of example, and not limitation, memory 804 may include a hard disk drive (hard disk drive, HDD for short), a floppy disk drive, a solid state drive (SSD for short), flash memory, an optical disk, a magneto-optical disk, tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Memory 804 may include removable or non-removable (or fixed) media, where appropriate. The memory 804 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 804 is a Non-Volatile (Non-Volatile) memory. In certain embodiments, memory 804 includes Read-only memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or FLASH memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random Access Memory (SRAM) or a Dynamic Random Access Memory (DRAM), where the DRAM may be a fast page mode dynamic random access memory 804 (fastpagemodedynamic random access memory (FPMDRAM), an Extended Data Output Dynamic Random Access Memory (EDODRAM), a Synchronous Dynamic Random Access Memory (SDRAM), or the like.
The memory 804 may be used to store or cache various data files for processing and/or communication purposes, as well as possibly computer program instructions for execution by the processor 802.
The processor 802 reads and executes the computer program instructions stored in the memory 804 to implement a building method or a target crowd querying method of the document model based on Elasticsearch in any of the above embodiments.
Optionally, the electronic apparatus may further include a transmission device 806 and an input/output device 808, where the transmission device 806 is connected to the processor 802, and the input/output device 808 is connected to the processor 802.
The transmission device 806 may be used to receive or transmit data via a network. Specific examples of the network described above may include wired or wireless networks provided by communication providers of the electronic devices. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 806 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The input/output device 808 is used to input or output information. In this embodiment, the input information may be target crowd data, a query request, and the like, and the output information may be a query result, and the like.
Alternatively, in this embodiment, the processor 802 may be configured to execute the following steps by a computer program:
s101, acquiring population basic information and business information of at least one type of target population, wherein the population basic information comprises general attributes corresponding to different types of target populations, and the business information comprises population types, business items and attribution information of the corresponding target populations;
s102, building a document model corresponding to each target crowd, filling the retrieval field of the document model with the population basic information corresponding to the target crowd, and filling the business field of the document model with the business information corresponding to the target crowd to obtain the document corresponding to the target crowd.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects of the invention may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
Embodiments of the invention may be implemented by computer software executable by a data processor of the mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Computer software or programs (also referred to as program products) including software routines, applets and/or macros can be stored in any device-readable data storage medium and they include program instructions for performing particular tasks. The computer program product may comprise one or more computer-executable components configured to perform the embodiments when the program is run. The one or more computer-executable components may be at least one software code or a portion thereof. Further in this regard it should be noted that any block of the logical flow as in figure 8 may represent a program step, or an interconnected set of logical circuits, blocks and functions, or a combination of a program step and a logical circuit, block and function. The software may be stored on physical media such as a memory chip or memory block implemented within the processor, magnetic media such as a hard disk or floppy disk, and optical media such as, for example, DVDs and data variants thereof, CDs. The physical medium is a non-transitory medium.
It should be understood by those skilled in the art that various features of the above embodiments can be combined arbitrarily, and for the sake of brevity, all possible combinations of the features in the above embodiments are not described, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the features.
The above examples are merely illustrative of several embodiments of the present application, and the description is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (14)

1. A method for building a document model based on a search analysis engine is characterized in that:
acquiring population basic information and business information of at least one target population, wherein the population basic information comprises general attributes corresponding to different types of target populations, and the business information comprises population types, business items and attribution information of the corresponding target populations;
and building a document model corresponding to each target crowd, filling the population basic information corresponding to the target crowd into the retrieval field of the document model, and filling the business information corresponding to the target crowd into the business field of the document model to obtain a document corresponding to the target crowd.
2. The method for building a document model based on a search analysis engine according to claim 1, wherein the business items are special attributes corresponding to different types of target people, and the special attributes are used for distinguishing different types of target people.
3. The method for building a document model based on a search analysis engine according to claim 1, wherein the attribution information is a region code and a belonging grid to which the target population belongs.
4. The method for building a document model based on a search analysis engine according to claim 1, wherein each document is named by a document uniform prefix and a grid number, and the grid number is a code corresponding to the home grid information recorded in the corresponding document.
5. The method of claim 1, wherein each document contains document directory metadata and document identification metadata, the document identification metadata of the document corresponds to the target population unique identifier, and the document identification metadata and the document directory metadata locate a specific document.
6. The method for building a document model based on a search analysis engine according to claim 1, wherein the non-time field in the document model adopts keywords, and the time field adopts timestamps.
7. A document model based on a search analysis engine is built by adopting the building method of the document model based on the search analysis engine, which is disclosed by any one of the claims 1 to 6.
8. A target crowd query method is characterized by comprising the following steps:
establishing a document model corresponding to each target crowd, filling population basic information of the target crowd into retrieval fields of the document model, and filling business information of the target crowd into business fields of the document model, wherein the population basic information comprises general attributes corresponding to different types of target crowds, the business information comprises crowd types, business items and attribution information corresponding to the target crowds, obtaining documents corresponding to the target crowds, and naming the documents by using the business information and the documents with unified prefixes;
and acquiring a query request, and querying a corresponding document based on the query request.
9. The target population query method of claim 8, wherein in the step of obtaining the query request, a query statement is predefined and parameters of the query statement are reserved, at least one query condition for recording incoming parameters is obtained, and the incoming parameters are substituted into the parameters in the query statement to be packaged to form the query request.
10. The method as claimed in claim 8, wherein in the step of "querying corresponding documents based on the query request", business information is recorded in the query request, and names of the documents are matched based on the business information to unify the query documents.
11. The method as claimed in claim 8, wherein in the step of "query for corresponding document based on the query request", the population basic information and business information of the document record are returned, or the page information of the document is returned.
12. A target population querying device, comprising:
the document building module is used for building a document model corresponding to each target crowd, population basic information of the target crowd is filled in retrieval fields of the document model, and business information of the target crowd is filled in business fields of the document model, wherein the population basic information comprises general attributes corresponding to different types of target crowds, the business information comprises crowd types, business items and attribution information corresponding to the target crowd, documents corresponding to the target crowd are obtained, and the documents are named by the business information and the document uniform prefix;
and the query module is used for querying the corresponding document based on the query request.
13. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for building a document model based on a search and analysis engine according to any one of claims 1 to 6 and the method for querying a target population according to any one of claims 9 to 11.
14. A readable storage medium, wherein a computer program is stored in the readable storage medium, the computer program comprising a program code for controlling a process to execute the process, the process comprising the method for building a document model based on a search analysis engine according to any one of claims 1 to 6 and the target population query method according to any one of claims 9 to 11.
CN202210415136.3A 2022-04-20 2022-04-20 Method for building document model based on search analysis engine and application thereof Pending CN114817253A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210415136.3A CN114817253A (en) 2022-04-20 2022-04-20 Method for building document model based on search analysis engine and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210415136.3A CN114817253A (en) 2022-04-20 2022-04-20 Method for building document model based on search analysis engine and application thereof

Publications (1)

Publication Number Publication Date
CN114817253A true CN114817253A (en) 2022-07-29

Family

ID=82506494

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210415136.3A Pending CN114817253A (en) 2022-04-20 2022-04-20 Method for building document model based on search analysis engine and application thereof

Country Status (1)

Country Link
CN (1) CN114817253A (en)

Similar Documents

Publication Publication Date Title
US11461356B2 (en) Large scale unstructured database systems
JP7410181B2 (en) Hybrid indexing methods, systems, and programs
CN111259006B (en) Universal distributed heterogeneous data integrated physical aggregation, organization, release and service method and system
CN108874971B (en) Tool and method applied to mass tagged entity data storage
US11468103B2 (en) Relational modeler and renderer for non-relational data
CN107423422B (en) Spatial data distributed storage and search method and system based on grid
US8543596B1 (en) Assigning blocks of a file of a distributed file system to processing units of a parallel database management system
WO2023087673A1 (en) Hierarchical data retrieval method and apparatus, and device
US10706022B2 (en) Space-efficient secondary indexing on distributed data stores
CN104239377A (en) Platform-crossing data retrieval method and device
CN102169507A (en) Distributed real-time search engine
US11216516B2 (en) Method and system for scalable search using microservice and cloud based search with records indexes
WO2019161679A1 (en) Data processing method and device for use in online analytical processing
US20190311051A1 (en) Virtual columns to expose row specific details for query execution in column store databases
CN112231351A (en) Real-time query method and device for PB-level mass data
CN104408084A (en) Method and device for screening big data
CN114153910A (en) Data acquisition method and device, electronic device and computer program product
CN107273443B (en) Mixed indexing method based on metadata of big data model
CN113297171A (en) Database migration method and device and database cluster
CN109063061B (en) Cross-distributed system data processing method, device, equipment and storage medium
CN114817253A (en) Method for building document model based on search analysis engine and application thereof
CN107291875B (en) Metadata organization management method and system based on metadata graph
CN117873967B (en) Data management method, device, equipment and storage medium of distributed file system
CN115248829A (en) Data storage method, data query method and device
CN114491111B (en) Distributed metadata system for picture storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination