CN112559671A

CN112559671A - ES-based text search engine construction method, device, equipment and medium

Info

Publication number: CN112559671A
Application number: CN202110191157.7A
Authority: CN
Inventors: 张玉君; 罗晓生; 钱勇; 杜晓东; 谢良义
Original assignee: Shenzhen Pingan Zhihui Enterprise Information Management Co ltd
Current assignee: Shenzhen Pingan Zhihui Enterprise Information Management Co ltd
Priority date: 2021-02-20
Filing date: 2021-02-20
Publication date: 2021-03-26
Anticipated expiration: 2041-02-20
Also published as: CN112559671B

Abstract

The application relates to the technical field of artificial intelligence, and discloses a method, a device, equipment and a medium for constructing a text search engine based on ES, wherein the method comprises the following steps: constructing an ES component and a search engine database; acquiring text data to be stored according to a data source set to be searched and storing the text data in a search engine database; respectively carrying out field type analysis and importance scoring on each field to be analyzed in the text data to be analyzed acquired from the search engine database; obtaining a target matching mode according to the type of the target field; obtaining a target search index according to the text data to be analyzed and the target matching mode; obtaining a target search result sorting mode according to the target importance scoring result and the relevance scoring method of the ES component; and obtaining a target text search engine according to the ES component, the search engine database, the target search result ordering mode and the target search index. Thereby eliminating the need to build a text search engine separately for different data sources.

Description

ES-based text search engine construction method, device, equipment and medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a medium for constructing an ES-based text search engine.

Background

In the internet era, in the process of enterprise informatization, search engines are increasingly used to support employees to retrieve internal information (such as personnel address books, personnel information, OA office modules, files and the like). Because the retrieval content comes from different data sources, the data content of each data source is different, and the interaction modes of different data sources are different, the search engine in the prior art is difficult to adapt, so that different search engines need to be developed, and the development cost of the search engine development is increased.

Disclosure of Invention

The application mainly aims to provide a method, a device, equipment and a medium for constructing an ES-based text search engine, and aims to solve the technical problems that the search engine in the prior art is difficult to adapt to application scenarios of interactive modes of retrieval contents from different data sources, different data contents of each data source and different data sources.

In order to achieve the above object, the present application provides a method for constructing an ES-based text search engine, the method including:

constructing an ES component, and constructing a search engine database based on the ES component;

acquiring a data source set to be searched, acquiring text data to be stored according to the data source set to be searched, and storing the text data to be stored in the search engine database;

acquiring text data from the search engine database to obtain text data to be analyzed, and respectively performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed;

respectively scoring the importance of each field to be analyzed in the text data to be analyzed to obtain a target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed;

performing matching mode setting according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed;

constructing search indexes of the ES component according to the text data to be analyzed and the target matching modes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed;

setting a search result sorting mode according to the target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed and the relevancy scoring method of the ES component to obtain a target search result sorting mode;

and performing search engine encapsulation according to the ES component, the search engine database, the target search result ordering mode and the target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain a target text search engine.

Further, the step of performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed includes:

respectively carrying out information entropy calculation on each field to be analyzed of the text data to be analyzed to obtain target field information entropy corresponding to each field to be analyzed corresponding to the text data to be analyzed;

and performing field type analysis according to the target field information entropy corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed.

Further, the step of performing field type analysis according to the target field information entropy corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed includes:

when in use

Determining the type of the target field as a code value type if the target field is not the code value type;

wherein, the calculation formula type (i) of the target field type is as follows:

wherein,

is the target field information entropy corresponding to the ith field to be analyzed corresponding to the text data to be analyzed,

is the number of the de-duplicated field values corresponding to the ith field to be analyzed corresponding to the text data to be analyzed, k is the number of all field values of the ith field to be analyzed corresponding to the text data to be analyzed,

is the text data to be analyzedThe number of non-empty field values of the corresponding ith field to be analyzed, k (i) is the number of field values of the ith field to be analyzed corresponding to the text data to be analyzed, and C is a constant;

n (i) is the number of the field values of the ith field to be analyzed corresponding to the text data to be analyzed after the duplication is removed, p (j) is the probability that the ith field to be analyzed corresponding to the text data to be analyzed is the jth field value in the field values after the duplication is removed, and log () is a logarithmic function;

judging whether a jth field value in an ith de-duplicated field value of the field to be analyzed corresponding to the text data to be analyzed is empty, and determining that the jth field value in the de-duplicated field value is empty

Is 0, when the j field value in the de-duplicated field value is not null

Is 1.

Further, the step of setting a matching mode according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed includes:

respectively judging whether the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed is a code value type;

and when the target field type is the code value type, determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as an accurate matching search mode of the ES component, otherwise, determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as a keyword segmentation matching degree matching mode.

Further, the step of determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as a keyword segmentation matching degree matching mode includes:

determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type in a non-code value type as the keyword segmentation matching degree matching mode, wherein the keyword segmentation matching degree matching mode refers to that the keyword segmentation matching degree is set to be 100%;

the formula match for calculating the keyword segmentation matching degree is as follows:

wherein,

is the number of words after the search keyword is segmented,

is the number of the words after the search keyword word segmentation and the words after the search result is hit after the duplication is removed.

Further, the step of constructing search indexes of the ES component according to the text data to be analyzed and the target matching manner corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed includes:

extracting one field to be analyzed from the text data to be analyzed as a target field to be analyzed;

constructing a search index of the ES component according to the field to be analyzed and the target matching mode corresponding to the field to be analyzed to obtain the target search index corresponding to the field to be analyzed;

and repeatedly executing the step of extracting one field to be analyzed from the text data to be analyzed as a target field to be analyzed until the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed is determined.

Further, the step of performing search engine encapsulation according to the ES component, the search engine database, the target search result ranking manner, and the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target text search engine includes:

setting the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed as the index of the ES component to obtain the ES component with the index construction completed;

setting the target search result ordering mode as the search result ordering mode of the ES component with the index construction completed to obtain a target ES component;

and packaging the target ES component and the search engine database to obtain the target text search engine.

The present application also proposes an ES-based text search engine construction apparatus, the apparatus including:

the ES component and database construction module is used for constructing ES components and constructing a search engine database based on the ES components;

the text data acquisition module to be stored is used for acquiring a data source set to be searched, acquiring text data to be stored according to the data source set to be searched, and storing the text data to be stored in the search engine database;

the field type analysis module is used for acquiring text data from the search engine database to obtain text data to be analyzed, and respectively performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed;

the importance scoring module is used for respectively scoring the importance of each field to be analyzed in the text data to be analyzed to obtain a target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed;

a matching mode setting module, configured to perform matching mode setting according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed, so as to obtain a target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed;

the search index construction module of the ES component is used for constructing the search index of the ES component according to the text data to be analyzed and the target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed;

a search result sorting mode setting module, configured to set a search result sorting mode according to the target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed and a relevancy scoring method of the ES component, so as to obtain a target search result sorting mode;

and the search engine packaging module is used for packaging a search engine according to the ES component, the search engine database, the target search result ordering mode and the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target text search engine.

The present application further proposes a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of any of the above methods when executing the computer program.

The present application also proposes a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any of the above.

According to the construction method, the construction device, the construction equipment and the construction medium of the text search engine based on the ES, the text data of the data source is stored in a search engine database constructed based on the ES component, then the data is obtained from the search engine database to carry out field type analysis and importance grading, and a target matching mode is obtained according to the target field type; obtaining a target search index according to the text data to be analyzed and the target matching mode; obtaining a target search result sorting mode according to the target importance scoring result and the relevance scoring method of the ES component; the target text search engine is obtained according to the ES component, the search engine database, the target search result ordering mode and the target search index, so that the target text search engine can be quickly constructed according to a plurality of data sources, and the text search engine does not need to be separately constructed for different data sources; the construction cost of the text search engine is simplified by automatically determining the matching mode of the fields of the text content; the search result ranking mode is set according to the field target importance scoring result and the ES component relevance scoring method, so that the ranking accuracy of the search results obtained by the constructed target text search engine is improved.

Drawings

FIG. 1 is a flow chart illustrating a method for constructing an ES-based text search engine according to an embodiment of the present application;

FIG. 2 is a block diagram schematically illustrating a construction apparatus of an ES-based text search engine according to an embodiment of the present application;

fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.

The objectives, features, and advantages of the present application will be further described with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In order to solve the technical problems that the search engine in the prior art is difficult to adapt to the application scene that the retrieval content comes from different data sources, the data content of each data source is different, and the interaction mode of different data sources is different, the construction method of the text search engine based on the ES is provided in the application, and the method is applied to the technical field of artificial intelligence. The construction method of the ES-based text search engine comprises the steps of storing text data of a data source in an ES component-based construction search engine database, obtaining data from the search engine database to perform field type analysis and importance grading, determining a target matching mode according to field type analysis results, obtaining a target search index according to text data to be analyzed and the target matching mode, obtaining a target search result sorting mode according to importance grading results and the correlation degree grading method of the ES components, and obtaining the target text search engine according to the ES components, the search engine database, the target search result sorting mode and the target search index, so that the target text search engine can be quickly constructed according to a plurality of data sources, and the text search engine does not need to be separately constructed for different data sources; the construction cost of the text search engine is simplified by automatically determining the matching mode of the fields of the text content; the search result ranking mode is set according to the field target importance scoring result and the ES component relevance scoring method, so that the ranking accuracy of the search results obtained by the constructed target text search engine is improved.

Referring to fig. 1, an embodiment of the present application provides a method for constructing an ES-based text search engine, where the method includes:

s1: constructing an ES component, and constructing a search engine database based on the ES component;

s2: acquiring a data source set to be searched, acquiring text data to be stored according to the data source set to be searched, and storing the text data to be stored in the search engine database;

s3: acquiring text data from the search engine database to obtain text data to be analyzed, and respectively performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed;

s4: respectively scoring the importance of each field to be analyzed in the text data to be analyzed to obtain a target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed;

s5: performing matching mode setting according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed;

s6: constructing search indexes of the ES component according to the text data to be analyzed and the target matching modes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed;

s7: setting a search result sorting mode according to the target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed and the relevancy scoring method of the ES component to obtain a target search result sorting mode;

s8: and performing search engine encapsulation according to the ES component, the search engine database, the target search result ordering mode and the target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain a target text search engine.

In the embodiment, text data of a data source is stored in a search engine database constructed based on an ES component, then data is obtained from the search engine database to perform field type analysis and importance scoring, and a target matching mode is obtained according to a target field type; obtaining a target search index according to the text data to be analyzed and the target matching mode; obtaining a target search result sorting mode according to the target importance scoring result and the relevance scoring method of the ES component; the target text search engine is obtained according to the ES component, the search engine database, the target search result ordering mode and the target search index, so that the target text search engine can be quickly constructed according to a plurality of data sources, and the text search engine does not need to be separately constructed for different data sources; the construction cost of the text search engine is simplified by automatically determining the matching mode of the fields of the text content; the search result ranking mode is set according to the field target importance scoring result and the ES component relevance scoring method, so that the ranking accuracy of the search results obtained by the constructed target text search engine is improved.

For S1, acquiring an ES (fully called elastic search, Lucene-based search server, providing a distributed multi-user-capability full-text search engine, RESTful web interface-based) installation file; installing the ES installation file to obtain an ES component; and constructing a search engine database matched with the ES component.

For S2, the data source set to be searched input by the user may be obtained, or the data source set to be searched input by the third-party application system may also be obtained from the database.

The data source set to be searched is a set of data sources which can be searched by the target text search engine realized by the application. The data source set to be searched comprises configuration data of a plurality of data sources. Configuration data for the data sources include, but are not limited to: data source name, data source access address, user name, password.

And acquiring text data from each data source corresponding to the data source set to be searched, taking the acquired text data as text data to be stored, and storing all the text data to be stored in the search engine database.

The text data to be stored refers to the text data which needs to be stored in the search engine database.

For S3, acquiring text data to be analyzed from the search engine database, and taking the acquired text data as text data to be analyzed; and analyzing the field type of the field to be analyzed according to the field value of the field to be analyzed in the text data to be analyzed, and taking the field type obtained by analysis as the target field type corresponding to the field to be analyzed, wherein the field to be analyzed is any field to be analyzed in all fields to be analyzed in the text data to be analyzed.

The text data to be analyzed refers to the text data which needs to be subjected to field type analysis and importance scoring. The text data to be analyzed comprises the text content of at least one field to be analyzed. It will be appreciated that each field to be analyzed includes one or more field values.

Target field type, i.e. field type. The field types include: coded and uncoded. The code value type means that values are distributed in a limited and selectable range. For example, the value range of the academic calendar is limited and selectable, and when the field to be analyzed is the academic calendar, the target field type corresponding to the field to be analyzed is determined to be the code value type, which is not specifically limited in this example. Non-code value type means that the range distribution of the values is relatively wide. For example, the value range of the name is relatively wide, and when the field to be analyzed is the name, the type of the target field corresponding to the field to be analyzed is determined to be a non-code value type, which is not limited in this example.

It can be understood that, when constructing the target text search engine, all the text data in the search engine database are extracted as the text data to be analyzed. In the use process after the construction of the target text search engine is completed, the text data newly stored in the search engine database can be extracted as the text data to be analyzed, and then steps S3 to S8 are performed to update the target text search engine. Therefore, the target matching mode and the target search index corresponding to all the fields to be analyzed of the target text search engine are automatically determined, the automation degree is improved, the cost for constructing the target text search engine is reduced, and the cost for using the target text search engine is also reduced.

For S4, importance scoring is performed on the target field to be analyzed according to the field value of the target field to be analyzed in the text data to be analyzed, and the obtained importance scoring result is used as the target importance scoring result corresponding to the target field to be analyzed, where the target field to be analyzed is any one of all the fields to be analyzed in the text data to be analyzed.

The target importance score result is calculated by the formula b (i):

wherein B (i) is a target importance scoring result of the ith field to be analyzed of the text data to be analyzed,

is an average of the number of characters of a non-empty field value of the ith field to be analyzed of the text data to be analyzed,

is a standard deviation of a number distribution of characters of a non-empty field value of an ith field to be analyzed of the text data to be analyzed, and log () is a logarithmic function.

Alternatively to this, the first and second parts may,

is an average of the number of chinese characters of a non-empty field value of an ith field to be analyzed of the text data to be analyzed,

is a standard deviation of a number distribution of chinese characters of a non-empty field value of an ith field to be analyzed of the text data to be analyzed. Therefore, the search engine constructed by the method is suitable for the application scene of Chinese search.

For S5, when the target field type corresponding to the target field to be analyzed is a code value type, determining a target matching mode corresponding to the target field to be analyzed as an accurate matching search mode of the ES; when the target field type corresponding to the field to be analyzed is a non-code value type, determining a target matching mode corresponding to the field to be analyzed as a keyword word segmentation matching degree matching mode; the target field to be analyzed is any field to be analyzed in all fields to be analyzed in the text data to be analyzed.

The matching mode of the keyword segmentation matching degree is determined according to the matching degree of the search keyword after segmentation.

For S6, search indexes of the ES component are constructed by using an index construction method of ES according to the text data to be analyzed and the target matching manner corresponding to each field to be analyzed corresponding to the text data to be analyzed, and the constructed indexes are used as target search indexes corresponding to each field to be analyzed corresponding to the text data to be analyzed.

It can be understood that the specific implementation of the index construction method of the ES according to the text data to be analyzed and the target matching manner corresponding to each field to be analyzed corresponding to the text data to be analyzed may be selected from the prior art, and details are not repeated here.

For S7, the calculation formula S (i) of the target search result ranking manner is:

wherein m is the number of the words after the word segmentation of the search keyword, B (j) is the target importance scoring result of the field to be analyzed corresponding to the jth word in all the words after the word segmentation of the search keyword,

the j-th word in all the words after the word segmentation of the search keyword is the relevance degree score contribution value relative to the search keyword.

Optionally, the scoring contribution value of the relevance degree of the jth word relative to the search keyword in all the words after the search keyword is segmented may be a scoring result calculated by adopting an TF-IDF (erm frequency-inverse document frequency) algorithm of the ES.

The search keyword is a keyword for searching, which is input by a user into a target text search engine constructed by the present application.

And determining a target search result sorting mode by the target importance scoring result and the relevance scoring method of the ES component, thereby improving the accuracy of search result sorting and improving the user satisfaction.

For S8, encapsulating the ES component, the search engine database, the target search result ordering manner, and the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed into a search engine, and taking the search engine obtained by encapsulation as a target text search engine.

In an embodiment, the step of performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed includes:

s31: respectively carrying out information entropy calculation on each field to be analyzed of the text data to be analyzed to obtain target field information entropy corresponding to each field to be analyzed corresponding to the text data to be analyzed;

s32: and performing field type analysis according to the target field information entropy corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed.

The method and the device realize that the target field type is determined according to the target field information entropy corresponding to the field to be analyzed, and provide a data basis for subsequent matching mode setting.

For S31, performing information entropy calculation on the target field to be analyzed according to the field value of the target field to be analyzed in the text data to be analyzed by adopting an information entropy algorithm, and taking the calculation result as the target field information entropy corresponding to the target field to be analyzed; the target field to be analyzed is any field to be analyzed in all fields to be analyzed in the text data to be analyzed.

And S32, performing field type analysis on the target field information entropy corresponding to the target field to be analyzed, and taking the obtained field type as the target field type corresponding to the target field to be analyzed.

In an embodiment, the step of performing field type analysis according to the target field information entropy corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed includes:

when in use

wherein,

is the number of non-empty field values of the ith field to be analyzed corresponding to the text data to be analyzed, k (i) is the number of field values of the ith field to be analyzed corresponding to the text data to be analyzed, and C is a constant;

n (i) is the ith station corresponding to the text data to be analyzedThe number of the de-duplicated field values of the fields to be analyzed, p (j) is the probability that the ith field to be analyzed corresponding to the text data to be analyzed is the jth field value in the de-duplicated field values, and log () is a logarithmic function;

Is 0, when the j field value in the de-duplicated field value is not null

Is 1.

According to the field type analysis method and device, the field type analysis is carried out according to the information entropy, and therefore the accuracy of determining the target field type is improved.

Alternatively, the value of C is set to 10,

and

the comparison of (a) and (b) can be regarded as the comparison of the information entropy of the target field and the non-empty distribution of the field value of the field to be analyzed, generally, when two numbers (a and b) are compared, if a is more than 10 times of b, then a can be regarded as being far more than b, so that the significance of the information entropy of the field to be analyzed on the field value is not only considered, the condition that the search is worthless when the field value is empty is avoided, and meanwhile, the interference influence of the empty field value on the judgment result is avoided.

In an embodiment, the step of setting a matching manner according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target matching manner corresponding to each field to be analyzed corresponding to the text data to be analyzed includes:

s51: respectively judging whether the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed is a code value type;

s52: and when the target field type is the code value type, determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as an accurate matching search mode of the ES component, otherwise, determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as a keyword segmentation matching degree matching mode.

According to the method and the device, the matching mode is set according to the target field type, so that different matching modes are set for different target field types, and the matching accuracy is improved.

For S52, when the target field type is the code value type, it means that a value range of a field value of a field to be analyzed corresponding to the target field type is limited, and at this time, precise matching may be adopted to improve matching accuracy, so that the target matching manner corresponding to the field to be analyzed corresponding to the target field type of the code value type is determined as a precise matching search manner of the ES component. When the target field type is the non-code value type, the value range distribution of the field value of the field to be analyzed corresponding to the target field type is wide, and at the moment, if accurate matching is adopted, a large number of values are matched, so that the searching efficiency and accuracy are reduced, therefore, the target matching mode corresponding to the field to be analyzed corresponding to the non-code value type target field type is determined to be a keyword segmentation matching degree matching mode, and the searching efficiency and accuracy are improved through the keyword segmentation matching degree matching mode.

It can be understood that, when the target field type is the code value type, determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as the precise matching search mode of the ES can avoid errors in a specific search scene. The value range of the code value type is very limited, and the phrase matching search mode or the exact matching search mode using the ES in the prior art exists, but a specific scene is not suitable for using the phrase matching search mode. For example, when the searched field has "regular employee" and "informal employee", the search keyword is "regular employee", and the "regular employee" and the "informal employee" can be matched simultaneously by using the phrase matching search method of ES, and the matching result is not in accordance with the search requirement, which is not limited in this example.

In an embodiment, the step of determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as a keyword segmentation matching degree matching mode includes:

wherein,

is the number of words after the search keyword is segmented,

In the embodiment, the keyword segmentation matching degree matching mode is set to be 100%, so that the retrieval efficiency and accuracy are improved.

For example, the word segmentation result of the search keyword "data engineer" is 3 words: "data", "engineer", for the search result "big data WEB development engineer", the word after the word segmentation of the search keywordThe de-duplicated words of "data", "engineering", "teacher" hit in the search result "big data WEB development engineer" are 3 (i.e. "data", "engineering", "teacher"), and the keyword matching degree

The examples herein are not particularly limited.

For example, the word segmentation result of the search keyword "data engineer" is 3 words: "data", "engineer", for the search result "data analyst", the words "data", "engineer" after the word segmentation of the search keyword are 2 (i.e., "data", "engineer") after the de-duplication words of the search result "data analyst" are hit, and the keyword matching degree

The examples herein are not particularly limited.

In an embodiment, the step of constructing the search index of the ES component according to the text data to be analyzed and the target matching manner corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed includes:

s61: extracting one field to be analyzed from the text data to be analyzed as a target field to be analyzed;

s62: constructing a search index of the ES component according to the field to be analyzed and the target matching mode corresponding to the field to be analyzed to obtain the target search index corresponding to the field to be analyzed;

s63: and repeatedly executing the step of extracting one field to be analyzed from the text data to be analyzed as a target field to be analyzed until the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed is determined.

According to the method and the device, the search index of the ES component is constructed according to the target matching mode of the field to be analyzed, so that the accuracy of the constructed target search index is improved, and the retrieval efficiency of the target text search engine is improved.

For S61, one of the fields to be analyzed is randomly extracted from the text data to be analyzed as a target field to be analyzed.

And S62, constructing search indexes of the ES component by adopting an ES index construction method according to the target field to be analyzed and the target matching mode corresponding to the target field to be analyzed, and taking the constructed indexes as the target search indexes corresponding to the target field to be analyzed.

For S63, repeating steps S61 to S63 until determining the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed.

In an embodiment, the step of performing search engine encapsulation according to the ES component, the search engine database, the target search result ordering manner, and the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target text search engine includes:

s81: setting the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed as the index of the ES component to obtain the ES component with the index construction completed;

s82: setting the target search result ordering mode as the search result ordering mode of the ES component with the index construction completed to obtain a target ES component;

s83: and packaging the target ES component and the search engine database to obtain the target text search engine.

According to the embodiment, the search engine packaging is carried out according to the ES component, the search engine database, the target search result ordering mode and the target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed, so that the target text search engine can be quickly constructed according to a plurality of data sources, and the text search engine does not need to be separately constructed for different data sources.

For S81, the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed is set as the index of the ES component by using the index setting method of the ES component, so that the ES component can search based on the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed.

For S82, the ES component sorting method is adopted to set the target search result sorting method as the search result sorting method of the ES component that completes the index construction, so that the ES component can sort the search results in the target search result sorting method by adopting the ES component sorting method.

For S83, the target ES component and the search engine database are packaged into a search engine, and the packaged search engine is used as the target text search engine.

Referring to fig. 2, the present application also provides an apparatus for constructing an ES-based text search engine, the apparatus including:

an ES component and database construction module 100 for constructing ES components, constructing a search engine database based on the ES components;

a to-be-stored text data obtaining module 200, configured to obtain a to-be-searched data source set, obtain to-be-stored text data according to the to-be-searched data source set, and store the to-be-stored text data in the search engine database;

a field type analysis module 300, configured to obtain text data from the search engine database, obtain text data to be analyzed, and perform field type analysis on each field to be analyzed in the text data to be analyzed, respectively, to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed;

the importance scoring module 400 is configured to perform importance scoring on each to-be-analyzed field in the to-be-analyzed text data, so as to obtain a target importance scoring result corresponding to each to-be-analyzed field corresponding to the to-be-analyzed text data;

a matching mode setting module 500, configured to perform matching mode setting according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed, so as to obtain a target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed;

a search index construction module 600 of the ES component, configured to perform search index construction of the ES component according to the text data to be analyzed and the target matching manner corresponding to each field to be analyzed corresponding to the text data to be analyzed, so as to obtain a target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed;

a search result ranking mode setting module 700, configured to set a search result ranking mode according to the target importance scoring result corresponding to each to-be-analyzed field corresponding to the to-be-analyzed text data and a relevance scoring method of the ES component, so as to obtain a target search result ranking mode;

a search engine encapsulation module 800, configured to perform search engine encapsulation according to the ES component, the search engine database, the target search result ordering manner, and the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed, so as to obtain a target text search engine.

Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer device is used for storing data such as a construction method of the ES-based text search engine. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of constructing an ES-based text search engine. The construction method of the ES-based text search engine comprises the following steps: constructing an ES component, and constructing a search engine database based on the ES component; acquiring a data source set to be searched, acquiring text data to be stored according to the data source set to be searched, and storing the text data to be stored in the search engine database; acquiring text data from the search engine database to obtain text data to be analyzed, and respectively performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed; respectively scoring the importance of each field to be analyzed in the text data to be analyzed to obtain a target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed; performing matching mode setting according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed; constructing search indexes of the ES component according to the text data to be analyzed and the target matching modes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed; setting a search result sorting mode according to the target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed and the relevancy scoring method of the ES component to obtain a target search result sorting mode; and performing search engine encapsulation according to the ES component, the search engine database, the target search result ordering mode and the target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain a target text search engine.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements a method for constructing an ES-based text search engine, including the steps of: constructing an ES component, and constructing a search engine database based on the ES component; acquiring a data source set to be searched, acquiring text data to be stored according to the data source set to be searched, and storing the text data to be stored in the search engine database; acquiring text data from the search engine database to obtain text data to be analyzed, and respectively performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed; respectively scoring the importance of each field to be analyzed in the text data to be analyzed to obtain a target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed; performing matching mode setting according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed; constructing search indexes of the ES component according to the text data to be analyzed and the target matching modes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed; setting a search result sorting mode according to the target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed and the relevancy scoring method of the ES component to obtain a target search result sorting mode; and performing search engine encapsulation according to the ES component, the search engine database, the target search result ordering mode and the target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain a target text search engine.

According to the method for constructing the ES-based text search engine, the text data of the data source is stored in the ES-based component construction search engine database, then the data is obtained from the search engine database to carry out field type analysis and importance scoring, and a target matching mode is obtained according to the target field type; obtaining a target search index according to the text data to be analyzed and the target matching mode; obtaining a target search result sorting mode according to the target importance scoring result and the relevance scoring method of the ES component; the target text search engine is obtained according to the ES component, the search engine database, the target search result ordering mode and the target search index, so that the target text search engine can be quickly constructed according to a plurality of data sources, and the text search engine does not need to be separately constructed for different data sources; the construction cost of the text search engine is simplified by automatically determining the matching mode of the fields of the text content; the search result ranking mode is set according to the field target importance scoring result and the ES component relevance scoring method, so that the ranking accuracy of the search results obtained by the constructed target text search engine is improved.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A construction method of an ES-based text search engine is characterized by comprising the following steps:

2. The construction method of the ES-based text search engine according to claim 1, wherein the step of performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed comprises:

3. The construction method of the ES-based text search engine according to claim 2, wherein the step of performing field type analysis according to the target field information entropy corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed comprises:

when in use

wherein,

Is 0, when the j field value in the de-duplicated field value is not null

Is 1.

4. The construction method of the ES-based text search engine according to claim 1, wherein the step of setting a matching manner according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target matching manner corresponding to each field to be analyzed corresponding to the text data to be analyzed comprises:

5. The method according to claim 4, wherein the step of determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as a keyword segmentation matching degree matching mode comprises:

wherein,

is the number of words after the search keyword is segmented,

6. The method for building the ES-based text search engine according to claim 1, wherein the step of building the search index of the ES component according to the text data to be analyzed and the target matching manner corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed comprises:

7. The method according to claim 1, wherein the step of encapsulating the search engine according to the ES component, the search engine database, the target search result ordering manner, and the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target text search engine comprises:

8. An apparatus for constructing an ES-based text search engine, the apparatus comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.