CN112559671A - ES-based text search engine construction method, device, equipment and medium - Google Patents

ES-based text search engine construction method, device, equipment and medium Download PDF

Info

Publication number
CN112559671A
CN112559671A CN202110191157.7A CN202110191157A CN112559671A CN 112559671 A CN112559671 A CN 112559671A CN 202110191157 A CN202110191157 A CN 202110191157A CN 112559671 A CN112559671 A CN 112559671A
Authority
CN
China
Prior art keywords
analyzed
field
target
text data
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110191157.7A
Other languages
Chinese (zh)
Other versions
CN112559671B (en
Inventor
张玉君
罗晓生
钱勇
杜晓东
谢良义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Pingan Zhihui Enterprise Information Management Co ltd
Original Assignee
Shenzhen Pingan Zhihui Enterprise Information Management Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Pingan Zhihui Enterprise Information Management Co ltd filed Critical Shenzhen Pingan Zhihui Enterprise Information Management Co ltd
Priority to CN202110191157.7A priority Critical patent/CN112559671B/en
Publication of CN112559671A publication Critical patent/CN112559671A/en
Application granted granted Critical
Publication of CN112559671B publication Critical patent/CN112559671B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of artificial intelligence, and discloses a method, a device, equipment and a medium for constructing a text search engine based on ES, wherein the method comprises the following steps: constructing an ES component and a search engine database; acquiring text data to be stored according to a data source set to be searched and storing the text data in a search engine database; respectively carrying out field type analysis and importance scoring on each field to be analyzed in the text data to be analyzed acquired from the search engine database; obtaining a target matching mode according to the type of the target field; obtaining a target search index according to the text data to be analyzed and the target matching mode; obtaining a target search result sorting mode according to the target importance scoring result and the relevance scoring method of the ES component; and obtaining a target text search engine according to the ES component, the search engine database, the target search result ordering mode and the target search index. Thereby eliminating the need to build a text search engine separately for different data sources.

Description

ES-based text search engine construction method, device, equipment and medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a medium for constructing an ES-based text search engine.
Background
In the internet era, in the process of enterprise informatization, search engines are increasingly used to support employees to retrieve internal information (such as personnel address books, personnel information, OA office modules, files and the like). Because the retrieval content comes from different data sources, the data content of each data source is different, and the interaction modes of different data sources are different, the search engine in the prior art is difficult to adapt, so that different search engines need to be developed, and the development cost of the search engine development is increased.
Disclosure of Invention
The application mainly aims to provide a method, a device, equipment and a medium for constructing an ES-based text search engine, and aims to solve the technical problems that the search engine in the prior art is difficult to adapt to application scenarios of interactive modes of retrieval contents from different data sources, different data contents of each data source and different data sources.
In order to achieve the above object, the present application provides a method for constructing an ES-based text search engine, the method including:
constructing an ES component, and constructing a search engine database based on the ES component;
acquiring a data source set to be searched, acquiring text data to be stored according to the data source set to be searched, and storing the text data to be stored in the search engine database;
acquiring text data from the search engine database to obtain text data to be analyzed, and respectively performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed;
respectively scoring the importance of each field to be analyzed in the text data to be analyzed to obtain a target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed;
performing matching mode setting according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed;
constructing search indexes of the ES component according to the text data to be analyzed and the target matching modes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed;
setting a search result sorting mode according to the target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed and the relevancy scoring method of the ES component to obtain a target search result sorting mode;
and performing search engine encapsulation according to the ES component, the search engine database, the target search result ordering mode and the target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain a target text search engine.
Further, the step of performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed includes:
respectively carrying out information entropy calculation on each field to be analyzed of the text data to be analyzed to obtain target field information entropy corresponding to each field to be analyzed corresponding to the text data to be analyzed;
and performing field type analysis according to the target field information entropy corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed.
Further, the step of performing field type analysis according to the target field information entropy corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed includes:
when in use
Figure DEST_PATH_IMAGE002
Determining the type of the target field as a code value type if the target field is not the code value type;
wherein, the calculation formula type (i) of the target field type is as follows:
Figure DEST_PATH_IMAGE003
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE004
is the target field information entropy corresponding to the ith field to be analyzed corresponding to the text data to be analyzed,
Figure DEST_PATH_IMAGE005
is the number of the de-duplicated field values corresponding to the ith field to be analyzed corresponding to the text data to be analyzed, k is the number of all field values of the ith field to be analyzed corresponding to the text data to be analyzed,
Figure DEST_PATH_IMAGE006
is the text data to be analyzedThe number of non-empty field values of the corresponding ith field to be analyzed, k (i) is the number of field values of the ith field to be analyzed corresponding to the text data to be analyzed, and C is a constant;
n (i) is the number of the field values of the ith field to be analyzed corresponding to the text data to be analyzed after the duplication is removed, p (j) is the probability that the ith field to be analyzed corresponding to the text data to be analyzed is the jth field value in the field values after the duplication is removed, and log () is a logarithmic function;
Figure DEST_PATH_IMAGE007
judging whether a jth field value in an ith de-duplicated field value of the field to be analyzed corresponding to the text data to be analyzed is empty, and determining that the jth field value in the de-duplicated field value is empty
Figure DEST_PATH_IMAGE008
Is 0, when the j field value in the de-duplicated field value is not null
Figure DEST_PATH_IMAGE009
Is 1.
Further, the step of setting a matching mode according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed includes:
respectively judging whether the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed is a code value type;
and when the target field type is the code value type, determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as an accurate matching search mode of the ES component, otherwise, determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as a keyword segmentation matching degree matching mode.
Further, the step of determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as a keyword segmentation matching degree matching mode includes:
determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type in a non-code value type as the keyword segmentation matching degree matching mode, wherein the keyword segmentation matching degree matching mode refers to that the keyword segmentation matching degree is set to be 100%;
the formula match for calculating the keyword segmentation matching degree is as follows:
Figure DEST_PATH_IMAGE010
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE011
is the number of words after the search keyword is segmented,
Figure DEST_PATH_IMAGE012
is the number of the words after the search keyword word segmentation and the words after the search result is hit after the duplication is removed.
Further, the step of constructing search indexes of the ES component according to the text data to be analyzed and the target matching manner corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed includes:
extracting one field to be analyzed from the text data to be analyzed as a target field to be analyzed;
constructing a search index of the ES component according to the field to be analyzed and the target matching mode corresponding to the field to be analyzed to obtain the target search index corresponding to the field to be analyzed;
and repeatedly executing the step of extracting one field to be analyzed from the text data to be analyzed as a target field to be analyzed until the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed is determined.
Further, the step of performing search engine encapsulation according to the ES component, the search engine database, the target search result ranking manner, and the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target text search engine includes:
setting the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed as the index of the ES component to obtain the ES component with the index construction completed;
setting the target search result ordering mode as the search result ordering mode of the ES component with the index construction completed to obtain a target ES component;
and packaging the target ES component and the search engine database to obtain the target text search engine.
The present application also proposes an ES-based text search engine construction apparatus, the apparatus including:
the ES component and database construction module is used for constructing ES components and constructing a search engine database based on the ES components;
the text data acquisition module to be stored is used for acquiring a data source set to be searched, acquiring text data to be stored according to the data source set to be searched, and storing the text data to be stored in the search engine database;
the field type analysis module is used for acquiring text data from the search engine database to obtain text data to be analyzed, and respectively performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed;
the importance scoring module is used for respectively scoring the importance of each field to be analyzed in the text data to be analyzed to obtain a target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed;
a matching mode setting module, configured to perform matching mode setting according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed, so as to obtain a target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed;
the search index construction module of the ES component is used for constructing the search index of the ES component according to the text data to be analyzed and the target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed;
a search result sorting mode setting module, configured to set a search result sorting mode according to the target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed and a relevancy scoring method of the ES component, so as to obtain a target search result sorting mode;
and the search engine packaging module is used for packaging a search engine according to the ES component, the search engine database, the target search result ordering mode and the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target text search engine.
The present application further proposes a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of any of the above methods when executing the computer program.
The present application also proposes a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any of the above.
According to the construction method, the construction device, the construction equipment and the construction medium of the text search engine based on the ES, the text data of the data source is stored in a search engine database constructed based on the ES component, then the data is obtained from the search engine database to carry out field type analysis and importance grading, and a target matching mode is obtained according to the target field type; obtaining a target search index according to the text data to be analyzed and the target matching mode; obtaining a target search result sorting mode according to the target importance scoring result and the relevance scoring method of the ES component; the target text search engine is obtained according to the ES component, the search engine database, the target search result ordering mode and the target search index, so that the target text search engine can be quickly constructed according to a plurality of data sources, and the text search engine does not need to be separately constructed for different data sources; the construction cost of the text search engine is simplified by automatically determining the matching mode of the fields of the text content; the search result ranking mode is set according to the field target importance scoring result and the ES component relevance scoring method, so that the ranking accuracy of the search results obtained by the constructed target text search engine is improved.
Drawings
FIG. 1 is a flow chart illustrating a method for constructing an ES-based text search engine according to an embodiment of the present application;
FIG. 2 is a block diagram schematically illustrating a construction apparatus of an ES-based text search engine according to an embodiment of the present application;
fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.
The objectives, features, and advantages of the present application will be further described with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In order to solve the technical problems that the search engine in the prior art is difficult to adapt to the application scene that the retrieval content comes from different data sources, the data content of each data source is different, and the interaction mode of different data sources is different, the construction method of the text search engine based on the ES is provided in the application, and the method is applied to the technical field of artificial intelligence. The construction method of the ES-based text search engine comprises the steps of storing text data of a data source in an ES component-based construction search engine database, obtaining data from the search engine database to perform field type analysis and importance grading, determining a target matching mode according to field type analysis results, obtaining a target search index according to text data to be analyzed and the target matching mode, obtaining a target search result sorting mode according to importance grading results and the correlation degree grading method of the ES components, and obtaining the target text search engine according to the ES components, the search engine database, the target search result sorting mode and the target search index, so that the target text search engine can be quickly constructed according to a plurality of data sources, and the text search engine does not need to be separately constructed for different data sources; the construction cost of the text search engine is simplified by automatically determining the matching mode of the fields of the text content; the search result ranking mode is set according to the field target importance scoring result and the ES component relevance scoring method, so that the ranking accuracy of the search results obtained by the constructed target text search engine is improved.
Referring to fig. 1, an embodiment of the present application provides a method for constructing an ES-based text search engine, where the method includes:
s1: constructing an ES component, and constructing a search engine database based on the ES component;
s2: acquiring a data source set to be searched, acquiring text data to be stored according to the data source set to be searched, and storing the text data to be stored in the search engine database;
s3: acquiring text data from the search engine database to obtain text data to be analyzed, and respectively performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed;
s4: respectively scoring the importance of each field to be analyzed in the text data to be analyzed to obtain a target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed;
s5: performing matching mode setting according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed;
s6: constructing search indexes of the ES component according to the text data to be analyzed and the target matching modes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed;
s7: setting a search result sorting mode according to the target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed and the relevancy scoring method of the ES component to obtain a target search result sorting mode;
s8: and performing search engine encapsulation according to the ES component, the search engine database, the target search result ordering mode and the target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain a target text search engine.
In the embodiment, text data of a data source is stored in a search engine database constructed based on an ES component, then data is obtained from the search engine database to perform field type analysis and importance scoring, and a target matching mode is obtained according to a target field type; obtaining a target search index according to the text data to be analyzed and the target matching mode; obtaining a target search result sorting mode according to the target importance scoring result and the relevance scoring method of the ES component; the target text search engine is obtained according to the ES component, the search engine database, the target search result ordering mode and the target search index, so that the target text search engine can be quickly constructed according to a plurality of data sources, and the text search engine does not need to be separately constructed for different data sources; the construction cost of the text search engine is simplified by automatically determining the matching mode of the fields of the text content; the search result ranking mode is set according to the field target importance scoring result and the ES component relevance scoring method, so that the ranking accuracy of the search results obtained by the constructed target text search engine is improved.
For S1, acquiring an ES (fully called elastic search, Lucene-based search server, providing a distributed multi-user-capability full-text search engine, RESTful web interface-based) installation file; installing the ES installation file to obtain an ES component; and constructing a search engine database matched with the ES component.
For S2, the data source set to be searched input by the user may be obtained, or the data source set to be searched input by the third-party application system may also be obtained from the database.
The data source set to be searched is a set of data sources which can be searched by the target text search engine realized by the application. The data source set to be searched comprises configuration data of a plurality of data sources. Configuration data for the data sources include, but are not limited to: data source name, data source access address, user name, password.
And acquiring text data from each data source corresponding to the data source set to be searched, taking the acquired text data as text data to be stored, and storing all the text data to be stored in the search engine database.
The text data to be stored refers to the text data which needs to be stored in the search engine database.
For S3, acquiring text data to be analyzed from the search engine database, and taking the acquired text data as text data to be analyzed; and analyzing the field type of the field to be analyzed according to the field value of the field to be analyzed in the text data to be analyzed, and taking the field type obtained by analysis as the target field type corresponding to the field to be analyzed, wherein the field to be analyzed is any field to be analyzed in all fields to be analyzed in the text data to be analyzed.
The text data to be analyzed refers to the text data which needs to be subjected to field type analysis and importance scoring. The text data to be analyzed comprises the text content of at least one field to be analyzed. It will be appreciated that each field to be analyzed includes one or more field values.
Target field type, i.e. field type. The field types include: coded and uncoded. The code value type means that values are distributed in a limited and selectable range. For example, the value range of the academic calendar is limited and selectable, and when the field to be analyzed is the academic calendar, the target field type corresponding to the field to be analyzed is determined to be the code value type, which is not specifically limited in this example. Non-code value type means that the range distribution of the values is relatively wide. For example, the value range of the name is relatively wide, and when the field to be analyzed is the name, the type of the target field corresponding to the field to be analyzed is determined to be a non-code value type, which is not limited in this example.
It can be understood that, when constructing the target text search engine, all the text data in the search engine database are extracted as the text data to be analyzed. In the use process after the construction of the target text search engine is completed, the text data newly stored in the search engine database can be extracted as the text data to be analyzed, and then steps S3 to S8 are performed to update the target text search engine. Therefore, the target matching mode and the target search index corresponding to all the fields to be analyzed of the target text search engine are automatically determined, the automation degree is improved, the cost for constructing the target text search engine is reduced, and the cost for using the target text search engine is also reduced.
For S4, importance scoring is performed on the target field to be analyzed according to the field value of the target field to be analyzed in the text data to be analyzed, and the obtained importance scoring result is used as the target importance scoring result corresponding to the target field to be analyzed, where the target field to be analyzed is any one of all the fields to be analyzed in the text data to be analyzed.
The target importance score result is calculated by the formula b (i):
Figure DEST_PATH_IMAGE013
wherein B (i) is a target importance scoring result of the ith field to be analyzed of the text data to be analyzed,
Figure DEST_PATH_IMAGE014
is an average of the number of characters of a non-empty field value of the ith field to be analyzed of the text data to be analyzed,
Figure DEST_PATH_IMAGE015
is a standard deviation of a number distribution of characters of a non-empty field value of an ith field to be analyzed of the text data to be analyzed, and log () is a logarithmic function.
Alternatively to this, the first and second parts may,
Figure DEST_PATH_IMAGE016
is an average of the number of chinese characters of a non-empty field value of an ith field to be analyzed of the text data to be analyzed,
Figure DEST_PATH_IMAGE017
is a standard deviation of a number distribution of chinese characters of a non-empty field value of an ith field to be analyzed of the text data to be analyzed. Therefore, the search engine constructed by the method is suitable for the application scene of Chinese search.
For S5, when the target field type corresponding to the target field to be analyzed is a code value type, determining a target matching mode corresponding to the target field to be analyzed as an accurate matching search mode of the ES; when the target field type corresponding to the field to be analyzed is a non-code value type, determining a target matching mode corresponding to the field to be analyzed as a keyword word segmentation matching degree matching mode; the target field to be analyzed is any field to be analyzed in all fields to be analyzed in the text data to be analyzed.
The matching mode of the keyword segmentation matching degree is determined according to the matching degree of the search keyword after segmentation.
For S6, search indexes of the ES component are constructed by using an index construction method of ES according to the text data to be analyzed and the target matching manner corresponding to each field to be analyzed corresponding to the text data to be analyzed, and the constructed indexes are used as target search indexes corresponding to each field to be analyzed corresponding to the text data to be analyzed.
It can be understood that the specific implementation of the index construction method of the ES according to the text data to be analyzed and the target matching manner corresponding to each field to be analyzed corresponding to the text data to be analyzed may be selected from the prior art, and details are not repeated here.
For S7, the calculation formula S (i) of the target search result ranking manner is:
Figure DEST_PATH_IMAGE018
wherein m is the number of the words after the word segmentation of the search keyword, B (j) is the target importance scoring result of the field to be analyzed corresponding to the jth word in all the words after the word segmentation of the search keyword,
Figure DEST_PATH_IMAGE019
the j-th word in all the words after the word segmentation of the search keyword is the relevance degree score contribution value relative to the search keyword.
Optionally, the scoring contribution value of the relevance degree of the jth word relative to the search keyword in all the words after the search keyword is segmented may be a scoring result calculated by adopting an TF-IDF (erm frequency-inverse document frequency) algorithm of the ES.
The search keyword is a keyword for searching, which is input by a user into a target text search engine constructed by the present application.
And determining a target search result sorting mode by the target importance scoring result and the relevance scoring method of the ES component, thereby improving the accuracy of search result sorting and improving the user satisfaction.
For S8, encapsulating the ES component, the search engine database, the target search result ordering manner, and the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed into a search engine, and taking the search engine obtained by encapsulation as a target text search engine.
In an embodiment, the step of performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed includes:
s31: respectively carrying out information entropy calculation on each field to be analyzed of the text data to be analyzed to obtain target field information entropy corresponding to each field to be analyzed corresponding to the text data to be analyzed;
s32: and performing field type analysis according to the target field information entropy corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed.
The method and the device realize that the target field type is determined according to the target field information entropy corresponding to the field to be analyzed, and provide a data basis for subsequent matching mode setting.
For S31, performing information entropy calculation on the target field to be analyzed according to the field value of the target field to be analyzed in the text data to be analyzed by adopting an information entropy algorithm, and taking the calculation result as the target field information entropy corresponding to the target field to be analyzed; the target field to be analyzed is any field to be analyzed in all fields to be analyzed in the text data to be analyzed.
And S32, performing field type analysis on the target field information entropy corresponding to the target field to be analyzed, and taking the obtained field type as the target field type corresponding to the target field to be analyzed.
In an embodiment, the step of performing field type analysis according to the target field information entropy corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed includes:
when in use
Figure DEST_PATH_IMAGE020
Determining the type of the target field as a code value type if the target field is not the code value type;
wherein, the calculation formula type (i) of the target field type is as follows:
Figure DEST_PATH_IMAGE021
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE022
is the target field information entropy corresponding to the ith field to be analyzed corresponding to the text data to be analyzed,
Figure DEST_PATH_IMAGE023
is the number of the de-duplicated field values corresponding to the ith field to be analyzed corresponding to the text data to be analyzed, k is the number of all field values of the ith field to be analyzed corresponding to the text data to be analyzed,
Figure DEST_PATH_IMAGE024
is the number of non-empty field values of the ith field to be analyzed corresponding to the text data to be analyzed, k (i) is the number of field values of the ith field to be analyzed corresponding to the text data to be analyzed, and C is a constant;
n (i) is the ith station corresponding to the text data to be analyzedThe number of the de-duplicated field values of the fields to be analyzed, p (j) is the probability that the ith field to be analyzed corresponding to the text data to be analyzed is the jth field value in the de-duplicated field values, and log () is a logarithmic function;
Figure DEST_PATH_IMAGE025
judging whether a jth field value in an ith de-duplicated field value of the field to be analyzed corresponding to the text data to be analyzed is empty, and determining that the jth field value in the de-duplicated field value is empty
Figure DEST_PATH_IMAGE026
Is 0, when the j field value in the de-duplicated field value is not null
Figure DEST_PATH_IMAGE027
Is 1.
According to the field type analysis method and device, the field type analysis is carried out according to the information entropy, and therefore the accuracy of determining the target field type is improved.
Alternatively, the value of C is set to 10,
Figure DEST_PATH_IMAGE028
and
Figure DEST_PATH_IMAGE029
the comparison of (a) and (b) can be regarded as the comparison of the information entropy of the target field and the non-empty distribution of the field value of the field to be analyzed, generally, when two numbers (a and b) are compared, if a is more than 10 times of b, then a can be regarded as being far more than b, so that the significance of the information entropy of the field to be analyzed on the field value is not only considered, the condition that the search is worthless when the field value is empty is avoided, and meanwhile, the interference influence of the empty field value on the judgment result is avoided.
In an embodiment, the step of setting a matching manner according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target matching manner corresponding to each field to be analyzed corresponding to the text data to be analyzed includes:
s51: respectively judging whether the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed is a code value type;
s52: and when the target field type is the code value type, determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as an accurate matching search mode of the ES component, otherwise, determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as a keyword segmentation matching degree matching mode.
According to the method and the device, the matching mode is set according to the target field type, so that different matching modes are set for different target field types, and the matching accuracy is improved.
For S52, when the target field type is the code value type, it means that a value range of a field value of a field to be analyzed corresponding to the target field type is limited, and at this time, precise matching may be adopted to improve matching accuracy, so that the target matching manner corresponding to the field to be analyzed corresponding to the target field type of the code value type is determined as a precise matching search manner of the ES component. When the target field type is the non-code value type, the value range distribution of the field value of the field to be analyzed corresponding to the target field type is wide, and at the moment, if accurate matching is adopted, a large number of values are matched, so that the searching efficiency and accuracy are reduced, therefore, the target matching mode corresponding to the field to be analyzed corresponding to the non-code value type target field type is determined to be a keyword segmentation matching degree matching mode, and the searching efficiency and accuracy are improved through the keyword segmentation matching degree matching mode.
It can be understood that, when the target field type is the code value type, determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as the precise matching search mode of the ES can avoid errors in a specific search scene. The value range of the code value type is very limited, and the phrase matching search mode or the exact matching search mode using the ES in the prior art exists, but a specific scene is not suitable for using the phrase matching search mode. For example, when the searched field has "regular employee" and "informal employee", the search keyword is "regular employee", and the "regular employee" and the "informal employee" can be matched simultaneously by using the phrase matching search method of ES, and the matching result is not in accordance with the search requirement, which is not limited in this example.
In an embodiment, the step of determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as a keyword segmentation matching degree matching mode includes:
determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type in a non-code value type as the keyword segmentation matching degree matching mode, wherein the keyword segmentation matching degree matching mode refers to that the keyword segmentation matching degree is set to be 100%;
the formula match for calculating the keyword segmentation matching degree is as follows:
Figure DEST_PATH_IMAGE030
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE031
is the number of words after the search keyword is segmented,
Figure DEST_PATH_IMAGE032
is the number of the words after the search keyword word segmentation and the words after the search result is hit after the duplication is removed.
In the embodiment, the keyword segmentation matching degree matching mode is set to be 100%, so that the retrieval efficiency and accuracy are improved.
For example, the word segmentation result of the search keyword "data engineer" is 3 words: "data", "engineer", for the search result "big data WEB development engineer", the word after the word segmentation of the search keywordThe de-duplicated words of "data", "engineering", "teacher" hit in the search result "big data WEB development engineer" are 3 (i.e. "data", "engineering", "teacher"), and the keyword matching degree
Figure DEST_PATH_IMAGE033
The examples herein are not particularly limited.
For example, the word segmentation result of the search keyword "data engineer" is 3 words: "data", "engineer", for the search result "data analyst", the words "data", "engineer" after the word segmentation of the search keyword are 2 (i.e., "data", "engineer") after the de-duplication words of the search result "data analyst" are hit, and the keyword matching degree
Figure DEST_PATH_IMAGE034
The examples herein are not particularly limited.
In an embodiment, the step of constructing the search index of the ES component according to the text data to be analyzed and the target matching manner corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed includes:
s61: extracting one field to be analyzed from the text data to be analyzed as a target field to be analyzed;
s62: constructing a search index of the ES component according to the field to be analyzed and the target matching mode corresponding to the field to be analyzed to obtain the target search index corresponding to the field to be analyzed;
s63: and repeatedly executing the step of extracting one field to be analyzed from the text data to be analyzed as a target field to be analyzed until the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed is determined.
According to the method and the device, the search index of the ES component is constructed according to the target matching mode of the field to be analyzed, so that the accuracy of the constructed target search index is improved, and the retrieval efficiency of the target text search engine is improved.
For S61, one of the fields to be analyzed is randomly extracted from the text data to be analyzed as a target field to be analyzed.
And S62, constructing search indexes of the ES component by adopting an ES index construction method according to the target field to be analyzed and the target matching mode corresponding to the target field to be analyzed, and taking the constructed indexes as the target search indexes corresponding to the target field to be analyzed.
For S63, repeating steps S61 to S63 until determining the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed.
In an embodiment, the step of performing search engine encapsulation according to the ES component, the search engine database, the target search result ordering manner, and the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target text search engine includes:
s81: setting the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed as the index of the ES component to obtain the ES component with the index construction completed;
s82: setting the target search result ordering mode as the search result ordering mode of the ES component with the index construction completed to obtain a target ES component;
s83: and packaging the target ES component and the search engine database to obtain the target text search engine.
According to the embodiment, the search engine packaging is carried out according to the ES component, the search engine database, the target search result ordering mode and the target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed, so that the target text search engine can be quickly constructed according to a plurality of data sources, and the text search engine does not need to be separately constructed for different data sources.
For S81, the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed is set as the index of the ES component by using the index setting method of the ES component, so that the ES component can search based on the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed.
For S82, the ES component sorting method is adopted to set the target search result sorting method as the search result sorting method of the ES component that completes the index construction, so that the ES component can sort the search results in the target search result sorting method by adopting the ES component sorting method.
For S83, the target ES component and the search engine database are packaged into a search engine, and the packaged search engine is used as the target text search engine.
Referring to fig. 2, the present application also provides an apparatus for constructing an ES-based text search engine, the apparatus including:
an ES component and database construction module 100 for constructing ES components, constructing a search engine database based on the ES components;
a to-be-stored text data obtaining module 200, configured to obtain a to-be-searched data source set, obtain to-be-stored text data according to the to-be-searched data source set, and store the to-be-stored text data in the search engine database;
a field type analysis module 300, configured to obtain text data from the search engine database, obtain text data to be analyzed, and perform field type analysis on each field to be analyzed in the text data to be analyzed, respectively, to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed;
the importance scoring module 400 is configured to perform importance scoring on each to-be-analyzed field in the to-be-analyzed text data, so as to obtain a target importance scoring result corresponding to each to-be-analyzed field corresponding to the to-be-analyzed text data;
a matching mode setting module 500, configured to perform matching mode setting according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed, so as to obtain a target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed;
a search index construction module 600 of the ES component, configured to perform search index construction of the ES component according to the text data to be analyzed and the target matching manner corresponding to each field to be analyzed corresponding to the text data to be analyzed, so as to obtain a target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed;
a search result ranking mode setting module 700, configured to set a search result ranking mode according to the target importance scoring result corresponding to each to-be-analyzed field corresponding to the to-be-analyzed text data and a relevance scoring method of the ES component, so as to obtain a target search result ranking mode;
a search engine encapsulation module 800, configured to perform search engine encapsulation according to the ES component, the search engine database, the target search result ordering manner, and the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed, so as to obtain a target text search engine.
In the embodiment, text data of a data source is stored in a search engine database constructed based on an ES component, then data is obtained from the search engine database to perform field type analysis and importance scoring, and a target matching mode is obtained according to a target field type; obtaining a target search index according to the text data to be analyzed and the target matching mode; obtaining a target search result sorting mode according to the target importance scoring result and the relevance scoring method of the ES component; the target text search engine is obtained according to the ES component, the search engine database, the target search result ordering mode and the target search index, so that the target text search engine can be quickly constructed according to a plurality of data sources, and the text search engine does not need to be separately constructed for different data sources; the construction cost of the text search engine is simplified by automatically determining the matching mode of the fields of the text content; the search result ranking mode is set according to the field target importance scoring result and the ES component relevance scoring method, so that the ranking accuracy of the search results obtained by the constructed target text search engine is improved.
Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer device is used for storing data such as a construction method of the ES-based text search engine. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of constructing an ES-based text search engine. The construction method of the ES-based text search engine comprises the following steps: constructing an ES component, and constructing a search engine database based on the ES component; acquiring a data source set to be searched, acquiring text data to be stored according to the data source set to be searched, and storing the text data to be stored in the search engine database; acquiring text data from the search engine database to obtain text data to be analyzed, and respectively performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed; respectively scoring the importance of each field to be analyzed in the text data to be analyzed to obtain a target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed; performing matching mode setting according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed; constructing search indexes of the ES component according to the text data to be analyzed and the target matching modes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed; setting a search result sorting mode according to the target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed and the relevancy scoring method of the ES component to obtain a target search result sorting mode; and performing search engine encapsulation according to the ES component, the search engine database, the target search result ordering mode and the target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain a target text search engine.
In the embodiment, text data of a data source is stored in a search engine database constructed based on an ES component, then data is obtained from the search engine database to perform field type analysis and importance scoring, and a target matching mode is obtained according to a target field type; obtaining a target search index according to the text data to be analyzed and the target matching mode; obtaining a target search result sorting mode according to the target importance scoring result and the relevance scoring method of the ES component; the target text search engine is obtained according to the ES component, the search engine database, the target search result ordering mode and the target search index, so that the target text search engine can be quickly constructed according to a plurality of data sources, and the text search engine does not need to be separately constructed for different data sources; the construction cost of the text search engine is simplified by automatically determining the matching mode of the fields of the text content; the search result ranking mode is set according to the field target importance scoring result and the ES component relevance scoring method, so that the ranking accuracy of the search results obtained by the constructed target text search engine is improved.
An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements a method for constructing an ES-based text search engine, including the steps of: constructing an ES component, and constructing a search engine database based on the ES component; acquiring a data source set to be searched, acquiring text data to be stored according to the data source set to be searched, and storing the text data to be stored in the search engine database; acquiring text data from the search engine database to obtain text data to be analyzed, and respectively performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed; respectively scoring the importance of each field to be analyzed in the text data to be analyzed to obtain a target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed; performing matching mode setting according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed; constructing search indexes of the ES component according to the text data to be analyzed and the target matching modes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed; setting a search result sorting mode according to the target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed and the relevancy scoring method of the ES component to obtain a target search result sorting mode; and performing search engine encapsulation according to the ES component, the search engine database, the target search result ordering mode and the target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain a target text search engine.
According to the method for constructing the ES-based text search engine, the text data of the data source is stored in the ES-based component construction search engine database, then the data is obtained from the search engine database to carry out field type analysis and importance scoring, and a target matching mode is obtained according to the target field type; obtaining a target search index according to the text data to be analyzed and the target matching mode; obtaining a target search result sorting mode according to the target importance scoring result and the relevance scoring method of the ES component; the target text search engine is obtained according to the ES component, the search engine database, the target search result ordering mode and the target search index, so that the target text search engine can be quickly constructed according to a plurality of data sources, and the text search engine does not need to be separately constructed for different data sources; the construction cost of the text search engine is simplified by automatically determining the matching mode of the fields of the text content; the search result ranking mode is set according to the field target importance scoring result and the ES component relevance scoring method, so that the ranking accuracy of the search results obtained by the constructed target text search engine is improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. A construction method of an ES-based text search engine is characterized by comprising the following steps:
constructing an ES component, and constructing a search engine database based on the ES component;
acquiring a data source set to be searched, acquiring text data to be stored according to the data source set to be searched, and storing the text data to be stored in the search engine database;
acquiring text data from the search engine database to obtain text data to be analyzed, and respectively performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed;
respectively scoring the importance of each field to be analyzed in the text data to be analyzed to obtain a target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed;
performing matching mode setting according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed;
constructing search indexes of the ES component according to the text data to be analyzed and the target matching modes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed;
setting a search result sorting mode according to the target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed and the relevancy scoring method of the ES component to obtain a target search result sorting mode;
and performing search engine encapsulation according to the ES component, the search engine database, the target search result ordering mode and the target search indexes corresponding to the fields to be analyzed corresponding to the text data to be analyzed to obtain a target text search engine.
2. The construction method of the ES-based text search engine according to claim 1, wherein the step of performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed comprises:
respectively carrying out information entropy calculation on each field to be analyzed of the text data to be analyzed to obtain target field information entropy corresponding to each field to be analyzed corresponding to the text data to be analyzed;
and performing field type analysis according to the target field information entropy corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed.
3. The construction method of the ES-based text search engine according to claim 2, wherein the step of performing field type analysis according to the target field information entropy corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed comprises:
when in use
Figure 559796DEST_PATH_IMAGE002
Determining the type of the target field as a code value type if the target field is not the code value type;
wherein, the calculation formula type (i) of the target field type is as follows:
Figure 195045DEST_PATH_IMAGE004
wherein the content of the first and second substances,
Figure 443624DEST_PATH_IMAGE005
is the target field information entropy corresponding to the ith field to be analyzed corresponding to the text data to be analyzed,
Figure 42095DEST_PATH_IMAGE006
is the number of the de-duplicated field values corresponding to the ith field to be analyzed corresponding to the text data to be analyzed, k is the number of all field values of the ith field to be analyzed corresponding to the text data to be analyzed,
Figure 590888DEST_PATH_IMAGE007
is the number of non-empty field values of the ith field to be analyzed corresponding to the text data to be analyzed, k (i) is the number of field values of the ith field to be analyzed corresponding to the text data to be analyzed, and C is a constant;
n (i) is the number of the field values of the ith field to be analyzed corresponding to the text data to be analyzed after the duplication is removed, p (j) is the probability that the ith field to be analyzed corresponding to the text data to be analyzed is the jth field value in the field values after the duplication is removed, and log () is a logarithmic function;
Figure 143179DEST_PATH_IMAGE008
judging whether a jth field value in an ith de-duplicated field value of the field to be analyzed corresponding to the text data to be analyzed is empty, and determining that the jth field value in the de-duplicated field value is empty
Figure 144633DEST_PATH_IMAGE009
Is 0, when the j field value in the de-duplicated field value is not null
Figure 812375DEST_PATH_IMAGE010
Is 1.
4. The construction method of the ES-based text search engine according to claim 1, wherein the step of setting a matching manner according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target matching manner corresponding to each field to be analyzed corresponding to the text data to be analyzed comprises:
respectively judging whether the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed is a code value type;
and when the target field type is the code value type, determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as an accurate matching search mode of the ES component, otherwise, determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as a keyword segmentation matching degree matching mode.
5. The method according to claim 4, wherein the step of determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type as a keyword segmentation matching degree matching mode comprises:
determining the target matching mode corresponding to the field to be analyzed corresponding to the target field type in a non-code value type as the keyword segmentation matching degree matching mode, wherein the keyword segmentation matching degree matching mode refers to that the keyword segmentation matching degree is set to be 100%;
the formula match for calculating the keyword segmentation matching degree is as follows:
Figure 950095DEST_PATH_IMAGE011
wherein the content of the first and second substances,
Figure 661568DEST_PATH_IMAGE012
is the number of words after the search keyword is segmented,
Figure 619160DEST_PATH_IMAGE013
is the number of the words after the search keyword word segmentation and the words after the search result is hit after the duplication is removed.
6. The method for building the ES-based text search engine according to claim 1, wherein the step of building the search index of the ES component according to the text data to be analyzed and the target matching manner corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed comprises:
extracting one field to be analyzed from the text data to be analyzed as a target field to be analyzed;
constructing a search index of the ES component according to the field to be analyzed and the target matching mode corresponding to the field to be analyzed to obtain the target search index corresponding to the field to be analyzed;
and repeatedly executing the step of extracting one field to be analyzed from the text data to be analyzed as a target field to be analyzed until the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed is determined.
7. The method according to claim 1, wherein the step of encapsulating the search engine according to the ES component, the search engine database, the target search result ordering manner, and the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target text search engine comprises:
setting the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed as the index of the ES component to obtain the ES component with the index construction completed;
setting the target search result ordering mode as the search result ordering mode of the ES component with the index construction completed to obtain a target ES component;
and packaging the target ES component and the search engine database to obtain the target text search engine.
8. An apparatus for constructing an ES-based text search engine, the apparatus comprising:
the ES component and database construction module is used for constructing ES components and constructing a search engine database based on the ES components;
the text data acquisition module to be stored is used for acquiring a data source set to be searched, acquiring text data to be stored according to the data source set to be searched, and storing the text data to be stored in the search engine database;
the field type analysis module is used for acquiring text data from the search engine database to obtain text data to be analyzed, and respectively performing field type analysis on each field to be analyzed in the text data to be analyzed to obtain a target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed;
the importance scoring module is used for respectively scoring the importance of each field to be analyzed in the text data to be analyzed to obtain a target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed;
a matching mode setting module, configured to perform matching mode setting according to the target field type corresponding to each field to be analyzed corresponding to the text data to be analyzed, so as to obtain a target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed;
the search index construction module of the ES component is used for constructing the search index of the ES component according to the text data to be analyzed and the target matching mode corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed;
a search result sorting mode setting module, configured to set a search result sorting mode according to the target importance scoring result corresponding to each field to be analyzed corresponding to the text data to be analyzed and a relevancy scoring method of the ES component, so as to obtain a target search result sorting mode;
and the search engine packaging module is used for packaging a search engine according to the ES component, the search engine database, the target search result ordering mode and the target search index corresponding to each field to be analyzed corresponding to the text data to be analyzed to obtain a target text search engine.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202110191157.7A 2021-02-20 2021-02-20 ES-based text search engine construction method, device, equipment and medium Active CN112559671B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110191157.7A CN112559671B (en) 2021-02-20 2021-02-20 ES-based text search engine construction method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110191157.7A CN112559671B (en) 2021-02-20 2021-02-20 ES-based text search engine construction method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN112559671A true CN112559671A (en) 2021-03-26
CN112559671B CN112559671B (en) 2021-06-08

Family

ID=75034416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110191157.7A Active CN112559671B (en) 2021-02-20 2021-02-20 ES-based text search engine construction method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN112559671B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168966A (en) * 2016-03-07 2017-09-15 阿里巴巴集团控股有限公司 A kind of search engine index construction method and device
CN110502548A (en) * 2019-07-26 2019-11-26 视联动力信息技术股份有限公司 A kind of search result recommended method, device and computer readable storage medium
US20200050643A1 (en) * 2015-10-23 2020-02-13 International Business Machines Corporation Ingestion planning for complex tables
CN111026710A (en) * 2019-12-11 2020-04-17 华南师范大学 Data set retrieval method and system
CN111881170A (en) * 2020-07-14 2020-11-03 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for mining timeliness query content field
CN112052308A (en) * 2020-08-21 2020-12-08 腾讯科技(深圳)有限公司 Abstract text extraction method and device, storage medium and electronic equipment
CN112115229A (en) * 2019-06-20 2020-12-22 北京京东尚科信息技术有限公司 Text intention recognition method, device and system and text classification system
CN112115281A (en) * 2020-09-17 2020-12-22 杭州海康威视数字技术股份有限公司 Data retrieval method, device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200050643A1 (en) * 2015-10-23 2020-02-13 International Business Machines Corporation Ingestion planning for complex tables
CN107168966A (en) * 2016-03-07 2017-09-15 阿里巴巴集团控股有限公司 A kind of search engine index construction method and device
CN112115229A (en) * 2019-06-20 2020-12-22 北京京东尚科信息技术有限公司 Text intention recognition method, device and system and text classification system
CN110502548A (en) * 2019-07-26 2019-11-26 视联动力信息技术股份有限公司 A kind of search result recommended method, device and computer readable storage medium
CN111026710A (en) * 2019-12-11 2020-04-17 华南师范大学 Data set retrieval method and system
CN111881170A (en) * 2020-07-14 2020-11-03 腾讯科技(深圳)有限公司 Method, device, equipment and storage medium for mining timeliness query content field
CN112052308A (en) * 2020-08-21 2020-12-08 腾讯科技(深圳)有限公司 Abstract text extraction method and device, storage medium and electronic equipment
CN112115281A (en) * 2020-09-17 2020-12-22 杭州海康威视数字技术股份有限公司 Data retrieval method, device and storage medium

Also Published As

Publication number Publication date
CN112559671B (en) 2021-06-08

Similar Documents

Publication Publication Date Title
CN111563051B (en) Crawler-based data verification method and device, computer equipment and storage medium
US20180165370A1 (en) Methods and systems for object recognition
CN110321408B (en) Searching method and device based on knowledge graph, computer equipment and storage medium
CN110674319A (en) Label determination method and device, computer equipment and storage medium
CN110458324B (en) Method and device for calculating risk probability and computer equipment
CN111177405A (en) Data search matching method and device, computer equipment and storage medium
CN112131295A (en) Data processing method and device based on Elasticissearch
CN111079429A (en) Entity disambiguation method and device based on intention recognition model and computer equipment
CN109325042B (en) Processing template acquisition method, form processing method, device, equipment and medium
CN112685475A (en) Report query method and device, computer equipment and storage medium
CN112347340A (en) Information searching method and device and computer equipment
CN112434217A (en) Position information recommendation method, system, computer equipment and storage medium
CN114595158A (en) Test case generation method, device, equipment and medium based on artificial intelligence
CN111083054B (en) Route configuration processing method and device, computer equipment and storage medium
Pocci et al. Synchronizing sequences on a class of unbounded systems using synchronized Petri nets
Brennan et al. Implementation of an organic database structure for population-based structural health monitoring
US20200293581A1 (en) Systems and methods for crawling web pages and parsing relevant information stored in web pages
CN112541739B (en) Method, device, equipment and medium for testing question-answer intention classification model
CN112559671B (en) ES-based text search engine construction method, device, equipment and medium
CN110851709B (en) Information pushing method and device, computer equipment and storage medium
CN110866007B (en) Information management method, system and computer equipment for big data application and table
CN110781310A (en) Target concept graph construction method and device, computer equipment and storage medium
CN114610973A (en) Information search matching method and device, computer equipment and storage medium
CN115374849A (en) Enterprise related patent retrieval method, device, equipment and medium
CN114968346A (en) Method, device, equipment and storage medium for detecting field comments of DDL script

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant