CN107784123B - Topic-based search optimization method - Google Patents
Topic-based search optimization method Download PDFInfo
- Publication number
- CN107784123B CN107784123B CN201711178366.8A CN201711178366A CN107784123B CN 107784123 B CN107784123 B CN 107784123B CN 201711178366 A CN201711178366 A CN 201711178366A CN 107784123 B CN107784123 B CN 107784123B
- Authority
- CN
- China
- Prior art keywords
- keywords
- preset
- optimization method
- user
- topic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a search optimization method based on a theme, which comprises the following steps: step 1, acquiring a data set provided by a user, wherein the data set comprises vocabularies of the field to which the user belongs; step 2, generating a theme by using the vocabulary, wherein the theme comprises preset keywords and an inter-word logical relationship; and 3, reading target keywords input to a search engine by a user, matching the target keywords with the preset keywords, and displaying a search result to the user according to the inter-word logical relationship. The invention provides a searching method which meets the requirements of users, not only can effectively improve the accuracy and intelligence of searching, but also can optimize the searching result and efficiently provide accurate searching results for the users.
Description
Technical Field
The invention relates to the technical field of full-text search engines, in particular to a theme-based search optimization method.
Background
At present, a conventional full-text search engine scans each word in a full text through an index, and then indexes each word to indicate the occurrence frequency and position of each word in an article; when the user carries out searching operation, the full-text search engine carries out searching according to the established index and feeds back the searched result to the user. However, with the expansion of knowledge base and the increasing knowledge level of users, the way similar to "looking up words through the sub-table in the dictionary" can not meet the requirements of users.
The above conventional search method has the following problems: (1) search results tend to be irrelevant to the user's intent; (2) useful search results are ranked too far back. For example, when the user inputs "Aliskiu Security", the displayed search results are often "Aliskiu," "network Security," and the like, which are not desired by the user, as will be apparent to those skilled in the relevant art.
Therefore, how to make the search result conform to the user's intention and preferentially show the useful search result to the user becomes a key point for technical problems and research in the past that the skilled person needs to solve urgently.
Disclosure of Invention
In order to solve the problems that the search result provided by the traditional full-text search engine is too large in difference with the intention of a user, the useful search result is too late in sequencing and the like, the invention innovatively provides a theme-based search optimization method, so that the problems of being not intelligent, inaccurate, difficult to optimize and the like in the prior art are solved, and the intelligence and classification accuracy of full-text search are effectively improved.
In order to achieve the technical purpose, the invention discloses a search optimization method based on a theme, which comprises the following steps:
step 1, acquiring a data set provided by a user, wherein the data set comprises vocabularies of the field to which the user belongs;
step 2, generating a theme by using the vocabulary, wherein the theme comprises preset keywords and an interword logical relationship;
and 3, reading target keywords input to a search engine by a user, matching the target keywords with the preset keywords, and displaying a search result to the user according to the inter-word logical relationship.
The invention not only can solve the problems that the traditional full-text search engine has too large difference between the search result and the intention of the user and the useful search result is ranked too late, but also can meet the customized requirement of the user, so that the search result greatly meets the requirement of the user and the search efficiency is very high.
Further, in step 2, the inter-word logical relationship includes an inter-word association relationship, and a highlight mark is given to the associated preset keyword;
and 3, preferentially displaying the search result of the associated preset keyword which is successfully matched with the target keyword and is endowed with the prominent mark.
Based on the improved technical scheme, the method and the device can more efficiently display a plurality of target keywords with incidence relation input by the user, and improve the searching accuracy of the user.
Further, in step 2, according to the order of the highlighted marks from large to small, the association relationship between words includes at least one of the following relationships: the method comprises the following steps that a plurality of preset keywords are adjacent and appear in sequence, characters separated among the preset keywords are less than N, the preset keywords appear in a natural sentence, the preset keywords appear in a paragraph, and the preset keywords appear in an article;
in step 3, if a plurality of preset keywords which are successfully matched with the target keywords and have different prominent marks exist, the search results are sequentially displayed according to the descending order of the prominent marks.
Based on the improved technical scheme, if at least two kinds of incidence relations exist between a plurality of target keywords input by a user, the method and the device can realize the preferential display of the search results which are more consistent with the intention of the user.
Further, step 2 includes a step of deepening the meaning of the preset keyword, where at least one sub-preset keyword is derived based on the meaning between at least two parent preset keywords, and the condition must be satisfied: and an inter-word association relationship exists between the child preset keywords and the parent preset keywords.
Based on the improved technical scheme, the method can reasonably extend the keywords, so that the intelligence and the accuracy of the search are effectively improved.
Further, in step 2, the inter-word logical relationship includes a weight scoring relationship, and the weight scoring is performed on a phrase formed by each preset keyword and/or a plurality of preset keywords having an inter-word association relationship with each other;
and 3, sequentially displaying the search results of the preset keywords or phrases successfully matched with the target keywords according to the sequence of the weight scores from high to low.
Further, in step 2, the vocabulary in the data set is analyzed based on a logistic regression method, and a theme is automatically generated according to the part of speech and the frequency of the vocabulary.
Further, in step 2, based on the business rules of the group where the user is located, the vocabularies in the data set are analyzed by the business experts, and the topics are manually generated according to the analysis results.
Further, the search optimization method further comprises the following steps,
in step 4, feedback data of the user on the search result is obtained; and then returning to the step 2, and adjusting the generated theme according to the feedback data.
Further, in step 3, a step of segmenting the read target keywords is also included, and at least two target keywords formed after segmentation are used for matching with the preset keywords.
Further, in step 1, the data set includes at least one of industry files and knowledge point files in the field to which the user belongs.
The invention has the beneficial effects that: the invention provides a searching method which meets the requirements of users, not only can effectively improve the accuracy and intelligence of searching, but also can optimize the searching result and efficiently provide accurate searching results for the users.
Drawings
FIG. 1 is a flow diagram of a topic-based search optimization method.
Detailed Description
The invention will be explained and explained in detail below with reference to the drawings.
As shown in FIG. 1, the invention discloses a topic-based search optimization method, which comprises the following steps, wherein step 1 and step 2 are preparation steps for optimizing a search engine, and step 3 is a step when a user actually uses the search engine.
Step 1, acquiring a data set provided by a user or a demand party, wherein the data set comprises vocabularies of the field to which the user belongs; in this embodiment, the data set includes at least one of an industry file and a knowledge point file in a field to which the user belongs. Therefore, the invention can provide customized search for the user.
For example, for an IT application scenario and a user population, oracle contains:
Oracle,
a database is stored in the database, and the database is used as a database,
and (4) cloud service.
For example, for the user population in the archaeological field and the historical research field, the oracle contains:
in the history of the operation,
the language of the language or languages to be spoken,
and (5) writing.
Therefore, under the technical teaching of the invention, the search engine can be provided with the field selection function, so that the invention has wider application range.
Step 2, generating a theme by utilizing the vocabulary, wherein the theme is derived from the data set and can be generated in an automatic or manual mode, for example, the vocabulary in the data set is analyzed based on a logistic regression method, and the theme is automatically generated according to the part of speech and the frequency of the vocabulary; for another example, based on the business rules of the group where the user is located, the vocabularies in the data set are analyzed by the business experts, and the topics are manually generated according to the analysis results, so that the business experts can form a definition system from top to bottom to describe the whole data set, such as the Chinese and play including porcelain, pottery, jade ware, bronze ware and the like. The subject of the invention is understood to be a knowledge base.
The theme comprises preset keywords and an interword logic relationship; in this embodiment, the inter-word logical relationship includes an inter-word association relationship and a weight scoring relationship.
The inter-word association relationship is used for giving a highlight mark to the associated preset keyword, and more specifically, the inter-word association relationship includes at least one of the following relationships in an order from large to small of the highlight mark: the method comprises the following steps that a plurality of preset keywords are adjacent and appear in sequence, the number of characters separated among the preset keywords is smaller than N, the preset keywords appear in a natural sentence, the preset keywords appear in a paragraph, and the preset keywords appear in an article. In the specific implementation of the present invention, one or more of the above-mentioned association relations may be adopted. The following are illustrative:
< immediate and sequential interword appearance > (machine learning, algorithm),
< interword space characters are less than N > (machine learning, algorithm),
< occurs in a natural sentence > (machine learning, algorithm),
< appearing in one paragraph > (machine learning, algorithm),
< appear in one article > (machine learning, algorithm).
For the above-mentioned "N", a reasonable and judicious setting, such as N10, can be made in the light of the present invention.
Under the technical inspiration of the invention, the association relationship between words can also comprise according to the sequence of the highlighted marks from big to small; the method comprises the steps that at least one preset keyword appears, the preset keywords exist in the same domain, the similar meaning word relationship exists among the preset keywords, the same root word exists among the preset keywords expressed in English, and the pen error correction relationship exists among the preset keywords.
In addition, the inter-word association relationship can also judge the display sequence of the search results according to the weights or no weights of the inter-word association relationship.
The weight scoring relationship is used for performing weight scoring on each preset keyword and/or a phrase formed by a plurality of preset keywords having an interword association relationship with each other.
For example, for the keywords "machine learning" and "algorithm", if the concept of "machine learning" is to be highlighted, the weight of "machine learning" can be made higher than that of "algorithm";
weight keywords
[0.8] "machine learning",
[0.7] "Algorithm".
For another example, a phrase "machine learning", "algorithm" formed by a plurality of preset keywords having an inter-word association relationship therebetween;
associated relation phrase between weight words
[0.8] < immediate and sequential interword > (machine learning, algorithm),
[0.7] < interword space characters < N > (machine learning, algorithm),
[0.6] < occurs in a natural sentence > (machine learning, algorithm),
[0.4] < appearing in one paragraph > (machine learning, algorithm),
[0.2] < appearing in an article > (machine learning, algorithm).
In addition, in this step, a step of deepening the meaning of the preset keyword is further included, at least one sub-preset keyword is derived based on the meaning between at least two parent preset keywords, and the condition must be satisfied: the inter-word association relationship exists between the child preset keywords and the parent preset keywords. For example, for the jade ornament, preset sub-keywords such as green jade, mutton tallow jade, white jade, yellow jade, bracelet, Buddha board and pendant can be derived, so that the content related to the preset sub-keywords can be found during matching, and the requirements of users are met. In this embodiment, the "association relationship between words" referred to in the above conditions includes three types, that is, the number of characters separated between a plurality of preset keywords is less than N, the plurality of preset keywords appear in a natural sentence, and the plurality of preset keywords appear in a paragraph.
And 3, reading target keywords input to the search engine by the user, matching the target keywords with preset keywords, and displaying the search result to the user according to the inter-word logical relationship. In this embodiment, the search results where the associated preset keyword that is successfully matched with the target keyword and given the prominent mark is located are preferentially displayed. More specifically, if there are a plurality of preset keywords that are successfully matched with the target keyword and have different prominent marks, the search results are presented in order of the prominent marks from large to small. If the weight scoring mode is taken into consideration, the search results where the preset keywords or phrases successfully matched with the target keywords are located are displayed in sequence according to the sequence from high to low of the weight scoring.
In step 4, feedback data of the user on the search result is obtained; and then returning to the step 2, and adjusting the generated theme according to the feedback data. For example, if a business person finds that the search result of an individual search keyword is not ideal in the actual use of the search, or wants to pay attention to a specific field. This keyword or domain-related information can be manually added to the topic for later searching.
In addition, in step 3, a step of segmenting the read target keywords may also be included, and at least two target keywords formed after segmentation are used for matching with preset keywords.
In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., and "a and/or B" means A, B, AB unless specifically limited otherwise.
In the description herein, references to the description of the term "the present embodiment," "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and simplifications made in the spirit of the present invention are intended to be included in the scope of the present invention.
Claims (9)
1. A topic-based search optimization method is characterized in that: the search optimization method comprises the following steps:
step 1, acquiring a data set provided by a user, wherein the data set comprises vocabularies of the field to which the user belongs;
step 2, generating a theme by using the vocabulary, wherein the theme comprises preset keywords and an interword logical relationship;
in the step 2, the method further comprises the step of deepening the meaning of the preset keyword: deriving at least one sub-preset keyword based on the meaning between at least two parent preset keywords, wherein the condition is satisfied: the child preset keywords and the parent preset keywords have an inter-word association relationship;
and 3, reading target keywords input to a search engine by a user, matching the target keywords with the preset keywords, and displaying a search result to the user according to the inter-word logical relationship.
2. The topic-based search optimization method of claim 1, wherein:
in step 2, the inter-word logical relationship comprises an inter-word association relationship, and a highlight mark is given to the associated preset keyword;
and 3, preferentially displaying the search result of the associated preset keyword which is successfully matched with the target keyword and is endowed with the prominent mark.
3. The topic-based search optimization method of claim 2, wherein:
in step 2, according to the sequence of the highlighted marks from large to small, the association relationship between words includes at least one of the following relationships: the method comprises the following steps that a plurality of preset keywords are adjacent and appear in sequence, characters separated among the preset keywords are less than N, the preset keywords appear in a natural sentence, the preset keywords appear in a paragraph, and the preset keywords appear in an article;
in step 3, if a plurality of preset keywords which are successfully matched with the target keywords and have different prominent marks exist, the search results are sequentially displayed according to the descending order of the prominent marks.
4. The topic-based search optimization method of claim 1, wherein:
in step 2, the inter-word logical relationship comprises a weight scoring relationship, and weight scoring is carried out on each preset keyword and/or a phrase formed by a plurality of preset keywords which have an inter-word association relationship with each other;
and 3, sequentially displaying the search results of the preset keywords or phrases successfully matched with the target keywords according to the sequence of the weight scores from high to low.
5. The topic-based search optimization method of claim 1 or 4, wherein:
and step 2, analyzing the vocabularies in the data set based on a logistic regression method, and automatically generating themes according to the parts of speech and the word frequency of the vocabularies.
6. The topic-based search optimization method of claim 1 or 4, wherein:
and step 2, analyzing the words in the data set through a service expert based on the service rule of the group where the user is located, and manually generating a theme according to an analysis result.
7. The topic-based search optimization method of claim 1, wherein: the search optimization method further comprises the step of,
in step 4, feedback data of the user on the search result is obtained; and then returning to the step 2, and adjusting the generated theme according to the feedback data.
8. The topic-based search optimization method of claim 1 or 4, wherein:
and step 3, segmenting the read target keywords, and using at least two target keywords formed after segmentation to match with the preset keywords.
9. The topic-based search optimization method of claim 1 or 4, wherein:
in step 1, the data set comprises at least one of industry files and knowledge point files in the field to which the user belongs.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711075947 | 2017-11-06 | ||
CN2017110759479 | 2017-11-06 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107784123A CN107784123A (en) | 2018-03-09 |
CN107784123B true CN107784123B (en) | 2021-01-01 |
Family
ID=61430602
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711178366.8A Active CN107784123B (en) | 2017-11-06 | 2017-11-23 | Topic-based search optimization method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107784123B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084668A (en) * | 2019-04-09 | 2019-08-02 | 北京中科智营科技发展有限公司 | A kind of data processing method and data processing equipment of interactive interface of classifying |
CN110472027B (en) * | 2019-07-18 | 2024-05-14 | 平安科技(深圳)有限公司 | Intent recognition method, apparatus, and computer-readable storage medium |
CN112100330B (en) * | 2020-09-09 | 2023-09-26 | 杭州凡闻科技有限公司 | Topic searching method and system based on artificial intelligence technology |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101196898A (en) * | 2007-08-21 | 2008-06-11 | 新百丽鞋业(深圳)有限公司 | Method for applying phrase index technology into internet search engine |
CN101923556A (en) * | 2010-02-09 | 2010-12-22 | 上海莱希信息科技有限公司 | Method and device for searching webpages according to sentence serial numbers |
CN102609512A (en) * | 2012-02-07 | 2012-07-25 | 北京中机科海科技发展有限公司 | System and method for heterogeneous information mining and visual analysis |
CN104778262A (en) * | 2015-04-21 | 2015-07-15 | 无锡天脉聚源传媒科技有限公司 | Searching method and searching device |
CN105468729A (en) * | 2015-11-23 | 2016-04-06 | 深圳大粤网络视界有限公司 | Internet mobile vertical search engine |
-
2017
- 2017-11-23 CN CN201711178366.8A patent/CN107784123B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101196898A (en) * | 2007-08-21 | 2008-06-11 | 新百丽鞋业(深圳)有限公司 | Method for applying phrase index technology into internet search engine |
CN101923556A (en) * | 2010-02-09 | 2010-12-22 | 上海莱希信息科技有限公司 | Method and device for searching webpages according to sentence serial numbers |
CN102609512A (en) * | 2012-02-07 | 2012-07-25 | 北京中机科海科技发展有限公司 | System and method for heterogeneous information mining and visual analysis |
CN104778262A (en) * | 2015-04-21 | 2015-07-15 | 无锡天脉聚源传媒科技有限公司 | Searching method and searching device |
CN105468729A (en) * | 2015-11-23 | 2016-04-06 | 深圳大粤网络视界有限公司 | Internet mobile vertical search engine |
Also Published As
Publication number | Publication date |
---|---|
CN107784123A (en) | 2018-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107180025B (en) | Method and device for identifying new words | |
Johnston | The lexical database of auslan (australian sign language) | |
US20080221863A1 (en) | Search-based word segmentation method and device for language without word boundary tag | |
CN105957518A (en) | Mongolian large vocabulary continuous speech recognition method | |
Mori et al. | A machine learning approach to recipe text processing | |
CN103309926A (en) | Chinese and English-named entity identification method and system based on conditional random field (CRF) | |
CN107784123B (en) | Topic-based search optimization method | |
CN109522418A (en) | A kind of automanual knowledge mapping construction method | |
CN105320644B (en) | A kind of rule-based automatic Chinese syntactic analysis method | |
CN108681574A (en) | A kind of non-true class quiz answers selection method and system based on text snippet | |
Tachicart et al. | Building a Moroccan dialect electronic dictionary (MDED) | |
CN112633012B (en) | Login word replacement method based on entity type matching | |
CN105488098B (en) | A kind of new words extraction method based on field otherness | |
CN111488429A (en) | Short text clustering system based on search engine and short text clustering method thereof | |
CN110390022A (en) | A kind of professional knowledge map construction method of automation | |
JP2572314B2 (en) | Keyword extraction device | |
CN103020311B (en) | A kind of processing method of user search word and system | |
Lin et al. | A study on Chinese spelling check using confusion sets and? n-gram statistics | |
CN108255818B (en) | Combined machine translation method using segmentation technology | |
CN115831117A (en) | Entity identification method, entity identification device, computer equipment and storage medium | |
KR20080019948A (en) | Method for construction of lexical concept network based on lexicon and concept network using the same | |
Malandrakis et al. | Affective language model adaptation via corpus selection | |
US11132505B2 (en) | Chinese composition reviewing system | |
CN108280066B (en) | Off-line translation method from Chinese to English | |
Tianwen et al. | Evaluate the chinese version of machine translation based on perplexity analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |