CN112256822A - Text search method and device, computer equipment and storage medium - Google Patents

Text search method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112256822A
CN112256822A CN202011133988.0A CN202011133988A CN112256822A CN 112256822 A CN112256822 A CN 112256822A CN 202011133988 A CN202011133988 A CN 202011133988A CN 112256822 A CN112256822 A CN 112256822A
Authority
CN
China
Prior art keywords
text
search
word
searched
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011133988.0A
Other languages
Chinese (zh)
Inventor
李志韬
王健宗
吴天博
程宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011133988.0A priority Critical patent/CN112256822A/en
Priority to PCT/CN2020/135243 priority patent/WO2021189951A1/en
Publication of CN112256822A publication Critical patent/CN112256822A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the field of artificial intelligence, and the accuracy of a search result can be improved by matching similar words of a text to be searched according to a search engine comprising an index text library subjected to similar word expansion processing. In particular, to a text search method, apparatus, computer device, and storage medium, the text search method comprising: when detecting a text searching operation in a preset searching page, determining a text to be searched according to the text searching operation; based on a preset search engine, performing similar word matching on the text to be searched to obtain a target phrase corresponding to the text to be searched, wherein the search engine comprises an index text library subjected to similar word expansion processing; and generating a search result list according to the target phrase, and displaying the search result list on the search page. In addition, the application also relates to a block chain technology, and the index text library can be stored in the block chain.

Description

Text search method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a text search method, apparatus, computer device, and storage medium.
Background
With the explosive growth of internet content, how to search required texts from massive network information becomes a hot spot of information processing technology attention.
Most of the existing search engines realize text search based on a word frequency-inverse document algorithm, and the word frequency-inverse document algorithm can realize accurate matching of search texts according to texts in a database during text search. However, when the search text of the user is deviated from the words in the database, it is difficult for the word frequency-inverse document algorithm to accurately match the search result expected by the user, thereby reducing the user experience.
Therefore, how to improve the accuracy of text search becomes an urgent problem to be solved.
Disclosure of Invention
The application provides a text searching method, a text searching device, computer equipment and a storage medium, similar words of a text to be searched are matched according to a search engine comprising an index text library subjected to similar word expansion processing, and the accuracy of a search result is improved.
In a first aspect, the present application provides a text search method, including:
when detecting a text searching operation in a preset searching page, determining a text to be searched according to the text searching operation;
based on a preset search engine, performing similar word matching on the text to be searched to obtain a target phrase corresponding to the text to be searched, wherein the search engine comprises an index text library subjected to similar word expansion processing;
and generating a search result list according to the target phrase, and displaying the search result list on the search page.
In a second aspect, the present application also provides a text search apparatus, including:
the device comprises a text to be searched acquisition module, a text search module and a search processing module, wherein the text to be searched acquisition module is used for determining a text to be searched according to a text search operation when the text search operation in a preset search page is detected;
the similar word matching module is used for matching similar words of the text to be searched based on a preset search engine to obtain a target word group corresponding to the text to be searched, wherein the search engine comprises an index text library subjected to similar word expansion processing;
and the search result generation module is used for generating a search result list according to the target phrase and displaying the search result list on the search page.
In a third aspect, the present application further provides a computer device comprising a memory and a processor;
the memory for storing a computer program;
the processor is configured to execute the computer program and implement the text search method as described above when executing the computer program.
In a fourth aspect, the present application also provides a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement the text search method as described above.
The application discloses a text searching method, a text searching device, computer equipment and a storage medium, wherein a text to be searched is determined according to a text searching operation by detecting the text searching operation in a preset searching page, so that the text to be searched input by a user can be conveniently determined; similar words are expanded in an index text base in a search engine, and the expanded index text has more similar words with the same or similar semantemes; the method and the device effectively improve the accuracy of the search result by matching the similar words of the text to be searched according to the search engine comprising the index text base subjected to the similar word expansion processing and generating a search result list according to the target phrases obtained by matching.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a text search method provided in an embodiment of the present application;
FIG. 2 is a schematic flow chart diagram of sub-steps of a similar word expansion process for an indexed text library provided by an embodiment of the present application;
FIG. 3 is a schematic flow chart of determining keywords in a text to be augmented according to an embodiment of the present application;
FIG. 4 is a schematic flow chart diagram of the substeps of determining similar words corresponding to each keyword word vector provided by an embodiment of the present application;
FIG. 5 is a schematic flow chart of a sub-step of matching similar words for a text to be searched provided by an embodiment of the present application;
FIG. 6 is a scene diagram illustrating a user operation on text selection of a search result list according to an embodiment of the present application;
fig. 7 is a schematic block diagram of a text search apparatus according to an embodiment of the present application;
fig. 8 is a schematic block diagram of a structure of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
The embodiment of the application provides a text search method and device, computer equipment and a storage medium. The text searching method can be applied to a server or a terminal, and the accuracy of a searching result can be improved by matching similar words of a text to be searched according to a searching engine comprising an index text base subjected to similar word expansion processing.
The server may be an independent server or a server cluster. The terminal can be an electronic device such as a smart phone, a tablet computer, a notebook computer, a desktop computer and the like.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
As shown in fig. 1, the text search method includes steps S10 through S30.
And step S10, when the text searching operation in the preset searching page is detected, determining the text to be searched according to the text searching operation.
It should be noted that the preset search page may be a page in a server or a terminal, where the server or the terminal is provided with a search engine. When a user inputs a search text on a search page, the server or the terminal can adjust a search engine to perform similar word matching on the search text, so that a search result corresponding to the search text is obtained.
The search engine is a system for automatically collecting information from the internet, arranging the information and providing the information to a user for inquiry.
In the embodiment of the application, when the text search operation of a user in a search page is detected, the text to be searched is determined according to the text search operation. Illustratively, the text search operation may include a text input operation and a voice input operation.
In some embodiments, determining the text to be searched according to the text search operation may include: and when the text searching operation is a character input operation, obtaining a text to be searched according to the input character information.
For example, the text information input by the user in the input box of the search page may be obtained, and the input text information may be used as the text to be searched.
In other embodiments, determining the text to be searched according to the text searching operation may include: and when the text searching operation is a voice input operation, performing voice recognition on the input voice information to obtain a text to be searched.
It should be noted that, when the text search method provided by the embodiment of the present application is applied to a terminal, the terminal may also be an e-pinning robot. When a user can input voice information on a search page of the electric marketing robot, the voice information input by the user can be received through a microphone array of the electric marketing robot.
In some embodiments, the input speech information is subjected to speech recognition, and the speech information can be subjected to speech recognition according to a pre-stored trained speech recognition model.
By way of example, the speech recognition models may include, but are not limited to, hidden markov models, convolutional neural networks, constrained boltzmann machines, recurrent neural networks, and long-term and short-term memory networks, among others. For example, before performing speech recognition on the speech information, noise reduction processing may also be performed on the speech information to obtain noise-reduced speech information. For example, the noise reduction processing may be performed according to an adaptive filter, spectral subtraction, wiener filtering, wavelet analysis, or the like. The specific noise reduction process and speech recognition process are not described herein.
The text to be searched is determined according to the character input operation and the voice input operation of the user in the preset search page, the text to be searched input by the user can be conveniently determined, and a more convenient and flexible text search mode can be provided for the user.
In the embodiment of the present application, similar word expansion processing may be performed on an index text library in a search engine in advance, so that the search engine includes the index text library subjected to the similar word expansion processing. Therefore, through the search engine, similar word matching can be carried out on the search text of the user, and the accuracy of the search result is improved.
It is to be understood that the similar word expansion process refers to the process of supplementing similar words to the text in the index text library. Wherein similar words include synonyms and synonyms. Similar words are supplemented to the text in the index text library, so that the index text library contains more words with the same or similar semantics, and the matching can be performed semantically when the similar words are matched with the search text of the user subsequently, thereby improving the matching accuracy.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating sub-steps of performing similar word expansion processing on an indexed text library according to an embodiment of the present application, and may specifically include the following steps S101 to S103.
Step S101, sequentially taking each text in the index text base as a text to be expanded, and determining at least one keyword in the text to be expanded.
Illustratively, the indexed text library includes at least one text. In the embodiment of the application, each text in the index text base can be sequentially used as a text to be expanded, so that similar word expansion processing can be performed on the text to be expanded, and the index text base after the similar word expansion processing is obtained.
When similar word expansion processing is performed on each text to be expanded, at least one keyword in each text to be expanded needs to be determined. Referring to fig. 3, fig. 3 is a schematic flowchart of a sub-step of determining at least one keyword in a text to be expanded according to an embodiment of the present application, and specifically includes the following steps S1011 and S1012.
Step S1011, performing word segmentation processing on each sentence in the text to be expanded to obtain a plurality of phrases corresponding to the text to be expanded.
For example, the text to be augmented may include a plurality of sentences. During word segmentation, each sentence in the text to be expanded can be segmented.
In some embodiments, a Viterbi algorithm and a Hidden Markov Model (HMM) are combined to perform a word segmentation process on each sentence in the text to be expanded, so as to obtain a plurality of word groups corresponding to the text to be expanded.
It should be noted that, the Viterbi algorithm is an algorithm commonly used in the word segmentation process of the HMM model, and the Viterbi algorithm is used to determine the most probable hidden sequence of the known observation sequence under the HMM model. In the word segmentation processing process of the HMM model, five elements in the HMM model can be obtained by counting a corpus: an initial probability matrix, a transition probability matrix, an emission probability matrix, an observation value set and a state value set. With these three matrices and two sets, the word segmentation problem of the HMM model is transformed into a problem that solves the optimal solution of the hidden state sequence, and the Viterbi algorithm is most often used to solve this problem. The Viterbi algorithm adopts the idea of dynamic programming, and recursively calculates the most probable (locally optimal) path to the current state path by using a backward pointer, so that the problem of solving the optimal solution of the hidden state sequence can be solved.
Illustratively, each sentence in the text to be expanded may be input into the trained HMM model for word segmentation, so as to obtain one or more phrases corresponding to each sentence. Therefore, a plurality of phrases corresponding to the text to be expanded can be obtained.
Step S1012, performing keyword extraction on the plurality of phrases according to a preset keyword extraction algorithm to obtain at least one keyword corresponding to the text to be expanded.
Illustratively, the preset keyword extraction algorithm may include a word frequency-inverse document (TF-IDF) algorithm.
In the TF-IDF algorithm, TF represents a Term Frequency (Term Frequency) and IDF represents an Inverse Document Frequency (Inverse Document Frequency). The TF-IDF algorithm is a commonly used weighting technique for information retrieval and data mining, which can evaluate the importance of a word to a document in a document set or a corpus.
Exemplary, the calculation formula of the word frequency TF is as follows:
Figure BDA0002736065690000061
in the formula, n represents the occurrence number of a certain word in a document; m represents the total number of words of the document.
In a corpus, the calculation formula of the inverse document frequency is as follows:
Figure BDA0002736065690000062
in the formula, w represents the total number of documents in the corpus; w represents the number of documents containing the word.
Exemplary, the calculation formula for the TF-IDF value is as follows:
Figure BDA0002736065690000063
it should be noted that the TF-IDF value is proportional to the number of occurrences of a word in a document and inversely proportional to the number of occurrences of the word in the entire corpus. Therefore, the process of extracting the keywords may be understood as calculating a TF-IDF value corresponding to each word of the text, and then sorting each word in a descending order according to the TF-IDF values, and using the first words as the keywords.
In the embodiment of the application, TF-IDF values corresponding to phrases in the text to be expanded can be calculated according to a TF-IDF algorithm, and the phrases of which the corresponding TF-IDF values are larger than a preset TF-IDF threshold value are determined as keywords corresponding to the text to be expanded.
In some embodiments, when extracting the keywords of the text to be expanded, words around the keywords may also be set together as the keywords. Illustratively, verbs and/or nouns around a keyword may be collectively set as the keyword. By setting the words around the keyword as the keyword, the semantic richness of the keyword can be improved, and the readability of the keyword is further improved.
By extracting the keywords from the phrases according to the word frequency-inverse document algorithm, the advantage of high speed of the word frequency-inverse document algorithm can be fully utilized, and the efficiency of extracting the keywords is improved.
And step S102, calling a word vectorization model, and vectorizing each keyword to obtain a keyword word vector corresponding to the text to be expanded.
In some embodiments, the trained word vectorization model is called to vectorize each keyword to obtain a keyword word vector corresponding to the text to be expanded.
Illustratively, the word vectorization model may include a bert (bidirectional Encoder expressions from transform) model.
In the embodiment of the application, before the word vectorization model is called, the initial word vectorization model can be trained to obtain the trained word vectorization model.
Illustratively, the BERT model may be trained in advance using a large-scale text corpus unrelated to a specific NLP (Natural Language Processing) task, so as to obtain a trained word vectorization model. During training, the BERT model can take semantic vector representation of a target word and each context word as input through an Attention mechanism, firstly, the vector representation of the target word, the vector representation of each context word and the original value representation of the target word and each context word are obtained through linear transformation, then, the similarity between the vector of the target word and the vector of each context word is calculated as weight, and the vector of the target word and the vectors of each upper and lower characters are weighted and fused to serve as output of the Attention, namely the enhanced semantic vector representation of the target word.
It should be emphasized that, in order to further ensure the privacy and security of the trained word vectorization model, the trained word vectorization model may also be stored in a node of a block chain. In vectorizing each keyword, the trained word vectorization model may be invoked from the nodes of the blockchain.
By using the BERT model to vectorize each keyword, because the BERT model can extract semantic information around the keyword and blend the semantic information into word vectors, the keyword word vectors with enhanced semantics can be obtained, and more similar words with the same or similar semantics as the keywords can be obtained subsequently.
Step S103, determining at least one similar word corresponding to each keyword word vector in the index text base, and adding the at least one similar word to the text to be expanded.
Illustratively, the indexed text library includes a plurality of phrases. It will be appreciated that the indexed text library comprises at least one text, wherein each text comprises a plurality of sentences, and thus, the indexed text library comprises a plurality of word groups.
In the embodiment of the present application, similarity calculation may be performed on each keyword-word vector and all phrases in the index text library to determine a similar word corresponding to each keyword-word vector.
Referring to fig. 4, fig. 4 is a schematic flowchart of the sub-step of determining at least one similar word corresponding to each keyword word vector in the index text library in step S103, and may specifically include the following steps S1031 to S1033.
Step S1031, based on a preset similarity algorithm, calculating first similarities between each keyword word vector and word vectors corresponding to a plurality of phrases in the index text library.
Exemplary preset similarity algorithms may include, but are not limited to, euclidean distance, cosine similarity, manhattan distance, and chebyshev distance similarity algorithms.
In the embodiment of the present application, the similarity between each keyword word vector and the word vectors corresponding to the multiple word groups in the index text library may be calculated according to a cosine similarity algorithm, and may of course be calculated according to other similarity algorithms, and the specific process is not described herein again.
It should be noted that the cosine similarity calculation method uses the cosine value of the included angle between two vectors in the vector space as the measure of the similarity between the two vectors. Illustratively, the formula for the cosine value of the included angle is:
Figure BDA0002736065690000081
in the formula, θ represents a vector V1Sum vector V2The angle between them, n represents the vector V1Sum vector V2Dimension (d); the cosine value cos theta of the included angle has a value range of [0, 1 ]]。
In some embodiments, before calculating the first similarity between each keyword word vector and the word vectors corresponding to the multiple word groups in the index text library, vectorizing the multiple word groups in the index text library to obtain the word vectors corresponding to the multiple word groups.
Illustratively, each keyword word vector may be represented as V0(ii) a The word vectors corresponding to multiple word groups in the index text library can be represented as v1,v2,…,vkWhere k represents the number of word vectors.
For example, the keyword word vectors V may be computed separately0And word vector v in index text library1,v2,…,vkThe cosine value of the included angle between the two words to obtain a keyword word vector V0A first similarity with word vectors in the indexed text corpus.
Step S1032, determining the corresponding target word vector with the first similarity greater than a first preset similarity threshold.
For example, the first preset similarity threshold may be set according to actual conditions, and specific values are not limited herein.
Illustratively, the keyword word vector V is obtained0After the first similarity between the word vectors and each word vector in the index text library, determining the corresponding word vector with the first similarity larger than a first preset similarity threshold value as a keyword word vector V0The target word vector of (2).
Step S1033, determining the word group corresponding to the target word vector as a similar word corresponding to each keyword word vector.
It can be understood that, since the word vectors in the index text library are obtained by vectorizing a plurality of word groups in the index text library, the word vectors in the index text library all have corresponding word groups.
In some embodiments, the word group corresponding to the target word vector of the keyword word vector is determined as the similar word corresponding to each keyword word vector.
And the target word vector of each keyword word vector is at least one, so that at least one similar word of each keyword word vector can be obtained.
In the embodiment of the present application, after determining at least one similar word corresponding to each keyword word vector in the indexed text library, at least one similar word may be added to the text to be expanded. And adding similar words to each text in the index text library in sequence to obtain the index text library subjected to similar word expansion processing.
It should be emphasized that, in order to further ensure the privacy and security of the index text library of the similar word expansion process, the index text library of the similar word expansion process may also be stored in a node of a block chain.
The similarity between each keyword word vector and the word vector in the index text library is calculated according to a similarity algorithm, and the similar words corresponding to each keyword word vector are added to the text to be expanded, so that the number of the similar words of each text in the index text library can be enriched.
Step S20, performing similar word matching on the text to be searched based on a preset search engine to obtain a target phrase corresponding to the text to be searched, where the search engine includes an index text library that is subjected to similar word expansion processing.
In the embodiment of the application, after the text to be searched is determined according to the text searching operation, similar word matching can be performed on the text to be searched based on a preset search engine, so as to obtain the target phrase corresponding to the text to be searched.
The search engine includes an index text library subjected to similar word expansion processing, and the specific similar word expansion processing process may refer to the detailed description of the above embodiments, which is not described herein again.
Similar word matching is carried out on the text to be searched input by the user through a search engine containing an index text base subjected to similar word expansion processing, a target phrase with similar semantics of the text to be searched can be matched, and the accuracy of a search result can be effectively improved.
Referring to fig. 5, fig. 5 is a schematic flowchart of the substep of performing similar word matching on the text to be searched in step S20 to obtain the target phrase corresponding to the text to be searched, and specifically may include the following steps S201 to S203.
Step S201, performing word segmentation processing on the text to be searched to obtain a phrase set corresponding to the text to be searched.
Illustratively, if the text to be searched comprises a sentence ABC, after the sentence ABC is subjected to word segmentation processing, a phrase set corresponding to the text to be searched is obtained as (a, B, C).
Step S202, calculating second similarity between the phrase set and a plurality of phrases in the index text library.
Illustratively, a second similarity between the set of phrases and the plurality of phrases in the indexed text library may be calculated according to a cosine similarity algorithm.
For example, if the index text library includes phrase a1, phrase a2, phrase A3, and phrase a4, then the second similarity between the phrase set (a, B, C) and phrases a1, a2, A3, and a4 are calculated, respectively, and the second similarity α corresponding to phrase a1 is obtained1The phrase A2 corresponds to a second similarity degree alpha2The phrase A3 corresponds to a second similarity degree alpha3And a second similarity degree alpha corresponding to the phrase A44
Step S203, using at least one phrase with the second similarity greater than a second preset similarity threshold as a target phrase corresponding to the phrase set.
For example, if the phrases with the second similarity greater than the second preset similarity threshold include phrase a1, phrase a2, and phrase A3, it may be determined that the target phrases corresponding to the phrase set (a, B, C) are phrase a1, phrase a2, and phrase A3.
For example, the second preset similarity threshold may be set according to actual conditions, and specific values are not limited herein.
And step S30, generating a search result list according to the target phrase, and displaying the search result list on the search page.
In some embodiments, generating the search result list from the target phrase may include: acquiring a target text corresponding to a target phrase; and sequencing the target texts according to the second similarity corresponding to the target phrases to obtain a search result list.
For example, if the target phrases corresponding to the phrase set include phrase a1, phrase a2, and phrase A3, the text where phrase a1, phrase a2, and phrase A3 are located may be used as the target text. For example, the target text includes text 1, text 2, and text 3.
For example, when generating the search result list, the target texts may be arranged in descending order according to the second similarity corresponding to the target phrase. If the second similarity corresponding to the phrase A1, the phrase A2 and the phrase A3 is alpha123Then the resulting search result list is shown in table 1.
TABLE 1
Text 1
Text 2
Text 3
For example, when repeated texts exist in the target text, one of the repeated texts can be retained, and other repeated texts can be eliminated.
In some embodiments, after generating the search result list, the search result list may be displayed on a search page. For example, the search result list may be rendered onto a search page to display the search result list on the search page.
In some embodiments, after displaying the search result list on the search page, the method further comprises: when a text selection operation for the search result list is received, determining a selected text according to the text selection operation; and determining the ranking value of the selected text in the search result list, and performing similar word expansion processing on the index text base when the ranking value is not the preset ranking value.
Referring to fig. 6, fig. 6 is a scene diagram illustrating a user operation on text selection of a search result list according to an embodiment of the present invention. As shown in fig. 6, after the user inputs a text desired to be searched on the search page, the text in the search result list may be selected to read the content information in the text. Therefore, the text selection operation of the user on the search result list in the search page can be received, and the selected text of the user is determined according to the text selection operation; and then judging the ranking value of the selected text in the search result list.
For example, the preset ranking value can be set according to actual conditions. For example, the preset ranking value may include a first ranking, and may also include a first ranking and a second ranking.
For example, when the selected text is a preset ranking value, for example, the ranking value of the selected text is the first ranking, which indicates that the search result is accurate.
For example, when the selected text is not the preset ranking value, the effect of the search result is not good enough, and the text desired by the user is not arranged at the front of the search result list.
In some embodiments, when the ranking value is not the preset ranking value, the index text base is subjected to similar word expansion processing. For a specific process of performing similar word expansion processing on the index text library, reference may be made to the detailed description of the above embodiments, which is not described herein again.
By receiving the text selection operation of the user in the search result list and determining whether similar word expansion processing needs to be carried out on the index text base again according to the text selection operation, the similar word expansion processing can be carried out on the index text base again, and the search accuracy of the search engine is further improved.
According to the text searching method provided by the embodiment, the text to be searched is determined according to the character input operation and the voice input operation of the user in the preset searching page, the text to be searched input by the user can be conveniently determined, and a more convenient and flexible text searching mode can be provided for the user; similar words are supplemented to the text in the index text library, so that the index text library contains more words with the same or similar semantics, and the matching can be performed semantically when the similar words are matched with the search text of the user subsequently, thereby improving the matching accuracy; by extracting the keywords from a plurality of phrases according to the word frequency-inverse document algorithm, the advantage of high speed of the word frequency-inverse document algorithm can be fully utilized, and the efficiency of extracting the keywords is improved; each keyword is vectorized by using a BERT model, and because the BERT model can extract semantic information around the keywords and integrate the semantic information into word vectors, the keyword word vectors with enhanced semantics can be obtained, so that more similar words with the same or similar semantics as the keywords can be obtained subsequently; similarity between each keyword word vector and a word vector in the index text library is calculated according to a similarity algorithm, and similar words corresponding to each keyword word vector are added to the text to be expanded, so that the number of the similar words of each text in the index text library can be enriched; similar word matching is carried out on the text to be searched input by the user through a search engine containing an index text base subjected to similar word expansion processing, a target phrase with similar semantics of the text to be searched can be matched, and the accuracy of a search result can be effectively improved; by receiving the text selection operation of the user in the search result list and determining whether similar word expansion processing needs to be carried out on the index text base again according to the text selection operation, the similar word expansion processing can be carried out on the index text base again, and the search accuracy of the search engine is further improved.
Referring to fig. 7, fig. 7 is a schematic block diagram of a text search apparatus 1000 according to an embodiment of the present application, where the text search apparatus is configured to perform the text search method described above. Wherein, the text search device can be configured in a server or a terminal.
As shown in fig. 7, the text search device 1000 includes: a text to be searched acquisition module 1001, a similar word matching module 1002 and a search result generation module 1003.
A text to be searched acquisition module 1001 configured to determine, when a text search operation in a preset search page is detected, a text to be searched according to the text search operation;
a similar word matching module 1002, configured to perform similar word matching on the text to be searched based on a preset search engine, to obtain a target phrase corresponding to the text to be searched, where the search engine includes an index text library subjected to similar word expansion processing;
a search result generating module 1003, configured to generate a search result list according to the target phrase, and display the search result list on the search page.
It should be noted that, as will be clear to those skilled in the art, for convenience and brevity of description, the specific working processes of the apparatus and the modules described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The apparatus described above may be implemented in the form of a computer program which is executable on a computer device as shown in fig. 8.
Referring to fig. 8, fig. 8 is a schematic block diagram of a computer device according to an embodiment of the present disclosure. The computer device may be a server or a terminal.
Referring to fig. 8, the computer device includes a processor and a memory connected by a system bus, wherein the memory may include a nonvolatile storage medium and an internal memory.
The processor is used for providing calculation and control capability and supporting the operation of the whole computer equipment.
The internal memory provides an environment for the execution of a computer program on a non-volatile storage medium, which when executed by a processor, causes the processor to perform any of the text search methods.
It should be understood that the Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Wherein, in one embodiment, the processor is configured to execute a computer program stored in the memory to implement the steps of:
when detecting a text searching operation in a preset searching page, determining a text to be searched according to the text searching operation; based on a preset search engine, performing similar word matching on the text to be searched to obtain a target phrase corresponding to the text to be searched, wherein the search engine comprises an index text library subjected to similar word expansion processing; and generating a search result list according to the target phrase, and displaying the search result list on the search page.
In one embodiment, the indexed text library comprises at least one text; when the processor detects a text search operation in a preset search page, before determining a text to be searched according to the text search operation, the processor is further configured to:
sequentially taking each text in the index text base as a text to be expanded, and determining at least one keyword in the text to be expanded; calling a word vectorization model, and vectorizing each keyword to obtain a keyword word vector corresponding to the text to be expanded; determining at least one similar word corresponding to each keyword word vector in the index text base, and adding the at least one similar word to the text to be expanded.
In one embodiment, the processor, in implementing determining at least one keyword in the text to be augmented, is configured to implement:
performing word segmentation processing on each sentence in the text to be expanded to obtain a plurality of word groups corresponding to the text to be expanded; and extracting keywords from the phrases according to a preset keyword extraction algorithm to obtain at least one keyword corresponding to the text to be expanded.
In one embodiment, the indexed text library comprises a plurality of phrases; the processor is configured to determine at least one similar word corresponding to each keyword word vector in the indexed text library, and is configured to:
calculating first similarity between each keyword word vector and word vectors corresponding to a plurality of phrases in the index text library based on a preset similarity algorithm; determining a corresponding target word vector with a first similarity larger than a first preset similarity threshold; and determining the phrases corresponding to the target word vectors as similar words corresponding to each keyword word vector.
In one embodiment, the text search operation includes a text input operation and a voice input operation; when the processor determines the text to be searched according to the text searching operation, the processor is used for realizing that:
when the text searching operation is a character input operation, obtaining the text to be searched according to input character information; and when the text searching operation is a voice input operation, performing voice recognition on input voice information to obtain the text to be searched.
In one embodiment, when implementing similar word matching on the text to be searched to obtain a target phrase corresponding to the text to be searched, the processor is configured to implement:
performing word segmentation processing on the text to be searched to obtain a word group set corresponding to the text to be searched; calculating a second similarity between the phrase set and a plurality of phrases in the index text library; and taking at least one phrase with the second similarity larger than a second preset similarity threshold as a target phrase corresponding to the phrase set.
In one embodiment, the processor, when implementing generating a search result list from the target phrase, is configured to implement:
acquiring a target text corresponding to the target phrase; and sequencing the target texts according to the second similarity corresponding to the target phrases to obtain the search result list.
In one embodiment, the processor, after enabling display of the search result list on the search page, is further configured to enable:
when a text selection operation on the search result list is received, determining a selected text according to the text selection operation; and determining the ranking value of the selected text in the search result list, and performing similar word expansion processing on the index text library when the ranking value is not a preset ranking value.
The embodiment of the application further provides a computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, the computer program comprises program instructions, and the processor executes the program instructions to implement any text search method provided by the embodiment of the application.
The computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital Card (SD Card), a Flash memory Card (Flash Card), and the like provided on the computer device.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A text search method, comprising:
when detecting a text searching operation in a preset searching page, determining a text to be searched according to the text searching operation;
based on a preset search engine, performing similar word matching on the text to be searched to obtain a target phrase corresponding to the text to be searched, wherein the search engine comprises an index text library subjected to similar word expansion processing;
and generating a search result list according to the target phrase, and displaying the search result list on the search page.
2. The text search method of claim 1, wherein the indexed text library comprises at least one text; when the text search operation in the preset search page is detected, determining the text to be searched according to the text search operation, including:
sequentially taking each text in the index text base as a text to be expanded, and determining at least one keyword in the text to be expanded;
calling a word vectorization model, and vectorizing each keyword to obtain a keyword word vector corresponding to the text to be expanded;
determining at least one similar word corresponding to each keyword word vector in the index text base, and adding the at least one similar word to the text to be expanded.
3. The text search method of claim 2, wherein the determining at least one keyword in the text to be expanded comprises:
performing word segmentation processing on each sentence in the text to be expanded to obtain a plurality of word groups corresponding to the text to be expanded;
and extracting keywords from the phrases according to a preset keyword extraction algorithm to obtain at least one keyword corresponding to the text to be expanded.
4. The text search method of claim 2, wherein the indexed text library comprises a plurality of phrases; the determining at least one similar word corresponding to each keyword word vector in the index text library includes:
calculating first similarity between each keyword word vector and word vectors corresponding to a plurality of phrases in the index text library based on a preset similarity algorithm;
determining a corresponding target word vector with a first similarity larger than a first preset similarity threshold;
and determining the phrases corresponding to the target word vectors as similar words corresponding to each keyword word vector.
5. The text search method according to claim 1, wherein the text search operation includes a text input operation and a voice input operation; the determining the text to be searched according to the text searching operation comprises the following steps:
when the text searching operation is a character input operation, obtaining the text to be searched according to input character information;
and when the text searching operation is a voice input operation, performing voice recognition on input voice information to obtain the text to be searched.
6. The text search method according to claim 1, wherein the performing similar word matching on the text to be searched to obtain a target phrase corresponding to the text to be searched comprises:
performing word segmentation processing on the text to be searched to obtain a word group set corresponding to the text to be searched;
calculating a second similarity between the phrase set and a plurality of phrases in the index text library;
taking at least one phrase with the second similarity larger than a second preset similarity threshold as a target phrase corresponding to the phrase set;
the generating of the search result list according to the target phrase includes:
acquiring a target text corresponding to the target phrase;
and sequencing the target texts according to the second similarity corresponding to the target phrases to obtain the search result list.
7. The text search method of any one of claims 1-6, wherein after displaying the search result list on the search page, further comprising:
when a text selection operation on the search result list is received, determining a selected text according to the text selection operation;
and determining the ranking value of the selected text in the search result list, and performing similar word expansion processing on the index text library when the ranking value is not a preset ranking value.
8. A text search apparatus, comprising:
the device comprises a text to be searched acquisition module, a text search module and a search processing module, wherein the text to be searched acquisition module is used for determining a text to be searched according to a text search operation when the text search operation in a preset search page is detected;
the similar word matching module is used for matching similar words of the text to be searched based on a preset search engine to obtain a target word group corresponding to the text to be searched, wherein the search engine comprises an index text library subjected to similar word expansion processing;
and the search result generation module is used for generating a search result list according to the target phrase and displaying the search result list on the search page.
9. A computer device, wherein the computer device comprises a memory and a processor;
the memory for storing a computer program;
the processor for executing the computer program and implementing the text search method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to implement the text search method according to any one of claims 1 to 7.
CN202011133988.0A 2020-10-21 2020-10-21 Text search method and device, computer equipment and storage medium Pending CN112256822A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011133988.0A CN112256822A (en) 2020-10-21 2020-10-21 Text search method and device, computer equipment and storage medium
PCT/CN2020/135243 WO2021189951A1 (en) 2020-10-21 2020-12-10 Text search method and apparatus, and computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011133988.0A CN112256822A (en) 2020-10-21 2020-10-21 Text search method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112256822A true CN112256822A (en) 2021-01-22

Family

ID=74263686

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011133988.0A Pending CN112256822A (en) 2020-10-21 2020-10-21 Text search method and device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112256822A (en)
WO (1) WO2021189951A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988753A (en) * 2021-03-31 2021-06-18 建信金融科技有限责任公司 Data searching method and device
CN115408491A (en) * 2022-11-02 2022-11-29 京华信息科技股份有限公司 Text retrieval method and system for historical data
CN117972097A (en) * 2024-03-29 2024-05-03 长城汽车股份有限公司 Text classification method, classification device, electronic equipment and storage medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114222000B (en) * 2021-12-13 2024-02-02 中国平安财产保险股份有限公司 Information pushing method, device, computer equipment and storage medium
CN114780673B (en) * 2022-03-28 2024-04-30 西安远诺技术转移有限公司 Scientific and technological achievement management method and platform based on field matching
CN115357605B (en) * 2022-10-19 2023-02-10 湖南创亚信息科技有限公司 Client information retrieval method and device, electronic equipment and storage medium
CN115659046B (en) * 2022-11-10 2023-03-10 果子(青岛)数字技术有限公司 AI big data based technical transaction recommendation system and method
CN116756151B (en) * 2023-08-17 2023-11-24 公安部信息通信中心 Knowledge searching and data processing system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102483757A (en) * 2009-08-21 2012-05-30 米科·韦内宁 Method and means for data searching and language translation
CN103177122A (en) * 2013-04-15 2013-06-26 天津理工大学 Personal document searching method based on synonyms
CN108776901A (en) * 2018-04-27 2018-11-09 微梦创科网络科技(中国)有限公司 Method and system for advertisement recommendation based on search term
US10459962B1 (en) * 2018-09-19 2019-10-29 Servicenow, Inc. Selectively generating word vector and paragraph vector representations of fields for machine learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110184946A1 (en) * 2010-01-28 2011-07-28 International Business Machines Corporation Applying synonyms to unify text search with faceted browsing classification
CN102999569B (en) * 2012-11-09 2015-08-19 同济大学 User requirements analysis steady arm and analysis and localization method
CN108241667B (en) * 2016-12-26 2019-10-15 百度在线网络技术(北京)有限公司 Method and apparatus for pushed information
CN108509474B (en) * 2017-09-15 2022-01-07 腾讯科技(深圳)有限公司 Synonym expansion method and device for search information
CN111930880A (en) * 2020-08-14 2020-11-13 易联众信息技术股份有限公司 Text code retrieval method, device and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102483757A (en) * 2009-08-21 2012-05-30 米科·韦内宁 Method and means for data searching and language translation
CN103177122A (en) * 2013-04-15 2013-06-26 天津理工大学 Personal document searching method based on synonyms
CN108776901A (en) * 2018-04-27 2018-11-09 微梦创科网络科技(中国)有限公司 Method and system for advertisement recommendation based on search term
US10459962B1 (en) * 2018-09-19 2019-10-29 Servicenow, Inc. Selectively generating word vector and paragraph vector representations of fields for machine learning

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988753A (en) * 2021-03-31 2021-06-18 建信金融科技有限责任公司 Data searching method and device
CN115408491A (en) * 2022-11-02 2022-11-29 京华信息科技股份有限公司 Text retrieval method and system for historical data
CN117972097A (en) * 2024-03-29 2024-05-03 长城汽车股份有限公司 Text classification method, classification device, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2021189951A1 (en) 2021-09-30

Similar Documents

Publication Publication Date Title
CN112256822A (en) Text search method and device, computer equipment and storage medium
US11017178B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
RU2678716C1 (en) Use of autoencoders for learning text classifiers in natural language
CN108304375B (en) Information identification method and equipment, storage medium and terminal thereof
CN109299280B (en) Short text clustering analysis method and device and terminal equipment
CN110162771B (en) Event trigger word recognition method and device and electronic equipment
CN112395385B (en) Text generation method and device based on artificial intelligence, computer equipment and medium
CN110334209B (en) Text classification method, device, medium and electronic equipment
US20140280088A1 (en) Combined term and vector proximity text search
CN105760363B (en) Word sense disambiguation method and device for text file
CN107885717B (en) Keyword extraction method and device
US20220114340A1 (en) System and method for an automatic search and comparison tool
CN114880447A (en) Information retrieval method, device, equipment and storage medium
CN112395875A (en) Keyword extraction method, device, terminal and storage medium
CN111325030A (en) Text label construction method and device, computer equipment and storage medium
Banik et al. Gru based named entity recognition system for bangla online newspapers
CN112632261A (en) Intelligent question and answer method, device, equipment and storage medium
CN113836938A (en) Text similarity calculation method and device, storage medium and electronic device
CN113204953A (en) Text matching method and device based on semantic recognition and device readable storage medium
CN111737420A (en) Class case retrieval method, system, device and medium based on dispute focus
CN111859079A (en) Information searching method and device, computer equipment and storage medium
CN112632285A (en) Text clustering method and device, electronic equipment and storage medium
CN116484829A (en) Method and apparatus for information processing
CN107729509B (en) Discourse similarity determination method based on recessive high-dimensional distributed feature representation
CN113505196B (en) Text retrieval method and device based on parts of speech, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination