WO2021189951A1 - Procédé et appareil de recherche de texte, et dispositif informatique et support de stockage - Google Patents

Procédé et appareil de recherche de texte, et dispositif informatique et support de stockage Download PDF

Info

Publication number
WO2021189951A1
WO2021189951A1 PCT/CN2020/135243 CN2020135243W WO2021189951A1 WO 2021189951 A1 WO2021189951 A1 WO 2021189951A1 CN 2020135243 W CN2020135243 W CN 2020135243W WO 2021189951 A1 WO2021189951 A1 WO 2021189951A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
search
searched
word
expanded
Prior art date
Application number
PCT/CN2020/135243
Other languages
English (en)
Chinese (zh)
Inventor
李志韬
王健宗
吴天博
程宁
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021189951A1 publication Critical patent/WO2021189951A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms

Definitions

  • This application relates to the field of artificial intelligence, and in particular to a text search method, device, computer equipment and storage medium.
  • This application provides a text search method, the method includes:
  • search engine includes an indexed text library that has been expanded by similar words
  • a search result list is generated according to the target phrase, and the search result list is displayed on the search page.
  • This application also provides a text search device, which includes:
  • the to-be-searched text obtaining module is configured to determine the to-be-searched text according to the text search operation when a text search operation in a preset search page is detected;
  • the similar word matching module is used to perform similar word matching on the text to be searched based on a preset search engine to obtain the target phrase corresponding to the text to be searched, wherein the search engine includes an index that has been expanded by similar words Text library
  • the search result generation module is configured to generate a search result list according to the target phrase, and display the search result list on the search page.
  • the application also provides a computer device, which includes a memory and a processor;
  • the memory is used to store a computer program
  • the processor is configured to execute the computer program and implement the following steps when the computer program is executed:
  • search engine includes an indexed text library that has been expanded by similar words
  • a search result list is generated according to the target phrase, and the search result list is displayed on the search page.
  • the present application also provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor implements the following steps:
  • search engine includes an indexed text library that has been expanded by similar words
  • a search result list is generated according to the target phrase, and the search result list is displayed on the search page.
  • FIG. 1 is a schematic flowchart of a text search method provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of sub-steps of similar word expansion processing on an indexed text library provided by an embodiment of the present application;
  • FIG. 3 is a schematic flowchart of determining keywords in a text to be expanded according to an embodiment of the present application
  • FIG. 4 is a schematic flowchart of the sub-steps of determining similar words corresponding to each keyword word vector provided by an embodiment of the present application;
  • FIG. 5 is a schematic flowchart of the sub-steps of matching similar words in the text to be searched according to an embodiment of the present application
  • FIG. 6 is a schematic diagram of a scene of a user's text selection operation on a search result list provided by an embodiment of the present application
  • FIG. 7 is a schematic block diagram of a text search device provided by an embodiment of the present application.
  • FIG. 8 is a schematic block diagram of the structure of a computer device provided by an embodiment of the present application.
  • the embodiments of the present application provide a text search method, device, computer equipment, and storage medium.
  • the text search method can be applied to a server or a terminal.
  • a search engine containing an indexed text library processed by similar word expansion processing By matching similar words to search texts according to a search engine containing an indexed text library processed by similar word expansion processing, the accuracy of search results can be improved.
  • the server can be an independent server or a server cluster.
  • the terminal can be an electronic device such as a smart phone, a tablet computer, a notebook computer, and a desktop computer.
  • the text search method includes steps S10 to S30.
  • Step S10 When a text search operation in a preset search page is detected, the text to be searched is determined according to the text search operation.
  • the preset search page may be a page in a server or a terminal, where a search engine is provided in the server or the terminal.
  • the server or terminal can call the search engine to match similar words to the search text, thereby obtaining search results corresponding to the search text.
  • search engine refers to a system that automatically collects information from the Internet, organizes the information and provides it to users for inquiries.
  • the text to be searched is determined according to the text search operation.
  • the text search operation may include a text input operation and a voice input operation.
  • determining the text to be searched according to the text search operation may include: when the text search operation is a text input operation, obtaining the text to be searched according to the input text information.
  • the text information input by the user in the input box of the search page can be obtained, and the input text information can be used as the text to be searched.
  • determining the text to be searched according to the text search operation may include: when the text search operation is a voice input operation, performing voice recognition on the input voice information to obtain the text to be searched.
  • the terminal may also be a telemarketing robot.
  • the voice information input by the user can be received through the microphone array of the telemarketing robot.
  • voice recognition is performed on the input voice information, and the voice information may be recognized according to a pre-stored and trained voice recognition model.
  • the speech recognition model may include, but is not limited to, a hidden Markov model, a convolutional neural network, a restricted Boltzmann machine, a recurrent neural network, and a long and short-term memory network.
  • noise reduction processing may also be performed on the voice information to obtain the voice information after noise reduction.
  • the noise reduction can be performed according to methods such as adaptive filter, spectral subtraction, Wiener filtering, or wavelet analysis. The specific noise reduction process and speech recognition process will not be repeated here.
  • the text to be searched input by the user can be easily determined, and a more convenient and flexible text search method can also be provided for the user.
  • similar word expansion processing may be performed on the indexed text library in the search engine in advance, so that the search engine includes the indexed text library subjected to similar word expansion processing. Therefore, through the search engine, similar word matching can be performed on the user's search text, thereby improving the accuracy of the search results.
  • similar word expansion processing refers to supplementing similar words to the text in the indexed text library.
  • similar words include synonyms and synonyms.
  • the indexed text library contains more semantically identical or similar words.
  • FIG. 2 is a schematic flowchart of a sub-step of similar word expansion processing on an indexed text library provided by an embodiment of the present application, which may specifically include the following steps S101 to S103.
  • Step S101 Use each text in the index text library as a text to be expanded in turn, and determine at least one keyword in the text to be expanded.
  • the indexed text library includes at least one text.
  • each text in the indexed text library may be used as the text to be expanded in turn, so that similar word expansion processing can be performed on the expanded text to obtain an indexed text library after similar word expansion processing.
  • FIG. 3 is a schematic flowchart of the sub-steps of determining at least one keyword in the text to be expanded according to an embodiment of the present application, which may specifically include the following steps S1011 and S1012.
  • Step S1011 performing word segmentation processing on each sentence in the text to be expanded to obtain multiple phrases corresponding to the text to be expanded.
  • the text to be expanded may include multiple sentences. During word segmentation, each sentence in the expanded text can be segmented separately.
  • the Viterbi algorithm and Hidden Markov Model can be combined to perform word segmentation for each sentence in the text to be expanded to obtain multiple phrases corresponding to the text to be expanded .
  • the Viterbi algorithm is a commonly used algorithm in the word segmentation processing of the HMM model.
  • the Viterbi algorithm is used to determine the most likely hidden sequence of the known observation sequence under the HMM model.
  • the five major elements of the HMM model can be obtained: the initial probability matrix, the transition probability matrix, the emission probability matrix, the observation value set and the state value set.
  • the word segmentation problem of the HMM model is transformed into the problem of solving the optimal solution of the hidden state sequence, and the Viterbi algorithm is most often used to solve this problem.
  • the Viterbi algorithm adopts the idea of dynamic programming, using backward pointers to recursively calculate the most likely (local optimal) path to the current state path, so as to solve the problem of solving the optimal solution of the hidden state sequence.
  • each sentence in the text to be expanded may be separately input into the trained HMM model for word segmentation processing to obtain one or more phrases corresponding to each sentence.
  • multiple phrases corresponding to the text to be expanded can be obtained.
  • Step S1012 Perform keyword extraction on the multiple phrases according to a preset keyword extraction algorithm to obtain at least one keyword corresponding to the text to be expanded.
  • the preset keyword extraction algorithm may include a term frequency-inverse document (TF-IDF) algorithm.
  • TF-IDF term frequency-inverse document
  • TF term frequency
  • IDF Inverse Document Frequency
  • the TF-IDF algorithm is a commonly used weighting technique for information retrieval and data mining. It can evaluate the importance of a word to a document in a document set or a corpus.
  • n represents the number of occurrences of a word in the document
  • m represents the total number of words in the document.
  • w represents the total number of documents in the corpus; W represents the number of documents containing the word.
  • the process of extracting keywords can be understood as calculating the TF-IDF value corresponding to each word in the text, and then sorting each word in descending order according to the TF-IDF value, and using the first few words as keywords.
  • the TF-IDF value corresponding to each phrase in the text to be expanded can be calculated according to the TF-IDF algorithm, and the phrase with the corresponding TF-IDF value greater than the preset TF-IDF threshold is determined as the phrase to be expanded Keywords corresponding to the text.
  • the words surrounding the keywords can also be set as keywords together.
  • the verbs and/or nouns around the keywords can be set as keywords together.
  • Step S102 Invoking the word vectorization model to vectorize each keyword to obtain a keyword word vector corresponding to the text to be expanded.
  • the trained word vectorization model is called to vectorize each keyword to obtain the keyword word vector corresponding to the text to be expanded.
  • the word vectorization model may include a BERT (Bidirectional Encoder Representations from Transformer) model.
  • BERT Bidirectional Encoder Representations from Transformer
  • the initial word vectorization model before calling the word vectorization model, may also be trained to obtain the trained word vectorization model.
  • a large-scale text corpus that is not related to a specific NLP (Natural Language Processing, natural language processing) task may be used to train the BERT model in advance to obtain a trained word vectorized model.
  • the BERT model can use the Attention mechanism to take the semantic vector representation of the target word and each word in the context as input.
  • the vector representation of the target word, the vector representation of each word in the context, and the target word and context are obtained through linear transformation.
  • the original value of each word is expressed, and then the similarity between the vector of the target word and the vector of each context word is calculated as the weight, and the vector of the target word and the vector of each context word are weighted and merged as the output of Attention, that is, the enhanced semantic vector of the target word Express.
  • the word vectorization model after training can also be stored in a node of a blockchain.
  • the trained word vectorization model can be called from the nodes of the blockchain.
  • the BERT model can extract the semantic information around the keyword and integrate the semantic information into the word vector, it can obtain the keyword word vector with enhanced semantics, so that more can be obtained later Similar words with the same or similar semantics as the keywords.
  • Step S103 Determine at least one similar word corresponding to each of the keyword word vectors in the index text library, and add the at least one similar word to the text to be expanded.
  • the indexed text library includes multiple phrases. It can be understood that the indexed text library includes at least one text, and each text includes multiple sentences. Therefore, the indexed text library includes multiple phrases.
  • the similarity between each keyword word vector and all the phrases in the index text database can be calculated to determine the similar words corresponding to each keyword word vector.
  • FIG. 4 is a schematic flowchart of the sub-steps of determining at least one similar word corresponding to each keyword word vector in the index text library in step S103, which specifically may include the following steps S1031 to S1033.
  • Step S1031 based on a preset similarity algorithm, calculate the first similarity between each of the keyword word vectors and the word vectors corresponding to multiple phrases in the index text library.
  • the preset similarity algorithms may include but are not limited to similarity algorithms such as Euclidean distance, cosine similarity, Manhattan distance, and Chebyshev distance.
  • the similarity between each keyword word vector and the word vector corresponding to multiple phrases in the index text library can be calculated according to the cosine similarity algorithm, of course, it can also be calculated according to other similarity algorithms. The specific process will not be repeated here.
  • the cosine similarity algorithm uses the cosine value of the angle between two vectors in the vector space as a measure of the degree of similarity between the two vectors.
  • the formula for calculating the cosine of the included angle is:
  • represents the angle between the vector V 1 and the vector V 2
  • n represents the dimension of the vector V 1 and the vector V 2
  • the value range of the cosine of the angle cos ⁇ is [0, 1].
  • the multiple phrases in the indexed text library may be vectorized. Get word vectors corresponding to multiple phrases.
  • each keyword word vector can be represented as V 0 ; the word vectors corresponding to multiple phrases in the indexed text library can be represented as v 1 , v 2 ,..., v k , where k represents the number of word vectors number.
  • the cosine of the angle between the keyword word vector V 0 and the word vector v 1 , v 2 ,..., v k in the index text library can be calculated separately to obtain the keyword word vector V 0 and the word vector in the index text library.
  • Step S1032 Determine the corresponding target word vector whose first similarity is greater than the first preset similarity threshold.
  • the first preset similarity threshold may be set according to actual conditions, and the specific value is not limited here.
  • Step S1033 Determine the phrase corresponding to the target word vector as a similar word corresponding to each of the keyword word vectors.
  • the word vectors in the indexed text library are obtained by vectorizing multiple phrases in the indexed text library, the word vectors in the indexed text library have corresponding phrases.
  • the phrase corresponding to the target word vector of the keyword word vector is determined as a similar word corresponding to each keyword word vector.
  • each keyword word vector has at least one target word vector, so that at least one similar word of each keyword word vector can be obtained.
  • At least one similar word may be added to the text to be expanded. Adding similar words processing to each text in the indexed text library in turn can obtain an indexed text library with similar words expansion processing.
  • the index text library for the similar word expansion process can also be stored in a node of a blockchain.
  • the indexed text library can be enriched The number of similar words in each text.
  • Step S20 Perform similar word matching on the text to be searched based on a preset search engine to obtain a target phrase corresponding to the text to be searched, wherein the search engine includes an indexed text library that has been expanded by similar words.
  • similar word matching can be performed on the text to be searched based on a preset search engine to obtain the target phrase corresponding to the text to be searched.
  • the search engine includes an indexed text library that has undergone similar word expansion processing.
  • similar word expansion processing process please refer to the detailed description of the foregoing embodiment, which will not be repeated here.
  • the target phrase with similar semantics of the text to be searched can be matched, which can effectively improve the accuracy of search results.
  • FIG. 5 is a schematic flowchart of the sub-steps of performing similar word matching on the text to be searched in step S20 to obtain the target phrase corresponding to the text to be searched, which may specifically include the following steps S201 to S203.
  • Step S201 Perform word segmentation processing on the text to be searched to obtain a set of phrases corresponding to the text to be searched.
  • the phrase set corresponding to the text to be searched is obtained as (A, B, C).
  • Step S202 Calculate a second degree of similarity between the phrase set and multiple phrases in the index text library.
  • the second similarity between the phrase set and multiple phrases in the indexed text library may be calculated according to the cosine similarity algorithm.
  • the indexed text library includes phrase A1, phrase A2, phrase A3, and phrase A4, calculate the second similarity between the phrase set (A, B, C) and phrase A1, phrase A2, phrase A3, and phrase A4, respectively , Obtain the second similarity ⁇ 1 corresponding to the phrase A1, the second similarity ⁇ 2 corresponding to the phrase A2, the second similarity ⁇ 3 corresponding to the phrase A3, and the second similarity ⁇ 4 corresponding to the phrase A4.
  • Step S203 Use at least one phrase with a second similarity greater than a second preset similarity threshold as a target phrase corresponding to the phrase set.
  • the phrase with the second similarity greater than the second preset similarity threshold includes the phrase A1, the phrase A2, and the phrase A3, it can be determined that the target phrase corresponding to the phrase set (A, B, C) is the phrase A1, the phrase A2 and phrase A3.
  • the second preset similarity threshold may be set according to actual conditions, and the specific value is not limited here.
  • Step S30 Generate a search result list according to the target phrase, and display the search result list on the search page.
  • generating the search result list according to the target phrase may include: obtaining the target text corresponding to the target phrase; and sorting the target text according to the second similarity corresponding to the target phrase to obtain the search result list.
  • the target phrase corresponding to the phrase set includes the phrase A1, the phrase A2, and the phrase A3, the text where the phrase A1, the phrase A2, and the phrase A3 are located can be used as the target text.
  • the target text includes text 1, text 2, and text 3.
  • the target text may be sorted in descending order according to the second similarity corresponding to the target phrase. If the magnitude of the second similarity corresponding to the phrase A1, the phrase A2, and the phrase A3 is ⁇ 1 > ⁇ 2 > ⁇ 3 , the search result list obtained is shown in Table 1.
  • one of the repeated texts can be kept, and the other repeated texts can be eliminated.
  • the search result list may be displayed on the search page.
  • the search result list may be rendered on the search page to display the search result list on the search page.
  • after displaying the search result list on the search page it further includes: when a text selection operation on the search result list is received, determining the selected text according to the text selection operation; determining the ranking of the selected text in the search result list Value, when the ranking value is not the preset ranking value, similar words expansion processing is performed on the indexed text library.
  • FIG. 6 is a schematic diagram of a scene of a user's text selection operation on a search result list provided by an embodiment of the present application.
  • the user's text selection operation on the search result list in the search page can be received, and the user's selected text can be determined according to the text selection operation; and then the ranking value of the selected text in the search result list can be judged.
  • the preset ranking value can be set according to actual conditions.
  • the preset ranking value may include the first ranking, and may also include the first ranking and the second ranking.
  • the ranking value of the selected text is the first ranking, indicating that the search result is accurate.
  • the selected text is a non-preset ranking value
  • similar word expansion processing is performed on the indexed text library.
  • Similar word expansion processing is performed on the indexed text library.
  • the indexed text library By receiving the user's text selection operation in the search result list, according to the text selection operation, it is determined whether the indexed text library needs to be expanded again for similar words, and the indexed text library can be expanded for similar words again, which further improves the search of the search engine accuracy.
  • the text search method provided in the above embodiment can determine the text to be searched according to the text input operation and voice input operation of the user in the preset search page, which can conveniently determine the text to be searched by the user, and can also provide the user with more convenience , Flexible text search methods; by adding similar words to the text in the indexed text library, the indexed text library contains more words with the same or similar semantics. When similar words are matched to the user’s search text, it can be semantically Perform matching, thereby improving the accuracy of matching; by extracting keywords from multiple phrases according to the word frequency-inverse document algorithm, you can take full advantage of the fast speed advantage of the word frequency-inverse document algorithm and improve the efficiency of keyword extraction; Use the BERT model to vectorize each keyword.
  • the BERT model can extract the semantic information around the keyword and integrate the semantic information into the word vector, it can obtain the keyword word vector with enhanced semantics, so that more semantics can be obtained later Similar words that are the same or similar to the keywords; by calculating the similarity between each keyword word vector and the word vector in the index text library according to the similarity algorithm, the similar words corresponding to each keyword word vector are added to the In the expanded text, the number of similar words in each text in the indexed text library can be enriched; the search engine based on the indexed text library that has been expanded by similar words is used to match similar words to the text to be searched by the user.
  • Searching for target phrases with similar semantics in the text can effectively improve the accuracy of the search results; by receiving the user's text selection operation in the search result list, according to the text selection operation, it is possible to determine whether the index text library needs to be expanded for similar words again.
  • the similar word expansion processing is performed on the indexed text library again, which further improves the search accuracy of the search engine.
  • FIG. 7 is a schematic block diagram of a text search device 1000 according to an embodiment of the present application.
  • the text search device is used to execute the aforementioned text search method.
  • the text search device can be configured in a server or a terminal.
  • the text search device 1000 includes: a text to be searched acquisition module 1001, a similar word matching module 1002, and a search result generation module 1003.
  • the to-be-searched text acquisition module 1001 is configured to determine the to-be-searched text according to the text search operation when a text search operation in a preset search page is detected;
  • the similar word matching module 1002 is configured to perform similar word matching on the text to be searched based on a preset search engine to obtain the target phrase corresponding to the text to be searched, wherein the search engine includes a similar word expansion process Index text library;
  • the search result generation module 1003 is configured to generate a search result list according to the target phrase, and display the search result list on the search page.
  • the above-mentioned apparatus can be implemented in the form of a computer program, and the computer program can be run on the computer device as shown in FIG. 8.
  • FIG. 8 is a schematic block diagram of a structure of a computer device provided by an embodiment of the present application.
  • the computer equipment can be a server or a terminal.
  • the computer device includes a processor and a memory connected through a system bus, where the memory may include a non-volatile storage medium and an internal memory.
  • the processor is used to provide computing and control capabilities and support the operation of the entire computer equipment.
  • the internal memory provides an environment for the operation of the computer program in the non-volatile storage medium.
  • the processor can execute any text search method.
  • the processor may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), and application specific integrated circuits (Application Specific Integrated Circuits). Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
  • the processor is used to run a computer program stored in a memory to implement the following steps:
  • the text to be searched is determined according to the text search operation; based on the preset search engine, similar word matching is performed on the text to be searched to obtain the text to be searched
  • Corresponding target phrases where the search engine includes an indexed text library that has been expanded by similar words; a search result list is generated according to the target phrase, and the search result list is displayed on the search page.
  • the indexed text library includes at least one text; when the processor detects a text search operation in a preset search page, before determining the text to be searched according to the text search operation, it also Used to achieve:
  • the processor when the processor realizes the determination of at least one keyword in the text to be expanded, it is configured to realize:
  • the indexed text library includes a plurality of phrases; the processor determines at least one similar word corresponding to each of the keyword word vectors in the indexed text library in the implementation, and is used to implement:
  • the text search operation includes a text input operation and a voice input operation; when the processor realizes the determination of the text to be searched according to the text search operation, it is used to realize:
  • the text search operation is a text input operation
  • the text to be searched is obtained according to the input text information
  • voice recognition is performed on the input voice information to obtain the to-be searched text.
  • the processor when the processor implements similar word matching on the text to be searched to obtain the target phrase corresponding to the text to be searched, the processor is used to implement:
  • the processor when the processor realizes generating a search result list according to the target phrase, it is configured to realize:
  • the processor after the processor realizes the display of the search result list on the search page, the processor is further configured to realize:
  • the selected text is determined according to the text selection operation; the ranking value of the selected text in the search result list is determined, when the ranking value is not preset When the ranking value is used, similar word expansion processing is performed on the index text library.
  • the embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium stores a computer program.
  • the computer program includes program instructions, and the processor executes the program instructions to implement any text search method provided in the embodiments of the present application.
  • the computer-readable storage medium may be the internal storage unit of the computer device described in the foregoing embodiment, for example, the hard disk or memory of the computer device.
  • the computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a smart media card (SMC), or a secure digital card equipped on the computer device. , SD Card, Flash Card, etc.
  • the computer-readable storage medium may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, an application program required by at least one function, etc.; the storage data area may store Data created by the use of nodes, etc.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention se rapporte au domaine de l'intelligence artificielle et de la technologie de chaîne de blocs. En effectuant, selon un moteur de recherche comprenant une bibliothèque de textes d'index qui a été soumise à un traitement d'expansion de mots similaires, une correspondance de mots similaires dans un texte dans lequel effectuer une recherche, la précision d'un résultat de recherche peut être améliorée et la bibliothèque de textes d'index peut être stockée dans une chaîne de blocs. L'invention concerne un procédé et un appareil, ainsi qu'un dispositif informatique et un support de stockage. Le procédé de recherche de texte comprend les étapes suivantes : lorsqu'une opération de recherche de texte dans une page de recherche prédéfinie est détectée, déterminer, selon l'opération de recherche de texte, un texte dans lequel effectuer une recherche (S10) ; effectuer, sur la base d'un moteur de recherche prédéfini, une mise en correspondance de mots similaires dans le texte soumis à la recherche de façon à obtenir un groupe de mots cibles correspondant au texte soumis à la recherche, le moteur de recherche comprenant une bibliothèque de textes d'index qui a été soumise à un traitement d'expansion de mots similaires (S20) ; et générer une liste de résultats de recherche selon le groupe de mots cibles et afficher la liste de résultats de recherche sur la page de recherche (S30).
PCT/CN2020/135243 2020-10-21 2020-12-10 Procédé et appareil de recherche de texte, et dispositif informatique et support de stockage WO2021189951A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011133988.0 2020-10-21
CN202011133988.0A CN112256822A (zh) 2020-10-21 2020-10-21 文本搜索方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2021189951A1 true WO2021189951A1 (fr) 2021-09-30

Family

ID=74263686

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/135243 WO2021189951A1 (fr) 2020-10-21 2020-12-10 Procédé et appareil de recherche de texte, et dispositif informatique et support de stockage

Country Status (2)

Country Link
CN (1) CN112256822A (fr)
WO (1) WO2021189951A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114222000A (zh) * 2021-12-13 2022-03-22 中国平安财产保险股份有限公司 信息推送方法、装置、计算机设备和存储介质
CN114492371A (zh) * 2022-02-11 2022-05-13 网易传媒科技(北京)有限公司 文本处理方法及装置、存储介质、电子设备
CN114780673A (zh) * 2022-03-28 2022-07-22 西安远诺技术转移有限公司 基于领域匹配的科技成果管理方法和科技成果管理平台
CN115357605A (zh) * 2022-10-19 2022-11-18 湖南创亚信息科技有限公司 一种客户信息检索方法、装置、电子设备及存储介质
CN115659046A (zh) * 2022-11-10 2023-01-31 果子(青岛)数字技术有限公司 基于ai大数据的技术交易推荐系统及方法
CN116756151A (zh) * 2023-08-17 2023-09-15 公安部信息通信中心 一种知识搜索与数据处理系统

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988753B (zh) * 2021-03-31 2022-10-11 中国建设银行股份有限公司 一种数据搜索方法和装置
CN115408491B (zh) * 2022-11-02 2023-01-17 京华信息科技股份有限公司 一种历史数据的文本检索方法及系统
CN117972097A (zh) * 2024-03-29 2024-05-03 长城汽车股份有限公司 文本的分类方法、分类装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110184946A1 (en) * 2010-01-28 2011-07-28 International Business Machines Corporation Applying synonyms to unify text search with faceted browsing classification
CN102483757A (zh) * 2009-08-21 2012-05-30 米科·韦内宁 用于数据搜索和语言翻译的方法和装置
CN102999569A (zh) * 2012-11-09 2013-03-27 同济大学 用户需求分析定位器和分析及定位方法
US20180181988A1 (en) * 2016-12-26 2018-06-28 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for pushing information
CN108509474A (zh) * 2017-09-15 2018-09-07 腾讯科技(深圳)有限公司 搜索信息的同义词扩展方法及装置
CN111930880A (zh) * 2020-08-14 2020-11-13 易联众信息技术股份有限公司 一种文本编码检索的方法、装置及介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177122B (zh) * 2013-04-15 2017-04-26 天津理工大学 一种基于同义词的个人桌面文件搜索方法
CN108776901B (zh) * 2018-04-27 2021-01-15 微梦创科网络科技(中国)有限公司 基于搜索词的广告推荐方法及系统
US10459962B1 (en) * 2018-09-19 2019-10-29 Servicenow, Inc. Selectively generating word vector and paragraph vector representations of fields for machine learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102483757A (zh) * 2009-08-21 2012-05-30 米科·韦内宁 用于数据搜索和语言翻译的方法和装置
US20110184946A1 (en) * 2010-01-28 2011-07-28 International Business Machines Corporation Applying synonyms to unify text search with faceted browsing classification
CN102999569A (zh) * 2012-11-09 2013-03-27 同济大学 用户需求分析定位器和分析及定位方法
US20180181988A1 (en) * 2016-12-26 2018-06-28 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for pushing information
CN108509474A (zh) * 2017-09-15 2018-09-07 腾讯科技(深圳)有限公司 搜索信息的同义词扩展方法及装置
CN111930880A (zh) * 2020-08-14 2020-11-13 易联众信息技术股份有限公司 一种文本编码检索的方法、装置及介质

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114222000A (zh) * 2021-12-13 2022-03-22 中国平安财产保险股份有限公司 信息推送方法、装置、计算机设备和存储介质
CN114222000B (zh) * 2021-12-13 2024-02-02 中国平安财产保险股份有限公司 信息推送方法、装置、计算机设备和存储介质
CN114492371A (zh) * 2022-02-11 2022-05-13 网易传媒科技(北京)有限公司 文本处理方法及装置、存储介质、电子设备
CN114780673A (zh) * 2022-03-28 2022-07-22 西安远诺技术转移有限公司 基于领域匹配的科技成果管理方法和科技成果管理平台
CN114780673B (zh) * 2022-03-28 2024-04-30 西安远诺技术转移有限公司 基于领域匹配的科技成果管理方法和科技成果管理平台
CN115357605A (zh) * 2022-10-19 2022-11-18 湖南创亚信息科技有限公司 一种客户信息检索方法、装置、电子设备及存储介质
CN115357605B (zh) * 2022-10-19 2023-02-10 湖南创亚信息科技有限公司 一种客户信息检索方法、装置、电子设备及存储介质
CN115659046A (zh) * 2022-11-10 2023-01-31 果子(青岛)数字技术有限公司 基于ai大数据的技术交易推荐系统及方法
CN116756151A (zh) * 2023-08-17 2023-09-15 公安部信息通信中心 一种知识搜索与数据处理系统
CN116756151B (zh) * 2023-08-17 2023-11-24 公安部信息通信中心 一种知识搜索与数据处理系统

Also Published As

Publication number Publication date
CN112256822A (zh) 2021-01-22

Similar Documents

Publication Publication Date Title
WO2021189951A1 (fr) Procédé et appareil de recherche de texte, et dispositif informatique et support de stockage
US11301637B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
CN109101479B (zh) 一种用于中文语句的聚类方法及装置
CN108304375B (zh) 一种信息识别方法及其设备、存储介质、终端
WO2019105432A1 (fr) Procédé et appareil de recommandation de texte, et dispositif électronique
WO2019091026A1 (fr) Procédé de recherche rapide de document dans une base de connaissances, serveur d'application, et support d'informations lisible par ordinateur
US9613024B1 (en) System and methods for creating datasets representing words and objects
WO2017101342A1 (fr) Procédé et appareil de classification de sentiments
CN109299280B (zh) 短文本聚类分析方法、装置和终端设备
US20130060769A1 (en) System and method for identifying social media interactions
US20150161242A1 (en) Identifying and Displaying Relationships Between Candidate Answers
CN110162771B (zh) 事件触发词的识别方法、装置、电子设备
CN111797214A (zh) 基于faq数据库的问题筛选方法、装置、计算机设备及介质
WO2020232898A1 (fr) Procédé et appareil de classification de texte, dispositif électronique et support de stockage non volatil lisible par ordinateur
US20220261545A1 (en) Systems and methods for producing a semantic representation of a document
CN109885813A (zh) 一种基于词语覆盖度的文本相似度的运算方法、系统、服务器及存储介质
CN107885717B (zh) 一种关键词提取方法及装置
WO2023029356A1 (fr) Procédé et appareil de génération d'incorporation de phrases basés sur un modèle d'incorporation de phrases, et dispositif informatique
CN111160007B (zh) 基于bert语言模型的搜索方法、装置、计算机设备及存储介质
CN114880447A (zh) 信息检索方法、装置、设备及存储介质
CN112632261A (zh) 智能问答方法、装置、设备及存储介质
US20220365956A1 (en) Method and apparatus for generating patent summary information, and electronic device and medium
WO2023033942A1 (fr) Recherche efficace dans un index à l'aide de vecteurs agnostiques de langage et de vecteurs de contexte
CN111859079B (zh) 信息搜索方法、装置、计算机设备及存储介质
CN109918661B (zh) 同义词获取方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20926753

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20926753

Country of ref document: EP

Kind code of ref document: A1