CN117891904A - Searching method, terminal device and computer readable storage medium - Google Patents

Searching method, terminal device and computer readable storage medium Download PDF

Info

Publication number
CN117891904A
CN117891904A CN202311655683.XA CN202311655683A CN117891904A CN 117891904 A CN117891904 A CN 117891904A CN 202311655683 A CN202311655683 A CN 202311655683A CN 117891904 A CN117891904 A CN 117891904A
Authority
CN
China
Prior art keywords
information
similarity
search
characteristic information
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311655683.XA
Other languages
Chinese (zh)
Inventor
王赞
董培
庞建新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ubtech Robotics Corp
Original Assignee
Ubtech Robotics Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ubtech Robotics Corp filed Critical Ubtech Robotics Corp
Priority to CN202311655683.XA priority Critical patent/CN117891904A/en
Publication of CN117891904A publication Critical patent/CN117891904A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application is applicable to the technical field of big data, and provides a searching method, terminal equipment and a computer readable storage medium, comprising the following steps: after receiving search information input by a user, performing word segmentation processing on the search information to obtain a plurality of Chinese characters; generating first characteristic information of the search information according to a plurality of Chinese characters; searching target information matched with the first characteristic information in a preset database. By the method, the storage space required by the question-answering system can be reduced, so that the deployment cost of the question-answering system is reduced.

Description

Searching method, terminal device and computer readable storage medium
Technical Field
The application belongs to the technical field of big data, and particularly relates to a searching method, terminal equipment and a computer readable storage medium.
Background
Natural language processing (Natural Language Processing, NLP) is a discipline that uses computer technology to analyze, understand and process natural language for objects in the language. The question-answering task is one of important applications of natural language processing, by which search information (natural language) input by a user can be converted into semantics understandable by a computer, so that the computer can answer questions related to the search information through a large-scale text dataset. However, at present, a question-answering system based on natural language processing occupies a large storage space and is high in deployment cost.
Disclosure of Invention
The embodiment of the application provides a searching method, terminal equipment and a computer readable storage medium, which can reduce the storage space required by a question-answering system, thereby reducing the deployment cost of the question-answering system.
In a first aspect, an embodiment of the present application provides a search method, including:
after receiving search information input by a user, performing word segmentation processing on the search information to obtain a plurality of Chinese characters;
generating first characteristic information of the search information according to a plurality of Chinese characters;
searching target information matched with the first characteristic information in a preset database.
In the embodiment of the application, the search information is split into Chinese characters, and the Chinese characters are matched with the information in the database according to the characteristics of the Chinese characters. Because the number of Chinese characters is greatly smaller than the number of words, the storage space required by the question-answering system can be greatly reduced by the method.
In a possible implementation manner of the first aspect, the performing word segmentation processing on the search information to obtain a plurality of Chinese characters includes:
filtering the search information to obtain Chinese characters in the search information;
and performing word segmentation processing on the Chinese characters to obtain a plurality of Chinese characters.
In the embodiment of the application, before word segmentation, the search information is filtered, so that non-Chinese characters such as numbers, english letters and the like can be deleted, the influence of the non-Chinese characters on word segmentation is effectively reduced, and the word segmentation efficiency is improved.
In a possible implementation manner of the first aspect, the generating, according to a plurality of chinese characters, first feature information of the search information includes:
acquiring word vectors corresponding to the Chinese characters respectively;
and calculating sentence vectors of the search information according to the word vectors corresponding to the Chinese characters, wherein the sentence vectors are the first characteristic information.
One implementation of the sentence vector calculation may be to add the word vectors of each of the plurality of chinese characters to obtain the sentence vector.
Another implementation manner of calculating sentence vectors may be to add respective word vectors of a plurality of Chinese characters to obtain an accumulated vector; and averaging the accumulated vectors to obtain sentence vectors.
The first calculation method is equivalent to accumulating the features of each character in the search information, and can enlarge the features. The second calculation method is equivalent to averaging the features of each character in the search information, and can reflect the average condition of the features. In practical application, the calculation mode of sentence vectors can be selected according to different application scenes.
In a possible implementation manner of the first aspect, the searching, in a preset database, target information matched with the first feature information includes:
acquiring second characteristic information of the stored information in the database;
k candidate information is screened out from the database according to the first characteristic information and the second characteristic information;
calculating a first similarity between the search information and each piece of candidate information according to the first characteristic information and third characteristic information, wherein the third characteristic information is characteristic information of the candidate information;
and determining the target information from k pieces of candidate information according to the first similarity between the search information and each piece of candidate information.
In the embodiment of the application, the process of screening k candidate information is equivalent to a coarse ranking stage or a retrieval stage of searching, namely, a plurality of candidate information similar to the search information is quickly found from a database. The first similarity between the search information and each piece of candidate information is calculated, which is equivalent to the fine ranking stage of the search, and the target information closest to the search information can be accurately found out from the candidate information obtained in the coarse ranking stage. By the method, candidate information which is relatively similar to the search information can be quickly found from a large amount of storage information in the database, so that the search time is saved, and the search efficiency is improved; on the basis, candidate information closest to the search information is further determined, so that the search accuracy is ensured.
In a possible implementation manner of the first aspect, the calculating a first similarity between the search information and each piece of candidate information according to the first feature information and the third feature information includes:
calculating second similarity between the search information and the candidate information according to the first characteristic information and the third characteristic information, wherein the second similarity represents semantic similarity;
calculating a third similarity between the search information and the candidate information according to the first characteristic information and the third characteristic information, wherein the third similarity represents the similarity of texts;
and calculating the first similarity between the search information and the candidate information according to the second similarity and the third similarity.
In the embodiment of the application, the second similarity and the third similarity are fused, and the similarity of the semantics and the similarity of the text are comprehensively considered, so that the matching accuracy is improved.
In a possible implementation manner of the first aspect, the calculating the first similarity between the search information and the candidate information according to the second similarity and the third similarity includes:
and carrying out weighted summation on the second similarity and the third similarity to obtain the first similarity.
In the embodiment of the application, the second similarity and the third similarity are fused, and the similarity of the semantics and the similarity of the text are comprehensively considered, so that the matching accuracy is improved.
In a possible implementation manner of the first aspect, the calculating a second similarity between the search information and the candidate information according to the first feature information and the third feature information includes:
calculating a dot product between the first characteristic information and the third characteristic information to obtain first data;
calculating the product of the vector length of the first characteristic information and the vector length of the third characteristic information to obtain second data;
and calculating the second similarity according to the first data and the second data.
In a possible implementation manner of the first aspect, the calculating a third similarity between the search information and the candidate information according to the first feature information and the third feature information includes:
calculating the sum of the minimum values of corresponding elements in the first characteristic information and the third characteristic information to obtain third data;
calculating the sum of maximum values of corresponding elements in the first characteristic information and the third characteristic information to obtain fourth data;
and calculating the third similarity according to the third data and the fourth data.
In a second aspect, an embodiment of the present application provides a search apparatus, including:
the word dividing unit is used for carrying out word dividing processing on the search information after receiving the search information input by the user to obtain a plurality of Chinese characters;
a generating unit, configured to generate first feature information of the search information according to a plurality of Chinese characters;
and the searching unit is used for searching the target information matched with the first characteristic information in a preset database.
In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the search method according to any one of the first aspects when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, implements a search method as in any one of the first aspects above.
In a fifth aspect, embodiments of the present application provide a computer program product, which, when run on a terminal device, causes the terminal device to perform the search method according to any one of the first aspects above.
It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required for the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a search method provided in an embodiment of the present application;
FIG. 2 is a schematic flow chart of information matching provided in an embodiment of the present application;
FIG. 3 is a schematic diagram of a "skip list" data structure provided by an embodiment of the present application;
fig. 4 is a block diagram of a search apparatus provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of a terminal device provided in an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise.
Natural language processing (Natural Language Processing, NLP) is a discipline that uses computer technology to analyze, understand and process natural language for objects in the language. The question-answering task is one of important applications of natural language processing, by which search information (natural language) input by a user can be converted into semantics understandable by a computer, so that the computer can answer questions related to the search information through a large-scale text dataset.
NLP-based question-answering systems require the storage of a large number of word vectors, which occupy a large amount of memory space. Because the hardware performance of the off-line terminal is generally lower than that of the on-line terminal, a system with larger memory occupation cannot be supported; in addition, in order to ensure the interactive experience, the response speed of the hardware side cannot be too slow, so that the question-answering system is difficult to deploy on the line-off end. At present, the question-answering system based on NLP is usually deployed on line, needs server support, and is high in deployment cost.
Based on this, the embodiment of the application provides a search method. In the embodiment of the application, the search information is split into Chinese characters, and the Chinese characters are matched with the information in the database according to the characteristics of the Chinese characters. Because the number of Chinese characters is greatly smaller than the number of words, the storage space required by the question-answering system can be greatly reduced by the method.
Referring to fig. 1, which is a schematic flow chart of a search method provided in an embodiment of the present application, by way of example and not limitation, the method may include the following steps:
s101, after receiving search information input by a user, performing word segmentation processing on the search information to obtain a plurality of Chinese characters.
The embodiment of the application is equivalent to splitting the search information into independent Chinese characters. In some application scenarios, the search information input by the user may include not only Chinese characters, but also numbers, english letters, and the like. If all characters are processed, the processing efficiency is affected. To improve processing efficiency, in some embodiments, S101 may include:
filtering the search information to obtain Chinese characters in the search information; and performing word segmentation processing on the Chinese characters to obtain a plurality of Chinese characters.
In one implementation of the filtering process, searching whether characters meeting a first preset condition exist in the search information; if the character meeting the first preset condition exists, deleting the character. The first preset condition may be that the code of the character is consistent with the first preset code.
In practical application, a first preset code is obtained according to a coding mode used by the development question-answering system. For example, the first preset code may be an ASCII code, and the ASCII code of the number, english character, and other characters to be filtered is defined as the first preset code in advance. In the filtering process, if the ASCII code of a certain character in the search information is the same as a certain first preset code, the character is deleted.
It is understood that other encoding methods, such as ANSI code, unicode code, UTF-8 code, etc., may be used, and the form of the preset encoding is not particularly limited in the embodiments of the present application.
In another implementation manner of the filtering process, whether each character meets a second preset condition is sequentially judged according to the character sequence; if the character meets the second preset condition, reserving the character; if the character does not meet the second preset condition, deleting the character. The second preset condition may be that the code of the character is consistent with the second preset code.
Continuing with the above example, the second preset code may be an ASCII code, and the ASCII codes of all chinese characters are defined in advance as the second preset code. In the filtering process, if the ASCII code of a certain character in the search information is different from any one of the second preset codes, deleting the character.
In the embodiment of the application, before word segmentation, the search information is filtered, so that non-Chinese characters such as numbers, english letters and the like can be deleted, the influence of the non-Chinese characters on word segmentation is effectively reduced, and the word segmentation efficiency is improved.
One implementation of the word segmentation process is to segment the filtered chinese characters by a preset byte according to the character sequence to obtain a plurality of chinese characters. For example, a chinese character typically occupies 3 bytes, and the filtered chinese character is divided by 3 bytes, in other words, every 3 bytes represents a chinese character.
S102, generating first characteristic information of the search information according to the Chinese characters.
In some embodiments, S102 may include:
acquiring word vectors corresponding to the Chinese characters respectively;
and calculating sentence vectors of the search information according to the word vectors corresponding to the Chinese characters, wherein the sentence vectors are the first characteristic information.
One implementation of the sentence vector calculation may be to add the word vectors of each of the plurality of chinese characters to obtain the sentence vector.
Another implementation manner of calculating sentence vectors may be to add respective word vectors of a plurality of Chinese characters to obtain an accumulated vector; and averaging the accumulated vectors to obtain sentence vectors.
The first calculation method is equivalent to accumulating the features of each character in the search information, and can enlarge the features. The second calculation method is equivalent to averaging the features of each character in the search information, and can reflect the average condition of the features. In practical application, the calculation mode of sentence vectors can be selected according to different application scenes.
In one implementation of acquiring a word vector, acquiring a trained feature model; inputting the Chinese characters into the characteristic model and outputting the corresponding word vectors of the Chinese characters. The characteristic model is used for extracting characteristic information of the Chinese characters.
The characteristic model can adopt a neural network model or other algorithm models capable of extracting characteristic information of Chinese characters. Before the character vector is obtained, training the feature model by utilizing a large number of Chinese character samples in advance to obtain the trained feature model. When the character vector is obtained, the trained feature model is utilized to extract the character vector of the Chinese character, so that the feature extraction efficiency and accuracy can be effectively improved.
And S103, searching target information matched with the first characteristic information in a preset database.
In some embodiments, S103 may include: acquiring second characteristic information of the stored information in the database; and respectively comparing the first characteristic information with the second characteristic information of each piece of stored information to obtain target information matched with the first characteristic information. This approach, while more accurate, is inefficient to search because of the large amount of stored information in the database.
In the embodiment of the present application, referring to fig. 2, a schematic flow chart of information matching provided in the embodiment of the present application is shown. By way of example and not limitation, as shown in fig. 2, S103 may include:
s201, second characteristic information of the information stored in the database is obtained.
In this embodiment, for each piece of stored information in the database, the second feature information obtaining manner is the same as the first feature information obtaining manner of the search information in the foregoing embodiment, and specifically, the first feature information obtaining manner may be referred to, which is not described herein.
And keeping the acquisition modes of the first characteristic information and the second characteristic information consistent, so that the success rate of subsequent matching can be ensured.
S202, k candidate information is screened out from the database according to the first characteristic information and the second characteristic information.
In some embodiments, S202 may employ a navigable small world network (Navigable Small World, NSW) algorithm or a Hierarchical NSW (HNSW) algorithm. The principle of the NSW algorithm is that vertexes of other vectors adjacent to the vertexes of each vector are searched, and connection among the vectors is established, so that a plurality of candidate information which is relatively close to the search information is determined. The HNSW algorithm is based on NSW algorithm and adds a 'skip list' data structure.
Exemplary, referring to fig. 3, a schematic diagram of a "skip list" data structure is provided in an embodiment of the present application. The "skip list" data structure may be divided into a plurality of layers (layers). By way of example and not limitation, fig. 3 shows only 3 layers (layer 0, layer1, and layer 2). Wherein, the bottommost layer0 stores all data, which is equivalent to a complete NSW; the upper layer1 and layer2 store pointer indexes pointing to the graph nodes, and the higher the layer number is, the lower the index number is.
The role of "jump table" is to be understood as that the point to be searched is first quickly brought into close proximity and then searched precisely. Therefore, the HNSW algorithm has higher query speed than the NSW algorithm, and is beneficial to improving the retrieval efficiency of the coarse-ranking stage.
S203, calculating a first similarity between the search information and each piece of candidate information according to the first characteristic information and third characteristic information, wherein the third characteristic information is characteristic information of the candidate information.
In the embodiment of the application, the process of screening k candidate information is equivalent to a coarse ranking stage or a retrieval stage of searching, namely, a plurality of candidate information similar to the search information is quickly found from a database. The first similarity between the search information and each piece of candidate information is calculated, which is equivalent to the fine ranking stage of the search, and the target information closest to the search information can be accurately found out from the candidate information obtained in the coarse ranking stage. By the method, candidate information which is relatively similar to the search information can be quickly found from a large amount of storage information in the database, so that the search time is saved, and the search efficiency is improved; on the basis, candidate information closest to the search information is further determined, so that the search accuracy is ensured.
In some implementations of S203, including:
I. and calculating second similarity between the search information and the candidate information according to the first characteristic information and the third characteristic information, wherein the second similarity represents the similarity degree of semantics.
Optionally, the implementation manner of the step I may include:
calculating a dot product between the first characteristic information and the third characteristic information to obtain first data;
calculating the product of the vector length of the first characteristic information and the vector length of the third characteristic information to obtain second data;
and calculating the second similarity according to the first data and the second data.
Specifically, it can be according to the formulaAnd calculating second similarity, wherein sim1 represents the second similarity, X represents the first characteristic information, and Y represents the third characteristic information.
II. And calculating a third similarity between the search information and the candidate information according to the first characteristic information and the third characteristic information, wherein the third similarity represents the similarity of texts.
Optionally, the implementation manner of step II may include:
calculating the sum of the minimum values of corresponding elements in the first characteristic information and the third characteristic information to obtain third data;
calculating the sum of maximum values of corresponding elements in the first characteristic information and the third characteristic information to obtain fourth data;
and calculating the third similarity according to the third data and the fourth data.
Specifically, it can be according to the formulaCalculating a third similarity, wherein sim2 represents the third similarity, x i Representing the ith element, y in the first characteristic information i Representing the ith element in the third characteristic information.
And III, calculating the first similarity between the search information and the candidate information according to the second similarity and the third similarity.
In one implementation of step III, the maximum value of the second similarity and the third similarity may be taken as the first similarity.
In an embodiment of the present application, an implementation manner of step III may include: and carrying out weighted summation on the second similarity and the third similarity to obtain the first similarity. The weight of the second similarity and the weight of the third similarity can be set according to actual requirements.
Compared with the first implementation mode, the second implementation mode is equivalent to the fusion of the second similarity and the third similarity, and the similarity of the semantics and the similarity of the text are comprehensively considered, so that the matching accuracy is improved.
It will be appreciated that the calculation manners of the second similarity and the third similarity in the above embodiments are only an example, and other calculation manners of the similarity may be also adopted, as long as the second similarity can represent the similarity degree of the semantics and the third similarity can represent the similarity degree of the text. The embodiment of the present application does not specifically limit the calculation manner of the similarity.
In addition, in some embodiments, other similarity items besides the semantic similarity degree and the text similarity degree can be added to the fusion similarity degree, so that more similarity content between the first feature information and the third feature information can be represented. The embodiment of the present application is not particularly limited thereto.
S204, determining the target information from k pieces of candidate information according to the first similarity between the search information and each piece of candidate information.
Specifically, the candidate information may be ordered in the order of the first similarity from the high to the low, and the first candidate information (or the top n candidate information) in the sequence is determined as the target information.
In the embodiment of the application, the search information is subjected to word segmentation, and the first feature vector of the search information is determined according to the features of each Chinese character, and the number of Chinese characters is far smaller than the number of words, so that the storage space required by a database can be greatly reduced in the mode. In the searching process, coarse arrangement is performed first, and then fine arrangement is performed, so that the searching efficiency can be effectively improved. And the HNSW algorithm can be adopted in the coarse row stage, so that the searching efficiency of the coarse row is improved. The precision arrangement stage utilizes the fusion similarity to match, fully considers the semantic information and the text information of sentences, and effectively improves the accuracy of search results.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.
Corresponding to the searching method described in the above embodiments, fig. 4 is a block diagram of the searching apparatus provided in the embodiment of the present application, and for convenience of explanation, only the portion related to the embodiment of the present application is shown.
Referring to fig. 4, the apparatus 4 includes:
and the word dividing unit 41 is used for carrying out word dividing processing on the search information after receiving the search information input by the user, so as to obtain a plurality of Chinese characters.
And a generating unit 42 for generating first characteristic information of the search information according to a plurality of Chinese characters.
And a searching unit 43, configured to search a preset database for target information matched with the first feature information.
Optionally, the word splitting unit 41 is further configured to:
filtering the search information to obtain Chinese characters in the search information;
and performing word segmentation processing on the Chinese characters to obtain a plurality of Chinese characters.
Optionally, the generating unit 42 is further configured to:
acquiring word vectors corresponding to the Chinese characters respectively;
and calculating sentence vectors of the search information according to the word vectors corresponding to the Chinese characters, wherein the sentence vectors are the first characteristic information.
Optionally, the search unit 43 is further configured to:
acquiring second characteristic information of the stored information in the database;
k candidate information is screened out from the database according to the first characteristic information and the second characteristic information;
calculating a first similarity between the search information and each piece of candidate information according to the first characteristic information and third characteristic information, wherein the third characteristic information is characteristic information of the candidate information;
and determining the target information from k pieces of candidate information according to the first similarity between the search information and each piece of candidate information.
Optionally, the search unit 43 is further configured to:
calculating second similarity between the search information and the candidate information according to the first characteristic information and the third characteristic information, wherein the second similarity represents semantic similarity;
calculating a third similarity between the search information and the candidate information according to the first characteristic information and the third characteristic information, wherein the third similarity represents the similarity of texts;
and calculating the first similarity between the search information and the candidate information according to the second similarity and the third similarity.
Optionally, the search unit 43 is further configured to:
and carrying out weighted summation on the second similarity and the third similarity to obtain the first similarity.
Optionally, the search unit 43 is further configured to:
calculating a dot product between the first characteristic information and the third characteristic information to obtain first data;
calculating the product of the vector length of the first characteristic information and the vector length of the third characteristic information to obtain second data;
and calculating the second similarity according to the first data and the second data.
Optionally, the search unit 43 is further configured to:
calculating the sum of the minimum values of corresponding elements in the first characteristic information and the third characteristic information to obtain third data;
calculating the sum of maximum values of corresponding elements in the first characteristic information and the third characteristic information to obtain fourth data;
and calculating the third similarity according to the third data and the fourth data.
It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein again.
In addition, the search device shown in fig. 4 may be a software unit, a hardware unit, or a unit combining both hardware and software, which are built into an existing terminal device, or may be integrated into the terminal device as an independent pendant, or may exist as an independent terminal device.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
Fig. 5 is a schematic structural diagram of a terminal device provided in an embodiment of the present application. As shown in fig. 5, the terminal device 5 of this embodiment includes: at least one processor 50 (only one shown in fig. 5), a memory 51 and a computer program 52 stored in the memory 51 and executable on the at least one processor 50, the processor 50 implementing the steps in any of the various search method embodiments described above when executing the computer program 52.
The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that fig. 5 is merely an example of the terminal device 5 and is not meant to be limiting as the terminal device 5, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.
The processor 50 may be a central processing unit (Central Processing Unit, CPU), the processor 50 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 51 may in some embodiments be an internal storage unit of the terminal device 5, such as a hard disk or a memory of the terminal device 5. The memory 51 may in other embodiments also be an external storage device of the terminal device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the terminal device 5. The memory 51 is used for storing an operating system, application programs, boot Loader (Boot Loader), data, other programs, etc., such as program codes of the computer program. The memory 51 may also be used to temporarily store data that has been output or is to be output.
Embodiments of the present application also provide a computer readable storage medium storing a computer program, which when executed by a processor, may implement the steps in the above-described method embodiments.
The embodiments of the present application provide a computer program product which, when run on a terminal device, causes the terminal device to perform the steps of the method embodiments described above.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to an apparatus/terminal device, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (10)

1. A search method, comprising:
after receiving search information input by a user, performing word segmentation processing on the search information to obtain a plurality of Chinese characters;
generating first characteristic information of the search information according to a plurality of Chinese characters;
searching target information matched with the first characteristic information in a preset database.
2. The method of claim 1, wherein said word-splitting the search information to obtain a plurality of chinese characters comprises:
filtering the search information to obtain Chinese characters in the search information;
and performing word segmentation processing on the Chinese characters to obtain a plurality of Chinese characters.
3. The search method of claim 1, wherein said generating first characteristic information of said search information from a plurality of said chinese characters comprises:
acquiring word vectors corresponding to the Chinese characters respectively;
and calculating sentence vectors of the search information according to the word vectors corresponding to the Chinese characters, wherein the sentence vectors are the first characteristic information.
4. The method of searching according to claim 1, wherein searching for target information matching the first feature information in a preset database includes:
acquiring second characteristic information of the stored information in the database;
k candidate information is screened out from the database according to the first characteristic information and the second characteristic information;
calculating a first similarity between the search information and each piece of candidate information according to the first characteristic information and third characteristic information, wherein the third characteristic information is characteristic information of the candidate information;
and determining the target information from k pieces of candidate information according to the first similarity between the search information and each piece of candidate information.
5. The search method of claim 4, wherein said calculating a first similarity between said search information and each of said candidate information based on said first feature information and third feature information comprises:
calculating second similarity between the search information and the candidate information according to the first characteristic information and the third characteristic information, wherein the second similarity represents semantic similarity;
calculating a third similarity between the search information and the candidate information according to the first characteristic information and the third characteristic information, wherein the third similarity represents the similarity of texts;
and calculating the first similarity between the search information and the candidate information according to the second similarity and the third similarity.
6. The search method of claim 5, wherein said calculating a first similarity between the search information and the candidate information based on the second similarity and the third similarity comprises:
and carrying out weighted summation on the second similarity and the third similarity to obtain the first similarity.
7. The search method of claim 5, wherein the calculating a second similarity between the search information and the candidate information based on the first characteristic information and the third characteristic information comprises:
calculating a dot product between the first characteristic information and the third characteristic information to obtain first data;
calculating the product of the vector length of the first characteristic information and the vector length of the third characteristic information to obtain second data;
and calculating the second similarity according to the first data and the second data.
8. The search method of claim 5, wherein the calculating a third similarity between the search information and the candidate information based on the first characteristic information and the third characteristic information comprises:
calculating the sum of the minimum values of corresponding elements in the first characteristic information and the third characteristic information to obtain third data;
calculating the sum of maximum values of corresponding elements in the first characteristic information and the third characteristic information to obtain fourth data;
and calculating the third similarity according to the third data and the fourth data.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 8 when executing the computer program.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 8.
CN202311655683.XA 2023-12-01 2023-12-01 Searching method, terminal device and computer readable storage medium Pending CN117891904A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311655683.XA CN117891904A (en) 2023-12-01 2023-12-01 Searching method, terminal device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311655683.XA CN117891904A (en) 2023-12-01 2023-12-01 Searching method, terminal device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN117891904A true CN117891904A (en) 2024-04-16

Family

ID=90645530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311655683.XA Pending CN117891904A (en) 2023-12-01 2023-12-01 Searching method, terminal device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN117891904A (en)

Similar Documents

Publication Publication Date Title
CN108287858B (en) Semantic extraction method and device for natural language
EP2829990B1 (en) Image search device, image search method, program, and computer-readable storage medium
CN110362824B (en) Automatic error correction method, device, terminal equipment and storage medium
EP2833275B1 (en) Image search device, image search method, program, and computer-readable storage medium
CN110941951A (en) Text similarity calculation method, text similarity calculation device, text similarity calculation medium and electronic equipment
KR101379128B1 (en) Dictionary generation device, dictionary generation method, and computer readable recording medium storing the dictionary generation program
CN112784009A (en) Subject term mining method and device, electronic equipment and storage medium
CN115392235A (en) Character matching method and device, electronic equipment and readable storage medium
CN115858773A (en) Keyword mining method, device and medium suitable for long document
JP7172187B2 (en) INFORMATION DISPLAY METHOD, INFORMATION DISPLAY PROGRAM AND INFORMATION DISPLAY DEVICE
CN113408660A (en) Book clustering method, device, equipment and storage medium
CN110705285B (en) Government affair text subject word library construction method, device, server and readable storage medium
CN112560425A (en) Template generation method and device, electronic equipment and storage medium
CN113449063B (en) Method and device for constructing document structure information retrieval library
CN115455416A (en) Malicious code detection method and device, electronic equipment and storage medium
CN112732743B (en) Data analysis method and device based on Chinese natural language
CN117891904A (en) Searching method, terminal device and computer readable storage medium
CN111310442B (en) Method for mining shape-word error correction corpus, error correction method, device and storage medium
CN113330430B (en) Sentence structure vectorization device, sentence structure vectorization method, and recording medium containing sentence structure vectorization program
CN115495636A (en) Webpage searching method, device and storage medium
CN112926297A (en) Method, apparatus, device and storage medium for processing information
JPH11328318A (en) Probability table generating device, probability system language processor, recognizing device, and record medium
CN114861062B (en) Information filtering method and device
CN117033769A (en) Front-end component retrieval method, device, equipment and storage medium
CN116306616A (en) Method and device for determining keywords of text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination