CN110825859A - Retrieval method, retrieval device, readable storage medium and electronic equipment - Google Patents

Retrieval method, retrieval device, readable storage medium and electronic equipment Download PDF

Info

Publication number
CN110825859A
CN110825859A CN201911001415.XA CN201911001415A CN110825859A CN 110825859 A CN110825859 A CN 110825859A CN 201911001415 A CN201911001415 A CN 201911001415A CN 110825859 A CN110825859 A CN 110825859A
Authority
CN
China
Prior art keywords
information
determining
matching degree
retrieval
question
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911001415.XA
Other languages
Chinese (zh)
Inventor
姜梦晓
王乾
于首杰
王天华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rajax Network Technology Co Ltd
Original Assignee
Rajax Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rajax Network Technology Co Ltd filed Critical Rajax Network Technology Co Ltd
Priority to CN201911001415.XA priority Critical patent/CN110825859A/en
Publication of CN110825859A publication Critical patent/CN110825859A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Abstract

The embodiment of the invention discloses a retrieval method, a retrieval device, a readable storage medium and electronic equipment. And the first statement vector and the second statement vector are respectively represented by statement vectors obtained after weighting corresponding word vectors of words in the retrieval information and the problem information. According to the method, the weight information of each word in the retrieval information is introduced in the process of determining the matching degree of the retrieval information and each question information, so that the efficiency and the accuracy of the whole retrieval process are improved.

Description

Retrieval method, retrieval device, readable storage medium and electronic equipment
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a retrieval method, a retrieval apparatus, a readable storage medium, and an electronic device.
Background
In the process of using the software, users often encounter a plurality of problems, so that a plurality of users can consult in a voice or text mode by dialing a customer service telephone or contacting with an online customer service. As the number of users increases, more and more users will be served to the questions and services that need to be solved. To save costs, more and more software is beginning to respond to user questions with machine voice or machine on-line service. In this process, the user's question is actually obtained by means of speech recognition or text input by the user, and then the search answer is fed back to the client. However, in the process of solving the problem by adopting the machine customer service, the accuracy of the returned problem answer is not high.
Disclosure of Invention
In view of the above, the embodiments of the present invention are directed to improving the efficiency and accuracy of the problem retrieval process.
In a first aspect, an embodiment of the present invention discloses a retrieval method, where the method includes:
determining retrieval information and a data information set, wherein the data information set comprises at least one information pair, and the information pair comprises question information and corresponding answer information;
determining word vectors corresponding to words contained in the retrieval information;
carrying out weighting processing on the word vector corresponding to each word to determine a first statement vector for representing the retrieval information;
determining each question information in the data information set and determining a corresponding second statement vector;
calculating a first matching degree of the first statement vector and each second statement vector;
determining candidate problem information according to the corresponding first matching degree;
and outputting answer information corresponding to the retrieval information according to the candidate question information.
Further, the calculating the first matching degree of the first statement vector and each second statement vector specifically includes:
and calculating the distance between the first statement vector and each second statement vector to determine the corresponding first matching degree.
Further, the determining of the candidate problem information according to the corresponding first matching degree specifically includes:
and determining the corresponding problem information with the first matching degree larger than the first threshold value as candidate problem information.
Further, the determining of the candidate problem information according to the corresponding first matching degree specifically includes:
and determining N pieces of problem information with the maximum first matching degree as candidate problem information, wherein N is a first preset constant.
Further, the outputting answer information corresponding to the retrieval information according to the candidate question information includes:
determining target problem information matched with the retrieval information in the candidate problem information;
and outputting answer information corresponding to the target question information.
Further, the determining candidate problem information according to the corresponding first matching degree includes:
determining a plurality of problem information in a data information set according to a preset rule;
determining a second matching degree of the retrieval information and each question information;
and determining at least one candidate question message according to the second matching degree corresponding to each question message.
Further, the determining of the second matching degree between the retrieval information and each question information specifically includes:
and inputting the retrieval information and each question information into a pre-trained prediction model respectively to determine a corresponding second matching degree.
Further, the determining at least one candidate question information according to the second matching degree corresponding to each question information specifically includes:
and determining the problem information with the second matching degree larger than a second threshold value as candidate problem information.
Further, the determining at least one candidate question information according to the second matching degree corresponding to each question information specifically includes:
and determining M pieces of problem information with the maximum second matching degree as candidate problem information, wherein M is a second preset constant.
In a second aspect, an embodiment of the present invention discloses a retrieval apparatus, where the apparatus includes:
the information determining module is used for determining retrieval information and a data information set, wherein the data information set comprises at least one information pair, and the information pair comprises question information and corresponding answer information.
And the word vector determining module is used for determining the word vector corresponding to each word contained in the retrieval information.
And the weighting module is used for weighting the word vector corresponding to each word so as to determine a first statement vector used for representing the retrieval information.
And the statement vector determining module is used for determining a second statement vector corresponding to each problem information in the data information set.
And the calculating module is used for calculating the first matching degree of the first statement vector and each second statement vector.
The candidate problem determining module is used for determining candidate problem information according to the corresponding first matching degree;
and the answer output module is used for outputting answer information corresponding to the retrieval information according to the candidate question information.
In a third aspect, an embodiment of the present invention discloses a computer-readable storage medium for storing computer program instructions, which when executed by a processor implement the method according to any one of the first aspect.
In a fourth aspect, an embodiment of the present invention discloses an electronic device, including a memory and a processor, where the memory is used to store one or more computer program instructions, where the one or more computer program instructions are executed by the processor to implement the following steps:
determining retrieval information and a data information set, wherein the data information set comprises at least one information pair, and the information pair comprises question information and corresponding answer information;
determining word vectors corresponding to words contained in the retrieval information;
carrying out weighting processing on the word vector corresponding to each word to determine a first statement vector for representing the retrieval information;
determining each question information in the data information set and determining a corresponding second statement vector;
calculating a first matching degree of the first statement vector and each second statement vector;
determining candidate problem information according to the corresponding first matching degree;
and outputting answer information corresponding to the retrieval information according to the candidate question information.
Further, the calculating the first matching degree of the first statement vector and each second statement vector specifically includes:
and calculating the distance between the first statement vector and each second statement vector to determine the corresponding first matching degree.
Further, the determining of the candidate problem information according to the corresponding first matching degree specifically includes:
and determining the corresponding problem information with the first matching degree larger than the first threshold value as candidate problem information.
Further, the determining of the candidate problem information according to the corresponding first matching degree specifically includes:
and determining N pieces of problem information with the maximum first matching degree as candidate problem information, wherein N is a first preset constant.
Further, the outputting answer information corresponding to the retrieval information according to the candidate question information includes:
determining target problem information matched with the retrieval information in the candidate problem information;
and outputting answer information corresponding to the target question information.
Further, the determining candidate problem information according to the corresponding first matching degree includes:
determining a plurality of problem information in a data information set according to a preset rule;
determining a second matching degree of the retrieval information and each question information;
and determining at least one candidate question message according to the second matching degree corresponding to each question message.
Further, the determining of the second matching degree between the retrieval information and each question information specifically includes:
and inputting the retrieval information and each question information into a pre-trained prediction model respectively to determine a corresponding second matching degree.
Further, the determining at least one candidate question information according to the second matching degree corresponding to each question information specifically includes:
and determining the problem information with the second matching degree larger than a second threshold value as candidate problem information.
Further, the determining at least one candidate question information according to the second matching degree corresponding to each question information specifically includes:
and determining M pieces of problem information with the maximum second matching degree as candidate problem information, wherein M is a second preset constant.
The method determines a word vector corresponding to each word contained in retrieval information, determines a first statement vector after weighting the word vector, then calculates and determines a first matching degree of the first statement vector and a second statement vector corresponding to each question information in the data information set, determines candidate question information according to the first matching degree, and finally outputs answer information corresponding to the retrieval information according to the candidate question information. According to the method, the weight information of each word in the retrieval information is introduced in the process of determining the matching degree of the retrieval information and each question information, so that the efficiency and the accuracy of the whole retrieval process are improved.
Drawings
The above and other objects, features and advantages of the embodiments of the present invention will become more apparent from the following description of the embodiments of the present invention with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart of a retrieval method according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating candidate problem information according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating answer information according to an embodiment of the present invention;
FIG. 4 is a flowchart of a retrieval method according to an alternative implementation of the present invention;
FIG. 5 is a diagram of a retrieving apparatus according to an embodiment of the present invention;
fig. 6 is a schematic diagram of an electronic device according to an embodiment of the invention.
Detailed Description
The present invention will be described below based on examples, but the present invention is not limited to only these examples. In the following detailed description of the present invention, certain specific details are set forth. It will be apparent to one skilled in the art that the present invention may be practiced without these specific details. Well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the present invention.
Further, those of ordinary skill in the art will appreciate that the drawings provided herein are for illustrative purposes and are not necessarily drawn to scale.
Unless the context clearly requires otherwise, throughout the description, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including, but not limited to".
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.
Fig. 1 is a flowchart of a retrieval method according to an embodiment of the present invention, and as shown in fig. 1, the retrieval method includes:
and step S100, determining retrieval information and a data information set.
Specifically, the retrieval information is a question which is input by the user and contains retrieval content, and may be text information input by the user or text information converted from input voice information. For example, "how to modify the shipping address". The data information set is a set of all or part of data in the database and comprises at least one information pair, and the information pair comprises question information and corresponding answer information. For example: "question information: how is the address modified? (ii) a Answer information: click on 'shipping address' in 'personal information' to edit.
And S200, determining word vectors corresponding to the words contained in the retrieval information.
Specifically, the search information is subjected to word segmentation processing to obtain a word sequence composed of a plurality of words. For example, when the input retrieval information is "how to do address error after ordering", the word sequence obtained after word segmentation is { "ordering", "after", "address", "error", "how to do" }. And respectively determining word vectors corresponding to the words in the word sequence. The method for determining the word vector corresponding to each word may be, for example, sequentially inputting each word in the word sequence into a word vector model, and sequentially outputting the corresponding word vectors. The word vector model may be, for example, a word2vec model, a glove model, an ELMo model, a BERT model, and the like.
Step 300, performing weighting processing on the word vector corresponding to each word to determine a first statement vector for representing the retrieval information.
Specifically, the weighting process includes averaging a sequence formed by word vectors corresponding to all the words to obtain an average vector, and weighting the obtained average vector in consideration of the weight of the word corresponding to each word vector in the search information to determine the first sentence vector. The weighting processing process includes the steps of firstly scoring word frequency-inverse text frequency (TF-IDF) of words corresponding to each word vector, then weighting the average vector according to the TF-IDF score corresponding to each word vector to obtain a statement representation, and finally removing common vectors in the statement representation by using a principal component analysis method to determine the first statement vector. Optionally, the statement representation may also be determined by a smooth reverse frequency (SIF) method, where the smooth reverse frequency method determines a weight according to the frequency of occurrence of a word in the search information, and a word with a lower frequency of occurrence is weighted more heavily.
And S400, determining each question information in the data information set and determining a corresponding second statement vector.
Specifically, the second statement vector may be a vector processed in advance in a database, and corresponds to each question information one to one. Or processing and determining each question information in the data information set through a process of processing a first statement vector, that is, processing each question information in the data information set through the methods described in steps S200 and S300 to determine a corresponding second statement vector, where each second statement vector is used to represent a corresponding question information in the data information set.
Step S500, calculating a first matching degree of the first statement vector and each second statement vector.
Specifically, the first matching degree of the first statement vector and each second statement vector may be determined by calculating a distance between the first statement vector and each second statement vector, where the distance may be a cosine distance. For example, when the first statement vector is q and the second statement vector is a, the first matching degree of the first statement vector and the second statement vector is:
Figure BDA0002241446950000071
and S600, determining candidate problem information according to the corresponding first matching degree.
Specifically, the candidate problem information may be determined according to a preset first threshold, that is, when the first matching degree corresponding to the problem information is greater than the first threshold, the problem information is determined to be the candidate problem information. For example, when the question information includes { a }1,a2,a3,a4,a5And determining that the candidate problem information is { a } when the corresponding first matching degrees are {0.23,0.75,0.19,0.52,0.91} respectively and the preset first threshold is 0.52,a4,a5}。
As another alternative to the embodiments of the present applicationIn an implementation manner, the candidate problem information may be determined according to a preset number, and N problem information with the largest first matching degree is determined as candidate problem information, where N is a first preset constant. For example, when the question information includes { a }1,a2,a3,a4,a5And determining that the candidate problem information is { a } when the corresponding first matching degrees are {0.23,0.75,0.19,0.52,0.91} respectively and the first preset constant N is 32,a4,a5}。
As a further optional implementation manner of the embodiment of the present application, the candidate problem information may be determined by combining the two rules, that is, determining the problem information in the database, where the first matching degree is greater than the first threshold, and then determining, as the candidate problem information, the N problem information with the largest first matching degree from among the problem information, where the first matching degree is greater than the first threshold. For example, when the question information includes { a }1,a2,a3,a4,a5The corresponding first matching degrees are respectively {0.23,0.75,0.19,0.52,0.91}, and when a preset first threshold is 0.5 and the first preset constant N is 2, the candidate problem information is determined to be { a }2,a5}。
Optionally, N may be set to be a constant range. When the first matching degree is larger than a first threshold value, the quantity of the problem information is smaller than the minimum value N of the NminThen, N with the maximum first matching degree is selected from the data information setminThe individual question information is used as candidate question information; when the first matching degree is larger than a first threshold value, the number of the problem information is within the minimum value N of NminAnd maximum value NmaxIn the meantime, all the problem information with the first matching degree larger than the first threshold is determined as candidate problem information; when the first matching degree is larger than a first threshold value, the number of the problem information is larger than the maximum value N of NmaxThen, N is determined in the question information with the first matching degree larger than a first threshold valuemaxThe individual question information is taken as candidate question information.
As an optional implementation manner of the embodiment of the present application, the process of determining candidate problem information includes the following steps:
step S610, determining a plurality of question information in the data information set according to a preset rule.
Specifically, the preset rule may be, for example, the rule for determining candidate problem information according to the preset first threshold, the preset number, and the combination of the threshold and the preset number, which is not described herein again.
And step S620, determining a second matching degree of the retrieval information and each question information.
Specifically, the process of determining the second matching degree between the search information and each question information is to input the search information and each question information into a pre-trained prediction model, and input the second matching degree between each question information and the search information. The prediction model is obtained by training according to problem information, historical retrieval information and clicked problem information in each historical retrieval in a data information set, namely, each problem information and historical retrieval information are used as input of the prediction model, when the input problem information is clicked in the retrieval process, the output is determined to be 1, and when the input problem information is not clicked, the output is determined to be 0.
Step S630, determining at least one candidate question information according to the second matching degree corresponding to each question information.
Specifically, the candidate problem information may be determined, for example, by determining the problem information with the second matching degree greater than the second threshold as the candidate problem information, or determining M problem information with the largest second matching degree as the candidate problem information, where M is a second preset constant. Optionally, the candidate problem information may be determined by combining the two rules.
Fig. 2 is a schematic diagram illustrating candidate question information according to an embodiment of the present invention, and as shown in fig. 2, the candidate question information 20 may be displayed through a display interface of a client. Optionally, the candidate question information 20 is displayed through a selection control on the client display interface, that is, the user may select at least one candidate task information by triggering the selection control.
And 700, outputting answer information corresponding to the retrieval information according to the candidate question information.
Specifically, the answer information corresponding to the retrieval information is answer information corresponding to one candidate question information. After determining candidate question information according to step S600, when one of the candidate question information is selected, that is, it is determined that answer information corresponding to the candidate question information is answer information corresponding to the retrieval information, and the answer information is output. The method for selecting the candidate information can be, for example, sending a selection instruction containing the identification of the selected candidate problem information to the server, and the selection instruction can be sent by triggering a selection control at the client.
Fig. 3 is a schematic diagram illustrating answer information display according to an embodiment of the present invention, and as shown in fig. 3, when the candidate task information is selected, answer information 30 corresponding to the search information is output, and the answer information 30 may be displayed through a display interface of a client.
According to the method, the weight information of each word in the retrieval information is introduced in the process of determining the matching degree of the retrieval information and each question information, so that the efficiency and the accuracy of the whole retrieval process are improved.
Fig. 4 is a flowchart of a retrieval method in an alternative implementation manner according to an embodiment of the present invention, and as shown in fig. 4, the method includes:
and step S800, determining retrieval information.
Specifically, the determining process of the retrieval information may be that a user inputs voice information or text information through a client, where the client may be an intelligent terminal such as a mobile phone, a computer, a tablet computer, and the like, and when the user input information is voice information, the client may convert the voice information into text information.
And step S900, determining a first matching degree of the retrieval information and each question information.
Specifically, the process of determining the first matching degree is the same as the process described in S200-S500, and is not repeated here.
And S1000, sorting the problem information according to the sequence of the first matching degrees from large to small so as to determine a first list.
Specifically, all the problem information is acquired from the data information set, and then the problem information is sorted according to the first matching degree from large to small, so as to determine a first list, that is, the first list is a sequence in which the first matching degree decreases from front to back.
Step 1100, obtaining the problem information with the first matching degree larger than the threshold value from the first list to determine a second list.
Specifically, whether the first matching degree of each question information in the first list is greater than a threshold value is judged, when the first matching degree of the question information is greater than the threshold value, the question information is sequentially acquired, a second list is determined, and the second list is a sequence in which the first matching degree is sequentially decreased from front to back.
And step S1200, determining the number M of the question information in the second list.
Specifically, the number M of question information in the second list is determined, and is determined according to a preset constant range N (N)min-Nmax) Problem information contained in the first list or the second list is acquired. Wherein, when M<NminThen, the first N in the first list is obtainedminProblem information; when N is presentmin<M<NmaxThen, all the question information in the second list is obtained; when M is>NmaxThen, the first N in the second list is obtainedmaxInformation of the individual problem.
Taking the preset constant range of 5-10 as an example for explanation, when the number M of the question information is 4, acquiring the first 5 question information in the first list; when the number M of the question information is 7, acquiring 7 question information contained in the second list; when the number M of question information is 19, the top 10 question information pieces are acquired in the second list.
And step S1300, determining second matching degrees of the retrieval information and the acquired problem information.
Specifically, the second matching degree may be determined, for example, by inputting the search information and the question information into a pre-trained model and outputting the corresponding second matching degree.
Step S1400: and determining candidate problem information according to the second matching degree.
Specifically, the process of determining candidate problem information is the same as that in step S630, and is not repeated here.
And S1500, outputting answer information corresponding to the retrieval information according to the candidate question information.
Specifically, the step of determining and outputting the answer information corresponding to the search information is the same as the step S700, and is not repeated herein.
The method in this embodiment first screens the question information by determining a first matching degree between the search information and each question information, and then screens the question information a second time by determining a second matching degree between the search information and each question information. According to the method, the question information is screened by using two different screening modes to obtain the best matched question information, so that the corresponding answer information is determined, and the accuracy of the retrieval efficiency is improved.
Fig. 5 is a schematic diagram of a search apparatus according to an embodiment of the present invention, as shown in fig. 5, the search apparatus includes an information determining module 50, a word vector determining module 51, a weighting module 52, a second word vector determining module 53, a calculating module 54, a candidate question determining module 55, and an answer outputting module 56.
Specifically, the information determining module 50 is configured to determine retrieval information and a data information set, where the data information set includes at least one information pair, and the information pair includes question information and corresponding answer information. The word vector determining module 51 is configured to determine a word vector corresponding to each word included in the search information. The weighting module 52 is configured to perform weighting processing on the word vector corresponding to each word to determine a first sentence vector used for characterizing the search information. The statement vector determining module 53 is configured to determine that each question information in the data information set determines a corresponding second statement vector. The calculating module 54 is configured to calculate a first matching degree between the first statement vector and each second statement vector. The candidate question determining module 55 is configured to determine candidate question information according to the corresponding first matching degree. The answer output module 56 is configured to output answer information corresponding to the retrieval information according to the candidate question information.
The device can determine a word vector corresponding to each word contained in the retrieval information, determine a first statement vector after weighting the word vector, calculate and determine a first matching degree of the first statement vector and a second statement vector corresponding to each question information in the data information set, determine candidate question information according to the first matching degree, and finally output answer information corresponding to the retrieval information according to the candidate question information. The retrieval method realized by the device can introduce the weight information of each word in the retrieval information in the process of determining the matching degree of the retrieval information and each question information, and improves the efficiency and the accuracy of the whole retrieval process.
Fig. 6 is a schematic view of an electronic device according to an embodiment of the present invention, as shown in fig. 6, in this embodiment, the electronic device may be a server or a terminal, and the terminal may be, for example, an intelligent device such as a mobile phone, a computer, a tablet computer, and the like. As shown, the electronic device includes: at least one processor 62; a memory 61 communicatively coupled to the at least one processor; and a communication component 63 communicatively coupled to the storage medium, the communication component 63 receiving and transmitting data under control of the processor; wherein the memory 61 stores instructions executable by the at least one processor 62, the instructions being executable by the at least one processor 62 to implement the steps of:
determining retrieval information and a data information set, wherein the data information set comprises at least one information pair, and the information pair comprises question information and corresponding answer information;
determining word vectors corresponding to words contained in the retrieval information;
carrying out weighting processing on the word vector corresponding to each word to determine a first statement vector for representing the retrieval information;
determining each question information in the data information set and determining a corresponding second statement vector;
calculating a first matching degree of the first statement vector and each second statement vector;
determining candidate problem information according to the corresponding first matching degree;
and outputting answer information corresponding to the retrieval information according to the candidate question information.
Further, the calculating the first matching degree of the first statement vector and each second statement vector specifically includes:
and calculating the distance between the first statement vector and each second statement vector to determine the corresponding first matching degree.
Further, the determining of the candidate problem information according to the corresponding first matching degree specifically includes:
and determining the corresponding problem information with the first matching degree larger than the first threshold value as candidate problem information.
Further, the determining of the candidate problem information according to the corresponding first matching degree specifically includes:
and determining N pieces of problem information with the maximum first matching degree as candidate problem information, wherein N is a first preset constant.
Further, the outputting answer information corresponding to the retrieval information according to the candidate question information includes:
determining target problem information matched with the retrieval information in the candidate problem information;
and outputting answer information corresponding to the target question information.
Further, the determining candidate problem information according to the corresponding first matching degree includes:
determining a plurality of problem information in a data information set according to a preset rule;
determining a second matching degree of the retrieval information and each question information;
and determining at least one candidate question message according to the second matching degree corresponding to each question message.
Further, the determining of the second matching degree between the retrieval information and each question information specifically includes:
and inputting the retrieval information and each question information into a pre-trained prediction model respectively to determine a corresponding second matching degree.
Further, the determining at least one candidate question information according to the second matching degree corresponding to each question information specifically includes:
and determining the problem information with the second matching degree larger than a second threshold value as candidate problem information.
Further, the determining at least one candidate question information according to the second matching degree corresponding to each question information specifically includes:
and determining M pieces of problem information with the maximum second matching degree as candidate problem information, wherein M is a second preset constant.
In particular, the memory 61, as a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The processor 62 executes various functional applications of the device and data processing by executing nonvolatile software programs, instructions, and modules stored in the memory, that is, implements the above-described retrieval method.
The memory 61 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory 61 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 61 may optionally include memory located remotely from the processor 62, which may be connected to an external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory 61 and, when executed by the one or more processors 62, perform the retrieval method of any of the method embodiments described above.
The product can execute the method disclosed in the embodiment of the present application, and has corresponding functional modules and beneficial effects of the execution method, and reference may be made to the method disclosed in the embodiment of the present application without detailed technical details in the embodiment.
The present invention also relates to a computer-readable storage medium for storing a computer-readable program for causing a computer to perform some or all of the above-described method embodiments.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
The embodiment of the invention discloses A1 and a retrieval method, wherein the method comprises the following steps:
determining retrieval information and a data information set, wherein the data information set comprises at least one information pair, and the information pair comprises question information and corresponding answer information;
determining word vectors corresponding to words contained in the retrieval information;
carrying out weighting processing on the word vector corresponding to each word to determine a first statement vector for representing the retrieval information;
determining each question information in the data information set and determining a corresponding second statement vector;
calculating a first matching degree of the first statement vector and each second statement vector;
determining candidate problem information according to the corresponding first matching degree;
and outputting answer information corresponding to the retrieval information according to the candidate question information.
A2, according to the method in a1, the calculating the first matching degree of the first term vector and each second term vector specifically includes:
and calculating the distance between the first statement vector and each second statement vector to determine the corresponding first matching degree.
A3, according to the method described in a1, the determining candidate question information according to the corresponding first matching degree specifically includes:
and determining the corresponding problem information with the first matching degree larger than the first threshold value as candidate problem information.
A4, according to the method described in a1, the determining candidate question information according to the corresponding first matching degree specifically includes:
and determining N pieces of problem information with the maximum first matching degree as candidate problem information, wherein N is a first preset constant.
A5, according to the method in A1, the outputting answer information corresponding to the search information according to the candidate question information includes:
determining target problem information matched with the retrieval information in the candidate problem information;
and outputting answer information corresponding to the target question information.
A6, according to the method of A1, the determining candidate question information according to the corresponding first matching degree includes:
determining a plurality of problem information in a data information set according to a preset rule;
determining a second matching degree of the retrieval information and each question information;
and determining at least one candidate question message according to the second matching degree corresponding to each question message.
A7, according to the method in a6, the determining the second matching degree between the search information and each question information is specifically:
and inputting the retrieval information and each question information into a pre-trained prediction model respectively to determine a corresponding second matching degree.
A8, according to the method described in a6, the determining at least one candidate question information according to the second matching degree corresponding to each question information specifically includes:
and determining the problem information with the second matching degree larger than a second threshold value as candidate problem information.
A9, according to the method described in a6, the determining at least one candidate question information according to the second matching degree corresponding to each question information specifically includes:
and determining M pieces of problem information with the maximum second matching degree as candidate problem information, wherein M is a second preset constant.
The embodiment of the invention also discloses B1 and a retrieval device, wherein the retrieval device comprises:
the information determining module is used for determining retrieval information and a data information set, wherein the data information set comprises at least one information pair, and the information pair comprises question information and corresponding answer information;
the word vector determining module is used for determining word vectors corresponding to all words contained in the retrieval information;
the weighting module is used for weighting the word vector corresponding to each word so as to determine a first statement vector used for representing the retrieval information;
the statement vector determining module is used for determining a second statement vector corresponding to each problem information in the data information set;
the calculation module is used for calculating a first matching degree of the first statement vector and each second statement vector;
the candidate problem determining module is used for determining candidate problem information according to the corresponding first matching degree;
and the answer output module is used for outputting answer information corresponding to the retrieval information according to the candidate question information.
The embodiment of the invention also discloses C1 and a computer readable storage medium for storing computer program instructions, wherein the computer program instructions realize the method of any one of A1-A9 when being executed by a processor.
The embodiment of the invention also discloses D1, an electronic device, comprising a memory and a processor, wherein the memory is used for storing one or more computer program instructions, and the one or more computer program instructions are executed by the processor to realize the following steps:
determining retrieval information and a data information set, wherein the data information set comprises at least one information pair, and the information pair comprises question information and corresponding answer information;
determining word vectors corresponding to words contained in the retrieval information;
carrying out weighting processing on the word vector corresponding to each word to determine a first statement vector for representing the retrieval information;
determining each question information in the data information set and determining a corresponding second statement vector;
calculating a first matching degree of the first statement vector and each second statement vector;
determining candidate problem information according to the corresponding first matching degree;
and outputting answer information corresponding to the retrieval information according to the candidate question information.
D2, according to the electronic device of D1, the calculating the first matching degree of the first term vector and each second term vector specifically includes:
and calculating the distance between the first statement vector and each second statement vector to determine the corresponding first matching degree.
D3, determining candidate problem information according to the corresponding first matching degree, according to the electronic device of D1, specifically:
and determining the corresponding problem information with the first matching degree larger than the first threshold value as candidate problem information.
D4, determining candidate problem information according to the corresponding first matching degree, according to the electronic device of D1, specifically:
and determining N pieces of problem information with the maximum first matching degree as candidate problem information, wherein N is a first preset constant.
D5, the electronic device according to D1, the outputting answer information corresponding to the retrieval information according to the candidate question information includes:
determining target problem information matched with the retrieval information in the candidate problem information;
and outputting answer information corresponding to the target question information.
D6, the electronic device according to D1, the determining candidate question information according to the corresponding first matching degree includes:
determining a plurality of problem information in a data information set according to a preset rule;
determining a second matching degree of the retrieval information and each question information;
and determining at least one candidate question message according to the second matching degree corresponding to each question message.
D7, according to the electronic device of D6, the determining the second matching degree between the search information and each question information specifically includes:
and inputting the retrieval information and each question information into a pre-trained prediction model respectively to determine a corresponding second matching degree.
D8, determining at least one candidate question information according to the second matching degree corresponding to each question information according to the electronic device described in D6 specifically includes:
and determining the problem information with the second matching degree larger than a second threshold value as candidate problem information.
D9, determining at least one candidate question information according to the second matching degree corresponding to each question information according to the electronic device described in D6 specifically includes:
and determining M pieces of problem information with the maximum second matching degree as candidate problem information, wherein M is a second preset constant.

Claims (10)

1. A method of searching, the method comprising:
determining retrieval information and a data information set, wherein the data information set comprises at least one information pair, and the information pair comprises question information and corresponding answer information;
determining word vectors corresponding to words contained in the retrieval information;
carrying out weighting processing on the word vector corresponding to each word to determine a first statement vector for representing the retrieval information;
determining each question information in the data information set and determining a corresponding second statement vector;
calculating a first matching degree of the first statement vector and each second statement vector;
determining candidate problem information according to the corresponding first matching degree;
and outputting answer information corresponding to the retrieval information according to the candidate question information.
2. The method according to claim 1, wherein the calculating the first matching degree of the first sentence vector and each second sentence vector is specifically:
and calculating the distance between the first statement vector and each second statement vector to determine the corresponding first matching degree.
3. The method according to claim 1, wherein the determining candidate problem information according to the corresponding first matching degree is specifically:
and determining the corresponding problem information with the first matching degree larger than the first threshold value as candidate problem information.
4. The method according to claim 1, wherein the determining candidate problem information according to the corresponding first matching degree is specifically:
and determining N pieces of problem information with the maximum first matching degree as candidate problem information, wherein N is a first preset constant.
5. The method according to claim 1, wherein the outputting answer information corresponding to the retrieval information according to the candidate question information comprises:
determining target problem information matched with the retrieval information in the candidate problem information;
and outputting answer information corresponding to the target question information.
6. The method of claim 1, wherein determining candidate problem information based on the corresponding first degree of match comprises:
determining a plurality of problem information in a data information set according to a preset rule;
determining a second matching degree of the retrieval information and each question information;
and determining at least one candidate question message according to the second matching degree corresponding to each question message.
7. The method according to claim 6, wherein the determining the second matching degree between the retrieval information and each question information is specifically:
and inputting the retrieval information and each question information into a pre-trained prediction model respectively to determine a corresponding second matching degree.
8. A retrieval apparatus, characterized in that the apparatus comprises:
the information determining module is used for determining retrieval information and a data information set, wherein the data information set comprises at least one information pair, and the information pair comprises question information and corresponding answer information;
the word vector determining module is used for determining word vectors corresponding to all words contained in the retrieval information;
the weighting module is used for weighting the word vector corresponding to each word so as to determine a first statement vector used for representing the retrieval information;
the statement vector determining module is used for determining a second statement vector corresponding to each problem information in the data information set;
the calculation module is used for calculating a first matching degree of the first statement vector and each second statement vector;
the candidate problem determining module is used for determining candidate problem information according to the corresponding first matching degree;
and the answer output module is used for outputting answer information corresponding to the retrieval information according to the candidate question information.
9. A computer readable storage medium storing computer program instructions which, when executed by a processor, implement the method of any one of claims 1-7.
10. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer program instructions, wherein the one or more computer program instructions are executed by the processor to implement the steps of:
determining retrieval information and a data information set, wherein the data information set comprises at least one information pair, and the information pair comprises question information and corresponding answer information;
determining word vectors corresponding to words contained in the retrieval information;
carrying out weighting processing on the word vector corresponding to each word to determine a first statement vector for representing the retrieval information;
determining each question information in the data information set and determining a corresponding second statement vector;
calculating a first matching degree of the first statement vector and each second statement vector;
determining candidate problem information according to the corresponding first matching degree;
and outputting answer information corresponding to the retrieval information according to the candidate question information.
CN201911001415.XA 2019-10-21 2019-10-21 Retrieval method, retrieval device, readable storage medium and electronic equipment Pending CN110825859A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911001415.XA CN110825859A (en) 2019-10-21 2019-10-21 Retrieval method, retrieval device, readable storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911001415.XA CN110825859A (en) 2019-10-21 2019-10-21 Retrieval method, retrieval device, readable storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN110825859A true CN110825859A (en) 2020-02-21

Family

ID=69549943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911001415.XA Pending CN110825859A (en) 2019-10-21 2019-10-21 Retrieval method, retrieval device, readable storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110825859A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639194A (en) * 2020-05-29 2020-09-08 天健厚德网络科技(大连)有限公司 Knowledge graph query method and system based on sentence vectors
CN113342968A (en) * 2021-05-21 2021-09-03 中国石油天然气股份有限公司 Text abstract extraction method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180181673A1 (en) * 2016-12-28 2018-06-28 Beijing Baidu Netcom Science And Technology Co., Ltd. Answer searching method and device based on deep question and answer
CN108345672A (en) * 2018-02-09 2018-07-31 平安科技(深圳)有限公司 Intelligent response method, electronic device and storage medium
CN108628825A (en) * 2018-04-10 2018-10-09 平安科技(深圳)有限公司 Text message Similarity Match Method, device, computer equipment and storage medium
CN108763529A (en) * 2018-05-31 2018-11-06 苏州大学 A kind of intelligent search method, device and computer readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180181673A1 (en) * 2016-12-28 2018-06-28 Beijing Baidu Netcom Science And Technology Co., Ltd. Answer searching method and device based on deep question and answer
CN108345672A (en) * 2018-02-09 2018-07-31 平安科技(深圳)有限公司 Intelligent response method, electronic device and storage medium
CN108628825A (en) * 2018-04-10 2018-10-09 平安科技(深圳)有限公司 Text message Similarity Match Method, device, computer equipment and storage medium
CN108763529A (en) * 2018-05-31 2018-11-06 苏州大学 A kind of intelligent search method, device and computer readable storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639194A (en) * 2020-05-29 2020-09-08 天健厚德网络科技(大连)有限公司 Knowledge graph query method and system based on sentence vectors
CN111639194B (en) * 2020-05-29 2023-08-08 天健厚德网络科技(大连)有限公司 Knowledge graph query method and system based on sentence vector
CN113342968A (en) * 2021-05-21 2021-09-03 中国石油天然气股份有限公司 Text abstract extraction method and device

Similar Documents

Publication Publication Date Title
CN107609101B (en) Intelligent interaction method, equipment and storage medium
CN107797984B (en) Intelligent interaction method, equipment and storage medium
CN108829808B (en) Page personalized sorting method and device and electronic equipment
US11763164B2 (en) Image-to-image search method, computer-readable storage medium and server
CN110795542A (en) Dialogue method and related device and equipment
CN110347866B (en) Information processing method, information processing device, storage medium and electronic equipment
CN110717099A (en) Method and terminal for recommending film
CN110909145A (en) Training method and device for multi-task model
CN112632257A (en) Question processing method and device based on semantic matching, terminal and storage medium
CN107291774B (en) Error sample identification method and device
JP2019191975A (en) Talent selection device, talent selection system, talent selection method, and program
CN110825859A (en) Retrieval method, retrieval device, readable storage medium and electronic equipment
WO2023029397A1 (en) Training data acquisition method, abnormal behavior recognition network training method and apparatus, computer device, storage medium, computer program and computer program product
CN114706945A (en) Intention recognition method and device, electronic equipment and storage medium
CN111046203A (en) Image retrieval method, image retrieval device, storage medium and electronic equipment
CN110765250A (en) Retrieval method, retrieval device, readable storage medium and electronic equipment
CN106651410B (en) Application management method and device
CN116204624A (en) Response method, response device, electronic equipment and storage medium
CN109344327B (en) Method and apparatus for generating information
CN111460811A (en) Crowdsourcing task answer verification method and device, computer equipment and storage medium
CN107766944B (en) System and method for optimizing system function flow by utilizing API analysis
CN111858966B (en) Knowledge graph updating method and device, terminal equipment and readable storage medium
CN111739518B (en) Audio identification method and device, storage medium and electronic equipment
CN113641767A (en) Entity relationship extraction method, device, equipment and storage medium
KR102060110B1 (en) Method, apparatus and computer program for classifying object in contents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200221

RJ01 Rejection of invention patent application after publication