CN113722459A

CN113722459A - Question and answer searching method based on natural language processing model and related device

Info

Publication number: CN113722459A
Application number: CN202111014403.8A
Authority: CN
Inventors: 陈凡
Original assignee: Ping An International Smart City Technology Co Ltd
Current assignee: Ping An International Smart City Technology Co Ltd
Priority date: 2021-08-31
Filing date: 2021-08-31
Publication date: 2021-11-30

Abstract

The application relates to an artificial intelligence technology, and provides a question and answer searching method and a related device based on a natural language processing model, wherein the method comprises the following steps: receiving a search request of a target problem input by a user; acquiring target characters and target words corresponding to the target problem based on a natural language processing model; acquiring a first similarity value between the target question and a reference question in a preset question-and-answer text library based on the target word; acquiring a second similarity value between the target question and the reference question based on the target word; based on a third similarity value between the text of the target question and the text of the reference question; performing weighted calculation on the first similarity value, the second similarity value and the third similarity value to obtain a target similarity value between the target problem and the reference problem; and displaying the target answer corresponding to the maximum value of the target similarity value. By the adoption of the method and the device, the accuracy of searching answers can be improved.

Description

Question and answer searching method based on natural language processing model and related device

Technical Field

The application relates to the technical field of artificial intelligence, and mainly relates to a question and answer searching method and a related device based on a natural language processing model.

Background

With the rapid development of the internet, some question-answering systems with a large number of manual participation gradually start to change to an automatic and manual combination mode, and use automatic question-answering recommendation to solve partial problems can reduce manual participation and can quickly respond to user requirements.

At present, words are generally used for simple matching, and recommendation is carried out according to the overlapping degree of the words in the target search words and the problems in the existing knowledge base. However, the scheme only uses words for simple matching, so that the recommended answers cannot meet the requirements of the user, and the user experience is poor.

Disclosure of Invention

The embodiment of the application provides a question and answer searching method and a related device based on a natural language processing model, which can improve the accuracy of answer searching.

In a first aspect, an embodiment of the present application provides a question and answer search method based on a natural language processing model, where:

receiving a search request of a target problem input by a user;

acquiring target characters and target words corresponding to the target problem based on a natural language processing model;

acquiring a first similarity value between the target question and a reference question in a preset question-and-answer text library based on the target word;

acquiring a second similarity value between the target question and the reference question based on the target word;

based on a third similarity value between the text of the target question and the text of the reference question;

performing weighted calculation on the first similarity value, the second similarity value and the third similarity value to obtain a target similarity value between the target problem and the reference problem;

and displaying the target answer corresponding to the maximum value of the target similarity value.

In a second aspect, an embodiment of the present application provides a question-answer searching apparatus based on a natural language processing model, where:

a communication unit for receiving a search request of a target question input by a user;

the processing unit is used for acquiring target characters and target words corresponding to the target problems based on a natural language processing model; acquiring a first similarity value between the target question and a reference question in a preset question-and-answer text library based on the target word; acquiring a second similarity value between the target question and the reference question based on the target word; based on a third similarity value between the text of the target question and the text of the reference question; performing weighted calculation on the first similarity value, the second similarity value and the third similarity value to obtain a target similarity value between the target problem and the reference problem;

and the display unit is used for displaying the target answer corresponding to the maximum value of the target similarity value.

In a third aspect, an embodiment of the present application provides a computer device, including a processor, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor, and the program includes instructions for some or all of the steps described in the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, where the computer program makes a computer execute to implement part or all of the steps described in the first aspect.

The embodiment of the application has the following beneficial effects:

after the question-answer searching method based on the natural language processing model and the related device are adopted, if a searching request of a target question input by a user is received, the target words and the target words corresponding to the target question are obtained based on the natural language processing model. And then acquiring a first similarity value between the target question and a reference question in a preset question-and-answer text base based on the target words, acquiring a second similarity value between the target question and the reference question based on the target words, and acquiring a third similarity value between the target question and the reference question based on the text of the target question and the text of the reference question. And then, carrying out weighted calculation on the first similarity value, the second similarity value and the third similarity value to obtain target similarity values of the target problem and the reference problem, and displaying a target answer corresponding to the maximum value of the target similarity values. Therefore, the similarity value between the target question and the reference question is obtained from the three dimensions of the characters, the words and the text, the accuracy of obtaining the target similarity value can be improved, and the accuracy of displaying the answer of the target question is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Wherein:

fig. 1 is a schematic flowchart of a question-answer searching method based on a natural language processing model according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a question-answer searching apparatus based on a natural language processing model according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art without any inventive work according to the embodiments of the present application are within the scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

The network architecture applied by the embodiment of the application comprises a server and electronic equipment. The number of the electronic devices and the number of the servers are not limited in the embodiment of the application, and the servers can provide services for the electronic devices at the same time. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. The server may alternatively be implemented as a server cluster consisting of a plurality of servers.

The electronic device may be a Personal Computer (PC), a notebook computer, or a smart phone, and may also be an all-in-one machine, a palm computer, a tablet computer (pad), a smart television playing terminal, a vehicle-mounted terminal, or a portable device. The operating system of the PC-side electronic device, such as a kiosk or the like, may include, but is not limited to, operating systems such as Linux system, Unix system, Windows series system (e.g., Windows xp, Windows 7, etc.), Mac OS X system (operating system of apple computer), and the like. The operating system of the electronic device at the mobile end, such as a smart phone, may include, but is not limited to, an operating system such as an android system, an IOS (operating system of an apple mobile phone), a Window system, and the like.

The electronic device may install and run the application program, and the server may be a server corresponding to the application program installed in the electronic device, and provide an application service for the application program. The application program may be a single integrated application software, or an applet embedded in another application, or a system on a web page, etc., which is not limited herein. In the embodiment of the present application, the application program is used for searching answers corresponding to the questions and answers, and also can be used for uploading answers corresponding to the questions and answers, and the like. The application may be a question and answer related application, for example, a community question and answer application. The application may also be an application having a search function, such as a browser or the like. The technical fields of the problem to which the present application relates may include medicine, computer, finance, etc., and are not limited herein.

For example, a user may enter a target question through an application in the electronic device. The application program or the server corresponding to the application program can obtain a target result corresponding to the target problem, and the target result is displayed through the electronic equipment.

The embodiment of the application provides a question and answer searching method based on a natural language processing model, which can be executed by a question and answer searching device based on the natural language processing model, wherein the device can be realized by software and/or hardware, can be generally integrated in electronic equipment or a server, and can improve the searching accuracy.

Referring to fig. 1, fig. 1 is a schematic flow chart of a question-answer searching method based on a natural language processing model according to the present application. Taking the application of the method to a server as an example for illustration, the method includes the following steps S101 to S107, wherein:

s101: a search request for a target issue input by a user is received.

In the embodiment of the application, the search request is used for searching the content corresponding to the target problem. That is, the result of the search is the answer to the target question. The user is not limited in the application, and the user can be any registered user in the question and answer application or a visitor using the question application. The method is not limited to the target problem, and can be fields composed of any characters and punctuations, fields corresponding to a retrieval formula and the like.

S102: and acquiring target words and target words corresponding to the target problem based on a natural language processing model.

In the embodiment of the present application, the target question may be composed of target words, and may also be composed of target words. It should be noted that there may be a single word in the target word. The natural language processing model is not limited, and a jieba word segmentation tool or a word vector model of word2vec can be adopted. At the same time of acquiring the target words, the part of speech of each target word, such as two categories of nouns and verbs, names of persons, places, names of organizations and the like, or auxiliary verbs, name verbs and the like, can be acquired. And acquiring the target words and the word sense of each target word at the same time, thereby determining the meaning of the target words in the target question.

The natural language processing model described above may be stored in one of the blocks created on the blockchain network. The Blockchain (Blockchain) is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. The blockchain is essentially a decentralized database, which is a string of data blocks associated by using cryptography, each data block contains information of a batch of network transactions, and the information is used for verifying the validity (anti-counterfeiting) of the information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer. Therefore, data are stored in a distributed mode through the block chain, data security is guaranteed, and meanwhile data sharing of information among different platforms can be achieved.

S103: and acquiring a first similarity value between the target question and a reference question in a preset question-answer text library based on the target word.

S104: and acquiring a second similarity value between the target question and the reference question based on the target word.

S105: based on a third similarity value between the text of the target question and the text of the reference question.

The execution sequence of step S103, step S104, and step S105 is not limited in the present application, and step S103, step S104, and step S105 may be executed first. Alternatively, step S105 may be performed first, and then step S103 and step S104 may be performed. Or step S103, step S104, step S105, and the like may be performed simultaneously.

In the embodiment of the present application, a question and an answer corresponding to the question may be stored in advance to obtain an answer to the question to be asked. The question and the answer corresponding to the question may be stored in a set, which may be referred to as a preset question and answer text library. The questions in the pre-set question-and-answer text library may be referred to as reference questions, and it is understood that each reference question may also be parsed into a form of a word and a phrase, so that a reference question similar to the target question may be obtained based on the word and the phrase. The predetermined question-answer text library may also be stored in a block created on the blockchain network.

In the embodiment of the present application, the first similarity value is a similarity value between the target problem and the reference problem determined in the word dimension. The second similarity value is a similarity value between the target problem and the reference problem determined by the word dimension. The third similarity value is a similarity value between the target problem and the reference problem determined in the text dimension. The target similarity value may be understood as a composite similarity value of the target problem and the reference problem. It can be understood that the similarity value between the target question and the reference question is obtained from the three dimensions of the characters, the words and the text, so that the accuracy of obtaining the target similarity value can be improved, and the accuracy of displaying the answer of the target question can be improved.

The present application is not limited to the method for obtaining the first similarity value, the second similarity value and the third similarity value, and in a possible example, the step S103 may include the following steps a1 to a5, where:

a1: and acquiring a first word frequency of the target word.

In the embodiment of the present application, Term Frequency (TF) is used to describe the ratio of the number of times a word (or word) appears in a target problem. That is, the word frequency tf of the target word (or target word) may be equal to the number of times n that the ith target word (or target word) appears in the target question, as shown in the following formula (1)_iTotal number of words (or total number of words) of/target problem NThe ratio of (a) to (b). The word frequency of the target word may be referred to as a first word frequency, and the word frequency of the target word may be referred to as a second word frequency.

tf＝n_i/N (1)

For example, if the target question is "what is word frequency", the target words may be "sh", "how", "is", "word", "frequency", and 5 in total. Without repeated words and phrases in the target word, the first word frequency of the target word is 1/5, i.e., 0.2. The target word may be "what", "is", "word frequency", 3 in total, and the second word frequency of the target word is 1/3, i.e. 0.33.

A2: and acquiring a first inverse text frequency of the target word based on a first number of reference questions including the target word in reference questions of a preset question-and-answer text library and a total number of questions in the preset question text library.

In the embodiment of the present application, an Inverse text Frequency (IDF) is used to describe the proportion of the number of times a word (or word) appears in the entire Document. The inverse text frequency of the target word may be referred to as a first inverse text frequency and the inverse text frequency of the target word may be referred to as a second inverse text frequency. The calculation of the target word (or target word) inverse text frequency idf can be referred to the following formula (2):

idf＝log(M/m_i+1) (2)

wherein M is the total number of questions in the preset question-answer text library, and M_iIs the number of reference questions containing the ith target word (or target word). As can be seen from equation (2), the more common a word (or character) is, the larger the denominator is, and the closer the inverse text frequency idf is to 0. The denominator +1 can avoid the case where the denominator is 0, so that the inverse text frequency idf of the target word (or the target word) can be calculated even if the target word (or the target word) does not have all reference problems.

For example, the preset question-and-answer text library includes 2 reference questions of "what is word frequency" and "what is word frequency". The target question is "what is word frequency", and the target question includes "sh", "how", "yes", "word", "frequency", for 5 target words. The target question may also include "what", "is", "word frequency", 3 target words in total. The first inverse text frequencies of "sh", "who", "is", "word", "frequency" are log (2/2+1), log (2/2+1), log (2/1+1), log (2/2+1), log (2/2+1), i.e. 0.48, respectively. The second inverse text frequency of "what", "is", "word frequency" is divided into log (2/2+1), log (2/1+1), log (2/2+1), i.e. 0.48, 0, 0.48.

The execution sequence of step a1 and step a2 is not limited in the present application, and step a1 may be executed first, and then step a2 may be executed. Alternatively, step A2 may be performed first, followed by step A1. Or step a1 and step a2 may be performed simultaneously.

A3: and calculating the product of the first word frequency and the first inverse text frequency to obtain a first fraction of the target word.

In the embodiment of the present application, the first score is used to describe the recall rate of the target word, and may be a product between the first word frequency and the first inverse text frequency. First fraction f₁The calculation of (c) can be referred to the following formula (3):

f₁＝tf₁*idf₁ (3)

wherein, tf₁Is the first word frequency, idf₁Is the first inverse text frequency.

A4: and acquiring a first vector of the target question based on the first score and the scores of the words except the target word in the preset question text library.

In an embodiment of the present application, the first vector may be composed of a first score of the target word and scores of words other than the target word in the preset document library. Note that the score of the word other than the target word in the preset document library may be 0.

For example, if the target question is "what is word frequency", the target words may be "sh", "how", "is", "word", or "frequency", the first word frequencies of the target words are all 0.2, and the first inverse text frequencies of the target words are 0.48, 0, 0.48, and 0.48, respectively. The first score of the target question, which is a product between the first word frequency and the first inverse text frequency of the target question, and the scores of words other than the target word in the preset document corpus, which are shown in table 1 below, may be (0.096, 0.096, 0, 0.096, 0.096, 0.096, 0).

TABLE 1

A5: and calculating cosine similarity between the first vector and a first reference vector of the reference problem to obtain a first similarity value between the target problem and the reference problem.

In the embodiment of the present application, the first similarity value S₀The cosine similarity between the first vector a of the target problem and the first reference vector d of the reference problem may be calculated as in the following equation (4).

It is understood that in steps A1-A5, a first score of the target question is obtained based on a product between a first word frequency and a first inverse text frequency of the target word. A first similarity value between the target question and the reference question is then calculated based on a first vector consisting of the first score of the target question and the scores of the other words and a first reference vector of the reference question. In this way, the similarity value between the target question and the stored reference question is obtained from the character perspective, and the accuracy of obtaining the answer to the target question can be improved.

In one possible example, step S104 may include the following steps B1-B5, wherein:

b1: and acquiring a second word frequency of the target word.

B2: acquiring a second inverse text frequency of the target word based on a second number of reference questions including the target word in reference questions of a preset question-and-answer text library and a total number of questions in the preset question text library

The method for obtaining the second word frequency and the second inverse text frequency may refer to the description of the first word frequency and the description of the first inverse text frequency, which is not repeated herein. In addition, the execution sequence of the step B1 and the step B2 is not limited, and the step B1 may be executed first, and then the step B2 may be executed. Alternatively, step B2 may be performed first, followed by step B1. Or step B1 and step B2 may be performed simultaneously.

B3: and acquiring a second score of the target word based on a preset adjusting parameter, the second word frequency and the second inverse text frequency.

In this embodiment of the present application, the preset adjustment parameter may include a preset weight of the target word, which is used to describe the importance of the target word. That is, the larger the preset weight of the target word is, the more important the target word is in the target question. The preset weight of the target word is not limited in the present application, and in a possible example, before the step B3, the method further includes the following steps: acquiring the part of speech and/or the meaning of the target word based on the natural language processing model; and determining the preset weight of the target word based on the part of speech and/or the word sense.

As described above, the part of speech and/or word sense of the target word may be obtained based on the natural language processing model. The preset weight of the target word is determined based on the part of speech and/or the word meaning of the target word, so that the accuracy of setting the preset weight can be improved.

For example: the preset weight of the verb can be 1.3; the preset weight of the auxiliary verb can be 1.2; the preset weight of the place noun may be 1.4; the preset weight of the indicator word can be 0.7; the preset weight of the stop word may be 0.1, etc.

Stop words (stop words) mean that in information retrieval, some words or words are automatically filtered before or after processing natural language data (or text) in order to save storage space and improve search efficiency. In an embodiment of the present application, a deactivation word list may be stored in advance, and the deactivation word list may include a plurality of deactivation words. Further, stop words can be divided into strict stop words and loose stop words. Strictly stop words are less important than loosely stop words, e.g., "do", "woolen", etc. indicate that the words in question may be strictly stop words, and "then", "and" etc. indicate that the words in compliance may be loosely stop words.

In the embodiment of the present application, the preset adjustment parameters may include a harmonic parameter b, a document length ratio L, and a constant k. And the harmonic parameter b is used for limiting the influence degree of the document length on the calculation result. The document length ratio L is equal to the ratio between the current document length and the average document length, i.e. the ratio between the character length of the target question and the average character length of the reference question. The constant k being used to limit the word frequency tf₂The growth limit of (c). It will be understood that when tf is₂When increasing, tf₂And idf₂The product between increases, so that tf is limited by a constant k₂The limit of increase in value. The value of the constant k is not limited in the present application, and for example, k is 1.2.

Second score f of target word₂Can be calculated with reference to equation (5) as shown below:

f₂＝(idf₂*(k+1)*tf₂/k*(1.0-b+b*L)+tf₂ (5)

wherein, tf₂Is the second word frequency, idf₂Is the second inverse text frequency.

It should be noted that, the preset adjustment parameter is not limited in the present application, and in practical application, other parameters may also be included. In one possible example, the second score of the target word may be multiplied by a preset weight of the target word based on equation (5). Therefore, the accuracy of the second vector is improved.

B4: and acquiring a second vector of the target question based on the second score and the scores of the words except the target word in the preset question text library.

In an embodiment of the present application, the second vector may be composed of the second score of the target word and the scores of words other than the target word in the preset document library. It should be noted that the score of the word other than the target word in the preset document library may be 0.

For example, if the target question is "what is word frequency", the target word may be "what", "is", "word frequency". The second word frequencies of the target words are all 0.33, and the second inverse text frequencies of the target words are 0.48, 0 and 0.48 respectively. The document length ratio L is 1, k is 1.2, and assuming that the blending parameter b is 2, the scores of the target words may be calculated to be 0.23, 0, and 0.23 based on the above formula (5). Assuming that the preset weight of the strict stop word corresponding to "what" is 0.1, the preset weight of the loose stop word corresponding to "yes" is 0.2, and the preset weight of the noun corresponding to "word frequency" is 1.4, the second score of the target word may be 0.023, 0, 0.322. The second score of the target question and the scores of the words other than the target word in the preset document library are 0, and as shown in table 2 below, the second vector may be (0.023, 0, 0.322, 0).

TABLE 2

	What is	Is that	Word frequency	Is composed of
					What is the word frequency	0.023	0	0.322	0

B5: and calculating cosine similarity between the second vector and a second reference vector of the reference problem to obtain a second similarity value between the target problem and the reference problem.

In the embodiment of the present application, the second similarity value S₁It may be the cosine similarity between the second vector e of the target problem and the second reference vector g of the reference problem, which is calculated as in equation (6) below.

It is to be understood that, in steps B1-B5, the second score of the target question is obtained based on the preset weight of the target word, the second word frequency, the second inverse text frequency and the preset parameters. And calculating a second similarity value between the target question and the reference question based on a second vector consisting of the second score of the target question and the scores of the other words and a second reference vector of the reference question. Thus, obtaining a similarity value between the target question and the stored reference question from the viewpoint of words can improve the accuracy of obtaining the answer to the target question.

In one possible example, step S105 may include the following step C1 and step C2, wherein:

c1: and acquiring the Jacard similarity coefficient and the editing distance between the text of the target problem and the text of the reference problem.

In an embodiment of the present application, a Jaccard's (Jaccard) similarity coefficient is used to calculate the degree of similarity between samples of a sign metric or a Boolean metric. The Jacard similarity coefficient is equal to the ratio of the number of sample set intersections and the number of sample set unions. Jacard similarity coefficient S₂₁The calculation formula (7) is as follows.

S₂₁＝N(A∩B)/N(A∪B) (7)

Wherein, A is a sample set of the target problem, and B is a sample set of the reference problem. N (a ≧ B) denotes the number of samples of the union of the sample set a and the sample set B, and N (a ≦ B) denotes the number of samples of the union of the sample set a and the sample set B. For example, the target question is "what is word frequency", the target words may include "what", "yes", "word frequency", and the sample set a (what, yes, word frequency) of the target question. The reference question is "what is word frequency", the words of the reference question may include "what", "is", "word frequency", the sample set B of the reference question (what, yes, word frequency). Based on the above formula (7), N (A. sup. n.sup. n.n.sup. n.sup. n.n.sup. n.n.n.n.n.sup. n.sup. n.n.sup. n.sup. n.n.p.is 4, n.sup. n.s.sup. n.sup. n.g, n.sup. n.b) is 2, n.sup. n.m.m.g, n.sup.sup. n.sup.sup.m.n.n.g, n.n.sup.sup.2, n.sup.sup.sup.sup.sup.e, n.e, n.n.p.e, n.e, n.sup.e, n.sup.sup.m.sup.sup.g, n.sup.g, n.sup.sup.s.sup.sup.n.m.sup.sup.c, n.sup.c, n.g, n.sup.sup.p.n.n.p.p.sup.sup.p.m.m.m.m.f, n.p.p.n.p.2, n.f, n.s.n.p.p.p.p.p.p.p.p.p.e.

In the embodiment of the present application, the edit distance refers to the number of single-character editing operations required to change from one character string to another. The single character operation includes: inserting, deleting and replacing. C, editing the distance S₂₂The calculation of (c) may be equation (8).

S₂₂＝(Max(lent(doc1),lent(doc2))-c)/(Max(lent(doc1),lent(doc2))) (8)

Here, let (doc1) be the text length of the target question, and let (doc2) be the text length of the reference question. Since the number of operations may be greater than the length of the text, the last S₂₂Possibly negative, it is necessary to take the values between this and 0, i.e. S₂₂＝Max(0,S₂₂)。

For example: the one-character operation between the target problem (what is the word frequency) and the reference problem (what is the word frequency) is a step of changing "yes" to "no". That is, c is 1, lent (doc1) is 3, lent (doc2) is 3, and the distance S is edited₂₂＝(Max(3,3)-1)/(Max(3,3))＝0.67。

C2: and acquiring a third similarity value between the text of the target problem and the text of the reference problem based on the Jacard similarity coefficient and the edit distance.

In this embodiment, the third similarity value may be an algorithm obtained by weighting the jaccard similarity coefficient and the edit distance. The preset weight of the Jacard similarity coefficient and the edit distance is not limited, and can be 0.5 and 0.5, or 0.3 and 0.7, or 0.6 and 0.4, and the like. Third similarity value S₂The calculation of (c) may be equation (9).

S₂＝S₂₁*V₂₁+S₂₂*V₂₂ (9)

Wherein S is₂₁And S₂₂Jacard similarity factor and edit distance, V, respectively₂₁And V₂₂Respectively, a preset weight of the Jacard similarity coefficient and a preset weight of the edit distance.

It is understood that, in steps C1 and C2, a third similarity value between the text of the target question and the text of the reference question is obtained based on the jackard similarity coefficient and the edit distance between the text of the target question and the text of the reference question. Therefore, the third similarity value is obtained by two methods for calculating the text similarity value, and the accuracy of obtaining the answer of the target question can be improved.

S106: and performing weighted calculation on the first similarity value, the second similarity value and the third similarity value to obtain a target similarity value between the target problem and the reference problem.

In the embodiment of the present application, the target similarity value may be understood as a comprehensive similarity value of the target problem and the reference problem. The preset weights of the first similarity value, the second similarity value and the third similarity value are not limited and can be specified in advance. For example, the first similarity value S₀Second similar value S₁And a third similarity value S₂The preset weights of (1) are 0.2, 0.3 and 0.5 respectively, and then the target similarity value S is equal to S₀*0.2+S₁*0.3+S₂*0.5。

S107: and displaying the target answer corresponding to the maximum value of the target similarity value.

In the embodiment of the present application, the maximum value of the target similarity value may be understood as a reference problem most similar to the target problem. The target answer corresponding to the maximum value of the target similarity value may be an answer corresponding to the maximum value of the target similarity value in a preset question-answer text library. The present application does not limit the method for obtaining the target answer, and in a possible example, the step S107 may include the following steps D1 to D6, where:

d1: and acquiring a reference answer of the reference question corresponding to the maximum value of the target similarity value.

D2: and acquiring a first reference value of the reference answer based on the historical record of the reference user corresponding to the reference answer.

In the embodiment of the present application, the reference answer is a reference question corresponding to the maximum value of the target similarity value, and corresponds to the stored reference answer. The reference user is a user who submits a reference answer. The reference user history refers to a record in an application program related to question answering, and may include a reference user history of uploading questions and answering questions, and may also include a reference user browsing record and the like. It can be understood that the history of the reference user can be used for determining the knowledge hunting range and the interested direction of the reference user, so that the reference value for acquiring the answer of the reference user to the reference question is improved. The reference value obtained based on the history of the reference user may be referred to as a first reference value. That is, the first reference value is used to describe a referential of the reference answer to which the reference user replies the reference answer.

The method for obtaining the first reference value is not limited in the present application, and in a possible example, the step D2 may include the following steps: determining the quality score of the reference user and the acquaintance of the reference user to the technical field of the target problem according to the historical records; and acquiring a first reference value of the reference answer according to the quality score and the familiar value.

In the embodiment of the present application, the quality score is used to describe an evaluation value of behavior data corresponding to a history of a reference user. The behavior data may include browsing questions or answers, uploading questions, replying to questions, making opinions or suggestions, and the like. Further, the behavior data can be further divided into positive behavior data and negative behavior data. The positive behavior data may be positive behaviors or positive energy behaviors made by the reference user for the question-answering application program, such as question-answering good behaviors, access good behaviors, and the like. The negative behavior data may be negative behaviors or negative energy behaviors made by the reference user for the application of the question and answer, such as a forced exit behavior, a behavior of maliciously evaluating reply content, a maliciously promoted behavior, and the like. It will be appreciated that by analyzing the positive and/or negative behavior data of the reference user, it may be used to determine whether the reference user's behavior in the question-and-answer application is helpful in solving the target problem, and thus a quality score for the reference user may be determined based on the positive and/or negative behavior data.

The quality score may also be determined based on review information in the history, and the like, which is not limited herein. The comment information comprises comments of various users on the questions or answers uploaded by the reference users, can be used for determining question quality and answer quality of the reference users, and is favorable for obtaining quality scores of the reference users.

The technical field of the target problem can be determined by the target words obtained by the natural language processing model and/or the technical field corresponding to the target words. The familiarity value of the reference user with the technical field of the target issue may be determined based on the number of the target histories of the reference user in the technical field, and/or the evaluation value and the like corresponding to the comment information of each target history, and is not limited herein.

It will be appreciated that in this example, the quality score of the reference user is determined from the reference user's history, as well as the reference user's familiarity with the technical field of the target issue. And acquiring a first reference value of the reference answer according to the quality score and the familiar value. Therefore, the accuracy of obtaining the reference value of the reference answer of the reference user is improved.

D3: and acquiring a second reference value of the reference answer based on the natural language processing model.

In the embodiment of the present application, the reference value of the reference answer obtained based on the natural language processing model may be referred to as a second reference value. That is, the second reference value is used to describe the referential of the text of the reference answer. The method for obtaining the second reference value is not limited in the present application, and in a possible example, the step D3 may include the following steps: acquiring a keyword corresponding to the reference answer and the technical field of the target question based on the natural language processing model; acquiring the association value of the keyword based on the knowledge graph in the technical field; and acquiring a second reference value of the reference answer based on the correlation value.

Wherein the keyword is a vocabulary for solving the target problem in the reference answer. The natural language processing model may obtain words in the reference answer and obtain a part of speech and a sense of speech of each word. In this way, stop words in the reference answer, as well as words not related to the target question, can be filtered out.

The technical field can refer to the description of step D2, and will not be described in detail herein. A knowledge graph of a technical field may include associations between words in the technical field. The association value of the keyword may be obtained based on a numerical value corresponding to the connection relationship of the keyword in the knowledge graph, for example, the association value is equal to a product between numerical values on the connection relationship, and the like.

In the embodiment of the present application, mapping relationships between different associated values and reference values may be stored in advance. In this way, after the associated value is obtained, the reference value corresponding to the associated value can be obtained based on the mapping relationship, and is used as the second reference value of the reference answer. It can be understood that, in this example, the second reference value is obtained based on the associated value of the keyword obtained from the knowledge graph in the technical field of the target problem, and both the technical field and the keyword are obtained based on the natural language processing model, which is beneficial to improving the accuracy of obtaining the second reference value.

The execution sequence of the steps D2 and D3 is not limited in the present application, and the steps D2 may be executed first, and then the step D3 may be executed. Alternatively, step D3 is executed first, and then step D2 is executed. Or step D2 and step D3 may be performed simultaneously.

D4: and acquiring a target value of the reference answer based on the first reference value and the second reference value.

In the embodiment of the present application, the target value may be a weighted average value between the first reference value and the second reference value, or a minimum value between the first reference value and the second reference value, etc., which is not limited herein. It can be understood that, the target value of the reference answer is obtained from both the first reference value obtained from the history of the reference user and the second reference value obtained from the reference answer, so that the accuracy of obtaining the target value can be improved.

D5: selecting a target answer from the reference answers based on the target value.

The method for selecting the target answer is not limited, and the reference answer corresponding to the maximum target price value can be selected as the target answer. Or a reference answer corresponding to a target value greater than a specified threshold may be selected as the target answer. For example, a reference answer greater than 80 points is selected as the target answer. The threshold may be specified in advance, or may be determined based on the number of reference answers, or may be determined based on the number set in the user's question-answering application, and the like, which is not limited herein. Or the reference answers corresponding to the larger first N target values can be selected as the target answers. For example, the reference answers corresponding to the top 10 target values are selected as the target answers. N may be determined based on the number of reference answers, or based on the number set in the user's question answering application, and the like, which is not limited herein.

D6: and displaying the target answer.

It is understood that in steps D1-D6, the reference answer of the reference question corresponding to the maximum value of the target similarity value is obtained first. And then acquiring a first reference value of the reference answer based on the behavior data of the reference user corresponding to the reference answer, and acquiring a second reference value of the reference answer based on the natural language processing model. And then obtaining a target value of the reference answers based on the first reference value and the second reference value, selecting the target answers from the reference answers based on the target value, and displaying the target answers so that the user can obtain a search result of the target question. Thus, the accuracy of the search can be improved.

In the method shown in fig. 1, when a search request for a target question input by a user is received, a target word and a target word corresponding to the target question are obtained based on a natural language processing model. And then acquiring a first similarity value between the target question and a reference question in a preset question-and-answer text base based on the target words, acquiring a second similarity value between the target question and the reference question based on the target words, and acquiring a third similarity value between the target question and the reference question based on the text of the target question and the text of the reference question. And then, carrying out weighted calculation on the first similarity value, the second similarity value and the third similarity value to obtain target similarity values of the target problem and the reference problem, and displaying a target answer corresponding to the maximum value of the target similarity values. Therefore, the similarity value between the target question and the reference question is obtained from the three dimensions of the characters, the words and the text, the accuracy of obtaining the target similarity value can be improved, and the accuracy of displaying the answer of the target question is improved.

The method of the embodiments of the present application is set forth above in detail and the apparatus of the embodiments of the present application is provided below.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a question-answer searching apparatus based on a natural language processing model according to an embodiment of the present application, which is consistent with the embodiment shown in fig. 1. As shown in fig. 2, the question-answer searching apparatus 200 includes:

a communication unit 201 for receiving a search request for a target question input by a user;

the processing unit 202 is configured to obtain a target word and a target word corresponding to the target question based on a natural language processing model; acquiring a first similarity value between the target question and a reference question in a preset question-and-answer text library based on the target word; acquiring a second similarity value between the target question and the reference question based on the target word; based on a third similarity value between the text of the target question and the text of the reference question; performing weighted calculation on the first similarity value, the second similarity value and the third similarity value to obtain a target similarity value between the target problem and the reference problem;

and the display unit 203 is configured to display the target answer corresponding to the maximum value of the target similarity value.

In a possible example, the processing unit 202 is specifically configured to obtain a first word frequency of the target word; acquiring a first inverse text frequency of the target word based on a first number of reference questions including the target word in reference questions of a preset question-and-answer text library and a total number of questions in the preset question text library; calculating the product between the first word frequency and the first inverse text frequency to obtain a first score of the target word; acquiring a first vector of the target question based on the first score and the scores of the words except the target word in the preset question text library; and calculating cosine similarity between the first vector and a first reference vector of the reference problem to obtain a first similarity value between the target problem and the reference problem.

In a possible example, the processing unit 202 is specifically configured to obtain a second word frequency of the target word; acquiring a second inverse text frequency of the target word based on a second number of reference questions including the target word in reference questions of a preset question-and-answer text library and a total number of questions in the preset question text library; acquiring a second score of the target word based on a preset adjusting parameter, the second word frequency and the second inverse text frequency; acquiring a second vector of the target question based on the second score and the scores of the words except the target word in the preset question text library; and calculating cosine similarity between the second vector and a second reference vector of the reference problem to obtain a second similarity value between the target problem and the reference problem.

In a possible example, the preset adjustment parameter includes a preset weight of the target word, and the processing unit 202 is further configured to obtain a part of speech and/or a word sense of the target word based on the natural language processing model; and determining the preset weight of the target word based on the part of speech and/or the word sense.

In one possible example, the processing unit 202 is specifically configured to obtain a jackard similarity coefficient and an edit distance between the text of the target question and the text of the reference question; and acquiring a third similarity value between the text of the target problem and the text of the reference problem based on the Jacard similarity coefficient and the edit distance.

In a possible example, the processing unit 202 is further configured to obtain a reference answer to the reference question corresponding to the maximum value of the target similarity value; acquiring a first reference value of the reference answer based on a historical record of a reference user corresponding to the reference answer; acquiring a second reference value of the reference answer based on the natural language processing model; obtaining a target value of the reference answer based on the first reference value and the second reference value; selecting a target answer from the reference answers based on the target value; the display unit 203 is specifically configured to display the target answer.

In one possible example, the processing unit 202 is specifically configured to obtain, based on the natural language processing model, a keyword corresponding to the reference answer and a technical field of the target question; acquiring the association value of the keyword based on the knowledge graph in the technical field; and acquiring a second reference value of the reference answer based on the correlation value.

The detailed process executed by each unit in the question-answer searching device 200 can refer to the execution steps in the foregoing method embodiments, and is not described herein again.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure. As shown in fig. 3, the computer device 300 includes a processor 310, a memory 320, a communication interface 330, and one or more programs 340. The processor 310, the memory 320, and the communication interface 330 are interconnected via a bus 350. The related functions implemented by the communication unit 201 shown in fig. 2 may be implemented by the communication interface 330, and the related functions implemented by the processing unit 202 shown in fig. 2 may be implemented by the processor 310. The related functions realized by the display unit 203 shown in fig. 2 can be realized by

The one or more programs 340 are stored in the memory 320 and configured to be executed by the processor 310, the programs 340 including instructions for:

receiving a search request of a target problem input by a user;

In one possible example, in terms of obtaining the first similarity value between the target question and the reference question in the preset question-and-answer text library based on the target word, the program 340 specifically includes instructions for:

acquiring a first word frequency of the target word;

acquiring a first inverse text frequency of the target word based on a first number of reference questions including the target word in reference questions of a preset question-and-answer text library and a total number of questions in the preset question text library;

calculating the product between the first word frequency and the first inverse text frequency to obtain a first score of the target word;

acquiring a first vector of the target question based on the first score and the scores of the words except the target word in the preset question text library;

and calculating cosine similarity between the first vector and a first reference vector of the reference problem to obtain a first similarity value between the target problem and the reference problem.

In one possible example, in terms of the obtaining a second similarity value between the target question and the reference question based on the target word, the program 340 is specifically configured to execute the following steps:

acquiring a second word frequency of the target word;

acquiring a second inverse text frequency of the target word based on a second number of reference questions including the target word in reference questions of a preset question-and-answer text library and a total number of questions in the preset question text library;

acquiring a second score of the target word based on a preset adjusting parameter, the second word frequency and the second inverse text frequency;

acquiring a second vector of the target question based on the second score and the scores of the words except the target word in the preset question text library;

and calculating cosine similarity between the second vector and a second reference vector of the reference problem to obtain a second similarity value between the target problem and the reference problem.

In one possible example, the preset adjustment parameter comprises a preset weight of the target word, and before the obtaining the second score of the target word based on the preset adjustment parameter, the second word frequency and the second inverse text frequency, the program 340 is further configured to execute the following steps:

acquiring the part of speech and/or the meaning of the target word based on the natural language processing model;

and determining the preset weight of the target word based on the part of speech and/or the word sense.

In one possible example, in terms of a third similarity value between the text based on the target question and the text of the reference question, the program 340 is specifically configured to execute the instructions of:

acquiring a Jacard similarity coefficient and an editing distance between the text of the target problem and the text of the reference problem;

and acquiring a third similarity value between the text of the target problem and the text of the reference problem based on the Jacard similarity coefficient and the edit distance.

In one possible example, in the aspect of displaying the target answer corresponding to the maximum value of the target similarity value, the program 340 is specifically configured to execute the following steps:

obtaining a reference answer of a reference question corresponding to the maximum value of the target similarity value;

acquiring a first reference value of the reference answer based on a historical record of a reference user corresponding to the reference answer;

acquiring a second reference value of the reference answer based on the natural language processing model;

obtaining a target value of the reference answer based on the first reference value and the second reference value;

selecting a target answer from the reference answers based on the target value;

and displaying the target answer.

In one possible example, in the aspect of obtaining the second reference value of the reference answer based on the natural language processing model, the program 340 is specifically configured to execute the following steps:

acquiring a keyword corresponding to the reference answer and the technical field of the target question based on the natural language processing model;

acquiring the association value of the keyword based on the knowledge graph in the technical field;

and acquiring a second reference value of the reference answer based on the correlation value.

Embodiments of the present application also provide a computer storage medium, where the computer storage medium stores a computer program for causing a computer to execute to implement part or all of the steps of any one of the methods described in the method embodiments, and the computer includes an electronic device and a server.

Embodiments of the application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform to implement some or all of the steps of any of the methods recited in the method embodiments. The computer program product may be a software installation package, the computer comprising an electronic device and a server.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art will also appreciate that the embodiments described in this specification are presently preferred and that no particular act or mode of operation is required in the present application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, at least one unit or component may be combined or integrated with another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may also be distributed on at least one network unit. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a hardware mode or a software program mode.

The integrated unit, if implemented in the form of a software program module and sold or used as a stand-alone product, may be stored in a computer readable memory. With such an understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer (which may be a personal computer, a server, a network device, or the like) to execute all or part of the steps of the methods according to the embodiments of the present application. And the aforementioned memory comprises: various media capable of storing program codes, such as a usb disk, a read-only memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A question-answer searching method based on a natural language processing model is characterized by comprising the following steps:

receiving a search request of a target problem input by a user;

2. The method according to claim 1, wherein the obtaining a first similarity value between the target question and a reference question in a preset question-and-answer text base based on the target word comprises:

acquiring a first word frequency of the target word;

3. The method of claim 1, wherein the obtaining a second similarity value between the target question and the reference question based on the target word comprises:

acquiring a second word frequency of the target word;

4. The method of claim 3, wherein the preset adjustment parameter comprises a preset weight of the target word, and before the obtaining the second score of the target word based on the preset adjustment parameter, the second word frequency and the second inverse text frequency, the method further comprises:

5. The method according to any of claims 1-4, wherein said based on a third similarity value between the text of the target question and the text of the reference question comprises:

6. The method according to any one of claims 1 to 4, wherein the displaying the target answer corresponding to the maximum value of the target similarity value comprises:

selecting a target answer from the reference answers based on the target value;

and displaying the target answer.

7. The method of claim 6, wherein obtaining a second reference value of the reference answer based on the natural language processing model comprises:

8. A question-answer search apparatus based on a natural language processing model, comprising:

9. A computer device comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps of the method of any of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, the computer program causing a computer to execute to implement the method of any one of claims 1-7.