CN113407813B

CN113407813B - Method for determining candidate information, method for determining query result, device and equipment

Info

Publication number: CN113407813B
Application number: CN202110722521.8A
Authority: CN
Inventors: 刘子航; 王锴睿; 白亚楠; 李鹏飞; 欧阳宇; 王丛
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2024-01-26
Anticipated expiration: 2041-06-28
Also published as: CN113407813A

Abstract

The disclosure provides a method for determining candidate information, a method, a device, equipment and a storage medium for determining query results, which are applied to the field of artificial intelligence, particularly to the technical field of natural language processing and deep learning, and can be applied to intelligent medical scenes and search scenes. The specific implementation scheme of the method for determining the candidate information is as follows: extracting characteristic information of each history session for each history session in the plurality of history sessions; based on the characteristic information, determining a quality evaluation value of each historical dialog segment by adopting a preset evaluation model; and determining a historical conversation segment with the quality evaluation value larger than a preset evaluation value threshold value in the historical conversation segments to obtain candidate information.

Description

Method for determining candidate information, method for determining query result, device and equipment

Technical Field

The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of natural language processing and deep learning, and can be applied to intelligent medical scenes and search scenes. And in particular to a method for determining candidate information, a method for determining query results, a device, equipment and a storage medium.

Background

In search scenarios, query-retrieved content tends to be generalized. Often, targeted query results are not provided due to incomplete query statements provided by the user. For scenarios where knowledge is obtained through a dialogue segment consulted on a query line, reference value query results are often not provided due to the uneven quality of the dialogue segment.

Disclosure of Invention

A method, apparatus, device and storage medium for determining candidate information, and determining query results are provided, which improve the quality of candidate information and facilitate improving the accuracy of query results.

According to one aspect of the present disclosure, there is provided a method of determining candidate information, including: extracting characteristic information of each history session for each history session in the plurality of history sessions; based on the characteristic information, determining a quality evaluation value of each historical dialog segment by adopting a preset evaluation model; and determining a historical conversation segment with the quality evaluation value larger than a preset evaluation value threshold value in the historical conversation segments to obtain candidate information.

According to another aspect of the present disclosure, there is provided a method of determining a query result, including: based on the query statement, obtaining a query expression for the query statement; obtaining a plurality of dialog segments from the candidate information based on the query expression; and determining a target dialog segment of the plurality of dialog segments as a query result for the query statement, wherein the candidate information is determined using the method of determining candidate information described above.

According to another aspect of the present disclosure, there is provided an apparatus for determining candidate information, including: the characteristic information extraction module is used for extracting characteristic information of each history session section aiming at each history session section in the plurality of history session sections; the first evaluation value determining module is used for determining a quality evaluation value of each historical dialog segment by adopting a predetermined evaluation model based on the characteristic information; and the candidate information obtaining module is used for determining a historical conversation segment with the quality evaluation value larger than a preset evaluation value threshold value in the plurality of historical conversation segments to obtain candidate information.

According to another aspect of the present disclosure, there is provided an apparatus for determining a query result, including: the expression obtaining module is used for obtaining a query expression aiming at a query statement based on a query result; and the dialogue segment obtaining module is used for obtaining a plurality of dialogue segments from the candidate information based on the query expression. And a query result determining module, configured to determine a target dialog segment of the plurality of dialog segments as a query result for the query statement, where the candidate information is determined by using the aforementioned means for determining candidate information.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods of determining candidate information and/or methods of determining query results provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of determining candidate information and/or the method of determining query results provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of determining candidate information and/or the method of determining query results provided by the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is an application scenario diagram of a method of determining candidate information, a method of determining query results, and an apparatus according to an embodiment of the present disclosure;

FIG. 2 is a flow diagram of a method of determining candidate information according to an embodiment of the present disclosure;

FIG. 3 is a flow diagram of a method of determining query results according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of determining a first keyword for candidate information according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of determining weights of a first keyword according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of determining a second keyword for a query statement, according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of determining a target dialog segment of a plurality of dialog segments, according to an embodiment of the disclosure;

FIG. 8 is a schematic diagram of ordering a plurality of target dialog segments according to an embodiment of the disclosure;

FIG. 9 is a block diagram of an apparatus for determining candidate information according to an embodiment of the disclosure;

FIG. 10 is a block diagram of an apparatus for determining query results according to an embodiment of the present disclosure; and

FIG. 11 is a block diagram of an electronic device for implementing a method of determining candidate information and/or a method of determining query results in accordance with an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The present disclosure provides a method of determining candidate information, including a feature information extraction stage, an evaluation value determination stage, and a candidate information acquisition stage. In the feature information extraction stage, feature information of each history session is extracted for each history session of the plurality of history sessions. In the evaluation value determination stage, a quality evaluation value of each history session is determined using a predetermined evaluation model based on the feature information. In the candidate information obtaining stage, a history dialogue segment in which the quality evaluation value is greater than a predetermined evaluation value threshold value among a plurality of history dialogue segments is determined, and candidate information is obtained.

An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1.

Fig. 1 is an application scenario diagram of a method of determining candidate information, a method of determining query results, and an apparatus according to an embodiment of the present disclosure.

As shown in fig. 1, the application scenario 100 of this embodiment may include a user 110, a terminal device 120, and a server 130. Terminal device 120 may be communicatively coupled to server 130 via a network, which may include wired or wireless communication links.

The terminal device 120 may be various electronic devices having a display function and capable of providing a man-machine interaction interface, including but not limited to a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like. The user 110 may query information through the terminal device 120, for example, through interaction with the terminal device 120. The information to be searched may be information of various fields such as information of medical fields and information of educational fields, and the information to be searched may be disease information obtained by searching for symptoms, attributes of an article obtained by searching for article names, and the like.

Illustratively, when the user 110 inputs the query sentence 140 through the terminal device 120, the terminal device 120 may, for example, send the query sentence 140 to the server 130. The server queries the knowledge base according to the query statement, obtains the query result 150, and feeds back the query result to the terminal device 120. The terminal device 120 may then present the query results 150 to the user 110. The present disclosure is not limited in this regard.

The server 130 may be, for example, a server providing various services, such as a background management server providing support for a website or client application accessed by a user using a terminal device. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

In one embodiment, as shown in FIG. 1, the application scenario 100 may further include a database 160 in which a full amount of knowledge base is maintained, which may include, for example, a session segment for online consultation. The server 130 may access the database 160 over a network to query from the database 160 for query results 150 according to the query statement 140.

In an embodiment, the server 130 may also filter the session segments in the full knowledge base maintained in the database, to obtain a session segment with higher quality, and store the selected session segment in a storage space other than the database 160, to generate the candidate information base 170. As such, the candidate information repository 170 may be queried according to the query statement 140 to obtain the query result 150. And thus increase the reference value of the query results 150 obtained by the query.

The database 160 may be, for example, a database independent of the server 130, or may be a data storage module integrated in the server 130, which is not limited in this disclosure.

It should be noted that, the method for determining candidate information and/or the method for determining the query result provided in the present disclosure may be performed by the server 130, or may be performed by another server communicatively connected to the server 130. Accordingly, the method for determining candidate information and/or the device for determining query results provided by the present disclosure may be disposed in the server 130, or may be disposed in another server communicatively connected to the server 130. The method for determining the candidate information and the method for determining the query result may be performed by the same server or may be performed by different servers, which is not limited in this disclosure.

It should be understood that the number and types of terminal devices, servers, and databases in fig. 1 are merely illustrative. There may be any number and type of terminal devices, servers, and databases as desired for implementation.

The method for determining the labeling information, the method for determining the query result and the overall principle of the query information provided by the present disclosure will be described in detail below with reference to fig. 2 to 8 by taking video labeling as an example.

Fig. 2 is a flow diagram of a method of determining candidate information according to an embodiment of the present disclosure.

As shown in fig. 2, the method 200 of determining candidate information of this embodiment may include operations S210 to S230.

In operation S210, feature information of each of the plurality of history dialogue segments is extracted for each history dialogue segment.

According to embodiments of the present disclosure, the historical dialog may be a dialog generated by a user online consultation. The information of the online consultation can be attributes of the article, disease names corresponding to symptoms, using methods of the article and the like. In one embodiment, the historical dialog may be dialog generated by an online inquiry. It will be appreciated that the above-described historical dialog is merely exemplary to facilitate an understanding of the present disclosure, which is not limited in this disclosure.

According to an embodiment of the present disclosure, the smoothness of each sentence in each history dialogue section may be determined, and the smoothness may be taken as feature information. The method can identify the entity words and the association relations between the entity words in the history dialogue section, and takes the association relations between the entity words and the entity words as characteristic information.

Illustratively, in an online consultation scenario, a sentence input by two objects is included in a history dialog segment. In the embodiment, when extracting the feature information, the first sentence for the first object in each history dialogue section may be used as an input of the first intention recognition model, and the first intention type of the first sentence may be obtained. And simultaneously, taking a second sentence aiming at a second object in each historical dialog segment as the input of the first intention recognition model to obtain a second intention type of the second sentence. Wherein the first intent recognition model and the second intent recognition model may be constructed based on multiple classification models to derive intent types from the output. After the intention types of the first sentences and the second sentences are obtained, the intention types of the plurality of first sentences can be counted, the intention types of the plurality of second sentences can be counted, and the richness of the historical dialog can be determined according to the counting result. The variety of the intention types obtained by statistics is high, and the richness is high.

For example, in an online consultation scenario, the first object may be a user and the second object may be a doctor. The first intent type may include a plurality of types of causes, diagnoses, medication, diet recommendations, credits, and the like. Based on the first intent type, clarification of the user's condition, etc., may be facilitated. The second type of intent may include a plurality of types including examination advice, medication advice, daily advice, diagnosis of a condition, collection of a condition, greeting, no intent, and the like. Based on the second intention type, the number of kinds of intention types, the order of intention types, and the like may be counted. According to the result obtained by statistics of the first intention type and the second intention type, the information quantity, the specialty, the credibility, the doctor question-answer satisfaction and the like of the information provided by the doctor can be determined. For example, if the first intent type includes a dietary recommendation and the second intent type includes a daily recommendation, it may be determined that the doctor's answer meets a requirement of the user, and finally a doctor-patient question-answer satisfaction is determined according to a ratio between the number of times the doctor meets the requirement and the number of times the user's requirement. The embodiment can take the illness state, information quantity, specialty, credibility, doctor-patient question-answer satisfaction and the like of the users as characteristic information.

For example, the method can also be based on a preset consultation process and an intention link to be included in the process. After the first intention type and the second intention type are obtained, whether the intention type comprises the preset intention type of each intention link or not can be further determined, so that whether the dialogue is complete or not is determined, and the completeness of the dialogue is obtained. The first sentences may also be compared with each other and the second sentences may be compared with each other to determine whether repeated dialogs exist or not, etc. The session integrity and/or the number of repeated sessions are also used as characteristic information.

According to an embodiment of the present disclosure, the first sentence and the second sentence may also be input into a predetermined classification model via which a satisfaction category between the first sentence and the second sentence is obtained. For example, a first sentence which is a question sentence and a second sentence which is a statement sentence adjacent to and subsequent to the first sentence may be spliced together as inputs of a predetermined classification model. And determining whether the statement sentence can be used as a reply sentence of the question sentence or not by a predetermined classification model, if so, determining that the satisfaction category is satisfied, and otherwise, determining that the satisfaction category is not satisfied. The embodiment can determine the satisfaction degree of the doctor-patient questions and answers by determining the proportion of the satisfied categories after analyzing all sentences in the dialogue section. Or the amount of physician answer information may be determined based on the number of categories satisfied. The satisfaction of the doctor-patient questions and answers and the information quantity of the doctor answer are taken as characteristic information.

For example, the first intent recognition model, the second intent recognition model, and the predetermined classification model may be built for an optimization-based distributed gradient enhancement library (eXtreme Grandient Boosting, XGBoost). At least one of the first intent recognition model, the second intent recognition model and the predetermined classification model may be provided with a semantic understanding model and a logistic regression model, for example, to enable understanding of semantics and classification of sentences. The semantic understanding model may include, for example, a recurrent neural network model, and it is understood that the present disclosure does not limit the types of the first intent recognition model, the second intent recognition model, and the predetermined classification model.

In operation S220, a quality assessment value for each historical dialog segment is determined using a predetermined assessment model based on the characteristic information.

In operation S230, a history dialogue segment in which the quality evaluation value is greater than a predetermined evaluation value threshold among a plurality of history dialogue segments is determined, and candidate information is obtained.

According to the embodiment of the disclosure, the characteristic information of each historical dialog segment may be input into a predetermined evaluation model, and the quality evaluation value of each historical dialog segment is obtained by output of the predetermined evaluation model. The predetermined evaluation model may be a Back Propagation (BP) neural network model or the like. The predetermined assessment model may be constructed, for example, based on XGBoost as described above.

In an embodiment, the first intent recognition model, the second intent recognition model, the predetermined classification model and the predetermined assessment model may be integrated in an entire assessment module, for example, by taking the historical dialog as input to the entire assessment module, from which the quality assessment value is output.

According to an embodiment of the present disclosure, the maximum quality evaluation value is set to 1, and the predetermined evaluation value threshold may be, for example, an arbitrary value not less than 0.5. Alternatively, the predetermined evaluation value threshold may be any value according to actual requirements, which is not limited in this disclosure.

By the method of the embodiment, the conversation segments with higher quality evaluation can be screened out from the historical conversation segments to serve as candidate information, and the quality of the candidate information can be improved. When the information is queried, the query result which has high quality, high reference value and higher matching degree with the query statement is conveniently provided for the user.

Based on the method for determining candidate information described in fig. 2 above, the present disclosure further provides a method for determining a query result, so as to obtain a query result satisfying a requirement from the candidate information. The method of determining the query result will be described in detail with reference to fig. 3.

Fig. 3 is a flow diagram of a method of determining query results according to an embodiment of the present disclosure.

As shown in fig. 3, the method 300 of determining a query result of this embodiment may include operations S310 to S330.

In operation S310, a query expression for a query statement is obtained based on the query statement.

According to the embodiment of the disclosure, the query sentence can be subjected to word segmentation processing to obtain a plurality of words. The query expression may be obtained by eliminating stop words from the plurality of words and substituting the remaining words into the query expression template. For example, the remaining words may be spliced in the form of a "sum" or "to obtain the query expression. The stop words may include prepositions, mood words, assisted words, and the like, for example. The embodiment can maintain a stop word list, and the stop word is removed by removing the words belonging to the stop word list in the plurality of words.

It will be appreciated that methods in the related art may be employed to derive a query expression from a query statement, which is not limited by the present disclosure.

In operation S320, a plurality of dialog segments are obtained from the candidate information based on the query expression.

According to the embodiment of the disclosure, after obtaining the query expression, the query expression may be used as a query condition to query the dialogue segment from the determined candidate information, so as to obtain the dialogue segment meeting the query condition. The method for querying information based on the query expression is similar to that of the related art, and will not be described in detail herein. The operation S320 is different from the related art in that candidate information is selected from a plurality of history dialogue segments by the aforementioned method of determining candidate information.

In operation S330, a target dialog segment of the plurality of dialog segments is determined as a query result for the query statement.

According to the embodiment of the disclosure, after a plurality of dialogue segments are obtained from the candidate information, the dialogue segments can be used as query results and are sequentially arranged and fed back to the terminal equipment for the terminal equipment to display.

According to the embodiment of the disclosure, after obtaining a plurality of dialogue segments, for example, the relevance of each dialogue segment to the query information can also be determined, so as to select a dialogue segment which is relevant to the query information or has high relevance from the plurality of dialogue segments, and take the dialogue segment as a target dialogue segment. Wherein, whether the session is a target session may be determined according to whether the correlation between the session and the query information is above a correlation threshold. The correlation may be determined by cosine similarity, BM25 algorithm, etc., which is not limited in this disclosure.

For example, the dialogue segments and the query terms may be input into a semantic understanding model to extract semantic features, which are then input into a logistic regression model classification model from which classification results of whether they are relevant are output. The logistic regression model may be, for example, a classification model, and the classification results may include correlations and uncorrelated.

By the method of the embodiment, when inquiring information, the method can screen the inquiring result from the candidate information with higher quality evaluation, and compared with the technical scheme of inquiring information from all historical dialogue sections in the related technology, the method can improve the matching degree of the inquiring result obtained by screening and the inquiring statement, improve the reference value of the inquiring result and the like.

Fig. 4 is a schematic diagram of determining a first keyword for candidate information according to an embodiment of the present disclosure.

In accordance with embodiments of the present disclosure, to facilitate choosing a target dialog segment from candidate information that matches a query statement, a keyword may be added to the candidate information when determining the candidate information, and a determination may be made as to whether to match the query statement based on the keyword. For example, a TF-IDF model or the like may be employed to extract keywords for each historical dialog segment as candidate information.

According to embodiments of the present disclosure, a subject term determination model may be employed to determine a subject term of candidate information and take the subject term as a first keyword for the candidate information. Alternatively, the first entity recognition model may be used to determine an entity word in the candidate information, and the entity word is used as the first keyword. Alternatively, the present disclosure may maintain a synonym library in advance, and in this embodiment, after obtaining the subject or entity word, the synonym of the subject or entity word may also be obtained from the synonym library, and the synonym may be used as the first keyword. The subject term determining model may be, for example, an implicit dirichlet allocation (Latent Dirichlet Allocation, LDA) model, and the first entity identification model may be a model obtained by constructing a two-way long-short-term memory network model and a conditional random field model, or may be a model obtained by constructing a swelling convolutional network (Dilated CNN, dinn) model and a conditional random field model, or may be any other model. It will be appreciated that any combination of the foregoing methods may be employed to derive the first keyword.

According to the embodiment of the disclosure, after the first entity word, the subject word and/or the synonym are obtained through the method, the first entity word, the subject word and/or the synonym can be used as initial words. Then dividing the initial word into words with a preset granularity, and taking the words obtained by segmentation as first keywords. By adopting the fine-grained word as the first keyword, the accuracy of the determined matching result can be improved when whether the word is matched with the query sentence is determined based on the first keyword, and therefore the accuracy of the determined query result is further improved, and the user experience is improved.

According to an embodiment of the present disclosure, as shown in fig. 4, the embodiment 400 may use the subject term determination model 420 to obtain the subject term 441 of the target sentence 411 in the candidate information 410, and use the first entity recognition model 430 to determine the first entity term 442 in the other sentences 412 except the target sentence 411 in the candidate information 410. This is because the starting location in the dialog segment typically includes a complaint content, i.e. personal demand information described when the user is to consult the information, which can be, for example, personal illness information described by the user, or simple features of the item described by the user, etc. The complaint content is typically short and should be processed using a subject word deterministic model that is suitable for processing short text. While content other than the complaint content typically includes multiple rounds of dialogue, text content is longer, and entity recognition models can be used to recognize entity words. In this way, the accuracy of the determined keywords can be improved.

Illustratively, after obtaining the subject term 441 and the first entity term 442, the synonym may be queried based only on the first entity term 442, i.e., the synonym 443 of the first entity term 442 in the synonym store 450 is determined. The synonym 443, the subject term 441, and the first entity term 442 are used as initial words. This is because the content of the subject is generally more reflective of the user's needs, which may be obscured by the subject-word-based query for synonyms. After the initial word is obtained, fine granularity division may be performed on each word in the initial word, so as to obtain a plurality of first keywords 460.

Fig. 5 is a schematic diagram of determining weights of a first keyword according to an embodiment of the present disclosure.

According to the embodiment of the disclosure, after the first keyword is determined, for example, a weight may be further allocated to the first keyword, so as to improve accuracy of a matching result when determining whether the dialogue segment matches the query statement based on the first keyword. This is because the degree of influence on the correlation result is different when the keyword is a word of different attribute.

As shown in fig. 5, in determining the weight of the first keyword, the embodiment 500 may first determine the attribute type 530 of the initial word 510 divided into the first keyword 520. The initial word 510 is then taken as the target initial word, and a weight (i.e., initial word weight 540) for the target initial word is determined based on the attribute type 530. Then, the weight of the first keyword 520 is determined according to the ratio 550 between the number 521 of characters of the first keyword 520 and the number 511 of characters of the target initial word and the initial word weight 540.

For example, a plurality of attribute types may be preset according to actual demands. The embodiment can obtain the attribute type of the initial word through the model output at the same time of obtaining the initial word by adopting the method. A mapping relationship between attribute types and weights may be established. After obtaining the attribute type of the first keyword, a weight for the first keyword may be determined according to the mapping relationship.

For example, a subject term determination model constructed from a two-way recurrent neural network model and a conditional random field model may be employed. The input of the model is a target sentence, and the output is a subject word and the attribute type of each subject word. In an online questioning scenario in the medical field, the attribute types of the subject term may include, for example: symptoms, complications, intent, background, extent of illness, etc. For example, the keywords may be classified into a plurality of classes according to attribute types, and the weights of the keywords are different for different classes. For example, the keywords with attribute types of symptoms, diseases, intentions, and the like may be classified as a third rank, the keywords with attribute types of parallel diseases, parallel symptoms, and the like may be classified as a second rank, the keywords with attribute types of background and the like may be classified as a first rank, and the weights of the three ranks may be sequentially decreased. Taking the target sentences of 'how to get back to the lower leg cramp of pregnant six months', 'how to sleep for four consecutive days', and 'harm of Maijinli papaya and kudzuvine root slices' as examples, the determined subject words and the filing times of the subject words are shown in the following table.

Target sentence	Third gear	Second gear	First gear
				How to get back the calf cramp six months pregnant	Pregnancy, small sizeLeg cramp	Six months of
What is not done for insomnia and sleep for four consecutive days		Insomnia and sleeplessness	Four consecutive days
				Hazard of Maijinli papaya and radix puerariae tablets	Maijinli papaya and kudzuvine root tablet, harm

For example, a named entity recognition model may be employed as the first entity recognition model. The input of the model is other sentences, and the output is entity words and attribute types of the entity words included in the other sentences. In an on-line questioning scenario in the medical field, the attribute type of the entity word may include, for example, a disease name, a symptom type, a medicine, an examination name, a treatment name, and the like. Since the attribute type of the synonym of the first entity word is generally the same as the first entity word, the weight of the first entity word may be given to its synonym.

According to an embodiment of the present disclosure, for the case where the target initial word is a first entity word and a synonym, a weight determined based on the attribute type may be taken as a first sub-weight. And then adjusting the first sub-weight according to the satisfied category between the sentence to which the target initial word belongs and other sentences. Specifically, a second sub-weight for the target initial word may be determined based on the satisfaction category. And determining initial weights for the target initial words according to the first sub-weights and the second sub-weights. The second sub-weight may be greater when the satisfaction category between the sentence to which the target initial word belongs and other sentences is satisfied, or the second sub-weight is smaller. Wherein satisfying includes satisfying other statements and being satisfied by other statements. This is because words in sentences satisfying the category being satisfied can provide higher reference value to the user, and by giving higher weight to the words, the finally determined query result can be more accurate information which can provide help to the user.

For example, when the second sub-weight is determined according to the satisfaction category, the sentence type of the sentence to which the initial word belongs, for example, a statement sentence or a question sentence may be determined first. If the sentence is the presentation sentence, the lowest second sub-weight is given. If the question is satisfied, the second sub-weight is given to the next lowest. If the question is satisfied, the highest second sub-weight is given. This is because the query sentence is usually a question sentence, and words in the question sentence which are satisfied are assigned a higher weight, so that the possibility that the query result can satisfy the user's requirement can be improved.

For example, after the first sub-weight and the second sub-weight are obtained, the product of the two sub-weights may be taken as the initial weight. Or the sum of the two sub-weights may be taken as the initial weight. As long as the initial weight is positively correlated with the first sub-weight and positively correlated with the second sub-weight, the present disclosure is not limited thereto.

According to the embodiment of the present disclosure, after the weight for the first keyword is obtained, the product of the ratio between the number of the aforementioned words and the weight may be taken as the weight of the first keyword. Alternatively, the sum of the product and a predetermined value may be used as the weight of the first keyword. The present disclosure is not limited thereto as long as the ratio between the weight of the first keyword and the number of words is positively correlated.

According to an embodiment of the present disclosure, a product of a ratio between the number of the aforementioned words and a weight may be taken as an initial weight. The initial weight 560 is then adjusted based on the source 570 of the target initial word to obtain the weight of the first keyword (i.e., keyword weight 580). The source of the target initial word means that the target initial word is the aforementioned subject word, first entity word or synonym.

For example, the weight of the first keyword obtained by dividing the subject term may be subjected to an ascending operation, or the weight of the first entity term or the synonym may be subjected to a descending operation. The main complaint content is more similar to the structure of the query sentence, and the keywords in the main complaint content can play a larger role when being matched with the query sentence by carrying out the ascending operation on the keywords obtained by the main complaint or carrying out the descending operation on the keywords in the entity words obtained from the text part and the synonyms derived from the entity words, so that the accuracy of the determined query result is further improved.

For example, the normalization processing can be performed on the weights of the first keywords obtained by dividing the keywords, so that the situation that the weights of the same keywords are incomparable due to different lengths of the affiliated main complaints is avoided. Through the normalization processing, the sum of the weights of all the first keywords included in the subject words of the target sentences can be a preset value, and the sum of the weights of all the first keywords included in the subject words of all the target sentences in different target sentences is equal.

For example, the first keyword may be deduplicated prior to adjusting the initial weights 560. When the weights of two identical first keywords obtained from the same candidate information are different, a word with higher weight can be selected, and a word with lower weight can be removed. In the case of duplication elimination, for example, duplication elimination may be performed on the first keywords of different sources, for example, duplication elimination may be performed on the first keywords obtained by dividing the subject term, and duplication elimination may be performed on the first keywords obtained by dividing the first entity term and its synonyms, instead of duplication elimination after mixing all the first keywords, so as to facilitate subsequent adjustment of the weights of the first keywords according to different sources.

According to an embodiment of the present disclosure, a word expressing the third intention type of the target sentence in the candidate information may also be used as the first keyword. Wherein the third intent type may be determined using a third intent recognition model. Specifically, the target sentence in the candidate information may be used as an input of the third intention recognition model, and the third intention recognition type may be obtained by outputting. Wherein the third intent recognition model is similar to the first and second intent recognition models described above, and will not be described in detail herein.

By determining keywords for candidate information and determining weights of the keywords, query results related to query sentences can be screened from the dialogue segments based on the keywords and the weights after the dialogue segments are obtained from the candidate information.

Fig. 6 is a schematic diagram of determining a second keyword for a query statement, according to an embodiment of the disclosure.

According to an embodiment of the present disclosure, when a target dialog segment is screened from a plurality of dialog segments obtained based on a query expression, for example, a plurality of second keywords for a query statement and a weight of each of the plurality of second keywords may be determined first. The third keyword and the weight of the third keyword for each of the plurality of dialog segments are then determined for each dialog segment. The third keyword is the first keyword obtained through the method, and the weight of the third keyword is the weight of the first keyword determined through the method. And finally, determining the dialogue section related to the query statement in the dialogue sections as a target dialogue section based on the second keywords and the third keywords.

For example, a word belonging to a third keyword among the plurality of second keywords may be determined, and then a product between the weight of the word in the second keyword and the weight in the third keyword may be determined as the weight product. And finally, adding the weight products obtained based on all the words belonging to the third keywords to be used as the correlation degree between the query sentence and the dialogue segment. Finally, a predetermined number of dialogue segments having a high degree of relevance to the query sentence are selected from the plurality of dialogue segments as target dialogue segments. Alternatively, a dialogue segment having a degree of relevance to the query statement higher than a degree of relevance threshold is selected from the plurality of dialogue segments as the target dialogue segment.

As shown in fig. 6, in this embodiment 600, when determining the second keyword and the weights thereof, the query sentence 610 may be subjected to word segmentation processing, so as to obtain a plurality of first words and weights of the plurality of first words 611. Wherein, for example, a dictionary and statistics-based method can be adopted to segment the query sentence, and the weight of each word obtained by the segmentation can be determined. The dictionary and statistics-based method may include a tree-structure word segmentation-based method or word segmentation based on a predetermined dictionary, followed by determination of weights of words using tf-idf algorithm. It is to be understood that the method adopted in the word segmentation process is not limited in the present disclosure, and any word segmentation method may be adopted according to actual requirements. And taking the first word obtained by the word segmentation as a first keyword.

According to an embodiment of the present disclosure, after obtaining the first word 611, for example, it is also possible to query the synonym library 620 for the synonym of the first word 611, take the synonym as the second word 621, and take the first word 611 and the second word 621 as the second keyword, as shown in fig. 6. Meanwhile, the weight of the second word 621 may be determined according to the weight of the first word 611. For example, the weight of the first word 611 may be considered as the weight of its synonym. In this way, the query statement may be augmented, e.g., the first word includes "headache", whose synonym "headache" is obtained by the query. By this expansion, the accuracy and number of the finally determined query results can be improved.

In determining the second keyword, according to embodiments of the present disclosure, as shown in fig. 6, a fourth intent recognition model 630 may also be employed to determine a fourth intent type of the query statement 610, take a third word expressing the fourth intent type as part of the second keyword 690, and assign a first predetermined weight to the third word. The first predetermined weight may be, for example, a higher value, for example, may be greater than the weight of each of the first words. This is because the intention is to be able to more accurately reflect the user's needs. Then, when determining the relevance between the dialogue segment and the query statement, the third word may also be compared with the first keyword in the dialogue segment that represents the type of intent of the complaint content, and the relevance may be determined based on the comparison result. The fourth intent recognition model 630 is similar to the first and second intent recognition models described above, and will not be described here.

According to embodiments of the present disclosure, the present disclosure may also maintain an intended word library 650, for example, the intended word library 650 may be in the form of a knowledge graph, or may maintain associations between intended words, for example. After obtaining the third word 640, the embodiment may also query the intent word stock 650 based on the third word 640, query the intent word stock for a fourth word 660 associated with the third word, and obtain a weight of the fourth word 660 based on the weight of the third word 640. For example, the weight of the third word 640 may be assigned to the weight of the fourth word 660 associated therewith. The fourth word 660 is included as part of the second keyword 690. This can further ensure that the second keyword can sufficiently express the intention of the user.

According to embodiments of the present disclosure, when determining the second keyword, the second keyword may be screened from entity words included in the query statement, because the entity words can generally more accurately characterize the query statement. And thus may facilitate improving the efficiency and accuracy of the determined correlation.

For example, as shown in FIG. 6, a second entity recognition model 670 may be employed to determine second entity words 680 included in the query statement 610. A word belonging to the second entity word 680 of the plurality of first words 611 and second words 621 is then determined as part of the second keyword 690. Specifically, the intersection of the second entity word and the first word and the intersection of the second entity word and the second word are used as a part of the second keyword.

According to the embodiment of the present disclosure, the word belonging to the second entity word 680 in the first word and the second word may be further used as a target word, and the target word is screened according to the weight, so that the target word with higher weight is screened to be used as a part of the second keyword 690. Therefore, the screened words can represent query sentences more, and the efficiency and accuracy of relevance determination are further improved. Specifically, it may be determined that a word having a weight greater than a weight threshold value among the target words is the second keyword. The weight threshold may be set according to actual requirements, which is not limited by the present disclosure.

The weight threshold may be dynamically adjusted, for example, according to query terms, to screen out a more appropriate number of second keywords for query terms of different lengths. For example, the weight threshold may be associated with a number of first words 611 resulting from the segmentation of the query statement.

Illustratively, the weight threshold may be expressed as: weight threshold = constant N/first word number 0.1. The value of the constant N may be set according to actual requirements, for example, may be 3, which is not limited in the disclosure.

According to the embodiment of the disclosure, when the user inputs the query statement, a plurality of recommendation labels can be displayed for the user, so that the user can input the query statement more accurately and in compliance. The recommendation tag may be determined in real time based on the characters that the user has entered. For example, if the character entered by the user includes "headache," the label recommended to the user may include "cause," "invasiveness," "treatment," and the like. When the user selects the recommendation tag, a fifth word indicated by the recommendation tag is used as a second keyword, and a second preset weight is given to the fifth word, namely, the weight of the fifth word is determined to be the second preset weight. The second predetermined weight may take a larger value, and the second predetermined weight may be equal to or different from the first predetermined weight, and the disclosure does not limit the information of the second predetermined weight and the recommended tag. By using the word represented by the recommended tag selected by the user as a keyword, the ability of the second keyword to express the user's needs can be improved, and thus the accuracy of determining the correlation can be facilitated to be improved.

According to the embodiment of the disclosure, after determining the plurality of second keywords included in the query statement, the query expression may also be determined according to the plurality of second keywords and weights of the plurality of second keywords, for example.

For example, a plurality of words with higher weights can be selected from a plurality of second keywords to serve as query keywords, and the query keywords are spliced in a form of 'sum', so that a query expression is obtained.

For example, the indispensable words and the non-indispensable words in the plurality of second keywords may also be determined according to the weights of the plurality of second keywords. For example, a predetermined number of the plural keywords having a higher weight may be selected, and the remaining keywords are non-plural keywords. Then, based on the must-select word and the non-must-select word, a query expression is obtained using an expression template. For example, the mandatory terms may be spliced in the form of an "and the optional terms may be spliced in the form of an" or "to obtain the query expression. By the method, the completeness and the accuracy of the query expression can be improved, and the accuracy and the diversity of a plurality of dialogue segments obtained by query are improved.

Fig. 7 is a schematic diagram of determining a target dialog segment of a plurality of dialog segments, according to an embodiment of the disclosure.

According to the embodiment of the disclosure, after the third keyword and the weight thereof of each dialogue segment and the second keyword and the weight thereof of the query sentence are obtained, the relevance between each dialogue segment and the query sentence can be determined based on the information.

For example, the intersection of the second keyword with the third keyword may be determined first. And taking the proportion of the number of words in the intersection to the number of words in the union of the second keyword and the third keyword as the correlation. Alternatively, the sum of the products of the weights in the third keyword and the weights in the second keyword for each word in the intersection may be taken as a value characterizing the correlation. Alternatively, the word in the intersection may be used as the target keyword, the sum of the weights of the target keywords in the plurality of second keywords may be used as the first value, the sum of the weights of the target keywords in the third keyword may be used as the second value, and the ratio between the first value and the second value may be used as the correlation between each dialogue segment and the query sentence.

According to embodiments of the present disclosure, in determining the relevance, semantic similarity between the dialog segment and the query statement may also be considered, for example, to improve the accuracy of the determined relevance. In this embodiment, the foregoing value of the correlation determined according to the third keyword and the weight thereof and the second keyword and the weight thereof may be used as the first sub-similarity. And taking the semantic similarity between the dialogue segment and the query sentence as a second sub-similarity. Finally, a correlation is determined based on the first sub-similarity and the second sub-similarity.

By way of example, only semantic similarity between the target sentence and the query sentence in the dialog segment may be considered. This is because the structure of the target sentence and the query sentence in the dialogue section is more similar, so the determined semantic similarity is more accurate. The target sentence may be, for example, a complaint content. This embodiment may employ, for example, a semantic similarity algorithm to determine semantic similarity. The semantic similarity algorithm may include, for example, a deep network-based semantic model (Deep Structured Semantic Model, DSSM), a CNN-DSSM model, or an LSTM-DSSM model, etc., which is not limited by the present disclosure.

According to the embodiment of the disclosure, in determining the relevance, in addition to the above-described determination of the relevance based on the keywords and the weights, the similarity between the intent of the query sentence and the intent of the target sentence may be further fused. The importance of the intention matching is highlighted, and the possibility that the query result obtained by screening can meet the requirement of the user is improved. In this embodiment, the foregoing value of the correlation determined according to the third keyword and the weight thereof and the second keyword and the weight thereof may be used as the first sub-similarity. The similarity between the intent of the query statement and the intent of the target statement in each dialog segment is taken as a third sub-similarity. Finally, a correlation is determined based on the first sub-similarity and the third sub-similarity.

For example, a word of the table schematic type may be selected from the second keyword, and a word representing the intention type may be selected from the third keyword. The editing distance, cosine similarity and the like of the two selected words are used as the similarity between the intention of the query sentence and the intention of the target sentence in each dialogue segment.

According to the embodiment of the disclosure, in determining the relevance, in addition to the above-described determination of the relevance based on the keywords and the weights, semantic similarity between the dialogue segment and the query sentence may be considered, and the similarity between the intent of the query sentence and the intent of the target sentence may be fused.

As shown in fig. 7, the embodiment 700 may determine, after determining the second keyword 711 and the weight thereof for the query sentence 710, and the third keyword 721 and the weight thereof for each dialog segment 720, to obtain the first sub-similarity 730 based on the second keyword 711 and the weight thereof and the third keyword 721 and the weight thereof. Meanwhile, a semantic similarity algorithm 740 may be employed to determine the semantic similarity between the query statement 710 and the target statement in the dialog segment 720 as a second sub-similarity 750. Words of the form type are determined for the second keywords of the query sentence 710 as the first intent word 712, and words of the form type are determined for the third keywords of each dialog segment 720 as the second intent word 722. The similarity between the first intent word 712 and the second intent word 722 is then determined as a third sub-similarity 760. Finally, based on the first sub-similarity 730, the second sub-similarity 750, and the third sub-similarity 760, it is determined whether each dialog segment is relevant to the query statement.

Illustratively, the sum of the three sub-similarities may be taken as the correlation between the query statement and each dialog segment. If the relevance is above a relevance threshold, then each of the dialog segments 720 can be determined to be relevant to the query statement 710, with each of the dialog segments 720 being the target dialog segment. It is to be understood that, for example, an average value of three sub-similarities may be used as the correlation, or an arithmetic square root of three sub-similarities may be used as the correlation, which is not limited by the present disclosure.

Illustratively, as shown in fig. 7, the first sub-similarity 730, the second sub-similarity 750, and the third sub-similarity 760 may also be input to a predetermined logistic regression model 770, and the classification result 780 may be obtained as the classification result for each session 720 after processing through the predetermined logistic regression model 770. The classification result 780 is, for example, a classification result, which is related or unrelated. In this way, a dialogue segment whose classification result is relevant can be regarded as a target dialogue segment.

Fig. 8 is a schematic diagram of ordering a plurality of targeted dialog segments according to an embodiment of the disclosure.

According to the embodiment of the disclosure, when a plurality of target dialogue segments are obtained, for example, the plurality of target dialogue segments can be further sequenced, so that the efficiency of searching the dialogue segments meeting the requirements by the user is improved, and the user experience is improved.

Illustratively, the plurality of target dialog segments may be ordered according to the previously determined relevance from high to low.

For example, if the correlation is a classification result, the embodiment may first determine a correlation evaluation value for each of the plurality of target dialog segments using a ranking model. The plurality of target dialog segments are then ranked based on the relevance evaluation value.

The ranking model may be, for example, a logistic regression model, and is input as a dialogue segment and a query sentence, and output as a relevance evaluation value.

As shown in fig. 8, the sorting model in this embodiment 800 may also be a model that considers the partial order relationship between every two samples, for example. Setting a plurality of target session segments as n, wherein n is a value of 2 or more. The embodiment may combine the first session 801 to the nth session 803 two by two to obtain a plurality of session pairs. For example, the first session section 801 and the second session section 802 may be combined to obtain a session section pair 811, the second session section 802 and the n-th session section 803 may be combined to obtain a session section pair 812, and the first session section 801 and the n-th session section 803 may be combined to obtain a session section pair 813. The embodiment may use the ordering model 820 to obtain a partial ordering relationship of two dialog segments in each dialog segment pair, for example, a correlation evaluation value of the two dialog segments relative to each other, based on each dialog segment pair. Finally, the n dialog segments are ordered according to the partial order relationship between every two of the n dialog segments, resulting in an ordering result 830. The ranking model 820 may be RankSVM, GBRank, for example, and the disclosure is not limited thereto.

For example, after the correlation evaluation value is determined, for example, the weight of the correlation evaluation value of each dialogue segment may also be determined based on the matching relationship between the word belonging to the target category in the query sentence and the word belonging to the target category in each dialogue segment. A weighted evaluation value for each dialog segment is then determined based on the weights of the correlation evaluation values for each dialog segment. Finally, a plurality of target dialogue segments are arranged based on the weighted evaluation values. The target category may include, for example, words describing user attribute information, disease names, symptom names, and the like. And if the words with the matched target categories are included, determining that the weight of the correlation evaluation value is a third preset weight. Or, if there are more words of the matched target category, the weight of the determined correlation evaluation value is higher. The number of the matched words and the weight of the correlation evaluation value are positively correlated with each other, and may be, for example, an exponential relationship, or may be a proportional relationship, etc., which is not limited in the present disclosure.

Based on the method for determining the candidate information, the disclosure also provides a device for determining the candidate information. The apparatus for determining candidate information will be described in detail below with reference to fig. 9.

Fig. 9 is a block diagram of an apparatus for determining candidate information according to an embodiment of the present disclosure.

As shown in fig. 9, the apparatus 900 for determining candidate information of this embodiment may include a feature information extraction module 910, a first evaluation value determination module 920, and a candidate information obtaining module 930.

The feature information extraction module 910 is configured to extract feature information of each of the plurality of history dialogue segments for each history dialogue segment. In an embodiment, the feature information extraction module 910 may be configured to perform the operation S210 described above, which is not described herein.

The first evaluation value determination module 920 is configured to determine a quality evaluation value of each historical dialog segment using a predetermined evaluation model based on the feature information. In an embodiment, the first evaluation value determining module 920 may be used to perform the operation S220 described above, which is not described herein.

The candidate information obtaining module 930 is configured to determine a historical dialog segment, of the plurality of historical dialog segments, for which the quality evaluation value is greater than a predetermined evaluation value threshold, and obtain candidate information. In an embodiment, the candidate information obtaining module 930 may be configured to perform the operation S240 described above, which is not described herein.

According to an embodiment of the present disclosure, the above-described feature information extraction module 910 may include a first intention determination sub-module, a second intention determination sub-module, a category-satisfaction obtaining sub-module, and a feature obtaining sub-module. The first intention determination submodule is used for obtaining a first intention type of a first sentence by taking the first sentence aiming at a first object in each historical dialog section as input of a first intention recognition model. The second intention determination submodule is used for obtaining a second intention type of a second sentence by taking the second sentence aiming at a second object in each historical dialog section as input of the first intention recognition model. The satisfaction category obtaining submodule is used for inputting the first sentence and the second sentence into a preset classification model to obtain the satisfaction category between the first sentence and the second sentence. The feature acquisition submodule is used for determining feature information of each historical dialog segment based on the first intention type, the second intention type and the satisfaction category.

According to an embodiment of the present disclosure, the feature information extraction module 910 may further include a keyword determination module for determining a first keyword for the candidate information. The keyword determination module comprises an initial word obtaining sub-module and a keyword obtaining sub-module. Wherein the initial word obtaining sub-module is configured to obtain the initial word by at least one of: determining the subject word of the target sentence in the candidate information by adopting a subject word determining model; determining first entity words in other sentences except the target sentence in the candidate information by adopting a first entity recognition model; and determining synonyms of the first entity words in the synonym library. The keyword obtaining submodule is used for dividing each word in the initial words into words with a preset granularity to obtain first keywords.

According to an embodiment of the present disclosure, the feature information extraction module 910 may further include a weight determining module for determining a weight of the first keyword after the keyword determining module determines the first keyword for the candidate information. The weight determination module comprises a first determination sub-module, a second determination sub-module and a weight adjustment sub-module. The first determination submodule is used for determining weights for target initial words based on attribute types of the target initial words divided to obtain the first keywords. The second determining submodule is used for determining initial weights of the first keywords based on the ratio between the number of characters of the first keywords and the number of characters of the target initial words and the weights of the target initial words. The weight adjustment submodule is used for adjusting the initial weight based on the source of the target initial word to obtain the weight of the first keyword.

According to an embodiment of the present disclosure, the weight adjustment submodule is configured to normalize the initial weight when the source of the target initial word is a subject word, so that the sum of weights of all first keywords included in the subject word of the target sentence is a predetermined value.

According to an embodiment of the present disclosure, the weight adjustment submodule is configured to perform weight reduction processing on the initial weight when the source of the target initial word is a first entity word or a synonym.

According to an embodiment of the present disclosure, the first determination sub-module may include a first sub-weight determination unit, a second sub-weight determination unit, and an initial weight determination unit. The first sub-weight determining unit is used for determining a first sub-weight for the target initial word based on the attribute type of the target initial word. The second sub-weight determining unit is used for determining a second sub-weight for the target initial word based on the meeting category between the sentence to which the target initial word belongs and other sentences. The initial weight determining unit is used for determining initial weights for target initial words based on the first sub-weights and the second sub-weights.

According to an embodiment of the present disclosure, the keyword determining module may further include a third intent determining sub-module configured to determine a third intent type of the target sentence in the candidate information using a third intent recognition model, and determine a word expressing the third intent type as the first keyword.

Based on the method for determining the query result, the disclosure also provides a device for determining the query result. The apparatus for determining the result of the query will be described in detail with reference to fig. 10.

Fig. 10 is a block diagram of an apparatus for determining query results according to an embodiment of the present disclosure.

As shown in fig. 10, the apparatus 1000 for determining a query result of this embodiment may include an expression obtaining module 1010, a session obtaining module 1020, and a query result determining module 1030.

The expression obtaining module 1010 is configured to obtain a query expression for a query statement based on the query statement. In an embodiment, the expression obtaining module 1010 may be configured to perform the operation S310 described above, which is not described herein.

The dialog segment obtaining module 1020 is configured to obtain a plurality of dialog segments from the candidate information based on the query expression. Wherein the candidate information is determined using the means for determining candidate information described above. In an embodiment, the session obtaining module 1020 may be configured to perform the operation S320 described above, which is not described herein.

The query result determination module 1030 is configured to determine a target dialog segment of the plurality of dialog segments as a query result for the query statement. In an embodiment, the query result determining module 1030 may be used to perform the operation S330 described above, which is not described herein.

According to embodiments of the present disclosure, the query result determination module 1030 may be configured to determine a relevance between each of the plurality of dialog segments and the query statement to determine the target dialog segment based on the relevance. The query result determination module 1030 may include: the third determination sub-module, the fourth determination sub-module, and the correlation determination sub-module. The third determination submodule determines a plurality of second keywords for the query statement and a weight of each of the plurality of second keywords. The fourth determination submodule is used for determining a third keyword and a weight of the third keyword of each dialog segment for each dialog segment in the plurality of dialog segments. The relevance determination submodule is used for determining relevance between each dialogue segment and the query statement based on the second keywords and the third keywords.

According to an embodiment of the present disclosure, the correlation determination submodule may include: the device comprises a first sub-similarity determining unit, a second sub-similarity determining unit, a third sub-similarity determining unit and a correlation determining unit. The first sub-similarity determination unit is used for determining a first sub-similarity between the query sentence and each dialogue segment based on the weight of each second keyword and the weight of the third keyword in the plurality of second keywords. The second sub-similarity determining unit is used for determining semantic similarity between the query sentence and the target sentence in each dialogue segment as a second sub-similarity. The third sub-similarity determination unit is configured to determine, as a third sub-similarity, a similarity between the intent of the query sentence and the intent of the target sentence in each dialog segment. The relevance determining unit is used for determining whether each dialogue segment is relevant to the query sentence or not based on the first sub-similarity, the second sub-similarity and the third sub-similarity.

According to an embodiment of the present disclosure, the first in-similarity determining unit includes a target determining subunit, a first value determining subunit, a second value determining subunit, and a similarity determining subunit. The target determination subunit is configured to determine intersections between the plurality of second keywords and the third keywords, and obtain a target keyword. The first value determining subunit is configured to determine a sum of weights of target keywords in the plurality of second keywords, to obtain a first value. The second value determining subunit is configured to determine a sum of weights of the target keywords in the third keywords, to obtain a second value. The similarity determination subunit is configured to determine a ratio of the first value to the second value as a first sub-similarity.

According to an embodiment of the present disclosure, the above-mentioned correlation determination unit is configured to obtain the classification result for each dialog segment with the first sub-similarity, the second sub-similarity, and the third sub-similarity as inputs of a predetermined logistic regression model. Wherein the classification result includes correlation or uncorrelation.

According to an embodiment of the present disclosure, the apparatus 1000 for determining a query result may further include a second evaluation value determining module and a ranking module. The second evaluation value determination module is used for determining a relevance evaluation value of each session in the plurality of target sessions by using the ranking model. The ranking module is used for ranking the plurality of target dialogue segments based on the correlation evaluation values.

According to an embodiment of the present disclosure, the ranking module may include a weight determination sub-module, a weight evaluation determination sub-module, and a ranking sub-module. The weight determination sub-module is used for determining the weight of the relevance evaluation value of each dialogue segment based on the matching relation between the words belonging to the target category in the query sentence and the words belonging to the target category in each dialogue segment. The weighted evaluation determination sub-module is used for determining a weighted evaluation value of each dialog segment based on the weight of the correlation evaluation value of each dialog segment. The ranking sub-module is used for ranking the plurality of target session segments based on the weighted evaluation values.

According to an embodiment of the present disclosure, the expression obtaining module 1010 is specifically configured to determine a query expression for a query sentence according to the weight of each second keyword and the plurality of second keywords.

The expression obtaining module 1010 described above may include a word determining sub-module and an expression determining sub-module according to an embodiment of the present disclosure. The word determining submodule is used for determining the necessary words and the unnecessary words in the plurality of second keywords according to the weight of each second keyword. The expression determination submodule is used for obtaining the query expression by adopting the expression template based on the necessary word and the unnecessary word.

According to an embodiment of the present disclosure, the above-described third determination submodule may include a first word obtaining unit, a second word obtaining unit, and a third word obtaining unit. The first word obtaining unit is used for carrying out word segmentation processing on the query sentence to obtain a plurality of first words and weights of the plurality of first words. The second word obtaining unit is used for obtaining the second word and the weight of the second word based on the synonyms of the first words and the weights of the first words in the preset synonym library. The third word obtaining unit is used for determining a fourth intention type of the query statement by adopting a fourth intention recognition model, obtaining a third word expressing the fourth intention type, and determining the weight of the third word as a first preset weight.

According to an embodiment of the present disclosure, the third determining submodule may further include an entity word determining unit, a target word determining unit, and a keyword determining unit. The entity word determining unit is used for determining second entity words included in the query statement by adopting a second entity recognition model. The target word determining unit is used for determining words belonging to the second entity word in the plurality of first words and the second words as target words. The keyword determining unit is used for determining that the word with the weight larger than the weight threshold value in the target words is the second keyword. Wherein the weight threshold is associated with a number of the plurality of first words.

According to an embodiment of the present disclosure, the above-mentioned third determination submodule may further include a fourth word obtaining unit configured to obtain weights of the fourth word and the fourth word based on the word associated with the third word and the weight of the third word in the intended word stock.

According to an embodiment of the present disclosure, the above-described third determination submodule may further include a fifth word determination unit and a weight determination unit. The fifth word determining unit is used for responding to selection of the displayed recommended labels and determining that the fifth word represented by the selected label is a second keyword aiming at the query statement. The weight determining unit is used for determining that the weight of the fifth word is a second preset weight.

It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, etc. of the related personal information of the user all conform to the rules of the related laws and regulations, and do not violate the popular regulations of the public order.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 11 illustrates a schematic block diagram of an example electronic device 1100 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the apparatus 1100 includes a computing unit 1101 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM1103, various programs and data required for the operation of the device 1100 can also be stored. The computing unit 1101, ROM 1102, and RAM1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

Various components in device 1100 are connected to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, etc.; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108, such as a magnetic disk, optical disk, etc.; and a communication unit 1109 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1101 performs the respective methods and processes described above, such as a method of determining candidate information and/or a method of determining query results. For example, in some embodiments, the method of determining candidate information and/or the method of determining query results may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1108. In some embodiments, some or all of the computer programs may be loaded and/or installed onto device 1100 via ROM 1102 and/or communication unit 1109. When the computer program is loaded into the RAM1103 and executed by the computing unit 1101, one or more steps of the above-described method of determining candidate information and/or method of determining query results may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured to perform the method of determining candidate information and/or the method of determining query results by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS"). The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of determining candidate information, comprising:

extracting characteristic information of each history session section in a plurality of history session sections;

based on the characteristic information, determining a quality evaluation value of each historical dialog segment by adopting a preset evaluation model;

determining a historical conversation segment, of which the quality evaluation value is larger than a preset evaluation value threshold, in the plurality of historical conversation segments, and obtaining the candidate information; and

Determining a first keyword for the candidate information by:

the initial word is obtained by at least one of the following means:

determining the subject word of the target sentence in the candidate information by adopting a subject word determination model;

determining first entity words in other sentences except the target sentence in the candidate information by adopting a first entity recognition model;

determining synonyms of the first entity words in the synonym library;

dividing each word in the initial words into words with a preset granularity, and obtaining the first keywords;

the method further includes, after determining a first keyword for the candidate information, determining a weight of the first keyword by:

determining a weight for a target initial word of the first keyword based on the attribute type of the target initial word obtained by dividing;

determining initial weights of the first keywords based on the ratio between the number of characters of the first keywords and the number of characters of the target initial words and the weights of the target initial words; and

and adjusting the initial weight based on the source of the target initial word to obtain the weight of the first keyword.

2. The method of claim 1, wherein extracting the characteristic information of each historical dialog segment comprises:

taking a first sentence aiming at a first object in each historical dialog section as input of a first intention recognition model to obtain a first intention type of the first sentence;

taking a second sentence aiming at a second object in each history dialogue section as input of a first intention recognition model to obtain a second intention type of the second sentence;

inputting the first sentence and the second sentence into a preset classification model to obtain a meeting category between the first sentence and the second sentence; and

determining characteristic information of each historical dialog segment based on the first intent type, the second intent type, and the satisfaction category.

3. The method of claim 1, wherein, in the case where the source of the target initial word is the subject word, adjusting the initial weight comprises:

and carrying out normalization processing on the initial weights so that the sum of the weights of all the first keywords included in the subject words of the target sentences in the candidate information is a preset value.

4. The method of claim 1, wherein, in the event that the source of the target initial word is the first entity word or the synonym, adjusting the initial weight comprises:

And carrying out weight reduction processing on the initial weight.

5. The method of claim 4, wherein determining weights for the target initial word if the target initial word is the first entity word comprises:

determining a first sub-weight for the target initial word based on the attribute type of the target initial word;

determining a second sub-weight for the target initial word based on the meeting category between the sentence to which the target initial word belongs and the other sentences; and

based on the first sub-weight and the second sub-weight, an initial weight for the target initial word is determined.

6. The method of any of claims 1-5, wherein determining a first keyword for the candidate information further comprises:

and determining a third intention type of the target sentence in the candidate information by adopting a third intention recognition model, and determining that the word expressing the third intention type is the first keyword.

7. A method of determining query results, comprising:

based on the query statement, obtaining a query expression for the query statement;

obtaining a plurality of dialogue segments from candidate information based on the query expression; and

Determining a target dialog segment of the plurality of dialog segments as a query result for the query statement,

wherein the candidate information is determined by the method of any one of claims 1 to 6.

8. The method of claim 7, wherein determining a target dialog segment of the plurality of dialog segments comprises determining a relevance between each of the plurality of dialog segments and the query statement to determine the target dialog segment based on the relevance by:

determining a weight of each of a plurality of second keywords for a query statement;

determining a third keyword of each dialog segment and a weight of the third keyword for each dialog segment of the plurality of dialog segments; and

and determining relevance between each dialogue segment and the query statement based on the second keywords and the third keywords.

9. The method of claim 8, wherein determining the relevance between each of the dialog segments and the query statement comprises:

determining a first sub-similarity between the query sentence and each dialog segment based on the weight of each second keyword and the weight of the third keyword in the plurality of second keywords;

Determining semantic similarity between the query sentence and the target sentence in each dialogue segment as second sub-similarity;

determining the similarity between the intention of the query sentence and the intention of the target sentence in each dialogue section as a third sub-similarity; and

determining whether each dialog segment is relevant to the query statement based on the first sub-similarity, the second sub-similarity, and the third sub-similarity.

10. The method of claim 9, wherein determining a first sub-similarity between the query statement and the each dialog segment comprises:

determining intersections between the plurality of second keywords and the third keywords to obtain target keywords;

determining the sum of the weights of the target keywords in the plurality of second keywords to obtain a first value;

determining the sum of the weights of the target keywords in the third keywords to obtain a second value; and

a ratio between the first value and the second value is determined as the first sub-similarity.

11. The method of claim 9, wherein determining whether each of the dialog segments is related to the query statement comprises:

Taking the first sub-similarity, the second sub-similarity and the third sub-similarity as inputs of a preset logistic regression model to obtain a classification result for each dialog segment,

wherein the classification result includes correlation or uncorrelation.

12. The method of claim 7, wherein the target session is a plurality of; the method further comprises the steps of:

determining a relevance evaluation value of each session in the plurality of target sessions by using the ranking model; and

the plurality of target conversation segments are ranked based on the relevance evaluation value.

13. The method of claim 12, wherein ordering the plurality of target conversation segments comprises:

for each dialogue segment, determining the weight of the relevance evaluation value of each dialogue segment based on the matching relation between the words belonging to the target category in the query sentence and the words belonging to the target category in each dialogue segment;

determining a weighted evaluation value of each session based on the weight of the correlation evaluation value of each session; and

and sorting the plurality of target session segments based on the weighted evaluation values.

14. The method of claim 8, wherein obtaining a query expression for the query statement comprises:

and determining a query expression for the query statement according to the weight of each second keyword and the plurality of second keywords.

15. The method of claim 14, wherein determining a query expression for the query statement comprises:

determining the necessary words and the unnecessary words in the plurality of second keywords according to the weight of each second keyword; and

and obtaining the query expression by adopting an expression template based on the necessary words and the unnecessary words.

16. The method of claim 8, wherein determining a plurality of second keywords for a query statement and a weight for each of the plurality of second keywords comprises:

word segmentation processing is carried out on the query sentence, and a plurality of first words and weights of the plurality of first words are obtained;

obtaining a second word and weights of the second word based on synonyms of the plurality of first words and weights of the plurality of first words in a preset synonym library; and

and determining a fourth intention type of the query statement by adopting a fourth intention recognition model, obtaining a third word expressing the fourth intention type, and determining the weight of the third word as a first preset weight.

17. The method of claim 16, wherein determining a plurality of second keywords for a query term and a weight for each of the plurality of second keywords further comprises:

determining a second entity word included in the query statement by adopting a second entity recognition model;

determining words belonging to the second entity word in the plurality of first words and the second words as target words; and

determining the words with weights larger than a weight threshold value in the target words as the second keywords,

wherein the weight threshold is associated with a number of the plurality of first words.

18. The method of claim 16, wherein determining a plurality of second keywords for a query term and a weight for each of the plurality of second keywords further comprises:

and obtaining a fourth word and the weight of the fourth word based on the word associated with the third word and the weight of the third word in the intention word library.

19. The method of claim 16, wherein determining a plurality of second keywords for a query term and a weight for each of the plurality of second keywords further comprises:

responding to the selection of the displayed recommended label, and determining that the fifth word represented by the selected label is a second keyword aiming at the query statement; and

And determining the weight of the fifth word as a second preset weight.

20. An apparatus for determining candidate information, comprising:

the characteristic information extraction module is used for extracting characteristic information of each history session section aiming at each history session section in the plurality of history session sections;

the first evaluation value determining module is used for determining a quality evaluation value of each historical dialog segment by adopting a predetermined evaluation model based on the characteristic information; and

a candidate information obtaining module, configured to determine a historical dialogue segment in which the quality evaluation value is greater than a predetermined evaluation value threshold value, from the plurality of historical dialogue segments, to obtain the candidate information;

wherein the candidate information obtaining module determines a first keyword for the candidate information by:

the initial word is obtained by at least one of the following means:

determining synonyms of the first entity words in the synonym library;

Further comprising, after determining the first keyword for the candidate information, determining a weight of the first keyword by:

21. An apparatus for determining query results, comprising:

the expression obtaining module is used for obtaining a query expression aiming at the query statement based on the query statement;

a dialogue segment obtaining module, configured to obtain a plurality of dialogue segments from candidate information based on the query expression; and

a query result determination module for determining a target dialog segment of the plurality of dialog segments as a query result for the query statement,

wherein the candidate information is determined using the apparatus of claim 20.

22. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-19.

23. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-19.