CN113407813A

CN113407813A - Method for determining candidate information, method, device and equipment for determining query result

Info

Publication number: CN113407813A
Application number: CN202110722521.8A
Authority: CN
Inventors: 刘子航; 王锴睿; 白亚楠; 李鹏飞; 欧阳宇; 王丛
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2021-09-17
Anticipated expiration: 2041-06-28
Also published as: CN113407813B

Abstract

The disclosure provides a method for determining candidate information, a method for determining query results, a device, equipment and a storage medium, which are applied to the field of artificial intelligence, particularly the technical field of natural language processing and the technical field of deep learning, and can be applied to intelligent medical scenes and search scenes. The specific implementation scheme of the method for determining the candidate information is as follows: extracting feature information of each historical dialog segment for each historical dialog segment of a plurality of historical dialog segments; determining a quality evaluation value of each historical dialogue section by adopting a preset evaluation model based on the characteristic information; and determining a history dialogue section of which the quality evaluation value is larger than a preset evaluation value threshold value in the plurality of history dialogue sections to obtain candidate information.

Description

Method for determining candidate information, method, device and equipment for determining query result

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly to the field of natural language processing and deep learning techniques, which can be applied to smart medical scenarios and search scenarios. And more particularly, to a method of determining candidate information, a method of determining query results, an apparatus, a device, and a storage medium.

Background

In a search scenario, the content of query retrieval tends to be generalized. Typically, targeted query results cannot be provided due to incompleteness of the query statements provided by the user. For scenarios where knowledge is obtained by querying online consultations for dialogue segments, it is also common to fail to provide reference-value query results due to the uneven quality of dialogue segments.

Disclosure of Invention

Provided are a method for determining candidate information, a method for determining a query result, a device, equipment and a storage medium, wherein the quality of the candidate information is improved, and the accuracy of the query result is improved.

According to an aspect of the present disclosure, there is provided a method of determining candidate information, including: extracting feature information of each historical dialog segment for each historical dialog segment of a plurality of historical dialog segments; determining a quality evaluation value of each historical dialogue section by adopting a preset evaluation model based on the characteristic information; and determining a history dialogue section of which the quality evaluation value is larger than a preset evaluation value threshold value in the plurality of history dialogue sections to obtain candidate information.

According to another aspect of the present disclosure, there is provided a method of determining query results, comprising: obtaining a query expression for the query statement based on the query statement; obtaining a plurality of dialog segments from the candidate information based on the query expression; and determining a target dialog segment in the plurality of dialog segments as a query result for the query statement, wherein the candidate information is determined by adopting the method for determining the candidate information.

According to another aspect of the present disclosure, there is provided an apparatus for determining candidate information, including: the characteristic information extraction module is used for extracting the characteristic information of each historical dialogue section aiming at each historical dialogue section in the plurality of historical dialogue sections; a first evaluation value determination module for determining a quality evaluation value of each history dialogue section by using a predetermined evaluation model based on the feature information; and the candidate information obtaining module is used for determining the history dialogue section with the quality evaluation value larger than the preset evaluation value threshold value in the plurality of history dialogue sections to obtain candidate information.

According to another aspect of the present disclosure, there is provided an apparatus for determining a query result, including: the expression obtaining module is used for obtaining a query expression aiming at the query statement based on the query result; and the dialog segment obtaining module is used for obtaining a plurality of dialog segments from the candidate information based on the query expression. And the query result determining module is used for determining a target dialog segment in the dialog segments as a query result for the query statement, wherein the candidate information is determined by adopting the device for determining the candidate information.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods of determining candidate information and/or determining query results provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of determining candidate information and/or the method of determining a query result provided by the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method of determining candidate information and/or the method of determining query results provided by the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of an application scenario of a method for determining candidate information, a method for determining a query result, and an apparatus according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart diagram of a method of determining candidate information according to an embodiment of the present disclosure;

FIG. 3 is a flow diagram of a method of determining query results according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating a principle of determining a first keyword for candidate information according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram illustrating a principle of determining a weight of a first keyword according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram illustrating a principle of determining a second keyword for a query statement according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram illustrating the principle of determining a target dialog segment among a plurality of dialog segments in accordance with an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of ordering a plurality of target dialog segments according to an embodiment of the present disclosure;

fig. 9 is a block diagram of an apparatus for determining candidate information according to an embodiment of the present disclosure;

FIG. 10 is a block diagram of an apparatus for determining query results according to an embodiment of the present disclosure; and

FIG. 11 is a block diagram of an electronic device for implementing a method of determining candidate information and/or a method of determining query results according to embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The disclosure provides a method for determining candidate information, which comprises a characteristic information extraction stage, an evaluation value determination stage and a candidate information obtaining stage. In the feature information extraction stage, for each of a plurality of history dialog segments, feature information of each history dialog segment is extracted. In the evaluation value determination stage, a quality evaluation value of each history dialogue is determined using a predetermined evaluation model based on the feature information. In the candidate information obtaining stage, a history dialogue section of which the quality evaluation value is larger than a predetermined evaluation value threshold value is determined from a plurality of history dialogue sections, and candidate information is obtained.

An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1.

Fig. 1 is a schematic application scenario diagram of a method for determining candidate information, a method for determining a query result, and an apparatus according to an embodiment of the present disclosure.

As shown in fig. 1, the application scenario 100 of this embodiment may include a user 110, a terminal device 120, and a server 130. The terminal device 120 may be communicatively coupled to the server 130 via a network, which may include wired or wireless communication links.

The terminal device 120 may be various electronic devices having a display function and capable of providing a human-computer interaction interface, including but not limited to a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like. User 110 may query for information through terminal device 120, for example, through interaction with terminal device 120. The information to be searched may be information in various fields such as information in a medical field and information in an educational field, and the information to be searched may be, for example, disease information searched for by a symptom, an attribute of an article searched for by an article name, or the like.

Illustratively, when the user 110 inputs the query statement 140 through the terminal device 120, the terminal device 120 may, for example, send the query statement 140 to the server 130. The server queries the knowledge base according to the query statement to obtain a query result 150, and feeds back the query result to the terminal device 120. Terminal device 120 may then present the query results 150 to user 110. The present disclosure is not limited thereto.

The server 130 may be, for example, a server that provides various services, such as a background management server that provides support for a website or client application that a user accesses using a terminal device. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

In one embodiment, as shown in FIG. 1, the application scenario 100 may also include a database 160 in which a full-scale knowledge base is maintained, which may include, for example, a dialog segment for consultation on a wire. The server 130 may access the database 160 over a network to query the database 160 for query results 150 in accordance with the query statement 140.

In an embodiment, the server 130 may also filter the dialog segments in the full-scale knowledge base maintained in the database to obtain the dialog segments with higher quality, and store the filtered dialog segments in other storage spaces except the database 160 to generate the candidate information base 170. In this manner, the candidate information store 170 may be queried according to the query statement 140 to obtain the query result 150. And thus improve the reference value of the query results 150 obtained by the query.

The database 160 may be, for example, a database independent from the server 130, or may be a data storage module integrated in the server 130, which is not limited in this disclosure.

It should be noted that the method for determining candidate information and/or the method for determining query result provided by the present disclosure may be performed by the server 130, or may be performed by another server communicatively connected to the server 130. Accordingly, the method for determining candidate information and/or the apparatus for determining query results provided by the present disclosure may be disposed in the server 130, or may be disposed in another server communicatively connected to the server 130. The method for determining the candidate information and the method for determining the query result may be executed by the same server or different servers, which is not limited in this disclosure.

It should be understood that the number and type of terminal devices, servers, and databases in fig. 1 are merely illustrative. There may be any number and type of terminal devices, servers, and databases, as the implementation requires.

The method for determining the annotation information, the method for determining the query result, and the overall principle of the query information provided by the present disclosure will be described in detail below with reference to fig. 2 to 8 by taking video annotation as an example.

Fig. 2 is a flowchart illustrating a method of determining candidate information according to an embodiment of the disclosure.

As shown in fig. 2, the method 200 of determining candidate information of this embodiment may include operations S210 to S230.

In operation S210, for each of a plurality of history dialog segments, feature information of each history dialog segment is extracted.

According to an embodiment of the present disclosure, the history dialog may be a dialog generated by a user's online consultation. The information of online consultation can be the attribute of the article, the disease name corresponding to the symptom, the using method of the article and the like. In one embodiment, the historical dialog may be a dialog generated for an online inquiry. It is to be understood that the above history dialog is merely an example to facilitate understanding of the present disclosure, and the present disclosure is not limited thereto.

According to the embodiment of the present disclosure, the currency of each sentence in each history dialog may be determined and taken as feature information. The entity words and the association relationship between the entity words in the history dialog can be identified, and the association relationship between the entity words and the entity words is used as the characteristic information.

Illustratively, in an online consultation scenario, the history dialog includes statements entered by two objects. When extracting the feature information, the embodiment may obtain the first intention type of the first sentence by taking the first sentence for the first object in each history dialog as an input of the first intention recognition model. And simultaneously, taking a second statement aiming at a second object in each historical dialog segment as an input of the first intention recognition model, and obtaining a second intention type of the second statement. Wherein the first and second intent recognition models can be constructed based on a multi-classification model to derive the intent type from the output. After the intention types of the first sentence and the second sentence are obtained, the intention types of the plurality of first sentences can be counted, the statistics can be carried out according to the intention types of the plurality of second sentences, and the richness of the historical dialog can be determined according to the counting result. The diversity is high if the types of the intention obtained by statistics are various.

For example, in an online interrogation scenario, the first object may be a user and the second object may be a physician. The first intent type can include a number of types including cause, diagnosis, medication, dietary recommendations, credits, and the like. Based on the first intent type, clarification of the user's condition, etc. may be facilitated. The second intent type may include a plurality of types including examination advice, medication advice, daily advice, diagnosis of a medical condition, collection of a medical condition, greeting, no intent, and the like. Based on the second intention type, the number of types of the intention type, the precedence order of the intention type, and the like may be counted. According to the result obtained by counting the first intention type and the second intention type, the information amount, the professionality, the credibility, the satisfaction degree of the doctor and the patient questions and answers and the like of the information provided by the doctor can be determined. For example, if the first intent type includes a diet recommendation and the second intent type includes a daily recommendation, it may be determined that the doctor's response satisfies a requirement of the user, and finally, the degree of satisfaction of the doctor-patient question-answer may be determined according to a ratio between the number of the doctor satisfying the requirement and the number of the user's requirements. The embodiment can use the condition, information amount, professionalism, credibility, satisfaction degree of doctor-patient answers and the like of the users as the characteristic information.

For example, the method may further include a predetermined consultation process and an intention link to be included in the consultation process. After the first intention type and the second intention type are obtained, whether the intention types comprise the preset intention types of all intention links can be determined, and therefore whether the conversation is complete or not is determined, and the conversation integrity is obtained. It is also possible to compare the first sentences with each other, to compare the second sentences with each other, to determine whether there is a repeated conversation, or the like. The dialog integrity and/or the number of repeated dialogs are also used as feature information.

According to an embodiment of the present disclosure, the first sentence and the second sentence may also be input into a predetermined classification model via which a satisfied category between the first sentence and the second sentence is obtained. For example, a first sentence which is a question sentence and a second sentence which is a statement sentence adjacent to and subsequent to the first sentence may be spliced together as input of the predetermined classification model. And determining whether the statement sentence can be used as a reply sentence of the question sentence or not by a preset classification model, if so, determining that the satisfied category is satisfied, otherwise, determining that the satisfied category is not satisfied. The embodiment can determine the proportion of the satisfied categories by analyzing all sentences in the dialogue section, thereby determining the degree of satisfaction of the doctor-patient question and answer. Or the amount of the doctor answer information can be determined according to the number of the satisfied categories. The degree of satisfaction of the doctor-patient question and answer and the amount of the doctor answer information are used as characteristic information.

For example, the first intention recognition model, the second intention recognition model, and the predetermined classification model may be constructed for an optimization-based distributed gradient enhanced library (XGBoost). At least one of the first intention recognition model, the second intention recognition model and the predetermined classification model may be provided with, for example, a semantic understanding model and a logistic regression model to realize understanding of semantics and classification of sentences. Wherein the semantic understanding model may comprise, for example, a recurrent neural network model, it being understood that the present disclosure does not limit the types of the first intention recognition model, the second intention recognition model, and the predetermined classification model.

In operation S220, a quality assessment value for each history dialog is determined using a predetermined assessment model based on the feature information.

In operation S230, a history session in which the quality evaluation value is greater than a predetermined evaluation value threshold among the plurality of history sessions is determined, and candidate information is obtained.

According to the embodiment of the present disclosure, the feature information of each history dialog may be input into a predetermined evaluation model, and the quality evaluation value of each history dialog may be obtained by the output of the predetermined evaluation model. The predetermined evaluation model may be a Back Propagation (BP) neural network model, or the like. The predetermined evaluation model may be constructed, for example, based on the aforementioned XGBoost.

In an embodiment, the aforementioned first intention recognition model, second intention recognition model, predetermined classification model, and predetermined evaluation model may be integrated in an entire evaluation module, for example, from which a quality evaluation value is output by taking a history dialog as an input of the entire evaluation module.

According to the embodiment of the present disclosure, the maximum quality evaluation value is set to 1, and the predetermined evaluation value threshold value may be, for example, an arbitrary value not less than 0.5. Alternatively, the predetermined evaluation value threshold may be any value according to actual requirements, and the disclosure does not limit this.

Through the method of the embodiment, the dialog segment with higher quality evaluation can be screened from the historical dialog segments to serve as the candidate information, and the quality of the candidate information can be improved. Therefore, when the information is queried, the query result with high quality, high reference value and higher matching degree with the query statement is provided for the user conveniently.

Based on the method for determining candidate information described in fig. 2 above, the present disclosure also provides a method for determining a query result to obtain a query result satisfying a requirement from the candidate information. The method for determining the query result will be described in detail below with reference to fig. 3.

FIG. 3 is a flowchart illustrating a method of determining query results according to an embodiment of the present disclosure.

As shown in FIG. 3, the method 300 of determining a query result of this embodiment may include operations S310 to S330.

In operation S310, based on the query statement, a query expression for the query statement is obtained.

According to the embodiment of the disclosure, a plurality of words can be obtained by performing word segmentation processing on the query sentence. The query expression can be obtained by eliminating stop words from the plurality of words and substituting the rest words into the query expression template. For example, the remaining words may be concatenated in the form of "and" or "to obtain the query expression. The stop words may include prepositions, moods, auxiliary words, and the like, for example. The embodiment can maintain the stop word list, and the stop words can be removed by removing the words belonging to the stop word list from the words.

It is to be understood that the method in the related art can be adopted to obtain the query expression according to the query statement, and the disclosure does not limit this.

In operation S320, a plurality of dialog segments are obtained from the candidate information based on the query expression.

According to the embodiment of the disclosure, after the query expression is obtained, the dialog segment can be queried from the candidate information determined in the foregoing by using the query expression as a query condition, so as to obtain the dialog segment meeting the query condition. The method for querying information based on query expression is similar to the related art, and is not described in detail here. The operation S320 is different from the related art in that candidate information is screened from a plurality of history dialog segments by the aforementioned method of determining candidate information.

In operation S330, a target dialog segment among the plurality of dialog segments is determined as a query result for the query statement.

According to the embodiment of the disclosure, after obtaining a plurality of dialog segments from the candidate information, the plurality of dialog segments can be used as query results, and are sequentially arranged and fed back to the terminal device for the terminal device to display.

According to the embodiment of the disclosure, after obtaining the plurality of dialog segments, for example, the relevance of each dialog segment to the query information may also be determined, so as to select the dialog segment which is relevant or highly relevant to the query information from the plurality of dialog segments, and use the dialog segment as the target dialog segment. Wherein whether the dialog segment is a target dialog segment may be determined based on whether a correlation between the dialog segment and the query information is above a correlation threshold. The correlation may be determined by using cosine similarity, BM25 algorithm, and the like, which is not limited in this disclosure.

Illustratively, the dialog segments and the query sentence can be input into a semantic understanding model to extract semantic features, and then the semantic features are input into a logistic regression model classification model, and whether relevant classification results are output by the classification model. The logistic regression model may be, for example, a two-classification model, and the classification results may include correlations and uncorrelations.

Through the method of the embodiment, the query result can be screened from the candidate information with higher quality evaluation when the information is queried, and compared with the technical scheme of querying the information from all historical dialog segments in the related art, the method can improve the matching degree of the screened query result and the query statement, improve the reference value of the query result and the like.

Fig. 4 is a schematic diagram illustrating a principle of determining a first keyword for candidate information according to an embodiment of the present disclosure.

According to the embodiment of the present disclosure, in order to facilitate picking up a target dialog matching a query sentence from candidate information, a keyword may be added to the candidate information when determining the candidate information, and whether the target dialog matches the query sentence may be determined based on the keyword. For example, a TF-IDF model or the like may be employed to extract keywords of each history dialog as candidate information.

According to the embodiment of the disclosure, a topic word determination model can be adopted to determine a topic word of candidate information, and the topic word is taken as a first keyword aiming at the candidate information. Alternatively, the entity word in the candidate information may be determined by using the first entity recognition model, and the entity word may be used as the first keyword. Alternatively, the present disclosure may be maintained with a synonym library in advance, and after obtaining the subject term or the entity term, the embodiment may further obtain the synonym of the subject term or the entity term from the synonym library, and use the synonym as the first keyword. The topic word determination model may be, for example, an implicit Dirichlet Allocation (LDA) model, and the first entity identification model may be a model constructed by a bidirectional long-short term memory network model and a conditional random field model, or a model constructed by a Dilated convolutional network (DICNN) model and a conditional random field model, or any other model. It will be appreciated that any combination of the foregoing methods may be employed to derive the first keyword.

According to the embodiment of the disclosure, after the first entity word, the subject word and/or the synonym are obtained through the method, the words can be used as the initial words. And then dividing the initial word into words with preset granularity, and taking the words obtained by segmentation as first keywords. By adopting fine-grained words as the first keyword, the accuracy of the determined matching result can be improved when whether the first keyword is matched with the query statement or not is determined based on the first keyword, so that the accuracy of the determined query result is further improved, and the user experience is improved.

According to an embodiment of the disclosure, as shown in fig. 4, the embodiment 400 may employ the subject word determination model 420 to obtain a subject word 441 of the target sentence 411 in the candidate information 410, and employ the first entity recognition model 430 to determine the first entity word 442 in the other sentences 412 except the target sentence 411 in the candidate information 410. This is because the starting position in the dialog typically includes the content of the complaint, i.e. the personal need information described when the user is using to consult the information, which may be, for example, the personal illness information described for the user, or the simple characteristics of the item described for the user, etc. The subject matter is typically short and should be handled using a subject word determination model that is suitable for handling short text. And other contents except the main complaint contents generally comprise multiple rounds of conversations, the text contents are long, and entity words can be recognized by using an entity recognition model. By the method, the accuracy of the determined keywords can be improved.

Illustratively, after obtaining the topic word 441 and the first entity word 442, the synonym may be queried based only on the first entity word 442, i.e., the synonym 443 of the first entity word 442 in the synonym library 450 is determined. The synonym 443, the topic word 441, and the first entity word 442 are used as initial words. This is because the subject matter generally reflects the user's needs more, which may be obscured if the synonyms are queried based on the subject matter. After the initial words are obtained, fine-grained division may be performed on each word in the initial words to obtain a plurality of first keywords 460.

Fig. 5 is a schematic diagram of a principle of determining a weight of a first keyword according to an embodiment of the present disclosure.

According to the embodiment of the present disclosure, after the first keyword is determined, for example, a weight may be further assigned to the first keyword, so that when it is determined whether the dialog segment matches the query sentence based on the first keyword, the accuracy of the matching result is improved. This is because when the keywords are words with different attributes, the degree of influence on the correlation results is different.

As shown in fig. 5, the embodiment 500 may first determine the attribute type 530 of the initial word 510 divided into the first keyword 520 when determining the weight of the first keyword. The initial word 510 is then used as a target initial word for which a weight (i.e., initial word weight 540) is determined based on the attribute type 530. Then, the weight of the first keyword 520 is determined according to the ratio 550 between the number 521 of characters of the first keyword 520 and the number 511 of characters of the target initial word, and the initial word weight 540.

For example, a plurality of attribute types may be preset according to actual requirements. The embodiment may obtain the attribute type of the initial word via the model output while obtaining the initial word by the foregoing method. A mapping relationship between attribute types and weights may be established. After the attribute type of the first keyword is obtained, the weight for the first keyword can be determined according to the mapping relationship.

For example, a subject term determination model constructed from a bidirectional recurrent neural network model and a conditional random field model may be employed. The input of the model is a target sentence, and the output is a subject word and the attribute type of each subject word. In an online inquiry scenario in the medical field, the attribute types of the subject term may include, for example: symptoms, complications, intentions, background, degree of illness, and the like. For example, the topic words may be categorized into multiple classes according to attribute types, with different weights for topic words in different classes. For example, the subject words whose attribute types are symptom, disease, intention, and the like may be classified into a third class, the subject words whose attribute types are parallel disease, parallel symptom, and the like may be classified into a second class, the subject words whose attribute types are background, and the like may be classified into a first class, and the weights of the three classes may be sequentially reduced. Taking target sentences of 'how to get back when the shank is cramped in six months of pregnancy', 'how to do insomnia and sleep in four days' and 'harm of Maijin mu Gua Ge Gen piece' as examples, the determined subject terms and the filing times of the subject terms are shown in the following table.

Target sentence	Third gear	Second gear	First gear
				How to get back when the shank is cramped in six months of pregnancy	Pregnant, cramp on shank	Six months old
How insomnia is difficult to sleep for four consecutive days		Insomnia and inability to sleep	Four consecutive days
				Harm of Maijin Limuguage root slices	Maijin Limu Gua Kudzuvine root slice damaging

For example, a named entity recognition model may be employed as the first entity recognition model. The input of the model is other sentences, and the output is entity words included in the other sentences and attribute types of the entity words. In an online inquiry scenario in the medical field, the attribute types of the entity words may include, for example, disease names, symptom types, medicines, examination names, treatment names, and the like. Since the attribute type of a synonym for a first entity word is generally the same as the first entity word, the weight of the first entity word may be given to its synonym.

According to an embodiment of the present disclosure, for a case where the target initial word is the first entity word and the synonym, the weight determined based on the attribute type may be taken as the first sub-weight. The first sub-weight is then adjusted according to the satisfied categories between the sentence to which the target initial word belongs and other sentences. In particular, a second sub-weight for the target initial word may be determined according to the satisfaction category. And determining the initial weight of the target initial word according to the first sub-weight and the second sub-weight. When the satisfaction category between the statement to which the target initial word belongs and other statements is satisfied, the second sub-weight is larger, otherwise, the second sub-weight is smaller. Where satisfying includes satisfying and being satisfied by other statements. This is because the word in the sentence whose satisfaction category is satisfaction can provide a higher reference value to the user, and by giving a higher weight to the word, the finally determined query result can be more accurate and information that can provide help to the user.

For example, when the second sub-weight is determined according to the satisfied category, it may be determined first whether the sentence type to which the initial word belongs is, for example, a statement sentence or an question sentence. If the statement sentence is a statement sentence, the lowest second sub-weight is given. If the question is not satisfied, the second lowest sub-weight is assigned. If the question sentence is satisfied, the highest second sub-weight is given. This is because the query sentence is usually an question sentence, and words in the satisfied sentence are assigned with higher weights for the question sentence, which can improve the possibility that the query result can satisfy the user's requirement.

For example, after the first sub-weight and the second sub-weight are obtained, the product of the two sub-weights may be used as the initial weight. Or the sum of the two sub-weights may be used as the initial weight. As long as the initial weight is positively correlated with the first sub-weight and positively correlated with the second sub-weight, this disclosure does not limit this.

According to the embodiment of the disclosure, after the weight for the first keyword is obtained, the product of the ratio between the number of words and the weight can be used as the weight of the first keyword. Alternatively, the sum of the product and a predetermined value may be used as the weight of the first keyword. The present disclosure does not limit this, as long as the ratio between the weight of the first keyword and the number of words is positively correlated.

According to an embodiment of the present disclosure, a product of a ratio between the aforementioned numbers of words and a weight may be used as the initial weight. The initial weights 560 are then adjusted based on the source 570 of the target initial word to obtain the weight of the first keyword (i.e., keyword weight 580). The source of the target initial word means that the target initial word is the aforementioned subject word, first entity word or synonym.

For example, the weighting of the first keyword obtained by dividing the subject term may be increased, or the weighting of the first entity term or the synonym may be decreased. The main content is more similar to the structure of the query sentence, and the keywords in the main content can play a greater role in matching with the query sentence by performing an increasing operation on the keywords obtained from the main term or performing a decreasing operation on the keywords in the entity words obtained from the text part and the synonyms derived from the entity words, so that the accuracy of the determined query result is further improved.

For example, the weights of the first keywords obtained by dividing the subject term can be normalized, so that the situation that the weights of the same keyword are not comparable due to different subject complaints can be avoided. By this normalization processing, the sum of the weights of all the first keywords included in the subject term of the target sentence can be made a predetermined value, and the sum of the weights of all the first keywords in the subject terms included in the target sentences in different target sentences can be made equal.

For example, the first keyword may be de-duplicated before the initial weight 560 is adjusted. When the weights of two identical first keywords obtained from the same candidate information are different, a higher-weight word may be selected and a lower-weight word may be removed. In the process of removing the duplication, for example, the duplication may be removed for the first keywords from different sources, for example, the duplication is removed for the first keywords obtained by dividing the subject term, and the duplication is removed for the first keywords obtained by dividing the first entity term and the synonyms thereof, instead of removing the duplication after mixing all the first keywords, so that the weights of the first keywords are adjusted according to different sources.

According to the embodiment of the present disclosure, a word expressing the third intention type of the target sentence in the candidate information may also be taken as the first keyword. Wherein the third intent type may be determined using a third intent recognition model. Specifically, the target sentence in the candidate information may be used as the input of the third intention recognition model, and the third intention recognition type is output. The third intention recognition model is similar to the first intention recognition model and the second intention recognition model described above, and will not be described in detail here.

According to the method and the device for searching the dialogue segment, the keywords aiming at the candidate information are determined, the weights of the keywords are determined, and the inquiry result related to the inquiry statement can be screened out from the dialogue segment based on the keywords and the weights after the dialogue segment is obtained from the candidate information in the follow-up process.

FIG. 6 is a schematic diagram illustrating a principle of determining a second keyword for a query statement according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, in screening a target dialog segment from a plurality of dialog segments obtained based on a query expression, for example, a plurality of second keywords for a query sentence and a weight of each of the plurality of second keywords may be determined first. A third keyword and a weight of the third keyword for each dialog segment are then determined for each dialog segment of the plurality of dialog segments. The third keyword is the first keyword obtained by the method, and the weight of the third keyword is the weight of the first keyword determined by the method. And finally, determining the dialog segment related to the query sentence in the dialog segments as the target dialog segment based on the second keywords and the third keywords.

For example, a word belonging to the third keyword among the plurality of second keywords may be determined, and then a product between a weight of the word in the second keyword and a weight in the third keyword may be determined as a weight product. And finally, adding the weight products obtained based on all the words belonging to the third key word to obtain the correlation degree between the query statement and the dialog segment. Finally, a predetermined number of dialog segments having a high degree of correlation with the query sentence are selected from the plurality of dialog segments as the target dialog segments. Or selecting a dialog segment with the relevance degree higher than the relevance degree threshold value with the query sentence from the plurality of dialog segments as the target dialog segment.

As shown in fig. 6, in the embodiment 600, when determining the second keyword and the weight thereof, a word segmentation process may be performed on the query statement 610 to obtain respective weights of the plurality of first words and the plurality of first words 611. For example, a dictionary and statistics-based method may be used to segment the query sentence and determine the weight of each word obtained by segmentation. The dictionary-based and statistical method may include a tree-based word segmentation method or a word segmentation based on a predetermined dictionary and then a tf-idf algorithm is used to determine the weight of each word. It is understood that the method adopted by the word segmentation processing is not limited in the present disclosure, and any word segmentation method can be adopted according to actual requirements. And taking the first word obtained by word segmentation as a first keyword.

According to the embodiment of the present disclosure, as shown in fig. 6, after obtaining the first word 611, for example, a synonym of the first word 611 may be queried from the synonym library 620, and the synonym is used as the second word 621, and the first word 611 and the second word 621 are used as the second keyword. Meanwhile, the weight of the second word 621 may be determined according to the weight of the first word 611. For example, the weight of the first word 611 may be taken as the weight of its synonym. In this way, the query statement may be augmented, for example, with the first word comprising "headache" and its synonym "headache" being derived from the query. By this expansion, the accuracy and number of the finally determined query results can be improved.

According to an embodiment of the present disclosure, in determining the second keyword, as shown in fig. 6, a fourth intention type of the query sentence 610 may also be determined using the fourth intention recognition model 630, a third word expressing the fourth intention type is taken as a part of the second keyword 690, and the third word is given a first predetermined weight. The first predetermined weight may be, for example, a higher value, for example, greater than the weight of the first words. This is due to the intent to be able to more accurately reflect the user's needs. Then in determining the relevance between the dialog segment and the query statement, the third word may also be compared with the first keyword in the dialog segment representing the content intention type of the main complaint, and the relevance is determined according to the comparison result. The fourth intention recognition model 630 is similar to the first intention recognition model and the second intention recognition model described above, and is not repeated here.

The present disclosure may also maintain, for example, a library of intended words 650, which library of intended words 650 may be in the form of a knowledge graph, for example, or may maintain associations between the intended words, according to embodiments of the present disclosure. After obtaining the third word 640, the embodiment may also query the intention word bank 650 based on the third word 640, query a fourth word 660 associated with the third word from the intention word bank, and obtain a weight of the fourth word 660 based on the weight of the third word 640. For example, the weight of the third word 640 may be assigned to the weight of the fourth word 660 associated therewith. The fourth word 660 is included as part of the second keyword 690. Thereby, it can be further ensured that the second keyword can sufficiently express the intention of the user.

According to an embodiment of the present disclosure, when determining the second keyword, the second keyword may be screened out from the entity words included in the query sentence, since the entity words can generally more accurately represent the query sentence. It may therefore be convenient to improve the efficiency and accuracy of the determined correlations.

For example, as shown in FIG. 6, a second entity recognition model 670 can be employed to determine a second entity word 680 included in the query statement 610. A word of the plurality of first words 611 and second words 621 belonging to second entity word 680 is then determined as part of second keyword 690. Specifically, the intersection of the second entity word and the first word and the intersection of the second entity word and the second word are used as a part of the second keyword.

According to the embodiment of the present disclosure, a word belonging to the second entity word 680 in the first word and the second word may be further used as a target word, the target word is screened according to the weight, and a target word with a higher weight is screened as a part of the second keyword 690. Therefore, the screened words can represent the query statement better, and the efficiency and the accuracy of determining the relevance are further improved. Specifically, the words with weights greater than the weight threshold in the target words may be determined as the second keywords. The weight threshold may be set according to actual requirements, which is not limited by this disclosure.

Illustratively, the weight threshold may be dynamically adjusted according to the query statement, for example, to filter out a more appropriate number of second keywords for query statements of different lengths. For example, the weight threshold may be associated with a number of the plurality of first words 611 resulting from the query statement participle.

Illustratively, the weight threshold may be expressed as: the weight threshold is constant N/first word number 0.1. The value of the constant N may be set according to actual requirements, for example, may be 3, which is not limited in this disclosure.

According to the embodiment of the disclosure, when the user inputs the query sentence, a plurality of recommendation labels can be displayed to the user, so that the user can input the more accurate and more compliant query sentence conveniently. The recommended tags may be determined in real time based on characters that the user has entered. For example, if the character entered by the user includes "headache," the label recommended to the user may include "cause," "make sleepy," "treat," and the like. When the user selects the recommendation label, the fifth word indicated by the recommendation label can be used as the second keyword, and a second predetermined weight is given to the fifth word, that is, the weight of the fifth word is determined to be the second predetermined weight. The second predetermined weight may take a larger value, and the second predetermined weight may be equal to or different from the aforementioned first predetermined weight, and the disclosure does not limit the second predetermined weight and the information of the recommendation label. By using the word represented by the recommended label selected by the user as the keyword, the ability of the second keyword to express the user's requirement can be improved, and thus the accuracy of determining the relevance can be improved.

According to the embodiment of the present disclosure, after the plurality of second keywords included in the query statement are determined, for example, the query expression may also be determined according to the plurality of second keywords and respective weights of the plurality of second keywords.

Illustratively, several words with higher weights can be selected from the second keywords as query keywords, and the query keywords are spliced in the form of "and" to obtain a query expression.

For example, the indispensable word and the unnecessary word in the plurality of second keywords can be determined according to the weights of the plurality of second keywords. For example, a predetermined number of candidate words with higher weights may be selected, with the remaining words being optional words. Then, based on the necessary choice and the unnecessary choice, an expression template is adopted to obtain a query expression. For example, the necessary alternatives can be spliced in the form of "and", and the unnecessary alternatives can be spliced in the form of "or", so as to obtain the query expression. By the method, the integrity and the accuracy of the query expression can be improved, and the accuracy and the diversity of the multiple dialog segments obtained by query can be improved.

Fig. 7 is a schematic diagram of a principle of determining a target dialog segment among a plurality of dialog segments according to an embodiment of the present disclosure.

According to the embodiment of the disclosure, after the third keyword and the weight thereof of each dialog segment and the second keyword and the weight thereof of the query sentence are obtained, the correlation between each dialog segment and the query sentence can be determined based on the information.

For example, the intersection of the second keyword and the third keyword may be determined first. And taking the proportion of the number of the words in the intersection to the number of the words in the union set of the second key words and the third key words as the correlation. Alternatively, the sum of the products of the weights of each word in the intersection in the third keyword and the weights in the second keyword may be used as the value characterizing the relevance. Or, the words in the cross set may be used as the target keywords, the sum of the weights of the target keywords in the second keywords may be used as a first value, the sum of the weights of the target keywords in the third keywords may be used as a second value, and the ratio between the first value and the second value may be used as the correlation between each dialog segment and the query sentence.

According to the embodiment of the disclosure, when determining the relevance, for example, semantic similarity between the dialog segment and the query statement can be further considered, so that the accuracy of the determined relevance is improved. In this embodiment, the aforementioned value of the correlation determined according to the third keyword and the weight thereof and the second keyword and the weight thereof may be used as the first sub-similarity. And taking the semantic similarity between the dialog segment and the query statement as a second sub-similarity. Finally, a relevance is determined based on the first sub-similarity and the second sub-similarity.

Illustratively, only semantic similarity between the target statement and the query statement in the dialog may be considered. This is because the structure of the target statement and the query statement in the dialog segment is more similar, and therefore the determined semantic similarity is more accurate. The target sentence may be, for example, a main complaint. The embodiment may employ a semantic similarity algorithm, for example, to determine semantic similarity. The Semantic similarity algorithm may include, for example, a Deep web-based Semantic Model (DSSM), a CNN-DSSM Model, or an LSTM-DSSM Model, and the like, which is not limited by this disclosure.

According to the embodiment of the present disclosure, when determining the relevance, in addition to the foregoing determination of the relevance based on the keyword and the weight, the similarity between the intention of the query sentence and the intention of the target sentence may be further fused. Therefore, the importance of intention matching is highlighted, and the possibility that the screened query result can meet the user requirement is improved. In this embodiment, the aforementioned value of the correlation determined according to the third keyword and the weight thereof and the second keyword and the weight thereof may be used as the first sub-similarity. And taking the similarity between the intention of the query statement and the intention of the target statement in each dialog segment as a third sub-similarity. Finally, a correlation is determined based on the first and third sub-similarities.

For example, a word representing the intention type may be picked out from the second keyword, and a word representing the intention type may be picked out from the third keyword. And taking the editing distance, the cosine similarity and the like of the two selected words as the similarity between the intention of the query sentence and the intention of the target sentence in each dialog segment.

According to the embodiment of the present disclosure, in determining the relevance, in addition to the foregoing determination of the relevance based on the keyword and the weight, semantic similarity between the dialog and the query sentence may be considered, and the similarity between the intention of the query sentence and the intention of the target sentence may be fused.

As shown in fig. 7, the embodiment 700 may determine the first sub-similarity 730 based on the second keyword 711 and the weight thereof and the third keyword 721 and the weight thereof for each dialog segment 720 after determining the second keyword 711 and the weight thereof for the query sentence 710 and the third keyword 721 and the weight thereof for each dialog segment 720. Meanwhile, a semantic similarity algorithm 740 may be employed to determine the semantic similarity between the query statement 710 and the target statement in the dialog 720 as a second sub-similarity 750. A word of the representation type is determined as a first intention word 712 for the second keyword of the query sentence 710, and a word of the representation type is determined as a second intention word 722 for the third keyword of each dialog segment 720. The similarity between the first intended word 712 and the second intended word 722 is then determined as a third sub-similarity 760. Finally, it is determined whether each dialog segment is relevant to the query statement based on the first sub-similarity 730, the second sub-similarity 750, and the third sub-similarity 760.

Illustratively, the sum of the three sub-likelihoods may be taken as the degree of correlation between the query statement and each dialog segment. If the degree of correlation is above the degree of correlation threshold, then it may be determined that each dialog segment 720 is relevant to the query statement 710, with each dialog segment 720 being the target dialog segment. It is understood that, for example, an average value of the three sub-similarities may be used as the correlation, or an arithmetic square root of the three sub-similarities may be used as the correlation, and the disclosure is not limited thereto.

Illustratively, as shown in fig. 7, the first sub-similarity 730, the second sub-similarity 750 and the third sub-similarity 760 may be used as inputs of a predetermined logistic regression model 770, and a classification result 780 is obtained after being processed by the predetermined logistic regression model 770 as a classification result for each dialog segment 720. The classification result 780 is, for example, a binary classification result, either related or unrelated. In this way, the dialog segment whose classification result is relevant can be taken as the target dialog segment.

FIG. 8 is a schematic diagram of ordering a plurality of target dialog segments according to an embodiment of the present disclosure.

According to the embodiment of the disclosure, under the condition that a plurality of target dialog segments are obtained, for example, the target dialog segments can be sequenced, so that the efficiency of finding the dialog segment meeting the requirement by the user is improved, and the user experience is improved.

Illustratively, the plurality of target dialog segments may be ordered according to the previously determined relevance from high to low.

For example, if the correlation is a binary result, the embodiment may first determine a correlation evaluation value for each of the target dialog segments by using a ranking model. The plurality of target dialog segments are then ranked based on the relevance estimates.

The ranking model may be, for example, a logistic regression model, and the input is a dialog and a query sentence, and the output is a relevance assessment value.

As shown in fig. 8, the ranking model in this embodiment 800 may also be a model that considers the partial ordering relationship between two samples, for example. The target dialog segments are set to be n, wherein n is a value larger than or equal to 2. In this embodiment, the first dialog section 801 to the nth dialog section 803 may be combined two by two to obtain a plurality of dialog section pairs. For example, the first dialog segment 801 and the second dialog segment 802 may be combined to yield a dialog segment pair 811, the second dialog segment 802 and the nth dialog segment 803 may be combined to yield a dialog segment pair 812, and the first dialog segment 801 and the nth dialog segment 803 may be combined to yield a dialog segment pair 813. This embodiment may use the ranking model 820 to derive a partial ordering relationship for two dialog segments in each dialog segment pair, e.g., to derive a relevance estimate for two dialog segments relative to each other, based on each dialog segment pair. Finally, the n dialog segments are sorted according to the partial order relationship between every two of the n dialog segments, and a sorting result 830 is obtained. The ranking model 820 may be, for example, a RankSVM, a GBRank, etc., which is not limited by this disclosure.

For example, after determining the relevance evaluation value, for each dialog segment, the weight of the relevance evaluation value of each dialog segment may be determined based on the matching relationship between the word belonging to the target category in the query sentence and the word belonging to the target category in each dialog segment, for example. Then, a weighted evaluation value for each dialog is determined based on the weight of the correlation evaluation value for each dialog. Finally, the plurality of target dialog segments are arranged based on the weighted evaluation value. The target category may include, for example, words describing user attribute information, disease names, symptom names, and the like. If there is a word of the matched target category, the weight of the correlation evaluation value is determined to be a third predetermined weight. Alternatively, the more words of the target category that match, the higher the weight of the determined relevance score. The number of the matched words and the weight of the correlation evaluation value are positively correlated with each other, for example, they may be in an exponential relationship, or may be in a proportional relationship, etc., which is not limited in this disclosure.

Based on the method for determining the candidate information, the disclosure also provides a device for determining the candidate information. The means for determining candidate information will be described in detail below with reference to fig. 9.

Fig. 9 is a block diagram of a structure of an apparatus for determining candidate information according to an embodiment of the present disclosure.

As shown in fig. 9, the apparatus 900 for determining candidate information of this embodiment may include a feature information extraction module 910, a first evaluation value determination module 920, and a candidate information obtaining module 930.

The feature information extraction module 910 is configured to extract, for each historical dialog segment of the plurality of historical dialog segments, feature information of each historical dialog segment. In an embodiment, the feature information extracting module 910 may be configured to perform the operation S210 described above, which is not described herein again.

The first evaluation value determining module 920 is configured to determine a quality evaluation value of each history dialog using a predetermined evaluation model based on the feature information. In an embodiment, the first evaluation value determining module 920 may be configured to perform the operation S220 described above, which is not described herein again.

The candidate information obtaining module 930 is configured to determine a history dialog segment with a quality evaluation value greater than a predetermined evaluation value threshold from among a plurality of history dialog segments, and obtain candidate information. In an embodiment, the candidate information obtaining module 930 may be configured to perform the operation S240 described above, which is not described herein again.

According to an embodiment of the present disclosure, the above-mentioned feature information extraction module 910 may include a first intention determination sub-module, a second intention determination sub-module, a satisfaction category obtaining sub-module, and a feature obtaining sub-module. The first intention determining submodule is used for taking a first statement aiming at the first object in each historical dialog segment as an input of a first intention recognition model and obtaining a first intention type of the first statement. The second intention determining submodule is used for taking a second statement aiming at a second object in each historical dialog segment as an input of the first intention recognition model and obtaining a second intention type of the second statement. The satisfaction category obtaining submodule is used for inputting the first statement and the second statement into a preset classification model and obtaining a satisfaction category between the first statement and the second statement. The feature obtaining submodule is used for determining feature information of each historical dialog segment based on the first intention type, the second intention type and the satisfaction category.

According to an embodiment of the present disclosure, the feature information extraction module 910 may further include a keyword determination module, configured to determine a first keyword for the candidate information. The keyword determining module comprises an initial word obtaining sub-module and a keyword obtaining sub-module. Wherein the initial word obtaining submodule is used for obtaining the initial word by at least one of the following modes: determining the subject term of the target sentence in the candidate information by adopting a subject term determination model; determining first entity words in other sentences except the target sentence in the candidate information by adopting a first entity recognition model; determining synonyms of the first entity words in the synonym library. The keyword obtaining sub-module is used for dividing each word in the initial words into words with preset granularity to obtain a first keyword.

According to an embodiment of the present disclosure, the feature information extraction module 910 may further include a weight determination module, configured to determine a weight of the first keyword after the keyword determination module determines the first keyword for the candidate information. The weight determination module includes a first determination submodule, a second determination submodule, and a weight adjustment submodule. The first determining submodule is used for determining the weight of the target initial word based on the attribute type of the target initial word of the first keyword obtained through division. The second determining submodule is used for determining the initial weight of the first keyword based on the ratio of the number of the characters of the first keyword to the number of the characters of the target initial word and the weight of the target initial word. The weight adjusting submodule is used for adjusting the initial weight based on the source of the target initial word to obtain the weight of the first keyword.

According to an embodiment of the present disclosure, the weight adjustment sub-module is configured to, when a source of the target initial word is a subject word, perform normalization processing on the initial weight, so that a sum of weights of all first keywords included in the subject word of the target sentence is a predetermined value.

According to an embodiment of the present disclosure, the weight adjustment submodule is configured to perform a weight reduction process on the initial weight if the source of the target initial word is the first entity word or the synonym.

According to an embodiment of the present disclosure, the above-described first determination submodule may include a first sub-weight determination unit, a second sub-weight determination unit, and an initial weight determination unit. The first sub-weight determining unit is used for determining a first sub-weight aiming at the target initial word based on the attribute type of the target initial word. The second sub-weight determination unit is used for determining a second sub-weight aiming at the target initial word based on the satisfaction category between the sentence to which the target initial word belongs and other sentences. The initial weight determination unit is used for determining an initial weight of the target initial word based on the first sub-weight and the second sub-weight.

According to an embodiment of the present disclosure, the keyword determination module may further include a third intention determination submodule, configured to determine a third intention type of the target sentence in the candidate information by using a third intention recognition model, and determine a word expressing the third intention type as the first keyword.

Based on the method for determining the query result, the disclosure also provides a device for determining the query result. The means for determining the query result will be described in detail below with reference to fig. 10.

Fig. 10 is a block diagram of an apparatus for determining a query result according to an embodiment of the present disclosure.

As shown in fig. 10, the apparatus 1000 for determining a query result of this embodiment may include an expression obtaining module 1010, a dialog obtaining module 1020, and a query result determining module 1030.

The expression obtaining module 1010 is configured to obtain a query expression for a query statement based on the query statement. In an embodiment, the expression obtaining module 1010 may be configured to perform the operation S310 described above, which is not described herein again.

Dialog segment acquisition module 1020 is configured to obtain a plurality of dialog segments from the candidate information based on the query expression. Wherein the candidate information is determined using the means for determining candidate information described above. In an embodiment, the dialog obtaining module 1020 may be configured to perform the operation S320 described above, which is not described herein again.

The query result determination module 1030 is configured to determine a target dialog segment of the plurality of dialog segments as a query result for the query statement. In an embodiment, the query result determining module 1030 may be configured to perform the operation S330 described above, which is not described herein again.

According to an embodiment of the present disclosure, the query result determining module 1030 may be configured to determine a correlation between each of the plurality of dialog segments and the query statement, so as to determine the target dialog segment based on the correlation. The query result determination module 1030 may include: a third determination submodule, a fourth determination submodule, and a correlation determination submodule. The third determination sub-module determines a plurality of second keywords for the query statement and a weight for each of the plurality of second keywords. The fourth determining submodule is used for determining a third key word of each dialog segment and the weight of the third key word aiming at each dialog segment in the plurality of dialog segments. The relevance determination sub-module is used for determining the relevance between each dialog segment and the query statement based on the plurality of second key words and the third key words.

According to an embodiment of the present disclosure, the correlation determination sub-module may include: the device comprises a first sub-similarity determining unit, a second sub-similarity determining unit, a third sub-similarity determining unit and a correlation determining unit. The first sub-similarity determining unit is used for determining a first sub-similarity between the query statement and each dialog segment based on the weight of each second keyword and the weight of the third keyword in the plurality of second keywords. The second sub-similarity determining unit is used for determining semantic similarity between the query statement and the target statement in each dialog segment as a second sub-similarity. The third sub-similarity determination unit is configured to determine a similarity between the intention of the query sentence and the intention of the target sentence in each dialog as a third sub-similarity. The correlation determination unit is used for determining whether each dialog segment is correlated with the query statement or not based on the first sub-similarity, the second sub-similarity and the third sub-similarity.

According to an embodiment of the present disclosure, the first above-described similarity determining unit includes a target determining subunit, a first value determining subunit, a second value determining subunit, and a similarity determining subunit. And the target determining subunit is used for determining the intersection between the plurality of second key words and the third key words to obtain the target key words. The first value determining subunit is configured to determine a sum of weights of the target keywords in the plurality of second keywords, so as to obtain a first value. And the second value determining subunit is used for determining the sum of the weights of the target keywords in the third keyword to obtain a second value. The similarity determining subunit is configured to determine a ratio of the first value to the second value as a first sub-similarity.

According to an embodiment of the present disclosure, the correlation determination unit is configured to obtain the classification result for each dialog segment with the first sub-similarity, the second sub-similarity, and the third sub-similarity as inputs of a predetermined logistic regression model. Wherein the classification result comprises correlation or non-correlation.

According to an embodiment of the present disclosure, the apparatus 1000 for determining a query result may further include a second evaluation value determining module and a sorting module. The second evaluation value determination module is configured to determine a relevance evaluation value for each of the plurality of target dialog segments using the ranking model. The ranking module is used for ranking the target dialog segments based on the relevance evaluation value.

According to an embodiment of the present disclosure, the above ranking module may include a weight determination sub-module, a weighted evaluation determination sub-module, and a ranking sub-module. The weight determination sub-module is configured to determine, for each dialog, a weight of the relevance evaluation value for each dialog based on a matching relationship between a word belonging to the target category in the query sentence and a word belonging to the target category in each dialog. The weighted evaluation determination sub-module is configured to determine a weighted evaluation value for each dialog based on the weight of the relevance evaluation value for each dialog. The sorting submodule is used for sorting the target dialog segments based on the weighted evaluation value.

According to an embodiment of the present disclosure, the expression obtaining module 1010 is specifically configured to determine a query expression for the query statement according to the weight of each second keyword and the plurality of second keywords.

According to an embodiment of the present disclosure, the expression obtaining module 1010 may include a word determination submodule and an expression determination submodule. And the word determining submodule is used for determining necessary word selection and unnecessary word selection in the plurality of second keywords according to the weight of each second keyword. And the expression determining submodule is used for obtaining the query expression by adopting the expression template based on the necessary word and the unnecessary word.

According to an embodiment of the present disclosure, the third determining sub-module may include a first word obtaining unit, a second word obtaining unit, and a third word obtaining unit. The first word obtaining unit is used for performing word segmentation processing on the query sentence to obtain a plurality of first words and respective weights of the plurality of first words. The second word obtaining unit is used for obtaining the weights of the second words and the second words based on the synonyms of the first words and the respective weights of the first words in the preset synonym library. The third word obtaining unit is used for determining a fourth intention type of the query statement by adopting a fourth intention recognition model, obtaining a third word expressing the fourth intention type and determining the weight of the third word as a first preset weight.

According to an embodiment of the present disclosure, the third determining sub-module may further include an entity word determining unit, a target word determining unit, and a keyword determining unit. The entity word determining unit is used for determining a second entity word included in the query statement by adopting a second entity recognition model. The target word determining unit is configured to determine, as a target word, a word belonging to the second entity word from the plurality of first words and the second word. The keyword determining unit is configured to determine, as the second keyword, a word with a weight greater than a weight threshold in the target words. Wherein the weight threshold is associated with a number of the plurality of first words.

According to an embodiment of the present disclosure, the third determining sub-module may further include a fourth word obtaining unit, configured to obtain weights of a fourth word and a fourth word based on the word associated with the third word in the intention word bank and the weight of the third word.

According to an embodiment of the present disclosure, the third determining sub-module may further include a fifth word determining unit and a weight determining unit. The fifth word determining unit is used for responding to the selection of the displayed recommended labels, and determining the fifth words represented by the selected labels as the second key words for the query statement. The weight determination unit is used for determining the weight of the fifth word as a second predetermined weight.

It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, and the like of the personal information of the related user all conform to the regulations of the relevant laws and regulations, and do not violate the common customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 11 shows a schematic block diagram of an example electronic device 1100 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the device 1100 comprises a computing unit 1101, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM1103, various programs and data necessary for the operation of the device 1100 may also be stored. The calculation unit 1101, the ROM 1102, and the RAM1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

A number of components in device 1100 connect to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, and the like; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108 such as a magnetic disk, optical disk, or the like; and a communication unit 1109 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 can be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 1101 performs the respective methods and processes described above, such as the method of determining candidate information and/or the method of determining a query result. For example, in some embodiments, the methods of determining candidate information and/or the methods of determining query results may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1108. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1100 via ROM 1102 and/or communication unit 1109. When loaded into RAM1103 and executed by computing unit 1101, may perform one or more steps of the above described method of determining candidate information and/or method of determining query results. Alternatively, in other embodiments, the computing unit 1101 may be configured by any other suitable means (e.g., by means of firmware) to perform the method of determining candidate information and/or the method of determining query results.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of determining candidate information, comprising:

extracting, for each historical dialog segment of a plurality of historical dialog segments, feature information of the each historical dialog segment;

determining a quality evaluation value of each historical dialog segment by adopting a preset evaluation model based on the characteristic information; and

and determining the history dialogue section of which the quality evaluation value is larger than a preset evaluation value threshold value in the plurality of history dialogue sections to obtain the candidate information.

2. The method of claim 1, wherein extracting feature information of the each historical dialog segment comprises:

taking a first statement aiming at a first object in each historical dialog segment as an input of a first intention recognition model, and obtaining a first intention type of the first statement;

taking a second statement aiming at a second object in each historical dialog segment as an input of a first intention recognition model, and obtaining a second intention type of the second statement;

inputting the first statement and the second statement into a preset classification model to obtain a satisfied category between the first statement and the second statement; and

determining feature information for the each historical dialog segment based on the first intent type, the second intent type, and the satisfaction category.

3. The method of claim 1, further comprising determining a first keyword for the candidate information after obtaining the candidate information by:

obtaining the initial word by at least one of:

determining the subject term of the target sentence in the candidate information by adopting a subject term determination model;

determining first entity words in other sentences except the target sentence in the candidate information by adopting a first entity recognition model;

determining synonyms of the first entity words in a synonym library;

and dividing each word in the initial words into words with preset granularity to obtain the first keyword.

4. The method of claim 3, further comprising, after determining a first keyword for the candidate information, determining a weight of the first keyword by:

determining the weight of the target initial word aiming at the target initial word based on the attribute type of the target initial word of the first keyword obtained by dividing;

determining the initial weight of the first keyword based on the ratio of the number of the characters of the first keyword to the number of the characters of the target initial word and the weight of the target initial word; and

and adjusting the initial weight based on the source of the target initial word to obtain the weight of the first keyword.

5. The method of claim 4, wherein, in the case that the source of the target initial word is the subject word, adjusting the initial weight comprises:

and normalizing the initial weight so that the sum of the weights of all first keywords included by the subject term of the target sentence in the candidate information is a preset value.

6. The method of claim 4, wherein, in the case that the source of the target initial word is the first entity word or the synonym, adjusting the initial weight comprises:

and performing weight reduction processing on the initial weight.

7. The method of claim 6, wherein, in the case that the target initial word is the first entity word, determining a weight for the target initial word comprises:

determining a first sub-weight for the target initial word based on the attribute type of the target initial word;

determining a second sub-weight for the target initial word based on a satisfaction category between the sentence to which the target initial word belongs and the other sentences; and

determining an initial weight for the target initial word based on the first and second sub-weights.

8. The method of any of claims 3-7, wherein determining a first keyword for the candidate information further comprises:

and determining a third intention type of the target sentence in the candidate information by adopting a third intention recognition model, and determining a word expressing the third intention type as the first keyword.

9. A method of determining query results, comprising:

obtaining a query expression aiming at a query statement based on the query statement;

obtaining a plurality of dialog segments from candidate information based on the query expression; and

determining a target dialog segment of the plurality of dialog segments as a query result for the query statement,

wherein the candidate information is determined by the method of any one of claims 1 to 8.

10. The method of claim 9, wherein determining a target dialog segment of the plurality of dialog segments comprises determining a relevance between each of the plurality of dialog segments and the query statement to determine the target dialog segment based on the relevance by:

determining a plurality of second keywords for the query statement and a weight for each of the plurality of second keywords;

determining, for each dialog segment of the plurality of dialog segments, a third keyword of the each dialog segment and a weight of the third keyword; and

determining a relevance between the each dialog segment and the query statement based on the plurality of second keywords and the third keyword.

11. The method of claim 10, wherein determining a relevance between the each dialog segment and the query statement comprises:

determining a first sub-similarity between the query statement and each dialog segment based on the weight of each of the plurality of second keywords and the weight of the third keyword;

determining semantic similarity between the query statement and the target statement in each dialog segment as a second sub-similarity;

determining similarity between the intention of the query statement and the intention of the target statement in each dialog segment as a third sub-similarity; and

determining whether the each dialog segment is relevant to the query statement based on the first sub-similarity, the second sub-similarity, and the third sub-similarity.

12. The method of claim 11, wherein determining a first sub-similarity between the query statement and the each dialog segment comprises:

determining the intersection between the plurality of second keywords and the third keyword to obtain a target keyword;

determining the sum of the weights of the target keywords in the plurality of second keywords to obtain a first value;

determining the sum of the weights of the target keywords in the third keywords to obtain a second value; and

determining a ratio between the first value and the second value as the first sub-similarity.

13. The method of claim 11, wherein determining whether the each dialog segment is relevant to the query statement comprises:

obtaining a classification result for the each dialog segment with the first sub-similarity, the second sub-similarity, and the third sub-similarity as inputs of a predetermined logistic regression model,

wherein the classification result comprises a correlation or a non-correlation.

14. The method of claim 9, wherein the target dialog segment is plural; the method further comprises the following steps:

determining a relevance evaluation value of each dialog segment in the plurality of target dialog segments by adopting a sequencing model; and

ranking the plurality of target dialog segments based on the relevance assessment value.

15. The method of claim 14, wherein ordering the plurality of target dialog segments comprises:

for each dialog segment, determining a weight of the relevance evaluation value of each dialog segment based on a matching relationship between a word belonging to a target category in the query sentence and a word belonging to the target category in each dialog segment;

determining a weighted evaluation value of each dialog segment based on the weight of the relevance evaluation value of each dialog segment; and

ranking the plurality of target dialog segments based on the weighted evaluation value.

16. The method of claim 10, wherein obtaining a query expression for the query statement comprises:

and determining a query expression aiming at the query statement according to the weight of each second keyword and the plurality of second keywords.

17. The method of claim 16, wherein determining a query expression for the query statement comprises:

determining necessary options and unnecessary options in the plurality of second keywords according to the weight of each second keyword; and

and obtaining the query expression by adopting an expression template based on the necessary selection words and the unnecessary selection words.

18. The method of claim 10, wherein determining a plurality of second keywords for a query statement and a weight for each of the plurality of second keywords comprises:

performing word segmentation processing on the query sentence to obtain a plurality of first words and respective weights of the first words;

obtaining a second word and the weight of the second word based on the synonyms of the first words and the respective weights of the first words in a preset synonym library; and

and determining a fourth intention type of the query statement by adopting a fourth intention recognition model, obtaining a third word expressing the fourth intention type, and determining the weight of the third word as a first preset weight.

19. The method of claim 18, wherein determining a plurality of second keywords for a query statement and a weight for each of the plurality of second keywords further comprises:

determining a second entity word included in the query statement by adopting a second entity recognition model;

determining a word belonging to the second entity word in the plurality of first words and the second word as a target word; and

determining the words with the weight larger than the weight threshold value in the target words as the second keywords,

wherein the weight threshold is associated with a number of the plurality of first words.

20. The method of claim 18, wherein determining a plurality of second keywords for a query statement and a weight for each of the plurality of second keywords further comprises:

and obtaining a fourth word and the weight of the fourth word based on the word associated with the third word in the intention word library and the weight of the third word.

21. The method of claim 18, wherein determining a plurality of second keywords for a query statement and a weight for each of the plurality of second keywords further comprises:

in response to the selection of the displayed recommended label, determining that a fifth word represented by the selected label is a second keyword for the query statement; and

determining the weight of the fifth word to be a second predetermined weight.

22. An apparatus for determining candidate information, comprising:

the characteristic information extraction module is used for extracting the characteristic information of each historical dialogue section in a plurality of historical dialogue sections;

a first evaluation value determination module, configured to determine, based on the feature information, a quality evaluation value of each of the historical dialog segments by using a predetermined evaluation model; and

and the candidate information obtaining module is used for determining the history dialogue section of which the quality evaluation value is greater than a preset evaluation value threshold value in the plurality of history dialogue sections and obtaining the candidate information.

23. An apparatus to determine query results, comprising:

the expression obtaining module is used for obtaining a query expression aiming at a query statement based on the query statement;

a dialog segment obtaining module, configured to obtain a plurality of dialog segments from candidate information based on the query expression; and

a query result determination module for determining a target dialog segment of the plurality of dialog segments as a query result for the query statement,

wherein the candidate information is determined using the apparatus of claim 22.

24. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-21.

25. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any of claims 1-21.

26. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 21.