CN110597968A

CN110597968A - Reply selection method and device

Info

Publication number: CN110597968A
Application number: CN201910350310.9A
Authority: CN
Inventors: 崔一鸣; 马文涛; 陈致鹏; 宋皓宇; 王士进; 胡国平; 张伟男; 刘挺
Original assignee: Hebei Xunfei Institute Of Artificial Intelligence; iFlytek Co Ltd
Current assignee: Hebei Xunfei Institute Of Artificial Intelligence; iFlytek Co Ltd
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2019-12-20

Abstract

The application discloses a reply selection method and a reply selection device, wherein the method comprises the following steps: after the target question asked by the questioner is obtained, the person portrait information of the questioner can be generated according to the target question, then all the replies to be selected of the target question are obtained, and then one reply can be selected from all the obtained replies to be selected according to the target question and the person portrait information to serve as the final reply of the target question. Therefore, when the final response of the target question is selected from all the responses to be selected, the character pictorial information of the questioner who proposes the target question is considered, so that the selected final response content is more relevant to the individual characteristics of the questioner, the conversation requirement of the questioner can be met, and the reasonability of the reply selection result is improved.

Description

Reply selection method and device

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a reply selection method and device.

Background

With the development of artificial intelligence and natural language processing technology, machines initially have certain ability to understand human language, which makes it possible for people to communicate with machines through human language, and therefore, various man-machine conversation systems have appeared in recent years. Such dialog systems can be divided into two categories according to whether they are task-oriented: one is task-type, with a definite goal or task, intended to complete the task with the shortest interaction time or turns, such as intelligent customer service, cell phone smart assistant, etc.; the other is of the natural interactive type, commonly known as "chat robots," which have no specific goal, and are intended to communicate, or even emotional complaints with humans.

In the natural interactive man-machine conversation system, a reply related to the conversation content is searched or generated based on the context of the conversation, but the obtained reply may not be related to the questioner, so that the conversation requirement of the questioner cannot be met.

Disclosure of Invention

The present disclosure provides a reply selection method and apparatus, which can obtain a reply related to a questioner to satisfy a dialog requirement of the questioner.

The embodiment of the application provides a reply selection method, which comprises the following steps:

acquiring a target question asked by a questioner;

generating character image information of the questioner according to the target question;

acquiring each reply to be selected of the target problem;

and selecting one reply from all replies to be selected according to the target question and the portrait information as a final reply of the target question.

An embodiment of the present application further provides a reply selection apparatus, including:

the target question acquiring unit is used for acquiring a target question asked by a questioner;

an image information generating unit configured to generate image information of a person of the questioner based on the target question;

a candidate reply acquisition unit, configured to acquire each candidate reply of the target problem;

and the reply-to-be-selected selecting unit is used for selecting one reply from all the reply-to-be-selected according to the target question and the portrait information as a final reply of the target question.

According to the reply selection method and device provided by the embodiment of the application, after the target question proposed by the questioner is obtained, the person portrait information of the questioner can be generated according to the target question, then, all replies to be selected of the target question are obtained, and then, one reply can be selected from all the replies to be selected according to the target question and the person portrait information and is used as the final reply of the target question. Therefore, when the final response of the target question is selected from all the responses to be selected, the character pictorial information of the questioner who submits the target question is considered, so that the selected final response content is more relevant to the individual characteristics of the questioner, the conversation requirement of the questioner can be met, and the reasonability of the reply selection result is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the embodiments or the technical solutions in the prior art are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a reply selection method according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a reply generation model according to an embodiment of the present application;

FIG. 3 is a schematic flow chart illustrating at least one alternative reply to a target question generated according to a target context and portrait information according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of generating at least one candidate reply to a target question according to an embodiment of the present application;

fig. 5 is a schematic flow chart illustrating a process of obtaining at least one candidate reply to a target question from a pre-constructed corpus of dialogues according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a correlation model provided in an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a reply selection model according to an embodiment of the present application;

FIG. 8 is a flowchart illustrating a process for selecting one reply from various replies to be selected according to the target context and portrait information according to an embodiment of the present application;

FIG. 9 is a flowchart illustrating a process of determining semantic relevance between a target context and portrait information and a reply to be selected, respectively, according to an embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of a reply selection method according to an embodiment of the present application;

fig. 11 is a schematic composition diagram of a reply selection apparatus according to an embodiment of the present application.

Detailed Description

Generally, human-machine dialog systems can be divided into two classes, one being based on search techniques and the other being based on generative models.

When the dialogue system based on the retrieval technology generates the reply of the question, the implementation process is as follows: firstly, generating semantic expression results of the context of the conversation between the current questioner and the machine, then, on the basis of a corpus containing a large number of human conversations, matching the semantic expression results with the semantic expression results corresponding to all the human conversations contained in the corpus by utilizing a semantic matching technology, retrieving at least one human conversation corresponding to the semantic expression result with higher matching degree from the semantic expression results, taking the human conversation as a human conversation which is semantically similar to the context of the conversation between the current questioner and the machine, further, taking the reply in the retrieved human conversation as a candidate reply to the question asked by the current questioner in the conversation, and finally, selecting the candidate reply with highest correlation from the candidate replies as a final reply to the question by calculating the correlation between the candidate replies and the question.

When the dialogue system based on the generative model generates the reply of the question, the realization process is as follows: firstly, a large amount of human dialogue corpora are collected in advance as training data, then, an initial neural network model (such as an end-to-end-based sequence generation model) which is constructed in advance is trained by utilizing the training data, so that the trained model can learn the transfer relationship in human dialogue, and further, the generation model can be utilized to generate the reply of the question which is presented by the current questioner in the dialogue.

However, the applicant finds that both of the above methods search or generate a reply related to the dialog content based on the context content of the dialog between the questioner and the machine, and the obtained reply may be related to the dialog content only and is not related to the questioner, however, the dialog content between human beings is often related to the personal information of the interlocutor, so that the replies selected by both methods are not related to the personal information of the questioner, and therefore, the dialog requirement of the questioner may not be satisfied.

For example, the following steps are carried out: assuming that the question asked by the questioner is "what you like to do generally on weekends", the two answer selection methods are used to select replies that may be "like outdoor sports", "like home", etc., and the relations between the replies and the question "what you like to do generally on weekends" both conform to the referral relation in the human conversation, and may also be stored in the corpus together with the question "what you like generally on weekends", but although both replies are related to the question "what you like to do generally on weekends", the semantics of the two replies are completely opposite, because both of the two existing answer selection methods fail to select a reply conforming to their own characteristics for the questioner according to the questioner's own personal information, resulting in different replies for the same question, possibly each selection, the repetition is irregular, resulting in failure to meet the dialog requirements of the questioner.

In order to solve the defects, the application provides a reply selection method, after a question asked by a questioner is obtained, person portrait information of the questioner is generated according to a target question, then all replies to be selected of the target question are obtained, and then one reply is selected from all the replies to be selected according to the target question and the person portrait information and is used as a final reply of the target question. Therefore, compared with the two reply selection methods, the method and the device have the advantages that when the final reply of the target question is selected from the replies to be selected, the personality characteristics of the questioner are fully considered, so that the selected final reply content is not only related to the man-machine conversation content of the questioner and the robot, but also related to the personality characteristics of the questioner, the conversation requirements of the questioner can be met, and the reasonability of a reply generation result is improved.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

First embodiment

Referring to fig. 1, a schematic flow chart of a reply selection method provided in this embodiment is shown, where the method includes the following steps:

s101: and acquiring the target question posed by the questioner.

In this embodiment, a question that the questioner presents to the machine and needs to be selected by the machine to reply is defined as a target question, and it should be noted that this embodiment does not limit the way in which the questioner presents the target question, for example, the questioner may present the target question to the machine by means of voice input, or may present the target question to the machine by means of text input, that is, the target question may be in a voice form or a text form, and this embodiment does not limit the language of the target question, such as chinese, english, and the like. In addition, when the questioner presents the target question to the machine by means of text input, the present embodiment also does not limit the type of input method used by the questioner, such as a dog search input method, a hundredth input method, and the like.

S102: based on the target question, the person image information of the questioner is generated.

In this embodiment, after the target question posed by the questioner is acquired in step S101, in order to improve the reasonableness of the reply selection result and enable the selected reply content to meet the dialog requirement of the questioner, first, person portrait information representing the personality characteristic of the questioner needs to be acquired based on the semantic information of the target question, where the person portrait information may include one or more of different types of information such as name (nickname), age, hobby, and lifestyle habit of the questioner.

The portrait information of the questioner refers to information describing the personality of a person, and can be acquired from a pre-constructed portrait information base. The portrait information base may be pre-stored with a large amount of candidate portrait information, which may be information data collected from existing person-describing personality information or information data simulated from existing person portrait information, and when storing the portrait information, various types of portrait information may be stored as different descriptive words, complete descriptive sentences, etc., and the portrait information may include different types of information such as the name (nickname), age, hobby, and lifestyle habits of a person.

When storing the image information in the image information base, the image information may be stored in accordance with different information types such as names, ages, and interests of people, and for example, the image information in the information type "name" may include: the person a, the person B, the person C, and the like, and the portrait information in the information type "age" may include various age groups, and other information types (e.g., hobbies, habits) may also include corresponding types of portrait information.

Furthermore, the collected existing image information and/or the image information simulated from the existing image information can be stored in the form of descriptive characters or words in the image information base to construct the image information base. Alternatively, the image information may be described by a plurality of complete descriptive words and stored in the image information base as a plurality of text lines, for example, assuming that the image information under the information type "interest and hobby" includes: basketball, badminton and outdoor hiking can be described by using a complete descriptive sentence 'i likes basketball, badminton and outdoor hiking', and the natural text can be stored in the portrait information base.

Based on this, after the image information base is constructed in advance, in this embodiment, an optional implementation manner is that this step S102 may specifically include: at least one candidate portrait information having a semantic meaning close to that of the target question is selected from a pre-constructed portrait information base as portrait information of the questioner.

In this implementation manner, after a large amount of candidate portrait information is pre-stored in the portrait information base, semantic comparison may be performed between each piece of candidate portrait information and the target question, so as to query at least one piece of candidate portrait information that is semantically close to the target question, and the at least one piece of candidate portrait information may be used as the portrait information that represents the personality of the questioner, and further, through the subsequent step S103, at least one candidate reply to the target question may be reasonably generated according to the portrait information and the target question, so that through the subsequent step S104, a final reply meeting the requirements of the questioner may be selected from the at least one candidate reply to be generated according to the portrait information and the target question.

Therefore, after the portrait information which embodies the individual characteristics of the questioner is obtained, the portrait information can be used as a basis for generating the to-be-selected reply of the target question, so that the generated to-be-selected reply can be more relevant to the characteristics of the questioner, and the dialogue requirements of the questioner are further met.

Next, the present embodiment will describe a specific process of "selecting at least one candidate portrait information semantically similar to the target question from a pre-constructed portrait information library" in the above implementation manner through the following steps a 1-A3.

Step A1: and generating a semantic representation result of the target problem.

In this embodiment, a word segmentation method may be first used to segment the recognition text corresponding to the target problem to obtain each word included in the recognition text, then a weight of each word in the recognition text may be calculated, and then a vector generation method may be used to generate a word vector corresponding to each word in the recognition text, for example, a word vector corresponding to each word in the recognition text may be queried in a manner of querying a semantic dictionary, and then a semantic expression result of the target problem may be generated according to the word vector corresponding to each word and the weight of each word in the recognition text, where a specific calculation formula is as follows:

wherein V represents the semantic representation result of the target problem; n represents the total number of words contained in the recognition text corresponding to the target question; e_iRepresenting a word vector corresponding to an ith word in the recognition text; w is a_iAnd the weight of the ith word in the recognition text is represented, and the greater the weight is, the higher the importance of the ith word in the recognition text is represented.

For example, the following steps are carried out: assuming that the recognition text corresponding to the target problem is 'what you like to do on the weekend', the words obtained after the recognition text is segmented are as follows: "you", "weekend", "like", "do", and "what", then, the weights of the words in the recognition text can be calculated as "0.2", "0.3", "0.16", "0.18", and "0.14", respectively, and then, the word vectors corresponding to the words are generated as E₁、E₂、E₃、E₄、E₅Further, the semantic expression result of the target question "what you like to do on weekends" can be calculated by the above formula (1) as: v is 0.2E₁+0.3E₂+0.16E₃+0.18E₄+0.14E₅。

Step A2: a semantic representation of each candidate image information in the image information base is generated.

In this embodiment, how to generate the semantic representation result of the candidate image information will be described based on a certain candidate image information in the image information base, and the processing manner of other candidate image information is similar and will not be described again.

Specifically, as shown in step a1, the candidate image information may be first segmented by a segmentation method to obtain each word included in the candidate image information, then the weight of each word in the candidate image information may be calculated, then a word vector corresponding to each word may be generated, and further, a semantic expression result of the candidate image information may be generated based on the word vector corresponding to each word and the weight of each word in the candidate image information, and the specific calculation formula is as follows:

wherein, V_h' semantic representation result of h-th candidate portrait information in portrait information base; n' represents the total number of words contained in the h-th candidate image information in the image information base; e_j' represents a word vector corresponding to the jth word in the h candidate image information; gamma ray_jThe importance of the jth word in the h candidate image information is higher when the weight of the jth word in the h candidate image information is larger.

For example, the following steps are carried out: supposing that the h-th candidate portrait information in the portrait information base is 'i like basketball, badminton and outdoor hiking', the candidate portrait information is subjected to word segmentation, and the obtained words are as follows: "i", "like", "basketball", "", "badminton", "sum", "outdoor", "hiking", then, the weights of the words in the candidate portrait information are calculated as "0.1", "0.15", "0.2", "0.01", "0.2", "0.05", "0.1", "0.14", respectively, and then, word vectors corresponding to the words are generated as E₁'、E₂'、E₃'、E₄'、E₅'、E₆'、E₇'、E₈Furthermore, the semantic expression result of the h-th candidate portrait information "i like basketball, badminton, and outdoor hiking" can be calculated by using the above formula (2) as follows: v_h'＝0.1E₁'+0.15E₂'+0.2E₃'+0.01E₄'+0.2E₅'+0.05E₆'+0.1E₇'+0.14E₈'。

It should be noted that the present embodiment does not limit the execution order of steps a1 and a 2.

Step A3: at least one piece of candidate portrait information similar to the target question in semantic distance is selected according to the generated semantic representation result.

In the present embodiment, the target question is generated by the step A1Semantic representation V, and generating a semantic representation V of each candidate image information in the image information base (e.g., the h-th candidate image information in the image information base) via step A2_h') obtaining a semantic distance between the target question and each piece of candidate portrait information in the portrait information base by calculating a cosine distance between the semantic representation result of the target question and the semantic representation result of each piece of candidate portrait information in the portrait information base, wherein the specific calculation formula is as follows:

sim＝cos ine(V,V_h') (3)

wherein sim is a semantic distance value used for measuring the semantic distance between the target problem and the h-th candidate image information in the image information base; cos ine represents a cosine distance calculation formula; v represents the semantic representation result of the target problem; v_h' represents the semantic representation result of the h-th candidate image information in the image information base.

It is understood that the larger the sim value in the above formula (3) is, the smaller the semantic distance between the target question and the h-th candidate portrait information in the portrait information base is, i.e., the closer the semantic distance is.

Based on this, the formula (3) can be used to calculate the semantic distance value between the target problem and each piece of candidate portrait information in the portrait information base, and further, the semantic distance values can be sorted from large to small, and the candidate portrait information corresponding to the semantic distance values of the preset number sorted at the top is selected as the candidate portrait information similar to the target problem in semantic distance; or, sorting the semantic distance values from small to large, and selecting candidate portrait information corresponding to the semantic distance values of the sorted preset number as candidate portrait information similar to the target problem in semantic distance; alternatively, candidate image information corresponding to all semantic distance values higher than a preset threshold value is selected as candidate image information that is similar to the target question in semantic distance.

S103: and acquiring each reply to be selected of the target problem.

In this embodiment, after the target question asked by the questioner is acquired in step S101 and the portrait information of the questioner is generated in step S102, data processing may be performed on the target question and/or the portrait information, so as to acquire, according to the processing result, each reply to be selected of the target question that meets the dialog requirement of the questioner.

This step S103 can adopt one of the following three implementation manners.

In a first implementation manner of S103, step S103 may specifically include: at least one candidate reply to the target question is generated based on the target question and the portrait information.

In this embodiment, it is understood that the responses to the target questions have a certain relationship with the contents of the historical dialog of the questioner before the questioner presents the target questions, and therefore, in order to improve the rationality of the generated candidate responses so that a final response satisfying the dialog requirement of the questioner can be selected from the candidate responses in the following, the person figure information of the questioner is generated in step S102, and then the corresponding encoding and decoding processes are performed on the figure information and the context to which the target questions belong, and at least one candidate response to the target question can be generated according to the decoding result.

It should be noted that, in the present implementation manner of step S103, a specific implementation procedure of "generating at least one candidate reply of the target question according to the target question and the person image information of the questioner" will be described in the second embodiment.

In a second implementation manner of S103, step S103 may specifically include: and acquiring at least one reply to be selected of the target question from a pre-constructed dialogue corpus.

In this implementation manner, similarly, since the reply of the target question has a certain relationship with the history dialog content of the questioner before the questioner proposes the target question, in order to improve the rationality of the obtained reply to be selected so that a final reply meeting the dialog requirement of the questioner can be subsequently selected from the replies to be selected, after the target question proposed by the questioner is obtained through step S101, first, based on the semantic information of the context to which the target question belongs, each set of corpus contexts semantically similar to the context to which the target question belongs is searched from the pre-constructed dialog corpus, and then, after the reply corpus corresponding to the question corpus in each set of corpus contexts is obtained, at least one reply corpus is selected from the set of corpus contexts to serve as at least one reply to be selected for the target question.

It should be noted that, in the present implementation manner of step S103, a specific implementation process of "obtaining at least one reply to be selected of the target question from the pre-constructed dialog corpus" will be described in the third embodiment.

In a third implementation manner of S103, step S103 may specifically include: generating at least one reply to be selected of the target question according to the target question and the portrait information of the questioner, and acquiring at least one reply to be selected of the target question from a pre-constructed dialogue corpus. For specific implementation, refer to the first implementation manner and the second implementation manner.

S104: and selecting one reply from the various replies to be selected as the final reply of the target question according to the target question and the portrait information.

In this embodiment, after the person image information of the questioner is generated in step S102, and after each candidate reply of the target question is obtained in step S103, in order to enable the selected final reply to meet the dialog requirement of the questioner, semantic relevance calculation may be performed on the person image information and the context to which the target question belongs with each candidate reply, and further, according to the calculation result, a reply with the highest relevance to the person image information and the context to which the target question belongs may be selected from each candidate reply as the final reply of the target question.

It should be noted that a specific implementation manner of the step S104 will be described in the fourth embodiment.

In summary, according to the reply selection method provided in this embodiment, after the target question posed by the questioner is obtained, the person image information of the questioner may be generated according to the target question, then, each reply to be selected of the target question is obtained, and then, one reply is selected from the obtained replies to be selected according to the target question and the person image information, and is used as the final reply of the target question. Therefore, when the final reply of the target question is selected from the replies to be selected, the character pictorial information of the questioner who proposes the target question is considered, so that the selected final reply content is more relevant to the individual characteristics of the questioner, the dialogue requirement of the questioner can be met, and the reasonability of the reply selection result is improved.

Second embodiment

The present embodiment will describe a specific implementation procedure of "generating at least one reply to be selected for the target question according to the target question and the person image information of the questioner" in the first implementation manner of step S103 in the first embodiment.

It is understood that the answer to the target question is greatly related to the dialog content between the questioner and the machine before the target question is presented, so in order to make the generated answer more suitable for the dialog requirement of the questioner, an alternative implementation is that "generating at least one alternative answer to the target question according to the target question and the person image information of the questioner" may specifically include: and generating at least one reply to be selected of the target question according to the target context and the character image information of the questioner.

Wherein the target context includes the target question and the historical dialog context before the target question, and may be defined as C. The target context C may include the target question and all history dialog texts before the target question, or may include a part of history dialog texts including the target question from the target question onward. Assume that the target context C includes m sentences, where the m sentences are defined as u in order from front to back in time order₁、u₂、…、u_mWherein u is_mThe text content corresponding to the target question is referred to, and it should be noted that the number m of target contexts is the same as the number l of target contexts mentioned in the fourth embodiment.

Furthermore, the information data related to the target question can be obtained by performing data processing on the target context and the character image information of the questioner, so as to generate more reasonable candidate replies of the target question.

The reply generation model pre-constructed in this embodiment may be formed by a multi-layer network, as shown in fig. 2, the model structure includes an input layer, a word-level coding layer, a sentence-level coding layer, a reply generation layer, and an output layer. Wherein, the word level coding layer and the sentence level coding layer jointly form a coding (Encoder) part of the model, and are used for coding the input target context and the portrait information into a target context coding vectorAnd a character portrait information coding vectorReverting the generated layer as a decoding (Decoder) part of the model for encoding the vector for the target contextAnd character image information coding vectorAnd decoding, and generating specific contents of the reply to be selected of the target question based on the decoding result through an output layer.

Specifically, in this embodiment, an alternative implementation manner is that, referring to fig. 3, "generating at least one candidate reply to the target question according to the target context and the person image information of the questioner" may specifically include the following steps S301 to S302:

s301: information related to the target question is extracted from the target context and the person image information of the questioner.

In this implementation, a pre-constructed reply generation model may be used to encode the target context and the person portrait information of the questioner, so as to remove some redundant information in the target context and the person portrait information of the questioner, and extract information more relevant to the target question, and then, the information more relevant to the target question is used to perform decoding, so as to obtain a corresponding decoding result, so as to generate a reply to be selected with higher rationality.

Specifically, referring to fig. 2, it is assumed that m sentences included in the target context (including the target question) are u in order of appearance₁、u₂、…、u_mWherein u is_mRefers to the text content corresponding to the target question, and the K descriptive sentences included in the character image information of the questioner are respectively p₁、p₂、...、p_kNow, the target context and the person image information of the questioner may be used as input data, and after the input data is input to a pre-constructed reply generation model, the model may be further used to extract information related to the target question from the target context and the person image information of the questioner, and the specific implementation process includes the following steps B1-B2:

step B1: first sentence representation results of respective sentences in the target context are generated, and context representation results of the target context are obtained by focusing attention on information related to the target question in the respective first sentence representation results.

As shown in FIG. 2, first, m sentences u in the target context may be combined₁、u₂、...、u_mThe word vectors of the respective words included in (1) are input to the input layer of the reply generative model as input data. For example, suppose u₁Is "you are … … dry and u is₁The middle bag contains m₁Each participle can be sequentially corresponding to a word vectorAs input data, to the input layer of the reply generative modelBy analogy, u can be_i(I see drama at … …), … …, u_mWord vector corresponding to each participle contained in (what … … TV play is watched) in turn As input data, the input data is input to an input layer of the reply generative model.

Then, a shared bidirectional Long Short Term memory network (BilSTM) contained in a word level coding layer of the reply generation model is used for coding word vectors contained in each sentence in a target context input by the input layer, and each sentence is coded to obtain a sentence coding result corresponding to each sentence, wherein the sentence coding result comprises a splicing result of network output vectors of a first node and a last node obtained after the BilSTM network codes word vectors of each participle in the corresponding sentence, and the sentence coding result corresponding to each sentence is defined as a first sentence representation result of the corresponding sentence.

For example, the following steps are carried out: as shown in FIG. 2, for the first sentence u in the target context₁Using shared BilSTM in the word-level coding layer of the reply generation model to carry out vector processing on the words contained in the shared BilSTMAfter the coding processing is carried out, the network output vector v corresponding to the first node and the last node of the network can be obtained_1,1Andthen the two are spliced to obtain a sentence u₁Sentence coding result h₁That is, the first sentence u in the target context can be obtained₁The first sentence of (1) represents a result of h₁。

Next, the model can be generated using the replyThe sentence-level coding layer further codes the first sentence representation results of each sentence in the target context to pay attention to the information related to the target problem in each first sentence representation result, and then can obtain the target context coding vectorTo represent the result as context of the target context.

Specifically, as shown in FIG. 2, the first sentence of each sentence in the target context may be represented as a result (h) using a sentence-level coding layer of the reply generation model₁、h₂、…、h_m) Respectively with the first sentence of the target question (i.e. the last sentence u in the target context)_mRepresents the result h in the first sentence of (1)_m) Attention calculation is performed to pay attention to information related to the target problem in each first sentence representation result, information unrelated to the target problem can be removed, only the information concerned can be encoded, and in the encoding process, the weight of each sentence in the target context can be calculated firstly, and a specific calculation formula is as follows:

e_i＝V^Ttanh(Wh_i+Uh_m) (4)

v, W, U each represents a model parameter obtained by training a reply generation model; e.g. of the type_iRepresenting the correlation between the ith sentence and the target question (i.e., the mth sentence in the target context) in the target context; h is_iA first sentence representation result representing an ith sentence in the target context; h is_mA first sentence representation result representing a target question (i.e., the mth sentence in the target context); alpha is alpha_iDenotes a description of_iObtaining the weight of the ith sentence in the target context after the normalization of the softmax function, wherein the value of the weight represents the high correlation between the ith sentence and the target question in the target contextLow, i.e., the larger the value of the weight, the higher the correlation between the ith sentence and the target question in the target context, whereas the smaller the value of the weight, the lower the correlation between the ith sentence and the target question in the target context.

After the weight of each sentence in the target context is calculated through the above formulas (4) and (5), the context expression result of the target context can be further calculated according to the weight value corresponding to each sentence, and the specific calculation formula is as follows:

wherein the content of the first and second substances,a context representation result representing a target context; alpha is alpha_iRepresenting the weight of the ith sentence in the target context; h is_iThe first sentence representing the ith sentence in the target context represents the result.

Step B2: second sentence representation results of respective sentences in the character image information are generated, and an image representation result of the character image information is obtained by focusing on information related to the target question in the respective second sentence representation results.

As shown in FIG. 2, first, K sentences p in the person image information may be extracted₁、p₂、...、p_kThe word vectors of the contained words are input to the input layer of the reply generative model as input data. For example, suppose p₁To "I like … … sports", and p₁Comprising k₁Each participle can be sequentially corresponding to a word vectorAs input data, input to the input layer of the reply generative model, and so on, p can be represented_i(I have seen … … movie), … …, p_kWord vectors corresponding to the participles contained in the swordsmen … … heaven-dragon eight parts in sequence As input data, the input data is input to an input layer of the reply generative model.

Then, using a shared two-way long-short term memory network BilSTM contained in the word level coding layer of the reply generation model to code word vectors contained in each sentence in the character image information input by the input layer, and respectively coding each sentence to obtain a sentence coding result corresponding to each sentence, wherein the sentence coding result comprises a splicing result of network output vectors of a first node and a last node obtained after the BilSTM network codes the word vectors of each participle in the corresponding sentence, and here, the sentence coding result corresponding to each sentence is defined as a second sentence representation result corresponding to the sentence.

For example, the following steps are carried out: as shown in FIG. 2, for the first sentence p in the character image information₁Using shared BilSTM in the word-level coding layer of the reply generation model to contain word vectorsAfter processing, network output vectors v 'corresponding to the first node and the last node of the network can be obtained'_1，1Andthen the two are spliced to obtain a sentence p₁Sentence coding result l₁Then the first sentence p in the character image information can be obtained₁The second sentence of (2) indicates that the result is l₁。

Then, a second sentence representation result of each sentence in the character image information can be further encoded by using the sentence-level encoding layer of the reply generation model, so as to focus on the information related to the target question in each second sentence representation resultFurther, a character image information coding vector can be obtainedThe image display result is used as the image information of the person.

Specifically, as shown in fig. 2, the second sentence of each sentence in the person image information may be represented by the sentence-level coding layer of the reply generation model (l)₁、l₂、…、l_k) Respectively with the first sentence of the target question (i.e. the last sentence u in the target context)_mRepresents the result h in the first sentence of (1)_m) Attention calculation is performed to focus on information related to the target question in each second sentence expression result, information unrelated to the target question can be removed, only the focused information can be encoded, and in the encoding process, the weight of each sentence in the person image information can be calculated first, and a specific calculation formula is as follows:

e'_j＝V₁ ^Ttanh(W'l_j+U'h_m) (7)

wherein, V₁ ^TW 'and U' both represent model parameters obtained by training the reply generation model; e'_jRepresenting the correlation between the jth sentence and the target question (i.e. the mth sentence in the target context) in the character image information; l_jA second sentence representing result representing a jth sentence in the character image information; h is_mA first sentence representation result representing a target question (i.e., the mth sentence in the target context); alpha is alpha_j'represents e'_jA weight representing the jth sentence in the character representation information obtained by normalization with a softmax function, wherein the value of the weight represents the correlation between the jth sentence in the character representation information and the target question, namely, the larger the value of the weight is, the higher the correlation between the jth sentence in the character representation information and the target question is,conversely, a smaller value of the weight indicates a lower correlation between the jth sentence in the character image information and the target question.

After the weight of each sentence in the character image information is calculated by the above formulas (7) and (8), the image representation result of the character image information can be further calculated according to the weight value corresponding to each sentence, and the specific calculation formula is as follows:

wherein the content of the first and second substances,an image representation result representing the person image information; alpha's'_jA weight indicating a jth sentence in the character image information; l_jA second sentence representing a jth sentence in the character image information is displayed.

S302: and generating at least one reply to be selected of the target question according to the extracted related information.

In step S301, information related to the target question is extracted from the target context and the person image information, and then, for example, the target context encoding vector is acquiredAnd a character portrait information coding vectorThen, as shown in FIG. 2, first, a target context encoding vector may be encoded by using a unidirectional Long-Short Term memory network (LSTM) included in the reply generation layer of the reply generation modelAnd a character portrait information coding vectorDecoding to obtain the reply generation layerThe specific calculation formula of the internal decoding state generated at the moment is as follows:

wherein S is_tRepresents an inner decoding state generated by the replay generation layer at the tth generation timing; s_t-1Representing the internal decoding state generated by the reply generation layer at the t-1 generation moment; y is_t-1The word vector represents the word generated by the output layer at the t-1 generation moment and corresponds to the word (namely the t-1 word in the reply to be selected output by the output layer);representing target context encoding vectorAnd a character portrait information coding vectorAnd (5) splicing the vectors.

Note that, in order to obtain the internal decoding state S in which the reply generation layer is generated at the 1 st generation timing₁The word vector y can be expressed₀Set to a special symbol, and the inner decoding state s of the reply generation layer at the 0 th generation time₀Set as the first sentence representation result h of the target question_mIn order to further perform the subsequent decoding step.

Then, the output layer of the reply generation model may be used to obtain a word probability distribution corresponding to each generation time from the internal decoding state generated at each generation time of the reply generation layer, and further obtain words at each generation time from the word probability distribution corresponding to each generation time, and the words constitute the reply to be selected in the generation order.

Specifically, the internal decoding state S generated at the t-th generation timing from the reply generation layer can be generated by the output layer of the reply generation model_tUsing a precursorAnd performing probability calculation on the candidate dictionary which is constructed firstly, wherein when the candidate dictionary is constructed, the vocabulary which is frequently used in daily reply can be sorted according to the occurrence frequency of the vocabulary, the vocabulary with low occurrence frequency is removed, and the remaining L words with high occurrence frequency form the candidate dictionary. Thus, the output layer can use the candidate dictionary to generate the internal decoding state S at the t-th generation time from the recovery generation layer_tObtaining a probability distribution with a length of L corresponding to the generation time, that is, the probability distribution includes a probability that the word generated at the t-th generation time is each of the L different words in the candidate dictionary, and further, in order to improve accuracy of a reply generation result, a word corresponding to the maximum probability value may be selected as the word generated at the t-th generation time, where a specific calculation formula of the word probability distribution generated at the t-th generation time is as follows:

z_t＝soft max(s_t) (11)

wherein z is_tRepresenting the probability distribution of the words generated at the t-th generation time; s_tIndicating the inner decoding state generated by the replay generation layer at the t-th generation time.

It should be noted that, in the actual decoding process of the model, in order to improve the working efficiency of the model and reduce the redundant computation flow of the model as much as possible, a special symbol, such as < end >, may be preset as an identifier for stopping decoding, when the model generates the identifier in the decoding process, the decoding operation is stopped, otherwise, if the special symbol is not set, the decoding is stopped until the preset maximum recovery length.

In the introduction, when the reply generation model is used for coding and decoding the target context and the character image information of the questioner, the individual characteristics of the questioner are fully considered, information irrelevant to the target question is removed in the coding and decoding process, only the concerned information relevant to the target question is decoded, and then a reply which is higher in accuracy and better meets the requirements of the questioner can be generated according to the decoding result.

It should be noted that, in order to further improve the accuracy of the reply generation result, so that the reply can meet the dialog requirement of the questioner, in an alternative implementation manner, the present embodiment may further generate at least one alternative reply to the target question through the following steps S401 to S403:

s401: and respectively determining probability distribution corresponding to the t-th generation moment by using N reply generation models, wherein N is more than or equal to 1.

In this implementation, in order to improve the accuracy of the reply result, a plurality of reply generation models may be trained in advance, and the probability distributions corresponding to the tth generation time may be determined by using the reply generation models. For example, N reply generation models may be trained in advance to determine probability distributions corresponding to N tth generation times.

The probability distribution includes the probability when the word generated at the tth generation time is each candidate word in the candidate dictionary, where the probability represents the probability when the word generated at the tth generation time is each candidate word in the candidate dictionary, and if the probability when the word generated at the tth generation time is a certain candidate word in the candidate dictionary is higher, it indicates that the word generated at the tth generation time is more likely to be the candidate word, otherwise, if the probability when the word generated at the tth generation time is a certain candidate word in the candidate dictionary is lower, it indicates that the word generated at the tth generation time is less likely to be the candidate word.

For example, the following steps are carried out: the 5 reply generative models can be trained in advance, and then the output layers of the 5 reply generative models are used to respectively determine the probability distributions corresponding to the 5 tth generation moments through the formula (11), namely Each probability distribution comprises the probability when the word generated at the tth generation moment is each candidate word in the candidate dictionary.

In this embodiment, when the N reply generation models are used to determine the probability distribution corresponding to the t-th generation time, the candidate dictionaries that are used for the determination are the same.

S402: and according to the probability distribution determined by the N reply generation models respectively, selecting M candidate words from the candidate word dictionary as the words generated at the t-th generation moment, wherein M is more than or equal to 1.

After the probability distribution corresponding to the t-th generation time is respectively determined by the N reply generation models in step S401, that is, after the probability of each candidate word in the candidate dictionary of the word generated at the t-th generation time is respectively determined by the N reply generation models, based on that the same candidate dictionary is used, it can be determined that the dimensions of the N probability distributions are the same, and the probability value corresponding to each dimension corresponds to the probability that the word generated at the t-th generation time is a candidate word in the candidate dictionary, so that the N probability distributions determined at the generation time can be averaged to obtain the average probability distribution corresponding to the generation time, where the average probability distribution includes the average probability of the word generated at the t-th generation time as each candidate word in the candidate dictionary. That is, for each candidate word in the candidate dictionary, the N probability values of the candidate word are averaged to obtain the average probability of the candidate word, so that when the candidate dictionary includes L words, the average probability distribution includes L average probabilities corresponding to the L words.

Further, after the average probability of the word generated at the tth generation time as each candidate word in the candidate dictionary is determined, M candidate words respectively corresponding to M higher average probabilities may be selected from the M candidate words as the word generated at the tth generation time, where M is greater than or equal to 1. For example, if the value of M is 2, 2 candidate words corresponding to 2 higher average probabilities may be selected from the average probability distribution as the word generated at the tth generation time.

S403: and combining the words generated at each generation moment to obtain at least one reply to be selected.

M candidate words are selected from the candidate dictionary as the first word through step S402After the words generated at the t generation times, the words generated at each generation time may be combined to obtain at least one reply to be selected. Specifically, the total number of generation times experienced in the process of generating the reply to be selected is set as n, and each generation time selects M candidate words, and then the generated words at each generation time are combined to obtain the reply to be selected with the number of MⁿThat is, M can be obtainedⁿAnd (5) returning to be selected.

For example, the following steps are carried out: assume that 3 generation moments are experienced in the process of generating a reply, and each generation moment selects 2 candidate words, specifically: the candidate word selected at the 1 st generation time is A, B, the candidate word selected at the 2 nd generation time is C, D, and the candidate word selected at the 3 rd generation time is E, F, then after the words generated at each generation time are combined, the number of replies to be selected is: 2³These 8 candidate replies are specifically: ACE, ACF, ADE, ADF, BCE, BCF, BDE, BDF.

It should be noted that, in the above contents, the number M of candidate words selected at each generation time is the same, but in this embodiment, the number M of candidate words selected at each generation time may also be different, and the combination manner between the candidate words is similar to that described above, and is not described here again.

Next, the present embodiment will briefly describe the construction process of the reply generative model. Specifically, the method may comprise the following steps C1-C3:

step C1: model training data is formed.

In this embodiment, in order to construct the reply generation model, a large amount of candidate image information (for constructing an image information base, the related contents of which are shown in the first embodiment), a large amount of human dialog contexts, and the actual reply contents corresponding to the last question in each human dialog context need to be collected in advance, and these collected data are used as model training data.

Step C2: and constructing a reply generation model.

An initial reply generation model may be constructed and model parameters initialized.

It should be noted that the execution sequence of step C1 and step C2 is not limited in this embodiment.

Step C3: and training the reply generation model by using the pre-collected model training data.

In this embodiment, after the model training data is collected in step C1, the reply generative model constructed in step C2 may be trained by using the model training data, and the reply generative model may be obtained by training through multiple rounds of model training until the training end condition is satisfied.

Specifically, in the present round of training, it is necessary to select a set of human dialogue contexts from the model training data (the last problem is defined as a sample problem), and at this time, the target context in the above embodiment is replaced with the human dialogue context, and the person image information corresponding to the human dialogue context is generated and the sample reply corresponding to the sample problem is predicted to be generated, in the manner described in the above embodiment. And then, updating parameters of the reply generation model according to the difference between the predicted sample reply content and the actual reply content corresponding to the human dialogue context, namely completing the current round of training of the reply generation model.

In this way, in the embodiment, the pre-constructed reply generation model is used to perform encoding and decoding processing on the character image information and the target context so as to extract information related to the target question from the target context and the character image information of the questioner and remove information unrelated to the target question, so that each reply to be selected, which has higher accuracy and better meets the requirement of the questioner, can be generated according to a decoding result obtained by decoding the extracted information related to the target question.

In summary, the embodiment generates at least one reply to be selected of the target question by using the person image information of the questioner and the target question. Therefore, when the to-be-selected reply of the target question is generated, the portrait information of the questioner who submits the target question is considered, so that the generated to-be-selected reply content is more relevant to the personality characteristics of the questioner, the final reply of the target question can be selected from the to-be-selected replies, the dialogue requirements of the questioner are met, and the reasonability of the reply generation result is improved.

Third embodiment

The present embodiment will describe a specific implementation process of "obtaining at least one reply to be selected for a target question from a pre-constructed dialog corpus" in the second implementation manner of step S103 in the first embodiment.

Referring to fig. 5, a schematic diagram of a process for obtaining at least one candidate reply of a target question from a pre-constructed corpus of dialogues according to the embodiment is shown, where the process includes the following steps:

s501: and acquiring a target context, wherein the target context comprises a target question asked by a questioner and historical dialogue texts before the target question.

In this embodiment, the target context may include the target question and all the historical dialog texts before the target question, or the target context may include a part of the historical dialog texts including the target question from the target question onward. It is assumed that the target context includes m sentences, which are defined as u in chronological order from front to back₁、u₂、...、u_mWherein u is_mThe text content corresponding to the target question is referred to.

S502: and acquiring various groups of linguistic data context which are similar to the target context in semanteme, wherein the linguistic data context comprises a question linguistic data and historical dialogue upper text before the question linguistic data.

In this embodiment, after the target context is obtained in step S501, in order to obtain a candidate reply that can satisfy the dialog requirement of the questioner, first, each group of corpus contexts semantically close to the target context is obtained based on the semantic information of the target context, and each obtained corpus context includes the corpus of question and the history conversation context before the corpus of question, where the corpus context may be collected in advance, multiple rounds of human-to-machine dialogues may be used as one group of corpus contexts, multiple rounds of human-to-human dialogues may be used as one group of corpus contexts, and the corpus of question in the corpus context is a user question.

It should be noted that the corpus context may include the question corpus and all historical dialog texts before the question corpus, or the corpus context may include a part of the historical dialog texts including the question corpus from the question corpus onward. Assume that a corpus context includes n sentences, where the n sentences are defined as v in chronological order from front to back₁、v₂、...、v_nWherein v is_nThe text content corresponding to the question corpus is referred to.

Next, the present embodiment will describe a specific process of "acquiring each set of corpus contexts semantically close to the target context" in step S502 through the following steps D-E.

Step D: and searching each group of contexts relevant to the target context from a pre-constructed dialogue corpus.

In this embodiment, after the target context is obtained in step S501, the text search method may be used to search each group of contexts related to the target context from the dialog corpus that is constructed in advance, for example, the distributed search engine (elastic search) or the full-text search server (Solr) may be used to search each group of contexts related to the target context from the dialog corpus that is constructed in advance. Then, through the subsequent step E, each group of contexts that are semantically similar to the target context is further screened out from the searched each group of contexts.

The dialog corpus may have a plurality of groups of contexts and reply corpora corresponding to the question corpus (i.e., the last question in each group of contexts) in each group of contexts, and the contexts and the reply corpora corresponding to the contexts may be obtained by collecting dialogues of people in daily life and performing sensitive information processing on the dialogues. Specifically, when constructing a dialog corpus, a large amount of dialog data of people in daily life may be collected, for example, a large amount of real dialog data of people on a social network platform (e.g., a microblog, a post bar, etc.) may be collected, then some sensitive data (e.g., a telephone number or an identification card number, etc.) in the dialog texts may be deleted or replaced, then each group of the dialog data after being processed by sensitive information may be directly used as a group of contexts, or a group of contexts may be extracted from the group of the dialog data with a partially continuous dialog data, where the last sentence in each group of contexts is a user question, and at this time, the reply corpus of each group of contexts and the user question in each group of contexts is stored in a dialog corpus.

In addition, when the dialog corpus is constructed, in addition to the groups of contexts and the reply corpora corresponding to the contexts obtained by processing the existing real dialog data, other dialog data can be simulated according to the real dialog data, and the groups of contexts and the reply corpora corresponding to each group of contexts are obtained from the real dialog data to construct the dialog corpus.

Similarly, the reply corpus corresponding to each group of contexts may be a text with a second preset length, and the first preset length is usually greater than the second preset length.

It should be noted that, when searching for each set of contexts related to the target context from the dialog corpus, for convenience of description, each context in the dialog corpus may be defined as a sample context, and a correlation value between each sample context and the target context may be calculated, where the correlation value is used to measure the correlation between the corresponding sample context and the target context, for example, the larger the correlation value is, the larger the correlation between the two contexts is. Then, sorting the correlation coefficient values from large to small, and selecting each group of sample contexts corresponding to the correlation coefficient values of the preset number sorted at the front as each group of contexts related to the target context semantically; or, the correlation coefficient values are sorted from small to large, and each group of sample contexts corresponding to the preset number of sorted correlation coefficient values is selected as each group of contexts semantically related to the target context.

In addition, in order to be able to search as many sets of contexts semantically related to the target context as possible, the preset number may be set to be a larger value as possible when the system computation amount allows, for example, it may be set to 1000, that is, 1000 sets of contexts semantically related to the target context may be selected when the system computation amount allows, so as to further obtain, through the subsequent step B, sets of contexts semantically similar to the target context from the 1000 sets of contexts.

Step E: and screening out each group of contexts similar to the target context in semanteme from the searched each group of contexts to be used as each group of corpus contexts.

In this embodiment, after each group of contexts related to the target context is searched out from the pre-constructed dialog corpus through step D, the semantic similarity between each searched group of contexts and the target context may be calculated, and each group of contexts semantically similar to the target context is screened out according to the calculation result, that is, each group of contexts semantically similar to the target context is screened out as each group of corpus contexts.

Therefore, after each group of contexts related to the target context is searched, each group of contexts which are similar to the target context in semantics can be screened out to be used as a basis for acquiring the reply to be selected of the target problem, so that the acquired content of the reply to be selected is semantically related to the content of the target context, and the conversation requirement of a questioner is further met.

Next, the present embodiment will describe a specific process of "screening out groups of contexts semantically close to the target context from the searched groups of contexts" in step E through steps E1 to E3 described below.

Step E1: each set of searched contexts is defined as a search context.

In the present embodiment, for convenience of description, each group of contexts searched from the corpus of dialogues that is related to a target context is defined as a search context.

Step E2: and generating context characteristics corresponding to the search context.

In this embodiment, for each group of search contexts, in order to determine whether the search context is semantically similar to the target context, first, a context feature corresponding to the search context may be generated.

The context features corresponding to the search context comprise co-occurrence features and/or semantic features, the co-occurrence features represent the importance of co-occurrence words in the search context and the target context, and the semantic features represent the semantic similarity of the search context and the target context.

It should be noted that, in this embodiment, how to generate the context features corresponding to the search contexts is based on a certain group of search contexts in all search contexts, and the processing manners of other groups of search contexts are similar to the above, which is not described herein again.

The co-occurrence characteristics are described below.

One way to generate the co-occurrence features is: firstly, utilizing a word segmentation method to segment words of a search context to obtain each word contained in the search context, then deleting stop words which are contained in the search context and have no definite meaning, and then calculating the weight of the remaining words in the search context, wherein the larger the weight is, the higher the importance of the corresponding word in the search context is.

Similarly, the word segmentation method may be utilized to segment the target context to obtain each word included in the target context, and then delete stop words having no definite meaning, such as "of" and "in" included therein, and then calculate the weight of each remaining word in the target context, where the larger the weight value is, the higher the importance of the corresponding word in the target context is.

In this embodiment, the word that both the search context and the target context have is defined as a co-occurrence word, which may be composed of at least one word.

Furthermore, the weights of the co-occurring words in the search context and the target context may be respectively summed, the two summed calculation results are harmonic-averaged, and the calculation result is used as the co-occurring feature corresponding to the search context to represent the importance of all the co-occurring words contained in the search context and the target context, where the specific calculation formula is as follows:

f_w＝f_s*f_h*2/(f_s+f_h) (14)

wherein the content of the first and second substances,representing the weight of the ith co-occurrence word in the search context, wherein the greater the weight value, the higher the importance of the ith co-occurrence word in the search context is; n represents the total number of co-occurring words in the search context and the target context; f. of_sRepresenting the total weight of all co-occurring words in the search context and the target context in the search context, wherein the greater the weight value, the higher the importance of all co-occurring words in the search context;representing the weight of the ith co-occurrence word in the target context, wherein the greater the weight value, the higher the importance of the ith co-occurrence word in the target context is; f. of_hRepresents the total weight of all co-occurring words in the search context and the target context in the target context, the greater the weight value, the more the weight value indicatesThe higher the importance of the co-occurring word in the target context; f. of_wCo-occurrence feature corresponding to the search context, which represents the pair f_sAnd f_hA calculation result obtained after performing harmonic mean calculation, f_wThe larger the value of (a), the higher the importance of all co-occurring words in the search context and the target context.

For example, the following steps are carried out: suppose that the words obtained after preprocessing such as word segmentation and the like on the search context are 'you', 'now', 'good' and 'do', the weights of the words in the search context are respectively calculated to be '0.2', '0.3', '0.4' and '0.1', the words obtained after preprocessing such as word segmentation and the like on the target context are 'you', 'true', 'good' and 'how', and the weights of the words in the target context are respectively calculated to be '0.2', '0.3' and '0.2'.

It can be seen that if the co-occurrence words in the search context and the target context are "you" and "good", the total weight of the two co-occurrence words in the search context can be calculated to be 0.6, i.e. 0.2+0.4 to 0.6, using formula (12), the total weight of the two co-occurrence words in the target context can be calculated to be 0.5, i.e. 0.2+0.3 to 0.5, using formula (13), and further, the co-occurrence feature corresponding to the search context can be calculated to be 0.545, i.e. f, using formula (14)_w0.6 × 0.5 × 2/(0.5+0.6) ═ 0.545, which characterizes the importance of all co-occurring words contained in the search context and target context.

The co-occurrence features corresponding to the search context are introduced above, and the semantic features corresponding to the search context are introduced below, specifically, the generation manner of the semantic features may include the following steps (1) to (3):

step (1): and generating a semantic representation result of the target context.

In this embodiment, after segmenting the target context to obtain each word included in the target context, a vector generation method may be used to generate a word vector corresponding to each word in the target context, for example, a word vector corresponding to each word in the target context may be queried in a manner of querying a semantic dictionary, and then, a semantic representation result of the target context may be generated according to the word vector corresponding to each word and a weight of each word in the target context, where a specific calculation formula is as follows:

wherein S represents a semantic representation result of the target context; m represents the total number of words contained in the target context; e_iRepresenting a word vector corresponding to the ith word in the target context; w is a_iThe weight of the ith word in the target context is represented, and the greater the weight value is, the higher the importance of the ith word in the target context is.

Step (2): semantic representation results of the search context are generated.

In this embodiment, after segmenting words of a search context to obtain words included in the search context, a vector generation method may be used to generate word vectors corresponding to the words in the search context, for example, the word vectors corresponding to the words in the search context may be queried in a manner of querying a semantic dictionary, and then, semantic representation results of the search context may be generated according to the word vectors corresponding to the words and the weight of the words in the search context, where the specific calculation formula is as follows:

wherein H represents the semantic representation result of the search context; m' represents the total number of words contained in the search context; e'_jRepresenting a word vector corresponding to the jth word in the search context; gamma ray_jThe weight of the jth word in the search context is represented, and the higher the weight value is, the higher the importance of the jth word in the search context is.

It should be noted that, the execution order of steps (1) and (2) is not limited in the embodiments of the present application.

And (3): and generating semantic features corresponding to the search context according to the generated semantic representation result.

In this embodiment, after the semantic representation result S of the target context is generated through the step (1), and the semantic representation result H of the search context is generated through the step (2), the cosine distance between the semantic representation result of the target context and the semantic representation result of the search context may be calculated to obtain the semantic similarity between the search context and the target context, which is used as the semantic feature corresponding to the search context, and the specific calculation formula is as follows:

f_m＝cosine(S,H) (17)

wherein f is_mIs the semantic feature corresponding to the search context to characterize the semantic similarity between the search context and the target context, and f_mThe semantic distance value is also used for measuring the semantic distance between the search context and the target context; cos ine represents a cosine distance calculation formula; s represents a semantic representation result of the target context; h represents the semantic representation result of the search context.

It is understood that f in the above formula (17)_mThe larger the value is, the smaller the semantic distance between the search context and the target context is, that is, the higher the semantic similarity between the search context and the target context is.

Step E3: and screening out each group of contexts similar to the target context in semanteme from each searched group of contexts according to the context characteristics corresponding to each searched group of contexts.

After the context features corresponding to each group of search contexts are generated through step E2, that is, after the co-occurrence features and/or semantic features corresponding to each group of search contexts are generated, the co-occurrence features and/or semantic features may be summed according to their respective weights, and the computation result is used to represent how close the semantics of the corresponding search context are to the target context, where the specific computation formula is as follows:

f＝w_wf_w+w_mf_m (18)

wherein f represents correspondence search upper and lowerSemantic proximity values of the text to the target context; w is a_wCo-occurrence feature f corresponding to the search context_wWeight of (1), weight w_wThe larger the size, the co-occurrence feature f is represented_wThe greater the importance of, the weight w_wCan be adjusted according to the experimental result; w is a_mRepresenting the semantic feature f corresponding to the search context_mWeight of (1), weight w_mThe larger the semantic feature f is represented_mThe greater the importance of, the weight w_mCan be adjusted according to the experimental result.

Specifically, after the semantic similarity degree value between each group of search contexts and the target context is calculated by using the above formula (18), the semantic similarity degree values may be sorted from large to small, and each group of search contexts corresponding to the semantic similarity degree value sorted in the previous preset number (or within a preset numerical range) is selected as each group of context semantically similar to the target context; or, the semantic similarity degree values are sorted from small to large, and each group of search contexts corresponding to the semantic similarity degree value of the sorted preset number (or within the preset numerical range) is selected as each group of contexts similar to the target context in semantics; or selecting all groups of search contexts corresponding to the semantic proximity values higher than the preset threshold value as groups of contexts similar to the target context in semantics.

Alternatively, each group of contexts that are semantically similar to the target context may be screened from the searched groups of contexts only according to the co-occurrence features corresponding to each group of search contexts or only according to the semantic features corresponding to each group of search contexts.

Specifically, an alternative implementation is to calculate the co-occurrence feature value f corresponding to each group of search contexts by using the above equation (14)_wThen, the co-occurrence feature values may be sorted from large to small, and each group of search contexts corresponding to the co-occurrence feature values sorted in the previous preset number (or within a preset numerical range) may be selected as each group of contexts semantically similar to the target context; or, the co-occurrence eigenvalues are sorted from small to large,selecting each group of search contexts corresponding to the co-occurrence characteristic values of the ordered preset number (or within a preset numerical range) as each group of contexts similar to the target context in semantics; or selecting each group of search contexts corresponding to all co-occurrence feature values higher than a preset threshold value as each group of contexts semantically similar to the target context.

Another optional implementation manner is that the semantic feature value f corresponding to each group of search contexts is calculated by using the above formula (17)_mThen, the semantic feature values can be sorted from large to small, and each group of search contexts corresponding to the semantic feature values sorted in the previous preset number (or within a preset numerical range) is selected as each group of contexts semantically similar to the target context; or, the semantic feature values are sorted from small to large, and each group of search contexts corresponding to the semantic feature values of the sorted preset number (or within a preset numerical range) is selected as each group of contexts similar to the target context in semantics; or selecting each group of search contexts corresponding to all semantic feature values higher than a preset threshold value as each group of contexts similar to the target context in semantics.

It should be noted that, by using the co-occurrence words in the search context and the target context, it is possible to more accurately screen out each set of contexts that are semantically close to the target context from each set of contexts that are related to the target context and searched out from the dialog corpus, and use the set of contexts as each set of corpus contexts.

S503: and acquiring reply linguistic data corresponding to the question linguistic data in each group of linguistic data contexts.

In this embodiment, after obtaining each group of corpus contexts semantically similar to the target context through step S502, the reply corpus corresponding to the question corpus in each group of corpus contexts may be obtained from the pre-constructed dialog corpus.

S504: and selecting at least one reply corpus as at least one to-be-selected reply of the target question.

In this embodiment, after the reply corpora corresponding to the question corpora in each corpus context are obtained in step S503, the semantic correlation degree between each reply corpus and the target context may be calculated, and then at least one reply corpus may be selected from the calculation results to serve as at least one to-be-selected reply to the target question.

It should be noted that, because the answer to a question should be semantically highly relevant to the question and even the context of the question, it is guaranteed that the answer is a reasonable answer that can be answered to the key contents of the question, rather than some high frequency answers that are semantically barely relevant to the question, like "i don't know".

Based on this, in this embodiment, an optional implementation manner is that the specific implementation process of "selecting at least one reply utterance" in step S504 may include: and selecting at least one reply corpus by analyzing the correlation between the target context and each reply corpus.

In this implementation manner, the existing or future semantic relevance calculation method may be used to calculate the relevance between the target context and each reply corpus, and then at least one reply corpus is selected from the calculated relevance, for example, the relevance between the target context and each reply corpus may be determined by using a pre-established relevance model or directly using a relevance calculation method, and then at least one reply corpus is selected according to the relevance determination result.

It should be noted that, in the following, how to determine the correlation between the target context and a certain reply corpus by using a pre-constructed correlation model will be described with reference to a certain reply corpus of all the reply corpuses acquired in step S503, and the processing manners of other reply corpuses are similar to that, and are not described in detail.

Specifically, the pre-constructed correlation model of the present embodiment may be formed by a multi-layer network, as shown in fig. 6, and the model structure includes an input layer, an embedding layer, a representation layer, and a matching layer.

The input layer is used for inputting the reply linguistic data and the target context. Specifically, as shown in fig. 6, the reply corpus may be input to the position of the "true reply" on the left side of the input layer in fig. 6, while the "target context" is input to the position of the "context" in the middle of the input layer in fig. 6.

Specifically, as shown in fig. 6, in the Embedding layer, a word vector corresponding to each word in the reply corpus and the target context may be queried by querying a pre-trained word vector dictionary (Embedding Matrix), that is, the word sequences of the reply corpus and the target context are converted into word vector sequences.

The presentation layer is used for coding word vector sequences corresponding to the reply corpus and the target context respectively output by the embedding layer so as to obtain coding vectors corresponding to the reply corpus and the target context respectively. For example, in the presentation layer, word-Bag models (BOW), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), or the like may be used to encode the respective word vector sequences corresponding to the reply corpus and the target context output by the embedding layer, so as to obtain a respective coding vector, and a specific coding process of the model is consistent with the existing method, and is not described herein again.

The matching layer is used for performing matching calculation on the reply linguistic data output by the presentation layer and the coding vectors corresponding to the target context respectively, and determining the correlation between the target context and the reply linguistic data according to a calculation result. Specifically, a cosine distance value between the code vectors corresponding to the reply corpus and the target context output by the presentation layer may be calculated, and as a matching calculation result, the larger the value of the matching calculation result, the higher the correlation between the target context and the reply corpus is. Alternatively, a Multi-layer full-connection network (MLP) trained in advance may be used to perform matching calculation on the code vectors corresponding to the reply corpus and the target context output by the presentation layer, so as to obtain a matching calculation result.

Furthermore, after the pre-constructed correlation model is used to determine the matching calculation results between the target context and each reply corpus, the matching calculation results can be sorted from large to small, and at least one reply corpus corresponding to the matching calculation result value of the preset number sorted before is selected as at least one to-be-selected reply of the target problem; or, the matching calculation results are sorted from small to large, and at least one reply corpus corresponding to the sorted preset number of matching calculation result values is selected as at least one to-be-selected reply of the target problem; or selecting at least one reply corpus corresponding to all the matching calculation result values higher than a preset threshold value as at least one to-be-selected reply of the target problem.

In this embodiment, an optional implementation manner is that the correlation model is obtained by training using model training data, where the model training data includes each sample context, and true reply and random reply of a sample question included in the sample context; wherein the sample context includes the sample question and a historical dialog context prior to the sample question.

Next, the present embodiment will describe a process of constructing a correlation model. The method specifically comprises the following steps F1-F3:

step F1: model training data is formed.

In this embodiment, in order to construct the correlation model, first, a large number of human dialog contexts (which may constitute the dialog corpus) need to be collected in advance, for example, a large number of real dialog data of people on social network platforms such as microblog and cafe may be collected in advance, then, the last question in each human dialog context is taken as a sample question, and the historical dialog context including the sample question and the previous sample question is defined as a sample context, and each sample question corresponds to real reply content and random reply content, and these collected data are taken as model training data.

Step F2: and constructing a correlation model.

An initial correlation model may be constructed and model parameters initialized.

It should be noted that the execution sequence of step F1 and step F2 is not limited in this embodiment.

Step F3: and training the correlation model by using the pre-collected model training data.

In this embodiment, after the model training data is collected in step F1, the correlation model constructed in step F2 may be trained using the model training data, and the correlation model may be obtained by training through multiple rounds of model training until the training end condition is satisfied.

Specifically, during the current round of training, a sample context needs to be selected from the model training data, at this time, the target context in the above embodiment is replaced by the sample context, the reply corpus obtained through step S503 in the above embodiment is replaced by the true reply content included in the sample context, and the correlation between the sample context and the true reply content is determined according to the manner described in the above embodiment, and the specific flow may be shown in the left and middle two columns of block diagrams in fig. 6.

Meanwhile, the reply corpus acquired in step S503 in the above embodiment may be replaced with the random reply content included in the sample, and the correlation between the sample context and the random reply content is determined according to the manner described in the above embodiment, and the specific flow may be shown in the middle and right column block diagrams in fig. 6.

Then, on the basis of the correlation between the sample context and the real reply content and the correlation between the sample context and the random reply content, the parameters of the correlation model are updated by comparing the difference between the sample context and the real reply content, and the current round of training of the correlation model is completed.

In the present round of training, an alternative implementation manner is that a target function may be used for training in the training process of the correlation model, for example, a loss function change loss or binary cross entropy (binary _ cross entropy) may be used as a target function to enlarge a gap between the sample context and the true reply based on the correlation between the sample context and the random reply, so that the correlation model has a capability of distinguishing between the rational reply and the unreasonable reply.

Moreover, when the correlation model is trained by using an objective function such as the loss function change or binary _ cross, the model parameters of the correlation model may be continuously updated according to the change of the value of the objective function, for example, the model parameters may be updated by using a back propagation algorithm until the value of the objective function meets the requirement (e.g., tends to 0 or the change amplitude is small), and the update of the model parameters is stopped, thereby completing the training of the correlation model.

In this way, in the embodiment, the relevance between the target context and each reply corpus is determined by using the pre-established relevance model, and then at least one reply corpus is selected according to the relevance determination result to serve as each reply to be selected of the target question, so that the content of the obtained replies to be selected is semantically related to the target context, and the replies to be selected can be ensured to be replied to the key content of the target question, and not only some high-frequency replies which are not meaningful or are only semantically reluctant to the target context, thereby meeting the dialogue requirement of the questioner.

In summary, in this embodiment, after the target context is obtained, first, each group of corpus contexts that is semantically similar to the target context is obtained from the pre-constructed corpus, and then, after the reply corpus corresponding to the question corpus in each group of corpus contexts is obtained from the pre-constructed corpus, at least one reply corpus is selected from the group of corpus contexts as at least one to-be-selected reply to the target question. Therefore, in the embodiment, the to-be-selected reply of the target question is acquired based on each group of corpus contexts which are semantically similar to the target context in the pre-constructed corpus, so that the acquired to-be-selected reply can be replied to the key content of the target question, and the final reply of the target question can be selected from the to-be-selected replies in the following process, so that the dialogue requirement of a questioner is met, and the reasonability of replying the acquired result is improved.

Fourth embodiment

The present embodiment will describe a specific implementation process of "selecting one reply from the candidate replies as a final reply to the target question" in step S104 in the first embodiment according to the target question and the portrait information.

It is understood that the answer to the target question is greatly related to the dialog content between the questioner and the machine before the target question is presented, so in order to make the selected final answer more suitable for the dialog requirement of the questioner, an alternative implementation manner is that the step S104 may specifically include: one reply is selected from the various replies to be selected according to the target context and the portrait information.

The target context includes the target question and the historical dialogue before the target question, and please refer to the introduction in S501 in the third embodiment regarding the relevant introduction of the target context.

The semantic relevance may be calculated between the target context and the person image information of the questioner and each of the candidate replies obtained in step S103 to obtain the semantic relevance between each of the candidate replies and the target context and the person image information of the questioner, respectively, and further, the candidate reply corresponding to the highest semantic relevance may be selected as the final reply of the target question, for example, a pre-constructed reply selection model may be used to calculate the semantic relevance between each of the candidate replies and the target context and the person image information of the questioner, so as to select the final reply of the target question according to the calculation result.

In this embodiment, an alternative implementation manner is that, referring to fig. 8, the implementation manner of S104 "selecting one reply from the various replies to be selected according to the target context and the character image information" may specifically include the following steps S801 to S802:

s801: and determining semantic relevance between the target context and the portrait information and each reply to be selected for each reply to be selected.

It should be noted that, in the following, how to determine the semantic relevance between the selected reply and the target context and the person image information by using the pre-constructed reply selection model will be described with reference to one of all the replies to be selected acquired in step S103, and the processing manners of other replies to be selected are similar to the semantic relevance, and are not described in detail again.

In the implementation mode, the pre-constructed reply selection model can be utilized to calculate the semantic relevance between the reply to be selected and the target context and the portrait information of the questioner.

The pre-constructed reply selection model may be formed by a multi-layer network, as shown in fig. 7, the model structure includes an input layer, a multi-layer representation layer (including a word-level representation layer, a sentence-level representation layer, and a context-level representation layer), a matching calculation layer, a dimensionality reduction fusion layer, and an output layer.

Specifically, the input layer is used for inputting a target context, character image information and a to-be-selected reply, the multilayer representation layer comprises a word level representation layer, a sentence level representation layer and a context representation layer, and is combined with the matching calculation layer, the target context and the character image information are respectively matched and calculated with the to-be-selected reply at the three levels of word level, sentence level and context level, and the obtained matching calculation result is subjected to dimension reduction by using the dimension reduction fusion layer, so that the output layer outputs the semantic relevance between the to-be-selected reply, the target context and the character image information of the questioner according to the dimension reduction result.

Specifically, referring to fig. 7, it is assumed that the target context includes l sentences, where the l sentences are defined as u in chronological order from front to back₁、u₂、...u_lWherein u is_lThe method refers to text content corresponding to a target question, and the speech of a questioner and the speech of a machine in each round of conversation can be respectively regarded as a sentence; k included in person image information of questionerThe bar descriptive sentences are each p₁、p₂、...、p_kAnd the maximum length of each sentence in the K sentences is m (i.e. each sentence contains m words at most); the maximum length of each sentence and the reply to be selected included in the target context is n (namely each sentence and the reply to be selected contain n words at most); where m may be the length of the longest sentence among the sentences in the character representation information, and n may be the length of the longest sentence among the sentences of the target context and the candidate replies.

The target context, the character image information of the questioner and the reply to be selected are used as input data, after the input data are input into a reply selection model which is constructed in advance, semantic relevance between the target context and the character image information and the reply to be selected can be respectively determined by using the selection model at three levels of a word level, a sentence level and a context level, and the specific implementation process comprises the following steps of S8011-S8013:

s8011: and generating a semantic representation result of the reply to be selected at the word level.

In this embodiment, a word segmentation method may be first used to segment the reply to be selected to obtain each word included in the reply to be selected, and then a vector generation method is used to generate a word vector corresponding to each word, for example, a word vector corresponding to each word in the reply to be selected may be queried in a manner of querying a pre-trained word vector dictionary, and then the word sequence to be selected may be converted into a word vector sequence to be used as input data to be input to an input layer of a reply selection model, as shown in fig. 7, assuming that a dimension of the word vector corresponding to each word is d, a tensor of the word vector sequence to be selected is: n x d.

Then, using a shared LSTM included in the word-level representation layer of the reply selection model to perform further vectorization processing on the word vector sequence of the reply to be selected input by the input layer to obtain a vector representation result corresponding to the reply to be selected at the word level, which is used as a semantic representation result of the reply to be selected at the word level, specifically, assuming that the number of network nodes included in the LSTM network is h, the obtained network output vectors of each node may be spliced to obtain a tensor size of the semantic representation result of the reply to be selected at the word level, where: n x h.

S8012: and generating a semantic representation result of the target context at least one of the word level, the sentence level and the context level, and determining the semantic correlation between the target context and the reply to be selected according to the generated semantic representation result.

In this embodiment, semantic representation results of the target context at the word level, the sentence level and the context level may be generated by using the word level representation layer, the sentence level representation layer and the context level representation layer of the reply generation model, and after generating a semantic representation result at least one representation level of the three levels, a semantic correlation between the target context and the reply to be selected may be determined according to the generated at least one semantic representation result.

Next, the present embodiment will respectively describe the generation processes of semantic representation results of the target context at the word level, sentence level and context level:

(1) the semantic representation result of the target context under the word level can be generated according to the following modes:

first, a word segmentation method may be used to segment the word of one sentence contained in the target context to obtain each word contained in the sentence, then, a vector generation method is used to generate a word vector corresponding to each word, for example, a word vector dictionary trained in advance may be queried to find out the word vector corresponding to each word in the one sentence, and then, the word sequence of the one sentence may be converted into a word vector sequence to be input to an input layer of the reply selection model as shown in fig. 7, where a dimension of the word vector corresponding to each word is d, and a maximum length of each sentence included in the target context is n, a tensor of the word vector sequence of the target context is: l n d.

Then, a shared LSTM included in the word level representation layer of the reply selection model is used to perform further vectorization processing on the word vector sequence of the target context input by the input layer to obtain a vector representation result corresponding to the target context at the word level, which is used as a semantic representation result of the target context at the word level.

(2) The semantic representation result of the target context at the sentence level can be generated according to the following modes:

because the information in the target context is more and only the target question is the most important information for selecting the reply, after the semantic representation result of each sentence in the target context at the word level is obtained, the sentence level representation layer of the reply selection model can be utilized to further carry out vectorization processing on the semantic representation result of each sentence in the target context at the word level so as to focus on the information related to the target question in the semantic representation result of each sentence at the word level, namely focus on the word related to the target question in each sentence contained in the target context, and further obtain the semantic representation result of each sentence contained in the target context at the sentence level.

Specifically, as shown in fig. 7, the semantic representation results of each sentence in the target context at the word level can be respectively associated with the target question (i.e. the last sentence u in the target context) by using the sentence-level representation layer of the reply selection model_lSemantic representation results under the word level) to focus on information related to the target problem in the semantic representation results of each sentence under the word level, so that information unrelated to the target problem can be removed, only the focused information is subjected to further vectorization processing, and in the vectorization processing, the weight of each word in each sentence in the target context can be calculated firstly, and the specific calculation formula is as follows:

wherein q (i) is a word vector of the ith word in the target question; u. of_k(j) A word vector representing a jth word in a kth sentence in a target context;representing the relevance of the jth word in the kth sentence in the target context and the ith word in the target question; n represents the maximum number of words contained in a preset target question;representing a correlation coefficient of a jth word in a kth sentence in the target context, wherein the value of the correlation coefficient is the maximum value of the correlation between the jth word in the kth sentence in the target context and each word in the target question; v_attShow thatThe weighted value of the jth word in the kth sentence in the target context is obtained after the softmax function normalization, and the weighted value represents the degree of correlation between the jth word in the kth sentence in the target context and the target problem, namely, the larger the weighted value is, the higher the correlation between the jth word in the kth sentence in the target context and the target problem is, and conversely, the smaller the weighted value is, the lower the correlation between the jth word in the kth sentence in the target context and the target problem is.

It can be seen that, through the above formulas (19) and (20), the weight value of each word in each sentence in the target context can be calculated, and further, the semantic representation result of each sentence in the target context at the sentence level can be calculated according to the weight value corresponding to each word in each sentence, that is, the semantic representation result of the target context at the sentence level is obtained, and the specific calculation formula is as follows:

wherein the content of the first and second substances,representing the semantic representation result of the kth sentence in the target context at the sentence level; v_attRepresenting the weight value of the jth word in the kth sentence in the target context; u. of_k(j) A word vector representing a jth word in a kth sentence in a target context; n represents the number of words contained in the kth sentence in the target context.

It should be noted that, assuming that the number of network nodes included in the sentence-level representation layer of the reply selection model is still h, the tensor size of the semantic representation result of each sentence at the sentence level in the target context obtained in the above manner is: l x h.

(3) The semantic representation result of the target context at the context level can be generated as follows:

after the semantic representation result of each sentence in the target context at the sentence level is obtained, the semantic representation result of each sentence in the target context at the sentence level can be subjected to one-step vectorization processing by using a context level representation layer of the reply selection model, so that the semantics of each sentence in the target context are linked to form the semantic representation result of the target context at the context level.

Specifically, as shown in fig. 7, the semantic representation results of each sentence at the sentence level in the target context input by the sentence-level representation layer may be further vectorized by using a shared LSTM included in the context-level representation layer of the reply selection model, so as to obtain a vector representation result at the context level corresponding to the target context, as the semantic representation result at the context level of the target context.

It should be noted that, assuming that the number of network nodes included in the LSTM hidden layer is h', the obtained network output vectors of each node may be spliced, and then the tensor of the semantic representation result of the target context at the context level is obtained as follows: l x h'. The value of h 'may be the same as or different from the value of h mentioned in the above steps (1) and (2), which is not limited in this embodiment, but for convenience of subsequent calculation, the value of h' may be generally the same as that of h mentioned in the above steps (1) and (2).

Further, after the semantic representation results of the target context at least one of the word level, the sentence level and the context level are generated through the steps (1), (2) and (3), the semantic relevance between the target context and the reply to be selected can be determined according to the generated semantic representation results.

Specifically, an alternative implementation manner may be that a semantic representation result of the target context at each representation level and a semantic representation result of the reply to be selected are subjected to correlation calculation to obtain a semantic correlation between the target context and the reply to be selected.

In this implementation, after the semantic representation result of the target context at each representation level is generated through the above steps (1), (2), and (3), and the semantic representation result of the to-be-selected reply at the word level is generated through the step S8011, as shown in fig. 7, the obtained semantic representation result of the target context at each representation level and the semantic representation result of the to-be-selected reply at the word level may be respectively subjected to matching calculation by using a matching calculation layer of a reply selection model, and a specific matching calculation manner may be to respectively calculate a cosine distance between the semantic representation result of the target context at each representation level and the semantic representation result of the to-be-selected reply at the word level, and further, based on the calculated cosine distance, obtain a semantic similarity between the target context and the to-be-selected reply. The specific matching calculation formula for the three representation levels is as follows:

wherein, W_r(i) Representing the semantic representation result of the ith word in the reply to be selected at the word level;representing semantic representation results of the jth word in the kth sentence in the target context at a word level;representing the semantic representation result of the kth sentence in the target context at the sentence level;representing the semantic representation result of the kth sentence in the target context at the context level; cos ine represents a cosine distance calculation formula;representing the cosine distance between the semantic representation result of the jth word in the kth sentence in the target context at the word level and the semantic representation result of the ith word in the reply to be selected at the word level;expressing the cosine distance between the semantic expression result of the kth sentence in the target context at the sentence level and the semantic expression result of the ith word in the reply to be selected at the word level;and expressing the cosine distance between the semantic expression result of the k-th sentence in the target context at the context level and the semantic expression result of the i-th word in the reply to be selected at the word level.

It will be appreciated that the above description has been made withIn the formula (22)The larger the value is, the smaller the semantic distance between the semantic representation result of the jth word in the kth sentence in the target context at the word level and the semantic representation result of the ith word in the reply to be selected at the word level is, namely, the higher the semantic similarity between the two is; similarly, in the above formula (23)The larger the value is, the smaller the semantic distance between the semantic representation result of the kth sentence in the target context at the sentence level and the semantic representation result of the ith word in the reply to be selected at the word level is, namely, the higher the semantic similarity between the semantic representation result and the semantic representation result is; in the above formula (24)The larger the value is, the smaller the semantic distance between the semantic representation result of the kth sentence in the target context at the context level and the semantic representation result of the ith word in the reply to be selected at the word level is, that is, the higher the semantic similarity between the two results is.

S8013: and generating semantic representation results of the character image information at least one representation level of word level, sentence level and context level, and determining semantic relevance between the character image information and the reply to be selected according to the generated semantic representation results.

In this embodiment, semantic representation results of the character image information at the word level, the sentence level and the context level may be generated by using a word level representation layer, a sentence level representation layer and a context level representation layer of the reply generation model, respectively, and after a semantic representation result at least one representation level of the three levels is generated, a semantic correlation between the character image information and the reply to be selected may be determined according to the generated at least one semantic representation result.

Next, the present embodiment will respectively introduce the generation processes of semantic representation results of the character image information at the word level, sentence level and context level:

(1) the semantic representation result of the character image information at the word level can be generated according to the following mode:

first, a word segmentation method may be used to segment K descriptive sentences contained in character image information to obtain words contained in the character image information, and then a vector generation method is used to generate word vectors corresponding to the words, for example, a word vector dictionary trained in advance may be queried to find out word vectors corresponding to the words in the K sentences, and then word sequences of the K sentences may be converted into word vector sequences to be input to an input layer of a reply selection model as shown in fig. 7, where a dimension of a word vector corresponding to each word is d, and a maximum length of each sentence included in the character image information is m, and a tensor of the word vector sequence of the character image information is: k m d.

Then, using a shared LSTM included in the word level representation layer of the reply selection model to perform further vectorization processing on the word vector sequence of the character image information input by the input layer to obtain a vector representation result corresponding to the character image information at the word level, which is used as a semantic representation result of the character image information at the word level, specifically, assuming that the number of network nodes included in the LSTM network is h, the obtained network output vectors of each node can be spliced to obtain a semantic representation result of each sentence included in the character image information at the word level, wherein the tensor size of the semantic representation result of each sentence at the word level is k m h, and the tensor size of the semantic representation result of each sentence at the word level is: m x h.

(2) The semantic representation result of the person portrait information at the sentence level can be generated as follows:

since there is a lot of information in the character image information, but not all information contributes to the selection reply, after obtaining the semantic representation result of each sentence in the character image information at the word level, the sentence-level representation layer of the reply selection model can be used to further quantize the semantic representation result of each sentence in the character image information at the word level, so as to focus on the information related to the target problem in the semantic representation result of each sentence at the word level, that is, the word related to the target problem in each sentence included in the character image information, and further obtain the semantic representation result of each sentence included in the character image information at the sentence level.

Specifically, as shown in fig. 7, the semantic representation results of each sentence in the character image information at the word level can be respectively associated with the target question (i.e. the last sentence u in the target context) by using the sentence level representation layer of the reply selection model_lSemantic representation results under the word level) to focus on information related to the target problem in the semantic representation results of each sentence under the word level, so that information unrelated to the target problem can be removed, only the focused information is subjected to further vectorization processing, and in the vectorization processing, the weight of each word in each sentence in the character image information can be calculated firstly, and the specific calculation formula is as follows:

wherein q (i) is a word vector of the ith word in the target question; p is a radical of_k(j) A word vector representing a jth word in a kth sentence in the character portrait information;representing the correlation between the jth word in the kth sentence in the portrait information and the ith word in the target question; n represents the maximum number of words contained in a preset target question;representing the jth sentence in the kth sentence in the portrait informationThe word correlation coefficient is the maximum value of the correlation between the jth word in the kth sentence in the portrait information and each word in the target question; v'_attShow thatThe method comprises the steps that a softmax function is used for normalizing to obtain a weight value of a jth word in a kth sentence in character portrait information, wherein the weight value represents the degree of correlation between the jth word in the kth sentence in the character portrait information and a target problem, namely, the larger the weight value is, the higher the correlation between the jth word in the kth sentence in the character portrait information and the target problem is, and the smaller the weight value is, the lower the correlation between the jth word in the kth sentence in the character portrait information and the target problem is.

As can be seen, the above formulas (25) and (26) can calculate the weighted value of each word in each sentence in the character image information, and further can calculate the semantic representation result of each sentence in the character image information at the sentence level according to the weighted value corresponding to each word in each sentence, that is, the semantic representation result of the character image information at the sentence level is obtained, and the specific calculation formula is as follows:

wherein the content of the first and second substances,representing the semantic representation result of the kth sentence in the portrait information at the sentence level; v'_attRepresenting the weight value of the jth word in the kth sentence in the portrait information; p is a radical of_k(j) A word vector representing a jth word in a kth sentence in the character portrait information; m represents the number of words included in the kth sentence in the character image information.

It should be noted that, assuming that the number of network nodes included in the sentence-level representation layer of the reply selection model is still h, the tensor size of the semantic representation result of each sentence in the character image information at the sentence level obtained in the above manner is: k x h.

(3) The semantic representation result of the person image information at the context level may be generated as follows:

after the semantic representation result of each sentence in the character portrait information at the sentence level is obtained, a context level representation layer of the reply selection model can be utilized to further carry out vectorization processing on the semantic representation result of each sentence in the character portrait information at the sentence level so as to link the semantics of each sentence in the character portrait information and form the semantic representation result of the character portrait information at the context level.

Specifically, as shown in fig. 7, the semantic representation results of each sentence at the sentence level in the character image information input from the sentence-level representation layer may be further vectorized by using a shared LSTM included in the context-level representation layer of the reply selection model, to obtain a vector representation result at the context level corresponding to the character image information, which is used as the semantic representation result of the character image information at the context level.

It should be noted that, assuming that the number of network nodes included in the LSTM hidden layer is h', the obtained network output vectors of each node may be spliced, and then the tensor of the semantic representation result of the portrait information at the context level is obtained as follows: k x h'. The value of h' may be the same as or different from the value of h mentioned in the above steps (1) and (2), which is not limited in this embodiment, but for convenience of subsequent calculation, the two values may be generally the same.

Further, after the semantic representation result of the character image information at least one of the word level, the sentence level and the context level is generated through the steps (1), (2) and (3), the semantic correlation between the character image information and the reply to be selected can be determined according to the generated semantic representation result.

Specifically, an alternative implementation manner may be that a semantic representation result of the character image information at each representation level and a semantic representation result of the reply to be selected are subjected to correlation calculation to obtain a semantic correlation between the character image information and the reply to be selected.

In the present implementation, when the semantic representation result of the human image information at each representation level is generated through the above steps (1), (2) and (3), and after generating the semantic representation result of the candidate reply at the word level through step S8011, as shown in fig. 7, the semantic representation results of the obtained character image information at each representation level and the semantic representation results of the to-be-selected reply at the word level can be respectively matched and calculated by utilizing a matching calculation layer of the reply selection model, and the specific matching calculation mode, the cosine distance between the semantic representation result of the character image information at each representation level and the semantic representation result to be selected and recovered at the word level can be respectively calculated, and then the semantic similarity between the character image information and the reply to be selected can be obtained based on the calculated cosine distance. The specific matching calculation formula for the three representation levels is as follows:

wherein, W_r(i) Representing the semantic representation result of the ith word in the reply to be selected at the word level;representing the semantic representation result of the jth word in the kth sentence in the character image information at the word level;a semantic representation result of the kth sentence in the character portrait information at the sentence level is represented;a semantic representation result representing a kth sentence in the character image information at a context level; cos ine represents a cosine distance calculation formula;representing the cosine distance between the semantic representation result of the jth word in the kth sentence in the portrait information under the word level and the semantic representation result of the ith word in the reply to be selected under the word level;expressing the cosine distance between the semantic expression result of the kth sentence in the portrait information at the sentence level and the semantic expression result of the ith word in the reply to be selected at the word level;and expressing the cosine distance between the semantic expression result of the kth sentence in the character image information at the context level and the semantic expression result of the ith word in the reply to be selected at the word level.

It is understood that in the above formula (28)The larger the value is, the smaller the semantic distance between the semantic representation result of the jth word in the kth sentence in the portrait information at the word level and the semantic representation result of the ith word in the reply to be selected at the word level is, namely, the higher the semantic similarity between the two semantic representations is; similarly, in the above formula (29)The larger the value is, the smaller the semantic distance between the semantic representation result of the kth sentence in the portrait information at the sentence level and the semantic representation result of the ith word in the reply to be selected at the word level is, namely, the higher the semantic similarity between the two results is; in the above equation (30)The larger the value is, the smaller the semantic distance between the semantic representation result of the kth sentence in the portrait information at the context level and the semantic representation result of the ith word in the reply to be selected at the word level is, that is, the higher the semantic similarity between the kth sentence and the candidate reply is.

It should be noted that the execution order of step S8012 and step S8013 is not limited in the present embodiment.

As can be seen, the matching result (i.e. cosine distance) between the semantic representation result of all words contained in all sentences (i.e. l sentences) in the target context at the word level and the semantic representation result of the candidate reply at the word level can be calculated by the formula (22), and the matching results can form a matching matrix of the candidate reply and the target context at the word level and are defined asThenThe dimension of (a) is n x l x n; similarly, the matching result (cosine distance) between the semantic expression result of all sentences (i.e. one sentence) in the target context at the sentence level and the semantic expression result of the candidate reply at the word level can be calculated by formula (23), and the matching results can form a matching matrix of the candidate reply and the target context at the sentence level and are defined asThenThe dimension of (a) is n x l; the formula (24) can calculate the matching result (cosine distance) between the semantic representation result of all sentences (i.e. l sentences) in the target context at the context level and the semantic representation result of the candidate reply at the word level, and these matching results can be used to form the candidate reply and the matching result of the target context at the context levelA matching matrix is defined asThenIs n x l.

Similarly, the matching result (i.e. cosine distance) between the semantic representation result of all words in all sentences (i.e. K sentences) in the portrait information at the word level and the semantic representation result of the candidate reply at the word level can be calculated by formula (28), and the matching results can form a matching matrix of the candidate reply and the portrait information at the word level, which is defined as a matching matrix of the candidate reply and the portrait information at the word levelThenHas a dimension of n x k x m; similarly, the matching result (cosine distance) between the semantic representation result of all sentences (i.e. K sentences) in the portrait information at sentence level and the semantic representation result of the candidate reply at word level can be calculated by formula (29), and the matching results can form a matching matrix of the candidate reply and the portrait information at sentence level, and are defined as the matching matrix of the candidate reply and the portrait information at sentence levelThenHas a dimension of n x k; the formula (30) can calculate the matching result (cosine distance) between the semantic representation result of all sentences (i.e. K sentences) in the portrait information at the context level and the semantic representation result of the reply to be selected at the word level, and the matching results can form a matching matrix of the reply to be selected and the portrait information at the context level, which is defined as the matching matrixThenIs n x k.

Furthermore, the candidate reply and the matching matrix of the target context and the character image information at the same representation level can be spliced respectively, and the specific splicing formula is as follows:

wherein M is₁Representing the splicing result of the matching matrix of the reply to be selected, the target context and the character image information at the word level; m₂Representing the splicing result of the matching matrix of the reply to be selected, the target context and the portrait information at the sentence level; m₃And representing the splicing result of the candidate reply and the matching matrix of the target context and the portrait information at the context level.

Further, due to the splicing result M obtained by the above formula (31)₁Can be obtained by using the max-posing method to obtain the splicing result M of the formula (31) as shown in FIG. 7₁And (3) performing dimension reduction treatment, wherein a specific treatment formula is as follows:

wherein M is₁' means maximum pooling max-pooling for M at word level₁Performing dimensionality reduction treatment on the obtained result;indicate from the waitingSelecting and replying the maximum value of the matching result (which can be calculated by the formula (28)) corresponding to each word in each sentence of the extracted character image information in the matching matrix of the character image information at the word level, wherein n x k values can be extracted;the maximum value of the matching result (which can be calculated by the above formula (22)) corresponding to each word in each sentence of the target context, which is extracted from the matching matrix of the candidate reply and the target context at the word level, can be extracted by n × l.

Further, M obtained by the above-mentioned formulas (32), (33) and (34) can be used₂、M₃、M₁Splicing is carried out, a splicing result is used as a final matching matrix corresponding to the reply to be selected, and the final matching matrix is used for representing the correlation degree of the reply to be selected, the target context and the portrait information, and a specific splicing formula is as follows:

M＝[M₁'；M₂；M₃] (35)

wherein M represents a final matching matrix corresponding to the candidate reply, which characterizes a semantic relevance corresponding to the candidate reply, and a tensor size corresponding to M is 3 × n × (k + l), n represents a total number of words included in the candidate reply, and k and l represent total numbers of sentences included in the character image information and the target context, respectively; m₁' means maximum pooling max-pooling for M at word level₁Result of dimension reduction, M₁Representing a splicing result of the reply to be selected and a matching matrix of the target context and the portrait information at a word level; m₂Representing the splicing result of the matching matrix of the reply to be selected, the target context and the character image information at the sentence level; m₃And representing the splicing result of the matching matrix of the reply to be selected, the target context and the character image information at the context level.

It should be noted that, the above description is related to the calculation of the degree of correlation based on the semantic representation results of the target context and the character image information at three representation levels, i.e., the word level, the sentence level, and the context level, and actually, the degree of correlation may also be calculated by using the semantic representation results at any one or two of the representation levels, and the calculation result is used as the semantic correlation M corresponding to the candidate reply.

For example, if only semantic representation results of the target context and the character image information at the word level are generated in steps S8012 and S8013, only semantic relevance of the target context and the character image information to the candidate reply at the word level may be determined in the above manner, and further, M may be calculated by using equations (31) and (34)₁', as the semantic relevance M corresponding to the reply to be selected; alternatively, M may be calculated using only equation (32)₂Or calculating M by using only the formula (33)₃And the semantic relevance M corresponding to the reply to be selected is used as the semantic relevance M corresponding to the reply to be selected.

For another example, if only semantic representation results of the target context and the character image information at the word level and the sentence level are generated in steps S8012 and S8013, only the semantic relevance of the target context and the character image information to the candidate reply at the word level and the sentence level may be determined in the above manner, and the semantic relevance corresponding to the candidate reply may be calculated as M ═ M using equations (31), (32), (34), and (35) ("M ═ M { (32) } 34 } and (35) } respectively₁'；M₂]。

As can be seen, in the above manner, the semantic relevance M corresponding to each reply to be selected in all the replies to be selected acquired in step S103 may be calculated, so that in the subsequent step S802, the reply to be selected corresponding to the highest semantic relevance is selected as the final reply of the target question.

S802: and selecting one reply from the replies to be selected according to the semantic relevance corresponding to each reply to be selected.

In this embodiment, if the semantic relevance M corresponding to each of the candidate replies is determined through step S801, for example, the semantic relevance M corresponding to each of the candidate replies is [ M ═ M₁'；M₂；M₃]Then B contained in the dimensionality-reduced fusion layer of the model can be selected using the replyDimension reduction is carried out on M by iLSTM and max-pooling, and the specific calculation formula is as follows:

M'＝Maxpooling(Bi-LSTM(M)) (36)

V＝Maxpooling(Bi-LSTM(M')) (37)

wherein, M' represents the result of performing one-time dimensionality reduction on M, that is, the result obtained by performing maximum pooling on the output value of each network node of the BiLSTM after inputting M into the BiLSTM in the dimensionality reduction fusion layer; v represents the result of performing the primary dimension reduction on M ', that is, the result obtained by performing the maximum pooling on the output value of each network node of the BiLSTM after inputting M' to the BiLSTM in the dimension reduction fusion layer, that is, the result of performing the secondary dimension reduction on M.

By the above equations (36) and (37), the three-dimensional M, V can be reduced to one-dimensional V, and the length corresponding to V is k + l.

Further, V may be input to an output layer, which may be an output layer having a sigmoid threshold function sigmoid, and the output layer may predict and output a probability value, which may be a numerical value in the interval [0,1], and the magnitude of the probability value represents the semantic correlation between the candidate reply and the target context and the person portrait information, for example, the larger the probability value, the higher the semantic correlation between the candidate reply and the target context and the person portrait information, and the specific calculation formula is as follows:

score＝sigmoid(W*V+b) (38)

w, b represents model parameters obtained after the reply selection model is trained; score represents the probability value corresponding to the candidate reply.

Therefore, by the above method, the probability value corresponding to each to-be-selected reply in each to-be-selected reply acquired in S103 can be calculated, and then the probability values can be sorted from large to small, and the to-be-selected reply corresponding to the probability value with the top sorting is selected as the final reply of the target problem; or, the probability values are sorted from small to large, and the reply to be selected corresponding to the probability value with the highest sorting is selected as the final reply of the target problem.

In summary, in the embodiment, a pre-constructed reply selection model is used, and correlation calculation is performed between each to-be-selected reply and the target context and the character image information from at least one of the three levels, i.e., the word, the sentence and the context, so that when the final reply of the target problem is selected based on the information, misguidance caused by unimportant local information in the target context and the character image information is avoided, and the final reply more suitable for the target context can be selected; and when the final reply of the target question is selected from all the replies to be selected, the character pictorial information of the questioner who proposes the target question is also considered, so that the selected final reply content is more relevant to the individual characteristics of the questioner, the conversation requirement of the questioner can be met, and the reasonability of a reply generation result is improved.

Fifth embodiment

For ease of understanding, this embodiment will be combined with a schematic structural diagram of a reply selection method shown in fig. 10. The overall implementation process of the reply selection method provided by the embodiment of the application is introduced.

As shown in FIG. 10, the structure comprises a portrait management module, a reply generation module, a reply acquisition module and a reply selection model, wherein the portrait management module is used for generating portrait information of a questioner of a target question; the reply generation module is used for generating at least one to-be-selected reply of the target question; the reply acquisition module is used for acquiring at least one reply to be selected of the target question; the reply selection module is used for selecting one reply from all the replies to be selected as the final reply of the target question.

Specifically, the overall implementation process of the embodiment of the present application is as follows: firstly, a target question input by a questioner in a voice or text mode can be obtained; then, the portrait management module can be used for selecting at least one piece of candidate portrait information which is similar to the target question in semantics from a pre-constructed portrait information base according to the obtained target question, the candidate portrait information is used as the portrait information of the questioner, then, the reply generation module can use the received portrait information and a target context which is composed of the target question and the historical dialogue text before the target question, coding and decoding are carried out, information which is related to the target question in the target context and the portrait information of the questioner is extracted, information which is irrelevant to the target question is removed, and then each reply to be selected which is higher in accuracy and better meets the requirements of the questioner can be generated according to a decoding result.

Meanwhile, the reply acquisition module can also screen out each group of contexts which are similar to the target context in semantics from a pre-constructed dialogue corpus according to the acquired target context to serve as each group of corpus contexts, and acquire reply corpora corresponding to the problem corpora in each group of corpus contexts; then, the reply acquisition module can also perform relevance calculation by using the acquired reply linguistic data and a target context consisting of the target question and the historical dialogue upper text before the target question, so as to select at least one reply linguistic data as each reply to be selected of the target question according to the relevance determination result, thereby ensuring that the acquired replies to be selected are semantically related to the target context in content and can reply to a reasonable reply of the key content of the target question instead of meaningless high-frequency replies only semantically related to the target context, and further meeting the dialogue requirement of a questioner.

Finally, the reply selection module can utilize the received portrait information and the target context composed of the historical dialogue upper text before the target question and the target question to respectively carry out semantic relevance calculation with the acquired replies to be selected, and determine the relevance between each reply to be selected and the portrait information and the target context under the three levels of words, sentences and contexts, and further select the reply to be selected with the highest relevance from the replies to be selected as the final reply of the target question, namely, select the candidate reply with the highest relevance between the portrait information and the target context from the replies to be selected as the final reply of the target question, and enable the content of the selected final reply to be more relevant to the personal characteristics of the questioner under the condition of considering the portrait information of the questioner who proposes the target question, so as to meet the dialogue requirement of the questioner and improve the reasonability of the reply generation result.

Further, the selected final reply can be output in the form of voice and/or text. It should be noted that, for the specific reply generation process, the reply acquisition process, and the reply selection process, please refer to the above related embodiments, which are not described herein again.

Sixth embodiment

In this embodiment, a reply selection device will be described, and for related contents, refer to the above method embodiment.

Referring to fig. 11, a schematic composition diagram of a reply selection apparatus provided in this embodiment is shown, where the apparatus 1100 includes:

a target question acquisition unit 1101 for acquiring a target question posed by a questioner;

an image information generating unit 1102 for generating character image information of the questioner based on the target question;

a candidate reply obtaining unit 1103, configured to obtain each candidate reply of the target question;

a candidate reply selection unit 1104, configured to select one reply from among the candidate replies as a final reply to the target question according to the target question and the portrait information.

In an implementation manner of this embodiment, the portrait information generation unit 1102 is specifically configured to:

and selecting at least one piece of candidate portrait information which is similar to the target question in semantics from a pre-constructed portrait information base as the portrait information of the questioner.

In one implementation manner of this embodiment, the portrait information generation unit 1102 includes:

a first result generation subunit, configured to generate a semantic representation result of the target question;

a second result generation subunit operable to generate a semantic representation result for each piece of candidate portrait information in the portrait information base;

and the portrait information selection subunit is used for selecting at least one piece of candidate portrait information which is similar to the target question in semantic distance according to the generated semantic representation result.

In an implementation manner of this embodiment, the reply to be selected obtaining unit 1103 is specifically configured to:

generating at least one reply to be selected of the target question according to the target question and the portrait information;

and/or acquiring at least one reply to be selected of the target question from a pre-constructed dialogue corpus.

In an implementation manner of this embodiment, the reply-to-be-selected selecting unit 1104 is specifically configured to:

and selecting one reply from all replies to be selected according to a target context and the portrait information, wherein the target context comprises the target question and the historical conversation text before the target question.

In an implementation manner of this embodiment, the reply-to-be-selected selecting unit 1104 includes:

a semantic relevance determining subunit, configured to determine, for each reply to be selected in the replies to be selected, a semantic relevance between the target context and the character image information and the reply to be selected, respectively;

and the reply to be selected selecting subunit is used for selecting one reply from each reply to be selected according to the semantic relevance corresponding to each reply to be selected.

In an implementation manner of this embodiment, the semantic relevance determining subunit includes:

a third result generation subunit, configured to generate a semantic representation result of the reply to be selected at the word level;

a first relevance generating subunit, configured to generate a semantic representation result of the target context at least one representation level of a word level, a sentence level, and a context level, and determine a semantic relevance between the target context and the reply to be selected according to the generated semantic representation result;

and the second relevance generation subunit is used for generating a semantic representation result of the character portrait information at least one representation level of a word level, a sentence level and a context level, and determining the semantic relevance between the character portrait information and the reply to be selected according to the generated semantic representation result.

In an implementation manner of this embodiment, the first correlation degree generating subunit is specifically configured to:

performing correlation calculation on the semantic representation result of the target context at each representation level and the semantic representation result of the reply to be selected to obtain semantic correlation between the target context and the reply to be selected;

correspondingly, the second correlation degree generation subunit is specifically configured to:

and performing correlation calculation on the semantic representation result of the figure portrait information at each representation level and the semantic representation result of the reply to be selected to obtain the semantic correlation between the figure portrait information and the reply to be selected.

Further, an embodiment of the present application further provides a reply selection apparatus, including: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is used for storing one or more programs, and the one or more programs comprise instructions which, when executed by the processor, cause the processor to execute any one of the implementation methods of the reply selection method.

Further, an embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device is caused to execute any implementation method of the above reply selection method.

Further, an embodiment of the present application further provides a computer program product, which when running on a terminal device, causes the terminal device to execute any implementation method of the above reply selection method.

From the above description of the embodiments, it is clear to those skilled in the art that all or part of the steps in the above embodiments may be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network communication device such as a media gateway, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.

It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant parts can be explained by referring to the method part.

It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A reply selection method, comprising:

acquiring a target question asked by a questioner;

acquiring each reply to be selected of the target problem;

2. The method of claim 1, wherein generating the portrait information of the questioner based on the target question comprises:

3. The method of claim 2, wherein selecting at least one candidate portrait information semantically similar to the target question from a pre-constructed portrait information library comprises:

generating a semantic representation result of the target question;

generating a semantic representation result of each candidate portrait information in the portrait information base;

and selecting at least one piece of candidate portrait information which is similar to the target problem in semantic distance according to the generated semantic representation result.

4. The method of claim 1, wherein the obtaining each candidate reply to the target question comprises:

5. The method of any of claims 1 to 4, wherein selecting one of the responses from the candidate responses based on the target question and the portrait information comprises:

6. The method of claim 5, wherein selecting one of the responses from the candidate responses based on the target context and the portrait information comprises:

for each reply to be selected in each reply to be selected, determining semantic relevancy between the target context and the portrait information and the reply to be selected respectively;

and selecting one reply from the replies to be selected according to the semantic relevance corresponding to each reply to be selected.

7. The method of claim 6, wherein determining semantic relatedness between the target context and the person representation information and the candidate reply, respectively, comprises:

generating semantic representation results of the to-be-selected replies at the word level;

generating semantic representation results of the target context at least one of a word level, a sentence level and a context level, and determining semantic relevance between the target context and the reply to be selected according to the generated semantic representation results;

generating a semantic representation result of the character image information at least one representation level of a word level, a sentence level and a context level, and determining semantic relevance between the character image information and the reply to be selected according to the generated semantic representation result.

8. The method of claim 7, wherein determining the semantic relatedness between the target context and the reply to be selected according to the generated semantic representation result comprises:

correspondingly, the determining the semantic relevance between the character image information and the reply to be selected according to the generated semantic representation result comprises:

9. A reply selection apparatus, comprising:

and the reply to be selected selecting unit is used for selecting one reply from all the replies to be selected according to the target question and the portrait information as the final reply of the target question.

10. The apparatus of claim 9, wherein the reply-to-be-selected selection unit is specifically configured to:

11. The apparatus of claim 10, wherein the reply to select unit comprises:

a semantic relatedness determining subunit, configured to determine, for each reply to be selected in the replies to be selected, a semantic relatedness between the target context and the portrait information and the reply to be selected, respectively;

and the reply selection subunit is used for selecting one reply from the replies to be selected according to the semantic relevance corresponding to each reply to be selected.

12. The apparatus of claim 11, wherein the semantic relatedness determination subunit comprises:

and the second correlation degree generation subunit is used for generating a semantic representation result of the character image information at least one representation level of a word level, a sentence level and a context level, and determining the semantic correlation degree between the character image information and the reply to be selected according to the generated semantic representation result.

13. A reply selection device, comprising: a processor, a memory, a system bus;

the processor and the memory are connected through the system bus;

the memory is to store one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform the method of any of claims 1-8.

14. A computer-readable storage medium having stored therein instructions that, when executed on a terminal device, cause the terminal device to perform the method of any one of claims 1-8.

15. A computer program product, characterized in that it, when run on a terminal device, causes the terminal device to perform the method of any of claims 1-8.