CN111831806A

CN111831806A - Semantic integrity determination method and device, electronic equipment and storage medium

Info

Publication number: CN111831806A
Application number: CN202010634056.8A
Authority: CN
Inventors: 鲁骁; 邓雄文; 柯震; 纪鸿旭; 过群; 孟二利; 王斌
Original assignee: Beijing Xiaomi Pinecone Electronic Co Ltd
Current assignee: Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date: 2020-07-02
Filing date: 2020-07-02
Publication date: 2020-10-27
Anticipated expiration: 2040-07-02
Also published as: CN111831806B

Abstract

The disclosure relates to a semantic integrity determination method, a semantic integrity determination device, an electronic device and a storage medium, wherein the method comprises the following steps: predicting a first text to be processed through a text prediction model, determining a target probability table corresponding to a target element position, wherein the target probability table is used for representing the prediction probability of each text element in a plurality of text elements contained in a preset element set appearing at the target element position, determining the prediction probability corresponding to each target text element in the element set according to the target probability table, and determining whether the semantics of the first text to be processed are complete or not according to the prediction probability corresponding to each target text element, and the target text element is used for representing the semantic integrity of the first text to be processed. The method and the device can accurately determine whether the semantics of the first text to be processed are complete, and avoid that the man-machine conversation system obtains the text with incomplete semantics due to the fact that the input end of the text is judged by mistake, so that the accuracy and the efficiency of man-machine conversation are improved.

Description

Semantic integrity determination method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of text recognition technologies, and in particular, to a semantic integrity determination method and apparatus, an electronic device, and a storage medium.

Background

In recent years, with the rise of artificial intelligence technology, human-machine interactive systems, which are one of important research fields of artificial intelligence technology, have attracted much attention. In the process of man-machine conversation, the man-machine conversation system needs to continuously acquire the conversation text input by the user, and timely makes a corresponding response based on the conversation text input by the user after the input of the conversation text is detected. However, the human-computer dialog system may not accurately detect whether the input of the dialog text is finished, and if the human-computer dialog system erroneously determines that the input of the dialog text is finished, the human-computer dialog system may intervene in advance without the input of the user being finished, so as to interrupt the input process of the user, so that the human-computer dialog system obtains the dialog text with incomplete semantics, which may cause the human-computer dialog system to give an erroneous response based on the erroneous dialog text, and reduce the accuracy and efficiency of the human-computer dialog.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides a semantic integrity determination method, apparatus, electronic device, and storage medium.

According to a first aspect of embodiments of the present disclosure, there is provided a semantic integrity determination method, the method including:

predicting a first text to be processed through a pre-trained text prediction model, and determining a target probability table corresponding to the position of a target element; wherein,

the target element position is an element position which is next to and next to a last text element of the first text to be processed, and the target probability table is used for representing the prediction probability of each text element in a plurality of text elements contained in a preset element set appearing at the target element position;

and determining a prediction probability corresponding to each target text element in the element set according to the target probability table, and determining whether the semantics of the first text to be processed are complete according to the prediction probability corresponding to each target text element, wherein the target text element is used for representing the semantic integrity of the first text to be processed.

Optionally, the predicting the first text to be processed by using a pre-trained text prediction model to determine a target probability table corresponding to the target element position includes:

taking the first text to be processed as the input of the text prediction model to obtain a predicted text output by the text prediction model and a probability table corresponding to each element position in the predicted text;

the text elements included in the predicted text correspond to the text elements included in the first text to be processed one by one, and each text element of the predicted text is characterized as a text element which appears at a position next to the position of the corresponding text element predicted by the text element corresponding to the first text to be processed; the probability table corresponding to each element position in the predicted text represents the predicted probability of the text element appearing at the element position;

and determining a probability table corresponding to the last element position in the predicted text as the target probability table.

Optionally, the target text element comprises a plurality of terminal text elements for representing the end of the text;

determining, according to the target probability table, a prediction probability corresponding to each target text element in the element set, so as to determine whether the semantic meaning of the first text to be processed is complete according to the prediction probability corresponding to each target text element, including:

determining the prediction probability of each terminal text element according to the target probability table;

performing preset operation on the prediction probability of each terminal text element to obtain an operation result;

and determining whether the semantics of the first text to be processed is complete according to the operation result.

Optionally, the target text element includes a termination element, a null word element and a symbol element, the termination element is a text element for representing the end of the text, the null word element is a text element for representing a part of speech as a null word, and the symbol element is a text element for representing a punctuation mark;

determining a terminating elements, b fictitious word elements and c symbolic elements contained in the element set, wherein a, b and c are integers, a is greater than 0, b is greater than 0, and c is greater than 0;

according to the target probability table, acquiring a first prediction probability corresponding to each termination element, a second prediction probability corresponding to each fictitious word element and a third prediction probability corresponding to each symbol element;

determining a target prediction probability by using a first formula according to the first prediction probability, the second prediction probability and the third prediction probability;

the first formula is expressed as:

wherein, P_xPredicting a probability, P, for said target_jIs the first prediction probability, P_kIs said second prediction probability, P_hIs the third prediction probability;

if the target prediction probability is greater than or equal to a preset probability threshold, determining that the semantics of the first text to be processed are complete; or,

and if the target prediction probability is smaller than the probability threshold, determining that the semantics of the first text to be processed is incomplete.

Optionally, the method further comprises:

and under the condition that the semantics of the first text to be processed are determined to be incomplete, performing a semantic supplementation step to obtain a target text with complete semantics corresponding to the first text to be processed.

Optionally, the semantic supplementing step includes:

determining a first time length by utilizing a preset corresponding relation according to the target prediction probability, wherein the preset corresponding relation is used for representing the corresponding relation between the target prediction probability and the time length;

determining a second text to be processed in the first time period, wherein the second text to be processed comprises the first text to be processed and a text acquired in the first time period;

and taking the second text to be processed as the first text to be processed, and performing the steps from the step of predicting the first text to be processed by the pre-trained text prediction model and determining a target probability table corresponding to the position of a target element to the step of determining the prediction probability corresponding to each target text element in the element set according to the target probability table so as to determine whether the semantic meaning of the first text to be processed is complete according to the prediction probability corresponding to each target text element.

Optionally, the text prediction model is obtained by:

training a preset time sequence convolution network through a plurality of groups of training data to obtain the text prediction model; wherein,

the training data includes: the input end training data comprise 1 st to nth text elements in a training text, the output end training data comprise 2 nd to n +1 th text elements in the training text, the training text comprises n +1 text elements which are arranged in sequence, the n +1 th text element in the training text is the target text element, n is an integer, and n is greater than 0.

According to a second aspect of the embodiments of the present disclosure, there is provided a semantic integrity determination apparatus, the apparatus including:

the processing module is configured to predict a first text to be processed through a pre-trained text prediction model, and determine a target probability table corresponding to the position of a target element; wherein,

a determining module configured to determine, according to the target probability table, a prediction probability corresponding to each target text element in the element set, so as to determine whether the semantic meaning of the first text to be processed is complete according to the prediction probability corresponding to each target text element, where the target text element is used to represent the semantic integrity of the first text to be processed.

Optionally, the processing module includes:

the first obtaining sub-module is configured to take the first text to be processed as the input of the text prediction model, and obtain a predicted text output by the text prediction model and a probability table corresponding to each element position in the predicted text;

a first determining sub-module configured to determine a probability table corresponding to a last element position in the predicted text as the target probability table.

the determining module comprises:

a second determining sub-module configured to determine a prediction probability of each of the terminal text elements according to the target probability table;

the first calculation submodule is configured to perform preset operation on the prediction probability of each terminal text element to obtain an operation result;

the second determining submodule is further configured to determine whether the semantics of the first text to be processed are complete according to the operation result.

Optionally, the target text element includes a termination element, a particle element, and a symbol element, where the termination element is a text element for representing a text end, the particle element is a text element for representing a part of speech as a particle, and the symbol element is a text element for representing a punctuation mark, and the determining module includes:

a third determining submodule configured to determine a termination elements, b particle elements and c symbol elements included in the element set, where a, b and c are integers, and a > 0, b > 0 and c > 0;

a second obtaining sub-module configured to obtain, according to the target probability table, a first prediction probability corresponding to each of the terminating elements, a second prediction probability corresponding to each of the dummy elements, and a third prediction probability corresponding to each of the symbol elements;

a second calculation sub-module configured to determine a target prediction probability using a first formula based on the first prediction probability, the second prediction probability, and the third prediction probability;

the first formula is expressed as:

the third determining submodule is further configured to determine that the semantic meaning of the first text to be processed is complete if the target prediction probability is greater than or equal to a preset probability threshold; or,

the third determining submodule is further configured to determine that the semantics of the first text to be processed are incomplete if the target prediction probability is smaller than the probability threshold.

Optionally, the apparatus further comprises:

the execution module is configured to execute a semantic supplementing step to acquire a target text with complete semantics corresponding to the first text to be processed under the condition that the semantics of the first text to be processed are determined to be incomplete.

Optionally, the semantic supplementing step includes:

Optionally, the text prediction model is obtained by:

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the steps of the semantic integrity determination method provided by the first aspect of the present disclosure.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the steps of the semantic integrity determination method provided by the first aspect of the present disclosure.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: the method comprises the steps of firstly predicting a first text to be processed through a text prediction model, determining a target probability table corresponding to a target element position, wherein the target element position is an element position which is behind the last text element of the first text to be processed and is close to the last text element, the target probability table is used for representing the prediction probability of each text element in a plurality of text elements contained in a preset element set appearing at the target element position, then determining the prediction probability corresponding to each target text element in the element set according to the target probability table, and determining whether the semantics of the first text to be processed are complete or not according to the prediction probability corresponding to each target text element, and the target text element is used for representing the semantic integrity of the first text to be processed. According to the method and the device, the text prediction model is used for predicting the prediction probability that the next text element of the first text to be processed is the target text element, whether the semantics of the first text to be processed is complete or not can be accurately determined, the situation that the man-machine conversation system obtains the text with incomplete semantics due to the fact that the input end of the text is judged mistakenly is avoided, and therefore the accuracy and the efficiency of man-machine conversation are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow diagram illustrating a semantic integrity determination method in accordance with an exemplary embodiment.

Fig. 2 is a flow chart illustrating a step 101 according to the embodiment shown in fig. 1.

Fig. 3 is a flow chart illustrating one step 102 according to the embodiment shown in fig. 1.

Fig. 4 is a flow chart illustrating another step 102 according to the embodiment shown in fig. 1.

FIG. 5 is a flow diagram illustrating another method of semantic integrity determination in accordance with an exemplary embodiment.

Fig. 6 is a block diagram illustrating a semantic integrity determination apparatus in accordance with an example embodiment.

FIG. 7 is a block diagram of a processing module shown in accordance with the embodiment shown in FIG. 6.

FIG. 8 is a block diagram illustrating a determination module according to the embodiment shown in FIG. 6.

FIG. 9 is a block diagram of another determination module shown in accordance with the embodiment shown in FIG. 6.

Fig. 10 is a block diagram illustrating another apparatus for semantic integrity determination in accordance with an example embodiment.

FIG. 11 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Before introducing the semantic integrity determination method, apparatus, electronic device, and storage medium provided by the present disclosure, an application scenario related to various embodiments of the present disclosure is first introduced. The application scenario may be a human-machine dialog scenario. In this scenario, a user may perform a human-machine conversation with the human-machine conversation system through the terminal, for example, when the user needs to use any one of an intelligent assistant, an intelligent customer service, a map navigation, an intelligent speaker, or a service robot, the user may input a text containing specific needs of the user through the terminal, so that the human-machine conversation system responds according to the text input by the terminal. The terminal may be a mobile terminal such as a smart phone, a tablet computer, a smart watch, a smart bracelet, a PDA (Personal Digital Assistant, chinese), or a fixed terminal such as a desktop computer.

FIG. 1 is a flow diagram illustrating a semantic integrity determination method in accordance with an exemplary embodiment. As shown in fig. 1, the method may include the steps of:

in step 101, a first text to be processed is predicted by a pre-trained text prediction model, and a target probability table corresponding to the position of a target element is determined.

The target element position is an element position which is behind and is close to the last text element of the first text to be processed, and the target probability table is used for representing the prediction probability of each text element in a plurality of text elements contained in the preset element set appearing at the target element position.

For example, the target probability table (or probability table) in the present application may be understood as a table storing a set of probabilities of prediction of occurrence of each text element included in the element set at the target element position (or at a certain element position), for example, when the element set includes 30000 text elements, the target probability table includes a set of prediction probabilities of occurrence of each text element of the 30000 text elements at the target element position. The probability set includes the correspondence between the text element and the prediction probability. In the embodiment of the present disclosure, for convenience of illustration, taking an element set containing 5 text elements as an example, table 1 is shown (where < eos > in table 1 is an end-of-sentence break), where table 1 is used to characterize a target probability table or a probability table of a text prediction model output according to a first text to be processed in a case where the element set contains 5 text elements.

TABLE 1

Text element	Prediction probability
		Listening device	0.5
Playing with	0.35
		。	0.05
<eos>	0.05
		To master	0.05

For example, in the process of the man-machine conversation, the man-machine conversation system can acquire a text input by a user through a terminal in real time, and take the currently acquired text as a first text to be processed. For example, when the terminal acquires that the voice input by the user is "i want to use map navigation", the man-machine dialog system may perform voice recognition through a preset voice recognition algorithm to obtain the first text to be processed, "i want to use map navigation", and may also perform image recognition on an image input by the user through the terminal to obtain a text, which is not specifically limited by the present disclosure. The first text to be processed is composed of a plurality of text elements according to the input sequence of the user, each text element in the first text to be processed corresponds to an element position, and the element position is the position of the text element in the first text to be processed.

The specific type of the text element of the first text to be processed is related to the language type of the first text to be processed, and when the language type of the first text to be processed is chinese, each text element may be an independent chinese character, for example, when the first text to be processed is "i want to listen to song", the first text to be processed includes 4 text elements, which are "i", "want", "listen" and "song", respectively. When the language type of the first text to be processed is english, the text element may be an english word, for example, when the first text to be processed is "I wait to listen to music", the first text to be processed includes 6 text elements, which are "I", "wait", "to", "listen", "to", and "music", respectively.

For example, after receiving the first text to be processed, the man-machine interaction system may input the first text to be processed into a pre-trained text prediction model. Then, the first text to be processed is predicted by the text prediction model, so that the prediction probability of each text element contained in the element set appearing at the target element position is determined, and a target probability table corresponding to the target element position is obtained. The target element position is an element position after and adjacent to the last text element of the first text to be processed, that is, the target element position is a position corresponding to a next text element of the last text element of the first text to be processed, for example, when the first text to be processed is "i want", a next element position of the text element "want" is a target element position. That is, the target probability table corresponding to the determined target element position is actually a predicted probability that each text element included in the determined element set is a text element next to the first text to be processed. The element set may be understood as a preset dictionary table including a large number of text elements, and the dictionary table may be composed of all text elements included in the training data of the text prediction model.

It should be noted that, a man-machine conversation is generally an online service, requirements on delay and concurrency performance are high, in order to meet the requirements on delay and concurrency performance of the man-machine conversation, a text prediction model can be obtained by training a time sequence convolution network, the obtained text prediction model has high concurrency performance, can be quickly trained on a GPU (english: Graphics processing unit, chinese: Graphics processor), and meanwhile, can make delay in the man-machine conversation process low.

In step 102, according to the target probability table, a prediction probability corresponding to each target text element in the element set is determined, so as to determine whether the semantics of the first text to be processed is complete according to the prediction probability corresponding to each target text element.

The target text element is used for representing the semantic integrity of the first text to be processed.

For example, a target text element with complete semantics for representing the first text to be processed may be preset in the human-computer interaction system, when the target text element appears at the target element position, it may be determined that the semantics of the first text to be processed is complete, and when the target text element does not appear at the target element position, it may be determined that the semantics of the first text to be processed is incomplete. The target text element may be a preset termination element (e.g. an end of sentence < eos >), may also be a particle, and may also be a punctuation mark. After determining the target probability table, the man-machine dialog system may determine, according to the target probability table, a prediction probability corresponding to each target text element, and further determine, according to the prediction probability corresponding to each target text element, whether the semantics of the first text to be processed are complete. The manner of determining whether the semantics of the first text to be processed is complete may be: the method comprises the steps of firstly, determining the sum of prediction probabilities corresponding to all target text elements according to the prediction probability corresponding to each target text element, determining that the semantics of a first text to be processed are complete if the sum of the prediction probabilities corresponding to all the target text elements is greater than or equal to a certain threshold, and determining that the semantics of the first text to be processed are incomplete if the sum of the prediction probabilities corresponding to all the target text elements is less than the threshold.

By adopting the scheme, the text prediction model is used for predicting the prediction probability that the next text element of the first text to be processed is the target text element, whether the semantics of the first text to be processed is complete or not can be accurately determined, the situation that the man-machine conversation system obtains the text with incomplete semantics due to the fact that the input end of the text is judged by mistake is avoided, and therefore the accuracy and the efficiency of man-machine conversation are improved.

Fig. 2 is a flow chart illustrating a step 101 according to the embodiment shown in fig. 1. As shown in fig. 2, step 101 may include the steps of:

in step 1011, the first text to be processed is used as an input of the text prediction model, and a predicted text output by the text prediction model and a probability table corresponding to each element position in the predicted text are obtained.

The text elements included in the predicted text correspond to the text elements included in the first text to be processed one by one, each text element of the predicted text is characterized as the text element which is predicted by the text element corresponding to the first text to be processed and appears at the position next to the position where the corresponding text element is located, and the probability table corresponding to the position of each element in the predicted text represents the prediction probability of the text element appearing at the position of the element.

In this step, first, the first text to be processed may be used as an input of a text prediction model, and the text prediction model predicts, according to each text element in the first text to be processed and in combination with a text element before the text element, a next text element of the text element, so as to determine a prediction probability that each text element included in the element set appears at a next element position of the text element, that is, determine a prediction probability that each text element included in the element set is the next text element of the text element, so as to obtain a predicted text output by the text prediction model, and a probability table corresponding to each element position in the predicted text. The predicted text is a text composed of a text element which is next to a possible text element of each text element in the first text to be processed and predicted by the text prediction model according to the first text to be processed, for example, when the first text to be processed is "i want", the predicted text may be "to listen". The predicted text and the first text to be processed contain the same number of text elements, each text element in the predicted text corresponds to an element position, the ith text element in the predicted text is actually based on the ith text element in the first text to be processed through a text prediction model, and is combined with the text element before the ith text element in the first text to be processed, the (i + 1) th text element in the predicted first text to be processed, namely the ith element position in the predicted text corresponds to the (i + 1) th element position in the first text to be processed, and i is a positive integer greater than or equal to 1.

In step 1012, a probability table corresponding to the last element position in the predicted text is determined as a target probability table.

Further, whether the semantics of the first text to be processed are complete or not is determined, and actually, whether a target text element representing the complete semantics of the text appears at the position of the target element or not is only required to be determined. The ith element position in the predicted text corresponds to the (i + 1) th element position in the first text to be processed, namely the last element position in the predicted text is the target element position. Therefore, after the predicted text and the probability table corresponding to each element position in the predicted text are obtained, the probability table corresponding to the last element position in the predicted text may be used as the target probability table, and whether the semantics of the first text to be processed are complete may be determined by using the target probability table.

Fig. 3 is a flow chart illustrating one step 102 according to the embodiment shown in fig. 1. As shown in fig. 3, the target text element includes a plurality of terminal text elements for characterizing the end of the text, and step 102 may include the steps of:

at step 1021, a prediction probability for each of the terminal text elements is determined based on the target probability table.

In one scenario, after determining the target probability table, the man-machine interaction system may detect text elements in the element set, determine m terminal text elements included in the element set, where m is an integer and m > 0. And then, according to the target probability table, obtaining the prediction probability of each terminal text element in the m terminal text elements.

In step 1022, a preset operation is performed on the prediction probability of each terminal text element to obtain an operation result.

In this step, the man-machine dialog system may perform a preset operation according to the prediction probability of each terminal text element to obtain an operation result for determining whether the semantics of the first to-be-processed text are complete. For example, the preset operation can be expressed as the following formula (1):

wherein, P_fAs a result of the operation, P_gIs the predicted probability of the terminating text element.

In step 1023, it is determined whether the semantics of the first text to be processed are complete according to the operation result.

For example, if the operation result is greater than or equal to a certain threshold, it indicates that the probability that the next text element of the first text to be processed is the target text element is higher, and it is determined that the semantics of the first text to be processed is complete. And if the operation result is smaller than a certain threshold, the probability that the next text element of the first text to be processed is the target text element is low, and the incomplete semantics of the first text to be processed is determined.

Fig. 4 is a flow chart illustrating another step 102 according to the embodiment shown in fig. 1. As shown in fig. 4, the target text element includes a termination element, a particle element and a symbol element, the termination element is a text element for representing the end of the text, the particle element is a text element for representing the part of speech as a particle, and the symbol element is a text element for representing punctuation, step 102 may include the following steps:

in step 1024, a termination elements, b particle elements and c symbol elements included in the element set are determined.

Wherein a, b and c are integers, a is more than 0, b is more than 0, and c is more than 0.

In step 1025, a first prediction probability corresponding to each termination element, a second prediction probability corresponding to each particle element, and a third prediction probability corresponding to each symbol element are obtained according to the target probability table.

In another scenario, the particle element and the symbol element are also text elements that can represent the complete semantic meaning of the text, and therefore, the termination element, the particle element and the symbol element can be taken together as the target text element. After determining the target probability table, the man-machine interaction system may detect text elements in the element set, and determine a terminating elements, b particle elements, and c symbol elements included in the element set. Then, the man-machine conversation system can obtain a first prediction probability corresponding to each termination element, a second prediction probability corresponding to each particle element and a third prediction probability corresponding to each symbol element according to the target probability table. For example, the target text element includes 1 termination element "< eos >", 1 particle element "and 2 symbol elements". ","! "in the case of" a first prediction probability corresponding to the termination element "< eos >", a second prediction probability corresponding to the particle element "of" and the symbol element "may be determined from the target probability table. ","! "respectively corresponding third prediction probabilities.

In step 1026, a target prediction probability is determined using a first formula based on the first prediction probability, the second prediction probability, and the third prediction probability.

The first formula may be expressed as the following formula (2):

wherein, P_xPredicting the probability for the target, P_jIs a first prediction probability, P_kIs the second prediction probability, P_hIs the third prediction probability.

In this step, the human-computer dialog system may determine, according to the first prediction probability, the second prediction probability, and the third prediction probability, a sum of prediction probabilities corresponding to the a termination elements, the b dummy word elements, and the c symbol elements by using a first formula to obtain a target prediction probability. For example, in the case of 1 terminating element, 1 particle element, and 2 symbol elements included in the target text element, if the prediction probability corresponding to the terminating element is 0.1, the prediction probability corresponding to the particle element is 0.15, and the prediction probabilities corresponding to the 2 symbol elements are 0.05 and 0.1, respectively, the target prediction probability P is determined by using the first formula_x＝0.1+0.15+0.05+0.1＝0.4。

In another possible embodiment, the manner of determining the target prediction probability may be: firstly, a weight value corresponding to each target text element can be respectively set, then, according to the prediction probability and the weight value corresponding to each target text element, the weighted sum of the prediction probabilities corresponding to all the target text elements is determined, and the weighted sum of the prediction probabilities corresponding to all the target text elements is used as the target prediction probability.

In step 1027, if the target prediction probability is greater than or equal to a predetermined probability threshold, it is determined that the semantic meaning of the first text to be processed is complete. Or,

in step 1028, if the target prediction probability is smaller than the probability threshold, it is determined that the semantics of the first text to be processed are incomplete.

For example, the target prediction probability may be understood as a semantic integrity flag generated by the human-computer dialog system according to the prediction probability corresponding to the target text element in the element set, and used for determining whether the semantic meaning of the first text to be processed is complete. If the target prediction probability is greater than or equal to the preset probability threshold, it is indicated that the probability that the next text element of the first text to be processed is the target text element is higher, and it is determined that the semantics of the first text to be processed is complete. And if the target prediction probability is smaller than the probability threshold, the probability that the next text element of the first text to be processed is the target text element is low, and the fact that the semantics of the first text to be processed is incomplete is determined. The target prediction probability is determined by fusing various different target text elements to judge the semantic integrity, so that whether the semantics of the first text to be processed are complete can be judged more accurately.

It should be noted that, when the set target text elements are different, the determined target prediction probabilities are also different, and the target text elements may be dynamically adjusted and expanded according to specific requirements, so as to further improve the accuracy of determining the semantic integrity of the first text to be processed. For example, when the language type of the first text to be processed is different, the definition and composition of the particle may be different, and the particle element in the target text element may be readjusted according to specific requirements.

FIG. 5 is a flow diagram illustrating another method of semantic integrity determination in accordance with an exemplary embodiment. As shown in fig. 5, the method further comprises the steps of:

in step 103, in a case that it is determined that the semantics of the first text to be processed are incomplete, a semantics supplementing step is performed to obtain a target text with complete semantics corresponding to the first text to be processed.

In this step, if it is determined that the semantics of the first text to be processed are incomplete, which indicates that the text input by the user through the terminal is not finished, the man-machine interaction system may continue to acquire the text input by the user through the terminal by performing the semantics supplementing step, so as to obtain the target text with complete semantics. If the fact that the semantics of the first text to be processed is complete is determined, which indicates that the text input by the user through the terminal is finished, the man-machine conversation system can take the first text to be processed as the target text. After the target text with complete semantics is determined, the man-machine conversation system can respond based on the target text, so that the user can be ensured to obtain the response required by the user, and the user experience is improved.

Taking the example that the user inputs the text in a voice input mode, when the text that the user wants to input through the terminal is "i want to listen to a song", if the user pauses in the voice input process, the semantics of the first text to be processed acquired by the man-machine conversation system may be incomplete. For example, the first text to be processed acquired by the man-machine conversation system is "i want to listen", and "i want to listen" lacks an object component of "listen", the man-machine conversation system determines that the semantics of the first text to be processed is incomplete, which indicates that text input by the user through the terminal is not finished, and the man-machine conversation system can continue to acquire the voice uttered by the user until a target text with complete semantics (i.e., "i want to listen to a song") is acquired. After the man-machine conversation system acquires the target text 'i want to listen to a song', the terminal can be controlled to play the song.

Optionally, the semantic supplementing step comprises:

step a), determining a first time length by utilizing a preset corresponding relation according to the target prediction probability, wherein the preset corresponding relation is used for representing the corresponding relation between the target prediction probability and the time length.

For example, if the semantics of the first text to be processed is incomplete, which indicates that the text input by the user through the terminal is not finished yet, the human-computer interaction system continues to collect the text input by the user through the terminal. In order to ensure that the man-machine conversation system can acquire a text with complete semantics, the first duration for the man-machine conversation system to continue to acquire the text can be determined by utilizing the preset corresponding relation according to the prediction probability corresponding to the target text element. The preset correspondence may be a negative correlation correspondence, that is, the lower the target prediction probability is, the longer the first duration is.

And b), determining a second text to be processed in the first duration, wherein the second text to be processed comprises the first text to be processed and the text acquired in the first duration.

And c), taking the second text to be processed as the first text to be processed, and executing the steps from the step 101 to the step 102 again.

For example, after the first duration is determined, the man-machine conversation system may continue to collect the text input by the user through the terminal in the first duration, and combine the text acquired by the man-machine conversation system in the first duration with the first text to be processed to obtain the second text to be processed. Then, the man-machine conversation system may take the second text to be processed as a new first text to be processed, and repeatedly perform steps 101 to 102 according to the new first text to be processed to determine whether the semantics of the new first text to be processed are complete until the target text is obtained.

Optionally, the text prediction model is obtained by:

and training a preset time sequence convolution network through a plurality of groups of training data to obtain a text prediction model.

Wherein the training data comprises: the input end training data comprise 1 st to nth text elements in a training text, the output end training data comprise 2 nd to n +1 th text elements in the training text, the training text comprises n +1 text elements which are arranged in sequence, the n +1 th text element in the training text is a target text element, n is an integer, and n is greater than 0.

Specifically, when the text prediction model is trained, a corpus data set including a large number of texts may be obtained in advance, and then the corpus data set is sorted to determine a target text element included in each text in the corpus data set, and for a terminating element or a terminating text element (e.g., < eos > or the like) in the target text element, the terminating element or the terminating text element may be added in advance behind each text in the corpus data set. The determination of the particle elements and the symbol elements contained in each text may be implemented by a text recognition algorithm or by a manual labeling method. Then, each text in the sorted corpus data set can be used as a training text corresponding to a set of training data, the 1 st to nth text elements in the training text corresponding to each set of training data can be used as input training data of the set of training data, and the 2 nd to n +1 st text elements in the training text corresponding to each set of training data can be used as output training data of the set of training data. Then, a Sequence to Sequence (English) method is adopted by multiple groups of training data, a preset time Sequence convolution network is pre-trained on a GPU, then a preset loss function (the preset loss function can be a cross entropy function for example) is utilized to optimize the pre-trained time Sequence convolution network, and when the loss function reaches the minimum value, a text prediction model is obtained.

In summary, the text prediction model is used for predicting the prediction probability that the next text element of the first text to be processed is the target text element, so that whether the semantics of the first text to be processed is complete can be accurately determined, the situation that the man-machine conversation system obtains the text with incomplete semantics due to the fact that the input end of the text is judged by mistake is avoided, and the accuracy and the efficiency of the man-machine conversation are improved.

Fig. 6 is a block diagram illustrating a semantic integrity determination apparatus in accordance with an example embodiment. As shown in fig. 6, the apparatus includes a processing module 201 and a determining module 202.

The processing module 201 is configured to predict a first text to be processed through a pre-trained text prediction model, and determine a target probability table corresponding to the target element position.

The determining module 202 is configured to determine, according to the target probability table, a prediction probability corresponding to each target text element in the element set, so as to determine whether the semantic meaning of the first text to be processed is complete according to the prediction probability corresponding to each target text element, where the target text element is used to represent the semantic integrity of the first text to be processed.

FIG. 7 is a block diagram of a processing module shown in accordance with the embodiment shown in FIG. 6. As shown in fig. 7, the processing module 201 includes a first obtaining sub-module 2011 and a first determining sub-module 2012.

The first obtaining sub-module 2011 is configured to use the first text to be processed as an input of the text prediction model, and obtain a predicted text output by the text prediction model and a probability table corresponding to each element position in the predicted text.

The first determining sub-module 2012 is configured to determine a probability table corresponding to a last element position in the predicted text as a target probability table.

FIG. 8 is a block diagram illustrating a determination module according to the embodiment shown in FIG. 6. As shown in fig. 8, the target text element includes a plurality of terminal text elements for characterizing the end of the text.

The determination module 202 includes: a second determination sub-module 2021 and a first calculation sub-module 2022.

A second determining sub-module 2021 configured to determine a prediction probability of each terminating text element according to the target probability table.

The first calculating sub-module 2022 is configured to perform a preset operation on the prediction probability of each terminal text element to obtain an operation result.

The second determining sub-module 2021 is further configured to determine whether the semantic meaning of the first text to be processed is complete according to the operation result.

FIG. 9 is a block diagram illustrating a determination module according to the embodiment shown in FIG. 6. As shown in fig. 9, the target text element includes a termination element, a particle element, and a symbol element, the termination element is a text element for characterizing the end of the text, the particle element is a text element for characterizing the part of speech as a particle, the symbol element is a text element for characterizing the punctuation, and the determining module 202 includes a third determining sub-module 2023, a second obtaining sub-module 2024, and a second calculating sub-module 2025.

A third determining submodule 2023, configured to determine a terminating elements, b particle elements and c symbol elements included in the element set, where a, b and c are integers, and a > 0, b > 0 and c > 0.

The second obtaining sub-module 2024 is configured to obtain, according to the target probability table, a first prediction probability corresponding to each termination element, a second prediction probability corresponding to each particle element, and a third prediction probability corresponding to each symbol element.

A second calculation sub-module 2025 configured to determine a target prediction probability using the first formula based on the first prediction probability, the second prediction probability, and the third prediction probability.

The first formula is expressed as:

The third determining sub-module 2023 is further configured to determine that the semantic meaning of the first text to be processed is complete if the target prediction probability is greater than or equal to a preset probability threshold. Or,

the third determining sub-module 2023 is further configured to determine that the semantics of the first text to be processed are incomplete if the target prediction probability is smaller than the probability threshold.

Fig. 10 is a block diagram illustrating another apparatus for semantic integrity determination in accordance with an example embodiment. As shown in fig. 10, the apparatus 200 further includes:

the execution module 203 is configured to, in a case that it is determined that the semantics of the first text to be processed are incomplete, perform a semantic supplementation step to obtain a target text with complete semantics corresponding to the first text to be processed.

Optionally, the semantic supplementing step comprises:

and determining the first time length by utilizing a preset corresponding relation according to the target prediction probability, wherein the preset corresponding relation is used for representing the corresponding relation between the target prediction probability and the time length.

And determining a second text to be processed in the first duration, wherein the second text to be processed comprises the first text to be processed and the text acquired in the first duration.

And taking the second text to be processed as the first text to be processed, and performing the steps from predicting the first text to be processed through a pre-trained text prediction model and determining a target probability table corresponding to the position of the target element to determining the prediction probability corresponding to each target text element in the element set according to the target probability table so as to determine whether the semantic meaning of the first text to be processed is complete according to the prediction probability corresponding to each target text element.

Optionally, the text prediction model is obtained by:

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

The present disclosure also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the steps of the semantic integrity determination method provided by the present disclosure.

FIG. 11 is a block diagram illustrating an electronic device in accordance with an example embodiment. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 11, electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the semantic integrity determination methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power components 806 provide power to the various components of the electronic device 800. Power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components for performing the semantic integrity determination methods described above.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the electronic device 800 to perform the semantic integrity determination method described above is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned semantic integrity determination method when executed by the programmable apparatus.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of semantic integrity determination, the method comprising:

2. The method of claim 1, wherein the predicting the first text to be processed by the pre-trained text prediction model to determine the target probability table corresponding to the target element position comprises:

3. The method of claim 1, wherein the target text element comprises a plurality of terminal text elements for characterizing the end of text;

4. The method according to claim 1, wherein the target text elements comprise a termination element, a null word element and a symbol element, the termination element is a text element for representing the end of the text, the null word element is a text element for representing a part of speech as a null word, and the symbol element is a text element for representing punctuation marks;

the first formula is expressed as:

5. The method according to any one of claims 1-4, further comprising:

6. The method of claim 5, wherein the semantic supplementing step comprises:

7. The method according to any of claims 1-4, wherein the text prediction model is obtained by:

8. An apparatus for semantic integrity determination, the apparatus comprising:

9. The apparatus of claim 8, wherein the processing module comprises:

10. The apparatus of claim 8, wherein the target text element comprises a plurality of terminal text elements for characterizing the end of text;

the determining module comprises:

11. The apparatus of claim 8, wherein the target text elements comprise a termination element, a particle element, and a symbol element, the termination element is a text element for representing an end of a text, the particle element is a text element for representing a part of speech as a particle, and the symbol element is a text element for representing a punctuation mark, and the determining module comprises:

the first formula is expressed as:

12. The apparatus according to any one of claims 8-11, further comprising:

13. The apparatus of claim 12, wherein the semantic supplementing step comprises:

14. The apparatus according to any of claims 8-11, wherein the text prediction model is obtained by:

15. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to perform the steps of the method of any one of claims 1-7.

16. A computer-readable storage medium, on which computer program instructions are stored, which program instructions, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 7.