CN111444324A

CN111444324A - Sentence-break-based multi-purpose recognition method, device, equipment and storage medium

Info

Publication number: CN111444324A
Application number: CN202010146290.6A
Authority: CN
Inventors: 黄孟缘
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2020-03-05
Filing date: 2020-03-05
Publication date: 2020-07-24

Abstract

The invention relates to the field of artificial intelligence, and discloses a multi-intention recognition method, a device, equipment and a storage medium based on sentence breaks, wherein the accuracy and the efficiency of the intention recognition of a target sentence are improved by finely dividing the target sentence and then carrying out the intention recognition on the divided target sentence, and the accuracy of the subsequent semantic recognition is also improved at the same time, the method comprises the following steps: acquiring a target sentence input by a user; sentence breaking is carried out on the target sentence by utilizing a preset sequence model to obtain a segmented corpus; acquiring word vectors in the segmented corpus in a preset intention rule base, wherein the preset intention rule base is established according to business data; calculating the matching rate between the word vector and a preset word vector to obtain a segmentation intention, and obtaining a corpus identification intention; and classifying the corpus identification intention, and feeding back the final corpus identification intention.

Description

Sentence-break-based multi-purpose recognition method, device, equipment and storage medium

Technical Field

The invention relates to the field of artificial intelligence, in particular to a sentence-break-based multi-purpose recognition method, a sentence-break-based multi-purpose recognition device, equipment and a storage medium.

Background

Man-machine conversation-is an important research area in the field of artificial intelligence. The computer is used for understanding and utilizing natural languages of human society, such as Chinese, English and the like, so that the natural language communication between the human and the machine is realized, the computer can replace part of mental labor of people, and the effect of extending the brain of the human is really achieved.

During a man-machine conversation, the computer may answer questions, give certain parameters or determine options. During the dialog, the user guides or defines the work of the computer and supervises the execution of the tasks through the dialog. The method is favorable for bringing the intention, judgment and experience of the user into the working process of the computer, enhances the flexibility of computer application and is convenient for software writing. Therefore, the understanding and processing functions of the computer for the user input language are important, and currently, the man-machine interaction adopts batch processing, and a batch of operation control cards are used for completing operations one by one according to a set sequence.

While the computer performs a job, it is the key to understanding that the computer performs intent recognition on the language input by the user, and in general, the computer converts a multi-intent recognition problem into a multi-label model classification problem. The method has high requirements on the number of the corpora and the quality of the corpora, and the identification accuracy in the open domain is often unstable, so that the intention identification efficiency is low.

Disclosure of Invention

The invention provides a sentence-break-based multi-intention recognition method, device, equipment and storage medium, which are used for solving the problem of low accuracy of a computer in an intention recognition process and improving the accuracy and efficiency of intention recognition.

A first aspect of an embodiment of the present invention provides a sentence-break-based multi-intent recognition method, including: acquiring a target sentence input by a user; sentence breaking is carried out on the target sentence by utilizing a preset sequence model to obtain a segmented corpus; acquiring word vectors in the segmented corpus in a preset intention rule base, wherein the preset intention rule base is established according to business data; calculating the matching rate between the word vector and a preset word vector to obtain a segmentation intention, and obtaining a corpus identification intention; and classifying the corpus identification intention, and feeding back the final corpus identification intention.

Optionally, in a first implementation manner of the first aspect of the embodiment of the present invention, a word vector of the segmented corpus is extracted from a preset intention rule base; calculating the matching rate between the word vector and a preset word vector, wherein the preset word vector is arranged in the preset intention rule base and corresponds to the preset intentions of a plurality of service data; and selecting the preset word vector with the highest matching rate, taking the preset intention corresponding to the preset word vector as the segmentation intention of the segmented corpus, and acquiring the corpus identification intention.

Optionally, in a second implementation manner of the first aspect of the embodiment of the present invention, the preset word vector with the highest matching rate is selected, and the preset intention corresponding to the preset word vector is used as the segmentation intention of the segmented corpus; judging whether preset statement intents exist in the segmentation intents; if the preset statement intention exists in the segmentation intention, segmenting the segmentation corpus according to the preset statement intention to obtain segmented corpora after segmentation, judging the segmented corpora again until the preset statement intention does not exist in the segmented intention, and taking a plurality of segmented intents after segmentation as corpus identification intents; and if the preset statement intention does not exist in the segmentation intention, taking the segmentation intention as a corpus identification intention.

Optionally, in a third implementation manner of the first aspect of the embodiment of the present invention, the target statement is segmented according to the position of the delimiter in the target statement, so as to obtain a segmented statement; selecting a segmentation sequence of a word sequence in the segmented sentence; calculating segmentation probability, wherein the segmentation probability is the probability of segmenting the segmented sentences according to the segmentation sequence; and selecting the segmentation sequence with the highest segmentation probability as a segmentation result to obtain a segmented corpus.

Optionally, in a fourth implementation manner of the first aspect of the embodiment of the present invention, in the target statement, a position of the separator is located; and dividing the target sentence on two sides of the position of the separator to obtain a segmented sentence.

Optionally, in a fifth implementation manner of the first aspect of the embodiment of the present invention, a word sequence in the segmented sentence is extracted; adding a first special character and a second special character at the head and the tail of the character sequence respectively to obtain a new character sequence; establishing directed edges between the adjacent word nodes in the new word sequence to obtain a synthesized word group; and if the synthesized phrase is a word in a preset dictionary, deleting the directed edges between the word nodes, and establishing the directed edges at the two ends of the word nodes until the segmentation of the whole segmented sentence is completed to obtain a selected segmentation sequence.

Optionally, in a sixth implementation manner of the first aspect of the embodiment of the present invention, the corpus identification intent of the segmented corpus is extracted; judging whether the corpus identification intents comprise a mutual exclusion intention, wherein the mutual exclusion intention is a definite corpus identification intention or not which is simultaneously appeared in a target statement; if the corpus intent identification does not include the mutual exclusion intent, taking the corpus intent identification as a final corpus identification intent, and feeding back the final corpus identification intent; and if the corpus intent identification comprises the mutual exclusion intent, taking the corpus identification intent with the turning intent as a final corpus identification intent, and feeding back the final corpus identification intent.

A second aspect of an embodiment of the present invention provides a sentence-break-based multi-intent recognition apparatus, including: the first acquisition unit is used for acquiring a target sentence input by a user; the sentence breaking unit is used for breaking the target sentence by using a preset sequence model to obtain a segmented corpus; a second obtaining unit, configured to obtain word vectors in the segmented corpus from a preset intention rule base, where the preset intention rule base is established according to service data; the calculation unit is used for calculating the matching rate between the word vector and a preset word vector to obtain a segmentation intention and obtain a corpus identification intention; and the feedback unit is used for classifying the corpus identification intention and feeding back the final corpus identification intention.

Optionally, in a first implementation manner of the second aspect of the embodiment of the present invention, the calculating unit specifically includes: the extraction module is used for extracting word vectors of the segmented linguistic data from a preset intention rule base; the calculation module is used for calculating the matching rate between the word vectors and preset word vectors, the preset word vectors are arranged in the preset intention rule base, and the preset word vectors correspond to the preset intentions of the plurality of service data; and the third selecting module is used for selecting the preset word vector with the highest matching rate, taking the preset intention corresponding to the preset word vector as the segmentation intention of the segmented corpus, and acquiring the corpus identification intention.

Optionally, in a second implementation manner of the second aspect of the embodiment of the present invention, the selecting module is specifically configured to: selecting the preset word vector with the highest matching rate, and taking the preset intention corresponding to the preset word vector as the segmentation intention of the segmentation corpus; judging whether preset statement intents exist in the segmentation intents; if the preset statement intention exists in the segmentation intention, segmenting the segmentation corpus according to the preset statement intention to obtain segmented corpora after segmentation, judging the segmented corpora again until the preset statement intention does not exist in the segmented intention, and taking a plurality of segmented intents after segmentation as corpus identification intents; and if the preset statement intention does not exist in the segmentation intention, taking the segmentation intention as a corpus identification intention.

Optionally, in a third implementation manner of the second aspect of the embodiment of the present invention, the sentence segmentation unit specifically includes: the segmentation module is used for segmenting the target sentence according to the position of the separator in the target sentence to obtain a segmented sentence; the first selection module is used for selecting a segmentation sequence of the word sequence in the segmented sentence; the calculation module is used for calculating the segmentation probability, wherein the segmentation probability is the probability of segmenting the segmented sentences according to the segmentation sequence; and the second selection module is used for selecting the segmentation sequence with the highest segmentation probability as a segmentation result to obtain the segmented corpus.

Optionally, in a fourth implementation manner of the second aspect of the embodiment of the present invention, the dividing module is specifically configured to: locating a position of a separator in the target sentence; and dividing the target sentence on two sides of the position of the separator to obtain a segmented sentence.

Optionally, in a fifth implementation manner of the second aspect of the embodiment of the present invention, the first selecting module is specifically configured to: extracting a word sequence in the segmented sentence; adding a first special character and a second special character at the head and the tail of the character sequence respectively to obtain a new character sequence; establishing directed edges between the adjacent word nodes in the new word sequence to obtain a synthesized word group; and if the synthesized phrase is a word in a preset dictionary, deleting the directed edges between the word nodes, and establishing the directed edges at the two ends of the word nodes until the segmentation of the whole segmented sentence is completed to obtain a selected segmentation sequence.

Optionally, in a sixth implementation manner of the second aspect of the embodiment of the present invention, the feedback unit is specifically configured to: extracting the corpus identification intention of the segmented corpus; judging whether the corpus identification intents comprise a mutual exclusion intention, wherein the mutual exclusion intention is a definite corpus identification intention or not which is simultaneously appeared in a target statement; if the corpus intent identification does not include the mutual exclusion intent, taking the corpus intent identification as a final corpus identification intent, and feeding back the final corpus identification intent; and if the corpus intent identification comprises the mutual exclusion intent, taking the corpus identification intent with the turning intent as a final corpus identification intent, and feeding back the final corpus identification intent.

A third aspect of the embodiments of the present invention provides a sentence-based multi-intent recognition apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the sentence-based multi-intent recognition method according to any of the above embodiments when executing the computer program.

A fourth aspect of embodiments of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the first aspect described above.

According to the technical scheme, the embodiment of the invention has the following advantages:

the embodiment of the invention provides a method, a device, equipment and a storage medium for identifying multiple intentions based on sentence breaks, which are used for acquiring a target sentence input by a user; sentence breaking is carried out on the target sentence by utilizing a preset sequence model to obtain a segmented corpus; acquiring word vectors in the segmented corpus in a preset intention rule base, wherein the preset intention rule base is established according to business data; calculating the matching rate between the word vector and a preset word vector to obtain a segmentation intention, and obtaining a corpus identification intention; and classifying the corpus identification intention, and feeding back the final corpus identification intention. According to the embodiment of the invention, the target sentence is finely divided, and then the intention recognition is carried out on the divided target sentence, so that the accuracy and efficiency of the intention recognition of the target sentence are improved, and the accuracy of the subsequent semantic recognition is also improved.

Drawings

FIG. 1 is a schematic diagram of an embodiment of a sentence break-based multi-intent recognition method according to the present invention;

FIG. 2 is a schematic diagram of another embodiment of a sentence break-based multi-intent recognition method according to the present invention;

FIG. 3 is a schematic diagram of an embodiment of a sentence break-based multiple intent recognition apparatus according to the present invention;

FIG. 4 is a schematic diagram of another embodiment of a sentence break-based multiple intent recognition apparatus according to the present invention;

fig. 5 is a schematic diagram of an embodiment of a sentence break-based multiple intention recognition apparatus according to the present invention.

Detailed Description

The invention provides a sentence-break-based multi-intention recognition method, which is used for solving the problem of low accuracy of a computer in an intention recognition process and improving the accuracy and efficiency of intention recognition.

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1, an embodiment of a sentence break-based multi-intent recognition method according to the present invention includes:

101. and acquiring a target sentence input by a user.

The server acquires a target sentence input by a user.

It should be noted that, in the process of human-computer interaction, the primary action of the server is to acquire the target sentence input by the user, and after the input target sentence is acquired, the server analyzes and identifies the intention of the target sentence input by the user, so that a more appropriate answer can be fed back according to the question or the requirement proposed by the user.

Here, the server needs to acquire more than one sentence as the target sentence input by the user, which may be many separate sentences or sentences having a relationship of context. The sentence form of the target sentence can be various, and can be a single character, a single phrase or a long sentence consisting of a plurality of short sentences. The target sentence is not limited in this embodiment.

102. And (5) performing sentence breaking on the target sentence by using a preset sequence model to obtain a segmented corpus.

And the server carries out sentence breaking on the target sentence by using a preset sequence model to obtain the segmented corpus.

It should be noted that, after the server receives the target sentence input by the user, the target sentence needs to be divided. If the input target sentence is too complex, the server is difficult to identify the real intention of the target sentence and cannot timely feed back an accurate answer, so that the server needs to segment the target sentence firstly. And the server roughly divides the target sentence by using the separators in the target sentence to obtain the segmented sentence, wherein the separators comprise. Is there a | A (ii) a And punctuation marks such as 'and' are used, and after the target sentence is segmented by using the separators, the server performs more detailed corpus segmentation to obtain segmented corpora after corpus segmentation.

103. And acquiring word vectors in the segmented linguistic data in a preset intention rule base, wherein the preset intention rule base is established according to the business data.

In a preset intention rule base, a server obtains word vectors in the segmented linguistic data, and the preset intention rule base is established according to business data.

After the server breaks the target sentence, the word vector in the target sentence needs to be further extracted, and after the word vector is extracted by the server, the extracted word vector is matched with the preset word vector by using a matching method, so that the corresponding segmentation intention is obtained. In addition, the preset intention rule base is calculated according to corresponding business data, where the business data may be insurance-related data, medical-related data, and the like, and in the embodiment of the present invention, the business data is not limited.

104. And calculating the matching rate between the word vector and the preset word vector to obtain a segmentation intention, and obtaining the corpus identification intention.

And the server calculates the matching rate between the word vector and the preset word vector to obtain a segmentation intention and obtain a corpus identification intention.

It should be noted that, the server performs intent recognition on the segmented corpus, and obtains the corresponding corpus recognition intent by classifying the segmented corpus and matching the segmented corpus with the corresponding preset intent in the preset intent database. For example: after the intention recognition is carried out by the server, the matched intention in the preset intention library belongs to the music intention, so that the corpus recognition intention of the segmented corpus is obtained, and after the intention recognition is carried out, the server can carry out feedback of corresponding data according to the result after the intention recognition. The intention identification is that the server firstly extracts word vectors of the segmented linguistic data and then calculates the matching rate between the word vectors and preset word vectors, the preset word vectors are arranged in a preset intention rule base, each preset word vector corresponds to a preset intention, therefore, the server can calculate the matching rate of different preset intentions of the word vectors, the server selects the preset word vector with the highest matching rate, the preset intention corresponding to the preset word vector serves as the segmentation intention of the segmented linguistic data, and the segmented linguistic data obtained in this way is the most accurate in segmentation intention.

105. And classifying the corpus identification intention and feeding back the final corpus identification intention.

And the server classifies the corpus recognition intention and feeds back the final corpus recognition intention. Specifically, the server extracts the corpus identification intention of the segmented corpus; the server judges whether the corpus identification intentions comprise a mutual exclusion intention, wherein the mutual exclusion intention is that whether the corpus identification intention is positive or not is determined in the target statement; if the corpus intent identification does not include the mutual exclusion intent, the server takes the corpus intent identification as the final corpus intent identification and feeds back the final corpus intent identification; if the corpus intent identification includes the mutual exclusion intent, the server takes the corpus identification intent with the turning intent as the final corpus identification intent and feeds back the final corpus identification intent.

It should be noted that, the server classifies a plurality of corpus identification intents identified by the segmented corpus, classifies and integrates the corpus identification intents, and then judges the corpus identification intents after classification and integration to judge whether the corpus identification intents include mutual exclusion intents, wherein the mutual exclusion intents refer to the same target sentence and simultaneously determine whether the corpus identification intents are definite or not, and the occurrence of the mutual exclusion intents can lead the server to be incapable of accurately judging which corpus identification intents are to be output finally by the segmented corpus. After the judgment of the server, if the corpus intent identification does not include the mutual exclusion intent, the corpus intent identification indicates that each corpus intent identification is independent, and the server directly outputs all the corpus intent identifications; if the corpus intent identification includes the mutual exclusion intent, the segmented corpus is indicated to include the mutually exclusive intentions of the corpus intent identification, the server is confused, and the real corpus identification intent of the segmented corpus cannot be identified, so that the server takes the corpus identification intent with the turning intent as the final corpus identification intent of the segmented corpus and feeds back the final corpus identification intent.

According to the embodiment of the invention, the target sentence is finely divided, and then the intention recognition is carried out on the divided target sentence, so that the accuracy and efficiency of the intention recognition of the target sentence are improved, and the accuracy of the subsequent semantic recognition is also improved.

Referring to fig. 2, another embodiment of the sentence-break-based multi-intent recognition method according to the embodiment of the present invention includes:

201. and acquiring a target sentence input by a user.

The server acquires a target sentence input by a user.

202. And dividing the target sentence according to the position of the separator in the target sentence to obtain the segmented sentence.

And the server divides the target sentence according to the position of the separator in the target sentence to obtain the segmented sentence. Specifically, in the target sentence, the server locates the position of the separator; and on two sides of the position of the separator, the server divides the target sentence to obtain a segmented sentence.

After the server acquires the target sentence, the target sentence is simply divided, namely the target sentence is divided according to the positions of the separators, so that the server can conveniently identify the intention of the target sentence, and the result of the intention identification is consistent with the actually expressed meaning.

203. A segmentation sequence of word sequences is selected in the segmented sentence.

The server selects a segmentation sequence of word sequences in the segmented sentence. Specifically, the server extracts word sequences in the segmented sentences; the server adds a first special character and a second special character at the head and the tail of the character sequence respectively to obtain a new character sequence; the server establishes a directed edge between adjacent byte points in the new word sequence to obtain a synthesized word group; if the synthesized phrase is a word in a preset dictionary, the server deletes the directed edges between the word nodes and establishes the directed edges at the two ends of the byte points until the segmentation of the whole segmented sentence is completed, and the selected segmentation sequence is obtained.

In the process of corpus segmentation, different corpus segmentation modes can cause different meanings of the same segmented sentence. For example, for the segmented sentence "achievement obtained by research", the meaning of the sentence is divided into "research/institute/acquisition/achievement" and "research institute/acquisition/achievement" is different, which segmentation mode is correct needs to be determined according to the context of the segmented sentence, corpus segmentation is actually to calculate the segmentation probability of the segmentation sequence, and the segmentation sequence with the maximum segmentation probability is the most reasonable segmentation mode of the segmented sentence.

204. And calculating the segmentation probability, wherein the segmentation probability is the probability of segmenting the segmented sentences according to the segmentation sequence.

And the server calculates the segmentation probability, wherein the segmentation probability is the probability of segmenting the segmented sentences according to the segmentation sequence.

205. And selecting the segmentation sequence with the highest segmentation probability as a segmentation result to obtain the segmented corpus.

And the server selects the segmentation sequence with the highest segmentation probability as a segmentation result to obtain the segmented corpus.

After the server calculates the segmentation probability, selecting a segmentation sequence corresponding to the highest segmentation probability in the calculation result as a segmentation result, and further obtaining the segmented corpus segmented according to the segmentation sequence. The highest segmentation probability indicates that the probability of sentence breaking of the target sentence according to the rules of the preset dictionary is the highest, that is, the matching probability of the synthesized phrase formed by the word sequences in the target sentence and the words in the preset dictionary is the highest, the accuracy and the reasonability of the synthesized phrase are both higher, and the target sentence segmented according to the segmentation sequence is the most reasonable.

For example: falseThe server is provided with a preset dictionary D and a segmented sentence t, and the aim of corpus segmentation is to divide a word sequence t of the segmented sentence into (t)₁,t₂,t₃…t_n) Cutting into segmented corpora s ═ (w)₁,w₂,w₃…w_j|w_j∈ D) and maximizes the segmentation probability p(s).

In the first step, the server needs to select a segmentation sequence, and for a word sequence, t ═ t (t)₁,t₂,t₃…t_n) In other words, a first special word and a second special word are added at the end of the first special word respectively, and the first special word is used<s>Indicating that the second special word is<\s>Shows that a new word sequence t ═ (t) is obtained₀,t₁,t₂,t₃…t_n,t_m)。

Second, the server sets the new word sequence t ═ t (t)₀,t₁,t₂,t₃…t_n,t_m) Middle adjacent byte point t_i-1And t_iEstablish a directed edge therebetween<t_i-1,t_i>。

Third, if the word node (t)_i,t_i+1,t_i+2…t_j) The composed word is a segmented corpus w in a preset dictionary D_jThe server deletes the directed edges between the word nodes and establishes directed edges at both ends of the byte point<t_i-1,t_w>And<t_w,t_j+1>obtaining the segmented corpus s ═ (w)₁,w₂,w₃…w_j|w_j∈ D), a plurality of segmented corpora constitute the selected segmentation sequence(s)₁,s₂,s₃…s_n)。

And fourthly, after the segmentation sequence is obtained, calculating the segmentation probability of the segmentation sequence, wherein the formula used is as follows:

assuming that the segmented corpora s are all independent, they can be converted into

Is calculated, wherein p: (w_j) The probability of (c) is set in the preset dictionary D, and thus the segmentation probability is calculated.

And fifthly, the server selects a segmentation sequence with high segmentation probability as a segmentation result to obtain a segmentation corpus.

206. And acquiring word vectors in the segmented linguistic data in a preset intention rule base, wherein the preset intention rule base is established according to the business data.

207. And calculating the matching rate between the word vector and the preset word vector to obtain a segmentation intention, and obtaining the corpus identification intention.

And the server calculates the matching rate between the word vector and the preset word vector to obtain a segmentation intention and obtain a corpus identification intention. Specifically, extracting word vectors of the segmented linguistic data from a preset intention rule base; calculating the matching rate between the word vectors and preset word vectors, wherein the preset word vectors are arranged in a preset intention rule base and correspond to preset intentions of a plurality of service data; and selecting a preset word vector with the highest matching rate, taking a preset intention corresponding to the preset word vector as a segmentation intention of the segmented corpus, and obtaining a corpus identification intention.

Selecting a preset word vector with the highest matching rate, taking a preset intention corresponding to the preset word vector as a segmentation intention of a segmented corpus, and acquiring a corpus identification intention, wherein the preset word vector comprises the following specific steps: selecting a preset word vector with the highest matching rate, and taking a preset intention corresponding to the preset word vector as a segmentation intention of the segmentation corpus; judging whether preset statement intents exist in the segmentation intents; if preset sentence intentions exist in the segmentation intentions, segmenting the segmentation corpus according to the preset sentence intentions to obtain segmented corpus, judging the segmented corpus again until the preset sentence intentions do not exist in the segmented corpus, and taking a plurality of segmented concepts as corpus identification intentions; and if the preset statement intention does not exist in the segmentation intents, taking the segmentation intents as corpus identification intents.

After the server acquires the segmentation intents, whether preset statement intents exist in the segmentation intents or not needs to be judged, if the preset statement intents exist in the segmentation intents, the segmentation intents can be segmented again and are segmented into different intents, therefore, when the preset statement intents exist in the segmentation intents, the server segments the corpus with the preset statement intents in the segmentation corpus and judges the rest segmentation corpora again until the preset statement intents do not exist in the segmented intents, and the server takes all the segmentation intents obtained after segmentation as corpus identification intents and applies the corpus identification intents in the following steps; when the preset statement intention does not exist in the segmentation intents, the server is explained to segment the segmentation corpus into the finest, all the segmentation intents of the target statement are obtained, and all the segmentation intents are used as the corpus identification intents and are applied in the subsequent steps.

208. And classifying the corpus identification intention and feeding back the final corpus identification intention.

With reference to fig. 3, the method for recognizing multiple intentions based on sentence breaks in the embodiment of the present invention is described above, and a device for recognizing multiple intentions based on sentence breaks in the embodiment of the present invention is described below, where one embodiment of the device for recognizing multiple intentions based on sentence breaks in the embodiment of the present invention includes:

a first obtaining unit 301, configured to obtain a target sentence input by a user;

a sentence-breaking unit 302, configured to perform sentence breaking on the target sentence by using a preset sequence model to obtain a segmented corpus;

a second obtaining unit 303, configured to obtain a word vector in the segmented corpus from a preset intention rule base, where the preset intention rule base is established according to service data;

the calculating unit 304 is configured to calculate a matching rate between the word vector and a preset word vector to obtain a segmentation intention, and obtain a corpus identification intention;

a feedback unit 305, configured to classify the corpus recognition intention and feed back a final corpus recognition intention.

In this embodiment of the present invention, the first obtaining unit 301 is configured to obtain a target sentence input by a user; the sentence breaking unit 302 is configured to break a sentence of the target sentence by using a preset sequence model to obtain a segmented corpus; the second obtaining unit 303 is configured to obtain word vectors in the segmented corpus from a preset intention rule base, where the preset intention rule base is established according to service data; the calculating unit 304 is configured to calculate a matching rate between the word vector and a preset word vector to obtain a segmentation intention, and obtain a corpus identification intention; the feedback unit 305 is configured to classify the corpus recognition intent and feed back a final corpus recognition intent.

Referring to fig. 4, another embodiment of the apparatus for recognizing multiple intentions based on sentence break according to the embodiment of the present invention includes:

Optionally, the calculating unit 304 specifically includes:

an extracting module 3041, configured to extract word vectors of the segmented corpus from a preset intention rule base;

a calculating module 3042, configured to calculate a matching rate between the word vector and a preset word vector, where the preset word vector is set in the preset intention rule base, and the preset word vector corresponds to preset intentions of a plurality of service data;

a third selecting module 3043, configured to select the preset word vector with the highest matching rate, use the preset intention corresponding to the preset word vector as a segmentation intention of the segmented corpus, and obtain a corpus identification intention.

Optionally, the third selecting module 3043 is specifically configured to:

selecting the preset word vector with the highest matching rate, and taking the preset intention corresponding to the preset word vector as the segmentation intention of the segmentation corpus;

judging whether preset statement intents exist in the segmentation intents;

if the preset statement intention exists in the segmentation intention, segmenting the segmentation corpus according to the preset statement intention to obtain segmented corpora after segmentation, judging the segmented corpora again until the preset statement intention does not exist in the segmented intention, and taking a plurality of segmented intents after segmentation as corpus identification intents;

and if the preset statement intention does not exist in the segmentation intention, taking the segmentation intention as a corpus identification intention.

Optionally, the sentence segmentation unit 302 specifically includes:

a dividing module 3021, configured to divide the target sentence according to the position of the delimiter in the target sentence, so as to obtain a segmented sentence;

a first selecting module 3022, configured to select a segmentation sequence of a word sequence in the segmented sentence;

a calculating module 3023, configured to calculate a segmentation probability, where the segmentation probability is a probability that the segmented sentence is segmented according to the segmentation sequence;

and the second selecting module 3024 is configured to select the segmentation sequence with the highest segmentation probability as a segmentation result, so as to obtain a segmented corpus.

Optionally, the dividing module 3021 is specifically configured to:

locating a position of a separator in the target sentence;

and dividing the target sentence on two sides of the position of the separator to obtain a segmented sentence.

Optionally, the first selecting module 3022 is specifically configured to:

extracting a word sequence in the segmented sentence;

adding a first special character and a second special character at the head and the tail of the character sequence respectively to obtain a new character sequence;

establishing directed edges between the adjacent word nodes in the new word sequence to obtain a synthesized word group;

and if the synthesized phrase is a word in a preset dictionary, deleting the directed edges between the word nodes, and establishing the directed edges at the two ends of the word nodes until the segmentation of the whole segmented sentence is completed to obtain a selected segmentation sequence.

Optionally, the feedback unit 305 is specifically configured to:

extracting the corpus identification intention of the segmented corpus;

judging whether the corpus identification intents comprise a mutual exclusion intention, wherein the mutual exclusion intention is a definite corpus identification intention or not which is simultaneously appeared in a target statement;

if the corpus intent identification does not include the mutual exclusion intent, taking the corpus intent identification as a final corpus identification intent, and feeding back the final corpus identification intent;

and if the corpus intent identification comprises the mutual exclusion intent, taking the corpus identification intent with the turning intent as a final corpus identification intent, and feeding back the final corpus identification intent.

In this embodiment of the present invention, the first obtaining unit 301 is configured to obtain a target sentence input by a user; the sentence segmenting unit 302 is configured to segment the target sentence by using a preset sequence model to obtain a segmented corpus, and specifically, the segmenting module 3021 is configured to segment the target sentence according to the position of the separator in the target sentence to obtain a segmented sentence; the first selecting module 3022 is configured to select a segmentation sequence of a word sequence in the segmented sentence; the calculating module 3023 is configured to calculate a segmentation probability, where the segmentation probability is a probability that the segmented sentence is segmented according to the segmentation sequence; the second selecting module 3024 is configured to select the segmentation sequence with the highest segmentation probability as a segmentation result, so as to obtain a segmented corpus; the second obtaining unit 303 is configured to obtain word vectors in the segmented corpus from a preset intention rule base, where the preset intention rule base is established according to service data; the calculating unit 304 is configured to calculate a matching rate between the word vector and a preset word vector to obtain a segmentation intention, and obtain a corpus identification intention, and specifically, the extracting module 3041 is configured to extract the word vector of the segmentation corpus in a preset intention rule base; the calculating module 3042 is configured to calculate a matching rate between the word vector and a preset word vector, where the preset word vector is set in the preset intention rule base, and the preset word vector corresponds to preset intentions of a plurality of service data; the third selecting module 3043 is configured to select the preset word vector with the highest matching rate, use the preset intention corresponding to the preset word vector as a segmentation intention of the segmented corpus, and obtain a corpus identification intention; the feedback unit 305 is configured to classify the corpus recognition intent and feed back a final corpus recognition intent.

Fig. 3 to 4 describe the sentence-based multi-intent recognition apparatus in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the sentence-based multi-intent recognition apparatus in the embodiment of the present invention is described in detail from the perspective of hardware processing.

The following specifically describes each constituent element of the sentence-break-based multi-intent recognition apparatus with reference to fig. 5:

fig. 5 is a schematic structural diagram of a sentence-based multi-intent recognition device 500 according to an embodiment of the present invention, which may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 501 (e.g., one or more processors) and a memory 509, and one or more storage media 508 (e.g., one or more mass storage devices) storing an application 507 or data 506. Memory 509 and storage medium 508 may be, among other things, transient storage or persistent storage. The program stored on storage medium 508 may include one or more modules (not shown), each of which may include a series of instruction operations for a sentence-based intent recognition device. Still further, the processor 501 may be configured to communicate with the storage medium 508 to execute a series of instruction operations in the storage medium 508 on the sentence-based multi-intent recognition device 500.

The sentence-based multi-intent recognition device 500 may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input-output interfaces 504, and/or one or more operating systems 505, such as Windows Server, Mac OS X, Unix, L inux, FreeBSD, etc. it will be understood by those skilled in the art that the sentence-based multi-intent recognition device architecture shown in FIG. 5 does not constitute a limitation of the sentence-based multi-intent recognition device, may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

the processor 501 is a control center of the sentence-break-based multiple intention recognition apparatus, and may perform processing according to a sentence-break-based multiple intention recognition method. The processor 501 connects the various parts of the whole sentence-break-based multi-intent recognition device by using various interfaces and lines, and improves the accuracy and efficiency of the intent recognition of the target sentence and the accuracy of the subsequent semantic recognition by operating or executing the software program and/or module stored in the memory 509 and calling the data stored in the memory 509, and by finely dividing the target sentence and then performing the intent recognition on the divided target sentence. The storage medium 508 and the memory 509 are carriers for storing data, in the embodiment of the present invention, the storage medium 508 may be an internal memory with a small storage capacity but a high speed, and the memory 509 may be an external memory with a large storage capacity but a low storage speed.

The memory 509 may be used to store software programs and modules, and the processor 501 executes various functional applications and data processing of the sentence-based multi-intent recognition apparatus 500 by operating the software programs and modules stored in the memory 509. The memory 509 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the stored data area may store data created from use of the sentence break-based multiple intention recognition apparatus, and the like. Further, the memory 509 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. The sentence-based multi-intent recognition program provided in the embodiment of the present invention and the received data stream are stored in a memory, and when needed to be used, the processor 501 calls from the memory 509.

The procedures or functions according to the embodiments of the invention are brought about in whole or in part when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, optical fiber, twisted pair) or wirelessly (e.g., infrared, wireless, microwave, etc.). A computer-readable storage medium may be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., compact disk), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A sentence-break-based multi-intent recognition method is characterized by comprising the following steps:

acquiring a target sentence input by a user;

sentence breaking is carried out on the target sentence by utilizing a preset sequence model to obtain a segmented corpus;

acquiring word vectors in the segmented corpus in a preset intention rule base, wherein the preset intention rule base is established according to business data;

calculating the matching rate between the word vector and a preset word vector to obtain a segmentation intention, and obtaining a corpus identification intention;

and classifying the corpus identification intention, and feeding back the final corpus identification intention.

2. The sentence-break-based multi-intent recognition method according to claim 1, wherein the calculating a matching rate between the word vector and a preset word vector to obtain a segmentation intent, and the obtaining the corpus recognition intent comprises:

extracting word vectors of the segmented linguistic data from a preset intention rule base;

calculating the matching rate between the word vector and a preset word vector, wherein the preset word vector is arranged in the preset intention rule base and corresponds to the preset intentions of a plurality of service data;

and selecting the preset word vector with the highest matching rate, taking the preset intention corresponding to the preset word vector as the segmentation intention of the segmented corpus, and acquiring the corpus identification intention.

3. The sentence-break-based multi-intent recognition method according to claim 2, wherein the selecting the preset word vector with the highest matching rate, taking the preset intent corresponding to the preset word vector as a segmentation intent of the segmented corpus, and obtaining the corpus recognition intent comprises:

judging whether preset statement intents exist in the segmentation intents;

4. The sentence-breaking-based multi-intent recognition method according to claim 1, wherein the step of breaking the target sentence by using a preset sequence model to obtain the segmented corpus comprises:

dividing the target sentence according to the position of the separator in the target sentence to obtain a segmented sentence;

selecting a segmentation sequence of a word sequence in the segmented sentence;

calculating segmentation probability, wherein the segmentation probability is the probability of segmenting the segmented sentences according to the segmentation sequence;

and selecting the segmentation sequence with the highest segmentation probability as a segmentation result to obtain a segmented corpus.

5. The sentence break-based multi-intent recognition method according to claim 4, wherein the segmenting the target sentence according to the position of the separator in the target sentence to obtain the segmented sentence comprises:

locating a position of a separator in the target sentence;

6. The sentence-break-based multi-intent recognition method of claim 4, wherein the selecting a sliced sequence of word sequences in the segmented sentence comprises:

extracting a word sequence in the segmented sentence;

7. The sentence-break-based multi-intent recognition method according to any of claims 1-6, wherein the classifying the corpus recognition intent and feeding back the final corpus recognition intent comprises:

extracting the corpus identification intention of the segmented corpus;

8. A sentence-break-based multi-intent recognition apparatus, comprising:

the first acquisition unit is used for acquiring a target sentence input by a user;

the sentence breaking unit is used for breaking the target sentence by using a preset sequence model to obtain a segmented corpus;

a second obtaining unit, configured to obtain word vectors in the segmented corpus from a preset intention rule base, where the preset intention rule base is established according to service data;

the calculation unit is used for calculating the matching rate between the word vector and a preset word vector to obtain a segmentation intention and obtain a corpus identification intention;

and the feedback unit is used for classifying the corpus identification intention and feeding back the final corpus identification intention.

9. A sentence-break-based multi-intent recognition device, comprising:

a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;

the at least one processor invokes the instructions in the memory to cause the sentence-based multi-intent recognition device to perform the sentence-based multi-intent recognition method of any of claims 1-7.

10. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the steps of the sentence-break-based multi-intent recognition method according to any of claims 1-7.