CN113591463B

CN113591463B - Intention recognition method, device, electronic equipment and storage medium

Info

Publication number: CN113591463B
Application number: CN202110872245.3A
Authority: CN
Inventors: 马亿凯
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2023-07-18
Anticipated expiration: 2041-07-30
Also published as: CN113591463A

Abstract

The invention relates to the technical field of artificial intelligence, and provides an intention recognition method, an intention recognition device, electronic equipment and a storage medium, wherein the method comprises the following steps: sending the first incoming call voice to a pre-trained dialect recognition model corresponding to the target area to obtain a first score; when the first score is smaller than a preset dialect score threshold value of the target area, respectively inputting the first incoming call voice into a plurality of residual dialect recognition models to obtain a plurality of second scores, and determining a target dialect recognition model; identifying a second incoming call voice by adopting a target dialect identification model to obtain an intention text; and receiving intention confirmation information reported by the client to perform online voice answering. According to the method and the device, after the first score is smaller than the preset dialect score threshold, the first incoming call voice is respectively input into the plurality of residual dialect recognition models trained in advance, the target dialect recognition model is determined to carry out intention recognition, and the accuracy of the intention recognition is improved.

Description

Intention recognition method, device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an intention recognition method, an intention recognition device, electronic equipment and a storage medium.

Background

The internet is rapidly developed, and online business handling is an important channel, particularly telephone customer service, and a user performs business handling by dialing a telephone, and the prior art performs recognition by inputting incoming call voice into a voice recognition model.

However, the inventor finds that in the relatively backward region of the middle-western part of our country, most of the daily languages of middle-aged and elderly people are not mandarin, but are mixed with local dialects, and the existing speech recognition model does not consider performing dialect recognition, if incoming call speech is directly input into the existing speech recognition model, the incoming call speech cannot be recognized in the local language, so that telephone customer service cannot be easily caused to confirm the actual intention of the incoming call user, and the accuracy and efficiency of intention recognition are low.

Therefore, there is a need for a method of quickly and accurately identifying the intent of an incoming user.

Disclosure of Invention

In view of the foregoing, it is necessary to provide an intention recognition method, apparatus, electronic device and storage medium, which determine that a target dialect recognition model performs intention recognition by respectively inputting first incoming call voices into a plurality of remaining dialect recognition models trained in advance after a first score is smaller than a preset dialect score threshold, thereby improving accuracy of intention recognition.

A first aspect of the present invention provides an intention recognition method, the method comprising:

responding to a received user incoming call request, collecting first incoming call voice of an incoming call user in the user incoming call request and identifying a target area where the incoming call user is located;

inputting the first incoming call voice into a pre-trained dialect recognition model corresponding to the target area to obtain a first score;

when the first score is smaller than a preset dialect score threshold corresponding to the target area, the first incoming call voice is respectively input into a plurality of residual dialect recognition models trained in advance to obtain a plurality of second scores, and a target dialect recognition model is determined according to the plurality of second scores;

collecting second incoming call voice of the incoming call user in real time, and identifying the second incoming call voice by adopting the target dialect identification model to obtain an intention text;

sending the intention text to a client corresponding to the caller number in the caller request;

receiving intention confirmation information reported by the client, and confirming an intention feedback result according to the intention confirmation information;

and carrying out online voice answering according to the intention feedback result, and sending the intention feedback result to a client corresponding to the caller number in the caller request.

Optionally, the determining the target dialect recognition model according to the plurality of second scores includes:

selecting a second score with the highest score from the plurality of second scores, identifying a third attribution area corresponding to the second score with the highest score, and comparing the second score with a preset dialect score threshold corresponding to the third attribution area;

when the second score with the highest score is greater than or equal to a preset dialect score threshold value corresponding to the third attribution area, determining a pre-trained dialect recognition model corresponding to the third attribution area as a target dialect recognition model; or alternatively

And when the second score with the highest score is smaller than a preset dialect score threshold value corresponding to the third attribution area, determining a default language identification model as a target dialect identification model.

Optionally, the training process of the dialect recognition model includes:

collecting a corpus of a plurality of historical users in each region;

splitting each voice segment in the corpus into a plurality of words through a Gaussian mixture model, and extracting a voice characteristic value of each word and a corresponding voice score of each word;

Constructing a sample data set containing positive samples and negative samples according to the language characteristic values of the plurality of words, wherein the positive samples are the language characteristic values of the plurality of words corresponding to a preset first language score threshold value of a corresponding region, and the negative samples are the language characteristic values of the plurality of words corresponding to a preset second language score threshold value of the corresponding region, and the language score of the negative samples is smaller than or equal to the language characteristic values of the plurality of words corresponding to a preset second language score threshold value of the corresponding region;

randomly dividing the sample dataset into a first number of training sets and a second number of test sets;

inputting the training set into a preset neural network for training to obtain a dialect recognition model;

inputting the test set into the dialect recognition model for testing to obtain a test passing rate;

judging whether the test passing rate is larger than a preset passing rate threshold value or not;

when the test passing rate is greater than or equal to the preset passing rate threshold value, finishing training of the dialect recognition model;

and when the test passing rate is smaller than the preset passing rate threshold value, increasing the number of the training sets and retraining the dialect recognition model based on the increased training sets until the test passing rate is larger than or equal to the preset passing rate threshold value.

Optionally, the identifying the target area where the incoming call user is located includes:

analyzing the incoming call request of the user to obtain the identity card number and the incoming call number of the incoming call user;

extracting a plurality of key fields from the identity card number, matching a first home region matched with the key fields from a preset region database, identifying an operator of the incoming call number, and acquiring a second home region of the incoming call number through a data interface query service corresponding to the operator;

when the first home region and the second home region are the same region, determining the first home region as a target region where the incoming call user is located; or alternatively

And when the first home region and the second home region are not the same region, determining the first home region and the second home region as target regions where the incoming call users are located.

Optionally, inputting the first incoming call voice into a pre-trained dialect recognition model corresponding to the target area, and obtaining a first score includes:

when the target area is a first attribution area, inputting the first incoming call voice into a pre-trained dialect recognition model corresponding to the first attribution area to obtain a first score; or alternatively

When the target area is a first home area and a second home area, inputting the first incoming call voice into a pre-trained dialect recognition model corresponding to the first home area, and sending the first incoming call voice into a pre-trained dialect recognition model corresponding to the second home area; receiving a first dialect score output by a pre-trained dialect recognition model corresponding to the first attribution area, and receiving a second dialect score output by a pre-trained dialect recognition model corresponding to the second attribution area; calculating the product of the first language score and a preset first weight value to obtain a third language score corresponding to a first attribution area; calculating the product of the second dialect score and a preset second weight value to obtain a fourth dialect score corresponding to a second attribution area; and calculating the sum of the third aspect score and the fourth aspect score to obtain a first score.

Optionally, the confirming the intention feedback result according to the intention confirming information includes:

when the intention confirming information is correct, carrying out entity identification on the intention text to obtain a plurality of entities;

Generating a first text vector containing contextual features according to the intention text, and generating entity feature vectors according to the entities;

converting the first text vector into a plurality of granularity second text vectors through convolution operation;

feature extraction is carried out on the second text vectors with the multiple granularities to obtain semantic feature vectors, and the semantic feature vectors and the entity feature vectors are spliced to obtain template feature vectors;

determining an intention category corresponding to the intention text according to the template feature vector;

determining the intention answer text matched with the intention category and the URL corresponding to the intention answer text from a preset database, and determining the intention answer text and the URL corresponding to the intention answer text as an intention feedback result.

Optionally, the sending the intent feedback result to the client corresponding to the caller number in the caller request includes:

and converting the intention feedback result into a data list with a preset format, and sending the data list with the preset format to a client corresponding to the caller number in the caller request through a corresponding interface.

A second aspect of the present invention provides an intention recognition apparatus, the apparatus comprising:

The system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for responding to a received user incoming call request, acquiring a first incoming call voice of an incoming call user in the user incoming call request and identifying a target area where the incoming call user is located;

the input module is used for inputting the first incoming call voice into a pre-trained dialect recognition model corresponding to the target area to obtain a first score;

the determining module is used for respectively inputting the first incoming call voice into a plurality of residual dialect recognition models trained in advance when the first score is smaller than a preset dialect score threshold corresponding to the target area, obtaining a plurality of second scores and determining a target dialect recognition model according to the plurality of second scores;

the recognition module is used for collecting second incoming call voice of the incoming call user in real time, and recognizing the second incoming call voice by adopting the target dialect recognition model to obtain an intention text;

the sending module is used for sending the intention text to a client corresponding to the caller number in the caller request;

the receiving module is used for receiving the intention confirming information reported by the client and confirming an intention feedback result according to the intention confirming information;

And the answering module is used for carrying out online voice answering according to the intention feedback result and sending the intention feedback result to a client corresponding to the caller number in the caller request.

A third aspect of the present invention provides an electronic device comprising a processor and a memory, the processor being arranged to implement the method of intent recognition when executing a computer program stored in the memory.

A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the intention recognition method.

In summary, according to the intention recognition method, the device, the electronic equipment and the storage medium of the invention, on one hand, the intention text is sent to the client corresponding to the caller number in the caller request, and after the intention text of the caller is obtained, the intention text is sent to the client corresponding to the caller number in the caller request to carry out secondary confirmation of the intention text, so that the accuracy of the intention text is improved; on the other hand, on-line voice answering is carried out according to the intention feedback result, the intention feedback result is sent to a client corresponding to the caller number in the caller request, after an accurate intention text is determined, the intention feedback result is obtained according to the intention text, and on-line voice answering is carried out according to the intention feedback result, so that the pain point that the true intention of the caller cannot be determined due to the fact that a telephone customer service cannot understand the dialect of the caller is solved, the phenomenon that the user is dissatisfied due to communication barriers is avoided, the satisfaction degree of the user is improved, meanwhile, the intention feedback result is sent to the client corresponding to the caller number in the caller request for confirmation, the telephone customer service is assisted to better understand the true intention of the caller and communicate, and the service quality and the service efficiency of the telephone customer service are improved; and finally, when the first score is smaller than a preset dialect score threshold corresponding to the target region, respectively inputting the first incoming call voice into a plurality of residual dialect recognition models trained in advance to obtain a plurality of second scores, determining a target dialect recognition model according to the second scores, and determining a target dialect recognition model for carrying out second incoming call voice recognition subsequently after considering whether the dialect corresponding to the target region is used or not and whether two dimensions of the dialect are used or not, thereby improving the accuracy of intention recognition of the second incoming call voice.

Drawings

Fig. 1 is a flowchart of an intention recognition method according to an embodiment of the present invention.

Fig. 2 is a block diagram of an intention recognition device according to a second embodiment of the present invention.

Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It should be noted that, without conflict, the embodiments of the present invention and features in the embodiments may be combined with each other.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Example 1

In this embodiment, the intention recognition method may be applied to an electronic device, and for an electronic device that needs to perform intention recognition, the function of intention recognition provided by the method of the present invention may be directly integrated on the electronic device, or may be run in the form of a software development kit (Software Development Kit, SDK) in the electronic device.

The embodiment of the application can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

As shown in fig. 1, the intention recognition method specifically includes the following steps, the order of the steps in the flowchart may be changed according to different needs, and some may be omitted.

S11, responding to a received user incoming call request, collecting first incoming call voice of an incoming call user in the user incoming call request and identifying a target area where the incoming call user is located.

In this embodiment, the user incoming call request is used to characterize the user to conduct business handling by dialing a phone number, for example, the user may have a question about the insurance clause in the process of making an application, and the user incoming call request is sent by dialing XXXXX to conduct the related information corresponding to the consulting insurance clause, where the target region is used to characterize the home region of the incoming call user.

In an alternative embodiment, the identifying the target area in which the incoming call user is located includes:

In this embodiment, when an incoming call request is received, the phone number of the incoming call user is extracted from the first incoming call language.

In this embodiment, a first home area corresponding to the identification card number and a second home area corresponding to the incoming call number in the incoming call request may be identified at the same time, and when the first home area and the second home area are not the same home area, the first home area is preferentially determined as the target area where the incoming call user is located.

In this embodiment, by identifying the target area of the caller, the pronunciation characteristics of the caller are considered in the subsequent voiceprint identification process, so that the accuracy of voiceprint identification is improved.

And S12, inputting the first incoming call voice into a pre-trained dialect recognition model corresponding to the target area to obtain a first score.

In this embodiment, the dialect recognition model is trained in advance, and the received first incoming call voice of the incoming call user is input into the dialect recognition model to be recognized, so as to obtain a first score, and it is determined that the first incoming call voice of the incoming call user is matched with the dialect voice feature corresponding to the target area according to the first score.

In this embodiment, different dialect recognition models are trained in advance for different regions due to differences among dialects of different regions, for example, a northern dialect recognition model, a Wu Fangyan recognition model, a Xiang dialect recognition model, a Gann dialect recognition model, a Hakka dialect recognition model, a Min dialect recognition model and a Guangdong dialect recognition model can be trained in advance.

Specifically, the training process of the dialect recognition model comprises the following steps:

collecting a corpus of a plurality of historical users in each region;

In this embodiment, the corpus of each region includes the historical incoming call voices of the users in the corresponding region, where the historical incoming call voices include incoming call voices with high voice scores and incoming call voices with low voice scores, when the dialect recognition model is trained, the incoming call voices with high voice scores are used as positive samples for training the dialect recognition model in the corresponding region, the incoming call voices with low voice scores are used as negative samples for training the dialect recognition model in the corresponding region, and in the subsequent service process, the dialect recognition model is trained by continuously inputting new positive samples and new negative samples, that is, continuously updating the dialect recognition model, so that the probability of negative samples being recognized is lower and lower, and the recognition rate of the dialect recognition model is continuously improved.

In this embodiment, in the training process of the dialect recognition model, the incoming call voice with high voice score is considered as a positive sample, and the incoming call voice with low voice score is considered as a negative sample to continuously optimize the dialect recognition model, so that the probability of the negative sample being recognized is lower and lower, the accuracy of the dialect recognition model is improved, and the intention recognition accuracy of the incoming call user is further improved.

In other optional embodiments, the inputting the first incoming call voice into the pre-trained dialect recognition model corresponding to the target area, and obtaining the first score includes:

In this embodiment, since the home region corresponding to the identification card number of the caller may be a birth region, the home region corresponding to the mobile phone number may be a growth region, in order to ensure the accuracy of the first score, the first home region corresponding to the identification card number and the second home region corresponding to the caller number may be identified at the same time, when the first home region corresponding to the identification card number and the second home region corresponding to the caller number are not the same home region, the first weight value may be set for the first home region corresponding to the identification card number in advance, and the second weight value may be set for the second home region corresponding to the caller number, and the first score may be obtained by calculation from two dimensions of the birth region and the growth region, thereby improving the accuracy of the first score.

And S13, when the first score is smaller than a preset dialect score threshold corresponding to the target area, respectively inputting the first incoming call voice into a plurality of residual dialect recognition models trained in advance to obtain a plurality of second scores, and determining a target dialect recognition model according to the plurality of second scores.

In this embodiment, dialect scoring thresholds may be set in advance for different dialects, after a first score of a first incoming call voice of an incoming call user is obtained, the first score is compared with a preset dialect scoring threshold corresponding to the target area, when the first score is smaller than the preset dialect scoring threshold corresponding to the target area, it is determined that a language used by the incoming call user may not be a dialect corresponding to the target area, the first incoming call voice is respectively input into a plurality of remaining dialect recognition models trained in advance, and a dialect type corresponding to the incoming call user is determined according to the plurality of second scores.

In an alternative embodiment, the method further comprises:

and when the first score is greater than or equal to a preset dialect score threshold corresponding to the target region, determining a pre-trained dialect recognition model corresponding to the target region as a target dialect recognition model.

In this embodiment, when the first score is greater than or equal to a preset dialect score threshold corresponding to the target area, it is determined that the dialect used by the incoming call user is the dialect corresponding to the target area.

In an alternative embodiment, said determining the target dialect identification model based on the plurality of second scores comprises:

In this embodiment, by comparing the first score with a preset dialect score threshold corresponding to the target area, determining whether the dialect used by the caller is the dialect of the target area according to the comparison result, when determining that the dialect used by the caller is not the dialect of the target area, respectively inputting the first incoming call voice of the caller into the residual dialect recognition model for recognition, sorting the scores returned according to the residual dialect recognition model from high to low, comparing the second score with the score of the corresponding preset dialect, and determining that the dialect used by the caller is the dialect corresponding to the dialect recognition model when the second score with the highest score is greater than or equal to the preset dialect score threshold corresponding to the third attribution area; and when the second score with the highest score is smaller than a preset dialect score threshold value corresponding to the third attribution area, determining that the incoming call user does not use dialects, possibly using the dialect as mandarin, and determining a default language recognition model as a target dialect recognition model.

In this embodiment, by comparing the first score with a preset dialect score threshold corresponding to the target area, it is determined whether the incoming call user uses a dialect corresponding to the target area, when the incoming call user does not use the dialect corresponding to the target area, the second score with the highest score is compared with the preset dialect score threshold corresponding to the third attribution area, it is determined whether the incoming call user uses the dialect, and by determining a target dialect recognition model for performing the second incoming call voice recognition subsequently after considering from two dimensions of whether the dialect corresponding to the target area is used and whether the dialect is used, accuracy of the second incoming call voice intention recognition is improved.

S14, collecting second incoming call voice of the incoming call user in real time, and adopting the target dialect recognition model to recognize the second incoming call voice to obtain an intention text.

In this embodiment, after determining a target dialect recognition model for recognizing the second incoming call voice, the second incoming call voice is input into the target dialect recognition model for performing intention recognition, and an intention text is obtained.

S15, the intention text is sent to a client corresponding to the caller number in the caller request.

In this embodiment, after the intention text of the caller is obtained, the intention text is sent to the client corresponding to the caller number in the caller request to perform secondary confirmation of the intention text, and whether the intention text is a target intention is confirmed, for example, the target intention may be whether the intention text is a service that the caller needs to query or transact, and when the caller confirms that the service that the caller queries or transacts in the intention text is the target intention, the accuracy of the intention text is improved.

S16, receiving intention confirmation information reported by the client, and confirming an intention feedback result according to the intention confirmation information.

In this embodiment, the intention confirmation information is used to characterize the intention text confirmation information obtained by the recognition of the dialect recognition model by the caller, and when the intention confirmation information reported by the client is received, if the intention confirmation information is correct, the intention text is determined to be the target intention of the caller, and the intention feedback result is determined according to the intention text, so that the accuracy of the intention feedback result is improved.

In an optional embodiment, the confirming the intention feedback result according to the intention confirming information includes:

In this embodiment, the URL is used to represent a related page URL associated with the text of the intention answer, and when the intention is identified, the intention text may include a plurality of large-class conversational intents, and each large-class conversational intention includes a plurality of fine-grained intents, for example, the intention text 1 is: "do you good, life insurance include regular insurance? ", intent text 2 is: "do you get life insurance include life insurance? ", intent text 3 is: "life insurance belongs to life insurance", the large category conversations corresponding to intention text 1, intention text 2 and intention text 3 are intended to be life insurance comprising which insurance, and the fine granularity corresponding to intention text 1 is intended to be: regular insurance, fine granularity intent corresponding to intent text 2 is: the fine granularity intent corresponding to intent text 3 is the following for life insurance: life-long insurance.

In this embodiment, in order to accurately identify the intention category that the caller wants to express in the intention text, the distinction between different graphs is increased by adding the entity feature to assist in the intention classification, the similarity of the text and the recognition accuracy of the intention recognition under the same intention are improved, and the accuracy of the intention feedback result is further improved.

S17, on-line voice answering is carried out according to the intention feedback result, and the intention feedback result is sent to a client corresponding to the caller number in the caller request.

In this embodiment, after determining an accurate intention text, an intention feedback result is obtained according to the intention text, and online voice answering is performed according to the intention feedback result, so that a pain point caused by that a telephone customer service cannot understand the dialect of an incoming call user, in which the true intention of the incoming call user cannot be determined, is solved, a phenomenon that the user is not satisfied due to communication disorder is avoided, the satisfaction degree of the user is improved, and meanwhile, the intention feedback result is sent to a client corresponding to an incoming call number in the incoming call request for confirmation, thereby assisting the telephone customer service to better understand the true intention of the incoming call user and communicate, and improving the service quality and efficiency of the telephone customer service.

In an optional embodiment, the sending the intent feedback result to the client corresponding to the caller number in the caller request includes:

In this embodiment, the format of the data list may be preset, for example, the data list in the preset format may be a data list in a short message format, a data list in a PDF format, a data list in a picture format, or the like, and the intention feedback result may be converted into the data list in the short message format, the data list in the PDF format, the data list in the picture format, or the like.

In this embodiment, each preset format corresponds to one interface, for example, if the intended feedback result is converted into a data list in a short message format, the data list in the short message format is sent to a client corresponding to the caller number in the caller request through the interface corresponding to the short message format, so that the phenomenon that the intended feedback result is sent to fail due to the diversity of the intended feedback result format is avoided, the diversity of the intended feedback result format is ensured, and meanwhile, the sending efficiency of the intended feedback result is improved.

In summary, according to the intention recognition method of the embodiment, on one hand, the intention text is sent to the client corresponding to the caller number in the caller request, and after the intention text of the caller is obtained, the intention text is sent to the client corresponding to the caller number in the caller request to perform secondary confirmation of the intention text, so that the accuracy of the intention text is improved; on the other hand, on-line voice answering is carried out according to the intention feedback result, the intention feedback result is sent to a client corresponding to the caller number in the caller request, after an accurate intention text is determined, the intention feedback result is obtained according to the intention text, and on-line voice answering is carried out according to the intention feedback result, so that the pain point that the true intention of the caller cannot be determined due to the fact that a telephone customer service cannot understand the dialect of the caller is solved, the phenomenon that the user is dissatisfied due to communication barriers is avoided, the satisfaction degree of the user is improved, meanwhile, the intention feedback result is sent to the client corresponding to the caller number in the caller request for confirmation, the telephone customer service is assisted to better understand the true intention of the caller and communicate, and the service quality and the service efficiency of the telephone customer service are improved; and finally, when the first score is smaller than a preset dialect score threshold corresponding to the target region, respectively inputting the first incoming call voice into a plurality of residual dialect recognition models trained in advance to obtain a plurality of second scores, determining a target dialect recognition model according to the second scores, and determining a target dialect recognition model for carrying out second incoming call voice recognition subsequently after considering whether the dialect corresponding to the target region is used or not and whether two dimensions of the dialect are used or not, thereby improving the accuracy of intention recognition of the second incoming call voice.

Example two

In some embodiments, the intent recognition device 20 may include a plurality of functional modules comprised of program code segments. Program code for each program segment in the intent recognition device 20 may be stored in a memory of the electronic device and executed by the at least one processor to perform the functions of intent recognition (as described in detail with respect to fig. 1).

In this embodiment, the intention recognition device 20 may be divided into a plurality of functional modules according to the functions it performs. The functional module may include: the system comprises an acquisition module 201, an input module 202, a determination module 203, an identification module 204, a transmission module 205, a receiving module 206 and a solution module 207. The module referred to herein is a series of computer readable instructions capable of being executed by at least one processor and of performing a fixed function, stored in a memory. In the present embodiment, the functions of the respective modules will be described in detail in the following embodiments.

The acquisition module 201 is configured to, in response to a received user incoming call request, acquire a first incoming call voice of an incoming call user in the user incoming call request and identify a target area where the incoming call user is located.

In an alternative embodiment, the identifying, by the acquisition module 201, the target area in which the incoming call user is located includes:

In this embodiment, when an incoming call request is received, the phone number of the incoming call user is extracted from the first incoming call language. In this embodiment, a first home area corresponding to the identification card number and a second home area corresponding to the incoming call number in the incoming call request may be identified at the same time, and when the first home area and the second home area are not the same home area, the first home area is preferentially determined as the target area where the incoming call user is located.

The input module 202 inputs the first incoming call voice to a pre-trained dialect recognition model corresponding to the target area, so as to obtain a first score.

collecting a corpus of a plurality of historical users in each region;

In other optional embodiments, the input module 202 inputs the first incoming call voice into a pre-trained dialect recognition model corresponding to the target area, and obtaining the first score includes:

And the determining module 203 is configured to, when the first score is smaller than a preset dialect score threshold corresponding to the target area, input the first incoming call voice into a plurality of remaining dialect recognition models trained in advance, obtain a plurality of second scores, and determine a target dialect recognition model according to the plurality of second scores.

In an optional embodiment, the determining module 203 is further configured to determine the pre-trained dialect recognition model corresponding to the target area as the target dialect recognition model when the first score is greater than or equal to a preset dialect score threshold corresponding to the target area.

In an alternative embodiment, the determining module 203 determines the target dialect recognition model according to the plurality of second scores includes:

The recognition module 204 is configured to collect a second incoming call voice of the incoming call user in real time, and recognize the second incoming call voice by using the target dialect recognition model to obtain an intention text.

And the sending module 205 is configured to send the intention text to a client corresponding to the caller number in the caller request.

And the receiving module 206 is configured to receive the intention confirmation information reported by the client, and confirm the intention feedback result according to the intention confirmation information.

In an alternative embodiment, the receiving module 206 confirms the intent feedback result according to the intent confirmation information includes:

The answering module 207 is configured to perform online voice answering according to the intention feedback result, and send the intention feedback result to a client corresponding to the caller number in the caller request.

In an alternative embodiment, the sending, by the answering module 207, the intent feedback result to the client corresponding to the caller number in the incoming call request includes:

In summary, according to the intention recognition device of the embodiment, on one hand, the intention text is sent to the client corresponding to the caller number in the caller request, and after the intention text of the caller is obtained, the intention text is sent to the client corresponding to the caller number in the caller request to perform secondary confirmation of the intention text, so that the accuracy of the intention text is improved; on the other hand, on-line voice answering is carried out according to the intention feedback result, the intention feedback result is sent to a client corresponding to the caller number in the caller request, after an accurate intention text is determined, the intention feedback result is obtained according to the intention text, and on-line voice answering is carried out according to the intention feedback result, so that the pain point that the true intention of the caller cannot be determined due to the fact that a telephone customer service cannot understand the dialect of the caller is solved, the phenomenon that the user is dissatisfied due to communication barriers is avoided, the satisfaction degree of the user is improved, meanwhile, the intention feedback result is sent to the client corresponding to the caller number in the caller request for confirmation, the telephone customer service is assisted to better understand the true intention of the caller and communicate, and the service quality and the service efficiency of the telephone customer service are improved; and finally, when the first score is smaller than a preset dialect score threshold corresponding to the target region, respectively inputting the first incoming call voice into a plurality of residual dialect recognition models trained in advance to obtain a plurality of second scores, determining a target dialect recognition model according to the second scores, and determining a target dialect recognition model for carrying out second incoming call voice recognition subsequently after considering whether the dialect corresponding to the target region is used or not and whether two dimensions of the dialect are used or not, thereby improving the accuracy of intention recognition of the second incoming call voice.

Example III

Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention. In the preferred embodiment of the invention, the electronic device 3 comprises a memory 31, at least one processor 32, at least one communication bus 33 and a transceiver 34.

It will be appreciated by those skilled in the art that the configuration of the electronic device shown in fig. 3 is not limiting of the embodiments of the present invention, and that either a bus-type configuration or a star-type configuration is possible, and that the electronic device 3 may also include more or less other hardware or software than that shown, or a different arrangement of components.

In some embodiments, the electronic device 3 is an electronic device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The electronic device 3 may further include a client device, where the client device includes, but is not limited to, any electronic product that can interact with a client by way of a keyboard, a mouse, a remote control, a touch pad, or a voice control device, such as a personal computer, a tablet computer, a smart phone, a digital camera, etc.

It should be noted that the electronic device 3 is only used as an example, and other electronic products that may be present in the present invention or may be present in the future are also included in the scope of the present invention by way of reference.

In some embodiments, the memory 31 is used to store program code and various data, such as the intent recognition device 20 installed in the electronic device 3, and to enable high-speed, automatic access to programs or data during operation of the electronic device 3. The Memory 31 includes Read-Only Memory (ROM), programmable Read-Only Memory (PROM), erasable programmable Read-Only Memory (EPROM), one-time programmable Read-Only Memory (One-time Programmable Read-Only Memory, OTPROM), electrically erasable rewritable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disc Memory, magnetic tape Memory, or any other medium that can be used for computer-readable carrying or storing data.

In some embodiments, the at least one processor 32 may be comprised of an integrated circuit, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The at least one processor 32 is a Control Unit (Control Unit) of the electronic device 3, connects the respective components of the entire electronic device 3 using various interfaces and lines, and executes various functions of the electronic device 3 and processes data by running or executing programs or modules stored in the memory 31 and calling data stored in the memory 31.

In some embodiments, the at least one communication bus 33 is arranged to enable connected communication between the memory 31 and the at least one processor 32 or the like.

Although not shown, the electronic device 3 may further include a power source (such as a battery) for powering the various components, and optionally, the power source may be logically connected to the at least one processor 32 via a power management device, thereby implementing functions such as managing charging, discharging, and power consumption by the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 3 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.

The integrated units implemented in the form of software functional modules described above may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device, etc.) or a processor (processor) to perform portions of the methods described in the various embodiments of the invention.

In a further embodiment, in connection with fig. 2, the at least one processor 32 may execute the operating means of the electronic device 3 as well as various types of applications installed (such as the intent recognition device 20), program code, etc., e.g., the various modules described above.

The memory 31 has program code stored therein, and the at least one processor 32 can invoke the program code stored in the memory 31 to perform related functions. For example, each of the modules depicted in fig. 2 is a program code stored in the memory 31 and executed by the at least one processor 32 to perform the functions of the respective module for purposes of intent recognition.

Illustratively, the program code may be partitioned into one or more modules/units that are stored in the memory 31 and executed by the processor 32 to complete the present application. The one or more modules/units may be a series of computer readable instruction segments capable of performing the specified functions, which instruction segments describe the execution of the program code in the electronic device 3. For example, the program code may be divided into an acquisition module 201, an input module 202, a determination module 203, an identification module 204, a transmission module 205, a reception module 206, and a solution module 207.

In one embodiment of the invention, the memory 31 stores a plurality of computer readable instructions that are executed by the at least one processor 32 to perform the function of intent recognition.

Specifically, the specific implementation method of the above instruction by the at least one processor 32 may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, it is therefore intended to include within the invention all changes that fall within the meaning and range of equivalency of the claims. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it will be obvious that the term "comprising" does not exclude other elements or that the singular does not exclude a plurality. The units or means stated in the invention may also be implemented by one unit or means, either by software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. A method of intent recognition, the method comprising:

responding to a received user incoming call request, collecting first incoming call voice of an incoming call user in the user incoming call request and identifying a target area where the incoming call user is located, wherein the method comprises the following steps: analyzing the incoming call request of the user to obtain the identity card number and the incoming call number of the incoming call user; extracting a plurality of key fields from the identity card number, matching a first home region matched with the key fields from a preset region database, identifying an operator of the incoming call number, and acquiring a second home region of the incoming call number through a data interface query service corresponding to the operator; when the first home region and the second home region are the same region, determining the first home region as a target region where the incoming call user is located; or when the first home region and the second home region are not the same region, determining the first home region and the second home region as target regions where the incoming call users are located;

Inputting the first incoming call voice into a pre-trained dialect recognition model corresponding to the target area to obtain a first score, wherein the method comprises the following steps of: when the target area is a first attribution area, inputting the first incoming call voice into a pre-trained dialect recognition model corresponding to the first attribution area to obtain a first score; or when the target area is a first home area and a second home area, inputting the first incoming call voice into a pre-trained dialect recognition model corresponding to the first home area, and sending the first incoming call voice into a pre-trained dialect recognition model corresponding to the second home area; receiving a first dialect score output by a pre-trained dialect recognition model corresponding to the first attribution area, and receiving a second dialect score output by a pre-trained dialect recognition model corresponding to the second attribution area; calculating the product of the first language score and a preset first weight value to obtain a third language score corresponding to a first attribution area; calculating the product of the second dialect score and a preset second weight value to obtain a fourth dialect score corresponding to a second attribution area; calculating the sum of the third aspect score and the fourth aspect score to obtain a first score;

When the first score is smaller than a preset dialect score threshold corresponding to the target area, the first incoming call voice is respectively input into a plurality of residual dialect recognition models trained in advance to obtain a plurality of second scores, and a target dialect recognition model is determined according to the plurality of second scores, wherein the method comprises the following steps: selecting a second score with the highest score from the plurality of second scores, identifying a third attribution area corresponding to the second score with the highest score, and comparing the second score with a preset dialect score threshold corresponding to the third attribution area; when the second score with the highest score is greater than or equal to a preset dialect score threshold value corresponding to the third attribution area, determining a pre-trained dialect recognition model corresponding to the third attribution area as a target dialect recognition model; or when the second score with the highest score is smaller than a preset dialect score threshold value corresponding to the third attribution area, determining a default language identification model as a target dialect identification model;

receiving intention confirmation information reported by the client, and confirming an intention feedback result according to the intention confirmation information, wherein the method comprises the following steps of: when the intention confirming information is correct, carrying out entity identification on the intention text to obtain a plurality of entities; generating a first text vector containing contextual features according to the intention text, and generating entity feature vectors according to the entities; converting the first text vector into a plurality of granularity second text vectors through convolution operation; feature extraction is carried out on the second text vectors with the multiple granularities to obtain semantic feature vectors, and the semantic feature vectors and the entity feature vectors are spliced to obtain template feature vectors; determining an intention category corresponding to the intention text according to the template feature vector; determining an intention answer text matched with the intention category and a URL corresponding to the intention answer text from a preset database, and determining the intention answer text and the URL corresponding to the intention answer text as an intention feedback result;

2. The intent recognition method as recited in claim 1, wherein the training process of the dialect recognition model includes:

collecting a corpus of a plurality of historical users in each region;

3. The method for identifying an intention as claimed in claim 1, wherein the sending the result of the intention feedback to the client corresponding to the caller number in the incoming call request comprises:

4. An intent recognition device, the device comprising:

the acquisition module is used for responding to the received user incoming call request, acquiring first incoming call voice of an incoming call user in the user incoming call request and identifying a target area where the incoming call user is located, and comprises the following steps: analyzing the incoming call request of the user to obtain the identity card number and the incoming call number of the incoming call user; extracting a plurality of key fields from the identity card number, matching a first home region matched with the key fields from a preset region database, identifying an operator of the incoming call number, and acquiring a second home region of the incoming call number through a data interface query service corresponding to the operator; when the first home region and the second home region are the same region, determining the first home region as a target region where the incoming call user is located; or when the first home region and the second home region are not the same region, determining the first home region and the second home region as target regions where the incoming call users are located;

The input module is configured to input the first incoming call voice into a pre-trained dialect recognition model corresponding to the target area, and obtain a first score, where the first score includes: when the target area is a first attribution area, inputting the first incoming call voice into a pre-trained dialect recognition model corresponding to the first attribution area to obtain a first score; or when the target area is a first home area and a second home area, inputting the first incoming call voice into a pre-trained dialect recognition model corresponding to the first home area, and sending the first incoming call voice into a pre-trained dialect recognition model corresponding to the second home area; receiving a first dialect score output by a pre-trained dialect recognition model corresponding to the first attribution area, and receiving a second dialect score output by a pre-trained dialect recognition model corresponding to the second attribution area; calculating the product of the first language score and a preset first weight value to obtain a third language score corresponding to a first attribution area; calculating the product of the second dialect score and a preset second weight value to obtain a fourth dialect score corresponding to a second attribution area; calculating the sum of the third aspect score and the fourth aspect score to obtain a first score;

The determining module is configured to, when the first score is smaller than a preset dialect score threshold corresponding to the target area, input the first incoming call voice into a plurality of remaining dialect recognition models trained in advance, obtain a plurality of second scores, and determine a target dialect recognition model according to the plurality of second scores, where the determining module includes: selecting a second score with the highest score from the plurality of second scores, identifying a third attribution area corresponding to the second score with the highest score, and comparing the second score with a preset dialect score threshold corresponding to the third attribution area; when the second score with the highest score is greater than or equal to a preset dialect score threshold value corresponding to the third attribution area, determining a pre-trained dialect recognition model corresponding to the third attribution area as a target dialect recognition model; or when the second score with the highest score is smaller than a preset dialect score threshold value corresponding to the third attribution area, determining a default language identification model as a target dialect identification model;

the receiving module is used for receiving the intention confirming information reported by the client and confirming an intention feedback result according to the intention confirming information, and comprises the following steps: when the intention confirming information is correct, carrying out entity identification on the intention text to obtain a plurality of entities; generating a first text vector containing contextual features according to the intention text, and generating entity feature vectors according to the entities; converting the first text vector into a plurality of granularity second text vectors through convolution operation; feature extraction is carried out on the second text vectors with the multiple granularities to obtain semantic feature vectors, and the semantic feature vectors and the entity feature vectors are spliced to obtain template feature vectors; determining an intention category corresponding to the intention text according to the template feature vector; determining an intention answer text matched with the intention category and a URL corresponding to the intention answer text from a preset database, and determining the intention answer text and the URL corresponding to the intention answer text as an intention feedback result;

5. An electronic device comprising a processor and a memory, wherein the processor is configured to implement the intent recognition method as claimed in any one of claims 1 to 3 when executing a computer program stored in the memory.

6. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the intention recognition method as claimed in any one of claims 1 to 3.