CN114020886A - Speech intention recognition method, device, equipment and storage medium - Google Patents

Speech intention recognition method, device, equipment and storage medium Download PDF

Info

Publication number
CN114020886A
CN114020886A CN202111274209.3A CN202111274209A CN114020886A CN 114020886 A CN114020886 A CN 114020886A CN 202111274209 A CN202111274209 A CN 202111274209A CN 114020886 A CN114020886 A CN 114020886A
Authority
CN
China
Prior art keywords
model
dialogue model
user
sentence
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111274209.3A
Other languages
Chinese (zh)
Inventor
毛星越
李婷
王坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Pingan Integrated Financial Services Co ltd
Original Assignee
Shenzhen Pingan Integrated Financial Services Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Pingan Integrated Financial Services Co ltd filed Critical Shenzhen Pingan Integrated Financial Services Co ltd
Priority to CN202111274209.3A priority Critical patent/CN114020886A/en
Publication of CN114020886A publication Critical patent/CN114020886A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Biology (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to an artificial intelligence technology, and discloses a voice intention recognition method, which comprises the following steps: receiving a user inquiry statement subjected to voice recognition; acquiring a preset node type label, extracting a keyword from the user inquiry statement, calculating a matching value of the keyword and the node type label, and determining the node type of the user inquiry statement according to the matching value; calling a pre-constructed recognition model framework comprising a basic dialogue model, a detailed dialogue model and a customized dialogue model, and selecting a corresponding dialogue model to generate a reply sentence of the user inquiry sentence; and converting the reply sentence into voice based on phoneme conversion and transmitting the voice to the user. In addition, the invention also relates to a blockchain technology, and the identification model framework can be stored in the nodes of the blockchain. The invention also provides a voice intention recognition device, electronic equipment and a storage medium. The invention can improve the recognition accuracy of the voice intention of the user.

Description

Speech intention recognition method, device, equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for recognizing a speech intention, an electronic device, and a computer-readable storage medium.
Background
With the development of voice recognition technology, the use of the voice recognition technology is more and more extensive, and various services begin to use intelligent AI to make outbound call to communicate with users, so that service automation is realized.
At present, the traditional intelligent AI outbound recognizes the intention of a user based on a voice recognition model and replies, but the method usually uses a model algorithm to recognize all conversations of the user, and can meet the service requirements under a general scene, but along with increasingly refined subdivision of services, the general model has high recognition error rate and insufficient accuracy, reduces the service handling efficiency, and has poor experience of the user. Therefore, a more accurate recognition method is required
Disclosure of Invention
The invention provides a voice intention recognition method, a voice intention recognition device and a computer readable storage medium, and mainly aims to improve the recognition accuracy of a voice intention of a user.
In order to achieve the above object, the present invention provides a speech intention recognition method, including:
receiving a user inquiry statement subjected to voice recognition;
acquiring a preset node type label, extracting a keyword from the user inquiry statement, calculating a matching value of the keyword and the node type label, and determining the node type of the user inquiry statement according to the matching value, wherein the node type comprises a basic type, a refining type and a customization type;
calling a pre-constructed recognition model framework comprising a basic dialogue model, a refined dialogue model and a customized dialogue model, and selecting a corresponding dialogue model from the basic dialogue model, the refined dialogue model and the customized dialogue model to generate a reply sentence of the user inquiry sentence;
and converting the reply sentence into voice based on phoneme conversion and transmitting the voice to the user.
Optionally, the extracting keywords from the user query sentence includes:
performing word segmentation processing on the user query sentence to obtain a word set;
taking words in the word set as vertexes, and constructing a directed graph for edges according to grammatical relations among the words;
calculating the weight of each top point in the directed graph according to a weight formula;
and selecting the peak with the weight value larger than a preset threshold value to obtain the keyword.
Optionally, the calculating a matching value of the keyword and the node type tag includes:
converting the keywords and the node type labels into vectors to obtain word vectors and type vectors;
respectively extracting the characteristics of the word vector and the type vector to obtain word characteristics and type characteristics;
and calculating the similarity of the word features and the type features to obtain a matching value of the keyword and the node type label.
Optionally, the invoking a pre-built recognition model framework including a base dialogue model, a refined dialogue model and a customized dialogue model, and selecting a corresponding dialogue model from the base dialogue model, the refined dialogue model and the customized dialogue model to generate a reply sentence of the user query sentence includes:
if the node type is a basic type, calling a basic dialogue model in a pre-constructed recognition model frame to generate a reply statement of the user inquiry statement;
if the node type is a refinement type, calling a refinement dialogue model in a pre-constructed recognition model frame to generate a reply sentence of the user inquiry sentence;
and if the node type is the customized type, calling a customized dialogue model in a pre-constructed recognition model frame to generate a reply sentence of the user inquiry sentence.
Optionally, the invoking a basic dialogue model in a pre-constructed recognition model framework to generate a reply statement of the user query statement includes:
converting the user inquiry statement into a vector by using an input layer of the basic dialogue model to obtain a text vector;
performing semantic recognition on the text vector by utilizing a hidden layer of the basic dialogue model to obtain a semantic vector;
and decoding the text vector by utilizing the output layer of the basic dialogue model to obtain a reply sentence.
Optionally, before invoking the customized dialog model in the pre-built recognition model framework to generate the reply sentence of the user query sentence, the method further comprises:
collecting corpus data related to preset services;
expanding the corpus data based on a corpus enhancement method to obtain a corpus training set;
and training the customized dialogue model by utilizing the corpus training set to obtain the customized dialogue model.
Optionally, the converting the reply sentence into voice and transmitting the voice to the user includes:
performing text analysis on the reply sentence to generate a phoneme text;
and converting the phoneme text into voice by using a preset voice library, and transmitting the voice to a user.
In order to solve the above problem, the present invention also provides a speech intention recognition apparatus, comprising:
the sentence receiving module is used for receiving the user inquiry sentences subjected to the voice recognition;
the node identification module is used for acquiring a preset node type label, extracting a keyword from the user inquiry statement, calculating a matching value of the keyword and the node type label, and determining the node type of the user inquiry statement according to the matching value;
the reply generation module is used for calling a pre-constructed recognition model framework comprising a basic dialogue model, a refined dialogue model and a customized dialogue model, and selecting a corresponding dialogue model from the basic dialogue model, the refined dialogue model and the customized dialogue model to generate a reply sentence of the user inquiry sentence;
and the voice reply module is used for converting the reply sentence into voice based on phoneme conversion and transmitting the voice to the user.
In order to solve the above problem, the present invention also provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the voice intent recognition method described above.
In order to solve the above problem, the present invention also provides a computer-readable storage medium having at least one computer program stored therein, the at least one computer program being executed by a processor in an electronic device to implement the voice intention recognition method described above.
Receiving a user query sentence subjected to voice recognition, acquiring a preset node type tag, extracting a keyword from the user query sentence, calculating a matching value of the keyword and the node type tag, determining a node type of the user query sentence according to the matching value, calling a pre-constructed recognition model framework comprising a basic dialogue model, a refined dialogue model and a customized dialogue model, selecting a corresponding dialogue model from the basic dialogue model, the refined dialogue model and the customized dialogue model to generate a reply sentence of the user query sentence, and recognizing and generating replies by using different models according to different service scenes, so that the reply accuracy is improved; meanwhile, each reply is subjected to node recognition in advance to ensure that the currently used recognition model is the corresponding optimum model, so that the recognition rate of the conversation intention of the user can be effectively improved, the satisfaction degree of the user can be improved, a plurality of different requirements of the user can be flexibly met, and the conversation accuracy is improved. Therefore, the voice intention recognition method, the voice intention recognition device, the electronic equipment and the computer readable storage medium can improve the recognition accuracy of the voice intention of the user.
Drawings
Fig. 1 is a flowchart illustrating a voice intention recognition method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of model invocation according to an embodiment of the present invention;
FIG. 3 is a functional block diagram of a speech intent recognition apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device implementing the speech intention recognition method according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides a voice intention recognition method. The execution subject of the voice intention recognition method includes, but is not limited to, at least one of electronic devices such as a server and a terminal, which can be configured to execute the method provided by the embodiment of the present application. In other words, the voice intention recognition method may be performed by software or hardware installed in the terminal device or the server device, and the software may be a block chain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.
Fig. 1 is a schematic flow chart of a speech intention recognition method according to an embodiment of the present invention. In this embodiment, the voice intention recognition method includes:
and S1, receiving the user inquiry sentence subjected to the voice recognition.
The user query statement in the embodiment of the invention refers to a text statement converted from dialogue voice sent by a user to AI under the scene of intelligent AI outbound service after voice recognition. Such as: the intelligent AI asks after the call: "ask what business you need to do? "the user answers: "vehicle loan transaction". And if so, the vehicle loan is transacted. "may be the user query statement in the embodiment of the present invention.
Further, before receiving the user query sentence subjected to the speech recognition, the method further includes:
receiving inquiry voice sent by a user;
extracting sound features of the voice;
and decoding the sound characteristics and converting the sound characteristics into texts to obtain the user inquiry sentences.
S2, acquiring a preset node type label, extracting a keyword from the user inquiry statement, calculating a matching value of the keyword and the node type label, and determining the node type of the user inquiry statement according to the matching value.
The node type refers to a service scene type of an actual service, and comprises a basic type, a refinement type and a customization type. The basic type refers to a common node without distinguishing specific service lines, such as a client intention determination node, an objection processing node, and the like, and is the most basic service of an actual service scenario. The refinement type refers to a service node which is more refined on the basic service type, such as a vehicle loan consultation node. The customized type refers to specific service nodes generated according to specific requirements, such as: and the small car loan transaction node.
In detail, in the embodiment of the present invention, a matching value between the keyword and the node type tag is calculated, and a service type corresponding to the node type tag with the highest matching value is determined as the node type of the user query statement.
Further, the extracting keywords from the user query sentence includes:
performing word segmentation processing on the user query sentence to obtain a word set;
taking words in the word set as vertexes, and constructing a directed graph for edges according to grammatical relations among the words;
calculating the weight of each top point in the directed graph according to a weight formula;
and selecting the peak with the weight value larger than a preset threshold value to obtain the keyword.
Wherein the weight formula is as follows:
Figure BDA0003328869910000051
wherein WS (V)i) Represents a node ViD is a damping coefficient, represents the probability of pointing from a certain point to any other point In the graph, and generally takes the value of 0.85 In (V)i) To point to node ViSet of nodes of, Out (V)j) Is node ViSet of pointed-to nodes, WjiIs two points of Vi、VjThe connection weight between them is typically 1.
Further, the calculating a matching value of the keyword and the node type tag includes:
converting the keywords and the node type labels into vectors to obtain word vectors and type vectors;
respectively extracting the characteristics of the word vector and the type vector to obtain word characteristics and type characteristics;
and calculating the similarity of the word features and the type features to obtain a matching value of the keyword and the node type label.
Wherein the calculating the similarity of the word feature and the type feature is calculated by using the following formula:
Figure BDA0003328869910000061
wherein L (X, Y) is the similarity value, X is the word feature, Y isiIs the type feature.
The embodiment of the invention judges which node type the voice intention of the current user belongs to by carrying out node identification on the inquiry sentence of the user, thereby facilitating the subsequent corresponding processing of the model.
And S3, calling a pre-constructed recognition model framework comprising a basic dialogue model, a refined dialogue model and a customized dialogue model, and selecting a corresponding dialogue model from the basic dialogue model, the refined dialogue model and the customized dialogue model to generate a reply sentence of the user inquiry sentence.
The pre-constructed recognition model framework in the embodiment of the invention is a system framework fusing a plurality of recognition dialogue models, and the recognition model framework comprises a basic dialogue model, a refined dialogue model and a customized dialogue model.
Optionally, to further ensure the security of the recognition model framework, the recognition model framework may be stored in a node of a block chain.
In detail, referring to fig. 2, the S3 includes:
s31, if the node type is a basic type, calling a basic dialogue model in a pre-constructed recognition model frame to generate a reply sentence of the user inquiry sentence;
s32, if the node type is a refinement type, calling a refinement dialogue model in a pre-constructed recognition model frame to generate a reply sentence of the user inquiry sentence;
and S33, if the node type is the customized type, calling a customized dialogue model in the pre-constructed recognition model framework to generate a reply statement of the user inquiry statement.
The basic dialogue model, the refined dialogue model and the customized dialogue model are all NLU-based language models, such as a seq2seq model, a bert model, a translation model and the like, and the language models are trained by using different linguistic data to obtain the dialogue models for different service scenes. The model includes an input layer, a hidden layer, and an output layer.
Further, the invoking a basic dialogue model in a pre-constructed recognition model framework to generate a reply sentence of the user query sentence includes:
converting the user inquiry statement into a vector by using an input layer of the basic dialogue model to obtain a text vector;
performing semantic recognition on the text vector by utilizing a hidden layer of the basic dialogue model to obtain a semantic vector;
and decoding the text vector by utilizing the output layer of the basic dialogue model to obtain a reply sentence.
In an optional embodiment of the invention, the basic dialogue model can be constructed by a single-node classification model through TextCNN, and is subjected to specific corpus preprocessing and pre-training, and the basic dialogue model is superimposed with a regular model on a deep learning model, so that the recognition stability and accuracy can be enhanced; the deep learning classification model is used as the recognition main body, the prediction is not depended on a large amount of accumulation, the overall recognition accuracy is greatly improved by using less linguistic data, and the recognition process of the nodes is greatly simplified compared with the prior art.
Optionally, before invoking the refined dialog model in the pre-constructed recognition model framework to generate the reply statement of the user query statement, the method further includes:
acquiring user reply sentences related to preset service classes from a database to obtain a fast-spelling training corpus;
and training the refined dialogue model by using the fast spelling training corpus to obtain the refined dialogue model.
Optionally, before invoking the customized dialog model in the pre-built recognition model framework to generate the reply statement of the user query statement, the method further includes:
collecting corpus data related to preset specific services;
expanding the corpus data based on a corpus enhancement method to obtain a corpus training set;
and training the customized dialogue model by utilizing the corpus training set to obtain the customized dialogue model.
The corpus data is expanded based on a corpus enhancement method to obtain a corpus training set, including: performing word segmentation processing on the corpus data to obtain a word set; and splitting the semantic role type of each statement in the corpus data, and performing vocabulary replacement, vocabulary increase and decrease and statement structure conversion on the words in the word set according to the semantic roles to obtain a corpus training set.
For the specific service with less linguistic data, the embodiment of the invention can provide a certain amount of corresponding linguistic data for the user, and then the user can enhance the service through the self-built linguistic data by the module: after word segmentation processing, the semantic role type of the sentence is split, and a small amount of business corpora are expanded into a large amount of corpora for feature learning in the modes of vocabulary replacement, vocabulary increase and decrease, sentence structure conversion and the like according to the semantic role. The service accumulation arrangement time and the cold start building time are greatly reduced.
And S4, converting the reply sentence into voice based on phoneme conversion and transmitting the voice to the user.
In detail, the converting the reply sentence into the voice based on the phoneme conversion and transmitting the voice to the user includes:
performing text analysis on the reply sentence to generate a phoneme text;
and converting the phoneme text into voice by using a preset voice library, and transmitting the voice to a user.
Further, the generating of the phoneme text by performing text analysis on the reply sentence includes:
performing word segmentation on the reply sentence to obtain a word sequence;
converting the word sequence into a phoneme sequence;
and inquiring whether the word sequence contains special words or not, determining phonemes of the special words according to a preset rule, and updating the phoneme sequence according to the phonemes to obtain a phoneme text.
Wherein the special words include, but are not limited to, polyphones, numbers, and abbreviations.
The embodiment of the invention determines the text structure of the sentence and the composition of the phoneme of each word by performing language analysis on the reply sentence on the text, such as vocabulary and semantic analysis.
And converting the phoneme text into voice by using a preset voice library, namely calling the synthesized voice in the voice library, converting the linguistic description into a voice waveform, and converting the voice waveform into voice.
According to the embodiment of the invention, the reply text generated by the model in the recognition model frame is converted into voice, and is replied to the user by using the AI robot, and then the reply of the user after the reply is received.
In another embodiment of the present invention, after converting the reply sentence into speech based on phoneme conversion and transmitting the speech to the user, the method may further include: receiving a reply sentence of the user; and returning to the step of identifying the node type of the user reply statement based on a preset rule until the user reply statement is a preset end statement, and ending the current call.
After the intelligent AI replies to the user, when the reply sentence of the user is not received within the preset time, the receiving of the conversation ending message can be confirmed, when the reply sentence of the user is received within the preset time, the receiving of the reply sentence can be confirmed, and the user can continue to transact other services, so that the received reply of the user may not be the ending sentence, but the sentence with the intention to be identified again. And the node type is identified again for the user reply sentence, so that the corresponding service processing can be continuously provided for the user, and the efficiency is improved.
After receiving the reply of the user to the reply sentence, judging whether the node of the current reply sentence belongs to the turn of the current node or not, namely judging the node type of the reply sentence again, calling different conversation models according to the node type to generate a reply conversation for the current reply until the reply sentence of the user is recognized to have an end intention, selecting a preset end sentence as the reply, ending the current conversation, or ending the conversation by the user, and finishing the current outbound content.
Wherein, the ending sentence is a sentence with the intention of ending the current call, such as when the robot AI asks "what is still needed? "user replies to a statement that" does not "or" can "etc. end the intent of the conversation.
The embodiment of the invention can identify and reply the more refined service scene correspondingly.
Receiving a user query sentence subjected to voice recognition, acquiring a preset node type tag, extracting a keyword from the user query sentence, calculating a matching value of the keyword and the node type tag, determining a node type of the user query sentence according to the matching value, calling a pre-constructed recognition model framework comprising a basic dialogue model, a refined dialogue model and a customized dialogue model, selecting a corresponding dialogue model from the basic dialogue model, the refined dialogue model and the customized dialogue model to generate a reply sentence of the user query sentence, and recognizing and generating replies by using different models according to different service scenes, so that the reply accuracy is improved; meanwhile, each reply is subjected to node recognition in advance to ensure that the currently used recognition model is the corresponding optimum model, so that the recognition rate of the conversation intention of the user can be effectively improved, the satisfaction degree of the user can be improved, a plurality of different requirements of the user can be flexibly met, and the conversation accuracy is improved. Therefore, the voice intention recognition method, the voice intention recognition device, the electronic equipment and the computer readable storage medium can improve the recognition accuracy of the voice intention of the user.
Fig. 3 is a functional block diagram of a speech intention recognition apparatus according to an embodiment of the present invention.
The voice intention recognition apparatus 100 according to the present invention may be installed in an electronic device. According to the implemented functions, the voice intention recognition apparatus 100 may include a sentence receiving module 101, a node recognition module 102, a reply generation module 103, and a voice reply module 104. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the sentence receiving module 101 is configured to receive a user query sentence subjected to voice recognition.
The user query statement in the embodiment of the invention refers to a text statement converted from dialogue voice sent by a user to AI under the scene of intelligent AI outbound service after voice recognition. Such as: the intelligent AI asks after the call: "ask what business you need to do? "the user answers: "vehicle loan transaction". And if so, the vehicle loan is transacted. "may be the user query statement in the embodiment of the present invention.
Further, before receiving the user query sentence subjected to the speech recognition, the method further includes:
receiving inquiry voice sent by a user;
extracting sound features of the voice;
and decoding the sound characteristics and converting the sound characteristics into texts to obtain the user inquiry sentences.
The node identification module 102 is configured to obtain a preset node type tag, extract a keyword from the user query statement, calculate a matching value between the keyword and the node type tag, and determine a node type of the user query statement according to the matching value.
The node type refers to a service scene type of an actual service, and comprises a basic type, a refinement type and a customization type. The basic type refers to a common node without distinguishing specific service lines, such as a client intention determination node, an objection processing node, and the like, and is the most basic service of an actual service scenario. The refinement type refers to a service node which is more refined on the basic service type, such as a vehicle loan consultation node. The customized type refers to specific service nodes generated according to specific requirements, such as: and the small car loan transaction node.
Further, when extracting the keyword from the user query sentence, the node identification module 102 specifically performs the following operations:
performing word segmentation processing on the user query sentence to obtain a word set;
taking words in the word set as vertexes, and constructing a directed graph for edges according to grammatical relations among the words;
calculating the weight of each top point in the directed graph according to a weight formula;
and selecting the peak with the weight value larger than a preset threshold value to obtain the keyword.
Wherein the weight formula is as follows:
Figure BDA0003328869910000101
wherein WS (V)i) Represents a node ViD is a damping coefficient, represents the probability of pointing from a certain point to any other point In the graph, and generally takes the value of 0.85 In (V)i) To point to node ViSet of nodes of, Out (V)j) Is node ViSet of pointed-to nodes, WjiIs two points of Vi、VjThe connection weight between them is typically 1.
Further, when calculating the matching value between the keyword and the node type tag, the node identification module 102 specifically performs the following operations:
converting the keywords and the node type labels into vectors to obtain word vectors and type vectors;
respectively extracting the characteristics of the word vector and the type vector to obtain word characteristics and type characteristics;
and calculating the similarity of the word features and the type features to obtain a matching value of the keyword and the node type label.
Wherein the calculating the similarity of the word feature and the type feature is calculated by using the following formula:
Figure BDA0003328869910000111
wherein L (X, Y) is the similarity value, X is the word feature, Y isiIs the type feature. The embodiment of the invention judges which node type the voice intention of the current user belongs to by carrying out node identification on the inquiry sentence of the user, thereby facilitating the subsequent corresponding processing of the model.
The reply generation module 103 is configured to invoke a pre-constructed recognition model framework including a basic dialogue model, a refined dialogue model, and a customized dialogue model, and select a corresponding dialogue model from the basic dialogue model, the refined dialogue model, and the customized dialogue model to generate a reply statement of the user query statement.
The pre-constructed recognition model framework in the embodiment of the invention is a system framework fusing a plurality of recognition dialogue models, and the recognition model framework comprises a basic dialogue model, a refined dialogue model and a customized dialogue model.
In detail, the reply generation module 103 is specifically configured to:
if the node type is a basic type, calling a basic dialogue model in a pre-constructed recognition model frame to generate a reply statement of the user inquiry statement;
if the node type is a refinement type, calling a refinement dialogue model in a pre-constructed recognition model frame to generate a reply sentence of the user inquiry sentence;
and if the node type is the customized type, calling a customized dialogue model in a pre-constructed recognition model frame to generate a reply sentence of the user inquiry sentence.
The basic dialogue model, the refined dialogue model and the customized dialogue model are all NLU-based language models, such as a seq2seq model, a bert model, a translation model and the like, and the language models are trained by using different linguistic data to obtain the dialogue models for different service scenes. The model includes an input layer, a hidden layer, and an output layer.
Further, the invoking a basic dialogue model in a pre-constructed recognition model framework to generate a reply sentence of the user query sentence includes:
converting the user inquiry statement into a vector by using an input layer of the basic dialogue model to obtain a text vector;
performing semantic recognition on the text vector by utilizing a hidden layer of the basic dialogue model to obtain a semantic vector;
and decoding the text vector by utilizing the output layer of the basic dialogue model to obtain a reply sentence.
Optionally, before invoking the refined dialog model in the pre-constructed recognition model framework to generate the reply statement of the user query statement, the method further includes:
acquiring user reply sentences related to preset service classes from a database to obtain a fast-spelling training corpus;
and training the refined dialogue model by using the fast spelling training corpus to obtain the refined dialogue model.
Optionally, before invoking the customized dialog model in the pre-built recognition model framework to generate the reply statement of the user query statement, the method further includes:
collecting corpus data related to preset specific services;
expanding the corpus data based on a corpus enhancement method to obtain a corpus training set;
and training the customized dialogue model by utilizing the corpus training set to obtain the customized dialogue model.
The corpus data is expanded based on a corpus enhancement method to obtain a corpus training set, including: performing word segmentation processing on the corpus data to obtain a word set; and splitting the semantic role type of each statement in the corpus data, and performing vocabulary replacement, vocabulary increase and decrease and statement structure conversion on the words in the word set according to the semantic roles to obtain a corpus training set.
The voice replying module 104 is configured to convert the reply sentence into voice based on the phoneme conversion and transmit the voice to the user.
In detail, the converting the reply sentence into the voice based on the phoneme conversion and transmitting the voice to the user includes:
performing text analysis on the reply sentence to generate a phoneme text;
and converting the phoneme text into voice by using a preset voice library, and transmitting the voice to a user.
Further, the generating of the phoneme text by performing text analysis on the reply sentence includes:
performing word segmentation on the reply sentence to obtain a word sequence;
converting the word sequence into a phoneme sequence;
and inquiring whether the word sequence contains special words or not, determining phonemes of the special words according to a preset rule, and updating the phoneme sequence according to the phonemes to obtain a phoneme text.
Wherein the special words include, but are not limited to, polyphones, numbers, and abbreviations.
The embodiment of the invention determines the text structure of the sentence and the composition of the phoneme of each word by performing language analysis on the reply sentence on the text, such as vocabulary and semantic analysis.
And converting the phoneme text into voice by using a preset voice library, namely calling the synthesized voice in the voice library, converting the linguistic description into a voice waveform, and converting the voice waveform into voice.
According to the embodiment of the invention, the reply text generated by the model in the recognition model frame is converted into voice, and is replied to the user by using the AI robot, and then the reply of the user after the reply is received.
Fig. 4 is a schematic structural diagram of an electronic device implementing a speech intention recognition method according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11, a communication bus 12 and a communication interface 13, and may further comprise a computer program, such as a speech intent recognition program, stored in the memory 11 and executable on the processor 10.
In some embodiments, the processor 10 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, and includes one or more Central Processing Units (CPUs), a microprocessor, a digital Processing chip, a graphics processor, a combination of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules (e.g., executing a voice intention recognition program, etc.) stored in the memory 11 and calling data stored in the memory 11.
The memory 11 includes at least one type of readable storage medium including flash memory, removable hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, for example a removable hard disk of the electronic device. The memory 11 may also be an external storage device of the electronic device in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used not only to store application software installed in the electronic device and various types of data, such as codes of a voice intention recognition program, etc., but also to temporarily store data that has been output or is to be output.
The communication bus 12 may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
The communication interface 13 is used for communication between the electronic device and other devices, and includes a network interface and a user interface. Optionally, the network interface may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which are typically used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.
Fig. 4 only shows an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 4 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management and the like are realized through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The speech intention recognition program stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
receiving a user inquiry statement subjected to voice recognition;
acquiring a preset node type label, extracting a keyword from the user inquiry statement, calculating a matching value of the keyword and the node type label, and determining the node type of the user inquiry statement according to the matching value, wherein the node type comprises a basic type, a refining type and a customization type;
calling a pre-constructed recognition model framework comprising a basic dialogue model, a refined dialogue model and a customized dialogue model, and selecting a corresponding dialogue model from the basic dialogue model, the refined dialogue model and the customized dialogue model to generate a reply sentence of the user inquiry sentence;
and converting the reply sentence into voice based on phoneme conversion and transmitting the voice to the user.
Specifically, the specific implementation method of the instruction by the processor 10 may refer to the description of the relevant steps in the embodiment corresponding to the drawings, which is not described herein again.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
The present invention also provides a computer-readable storage medium, storing a computer program which, when executed by a processor of an electronic device, may implement:
receiving a user inquiry statement subjected to voice recognition;
acquiring a preset node type label, extracting a keyword from the user inquiry statement, calculating a matching value of the keyword and the node type label, and determining the node type of the user inquiry statement according to the matching value, wherein the node type comprises a basic type, a refining type and a customization type;
calling a pre-constructed recognition model framework comprising a basic dialogue model, a refined dialogue model and a customized dialogue model, and selecting a corresponding dialogue model from the basic dialogue model, the refined dialogue model and the customized dialogue model to generate a reply sentence of the user inquiry sentence;
and converting the reply sentence into voice based on phoneme conversion and transmitting the voice to the user.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method of speech intent recognition, the method comprising:
receiving a user inquiry statement subjected to voice recognition;
acquiring a preset node type label, extracting a keyword from the user inquiry statement, calculating a matching value of the keyword and the node type label, and determining the node type of the user inquiry statement according to the matching value, wherein the node type comprises a basic type, a refining type and a customization type; (ii) a
Calling a pre-constructed recognition model framework comprising a basic dialogue model, a refined dialogue model and a customized dialogue model, and selecting a corresponding dialogue model from the basic dialogue model, the refined dialogue model and the customized dialogue model to generate a reply sentence of the user inquiry sentence;
and converting the reply sentence into voice based on phoneme conversion and transmitting the voice to the user.
2. The voice intention recognition method according to claim 1, wherein the extracting of the keyword from the user query sentence includes:
performing word segmentation processing on the user query sentence to obtain a word set;
taking words in the word set as vertexes, and constructing a directed graph for edges according to grammatical relations among the words;
calculating the weight of each top point in the directed graph according to a weight formula;
and selecting the peak with the weight value larger than a preset threshold value to obtain the keyword.
3. The speech intent recognition method of claim 2, wherein said computing a match value for the keyword to the node type label comprises:
converting the keywords and the node type labels into vectors to obtain word vectors and type vectors;
respectively extracting the characteristics of the word vector and the type vector to obtain word characteristics and type characteristics;
and calculating the similarity of the word features and the type features to obtain a matching value of the keyword and the node type label.
4. The speech intent recognition method of claim 1, wherein the invoking of a pre-built recognition model framework comprising a base dialogue model, a refined dialogue model, and a customized dialogue model, the selecting of a corresponding dialogue model from the base dialogue model, the refined dialogue model, and the customized dialogue model to generate a reply sentence to the user query sentence, comprises:
if the node type is a basic type, calling a basic dialogue model in a pre-constructed recognition model frame to generate a reply statement of the user inquiry statement;
if the node type is a refinement type, calling a refinement dialogue model in a pre-constructed recognition model frame to generate a reply sentence of the user inquiry sentence;
and if the node type is the customized type, calling a customized dialogue model in a pre-constructed recognition model frame to generate a reply sentence of the user inquiry sentence.
5. The speech intent recognition method of claim 1, wherein the invoking of the underlying dialog model in the pre-built recognition model framework generates a reply sentence to the user query sentence, comprising:
converting the user inquiry statement into a vector by using an input layer of the basic dialogue model to obtain a text vector;
performing semantic recognition on the text vector by utilizing a hidden layer of the basic dialogue model to obtain a semantic vector;
and decoding the text vector by utilizing the output layer of the basic dialogue model to obtain a reply sentence.
6. The speech intent recognition method of any one of claims 1-5, wherein prior to invoking the custom dialog model in the pre-built recognition model framework to generate a reply sentence to the user query sentence, the method further comprises:
collecting corpus data related to preset services;
expanding the corpus data based on a corpus enhancement method to obtain a corpus training set;
and training the customized dialogue model by utilizing the corpus training set to obtain the customized dialogue model.
7. The method for recognizing speech intention according to any one of claims 1 to 5, wherein the converting the reply sentence into speech based on phoneme conversion is transmitted to a user, and comprises:
performing text analysis on the reply sentence to generate a phoneme text;
and converting the phoneme text into voice by using a preset voice library, and transmitting the voice to a user.
8. A speech intent recognition apparatus, the apparatus comprising:
the sentence receiving module is used for receiving the user inquiry sentences subjected to the voice recognition;
the node identification module is used for acquiring a preset node type label, extracting a keyword from the user inquiry statement, calculating a matching value of the keyword and the node type label, and determining the node type of the user inquiry statement according to the matching value;
the reply generation module is used for calling a pre-constructed recognition model framework comprising a basic dialogue model, a refined dialogue model and a customized dialogue model, and selecting a corresponding dialogue model from the basic dialogue model, the refined dialogue model and the customized dialogue model to generate a reply sentence of the user inquiry sentence;
and the voice reply module is used for converting the reply sentence into voice based on phoneme conversion and transmitting the voice to the user.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the speech intent recognition method of any of claims 1-7.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the speech intention recognition method according to any one of claims 1 to 7.
CN202111274209.3A 2021-10-29 2021-10-29 Speech intention recognition method, device, equipment and storage medium Pending CN114020886A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111274209.3A CN114020886A (en) 2021-10-29 2021-10-29 Speech intention recognition method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111274209.3A CN114020886A (en) 2021-10-29 2021-10-29 Speech intention recognition method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114020886A true CN114020886A (en) 2022-02-08

Family

ID=80059073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111274209.3A Pending CN114020886A (en) 2021-10-29 2021-10-29 Speech intention recognition method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114020886A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115658891A (en) * 2022-10-18 2023-01-31 支付宝(杭州)信息技术有限公司 Intention identification method and device, storage medium and electronic equipment
CN117132392A (en) * 2023-10-23 2023-11-28 蓝色火焰科技成都有限公司 Vehicle loan fraud risk early warning method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115658891A (en) * 2022-10-18 2023-01-31 支付宝(杭州)信息技术有限公司 Intention identification method and device, storage medium and electronic equipment
CN115658891B (en) * 2022-10-18 2023-07-25 支付宝(杭州)信息技术有限公司 Method and device for identifying intention, storage medium and electronic equipment
CN117132392A (en) * 2023-10-23 2023-11-28 蓝色火焰科技成都有限公司 Vehicle loan fraud risk early warning method and system
CN117132392B (en) * 2023-10-23 2024-01-30 蓝色火焰科技成都有限公司 Vehicle loan fraud risk early warning method and system

Similar Documents

Publication Publication Date Title
CN110287479B (en) Named entity recognition method, electronic device and storage medium
CN112100349A (en) Multi-turn dialogue method and device, electronic equipment and storage medium
CN111488468B (en) Geographic information knowledge point extraction method and device, storage medium and computer equipment
EP4113357A1 (en) Method and apparatus for recognizing entity, electronic device and storage medium
CN114020886A (en) Speech intention recognition method, device, equipment and storage medium
CN113096242A (en) Virtual anchor generation method and device, electronic equipment and storage medium
CN114676255A (en) Text processing method, device, equipment, storage medium and computer program product
CN115309877A (en) Dialog generation method, dialog model training method and device
CN114416943A (en) Training method and device for dialogue model, electronic equipment and storage medium
CN113722483A (en) Topic classification method, device, equipment and storage medium
CN113326702A (en) Semantic recognition method and device, electronic equipment and storage medium
CN107480197B (en) Entity word recognition method and device
CN113254814A (en) Network course video labeling method and device, electronic equipment and medium
CN116186219A (en) Man-machine dialogue interaction method, system and storage medium
CN114528851B (en) Reply sentence determination method, reply sentence determination device, electronic equipment and storage medium
CN115510188A (en) Text keyword association method, device, equipment and storage medium
CN114637831A (en) Data query method based on semantic analysis and related equipment thereof
CN114662484A (en) Semantic recognition method and device, electronic equipment and readable storage medium
CN114398896A (en) Information input method and device, electronic equipment and computer readable storage medium
CN114186028A (en) Consult complaint work order processing method, device, equipment and storage medium
CN114401346A (en) Response method, device, equipment and medium based on artificial intelligence
CN114528840A (en) Chinese entity identification method, terminal and storage medium fusing context information
CN114519094A (en) Method and device for conversational recommendation based on random state and electronic equipment
CN114548114A (en) Text emotion recognition method, device, equipment and storage medium
CN112597748A (en) Corpus generation method, apparatus, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination